Porting Smalltalk to the Amstrad CPC: some progress

Posted by pulkomandy on Tue Mar 19 21:15:18 2024  •  Comments (0)  • 

If it doesn't work, keep trying

My previous article on porting Little Smalltalk to the Amstrad CPC received some attention and support messages from other people, who either find it useful, or just interesting to read. I got a comment from someone who was happy that I am doing retrocomputing and not just retrogaming. I'll take that as a compliment and I think it is a somewhat common theme in the Amstrad CPC scene.

As you can probably guess from the previous post on garbage collectors, I am still thinking about this project and trying to figure out a way to make things work.

First, I noticed that I had made a mistake in my first attempt: by not looking closely enough, I assumed that the image file would be somewhat specific to the word size and endianness of the target machine, leading me into attempting to build the image directly on the CPC. A closer look at the code made me realize that this is not the case. The stored image format is generic and does not change depending on endianness and word size.

So, that means I can throw away my work on porting the image builder, and focus on the VM itself. But I was still curious why the built image was more than 100 kilobytes. Surely there is not *that* much code in Little Smalltalk?

I started taking a closer look at the image file in an hex editor. And I noticed something that I should have found much earlier, especially as it was hinted at in the Little Smalltalk book. The image contains not only the bytecode, but also the sourcecode to all methods. This allows Little Smalltalk to not include a disassembler. You can open the sourcecode in an editor, edit it as you want, and then recompile it to create a new bytecode. But the price to pay is a lot of used up space in the image. I think later I will write (or port) a bytecode disassembler (I think there is one in one of the later versions/forks of Little Smalltalk?). For now, I disabled the sourcecode generation in the ImageBuilder, and it resulted in a much smaller image of 49KB. Now that's a size I have at least some chance of fitting into the CPC RAM! So let's try that.

The porting of the Smalltalk VM to the CPC is pretty similar to what I did for the ImageBuilder. I compile the C code, set up a linker script, replace file IO with AMSDOS routines which are much simpler than POSIX, but should still be enough for our needs at least to load the initial image. I encounter similar problems in the image loading than in the image saving (too much stack usage by recursive functions) and apply the same workaround (moving the stack to a different place before calling the heavily recursive functions).

Anyway, I got it running. Sort of. And then I ran out of memory again.

Memory layout of the CPC

To understand the problem (and the next steps), it is useful to understand how memory is addressed on the CPC. First, remember that the machine uses a Z80 CPU, and so, it can address 64KiB of memory using 16-bit addresses. On the CPC this is organized as such (rounded to the next 256th byte for simplicity):

  • 0000-0100: Reset vectors and "low jumpblock" from the firmware. This is the handlers for the RST instruction, interrupts, etc.
  • 0100-A700: Available for applications.
  • A700-BFFF: Used by the OS. Variables from the disk ROM, indirections for various system routines that are actually implemented in ROM, etc.
  • C000-FFFF: video RAM

So we have about 42KiB of RAM free for our application. Initially I tried to fit both the interpreter code and the Smalltalk heap (and any variables needed for the interpreter) into this space, and it didn't work. I removed various parts of the interpreter, I replaced the garbage collector with a stupid one that can't free any memory (this particular idea being brought up by one of the papers on garbage collectors I read, apparently some people were using their LISP machines this way for performance reasons and it was used also as a benchmark of the fastest possible GC algorithm).

Anyway, even with just half an interpreter, I still was out of memory rather quickly. So it's time to find other approaches.

Mocing the code to a ROM

The Amstrad CPC OS is well designed to allow running code from expansion ROMs. These ROMs are mapped at C000-FFFF (overlapping the screen memory), and can easily call into the system ROM (which maps at a different address) and also call into other ROMs as neede (for example, the AMSDOS ROM for disk access).

To put our code in ROM, we need some changes to our linker script and CRT0. I based this on scripts I wrote earlier while making a ROM version of the Contiki operating system.

-i allinone.ihx
-b _INITIALIZED=0x0100
-b _CODE = 0xC000
-k /bin/../data/sdcc/lib/z80
-k /packages/sdcc-4.4.0-1/.self/data/sdcc/lib/z80
-l z80


The main change here is that _CODE is now at 0xC000, and we also add that _INITIALIZED is at 0x0100. _INITIALIZED is where the RAM area starts, the other RAM sections are put right after it. So now the linker will put our code in ROM, and our data in RAM.

Of course we also need to adjust the CRT0 file.

;; FILE: crt0.s
;; Generic crt0.s for runing code as a CPC ROM.

  .module crt0
        .globl  _main

        .globl _progend

        ;; Ordering of segments for the linker.
        ; Things that go in RAM
        .area _INITIALIZED

        .area _DATA
        .area _BSS
    .area _HEAP

        ;; Things that go in ROM
        .area _HOME
        .area   _CODE (REL,CON)
        .area _INITIALIZER (REL,CON)
        .area   _GSINIT (REL,CON)
    .area   _GSFINAL (REL,CON)

        .area   _CODE (REL,CON)

	    ;; The _CODE area starts with the ROM header.
		;; This has some version markers, a ROM type (background here, meaning
		;; it will not immediately take control of the system during boot),
		;; And a table of "RSXs", that is, commands that can be called from
		;; BASIC or ASM code by name.
        .db #1  ; Background ROM
        .db #4  ; Version 4.0
        .db #0
        .db #0
        .dw rsx_table

		;; The first routine is called during boot, in our case we do nothing
		;; at that point. The second is invoked by the name defined in the
		;; table below (|ST in our case).
        jp do_nothing
        jp init


        .db 'T' + #0x80
        .ascii "S"
        .db 'T' + #0x80
        .db #0

; This is the program entry point. Execution starts here
;; Initialise global variables, clear BSS areas
;; Note: not used in Smalltalk, I removed all initialized variables.
;; Probably still a good idea to clear the BSS part so all global variables are
;; initially set to 0...
;       ld      bc, #l__INITIALIZER
;       ld      a, b
;       or      a, c
;       jr      Z, gsinit_next
;       ld      de, #s__INITIALIZED
;       ld      hl, #s__INITIALIZER
;       ldir
; Clear BSS sections
;       ld hl,#s__DATA
;       ld (hl),#0
;       ld de,#s__DATA + #1
;       ld bc,#s__DATA + #0x200
;       ldir

; Initialize disk ROM. We do the simplest thing here and just initialize ROM 7.
; It wouls be more compatible to save the currently active drive and see if
; there are other ROMs to initialize.
        ld hl,#0xabff
        ld de,#0x40
        ld c,#7
        call 0xbcce

        ; Init the heap. Do it after the disk ROM is ready, because that may change HIMEM
		; Not needed: no malloc/free is used.
        ; call __sdcc_heap_init

; Enter the C main.
        call    _main
        ; TODO figure out how to return to BASIC cleanly (we probably erased a bit much)
        jr endless


That's all we need, we can now run our C code from ROM. If it fits there, that is.

Making the code a bit smaller

At this point I had about 20-24K of code. A ROM is 16K, and, for now, I don't want the headache of switching between different ROMs for different parts of the code (maybe later, if I can't fit it all in one single ROM). So, I commented out more code from the interpreter and garbage collector and started simplifying some things. There is no difference anymore between static and dynamic allocations. All integer math code is removed as well as most other things. Also a big way to save space was replacing printf with simpler custom functions that don't do much custom formatting. I have way to print 16 and 8 bit values, NULL terminated strings, and "Pascal" style strings with a length known in advance. That is good enough, and it saved several kilobytes to not need printf's % parsing, decimal integer formatting, etc.

After this, I could finally try to run my VM again. And... it ran out of memory.

I guess that makes sense. Even after moving the VM code out of the way in a ROM, I still get only 42K of RAM free. And that has to hold not only the Smalltalk image data, but also all variables needed by the interpreter, the AMSDOS file buffers, etc. Well, it doesn't fit.

But, if you know a bit about the CPC, you have noticed the existence of the CPC 6128, a machine with twice as much RAM as what I have explained so far. How does this work? Well, the CPU cannot access any more RAM directly. So, there is a special hardware register to map extra RAM pages. There are various ways to use this, they are somewhat handcrafted to run CP/M Plus, a modified version of CP/M that was shipped with the machine and put the user code all in expansion memory, allowing applications up to 63K of RAM usage, with the video memory, and all system internals remaning in the main 64K of RAM.

Anyway, I decided to do something simple, and the simplest way to handle this memory in a way that's friendly to the CPC firmware is to map it by blocks of 16K at addresses 0x4000-0x7FFF. In my previous article about garbage collectors, I considered some more advanced options, but in the end I decided to keep things as simple as possible: all the Smalltalk heap and objects will live in this 4000-7FFF space, and I can keep the remaining parts of RAM (0100-3FFF and 8000-A7000) for the internal needs of the interpreter and GC (which will both need some variables and structures).

Of course, this create a new difficulty: the C compiler isn't designed to work with such memory banks. So, I will have to explicitly map the right memory page before accessing these variables. There are again multiple ways to handle this (so many choices...). For example, I could use a 16-bit address referring to a position in the 64K of extended RAM. But that would prevent having anything more than 64K of extended RAM. There are memory expansions for the CPC, and if just loading the base image already requires nearly 50K, I should rather prepare for use of one of these.

So, I will instead use indirect pointers. The idea is that pointers to objects can be represented as such:

struct indirection {
        unsigned char bank;
        struct object* ptr;

The "bank" here is the value to send to the Gate Array (the Amstrad CPC memory management chip) to map the correct RRAM page. For now we have to know about the vlaues C4, C5, C6 and C7 (these allow access to the 64K of extended memory) as well as C8 (this will put the 4000-7FFF from the base RAM back where it was).

The code to dereference an indirect pointer is like so:

__sfr __banked __at(0x7F00) GateArray;
// __sfr means this is accessed with the IN and OUT instructions of the Z80, instead of RAM access instructions
// __banked means the address is still 16bit (normally for IN and OUT it should be 8 bits, but manufacturers of hardware using the Z80 didn't respect this)
// __at(0x7f00) forces the variable to be at that address
// This means we can now write:
// GateArray = xx;
// And this will generate the appropriate assembler code:
// LD BC,0x7f00
// LD A,xx
// OUT (C),A

struct object* DEREF(struct indirection* x)
	    // Get the pointer first, in case the indirection is itself stored in
	    // the currently active bank (I didn't use that in the end, so maybe I
	    // can simplify this)
        struct object* o = x->ptr;
        GateArray = x->bank;
        return o;

I then modified the Smalltalk object structure to contain "indirections" instead of pointers to other objects. And I also added a lot of calls to DEREF all over the code of the interpreter, to map the correct memory page before accessing anything. As a first step, I didn't try to be smart about it (well, of course I did, then I had a lot of bugs, and I stopped trying to be smart, and then it worked better). So, I made a rule to not have any direct object* pointer stored anywhere in the interpreter code, only indirection* and calls to DEREF over and over. This will not be fast, as every other instruction will be switching to a different memory bank (or uselessly activating a bank that is already activated). But let's try to make it work this way first.

Starting from here, I have two choices. I can store indirections directly inside objects, or separately. Basically, one of these two:

// Indirection stored directly into the object
struct object {
        uint size;
        struct indirection class;
        struct indirection data[0];

// Indirection stored separately
struct object {
        uint size;
        struct indirection *class;
        struct indirection *data[0];

As you probably guessed by the fact that I named this structure "indirection", I went with the second option. The reason for this is that it is much easier to code when object sizes are in multiple of 2 bytes, and not in "multiples of 3 + 2". And also because you would have to be very careful to store the destination pointer before connecting the new bank, if the indirection itself was stored in a bank.

So, now, I have a big table of indirections living somewhere in 0100-3FFF. This table is filled with pointers to objects in the banked RAM. It creates a limitation to about 5000 objects in the system (16K/3). I have not hit this limit yet, but the initial image is already about half of that. We'll see how that goes.

This indirection also may simplify work for the garbage collector: it allows to easily move objects from one address to another, since all references to an object will be through the indirection table: just update the indirection table entry, and you're done. I think the garbage collector will benefit from that simplicity of not having to crawl through all other objects to find references to the one being moved.

Another advantage, mainly for debugging, is that there are now a lot of constraints on what a valid pointer is. An indirection must have an address in the 0100-3fff range. The pointed object must have a bank in the C4-C8 range, and a pointer in the 4000-7FFF value range. I have added some checks to my DEREF function to validate this. If the code attempts to dereference an invalid reference, execution stops with an error message.

Finally, these contraints also mean we have several unused bits, both in the pointers and in the indirections themselves. These are useful to store extra information. In indirection pointers, the high bit being set means that the value is not an indirection, but a direct 15-bit integer value. In indirections, unused bits in the bank or in the pointer fields can be used by the garbage collector to track which objects have already been garbage collected (the original code also did similar things, but using the size field in the objects themselves for the gc, and the lower bits of the pointers to mark integers, relying on the fact that allocations are aligned to 2 bytes and so for pointers that bit is always 0).

So, after all these changes and a bit of debugging, we're getting somewhere. The full image can be loaded into banks, and then execution starts. Unfortunately, it seems to be confused after a short time, returning to the wrong place after executing a "block2 (blocks are similar to lambdas in other programming languages).

So, I have more debugging to do. I have to keep a balance between adding more debug statements in the code, and not overrunning my budget of 16K of code in a single ROM. For now, I do not want to leave C and start rewriting things in assembler, because it would make it harder to change things and try new ideas. I hope I can delay that until the code is mostly running and there are at least some parts I'm sure won't change anymore. Then I can rewrite or hand optimize these first, and save enough space for debugging the other parts. Also, working in the ACE emulator has been very helpful, setting breakpoints in strategic places allowed me to find and fix many of the problems.

I continued debugging, identifying and fixing various issues. For example, if I use the DEREF function twice in the same line of code, like so:

DEREF(x) = DEREF(y);

that doesn't work right, because the = operator is not a sequence point. That means the bank activations for each side will not happen in the right order for things to be read and written where they should. So this must always be done through a temporary variable, which ensures proper sequencing.

Other mistakes include too many or not enough calls to DEREF, or simply writing code that doesn't behave the same way as the original while adding the DEREF calls. Eventually, I got it to run! Well, sort of.

Running out of space (again)

One of the very first things the Smalltalk image does is creating the 256 "Char" objects for each of the ASCII characters. Each of them is a full-blown object, using up 6 bytes in the heap. This saves a lot of special-casing in the VM. But, the loop to create them ends up creating a lot of objects and filling up my indirection array. Eventually the indirection array grows into the 4000-FFFF space, overwrites some objects there, and things crash badly in various ways. So maybe this indirection was not such a great idea. Or maybe I need to start thinking about the garbage collection, and see how many entries in that array it would manage to free. But the problem with that is, at the moment, I have less than 128 bytes left in my ROM, so a GC is not going to fit. Unless I start removing debugging traces and optimizing some code in assembler. Which I don't want to do until I'm sure the indirection table approach is working.

Well, that's all for today. See you in the next article in the series...

Ideas for a garbage collector for memory-banked systems

Posted by pulkomandy on Thu Feb 29 21:20:52 2024  •  Comments (0)  • 

Well, apparently I'm not ready to give up on Smalltalk for the Amstrad CPC yet.

The previous attempt showed that it is not possible to fit even a rather simplified Smalltalk interpreter into the main memory of the CPC. I guess this comes as a surprise to no one, since there are 16KB used by the display, about 8 used by the OS, and so only around 40K left for everything else.

It turns out, others have tried this and somewhat succeeded. Pocket Smalltalk for the Palm Pilot promizes to fit a complete game (Dungeoneers) in a similar space (I have not managed to run the game and have no idea if it is good). The Palm is a bit quirky in the way it works, but also, Pocket Smalltalk abandons the idea of editing things directly in the live Smalltalk image, instead it is designed as a cross development environment, more similar to a Java virtual machine in a way. That's why I had not considered it in my porting attempts. Well, maybe I should revisit it.

There is also TerseTalk, but that seems a bit too limited for me (only 256 objects?) and also not implemented yet (the code is extremely incomplete)

On my CPC, I currently have 2MB of memory, so, surely there would not be any problems running a larger version of Smalltalk. Maybe even Smalltalk-80 could be made to work somehow. IT will probably be slow. But would it even be possible?

The Z80 CPU cannot access all this memory linearly. Instead, all we get is a 16KB window that can be moved to any part of that 2MB space. This is one of the annoying things to handle with native code. Some people choose to keep their work running on the CPC 464, and not use any of this extended memory, to save the headaches. (Not attacking or ridiculing anyone here, it is a valid choice and there are other reasons for it). Anyway, I got thinking about how the abstractions of Smalltalk could make it possible to handle this memory mapping scheme transparently for the developper.

So I started to read about garbage collectors. This post is an attempt to ogganize these thoughts and see if we can come up with a design that will have acceptable performance. I will continue updating it as I continue thinking about it...

Garbage collection basics

First let's talk about the general idea of dynamic memory allocation. The basic need is simple: an application is running and it wants to use memory for temporary storage, and release it later when not needed anymore, so it can be reused by something else.

One way to handle this is a very simple API like the one available in C: malloc() and free(). The application tells the memory allocator when it needs a new block of memory (nidicating how large the block should be), and gets a pointer to that block. It returns the block to the allocator when it does not need it anymore.

This seems as simple as it gets. Just two functions. But, it turns out, the job of the memory allocator is quite complicated. While the application keeps track of the currently allocated memory blocks, the allocator is responsible for keeping track of the ones that are not in use. This will tipically be managed with a "free list" of all the blocks that are not currently allocated. This can be implemented in a quite simple way, but there can be tricky problems, for example if the application forgets to free some block, or even worse, if it tries to free the same block multiple times. And even if the application has no bugs, this approach has another limitation: as long as a block is allocated and in use, it is not possible to move it in memory to a different address. This is because the application has a direct pointer to the block, and the allocator does not know what the application is doing with it.

The idea of garbage collectors is to solve some of these problems. From the API side, the problems all come from the free() function. It is hard to mis-use the malloc function. At worst, the application asks for more memory than is available, and gets an error instead of a valid pointer. The free() function however, can be misused, by calling it with addresses that were not returned by malloc, by calling it twice with the same address, by forgetting to call it, by trying to free blocks where information stored by the memory allocator has been erased, etc.

So, let's remove that function! Problem solved. This is the end of the article. Thanks for reading.

Oh wait. Of course that introduces a new problem. How is the allocator going to know when a piece of memory can be reclaimed and put back into the free list? Garbage collectors solve this problem by analyzing the content of the memory block. This, of course, requires some level of cooperation from the application. The general idea is that the garbage collector can access the data structures stored in the allocated blocks, and identify if they have references to other blocks. A block that is not referenced by anything else is unreachable by the application, and so, its memory can be reused.

The trick now is thinking about an efficient way to find such unreferenced blocks. But, since the garbage collector knows how to access the application memory and identify references in there, it may also write this memory and update the references to point to some new place. This gives us a new tool: it is now possible to move existing blocks in memory to a different place. As we will see, this opens a lot of new possibilities. The obvious one is that, after a while, with a typical malloc/free allocator, the memory becomes fragmented, and there are only very small pieces left. There is no easy way to get out of that situation. But with a garbage collector, you can. This makes garbage collection well suited to small memory systems, where fragmentation will quickly be a problem.

Establishing our invariants

I have read some texts about garbage collectors, and most of them tend to present algorithms starting with the earliest ones, and introducing the new ideas gradually. This is a good idea, but I think I will make a change still: first I want to introduce some terms and explanations. The thing is, each of the algorithm is described in its original research paper with its own terminology, especially for the earlier ones. This, for me, makes it harder to think about how these algorithms can be combined together, which part of one can be reused in another, etc. We will need this, because most of the research on garbage collectors is done with much larger machines in mind, and no banked memory (at least the ones I could easily find. I am interested if you have papers or algorithm descriptions for something using banked memory).

The abstraction I found very useful is the tricolor invariant. It formalizes a bit what we need to do. Let's have a brief look at it.

The situation is that we have a set of objects in our memory. These objects have references (pointers) to each other. What we need to do is, starting from a root object or set of objects, determine another set of objects where we are sure that, by following any number of references, it is not possible to reach that second set starting from the first one.

If the root set is correctly chosen, this will allow to determine a set of objects that cannot be reached in any way from the application code. These objects are unused, and their memory can be reused.

So, how does a garbage collector goes about doing that? In very general terms, and without giving any specific algorithm, we can establish a solution that is proven to work. This is exactly what the tricolor invariant is.

What it does is attribute a color to each object in the system. There are 3 different colors. It could have been red, green and blue, but due to the technical limitation of printing technologies at the time, most litterature uses black, white and grey instead. I will stick with these to keep things simple.

That's the "tricolor" part explained. Now let's talk about the invariant. An invariant is a condition that must always be verified. The invariant is:

A black object must never have a direct reference to a white object.

At first this does not seem helpful. But it is the key to understanding all garbage collection algorithms. Let's look at our 3 set of objects now:

  • There is a set of black objects. They may reference grey ones and other black ones, but never white ones.
  • There is a set of white objects. They may reference anything.
  • There is a set of grey objects. They may reference anything as well.

So, because of the invariant, if we want to reach an object from another, we have a constraint (which is essentially a different way to express the invariant):

All paths from a black object to a white object will at some point travel through a grey object

Now, let's consider a special case. What if there are no grey objects in the system? In this case, because of the invariant, we are sure that there are no references from any of the black objects to the white ones.

I think you can see where we're going with this: if the "black" set is the objects (potentially) reachable by the application, and the grey set is empty, we are sure that the white set is made of unused objects, which can be deleted.

So, we can now redefine the job of the garbage collector as a two step process:

  • Step 1: colorize all objects such as the invariant is verified
  • Step 2: modify the coloring of objects to try to reduce the number of grey objects to 0
  • Step 3: when there are no grey objects, the remaining white objects are unused and their memory can be reclaimed

We have to eliminate one edge case: the stupid garbage collector that just makes all objects white. Then the invariant is verified, there are no grey objects, and so everything can be deleted. Simple, but not very useful. So, the initial setup in step 1 must have either one black or one grey object. Usually this is called the root: the object(s) that allow to reach all others. In Smalltalk, that is the object references currently in the application stack.

As I have written, this does not describe any specific algorithm just yet. Garbage collectors have to decide on various things, including: how to remember the color of nodes, how to assign the initial colors, and how to process the nodes to reduce the number of grey ones.

malloc/free, again

Having established all this theory, let's look at the practical example of a typical malloc/free allocator.

Initially all memory is unreachable by the application. We can consider this a "white" memory block.

Every call to malloc takes some memory and makes it available to the application. Since the content of that memory is not yet filled by the application, it cannot at this time contain any reference to anything in the application, and even less any references to any other "white" (unused) area of memory. So, we can safely mark this block as black.

free() is used to return memory to the application. In theory, we don't know if the application still has references to that block in any other part of its "black" memory. So, we should mark this block black or grey. But, since we have no way to know what the application is doing, we don't have any algorithm to make this block white again. So, we have to trust the application that it is correct, and mark the block white.

It seems difficult to make anything simpler. But, if we look closer, we notice two things: first, it is not as simple as it looks. malloc() is given just a size, and it can't magically find a block of "white" memory where that fits. So, there will be some data structure to do that. We will see later how garbage collectors solve this. For a typicall malloc/free implementation, this will need to store a pointer or size at the start of each black memory block, and also at the start of white blocks, but since these are not used by the app, it's not a problem.

The second problem is more important, it's the way the invariant is maintained true. Essentially, we have delegated our job back to the application. This is what makes malloc/free an unreliable solution: the application developper probably does not want to think about memory allocation too much, and surely they will occasionally get it wrong. That's the problem we are trying to solve here. So, let's move on to better options.

The garbage collector in Little Smalltalk

Let's see how Little Smalltalk handles this. According to the comments in the sourcecode, this is the Baker two-space algorithm. It is relatively simple.

First let's look at the way new objects are allocated. We start from the first available byte in memory. At every allocation, we increment a pointer by the size of the allocated object. That's it. Objects are allocated next to each other this way, until we reach half of the available memory. Then, the garbage collector starts working.

As we know, it must perform its job in three steps.

Step 1: colorize all objects such as the invariant is verified

All objects in the garbage collected area are first considered white, except the root, which is grey.

Since there are no black objects, the invariant is verified.

Note: the root can be either the first object in the memory, or it can be stored outside of the memory in some separate storage (for example, in Smalltalk, the execution stack is the root and is not typically a dynamically allocated object).

Step 2: modify the coloring of objects to try to reduce the number of grey objects to 0

Get the next grey object.

Analyze each reference in that object. Mark all the referenced objects as grey if they are still white.

When done, mark the object black.

Continue with the next grey object.

Eventually, there will be no more grey objects and the algorithm is complete. In the worst case, this stops when all white objects have been reached and marked grey, and eventually marked black after processing.

So far, this is how you would obviously implement such an algorithm. The trick is in how the algorithm remembers which objects are white, grey of black.

In Baker's two space algorithm, the "black" and "grey" objects are moved to the second half of the memory, while the white objects remain in the first half. This is pretty interesting, because it means all the information needed is already stored in the object addresses. No additional data structures at all are needed. Pretty impressive. But that comes at the cost of not being able to use half of the memory. Maybe that's a bit much.

Other algorithms instead reserve 1 or 2 bits in the objects to store the current color (only 1 bit of grey objects are remembered by some other means, for example, storing their addresses in a stack). That's likely more efficient, especially if you can find some available space for that extra bit. For example, if you have a 16-bit size field in objects, you could easily use one of these bits for the color. No one is going to fill the entire RAM with a single object, that would be stupid. So the size won't ever get that large.

Anyway, there is another problem with that general algorithm. We started by making almost all objects white. Since white objects have to be moved to a new place in memory, that means eventually moving all "live" objects into a new place, and also updating all references to these objects in other objects. If we do this for a large memory (say, 512KB or 2MB) at once, this is going to lock the program from running for quite a long time. Incremental garbage collectors solve this by making only one subset of objects white. But it is not so simple to do that, since we must make sure that the invariant is established (no black objects point to white ones).

Specificities of the Amstrad CPC

One thing we need to keep in mind is the CPC memory is addressed in banks. This means we can only access 16K at once (well, not exactly, but let's keep this simple). Is there a garbage collector taking this into account?

There does not seem to be a lot of research on such things. Which makes sense: when garbage collectors for larger memory became a reasonable thing to do, it was at least on 16 or even 32 bit CPUs, and possibly with virtual memory, and even hardware acceleration to help the GC on some LISP machines. All entertaining reading, but not really what I need.

Note: after writing the first version of this article I found about this article from Lieberman and Hewitt, which is quite similar to the solution I ended up with. It turns out, there probably is more research then you think about these things, it's just not immediately obvious where to find it.

So, I have to get back to the basics and consider how I can tweak them for my need. The idea is to work one 16K page at a time. When the page is full, do a garbage collecting cycle on it, so that we don't leave too many unused objects behind. Then, move on to the next 16K page and start over. Hopefully, we can then have one "in use" page with the recently allocated objects, some of which will have a short life and be garbage collected as soon as the page is full. And we have only a limited number of accesses to objects in other pages.

But, how do we implement that? Since we don't want to go through all of the memory, we can take some inspiration from more complex garbage collectors.

The algorithm can be used to do a complete garbage collection of all the system. This is done by first making all objects "white" (this is easy: just create any new object with the marking bit set to the "white" value). Then, start from the "root" (the references to objects in the program stack), and mark all referenced objects as grey. Then one by one, take one grey object, add all its referenced objects to the grey list, and so on, until there are no more grey objects.

When you are done, all allocated objects are now "black". Then, the memory can be accessed directly (page by page, not using object references), and the available space can be reused for allocating new objects. If you are used to C style memory allocators, you would then start adding all this free space to some kind of free list or something like that. But that's a bad idea: you are using memory to keep track of your free memory. With a garbage collected system, what you can do instead, is move objects around so that the remaining "black" objects are all tightly packed at the start of memory. Then, allocations simply consists of appending new objects at the end. Since each object has a field indicating its size, you can then use that info to figure out where the next object in memory is, and allocations are just incrementing the "last object" pointer by adding the size of the object to it.

Of course, when moving objects around, you must update references to them. In the two spaces method, this is done by putting a special marker at the old place where the object was, indicating its new address. As objects are moved to the new space, they can access this info. Once everything is moved to the new space, the old space is empty and has only unneeded redirections. But again, doing it this way would require going through all memory.

The tricolor scheme is better than that. One interesting thing is, if you read the above carefully, at some point I said "first, make all objects white". But this is a stronger requirement than what we really need. We have 1 condition that we must satisfy: black objects must not point to white objects directly, only to other grey or black ones. What if, initially, all our objects are black instead of white? This also verifies that condition, and the algorithm can run from there.

This opens some new possibilities. It means we don't have to perform garbage collection on all the RAM everytime. If we manage to build a small set of white objects, it can be scanned quite fast and without having to go in all the memory banks. For example, let's start by making all objects initially black. Then, we create a new object (in the currently active bank) and make it white. Surely we will store a reference to this object somewhere soon, and we will store it in another object. Since all our other objects are black, we are breaking our invariant condition here (we would have a black object pointing to a white one). So, we have two solutions: make the new object grey, or make the old one that points on it grey. Both of these preserve the condition. Making the new object grey is not very interesting, however: quickly, all our new objects will all become grey, then be considered by the GC as still used, and never deleted. So we must take the other option.

When we decide to run our garbage collection (for example, because the working memory bank is full), we know that it can only delete objects from the current bank (the only ones that can be white at this time). The "grey" list of references was built as we modified any other objects in other banks. We can check these objects, but we can optimize that check: since all banks other than the current one contains only grey and black objects, we only need to check references to the current bank. All other references would either point to a black object (that has already been scanned), or a grey object (that will be scanned already). We will not learn anything by checking these. And so we can quickly scan the current bank, find if all objects in it are still referenced, and then compact it and start over. We can do this until there is only one free 16K bank left. Then, we need to do a complete GC pass on all the RAM, that will take a longer time, but it will clean all remaining objects that became unreferenced after they were pushed out of the current bank.

Note an important thing here: the current bank always ends up being full, then garbage collected and the remaining objects moved elsewhere. There is also a lot of back and forth between this current bank and all the others during the gc phase (but also probably during running the Smalltalk interpreter). So, I suggest the following layout:

  • 0000-3FFF: The "current bank" with the newly allocated objects (sharing space with other things if needed: interrupt handlers, stack, global variables for the interpreter)
  • 4000-7FFF: In main RAM, the screen memory. Overlaid with memory banks being accessed when the VM is running. Just map the bank out when printing something to the screen.
  • 8000-FFFF: Interpreter and GC code, with the AMSDOS variables in the middle.

So we maintain two pointers for memory allocations: one in the 0000-3FFF area for freshly created objects. And one for the 4000-7FFF area in banks, for the longer lifetime objects. Once the 0000-3FFF RAM is full of objects, we do a small GC cycle of if to take care of the recently created and short-lived objects, then we move the remaining things to banks, and we start over. Until there are no banks left. Then, one big GC cycle to compact all the banks (if we ever manage to fill that much RAM, that is). We may even save ourselves the trouble for the "big GC": instead of compacting the RAM in place, if you run out of RAM, just save your session to disk. This is like a GC but still-referenced objects are written to disk instead of another RAM space. Then reload it. That's it, you just defragmented your whole memory!

One problem with this system remains: Little Smalltalk normally represents objects using 16-bit "cells" (a cell can be either: a pointer to another object, the object size, or a direct integer value. The VM uses the low bit to distinguish integers from pointers (pointers are always to an even address; integers are shifted by 1 and the low bit is always set for example. We could change this and use the highest bit instead, since all our pointers to Smalltalk objects would be in 0000-7FFF). But we have to store info in all pointers/references to know which bank it is pointing to. This would mean switching to 24-bit cells. This makes everything more memory hungry by 50%. Or, we could keep the internal pointers as 16 bit values, and have an indirection table somewhere to find the right destination. For example let's say the interpreter code is now moved to a ROM, so we can do this:

  • 0000-3FFF: The "current bank" with the newly allocated objects (sharing space with other things if needed: interrupt handlers, stack, global variables for the interpreter)
  • 4000-7FFF: In main RAM, the indirection table. Overlaid with memory banks being accessed when the VM is running. Just map the bank out when printing something to the screen.
  • 8000-BFFF: AMSDOS and firmware stuff, variables for the interpreter if needed.
  • C000-FFFF: In main RAM, screen memory (write-only). In ROM, the Smalltalk interpreter code.

So we have 16K bytes indirection table. If each reference needs 3 bytes, that's enough for 5461 objects. Maybe the table can go up to 9FFF, and then we have enough for 8000 objects. That would be enough to fill the 2MB memory if each object is exactly 256 bytes. Is this a reasonable compromise? I think objects will typically be smaller than that. So maybe using 24-bit cells is better. But then, the indirection table also simplifies things when moving objects around since you can use the table to redirect to the new address of an object when it is moved. Otherwise, you have to update all objects which are pointing to it. Not necessarily a big deal, this can be done at the same time as scanning the grey objects: look at the grey object, see that it references something white in the "new allocations" area, immediately move the white thing (if not already moved) to a more persistent storage in extended memory, and update the grey object with the new address.

Well, all of this easier said than done. I guess I have to write an implementation now?

Porting Smalltalk to the Amstrad CPC: a failed experiment

Posted by pulkomandy on Sat Feb 24 12:38:04 2024  •  Comments (0)  • 

This week project was an experiment with porting Smalltalk to the Amstrad CPC. It didn't work, but it was interesting. So here are my notes.

About Smalltalk

Smalltalk is an object oriented language developed at Xerox PARC. It started in 1969 but the first release outside of PARC was in 1980, and was in fact the third major rewrite of the language.

Smalltalk was part of the "Dynabook" project, which was a research project about imagining the future of computing. The vision for this was a small personal tablet computer (hardware somewhat similar to an iPad), where you could run your own heavily customized system. Of course, this was way out of reach of 1970s technology. But that didn't stop the team. They built an "interim" machine to be the first step towards this: the Xerox Alto.

The Alto is a personal computer, but with a large CRT display and not portable at all. It implements the instruction set from the Data General Nova, an earlier machine that has a rather simple CPU, on top of a microcoded system that allows to reuse the same logic for also driving the display, implementing the disk controller, and a few other things.

The early versions of the hardware didn't have a lot of RAM, but since the project was about designing the future of computing, that was a problem. So, Smalltalk runs in a virtual machine, implemented in software, that implements virtual memory: most of the data is stored on disk, and loaded on-demand when the Smalltalk software needs it, and all of this, without the user/programmer having to think about it.

Why port it to the Amstrad CPC?

The CPC was one of the first computers I had access to. I still own one (several in fact) and still use it from time to time, as well as developping hardware expansions for it. The initial machine has 64K of RAM, later expanded to 128. But mine has a 2MB expansion card connected to it. However, the machine runs a Z80 CPU, which can only access a total of 64K of memory. As a result, extended memory is accessible in banks of 16K, one of which can be accessed by the CPU at a time. There are no languages that can handle this automatically: the BASIC shipped with the machine was never updated, and the SDCC compiler can do it, but in a bit limited way.

Anyway, the BASIC on the CPC is a rather good implementation, but it is still BASIC: not very fast, and not very well suited to structured programming. And C requires a cross-development setup, where the code is written and compiled on another machine, and the CPC is used just to run it. That's a bit sad, the machine should be able to work on its own.

There aren't many languages that can fit in these 64K of memory. Well, you can go with Pascal, and in fact there are pretty good versions of it for CP/M. You can go with C, but the Z80 CPU is not very well suited to a typical C compiler, and so the generated assembler code is messy and not efficient, and, more importantly, there is no hope for a good C compiler running directly on the machine (that would be possible with a simpler CPU or maybe if it was more reasonable to use the memory banks).

Moreover, my goal here is to have a language for quick prototyping and experiments. When I need the full performance of the machine, I'm happy with using a C compiler or even hand optimizing some parts in assembler. But this is a bit annoying to do on a larger scale. Essentially, I want something acting as the main interface for the machine, just like BASIC does, but with a more modern language.

This reduces the set of candidates to: BASIC itself, stack based languages such as Forth and LISP (which would both be good choices but are a bit too different from what I'm used to), maybe something like Lua or Wren (but not the existing implementation of Wren, which is based on floating point numbers and NaN tagging, not really suitable for the Amstrad. And there is Smalltalk.

The Smalltalk from Xerox as it was released in 1980 is interesting. The main interpreter could fit in a dozen or so kilobytes of code. That's just the amount we have on a CPC ROM expansion bank, so it could fit in the place occupied by the BASIC. However, Smalltalk is not just a language, it is a complete system, with a GUI, overlapping windows, a built-in debugger, and so on. Well, at least it would be interesting to see all of this running on the CPC, and how slow it would be. But maybe not as a practical thing to use, at least not as a replacement for the BASIC. The Smalltalk world mainly continued from there, and today you can find Squeak and Pharo, which are direct desendants of this version of Smalltalk, but keep getting larger and larger, and are probably much more than the CPC can handle.

Little Smalltalk

So, I had this idea of writing a Smalltalk VM for the CPC in the back of my head for a while. Lately, I looked again for existing Smalltalk implementation that I could maybe use as a starting point. You know, start from working C code, even if slow and inefficient, and gradually rewrite things out in assembler until I have something usable.

These searches brought me to Little Smalltalk. This is a dialect of Smalltalk, not quite compatible with the standard Smalltalk-80. The syntax is very similar, with a few minor differences. But, maybe more interestingly, the GUI environment and built-in debugger and all that stuff is not available. This is a command-line Smalltalk, initially designed for use from a simple serial terminal on a UNIX timeshared computer.

There are four versions of Little Smalltalk, each improving in the previous one and making the code simpler and nicer. Since this is rather simple C code, and it compiles to a reasonably small executable, it was not unreasonable to attempt porting it to the CPC. So I started hacking.

Little Smalltalk is a bit different from Smalltalk-80 in that it is not a completely abstract VM specification. For speed, it retains the pointer size of the host machine and uses that as its main type. With this and a few other tricks, and the removal of the GUI parts, it should be quite simpler than Smalltalk-80. Simple enough that I could attempt running it in a few evenings of work.

I started by compiling all 4 versions on my PC, and looking at the code size for each. Version 4 is indeed the smaller one, so I started with that. It also is the only one not using K&R style function prototypes, which is good for me, because SDCC (the C compiler I'm going to use) doesn't implement those.

From the start I noticed that the "image" size (the Smalltalk bytecode) was too large to fit in RAM, but I thought this was because of building it for a 64bit machine in my tests, and the final version would be considerably smaller.

A note on running C on the Amstrad CPC

There are various tutorials for this out there, but the ones I found were slightly out of date. So, I'm going to share my own files here, and explain what I needed to change in the Little Smalltalk implementation to get it running as much as I did (not enough for anything usable).

C application startup

If you know something about C, you know that execution of a C program starts with the main() function. Well, that isn't entirely true. Before this can be run, the code should make sure that all global and static variables are initialized to the correct values. In SDCC, this is implemented by the GSINIT code segment. To make sure this code is run before everything else, we need a little bit of assembler code, usually called CRT0. SDCC ships with a CRT0 meant for their own Z80 simulator platform, but that is not quite suitable for the CPC. So, here is my version suitable for the CPC.

  ;; This is the CRT0 module. Declaring this will make sure it shows properly
  ;; in .map files and other places.
  .module crt0

  ;; Note that the main function is a global that can be found elsewhere.
  ;; The assembler then knows how to emit a reference to it for the linker to
  ;; handle. Note that C functions are translated to assembler labels with an
  ;; extra _ in front of them.
  .globl  _main

  ;; The output file starts with a HEADER section, that is not loaded. In theory
  ;; we could put a standard AMSDOS header here, but I chose to do that with a
  ;; separate tool instead.
  .area _HEADER

  ;; Then we have all the sections with read-only code and data
  ;; GSINIT and GSFINAL are for the initialization code run before main (see
  ;; explanation below where they are filled in).
  ;; HOME and CODE is for the code and constants
  ;; INITIALIZER and INITIALIZED are for global and static variables initialized
  ;; with some value in the code. The GSINIT code should copy them from INITIALIZER
  ;; (in ROM) to INITIALIZED (in RAM) so they have the right value. In our case,
  ;; all the code and data is loaded in RAM, so we don't really bother with that.
  ;; DATA and BSS are the RAM areas used by the software. BSS should be cleared
  ;; to 0 during program startup.
  ;; Finally there is the HEAP, this is the area used for dynamic allocations.
  ;; This could be SDCC's malloc, but it is not a great implementation, in our
  ;; case we will use a custom and simplified allocator that can't free any memory.
  ;; Each compilation unit (.rel file) can add data independently to each of these
  ;; sections. Then, the linker concatenates the section from each of them
  ;; (for example, all the GSINITs together) to generate the final output file.
  .area   _GSINIT
  .area   _GSFINAL
  .area   _HOME
  .area   _CODE
  .area   _INITIALIZER

  .area   _DATA
  .area _BSEG
  .area   _BSS

  ;; Now that we have defined all sections and in what order we want them (well,
  ;; in theory, but there seem to be other things messing with this order currently),
  ;; we can start filling in these sections with data or define symbols that point
  ;; to important parts of the memory, so these symbols can be accessed by the
  ;; C code (there will be examples of this later in the article).
  .area   _HEAP
  ;; Define a symbol for the "heap" start. This is the first address not used by SDCC
  ;; generated code, so anything above that is free for our own needs. Our simplified
  ;; memory allocator will make use of it.

  .area   _GSINIT
  ;; All code that needs to be run at global init (before entering main) is inserted here by SDCC
  ;; Each compilation unit will generate little bits of code for each thing that needs initialization
  ;; (for example if you initialize a global or static variable with some complex expression).
  .area   _GSFINAL
  ;; The GSFINAL area is inserted after the GSINIT one. Usually nothing else gets added here, but
  ;; it could be used for example to implement atexit() or similar (code to run after main has
  ;; returned).
  ;; So, after the GSINIT code is done, we can call into the main function. We use JP to do a tail-call
  ;; optimization, the main function will return directly to the caller (to the BASIC interpreter for example).
  jp      _main

  .area   _CODE
  ;; Finally we need some code in the CODE segment. This will be put at the start
  ;; of the output file at 0x100 (see the linker script). For simplicity (since SDCC
  ;; does not generate the AMSDOS header for us), we put the entry point at that address.
  ;; TODO; it would be even better if the GSINIT and GSFINAL code was already at 0x100,
  ;; then we could directly have the program entry point pointing there.
  ;; Another option is to have GSINIT and GSFINAL at the end of the code, and overlapping
  ;; with the HEAP area. Since the init code is run only once, this allows to have this code
  ;; erased after it is run and not needed anymore. Maybe I will write about this
  ;; trick in another article later.
        jp gsinit

  ;; We also define the exit() function to be an infinite loop, because our code
  ;; expects that to be available.
        jr _exit

So, we have a CRT0 file now. How do we use it? Well, we tell the linker about it using a linker script. Here is the one for the ImageBuilder tool (more on what that is in the next section).

-i imagebuilder.ihx
-b _CODE = 0x100
-k /bin/../data/sdcc/lib/z80
-k /packages/sdcc-4.4.0-1/.self/data/sdcc/lib/z80
-l z80


This generates a file named "imagebuilder.ihx" in Intel hex format. It also forces the _CODE section to start at address 0x100 (this is where we want to load our program). It then declares the Z80 standard libraries, the linker will use these to provide implementations of standard functions (printf, malloc, strcpy, ...) if they are used. And finally, we have our two object files, the CRT0 and the output of the C compiler.

Finally we can tie this all together with a simple build script:

sdcc -c -mz80 --no-std-crt0 --opt-code-size --max-allocs-per-node 100000 imageBuilder.c -I../source
sdldz80 -f lst4.lk
makebin -p -o 0x100 < imagebuilder.ihx > imagebuilder.bin
hideur imagebuilder.bin -o ../IB.BIN -t 2 -x "&100" -l "&100"

The first two steps are compiling the .c file, and linking it using the above linker script. This result in a file in intel hex format, but we need a binary. The makebin tool is used to perform the conversion. Finally, we use hideur maikeur to add an AMSDOS header setting up the load address and entry point. As a result, our executable can be run directly from BASIC. That's all, we're up and running!

Porting ImageBuilder

ImageBuilder is a tool that parses a text description of smalltalk classes and methods, and converts it into a binary smalltalk "Image" containing class and method descriptions, and the bytecode to run for each of the methods. I need to run it on the CPC because the image format is platform specific (mainly, the class and object headers depend on the native pointer size).

Besides the things described in the chapter above, I made a number of changes to the ImageBuilder to get it to run. I believe these can be useful to anyone attempting to port C code to the CPC, so, here is the details of what I changed.

Console output

Of course, we want to see what the program is doing. This is traced by printf statements througout the code. SDCC provides an implementation of printf, but it needs to know how to send the characters to the screen. For this, we need to use the firmware BB5A function. This function takes a parameter in A that is the character to print.

That's a good thing, because that is exactly the calling convention used by SDCC. So, we can just tell SDCC that this function exists at that address. We do this with a little function pointer trick.

#define bb5a ((void(*)(char))(0xBB5A))

What does this do exactly? Well, we define a macro called "bb5a" (this could be any name we want to give to the function). This macro takes the value 0xBB5A, and casts it into a function pointer, to a function that takes a char argument and returns nothing. That's all SDCC needs to know to call that function. Now, when we want to print a character, we can do this:


Unfortunately, that is not enough to get printf to work. SDCC follows the C standard a bit too closely, and needs a putchar function that takes its parameter as a 16-bit value. Also, the code uses \n for newlines, while the CPC needs \r\n. We can take care of these two problems by defining our putchar function this way:

int putchar(int c) {
       if (c == '\n')

File I/O

The ImageBuilder needs an input and an output file (the sourcecode and the generated bytercode image, respectively). The original code is written for UNIX, using FILE* and the corresponding standard APIs. These are not available on the CPC, but instead we have AMSDOS, which, in this case, provides a good enough support for file access. But, it is not designed for accessing it from C, and a few tricks will be needed.

First of all, we need to initialize the disk ROM. On the CPC, when a program is started with the RUN command, the system is de-initialized, and all ROMs are turned off. This means any attempts at disk access would instead try to target the tape drive. Well, technically that would work, but it's easier to use some modern mass storage (in my case, it is an SD card connected to the Albireo mass storage expansion). So, we need a function to re-initialize the disk ROM.

static inline void amsdos_init(void)
               LD HL, #0xABFF
               LD DE, #0x40
               LD C, #7
               CALL #0xBCCE

I kept this extremely simplified. Normally, this function should also save the current drive number for AMSDOS and restore it after reinitializing the ROM. It should probably not use hardcoded values for HL and DE, but determine where it is safe to put the AMSDOS buffers. Maybe it should init all the ROMs instead of just ROM 7. Anyway, for the ImageBuilder I just wanted to get up and running quickly. So, I call this from the start of the main() function.

Next, we need to open a file for input. Unfortunately, the firmware function does not accept its parameters in the SDCC calling convention. SDCC puts the first 8bit parameter in A, the second parameter in DE, and then everything else on the stack. The CAS_IN_OPEN firmware vector needs the filename length in B, the filename pointer in HL, and a temporary work buffer in DE. So, only DE is at the right place. However, by looking at the generated code, we can see that, to push the filename pointer to the stack, SDCC is also accidentally putting it in HL. I decided to abuse this since it made my life a little easier. So we end up with the following:

static char amsdos_in_open(char fnlength, const char* buffer, const char* const fn)
       // A = filename length (we need it in B)
       // DE = buffer
       // on stack = filename (but it is also in HL by accident)
               ld b,a
               ;pop hl
               call #0xbc77

       // TODO error handling
       // Popping is taken care by SDCC
       return 0;

While we're at it, the function to open a file for output is pretty similar:

static char amsdos_out_open(int fnlength, const char* buffer, const char* const fn)
               ld b,a
               ;pop hl
               call #0xbc8c

       // TODO error handling
       return 0;

And we can also add the functions to read and write characters, and close the files when done. These match the SDCC calling convention, so, we handle them like the BB5A function from earlier:

#define amsdos_getc ((char(*)(void))(0xBC80))
#define amsdos_putc ((void(*)(char))(0xBC95))
#define amsdos_in_close ((void(*)(void))(0xBC7A))
#define amsdos_out_close ((void(*)(void))(0xBC8F))

Finally, the original code also uses fgets to read the input file line by line. AMSDOS doesn't natively provide that, but we can easily provide an implementation. We just need to read the file until there is a newline, and make sure to add a NULL terminator after it.

static char amsdos_gets(char* buffer)
       char* fin = buffer;
       for (;;) {
               // FIXME test carry instead for returning
               *fin = amsdos_getc();
               if (*fin == '\n') {
                       *++fin = 0;
                       return 1;
               if (*fin == '\0')
                       return 0;

This is not really correct, the end of file is not properly detected (in fact, that's a problem from the way amsdos_getc was implemented here). But the work never got as far as reading the file until the end anyways.

Now we can implement our file IO. Note that in AMSDOS there are no file descriptors like in UNIX, there can be only one input and one output file opened at the same time. Fortunately, our needs are even simpler: the ImageBuilder first reads all of the input, closes it, and only then writes the output. So, all that's left to do is declare a working buffer for AMSDOS:

static char amsdos_buffer[2048];

And then we can replace all the calls to fopen, fgetc, fgets, fputc and fclose in the sourcecode.

Memory allocations

After that, we try to run the program and... it immediately fails after trying to allocate some memory. It turns out, SDCC only provides 1000 bytes of memory to use for malloc() by default. That is, of course, ridiculously small. Changing that requires manually recompiling the memory allocator. I have no idea why they made it that way. Anyway, for the vast majority of allocations, the ImageBuilder never frees them. The idea is to keep everything in RAM until the final image can be generated. This means we can use an extremely simple allocator:

static void* memalloc(int size)
       extern char heap_start;
       // Make sure heap is aligned since the low bit is used for non-pointer stuff
       static char* ptr = (char*)(((unsigned int)(&heap_start) + 1) & 0xfffe);

       char* allocated = ptr;
       ptr += size;

       return allocated;

This allocator works by having a pointer to the start of the free memory, that is simply incremented whenever we need a new allocation. Where is the "start of free memory"? Well, remember in the linker script we declared a "heap_start" label that points there. So, we just need to grab the address of that label. So we declare it as an extern variable, and take its address. We then make sure it is an even address. This is because I suspect some things in Little Smalltalk use the lowest bit for storing a flag, so all allocated addresses must be even.

The code also makes use of strdup, the implementation from SDCC would still want to use its own malloc, so let's replace that as well:

static char* mystrdup(const char* in) { // keep heap aligned int len = strlen(in) + 1; if (len & 1) len++; const char* out = memalloc(len); if (out != NULL) strcpy(out, in); return out; }

Here as well, we take care of keeping the heap aligned. Otherwise, there is nothing special about it.

Finally, there was one place where malloc/free was used, that is in the ClassCommands methods that does something like this:

	char *class, *super, *ivars;

	// Some code here ...

	class = strdup(...);
	super = strdup(...);
	ivars = strdup(...);

	// More code here using those strings...

	free(class); free(super); free(ivars);

This is easily replaced with the following:

	char class[16], super[16], ivars[83];

	// Some code here ...

	strcpy(class, ...);
	strcpy(super, ...);
	strcpy(ivars, ...);

	// More code here using those strings...

And that's all, we can now run the ImageBuilder!

... Or can we? (Running out of memory)

Well, it starts parsing and compiling things, and it seems to work. But after a while, it crashes. Fortunately, I'm running this all in the ACE CPC emulator, which provides a good debugger and allows me to watch the CPU registers in realtime. I see that the code seems to be using quite a lot of stack space, and the SP register (the stack pointer) ends up way lower than it is allowed to be. The CPC firmware only allocates a relatively small space to the stack, since it didn't expect me to run complicated C code like this, with recursion and use of stack buffers to store strings.

That's not really a problem, we can move the stack somewhere else. But where? there is now a lot of things going on, and we don't really have much space to fit it all. Of course, it would be convenient if we could use the display RAM. On the CPC, this uses 16K of RAM, or 1/4 of the available space. Can we put the stack there? Of course there's the problem that we can't use that space for both the stack and for displaying messages. So, let's try to do it only while the parser function with the big recursive calls is running. And we can restore it to its normal location when we are done with the complicated parsing code. Let's add this to the parseBody function:

// Declare a new global variable where we can temporarily store the stack pointer
int savesp;
	// When entering the parseBody function:
	// Disable interrupts (so the firmware doesn't notice us doing strange things with the SP register)
	// Save the old value of SP to a global variable
	// Point SP to the screen RAM
               ld (#_savesp),sp
               ld sp,#0xffff
	// When exiting the parseBody function:
	// Restore the stack pointer to where it was
               ld sp,(#_savesp)

OK, it is a bit crazy to do this without telling the C compiler about it. But here it works because this function has no parameters and no local variables. And we do these things just at the right places so the compiler does not attempt to access the stack by other ways (such as by using the IX register as the frame pointer).

Now, we can see some glitches on screen while the parser is running, but at least it doesn't erase some important things in memory while running. And this gets us a little bit further into parsing the file...

But how far, exactly?

Unfortunatly, it does not get us far enough. With this setup, and tuning the compiler for optimizations a bit (that's the --max-allocs-per-node setting you saw earlier in the build script), we end up with about 16K of RAM available for the heap. I made some tweaks to other parts of the code, reducing the size of some memory buffers that didn't need to be so large. This heap fills rather fast as the parser creates more and more object declarations and compiles bytecode for several methods from the imageSource definition file. Unfortunately, we run out of memory after compiling barely 10% of the input file. So, a rough estimation is that we'll need 10x as much memory to get through the complete file. There's no way that's going to fit into the Z80 address space. Even if I removed most of the firmware, even if I used all the screen RAM. And then, the resulting image file would still be around 100K. So, the conclusion is that this is no match for the space efficiency of the Amstrad CPC BASIC, unfortunately.


Do I give up on running Smalltalk on the CPC? No, not really. But, it will need a Smalltalk VM that can manage banked memory. That's doable, as I have said earlier in the article, in the Xerox Alto they even did it by swapping things to disk, so, surely, memory banks are not a problem. But, if I'm going to do that, I may as well use the fact that I have 2MB of RAM, and so I can run a complete Smalltalk-80 image. I don't know how ridiculously slow that will be. But in any case, there is little gain from constraining myself to Little Smalltalk. If you're going to need hundreds of kilobytes of RAM for something, you may as well throw in a full-blown GUI in it.

Should I try some other language someday as a possible alternative to BASIC? Well, I'd love to. Ideally the requirements for that would be to use only 16K of code (fitting in a ROM), and maybe a few hundred bytes of RAM, leaving as much as possible free for the user. I think I would be fine with needing to write scripts in a text editor (such as Protext) and not have to use the way it is done in BASIC. Or maybe just a REPL where one can define custom functions and so would be fine. In Little Smalltalk, the editing of functions is left to an external text editor, and when the text editor returns, the resulting text is parsed and turned into bytecode. I think that's an interesting idea. Well, I'll keep looking at other options...

IFF catalog add-on for Haiku Locale Kit

Posted by pulkomandy on Sun Oct 15 15:58:38 2023  •  Comments (0)  • 

I made this as part of my work on porting ACE to Haiku. ACE originates from MorphOS, and so it uses "catalog" files for internationalization in the format used there. The format is based on the IFF container typical on Amiga originated software.

Since the Haiku version of ACE uses pretty much exactly the same strings as the MorphOS version, it would be silly to have to re-translate all the strings. So I made this add-on to Haiku locale kit to handle catalogs in the MorphOS format.

At the moment this is not useful to any other software than ACE. However, maybe someone will find it useful. The format is a bit different than what is done in Haiku. In the Haiku locale kit, the original (usually English) string is used as a key for translations. This requires hashing the whole string, which makes everything a bit slower. It isn't very noticeable on modern machines, but it is definitely part of what makes Haiku slower than BeOS. On the other hand, the format used on MorphOS assigns an integer to each string, which can be used to refer to it efficiently. It still provides fallback strings inside the executable, so if there is no catalog, things will still work.

Maybe we could use this format more on Haiku and get better performance. But it would require a lot of changes to the tooling built around the existing format. Maybe a project for later...

You can get the iffcatalog sourcecode:

If you want to simply use it, the binaries are available in Haiku package manager.

Version history:

  • 0.1 - 04/2015 - First public release
  • 0.2 - 11/2017 - Skip '\0' in translated strings (used for keyboard shortcuts on MorphOS)
  • 0.3 - 06/2020 - Fix catalog search directory
  • 0.4 - 02/2022 - Add a python script to generate locale_strings.h from catalog definitions
  • 0.5 - 10/2023 - Remove dependency on libtextencoding, which is not safely usable in a Haiku add-on unless the host app links with it as well. Fix a bug in the locale_strings.h generator with handling of escape sequences.

Personal notes on the smallweb

Posted by pulkomandy on Sun Oct 15 14:10:04 2023  •  Comments (0)  • 

Lately I have taken part in some fediverse discussions about the "small web". As you probably know, many websites these days are bloated. They will load several megabytes of resources. I remember surfing the web as a kid, I would download some games and shareware, and I would be annoyed when one of them would be larger than 1MB, as that would take a while to download. Now, a lot of webpage (just the webpage, not the downloaded software!) are already 10 times larger than that.

Yet, they do not seem to offer much more than they did 25 years ago. A web page mainly shows text with some pictures where relevantm or just to make it look nice. So, what happened? Well, I don't really know. If anything, my website is getting simpler and smaller over the years. Early versions had MIDI music and a Java applet emulating the Start Wars opening scroll on the landing page.

Anyway, so, the "small web". It seems this idea started gaining traction in the last few years. It comes from multiple directions: retrocomputing is trendy, and, well, it always was, but now, there are decidedly retro machines that had their commercial life during the early days of the worldwide web. So there is demand from retrocomputer users who want to browse the web like in 1996 or so. Another nostalgic aspect is the "what if?" exploration of Gopher, an early competitor to the web that largely lost to it. And there is also the concerns caused, I guess, indirectly by climate change, of building a more sustainable tech that's less power hungry. Surely, simpler websites and web browsers could be part of that, as well as being genuinely useful to people who have a limited internet access for various reasons (living in very remote areas, not willing or being able to pay for a super fast connection, or being in a place where the infrastructure just isn't there for many possible reasons).

One thing that is somewhat succesful in that domain is the Gemini protocol. This is inspired by Gopher but made more modern and a little less quirky, for example, it requires SSL where Gopher was mainly plaintext. Unlike HTML, it gives very limited control on the text formatting to people writing webpages. As a result, it is quite easy to write a Gemini browser, even one working in a command-line, and indeed there are quite a few of them. I had only a brief look at it, and decided it doesn't work for me for various reasons (this is my personal reaction to it, not a complaint or so, I'm sure other people are happy with these choices and that's good for them). The SSL requirement makes it unreachable for the 8-bit retrocomputers I care about. The limited formatting is often not enough for what I want to do with the web, and likewise, I find the lack of cookies, URL parameters and the like a bit of an extreme measure. I understand the idea of enforcing privacy, but I am not convinced it is worth the limitations that result.

But, my main problem with it is: it is new technology. There is a new protocol, a new page formatting language, new specifications, and new software to write. I get it, a lot of people will find these new things exciting, it's fun to write software and specs from scratch, without having to care about legacy quirks and past mistakes. This is not how I usually think about things. As I learnt in school, a good software developer is a lazy software developer. So, my approach is often trying to make most use of what's already available. In this case, this means the existing web browsers. That's a good idea because we already have a lot of software, a lot of people who know how to write webpages, and a good understanding of the problems and limitations. And there are also users (people browsing the web) who won't need to be convinced to install yet another piece of software on their computers. Can we achieve that? I don't know, but I find that idea a lot more interesting to explore than the "what if we erased everything we have and started over?" approach of Gemini.

So, I took a look at the existing specifications for the web. My goal here is to see how to make websites that are "light" (a webpage should be a dozen to a hundred kilobytes or so), fast to load, and yet allow websites to have some visual personality. The latter can't really be done in Gemini and I think it's an important part of the web: allowing everyone to have their own space, and really make it reflect who they are. There will be the wall of text just using the default formatting. The cute page with colorful text and animated GIFs. The red-text-on-black-background crazy conspiracy theory website. And so on.

From the technical side, let's try to limit ourselves to the technologies already widely supported by web browsers. Or, to word it differently: I don't think the bloat of the web is a technical problem to be solved by throwing more tech at it. Instead, it's just a matter of making better and more efficient use of what we already have. The core set of technologies is in fact quite reasonable. On the protocol side, we have a choice of at least HTTP 1.0 and 1.1, possibly HTTP 2, possibly some other more experimental protocols. I'm not sure HTTP 2 complexity is really useful for such small websites, 1.1 will probably be good enough.

More interesting is the content format. Here, we can review the different available options. The oldest one is HTML 3, which does not have CSS and mixes up content and presentation. This is not great by today's standards and does not really make things simpler. The newest one is HTML 5, which is in a kind of "rolling release" mode and keeps changing all the time. Personally I find this to be a problem, I would prefer a fixed set of features.

So, that leaves us with HTML 4. Which turns out to be really interesting, because it is from a time where mobile phones were starting to go online, and so, there was a simultaneous demand for richer webpages on desktop machines, and really simple ones for then limited mobile phones and other similar devices. Also, it is from a time where the web attempted to move to XML with XHTML 4. This seems to still be controversial: XHTML is a much stricter syntax, while HTML 4 retains the more flexible way of working of HTML3 and later HTML versions. Basically, whatever you write in a text document, a web browser can parse it as HTML. It is very tolerant on unclosed tags, weird nesting of things (putting a paragraph inside a table inside a list item), and so on. XHTML, on the other hand, is allowed to reject a page as invalid XML. Well, it doesn't really matter anyway, existing browsers already accept both, so we can use both depending on the context. If you are writing a static website generator, you can probably make it generate valid XHTML. If you are manually editing webpages or allowing your users to do so, maybe HTML will provide a less frustrating experience? I'm not sure about that, in my early days of writing HTML and Javascript I think I would have preferred to have clear error messages instead of a browser that always did something, but rarely exactly what I wanted.

Anyway, with both XHTML and HTML4, one interesting aspect that the w3c worked on is modularity. You can pick a profile like XHTML Basic, which gives a set of recommendations for which tags should be used or not. The selected set seems reasonable for my use of the web, it does not seem very constraining to me. Likewise for CSS, you can easily decide to limit yourself to "level 1" or "level 2" features. Or at least, you can make sure your website "degrades" to an acceptable rendering on browsers that support only these, while making use of more advanced features on browsers that can handle it.

Finally, we have to talk about javascript. Javascript is not a great language. We know why that is: it was designed in a very short time with requirements along the line of "Java is trendy, but objecto riented programming is too complicated, can you make us a version of Java without objects? We need it in two weeks". As you expect, the result is not great, and the language did get objects later on anyways. Well, unfortunately, we don't really have a choice. So, my approach to this is to limit Javascript usage to the minimum possible, and make sure my websites work well on browsers that don't parse Javascript. But I still find it useful, for example for client-side pre-validation of forms. It is useful to have immediate feedback on required fields in a form, and not discover problems only after you have submitted it. That kind of thing.

Also, Javascript allows to set up things like "AJAX" (OK, no one calls it this way anymore), basically, you can have dynamic webpages where just one part of the page is reloaded or generated by Javascript code from a loaded network resource. Sure, this makes the browser quite a bit more complex, and the webpage will need some processing. But it doesn't necessarily have to turn into megabytes of bloat. In fact, if used well, this can be very efficient in terms of bandwidth, since the whole webpage does not need to be reloaded. So, I don't like the idea of banning Javascript. Just make its use minimal, and, where it makes sense, have a fallback for browsers without Javascript.

Finally, one thing that is a bit unused in the current web is RSS feeds. Or, more broadly, the idea to expose website data in a format that's easy to parse and process by applications. I thnk this is a path that deserves to be explored more. Not only RSS, but also more generally exposing any type of data in a well structured XML, and maybe using XSLT to process it into other formats. This is well supported in most web browsers, and certainly could get more use. I like the idea of websites exposing data as XML that can also be easily downloaded and processed elsewhere (by some other website, or by a "native" app on the user's computer). This is kind of a tangent to the small web, but maybe there's an opportunity to build something new and exciting here.

So, what will I do, then? Well, not much. This website should be XTML-1.0 basic compatible and uses no Javascript, because it doesn't need any more than that. It does use some modern CSS features, but even with CSS disabled, you can read the text in the articles and use the website in an acceptable way. I hope this inspires other people to do the same. Maybe if more and more websites are built like this, the bloat of the big modern web3 ones will stand out more, and people will notice and complain about it?

Down the rabbit hole: I just wanted to write a videogame

Posted by pulkomandy on Sat Jul 1 17:02:59 2023  •  Comments (0)  • 

It is one of these weeks where nothing goes as planned.

Rabbit hole level 0: I wanted to write videogames

For some time now I've been part of a team trying to better document the VTech V.Smile console and make it easier to write games for it. They contacted me because I had some experience (and blog articles) about other VTech hardware.

The current efforts are documented in my VTech wiki. Most of the work was already done several years ago: datasheet and schematics were found, hardware was documented, games were dumped, emulators were written. But there was no documentation and no opensource tools to build new games, or at least, nothing quite production-ready. The only option would be the compiler suite provided by the CPU manufacturer (this is a custom CPU core, used in a few other game consoles).

So, after writing the documentation in the wiki, I started to experiment with writing an assembler and compiler. I initially started looking into vasm and vbcc, because my experience with these in the past had been rather good. The developers are helpful and the code is understandable by me and designed to make adding more CPU architectures easy.

Rabbit hole level 1: I need a C compiler

I quickly ran into problems with vasm, however. The CPU in the V.Smile is a purely 16-bit thing, which means it can't address individual bytes. While vasm has some support for this in the code, it was never used, and in fact, does not work. I discussed this with the vasm developers, and the solution they suggested is that all addresses in the assembler code should be prefixed with some special character, and the assembler frontend can multiply or divide them by two as needed.

I looked into that, but decided it would make writing assembler code more complicated and annoying than needed, as there is a risk of forgetting the marker and suddenly having your address all wrong.

On vbcc side, I did not have much problems, the porting guide is very complete and there is not too much work needed to get a basic version of the compiler running. But, without an assembler, it is not very useful. I did some experiments with Mikke Kohn's naken_asm, which has support for the UNSP CPU used in the V.Smile, but it is a simple assembler that can only directly generate a final binary. It has no support for temporary .o files and a linker. So in my tests I had to let the compiler generate assembler files and not assemble them, and also generate them in a way that they could be concatenated together at "link" stage before being assembled into a binary.

I got this to work for simple cases, but it is not great to work this way.

Rabbit hole level 2: I need an assembler and linker

I let the project sit for a while (I think it's been about a year?) hoping that someone else would do it(tm) or I would find a more suitable assembler somehow. I looked into ASXXXX, but this looks somewhat limited and not super easy to port.

So, eventually, I decided, if I'm going to port somethign not super easy, I may as well go for the Real Thing, and port GNU binutils. My research showed me that there is a porting guide, even if it's fairly short. And I think I have spent enough time doing low-level stuff (compilers, assemblers, wriing linker scripts, baremetal programming on AVR and ARM) that this should be within my reach. And so I cloned the git repository and started following the guide.

After just a few hours, I had something compiling and generating various executables: assembler, ar, objdump, etc for my architecture. I don't expect any of these to actually work, I started by just filling in empty functions and adjusting the buildsystem to get it to compile all things. The idea is then to run each of them, find what doesn't work, and add the missing functions as I go.

Binutils comes with a test suite, so I thought I would start by running that, look at all the failing tests, and fix them by adding bits of the code for my port, looking at how it's done for other CPUs.

Rabbit hole level 3: running the binutils test suite

This doesn't look too complicated: install the needed software, run "make check", investigate and fix bugs, and repeat.

So I went ahead and installed DejaGNU and expect which form the base of the testing framework. I then ran "make check" and… the testsuite immediately failed.

I had not heard of DejaGNU before, it seems to be a set of extensions to expect used to run tests on cross-development environments, typically, compile software on one computer, run it on another, and check that the results are as expected. I am not sure if anyone else uses it outside of binutils and gdb.

In any case, it is written in expect, which itself is written in TCL. And in the binutils case, it is also intertwined with the binutils build system which is written using autotools (and a specific version of it).

Rabbit hole level 4: learning to use expect

So my next step was trying to run a simple "expect" program. I quickly found that expect was completely broken, and it was a known problem with a bugreport opened at Haikuports since 2021. I have not mentionned that I am doing all this using the Haiku operating system, I would not run into these problems if I had chosen a more stable and finished operating system. But where would be the fun in that?

Anyway, so expect doesn't know how to open a PTY to communicate with another process (which is the main thing it is designed to do: spawn a process, read its output, match that with some regular expressions, and reply with some input according to a script).

A quick look at the code and buildsystem helped me find the problem: expect can handle many ways to open PTYs, and on Haiku, the preferred one was not picked because it requires linking an extra library that the expect configure script could not figure out. I quickly fixed that and… immediately hit another bug.

Rabbit hole level 5: coreutils

Now expect would correctly open a PTY, but it would fail to configure it. I once again dug into the sourcecode and found that it does this by running "stty sane" using the system() call. So I ran that same command in my shell, and indeed was greeted with the exact same error message.

Quick sidenote: I found the use of "stty sane" using strace and looking for calls to the exec system call. This almost didn't work: support for printing the command line of the executed command for exec in strace was added in Haiku by another developer just 3 weeks ago. So that's one rabbit hole jumped over, yay!

stty is a standard command provided by GNU coreutils (in Haiku at least, other operating systems may have their own version or one written by someone else under a different license or using a different programming language).

The expectation is that coreutils will detect and check a lot of things about the OS in their configure script while building, and compile the tools in a way that works for each system. But, they didn't handle the case where termios.h defined speed_t to be an unsigned char type. They are trying to set speed_t variables to -1 and later compare them equal to -1, and due to integer promotion rules in C, this is not the case. If someone is trying to tell you Javascript makes no sense, if you want them to go away, tell them about C integer promotion rules.

Anyway, I added the missing type cast, and stty started working. I thought I was finally ready to go one level up the rabbit hole towards the surface. I was wrong.

Rabbit hole level 4-and-a-half: expect again

I installed my newly built coreutils on my system, ran expect again, tried to run a child process, and this time, not only expect would start, but I managed to read the output from the launched program.

I then returned to the binutils test suite and ran 'make check' again. This time, it ran 2 tests, and the 3rd one made it stop waiting for something. I was a bit annoyed, not only because I had already fixed more bugs than I wanted to, but also because I was not too sure which part of the stack was wrong this time.

Eventually I found how to enable expect debug mode, and found which command it was running. I confirmed that the same command, ran standalone, returned immediately and with the correct results. So that wasn't a problem and I turned my attention to the test framework.

I studied the DejaGNU script for the failing test, and, while it took some time to peel all the layers, eventually I found that it was something quite simple: run 'ar' with some arguments, wait for the command to complete, and then check the output file. The failing part was 'wait for the command to complete'.

After some more experimentation with expect, I wrote a two line script that reproduces the issue. I ran it on Linux and confirmed that it has no problem there. Since that script is short, here is a copy of it:

spawn echo
expect eof

So basically, we start the 'echo' command and wait for it to terminate. And expect doesn't noticed that it terminates. 10 seconds later, there is a timeout (that doesn't happen in the coreutils tests because they set the timeout to 300 seconds instead of 10).

I turned to strace again, but I could not see a lot more. I also tried to follow the code in expect and in the tcl interpreter, but I quickly got lost. So I opened a support request on the expect bugtracker describing my problem, and went to sleep.

The next day, I had some answers from expect developers, mainly suggesting things that I had already tried but not included in my short ticket, so I shared the info (strace output) with them. And my fresher brain after a night of sleep also helped looking at things in more details. I know that expect uses a PTY to communicate with the spawned process, and so I decided to write a simple test program to do something similar with less "moving parts" involved: spawn a child process attached to a PTY, let it exit, and verify that the parent process waiting on the other side of the PTY is notified that the child is done.

Rabbit hole level 6: PTYs and poll

So I picked an example of PTY usage and started modifying it to my needs. And, I could easily reproduce the problem. Once again I made sure to run the program on Linux and Haiku to compare outputs. On Linux, when the child process exits, the PTY is closed and the poll in the parent process is notified. On Haiku, this does not seem to be the case, and so this program remains locked waiting forever. However, removing the poll call, a read call does not block, and properly returns an end of file. So it is just a problem of notifying the process waiting on poll that the file descriptor it is waiting on is now closed.

Now the next step is to fix that bug in Haiku. And, even if I do that, I don't know if it will also fix the problem in expect, as I was not able to find where in Tcl the waiting for file descriptors is handled.

So, as of now, I don't know if this rabbit hole has more rooms for me to explore, or if I will find my way up at least one level. Maybe I will lose interest in this and do other things for a few months before I get back to it. And probably I will uncover many more rabbit holes.


For people who think Haiku should not be in "beta" releases, I hope this helps you understand what we mean when we tell Haiku is not finished. It is not a safe ground to build any software on. Sure, a lot of commercial systems don't do any better, or didn't in the past, but still, the other options currently available aren't that bad nowadays. And not everyone is willing to get depp into these things like I do.

For people who wanted to use my C compiler to port games to the V.Smile: well, if you don't run Haiku, you can stay compfortably at level 1 or 2 of this rabbit hole and still be of help. If someone else was porting this assembler and compiler, I wouldn't need to run the binutils testsuite and all the deeper levels could be skipped. For now, at least.

For myself: sometimes it feels like I'm making no progress, but that's not true. It's just a lot of work in directions I didn't initially plan to go in. And such things are probably helpful for future projects as well. Also: I am surprised there were not more complaints about expect not working, and about PTYs being broken on Haiku. I thought these would be used a bit more often in typical UNIX toolchains?

BGhostview postscript viewer for Haiku

Posted by pulkomandy on Sat Apr 29 13:27:30 2023  •  Comments (0)  • 

Today I released version 1.0 of BGhostView, a postscript document viewer for Haiku.

Screenshot of BGhostView, showing some USB document from Openboot specifications

This software started in the late 90s as a port of a postscript viewer from UNIX/Linux to BeOS. Back then, Ghostscript did not have a cross platform API, and the BeOS port had to work with a patched up version of the Windows GSDLL API, heavily modified to run on BeOS.

I started working on it because in my past attempt to port Haiku to SPARC machines, I found that a lot of the documentation for these was distributed as PS files instead of the now more typical PDF. At the time, no version of ghostscript was available for Haiku. So I started digging and found an old port of Ghostscript whic provided a starting point, and this viewer I could use it with. But it wasn't working very well.

I then found that Ghostscript now has a slighly better API, and I could make use of that instead. So now BGhostView is running with an up to date version of Ghostscript (thanks to other people who also have postscript interpretation needs on Haiku, this was not taken care entirely by me).

I had not touched BGhostview since 2019, but I got reports that it was crashing recently. So, this week I dug into the code again and made some fixes and updates and decided to make a 1.0.0 version for all to enjoy. It is certainly not yet perfect, but for the basic needs of viewing Postscript documents, it should be fine.

This is yet another one of these applications that is currently hosted by HaikuArchives on Github, meaning it is more or less open for many people to contribute to, but left without someone to really take care of it and move it forward. Well, I guess that can be me when there are bugs to fix, but I probably won't have time to manage the larger refactorings and cleanups that would be needed: converting the UI to Haiku layout system so that it can automatically scale for High DPI displays, reindenting and reformatting all the sourcecode (it's very inconsistent at the moment, I guess most of it was written without a proper code editor that would watch the indentation a bit?), and reviewing the Ghostscript integration code to make better use of the APIs available in modern versions. Postscript isn't exactly great for that kind of usage, just figuring out how many pages there are in a document turns out to be a somewhat tricky problem.

There's also the question of wether we want a separate viewer for each document format. Wouldn't it be nice if the same viewer could do both PDF and PostScript? And what about DVI and XPS and FOP? And maybe docx and opendocument while we're at it? Could we use translators for this, so we can write the GUI once and then have all formats added to it later on? That would certainly be a cool project, but I already have many other things on my TODO list, so, not for now...

Get the sourcecode here

If you just want to run BGhostView on Haiku you can simply install it from HaikuDepot as you'd usually do.

Developer console

Posted by pulkomandy on Mon Apr 10 15:45:46 2023  •  Comments (0)  • 

I wrote new software today! Well, sort of.

This does not happen very often. A lot of my software work is fixing and improving existing code, and not writing new things. Maybe because I'm a bit lazy and I find it easier to fix some bugs in existing code than having to start from a blank page. It provides easier reward for me (that may not be the case for everyone, digging into an existing codebase is a learnt skill).

Anyway, so, the story is, somewhat recently (ok, actually, it's already more than a year ago), I got a new laptop. This was an opportunity to do an install of Haiku from scratch, and while doing so, I decided to go with the 64bit version. The only limitation in the 64bit version of Haiku is that it can't run 32bit software, including several applications that were compiled two decades ago for BeOS, and for which the sourcecode isn't publicly available.

I didn't think I was needing any of these applications, as, over the years, a lot of them have either been open sourced (mainly thanks to the effort of the Haiku Archives project in collecting such software and reaching out to the original authors to get the sourcecode published and/or relicensed under opensource licenses), or has been replaced by newer software or rewriten.

It turns out, one piece of software I occasionally use had not been through this yet. And so I got to work on rewriting it.

The software in question is BeDC. No, not the Direct Connect client of that name, but the rewrite of the earlier "Developer Console" (I think? it is unclear how BeDC and Developer Console are related). The idea is quite simple: this application receives text messages from other applications and logs them in a window. I discovered this software while working on PadBlocker, an input filter add-on that disables the touchpad while the keyboard is being typed on. Because of the way input server add-ons work, it is not easy to grab their output in the common way (sending it to stdout) because they run inside of input_server, which normally does not have its stdout route to anywhere.

I thought the idea was interesting and started using the app in some of my other projects, mainly in WebKit, where it was useful to collect logs from the different processes started by a single web browser instance and clearly marking where each message comes from, and in Renga, an XMPP client, as a way to log incoming and outgoing XML messages for debugging.

So, of course, on my shiny new 64bit system I did not have access to this nice tool, until now. I have rewritten my own version of it. That was made quite simple thanks to the APIs available in Haiku: the whole thing fit in about 200 lines of code. It is fully compatible with existing apps that used to target the old BeDC app, but it looks a bit nicer, thanks to the modern UI classes in Haiku. For example, when logging a BMessage, it can be shown as a nice foldable structure instead of just a bunch of lines.

I have some future plans for this, mainly to make it more useful with Renga where having some formatting of the XML would be great. But I don't know when I'll get to that. Until then, you can find the sourcecode on my Gerrit.

Pourquoi les hackatons, c'est de la merde

Posted by pulkomandy on Wed May 25 19:02:11 2022  •  Comments (0)  • 

Bon, on est sortis du confinement, on commence à voir des gens organiser des évènements en présentiel. C'est donc logiquement le retour des hackatons organisés par plein d'entreprises. J'ai pas trouvé (en cherchant 3 secondes) de page qui résumait pourquoi c'est pas une bonne idée de faire ou de participer à un hackaton. Alors j'vais en écrire une.

Pour ceux qui n'ont pas suivi le sujet, le principe d'un hackaton est de constituer des équipes qui vont développer un projet en un temps assez court, par exemple 48h (mais maintenant on voit des hackaton de une semaine). Dans ces 48h, il faut avoir une idée et commencer à écrire un "proof of concept", un logiciel qui permet de démontrer rapidement le principe. À la fin, chaque équipe présente son travailm, et un jury décide d'une équipe gagnante. On attend des participants qu'ils travaillent sur leur projet à fond, jour et nuit sans s'arrêter, compte tenu du délai très court accordé.

Bien sûr les participants ne sont pas payés, sauf éventuellement l'équipe gagnante qui récoltera au mieux quelques centaines d'euros (divisé par le nombre de membres de l'équipe, ça ne fait pas un salaire raisonable pour du travail de nuit).

La plupart du temps, le code développé va finir à la poubelle. Et c'est probablement la meilleure fin. Quand on décide de réaliser un projet dans un temps aussi court, on a pas le temps de réfléchir à une architecture propre. On va donc surtout créer de la dette technique et pas grand chose d'exploitable. Si quelqu'un trouve que l'idée de départ est bonne, il fera probablement mieux de refaire le code de zéro.

Parlons des idées justement. Normalement quand on a une idée et qu'on envisage d'en faire quelque chose, soit on la garde secrète jusqu'à que le produit soit prêt, ou alors si c'est un truc un peu plus avancé, on peut éventuellement poser un brevet (qui suppose que l'idée n'était pas diffusée avant, sinon c'est plus possible). Un hackaton, c'est une très bonne solution pour une entreprise de vous forcer à publier gratuitement vos idées pour les réutiliser ensuite. Vous allez donc travailler gratuitement pour eux.

En plus de l'aspect compétition, les hackatons représentent la culture d'entreprise et la façon de travailler la pire qui soit: on va jeter à la fois la qualité du code et la santé des développeurs, juste pour tenir un délai arbitraire. C'est tout le contraire de ce qu'il faudrait faire. Sur le long ou même le moyen terme (plus de deux jours), un projet a surtout besoin d'un code bien architecturé avec une dette technique gérable, et d'une équipe en pleine forme qui peut durer plusieurs années. Je vous laisse réfléchir qui peut avoir intérêt à habituer les développeurs ou futur développeurs à autre chose. Dans un hackaton, l'aspect compétitif est là pour mettre la pression sur les participants et les pousser à bout.

En plus de ça, ce mode de fonctionnement où on s'attend à ce que les gens soient disponibles 48h ou même parfois une semaine d'affilée en continu n'est pas du tout inclusif. Vous ne trouverez par exemple pas parmi les participants (liste bien sûr pas du tout exhaustive):

  • Des personnes qui ont une famille dont ils doivent s'occuper,
  • Des personnes qui ont un emploi ou des études avec des horaires fixes,
  • Des personnes souffrant de toute une liste de handicaps qui nécessitent du repos ou imposent d'autres contraintes,
  • Tout simplement des personnes qui préfèrent avoir une hygiène de vie plus raisonable et n'ont pas envie de faire une ou plusieurs nuit blanches.

Some random thoughts about XMPP spaces

Posted by pulkomandy on Tue Aug 17 17:58:46 2021  •  Comments (0)  • 

You may or may not know I am involved in XMPP in a way or another for some time. Recently I have started work on Renga, an XMPP client for Haiku, and participated in an online meeting and discussion about why Discord is so succesful and what ideas XMPP could borrow from it. A part of the discussion revolved around the way Discord organizes multiple channels in a "server" and how that fits very well with their user base.

Today someone contacted me and shared a work in progress document about XMPP "spaces" which is an attempt to see how something similar could be made in XMPP. I was surprised to see the document dive straight into discussion about protocols and stuff like that, with the UI/UX part being "TODO write this later". I am not sure this is the right way to design the thing. I was asked my input on it, so here it is. I have not a lot of experience of the XMPP protocol, but as a user, I chat on various systems with various people and there are several instances where I can see an use for such things.

So, let's tackle this from the user experience point of view. I will completely ignore the "but how do we do that?" part and focus on what I think could work well. Let's start by going back to basics and define some use cases. Why do we want to do "Spaces" in the first place?

Use cases

Let's imagine that XMPP is a very popular protocol and everyone uses it. No other chat system exists anymore. Let's see what it did to become so succesful. I will take my own usage of various IM networks to see how this could look in that alternate (or future?) world.

Communicating with my family

My parents are doing ok with phones and computers, but still, let's keep things simple for them. A single chat channel and ability to send 1:1 messages will be enough. A media library archiving all the pictures and links we sent to each other could be nice. There will be almost no movement of users joining/leaving this space (maybe a new girlfriend/boyfriend joining the family or leaving after a breakup once every few years?), and everyone can be a moderator or it could be just one person.

I think that was the easy case which is already mostly covered by existing options

In the office

I am working remotely this year and in our company this meant reviewing and improving our chat system, which we now use a lot.

I work in a company that has in total about 800 employees, of which 150 in the local branch I work at. We are software engineers and developers. We work on many projects for different customers. Each team is typically 3 to 30 people (with sub-teams in the largest teams). We also have some people who need to do things for many teams at once (for example our sysadmins are taking care of services and tools deployed globally or specifically for a given team).

In our current chat system, we have one single space for the 150 persons in the local branch. Each project has one or more private channels, which are not listed anywhere in the UI. When people join a project, their account is invited to all the corresponding channels. This works quite nicely for us and there doesn't seem to be a reason to group channels together here.

What we would like to have, however, is a way to create temporary sub-channels for discussing specific issues. Something that would be similar to an e-mail thread. Slack and Zulip are examples of chat systems which allow this. Zulip is very close to the email way, having separate threads at the core of its UI. In Slack, it is done by picking a specific message in a channel and replying to it, which creates a sub-discussion. This would be great to organize our chats and more easily keep the info we need.

Other nice to have features are a way to search for old messages in a specific set of channels (but probably this doesn't need to have them formally tied together as a "space"), and a way to pin things (mainly URLs or file attachments) to be able to find them easily. I can imagine also more advanced features such as a shared calendar to place our meetings and days off work in.

Probably, larger companies will want a more segregated system, and I can imagine companies which have non-computer-inclined people (not software engineers) may need some more centralized admin roles to oversee who has access to what. So that probably means accounts tied to some LDAP server, not being able to list MUCs unless your account is added to the appropriate space, and not allowing people to leave a space on their own because they would be unable to re-join it.

Opensource projects

I contribute to a lot of projects, some large, some small.

I will not go on for very long about the small projects because the existing solution (just a single MUC) is just fine for me. So let's see about the larger ones which have a need to split their discussion into multiple aspects.

So, in this case, the main thing to think about is onboarding. If you don't care about onboarding, you will be just fine with a dozen independant channels which have no apparent relation to each other, except they are listed together on your website, and maybe they all have your project logo or some variation of it on them. If you care about onboarding, you want to make it easy for a newcomer to click on a single button/link and immediately join your Space and discover all the channels from inside there, in their favourite IM client.

You will probably want some kind of "backstage" channel for moderators to discuss ongoing issues. This should not be visible to regular users, of course. Which means multiple channels in a space can have different access rights. On the other hand, you may want to nominate moderators and automatically allow them to be moderators on all the channels. Speaking of moderation, you also want the ability to kick/ban someone from the whole space if they misbehave.

As an opensource project, you want to be transparent and have an archive of everything that was said and shared, possibly over the course of decades. This includes channels that are currently unused because the project was reorganized. Possibly you'll want to split the history of a space because one project was split into two separate parts. You may want to copy it to create a fork of the project while retaining the past history in both branches of the fork. And you may also want to merge the history from two projects together and form a single space, but probably I'm going a little crazy here.

You also want to preserve the privacy of your users. It should not be easily possible to identify that user A in space X is in fact the same person as user B in space Y, if they decided to use different nicknames in each place. On the other hand, you want to be really sure that if you talk to someone named "user A", it is really them, and not some other person using the same nickname.

Another aspect to think about here is notifications. For high traffic channels and projects, probably I won't read everything. I will have the chat client on my computer and read it when I have time or maybe if someone pings me. But I don't want this to ring my phone everytime something happens. It should be a distraction free thing that I can have running in the background. This mean I need easy configuration for which notifications I want on each of my clients. I think both for the whole space, and for specific channels (there may be some channels I have no interest at all in following, maybe because they are in languages I don't understand, maybe because the project is large and some topics are not interesting to me).

Chat with friends

One of my uses of IM currently is organizing board games sessions with friends (but whatever your hobby is, probably some of the same applies). Here, there isn't really a notion of a fixed "space". Some of my friends don't know each other or have met once during a board game afternoon and then never met again. Currently I have one rather large channel with a lot of people but I think I will just create and delete smaller groups as needed. In my case, a "space" is probably not useful here.

Gamer communities

I am a lot less familiar with this. I think a large part of the "opensource" section will also apply. Probably channels with restricted permissions (only a few people can talk) are needed. Also, some nice to have things: custom "stickers"/emojis specific to the server, ability to define and rename roles and assign them specific permissions, ... Just read the Discord documentation.

Chat with strangers

One place where IRC is still somewhat popular. There are chat services with various thematic or so channels (by age, location, or shared interests) all thrown together into a "space". People can join and talk with complete strangers. There are a lot of trolls and people with inappropriate behavior. Users of the service need an easy way to signal such things so a moderator can quickly intervene. If the space is big enough, there will be separate moderator staff for each channel, but probably still a common backstage channel for coordination

Thinking about the user interface

So, what do we need to put in our user interface? Here is an attempt to summarize it:

  • Single-button way to join a space
  • Ability to see a list of channels in a space you joined, with a description of what their purpose is
  • Media library with all pictures/links/? or pinned messages
  • Ability to see long term logs (multiple years) of all channels, including now inactive ones
  • Possibility for space moderators to archive a channel (only past logs available, no way to post new messages)
  • Manage permissions for a single channel and for the whole space (who can talk, who is a moderator, etc)
  • Ability to configure notifications, per-client, globally on a space and more specifically on each channel
  • Know who is joined in a space, ability to reliably ban people (in a way they can't avoid just by rejoining with a different nickname)
  • No way to identify that two users in two different spaces are in fact the same person
  • Multiple levels of administration: the owner of a space can nominate moderators for different channels, control which channels are visible to all users or to users with some specific role only, etc. Moderators can adjust some, but not all, settings of the channel they are moderating
  • Ability to join a space but only join some of the channels inside and not all

In terms of user interface, channels from the same space should of course be grouped together. There will probably be a LOT of channels so you probably won't get away with a single tree view, it will never fit everything on screen. Which means you need a first level with a list of all the spaces, showing which ones have ongoing activity. Then you can select one of the spaces and see the channels inside.

In the XMPP world, one thing to think about is how to handle things that are not in a space. Maybe they can just be put into a "default" space from the UI point of view?

If you know someone's real JID, and you start a chat with them from inside a MUC, it would be super annoying if that ended up being a separate chat history than if you contact them directly. Or maybe it's a feature to have separate discussions (let's say if you have a colleague and you talk work things, but they're also a friend and at other times you talk non-work things).

You will have some kind of management menu (maybe right click on the space icon/name) to decide if you want to leave a space, configure notifications, see who is a moderator or admin.

Quick notes on building gcc

Posted by pulkomandy on Sat Apr 1 13:44:05 2017  •  Comments (0)  • 

This may not be up to date anymore. A complete GCC for AVR (and AMR) is now available as HaikuPorts recipes, which provide a more complete process, with a C library and everything. Refer to these recipes if you need to do it (even on other platforms, the recipes aren't all that hard to read and adjust)

As you may have noticed, I like to develop stuff for all kind of weird devices. For this, I usually need a C compiler, and most of the time it's gcc (not always).

gcc is a big piece of software and there are some tricks needed to build it. Also, I run the Haiku operating system, which is quite nonstandard, so additional workaround are needed.

Today I built gcc for avr. Here are notes on how to do it so I don't spent a month figuring it out next time.

# gcc compilation for gcc 4.4.5 (4.5.x needs more stuff. maybe later)
# Made by PulkoMandy in 2010
# Before you start :
# * Download gmp and mpfr from HaikuPorts (http://ports.haiku-files.org/wiki/Downloads) and extract to /boot
# * Download gcc-core-4.4.5.tar.bz2 from gcc mirror and extract to work directory
mkdir obj && cd obj # This is the output folder. So you can keep the source area clean
setgcc gcc4 # We can't compile gcc4 with gcc2.
../gcc-4.4.5/configure --target=avr --enable-languages=c --prefix=/boot/common/ --with-mpfr=/boot/common/
	# Tell the target, then language we want, and where to install the result. Binaries will be called avr-* so don't care about overwriting other ones.
	# For some obscure reason mpfr isn't detected properly, so we force the prefix.
make all-gcc ; make install-gcc # This does compile only gcc, not libgcc; which failed to work for me.

There are other things to watch out : I had to remove a -lm somewhere as Haiku doesn't have a separate libmath.

Next : build a PlayStation development toolchain, including gcc MIPS target.

Projects I'm NOT coding

Posted by pulkomandy on Sat Apr 1 13:41:22 2017  •  Comments (0)  • 

SometimesI have ideas about software that could be interesting to write or useful to use; but I'm already contributing to a lot of projetcts and I'd rather not start all of the new ones.

Following a talk on #haiku irc channel, I decided to put the list online so other people can pick these projects up and start working on them.

Please let me know if you made (part of) one of them. So I can link to you here :)

PulkoMandy's ever growing TODO-list

Older items first.

  • Port the znax flash game to the Atari Lynx console.
  • Create a device that plug on the amiga clockport and can serve as a base for complicated projects. A DSP to decode OGG would be nice.
  • Build a mouse adapter, similar to AmiPS/2, but using the AMXmouse protocol for the CPC.
  • Compile the SDL version of Road Fighter for the GP2X
  • Port Rick Dangerous2, Prince of Persia 1 and 2, The Lost Vikings 1 and 2, and Jazz JackRabbit 2 to SDL or another lib and make them run on the GP2X.
  • Maplegochi : an electronic Maple tree for the Haiku desktop. You feed it with some water and you watch it grow day after day. The tree is built with random fractals so everyone gets an unique tree on its desktop. It changes over the seasons (depending on the system date and the locale). It is a replicant living on the desktop and acts as a living background,without being too disturbing.
  • Make the various usb to serial chips work on haiku with a simple terminal program.
  • Code a ROMDOS D1 filesystem add-on for Haiku to read/write floppies for Amstrad/Schneider CPC computers.
  • Network-shared whiteboard application for Haiku, allowing people to draw diagrams and see each other drawings. Likely use Playground as a base.

Some projects

Posted by pulkomandy on Fri Nov 8 12:56:42 2013  •  Comments (0)  • 

Some time ago I set up a trac install on my server to put all my running stuff. As I ended up using the provided wiki for tech documents, there is nothing visible on this website. This article list all these projects so they are linked from somewhere, maybe this way Google will index them better.

Also, there's more on my github page, where I plan to migrate most of the stuff above, someday (because git is better, github is more visible, and Trac in fastCGI mode has a very annoying memory leak making it the first source of problems on my homeserver). There's also my Google Code Prject Hosting page, which I plan on not using anymore but there are some projects to migrate, still.

Amelie, a filesystem for 8-bit computers

Posted by pulkomandy on Sat Jun 8 23:23:19 2013  •  Comments (0)  • 

This is a project I've worked on and off for the past year. It all started with the MO5SD project by Daniel Coulom (French page). The idea of this is to use an SD card plugged to the tape port of the french Thomson MO5 computer, which happens to use TTL logic levels.

I reused the ideas from MO5SD to build a similar interface for the Amstrad CPC printer port. However, the Amstrad operating system has better capabilities than the MO5 one and is friendly to expansion ROMs. This makes it possible to run a full-blown filesystem rather than a simple bootloader like the MO5 version does.

Getting the SD card read/write code working was the easy part. After spending some time (with help from SyX and Cloudstrife) optimizing the z80 assembler code for SPI bit banging, I started looking for a suitable filesystem. The most common floppy disc ROMs for the CPC are AMSDOS (the one that shipped with the machine) and Parados, which improves support for dual side, 80 tracks floppy drives from PC. None of them easily allow to reroute disk access to anything else than the floppy controller. I heard that RODOS, a less known disk ROM, has such a capability, and that it also has directories, permissions, and some other nice features. However, close inspection of the RODOS ROM showed that there is actually no way to redirect disk access to anything else than the FDC (if you know a way...)

Moreover, the RODOS filesystem design looked like it would be slow. My bit-bang access to the SD card doesn't go very fast, and jumping around between several sectors isn't a good thing. I wanted to keep files without too much fragmentation so I could load them fast. RODOS is also limited to 20MB volumes, which sounded huge in the 1980s but felt ridiculous for my 4GB SD card. Finally, RODOS requires some RAM, and any ROM that does that on the CPC reduces compatibility with some software.

So, I decided to design my own filesystems. The goals are simple:

  • Use as few RAM as possible. AMSDOS allocates a 2K buffer, that should be the only space we are allowed to use (and some stack, of course).
  • Limit access to the storage medium whenever possible. Try to not read the same sector multiple times.
  • Allow the use of big drives. This requires directories, and also some other tricks.
  • Be z80-friendly. No 32-bit math is ever done.
I worked on a C++ version first. This allowed me to keep the code readable while I experimented with various things. I did all the testing on PC and will start converting the whole stuff to z80 code only when I'm fairly sure there won't be big changes to it again.

An SD card can hold up to 4GB of data. I decided to split this into 256 volumes that map to the CPC notion of "users". On an AMSDOS floppy, each file belongs to a single user and can't be seen by others. This limits each user to 16 megabytes of data. Since the files are allocated on 512-byte sectors, these can all be addressed with a 16-bit offset. I also limited the number of files and directories to fit them on a 16-bit counter.

The filesystem uses a block map with sector granularity (the 16 first sectors are used for that), and a file/directory list based on extents. When the filesystem is not fragmented, a directory will use just 16 bytes of space (including the 11 character name and up to 256 entries), and a file will use a 16 bytes header to store up to 128K of data. These entries can be extended, so directories have unlimited number of entries, and files have unlimited size.

You can read more about the data structures in the filesystem readme, and also check the source code. They are both available at CPCSDK. The C++ code makes use of some C++ features such as vector, but this could easily be converted into plain C. You will also find a WIP ROM code for the z80 version, mostly done by SyX. I'll start filling it with actual disk access code someday, for now it's just managing the patching of AMSDOS code and forwarding the access to floppy drives.

GuideML AmigaGuide converter for Haiku

Posted by pulkomandy on Tue May 28 18:22:36 2013  •  Comments (2)  • 

I'm currently porting some Amiga software (namely, the ACE CPC Emulator). As usual with Amiga software, the user documentations are written in AmigaGuide. I wanted to convert it to a more usual format for Haiku users. I found some tools, but they all run only on Amiga systems. Fortunately, one of them is written in C and open source.

You can get GuideML for Haiku sourcecode at my GitHub account.

The interesting part (development-wise) of this is that I used a set of wrapper header that convert the Amiga API calls into Haiku ones. The Amiga API is based on BCPL, which predates C and has differences such as using FPutC instead of fputc, it also has extra stuff such as support for lists, and a different way to allocate and free memory (where you have to tell FreeVec() the size of the block you're freeing).

This set of header allowed me to make very few changes to the core code of GuideML. I'm also using these headers for the port of ACE, where I could getthe emulator core running quite easily in a short time.

I'm still refining these headers to remove warnings, make them C++ safe (as my user interface for ACE is written in C++) and make them behave as close as possible to the original system. I even reused parts of AROS, an open source rewrite of the Amiga OS, which aims for source-level compatibility. Their ReadArgs implementation was not too hard to port to Haiku, and now my software can use just the same smart argument parsing as on Amiga. I still have to get something similar to icon tooltypes working, however, but that shouldn't be too hard as ReadArgs can parse strings from a file, instead of the command line.

IUP portable user interface

Posted by pulkomandy on Fri Mar 30 22:48:11 2012  •  Comments (5)  • 

Just a quick note to say I started a project with IUP as the framework for user interface. After spending some time with Qt and wxWidgets, I've finally found an UItoolkit that just does what's expected. No need for a precompiler, no replacement of my main function by some wrapper, no rewrite of the C++ STL.

IUP is written in C, but has a nice attribute-based interface that makes it very easy and pleasant to use. I've made good progress on building my windows and the layouting system is nice to work with (still fighting with Qt one...). IUP is cross platform as it uses either comctl32, GTK or Motif. I think I'll write an Haiku/BeAPI backend for it as it's going to be rather useful.

It's quite easy to install in MinGW, just get the prebuilt binaries (either dll or static linked) and includes and drop them at the right place. No need to recompile for hours like wxWidgets.

I still have to see how integration of custom widgets is possible. This gets useful quicker than one may think, as soon as you need an hex-editor, a music-tracker like interface, or something similar. But these seem to be handled by IUP with a generic "grid" control that looks quite flexible. In wxWidgets both of these were a mess, with no easy way to do a custom control like an hex spinbox, and a button grid giving very bad performance. Having to manually call freeze() and thaw() on widgets got boring really quickly. Not to mention the complete lack of threading support...

Let's see how it goes in a few weeks!

IUP with C++

While the documenation suggests a way of using IUP in C++ and encourages it, I was not so happy with using friend functions or static methods. So I came up with my own solution involving a bit of C++11 magic (variadic templates). The result is the IUP++ class that reigsters itself as an IUP callback (with any arguments) and forwards the call to a C++ object and method. It's used this way :

Callback<Gui>::create(menu_open, "ACTION", this, &Gui::doStuff);
Callback<Gui, int, int>::create(toggle, "ACTION", this, &Gui::doMoreStuff);

Where Gui is the class you want to answer to the event, menu_open and toggle are IUP handles to UI objects, "ACTION" is the callback name, this is the object to forward the event to (an instance of Gui), and doStuff and doMoreStuff are methods called. Notice the Callback also needs the parameters to these - that's the second "int" in the second example (the first one being the return type, that defaults to int if missing, but is needed when you add parameters). I'm looking for suggestions on how to make this simpler, as there is still some repetition in it...

Smarter vim filetype detection

Posted by pulkomandy on Tue Jan 3 20:46:05 2012  •  Comments (0)  • 

Vim is, as you may know, is my favorite editor for all development purposes. The syntax highlighting is powerful and extensible easily. Most of the time, the file type detection for this is based on file extensions. Works well, unless you have files named .src or .asm for assembly language on different CPUs...

Vim documentation only shows how to set the filetype guessing from the file extension. Here's an example of doing something a bit more smart.

The idea is to put the CPU name on the first line of the file (in a comment). Then use the powerful regexp match features of vim to detect it:

; vimfiles/ftdetect/z80.vim
au BufRead,BufNewFile *.z80	set filetype=z80
	; The usual way to do it for clear file extensions

func! s:detect()
	if getline(1) =~ z80
		set filetype=z80

au BufRead *.src	call s:detect()
au BufRead *.asm	call s:detect()
	; And the smart one. Note it is useless on BufNewFile,
	; as the file will not have the header yet.

Do a similar file for each of your CPUs.

Note, it should be possible to scan fr the use of particular mnemonics to go without the header, but that requires a bit more work to identify many CPUs. Any volunteer ?

The software archive

Posted by pulkomandy on Sat Jul 16 20:50:40 2011  •  Comments (0)  • 

This is the script that runs my BeOS software archive

This is a website presenting software, similar to aminet for the amiga or others repositories. It runs without a database and is meant to be easily open to external contributions through ftp uploading.

The full script is less than 200 lines of perl and features a category hierarchy, screenshots, and some other useful infos about the software.

As usual, it is distributed under the MIT Licence.

#!/bin/perl -w
use strict;
use CGI::Carp qw(fatalsToBrowser);
use URI::Escape;
use Time::HiRes qw(tv_interval gettimeofday);

my $t0 = [gettimeofday];

my @QUERY = split(/&/, $ENV{'QUERY_STRING'});
my %query;
foreach my $i (@QUERY)
    my $mydata;
    my $varname;
    ($varname, $mydata) = split(/=/, $i);
    $query{$varname} = $mydata ;
my $line;
read(STDIN, $line, $ENV{'CONTENT_LENGTH'});
@QUERY = split(/&/, $line);
foreach my $i (@QUERY)
    my $mydata;
    my $varname;
    ($varname, $mydata) = split(/=/, $i);
    $query{$varname} = $mydata ;

print "Content-Type: text/html\n\n";
print <<ENDHTML;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<meta name="keywords" content="Haiku Software Archive BeOS" />
<meta name="description" content="BeOS software archive" />
<meta http-equiv="content-type" content="text/html; charset=iso-8859-15" />
<title>PulkoMandy's BeOS software archive</title>
<link href="style.css" rel="stylesheet" type="text/css" media="screen" />
<h1>PulkoMandy's BeOS software archive</h1>
<p>This is an archive of BeOS software. Unlike BeBits, the files are archived locally so if their origin gets lost, they'll still here safely.</p>

<p>Most of the files in this archive are from the emupt website, gathered by Xeon3D. Thanks to him for the great work.</p>
<p>You can access the archive directly if you don't like this web interface <a href="archive/">here</a>.</p>
<p>Most of the stuff from emupt is still unsorted, have a look <a href="www.emupt.com">here</a>.</p>

<p>The main goal of this website is to try to get some of this software open sourced so it can be improved for future use.
Tracking of the original authors is sometimes difficult, but usually gives good results. They may also be happy to know the
BeOS world is still alive. This goal is the reason for the separation in three folders.</p>
<li>Nosource : unfortunately, these apps are closed source. We need to get in touch with the authors and ask them to release the code.</li>
<li>Source: these application are distributed with their source code, but they are not updated anymore. Take over the development of some of them !</li>
<li>Adopted : these apps are living their own life somewhere else.</li>

if ($query{file}) { # User wants to show details about a file
    my $file = $query{file};
    open(F, "$file.desc");
    my %data;
    my $description;
    while (<F>) {
        my $field;
        my $value;
        ($field, $value) = split(/:/,$_,2);
        if (length($value) > 0 && $field =~ /^[a-z]+$/) {
            $data{$field} = $value;
        } else {
            $description = $description . $_;
    print "
    <h2><a href=\"$file\">Download $data{name}</a></h2>
    <tr><th>Short description</th><td>$data{shortdesc}</td></tr>
    <tr><th>Author</th><td><a href=\"$data{url}\">$data{author}</a></td></tr>
    <img src=\"$data{screenshot}\" alt=\"screenshot\"/>";
    print "<p>$description</p>";

    print "<a href=\"/~beosarchive/\">Go up</a>";
} else { #User wants the full software list
    print "<h2>Full software list</h2>";

    sub loopDir {
        my $f;
        my($dir) = @_;
        opendir(DIR, "$dir");
        while ($f=readdir(DIR)) {
            next if ($f eq "." || $f eq "..");

            my $path = "$dir/$f";

            if (-d $path) {
                # We found a directory: recurse into it
                print "<li class=\"dir\">$f</li>\n";
                print("<ul class=\"dir\">");
            } elsif($path =~ /\.desc$/) {
                # we found a .desc file
                $f = substr($f, 0, -5);
                $path = "$dir/$f";
                print "<li class=\"file\"><a href=\"?file=$path\">$f</a></li>\n";
        print "</ul>";

    print "<div style=\"float:left\"><h3>Files without source</h3><ul>";
    print "</ul></div><div style=\"float:left\"><h3>Files with source looking \
        for a maintainer</h3>";
    print "</ul></div><div style=\"float:left\"><h3>Adopted projects</h3>";
    print "</ul></div>";

my $elapsed = tv_interval ( $t0 );
print "<p style=\"clear:both; width:100%; border-top: 1px solid #ECC;\">\
    Page generated in $elapsed seconds.</p></body>";

SVN to IRC commit bot

Posted by pulkomandy on Thu Jul 14 22:26:08 2011  •  Comments (0)  • 

CommitOMatic is an SVN post-commit hook that connects to IRC and tells the logmessage.

Put this file as "post-commit" in the hooks directory of a subversion repository, and set the settings as you need.

#!/usr/bin/perl -w
# see http://www.javalinux.it/wordpress/2009/10/15/writing-an-irc-bot-for-svn-commit-notification/
# see http://oreilly.com/pub/h/1964
use strict;
# We will use a raw socket to connect to the IRC server.
use IO::Socket;

# The server to connect to and our details.
my $server = "irc.freenode.net";
my $nick = "Commit-O-Matic";
my $login = "pulkobot";
my $channel = "#commits";

my $repos = $ARGV[0];
my $rev = $ARGV[1];
my $commit = `/usr/bin/svnlook log $repos -r$rev`;
my $user = `/usr/bin/svnlook author $repos -r$rev`;
chomp $user;

# Connect to the IRC server.
my $sock = new IO::Socket::INET(PeerAddr => $server,
                                PeerPort => 6667,
                                Proto => 'tcp') or
                                die "Can't connect\n";
# Log on to the server.
print $sock "NICK $nick\r\n";
print $sock "USER $login 8 * :Commit-O-Matic Robot\r\n";

# Read lines from the server until it tells us we have connected.
while (my $input = <$sock>) {
    # Check the numerical responses from the server.
    if ($input =~ /004/) {
        last; # exit the loop
    elsif ($input =~ /433/) {
        die "Nickname is already in use.";

# We are now logged in : join a channel
print $sock "JOIN $channel\r\n";

# Now wait for the "end of name list" message
while (my $input = <$sock>) {
    # Check the numerical responses from the server.
    if ($input =~ /366/) {
        last; # exit the loop
    elsif ($input =~ /433/) {
        die "Nickname is already in use.";

# We are now logged in.
my $cmd = "PRIVMSG $channel :";
$repos =~ s#/home/subversion/##;
print $sock "$cmd$repos: $user * r$rev\r\n";
chomp $commit; #svnlook add an extra newline.
chomp $commit; #svnlook add an extra newline.
my $com = $commit;
$com =~ s/\n/\n$cmd/g;
$com =~ s/^/$cmd/g;
print $sock $com;
print $sock "\n";

# Get out of it
print $sock "QUIT bye... \r\n";

Opensource your abandonware

Posted by pulkomandy on Tue Nov 30 22:27:26 2010  •  Comments (3)  • 

As you may know, back in 2007 I ressurected the development of GrafX2. this old pixelart program, made only for DOS, was left out 6 years earlier by the authors, that had moved on to more modern computers. Today, graFX2 is amongst the best tool for pixelling, particularly on Linux or other alternative operating systems. Many people are using it daily to draw really nice pictures. The newer versions added a lot of features such as layers, and there's more to come.

This was possible only because the authors decided to release the source when the project stopped. The code wasn't perfectly clean; it was tied to ms-dos with some optimized parts written directly in assembly language and accessing the video card hardware directly. Of course, getting an SDL-based version out of it was not easy. But still, it took considerably less time than rewriting everything from scratch. Also, part of the userbase for the old GrafX2 upgraded to the new one. For some of them it felt like getting back home after years of using suboptimal tools.

During the revival of GrafX2, I had to develop my web searching skills a lot. First, the original GrafX2 website was offline, and the sourcecode was gone with it. Thanks to filewatcher, an ftpsearch engine, and the web archive, I was able to locate a copy on some russian FTP. Then, I wanted to get in touch with the authors to let them know their software finally found some use.

But grafX2 isn't the main purpose of this article. Last month, I downloaded APlayer, a music player for BeOS. After some hacking to get it working on Haiku (which eventually led to uncovering and fixing a compatibility bug), I noticed that most musics from Burned Sounds, my preferred chiptune collection, didn't load. The strange thing is that most of them were in formats supposed to be recognized by APlayer. But looking closer, it turned out they are packed using the Shrink algorithm. This is a packing system from Amiga days, which can be unpacked only on amiga for lack of any sourcecode or format information. Well, that was until yesterday. Using my high web searching skills, I found the author of Shrink and kindly asked him by mail if he was willing to release the sourcecode for hissoftware, the last version being from 1996.

He was a bit surprised to see there was some files using Shrink still around, but he had a linux version of the archiver. This version is now released as GPL sourcecode at sourceforge. This is an important step for me in getting more open source software available ; but also in preserving old files packed in this format. I hope some other people will find it useful too.

I forgot to mention I also made possible the release of a whole lot of other BeOS software made by Arvid and Jonas Norberg. This include the sawteeth sound synthetizer, as well as the backslash n demo, and also some other unfinished code.

The overall message is, for developpers : think about opensourcing your old projects. Even if the source is not as clean as it could be ; even if they are of no use for you ; even if they only work on a dead since 10 years operating system : someone, somewhere, may find it useful. You can visit the Unamintained Free Software page to get some examples of how passing on a project to someone else may work. But not everything goes through this website.

For non-developpers, don't hesitate to get in touch with the devs, even for unmaintained apps, and ask for an open source release. If the software is dead, the author isn't going to get anymoney for it, so why not release it so other people can improve it ? This is the fastest way to get more open source software. And don't be shy, developpers are, above all, normal people, and they do like hearing from users.

Static linking nightmares

Posted by pulkomandy on Mon May 31 11:25:58 2010  •  Comments (0)  • 

I recently ported Reloaded, an Amstrad CPC emulator, to Windows. This turned out to be more complicated than expected, and I encoutered problems for which I can't find any proper solution on the internet, so I decided to tell you how I solved them.

Step one : SDL and wxWidgets

Reloaded is a very special project. It started its life as a fork of the Caprice32 emulator. Caprice was an emulator designed for Windows, but later ported to Linux using SDL to render the screen. SDL didn't provide support for anything else than pixels, and we wanted complex windows, so we decided to use wxWidgets. After some hacks, we got that running : SDL was used for sound and timers, and wxWidgets for display rendering.

It worked quite fine on Linux, but when trying to run it on Windows, I encountered a problem : both SDL and wxWidgets are trying to define the WinMain function to do some special inits and call the application's main. Of course, having two WinMains didn't please my compiler.

wxWidgets offers a nice way to solve that : their WinMain is defined in the macro IMPLEMENT_APP, which is easy to replace with something else if you want so (you have to do somme inits by yourself, but that's ok). SDL, however, doesn't allow you to do so : as soon as you #include SDL.h, it is not possible anymore to define WinMain yourself!

Reloaded is now using Portaudio for sound and doesn't rely on SDL anymore. I also had to disable OpenGL as our code used SDL_Surfaces, and rewrite some timer handling to use native windows functions.

Step 2 : getting it to link

Another particularity of Reloaded is the way its built : we wanted to create a platform-independant core, and a gui part that wraps around it and provide the windows and graphical stuff. This allows for really easy porting : you have very little things to alter in the core and only mess with rewriting the GUI part.

This was counting without autotools limitations : to build these files properly, we had to use different defines on each side. The core will have to look for portaudio includes while the gui will want wxWidgets. We solved that by building them into two separate .a static libraries, then linking these libs to a single executable.

Again, this was working well on Linux, but Windows strangely failed to link the thing, giving undefined references to portaudio and some other DLLs we are using. Also, I was getting an "undefined reference to main" for no apparent reason (as there was a WinMain function in my program). After some searching, I found I was supposed to add -lmingw32 _before_ my .a in g++ command line, or else the runtime loader wouldn't find WinMain. But doing so would result in undefined reference to every possible function in nwxWidgets.

After a full week of trial and error, I finally managed to get it working : you have to link -lmingw32 first, then your static libs (in the right order so they can link to each other), and finally link wxWidgets and other stuff you need.

I don't know why the gcc tools want the static libs to be in the right order while dynamic one will not care, I think that's not very practical and even annoying. But I hope this article will help you solve the same kind of problem, shouldyou ever encounter it.

Atari Lynx development under Linux

Posted by pulkomandy on Sun Jul 5 01:33:33 2009  •  Comments (0)  • 

Bon alors, j'ai une Lynx 2 qui traine dans mon bazar et j'ai dcid d'essayer de programmer un peu dessus. Le hardware a l'air plutt sympa et assez costaud pour permettre de dvelopper des trucs rapidement. Sprites zoomables et dplaables ligne par ligne permettant de faire des polygones texturs, son 4 canaux base de gnrateurs polynomiaux, et quelques autres gadgets sympa. Donc voil une catgorie de mon site ddie mes aventures avec cette console.


cc65 est un compilateur pour le 6502 qui est le processeur de la Lynx. Il est fourni avec des bibliothques de base permettant de faire quelques trucs. Ces bibliothques sont portables sur plusieurs plateformes (c64, NES, ...) mais avec pas mal de limites. Elles sont loin d'exploiter les capacits de la console fond. Je vais donc probablement crire mon propre toolkit pour grer tout a. D'autre part cc65 a l'air d'tre assez limit niveau optimisation du C, il faut tout faire la main pour avoir du code efficace, aussi je pense que je vais rapidement devoir apprendre l'assembleur 6502 pour passer aux choses srieuses. Cela dit, je parle dj l'ARM et le z80 donc la transition devrait se faire sans trop de mal

CC65 n'est pas disponible dans les paquets Debian, pour des raisons de problmes de licence. Le code est trop ancien pour avoir t mis sous GPL... Il faut donc faire l'installation la main. Hereusement, a se passe plutt bien.

wget ftp://ftp.musoftware.de/pub/uz/cc65/cc65-sources-2.12.0.tar.bz2 # tlchargement de la dernire version de cc65
tar xvf cc65-sources-2.12.0.tar.bz2 # on dcompresse...
cd cc65-2.12.0                      # on va voir ce qu'il y a dans le dossier
make -f make/gcc.mak                # on lance la compilation
sudo make -f make/gcc.mak install   # on installe!

Une seule petite subtilit : il faut ajuster deux variables d'environnement pour dire cc65 o il est install. J'ai mis a dans mon fichier .bashrc pour rgler le problme une fois pour toutes.

# Configuration de schemins d'accs importants pour cc65
export CC65_INC=/usr/local/lib/cc65/include 
export CC65_LIB=/usr/local/lib/cc65/lib

Squelette de projet

Bon, voil, j'ai maintenant un compilateur qui fonctionne. Pour le tester (et pour s'en servir plus tard), on a besoin du package d'exemple de Karri. Ce package contient un projet complet avec makefile et tout le bazar pour gnrer un fichier .lnx. C'est un programme simple qui fait une petite application de dessin. a servira de base mes prochains projets mais je vais surement faire quelques modifications dans le makefile. Bref, pour le moment, je lance make, et a me compile un fichier .lnx sans le moindre problme. Cool.

mulateur: Handy SDL

Pour l'instant je n'ai pas de cartouche ni de cble BLL pour ma Lynx. J'ai donc besoin d'un mulateur pour tester mon bazar. J'ai essay mednafen qui est dans les paquets Debian, mais il s'est lamentablement plant avec une segfault au premier lancement. Poubelle, donc. J'ai tlcharg le code source de Handy SDL. Petit problme avec cette archive, les dossiers n'ont pas les droits en excution ce qui empche la compilation de fonctionner. Un chmod +x -R sur le dossier extrait de l'archive rgle le problme. Ensuite, a compile tout seul. Pour utiliser l'mulateur il faut aussi un dump de la ROM interne de la Lynx. Celui qu'on trouve chez planetemu a l'air trs bien. Il semblerait que vous n'ayez pas le droit d'avoir ce fichier si vous ne disposez pas d'une vraie Lynx. Je suis pas spcialiste en droit, renseignez vous.

Pour que Handy marche bien chez moi, je dois mettre le bpp 16, sinon l'image ne s'affiche pas correctement. part a tout se passe bien. J'ai aussi l'impression que la musique de ma cartouche de test (faite avec ABC) ne fonctionne pas. Mais de toutes faons, ABC n'a pas l'air trs simple utiliser donc mon premier objectif pour la Lynx sera de coder un vrai tracker capable d'exploiter fond les capacits de la machine, qui ont l'air plutt sympathiques.

La suite...

Prochaines tapes pour quand j'aurais le temps :

  • Faire une cartouche et un cable BLL pour tester des trucs en vrai sur la Lynx,
  • Coder un tracker capable d'exploiter les capacits de la console fond,
  • Coder un jeu ou une dmo sur la console. Commencer simple et puis faire des trucs plus compliqus aprs,
  • Vendre les jeux programms (bon l on rve hein, pas avant 2020 au moins...),
  • Tenter de conqurir le monde !