PulkoMandy's home on the web

Porting Smalltalk to the Amstrad CPC: some progress

Posted by pulkomandy on Tue Mar 19 21:15:18 2024 • Comments (0) •

If it doesn't work, keep trying

My previous article on porting Little Smalltalk to the Amstrad CPC received some attention and support messages from other people, who either find it useful, or just interesting to read. I got a comment from someone who was happy that I am doing retrocomputing and not just retrogaming. I'll take that as a compliment and I think it is a somewhat common theme in the Amstrad CPC scene.

As you can probably guess from the previous post on garbage collectors, I am still thinking about this project and trying to figure out a way to make things work.

First, I noticed that I had made a mistake in my first attempt: by not looking closely enough, I assumed that the image file would be somewhat specific to the word size and endianness of the target machine, leading me into attempting to build the image directly on the CPC. A closer look at the code made me realize that this is not the case. The stored image format is generic and does not change depending on endianness and word size.

So, that means I can throw away my work on porting the image builder, and focus on the VM itself. But I was still curious why the built image was more than 100 kilobytes. Surely there is not *that* much code in Little Smalltalk?

I started taking a closer look at the image file in an hex editor. And I noticed something that I should have found much earlier, especially as it was hinted at in the Little Smalltalk book. The image contains not only the bytecode, but also the sourcecode to all methods. This allows Little Smalltalk to not include a disassembler. You can open the sourcecode in an editor, edit it as you want, and then recompile it to create a new bytecode. But the price to pay is a lot of used up space in the image. I think later I will write (or port) a bytecode disassembler (I think there is one in one of the later versions/forks of Little Smalltalk?). For now, I disabled the sourcecode generation in the ImageBuilder, and it resulted in a much smaller image of 49KB. Now that's a size I have at least some chance of fitting into the CPC RAM! So let's try that.

The porting of the Smalltalk VM to the CPC is pretty similar to what I did for the ImageBuilder. I compile the C code, set up a linker script, replace file IO with AMSDOS routines which are much simpler than POSIX, but should still be enough for our needs at least to load the initial image. I encounter similar problems in the image loading than in the image saving (too much stack usage by recursive functions) and apply the same workaround (moving the stack to a different place before calling the heavily recursive functions).

Anyway, I got it running. Sort of. And then I ran out of memory again.

Memory layout of the CPC

To understand the problem (and the next steps), it is useful to understand how memory is addressed on the CPC. First, remember that the machine uses a Z80 CPU, and so, it can address 64KiB of memory using 16-bit addresses. On the CPC this is organized as such (rounded to the next 256th byte for simplicity):

0000-0100: Reset vectors and "low jumpblock" from the firmware. This is the handlers for the RST instruction, interrupts, etc.
0100-A700: Available for applications.
A700-BFFF: Used by the OS. Variables from the disk ROM, indirections for various system routines that are actually implemented in ROM, etc.
C000-FFFF: video RAM

So we have about 42KiB of RAM free for our application. Initially I tried to fit both the interpreter code and the Smalltalk heap (and any variables needed for the interpreter) into this space, and it didn't work. I removed various parts of the interpreter, I replaced the garbage collector with a stupid one that can't free any memory (this particular idea being brought up by one of the papers on garbage collectors I read, apparently some people were using their LISP machines this way for performance reasons and it was used also as a benchmark of the fastest possible GC algorithm).

Anyway, even with just half an interpreter, I still was out of memory rather quickly. So it's time to find other approaches.

Mocing the code to a ROM

The Amstrad CPC OS is well designed to allow running code from expansion ROMs. These ROMs are mapped at C000-FFFF (overlapping the screen memory), and can easily call into the system ROM (which maps at a different address) and also call into other ROMs as neede (for example, the AMSDOS ROM for disk access).

To put our code in ROM, we need some changes to our linker script and CRT0. I based this on scripts I wrote earlier while making a ROM version of the Contiki operating system.

-mwx
-i allinone.ihx
-b _INITIALIZED=0x0100
-b _CODE = 0xC000
-k /bin/../data/sdcc/lib/z80
-k /packages/sdcc-4.4.0-1/.self/data/sdcc/lib/z80
-l z80
crt0_rom.rel
allinone.rel

-e

The main change here is that _CODE is now at 0xC000, and we also add that _INITIALIZED is at 0x0100. _INITIALIZED is where the RAM area starts, the other RAM sections are put right after it. So now the linker will put our code in ROM, and our data in RAM.

Of course we also need to adjust the CRT0 file.

;; FILE: crt0.s
;; Generic crt0.s for runing code as a CPC ROM.

  .module crt0
        .globl  _main

        .globl _progend

        ;; Ordering of segments for the linker.
        ; Things that go in RAM
        .area _INITIALIZED

        .area _DATA
        .area _BSS
    .area _HEAP
_progend::

        ;; Things that go in ROM
        .area _HOME
        .area   _CODE (REL,CON)
        .area _INITIALIZER (REL,CON)
        .area   _GSINIT (REL,CON)
    .area   _GSFINAL (REL,CON)

        .area   _CODE (REL,CON)

	    ;; The _CODE area starts with the ROM header.
		;; This has some version markers, a ROM type (background here, meaning
		;; it will not immediately take control of the system during boot),
		;; And a table of "RSXs", that is, commands that can be called from
		;; BASIC or ASM code by name.
        .db #1  ; Background ROM
        .db #4  ; Version 4.0
        .db #0
        .db #0
        .dw rsx_table

		;; The first routine is called during boot, in our case we do nothing
		;; at that point. The second is invoked by the name defined in the
		;; table below (|ST in our case).
        jp do_nothing
        jp init

rsx_table:

        .db 'T' + #0x80
        .ascii "S"
        .db 'T' + #0x80
        .db #0

init:
; This is the program entry point. Execution starts here
;; Initialise global variables, clear BSS areas
;; Note: not used in Smalltalk, I removed all initialized variables.
;; Probably still a good idea to clear the BSS part so all global variables are
;; initially set to 0...
;       ld      bc, #l__INITIALIZER
;       ld      a, b
;       or      a, c
;       jr      Z, gsinit_next
;       ld      de, #s__INITIALIZED
;       ld      hl, #s__INITIALIZER
;       ldir
;gsinit_next:
; Clear BSS sections
;       ld hl,#s__DATA
;       ld (hl),#0
;       ld de,#s__DATA + #1
;       ld bc,#s__DATA + #0x200
;       ldir

; Initialize disk ROM. We do the simplest thing here and just initialize ROM 7.
; It wouls be more compatible to save the currently active drive and see if
; there are other ROMs to initialize.
        ld hl,#0xabff
        ld de,#0x40
        ld c,#7
        call 0xbcce

        ; Init the heap. Do it after the disk ROM is ready, because that may change HIMEM
		; Not needed: no malloc/free is used.
        ; call __sdcc_heap_init

; Enter the C main.
        call    _main
        ; TODO figure out how to return to BASIC cleanly (we probably erased a bit much)
endless:
_exit::
        jr endless

do_nothing:
        scf
        ret

That's all we need, we can now run our C code from ROM. If it fits there, that is.

Making the code a bit smaller

At this point I had about 20-24K of code. A ROM is 16K, and, for now, I don't want the headache of switching between different ROMs for different parts of the code (maybe later, if I can't fit it all in one single ROM). So, I commented out more code from the interpreter and garbage collector and started simplifying some things. There is no difference anymore between static and dynamic allocations. All integer math code is removed as well as most other things. Also a big way to save space was replacing printf with simpler custom functions that don't do much custom formatting. I have way to print 16 and 8 bit values, NULL terminated strings, and "Pascal" style strings with a length known in advance. That is good enough, and it saved several kilobytes to not need printf's % parsing, decimal integer formatting, etc.

After this, I could finally try to run my VM again. And... it ran out of memory.

I guess that makes sense. Even after moving the VM code out of the way in a ROM, I still get only 42K of RAM free. And that has to hold not only the Smalltalk image data, but also all variables needed by the interpreter, the AMSDOS file buffers, etc. Well, it doesn't fit.

But, if you know a bit about the CPC, you have noticed the existence of the CPC 6128, a machine with twice as much RAM as what I have explained so far. How does this work? Well, the CPU cannot access any more RAM directly. So, there is a special hardware register to map extra RAM pages. There are various ways to use this, they are somewhat handcrafted to run CP/M Plus, a modified version of CP/M that was shipped with the machine and put the user code all in expansion memory, allowing applications up to 63K of RAM usage, with the video memory, and all system internals remaning in the main 64K of RAM.

Anyway, I decided to do something simple, and the simplest way to handle this memory in a way that's friendly to the CPC firmware is to map it by blocks of 16K at addresses 0x4000-0x7FFF. In my previous article about garbage collectors, I considered some more advanced options, but in the end I decided to keep things as simple as possible: all the Smalltalk heap and objects will live in this 4000-7FFF space, and I can keep the remaining parts of RAM (0100-3FFF and 8000-A7000) for the internal needs of the interpreter and GC (which will both need some variables and structures).

Of course, this create a new difficulty: the C compiler isn't designed to work with such memory banks. So, I will have to explicitly map the right memory page before accessing these variables. There are again multiple ways to handle this (so many choices...). For example, I could use a 16-bit address referring to a position in the 64K of extended RAM. But that would prevent having anything more than 64K of extended RAM. There are memory expansions for the CPC, and if just loading the base image already requires nearly 50K, I should rather prepare for use of one of these.

So, I will instead use indirect pointers. The idea is that pointers to objects can be represented as such:

struct indirection {
        unsigned char bank;
        struct object* ptr;
};

The "bank" here is the value to send to the Gate Array (the Amstrad CPC memory management chip) to map the correct RRAM page. For now we have to know about the vlaues C4, C5, C6 and C7 (these allow access to the 64K of extended memory) as well as C8 (this will put the 4000-7FFF from the base RAM back where it was).

The code to dereference an indirect pointer is like so:

__sfr __banked __at(0x7F00) GateArray;
// __sfr means this is accessed with the IN and OUT instructions of the Z80, instead of RAM access instructions
// __banked means the address is still 16bit (normally for IN and OUT it should be 8 bits, but manufacturers of hardware using the Z80 didn't respect this)
// __at(0x7f00) forces the variable to be at that address
// This means we can now write:
// GateArray = xx;
// And this will generate the appropriate assembler code:
// LD BC,0x7f00
// LD A,xx
// OUT (C),A

struct object* DEREF(struct indirection* x)
{
	    // Get the pointer first, in case the indirection is itself stored in
	    // the currently active bank (I didn't use that in the end, so maybe I
	    // can simplify this)
        struct object* o = x->ptr;
        GateArray = x->bank;
        return o;
}

I then modified the Smalltalk object structure to contain "indirections" instead of pointers to other objects. And I also added a lot of calls to DEREF all over the code of the interpreter, to map the correct memory page before accessing anything. As a first step, I didn't try to be smart about it (well, of course I did, then I had a lot of bugs, and I stopped trying to be smart, and then it worked better). So, I made a rule to not have any direct object* pointer stored anywhere in the interpreter code, only indirection* and calls to DEREF over and over. This will not be fast, as every other instruction will be switching to a different memory bank (or uselessly activating a bank that is already activated). But let's try to make it work this way first.

Starting from here, I have two choices. I can store indirections directly inside objects, or separately. Basically, one of these two:

// Indirection stored directly into the object
struct object {
        uint size;
        struct indirection class;
        struct indirection data[0];
};

// Indirection stored separately
struct object {
        uint size;
        struct indirection *class;
        struct indirection *data[0];
};

As you probably guessed by the fact that I named this structure "indirection", I went with the second option. The reason for this is that it is much easier to code when object sizes are in multiple of 2 bytes, and not in "multiples of 3 + 2". And also because you would have to be very careful to store the destination pointer before connecting the new bank, if the indirection itself was stored in a bank.

So, now, I have a big table of indirections living somewhere in 0100-3FFF. This table is filled with pointers to objects in the banked RAM. It creates a limitation to about 5000 objects in the system (16K/3). I have not hit this limit yet, but the initial image is already about half of that. We'll see how that goes.

This indirection also may simplify work for the garbage collector: it allows to easily move objects from one address to another, since all references to an object will be through the indirection table: just update the indirection table entry, and you're done. I think the garbage collector will benefit from that simplicity of not having to crawl through all other objects to find references to the one being moved.

Another advantage, mainly for debugging, is that there are now a lot of constraints on what a valid pointer is. An indirection must have an address in the 0100-3fff range. The pointed object must have a bank in the C4-C8 range, and a pointer in the 4000-7FFF value range. I have added some checks to my DEREF function to validate this. If the code attempts to dereference an invalid reference, execution stops with an error message.

Finally, these contraints also mean we have several unused bits, both in the pointers and in the indirections themselves. These are useful to store extra information. In indirection pointers, the high bit being set means that the value is not an indirection, but a direct 15-bit integer value. In indirections, unused bits in the bank or in the pointer fields can be used by the garbage collector to track which objects have already been garbage collected (the original code also did similar things, but using the size field in the objects themselves for the gc, and the lower bits of the pointers to mark integers, relying on the fact that allocations are aligned to 2 bytes and so for pointers that bit is always 0).

So, after all these changes and a bit of debugging, we're getting somewhere. The full image can be loaded into banks, and then execution starts. Unfortunately, it seems to be confused after a short time, returning to the wrong place after executing a "block2 (blocks are similar to lambdas in other programming languages).

So, I have more debugging to do. I have to keep a balance between adding more debug statements in the code, and not overrunning my budget of 16K of code in a single ROM. For now, I do not want to leave C and start rewriting things in assembler, because it would make it harder to change things and try new ideas. I hope I can delay that until the code is mostly running and there are at least some parts I'm sure won't change anymore. Then I can rewrite or hand optimize these first, and save enough space for debugging the other parts. Also, working in the ACE emulator has been very helpful, setting breakpoints in strategic places allowed me to find and fix many of the problems.

I continued debugging, identifying and fixing various issues. For example, if I use the DEREF function twice in the same line of code, like so:

DEREF(x) = DEREF(y);

that doesn't work right, because the = operator is not a sequence point. That means the bank activations for each side will not happen in the right order for things to be read and written where they should. So this must always be done through a temporary variable, which ensures proper sequencing.

Other mistakes include too many or not enough calls to DEREF, or simply writing code that doesn't behave the same way as the original while adding the DEREF calls. Eventually, I got it to run! Well, sort of.

Running out of space (again)

One of the very first things the Smalltalk image does is creating the 256 "Char" objects for each of the ASCII characters. Each of them is a full-blown object, using up 6 bytes in the heap. This saves a lot of special-casing in the VM. But, the loop to create them ends up creating a lot of objects and filling up my indirection array. Eventually the indirection array grows into the 4000-FFFF space, overwrites some objects there, and things crash badly in various ways. So maybe this indirection was not such a great idea. Or maybe I need to start thinking about the garbage collection, and see how many entries in that array it would manage to free. But the problem with that is, at the moment, I have less than 128 bytes left in my ROM, so a GC is not going to fit. Unless I start removing debugging traces and optimizing some code in assembler. Which I don't want to do until I'm sure the indirection table approach is working.

Well, that's all for today. See you in the next article in the series...

Ideas for a garbage collector for memory-banked systems

Posted by pulkomandy on Thu Feb 29 21:20:52 2024 • Comments (0) •

Well, apparently I'm not ready to give up on Smalltalk for the Amstrad CPC yet.

The previous attempt showed that it is not possible to fit even a rather simplified Smalltalk interpreter into the main memory of the CPC. I guess this comes as a surprise to no one, since there are 16KB used by the display, about 8 used by the OS, and so only around 40K left for everything else.

It turns out, others have tried this and somewhat succeeded. Pocket Smalltalk for the Palm Pilot promizes to fit a complete game (Dungeoneers) in a similar space (I have not managed to run the game and have no idea if it is good). The Palm is a bit quirky in the way it works, but also, Pocket Smalltalk abandons the idea of editing things directly in the live Smalltalk image, instead it is designed as a cross development environment, more similar to a Java virtual machine in a way. That's why I had not considered it in my porting attempts. Well, maybe I should revisit it.

There is also TerseTalk, but that seems a bit too limited for me (only 256 objects?) and also not implemented yet (the code is extremely incomplete)

On my CPC, I currently have 2MB of memory, so, surely there would not be any problems running a larger version of Smalltalk. Maybe even Smalltalk-80 could be made to work somehow. IT will probably be slow. But would it even be possible?

The Z80 CPU cannot access all this memory linearly. Instead, all we get is a 16KB window that can be moved to any part of that 2MB space. This is one of the annoying things to handle with native code. Some people choose to keep their work running on the CPC 464, and not use any of this extended memory, to save the headaches. (Not attacking or ridiculing anyone here, it is a valid choice and there are other reasons for it). Anyway, I got thinking about how the abstractions of Smalltalk could make it possible to handle this memory mapping scheme transparently for the developper.

So I started to read about garbage collectors. This post is an attempt to ogganize these thoughts and see if we can come up with a design that will have acceptable performance. I will continue updating it as I continue thinking about it...

Garbage collection basics

First let's talk about the general idea of dynamic memory allocation. The basic need is simple: an application is running and it wants to use memory for temporary storage, and release it later when not needed anymore, so it can be reused by something else.

One way to handle this is a very simple API like the one available in C: malloc() and free(). The application tells the memory allocator when it needs a new block of memory (nidicating how large the block should be), and gets a pointer to that block. It returns the block to the allocator when it does not need it anymore.

This seems as simple as it gets. Just two functions. But, it turns out, the job of the memory allocator is quite complicated. While the application keeps track of the currently allocated memory blocks, the allocator is responsible for keeping track of the ones that are not in use. This will tipically be managed with a "free list" of all the blocks that are not currently allocated. This can be implemented in a quite simple way, but there can be tricky problems, for example if the application forgets to free some block, or even worse, if it tries to free the same block multiple times. And even if the application has no bugs, this approach has another limitation: as long as a block is allocated and in use, it is not possible to move it in memory to a different address. This is because the application has a direct pointer to the block, and the allocator does not know what the application is doing with it.

The idea of garbage collectors is to solve some of these problems. From the API side, the problems all come from the free() function. It is hard to mis-use the malloc function. At worst, the application asks for more memory than is available, and gets an error instead of a valid pointer. The free() function however, can be misused, by calling it with addresses that were not returned by malloc, by calling it twice with the same address, by forgetting to call it, by trying to free blocks where information stored by the memory allocator has been erased, etc.

So, let's remove that function! Problem solved. This is the end of the article. Thanks for reading.

Oh wait. Of course that introduces a new problem. How is the allocator going to know when a piece of memory can be reclaimed and put back into the free list? Garbage collectors solve this problem by analyzing the content of the memory block. This, of course, requires some level of cooperation from the application. The general idea is that the garbage collector can access the data structures stored in the allocated blocks, and identify if they have references to other blocks. A block that is not referenced by anything else is unreachable by the application, and so, its memory can be reused.

The trick now is thinking about an efficient way to find such unreferenced blocks. But, since the garbage collector knows how to access the application memory and identify references in there, it may also write this memory and update the references to point to some new place. This gives us a new tool: it is now possible to move existing blocks in memory to a different place. As we will see, this opens a lot of new possibilities. The obvious one is that, after a while, with a typical malloc/free allocator, the memory becomes fragmented, and there are only very small pieces left. There is no easy way to get out of that situation. But with a garbage collector, you can. This makes garbage collection well suited to small memory systems, where fragmentation will quickly be a problem.

Establishing our invariants

I have read some texts about garbage collectors, and most of them tend to present algorithms starting with the earliest ones, and introducing the new ideas gradually. This is a good idea, but I think I will make a change still: first I want to introduce some terms and explanations. The thing is, each of the algorithm is described in its original research paper with its own terminology, especially for the earlier ones. This, for me, makes it harder to think about how these algorithms can be combined together, which part of one can be reused in another, etc. We will need this, because most of the research on garbage collectors is done with much larger machines in mind, and no banked memory (at least the ones I could easily find. I am interested if you have papers or algorithm descriptions for something using banked memory).

The abstraction I found very useful is the tricolor invariant. It formalizes a bit what we need to do. Let's have a brief look at it.

The situation is that we have a set of objects in our memory. These objects have references (pointers) to each other. What we need to do is, starting from a root object or set of objects, determine another set of objects where we are sure that, by following any number of references, it is not possible to reach that second set starting from the first one.

If the root set is correctly chosen, this will allow to determine a set of objects that cannot be reached in any way from the application code. These objects are unused, and their memory can be reused.

So, how does a garbage collector goes about doing that? In very general terms, and without giving any specific algorithm, we can establish a solution that is proven to work. This is exactly what the tricolor invariant is.

What it does is attribute a color to each object in the system. There are 3 different colors. It could have been red, green and blue, but due to the technical limitation of printing technologies at the time, most litterature uses black, white and grey instead. I will stick with these to keep things simple.

That's the "tricolor" part explained. Now let's talk about the invariant. An invariant is a condition that must always be verified. The invariant is:

A black object must never have a direct reference to a white object.

At first this does not seem helpful. But it is the key to understanding all garbage collection algorithms. Let's look at our 3 set of objects now:

There is a set of black objects. They may reference grey ones and other black ones, but never white ones.
There is a set of white objects. They may reference anything.
There is a set of grey objects. They may reference anything as well.

So, because of the invariant, if we want to reach an object from another, we have a constraint (which is essentially a different way to express the invariant):

All paths from a black object to a white object will at some point travel through a grey object

Now, let's consider a special case. What if there are no grey objects in the system? In this case, because of the invariant, we are sure that there are no references from any of the black objects to the white ones.

I think you can see where we're going with this: if the "black" set is the objects (potentially) reachable by the application, and the grey set is empty, we are sure that the white set is made of unused objects, which can be deleted.

So, we can now redefine the job of the garbage collector as a two step process:

Step 1: colorize all objects such as the invariant is verified
Step 2: modify the coloring of objects to try to reduce the number of grey objects to 0
Step 3: when there are no grey objects, the remaining white objects are unused and their memory can be reclaimed

We have to eliminate one edge case: the stupid garbage collector that just makes all objects white. Then the invariant is verified, there are no grey objects, and so everything can be deleted. Simple, but not very useful. So, the initial setup in step 1 must have either one black or one grey object. Usually this is called the root: the object(s) that allow to reach all others. In Smalltalk, that is the object references currently in the application stack.

As I have written, this does not describe any specific algorithm just yet. Garbage collectors have to decide on various things, including: how to remember the color of nodes, how to assign the initial colors, and how to process the nodes to reduce the number of grey ones.

malloc/free, again

Having established all this theory, let's look at the practical example of a typical malloc/free allocator.

Initially all memory is unreachable by the application. We can consider this a "white" memory block.

Every call to malloc takes some memory and makes it available to the application. Since the content of that memory is not yet filled by the application, it cannot at this time contain any reference to anything in the application, and even less any references to any other "white" (unused) area of memory. So, we can safely mark this block as black.

free() is used to return memory to the application. In theory, we don't know if the application still has references to that block in any other part of its "black" memory. So, we should mark this block black or grey. But, since we have no way to know what the application is doing, we don't have any algorithm to make this block white again. So, we have to trust the application that it is correct, and mark the block white.

It seems difficult to make anything simpler. But, if we look closer, we notice two things: first, it is not as simple as it looks. malloc() is given just a size, and it can't magically find a block of "white" memory where that fits. So, there will be some data structure to do that. We will see later how garbage collectors solve this. For a typicall malloc/free implementation, this will need to store a pointer or size at the start of each black memory block, and also at the start of white blocks, but since these are not used by the app, it's not a problem.

The second problem is more important, it's the way the invariant is maintained true. Essentially, we have delegated our job back to the application. This is what makes malloc/free an unreliable solution: the application developper probably does not want to think about memory allocation too much, and surely they will occasionally get it wrong. That's the problem we are trying to solve here. So, let's move on to better options.

The garbage collector in Little Smalltalk

Let's see how Little Smalltalk handles this. According to the comments in the sourcecode, this is the Baker two-space algorithm. It is relatively simple.

First let's look at the way new objects are allocated. We start from the first available byte in memory. At every allocation, we increment a pointer by the size of the allocated object. That's it. Objects are allocated next to each other this way, until we reach half of the available memory. Then, the garbage collector starts working.

As we know, it must perform its job in three steps.

Step 1: colorize all objects such as the invariant is verified

All objects in the garbage collected area are first considered white, except the root, which is grey.

Since there are no black objects, the invariant is verified.

Note: the root can be either the first object in the memory, or it can be stored outside of the memory in some separate storage (for example, in Smalltalk, the execution stack is the root and is not typically a dynamically allocated object).

Step 2: modify the coloring of objects to try to reduce the number of grey objects to 0

Get the next grey object.

Analyze each reference in that object. Mark all the referenced objects as grey if they are still white.

When done, mark the object black.

Continue with the next grey object.

Eventually, there will be no more grey objects and the algorithm is complete. In the worst case, this stops when all white objects have been reached and marked grey, and eventually marked black after processing.

So far, this is how you would obviously implement such an algorithm. The trick is in how the algorithm remembers which objects are white, grey of black.

In Baker's two space algorithm, the "black" and "grey" objects are moved to the second half of the memory, while the white objects remain in the first half. This is pretty interesting, because it means all the information needed is already stored in the object addresses. No additional data structures at all are needed. Pretty impressive. But that comes at the cost of not being able to use half of the memory. Maybe that's a bit much.

Other algorithms instead reserve 1 or 2 bits in the objects to store the current color (only 1 bit of grey objects are remembered by some other means, for example, storing their addresses in a stack). That's likely more efficient, especially if you can find some available space for that extra bit. For example, if you have a 16-bit size field in objects, you could easily use one of these bits for the color. No one is going to fill the entire RAM with a single object, that would be stupid. So the size won't ever get that large.

Anyway, there is another problem with that general algorithm. We started by making almost all objects white. Since white objects have to be moved to a new place in memory, that means eventually moving all "live" objects into a new place, and also updating all references to these objects in other objects. If we do this for a large memory (say, 512KB or 2MB) at once, this is going to lock the program from running for quite a long time. Incremental garbage collectors solve this by making only one subset of objects white. But it is not so simple to do that, since we must make sure that the invariant is established (no black objects point to white ones).

Specificities of the Amstrad CPC

One thing we need to keep in mind is the CPC memory is addressed in banks. This means we can only access 16K at once (well, not exactly, but let's keep this simple). Is there a garbage collector taking this into account?

There does not seem to be a lot of research on such things. Which makes sense: when garbage collectors for larger memory became a reasonable thing to do, it was at least on 16 or even 32 bit CPUs, and possibly with virtual memory, and even hardware acceleration to help the GC on some LISP machines. All entertaining reading, but not really what I need.

Note: after writing the first version of this article I found about this article from Lieberman and Hewitt, which is quite similar to the solution I ended up with. It turns out, there probably is more research then you think about these things, it's just not immediately obvious where to find it.

So, I have to get back to the basics and consider how I can tweak them for my need. The idea is to work one 16K page at a time. When the page is full, do a garbage collecting cycle on it, so that we don't leave too many unused objects behind. Then, move on to the next 16K page and start over. Hopefully, we can then have one "in use" page with the recently allocated objects, some of which will have a short life and be garbage collected as soon as the page is full. And we have only a limited number of accesses to objects in other pages.

But, how do we implement that? Since we don't want to go through all of the memory, we can take some inspiration from more complex garbage collectors.

The algorithm can be used to do a complete garbage collection of all the system. This is done by first making all objects "white" (this is easy: just create any new object with the marking bit set to the "white" value). Then, start from the "root" (the references to objects in the program stack), and mark all referenced objects as grey. Then one by one, take one grey object, add all its referenced objects to the grey list, and so on, until there are no more grey objects.

When you are done, all allocated objects are now "black". Then, the memory can be accessed directly (page by page, not using object references), and the available space can be reused for allocating new objects. If you are used to C style memory allocators, you would then start adding all this free space to some kind of free list or something like that. But that's a bad idea: you are using memory to keep track of your free memory. With a garbage collected system, what you can do instead, is move objects around so that the remaining "black" objects are all tightly packed at the start of memory. Then, allocations simply consists of appending new objects at the end. Since each object has a field indicating its size, you can then use that info to figure out where the next object in memory is, and allocations are just incrementing the "last object" pointer by adding the size of the object to it.

Of course, when moving objects around, you must update references to them. In the two spaces method, this is done by putting a special marker at the old place where the object was, indicating its new address. As objects are moved to the new space, they can access this info. Once everything is moved to the new space, the old space is empty and has only unneeded redirections. But again, doing it this way would require going through all memory.

The tricolor scheme is better than that. One interesting thing is, if you read the above carefully, at some point I said "first, make all objects white". But this is a stronger requirement than what we really need. We have 1 condition that we must satisfy: black objects must not point to white objects directly, only to other grey or black ones. What if, initially, all our objects are black instead of white? This also verifies that condition, and the algorithm can run from there.

This opens some new possibilities. It means we don't have to perform garbage collection on all the RAM everytime. If we manage to build a small set of white objects, it can be scanned quite fast and without having to go in all the memory banks. For example, let's start by making all objects initially black. Then, we create a new object (in the currently active bank) and make it white. Surely we will store a reference to this object somewhere soon, and we will store it in another object. Since all our other objects are black, we are breaking our invariant condition here (we would have a black object pointing to a white one). So, we have two solutions: make the new object grey, or make the old one that points on it grey. Both of these preserve the condition. Making the new object grey is not very interesting, however: quickly, all our new objects will all become grey, then be considered by the GC as still used, and never deleted. So we must take the other option.

When we decide to run our garbage collection (for example, because the working memory bank is full), we know that it can only delete objects from the current bank (the only ones that can be white at this time). The "grey" list of references was built as we modified any other objects in other banks. We can check these objects, but we can optimize that check: since all banks other than the current one contains only grey and black objects, we only need to check references to the current bank. All other references would either point to a black object (that has already been scanned), or a grey object (that will be scanned already). We will not learn anything by checking these. And so we can quickly scan the current bank, find if all objects in it are still referenced, and then compact it and start over. We can do this until there is only one free 16K bank left. Then, we need to do a complete GC pass on all the RAM, that will take a longer time, but it will clean all remaining objects that became unreferenced after they were pushed out of the current bank.

Note an important thing here: the current bank always ends up being full, then garbage collected and the remaining objects moved elsewhere. There is also a lot of back and forth between this current bank and all the others during the gc phase (but also probably during running the Smalltalk interpreter). So, I suggest the following layout:

0000-3FFF: The "current bank" with the newly allocated objects (sharing space with other things if needed: interrupt handlers, stack, global variables for the interpreter)
4000-7FFF: In main RAM, the screen memory. Overlaid with memory banks being accessed when the VM is running. Just map the bank out when printing something to the screen.
8000-FFFF: Interpreter and GC code, with the AMSDOS variables in the middle.

So we maintain two pointers for memory allocations: one in the 0000-3FFF area for freshly created objects. And one for the 4000-7FFF area in banks, for the longer lifetime objects. Once the 0000-3FFF RAM is full of objects, we do a small GC cycle of if to take care of the recently created and short-lived objects, then we move the remaining things to banks, and we start over. Until there are no banks left. Then, one big GC cycle to compact all the banks (if we ever manage to fill that much RAM, that is). We may even save ourselves the trouble for the "big GC": instead of compacting the RAM in place, if you run out of RAM, just save your session to disk. This is like a GC but still-referenced objects are written to disk instead of another RAM space. Then reload it. That's it, you just defragmented your whole memory!

One problem with this system remains: Little Smalltalk normally represents objects using 16-bit "cells" (a cell can be either: a pointer to another object, the object size, or a direct integer value. The VM uses the low bit to distinguish integers from pointers (pointers are always to an even address; integers are shifted by 1 and the low bit is always set for example. We could change this and use the highest bit instead, since all our pointers to Smalltalk objects would be in 0000-7FFF). But we have to store info in all pointers/references to know which bank it is pointing to. This would mean switching to 24-bit cells. This makes everything more memory hungry by 50%. Or, we could keep the internal pointers as 16 bit values, and have an indirection table somewhere to find the right destination. For example let's say the interpreter code is now moved to a ROM, so we can do this:

0000-3FFF: The "current bank" with the newly allocated objects (sharing space with other things if needed: interrupt handlers, stack, global variables for the interpreter)
4000-7FFF: In main RAM, the indirection table. Overlaid with memory banks being accessed when the VM is running. Just map the bank out when printing something to the screen.
8000-BFFF: AMSDOS and firmware stuff, variables for the interpreter if needed.
C000-FFFF: In main RAM, screen memory (write-only). In ROM, the Smalltalk interpreter code.

So we have 16K bytes indirection table. If each reference needs 3 bytes, that's enough for 5461 objects. Maybe the table can go up to 9FFF, and then we have enough for 8000 objects. That would be enough to fill the 2MB memory if each object is exactly 256 bytes. Is this a reasonable compromise? I think objects will typically be smaller than that. So maybe using 24-bit cells is better. But then, the indirection table also simplifies things when moving objects around since you can use the table to redirect to the new address of an object when it is moved. Otherwise, you have to update all objects which are pointing to it. Not necessarily a big deal, this can be done at the same time as scanning the grey objects: look at the grey object, see that it references something white in the "new allocations" area, immediately move the white thing (if not already moved) to a more persistent storage in extended memory, and update the grey object with the new address.

Well, all of this easier said than done. I guess I have to write an implementation now?

Porting Smalltalk to the Amstrad CPC: a failed experiment

Posted by pulkomandy on Sat Feb 24 12:38:04 2024 • Comments (0) •

This week project was an experiment with porting Smalltalk to the Amstrad CPC. It didn't work, but it was interesting. So here are my notes.

About Smalltalk

Smalltalk is an object oriented language developed at Xerox PARC. It started in 1969 but the first release outside of PARC was in 1980, and was in fact the third major rewrite of the language.

Smalltalk was part of the "Dynabook" project, which was a research project about imagining the future of computing. The vision for this was a small personal tablet computer (hardware somewhat similar to an iPad), where you could run your own heavily customized system. Of course, this was way out of reach of 1970s technology. But that didn't stop the team. They built an "interim" machine to be the first step towards this: the Xerox Alto.

The Alto is a personal computer, but with a large CRT display and not portable at all. It implements the instruction set from the Data General Nova, an earlier machine that has a rather simple CPU, on top of a microcoded system that allows to reuse the same logic for also driving the display, implementing the disk controller, and a few other things.

The early versions of the hardware didn't have a lot of RAM, but since the project was about designing the future of computing, that was a problem. So, Smalltalk runs in a virtual machine, implemented in software, that implements virtual memory: most of the data is stored on disk, and loaded on-demand when the Smalltalk software needs it, and all of this, without the user/programmer having to think about it.

Why port it to the Amstrad CPC?

The CPC was one of the first computers I had access to. I still own one (several in fact) and still use it from time to time, as well as developping hardware expansions for it. The initial machine has 64K of RAM, later expanded to 128. But mine has a 2MB expansion card connected to it. However, the machine runs a Z80 CPU, which can only access a total of 64K of memory. As a result, extended memory is accessible in banks of 16K, one of which can be accessed by the CPU at a time. There are no languages that can handle this automatically: the BASIC shipped with the machine was never updated, and the SDCC compiler can do it, but in a bit limited way.

Anyway, the BASIC on the CPC is a rather good implementation, but it is still BASIC: not very fast, and not very well suited to structured programming. And C requires a cross-development setup, where the code is written and compiled on another machine, and the CPC is used just to run it. That's a bit sad, the machine should be able to work on its own.

There aren't many languages that can fit in these 64K of memory. Well, you can go with Pascal, and in fact there are pretty good versions of it for CP/M. You can go with C, but the Z80 CPU is not very well suited to a typical C compiler, and so the generated assembler code is messy and not efficient, and, more importantly, there is no hope for a good C compiler running directly on the machine (that would be possible with a simpler CPU or maybe if it was more reasonable to use the memory banks).

Moreover, my goal here is to have a language for quick prototyping and experiments. When I need the full performance of the machine, I'm happy with using a C compiler or even hand optimizing some parts in assembler. But this is a bit annoying to do on a larger scale. Essentially, I want something acting as the main interface for the machine, just like BASIC does, but with a more modern language.

This reduces the set of candidates to: BASIC itself, stack based languages such as Forth and LISP (which would both be good choices but are a bit too different from what I'm used to), maybe something like Lua or Wren (but not the existing implementation of Wren, which is based on floating point numbers and NaN tagging, not really suitable for the Amstrad. And there is Smalltalk.

The Smalltalk from Xerox as it was released in 1980 is interesting. The main interpreter could fit in a dozen or so kilobytes of code. That's just the amount we have on a CPC ROM expansion bank, so it could fit in the place occupied by the BASIC. However, Smalltalk is not just a language, it is a complete system, with a GUI, overlapping windows, a built-in debugger, and so on. Well, at least it would be interesting to see all of this running on the CPC, and how slow it would be. But maybe not as a practical thing to use, at least not as a replacement for the BASIC. The Smalltalk world mainly continued from there, and today you can find Squeak and Pharo, which are direct desendants of this version of Smalltalk, but keep getting larger and larger, and are probably much more than the CPC can handle.

Little Smalltalk

So, I had this idea of writing a Smalltalk VM for the CPC in the back of my head for a while. Lately, I looked again for existing Smalltalk implementation that I could maybe use as a starting point. You know, start from working C code, even if slow and inefficient, and gradually rewrite things out in assembler until I have something usable.

These searches brought me to Little Smalltalk. This is a dialect of Smalltalk, not quite compatible with the standard Smalltalk-80. The syntax is very similar, with a few minor differences. But, maybe more interestingly, the GUI environment and built-in debugger and all that stuff is not available. This is a command-line Smalltalk, initially designed for use from a simple serial terminal on a UNIX timeshared computer.

There are four versions of Little Smalltalk, each improving in the previous one and making the code simpler and nicer. Since this is rather simple C code, and it compiles to a reasonably small executable, it was not unreasonable to attempt porting it to the CPC. So I started hacking.

Little Smalltalk is a bit different from Smalltalk-80 in that it is not a completely abstract VM specification. For speed, it retains the pointer size of the host machine and uses that as its main type. With this and a few other tricks, and the removal of the GUI parts, it should be quite simpler than Smalltalk-80. Simple enough that I could attempt running it in a few evenings of work.

I started by compiling all 4 versions on my PC, and looking at the code size for each. Version 4 is indeed the smaller one, so I started with that. It also is the only one not using K&R style function prototypes, which is good for me, because SDCC (the C compiler I'm going to use) doesn't implement those.

From the start I noticed that the "image" size (the Smalltalk bytecode) was too large to fit in RAM, but I thought this was because of building it for a 64bit machine in my tests, and the final version would be considerably smaller.

A note on running C on the Amstrad CPC

There are various tutorials for this out there, but the ones I found were slightly out of date. So, I'm going to share my own files here, and explain what I needed to change in the Little Smalltalk implementation to get it running as much as I did (not enough for anything usable).

C application startup

If you know something about C, you know that execution of a C program starts with the main() function. Well, that isn't entirely true. Before this can be run, the code should make sure that all global and static variables are initialized to the correct values. In SDCC, this is implemented by the GSINIT code segment. To make sure this code is run before everything else, we need a little bit of assembler code, usually called CRT0. SDCC ships with a CRT0 meant for their own Z80 simulator platform, but that is not quite suitable for the CPC. So, here is my version suitable for the CPC.

  ;; This is the CRT0 module. Declaring this will make sure it shows properly
  ;; in .map files and other places.
  .module crt0

  ;; Note that the main function is a global that can be found elsewhere.
  ;; The assembler then knows how to emit a reference to it for the linker to
  ;; handle. Note that C functions are translated to assembler labels with an
  ;; extra _ in front of them.
  .globl  _main

  ;; The output file starts with a HEADER section, that is not loaded. In theory
  ;; we could put a standard AMSDOS header here, but I chose to do that with a
  ;; separate tool instead.
  .area _HEADER

  ;; Then we have all the sections with read-only code and data
  ;; GSINIT and GSFINAL are for the initialization code run before main (see
  ;; explanation below where they are filled in).
  ;; HOME and CODE is for the code and constants
  ;; INITIALIZER and INITIALIZED are for global and static variables initialized
  ;; with some value in the code. The GSINIT code should copy them from INITIALIZER
  ;; (in ROM) to INITIALIZED (in RAM) so they have the right value. In our case,
  ;; all the code and data is loaded in RAM, so we don't really bother with that.
  ;; DATA and BSS are the RAM areas used by the software. BSS should be cleared
  ;; to 0 during program startup.
  ;; Finally there is the HEAP, this is the area used for dynamic allocations.
  ;; This could be SDCC's malloc, but it is not a great implementation, in our
  ;; case we will use a custom and simplified allocator that can't free any memory.
  ;;
  ;; Each compilation unit (.rel file) can add data independently to each of these
  ;; sections. Then, the linker concatenates the section from each of them
  ;; (for example, all the GSINITs together) to generate the final output file.
  .area   _GSINIT
  .area   _GSFINAL
  .area   _HOME
  .area   _CODE
  .area   _INITIALIZER

  .area _INITIALIZED
  .area   _DATA
  .area _BSEG
  .area   _BSS

  ;; Now that we have defined all sections and in what order we want them (well,
  ;; in theory, but there seem to be other things messing with this order currently),
  ;; we can start filling in these sections with data or define symbols that point
  ;; to important parts of the memory, so these symbols can be accessed by the
  ;; C code (there will be examples of this later in the article).
  .area   _HEAP
  ;; Define a symbol for the "heap" start. This is the first address not used by SDCC
  ;; generated code, so anything above that is free for our own needs. Our simplified
  ;; memory allocator will make use of it.
_heap_start::

  .area   _GSINIT
gsinit::
  ;; All code that needs to be run at global init (before entering main) is inserted here by SDCC
  ;; Each compilation unit will generate little bits of code for each thing that needs initialization
  ;; (for example if you initialize a global or static variable with some complex expression).
  .area   _GSFINAL
  ;; The GSFINAL area is inserted after the GSINIT one. Usually nothing else gets added here, but
  ;; it could be used for example to implement atexit() or similar (code to run after main has
  ;; returned).
  ;; So, after the GSINIT code is done, we can call into the main function. We use JP to do a tail-call
  ;; optimization, the main function will return directly to the caller (to the BASIC interpreter for example).
  jp      _main

  .area   _CODE
  ;; Finally we need some code in the CODE segment. This will be put at the start
  ;; of the output file at 0x100 (see the linker script). For simplicity (since SDCC
  ;; does not generate the AMSDOS header for us), we put the entry point at that address.
  ;; TODO; it would be even better if the GSINIT and GSFINAL code was already at 0x100,
  ;; then we could directly have the program entry point pointing there.
  ;; Another option is to have GSINIT and GSFINAL at the end of the code, and overlapping
  ;; with the HEAP area. Since the init code is run only once, this allows to have this code
  ;; erased after it is run and not needed anymore. Maybe I will write about this
  ;; trick in another article later.
_start::
        jp gsinit

  ;; We also define the exit() function to be an infinite loop, because our code
  ;; expects that to be available.
_exit::
        jr _exit

So, we have a CRT0 file now. How do we use it? Well, we tell the linker about it using a linker script. Here is the one for the ImageBuilder tool (more on what that is in the next section).

-mjwx
-i imagebuilder.ihx
-b _CODE = 0x100
-k /bin/../data/sdcc/lib/z80
-k /packages/sdcc-4.4.0-1/.self/data/sdcc/lib/z80
-l z80
../crt0_cpc.rel
imageBuilder.rel

-e

This generates a file named "imagebuilder.ihx" in Intel hex format. It also forces the _CODE section to start at address 0x100 (this is where we want to load our program). It then declares the Z80 standard libraries, the linker will use these to provide implementations of standard functions (printf, malloc, strcpy, ...) if they are used. And finally, we have our two object files, the CRT0 and the output of the C compiler.

Finally we can tie this all together with a simple build script:

sdcc -c -mz80 --no-std-crt0 --opt-code-size --max-allocs-per-node 100000 imageBuilder.c -I../source
sdldz80 -f lst4.lk
makebin -p -o 0x100 < imagebuilder.ihx > imagebuilder.bin
hideur imagebuilder.bin -o ../IB.BIN -t 2 -x "&100" -l "&100"

The first two steps are compiling the .c file, and linking it using the above linker script. This result in a file in intel hex format, but we need a binary. The makebin tool is used to perform the conversion. Finally, we use hideur maikeur to add an AMSDOS header setting up the load address and entry point. As a result, our executable can be run directly from BASIC. That's all, we're up and running!

Porting ImageBuilder

ImageBuilder is a tool that parses a text description of smalltalk classes and methods, and converts it into a binary smalltalk "Image" containing class and method descriptions, and the bytecode to run for each of the methods. I need to run it on the CPC because the image format is platform specific (mainly, the class and object headers depend on the native pointer size).

Besides the things described in the chapter above, I made a number of changes to the ImageBuilder to get it to run. I believe these can be useful to anyone attempting to port C code to the CPC, so, here is the details of what I changed.

Console output

Of course, we want to see what the program is doing. This is traced by printf statements througout the code. SDCC provides an implementation of printf, but it needs to know how to send the characters to the screen. For this, we need to use the firmware BB5A function. This function takes a parameter in A that is the character to print.

That's a good thing, because that is exactly the calling convention used by SDCC. So, we can just tell SDCC that this function exists at that address. We do this with a little function pointer trick.

#define bb5a ((void(*)(char))(0xBB5A))

What does this do exactly? Well, we define a macro called "bb5a" (this could be any name we want to give to the function). This macro takes the value 0xBB5A, and casts it into a function pointer, to a function that takes a char argument and returns nothing. That's all SDCC needs to know to call that function. Now, when we want to print a character, we can do this:

	bb5a('h');

Unfortunately, that is not enough to get printf to work. SDCC follows the C standard a bit too closely, and needs a putchar function that takes its parameter as a 16-bit value. Also, the code uses \n for newlines, while the CPC needs \r\n. We can take care of these two problems by defining our putchar function this way:

int putchar(int c) {
       if (c == '\n')
               bb5a('\r');
       bb5a(c);
}

File I/O

The ImageBuilder needs an input and an output file (the sourcecode and the generated bytercode image, respectively). The original code is written for UNIX, using FILE* and the corresponding standard APIs. These are not available on the CPC, but instead we have AMSDOS, which, in this case, provides a good enough support for file access. But, it is not designed for accessing it from C, and a few tricks will be needed.

First of all, we need to initialize the disk ROM. On the CPC, when a program is started with the RUN command, the system is de-initialized, and all ROMs are turned off. This means any attempts at disk access would instead try to target the tape drive. Well, technically that would work, but it's easier to use some modern mass storage (in my case, it is an SD card connected to the Albireo mass storage expansion). So, we need a function to re-initialize the disk ROM.

static inline void amsdos_init(void)
{
       __asm
               LD HL, #0xABFF
               LD DE, #0x40
               LD C, #7
               CALL #0xBCCE
       __endasm;
}

I kept this extremely simplified. Normally, this function should also save the current drive number for AMSDOS and restore it after reinitializing the ROM. It should probably not use hardcoded values for HL and DE, but determine where it is safe to put the AMSDOS buffers. Maybe it should init all the ROMs instead of just ROM 7. Anyway, for the ImageBuilder I just wanted to get up and running quickly. So, I call this from the start of the main() function.

Next, we need to open a file for input. Unfortunately, the firmware function does not accept its parameters in the SDCC calling convention. SDCC puts the first 8bit parameter in A, the second parameter in DE, and then everything else on the stack. The CAS_IN_OPEN firmware vector needs the filename length in B, the filename pointer in HL, and a temporary work buffer in DE. So, only DE is at the right place. However, by looking at the generated code, we can see that, to push the filename pointer to the stack, SDCC is also accidentally putting it in HL. I decided to abuse this since it made my life a little easier. So we end up with the following:

static char amsdos_in_open(char fnlength, const char* buffer, const char* const fn)
{
       // A = filename length (we need it in B)
       // DE = buffer
       // on stack = filename (but it is also in HL by accident)
       __asm
               ld b,a
               ;pop hl
               call #0xbc77
       __endasm;

       // TODO error handling
       // Popping is taken care by SDCC
       return 0;
}

While we're at it, the function to open a file for output is pretty similar:

static char amsdos_out_open(int fnlength, const char* buffer, const char* const fn)
{
       __asm
               ld b,a
               ;pop hl
               call #0xbc8c
       __endasm;

       // TODO error handling
       return 0;
}

And we can also add the functions to read and write characters, and close the files when done. These match the SDCC calling convention, so, we handle them like the BB5A function from earlier:

#define amsdos_getc ((char(*)(void))(0xBC80))
#define amsdos_putc ((void(*)(char))(0xBC95))
#define amsdos_in_close ((void(*)(void))(0xBC7A))
#define amsdos_out_close ((void(*)(void))(0xBC8F))

Finally, the original code also uses fgets to read the input file line by line. AMSDOS doesn't natively provide that, but we can easily provide an implementation. We just need to read the file until there is a newline, and make sure to add a NULL terminator after it.

static char amsdos_gets(char* buffer)
{
       char* fin = buffer;
       for (;;) {
               // FIXME test carry instead for returning
               *fin = amsdos_getc();
               if (*fin == '\n') {
                       *++fin = 0;
                       return 1;
               }
               if (*fin == '\0')
                       return 0;
               fin++;
       }
}

This is not really correct, the end of file is not properly detected (in fact, that's a problem from the way amsdos_getc was implemented here). But the work never got as far as reading the file until the end anyways.

Now we can implement our file IO. Note that in AMSDOS there are no file descriptors like in UNIX, there can be only one input and one output file opened at the same time. Fortunately, our needs are even simpler: the ImageBuilder first reads all of the input, closes it, and only then writes the output. So, all that's left to do is declare a working buffer for AMSDOS:

static char amsdos_buffer[2048];

And then we can replace all the calls to fopen, fgetc, fgets, fputc and fclose in the sourcecode.

Memory allocations

After that, we try to run the program and... it immediately fails after trying to allocate some memory. It turns out, SDCC only provides 1000 bytes of memory to use for malloc() by default. That is, of course, ridiculously small. Changing that requires manually recompiling the memory allocator. I have no idea why they made it that way. Anyway, for the vast majority of allocations, the ImageBuilder never frees them. The idea is to keep everything in RAM until the final image can be generated. This means we can use an extremely simple allocator:

static void* memalloc(int size)
{
       extern char heap_start;
       // Make sure heap is aligned since the low bit is used for non-pointer stuff
       static char* ptr = (char*)(((unsigned int)(&heap_start) + 1) & 0xfffe);

       char* allocated = ptr;
       ptr += size;

       return allocated;
}

This allocator works by having a pointer to the start of the free memory, that is simply incremented whenever we need a new allocation. Where is the "start of free memory"? Well, remember in the linker script we declared a "heap_start" label that points there. So, we just need to grab the address of that label. So we declare it as an extern variable, and take its address. We then make sure it is an even address. This is because I suspect some things in Little Smalltalk use the lowest bit for storing a flag, so all allocated addresses must be even.

The code also makes use of strdup, the implementation from SDCC would still want to use its own malloc, so let's replace that as well:

static char* mystrdup(const char* in) { // keep heap aligned int len = strlen(in) + 1; if (len & 1) len++; const char* out = memalloc(len); if (out != NULL) strcpy(out, in); return out; }

Here as well, we take care of keeping the heap aligned. Otherwise, there is nothing special about it.

Finally, there was one place where malloc/free was used, that is in the ClassCommands methods that does something like this:

	char *class, *super, *ivars;

	// Some code here ...

	class = strdup(...);
	super = strdup(...);
	ivars = strdup(...);

	// More code here using those strings...

	free(class); free(super); free(ivars);

This is easily replaced with the following:

	char class[16], super[16], ivars[83];

	// Some code here ...

	strcpy(class, ...);
	strcpy(super, ...);
	strcpy(ivars, ...);

	// More code here using those strings...

And that's all, we can now run the ImageBuilder!

... Or can we? (Running out of memory)

Well, it starts parsing and compiling things, and it seems to work. But after a while, it crashes. Fortunately, I'm running this all in the ACE CPC emulator, which provides a good debugger and allows me to watch the CPU registers in realtime. I see that the code seems to be using quite a lot of stack space, and the SP register (the stack pointer) ends up way lower than it is allowed to be. The CPC firmware only allocates a relatively small space to the stack, since it didn't expect me to run complicated C code like this, with recursion and use of stack buffers to store strings.

That's not really a problem, we can move the stack somewhere else. But where? there is now a lot of things going on, and we don't really have much space to fit it all. Of course, it would be convenient if we could use the display RAM. On the CPC, this uses 16K of RAM, or 1/4 of the available space. Can we put the stack there? Of course there's the problem that we can't use that space for both the stack and for displaying messages. So, let's try to do it only while the parser function with the big recursive calls is running. And we can restore it to its normal location when we are done with the complicated parsing code. Let's add this to the parseBody function:

// Declare a new global variable where we can temporarily store the stack pointer
int savesp;

	// When entering the parseBody function:
	// Disable interrupts (so the firmware doesn't notice us doing strange things with the SP register)
	// Save the old value of SP to a global variable
	// Point SP to the screen RAM
       __asm
               DI
               ld (#_savesp),sp
               ld sp,#0xffff
       __endasm;

	// When exiting the parseBody function:
	// Restore the stack pointer to where it was
       __asm
               ld sp,(#_savesp)
       __endasm;

OK, it is a bit crazy to do this without telling the C compiler about it. But here it works because this function has no parameters and no local variables. And we do these things just at the right places so the compiler does not attempt to access the stack by other ways (such as by using the IX register as the frame pointer).

Now, we can see some glitches on screen while the parser is running, but at least it doesn't erase some important things in memory while running. And this gets us a little bit further into parsing the file...

But how far, exactly?

Unfortunatly, it does not get us far enough. With this setup, and tuning the compiler for optimizations a bit (that's the --max-allocs-per-node setting you saw earlier in the build script), we end up with about 16K of RAM available for the heap. I made some tweaks to other parts of the code, reducing the size of some memory buffers that didn't need to be so large. This heap fills rather fast as the parser creates more and more object declarations and compiles bytecode for several methods from the imageSource definition file. Unfortunately, we run out of memory after compiling barely 10% of the input file. So, a rough estimation is that we'll need 10x as much memory to get through the complete file. There's no way that's going to fit into the Z80 address space. Even if I removed most of the firmware, even if I used all the screen RAM. And then, the resulting image file would still be around 100K. So, the conclusion is that this is no match for the space efficiency of the Amstrad CPC BASIC, unfortunately.

Conclusion

Do I give up on running Smalltalk on the CPC? No, not really. But, it will need a Smalltalk VM that can manage banked memory. That's doable, as I have said earlier in the article, in the Xerox Alto they even did it by swapping things to disk, so, surely, memory banks are not a problem. But, if I'm going to do that, I may as well use the fact that I have 2MB of RAM, and so I can run a complete Smalltalk-80 image. I don't know how ridiculously slow that will be. But in any case, there is little gain from constraining myself to Little Smalltalk. If you're going to need hundreds of kilobytes of RAM for something, you may as well throw in a full-blown GUI in it.

Should I try some other language someday as a possible alternative to BASIC? Well, I'd love to. Ideally the requirements for that would be to use only 16K of code (fitting in a ROM), and maybe a few hundred bytes of RAM, leaving as much as possible free for the user. I think I would be fine with needing to write scripts in a text editor (such as Protext) and not have to use the way it is done in BASIC. Or maybe just a REPL where one can define custom functions and so would be fine. In Little Smalltalk, the editing of functions is left to an external text editor, and when the text editor returns, the resulting text is parsed and turned into bytecode. I think that's an interesting idea. Well, I'll keep looking at other options...

Attempting to revive a PlayStation 3

Posted by pulkomandy on Mon Jan 1 11:25:33 2024 • Comments (0) •

Many years ago, my cousin had a problem with his PS3. Apparently, the blueray drive stopped working. He disassembled it to see if he could fix it, but he couldn't. Then he asked me if I could take a look.

At the time, I thought that I could look into hacking it to allow playing games from the hard disk only. Surely that would be good enough. But, it was still the early days of PS3 hacking, at the time, the exploits needed a special USB device, which I didn't have. So, the console ended up sitting on a shelf in my paren'ts flat, and nothing happened.

Recently I decided to take a look at it again. So I took the console home with me and started investigating. I don't know what happened to this poor blueray drive when my cousin attempted fixing it, but it seems completely dead now. I didn't even manage to insert a disk in it. A brief research showed me two good news, however: hacking PS3s is now much easier, and there is also a special "noBD" firmware for people in this situation, because the console normally won't launch any game without a Blueray drive.

So, I got the noBD firmware and installed it. I did this a while ago, so I don't remember all the details, but it was straightforward, especially as this PS3 was running a very old firmware which is easy to hack. I am now running Evilnat's firmware (CEX, noBD). I also installed WebMAN which is the tool for running games from disk images on USB or hard disk.

Then the problems started. I wanted to install some PS1 games. One nice feature of the PS3 (at least the early models) is the backwards compatibility with the older consoles. Surely I could put some fun games there and play with friends? Well, it turns out, not really. The problem is, the console really wants to communicate with a blueray drive to start games that are meant to run from a CD. And for the PS1 games, there doens't seem to be any workaround for that.

Then I thought, OK, the BlueRay drive is mechanically dead, but I don't need it to actually read any discs. It just needs to be there. In reality, just the control board has to be there. So, I got the drive back (initially I had left it in my parent's flat) and I reconnected it insode the console. And I tried to install a normal firmware with Blueray enabled.

Well, this didn't work. The firmware cannot communicate with the blueray and this results in an installation error. Then the console reboots and tries again. And again. And again. And you can't exit that update loop, as there isn't even a way to access the service menu/failsafe mode in that state.

How do you get out of this? Well, the firmware is on the hard disk during installation. So, you remove and reformat the hard disk (I took the opportunity to replace it with a larger one). You don't need a specific format for the disk, just remove what's on it. The PS3 will reformat it. Then, prepare an USB key with a firmware update (a noBD one). Install the new HDD and hold the power button until the 3rd beep. Then release it and hold it again, again until the 3rd beep (this time it will be a double beep). You get to the service menu and from there you can install the noBD firmware again.

I still don't have a solution for installing PS1 games. I might try PS2 and PS3 ones. At least games in PKG format should work, so I can explore some of the interesting things that were sold on the PSN store, I guess. And also there are emulators for some older consoles.

But, in the end, the experience of a console with noisy fans is not that great. Should I bother with this? I don't know, maybe I will install a random set of games and put it in the break room at work, and we can have fun playing it during breaks with my colleagues. Do you have any games suggestions?

Archiving my old CDs

Posted by pulkomandy on Thu Nov 30 18:52:17 2023 • Comments (0) •

I have a shelf of old CDs, both audio and games. Some are very common stuff and not so interesting. Others are quite a bit more obscure, for a variety of reasons:

Audio CDs I bought at shows or from websites, that are not currently re-edited
Various shareware compilations and tie-in discs
Even for big well-known games, sometimes the French edition is not well archived even if the English one is

Anyway, it was one small worry in my life that if these discs would be damaged, lost, or stolen, I don't really have a backup plan. Actually, while archiving, I found some cases where the disc is missing (nothing impossible to replace, fortunately).

The other part of this problem is I had no remote backup. I used to do a "backup exchange" system with a few friends, where we would send our homeservers backups to each other. But people move on, servers go offline, and this is not available anymore. Related to that (and less so to the CDs archiving), I'd like to do more with my homeserver, but as long as there is no backup plan, and the server is a machine in my living room, the risk of data loss is just too high. So, I finally registered an account on iDrive so I can upload the stuff there. The server is now backed up, and I'm dumping the software and music there as well.

Due to copyright laws, this is a private archive for my own use. However, I will publicly list what I have. If it turns out some of this is not readily available, contact me and we'll figure out something.

I tried to do things "right", dumping ISO images where appropriate, all sound tracks as flac (both for the games and music CDs), as well as the extra content for Audio CDs. I also scanned the booklets and the disks themselves.

PC GAMES

123 Free Solitaire 2003
8 jeux de casse-tête (Pointsoft collection of Tetris-like and other braintwister games)
3D Ultra Pinball - Le continent perdu
3D Maze
Alien Detector Pro
Astrobeer
Blobby Volley
Blueprint menu editor (for editing The Sims objects)
Bousille écran
Boston Bomb Club (edited by Pointsoft)
CD Force 1 - Octobre 1999
CD Fun 1 and 2 - Février 2000
Car Tycoon (French version, I remember this being unplayable due to too many bugs)
Command And Conquer (with "Opérations Survie" extra disk)
Crop Circles
Dune II
Dungeon Keeper Gold (multilingual)
EcoQuest
EcoQuest 2
Freelog Hors Série 11 - Démos et Merveilles
Inca II Wiracocha
Get Medieval (Microïds CD with Shogun, Rage of Mages, and Corsairs
Grim Fandango
Gruntz
GT Racing 97
Heroes of Might and Magic III (GOG.com edition)
Hi-Octane
Hugo
Indiana Jones and the Fate of Atlantis
Lego Creator
Lemmings for Window 95
Lemmings 2
Lemmings 3
Lemmings 3D
Lemmings Revolution (a different print than the one available at WebArchive)
Les fourmis - Les Guerres de l'Ouest
Les Mondes Engloutis 2.0 (scenario for Age of Mythology)
Little Big Adventure
Lord of the Rings II
Lost in Time
Lost in Time 2
Ludorace
Mais où se cache Carmen San Diego edition Junior (Pointsoft and TLC edusoft)
MaxiMax Septembre 1999
Micro Machines v3
Monthy Python Sacré Grall
New World 3 - Trip to Australia demo
Oddworld l'Exode d'Abe
Pandora's Box
Pandemonium
Pandemonium 2
PC Mag hors série 21 (Shareware compilation)
Plane Crazy (PC Jeux 26 - Nov 99 - CD1)
Pointsoft Jeux pour Windows (several games edited by Pointsoft: Jungle colors, Snowmanland, Lost In Time, Wizards). The games are clones of well known things (Puzzle Bobble, Dr Mario, ...), but they have decent pixel graphics and great music by then demoscene musician Traven
Powerslide
Prince of Persia 2
RAMA
Rayman Forever
Red Baron (PC Fun Hors Série)
Q*Bert (version from 1999. My CD looks different from the ones already dumped at the web archive, I guess the French version got a different artwork?)
Rising Lands (Microïds)
RollerCoaster Tycoon
RollerCoaster Tyconn: Loopings en Folie scenario disc
Sim Tower (Maxis)
Sim City 2000 édition spéciale Media Pocket (a low-cost re-release, including the DOS, WIN16 and WIN32 versions with SCURK and other extras. The CD includes PDF user manuals in french)
Sim City 3000 OEM edition
Sim Life
Sim Town
Street Wars (Constructor 2) (Acclaim)
Les Super Maths (PC Junior CD #16) - A MAcromedia made game where You play X-Men characters in 4 minigames involving mathematical operations, graphs, triangles, and so on.
The Dig
The Incredible Machine 2
The Sims Transmogrifier 1.4
A random asortment of custom objects to add to The Sims
Theme Hospital
Theme Park
Traffic Giant (GOG.com edition)
Tux Racer 1.1
Tomb Raider II
Warcraft II Battle.net edition (in English)
Warcraft II Beyond The Dark Portal
Wild Tangent Front 9 minigolf
Worms 2

Music

Alternine: both the original EP and the 2.0 from the rebooted band
André Minvielle et Lionel Suarez - Tandem
Antoine Reneaut - En Plénitude
Les aventures de Bibounet et Patacloum (companion CD to a DVD that I don't have)
Chorale Iroise - Couleurs Celtiques
Daft Punk - Ton Legacy soundtrack
Datarock - Datarock Datarock (EU version with extra tracks), also including the videoclips from the CD
Datarock - RED
Dr Wong - Heavy Metal Banania Unplugged, Tipiak, and covers of Zelda and Tetris
Eiffel 65 and Bloom 06 - All albums, nothing very exciting here I think
Eurovision Song Contest 2009 (CD1 only)
Fabulous Trobadors - Ma ville est le plus beau park
Flash (some Atari XL music)
Flavien Berger - Léviathan and Contre-Temps
François Hadji-Lazaro - Un recueil frais et disco (including DVD with videoclips)
Les Frères Brothers - First 2 albums
Guilhem Desq - Visions
Harajuku - Just One Look (incomplete)
La Tordue - All 6 albums
La Patate - L'alboom (if you know French you may remember their flash animation "La chanson de la patate" but you probably didn't there is a full album)
Les Escrocs - All 3 albums
Les Wriggles - Justice avec des saucisses
Malaussene - 2 albums and bonus tracks. They had these for free on their website
Mélusine - Chansons à écouter au coin du feu (a tie-in music CD sold with a comic book. Not bad music at all from people whose main job is comic book scenario and drawing!)
Mickey 3D - All 3 albums from a boxed set
Paula Powered - Level Up
Poison - The last legend
Pomplamoose - Invisible People
Hurricane on the Bayou (soundtrack from the movie/documentary of the same name
Tribute to Arno (Putain, putain, une tribu pour Arno)

Playstation games

SLES-00278-2107629 - Bust A Move - Arcade 2 Edition
SLES-00838 - Oddworld - L'odyssée d'Abe
SLES-01502 - Oddworld - L'exode d'Abe
SLES-01893 - Bomberman
SLES-04166 - FIFA Football 2005

Software and drivers

3D Prophet Kyro II driver CD
Akai LPD8 CD (Mac and Windows software and user manual)
Altera Complete Design Suite 7.2 for Windows and UNIX/Linux, with Cyclone II starter kit CD
ASUS Windows 7 Anytime Upgrade
BeOS "developer edition" distribution (versions 1.1 and 2.2)
Borland Delphi pour Windows (Presqu'Offert hors série juillet 1998)
CD from my high school music teacher with multimedia presentations about the music we had to study for exams
ClarisWorks 3.0 (Presqu'Offert 9)
DirectX 9.0 redistributable
DIV Games Studio
E-Icons 4.15
Genius Genitab III driver for Windows 98 and NT4, version 1.2
Global American Inc drivers - for a VIA embedded PC motherboard, a GPS received and a PCI serial card
GP2X F100 userguide (English and Korean)
HP PSC1310 / OfficeJet4200 / OfficeJet 5500 printer drivers (Windows 98-XP, possibly also Mac)
Iomega Zip driver
Klik and Play (French, Mac)
La Bible PC - Programmation système - 5ème édition (sold with the book of the same name, including all examples from the book, and also a Micro Application software and book catalog with weird background music, and a low resolution version of the "Don't copy that floppy" rap clip that takes more space than everything else on the CD)
Liquid Desktop screensaver
Matrox Millenium G450 driver CD
Micrografx picture publisher 6.0 (also including ABC Media Manager, older 16bit versions of Picture Publisher, and demos of other Micrografx software)
MP3 player driver (don't remember for which cheap MP3, either an S1mp3 compatible one or some later ipod Nano like thing)
Mustek ScanExpress 1200 SP (includes TextBridge and IPhoto Plus for Windows and Color It! for Macintosh)
NetBurner support CD (this is a Motorola Coldfire development board with an Ethernet port)
OfficeOne 7 - An office suite built from existing components (OpenOffice, Paint.net, Avast, ...) also including some games. Sold with an Asus computer back in 2008 or so
Partition Logic 0.66
PC100 Video Inside motherboard drivers
Shino Disquette v3.7
SNes9x 1.39a FR
Sound Blaster Live
ToUCam fun Philips webcam drivers, PCVC 730K version 1.01
Ultimate Boot CD 5.1.1
USB to IDE driver CD (I don't remember what piece of hardware it goes with, includes drivers for various chips)
USB to serial adapter driver (a Prolific based adapter)
USB 2.0 Digital HDTV Tuner
Various Palm software
Visual C++ Express 2005
Windows XP (first release)
Windows XP Pro SP3
Windows 7 Pro 64bit with Service Pack 1, French (ok you probably don't need this one...)

DVD and movies

Blanche Neige et les sept nains (Zylo / Goodtimes platinum series)
Heidi (Zylo / Goodtimes platinum series)
Le Bossu de Notre Dame (Zylo / Goodtimes platinum series)
Pocahontas

Wanted / Missing

Some things I couldn't archive in time... Some of this is probably easy to find, and just notes to myself to get it. Other things are more complicated.

The Daft punk soundtrack is supposed to have some bonus on the disk. Unfortunateely, what's actually on the disk is an EXE file that will just send you to a now offline website. The web archive only has a backup of the landing page, because the actual content was behind a login wall. There were apparently 3 trailers for the movie, the videoclip for Derezzed, and a picture gallery. Did anyone doznload the pictures from the gallery or other stuff from that website?
Complete "Just One Look" album by Harajuku
My copy of Tomb Raider ("version longue"/extended edition) has only CD2. CD1 has been accidentally replaced by a CD of Tomb Raider II. Anyone has the matching Tomb Raider II case with CD1 of Tomb Raider 1 in it?
Bénabar - Infréquentable, I have the case but the CD is not in it
Petite Oreille by Boucherie Prod - A compilation CD released in 1996 with children songs by artists who don't usually do children songs. My CD is lost and this is long out of print, I'll have to locate a copy at acceptable price.
Megarace 2 - CD2 (French version), mine is too damaged to rip properly
Command and Conquer Soleil de Tibérium - GDI disc

Adding an RGB switch to my TV

Posted by pulkomandy on Thu Oct 26 13:07:55 2023 • Comments (0) •

I have a flat screen TV I bought in 2009. I use it for various retro computer things, mainly when I'm travelling and don't want to carry a CRT around. It is also useful because the VGA port will accept any timing, down to the 18KHz refresh rate of MDA/Hercules at least. It also has a few other cool things my CRTs will not do, like Teletext support.

There is just one problem with it, however: there is no way in the menus to force the SCART connector into RGB mode. This can be done only by sending the proper voltage (1 to 3 volts) to the SCART connector itself. Normally this isn't a problem: proper RGB SCART things will send that voltage, and everything will be fine. But I also want to use this port for other stuff, where I make an adapter myself for some other type of RGB output. Often there isn't a voltage source available on the power connector then. As a result, the TV switches to composite mode and shows a grey screen (since it uses the sync signal as a video signal, and so there is nothing to show).

Previously I did this as an external mod: getting 5V from the USB port of the TV, and feeding back into the SCART. But this requires making SCART connectors with a 5V input, the cable can break when transporting the TV... well, it's annoying. So I decided to make it internally instead.

I get the 5V from the Common Interface slot, this is a PCMCIA slot where you could plug a satellite TV receiver or crypted TV decoder or the like. Since this has a standard pinout, the 5V was easy to locate, and also they made it with very large soldering pads so adding a wire there was very easy.

Since the TV expects 1 to 3V on the RGB switching line, I use a 220 ohm resistor. The TV has a 75 ohm impedance, so this creates a voltage divider in the right range. Any resistor from 50 to 300 ohms would do.

Two wires soldered on the PCB, going from CI slot pin 17 to a switch (not visible, on the back of the PCB) and then to SCART pin 16

Finally, I mounted the switch by screwing it into the Common Interface slot. The slot is just wide enough to put the switch through, and will provide enough mechanical support once the switch is secured there.

the switch on the back of the reassembled TV

When the switch is ON, this will force the TV to use RGB input on the SCART port. When it's off, the TV will work as normal since the mod is disconnected. A quite simple change in the end, but very useful for me.

Haiku Screenmode command line tool backported to BeOS

Posted by pulkomandy on Wed Oct 25 10:41:24 2023 • Comments (0) •

Just a little thing today...

Here is a version of Haiku's

screenmode

command line utility modified and recompiled to run on BeOS.

I made this because BeOS screen preferences didn't allow me to set custom resolutions easily. Now I can set up any timings I want, and so I can get my BeOS system running on my 1080p display in native resolution.

Download screenmode for BeOS (17KB, zip)

IFF catalog add-on for Haiku Locale Kit

Posted by pulkomandy on Sun Oct 15 15:58:38 2023 • Comments (0) •

I made this as part of my work on porting ACE to Haiku. ACE originates from MorphOS, and so it uses "catalog" files for internationalization in the format used there. The format is based on the IFF container typical on Amiga originated software.

Since the Haiku version of ACE uses pretty much exactly the same strings as the MorphOS version, it would be silly to have to re-translate all the strings. So I made this add-on to Haiku locale kit to handle catalogs in the MorphOS format.

At the moment this is not useful to any other software than ACE. However, maybe someone will find it useful. The format is a bit different than what is done in Haiku. In the Haiku locale kit, the original (usually English) string is used as a key for translations. This requires hashing the whole string, which makes everything a bit slower. It isn't very noticeable on modern machines, but it is definitely part of what makes Haiku slower than BeOS. On the other hand, the format used on MorphOS assigns an integer to each string, which can be used to refer to it efficiently. It still provides fallback strings inside the executable, so if there is no catalog, things will still work.

Maybe we could use this format more on Haiku and get better performance. But it would require a lot of changes to the tooling built around the existing format. Maybe a project for later...

You can get the iffcatalog sourcecode:

If you want to simply use it, the binaries are available in Haiku package manager.

Version history:

0.1 - 04/2015 - First public release
0.2 - 11/2017 - Skip '\0' in translated strings (used for keyboard shortcuts on MorphOS)
0.3 - 06/2020 - Fix catalog search directory
0.4 - 02/2022 - Add a python script to generate locale_strings.h from catalog definitions
0.5 - 10/2023 - Remove dependency on libtextencoding, which is not safely usable in a Haiku add-on unless the host app links with it as well. Fix a bug in the locale_strings.h generator with handling of escape sequences.

Personal notes on the smallweb

Posted by pulkomandy on Sun Oct 15 14:10:04 2023 • Comments (0) •

Lately I have taken part in some fediverse discussions about the "small web". As you probably know, many websites these days are bloated. They will load several megabytes of resources. I remember surfing the web as a kid, I would download some games and shareware, and I would be annoyed when one of them would be larger than 1MB, as that would take a while to download. Now, a lot of webpage (just the webpage, not the downloaded software!) are already 10 times larger than that.

Yet, they do not seem to offer much more than they did 25 years ago. A web page mainly shows text with some pictures where relevantm or just to make it look nice. So, what happened? Well, I don't really know. If anything, my website is getting simpler and smaller over the years. Early versions had MIDI music and a Java applet emulating the Start Wars opening scroll on the landing page.

Anyway, so, the "small web". It seems this idea started gaining traction in the last few years. It comes from multiple directions: retrocomputing is trendy, and, well, it always was, but now, there are decidedly retro machines that had their commercial life during the early days of the worldwide web. So there is demand from retrocomputer users who want to browse the web like in 1996 or so. Another nostalgic aspect is the "what if?" exploration of Gopher, an early competitor to the web that largely lost to it. And there is also the concerns caused, I guess, indirectly by climate change, of building a more sustainable tech that's less power hungry. Surely, simpler websites and web browsers could be part of that, as well as being genuinely useful to people who have a limited internet access for various reasons (living in very remote areas, not willing or being able to pay for a super fast connection, or being in a place where the infrastructure just isn't there for many possible reasons).

One thing that is somewhat succesful in that domain is the Gemini protocol. This is inspired by Gopher but made more modern and a little less quirky, for example, it requires SSL where Gopher was mainly plaintext. Unlike HTML, it gives very limited control on the text formatting to people writing webpages. As a result, it is quite easy to write a Gemini browser, even one working in a command-line, and indeed there are quite a few of them. I had only a brief look at it, and decided it doesn't work for me for various reasons (this is my personal reaction to it, not a complaint or so, I'm sure other people are happy with these choices and that's good for them). The SSL requirement makes it unreachable for the 8-bit retrocomputers I care about. The limited formatting is often not enough for what I want to do with the web, and likewise, I find the lack of cookies, URL parameters and the like a bit of an extreme measure. I understand the idea of enforcing privacy, but I am not convinced it is worth the limitations that result.

But, my main problem with it is: it is new technology. There is a new protocol, a new page formatting language, new specifications, and new software to write. I get it, a lot of people will find these new things exciting, it's fun to write software and specs from scratch, without having to care about legacy quirks and past mistakes. This is not how I usually think about things. As I learnt in school, a good software developer is a lazy software developer. So, my approach is often trying to make most use of what's already available. In this case, this means the existing web browsers. That's a good idea because we already have a lot of software, a lot of people who know how to write webpages, and a good understanding of the problems and limitations. And there are also users (people browsing the web) who won't need to be convinced to install yet another piece of software on their computers. Can we achieve that? I don't know, but I find that idea a lot more interesting to explore than the "what if we erased everything we have and started over?" approach of Gemini.

So, I took a look at the existing specifications for the web. My goal here is to see how to make websites that are "light" (a webpage should be a dozen to a hundred kilobytes or so), fast to load, and yet allow websites to have some visual personality. The latter can't really be done in Gemini and I think it's an important part of the web: allowing everyone to have their own space, and really make it reflect who they are. There will be the wall of text just using the default formatting. The cute page with colorful text and animated GIFs. The red-text-on-black-background crazy conspiracy theory website. And so on.

From the technical side, let's try to limit ourselves to the technologies already widely supported by web browsers. Or, to word it differently: I don't think the bloat of the web is a technical problem to be solved by throwing more tech at it. Instead, it's just a matter of making better and more efficient use of what we already have. The core set of technologies is in fact quite reasonable. On the protocol side, we have a choice of at least HTTP 1.0 and 1.1, possibly HTTP 2, possibly some other more experimental protocols. I'm not sure HTTP 2 complexity is really useful for such small websites, 1.1 will probably be good enough.

More interesting is the content format. Here, we can review the different available options. The oldest one is HTML 3, which does not have CSS and mixes up content and presentation. This is not great by today's standards and does not really make things simpler. The newest one is HTML 5, which is in a kind of "rolling release" mode and keeps changing all the time. Personally I find this to be a problem, I would prefer a fixed set of features.

So, that leaves us with HTML 4. Which turns out to be really interesting, because it is from a time where mobile phones were starting to go online, and so, there was a simultaneous demand for richer webpages on desktop machines, and really simple ones for then limited mobile phones and other similar devices. Also, it is from a time where the web attempted to move to XML with XHTML 4. This seems to still be controversial: XHTML is a much stricter syntax, while HTML 4 retains the more flexible way of working of HTML3 and later HTML versions. Basically, whatever you write in a text document, a web browser can parse it as HTML. It is very tolerant on unclosed tags, weird nesting of things (putting a paragraph inside a table inside a list item), and so on. XHTML, on the other hand, is allowed to reject a page as invalid XML. Well, it doesn't really matter anyway, existing browsers already accept both, so we can use both depending on the context. If you are writing a static website generator, you can probably make it generate valid XHTML. If you are manually editing webpages or allowing your users to do so, maybe HTML will provide a less frustrating experience? I'm not sure about that, in my early days of writing HTML and Javascript I think I would have preferred to have clear error messages instead of a browser that always did something, but rarely exactly what I wanted.

Anyway, with both XHTML and HTML4, one interesting aspect that the w3c worked on is modularity. You can pick a profile like XHTML Basic, which gives a set of recommendations for which tags should be used or not. The selected set seems reasonable for my use of the web, it does not seem very constraining to me. Likewise for CSS, you can easily decide to limit yourself to "level 1" or "level 2" features. Or at least, you can make sure your website "degrades" to an acceptable rendering on browsers that support only these, while making use of more advanced features on browsers that can handle it.

Finally, we have to talk about javascript. Javascript is not a great language. We know why that is: it was designed in a very short time with requirements along the line of "Java is trendy, but objecto riented programming is too complicated, can you make us a version of Java without objects? We need it in two weeks". As you expect, the result is not great, and the language did get objects later on anyways. Well, unfortunately, we don't really have a choice. So, my approach to this is to limit Javascript usage to the minimum possible, and make sure my websites work well on browsers that don't parse Javascript. But I still find it useful, for example for client-side pre-validation of forms. It is useful to have immediate feedback on required fields in a form, and not discover problems only after you have submitted it. That kind of thing.

Also, Javascript allows to set up things like "AJAX" (OK, no one calls it this way anymore), basically, you can have dynamic webpages where just one part of the page is reloaded or generated by Javascript code from a loaded network resource. Sure, this makes the browser quite a bit more complex, and the webpage will need some processing. But it doesn't necessarily have to turn into megabytes of bloat. In fact, if used well, this can be very efficient in terms of bandwidth, since the whole webpage does not need to be reloaded. So, I don't like the idea of banning Javascript. Just make its use minimal, and, where it makes sense, have a fallback for browsers without Javascript.

Finally, one thing that is a bit unused in the current web is RSS feeds. Or, more broadly, the idea to expose website data in a format that's easy to parse and process by applications. I thnk this is a path that deserves to be explored more. Not only RSS, but also more generally exposing any type of data in a well structured XML, and maybe using XSLT to process it into other formats. This is well supported in most web browsers, and certainly could get more use. I like the idea of websites exposing data as XML that can also be easily downloaded and processed elsewhere (by some other website, or by a "native" app on the user's computer). This is kind of a tangent to the small web, but maybe there's an opportunity to build something new and exciting here.

So, what will I do, then? Well, not much. This website should be XTML-1.0 basic compatible and uses no Javascript, because it doesn't need any more than that. It does use some modern CSS features, but even with CSS disabled, you can read the text in the articles and use the website in an acceptable way. I hope this inspires other people to do the same. Maybe if more and more websites are built like this, the bloat of the big modern web3 ones will stand out more, and people will notice and complain about it?

A deep dive into Thomson EFCIS graphic controllers

Posted by pulkomandy on Sat Sep 23 22:03:50 2023 • Comments (0) •

So, this week I thought about Thomson EFCIS graphic controllers again. I know that Thomson made at least two families of controllers, the EF934x, which runs the Minitel, and the EF936x, which is designed for pixel graphics, but I did not know as much about this one. It seems a bit more of a "high end" thing, but at the same time, it is loosely similar to the display system used in Thomson TO and MO microcomputers.

I also knew that Thomson EFCIS sold the color palette circuit originally developped for the TO9 as a general color palette chip (the TO9 has it described as a custom chip in its technical manual, but then it became an off the shelf chip).

Anyway, I became curious about the other chips and why and how they were designed. And then I read a lot of datasheets and research papers and patents. Here is what I found.

Note: this is based on some random web searches, I wasn't there at the time (in fact I wasn't born at the time) and I could be misunderstanding or misremembering what I read. Don't trust me too much...

What are we talking about, anyways?

EFCIS is a French semiconductor company. It was created by a public research lab which turned part of its activity into, apparenly, an integrated circuit design and manufacturing company. At the time, France was a bit behind the USA in terms of semiconductors manufacturing. The government helped solve this by injecting money into the sector, but also there would be various merge of companies to join forces, and partnerships with US companies to first resell their chips, and then manufacture them locally.

In the case of EFCIS, it would merge with several other companies to become Thomson Semiconducteurs, which in turn merged with Italian SGS to become SGS-Thomson, later shortened to ST Microelectronics. That company is still around today.

But our story does not start there. Instead, we have to dig into what was happening in the mid-70s in the Laboratoire d'Informatique de l'École Normale Supérieure (LIENS). That's another French reserach lab that, at the time, worked on computer systems and architecture. This is the 70s, so we're talking about probably only one computer, and various terminals to serve it. The terminals would either be text-only, or graphic ones such as the Tektronix 4010 series.

These terminals are somewhat complex and costly hardware. The 4010 is a strange device by today's standard, it does not use RAM to store display data, instead, the cathode ray tube itself keeps the pixels "on". This makes the system cheaper, but has the major downside that you can only erase the whole screen, not just subsets of it, and erasing takes a second or so. As a result, this is not giving a great user experience for interactive graphics.

So, the people at LIENS have an idea: what if we could use a TV and some RAM to make a cheaper terminal? It would allow interactivity, and also maybe color.

Their first try was a text terminal. This was reasonable to do, with 1Kx8bit of RAM and a character ROM, it is possible to display a 64x16 character grid on a TV. This allowed to have a first go at generating the display timings, clock, and pixel shifting.

There was already prior art in doing this, for example, Dan Lancaster's TV Typewriter from 1973 is based on very similar concepts (at only half the resolution, 32x16 characters). That TV typewriter allowed home computers to get some kind of cheap terminal system, and the video output in early home computers like the Apple I and the SOL-20 were based on this design.

Anyway, with the goal of making the cost lower, one solution is to reduce the number of chips by using an LSI (large scale integration) chip. This would result in the SFF96364, with a patent application in early 1977 (FR 77.0524, also known as FR2382049B1, and its equivalent US patent US4328557A, attributed to Jean Gastinel), I think just a little time before similar chips like the MC6845 were announced (but it's hard to find exact dates).

Of course, there would be no point in manufacturing an LSI chip just for use in the research lab, and so, the SESCOSEM, the chip manufacturer, also introduces it for sale to the general public. I have not introduced the SESCOSEM yet, but I don't have much info on it, it seems it merges with EFCIS just a few months after releasing this chip, and the chip is then rebranded as EF9364.

It's also worth mentionning that SESCOSEM was also a second source for Motorola products. Well, or at least that's how they advertised it, it looks like initially, the chips were manufactured by Motorola but rebadged with SESCOSEM part numbers. This means you could get both the 6845 and the 96364 from the same company.

However, the two chips, while similar in the way they drive the CRT, are different on the way they are interfaced on the other side. The 6845 is designed for use with a microprocessor (initially a 6800, but people figured out how to make it work with other CPU families as well). On the other hand, the 96364 is designed instead to interface directly with an UART chip. This means the chip is less flexible and has a fixed resolution of 64x16 characters, whereas the 6845 can be configured into many videomodes, but on the other hand, and unlike the 6845, it will manage memory writes and cursor movement, with some special commands allowing to move the cursor around. This makes it possible to build a serial terminal ("glass teletype") from a TV set, this chip, an UART, a character generator and 1Kx7 of RAM, with a bit of glue logic.

So, we now have this chip that makes it easy and reasonable for DIY computer builders to assemble a text terminal for their computers. This is taken up by several electronics magazines and kit distributors, one of the well-known ones being the Elektor Elekterminal. Suddenly it is possible to build a terminal that will not require hundreds of TTL chips, or include a microcontroller, and cost more than the computer it is attached to.

Into graphics controllers

But for the LIENS, this was just a first test of the concept. Their plan is not to do just text, they are working on designing hardware, and they want to use CAD applications. They want to replace the Tektronix 4010 with a TV set and a relatively cheap adapter board.

As the SFF96394 is finally made available to the general public in mid-78, with terminals and VDU cards starting to appear as kits or prebuilt at the end of that year, the design for the next system is published, as four patents and a paper in SIGGRAPH 78 from Philippe Matherat. Now my research was made quite a bit easier because he links to all of this from his personal website :).

So, this next design retains the basic idea of designing a chip for use in a Terminal. However, at this time, and for a more complex graphics terminal, it becomes acceptable to include a microcontroller, or maybe even a microprocessor and turn the terminal into a kind of simple workstation. This design aims to once again reuse a standard TV receiver as the display, but support Tektronix's PLOT10 API or even Tektronix serial line protocol. This way, the existing applications will run on the new, cheap terminal as a starting point, and then the terminal can be extended with color display, faster refresh times, etc.

At the time, the main constraint is not so much on the memory size. 16K DRAMs are now available at an acceptable price for this application. The main problem is rather the memory speed. It is not possible to output single bits from a single DRAM chip fast enough to match the needed pixel clock (the project aims for a 512x512 interlaced display window). The soluion to this is to put several DRAM chips in parallel (as you do when building a computer to address them in bytes), and then use a shift register to send the bits to the display circuit. Essentially, this removes the character ROM from the previous text based system, instead feeding the RAM output directly into the already existing shift register.

This way, the shift register is the only thing that needs to be really fast, the RAM will only need to fetch one 8-bit word at a time.

The second problem is fast vector drawing. Microprocessors at the time could not be programmed to do this fast enough to keep up with even the Tektronix terminals. Part of the reason for this is that the RAM is completely busy with sending pixels to the display in this design, probably to keep it simpler. Other machines at the time (such as the Apple ][) are figuring out how to intermix accesses to the RAM for the display and CPU, but here a different direction is taken. The CPU will have only limited access (in fact, it's possible to have no access at all) and instead the hardware will provide high level drawing commands, with the goal of drawing one pixel per clock cycle (during borders and blanking, when the memory is not busy with either display or refresh cycles).

The commands start with vector operations. The display controller keeps track of X and Y pointers and implements Bresenham's algorithm in hardware for drawing straight lines. Additionally, it can also draw characters (from the ASCII) table, with optional scaling, rotation, and slanting.

The chip internally manages a 4096x4096 coordinate space, of which a part is not visible on screen. This allows line vectors to go out of screen and the drawing to continue correctly with the next vectors. The part that would be outside the screen are clipped cleanly and the drawing is not distorted.

Access to the memory is done in parallel (all 8 chips) when reading, but writes are done one bit at a time, by enabling the RAS pin for just one of the chips.

The SFF96364 could fit in a 28 pins DIP package, but the new EF9365 needs a 40 pins package, and even that is a bit too limiting, so a few extra chips are needed: a demultiplexer to select the RAM chips, a shift register, and also a PROM to derive some signals from the clock. The EF9366 is a simplified version that can only draw a 256x256 display, but requires a little less support logic. Both versions of the chip do include a lightpen controller as well.

Color graphics are possible by adding parallel banks of memory chips, typically one for each of red, green, and blue. Then, the CPU can program an external register to store a 3 bit color, that is loaded to the RAMs when a pixel needs to be plotted.

In one of his later articles, Philippe Matherat explains why this approach turned out not to be such a great idea: it was done with the goal of replacing the Tektronix displays, and it did a decent job at that. But, a few year laters, bitmap displays would become more common, and faster CPUs such as the 68000 would be available, which would do a faster job at drawing lines and text than this chip could do, and also allowed more advanced operations (scrolling, overlapping windows, etc).

There were attempts to workaround these limitations, for example by adding extra operators on the data path between the video controller and the RAM for write operations. That would allow some operations like XOR with a cursor, intersection with existing drawings, etc. However, adding more complex drawing primitives require a larger redesign of the chip, more on that later.

Initially the chip finds some use, both in DIY videocard projects (again with Elektor offering one) and in scientific hardware where it makes it possible to show realtime and color display of measurements (for example I read about its use to display spectrograms). While the initial design was ready by may of 1978 (as you can see in the patents and papers published then), the chip would be available to customers only in late 1979 or early 1980. At that time, computers and workstations with raster graphics are quickly catching up, which means the chip did not really get a chance to be used.

Meanwhile, in Brittany, ...

Meanwhile, other innovations are going on in France. There are experiments to replace the phonebook with an electronic device, basically a terminal that would be plugged into a phone line and allow to send and receive data. Experiments on this start at about the same time in 1978. Of course, the plan is to replace the phonebook, which means the terminals will have to be produced in a large quantity, and be cheap. This justifies the use of an LSI chip again.

This time, there is no room for complex graphics: in a desire to keep the costs low, there are big constraints on the amount of RAM and ROM in the terminal. And so, it's back to the simple character based system. But maybe not quite so simple as the EF9364. The people working on this first experiment with hardware and protocols based on ANTIOPE, the early Teletext system in France. They also want to make the terminal user friendly, and for this they need various character attributes, maybe colors (eventually it will be 8 greyscale levels), double size characters. Hardware vertical scrolling ("roll") also seems useful to render long pages of text at an acceptable speed.

These requirements will lead to a new design, initially as a set of two chips based on the earlier combination of a text mode controller with a character generator. Eventually, both of the chips become a lot more complex than what they were initially, and, as it seems usual for EFCIS, they are not reserved to the Minitel and added to the general catalog of available ICs. These are the EF9340 and EF9341, also known as VIN and GEN and introduced in 1980. They will also find an use in the Philips Videopac+ consoles where their video incrustation support will be used to mix the video signal with the existing one from the Intel 8244 in the older generation Videopac.

The Minitel hardare will be revised several times over the years, to introduce new features (such as a dialer and local phonebook, and a generic ASCII 80 column terminal mode). This leads to new generations of video controllers, the EF9345 (introduced 1983, also ends up being used in the Matra Alice computers as well as the Philips VG5000) and the TS9347 (introduced 1988, with EFCIS now renamed Thomson Semiconducteurs, this one gets a new manufacturer prefix).

These chip are quite far from the initial "dumb terminal" usage. They can redefine some custom characters in RAM, have a "compression" system where a short byte sequence can expand into a longer string that is stored in memory, etc. They also provide a rather large 8x10 font, fitting 40x25 characters in a 320x256 video output.

Ok, back to the pixel based ones

The EF936x family also gets an upgrade in the form of the EF9367. This chip allows doubling the horizontal resolution (to 1024x512 pixels), but it does so in a not so convenient way. It looks like this was first done as an extension to an EF9365 videocard, and then pushed back into the video chip. It also changes the video timing: the initial timing for the 9365/66 resulted in a square, 256x256 or 512x512 window at the center of the display. But, this results in a lot of wasted space to the left and right of that window. The new chip changes the timing, it now operates on a slower clock but keeps the same horizontal frequency for the display. As a result, in 256x256 or 512x512 interlaced modes, the pixels will be slightly wider than high, matching the 4:3 ratio of the display, and at 512x256 or 1024x512 interlaced modes, they will be slightly higher than wide, with a ratio around 2:3, still much better than the 1:2 you would get in the same conditions with the original chip.

Another new feature is the possibility of a 60Hz mode, reducing the video resolution. The datasheet and appnotes also add a few extra tricks to the circuitry, showing how to implement vertical scrolling (useful on the 60Hz version because it reduces the vertical resolution, and so, parts of the pixels stored in RAM are actually not visible).

Finally, one last iteration is the TS68483. This is a much extended design, made to interface with a 68000 CPU and its 16 bit databus (but 8 bit access is also still possible). It adds more complex drawing primitives for curves, arcs, and so on. It also uses a larger 64 pin DIP package, which allows to have more of the logic integrated in the chip, for example, external shift registers are not needed. It can also produce even higher video resolutions. But, at this point, the design with memory completely isolated from the CPU is not appropriate anymore. Graphics terminals are a thing of the past, and other computer systems have shown that it is possible to get quite fast drawing in software, or, if there is anything to accelerate, it's rather moving blocks of data around, with something like a blitter.

Anyway, the EF936x series remains an ahead of its time graphics accelerator. It would take a few more years to see 2D acceleration implemented in hardware, and 3D acceleration would follow.

But what if 8 colors is not enough?

In the EF936x documentation, it is suggested to use 3 bitplanes for an RGB image. This is similar to what was used in the Thomson TO7, but it felt quite limiting. Later models at Thomson Microcomputers added a 4th bit of color, either darkening or brightening the other colors. This was implemented by feeding the 4 color bits from the video generator into a lookup ROM, and then a resistor network. Of course, this can be added to an EF936x system.

But Thomson computers did not stop there. The next step, introduced first in the Thomson TO9, was to use a dedicated palette chip. This would have 16 color registers that would store a 12 bit color (4 bit each for R, G, and B) and replace the ROM used in the previous models. Then, the color palette becomes fully programmable. Typically, you can switch to 16 grey levels, or mix 8 basic colors and 8 greyscales.

Once again, EFCIS is the company manufacturing that chip, first as a custom chip for the TO9, but later on added to their general catalog as the EF9369.

There is also a later revision, the TS9370, which can handle higher pixel clocks, and removes the need for a separate video buffer/amplifier, further simplifying the design. I don't know if there ever was any graphics hardware combining the color palette chip with the graphics processors (or even the textmode one). These products were manufactured by the same company, but for different projects, and it's unclear if the fact that they would fit together quite well is intentional, or just an happy incident.

Leaving Twitter

Posted by pulkomandy on Wed Aug 30 18:25:20 2023 • Comments (0) •

That's it. I'm off Twitter now.

I used it for 5 years. I now have archives of everything I posted there.

Now you can find me on Mastodon instead. See you there!

Website mirroring

Posted by pulkomandy on Sat Jan 1 15:19:32 2022 • Comments (0) •

Please do not mirror my whole website using wget mirroring mode, WebReaper, or other similar tools. It results in a lot of requests to my poor old server, too much memory and CPU use, and the server fan spins up and makes a lot of annoying noise. So I will ban your IP or IP range from accessing the website if you try this.

If you want a full copy of some parts of the website, please contact me so we can set up a better way to share the data (I can set up an ftp or sftp or rsync thing, for example)

Hello World

Posted by pulkomandy on Sun Feb 18 16:30:21 2018 • Comments (4) •

This is the new server of the PulkoTeam! Enjoy your stay here.

On this website you can find various stuff I work on, around the Haiku project, electronics, 8bit computers, and random things depending on the day mood.