It is one of these weeks where nothing goes as planned.
Rabbit hole level 0: I wanted to write videogames
For some time now I've been part of a team trying to better document the
VTech V.Smile console and make it easier to write games for it. They contacted
me because I had some experience (and blog articles) about other VTech hardware.
The current efforts are documented in my VTech wiki.
Most of the work was already done several years ago: datasheet and schematics were found,
hardware was documented, games were dumped, emulators were written. But there
was no documentation and no opensource tools to build new games, or at least,
nothing quite production-ready. The only option would be the compiler suite
provided by the CPU manufacturer (this is a custom CPU core, used in a few
other game consoles).
So, after writing the documentation in the wiki, I started to experiment with
writing an assembler and compiler. I initially started looking into vasm and vbcc,
because my experience with these in the past had been rather good. The developers
are helpful and the code is understandable by me and designed to make adding more
CPU architectures easy.
Rabbit hole level 1: I need a C compiler
I quickly ran into problems with vasm, however. The CPU in the V.Smile is a
purely 16-bit thing, which means it can't address individual bytes. While vasm
has some support for this in the code, it was never used, and in fact, does not
work. I discussed this with the vasm developers, and the solution they suggested
is that all addresses in the assembler code should be prefixed with some special
character, and the assembler frontend can multiply or divide them by two as needed.
I looked into that, but decided it would make writing assembler code more
complicated and annoying than needed, as there is a risk of forgetting the marker
and suddenly having your address all wrong.
On vbcc side, I did not have much problems, the porting guide is very complete
and there is not too much work needed to get a basic version of the compiler running.
But, without an assembler, it is not very useful. I did some experiments with
Mikke Kohn's naken_asm, which has support for the UNSP CPU used in the V.Smile,
but it is a simple assembler that can only directly generate a final binary. It
has no support for temporary .o files and a linker. So in my tests I had to
let the compiler generate assembler files and not assemble them, and also generate
them in a way that they could be concatenated together at "link" stage before
being assembled into a binary.
I got this to work for simple cases, but it is not great to work this way.
Rabbit hole level 2: I need an assembler and linker
I let the project sit for a while (I think it's been about a year?) hoping
that someone else would do it(tm) or I would find a more suitable assembler
somehow. I looked into ASXXXX, but this looks somewhat limited and not super
easy to port.
So, eventually, I decided, if I'm going to port somethign not super easy, I
may as well go for the Real Thing, and port GNU binutils. My research showed me
that there is a porting guide, even if it's fairly short. And I think I have
spent enough time doing low-level stuff (compilers, assemblers, wriing linker scripts,
baremetal programming on AVR and ARM) that this should be within my reach. And
so I cloned the git repository and started following the guide.
After just a few hours, I had something compiling and generating various executables:
assembler, ar, objdump, etc for my architecture. I don't expect any of these to
actually work, I started by just filling in empty functions and adjusting the
buildsystem to get it to compile all things. The idea is then to run each of them,
find what doesn't work, and add the missing functions as I go.
Binutils comes with a test suite, so I thought I would start by running that,
look at all the failing tests, and fix them by adding bits of the code for my port,
looking at how it's done for other CPUs.
Rabbit hole level 3: running the binutils test suite
This doesn't look too complicated: install the needed software, run "make check",
investigate and fix bugs, and repeat.
So I went ahead and installed DejaGNU and expect which form the base of the
testing framework. I then ran "make check" and… the testsuite immediately failed.
I had not heard of DejaGNU before, it seems to be a set of extensions to
expect used to run tests on cross-development environments, typically, compile
software on one computer, run it on another, and check that the results are
as expected. I am not sure if anyone else uses it outside of binutils and gdb.
In any case, it is written in expect, which itself is written in TCL. And in
the binutils case, it is also intertwined with the binutils build system which
is written using autotools (and a specific version of it).
Rabbit hole level 4: learning to use expect
So my next step was trying to run a simple "expect" program. I quickly found
that expect was completely broken, and it was a known problem with a bugreport
opened at Haikuports since 2021. I have not mentionned that I am doing all this
using the Haiku operating system, I would not run into these problems if I had
chosen a more stable and finished operating system. But where would be the fun
in that?
Anyway, so expect doesn't know how to open a PTY to communicate with another
process (which is the main thing it is designed to do: spawn a process, read its
output, match that with some regular expressions, and reply with some input
according to a script).
A quick look at the code and buildsystem helped me find the problem: expect
can handle many ways to open PTYs, and on Haiku, the preferred one was not picked
because it requires linking an extra library that the expect configure script could not figure out.
I quickly fixed that and… immediately hit another bug.
Rabbit hole level 5: coreutils
Now expect would correctly open a PTY, but it would fail to configure it.
I once again dug into the sourcecode and found that it does this by running
"stty sane" using the system() call. So I ran that same command in my shell,
and indeed was greeted with the exact same error message.
Quick sidenote: I found the use of "stty sane" using strace and looking for
calls to the exec system call. This almost didn't work: support for printing
the command line of the executed command for exec in strace was added in Haiku
by another developer just 3 weeks ago. So that's one rabbit hole jumped over, yay!
stty is a standard command provided by GNU coreutils (in Haiku at least,
other operating systems may have their own version or one written by someone else
under a different license or using a different programming language).
The expectation is that coreutils will detect and check a lot of things about
the OS in their configure script while building, and compile the tools in a way
that works for each system. But, they didn't handle the case where termios.h
defined speed_t to be an unsigned char type. They are trying to set speed_t
variables to -1 and later compare them equal to -1, and due to integer promotion rules
in C, this is not the case. If someone is trying to tell you Javascript makes no
sense, if you want them to go away, tell them about C integer promotion rules.
Anyway, I added the missing type cast, and stty started working. I thought I
was finally ready to go one level up the rabbit hole towards the surface. I was
wrong.
Rabbit hole level 4-and-a-half: expect again
I installed my newly built coreutils on my system, ran expect again, tried
to run a child process, and this time, not only expect would start, but I managed
to read the output from the launched program.
I then returned to the binutils test suite and ran 'make check' again. This
time, it ran 2 tests, and the 3rd one made it stop waiting for something. I was
a bit annoyed, not only because I had already fixed more bugs than I wanted to,
but also because I was not too sure which part of the stack was wrong this time.
Eventually I found how to enable expect debug mode, and found which command
it was running. I confirmed that the same command, ran standalone, returned
immediately and with the correct results. So that wasn't a problem and I turned
my attention to the test framework.
I studied the DejaGNU script for the failing test, and, while it took some
time to peel all the layers, eventually I found that it was something quite simple:
run 'ar' with some arguments, wait for the command to complete, and then check
the output file. The failing part was 'wait for the command to complete'.
After some more experimentation with expect, I wrote a two line script that
reproduces the issue. I ran it on Linux and confirmed that it has no problem
there. Since that script is short, here is a copy of it:
spawn echo
expect eof
So basically, we start the 'echo' command and wait for it to terminate. And
expect doesn't noticed that it terminates. 10 seconds later, there is a timeout
(that doesn't happen in the coreutils tests because they set the timeout to 300
seconds instead of 10).
I turned to strace again, but I could not see a lot more. I also tried to
follow the code in expect and in the tcl interpreter, but I quickly got lost.
So I opened a support request on the expect bugtracker describing my problem,
and went to sleep.
The next day, I had some answers from expect developers, mainly suggesting
things that I had already tried but not included in my short ticket, so I shared
the info (strace output) with them. And my fresher brain after a night of sleep
also helped looking at things in more details. I know that expect uses a PTY
to communicate with the spawned process, and so I decided to write a simple
test program to do something similar with less "moving parts" involved: spawn
a child process attached to a PTY, let it exit, and verify that the parent
process waiting on the other side of the PTY is notified that the child is done.
Rabbit hole level 6: PTYs and poll
So I picked an example of PTY usage and started modifying it to my needs.
And, I could easily reproduce the problem. Once again I made sure to run the
program on Linux and Haiku to compare outputs. On Linux, when the child process
exits, the PTY is closed and the poll in the parent process is notified. On
Haiku, this does not seem to be the case, and so this program remains locked
waiting forever. However, removing the poll call, a read call does not block,
and properly returns an end of file. So it is just a problem of notifying the
process waiting on poll that the file descriptor it is waiting on is now closed.
Now the next step is to fix that bug in Haiku. And, even if I do that, I
don't know if it will also fix the problem in expect, as I was not able to find
where in Tcl the waiting for file descriptors is handled.
So, as of now, I don't know if this rabbit hole has more rooms for me to
explore, or if I will find my way up at least one level. Maybe I will lose interest
in this and do other things for a few months before I get back to it. And probably
I will uncover many more rabbit holes.
Conclusion
For people who think Haiku should not be in "beta" releases, I hope this helps
you understand what we mean when we tell Haiku is not finished. It is not a safe
ground to build any software on. Sure, a lot of commercial systems don't do any
better, or didn't in the past, but still, the other options currently available
aren't that bad nowadays. And not everyone is willing to get depp into these
things like I do.
For people who wanted to use my C compiler to port games to the V.Smile: well,
if you don't run Haiku, you can stay compfortably at level 1 or 2 of this rabbit
hole and still be of help. If someone else was porting this assembler and compiler,
I wouldn't need to run the binutils testsuite and all the deeper levels could be skipped. For now, at least.
For myself: sometimes it feels like I'm making no progress, but that's not
true. It's just a lot of work in directions I didn't initially plan to go in.
And such things are probably helpful for future projects as well. Also: I am
surprised there were not more complaints about expect not working, and about
PTYs being broken on Haiku. I thought these would be used a bit more often in
typical UNIX toolchains?