Germ as a C Compiler
Germ can run MesCC.
Introduction
Germ can now compile basic C code using MesCC, and that’s a nice milestone to write a blog post about! I have a bunch to say about it, but I want to start by showing how you can try it for yourself. Note that Germ only runs on the Linux kernel and only on x86-compatible architectures (I’m running Guix on top of x86_64). Also, I assume you have access to the Guix package manager. If not, you will have to scrounge together the dependencies some other way.
Demo
For the demo, I will assume that we’re working out of a single directory. That is, the Mes and Germ checkouts should live in the same place. Note that building Mes is quite slow (dozens of minutes) and building the MesCC image with Germ is also quite slow (about 10 minutes).
Obtaining and building Mes
First, we need to checkout and build Mes 0.27.1:
git clone -b v0.27.1 https://git.savannah.gnu.org/git/mes.git
cd mes
guix shell -D -f guix.scm
CC=i686-unknown-linux-gnu-gcc ./configure
make
exit
cd ..
The reason we do this is because we will need the headers and libraries from its C library. Germ might be able to build everything itself, but I haven’t tried that yet. For now, we will simply link with the Mes-built libraries.
Getting the MesCC demo version of Germ
I’ve prepared a special branch for this demo called
demo-mescc (which will only be available temporarily –
sorry readers from the distant future). This provides the
NYACC and MesCC modules, as well as changes to them and Germ to
make things “just work”. Checkout this branch and build Germ:
git clone -b demo-mescc https://git.ngyro.com/germ/
cd germ
guix shell -m manifest.scm
autoreconf -vif
./configure
make
exit
Building a MesCC image
Germ supports serializing a program’s state to disk, and this is used as a way to “compile” programs. The demo branch includes a script you can use to make a MesCC executable:
./pre-inst-germ mescc-image.scm "$(pwd)"/germ-continue pre-inst-mescc
chmod +x pre-inst-mescc
Our test subject
We now need a C program to compile. We don’t dare challenge the status quo:
#include <stdio.h>
int
main ()
{
puts ("Hello world!");
return 0;
}
Save this as “hello.c”.
Invoking the compiler
MesCC relies on an assembler called “M1” and a linker called “hex2”. Get them using Guix:
guix shell -e '(@@ (gnu packages commencement) stage0-posix)'
Finally, we can run MesCC using Germ:
./pre-inst-mescc --arch=x86 -I ../mes/include \
-L ../mes/lib -L ../mes/mescc-lib -o hello hello.c
Et voilà:
./hello
Hello world!
What’s new
There are 34 new commits and a fair bit of cleaning up the branch history since getting NYACC working. Let’s look at some highlights.
Clean up
All the shortcuts from the last round of hacking are gone. Each change at the kernel level has been copied to the Scheme version as well as the C version. I’ve been using the C version of the kernel to test the portability of the growing base of Scheme modules shipped with Germ. I can build a working x86_64 version of the kernel, and use that to make sure all the system calls made from Scheme code use the correct IDs and word size depending on the architecture. This reassures me that an ARM or RISC-V port would be possible and would not be full of unpleasant surprises.
On top of that, I’m rather confident now with the kernel changes and the module system, so I’ve merged the WIP branches into the main branch.
Image loading
My original architecture for image loading was to use the same kernel
utility, and have it run specialized boot code to load an image. Now,
there’s a specialized utility that loads an image directly. This means
there are two utilities for each implementation of Germ: one called
“start” and the other called “continue”. The “start” utility reads
source code from scratch, while the “continue” utility loads a pre-built
image. This means that loading an image takes only a few milliseconds.
I’ve also taken to using the “continue” utility to add a shebang line to
images, so that they can be executed directly. Now, the final germ
utility is just an image that is about to process command line arguments
and either run a script or launch a REPL as requested.
Lists in system calls
MesCC calls out to other utilities using the system*
procedure. Under the hood, this uses the fork, exec, and wait
system calls. The exec system call is a little peculiar in that you
have to pass it lists: one for the arguments and one for the
environment. To the Linux kernel, a list of strings is a
NULL-terminated sequence of pointers to NUL-terminated sequences of
bytes.
To support this style of system call, the syscall primitive now
accepts “l” as a parameter specification. This works basically like the
old “p” specification, except that it adjusts a NULL terminated list
of pointers already in the system call buffer.
Using this, and a lot of Scheme code support, we can make exec system
calls from Scheme code. The tricky part from the Scheme side is loading
(and aligning) the strings correctly in the system call buffer. That
buffer also had to be made larger to support storing an entire
environment (full of store paths because I’m running Guix). I’ve
considered using bytevectors for this in the past, but so far don’t see
an urgent need for it.
Misc. Scheme improvements
There’s a never-ending list of Guile features that are slowly creeping into Germ. Trying to be brief, here’s a few of the most interesting ones:
- Record inheritance
- Exception objects (think SRFI 35)
- Variadic
foldandmap - The
basenameanddirnameprocedures - Procedure specialization with
cut(SRFI 26)
It’s getting much easier to add features now, too. New code can make
use of match. Existing code can be copied from Guile and expected to
work without modification. Having back traces and near-instant start up
makes testing new code almost fun! (Most things are fun compared to
debugging an unadorned “Segmentation fault” that takes a minute to
reproduce.)
A bit of a snag
One of the biggest problems I encountered while doing this is that NYACC relies on the evaluation order of operands in a few places. As an example, it has code like:
(define (do-stuff first-char second-char)
(frobnicate! first-char second-char))
(do-stuff (read-char) (read-char))
Now, if you process the operands from left to right, this code makes
perfect sense. However, Scheme does not specify the order in which the
operands are evaluated, and portable code should not depend on it. I
was hoping to take advantage of that when I simplified things by
evaluating the operands from right to left. That means that when
running on Germ, the first character read would be called second-char
while the second would be called first-char. This cropped up four
times in the NYACC code exercised to parse “hello.c”.
So far, I’ve just patched NYACC. It wouldn’t be too hard to change Germ, but maybe it would be better if it caught this kind of thing. It’s something I’ll have to ponder.
Looking ahead
I’ve already started a wip-guix-build branch to tackle
running Scheme code from Guix’s “build” modules. With some beefing up
of its reader and expander, Germ can load the gnu-build-system module
(along with its dependencies). This is nice because there are a few
gnarly macros in there, and Germ seems to be able to chew through them
just fine (famous last words – I’m sure it will fall over as soon as it
tries to do anything with them). The goal here is for Germ to build the
commencement Guix packages (i.e., Guile on the host side, Germ on the
build side). I guess a good start would be the stage0-posix package,
which uses the trivial-build-system and a handful of procedures from
(guix build utils). From there, I could move on to mes-boot to try
the gnu-build-system.
The other half of that job is getting Germ built from scratch in Guix.
I have a package that builds germ0, but I still need to use that to
build the rest of Germ with all of its modules. This will likely use
the same self-extracting Scheme trick I used for Bootar
(here’s an example of it). It’s basically just a
simplified tarball rendered as an S-Expression with enough Scheme code
to write the files to disk.
I think I’ll leave the future work at that. There’s a bunch to do with MesCC and Gash, but the Guix build code is a big enough job for now.