Germ as Bootstrap Guile
Germ can run the GNU build system from Guix
Introduction
Very recently, Germ learned another new trick. It can now replace Guile as the build-side Scheme interpreter for Guix when building the GNU Hello package. This is an exciting milestone, as the fear in my heart of trying to do this with Mes was what motivated Germ in the first place. Of course, Germ can’t do all of Mes’s tricks just yet, but this is the first thing that Germ can do that Mes cannot.
What it looks like
Here are some shell commands you can run in your own home to see the results for yourself:
git clone https://git.ngyro.com/germ/
cd germ
git checkout wip-guix-boot
guix shell -m manifest.scm
autoreconf -vif
./configure
make germ0-start germ0-boot.scm dist
gunzip -c germ-0.1.0.tar.gz \
| ./etc/make-ses.scm \
> germ-0.1.0.ses
guix build -f guix/boot.scm
If everything goes well, you should see something like:
successfully built /gnu/store/...-hello-2.12.3.drv
/gnu/store/...-hello-2.12.3
Of course whether this works depends on the version of Guix you’re using, but the build-side code rarely changes.
Let’s look at the packages of the “guix/boot.scm” file to see what’s going on.
Package 0: guile-via-germ0-seed
This is the bootstrap seed. It takes three inputs: the “germ0-start”
binary, the “germ0-boot.scm” script, and an install script which is
made using plain-file from Guix. The install script
copies the binary and the bootstrap script into the store, and sets
the shebang line of the bootstrap script to be the location of the
binary. It calls this script “guile” so that we can use it for
build-side Guix code. Note that this uses an opaque binary “blob”,
but it’s very small and it’s the only one we need.
This package uses a custom “raw” build system. This is modelled after the bootstrap Guile package. It creates a derivation whose builder is just a single command. In this case, the command is like
germ0-start germ0-boot.scm \
germ0-install.scm germ0-start germ0-boot.scm
but with each part replaced with a full store path.
Package 1: guile-via-germ0
This package uses the previous one as it’s Guile, and runs a G-expression via the “trivial” build system. This expression does two things:
- It unpacks the Germ sources, and
- It runs the “scm/germ0-to-germ0.scm” install script contained therein.
The first point is interesting, because the Germ sources are packaged as a Scheme program. To unpack them, you simply run the program! This is a nice way to avoid building a program for unpacking a tarball until later.
The output of this package is much the same as the first one, except that we built it ourselves. We are now “bootstrapped”.
Package 2: guile-via-germ-sans-image
This one is an intermediate step. It uses Germ0 to build regular Germ,
but it cannot create an image because Germ0 has no way to execute
another program. This is because the exec family of system calls
requires a list argument, and I left support for that out of Germ0 to
save space. Hence, we build the full Germ twice.
Package 3: guile-via-germ
This is the final Germ package. It is built by Germ, and improves on the previous package by including a pre-built image for fast startup. It will serve as the Scheme interpreter for all future packages.
Package 4: hello-germ
This is the main attraction. This is the hello package, but with
Guile replaced by Germ (using the #:guile package argument). This
requires Germ to run the gnu-build procedure from the GNU build
system.
What it took
Loads! I’m looking at 128 commits now on the WIP branch I’m using for this! The changes fall into three broad categories: new features, bug fixes, and heinous hacks.
New features
Install scripts
Germ now has three install scripts (besides the standard GNU tools). One builds Germ0 from Germ0, the next builds Germ from Germ0, and the last builds Germ from Germ. In order for these to work, both Germ and Germ0 needed to learn how to create and change to directories, as well as set file permissions.
Furthermore, in order to run these scripts from the Guix trivial build system, Germ0 needed to be able to read environment variables.
Exceptions
Germ now has structured exception objects (with inheritance and
everything). I’ve also added the two relevant SRFI’s: 34 (Exception Handling for
Programs) and 35 (Conditions). On top of this,
system calls now generally raise structured errors, which catch
converts to old-style Guile system call errors.
scandir
Germ now has the scandir procedure that I wrote for Mes. It also has
file-system-fold taken from Guile.
Random access, multi-mode ports
The ports system now supports random access for file ports. This is
built on the lseek system call. (String ports should support this,
too, but I didn’t get around to it.) Additionally, a single port can
be an input and output port now. There’s some details about when
ports get flushed that I may have borked, but I did think hard about
it!
Bignums
I ran into the famous Y2K4 bug that all Unix users face when using
31-bit signed integers to represent time. So now Germ supports bignums
(arbitrary precision integers). These bignums were bigdeals to
implement. (I almost wrote a whole post about how division works, but I
decided it’s actually pretty tedious.) This all happens in Scheme.
Every arithmetic operation now has to be generic, and fixnum arithmetic
has to be checked for overflow. I thought this would be a major
performance problem, but so far I haven’t noticed when running regular
code. The fibonacci benchmark (see my previous post)
takes 3.3 times as long, but it’s artificially arithmetic heavy.
It took a long time to write the bignum code. Partly it was hard, but maybe more importantly it was boring. After figuring out how division worked the rest of it was mostly bookkeeping, but very careful bookkeeping. I wrote it with Guile and Guile-QuickCheck in hand, the latter making testing pretty painless. Briefly, I considered using code from the reference implementation of SRFI 77, but decided against it because I was concerned about both its license and its quality. However, I laughed at one of the comments:
; Divide m/n: find q and r such that m = q*n + r, where 0 <= r < n.
; Oh no... time to get out Knuth...
I did indeed get out Knuth, which is always a riot if you have a really thorny problem and a lot of patience for math and history.
POSIX procedures
Many more procedures for interacting with the operating system are now
in Germ. Most of them deal with file system metadata, but my personal
favourite is fork. I also implemented copy-file using Linux’s
sendfile. That’s right: arithmetic is slow but you can copy files as
fast as the kernel can sling bytes!
Bug fixes
As usual, many little bugs needed fixing, but I want to recount a bug
fixing story, so I’ll only talk about one. It showed up after I
implemented bignums. Every once in a while, fixnum? would throw a
“wrong number of arguments” exception. The problem is that the source
code clearly shows this is impossible. Furthermore, it would happen
sporadically. If you change anything about the program, the error might
go away or pop up at a completely different time. Sounds like a garbage
collector bug... *shudders*
In order to get some more information, I wrapped the fixnum? procedure
up so that I could catch the exception and print more details about
what’s going on. Doing this showed that it only seemed to happen with
the numbers 12 and 13. For a while it was always 13, and I was worried
that Germ is refusing to work because it’s superstitious, but when 12
showed up I knew it had to be a real bug. Eventually I made it so that
the wrapper was basically:
(define (wrapped-fixnum? obj)
(or (false-if-exception (fixnum? obj))
(fixnum? obj)))
That made the errors stop, but obviously it’s ridiculous.
The cause of the error is how fixnum? is defined. It checks that the
object isn’t any other type. Germ only has four principle types, and
fixnum? is one of them. To check if an object is a fixnum, I wrote a
procedure that checks that it is not any of the other principle types:
(define (fixnum? obj)
(and (not (pair? obj))
(not (pointer? obj))
(not (special? obj))))
Germ maintains a very small number of primitives, and so this seemed
like a clever way to avoid making fixnum? one of them. Of the other
three, both pair? and pointer? are primitives. The last one,
special?, was defined in a goofy way:
(define (special? obj)
(eq? obj (make-special obj)))
(Nerds will appreciate the fact that specials are simply the fixpoints
of make-special.) The make-special procedure very unsafely tags
the object it’s given as a special. I used it to generate special
objects for procedures like eof-object?. However, there are a few
objects that are very bad to have in user code. The worst of them is
the broken heart object. This object tells the garbage collector that
a value has already been moved, and that it should use the new
location. If it shows up in user code, it steers the garbage
collector into a ditch with a dangling pointer.
What becomes the broken heart object when passed into make-special?
That’s right: the numbers 12 and 13! If you ask (fixnum? 13) it will
in turn ask (special? 13), which will briefly generate a broken heart
object. If you are very unlucky, the garbage collector will run into
this value, get confused, and poison your cat.
In order to fix this bug, I replaced make-special with fixnum?
(called number? during the Germ boot script) as a primitive. Now the
reader has support for generating specials. They are merely literals in
the source code. It’s a bit hacky, but I’m glad to be no longer haunted
by those numbers.
Heinous hacks
So far Germ has been relatively free of shortcuts, but I’ve added some doozies now.
Fake system
We can’t easily support Guile’s system procedure, as it parses shell
commands. Assuming we have no other shell or parser, we would need to
pull in Gash for that job, which would be pretty messy for one
little procedure. Instead, I’ve included a version of system that
simply ignores the one unnecessary use that Guix has for it.
Fake division
Guix keeps track of how much time each “phase” of a build script takes, and presents this information to the user:
phase `build' succeeded after 1.3 seconds
The way it does this is to convert a time object—made up of seconds and nanoseconds—into an inexact number. Specifically, it runs
(+ (time-second delta)
(/ (time-nanosecond delta) 1e9))
Now, Germ does not support inexact numbers. It’s not impossible to
add them to the kernel or to emulate them, but I’m hoping that it
won’t be necessary. In the mean time, I’ve added just enough hacks so
that the above math works and the result can be consumed by format’s
floating-point printer.
Now what?
In terms of achieving bootstrapping goals, the next step would be to
try Germ on the full Guix bootstrap. Rather than just building the
hello package, it would have to build everything up until Guile
itself can be built. This is within reach, but it’s a pretty
frustrating time (speaking from past experience). Even if that
worked, we would still be using %bootstrap-guile for Gash and
company. Fixing that involves porting those things to Germ. That
should be doable, since I’ve already done much of that work for
Mes. If those two parts worked, we could get rid of
%bootstrap-guile (on x86, anyway).
Another question is performance. The fun side of that is speeding up
the interpreter itself. Germ takes about 2.5× the time that Guile 1.8
does to compute (fib 35) fully recursively. Doing some experiments,
I can get the C version of Germ close to parity (hooray computed
gotos). That’s still an order of magnitude slower
than modern Guile. It’s also 3–4× the time it takes Chibi to
compute it. Ekaitz Zarraga has been working on implementing
a compiler and bytecode interpreter for Mes, and honestly I’ve got a
little FOMO. However, I
don’t have good insight into what’s actually slowing Germ down outside
of benchmarks. I could work on making procedure calls twice as fast,
or I could work on making half as many procedures calls—it’s the same
outcome! On that front I could speed up I/O by making system calls work directly
on bytes in the heap, rather than going through an intermediary step.
Or I could add let or even letrec to the kernel instead of
emulating them with lambda (making temporary closures along the
way). Plenty of experiments to try.
There’s also the need to run on other architectures. I’ve worked to keep the Scheme code portable, and test it from time to time by running it on an x86_64 kernel (built from C sources). It’s probably not perfect, but it gives me confidence that a port would require minimal Scheme changes. Porting the assembly code (and the assembler) would be a substantial effort. Honestly, it would be kinda relaxing—like doing a puzzle. The problem is that doing so would entrench the current design, and make changes very difficult. Until I’m sure about the design, I’m not interested in porting.
Parting words
Ending blog posts with a big fat list of jobs is a bit of a downer. This is supposed to be a celebration of a milestone: go forth and bootstrap your gexps! (Don’t actually Forth, though—LISP rules; Forth drools!)