Introducing Germ Lisp

Presenting my tiny, self-hosted Lisp.

Background

For many years, I’ve been helping to make software more “bootstrappable”. This has all been done in the context of the GNU Guix project. Guix makes the data flow from source code to running software environment explicit: these sources become these binaries by way of this compiler, whose sources become binaries by way of this other compiler, etc. How deep into compilers of compilers of compilers can you go? For many programs, all the way back to a 357-byte proto-compiler (on x86 Linux). There is a bit of hand waving here, as all of the builds are scripted in Scheme and need a compiled version Guile to run. Guile is several mebibytes. We’re close to getting rid of Guile, too, but that will have to wait for another post.

The 357-byte proto-compiler, hex0, simply turns ASCII hex digits into binary form. From there, we work up to a primitive assembler called M0. Then, with M0, we can build a compiler for a C-like language (a subset of C, really). This process is iterated a few times as the tools become more sophisticated, resulting in more capable compilers, standard libraries, assemblers, linkers, and several other programs. Then, we can build Mes. Mes is a Scheme interpreter. With Mes, we can run an even more capable C compiler called MesCC and use a more complete standard library. From here, we can build TinyCC, which can build an earlier version of GCC, which can build a later version of GCC, and so on until we have built the latest GNU build system.

This is all great! Really! It’s all ready to be integrated into Guix and make the bulk of the distro bootstrappable from soup to nuts. But sometimes I find myself wondering why we go from C to Lisp and then back again? Especially in Guix, where we end up going from C to Lisp, then back to C, and then finally back to Lisp. In fact, I’ve often wondered if we could insert Lisp immediately after hex0. A primitive Lisp interpreter is not much more complicated than a primitive assembler. It’s much more powerful, though!

This question has been in the back of my mind for several years now. I’ve built a few prototypes and explored various ways to pack a lot of punch in a small Lisp interpreter. Recently, I built a functional proof of concept which I’m delighted to present to you, dear reader, in this blog post.

Germ Lisp

I call the thing Germ Lisp (think “germinate”). It is written in assembly, runs on x86 Linux, and weighs in at about 2.25KiB. It works in two stages. The first stage is the kernel which is a tree-walking interpreter with a garbage collector, system interfaces, and enough of a parser to read basic S-expressions. The second stage runs directly on that kernel, adding a more sophisticated parser (apostrophe for quote, hex numbers, character literals, etc.) and a basic Lisp-style macro system. This second stage is Germ Lisp, and if you run it, you will be inside of a REPL, ready to send expressions to the interpreter.

System interface

The system interface is very Spartan, but extremely capable. It includes:

The combine-numbers procedure provides an indexed interface in to the various operations on machine integers. For example, you give it 0 for addition, 1 subtraction, and so on. It exposes 12 operations in total, including comparison. This is wonky, but saves space and goes away with a few simple definitions like:

(define + (lambda (x y) (combine-numbers 0 x y)))

The syscall procedure invokes the Linux kernel. In order to manipulate C-style strings, the Germ kernel provides a fixed-size buffer that can be read and written to using the buf-ref and buf-set! procedures. The syscall procedure takes a specification string for argument types. If the string has an “n”, the corresponding argument is a number and should be passed to the Linux kernel as-is. If the string has a “p” the corresponding argument is an index into the buffer and should be converted to a pointer before going to the Linux kernel. For example, to write the letter “A” to standard output, you could write:

(define %sys-write 4)
(define %stdout-handle 1)

(buf-set! 0 65)
(buf-set! 1 0)
(syscall "nnpn" %sys-write %stdout-handle 0 1)

Of course users of Germ wouldn’t usually use combine-numbers or syscall directly. The bootstrap script renames all the basic arithmetic procedures and provides a very primitive “ports” interface for input/output.

Memory management

To save space, the kernel only supports three types of objects: pairs, numbers, and “special” values (sigils, essentially). Since numbers and special values are atomic, the garbage collector only has to worry about pairs. This means that strings have to be lists of numbers with a “this is a string” special value in the car of the first pair. It’s horrifyingly inefficient, but it means the garbage collector itself is tiny.

It’s just a stop-and-copy system: stop execution, copy over some root objects, and then keep copying the objects they refer to (transitively) until you run out of new objects. When you copy a pair, put a “broken heart” special value in the old car, and the new address in the old cdr. That way, you never copy the same object more than once. The whole thing is just two subroutines: collect which traverses live objects and copy which copies them.

Self-hosting

Germ can build itself. The only script I’ve written for Germ (so far) is an x86 assembler, and a Lispified version of the Germ assembly code. The assembler constructs an embedded domain-specific language for writing assembly programs in Lisp. Here’s what the exit subroutine looks like:

;;; exit
;;; Exit the program with the code in EAX.
(: 'exit)
(mov %eax %ebx)                       ; Use EAX for the exit code
(movst ($ sys-exit) %eax)             ; EAX := SYS_exit
(int ($ #x80))                        ; Call kernel

If you’re familiar with x86, GNU assembler, and Lisp this should be pretty easy to understand. (The movst instruction is like an assembler macro that does the move via the stack, which is one byte cheaper.) Basically, each instruction is a Lisp procedure that manipulates some global state in the assembler. When you are finished adding instructions, you can run assemble! to get a list of bytes corresponding to the assembled instructions. These bytes go on to write-elf, which adds an ELF header to the machine code. The resulting ELF file is bit-for-bit identical to Germ as built with the GNU assembler.

As a bonus, the assembler written in Lisp can be run by Chibi Scheme (or likely any R⁷RS-compatible system). That means there are three ways to build Germ. Using the GNU assembler, using a Scheme interpreter, or using Germ itself. They all create identical results.

Limitations

It’s small and simple; what’s the catch? There are two major design decisions that trade performance for simplicity. The first is that variable values are resolved by linear search. This means that as programs grow in complexity performance degrades. For the small assembler, this is not a big deal. Germ is orders of magnitude slower than Chibi, but manages to assemble itself in a few seconds on my aging hardware. The second limitation is the representation of strings. First of all, it wastes a lot of memory. But it also wastes a lot of time to do string manipulations like linked list manipulations. And don’t even get me started on the extra GC pressure this causes!

On the other hand, it’s not as small as I would like. When I started, I hoped to keep the thing under 2048 bytes. The current ELF file is 2313 bytes. At this point I’m torn: it might be possible to bring it down to 2048, but there are one or two features the kernel is missing that would make the bootstrap script simpler. For the moment I’m trying to avoid getting distracted by nice, round numbers.

Inspiration

Like many Lispers, I’ve built a number of interpreters over the years. (It’s hard to resist!) When I first started exploring this space, I thought that it would require some kind of fancy trick to keep things small. Maybe inventing a clever pointer tagging system or avoiding having a top-level environment or something like that. In the end, the tried-and-true approaches yielded the best results. Germ looks a lot like any other interpreter. Rather than any kind of breakthrough, it’s the result of careful and parsimonious implementations of old ideas.

There are two projects that inspired me to take this approach. They both are, in many ways, careful takes on old ideas. The first is the rather famous SectorLISP. It shows how simple memory management and reading S-expressions can be. In some ways Germ is a beefier, Linux-compatible version of SectorLISP. The other project is Ribbit – a Scheme system composed of a bytecode VM and compiler. The original system provided a REPL in only 4K, and it was then updated and expanded to provide a complete R⁴RS system in 7K. (For comparison, when you include its bootstrap script, Germ is close to 30K and offers fewer features.) While Ribbit’s architecture is not directly applicable to bootstrapping, the simplicity of its bytecode VM helped convince me to throw away unnecessary features.

What’s next?

In order to make Germ more than just a proof of concept, it would have to be integrated into a larger bootstrapping process. To do this, it would have to be expanded in three directions.

First, it would need to be expanded down in order to connect to something simpler. It’s small, but maybe not small enough to be the bootstrap seed. It would be nice to keep hex0 in that role. This means converting the Germ sources to hex0 (or hex1 or even hex2). This is not a complicated job, just a bit tedious!

Next, it needs to be expanded up in order to connect to something more powerful. For this, a second version of Germ might be necessary to remove the two performance limitations discussed above. This would allow Germ to run more complicated programs – perhaps even MesCC. This should be done by conditionally assembling features into the Germ kernel. That way, we could keep the 2.25KiB version of the kernel, and use it to build a bigger version that can support a more capable scripting environment.

Lastly, it should be expanded sideways. By this I mean ported to other systems. A good start would be a RISC-V port. This should be pretty easy, as having more registers makes hand-writing assembly with bespoke calling conventions easier. It would be interesting to explore running on other kernels, of course, but that would require significant changes.

To be clear, I’m not sure if I’ll do any of these. My guess is that I’ll look at making Germ more capable. I’ve written an R⁷RS front end that provides syntax-case macros and a module system. I also have a handful of syntax and procedures from R⁷RS lying around. If I could get these to run (or even crawl) on a slightly expanded Germ kernel, that would be pretty neat.

Conclusion

It’s always nice to have a project with well-defined goal. “Is it small?” and “can it build itself?” are two questions that have pretty concrete answers. Now that Germ answers both questions in the affirmative, it’s time to put it out there. Please have a look at the code and feel free to send me feedback.