Germ Lisp Update
Germ is fitter, healthier, more productive.
Previously…
In a previous post, I introduced Germ Lisp. Germ is a small (less than 2.5KiB) Lisp interpreter – written in x86 assembly – that can assemble itself. At the end of my previous post, I presented three ways I would like to see Germ improve. I’ve made significant progress on making Germ more capable which is what I’ll present in this post.
Two Germs
In order to keep the original Germ tiny, all of the improvements are behind a feature flag. This means there are two Germs: the original one, now called “Germ0” and the new one, which has stolen the name “Germ”. This allowed me to write whatever new features I wanted without worrying about size.
Kernel improvements
Speaking of new features, there are quite a few! There are also a few bug fixes, but I’m not going to cover them. Here are the splashy new features at the kernel level.
Memory management
Germ now supports allocations of contiguous bytes (“bytes objects”) and allocations of contiguous objects (“structs”). Bytes objects make strings and symbols more efficient, and enable proper bytevectors. Structs enable vectors and records.
Symbol hash tables
Symbols are now hashed when created, and can be used as keys in hash tables. This is used to speed up symbol interning, and also for top-level and module lookups.
Modules
The kernel now supports simple module search, and maintains a “current module”. Much like Guile, the current module starts off as a raw hash table, but it can be replaced with a module object that has its own hash table along with a list of imports.
There is also a new hash table of loaded modules, which can be
accessed (and modified) with the primitive loaded-modules
.
Syntax
I added support for two new keywords: set!
and @@
. There’s
nothing special about set!
: it works as you would expect. The @@
keyword allows module search. It takes a module name and variable
name, and looks up the variable in the module (making use of the list
of loaded modules).
Multiple values
The kernel provides call-with-values
and values
, both of which run
with very little overhead. If you return from call-with-values
with
a single value, you have to allocate an argument list for the single
value. If you use values
, all you do is pop the consumer procedure
off the stack and pass the arguments given to values
directly to it.
Composable continuations
Rather than using the traditional call-with-current-continuation
, I
opted for “composable” (or “delimited”) continuations. I followed
SRFI 226
(Control Features) for the interface, and wrote primitives
for these mouthfuls: call-with-continuation-prompt
,
abort-current-continuation
, and call-with-composable-continuation
.
These allow for all kinds of fun, but so far I’ve been using them for
error handling.
Dynamic guards
I wrote a special primitive to support dynamic-wind
. It’s called
call-with-dynamic-guards
, and it works just like dynamic-wind
, but
doesn’t call the pre-thunk
or post-thunk
in normal circumstances
(only during stack winding and unwinding). It’s much easier to
implement without the extra calls – remember this is all in assembly!
Germ uses dynamic-wind
for parameterize
. I did flirt with
continuation marks (see SRFI 226) for a while, but decided
against it for now (they are good, just not required for this
project).
Error checking
After a while, you get sick of seeing segfaults. It’s much nicer if
(car 0)
signals an error instead of crashing the whole program. I
added quite a few little type checks and assertions. While it’s still
possible to crash Germ from user code, it’s much harder to do it by
accident. On top of that, I implemented stack trace printing so that
you can quickly locate the source of an error. Here’s an example:
> (define (bar)
(car 0)
#t) ; avoid tail position
=> bar
> (define (foo)
(bar)
#t) ; ditto
=> foo
> (foo)
Unhandled exception in procedure bar
created in module user
called by foo
created in module user
car: Not a pair (16 0)
This makes it easy to solve little errors like a missing procedure or using the wrong number of operands.
Modules
The original Germ0 initialization script only did two things: load a
script and launch a REPL. On Germ, it now bootstraps a module system and
then calls the boot
thunk defined in the (germ boot)
module.
There are actually two module systems: a “boot” module system and a regular one. The difference between the two is that the “boot” module system uses the simple reader and expander from the initialization script, while the regular module system uses fully featured versions built using the “boot” module system.
Both of these module systems are like Guile’s. That is, they use
define-module
for creating modules and use-modules
for importing
them (as opposed to define-library
and import
like in R⁷RS). Not everything is implemented yet, but you can
use #:select
and #:prefix
while importing and #:export
,
#:re-export
, and #:pure
while defining. Both exporting and
re-exporting support renaming.
Unlike Mes, which copies Guile 1.8 code for the module system, I’ve opted to write it all from scratch. Guile’s module system is very complex, and supports fancy things like autoloading and observers. I don’t think Germ will have to support any of that (except maybe autoloading, which we can probably fake). So far it’s been easier having simple code that only provides what’s actually needed instead of dealing with the much more complicated code from Guile.
Macros
Again, looking at Germ0, we only had Lisp-style macros
(define-macro
) – and we didn’t even have quasiquote
! Now I’ve
ported PopSyntax to Germ, which provides hygienic
syntax-case
macros. PopSyntax uses the “sets of scopes”
approach, developed by Matthew Flatt. After fixing a
few ellipsis bugs – turns out that feature is complicated – it’s
working really well. It can expand all kinds of complicated macros,
like Alex Shinn’s match
and the quasisyntax
and
quasiquote
implementations from (old) psyntax (Portable
syntax-case
). The main problem is that it’s very slow.
I’ll discuss performance in general at the end of this post.
The main
thunk
With all this in place, the boot
thunk (which was developed using
the simple reader and expander) can load the main
thunk using a
better reader and PopSyntax as an expander. The main
thunk is
normal (Guile-flavoured) Scheme code. It uses getopt-long
to parse
its command line arguments, and – don’t be too disappointed – can load
scripts and launch REPLs!
At this point, though, the scripts are Guile-like scripts. And the REPLs are Guile-like REPLs. If I wanted to I could add support for “-L” for adding module directories or “-e” to run a particular procedure. It would all be straight forward. Yes, we have come full circle, but the scripts and REPLs can now be very complicated.
How complicated?
Well, at first I was excited and satisfied that Germ could run
match
. That’s a feature that’s as useful as it is taxing on the
expander. But I kept going. It also runs getopt-long
like I wrote
above. It runs it completely unmodified from the current Guile
sources. Did you know that getopt-long
uses (ice-9 regex)
? Guile
uses the C library for regular expressions, which is not an option for
Germ. However, I had already built a (ice-9 regex)
wrapper around
pregexp (Portable Regular Expressions for Scheme and Common
Lisp) to add regular expressions to Mes. That also works
unmodified on Germ.
That’s not the end though. I kept poking, and was able to get NYACC to parse C code. That’s right! Germ will read
int
main (void)
{
printf ("hello\n");
return 0;
}
and then spit out
(trans-unit
(fctn-defn (decl-spec-list
(type-spec
(fixed-type "int")))
(ftn-declr
(ident "main")
(param-list
(param-decl
(decl-spec-list
(type-spec (void))))))
(compd-stmt
(block-item-list
(expr-stmt (fctn-call
(p-expr (ident "printf"))
(expr-list
(p-expr (string "hello\n")))))
(return (p-expr (fixed "0")))))))
Hooray! I bet MesCC wouldn’t be too much extra work, which is exciting. I’m more interested in getting some of the build machinery from GNU Guix running, but it intimidates me. It makes rather liberal use of Guile features. Ultimately I want the early bootstrap packages in Guix to be able to use this machinery along with G-expressions without relying on a pre-built Guile.
Performance
There’s good news and bad news when it comes to performance. I’ll
start with the bad news. Booting Germ – getting to the point where
the main
thunk is loaded – takes 16 seconds on my computer. Six.
Teen. Seconds. This is really rough. There seem to be two major
reasons for this. First, because Germ is written in assembly, it
makes it harder to implement features in the kernel. Most things are
delegated to the Lisp layer. At the moment, everything that’s not
required to be in the kernel, isn’t. This means that Germ has to read
and expand a lot of code in order to know how to do basic things.
Ports, read
, write
, hash table lookups, records, and much more are
all outside the kernel. Second, PopSyntax is much more “correct” than
it is “fast”. This is a problem with using sets of scopes, and
PopSyntax is not even a fast implementation of sets of scopes.
Basically, Germ has to expand a ton of code with a slow expander.
How about some good news? Sure, but first: a story. A fellow Mes hacker, Ekaitz Zárraga, wrote a blog post looking at the performance of Mes, Guile, and Chibi using the naïve Fibonacci function as a benchmark:
(define (fib n)
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))
(write (fib 35))
(newline)
Computing (fib 35)
results in calling fib
29 860 703 times!
Unsurprisingly, Mes was slow, Guile was blazing fast, and Chibi was
pretty darn good. Well, how did Germ do? Poorly! Germ took 4 times
as long as Mes built with GCC and twice as long as Mes built with
M2-Planet. This didn’t make any sense to me, since I know the
internals of both Mes and Germ, and Germ should be faster….
Now the good news! The reason Germ is so slow is that it’s arithmetic
procedures (+
, -
, and <
) are not fully built in to the kernel.
The kernel exposes those operations, but Germ adds variadic wrappers
over the binary primitives. That means that Mes is calling C code
directly for arithmetic, while Germ is calling assembly code via a few
layers of Lisp. If you adjust the benchmark so that Germ uses its
primitives directly, it only takes 40% of time that GCC Mes does!
It’s still 8 times longer than Chibi, but I’m pleased that it’s at
least theoretically faster than Mes.
“Unexec”
What do you do with a Lisp system that’s slow to boot? Be like
(ancient) Emacs and “unexec” it. I’m kidding, but only sort
of. I’ve been experimenting with primitives that essentially
serialize and deserialize Germ’s heap and its current continuation.
They are called dump-image
and load-image
. After booting and
before calling main
, the boot
thunk instead calls dump-image
to
write an image of the bootstrapped Germ to disk. Then, another script
can run directly on the kernel to bootstrap just enough to call
load-image
. The load-image
call is basically like calling a
continuation that has been serialized to disk. Germ is immediately
back at the point where it was about to call main
. This works
beautifully, bringing up a REPL in the blink of an eye.
I’m conflicted about this approach, since it feels like I’m sweeping
serious performance problems under the rug. On the other hand, having
images for other scripts could be really handy. You could have an
image capturing the point right before you call the main
procedure
from MesCC or Gash or whatever. That way, you pay the cost of
expanding macros only once. You could even go so far as to
pre-compute the top-level lookups, embedding the locations directly in
the code (but that may be a topic for a future post).
Other performance improvements
One issue that I suspect is slowing Germ down a lot is allocating every closure. As an example,
(let ((foo 42)) (+ foo 15))
expands to
((lambda (foo) (+ foo 15)) 42)
and that lambda
generates a full closure that gets run and then
immediately thrown away. This creates a lot of unnecessary GC pressure and just wastes time in
general. It wouldn’t be too hard to recognize that special case, and
speed it up. Since it’s used pervasively, I suspect it would improve
things a bunch.
As explained above, wrapping performance critical procedures like +
makes everything crawl. There are two ways to improve this. First, I
could add a bunch of more complicated primitives. This would
certainly help, but it’s quite cumbersome. Programming in assembly is
fine when you get a feel for it, but it sure keeps your ambition in
check. Another option would be to implement case-lambda
in the
kernel. That way binary +
would be the primitive, while all other
kinds of +
would dispatch to Lisp code. This would also help with
map
and its friends, whose fully variadic versions are much more
complicated than their fixed arity counterparts. It would also help
with simple optional operands (like the passing a port to write
).
As a final performance thought, I’m tempted to allow Lisp code to install new primitives into the kernel. Maybe I could reserve some fixed amount of space for new primitives, and rig up a way to write, compile, and install new primitives directly from Lisp. Or maybe write the primitives in C and use MesCC to compile and link them into the kernel in a new build. Or maybe Pre-Scheme…. This all seems a little crazy, but clearly has excellent “fun hack” value.
Future
First, I need to clean up these new features. Most of them are fine, but I left a few things in WIP status to keep momentum up. The big issue is that I maintain two versions of Germ (and Germ0): one written in GNU assembly (for the GNU assembler) and one written in Scheme assembly (for Germ itself). This is mainly because the GNU assembler can make a version of Germ with debugging symbols. It also helps make sure there are no Scheme assembler bugs (the two outputs should be bit-for-bit identical). After a while, I got bored of porting from GNU assembler to Scheme, so I left some stuff unported. It’s a rather mechanical task that I just need to chew through.
Second, I took a few shortcuts to get NYACC running, and will have to clean those up, too. Because boot times are so bad, I iterated (in classic Lisp fashion) by keeping an image running and monkey patching new features into Germ as much as I could. This means I have a little script that reaches into the guts of various modules and fixes little issues. I need to move all those fixes into the module source files.
After those “eat your vegetables” tasks are done, I can continue having fun getting MesCC to work. Ultimately it would make sense to rewrite portions of MesCC to use an assembler and linker written in Lisp. I don’t plan to worry about that too soon, though.
Of course, there will have to be some performance experiments, and the other two tasks from the first blog post still remain: porting Germ to other processor architectures and porting Germ to hex0.
The main goal I’m looking at now is to get Germ running as the bootstrap seed for Guix. I’ve already experimented with Germ0 assembling Germ0, and that worked. Getting the rest of the Germ sources into Guix and building and running it is the next step. After that, I want to experiment with G-expressions and then the build machinery. If that all works, we’ll really be cooking.
Summary
Germ continues to show promise both as tool to clean up the early Guix bootstrap and as a viable Scheme interpreter for much of the rest of the bootstrap process (until Guile is available). We are starting to hit significant performance issues, which will have to be addressed. In theory, Germ should be fast enough, but a few little oversights are tanking its performance. It will be interesting to see if Germ can speed up C compilation, which has been a big issue for those working on bootstrapping. Overall I’ve been pleased with how stable Germ is (especially with the new error checking). It’s becoming a fun little system to hack on, which helps keep momentum (and motivation) up.