I’ve written a few (toy) Lisp interpreters over the years. Who hasn’t right? :-) At the core of such interpreters, there’s the eval-apply cycle. It basically works like this:
- To evaluate an expression, call eval on it.
- If the expression is atomic, this is just a matter of a simple name lookup, or the expression evaluates to itself (in the case of literals).
- If the expression is composite (i.e. a list), it is considered to be a function call, where the first element of the list is a function, and the other elements are the arguments passed to that function. We evaluate all of these elements (again by calling eval on them), then call apply to do the actual function application.
- If said function is built-in, we just call it, get a result back, and be done. If it’s a user-defined function, then we need to evaluate the expressions in that function’s body, using the procedure described here.
This is a simplified version — it doesn’t take into account special forms or macros, for example — but it does capture the essence of the eval-apply cycle.
Anyway, since expressions can be nested arbitrarily deep, it should be clear that eval and apply can and often do call themselves and each other.
Now, when writing a Scheme interpreter in a language like Python, it is tempting to use a straightforward translation of this principle. You write functions (or methods, if you wish) eval() and apply(), which may call themselves and each other. This works rather well — until you blow the stack. The implementation language’s stack, that is… because every call to eval or apply in the Scheme interpreter, also uses the implementation language’s call stack.
This is a problem, especially in Scheme, which relies on tail recursion to define “iterative” processes. (These still use recursion, but because of tail call optimization, they operate in constant stack space. See SICP 1.2.1.)
We can simulate tail recursion by using exceptions (as demonstrated by this Cookbook recipe, for example). This is the approach I followed in most of my projects. While it does work, sooner or later you’ll run into other problems, because the interpreter is still piggybacking on Python’s call stack, rather than having one of its own. So, implementing continuations, for example, becomes very difficult, as the notion of “the calling expression” is implicit.
So, I decided to use a different approach. And now we’re finally getting to the point of this blog post. :-)
This time I started with a call stack. The expression we want to evaluate is pushed onto it. For example,
(+ 1 a)
Then, I wrote a method run() which does one “evaluation step” using this call stack. If it’s done evaluating, it returns the result value, otherwise it returns None, as an indication that there’s still more work to do. (So, in order to evaluate an expression, you call it multiple times until you get a result back.)
How does such an evaluation step work? We take the topmost expression on the call stack, and start evaluating it. In this case, it’s a composite expression, so we just start at the beginning. We want to evaluate the symbol +. To do so, we push + onto the stack, and put a placeholder in its original position, indicated by “$$”.
;; call stack now looks like:
($$ 1 a) +
We’re now done with this step. In the next step, we again look at the topmost expression on the stack. It’s a symbol; we look it up, get a function back (represented by <lambda:+>, and stick it back into the original expression. We then move on to the next element of the list, which is 1, and push it.
(<lambda:+> $$ a) 1
Another step, and we get:
(<lambda:+> 1 $$) a
Let’s say the variable a has been defined and its value is 4. We then get:
(<lambda:+> 1 4)
Now we’re done evaluating the elements of this expression, and we can do the actual function application. Finally, we get the result 5.
This gets more complex when we use nested expressions, but the principle is the same.
(+ (* a (- c d)))
($$ (* a (- c d))) +
(<lambda:+> $$) (* a (- c d))
(<lambda:+> $$) ($$ a (- c d)) *
(<lambda:+> $$) (<lambda:*> $$ (- c d)) a
(<lambda:+> $$) (<lambda:*> 4 $$) (- c d)
It becomes more interesting when we introduce special forms into the mix. But since we have an explicit call stack now, we can clearly see what is going on with it, and tail recursion starts making much more sense. For example, if our topmost expression is
(if condition (do-this) (do-that))
and during evaluation we determine that condition evaluates to true, we can at that point replace the whole expression by
rather than having to push (do-this) and then sticking the result back into the if expression. Ditto for the last expression inside a begin block, and so on.
The special forms require a bit of hackery, since we don’t evaluate all of their arguments… but there’s only a few of them.
The proof-of-concept implementation that uses this concept is called dollop and is available at github. (Requires Python 3.0.) The name is because it’s only a “dollop of Lisp” (or rather, Scheme); it only supports a few special forms (begin, define, if, lambda), and a few functions for example programs (+, -, *, =, list). It cuts corners in other ways as well, as my goal was to get a working proof-of-concept out, not to write a complete Scheme interpreter.
Anyway, comments, bug fixes, improvements, etc, welcome.
(I will also release two other Lisp-y interpreters, but those will get blog posts and project pages of their own…)