A tiny interpreter for a stack-based language

After yesterday’s post, I figured I might as well explain how stack-based languages work. After all, they’re not that common, so maybe some clarification would be in order.

As the name indicates, a stack-based (or “concatenative”) language relies heavily on a stack. Functions are not called with parameters; rather, they may take a number of items off the stack, do something with them, maybe push a result back onto the stack, etc.

To illustrate this, let’s write a tiny interpreter for a stack-based language. I’ll use Python, but it should be trivial to translate to other higher-level languages.

First, let’s describe the language. A program consists of a number of words separated by whitespace. Words can be numbers, actual words, operators, pretty much anything…  For now, we’ll stick to integers and a small number of built-in words.

The interpreter starts with an empty stack. When it encounters a number, it will push that number onto the stack. For example, “1 2 3″ is a valid program, that pushes the numbers 1, 2 and 3, in that order, on the stack. At this point, the topmost element of the stack is 3. This is important, because words take elements from the top.

Our language will then support the following words:

  • +, which takes the two topmost elements off the stack, adds them, and pushes the result back.
  • dup, which duplicates the topmost element.
  • drop, which takes the topmost element off (and does not do anything with it).
  • swap, which takes the two topmost elements off the stack, and pushes them back in reverse order. Or, in other words, it swaps the two topmost elements.
  • print, which takes the topmost element and displays it.

(“Real” stack-based languages have many, many more words, but this small subset is enough for our purposes.)

Now, let’s write the interpreter. First, we define a Stack class.

class Stack:
    def __init__(self):
        self.data = []
    def push(self, x):
        self.data.append(x)
    def pop(self):
        return self.data.pop()

All we really need right now is two methods: push(x) to push an item onto the stack, and pop() to take an item off.

Now let’s define a rudimentary interpreter. The process() method takes a word and tries to do something with it… if it’s a number, it will push it, if it’s a built-in word, it will execute it, otherwise it’s an error.

class Interpreter:

    def __init__(self):
        self.stack = Stack()

    def process(self, word):
        if is_int(word):
            n = int(word)
            self.stack.push(n)
        else:
            try:
                f = words[word]
            except KeyError:
                raise SyntaxError, "Unknown word: %s" % word

            # determine number of arguments
            n = num_args(f)

            # pop that number of args from stack
            args = reversed([self.stack.pop() for i in range(n)])

            # call f with that list
            result = f(*args)

            # take return value (a list)
            # push elements on stack
            for x in result or []:
                self.stack.push(x)

    def process_line(self, line):
        words = line.split()
        for word in words:
            self.process(word)

Now let’s define a few built-in words. The idea is that we can define these words as Python functions. The number of arguments these functions take, is the number of items that will be taken from the stack. They should return a list containing the items that should be pushed, or None, in which case nothing will be pushed.

def _print(x):
    print x

words = {
    '+': lambda x, y: [x+y],
    'dup': lambda x: [x, x],
    'drop': lambda x: [],
    'swap': lambda x, y: [y, x],
    'print': _print,
}

(Lambdas are especially useful here to add simple functions as one-liners, but of course we can use named functions as well.)

Now all we need is a few helper functions, and we’re good to go:

def num_args(f):
    """ Return the number of arguments that function <f> takes. """
    return f.func_code.co_argcount

def is_int(s):
    try:
        int(s)
    except ValueError:
        return False
    else:
        return True

(These are clumsy and incomplete; for example, num_args doesn’t work correctly for methods; but for now they will do.)

Throw all the code above in a file, and add this __main__ block:

if __name__ == "__main__":

    e = Interpreter()

    while 1:
        line = raw_input("> ")
        e.process_line(line)
        print "stack:", e.stack.data

…and run that file through Python.  You should now have a very small but working interpreter. Let’s test it.

> 1
stack: [1]

Aha! This pushes the number 1 on the stack. Let’s try a few more numbers…

> 2 3
stack: [1, 2, 3]

We now have three numbers on the stack. Remember that 3 is the topmost one… so in this representation, the right represents the top of the stack.

> dup
stack: [1, 2, 3, 3]

dup duplicates the topmost element… so now we have an extra 3.

> +
stack: [1, 2, 6]

+ takes the two top elements, add them, and pushes the result back. So it takes 3 and 3, adds them, and pushes 6 back, which is what we see. The other items on the stack (1 and 2) are unaffected by this.

What if we want to add, say, 1 and 6, and we aren’t interested in the 2? Use some of the other words:

> swap
stack: [1, 6, 2]
> drop
stack: [1, 6]
> +
stack: [7]
> print
7
stack: []

Well. This is basically how a stack-based language works. Of course, languages like Forth, Joy and Factor have much more than this small subset; they have pretty much everything that mainstream languages have, and possibly more. Even my small llizard interpreter has more: you get to define functions, variables, macros, use strings and lists, manipulate the stack in other ways, etc. The nice thing is that in the end, almost everything is just a word.

In the next posts I will talk about the way these built-in words are defined here, and probably release the OCaml code for my latest toy interpreter.

8 Comments

  1. Tim Hoffman said,

    December 21, 2008 @ 8:56 am

    Hi Hans

    One very widely available (though most people don’t use it directly ;-)
    stacked based language that you should mention is postscript.

    Whole OO systems including mutliple inheritance and delegation have been built in it. (A really good example was TNT which ran on NeWS from Sun) a totally awesome system imho ;-)

    Cheers

    T

  2. Sidhath said,

    December 21, 2008 @ 2:23 pm

    Possible Bug.

    if __name__ == “__main__”:

    e = Engine()

    Shouldn’t this be:

    e = Interpreter()

    Sorry if I’m wrong…

  3. Hans Nowak said,

    December 21, 2008 @ 2:25 pm

    You are right. I renamed this class, but forgot the __main__ block. Will fix.

  4. Evgueni said,

    December 22, 2008 @ 11:48 am

    I could be mistaken, but isn’t Python itself uses a stack machine inside (with appropriate bytecodes for it)

  5. Hans Nowak said,

    December 22, 2008 @ 12:13 pm

    Yes, the Python VM is stack-based as well… but people don’t usually program in it directly.

  6. Nico said,

    January 5, 2009 @ 6:34 pm

    Another not very known but mature stack-based language is Pop-11. It’s the base of the Poplog VM that also include implementations of Common Lisp, Prolog, Scheme and SML. Used for many years for AI, it was open sourced some years ago. Available for a number of OSs, can be found at: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html

    Pop-11 is very funny to program in, allowing things that are impossible in most other languages.

    A curious detail is that assignement is done from left to right:

    5 -> x;

    It’s done through the user stack, so it could be splitted:

    5;
    -> x;

    Although its called VM, Poplog isn’t an interpreter or uses bytecode, but it’s incrementally compiled.

  7. Hans Nowak said,

    January 5, 2009 @ 6:38 pm

    Ah, yes, Poplog… I wrote about it a few years ago. Here are the links to those posts:

    http://zephyrfalcon.org/weblog2/arch_e10_00560.html#e562

    http://zephyrfalcon.org/weblog2/arch_e10_00560.html#e563

    I wonder if it has evolved since then? I’ll have to take a look.

  8. Link Backlog | Programmer's Log said,

    January 6, 2009 @ 5:34 pm

    [...] A tiny interpreter for a stack-based language Writing an interpreter in Python. [...]

RSS feed for comments on this post