Another tiny interpreter for a stack-based language… this time in OCaml

So a few days ago I was wondering if simple stack functions could be implemented concisely in a statically typed language like OCaml… similar to the way it can be done in Scheme and Python, as discussed here. This prompted an interesting comment by Juri Pakaste.

However, one thing that does come to mind is that there’s a limited number of stack effects you need to support – maybe create higher order functions like one_two (for dup), two_two (for swap), etc, that would take a function as a parameter and return a function that takes in the env and ns parameters, pop the correct number of items, give them as args to the function received as a parameter, then push the return values?

So, using this idea, I wrote another proof-of-concept “interpreter” in OCaml. I’ve deliberately kept it as simple as possible; the stack only takes integers, and there’s only a few words, and no real parser; rather, you pass a list of words (as strings) to the process_words function. 

(Again, my OCaml is clunky, but then again I haven’t read ML for the Working Programmer yet.)

First of all, let’s define a simple stack:

type stack = { mutable items: int list };;

let push stk x =
    stk.items <- x :: stk.items;;

let pop stk =
    match stk.items with
      [] -> failwith "stack is empty"
    | hd::tl ->
      stk.items <- tl;
      hd;;

let stack_repr stk =
    let contents = List.map string_of_int (List.rev stk.items) in
    let s = String.concat " " contents in
    "[" ^ s ^ "]";;

let make_stack () =
    { items=[] };;

Next, let’s write a few common “stack transformers”.

let one_two f =
    let g stk =
        let n1 = pop stk in
        let (r1, r2) = f n1 in
        push stk r1; push stk r2
    in g;;

let two_one f =
    let g stk =
        let n2 = pop stk in
        let n1 = pop stk in
        let r1 = f n1 n2 in
        push stk r1
    in g;;

let two_two f =
    let g stk =
        let n2 = pop stk in
        let n1 = pop stk in
        let (r1, r2) = f n1 n2 in
        push stk r1; push stk r2
    in g;;

let one_zero f =
    let g stk =
        let n1 = pop stk in
        let _ = f n1 in
        ()
    in g;;  

The idea is to take functions with different signatures; e.g. one_two wants a function that takes one argument and returns a tuple of two items, two_one takes two arguments and returns one, etc. They are then “transformed” to (or rather, encapsulated in) functions that take one argument (a stack) and apply the wrapped function to it, popping and pushing the right number of items.

Using these transformers, we can now define our built-in words.

let words = [
    ("dup", one_two (fun x -> (x, x)));
    ("swap", two_two (fun a b -> (b, a)));
    ("drop", one_zero (fun x -> ()));
    ("+", two_one (fun x y -> x + y));
    ("-", two_one (fun x y -> x - y))
];;

It looks a bit odd, but the actual functions have different signatures, and OCaml won’t let us store them in the same list, unless we wrap them. (Python’s decorators come to mind…)

Next up is the “meat” of the interpreter… processing words, determining whether they are numbers, looking them up in our list of builtins.

let lookup_word word =
    let rec lookup_word_aux words_left =
        match words_left with
          [] -> failwith ("unknown word: " ^ word)
        | (name, f) :: tl ->
          if name = word
          then f
          else lookup_word_aux tl
    in
    lookup_word_aux words;;

let is_integer word =
    try
        let _ = int_of_string word in
        true
    with Failure e ->
        false;;

let process word stk =
    if is_integer word
    then push stk (int_of_string word)
    else
      let f = lookup_word word in
      let _ = f stk in
      ();;

let process_words words stk =
    let g word =
        print_string ("> " ^ word ^ "\n");
        process word stk;
        print_string ("stack: " ^ (stack_repr stk) ^ "\n")
    in
    List.iter g words;;

Test test…

process_words ["1"; "2"; "3"; "+"; "dup"; "drop"; "swap"; "-"]
    (make_stack ());;

…produces:

> 1
stack: [1]
> 2
stack: [1 2]
> 3
stack: [1 2 3]
> +
stack: [1 5]
> dup
stack: [1 5 5]
> drop
stack: [1 5]
> swap
stack: [5 1]
> -
stack: [4]

Fun! Now, let’s try this in Java… ;-)

:: Comments (7)

Manta (everything old is new again)

So, this language with the working title llizard… I’m going to change its name to Manta.

I’ve been tinkering with various incarnations of this language/idea for a *long* time. I believe the oldest versions were called Manta. This was back in, say, 1992, when I was still using Turbo Pascal. :-)

Anyway, the OCaml sources are now available. Get them here: [manta-20081222.tar.gz].

Big fat disclaimers apply. This code is not so recent (written in 2006), and it was my first non-trivial project in OCaml, so I’m sure I’m doing a lot of clumsy things… like writing my own version of functions and data structures that are probably available in some library, among other things. It matters not. This is only an experimental version, and I am making the code available for masochistic adventurous souls.

There’s currently no documentation included. Build it using the build script (Unixoid systems only). Obviously you’ll need a recent version of OCaml (3.x), and Python if you want to run build.py (which generates a makefile of some sort). More info will be available when I get around to giving this thing a separate project page. :-/

A newer version, if any, will probably be written in Python 3.0 or Clojure or something. I have no concrete plans right now, but this project has a tendency to resurface every few years or so.

:: Comments off

Interpreter, redux

This post is a followup on:

Let’s take another look at the way the built-in words were defined, in the interpreter. This may look straightforward, but I used to do this differently, even in Python.

In the OCaml code (which will be made available soon), all built-in words are functions with the same signature. They all take two parameters, an environment (basically an object that holds the stack and other interpreter state) and a namespace. These functions then use the environment to pop elements off the stack, etc., resulting in code that looks like this:

(* dup  ( x -- x x ) *)
let dup env ns =
  let obj = Env.top env in
  Env.push env obj;;

(* swap  ( a b -- b a ) *)
let swap env ns =
  let a, b = Env.pop2 env in
  Env.push env a;
  Env.push env b;;

(* -  ( n1 n2 -- n3 )
*)
let minus env ns =
  let top_stack = Env.top_stack env in
  let x1 = Stack.pop top_stack in
  let x2 = Stack.pop top_stack in
  match x1.obj, x2.obj with
    LzInt i1, LzInt i2 ->
      let result = i2 - i1 in
      let newobj = Env.create_object env (LzInt result) in
      Stack.push top_stack newobj
  | _ -> raise Invalid_arguments "-: requires two integers"
;;

(Note: This was my first non-trivial project in OCaml, written about two years ago… I’m sure the code can be improved, but bear with me.)

For a long time I thought this was a fairly straightforward way to do it… until I saw the source code of this Joy interpreter (by John Cowan). Written in Scheme, it puts macros to good use in order to make word definitions simpler and clearer. The words mentioned above can be defined like this:

(joy-prim-list (dup x) (list x x))
(joy-prim-list (swap x y) (list y x))
(joy-prim (- i j) (- i j))

Definitions like (dup x) and (swap x y) tell us how many items are popped off the stack, *and* they get named automatically, to be used in the following expression.

Python doesn’t have macros, but it has other ways to do something similar. The tiny interpreter I wrote about yesterday uses a form of introspection to determine the number of arguments a function has. Using this, we can then write built-in words like this:

words = {
    '+': lambda x, y: [x+y],
    'dup': lambda x: [x, x],
    'drop': lambda x: [],
    'swap': lambda x, y: [y, x],
    'print': _print,
}

In other words, the function associated with +, takes two items off the stack, adds them, and puts them back. The function associated with dup takes one item, pushes back two, etc. This allows for definitions that are much more concise and straightforward… at least in these cases.

I wonder if the same can be done in statically typed languages like OCaml (or Haskell)? I don’t see an obvious way to do it, but I’d love to be enlightened. :-)

:: Comments (2)