Interpreter, redux

This post is a followup on:

Let’s take another look at the way the built-in words were defined, in the interpreter. This may look straightforward, but I used to do this differently, even in Python.

In the OCaml code (which will be made available soon), all built-in words are functions with the same signature. They all take two parameters, an environment (basically an object that holds the stack and other interpreter state) and a namespace. These functions then use the environment to pop elements off the stack, etc., resulting in code that looks like this:

(* dup  ( x -- x x ) *)
let dup env ns =
  let obj = Env.top env in
  Env.push env obj;;

(* swap  ( a b -- b a ) *)
let swap env ns =
  let a, b = Env.pop2 env in
  Env.push env a;
  Env.push env b;;

(* -  ( n1 n2 -- n3 )
*)
let minus env ns =
  let top_stack = Env.top_stack env in
  let x1 = Stack.pop top_stack in
  let x2 = Stack.pop top_stack in
  match x1.obj, x2.obj with
    LzInt i1, LzInt i2 ->
      let result = i2 - i1 in
      let newobj = Env.create_object env (LzInt result) in
      Stack.push top_stack newobj
  | _ -> raise Invalid_arguments "-: requires two integers"
;;

(Note: This was my first non-trivial project in OCaml, written about two years ago… I’m sure the code can be improved, but bear with me.)

For a long time I thought this was a fairly straightforward way to do it… until I saw the source code of this Joy interpreter (by John Cowan). Written in Scheme, it puts macros to good use in order to make word definitions simpler and clearer. The words mentioned above can be defined like this:

(joy-prim-list (dup x) (list x x))
(joy-prim-list (swap x y) (list y x))
(joy-prim (- i j) (- i j))

Definitions like (dup x) and (swap x y) tell us how many items are popped off the stack, *and* they get named automatically, to be used in the following expression.

Python doesn’t have macros, but it has other ways to do something similar. The tiny interpreter I wrote about yesterday uses a form of introspection to determine the number of arguments a function has. Using this, we can then write built-in words like this:

words = {
    '+': lambda x, y: [x+y],
    'dup': lambda x: [x, x],
    'drop': lambda x: [],
    'swap': lambda x, y: [y, x],
    'print': _print,
}

In other words, the function associated with +, takes two items off the stack, adds them, and puts them back. The function associated with dup takes one item, pushes back two, etc. This allows for definitions that are much more concise and straightforward… at least in these cases.

I wonder if the same can be done in statically typed languages like OCaml (or Haskell)? I don’t see an obvious way to do it, but I’d love to be enlightened. :-)

:: Comments (2)

Cutting

I did some long overdue Chicken hackery yesterday, and by accident I found out that Chicken’s (or rather, SRFI-26‘s) cut/cute macros are not the same as Arc’s [ ] syntax after all.

Scheme/Arc aficionados already knew this, of course, but to me it was news since I’ve never really used cut much. Here’s a short explanation.

Scheme’s cut produces an anonymous function that takes as many arguments as there are <> symbols at the “top level”. E.g.

> (cut + 1 <>)
#<procedure (? g3)>
; has one argument

> (cut + 1 <> <>)
#<procedure (? g4 g5)>
; has two arguments

; etc...

However, any <> that is found in a nested expression, is not considered as a parameter. Therefore:

;this works...
> (map (cut + 1 <>) '(1 2 3))
(2 3 4)

; but this does not:
> (map (cut + 1 (* 2 <>)) '(1 2 3))
Error: bad argument count - received 1 but expected 0: #<procedure (?)>

By contrast, Arc’s [ ] syntax produces an anonymous function that expects one and only one argument, which is represented by a single underscore. As a result, it’s simultaneously more and less limited than cut:

arc> (map [+ 1 _] '(1 2 3))
(2 3 4)
arc> (map [+ 1 (* 2 _)] '(1 2 3))
(3 5 7)
; no problem

; but two parameters doesn't work:
arc> (map [cons _ _] '(1 2 3) '(4 5 6))
Error: "#<procedure>: expects 1 argument, given 2: 1 4"

Of course, for anonymous functions that are more complex than this, it’s probably better to just use lambda… :-/ Or a named function.

:: Comments (2)

Interlude: Dude, where’s my Chicken?

This started out as a Chicken blog, where I would report my adventures with Chicken Scheme. But these days it’s more like a Python blog again. (Which isn’t too surprising, given my experience with the language, and the fact that I use it for work and most of my personal projects.)

Heck, it has even been added to Planet Python again, which wouldn’t have been the case if I wrote about Scheme all the time.

But really, I want to learn more about Scheme as well, *and* be able to use it in a pragmatic way. I just need a bit of focus. Right now I’m wondering what to look at next. Maybe object systems?

:: Comments off

Python vs Scheme: using files as both modules and programs

Python has a useful idiom, that allows one to use the same file as both a module and a program. Consider this simple example:

# foo.py

def bar(x):
    print "bar says:", x

if __name__ == "__main__":

    bar(42)

The if __name__ == “__main__” clause is only executed if foo.py is run as the “main program”. In other words, we can do both

$ python foo.py
bar says: 42

and

$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> foo.bar(33)
bar says: 33

Although admittedly a bit of a hack, this construct is well-known and often used.

Now, on to Chicken Scheme. Can we do the same? It required a bit of poking around in documentation and mailing list, but it appears the answer is yes.

;; foo.scm

(define (foo x)
  (print "foo says: " x))

(define (main args)
  (print "args: " args)
  (foo 42))

Now, we can run this as a script with csi -ss (which looks for a function called main and automagically calls it):

$ csi -ss foo.scm
args: ()
foo says: 42

Notice that the main function has an argument args, which contains the command line arguments passed to the program:

$ csi -ss foo.scm 1 2 3
args: (1 2 3)
foo says: 42

We can also import foo.scm from within an interactive session, in which cases main is not called:

$ csi

CHICKEN
Version 3.0.0 - macosx-unix-gnu-x86     [ manyargs dload ptables applyhook ]
(c)2000-2008 Felix L. Winkelmann        compiled 2008-03-05 on niflheim.local (Darwin)

; loading /Users/zephyrfalcon/.csirc ...
; loading /usr/local/lib/chicken/3/readline.so ...
#;1> (use foo)
; loading ./foo.scm ...
#;2> (foo 101)
foo says: 101

But, but! Isn’t Chicken primarily a compiler? Does the above work too when using csc rather than csi? Actually it does, but the invocation is different. I use the following:

$ csc foo.scm -postlude "(main (cdr (argv)))"

$ ./foo
args: ()
foo says: 42

$ ./foo 1 2 3
args: (1 2 3)
foo says: 42

The -postlude option can be used to specify code that runs when the executable is called. (Actually, the official explanation is: “Add EXPRESSIONS after all other toplevel expressions in the compiled file. This option may be given multiple times. Processing of this option takes place after processing of -epilogue.”)

I use (main (cdr (argv))) as the postlude expression, which seems to pass command line arguments the same way as csi -ss passes them to the main function (although there might be a catch that I’m not aware of). The cdr is necessary because the first item of the list returned by (argv) is the name of the calling program (e.g. “./foo”).

(If there’s a better way, please let me know, as my knowledge about the compiler is limited.)

Next up: parsing command line arguments in both Python and Scheme…

:: Comments (2)

Scheme crooks & nannies: the case form

You know R5RS’s case construct? It works much like case/switch statements in other languages, except it has more parentheses: ;-)

#;6> (define language 'chicken)
#;7> (case language
--->   ((c c++ java) "bleh")
--->   ((python perl ruby) "better")
--->   ((chicken scheme) "cool")
--->   (else "never heard of it"))
"cool"

Now, yesterday there was an interesting question in chicken-users. The following code works:

#;8> (define foo 'a)
#;9> (case foo
--->   ('a 3)
--->   (else 4))
3

Does that mean that Chicken’s case accepts qualifiers other than lists? Actually, no. Here’s a hint:

#;10> (define bar 'quote)
#;11> (case bar
---->   ('a 3)
---->   (else 4))
3

As it turns out, ‘a expands to (quote a), so it *is* a list, and therefore “correct”, in the sense that case treats it like a list containing the symbols quote and a. (Which is not what was intended, but it *is* valid.)

Too bad, the following does not work:

#;13> (define cool-names '(guido larry matz felix))
#;14> (case 'bill
---->   (cool-names "cool!")
---->   (else "uncool"))
Error: (map) during expansion of (case ...) -
 argument is not a proper list: cool-names

Inspecting a case construct with macroexpand-1 may give more insight in how case works exactly. I was going to leave this as an exercise to the reader, but thought better of it. :-) Here’s an example (formatting added by me):

#;1> (macroexpand-1 '(case a ((x y z) 5) (else 4)))
(let ((g2 a))
  (if (or (eqv? g2 (quote x))
          (eqv? g2 (quote y))
          (eqv? g2 (quote z)))
      (begin 5)
      (begin 4)))

:: Comments off

Python vs Scheme: function parameters (part III)

(Also see: part I, part II)

Just discovered something neat. In Python, if a function has arguments that have a default value, then those defaults are bound when the function is defined. So the following function, when called with no arguments, always returns the same value:

>>> def f(x=random.randrange(0, 100)): return x
...
>>> f()
32
>>> f()
32
>>> f()
32

…because the default value for x is determined when f is defined, rather than when it’s called.

And this is not allowed at all:

>>> def z(a, b=a): print a, b
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined

I assumed that these rules would be the same in Chicken, but it turns out that this is not the case. This allows for some cool constructs that simply aren’t possible in Python. Like the example with the random number:

> (define (f #!optional (x (random 100))) x)
> (f)
23
> (f)
89
> (f)
97

It appears that (random 100) is computed whenever f is called, rather than when it’s defined. We can also refer to other arguments in this default expression:

> (define (g a #!optional (b (+ a 10)))
>   (list a b))
> (g 3)
(3 13)
> (g 40)
(40 50)

Good to know. This behavior is intentional rather than accidental, judging from the Extensions to the standard section in the user manual.

:: Comments (1)

Macro trial balloons

I’ve been hacking on and off on s1, my vaporware Awk-like tool that uses Scheme for scripting. The “language” defined by s1 includes a few macros. Strictly speaking, these are redundant, but they do make things shorter, which is important for writing one-liners.

Anyway, I am still kind of new to macros… so I’m wondering if the code I wrote is sound. I’m offering the definitions here for review. (Comments from experienced Schemers are welcome.)

1. inc! is used to easily increase variables. It can take any number of arguments; if none are given, 1 is added by default. For example:

  • (inc! x) adds 1 to x
  • (inc! x 33) adds 33 to x
  • (inc! x 1 2 3 4 5) adds (+ 1 2 3 4 5) to x

As the exclamation mark indicates, inc! changes the variable in-place. Here’s its current definition:

(define-macro (inc! name . args)
  (let ((total (gensym)))
    `(let ((,total
             (if (null? (list ,@args))
                 1
                 (apply + (list ,@args)))))
       (set! ,name (+ ,name ,total)))))

2. Defining multiple variables does not mix well with one-liners. So I added a macro def that allows one to define them quickly and concisely. Like inc!, def takes any number of arguments. If an argument is a list (name value), then a variable is created with the given name and value. If an argument is a symbol, then a variable is created with that name and a value of 0. (Awk is often used to add numbers, so zero seems the most sensible default, IMHO.)

Examples:

  • (def x) — same as (define x 0)
  • (def x y) — same as (define x 0) (define y 0)
  • (def (a 1) (b “hello”)) — same as (define a 1) (define b “hello”)
  • (def q (w 3)) — same as (define q 0) (define w 3)

Here’s the current definition:

(define-macro (def . args)
  (if (null? args)
      #f
      (let* ((a (car args))
             (rest-args (cdr args))
             (name (if (list? a) (car a) a))
             (value (if (list? a) (cadr a) 0)))
        `(begin
           (define ,name ,value)
           (def ,@rest-args)))))

(I’m not sure about the #f; it’s not supposed to return a value anyway.)

In any case, s1‘s auxiliary functions and macros allow for concise code. (Some of it is sloppy, but useful for “scripting”, especially one-liners. Naturally, it’s always possible to write longer scripts using “cleaner” code.)

For example, here’s a one-liner that takes the last words on the given lines and adds them up (assuming they are numbers):

s1 '(B (def s)) (inc! s &$nf) (A (out s))'

(I’m not sure about the & syntax yet; it’s used here as a shortcut for the as-number function, which attempts to convert a string to a number, returning 0 by default.)

B and A are shorthands for BEFORE and AFTER, blocks that are executed before and after the main code (which is executed for each line in the given text). The actual order in which these appear doesn’t matter, but it’s probably more intuitive to do before-main-after.

Print the number of lines, words and characters (like wc):

s1 '(B (def c w)) (inc! w nf) (inc! c (len $0) 1) (A (out nl w c))'

Print names starting with “Ga”:

s1 '(if (~ #/^Ga/) (print $0))' /usr/share/dict/propernames

(I’m using regex literals, and ~ is the same as string-search, except it matches against $0 (the whole line) by default.)

These are just teasers. Actual code is subject to change. I will release this when the “API” is somewhat stable. It’s still mostly a toy, though… :-)

:: Comments (4)

Modules

I noticed there’s a new version of Arc. According to the web page, “The most dramatic change is probably the ability to use x.y and x!y as abbreviations for (x y) and (x ‘y) respectively.”

The x.y syntax reminds me of one of the things I miss in Scheme: a “Pythonic” module system. By “Pythonic” I mean, that you can import modules, and get a module object back, that you can access using x.y syntax. I’m mostly interesting in the namespace issue here, as the convention in Scheme seems to be, to just stick everything in the global namespace.

In other words, I am somewhat uncomfortable that it’s not possible to do this:

;; --- foo.scm ---

(define (bar x)
  (+ x 1))

;; --- REPL ---

> (import foo)
> foo
#<module foo>
> (foo.bar 3)
4
> foo.bar
#<procedure (foo.bar x)>

…or something to that effect.

Maybe this is an irrational “need”, but I like namespaces, and have gotten used to them over the years, and loading the toplevel namespace with lots and lots of definitions just seems a bit “unsafe” to me.

I found a comparison of Python’s and PLT Scheme’s module systems, which explains the issue better:

Once we’ve done ‘(require (lib “math.ss”))’, we have access to the internals of the math library. But there’s one surprise: unlike Python, ‘math‘ itself is not a first-class object. By default, the require form has the same semantics as Python’s “from [module] import *“!

The article then mentions the following technique to add a prefix to the names imported from a module:

> (require (prefix math. (lib "math.ss" "mzlib")))
> math.e
2.718281828459045

I suppose this would not be too bad as an alternative, although you still cannot do things like inspecting a module, pass it around, etc. Chicken does not seem to support it though, and Schemers generally don’t seem to miss the feature. (Or maybe there’s a reason why it would be a bad idea in Scheme.) So maybe I should just learn to live without it. Thoughts welcome…

:: Comments (4)

SRE regular expression syntax

Looking for eggs that handle regular expressions, I found scsh-regexp. It implements most of the regexp API, as described in “Proposed SRE regular-expression notation“.

Now, I’m not particularly interested in the API, as Chicken already supports regular expressions out of the box (see the regex unit). What I find much more interesting is the part of the proposal that *isn’t* implemented. It also defines a syntax for describing regular expressions. This part isn’t available in the egg; according to scsh-regexp‘s documentation, it was only implemented in SCSH.

Many questions arise. Why wasn’t this implemented elsewhere? Would it be useful (i.e. compared to usual regex notation)? Would it be hard to port to Chicken? Would it be hard to write from scratch, going by the documentation only?

Most likely, I’m not up to the task, but that doesn’t stop me from dreaming about things like a sed-like stream editor using this kind of syntax for regular expressions. It would be super verbose, but also more readable… at least, that is the assumption.

:: Comments (3)

Making s1 code shorter

More efforts to make s1 code shorter…

I added/changed the following:

  • slice function (this will come in handy for string manipulation, although (slice s a b) is still longer than s[a:b]
  • nf variable (indicates the number of fields in a line)
  • $ now accepts any expression, not just a number literal (so we can say $nf or $(- x 1) or whatever)
  • -f command line option to quickly set the field separator
  • created an alias s1 for ‘csi -ss /path/to/s1.scm’ (OK, this is not a change to the tool proper, but it helps… nobody wants to type csi -ss with the full path, all the time)

None of this is particularly original (as it’s all borrowed from Awk and Python), but it does help to make code much shorter. I can now write a one-liner like this:

find "${1-.}" -type d | s1 -f/ '(print "   |" (make-string nf #\-) $nf)'

…which produces almost the same tree as described here. Almost, because it uses one “-” for each directory rather than two. We can fix that by making the example somewhat longer (and less readable to non-Schemers, although admittedly the Awk example isn’t very readable to non-Awkers either):

find "${1-.}" -type d | s1 -f/ '(print "   |" (make-string (* (- nf 1) 2) #\-) $nf)'

Next up: regular expressions…

:: Comments off

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »