Defining custom literals in Chicken Scheme

In a previous post, I briefly pondered what a Scheme-based Awk-like tool would look like. Awk has a concise syntax to access fields; e.g. $1 is the first field in a line, etc. A similar tool written in Scheme would benefit from having such syntax as well.

So, I wondered how much work it would be to add it to Chicken. Let's say, something that maps $1 to (field 1) (assuming a function called field exists, of course). As it turns out, it's not much work at all, not even for someone who has never hacked the Scheme reader (that would be me :-).

First of all, we need to look at the set-read-syntax function, which is helpfully built into Chicken. It is used like this:

(set-read-syntax! <character>
  (lambda (port)
    ...read characters...
    ...return custom value...

...where <character> is the first character of the new literal. The lambda that follows can then do custom reading from port, resulting in data that can be manipulated at will. In this case, I want the literal to start with $, then read digits, and stop as soon as a non-digit is encountered. So my code should look something like this:

(set-read-syntax! #\$
  (lambda (port)
    (let* ((s (read-number port))
           (i (string->number s)))
      (field i))))

Except that Chicken doesn't have a read-number function. Fortunately, it's not hard to write one. Here's a version using read-token. (read-token reads a character at a time and tests it against a predicate, collecting characters that match the predicate, stopping as soon as one doesn't match, and returning the collected characters as a string.)

(define (number-char? c)
  (member c '(#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9)))

(define (read-number port)
  (read-token number-char? port))

Now let's test it with a dummy implementation of field.

(define *fields* '())

(define (field n)
  (cond
    ((< n 1) "")
    ((> n (length *fields*)) "")
    (else (list-ref *fields* (- n 1)))))

(set! *fields* (string-split "the quick brown fox jumps over the lazy dog"))

(printf "~a ~a ~a~n" $1 $9 $5)

...prints "the dog jumps". :-)

(And yeah, my code isn't perfect, but it's just for demonstration purposes.)

Much like Ruby's monkeypatching, defining custom literals is probably not something that should be used in libraries a lot, but it looks like it could be very useful in DSLs or tools like the one mentioned.

:: Comments (1)

Simple unit testing with Chicken Scheme

I'm currently translating a small toy project of mine to Chicken Scheme. Since the original (written in Python) has unit tests, it made sense to look for similar functionality for Chicken. As it turns out, there are several testing frameworks available. Since right now I'm still exploring, I settled for a relatively simple one, the test egg (by Alex Shinn).

Unlike Python's unittest, the test library is not object-oriented at all. Rather, it defines a small number of useful functions and macros, which can be used at will (e.g. nested, inside lets and loops, etc).

The most important interface is (test <expected-value> <expression>), which, naturally, checks if expression evaluates to expected-value. For example, (test 4 (+ 2 2)) would be a valid use of test. (Note that switching the arguments does *not* work; (test (+ 2 2) 4) is illegal.)

Here's some actual code:

(use test)
(load "tools")

(test-group "make-address"
  (test #xC000 (make-address #xC0 #x00))
  (test #x1234 (make-address #x12 #x34)))

(test-group "high"
  (test #x08 (high #x0801))
  (test #xC0 (high #xC000))
  (test #x12 (high #x1234)))

(test-group "low"
  (test #x01 (low #x0801))
  (test #x00 (low #xC000))
  (test #x34 (low #x1234)))

(test-group "signed->unsigned"
  (define s->u signed->unsigned)
  (test 0 (s->u 0))
  (test 20 (s->u 20))
  (test #xFF (s->u -1))
  (test #xF7 (s->u -9))
  (test #x80 (s->u -128))
  (test-error (s->u 200)))

(test-group "unsigned->signed"
  (define u->s unsigned->signed)
  (test 0 (u->s 0))
  (test 20 (u->s 20))
  (test -128 (u->s #x80))
  (test -1 (u->s #xFF))
  (test -9 (u->s #xF7))
  (test-error (u->s -1)))

The test-group form is a simple but effective way to group tests together; they show up in separate sections in test reports. It has its own lexical scope, so anything defined inside it is not visible outside of it.

When an expression is expected to produce an error, use test-error to catch this and test for it. To simply assert that an expression is non-false, use test-assert.

Of course, tests can be used inside loops, which makes it really easy to test each item in a list. Each of these tests will show up as a separate entry in the resulting test report. (Something which isn't so easy to accomplish with Python's unittest.) Some more actual code (which tests all items in a list, and defines a meaningful name for each test):

(test-group "*opcodes*"
  (define (test-opcode opcode)
    (define test-name (sprintf "test opcode: ~s" opcode))
    (test-assert test-name (member (opcode-type opcode) *opcode-types*)))
  (for-each test-opcode *opcodes*))

When running these scripts, test reports are produced that look like this:

-- testing make-address ------------------------------------------------------
(make-address 192 0) ................................................. [ PASS]
(make-address 18 52) ................................................. [ PASS]
2 tests completed in 0.002 seconds.
2 out of 2 (100%) tests passed.
-- done testing make-address -------------------------------------------------

-- testing high --------------------------------------------------------------
(high 2049) .......................................................... [ PASS]
(high 49152) ......................................................... [ PASS]
(high 4660) .......................................................... [ PASS]
3 tests completed in 0 seconds.
3 out of 3 (100%) tests passed.
-- done testing high ---------------------------------------------------------

...etc.

The test egg works for my current purposes; writing tests is easy, and so is grouping them, or running several files of tests. (Just create a file test-all.scm that loads all the other test files.) However, in the future I might want to look for a testing framework that is a little more configurable. For example, as far as I can tell, it's impossible to make small changes to the test report output, without rewriting the whole function that handles it (and registering it as the current-test-group-reporter). But I'll stick with it for now -- it's a great way to do real testing (with surprising flexibility) with little work.

Update (2008-01-24): I found it somewhat annoying that, if you have a lot of tests in separate groups, you have to scroll up in the terminal window to see if any failed, because totals are only displayed for each group, but not for all groups as a whole. However, this is easily fixed with a small shell script:

csi -s test-all.scm | grep FAIL

shows quickly if there were any failures. It's not perfect (e.g. it doesn't show which groups the failed tests were in), but it helps.

:: Comments (3)

Scripting with Chicken

Today a cool script was posted to the chicken-users mailing list. Not only does it illustrate how to make an egg, it also demonstrates how to use Chicken Scheme as a scripting language.

The eggify function is somewhat long, but still relatively easy to read; the second let* part is long, but not complicated, as it can be read line by line. It almost reads like line-for-line statements, e.g.

# pseudo-translation to Python
metafile = metafiles[0]
metadata = handle_exceptions(...)
egg(metadata['egg'][0])
...etc...

And the actual body of it is just three "statements" (yeah, yeah, I know Scheme doesn't really have statements), dealing with the file system and issuing commands (like tar).

This is of course nothing special for languages that are often regarded as "scripting languages" -- like Perl, Python and Ruby -- but it's unusual to see it in Scheme. I'm delighted that it's possible. This is one of the reasons I like Chicken -- it tends to be more "real world" and "pragmatic" than some other implementations.

:: Comments

Cool listcomp, revisited

About two weeks ago I looked at a list comprehension in Haskell and Python. Since then I have also looked at eager comprehensions in Scheme (using SRFI-42). So, I just thought I'd be interesting to write that same listcomp in Scheme. Here it is...

(use syntax-case)
(use srfi-42)

(define (f n)
  (list-ec (: x 1 (+ n 1))
           (: y (+ x 1) (+ n 1))
           (: z (+ y 1) (+ n 1))
           (if (= (+ (* x x) (* y y))
                  (* z z)))
           (list x y z)))

(print (f 20))

From a readability point of view, this code might benefit from something like

  (let ((m (+ n 1)))
      (list-ec (: x 1 m)
               (: y (+ x 1) m)
               (: z (+ y 1) m)
               ...

as the ranges used by eager comprehensions don't include the upper bound, much like Python's range function. (I.e. (: x 10) includes the numbers 0 through 9, but not 10.)

:: Comments

Python vs Scheme: Simple list comprehensions

Python has a nifty feature called list comprehensions. Originally borrowed from Haskell, this is a (mostly) compact way to generate lists, without having to resort to the map and filter functions (which are often considered less clear and tend to rely on Python's underpowered lambda).

Scheme, being a functional language, has an entirely different attitude toward map, filter and friends. Their use is encouraged, rather than being downplayed. In spite of that, there are situations where list comprehensions (or some form thereof) would be useful. So, SRFI-42 introduces "eager comprehensions".

Rather than providing one construct, SRFI-42 has a number of forms, some of which seem to exist for efficiency reasons. For now, I'm just going to look at list-ec, which (in its most basic form) is a close cousin of Python's list comprehensions.

Let's run a simple example in Chicken:

> (use syntax-case)
; loading /usr/local/lib/chicken/3/syntax-case.so ...
; loading /usr/local/lib/chicken/3/syntax-case-chicken-macros.scm ...
> (use srfi-42)
; loading /usr/local/lib/chicken/3/srfi-42.scm ...
; loading /usr/local/lib/chicken/3/srfi-42-support.so ...

> (list-ec (: i 5) (* i i))
(0 1 4 9 16)

(Note: srfi-42 is an egg that needs to be installed first. It depends on another egg, syntax-case. Make sure these are installed, and imported in the correct order, or the examples won't work!)

In Python, this would look like:

>>> [i*i for i in range(5)]
[0, 1, 4, 9, 16]

In this case, we could easily write this with a map... if Scheme had something similar to Python's range function, which it doesn't. In fact, the lack of such a construct seems to have been a major motivation for writing SRFI-42. The author states:

"The origin of this SRFI is my frustration that there is no simple [form] for the list of integers from 0 to n-1. With this SRFI it is (list-ec (: i n) i)."

Moreover, much like in Python, eager comprehensions are capable of more powerful expressions, that are not so easily written using map. Like this one, which isn't terribly complex:

> (list-ec (: n 1 4) (: i n) (list n i))
((1 0) (2 0) (2 1) (3 0) (3 1) (3 2))

# Python:
>>> [(n,i) for n in range(1, 4) for i in range(n)]
[(1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2)]

Much, much more is possible using SRFI-42, most of which I don't understand yet :-), so for now I'm sticking to these simple examples. There will probably be a part II to this post, at some point. However, as a teaser, here's a way to write Python's enumerate (that works on strings and lists and probably some other types):

> (define (enumerate seq)
>   (list-ec (: x (index i) seq) (list i x)))

> (enumerate "hello")
((0 #\h) (1 #\e) (2 #\l) (3 #\l) (4 #\o))
> (enumerate '(guido larry matz))
((0 guido) (1 larry) (2 matz))

:: Comments (1)

Eggs, eggs, eggs

Eggs are official Chicken extensions. As you can see in the Eggs Unlimited repository, there are lots of them, for all kinds of purposes: extensions to the language, object systems, testing frameworks, graphics, web, XML, and much more. (As such, the repository is not unlike Perl's CPAN, or RubyGems, or Python's Cheese Shop.)

Eggs are Chicken-specific; while some other Schemes have their own packages, they are not compatible with eggs.

These extensions are (usually) easily installed, using the chicken-setup script. You can find the full documentation here, but the short story is as follows.

Look up the egg you want in Eggs Unlimited. Let's say you don't like the fact that Chicken doesn't support the full numerical tower out of the box. So you want to install the numbers egg. Doing so is as easy as

$ chicken-setup numbers

or, if you need admin privileges, maybe

$ sudo chicken-setup numbers

(which is what I use on my Mac).

This will look for the numbers egg, download it, build it, and install it. Note that, like most package managers, chicken-setup will figure out any dependent eggs, and (if necessary) download and install them as well.

Some eggs require external libraries; for example, numbers wants to be compiled with -lgmp. Such libraries need to be installed first (e.g. via MacPorts on OS X).

In order to see which eggs have been installed, use the -l switch:

$ chicken-setup -l

I find the output a bit hard to read, because of the distance between the package name and the version. This one-liner might help:

$ chicken-setup -l | awk '{ print $1, $3 }'

This produces output that looks like this:

args 1.2
chicken-wrap 1.91
easyffi 1.91
environments 1.5
format 3.1.3
...etc...

To see detailed information about a specific egg, use the -l switch with a name, like:

$ chicken-setup -l numbers

...which will give you information about the files in the egg, the names that it exports, the release date, the version, dependencies, etc.

chicken-setup supports many more options (use the --help switch to see them), but these are among the most useful.

By the way, there's a caveat that might not be a problem for most users unless they're running on Windows without Cygwin:

"The mechanism for loading compiled extensions is based on dynamically loadable code and as such is only available on systems on which loading compiled code at runtime is supported. Currently these are most UNIX-compatible platforms that provide the libdl functionality like Linux, Solaris, BSD, Mac OS X and Windows using Cygwin."

:: Comments

Python vs Scheme: function parameters (part II)

In part I, I looked at the handling of function arguments as specified in R5RS. Chicken adds three more ways to handle special arguments, using the keywords #!optional, #!rest and #!key. (Here's the relevant page in the Chicken documentation.)

The way I understand it, #!rest is just another way to collect optional arguments in a list. For our purposes, the following code is equivalent to (f a b . args):

> (define (r a b #!rest args)
     (list a b args))
> (r 1 2)
(1 2 ())
> (r 1 2 3)
(1 2 (3))

Optional arguments can also be specified separately (rather than collecting them all in a list). For this, use the #!optional keyword, followed by a list of names. Arguments passed to the function are associated with these names based on position. If not specified, the name defaults to #f.

> (define (o a #!optional b c)
     (list a b c))
> (o 1)
(1 #f #f)
> (o 2 'fred)
(2 fred #f)

;; roughly equivalent to Python:
;; def o(a, b=None, c=None): ...

We can also specify defaults:

> (define (p #!optional (a 42) (b 'cookie))
     (list a b))
> (p)
(42 cookie)
> (p 103)
(103 cookie)

;; roughly equivalent to Python:
;; def p(a=42, b="cookie"): ...

Again, values are associated with names based on position.

It would be useful if we could specify a parameter's name. What if, in the above example, we want to override b but not a? In that case, we need to use #!key. This works much like #!optional, except that we can specify names (using #:name syntax). In fact, we *must* specify names, because positional arguments are ignored (unless we also specify #!optional or #!rest).

> (define (k #!key (a 42) (b 'cookie))
        (list a b))
> (k)
(42 cookie)
> (k 1)
(42 cookie)  ;; neither a nor b is 1
> (k #:b 'possum)
(42 possum)
> (k #:b 'possum #:a 99) ; order of args doesn't matter
(99 possum)
> (k b: 'possum) ; this is also allowed
(42 possum)

So, in order to set a value for a, we must use #a: value in the function call. value: is also allowed.

Note if a function specifies both #!key and #!rest, keyword arguments passed to the function will show up in the rest argument as well, no matter whether there's a matching name in #!key or not.

> (define (z #!rest rest #!key (a 42) (b 'cookie))
     (list a b rest))
> (z)
(42 cookie ())
> (z 1 2)
(42 cookie (1 2))
> (z #:b 'soup 52)
(42 soup (b: soup 52))
> (z #:a 10 #:q 129)
(10 cookie (a: 10 q: 129))

In fact, using just #!rest, we can emulate Python's **kwargs construct (sort of):

> (define (y #!rest r) r)
> (y #:a 3)
(a: 3)
> (y 1 2 3 #:foo 'bar)
(1 2 3 foo: bar)

("Parsing" this list of rest args requires a bit of special code to collect the pairs; at this point, I'm not sure if Chicken provides such a function out of the box.)

By the way, we can pass the resulting list to apply without problems:

> (apply y '(1 2 3))
(1 2 3)
> (apply y '(1 2 3 foo: bar))
(1 2 3 foo: bar)
> (apply y '(1 2 3 #:foo bar))
(1 2 3 foo: bar)

What I like about the Chicken construct is that it doesn't mix up positional and keyword arguments (unlike Python, although there's a PEP to fix this in Python 3000).

Anyway, while all this is powerful, it's probably generally a good idea not to make argument lists too complicated. (The same is true in Python, by the way.)

:: Comments

Compiling Chicken code (part II)

One of the cool features of Chicken is that you may choose to compile parts of your Scheme program to C, in order to get better performance.

Let's write a (naive) implementation of the factorial function. Store it in fac.scm:

;; fac.scm

(define (fac n)
(if (<= n 1)
1
(* n (fac (- n 1)))))

Now, write a second file main.scm that uses our fac function. Normally this would be as simple as

(load "fac")

(print (fac 30))

But I'd like to include some simple benchmark code, to see how well fac performs, interpreted vs compiled. I'm using the following code to call (fac 30) 100,000 times, and time the results.

;; main.scm

(load "fac")

(define *max-iterations* 100000)

(define t1 (current-milliseconds))
(let loop ((i 0))
(unless (= i *max-iterations*)
(fac 30)
(loop (+ i 1))))
(define t2 (current-milliseconds))

(define seconds (/ (- t2 t1) 1000))
(printf "~a iterations in ~a seconds.~n" *max-iterations* seconds)

Interpreted, it takes a little more than two seconds on my machine:

$ csi -s main.scm
100000 iterations in 2.191 seconds.

We can do better than that. Let's compile fac.scm and see if it makes a difference.

$ csc -s fac.scm  # produces fac.so
$ csi -s main.scm
100000 iterations in 0.326 seconds.

That's right -- we still run main.scm interpreted, but it now uses the *compiled* fac.scm, which makes execution much faster. ^_^

We could do a lot more with this, e.g. compile main.scm to an executable, etc, but I just wanted to demonstrate how easy it is to precompile libraries, with no change in code. (At least, not for simple cases like this... but that's a different story.)

:: Comments (2)

Compiling Chicken code (part I)

(This post is just a quick introduction to using the Chicken compiler and interpreter.)

Consider this simple file:

;; hello.scm

(print "Hello, world!")

Save this as hello.scm.

Running this as a script in the interpreter is done as follows:

$ csi -s hello.scm

Note that if the -s is omitted, the script will still run, after which the interactive interpreter is started. This can be useful in some situations (e.g. to do post-execution inspection).

Alternatively, we may choose to compile the script to an executable:

$ csc hello.scm

(csc has a lot of options, but the simplest case does what most people would expect, i.e. it compiles an the executable named hello.)

Lo and behold, it works:

$ ./hello
Hello, world!

On my system (using Chicken 2.732, Mac OS X 10.4, Intel Dual Core) this produces a 14K executable, which seems quite reasonable. However, the way I understand it, this file dynamically links to the Chicken libraries. To create a standalone executable, we must use the -static command line option, which produces a much bigger file:

$ csc -static hello.scm
$ ls -l hello
-rwxr-xr-x   1 zephyrfa  zephyrfa  1370732 Dec 30 13:54 hello*

By the way, I'm not entirely sure that this really produces an executable that is 100% standalone... I'll have to do more testing.

There is much, much more to the Chicken compiler. Expect to see more about it in future posts. (Like the next one... :-)

:: Comments

A resolution for 2008

You know how they say that in order to improve yourself as a programmer, it's a good idea to learn a new language each year? Well, I feel that I haven't been learning much for the last few years. In fact, I've been stagnant for quite a while.

So, maybe I should make a change for 2008, and devote it to learning. The language I chose for this is Scheme. Chicken Scheme, to be precise. (I chose a Lisp language because it's a step up from Python and likely to challenge me in many ways; Scheme because it's clean; and Chicken because it seems to be more of a pragmatic/practical Scheme variant, rather than being focused on theory.)

I tinkered with Chicken before, but I didn't really get very far, possibly because I chose a project that felt like a dead end. But there is much, much to learn. I want to learn about Scheme macros, and continuations, and SRFIs, and object models, and eggs, and interfacing with C code, and many other things. I want to study SICP and EOPL and HTDP. And most importantly, I want to use Scheme much like I use Python now, in a way that produces readable, understandable and maintainable code.

In this blog, I'm hoping to record my progress, talk about things I learned (or that stump me), compare Scheme to other languages (most importantly Python), etc. Hopefully, while I'm at it, I'll also produce some code. :-)

(That said, this blog won't be *exclusively* about my adventures with Scheme...)

:: Comments

Next entries »