Cutting

I did some long overdue Chicken hackery yesterday, and by accident I found out that Chicken's (or rather, SRFI-26's) cut/cute macros are not the same as Arc's [ ] syntax after all.

Scheme/Arc aficionados already knew this, of course, but to me it was news since I've never really used cut much. Here's a short explanation.

Scheme's cut produces an anonymous function that takes as many arguments as there are <> symbols at the "top level". E.g.

> (cut + 1 <>)
#<procedure (? g3)>
; has one argument

> (cut + 1 <> <>)
#<procedure (? g4 g5)>
; has two arguments

; etc...

However, any <> that is found in a nested expression, is not considered as a parameter. Therefore:

;this works...
> (map (cut + 1 <>) '(1 2 3))
(2 3 4)

; but this does not:
> (map (cut + 1 (* 2 <>)) '(1 2 3))
Error: bad argument count - received 1 but expected 0: #<procedure (?)>

By contrast, Arc's [ ] syntax produces an anonymous function that expects one and only one argument, which is represented by a single underscore. As a result, it's simultaneously more and less limited than cut:

arc> (map [+ 1 _] '(1 2 3))
(2 3 4)
arc> (map [+ 1 (* 2 _)] '(1 2 3))
(3 5 7)
; no problem

; but two parameters doesn't work:
arc> (map [cons _ _] '(1 2 3) '(4 5 6))
Error: "#<procedure>: expects 1 argument, given 2: 1 4"

Of course, for anonymous functions that are more complex than this, it's probably better to just use lambda... :-/ Or a named function.

:: Comments (2)

Interlude: Dude, where's my Chicken?

This started out as a Chicken blog, where I would report my adventures with Chicken Scheme. But these days it's more like a Python blog again. (Which isn't too surprising, given my experience with the language, and the fact that I use it for work and most of my personal projects.)

Heck, it has even been added to Planet Python again, which wouldn't have been the case if I wrote about Scheme all the time.

But really, I want to learn more about Scheme as well, *and* be able to use it in a pragmatic way. I just need a bit of focus. Right now I'm wondering what to look at next. Maybe object systems?

:: Comments

Python vs Scheme: using files as both modules and programs

Python has a useful idiom, that allows one to use the same file as both a module and a program. Consider this simple example:

# foo.py

def bar(x):
    print "bar says:", x

if __name__ == "__main__":

    bar(42)

The if __name__ == "__main__" clause is only executed if foo.py is run as the "main program". In other words, we can do both

$ python foo.py
bar says: 42

and

$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> foo.bar(33)
bar says: 33

Although admittedly a bit of a hack, this construct is well-known and often used.

Now, on to Chicken Scheme. Can we do the same? It required a bit of poking around in documentation and mailing list, but it appears the answer is yes.

;; foo.scm

(define (foo x)
  (print "foo says: " x))

(define (main args)
  (print "args: " args)
  (foo 42))

Now, we can run this as a script with csi -ss (which looks for a function called main and automagically calls it):

$ csi -ss foo.scm
args: ()
foo says: 42

Notice that the main function has an argument args, which contains the command line arguments passed to the program:

$ csi -ss foo.scm 1 2 3
args: (1 2 3)
foo says: 42

We can also import foo.scm from within an interactive session, in which cases main is not called:

$ csi

CHICKEN
Version 3.0.0 - macosx-unix-gnu-x86     [ manyargs dload ptables applyhook ]
(c)2000-2008 Felix L. Winkelmann        compiled 2008-03-05 on niflheim.local (Darwin)

; loading /Users/zephyrfalcon/.csirc ...
; loading /usr/local/lib/chicken/3/readline.so ...
#;1> (use foo)
; loading ./foo.scm ...
#;2> (foo 101)
foo says: 101

But, but! Isn't Chicken primarily a compiler? Does the above work too when using csc rather than csi? Actually it does, but the invocation is different. I use the following:

$ csc foo.scm -postlude "(main (cdr (argv)))"

$ ./foo
args: ()
foo says: 42

$ ./foo 1 2 3
args: (1 2 3)
foo says: 42

The -postlude option can be used to specify code that runs when the executable is called. (Actually, the official explanation is: "Add EXPRESSIONS after all other toplevel expressions in the compiled file. This option may be given multiple times. Processing of this option takes place after processing of -epilogue.")

I use (main (cdr (argv))) as the postlude expression, which seems to pass command line arguments the same way as csi -ss passes them to the main function (although there might be a catch that I'm not aware of). The cdr is necessary because the first item of the list returned by (argv) is the name of the calling program (e.g. "./foo").

(If there's a better way, please let me know, as my knowledge about the compiler is limited.)

Next up: parsing command line arguments in both Python and Scheme...

:: Comments (2)

Scheme crooks & nannies: the case form

You know R5RS's case construct? It works much like case/switch statements in other languages, except it has more parentheses: ;-)

#;6> (define language 'chicken)
#;7> (case language
--->   ((c c++ java) "bleh")
--->   ((python perl ruby) "better")
--->   ((chicken scheme) "cool")
--->   (else "never heard of it"))
"cool"

Now, yesterday there was an interesting question in chicken-users. The following code works:

#;8> (define foo 'a)
#;9> (case foo
--->   ('a 3)
--->   (else 4))
3

Does that mean that Chicken's case accepts qualifiers other than lists? Actually, no. Here's a hint:

#;10> (define bar 'quote)
#;11> (case bar
---->   ('a 3)
---->   (else 4))
3

As it turns out, 'a expands to (quote a), so it *is* a list, and therefore "correct", in the sense that case treats it like a list containing the symbols quote and a. (Which is not what was intended, but it *is* valid.)

Too bad, the following does not work:

#;13> (define cool-names '(guido larry matz felix))
#;14> (case 'bill
---->   (cool-names "cool!")
---->   (else "uncool"))
Error: (map) during expansion of (case ...) -
 argument is not a proper list: cool-names

Inspecting a case construct with macroexpand-1 may give more insight in how case works exactly. I was going to leave this as an exercise to the reader, but thought better of it. :-) Here's an example (formatting added by me):

#;1> (macroexpand-1 '(case a ((x y z) 5) (else 4)))
(let ((g2 a))
  (if (or (eqv? g2 (quote x))
          (eqv? g2 (quote y))
          (eqv? g2 (quote z)))
      (begin 5)
      (begin 4)))

:: Comments

Python vs Scheme: function parameters (part III)

(Also see: part I, part II)

Just discovered something neat. In Python, if a function has arguments that have a default value, then those defaults are bound when the function is defined. So the following function, when called with no arguments, always returns the same value:

>>> def f(x=random.randrange(0, 100)): return x
...
>>> f()
32
>>> f()
32
>>> f()
32

...because the default value for x is determined when f is defined, rather than when it's called.

And this is not allowed at all:

>>> def z(a, b=a): print a, b
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined

I assumed that these rules would be the same in Chicken, but it turns out that this is not the case. This allows for some cool constructs that simply aren't possible in Python. Like the example with the random number:

> (define (f #!optional (x (random 100))) x)
> (f)
23
> (f)
89
> (f)
97

It appears that (random 100) is computed whenever f is called, rather than when it's defined. We can also refer to other arguments in this default expression:

> (define (g a #!optional (b (+ a 10)))
>   (list a b))
> (g 3)
(3 13)
> (g 40)
(40 50)

Good to know. This behavior is intentional rather than accidental, judging from the Extensions to the standard section in the user manual.

:: Comments (1)

Macro trial balloons

I've been hacking on and off on s1, my vaporware Awk-like tool that uses Scheme for scripting. The "language" defined by s1 includes a few macros. Strictly speaking, these are redundant, but they do make things shorter, which is important for writing one-liners.

Anyway, I am still kind of new to macros... so I'm wondering if the code I wrote is sound. I'm offering the definitions here for review. (Comments from experienced Schemers are welcome.)

1. inc! is used to easily increase variables. It can take any number of arguments; if none are given, 1 is added by default. For example:

  • (inc! x) adds 1 to x
  • (inc! x 33) adds 33 to x
  • (inc! x 1 2 3 4 5) adds (+ 1 2 3 4 5) to x

As the exclamation mark indicates, inc! changes the variable in-place. Here's its current definition:

(define-macro (inc! name . args)
  (let ((total (gensym)))
    `(let ((,total
             (if (null? (list ,@args))
                 1
                 (apply + (list ,@args)))))
       (set! ,name (+ ,name ,total)))))

2. Defining multiple variables does not mix well with one-liners. So I added a macro def that allows one to define them quickly and concisely. Like inc!, def takes any number of arguments. If an argument is a list (name value), then a variable is created with the given name and value. If an argument is a symbol, then a variable is created with that name and a value of 0. (Awk is often used to add numbers, so zero seems the most sensible default, IMHO.)

Examples:

  • (def x) -- same as (define x 0)
  • (def x y) -- same as (define x 0) (define y 0)
  • (def (a 1) (b "hello")) -- same as (define a 1) (define b "hello")
  • (def q (w 3)) -- same as (define q 0) (define w 3)

Here's the current definition:

(define-macro (def . args)
  (if (null? args)
      #f
      (let* ((a (car args))
             (rest-args (cdr args))
             (name (if (list? a) (car a) a))
             (value (if (list? a) (cadr a) 0)))
        `(begin
           (define ,name ,value)
           (def ,@rest-args)))))

(I'm not sure about the #f; it's not supposed to return a value anyway.)

In any case, s1's auxiliary functions and macros allow for concise code. (Some of it is sloppy, but useful for "scripting", especially one-liners. Naturally, it's always possible to write longer scripts using "cleaner" code.)

For example, here's a one-liner that takes the last words on the given lines and adds them up (assuming they are numbers):

s1 '(B (def s)) (inc! s &$nf) (A (out s))'

(I'm not sure about the & syntax yet; it's used here as a shortcut for the as-number function, which attempts to convert a string to a number, returning 0 by default.)

B and A are shorthands for BEFORE and AFTER, blocks that are executed before and after the main code (which is executed for each line in the given text). The actual order in which these appear doesn't matter, but it's probably more intuitive to do before-main-after.

Print the number of lines, words and characters (like wc):

s1 '(B (def c w)) (inc! w nf) (inc! c (len $0) 1) (A (out nl w c))'

Print names starting with "Ga":

s1 '(if (~ #/^Ga/) (print $0))' /usr/share/dict/propernames

(I'm using regex literals, and ~ is the same as string-search, except it matches against $0 (the whole line) by default.)

These are just teasers. Actual code is subject to change. I will release this when the "API" is somewhat stable. It's still mostly a toy, though... :-)

:: Comments (4)

Modules

I noticed there's a new version of Arc. According to the web page, "The most dramatic change is probably the ability to use x.y and x!y as abbreviations for (x y) and (x 'y) respectively."

The x.y syntax reminds me of one of the things I miss in Scheme: a "Pythonic" module system. By "Pythonic" I mean, that you can import modules, and get a module object back, that you can access using x.y syntax. I'm mostly interesting in the namespace issue here, as the convention in Scheme seems to be, to just stick everything in the global namespace.

In other words, I am somewhat uncomfortable that it's not possible to do this:

;; --- foo.scm ---

(define (bar x)
  (+ x 1))

;; --- REPL ---

> (import foo)
> foo
#<module foo>
> (foo.bar 3)
4
> foo.bar
#<procedure (foo.bar x)>

...or something to that effect.

Maybe this is an irrational "need", but I like namespaces, and have gotten used to them over the years, and loading the toplevel namespace with lots and lots of definitions just seems a bit "unsafe" to me.

I found a comparison of Python's and PLT Scheme's module systems, which explains the issue better:

Once we've done '(require (lib "math.ss"))', we have access to the internals of the math library. But there's one surprise: unlike Python, 'math' itself is not a first-class object. By default, the require form has the same semantics as Python's "from [module] import *"!

The article then mentions the following technique to add a prefix to the names imported from a module:

> (require (prefix math. (lib "math.ss" "mzlib")))
> math.e
2.718281828459045

I suppose this would not be too bad as an alternative, although you still cannot do things like inspecting a module, pass it around, etc. Chicken does not seem to support it though, and Schemers generally don't seem to miss the feature. (Or maybe there's a reason why it would be a bad idea in Scheme.) So maybe I should just learn to live without it. Thoughts welcome...

:: Comments (4)

SRE regular expression syntax

Looking for eggs that handle regular expressions, I found scsh-regexp. It implements most of the regexp API, as described in "Proposed SRE regular-expression notation".

Now, I'm not particularly interested in the API, as Chicken already supports regular expressions out of the box (see the regex unit). What I find much more interesting is the part of the proposal that *isn't* implemented. It also defines a syntax for describing regular expressions. This part isn't available in the egg; according to scsh-regexp's documentation, it was only implemented in SCSH.

Many questions arise. Why wasn't this implemented elsewhere? Would it be useful (i.e. compared to usual regex notation)? Would it be hard to port to Chicken? Would it be hard to write from scratch, going by the documentation only?

Most likely, I'm not up to the task, but that doesn't stop me from dreaming about things like a sed-like stream editor using this kind of syntax for regular expressions. It would be super verbose, but also more readable... at least, that is the assumption.

:: Comments (3)

Making s1 code shorter

More efforts to make s1 code shorter...

I added/changed the following:

  • slice function (this will come in handy for string manipulation, although (slice s a b) is still longer than s[a:b]
  • nf variable (indicates the number of fields in a line)
  • $ now accepts any expression, not just a number literal (so we can say $nf or $(- x 1) or whatever)
  • -f command line option to quickly set the field separator
  • created an alias s1 for 'csi -ss /path/to/s1.scm' (OK, this is not a change to the tool proper, but it helps... nobody wants to type csi -ss with the full path, all the time)

None of this is particularly original (as it's all borrowed from Awk and Python), but it does help to make code much shorter. I can now write a one-liner like this:

find "${1-.}" -type d | s1 -f/ '(print "   |" (make-string nf #\-) $nf)'

...which produces almost the same tree as described here. Almost, because it uses one "-" for each directory rather than two. We can fix that by making the example somewhat longer (and less readable to non-Schemers, although admittedly the Awk example isn't very readable to non-Awkers either):

find "${1-.}" -type d | s1 -f/ '(print "   |" (make-string (* (- nf 1) 2) #\-) $nf)'

Next up: regular expressions...

:: Comments

I can has scripting?

Brainstorming a bit here...

So I saw this cool one-liner script. It displays a directory tree:

ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'

I have been hacking (on and off) on an Awk-like tool that uses Chicken for scripting. Its working title is s1, and so far it supports begin/main/end blocks like Awk, and $N-notation for quick access to fields.

The real problem is not, adding Awk-like functionality as Scheme functions, but rather, making the Scheme concise enough that it can be used for one-liners. For example, the following works, but is still too verbose:

ls -l | csi -ss s1.scm '(before (set! total 0)) (set! total (+ total (as-number $5))) (after (print total))'

(It shows the total size of files in the current directory, in case it wasn't obvious. :-)

It gets even worse when we try to emulate the ls/grep/sed script mentioned above. Writing this as a one-line just isn't feasible at this point; rather, I have to use a script:

;; tree.s1
;; Call with: ls -aR | csi -ss s1.scm -s tree.s1

(define (branch parts)
  (let* ((parts (reverse parts))
         (dirname (string-trim-right (first parts) #\:))
         (line (apply conc (map (lambda (s) "--") (cdr parts)))))
    (conc "|" line dirname)))

(when (string-suffix? ":" $0)
  (let* ((parts (string-split $0 "/")))
    (print "   " (branch parts))))

Much of the work goes into processing strings and lists, but even without that the code probably still would not fit on one line.

Granted, it's hard to beat the conciseness of regular expressions (as used by grep and sed), but if I want s1 to be useful, it needs to support (much) shorter code. (Maybe I should rewrite it in Arc? ;-)

Here are some thoughts about making this shorter:

1. I dearly miss slicing. In Python, I would have used s[:-1] to chop off the trailing colon. Adding a slice function to the toolbox would probably be useful.

2. I could leverage some of the Awk-like features. This Reddit comment does just that:

find "${1-.}" -type d | awk -F/ '{printf "  |%*s%s\n",(NF-1)*2,"",$NF}'

It takes advantage of the following features:

  • using "/" as a field separator (effectively splitting the path with no extra code)
  • using NF (indicating the number of fields in a line) to compute the indentation level
  • using $NF to display the last field (i.e. the directory name we want to see)

As it happens, s1 can do all these things as well, but it won't be quite as concise. This is something to work on. Maybe there could be a special variable (or symbol) that translates to (length *fields*). I would have to hack the current $-literal to accept any expression, not just a number literal.

The tricky part is to make things concise while still keeping the spirit of Scheme. :-/ s1 will never automagically coerce numbers to strings (or vice versa), for example. I can, however, make it easy to convert them explicitly. (This is also the Pythoneer in me speaking... explicit is better than implicit, and all that good stuff.)

Anyway, there was really no point to this post, except letting the world know that I am tinkering with a new toy. :-) If it ever actually becomes useful, I'll release it... Until then, I might talk more about it here.

:: Comments (3)

« Previous entries