A foolish consistency, etc

People new (and not so new) to Python are often confused by the fact that the language uses functions in some situations, but methods in others. I see this come up a lot these days. The most recent example is this comment:

Say you have a list, how is a beginning supposed to follow/comprehend the bigger picture when faced with code like this:

mylist = [1,3,2]
print len( mylist )
print sorted( mylist )
print mylist.sort()
print mylist.reverse()
print mylist[0]

Honestly, what is going on here? Why different syntaxes for slightly different ideas?

This gives many people the feeling that OO in Python was added later, or "bolted on", as Taw puts it.

I figure the reasons for the discrepancy are like this:

In 1991, when Python was first released, *pure* object-oriented languages were rare. Sure, there was Smalltalk, but it was fairly obscure. Other languages that called themselves object-oriented, like C++ and ObjectPascal, were actually hybrids; objects were indeed "bolted on" to an existing language.

In this light, it makes sense that Python 0.9.1 did the same. It had built-in types, and user-defined objects. Some of the types, like lists and dictionaries, had methods, possibly because were mutable and therefore contained "state". Other types, like numbers, strings and tuples, did not have methods (and were therefore not "perceived" as objects, even though behind the scenes they were). (In fact, back in the early 90s, the idea of calling a method on a number seemed like a really odd idea (if you were not a Smalltalker :-).)

As a result, in order to take the length of a string or a tuple, you needed a *function* (because you could not call a method on these types). While list and dict could easily have had such a method, for consistency's sake this was omitted; instead, the len() function was to be used on them too.

And so it still is. But the world has changed since then. OO has become much more mainstream, and the notion of using methods for everything is no longer odd, but expected by many. I can understand why someone coming from Ruby (or maybe Java) would wonder why in some cases functions are required, where a method would seem more natural. It makes Python seem inconsistent, when initially the use of len() actually improved consistency.

This is something that *could* have been fixed in Python 3.0, but won't, as far as I understand it. I'm not sure if this is a missed opportunity, or that it will keep Python closer to its original spirit (i.e., a multi-paradigm language).

(Note: I am aware of the __len__ method of course, but honestly, who is really going to use that instead of len()? __len__ is a hook to make len() work, not an alternative for it.)

:: Comments (2)

ABCs

So Python 3000 will have abstract base classes.

As far as I know, in the Python world, an abstract base class used to mean, a base class that cannot be instantiated. Kind of like:

class Foo:
    def __init__(self):
        raise NotImplementedError, "Foo cannot be instantiated"

It would, however, be perfectly possible for an abstract base class to have valid, working methods, to be used by classes inheriting from Foo.

Apparently this isn't the case any longer. If I read PEP 3119 correctly, in Py3K-speak, an abstract base class is a way to trick isinstance() and issubclass() into believing that a given class derives from another class, when this actually isn't the case. For example:

>>> list.__bases__
(<class 'object'>,)
>>> object.__bases__
()

list derives from object, which in turn derives from nothing else. But:

>>> issubclass(list, collections.Sequence)
True

In other words, list *pretends* to derive from Sequence, but it really doesn't, and so any methods defined in Sequence will not be inherited by list.

Filtered through my old-school-Python-riddled brain, this implies the following:

  • abstract base classes mean something different now, and
  • they break isinstance and issubclass, and
  • methods defined in the ABC cannot be used by the "subclass".

Seriously, the more I read about Python 3.0, the harder it becomes for me not to be appalled. The language makes a heroic effort at fixing warts and problems and trimming fat, but at the same time it's becoming more and more complex.

I'd love to be proven wrong though. I like Python a lot, and I am (and have been) worried about what it's evolving into. But maybe Python 3.0 will actually be easy and intuitive to use, like the Python of old, and all the overly complex stuff will stay in the back rather than get in the way. I guess time will tell. (Or maybe I'm misunderstanding things... feel free to point that out. :-)

(This is mostly a matter of personal preference, of course -- for the last few years I've been gravitating toward languages with little syntax, that allow powerful new constructs to be written in the language itself. Think Scheme and Io. Python, however, has steadily been moving in the other direction.)

:: Comments (6)

list(dict)

The other day there was a thread on comp.lang.python that started off by discussing a common problem: deleting keys from a dict while iterating over it causes an exception. In other words, the following code fails:

>>> d = {1: 2, 3: 4, 5: 6}
>>> for i in d: del d[i]
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

This is easily solved by replacing for i in d with for i in d.keys(). The former iterates over the keys one by one, while the later creates a list of keys first, then iterates over *that*; the list does not change during the loop or after deleting keys from the dictionary, so the error above does not occur.

However. In Python 3000 this is going to change. d.keys() will then return an iterator, and effectively produce the same result as for i in d; that is, an exception. This can be solved (in both 2.5 and 3.0) by forcing a list; e.g. for i in list(d.keys()).

Then someone pointed out that this can be written as for i in list(d) as well. I had not given this much thought before, but this would work as well, of course. It's just that list(d) strikes me as very counter-intuitive. I mean, it makes sense, in a way; if you can iterate over a dictionary's keys by doing for i in d, then list(d) would do just that: iterate over the keys, collect them in a list, and return that list. Except that list(d) kind of reads to me as, "convert a dictionary to a list", which is not what it does. The reverse operation does not work because of this; dict(list(d)) doesn't fly. Maybe iterating over a dictionary's items would have been a better choice, after all. :-/

:: Comments

Python 3000 in 60 seconds

After studying the relevant PEPs and other documents, I've come to the following conclusions:

  • Python 3000 used to be the mythical, revolutionary new release, implemented from scratch (possibly in C++), that would solve all of Python's problems, and then some.
  • However, over time, features that were originally intended for Python 3000 were added to the 2.x branch instead. (Actually, many of these were more like cleanups than new features. Fixing the class/type dichotomy, nested scopes, unifying longs and ints, division operator, etc.)
  • So, in 2008, Python 3000 is not quite so revolutionary anymore, compared to 2.5. As PEP 3000 says, "Python 3000 will be implemented in C, and the implementation will be derived as an evolution of the Python 2 code base. [...] Since Python 3000 as a language is a relatively mild improvement on Python 2, we can gain a lot by not attempting to reimplement the language from scratch."
  • In spite of that, Python 3000 is still special, because it introduces a number of changes that will break backward compatibility. Again, many of these deal with the fixing of warts and inconsistencies. For example: print will become a function; as and with will be keywords, and so will True, False and None; some functions (like map) now return iterators rather than lists; exception handling is different; etc. More here. (I will probably write about some of these changes in more detail later.)
  • Don't count on changes that will change the face of Python completely: explicit self and limited lambda are here to stay, not to mention indentation-based syntax and case sensitivity. (See PEP 3099.)

:: Comments (3)

Python vs Scheme: using files as both modules and programs

Python has a useful idiom, that allows one to use the same file as both a module and a program. Consider this simple example:

# foo.py

def bar(x):
    print "bar says:", x

if __name__ == "__main__":

    bar(42)

The if __name__ == "__main__" clause is only executed if foo.py is run as the "main program". In other words, we can do both

$ python foo.py
bar says: 42

and

$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> foo.bar(33)
bar says: 33

Although admittedly a bit of a hack, this construct is well-known and often used.

Now, on to Chicken Scheme. Can we do the same? It required a bit of poking around in documentation and mailing list, but it appears the answer is yes.

;; foo.scm

(define (foo x)
  (print "foo says: " x))

(define (main args)
  (print "args: " args)
  (foo 42))

Now, we can run this as a script with csi -ss (which looks for a function called main and automagically calls it):

$ csi -ss foo.scm
args: ()
foo says: 42

Notice that the main function has an argument args, which contains the command line arguments passed to the program:

$ csi -ss foo.scm 1 2 3
args: (1 2 3)
foo says: 42

We can also import foo.scm from within an interactive session, in which cases main is not called:

$ csi

CHICKEN
Version 3.0.0 - macosx-unix-gnu-x86     [ manyargs dload ptables applyhook ]
(c)2000-2008 Felix L. Winkelmann        compiled 2008-03-05 on niflheim.local (Darwin)

; loading /Users/zephyrfalcon/.csirc ...
; loading /usr/local/lib/chicken/3/readline.so ...
#;1> (use foo)
; loading ./foo.scm ...
#;2> (foo 101)
foo says: 101

But, but! Isn't Chicken primarily a compiler? Does the above work too when using csc rather than csi? Actually it does, but the invocation is different. I use the following:

$ csc foo.scm -postlude "(main (cdr (argv)))"

$ ./foo
args: ()
foo says: 42

$ ./foo 1 2 3
args: (1 2 3)
foo says: 42

The -postlude option can be used to specify code that runs when the executable is called. (Actually, the official explanation is: "Add EXPRESSIONS after all other toplevel expressions in the compiled file. This option may be given multiple times. Processing of this option takes place after processing of -epilogue.")

I use (main (cdr (argv))) as the postlude expression, which seems to pass command line arguments the same way as csi -ss passes them to the main function (although there might be a catch that I'm not aware of). The cdr is necessary because the first item of the list returned by (argv) is the name of the calling program (e.g. "./foo").

(If there's a better way, please let me know, as my knowledge about the compiler is limited.)

Next up: parsing command line arguments in both Python and Scheme...

:: Comments (2)

Python vs Scheme: function parameters (part III)

(Also see: part I, part II)

Just discovered something neat. In Python, if a function has arguments that have a default value, then those defaults are bound when the function is defined. So the following function, when called with no arguments, always returns the same value:

>>> def f(x=random.randrange(0, 100)): return x
...
>>> f()
32
>>> f()
32
>>> f()
32

...because the default value for x is determined when f is defined, rather than when it's called.

And this is not allowed at all:

>>> def z(a, b=a): print a, b
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined

I assumed that these rules would be the same in Chicken, but it turns out that this is not the case. This allows for some cool constructs that simply aren't possible in Python. Like the example with the random number:

> (define (f #!optional (x (random 100))) x)
> (f)
23
> (f)
89
> (f)
97

It appears that (random 100) is computed whenever f is called, rather than when it's defined. We can also refer to other arguments in this default expression:

> (define (g a #!optional (b (+ a 10)))
>   (list a b))
> (g 3)
(3 13)
> (g 40)
(40 50)

Good to know. This behavior is intentional rather than accidental, judging from the Extensions to the standard section in the user manual.

:: Comments (1)

Modules

I noticed there's a new version of Arc. According to the web page, "The most dramatic change is probably the ability to use x.y and x!y as abbreviations for (x y) and (x 'y) respectively."

The x.y syntax reminds me of one of the things I miss in Scheme: a "Pythonic" module system. By "Pythonic" I mean, that you can import modules, and get a module object back, that you can access using x.y syntax. I'm mostly interesting in the namespace issue here, as the convention in Scheme seems to be, to just stick everything in the global namespace.

In other words, I am somewhat uncomfortable that it's not possible to do this:

;; --- foo.scm ---

(define (bar x)
  (+ x 1))

;; --- REPL ---

> (import foo)
> foo
#<module foo>
> (foo.bar 3)
4
> foo.bar
#<procedure (foo.bar x)>

...or something to that effect.

Maybe this is an irrational "need", but I like namespaces, and have gotten used to them over the years, and loading the toplevel namespace with lots and lots of definitions just seems a bit "unsafe" to me.

I found a comparison of Python's and PLT Scheme's module systems, which explains the issue better:

Once we've done '(require (lib "math.ss"))', we have access to the internals of the math library. But there's one surprise: unlike Python, 'math' itself is not a first-class object. By default, the require form has the same semantics as Python's "from [module] import *"!

The article then mentions the following technique to add a prefix to the names imported from a module:

> (require (prefix math. (lib "math.ss" "mzlib")))
> math.e
2.718281828459045

I suppose this would not be too bad as an alternative, although you still cannot do things like inspecting a module, pass it around, etc. Chicken does not seem to support it though, and Schemers generally don't seem to miss the feature. (Or maybe there's a reason why it would be a bad idea in Scheme.) So maybe I should just learn to live without it. Thoughts welcome...

:: Comments (4)

Python vs Scheme: strings

Python and Scheme have different philosophies when it comes to strings. Scheme strings are mutable and consist of characters, which are a separate type. By contrast, Python's strings are immutable, and its "characters" are really strings with a length of one.

Also, Python uses both " " and ' ' for string literals, while Scheme only uses " ".

Aside from that, strings can be used in these languages in ways that are very similar (as opposed to e.g. C's strings which tend to involve memory allocation and pointer arithmetic). So in this post, I will be focusing on common string operations, and what they look like in both Python and Scheme.

In Python, all these strings operations work out of the box. In Scheme, some are provided by R5RS, while others are found in SRFI-13 (a very useful library which has a large number of non-trivial string operations), and yet others are included by Chicken (but not necessarily part of other Scheme implementations). In the examples below, I'm assuming Chicken with SRFI-13 imported.

Joining multiple strings »

Python has the very obvious + operator to concatenate two strings, something which won't work in Scheme; (+ "a" "b") is an error. It also has the butt-ugly str.join method to join a list of strings. Scheme has string-append (R5RS) and string-join (SRFI-13).

# Python
>>> "hello" + " " + "world"
'hello world'
>>> " ".join(['my', 'name', 'is', 'poison'])
'my name is poison'

;; Scheme
> (string-append "hello" " " "world")
"hello world"
> (string-join '("my" "name" "is" "poison") " ")
"my name is poison"

Getting the length »

These functions are very simple, but I'm mentioning them anyway because there might be surprises here for people coming from other languages. Python uses the len() function rather than a method (like e.g. Ruby and Io do). Scheme uses string-length rather than length (which only works on lists).

# Python
>>> len("koyaanisqatsi")
13

;; Scheme
> (string-length "koyaanisqatsi")
13

Substrings »

Python uses the [] syntax for indexing and slicing; it also accepts negative numbers (to count from the end of the string). Scheme has string-ref and substring, which work similarly, except they don't take negative values. (Note that string-ref returns a *character* rather than a one-length string.)

# Python
>>> s = "hello"
>>> s[0]
'h'
>>> s[2]
'l'
>>> s[1:3]
'el'
>>> s[-3:]
'llo'

;; Scheme
> (define s "hello")
> (string-ref s 0)
#\h
> (string-ref s 2)
#\l
> (substring s 1 3)
"el"

Comparing strings »

In Python, strings are compared with the usual == family of operators. Case matters; "a" does not compare equal to "A". Scheme, on the other hand, has a number of functions to do the comparison; string=? and friends for case-sensitive comparing like in Python, and the string-ci=? family for case-insensitive comparing.

# Python
>>> "abc" == "abc"
True
>>> "abc" == "ABC"
False
>>> "b" > "a"
True

;; Scheme
> (string=? "abc" "ABC")
#f
> (string-ci=? "abc" "ABC")
#t
> (string>? "b" "a")
#t

SRFI-13 also provides equivalents for Python's useful startswith() and endswith() methods:

> (string-prefix? "He" "Herbert")
#t
> (string-suffix? "tt" "Abbott")
#t

Changing case »

Speaks for itself. Note that R5RS defines char-upcase and char-downcase, but not string-upcase or string-downcase (those are in SRFI-13).

>>> "kibbles and bits".upper()
'KIBBLES AND BITS'
>>> "KIBBLES AND BITS".lower()
'kibbles and bits'
>>> "kibbles and bits".capitalize()
'Kibbles and bits'
>>> "kibbles and bits".title()
'Kibbles And Bits'

;; Scheme
> (string-upcase "kibbles and bits")
"KIBBLES AND BITS"
> (string-downcase "KIBBLES AND BITS")
"kibbles and bits"
> (string-titlecase "kibbles and bits")
"Kibbles And Bits"

Splitting »

Splitting a string into a list of smaller strings is a common thing to do in high-level languages. Luckily, for common cases, we don't have to resort to regular expressions. Python uses the split() method, Scheme has string-tokenize (SRFI-13) and string-split (Chicken).

# Python
>>> "a few good men".split()
['a', 'few', 'good', 'men']
>>> "abracadabra".split("b")
['a', 'racada', 'ra']

;; Scheme
> (string-tokenize "a few good men")  ;; SRFI-13
("a" "few" "good" "men")
> (string-split "a few good men")     ;; Chicken built-in
("a" "few" "good" "men")
> (string-split "abracadabra" "b")
("a" "racada" "ra")

Trimming »

To trim characters from the left and/or right side of a string, Python uses the lstrip (from the left), rstrip (from the right) or strip (both sides) methods. Somewhat asymmetrically, in Scheme (or, more precisely, SRFI-13) these functions are called string-trim (from the left), string-trim-right (from the right) and string-trim-both (both sides).

Both Python and Scheme allow you to specify the character(s) that need to be stripped. By default, whitespace is removed, as this seems to be the most common use case.

# Python
>>> s = "  i like cookies  "
>>> s.strip()
'i like cookies'
>>> s.lstrip()
'i like cookies  '
>>> s.rstrip()
'  i like cookies'
>>> "xxxhi!xxx".strip("x")
'hi!'

;; Scheme
> (define s "  i like cookies  ")
> (string-trim s)
"i like cookies  "
> (string-trim-right s)
"  i like cookies"
> (string-trim-both s)
"i like cookies"
> (string-trim-both "xxxhi!xxx" #\x)
"hi!"

Looping »

In Python, you can loop over a string (using for) or turn it into a list, but what you get is essentially a list of strings with length one. In Scheme, you get characters. Use string->list to get a list of characters, and string-map to map one string to another (much like the regular map, but it takes and returns a string).

# Python
>>> for c in "hello": print c,
...
h e l l o
>>> list("hello")
['h', 'e', 'l', 'l', 'o']

;; Scheme
> (string->list "hello")
(#\h #\e #\l #\l #\o)
> (for-each
>   (lambda (c) (printf "~a! " c))
>   (string->list "hello"))
h! e! l! l! o!
> (string-map char-upcase "hello")
"HELLO"

Searching »

Python has several ways to search strings for contents... like the find/rfind methods (and their index/rindex counterparts) to find the index of a matching substring, and the in operator if you just want to know if a string has a certain substring, but don't need to know where exactly it starts.

You can do the same things in Scheme, assuming you use SRFI-13, as R5RS does not define any of this. string-index searches for a character (or a character set or a predicate), string-contains searches for a substring. When found, it returns the index, otherwise #f (which is useful because it allows one to write (if (string-contains s1 s2) ...)).

# Python
>>> "lemon-flavored jellibeans".find("e")
1
>>> "lemon-flavored jellibeans".rfind("e")
21
>>> "lemon-flavored jellibeans".find("el")
16
>>> "lemon-flavored jellibeans".find("xyz")
-1
>>> "el" in "lemon-flavored jellibeans"
True

;; Scheme
> (string-index "lemon-flavored jellibeans" #\e)
1
> (string-index-right "lemon-flavored jellibeans" #\e)
21
> (string-contains "lemon-flavored jellibeans" "el")
16
> (string-contains "lemon-flavored jellibeans" "xyz")
#f

Replacing »

Python's replace() method is very easy: simply specify the substring that needs to be replaced, and its replacement. By contrast, SRFI-13's string-replace is more sophisticated. It takes a string, a replacement string, and start/end indices that indicate what part of the string needs replaced. See the example below.

# Python
>>> "I like cookies".replace("cookie", "hot dog")
'I like hot dogs'

;; Scheme
> (string-replace "i like cookies" "hot dog" 7 13)
"i like hot dogs"

I don't know if there's a version that is easier to use floating around somewhere (in a SRFI or otherwise), but it's not so hard to write something that emulates the Python behavior:

(define (string-replace-v2 s before after)
  (let ((idx (string-contains s before)))
    (if idx
        (string-replace s after idx (+ idx (string-length before)))
        s)))

Also, Chicken has string-translate*, which works for our purposes, but is used with a table of elements to be replaced:

> (string-translate* "i like cookies"
>   '(("cookie" . "hot dog")))
"i like hot dogs"

:::

This has become a long post, longer than I intended, and there are still many things I haven't even touched upon yet... like Unicode, or the fact that some of the aforementioned functions have equivalents that change the string in-place, rather than returning a new string. Anyway, this wasn't meant to be a complete reference; it's more of a starting point, or a quick way to look up "I can do X in Python, how do I do it in Scheme?"

Further reading:

:: Comments (5)

Chandler

Chandler is pining for the fjords. Carlos Perez blames Python; Ned Batchelder comes to Python's defense.

First of all: I have always thought that Chandler was dubious marketing for Python, to be honest. When it was first announced, much was made of the fact that it was going to use Python as the development language... and then it just sat there for years, showing very little progress.

Upon reading Dreaming In Code, I got the impression that programming languages (whether Python or Java) are not to blame, but rather the fact that the project's goals were very vague and unclear, right from the start (which caused a slew of other, related, problems). Writing a revolutionary PIM is a great idea... but nobody knew what it should look like exactly, much less what it should do.

Python is great, but it doesn't design the program for you. :-} As such, I don't think it has anything to do with Chandler's failure (and neither does the static-vs-dynamic typing issue). If you don't know what you want, you can have the most productive language known to man, but it won't do you any good.

(And, of course, the fact that most of the developers were unfamiliar with Python, did not exactly push the project forward either...)

:: Comments

Python vs Scheme: dictionaries

Python has a separate dictionary type. I'm assuming most of my readers are familiar with it, as it's very common; if not, please read the fine tutorial first, followed by the library reference.

In short, Python's dictionaries are mutable objects that associate unique keys with values. There is special syntax to create them.

Now, going by R5RS, Scheme doesn't even have anything similar. All it has is three closely related functions, that look up pairs in a list, based on a "key" which is matched to the first element of each pair. These functions are assq, assv and assoc. (See here in R5RS.)

Naturally, it's possible to write extensions in Scheme that look more like Python's dict (or Ruby's Hash, etc), and I'm sure people have done so; for example, SRFI-69 defines hash tables. But for now, let's see what we can do with the bare-bones approach.

Its usage is simple: you define a list of pairs, possibly augmenting them by consing new pairs onto it, or deleting elements from it. (The order doesn't really matter.) Then you use the aforementioned functions to look up "keys" (matched to the first element in each pair). If not found, #f is returned, otherwise the matching pair.

That's right; the whole pair is returned, not just the second element of the pair. By doing so, Scheme sidesteps the problem that some languages have (e.g. Ruby); it either returns a pair (found) or #f (not found), so there can never be any confusion whether the key was found or not. By contrast, in Ruby, if myhash[value] returns nil, that could mean that the value was found and that its associated value was nil, *or* that it was not found at all. (Python doesn't have this problem either; it raises a KeyError exception if the key is not found; in addition, the Ruby behavior can be emulated with the dict.get() method.)

Anyway, here's an example (R5RS only but we're secretly assuming that filter exists):

(define language-designers
  '((guido python)
    (matz ruby)
    (rasmus php)
    (larry perl)))

; add some...
(define language-designers
  (cons '(felix chicken) language-designers))

; I don't like PHP; remove it :-)
(define language-designers
  (filter (lambda (pair) (not (equal? (cadr pair) 'php)))
          language-designers))

(print (assoc 'guido language-designers))  ; => (guido python)
(print (assoc 'hans language-designers))   ; => #f

Note how the usage is completely different from Python's dicts. This can be written differently -- hell, OF COURSE it can be written differently, it's Scheme! :-) But let's stick with this approach for a minute.

(I'm using assoc here, which compares key and first element using the equal? predicate. assq and assv basically do the same thing, using the eq? and eqv? predicates, respectively. Equality testing in Scheme will be dealt with in yet another forthcoming post...)

Basically, this is all you need. Since a "dictionary" is just a list of pairs, all the usual list operators apply, and can be used to write any Python-esque dict operations fairly easily: keys, has_key, adding and removing using [], etc. (Doing so is left as an exercise for the reader. ;-)

Fortunately, the author of SRFI-1 (a list library) recognized the need for such functions, and supplied a few of them. Using the SRFI, we could write:

(define language-designers
  (alist-cons 'felix chicken language-designers))

(define language-designers
  (alist-delete 'rasmus language-designers))

... which is at least a bit clearer.

(More about Scheme lists, and SRFI-1 which is *very* necessary, in a separate post.)

:: Comments (4)

« Previous entries