Python vs Scheme: strings

Python and Scheme have different philosophies when it comes to strings. Scheme strings are mutable and consist of characters, which are a separate type. By contrast, Python's strings are immutable, and its "characters" are really strings with a length of one.

Also, Python uses both " " and ' ' for string literals, while Scheme only uses " ".

Aside from that, strings can be used in these languages in ways that are very similar (as opposed to e.g. C's strings which tend to involve memory allocation and pointer arithmetic). So in this post, I will be focusing on common string operations, and what they look like in both Python and Scheme.

In Python, all these strings operations work out of the box. In Scheme, some are provided by R5RS, while others are found in SRFI-13 (a very useful library which has a large number of non-trivial string operations), and yet others are included by Chicken (but not necessarily part of other Scheme implementations). In the examples below, I'm assuming Chicken with SRFI-13 imported.

Joining multiple strings »

Python has the very obvious + operator to concatenate two strings, something which won't work in Scheme; (+ "a" "b") is an error. It also has the butt-ugly str.join method to join a list of strings. Scheme has string-append (R5RS) and string-join (SRFI-13).

# Python
>>> "hello" + " " + "world"
'hello world'
>>> " ".join(['my', 'name', 'is', 'poison'])
'my name is poison'

;; Scheme
> (string-append "hello" " " "world")
"hello world"
> (string-join '("my" "name" "is" "poison") " ")
"my name is poison"

Getting the length »

These functions are very simple, but I'm mentioning them anyway because there might be surprises here for people coming from other languages. Python uses the len() function rather than a method (like e.g. Ruby and Io do). Scheme uses string-length rather than length (which only works on lists).

# Python
>>> len("koyaanisqatsi")
13

;; Scheme
> (string-length "koyaanisqatsi")
13

Substrings »

Python uses the [] syntax for indexing and slicing; it also accepts negative numbers (to count from the end of the string). Scheme has string-ref and substring, which work similarly, except they don't take negative values. (Note that string-ref returns a *character* rather than a one-length string.)

# Python
>>> s = "hello"
>>> s[0]
'h'
>>> s[2]
'l'
>>> s[1:3]
'el'
>>> s[-3:]
'llo'

;; Scheme
> (define s "hello")
> (string-ref s 0)
#\h
> (string-ref s 2)
#\l
> (substring s 1 3)
"el"

Comparing strings »

In Python, strings are compared with the usual == family of operators. Case matters; "a" does not compare equal to "A". Scheme, on the other hand, has a number of functions to do the comparison; string=? and friends for case-sensitive comparing like in Python, and the string-ci=? family for case-insensitive comparing.

# Python
>>> "abc" == "abc"
True
>>> "abc" == "ABC"
False
>>> "b" > "a"
True

;; Scheme
> (string=? "abc" "ABC")
#f
> (string-ci=? "abc" "ABC")
#t
> (string>? "b" "a")
#t

SRFI-13 also provides equivalents for Python's useful startswith() and endswith() methods:

> (string-prefix? "He" "Herbert")
#t
> (string-suffix? "tt" "Abbott")
#t

Changing case »

Speaks for itself. Note that R5RS defines char-upcase and char-downcase, but not string-upcase or string-downcase (those are in SRFI-13).

>>> "kibbles and bits".upper()
'KIBBLES AND BITS'
>>> "KIBBLES AND BITS".lower()
'kibbles and bits'
>>> "kibbles and bits".capitalize()
'Kibbles and bits'
>>> "kibbles and bits".title()
'Kibbles And Bits'

;; Scheme
> (string-upcase "kibbles and bits")
"KIBBLES AND BITS"
> (string-downcase "KIBBLES AND BITS")
"kibbles and bits"
> (string-titlecase "kibbles and bits")
"Kibbles And Bits"

Splitting »

Splitting a string into a list of smaller strings is a common thing to do in high-level languages. Luckily, for common cases, we don't have to resort to regular expressions. Python uses the split() method, Scheme has string-tokenize (SRFI-13) and string-split (Chicken).

# Python
>>> "a few good men".split()
['a', 'few', 'good', 'men']
>>> "abracadabra".split("b")
['a', 'racada', 'ra']

;; Scheme
> (string-tokenize "a few good men")  ;; SRFI-13
("a" "few" "good" "men")
> (string-split "a few good men")     ;; Chicken built-in
("a" "few" "good" "men")
> (string-split "abracadabra" "b")
("a" "racada" "ra")

Trimming »

To trim characters from the left and/or right side of a string, Python uses the lstrip (from the left), rstrip (from the right) or strip (both sides) methods. Somewhat asymmetrically, in Scheme (or, more precisely, SRFI-13) these functions are called string-trim (from the left), string-trim-right (from the right) and string-trim-both (both sides).

Both Python and Scheme allow you to specify the character(s) that need to be stripped. By default, whitespace is removed, as this seems to be the most common use case.

# Python
>>> s = "  i like cookies  "
>>> s.strip()
'i like cookies'
>>> s.lstrip()
'i like cookies  '
>>> s.rstrip()
'  i like cookies'
>>> "xxxhi!xxx".strip("x")
'hi!'

;; Scheme
> (define s "  i like cookies  ")
> (string-trim s)
"i like cookies  "
> (string-trim-right s)
"  i like cookies"
> (string-trim-both s)
"i like cookies"
> (string-trim-both "xxxhi!xxx" #\x)
"hi!"

Looping »

In Python, you can loop over a string (using for) or turn it into a list, but what you get is essentially a list of strings with length one. In Scheme, you get characters. Use string->list to get a list of characters, and string-map to map one string to another (much like the regular map, but it takes and returns a string).

# Python
>>> for c in "hello": print c,
...
h e l l o
>>> list("hello")
['h', 'e', 'l', 'l', 'o']

;; Scheme
> (string->list "hello")
(#\h #\e #\l #\l #\o)
> (for-each
>   (lambda (c) (printf "~a! " c))
>   (string->list "hello"))
h! e! l! l! o!
> (string-map char-upcase "hello")
"HELLO"

Searching »

Python has several ways to search strings for contents... like the find/rfind methods (and their index/rindex counterparts) to find the index of a matching substring, and the in operator if you just want to know if a string has a certain substring, but don't need to know where exactly it starts.

You can do the same things in Scheme, assuming you use SRFI-13, as R5RS does not define any of this. string-index searches for a character (or a character set or a predicate), string-contains searches for a substring. When found, it returns the index, otherwise #f (which is useful because it allows one to write (if (string-contains s1 s2) ...)).

# Python
>>> "lemon-flavored jellibeans".find("e")
1
>>> "lemon-flavored jellibeans".rfind("e")
21
>>> "lemon-flavored jellibeans".find("el")
16
>>> "lemon-flavored jellibeans".find("xyz")
-1
>>> "el" in "lemon-flavored jellibeans"
True

;; Scheme
> (string-index "lemon-flavored jellibeans" #\e)
1
> (string-index-right "lemon-flavored jellibeans" #\e)
21
> (string-contains "lemon-flavored jellibeans" "el")
16
> (string-contains "lemon-flavored jellibeans" "xyz")
#f

Replacing »

Python's replace() method is very easy: simply specify the substring that needs to be replaced, and its replacement. By contrast, SRFI-13's string-replace is more sophisticated. It takes a string, a replacement string, and start/end indices that indicate what part of the string needs replaced. See the example below.

# Python
>>> "I like cookies".replace("cookie", "hot dog")
'I like hot dogs'

;; Scheme
> (string-replace "i like cookies" "hot dog" 7 13)
"i like hot dogs"

I don't know if there's a version that is easier to use floating around somewhere (in a SRFI or otherwise), but it's not so hard to write something that emulates the Python behavior:

(define (string-replace-v2 s before after)
  (let ((idx (string-contains s before)))
    (if idx
        (string-replace s after idx (+ idx (string-length before)))
        s)))

Also, Chicken has string-translate*, which works for our purposes, but is used with a table of elements to be replaced:

> (string-translate* "i like cookies"
>   '(("cookie" . "hot dog")))
"i like hot dogs"

:::

This has become a long post, longer than I intended, and there are still many things I haven't even touched upon yet... like Unicode, or the fact that some of the aforementioned functions have equivalents that change the string in-place, rather than returning a new string. Anyway, this wasn't meant to be a complete reference; it's more of a starting point, or a quick way to look up "I can do X in Python, how do I do it in Scheme?"

Further reading:

5 Comments »

  1. John Cowan said,

    January 27, 2008 @ 12:08 am

    The reason R5RS doesn't have string-{up,down}case is probably that it was assumed those are just lifts of char-{up-down}case over the string, and in ASCII they are. In the real (Unicode) world, alas, they are not; for example, the uppercase form of "ß" is "SS". Consequently, R6RS includes proper string casing functions and warns against using the character functions.

  2. John Nowak said,

    January 27, 2008 @ 11:13 am

    How is using '+' for a non-commutative operation obvious? Here's what's not obvious:

    >>> a = "foo"
    >>> b = "bar"
    >>> a + b == b + a
    False

  3. Hans Nowak said,

    January 27, 2008 @ 12:42 pm

    I think it's obvious when you think of "+" as "adding two things together" (rather than "a commutative mathematical operation").

  4. John Cowan said,

    January 29, 2008 @ 1:50 pm

    I should also mention that Chicken characters cover the full Unicode range, but how strings are interpreted depends on the egg: by default they are Latin-1 (and Unicode escapes in strings outside the Latin-1 range produce bizarre results), but if you load the utf-8 egg, they are treated as UTF-8, with consequent effects on string-length, string-ref, string-set!, etc.

  5. troels said,

    January 30, 2008 @ 4:43 pm

    I'm fairly new to lisp/scheme myself, and seeing your Python-comparisons made a lot of sense to me. Please post more like this.

RSS feed for comments on this post · TrackBack URI

Leave a Comment