(cond-expand ((library (srfi 130)) (import (srfi 130))) ((library (srfi 13)) (import (srfi 13))))
This is a guide for common character and string tasks.
Scheme has had standard string and character datatypes since forever. They are fully distinct types: a character is neither a string nor an integer. String, bytevector, vector and list are also distinct types: none of them is a subtype of another.
A Scheme character object represents one Unicode codepoint.
char→integer and integer→char convert between integer codepoints
and character objects. (map char→integer (string→list s))
shows
all the codepoints in the string s.
Scheme strings are generally mutable. This means you can change individual characters in the string using string-set! at any time after the string has been created.
However, actual Scheme code that mutates strings is somewhat rare. It is generally best to avoid mutating them if you can manage without. In the medium-to-long term, Scheme may evolve in a direction where strings are immutable, or where mutable strings are second-class citizens and immutable strings are the default thing to use.
Scheme has had several standard char- and string- procedures since forever (R2RS). Since R6RS they have been Unicode-aware.
SRFI 13 (String Libraries) is the most popular library for extra convenience. SRFI 130 (Cursor-based string library) is mostly a drop-in replacement, but additionally supports string cursors for walking a string character-by-character while keeping track of the current position.
R7RS: If you don’t need string cursors, you can use the following cond-expand. It will import whichever one of 130 and 13 is available in any given Scheme implementation. Almost all R7RS Schemes come with one or both libraries.
(cond-expand ((library (srfi 130)) (import (srfi 130))) ((library (srfi 13)) (import (srfi 13))))
char-alphabetic?
char-numeric?
char-whitespace?
char-upper-case?
char-lower-case?
SRFI 175 (ASCII character library) has ASCII-only versions of these.
char-ci=?
char-ci<?
char-ci>?
char-ci⇐?
char-ci>=?
string-ci=?
string-ci<?
string-ci>?
string-ci⇐?
string-ci>=?
char-upcase
char-downcase
string-upcase
string-downcase
SRFI 129 (Titlecase procedures) has Unicode-aware title-casing.
string→number and number→string deal with Scheme syntax. (number→string number base) can output binary, octal or hexadecimal numbers.
digit-value (standard in R7RS) converts an individual character to a number.
string→utf8 and utf8→string take care of most needs.
Scheme does not have lazy strings (or "ropes"). Doing (string-append a b) makes copies of the underlying bytes of both a and b. The resulting string does not share structure with a or b. This means building new strings is somewhat expensive.
list→string allows accumulating individual characters into a list and then turning them into a string at the end. This is fast enough for everyday tasks.
(let letters ((cc (char->integer #\a)) (chars '())) (if (<= cc (char->integer #\z)) (letters (+ cc 1) (cons (integer->char cc) chars)) (list->string (reverse chars))))
Same without the reverse:
(let letters ((cc (char->integer #\z)) (chars '())) (if (< cc (char->integer #\a)) (list->string chars) (letters (- cc 1) (cons (integer->char cc) chars))))
R7RS has a standard open-output-string procedure. Writing to a string output port can be faster (and in some cases more convenient) than list→string.
(call-with-port (open-output-string) (lambda (out) (let letters ((cc (char->integer #\a))) (cond ((<= cc (char->integer #\z)) (write-char (integer->char cc) out) (letters (+ cc 1))) (else (get-output-string out))))))
read-char (R7RS) a.k.a get-char (R6RS) reads a character at a time from a string port. In some cases, using a byte port instead of a string port can yield an approach that is more resilient against character encoding gotchas, and utf8→string can be called once after reading a bytevector instead of constructing the string character-by-character.
A string can be iterated by incrementing the character index in a loop:
(define (display-chars s) (let ((n (string-length s))) (let loop ((i 0)) (when (< i n) (display (string-ref s i)) (newline) (loop (+ i 1))))))
string-ref is a constant-time operation in implementations that store string characters internally as a vector of 32-bit integers. Implementations that store a string as UTF-8 generally have to traverse the string from the beginning for each string-ref.
R7RS has a standard open-input-string procedure. Reading from a string port can be faster than string-ref depending on the implemnetation.
doc.scheme.org is a community subdomain of scheme.org.
schemedoc
mailing list (archives,
subscribe), GitHub issues.