An Introduction to Scheme and its Implementation - Improving the Simple Interpreter

Go to the first, previous, next, last section, table of contents.

Improving the Simple Interpreter

We can easily improve the little interpreter in lots of ways. [ We should put the code in a file minieval.scm so people can experiment with it. Need to debug it first, of course. It's changed since the one I've used in class. ]

First, we can add a toplevel binding environment, so we can have some variables. (Local variables will be discussed in the next chapter.) To make them useful, we need some special forms, like define and (while we're at it) set!.

We can also add a few more data types; for now, we'll just add booleans.

Here's what our new main dispatch routine looks like:

(define (eval expr)
   (cond ;; variable reference
         ((symbol? expr)
          (eval-variable expr))
         ;; combination OR special form (any parenthesized expr)
         ((pair? expr)
          (eval-list expr))
         ;; any kind of self-evaluating object
         ((self-evaluating? expr)
          expr)
         (else
          (error "Illegal expression: " expr))))

Since we're adding variables to our interpreter, symbols can be expressions by themselves now--references to top-level variable bindings. We've added a branch to our cond to handle that, and a helper procedure eval-variable. (We'll discuss how the variable lookup is done shortly.)

We need to recognize two kinds of self-evaluating types now (and may add more later), so we come up with a procedure self-evaluating? that covers both cases and can easily be extended.

(define (self-evaluating? expr)
   (or (number? expr) (boolean? expr)))

[hmm... haven't extended the reader to handle booleans yet... need to use the standard Scheme reader, or extend micro-read]

Be sure you understand the significance of this. We chose to read numeric literals as Scheme numbers, and boolean literals as Scheme objects. This representation makes them easy to evaluate--we just return the same object that the reader created.

We also need to recognize two basic types of compound expressions: combinations and special forms. These (and only these) are represented as lists, so we can use pair? as a test, and dispatch to eval-list.

What we're really doing here is checking to see whether we're evaluating a parenthesized expression of either kind. Since parenthesized expressions are read in as lists, we can just check to see whether the s-expression is a list. That is, we chose to parse (read) parenthesized expressions as lists, and by checking to see if something's a list, we're implicitly checking whether it was a parenthesized expression in the original code. (We could have chosen to represent parse trees as some other kind of tree, rather than s-expressions, but s-expressions are handy because Scheme provides procedures for manipulating and displaying them.)

Here's the code for eval-list, which just checks to see whether a compound expression is a special form. It dispatches to eval-special-form if it is, or to eval-combo if it's not.

(define (eval-list expr)
   (if (and (symbol? (car expr))
            (special-form-name? (car expr)))
       (eval-special-form expr)
       (eval-combo)))

We could use a cond to check whether symbols are special form names, but using member on a literal list is clearer and easily extensible--you can just add names to the list.

(define (special-form-name? expr)
   (member '(if define set!)))

eval-special-form just dispatches again, calling a routine that handles whatever kind of special form it's faced with. (Later, we'll see prettier ways of doing this kind of dispatching, using first-class procedures.) From here, we've done most of the analysis, and are dispatching to little procedures that actually do the work.

[ need to come back to this after discussing backquote--this would make a good example ]

(define (eval-special-form expr)
  (let ((name (car expr)))
     (cond ((eq? name 'define)
            (eval-define expr))
           ((eq? name 'set!)
            (eval-set! expr))
           ((eq? name 'if)
            (eval-if expr)))))

Notice that each special form has its own special routine to interpret it. This is what makes special forms "special," in contrast to combinations, which can be handled by one procedure, eval-combo.

Once the evaluator has recognized an if expression, it calls eval-if to do the work. eval-if calls eval recursively, to evaluate the condition expression, and depending on the result, calls it again to evaluate the "then" branch or the "else" branch. (One slight complication is that we may have a one-branch else, so eval-if has to check to see if the else branch is there. If not, it just returns #f.)

(define (eval-if expr)
   (let ((expr-length (length expr)))
      (if (eval (cadr expr))
          (eval (caddr expr))
          (if (= expr-length 4))
              (eval (cadddr expr))
              #f)))

[ note that what we're doing includes parsing... one-branch vs. two branch if. Should actually be doing more parsing, checking syntax and signaling errors gracefully. E.g., should check to see that expr-length is a legal length. ]

[ Also note that we're snarfing booleans, and our if behaves like a Scheme if... but we don't have to. We could put a different interpretation on if, e.g., only interpreting #t as a true value. ]

Implementing top-level variable bindings

For a toplevel binding environment, we'll use an association list. (A more serious interpreter would probably use a hash table, but a association list will suffice to demonstrate the principles.)

We start by declaring a variable to hold our interpreter's environment, and initializing it with an empty list.

(define t-l-envt '())

To add bindings, we can define a routine to add an association to the association list.

(define (toplevel-bind! name value)
   (let ((bdg (assoc name t-l-envt))) ;; search for association of name
      ;; if binding already exists, put new value (in cadr) of association
      ;; else create a new association with given value
      (if bdg
          (set-car! (cdr bdg) value)
          (set! t-l-envt
                (cons (list name value) t-l-envt)))))

Recall that the elements of an association list are "associations," which are just lists whose first value is used as a key. We'll use the second element of the list as the actual storage for a variable.

For example, an environment containing just bindings of foo and bar with values 2 and 3 (respectively) would look like ((foo 2) (bar 3)).

At the level of the little Scheme subset we're implementing, we'd draw this environment this way:

         +-------+        [envt]
t-l-envt |   *---+------>+-------+
         +-------+   foo |   *---+---> 2
                         +-------+
                     bar |   *---+---> 3
                         +-------+

This emphasizes the fact that these are variable bindings with values, i.e., named storage locations. Notice that t-l-envt is a variable in the language we're using to implement our interpreter, but foo and bar are variables in the language the interpreter implements.

If we want to show how it's implemented at the level of the Scheme we're writing our interpreter in, we can draw it more like this:

         +-------+
t-l-envt |   *---+---->+---+---+                   +---+---+
         +-------+     | * | *-+------------------>| * | * +
                       +-|-+---+                   +-+-+---+
                         |                           |
                       +---+---+   +---+---+       +---+---+   +---+---+
                       | * | +-+-->| * | * |       | * | *-+-->| * | * |
                       +-|-+---+   +-|-+---+       +-|-+---+   +-+-+---+
                         |           |               |           |
                        foo          2              bar          3

Now we can add the four procedures we had in the math evaluator:

(toplevel-bind! '+ +)
(toplevel-bind! '- -)
(toplevel-bind! '* *)
(toplevel-bind! '/ /)

Again, we're just snarfing procedures straight from the Scheme we're implementing our interpreter in. We put them in our binding environment under the same names.

Now we need accessor routines to get and set values of bindings for variable lookups and set!

(define (toplevel-get name)
   (cadr (assoc name t-l-envt)))

(define (toplevel-set! name value)
   (set-car! (cdr (assoc name t-l-envt))
             value))

[ of course, these really should have some error checking--give examples that signal unbound variable errors? ]

Given this machinery, we can now write eval-variable. (Recall that when eval encounters an expression that's just a symbol, it interprets it as a reference to a variable by calling eval-variable.)

(define (eval-variable symbol)
   (toplevel-get symbol))

We can also define eval-define and eval-set!. All they do is extract a variable name from the define or set! expression, and create binding for that name or update its value. (Recall that eval-define! and eval-set! are called from eval-special-form to interpret expressions of the forms (define ...) or (set! ...), respectively.)

(define (eval-define! expr)
   (toplevel-bind! (cadr expr)
                   (eval (caddr expr))))

(define (eval-set! expr)
   (toplevel-set! (cadr expr)
                  (eval (caddr expr))))

Running the improved interpreter

[ need some example uses ]

Go to the first, previous, next, last section, table of contents.