An Introduction to Scheme and its Implementation - An Interpreter with let and lambda

Go to the first, previous, next, last section, table of contents.

An Interpreter with `let` and `lambda`

In this section, I'll present a new interpreter for a bigger subset of Scheme; it handles all of the essential special forms of Scheme, except for macro definitions. (A macro facility would be easy to add, as well, and would make it easy to implement the remaining special forms by automatic transformation, in terms of the special forms the interpreter "understands" directly. A later chapter will show how to do this.)

The new interpreter is very much like the one from the last chapter, with three important differences:

It implements local binding environments as well as a top-level environment. Evaluating an expression (such as a let) may create a new environment, and subexpressions (such as the let body) can simply be evaluated in the new environment by recursive calls to eval
It allows new procedures to be defined, creating closures. Closures pair environments with code bodies that are interpreted by the interpreter. Calling a closure is much like evaluating a let. The arguments are bound in a local environment (like let variables), and the body is interpreted in that environment.
We will treat special forms differently, binding special form names in much the same way as normal variable names. This will make the interpreter cleaner and more extensible.

Here is our new eval:

(define (eval expr envt)
   (cond ((symbol? expr)
          (eval-symbol expr envt))
         ((pair? expr)
          (eval-list expr envt))
         ((self-evaluating? expr)
          expr)
         (#t
          (error "Illegal expression form" expr))))

Notice that not much has changed---eval still just analyzes expressions and dispatches to more specialized helper procedures that handle particular kinds of expressions.

The important difference is that eval expects an environment argument envt, which represents the binding environment in which to evaluate an expression. That is, the environment argument is used to keep track of the meaning of variable names--what storage they refer to--as teh interpretation process moves in and out of scopes.

Nested Environments and Recursive Evaluation

Instead of using the old "flat" representation of an environment, which was just a table of name-value pairs, we'll represent nested environments as a list of tables, or environment chain.

When we begin interpreting, the environment chain will consist of one table, the top-level environment. When we evaluate a binding construct such as a let, we will create a new table, or environment frame, which binds the local variables. This frame will contain the name-value pairs bound locally, plus a pointer to the next enclosing environment. The environment chain is thus a linked list that acts like a stack, for the most part--new enviornment frames are pushed on the front of the list when entering a binding construct, and popped off the front of the list when exiting it.

We could implement this stack-like behavior with an explicit stack data structure in the interpreter, but it's easier to use the activation "stack" of the language we're using to implement the interpreter. (In this case, that happens to be Scheme, but if we were implementing the interpreter in C, we could use C's activation stack.) We'll just use recursion to evaluate subexpressions, and rely on the language we're implementing the interpreter in to remember where we were in interpreting the enclosing expressions.

At any given point during evaluation, the current environment is the environment referred to by the interpreter's internal variable envt, an in particular the most recent binding of envt.

When we evaluate an expression that doesn't change the interpretive environment, and call eval recursively to evaluate subexpressions, we simply pass the envt variable's value to the recursive calls. This will ensure that the subexpressions execute in the same environment as the enclosing expression expression.

When we evaluate a binding construct, and evaluate subexpressions in that environment, we create a new environment and pass that to the recursive calls to eval, so the subexpressions will execute in the new enviornment instead.

Notice that we don't actually modify the environment chain when creating a new environment--we simply create a new frame which holds a pointer to the old environment, and pass it to the recursive eval. The fact that we don't actually modify the structure of the environment is important--it's will let us implement closure correctly.

When the interpreter returns from evaluating a subexpression, it returns to an enclosing invocation of eval; the old environment will become visible again because we return to an eval where that environment is the value of the envt argument.

For example, consider what happens when we interpret the following expression, starting at the top level:

(let ((foo 1))
   (if (a)
       (let ((bar 2))
          (if (b)
              (c)
              (d))
          (e))
       (f)
   (g))

[ We'll focus on the nested calls to eval corresponding to the nesting of let, if, let, if ]

If we look at the nested calls to eval, we first see a call that evaluates the whole expression in the top-level environment:

                            +-----+
eval  expr: (let...)  envt: |  *--+--> [toplevel envt]
                            +-----+

(I've given a textual representation of the expr argument, but a pictorial representation of the envt argument to eval.)

eval will dispatch to eval-let, passing it the same environment. eval-let will evaluate the initial value expression 1 in that environment, and create a new environment binding foo. (I'll ignore the recursive call to eval to evaluate the argument.) It will then call eval recursively to evaluate the let body in that environment.

I'll depict the nested invocations of eval and eval-let top-to-bottom, showing the stack growing toward the bottom of the picture. (This just turns out to be simpler than drawing the stack growing up.)

                                +-----+
eval     expr: (let...)   envt: |  *--+--> [toplevel envt]
                                +-----+      /|\     /|\
                                              |       |
                                +-----+       |       |
eval-let expr: (let...)   envt: |  *--+-------+       |
                                +-----+               |
                                                      |
                                +-----+               |
eval     expr: (if...)    envt: |  *--+-->  [ [foo 1] * ]
                                +-----+

eval-if will evaluate the condition expression (a) in the given environment. We'll ignore that recursive call to eval, but assume it returns a true value. In that case, eval-if will evaluate its consequent, the inner let expression, by another recursive call to eval.

At this point, the "stack" of invocations of eval, eval-let, and eval-if looks like this:

                                +-----+
eval     expr: (let...)   envt: |  *--+--> [toplevel envt]
                                +-----+      /|\     /|\
                                              |       |
                                +-----+       |       |
eval-let expr: (let...)   envt: |  *--+-------+       |
                                +-----+               |
                                                      |
                                +-----+               |
eval     expr: (if...)    envt: |  *--+---> [ [foo 1] * ]
                                +-----+      /|\  
                                              | 
                                              | 
                                +-----+       |   
eval-if  expr: (if...)    envt: |  *--+-------+       
                                +-----+       |       
                                              |      
                                +-----+       |        
eval     expr: (let...)   envt: |  *--+-------+
                                +-----+

Again, the let will evaluate the intial value expression, 2, by a recursive call to eval, which we will ignore here. Then it will bind bar in a new environment frame, and call eval recursively to evaluate the body in that environment. The body consists of another if, so eval-if will be called, and it will evaluate its argument expression and either the consequent or the alternative in that environment.

Assuming the condition returns true and it evaluates the consequent, (c), here's the "stack" of invocations of eval, eval-let, and eval-if at the point where (c) is evaluated:

                                +-----+
eval     expr: (let...)   envt: |  *--+--> [toplevel envt]
                                +-----+      /|\     /|\
                                              |       |
                                +-----+       |       |
eval-let expr: (let...)   envt: |  *--+-------+       |
                                +-----+               |
                                                      |
                                +-----+               |
eval     expr: (if...)    envt: |  *--+---> [ [foo 1] * ]
                                +-----+      /|\     /|\
                                              |       |
                                              |       |
                                +-----+       |       |
eval-if  expr: (if...)    envt: |  *--+-------+       |
                                +-----+       |       |
                                              |       |
                                +-----+       |       | 
eval     expr: (let...)   envt: |  *--+-------+       |
                                +-----+               |
                                                      |
                                +-----+               |
eval     expr: (if...)    envt: |  *--+---> [ [bar 2] * ]
                                +-----+      /|\  
                                              | 
                                +-----+       | 
eval     expr: (c)        envt: |  *--+-------+    
                                +-----+

[ Note that the pictures above all depict evaluation of nested non-tail expressions. In the case of tail expressions, the "stack" will not include as much information, because the state of the calls to eval, etc., will not be saved before the calls that evaluate subexpressions.

Our interpreter is written in good tail-recursive style, with tail calls to evaluate expressions that are tails of expressions in the language we're interpreting. This means that the intepreter is tail-recursive wherever the program it's implementing is tail-recursive, and since it's implemented in a tail-recursive language (Scheme), we preserve the tail-recurson of the program we're interpreting. In effect, we snarf tail-call optimization from the underlying Scheme system. If we were implementing our interpreter in C, we'd have to use special tricks to preserve tail recursion. We'll show how this can be done later, when we discuss our compiler. ]

Integrated, Extensible Treatment of Special Forms

In the interpreter in the last chapter, we implemented special forms directly in the interpreter---eval-list checked compound expressions to see if they began with a special form name. In effect, we hardcoded the meanings of special form names in the procedure eval-special-form.

In our new interpreter, we'll use a cleaner approach, which treats special form definitions pretty much like variable definitions. This will let us put special forms in particular environments, and use the normal scoping mechanisms to look up the routines that compile them.

This has several advantages. The first is that it makes our interpreter more modular. We can create different environments with different special forms, and use the same interpreter to interpret different languages. That is, we separate out the basic operation of the interpreter from the particular special forms we decide on.

The second advantage is that it will allow us to build an elegant macro facility, so that new special forms can be defined in terms of old ones. (This will be described in detail in [ a later chapter ].)

[ this is out of place, but fwd ref idea anyway? Shorten? Or just move?]

A Scheme interpreter or compiler only needs to "understand" procedure calling and a few basic special forms---lambda, if, set!, quote, and one very special special form for defining new special forms (macros). (We can write cond as a macro using if, let as a macro using lambda, letrec as a macro using let, lambda, and set!, and so on.)

The third advantage is that we can use the same scoping rules for special forms that we use for variables. This will be very convenient later, because we will be able to define local macros, in much the same way we define local procedures.

To support this, we need to represent bindings slightly differently. In the simple interpreter from the last chapter, each binding was just a name-value pair. Now we'll have a third part to each binding, telling what kind of binding it is--a variable binding, a special form binding, or a macro binding.

We can still use associations to represent the bindings. Where the simpler interpreter representing each binding as an association of the form (name value), the new one will use bindings of the form (name type whatever). In the case of a normal variable binding, the "whatever" is the actual value of the variable. In the case of a special form, the "whatever" is the information the interpreter needs to interpret that particular special form, including the procedure to evaluate it. For example, when binding the name let, we can store a pointer to the procedure eval-let right there in the binding information.

Since the exact representation of bindings is irrelevant, and we may want to change it, we'll call the whole thing a binding-info data structure. This reflects that fact that it may not hold just a binding, but also any auxiliary information we want to store.

To abstract away from exactly how bindings are implemented, we'll define several procedures that operate on binding-info's. These include:

bdg-type, which returns a symbol saying what kind of binding it is: <variable> for a normal variable, <special-form> for a built-in special form binding, and <syntax> for a syntax (macro) binding.
bdg-variable-ref, which returns the value of a normal variable binding.
bdg-special-form-evaluator, which returns an evaluation procedure for a special form binding.

For now we'll ignore <syntax> bindings, which will be discussed in a later chapter.

[ give actual code for accessors, etc? ]

Here's our new eval-list for handling compound expressions:

(define (eval-list list-expr envt)
   ;; only try to consider it specially if the head is a symbol
   (if (symbol? (car list-expr))

       ;; look it up in the current lexical environment
       (let ((binding-info (envt-lexical-lookup envt (car list-expr))))

          ;; switch on the type of thing that it is
          (cond ((not binding-info)
                 (error "Unbound symbol" (car list-expr)))
                (else
                 (cond
                    ;; special forms just call the special-form
                    ;; evaluator, which is stored in the binding-info
                    ;; object itself
                   ((eq? (binding-type binding-info) '<special-form>)
                    ((bdg-special-form-evaluator binding-info) list-expr
                                                               envt))

                   ((eq? (binding-type binding-info) '<variable>)
                    (eval-combo (bdg-variable-ref binding-info)
                                (cdr list-expr)
                                envt))
                   ((eq? (binding-type binding-info) '<syntax>)
                    (eval-macro-call (bdg-syntax-transformer binding-info)
                                     list-expr
                                     envt))
                   (else
                    (error "Unrecognized binding type"))))))

         ;; the head of the list is not a symbol, so evaluate it
         ;; and then do an eval-combo to evaluate the args and
         ;; call the procedure
         (eval-combo (eval (car list-expr) envt)
                     (cdr list-expr)
                     envt)))

eval-list first checks to see whether the head of the list is a symbol; if not, it's just a combination (procedure call expression), and is handled by eval-combo. (Remember that a combination can have an arbitrary expression as its operator, and that expression is assumed to return a procedure to call.)

If it is a symbol, the binding of the variable is looked up. If it's a special form binding, the evaluation procedure is extracted from the binding info, and called to evaluate the expression.

If the head of the list is just the name of a normal variable, that's also just a combination, and eval-combo is called in that case, too.

If the head of the list is the name of a syntax binding (macro), we call eval-macro-call to deal with it; don't worry about this for now--it will be discussed in detail in Chapter [ whatever ].

Notice that in all cases, the environment is passed along unchanged to whatever procedure handles the expression.

Go to the first, previous, next, last section, table of contents.