let
and lambda
In this section, I'll present a new interpreter for a bigger subset of Scheme; it handles all of the essential special forms of Scheme, except for macro definitions. (A macro facility would be easy to add, as well, and would make it easy to implement the remaining special forms by automatic transformation, in terms of the special forms the interpreter "understands" directly. A later chapter will show how to do this.)
The new interpreter is very much like the one from the last chapter, with three important differences:
let
) may
create a new environment, and subexpressions (such as the let
body) can simply be evaluated in the new environment by recursive
calls to eval
let
.
The arguments are bound in a local environment (like let
variables), and the body is interpreted in that environment.
Here is our new eval
:
(define (eval expr envt) (cond ((symbol? expr) (eval-symbol expr envt)) ((pair? expr) (eval-list expr envt)) ((self-evaluating? expr) expr) (#t (error "Illegal expression form" expr))))
Notice that not much has changed---eval
still just analyzes
expressions and dispatches to more specialized helper procedures that
handle particular kinds of expressions.
The important difference is that eval
expects an environment
argument envt
, which represents the binding environment in
which to evaluate an expression. That is, the environment
argument is used to keep track of the meaning of variable names--what
storage they refer to--as teh interpretation process moves in and
out of scopes.
Instead of using the old "flat" representation of an environment, which was just a table of name-value pairs, we'll represent nested environments as a list of tables, or environment chain.
When we begin interpreting, the environment chain will consist
of one table, the top-level environment. When we evaluate a binding
construct such as a let
, we will create a new table, or
environment frame, which binds the local variables. This
frame will contain the name-value pairs bound locally, plus a pointer
to the next enclosing environment. The environment chain is thus
a linked list that acts like a stack, for the most part--new enviornment
frames are pushed on the front of the list when entering a binding
construct, and popped off the front of the list when exiting it.
We could implement this stack-like behavior with an explicit stack data structure in the interpreter, but it's easier to use the activation "stack" of the language we're using to implement the interpreter. (In this case, that happens to be Scheme, but if we were implementing the interpreter in C, we could use C's activation stack.) We'll just use recursion to evaluate subexpressions, and rely on the language we're implementing the interpreter in to remember where we were in interpreting the enclosing expressions.
At any given point during evaluation, the current environment
is the environment referred to by the interpreter's internal
variable envt
, an in particular the most recent binding of
envt
.
When we evaluate an expression that doesn't change the interpretive
environment, and call eval
recursively to evaluate subexpressions,
we simply pass the envt
variable's value to the recursive
calls. This will ensure that the subexpressions execute in the
same environment as the enclosing expression expression.
When we evaluate a binding construct, and evaluate subexpressions in
that environment, we create a new environment and pass that
to the recursive calls to eval
, so the subexpressions will
execute in the new enviornment instead.
Notice that we don't actually modify the environment chain when creating
a new environment--we simply create a new frame which holds a pointer
to the old environment, and pass it to the recursive eval
. The
fact that we don't actually modify the structure of the environment
is important--it's will let us implement closure correctly.
When the interpreter returns from evaluating a subexpression, it
returns to an enclosing invocation of eval
; the old
environment will become visible again because we return to
an eval
where that environment is the value of the envt
argument.
For example, consider what happens when we interpret the following expression, starting at the top level:
(let ((foo 1)) (if (a) (let ((bar 2)) (if (b) (c) (d)) (e)) (f) (g))
[ We'll focus on the nested calls to eval corresponding to the nesting of let, if, let, if ]
If we look at the nested calls to eval
, we first see a call
that evaluates the whole expression in the top-level environment:
+-----+ eval expr: (let...) envt: | *--+--> [toplevel envt] +-----+
(I've given a textual representation of the expr
argument, but
a pictorial representation of the envt
argument to eval
.)
eval
will dispatch to eval-let
, passing it the same
environment. eval-let
will evaluate the initial value expression
1
in that environment, and create a new environment binding
foo
.
(I'll ignore the recursive call to eval
to evaluate the argument.)
It will then call eval
recursively to evaluate the let
body in that environment.
I'll depict the nested invocations of eval
and eval-let
top-to-bottom, showing the stack growing toward the bottom of the
picture. (This just turns out to be simpler than drawing the stack
growing up.)
+-----+ eval expr: (let...) envt: | *--+--> [toplevel envt] +-----+ /|\ /|\ | | +-----+ | | eval-let expr: (let...) envt: | *--+-------+ | +-----+ | | +-----+ | eval expr: (if...) envt: | *--+--> [ [foo 1] * ] +-----+
eval-if
will evaluate the condition expression (a)
in the given environment. We'll ignore that recursive call to
eval
, but assume it returns a true value. In that case,
eval-if
will evaluate its consequent, the inner let
expression, by another recursive call to eval
.
At this point, the "stack" of invocations of eval
,
eval-let
, and eval-if
looks like this:
+-----+ eval expr: (let...) envt: | *--+--> [toplevel envt] +-----+ /|\ /|\ | | +-----+ | | eval-let expr: (let...) envt: | *--+-------+ | +-----+ | | +-----+ | eval expr: (if...) envt: | *--+---> [ [foo 1] * ] +-----+ /|\ | | +-----+ | eval-if expr: (if...) envt: | *--+-------+ +-----+ | | +-----+ | eval expr: (let...) envt: | *--+-------+ +-----+
Again, the let
will evaluate the intial value expression, 2
,
by a recursive call to eval
, which we will ignore here. Then
it will bind bar
in a new environment frame, and call eval
recursively to evaluate the body in that environment. The body consists
of another if, so eval-if
will be called, and it will evaluate
its argument expression and either the consequent or the alternative
in that environment.
Assuming the condition returns true and it evaluates the consequent,
(c)
, here's the "stack" of invocations of eval
,
eval-let
, and eval-if
at the point where (c)
is
evaluated:
+-----+ eval expr: (let...) envt: | *--+--> [toplevel envt] +-----+ /|\ /|\ | | +-----+ | | eval-let expr: (let...) envt: | *--+-------+ | +-----+ | | +-----+ | eval expr: (if...) envt: | *--+---> [ [foo 1] * ] +-----+ /|\ /|\ | | | | +-----+ | | eval-if expr: (if...) envt: | *--+-------+ | +-----+ | | | | +-----+ | | eval expr: (let...) envt: | *--+-------+ | +-----+ | | +-----+ | eval expr: (if...) envt: | *--+---> [ [bar 2] * ] +-----+ /|\ | +-----+ | eval expr: (c) envt: | *--+-------+ +-----+
[ Note that the pictures above all depict evaluation of nested non-tail
expressions. In the case of tail expressions, the "stack" will not
include as much information, because the state of the calls to eval
,
etc., will not be saved before the calls that evaluate subexpressions.
Our interpreter is written in good tail-recursive style, with tail calls to evaluate expressions that are tails of expressions in the language we're interpreting. This means that the intepreter is tail-recursive wherever the program it's implementing is tail-recursive, and since it's implemented in a tail-recursive language (Scheme), we preserve the tail-recurson of the program we're interpreting. In effect, we snarf tail-call optimization from the underlying Scheme system. If we were implementing our interpreter in C, we'd have to use special tricks to preserve tail recursion. We'll show how this can be done later, when we discuss our compiler. ]
In the interpreter in the last chapter, we implemented special forms
directly in the interpreter---eval-list
checked compound
expressions to see if they began with a special form name. In effect,
we hardcoded the meanings of special form names in the procedure
eval-special-form
.
In our new interpreter, we'll use a cleaner approach, which treats special form definitions pretty much like variable definitions. This will let us put special forms in particular environments, and use the normal scoping mechanisms to look up the routines that compile them.
This has several advantages. The first is that it makes our interpreter more modular. We can create different environments with different special forms, and use the same interpreter to interpret different languages. That is, we separate out the basic operation of the interpreter from the particular special forms we decide on.
The second advantage is that it will allow us to build an elegant macro facility, so that new special forms can be defined in terms of old ones. (This will be described in detail in [ a later chapter ].)
[ this is out of place, but fwd ref idea anyway? Shorten? Or just move?]
A Scheme interpreter or compiler only needs to "understand"
procedure calling and a few basic special forms---lambda
,
if
, set!
, quote
, and one very special
special form for defining new special forms (macros). (We can write
cond
as a macro using if
, let
as a macro using
lambda
, letrec
as a macro using let
, lambda
,
and set!
, and so on.)
The third advantage is that we can use the same scoping rules for special forms that we use for variables. This will be very convenient later, because we will be able to define local macros, in much the same way we define local procedures.
To support this, we need to represent bindings slightly differently. In the simple interpreter from the last chapter, each binding was just a name-value pair. Now we'll have a third part to each binding, telling what kind of binding it is--a variable binding, a special form binding, or a macro binding.
We can still use associations to represent the bindings. Where the
simpler interpreter representing each binding as an association of
the form (name value)
, the new one will use bindings of
the form (name type
whatever)
. In the case of a
normal variable binding, the "whatever" is the actual value of the
variable. In the case of a special form, the "whatever" is the
information the interpreter needs to interpret that particular special
form, including the procedure to evaluate it. For example, when binding
the name let
, we can store a pointer to the procedure eval-let
right there in the binding information.
Since the exact representation of bindings is irrelevant, and we may
want to change it, we'll call the whole thing a binding-info
data structure. This reflects that fact that it may not hold just
a binding, but also any auxiliary information we want to store.
To abstract away from exactly how bindings are implemented, we'll define
several procedures that operate on binding-info
's. These include:
bdg-type
, which returns a symbol saying what kind of binding
it is: <variable>
for a normal variable, <special-form>
for a built-in special form binding, and <syntax>
for a syntax
(macro) binding.
bdg-variable-ref
, which returns the value of a normal variable
binding.
bdg-special-form-evaluator
, which returns an evaluation procedure
for a special form binding.
For now we'll ignore <syntax>
bindings, which will be discussed
in a later chapter.
[ give actual code for accessors, etc? ]
Here's our new eval-list
for handling compound expressions:
(define (eval-list list-expr envt) ;; only try to consider it specially if the head is a symbol (if (symbol? (car list-expr)) ;; look it up in the current lexical environment (let ((binding-info (envt-lexical-lookup envt (car list-expr)))) ;; switch on the type of thing that it is (cond ((not binding-info) (error "Unbound symbol" (car list-expr))) (else (cond ;; special forms just call the special-form ;; evaluator, which is stored in the binding-info ;; object itself ((eq? (binding-type binding-info) '<special-form>) ((bdg-special-form-evaluator binding-info) list-expr envt)) ((eq? (binding-type binding-info) '<variable>) (eval-combo (bdg-variable-ref binding-info) (cdr list-expr) envt)) ((eq? (binding-type binding-info) '<syntax>) (eval-macro-call (bdg-syntax-transformer binding-info) list-expr envt)) (else (error "Unrecognized binding type")))))) ;; the head of the list is not a symbol, so evaluate it ;; and then do an eval-combo to evaluate the args and ;; call the procedure (eval-combo (eval (car list-expr) envt) (cdr list-expr) envt)))
eval-list
first checks to see whether the head of the list
is a symbol; if not, it's just a combination (procedure call expression),
and is handled by eval-combo
. (Remember that a combination
can have an arbitrary expression as its operator, and that expression
is assumed to return a procedure to call.)
If it is a symbol, the binding of the variable is looked up. If it's a special form binding, the evaluation procedure is extracted from the binding info, and called to evaluate the expression.
If the head of the list is just the name of a normal variable, that's
also just a combination, and eval-combo
is called in that
case, too.
If the head of the list is the name of a syntax binding (macro), we
call eval-macro-call
to deal with it; don't worry about
this for now--it will be discussed in detail in Chapter [ whatever ].
Notice that in all cases, the environment is passed along unchanged to whatever procedure handles the expression.