Most programming languages have a standard facility for defining
new types of records or structures. A record is
an object with named fields. For example, we might define a
point
record type, to represent geometric points in a plane.
Each point
object might have a x
field and a y
field, giving the horizontal and vertical coordinates of a point
relative to the origin. Once this point
class is defined,
we can create instances of it, i.e., actual objects of type
point, to represent particular points in space.
Scheme is an unusual language in that there is not a standard facility for defining new types. We can build a type-definition facility, however, using macros.
In this chapter, I'll show a simple record definition facility written in Scheme. Then I'll describe a simple object-oriented programming system for Scheme, and show how it can be implemented in Scheme, too. (Both of these systems rely on Lisp-style macros, which are not standard Scheme, but are available in almost every Scheme implementation.)
Scheme's main abstraction mechanism is procedural abstraction. We can define procedures that represent higher-level operations, i.e., operations not built into the language, but which are useful for our purposes. We can construct abstract data types, which are data types that represent higher-level concepts (such as points in a coordinate space), and use procedures to implement the operations.
For example, we can fake a point
data type by hand, by writing
a set of procedures that will construct point objects and access
their fields. We can choose a representation of points in terms
of preexisting Scheme types, and write our procedures accordingly.
For example, we can use Scheme vectors to represent points, with
each point represented as a small vector, with a slot for the x
field and a slot for the y field. We can write a handful of
procedures to create and operate on instances of our point
data type, which will really allocate Scheme vectors and operate
on them in ways that are consistent with our higher-level point
abstraction.
We start with a constructor procedure make-point
, which will
create ("construct") a point object and initialize its x
and
y
fields. It really allocates a Scheme vector. The zeroth slot
of the vector holds the symbol point
, so that we can tell it
represents a point object.
; a point is represented as a three-element vector, with the 0th ; slot holding the symbol point, the 1st slot representing ; the x field,, and the 2nd slot representing the y field. (define (make-point x y) (vector 'point x y))
We also define a predicate for testing whether an object is a
point
record. It checks to see if the object is actuall
a Scheme vector and its zeroth slot holds the symbol point
.
This isn't perfect--we might mistake another vector that happens
to hold that symbol in its zeroth slot for a point, but we'll
ignore that for now. (It's easy to fix, and we'll fix it later
when we build a more sophisticated object system.)
; check to see if something is a point by checking to see if it's ; a vector whose 0th slot holds the symbol point. (define (point? obj) (and (vector? obj) (eq? (vector-ref obj 0) 'point)))
Now we define accessor procedures to get and set the x and y fields of our points--the 1st and 2nd slots of the vector we use to represent a point.
; accessors to get and set the value of a point's x field. (define (point-x obj) (vector-ref obj 1)) (define (point-x-set! obj value) (vector-set obj 1 value)) ; accessors to get and set the value of a point's y field. (define (point-y obj) (vector-ref obj 2)) (define (point-y-set! obj) (vector-set! obj 2 value))
This isn't perfect, either--we should probably test to make sure
an object is a point
before operating on it as a point
.
For example, point-x
should be more like this:
(define (point-x obj) (if (point? obj) (vector-ref obj 1) (error "attempt to apply point-x to a non-point)))
Once we've defined the procedures that represent operations on
an abstract data type, we can ignore how it's implemented--we
no longer have to worry about how point
s are represented.
We can also change the implementation of an abstract data type by redefining the procedures that create and operate on instances of that type.
For example, we could decide to represent points as lists, rather than vectors, and redefine the constructor, predicate, and accessors to use that representation.
We could also change the representation to polar form, rather than Cartesian, storing a direction and distance from the origin rather than x and y distances. With a polar representation, we could still support the operations that return or set x coordinates, using trigonometric functions to compute them from the direction and distance.
As I just showed, it's easy to define an abstract data type in Scheme, by hand, using procedural abstraction. Doing this for every abstract data type is very tedious, however, so it would be good to automate the process and provide a declarative interface to it.
We'd like to be able to write something like this:
(define-structure point x y)
and have Scheme automatically construct the constructor, type predicate,
and accessor procedures for us. In most languages, this is done by
the compiler, but we can tell Scheme how to do it by defining
define-structure
as a macro. Whenever the interpreter or
compiler encounters a define-structure
form, our macro
transformation procedure will be called and will generate the
relevant procedures, which will then be interpreted or compiled
in place of the define-structure
form.
We'll use a define-macro
(Lisp-style) macro for this.
this macro will intercept each define-structure
form,
analyze it, and produce an s-expression that is a sequence of
procedure definitions to be interpreted or compiled. Each
define-structure form will be translated into a begin
form containing a series of procedure definitions.
; define-struct is a macro that takes a struct name and any number of field ; names, all of which should be symbols. Then it generates a begin expression ; to be compiled, where the begin expression contains the constructor for this ; structure type, a predicate to identify instances of this structure type, ; and all of the accessor definitions for its fields. (define-macro (define-struct struct-name . field-names) ; analyze the macro call expression and construct some handy symbols ; and an s-expression that will define and record the accessor methods. (let* ((maker-name (string->symbol (string-append "make-" (symbol->string struct-name)))) (pred-name (string->symbol (string-append (symbol->string struct-name) "?"))) (accessor-defns (generate-accessor-defns struct-name field-names))) ; return an s-expression that's a series of definitions to be ; interpreted or compiled. `(begin (define (,maker-name ,@field-names) (vector ',struct-name ,@field-names)) (define (,pred-name obj) (and (vector? obj) (eq? (vector-ref obj 0) ,struct-name))) ,@accessor-defns)))
To generate all of the accessor definitions, we call a special helper
routine, generate-accessor-defns
, and splice the result into
the sequence of definitions using unquote-splicing
(,@
).
generate-accessor-definitions
simply iterates over the
list of slot names tail-recursively (using named let
), consing
two definitions onto the definitions for the rest of the slots:
; generate-accessor-defns generates a list of s-expressions that ; define the accessors (getters and setters) for a structure. (define (generate-accessor-defns structname fnames) (let ((structname-string (symbol->string structname))) ; loop over the fieldnames, and for each fieldname, generate two ; s-expressions: one that is a definition of a getter, and one that's ; a definition of a setter. ; As we loop, increment a counter i so that we can use it as the index ; for each slot we're generating accessors for (let loop ((fieldnames fnames) (i 1)) (if (null? fieldnames) '() ; take a fieldname symbol, convert to string, append it to the ; struct name string with a hyphen in the middle, and convert ; that to a symbol... (let* ((fieldname-string (symbol->string (car fieldnames))) (getter-name (string->symbol (string-append structname-string "-" fieldname-string))) (setter-name (string->symbol (string-append structname-string "-" fieldname-string "-set!")))) ; now construct the define forms and cons them onto the ; front of the list of the remaining define forms, generated ; iteratively (tail-recursively) (cons `(define (,getter-name obj) (vector-ref obj ,i)) (cons `(define (,setter-name obj value) (vector-set! obj ,i value)) (loop (cdr fieldnames) (+ i 1)))))))))
In this section, I'll discuss a simple object system and how it it used. This object system is not part of Standard Scheme, but can be [almost entirely ?] implemented in portable Scheme, and used in any Scheme system with a reasonably powerful macro system.
The object system is based on classes and generic procedures. It is a subset of the RScheme object system, and its basic functionality is similar to a subset of CLOS object system for Common Lisp, the Dylan object system, Meroon, TinyCLOS, and STkLOS.
One of the major features of object-based and object-oriented programming is late binding of methods, which means that we can write code in terms of abstract operations without knowing exactly which concrete operations will be executed at run time.
For example, consider a graphical program that maintains a list of objects whose graphical reprsentations are visiblle on the user's screen, and periodically redraws those objects. It might iterate over this "display list" of objects, applying a drawing routine to each object to display it on the screen. In most interesting applications, there would be a variety of graphical object types, each of which is drawn in a different ways.
If our graphical objects are represented as traditional records, such
as C structs or Pascal records, the drawing routine must be modified
each time a new graphical type is added to the program. For example,
suppose we have a routine draw
which can draw any kind
of object on the screen. draw
might be written with
a case expression, like this:
(define (draw obj) (cond ((triangle? obj) (draw-triangle obj)) ((square? obj) (draw-square obj)) ((circle? obj) (draw-circle obj)) ; more branches... ; . ; . ; . ((regular-pentagon? obj) (draw-regular-pentagon obj))
Each time we define a new kind of record that represents a graphical
object, we must add a branch to this cond
to check for that
kind of object, and call the appropriate drawing routine.
In large, sophisticated programs that deal with many kinds of objects,
the code may be littered with cond
or case
statements
like this, which represent abstract operations, and map them
onto concrete operations for specific types. (This example maps
the abstract operation "draw an object" onto concrete operations
like draw-triangle
, draw-square
, and so on.)
Such code is very difficult to maintain and extend. Whenever a new
type is added to the system, all of the cond
or case
expressions that could be affected must be located and modified.
What we would like is a way of specifying how an abstract operation
is implemented for a particular kind of object, and having the system
keep track of the details. For example, we'd like to say at one
point in the program, "here's how you draw a regular pentagon,"
and then be able to use regular pentagons freely. We can then use
the abstract operation draw
, and rely on the system to automatically
check what kind of object is being drawn, find the appropriate drawing
routine for that type, and call it to draw that particular object.
For example, the routine that draws all of the visible objects might just look like this:
(map draw display-list)
When we later add a new type, such as irregular-hexagon
, we can
just define a method for drawing irregular hexagons, and the
system will automatically make the draw
operation work for
irregular hexagons. We don't have to go find all of the code that
might encounter irregular hexagons and modify it.
This feature is called late binding of methods. When we write code that uses an abstract operation, we don't have to specify exactly what concrete operation should be performed.
(Note: here we're using
a fairly general sense of the word "binding," which is more
general than the notion of variable binding. We're making an association
between a piece of code and the operation it represents, rather than
between a name and a piece of storage. In this general sense, "binding"
means to associate something with something else, and in this example,
we associating the abstract operation draw
with the particular
procedure needed to draw a particular object at run time.)
As we'll see a little later, we can define a generic procedure
that reprsents the abstract draw
operation, and rely on an object
system to bind that abstract operation to the appropriate drawing
procedure for a particular type at run time. When we later define
new types and methods for drawing them, the generic procedure will be
automatically updated to handle them. This lets us write most of
our code at a higher level of abstraction, in terms of operations
that "just work" for all of the relevant types. (E.g., we might
have abstract operations that can draw, move, and hide any kind
of graphical object, so that we don't need to worry about the differences
between the different kinds of graphical objects if those differences
don't matter for what we're trying to do.)
A class is an object that describes a particular kind of object. A class definition is an expression like a record or structure definition, which defines the structure of that kind of object. Classes can also have associated behavior or methods, which are routines for performing particular operations on instances of a class.
For example, suppose we would like to have a class of objects that
can be used to represent points in two-dimensional space. Each
point object will have an x
slot and a y
slot, which
hold the object's position in the x and y dimensions.
(A slot is a field of an object, which in other languages may be known as an instance variable, a data member, an attribute, or a feature.)
We can define our point class like this:
(define-class <point> (<object>) (x init-value: 0) (y init-value: 0))
Here we have chosen to name the class <point>
. By convention,
we use angle brackets to begin and end the names of classes, so that
it's clear that they are class names, not names of normal objects.
The parenthesized expression after the class name <point>
is
a sequence of superclass names, which will be explained
later.(12)
(When in doubt, it is a good idea to use <object>
as
the sole superclass, so use (<object>)
after the class
name in the class definition.)
The two remaining fields after the superclasses are the slot
specifications, which say what kinds of fields an instance of
<point>
will have. A slot specification is written
in parentheses, and the first thing is the name of the slot.
After that come keyword/value pairs. Here we use
the keyword init-value:
followed by the value 0
.
The specification (x init-value: 0)
says that each instance
of <point> will have a slot (field) named x
, and that the
initial value of the field is 0
. That is, when we create a
<point>
instance to represent a 2-d point, the initial
x value will be zero. Likewise, the slot specification
(y init-value 0
says that each point will also have a
y
slot whose initial value is 0
.
We can create an instance of an object by using the special
form make
, which is actually implemented as a macro.
The make
operation takes a class as its first argument,
and returns a new object that is an instance of that class.
To make a <point>
, we might use the make
expression
(make <point>)
This expression returns a new point whose x and y slots are initialized to zero.
If we want the slots of an object to be initialized to a requested
value at the time the object is initialized--rather than always
being initialized the to the same value for every object, we
can omit the initial value specification in the class definition,
and provide it to the make
call that creates an object.
(define-class <point> (<object>) (x) (y))
Given this class definition, we can use make
to create
a <point>
instance with particular x and y values:
(define my-point (make <point> x: 10 y: 20))
Here we've created a point object with an x value of 10 and a y
value of 20. Note that the x value is labeled by a keyword
x:
. As in a class definition, a keyword argument to
make
looks sort of like an argument, but it really isn't:
it's the name of the following argument.
Keyword arguments to define-class
and make
let
you write the arguments in any order, by giving the name
before the value. We could have written the above call to
make
with the values in the opposite order:
(define my-point (make <point> y: 20 x: 10))
The result of this definition is exactly the same as the
earlier one. The make
macro will sort out the arguments,
looking at the keyword to figure out what the following arguments
are for.
By default, when we define a class with slots x
and y
,
we implicitly define operations on those fields of those objects.
For each field, two routines are defined, a getter, which
fetches the value of the field, and a setter, which sets
the value of the field. The name of the getter is just
the name of the field. The name of the setter starts with set-
,
followed by the name of the field, followed by an exclamation point to
indicate that the operation is destructive (i.e., modifies the state
of the object by replacing an old value with a new one.)
Given the point we created, we can ask the value of its x
field by evaluating the expression (x my-point)
, which
will return 10
. We can change teh value to 50 by evaluating
the expression (set-x! my-point 50)
. We can increment
it by 1 with the expression
(set-x! my-point (+ 1 (x my-point)))
Different kinds of objects can have fields with the same name, and the getters and setters will operate on the appropriate field of whatever kind of object they are applied to. (Accessors are actually generic procedures, which will be explained later.)
A generic procedure is a procedure that does a certain operation, but may do it in different ways depending on what kind of argument it is given. A generic procedure can be specialized, telling it how to perform a particular kind of operation for a particular kind of argument.
A method definition specifies how a generic operation should be done for an object of a particular type. Conceptually, a generic function keeps track of all of the methods which perform a particular operation on different kinds of objects. A generic procedure is called just like any other function, but the first thing it does is to look up the appropriate method for the kind of object being operated on. Then it applies that method. A generic procedure is therefore a kind of dispatcher, which maps abstract operations onto the actual procedures for performing them.
For example, suppose we would like to define several classes, @code<stack>, @code<queue>, and @code<d-e-queue>, to represent stacks, queues, and double-ended queues, respectively.
We could define stack this way:
(define-class <stack> (<object>) (items init-value: '()) ; list of items in the stack
An instance of <stack>
has one field, items
, which
points to a list of items in the stack. We can push items onto the
stack by cons
ing them onto the front of its list of items, or
pop items off of the stack by cdr
ing the list.
To define the behavior of <stack>
---and things like stacks--we
need some generic procedures, insert-first!
and remove-first!
.
These will add an item to the front (top) of a stack, or remove
and return the item from the front (top) of a stack, respectively.
(define-generic (insert-first! obj item)) (define-generic (remove-first! obj))
These two generic procedures define "generic operations" which may be supported by different classes, but do semantically "the same thing." That is, the generic procedures don't represent how to do a particular kind of operation on a particular kind of object, but instead represent a general kind of operation that we can define for different kinds of objects.
This pair of generic procedures therefore acts as an abstract data type, which represents object that can behave as stacks. The don't say how any particular implementation of stacks works.
To make the generic operations work for the particular class
<stack>
, we need to define methods that say
how to perform the insert-first!
and
remove-first!
operations on objects that are instances
of class <stack>
.
For this, we use the macro define-method
. Here's the
definition of the insert-first!
operation for the class
<stack>
:
(define-method (insert-first! (self <stack>) item) (set-items! self (cons item (items self))))
This method definition is very much like a procedure
definition. Here we're defining a method that takes two
arguments, named self and item
. The calling form
(insert-first! (<stack> self) item)
says that this is
the particular procedure to use for the generic procedure
insert-first!
operation when it's given two arguments,
and the first argument is an instance of class <stack>
.
That is, we're defining a procedure of two arguments, self
and item
, but we're also saying that this procedure is to
be used by the generic procedure insert-first!
only when its first argument is a stack. (The names self
and item
were chosen for convenience--as with a normal
procedure, we can name arguments anything we want.)
Given this definition, when insert-first!
is called
with two arguments, and the first is a stack, this procedure
will be executed to perform the operation in the appropriate
way for stacks. We say that we are specializing the
generic procedure insert-first!
for instances of the
class <stack>
.
The body of this method definition refers to the stack being
operated on as self
, the name given as the first
argument name; it refers to the second argument, which is
being pushed on the stack, as item
. The body
of the method is
(set-items! self (cons item (items self)))
which relies on the getter and setter implicitly defined for
the items
slot by the class definition. It fetches
the value of the head
slot of self
using
head
, cons
es the argument item
onto
that list, and assigns the result to the head slot using
set-head!
.
The method for the generic procedure remove-first!
when
applied to stacks could be defined like this:
(define-method (remove-first! (self <stack>)) (let ((first-item (car (items self)))) (set-items! (cdr (items self)))))
Now let's implement a queue data type. Like a stack, a queue data
type allows you to push an item on the front of an ordered sequence
of items--it supports the insert-first!
operation.
However, a queue doesn't let you add items to the front--it only
lets you add items to the rear. So our <queue>
class should
support remove-first!
, like <stack>
, but insert-last!
instead of insert-first!
.
This means that we can define a method for <queue>
on the
remove-first!
generic procedures, but we need a new generic
procedure insert-last!
, which represents the abstract operation
of removing the last item from an ordered sequence.
(define-generic insert-last!)
The pair of generic operations insert-last!
and remove-first!
represent the abstract datatype of queues and things that can behave
like queues.
To actually implement queues, we need a class definition and some method definitions, to say how a queue should be represented, and how the queue operations should be done on it.
For a queue, it's good for accesses to be fast at either end, so we'll
want a doubly-linked list, rather than a simple list of pairs.
Here's a class definition for <queue>
:
(define-class <queue> (<object>) (front '()) (rear '()))
Each <queue>
s keep a pointer to the beginning of the linked list
and a pointer to the end of the linked list. The queue itself is
structured as a doubly-linked list of queue nodes, each of which has a
pointer to an item that's conceptually in the queue, plus a
next
pointer to the next doubly-linked list node, and a prev
pointer to the previous one.
To implement the doubly-linked list, we'll use a helper class to implement
the list nodes, called <d-l-list-node>
.
(define-class <d-l-list-node> (<object>) (item) (next) (prev))
This definition will implicitly define setters and getters for the
fields, e.g., set-next!
and set-next!
for the next
field of a <d-l-list-node>
.
Now we can define the methods for the remove-first!
and
insert-last!
operations on instances of <queue>
.
(define (insert-last! (self <queue>) item) (let ((new-node (make <d-l-list-node> item: item prev: (rear self)) next: '()))) (cond ((null? (front self)) ; nothing in queue yet? (set-front! self new-node) ; this will be first (else ; otherwise (set-next! (rear self) new-node))) ; append to rear of list (set-rear! self new-node)))) ; update rear pointer
(define (remove-first! (self <queue>)) (let ((first-node (front self))) (if (null? first-node) (error "attempt to remove-first! from an empty queue:" self) (let* ((first-item (item first-node)) (rest (next first-node))) (cond((null? rest) ; no nodes left in queue? (set-front! self '()) (set-rear! self '())) (else (set-prev! rest '()) (set-front! self rest)))))))
Note that what stacks and queues both support the abstract operation of removing the first item, but each does it in a different way--the same operation (generic procedure) is implemented by different code (methods).
A generic procedure is a procedure, like any other--it is a first-class object that happens to be callable as a procedure. You can therefore use store generic procedures in data structures, pass them as arguments to other procedures, and so on.
For example, in a graphical program, we may have a generic draw
procedure to display any kind of graphical object, and each class of
graphical object may have its own draw
method. By mapping the
generic procedure draw
over a list of graphical objects, like this,
(map draw list-of-objects-to-be-drawn)
we can invoke the appropriate draw method for each kind of object.
In our system, classes are also first class. When we use define-class
to define a class named <point>
, we are actually doing two things:
we are creating a special kind of object to represent the class,
and we are defining a variable named <point>
initialized
with a pointer to the class object.
In this section, I'll present a simple implementation of the simple object system described so far. Our object system is based on metaobjects, i.e., objects which represent or manipulate other objects such as class instances and methods. (The meta- is Greek for "about," "beyond," or "after".
In programming language terminology, metaobjects are objects that are "about" other objects or procedures. The two most important kinds of metaobjects are class objects and generic procedure objects. A class object represents instances of a particular class, and a generic procedure object represents a generic operation.
Metaobjects control how other objects behave. For example, a class object controls how instances of the class are constructed, and a generic procedure object controls when and how the particular methods on that generic procedure are invoked to do the right thing for particular kinds of objects.
A big advantage of the metaobject approach is that since metaobjects are just objects in the language, we can implement most or all of the object system in the language--in this section, we'll show how to implement a simple object system for Scheme, in portable Scheme. (We will rely on macros, which some versions of Scheme don't support yet, however.) An advantage of writing a Scheme object system in Scheme is that we can modify and extend the object system without having to change the compiler.
We will use macros to translate class, generic procedure, and method definitions into standard Scheme data structures and procedures. A class object in our system is just a data structure, for which we'll use a vector (one-dimensional array) as the main representation. A class object will record all of the information necessary to create instances of that class.
Instances of a class will also be represented as Scheme vectors. Each slot of an object will be represented as a field of a vector, and we'll translate slot names into vector indexes.
Generic procedures will be represented as Scheme procedures, constructed in a way that lets us define methods--each generic procedure will maintain a table of methods indexed by which classes they work for. When a generic procedure is called in the normal way, it check the class of the object it's being applied to, and will search its table of methods for the appropriate method, and call that method, passing along the same arguments. Methods will also be represented as Scheme procedures.
define-class
define-class
is a macro which accepts the users's description
of a class, massages it a little, and passes it on to the procedure
create-class
to construct a class object.
The reason that define-class
is written as a macro and not
a procedure is so that the arguments to the macro won't be evaluated
immediately. For example, the class name (e.g., <point>
or
<queue
passed to define-class
isn't a variable to
be evaluated--it's a symbol to be used as the name of the class.
When a call to define-macro
is compiled (or interpreted),
the transformation procedure for the macro does two things.
First, it constructs the class object and adds it to a special
data structure by calling register-class
. Then it
generates code to define a variable whose name is the name
of the class, and initialize that with a pointer to the
class. The generated code (the variable definition) is returned
by the transformer, and that's what's interpreted or compiled
at the point where the macro was called.
For example, consider a call to create a <point>
class:
(define-class <point> (<object>) (x) (y))
This should be translated by macro processing into a variable definition
for <point>
, which will hold a pointer to the class object, like
this:
(define <point>
complicated_expression)
where complicated_expression has the side-effect of constructing
the class object, registering its existence with related objects (virtual
procedures for the accessors), and so on. complicated_expression
should look something like this, for our <point>
definition:
; construct an association list describing the slots of this kind of object, ; indexed by slot name and holding the routines to get and set the slot ; values. (let ((slots-alist (generate-slots-alist '((x) (y))))) ; create the class object, implemented as a Scheme vector (let ((class-object (vector <<class>> ; pointer to class of class '<point> ; name symbol for this class (list <object>) ; list of superclass objects slots-alist ; slot names/getters/setters '*dummy*))) ; placeholder ; create and install the instance allocation routine, which will create ; and initialize an instance of this class, implemented as a vector (vector-set! class-object 4 (lambda (x y) (vector class-object x y))) ; register accessor methods with appropriate generic procedures (register-accessor-methods class-object slots-alist) ; and return the class object we constructed class-object))
In more detail, what this generated code does is:
<point>
object.
It creates procedures that will get and set the values of the
slots x
and y
, which have been mapped to
indexed fields 1
and 2
of the vector used
to represent an instance. (These are the methods for the
generic procedures x
, set-x!
, y
,
and set-y!
, which will be registered with those
generic procedures.)
<class>
,
which identifies this object as a class object.(13) The 1st slot holds
a pointer to the name symbol that is this class's class name.
(This is just for documentation purposes.) The 2nd slot holds
a list of pointers to this object's immediate superclasses.
(Note that this is a list of pointers to actual class objects,
not name symbols.(14)
make
macro will ensure that
arguments are passed in the right order from calls to make
using keywords.)
Since this is all done in the initial value expression of the
definition of the variable <point>
, the returned class
object becomes the initial value of that variable binding.
Once all this is done, we could create an instance of class point by extracting the allocator procedure from the class object and calling it with the initial values in the proper order. For example,
((vector-ref <point> 4) 20 30)
would extract the point-allocator procedure from the <point> class
object, and call it to create a <point>
instance with an
x
value of 20 and a y
value of 30. (The make
macro will provide a friendlier interface.)
Now we'll show a simplified version of the procedure
generate-class-code
, which generates the kind of class-creating
s-expression shown above.
Now let's look at the macro to produce code like this from a simple class definition.
For now, we'll assume that the body of the class definition consists of nothing but slot declarations with no keword options--initial value specifiers or other options--i.e., they're one-element lists holding just a symbol that names a slot. Ignoring inheritance and assuming that a class includes only the slots declared in this class definition, we'll simply assign slots index numbers in the order they're declared.
We'll also continue to ignore issues of inheritance and automatic generation of generic procedures for slot accessor methods. When we implement inheritance, described later, we'll need to do something with the list of superclasses.)
(define-macro (define-class class-name superclass-list . slot-decls) `(define ,class-name (let ((slots-slist (generate-slots-alist ',slot-decls 1))) ; create the class object, implemented as a Scheme vector (let ((class-object (vector <<class>> ; metaclass ',class-name ; name (list ,@superclass-list) ; supers slots-alist ; slots '*dummy*))) ; creator ; install a routine to create instances (vector-set! class-object 4 ; creation routine takes slot values ; as args, creates a vector w/class ; pointer for this class followed by ; slot values in place. (lambda ,(map car slot-decls) (vector class-object ,@(map car slot-decls)))) ; register accessor methods with appropriate generic procs (register-accessor-methods class-object slots-alist) class-object))
Two important helper routines are used by this macro:
generate-slots-alist
and register-accessor-methods
.
The initial value expression for slots-alist
is a call to generate-slots-alist
, with an argument
that is a quoted version of the argument declarations passed to
the macro. Notice that we're using unquote inside a quoted
expression, and this works. The value of slot-decls will be
substituted inside the quote expression during macro processing.
For the <point>
definition, the expression
(generate-slots-alist ',slot-decls 1)
will translate to
(generate-slots-alist '((x) (y)) 1)
.(15)
Several other expressions in the macro work this way, as well:
For the <point>
example, ',class-name
will translate
into '<point>
, a literal referring to the name symbol for the
particular class we're defining.
Likewise, (list ,@superclass-list)
, which uses unquote-splicing,
will be translated to (list <object>)
; when that expression is
evaluated, the value of the variable <object>
will be fetched and
put in a list. (Notice that this makes a list with the actual class
object in it, not the symbol <object>
.) The lambda
expression that generates an instance creating procedure uses
both unquote and unquote-splicing:
(lambda ,(map car slot-decls) (vector class-name ,@(map car slot-decls))
It will translate to
(lambda (x y) (vector class-name x y))
generate-slots-alist
just traverses the list of slot
declarations recursively, inrementing an index of which slot number is next,
and constructs list of associations, one per slot. Each association is a
list hose car (i.e., the key) is the name of the slot, and its second
and third elements are procedures to access the slot. The actual
accessor procedures are generated by calls to slot-n-getter
and slot-n-setter
, which return procedures to get or
set the nth slot of a vector.
(define (generate-slots-alist slot-decls slot-num) (if (null? slot-decls) '() (cons `(,(caar slot-decls) ,(slot-n-getter slot-num) ,(slot-n-setter slot-num)) (generate-slots-alist (cdr slot-decls) (+ 1 slot-num)))))
(This procedure is initially called with a slot-num of 1, reserving the zeroth slot for the class pointer.)
Here are simple versions of slot-n-getter
and slot-n-setter
.
Each one simply makes a closure of an accessor procedure, capturing
the environment where n is bound, to specialize the accessor to access
a particular slot. (If we handle keyword options, we'll have to make
the code a little more complicated.)
(define (slot-n-getter offset) (lambda (obj) ; return a procedure to read (vector-ref obj offset))) ; slot n of an object (define (slot-n-setter offset) (lambda (obj value) ; return a procedure to update (vector-set! obj offset value))) ; slot n of an object
We construct a new closure for each slot accessor, but that really isn't necessary. We could cache the closures, and always return the same closure when we need an accessor for a particular slot offset.
<<class>>
Our simple object system implementation assumes that every instance is represented as a Scheme vector whose 0th slot holds a pointer to a class object, which is also an object in the system. This implies that a class object must also have a class pointer in its zeroth slot. A question naturally arises as to what the class of a class object is, and what its class pointer points to.
This is actually a deep philosophical question, and for advanced
and powerful object system, it has practical consequences. For
our little object system, we'll settle the question in a simple
way. All class objects have a class pointer that points to
a special object, the class of all classes. We call this object
<<class>>
, where the doubled angle brackets suggest that
it is not only a class, but the class of other class objects.
This is known as a metaclass
, because it's a class that's
about classes.
It doesn't do very much--it just gives a special object we can use as the class value for other class objects, so that we can tell that they're classes.
In our simple system, the unique object <<class>>
has
a class pointer that points to itself---that is, it describes
itself in the same sense that it describes other classes. This
circularity isn't harmful, and allows us to terminate the possibly
infinite regression of classes, metaclasses, meta-metaclasses,
and so on.
We construct this one special object "by hand." Like other class objects in our system, it is represented as a Scheme vector whose first element points to itself, and which has a few other standard fields. Most of the standard fields will be empty, because class <<class>> has no superclasses, no slots, and no allocator--because we create the one instance specially.
The following definition suffices to create the class <<class>>
:
(define <<class>> (let ((the-object (vector '*dummy* ; placeholder for class ptr '<<class>> ; name of this class '() ; superclasses (none) '() ; slots (none) #f ; allocator (none) '()))) ; prop. list (initially empty) ; set class pointer to refer to itself (vector-set! the-object 0 the-object) ; and return the class object as initial value for define the-object))
Once this is done, we can define a few other routines that will come in handy in implementing the rest of the object system:
instance?
is a predicate that checks whether an object is an
instance of a class in our class system, as opposed to a plain old
Scheme object like a pair or a number. (In a better object system,
like RScheme's, all Scheme objects would also be instances of classes,
but we'll ignore that for now.)
; An object is an instance of a class if it's represented as a ; Scheme vector whose 0th slot holds a class object. ; Note: we assume that we never shove class objects into other ; vectors. We could relax this assumption, but our code ; would be slower. (define (instance? obj) (and (vector? obj) (class? (vector-ref 0 obj)))
; An object is a class (meta)object if it's represented as a Scheme ; vector whose 0th slot holds a pointer to the class <<class>>. ; Note: we assume that we never shove the <<class>> object into ; other vectors. We could relax this, at a speed cost. (define (class? obj) (and (vector? obj) (eq? (vector-ref 0 obj) <<class>>)))
; We can fetch the class of an instance by extracting the value ; in its zeroth slot. Note that we don't check that the argument ; obj *is* an instance, so applying this to a non-instance is an error. (define (class-of-instance obj) (vector-ref obj 0))
define-generic
Each generic procedure maintains a table of methods that are defined on it, indexed by the classes they are applicable to. In our simple object system implementation, this table will be implemented as an association list, keyed by class pointer. That is, the association list is a list of lists, and each of those lists holds a class object and a procedure. The class object represents the class on which the method is defined, and the procedure is the method itself.
When the generic procedure is called on a particular instance, it will extract the class pointer from the zeroth slot of the instance, and use it as a key to probe its own association list. It will then extract the procedure that's the second element of the resulting list, and call it. When calling the method, it will pass along the same arguments it received.
This scheme can be rather slow--a linear search of all methods may be slow if there are many methods defined on a generic procedure, and especially if the frequently-called ones are not near the front of the list. We could speed this up considerably by using caching tricks, e.g., reorganizing the list to put recently-used elements at the front. A more aggressive system could figure out how to avoid looking up methods at runtime in most cases, but that's considerably more complicated. We won't bother with any of that for now, to keep our example system simple.
(Understanding this simple system will be a good start toward understanding more sophisticated systems that perform much better, and even this simple system is fast enough for many real-world uses, such as most scripting and GUI programming, or coarse-grained object-oriented programming where most of the real work is done in non-object-oriented code.)
When we evaluate an expression such as
(define-generic (insert-first! obj item))
we would like the macro to be translated into code that will do several things:
insert-first!
and initialize
the binding with a pointer to the generic procedure.
The first two and the last are easy, and we'll ignore the third for
now. define-generic
can generate code like this:
(define insert-first! ; create an environment that only the generic procedure will ; be able to see. (let ((method-alist '())) ; create and return the generic procedure that can see that ; method a-list. (lambda (obj item) (let* ((class (class-of-instance obj)) (method (cadr (assq class method-alist)))) (if method (method obj item) (error "method not found"))))))
Here we use let
to create a local variable binding to hold
the association list, and capture it by using lambda to create
the generic procedure in its scope. Once the procedure is returned
from the let, only that procedure will ever be able to operate on
that association list.
The procedure returned by lambda
will take the two arguments
specified by the generic procedure declaration, extract the class pointer
from the first argument object, probe the association list to get the
appropriate method for that class, and (tail-)call that method, passing
along the original arguments. If it fails to find a method for the
class of the instance it's being applied to, it signals an error.
Keeping in mind that this code doesn't quite work because we can't
actually add methods to the method association list, we could
define define-generic
as a macro this way:
(define-macro (define-generic name . args) `(define ,name (let ((method-alist '())) (lambda (,@args) (let* ((class (class-of-instance ,(car args)))) (method (cadr (assq class method-alist)))) (if method (method obj item) (error "method not found"))))))
To allow methods to be added to the method-alist, we'll change
the macro to create another procedure, along with the generic
procedure, in the environment where method-list
is visible.
This procedure can be used to add a new method to the method
association lists. This table will be an association list
stored in the global variable *generic-procedures*
.
We'll also maintain a table of generic procedures and the corresponding
procedures that add methods to their association lists. While we're
at it, we'll modify define-generic
record the name of a generic
procedure when it's defined, so that it can print out a more helpful
error message when a lookup fails. The inital value expression
will be a letrec which lets us define four variables, two of
which are procedure-valued, and then returns one of those procedures,
the actual generic procedure
(define *generic-procedures* '()) (define-macro (define-generic name . args) `(define ,name (letrec ((gp-name ,name) (method-alist '()) (method-adder (lambda (generic-proc method) (set! method-alist (cons (list generic-proc method) method-alist)))) (generic-proc (lambda (,@args) (let* ((class (class-of-instance ,(car args)))) (method (cadr (assq class method-alist)))) (if method (method obj item) (error "method not found for " gp-name)))))) ; add the generic procedure and its method-adding ; routine to the association list of generic procedures (set! *generic-procedures* (cons (list generic-proc method-adder) *generic-procedures*)) generic-procedure)))
define-method
Now that each generic procedure is associated with a method-adding
procedure that can add to its list of methods, we can define the
define-method
macro. define-method
will create a
method using lambda
, and add it to the generic procedure's
method association list, indexed by the class that it is to
be used for.
In this simple system, where only the first argument is dispatched on (used in selecting the appropriate method), we only need to treat the first argument declaration specially.
Consider an example the example of defining an insert-first!
method for class stack
.
(define-method (insert-first! (self <stack>) item) (set-items! self (cons item (items self))))
We'd like this to be translated by macro processing into the equivalent
(add-method-to-generic-proc insert-first! <stack> (lambda (self item) (set-items! self (cons item (items self)))))
The real work is done by the procedure add-method-to-generic-procedure
,
which we can write as
(define (add-method-to-generic-procedure generic-proc class method) (let ((method-adder! (cadr (assq *generic-procedures* generic-proc)))) (method-adder! class method)))
This procedure expects three arguments--a generic procedure object,
a class object, and a closure that implments the corresponding method.
It searces the association list
The calling pattern for the define-method
macro will ensure that
the actual calling expression is destructured into three parts, giving
us the first argument's name and the name and its class.
(define-macro (define-method (gp (arg1 class) . args) . body) `(add-method-to-generic-proc ,gp ,class (lambda (arg1 ,@args) ,@body)))
Given the code we've seen so far, we've almost got a working object system,
but we left out a detail when we defined define-class
. Recall that
the accessor routines for a class's slots are supposed to be used as
methods on generic procedures such as x
. define-class
generates code that calls register-accessor-methods
, to install
the accessor routines for the slots of a class as methods on generic
procedures.
register-accessor-methods
iterates over the slots association
list of the class, looking at each slot name and its corresponding
accessors, and adding the accessor procedures to the appropriate generic
procedure. For a given slot name, the appropriate generic procedure
name is automatically constructed using the accessor naming conventions.
[ OOPS--theres a hitch here. We didn't index the generic procedures by name... it's also awkward that Scheme doesn't provide a standard bound? procedure so that we can tell if the generic procedure already exists. Is it even possible to automatically define the generic procedures in absolutely portable Scheme, without doing something painful? I suppose that if we can search the list of generic procedures by name, the macro transformer for define-class can look to see which accessor names don't have corresponding generic functions, BEFORE actually generating the transformed code. It could then add a (define-generic ...) to its output for each accessor that doesn't already have an existing generic procedure to add it to. Tedious, and annoying to have to explain. ]
So far we've described a simple object-based programming system and shown how it can be implemented. A fully object-oriented system requires another feature---inheritance.
Inheritance allows you to define a new class in terms of another class.
For example, we might have a class <point>
, and want to define
a similar class, <colored-point>
, which records the color to
be used to display a point when it is drawn on the user's screen.
Given our simple object-based system so far, we would have to define
colored-point
from scratch, defining its x and y fields as
well as its color field. This definition would be mostly redundant
with the definition of <point>
, making the code harder to
understand and maintain.
Inheritance lets us define new classes by describing its differences from another class. For example, we could define colored-point like this:
(define-class <colored-point> (<point>) (color))
This definition says that instances of <colored-point>
have
all of the slots of <points>
(i.e., x
and y
),
as well as another slot, color
. We say that <colored-point>
inherits the slotss defined for <point>
.
Inheritance applies to methods as well as slots. The definition
above tells our object system that the methods defined for the
superclass <point>
should also be used for <colored-point>s
,
unless we specifically define new methods just for <colored-point>s
on the same generic procedures.
This gives us a concise declarative way of defining classes--we can
declare that a <colored-point>
is like a <point>
, except
for the particular differences we specify. The object system then
infers what slots a <colored-point>
must have from this declaration
(and methods we define for this class) plus the declarations for
<point>
and its methods.
Note that inheritance is transitive. If we define a subclass of
<colored-point>, say <flashing-colored-point>
, it will inherit
the slots and methods of <colored-point>
, and also the slots
and methods of <point>
.
By default, a class inherits all of the methods defined for
its superclasses. We can override an inherited definition,
though, by defining a method definition explicitly. For example,
we might have a draw
method for class <point>
which
simply draws a black pixel on the screen at the point's coordinates.
(This might be through a call to an underlying graphics library
provided by the operating system.) For <colored-point>
,
we would probably want to define a new draw
method so that
the point would be drawn in color.
Sometimes, we don't want to completely redefine an inherited method for a new class, but we would like to refine it--we may want to define the new method in terms of the inherited method.
For example, suppose we have a class <queue>
, which maintains
a queue as we saw earlier, and we woulto refine it to
create a new kind of queue that keeps track of the size of the
queue--i.e., the number of items in the queue.
We might define <counted-queue>
as a subclass of <queue>
,
but with a size slot, like this:
(define <counted-queue> (<queue>) (size initial-value: 0))
Then we can define the get-first
and put-last
methods
for counted-queue
in terms of the corresponding methods for
<queue>
. We do this by using a special pseudo-procedure
called next-method
. Inside a method definition, the name
next-method
refers to an inherited procedure by the same
name. This allows us to call the inherited version of a method
even though we're overriding that definition.
(define-method (get-first! (self <counted-queue>)) (count-set! self (- (count self) 1)) ; update count of items, and (next-method self)) ; call inherited get-first
(define-method (put-last! (self <counted-queue>) item) (next-method self item) (count-set! self (+ (count self) 1))
next-method
's name comes from the fact that it represents the
next most specific method for this operation applied to this class,
according to the inheritance graph. The method we're defining is the
most specific method, because it's defined for this class exactly, and
the inherited one is the next most specific. (The inherited
one may in turn call a method that was inherited earlier, which will
in turn be the next most specific method, and so on.)
The simple object system
(And no bound?
either, so it's hard to ensure things like generation
of generic procedures for accessors exactly once.)
(Check-me-on-this declarations vs. trust-me declarations.)
bound?
procedure