fun-defs)]
[id (v) (error ’interp ”free identifier”)]
[app (fun-name arg-expr)
(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])
(interp (subst (fundef-body the-fun-def )
(fundef-arg-name the-fun-def )
(num (interp arg-expr fun-defs)))
fun-defs))]))
Figure 4.2: Implementation of Functions: Interpreter
Chapter 5
Deferring Substitution
Let’s examine the process of interpreting the following small program. Consider the following sequence of
evaluation steps:
{with {x 3}
{with {y 4}
{with {z 5}
{+ x {+ y z}}}}}
= {with {y 4}
{with {z 5}
{+ 3 {+ y z}}}}
= {with {z 5}
{+ 3 {+ 4 z}}}
= {+ 3 {+ 4 5}}
at which point it reduces to an arithmetic problem. To reduce it, though, the interpreter had to apply substi-
tution three times: once for each with. This is slow! How slow? Well, if the program has size n (measured
in abstract syntax tree nodes), then each substitution sweeps the rest of the program once, making the com-
plexity of this interpreter at least O(n2). That seems rather wasteful; surely we can do better.
How do we avoid this computational redundancy? We should create and use a repository of deferred
substitutions. Concretely, here’s the idea. Initially, we have no substitutions to perform, so the repository
is empty. Every time we encounter a substitution (in the form of a with or application), we augment the
repository with one more entry, recording the identifier’s name and the value (if eager) or expression (if
lazy) it should eventually be substituted with. We continue to evaluate without actually performing the
substitution.
This strategy breaks a key invariant we had established earlier, which is that any identifier the interpreter
encounters is of necessity free, for had it been bound, it would have been replaced by substitution. Because
we’re no longer using substitution, we will encounter bound identifiers during interpretation. How do we
handle them? We must substitute them with by consulting the repository.
Exercise 5.0.2 Can the complexity of substitution be worse than O(n2)?
33
34
CHAPTER 5. DEFERRING SUBSTITUTION
5.1
The Substitution Repository
Let’s provide a data definition for the repository:
(define-type DefrdSub
[mtSub]
[aSub (name symbol?) (value number?) (ds DefrdSub?)])
where DefrdSub stands for “deferred substitutions”. A DefrdSub has two forms: it’s either empty (mtSub1)
or non-empty (represented by an aSub structure). The latter contains a reference to the rest of the repository
in its third field.
The interpreter obviously needs to consume a repository in addition to the expression to interpret. There-
fore, its contract becomes
;; interp : F1WAE listof(fundef) DefrdSub → number
It will need a helper function that looks up the value of identifiers in the repository. Its code is:
;; lookup : symbol DefrdSub → F1WAE
(define (lookup name ds)
(type-case DefrdSub ds
[mtSub () (error ’lookup ”no binding for identifier”)]
[aSub (bound-name bound-value rest-ds)
(if (symbol=? bound-name name)
bound-value
(lookup name rest-ds))]))
With that introduction, we can now present the interpreter:
(define (interp expr fun-defs ds)
(type-case F1WAE expr
[num (n) n]
[add (l r) (+ (interp l fun-defs ds) (interp r fun-defs ds))]
[with (bound-id named-expr bound-body)
(interp bound-body
fun-defs
(aSub bound-id
(interp named-expr
fun-defs
ds)
ds))]
[id (v) (lookup v ds)]
[app (fun-name arg-expr)
(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])
1“Empty sub”—get it?
5.2. DEFERRING SUBSTITUTION CORRECTLY
35
(interp (fundef-body the-fun-def )
fun-defs
(aSub (fundef-arg-name the-fun-def )
(interp arg-expr fun-defs ds)
ds)))]))
Three clauses have changed: those for with, identifiers and applications. Applications must look up the
value of an identifier in the repository. The rule for with evaluates the body in a repository that extends
the current one (ds) with a binding for the with-bound identifier to its interpreted value. The rule for
an application similarly evaluates the body of the function with the repository extended with the formal
argument bound to the value of the actual argument.
To make sure this is correct, we recommend that you first study its behavior on programs that have no
identifiers—i.e., verify that the arithmetic rules do the right thing—and only then proceed to the rules that
involve identifiers.
5.2
Deferring Substitution Correctly
Consider the evaluation of the expression
{with {n 5}
{f 10}}
in the following list of function definitions:
(list (fundef ’f ’p (id ’n)))
That is, f consumes an argument p and returns the value bound to n. This corresponds to the Scheme
definition
(define (f p) n)
followed by the application
(local ([define n 5])
(f 10))
What result does Scheme produce?
Our interpreter produces the value 5. Is this the correct answer? Well, it’s certainly possible that this is
correct—after all, it’s what the interpreter returns, and this could well be the interpreter for some language.
But we do have a better way of answering this question.
Recall that the interpreter was using the repository to conduct substitution more efficiently. We hope that
that’s all it does—that is, it must not also change the meaning of a program! Our “reference implementation”
is the one that performs explicit substitution. If we want to know what the value of the program really “is”,
we need to return to that implementation.
What does the substitution-based interpreter return for this program? It says the program has an unbound
identifier (specifically, n). So we have to regard our interpreter with deferred substitutions as buggy.
36
CHAPTER 5. DEFERRING SUBSTITUTION
While this new interpreter is clearly buggy relative to substitution, which it was supposed to represent,
let’s think for a moment about what we, as the human programmer, would want this program to evaluate to.
It produces the value 5 because the identifier n gets the value it was bound to by the with expression, that
is, from the scope in which the function f is used. Is this a reasonable way for the language to behave? A
priori, is one interpretation better than the other? Before we tackle that, let’s introduce some terminology to
make it easier to refer to these two behaviors.
Definition 10 (Static Scope) In a language with static scope, the scope of an identifier’s binding is a syn-
tactically delimited region.
A typical region would be the body of a function or other binding construct. In contrast:
Definition 11 (Dynamic Scope) In a language with dynamic scope, the scope of an identifier’s binding is
the entire remainder of the execution during which that binding is in effect.
That is, in a language with dynamic scope, if a function g binds identifier n and then invokes f, then f can
refer to n—and so can every other function invoked by g until it completes its execution—even though f
has no locally visible binding for n.
Armed with this terminology, we claim that dynamic scope is entirely unreasonable. The problem is
that we simply cannot determine what the value of a program will be without knowing everything about its
execution history. If the function f were invoked by some other sequence of functions that did not bind a
value for n, then that particular application of f would result in an error, even though a previous application
of f in the very same program’s execution completed successfully! In other words, simply by looking at the
source text of f, it would be impossible to determine one of the most rudimentary properties of a program:
whether or not a given identifier was bound. You can only imagine the mayhem this would cause in a large
software system, especially with multiple developers and complex flows of control. We will therefore regard
dynamic scope as an error and reject its use in the remainder of this text.
5.3
Fixing the Interpreter
Let’s return to our interpreter. Our choice of static over dynamic scope has the benefit of confirming that
the substituting interpreter did the right thing, so all we need do is make the new interpeter be a correct
reimplementation of it. We only need to focus our attention on one rule, that for function application. This
currently reads:
[app (fun-name arg-expr)
(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])
(interp (fundef-body the-fun-def )
fun-defs
(aSub (fundef-arg-name the-fun-def )
(interp arg-expr fun-defs ds)
ds)))]
5.3. FIXING THE INTERPRETER
37
When the interpreter evaluates the body of the function definition, what deferred substitutions does it rec-
ognize? It recognizes all those already are in ds, with one more for the function’s formal parameter to
be replaced with the value of the actual parameter. But how many substitutions should be in effect in the
function’s body? In our substitution-based interpreter, we implemented application as follows:
[app (fun-name arg-expr)
(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])
(interp (subst (fundef-body the-fun-def )
(fundef-arg-name the-fun-def )
(num (interp arg-expr fun-defs)))
fun-defs))]
This performs only one substitution on the function’s body: the formal parameter for its value. The code
demonstrates that none of the substitutions applied to the calling function are in effect in the body of the
called function. Therefore, at the point of invoking a function, our new interpeter must “forget” about the
current substitutions. Put differently, at the beginning of every function’s body, there is only one bound
identifier—the function’s formal parameter—independent of the invoking context.
How do we fix our implementation? We clearly need to create a substitution for the formal parameter
(which we obtain using the expression (fundef-arg-name the-fun-def )). But the remaining substitutions must
be empty, so as to not pick up the bindings of the calling context. Thus,
[app (fun-name arg-expr)
(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])
(interp (fundef-body the-fun-def )
fun-defs
(aSub (fundef-arg-name the-fun-def )
(interp arg-expr fun-defs ds)
(mtSub) )))]
That is, we use the empty repository to initiate evaluation of a function’s body, extending it immediately with
the formal parameter but no more. The difference between using ds and (mtSub) in the position of the box
succintly captures the implementation distinction between dynamic and static scope, respectively—though
the consequences of that distinction are far more profound that this small code change might suggest.
Exercise 5.3.1 How come we never seem to “undo” additions to the repository? Doesn’t this run the risk
that one substitution might override another in a way that destroys static scoping?
Exercise 5.3.2 Why is the last ds in the interpretation of with also not replaced with (mtSub)? What
would happen if we were to effect this replacement? Write a program that illustrates the difference, and
argue whether the replacement is sound or not.
Exercise 5.3.3 Our implementation of lookup can take time linear in the size of the program to find some
identifiers. Therefore, it’s not clear we have really solved the time-complexity problem that motivated the
use of a substitution repository. We could address this by using a better data structure and algorithm for
lookup: a hash table, say. What changes do we need to make if we use a hash table?
Hint: This is tied closely to Exercise 5.3.1!
38
CHAPTER 5. DEFERRING SUBSTITUTION
(define-type F1WAE
[num (n number?)]
[add (lhs F1WAE?) (rhs F1WAE?)]
[with (name symbol?) (named-expr F1WAE?) (body F1WAE?)]
[id (name symbol?)]
[app (fun-name symbol?) (arg F1WAE?)])
(define-type FunDef
[fundef (fun-name symbol?)
(arg-name symbol?)
(body F1WAE?)])
(define-type DefrdSub
[mtSub]
[aSub (name symbol?) (value number?) (ds DefrdSub?)])
;; lookup-fundef : symbol listof(FunDef) −→ FunDef
(define (lookup-fundef fun-name fundefs)
(cond
[(empty? fundefs) (error fun-name ”function not found”)]
[else (if (symbol=? fun-name (fundef-fun-name (first fundefs)))
(first fundefs)
(lookup-fundef fun-name (rest fundefs)))]))
;; lookup : symbol DefrdSub → F1WAE
(define (lookup name ds)
(type-case DefrdSub ds
[mtSub () (error ’lookup ”no binding for identifier”)]
[aSub (bound-name bound-value rest-ds)
(if (symbol=? bound-name name)
bound-value
(lookup name rest-ds))]))
Figure 5.1: Functions with Deferred Substitutions: Support Code
5.3. FIXING THE INTERPRETER
39
;; interp : F1WAE listof(fundef) DefrdSub → number
(define (interp expr fun-defs ds)
(type-case F1WAE expr
[num (n) n]
[add (l r) (+ (interp l fun-defs ds) (interp r fun-defs ds))]
[with (bound-id named-expr bound-body)
(interp bound-body
fun-defs
(aSub bound-id
(interp named-expr
fun-defs
ds)
ds))]
[id (v) (lookup v ds)]
[app (fun-name arg-expr)
(local ([define the-fun-def (lookup-fundef fun-name fun-defs)])
(interp (fundef-body the-fun-def )
fun-defs
(aSub (fundef-arg-name the-fun-def )
(interp arg-expr fun-defs ds)
(mtSub))))]))
Figure 5.2: Functions with Deferred Substitutions: Interpreter
40
CHAPTER 5. DEFERRING SUBSTITUTION
Chapter 6
First-Class Functions
We began Section 4 by observing the similarity between a with expression and a function definition applied
immediately to a value. Specifically, we observed that
{with {x 5} {+ x 3}}
is essentially the same as
f (x) = x + 3; f (5)
Actually, that’s not quite right: in the math equation above, we give the function a name, f , whereas there is
no identifier named f anywhere in the WAE program. We can, however, rewrite the mathematical formula-
tion as
f = λ (x).x + 3; f (5)
which we can then rewrite as
(λ (x)x + 3)(5)
to get rid of the unnecessary name ( f ).
Notice, however, that our langugae F1WAE does not permit anonymous functions of the style we’ve
used above. Because such functions are useful in their own right, we now extend our study of functions.
6.1
A Taxonomy of Functions
The translation of with into mathematical notation exploits two features of functions: the ability to create
anonymous functions, and the ability to define functions anywhere in the program (in this case, in the func-
tion position of an application). Not every programming language offers one or both of these capabilities.
There is, therefore, a taxonomy that governs these different features, which we can use when discussing
what kind of functions a language provides:
first-order Functions are not values in the language. They can only be defined in a designated portion of
the program, where they must be given names for use in the remainder of the program. The functions
in F1WAE are of this nature, which explains the 1 in the name of the language.
41
42
CHAPTER 6. FIRST-CLASS FUNCTIONS
higher-order Functions can return other functions as values.
first-class Functions are values with all the rights of other values. In particular, they can be supplied as the
value of arguments to functions, returned by functions as answers, and stored in data structures.
We would like to extend F1WAE to have the full power of functions, to reflect the capability of Scheme. In
fact, it will be easier to return to WAE and extend it with first-class functions.
6.2
Enriching the Language with Functions
To add functions to WAE, we must define their concrete and abstract syntax. First let’s examine some
concrete programs:
{{fun {x} {+ x 4}}
5}
This program defines a function that adds 4 to its argument and immediately applies this function to 5,
resulting in the value 9. This one
{with {double {fun {x} {+ x x}}}
{+ {double 10}
{double 5}}}
evaluates to 30. The program defines a function, binds it to double, then uses that name twice in slightly
different contexts (i.e., it instantiates the formal parameter with different actual parameters).
From these examples, it should be clear that we must introduce two new kinds of expressions: function
applications (as before), as well as (anonymous) function definitions. Here’s the revised BNF corresponding
to these examples:
<FWAE> ::= <num>
| {+ <FWAE> <FWAE>}
| {with {<id> <FWAE>} <FWAE>}
| <id>
| {fun {<id>} <FWAE>}
| {<FWAE> <FWAE>}
Note that F1WAE did not have function definitions as part of the expression language, since the definitions
were assumed to reside outside the expression being evaluated. In this language, functions can be defined
anywhere, including in the function position of an application (the last BNF production): instead of just
the name of a function, programmers can write an arbitrary expression that must be evaluated to obtain the
function to apply. The corresponding abstract syntax is:
(define-type FWAE
[num (n number?)]
[add (lhs FWAE?) (rhs FWAE?)]
[with (name symbol?) (named-expr FWAE?) (body FWAE?)]
6.2. ENRICHING THE LANGUAGE WITH FUNCTIONS
43
[id (name symbol?)]
[fun (param symbol?) (body FWAE?)]
[app (fun-expr FWAE?) (arg-expr FWAE?)])
To define our interpreter, we must think a little about what kinds of values it consumes and produces.
Naturally, the interpreter consumes values of type FWAE. What does it produce? Clearly, a program that
meets the WAE description must yield numbers. As we have seen above, some program that use functions
and applications also evaluate to numbers. How about a program that consists solely of a function? That is,
what is the value of the program
{fun {x} x}
? It clearly doesn’t represent a number. It may be a function that, when applied to a numeric argument,
produces a number, but it’s not itself a number (if you think differently, you need to indicate which number
it will be: 0? 1? 1729?). We instead realize from this that functions are also values that may be the result of
a computation.
We could design an elaborate representation for function values, but for now, we’ll remain modest. We’ll
let the function evaluate to its abstract syntax representation (i.e., a fun structure). (We will soon get more
sophisticated than this.) For consistency, we’ll also let numbers evaluate to num structures. Thus, the result
of evaluating the program above would be the value
(fun ’x (id ’x))
Now we’re ready to write the interpreter. We must pick a type for the value that interp returns. Since
we’ve decided to represent function and number answers using the abstract syntax, it makes sense to use
FWAE, with the caveat that only two kinds of FWAE terms can appear in the output: numbers and functions.
Our first interpreter will use explicit substitution, to offer a direct comparison with the corresponding WAE
and F1WAE interpreters.
;; interp : FWAE → FWAE
;; evaluates FWAE expressions by reducing them to their corresponding values
;; return values are either num or fun
(define (interp expr)
(type-case FWAE expr
[num (n) expr]
[add (l r) (add-numbers (interp l) (interp r))]
[with (bound-id named-expr bound-body)
(interp (subst bound-body
bound-id
(interp named-expr)))]
[id (v) (error ’interp ”free identifier”)]
44
CHAPTER 6. FIRST-CLASS FUNCTIONS
[fun (bound-id bound-body)
expr]
[app (fun-expr arg-expr)
(local ([define fun-val (interp fun-expr)])
(interp (subst (fun-body fun-val)
(fun-param fun-val)
(interp arg-expr))))]
))
(We’ve made a small change to the rule for add: it uses a helper named add-numbers because interp now
returns an FWAE. As an exercise, define this helper function for yourself.)
The rule for a function says, simply, to return the function itself. (Notice the similarity to the rule for
numbers, the other kind of constant in our language!) That leaves only the rule for applications to study.
This rule first evaluates the function position of an application. This is because that position may itself
contain a complex expression that needs to be reduced to an actual function. For instance, in the expression
{{{fun {x} x}
{fun {x} {+ x 5}}}
3}
the outer function position consists of the application of the identity function to a function that adds five to
its argument.
When evaluated, the function position had better reduce to a function value, not a number (or anything
else). For now, we implicitly assume that the programs fed to the interpreter have no errors. (Chapter X will study how we can detect erroneous programs.) Given a function, we need to evaluate its body after having
substituted the formal argument with its actual value. That’s what the rest of the program does: evaluate
the expression that will become the bound value, bind it using substitution, and then interpet the resulting
expression. The last few lines are very similar to the code for with.
To understand this interpreter better, consider what it produces in response to evaluating the following
term:
{with {x 3}
{fun {y}
{+ x y}}}
DrScheme prints the following:
(fun ’y (add (num 3) (id ’y)))
Notice that the x inside the function body has been replaced by 3 as a result of substitution, so the function
has no references to x left in it.
Exercise 6.2.1 What induced the small change in the rule for add? Explain, with an example, what would
go wrong if we did not make this change.
Exercise 6.2.2 Did you notice the small change in the interpretation of with?
Exercise 6.2.3 What goes wrong if the interpreter fails to evaluate the function position (by invoking the
interpreter on it)? Write a program and present the expected and actual results.
6.3. MAKING WITH REDUNDANT
45
6.3
Making with Redundant
Now that we have functions and function invocation as two distinct primitives, we can combine them to
recover the behavior of with as a special case. Every time we encounter an expression of the form
{with {var named} body}
we can replace it with
{{fun {var} body}
named}
and obtain the same effect. The result of this translation does not use with, so it can be evaluated by a more
primitive interpreter: one for AE enriched with functions.
Exercise 6.3.1 Implement a pre-processor that performs this translation.
We will assume the existence of such a pre-processor, and use the language FAE as our basis for subse-
quent exploration. Section 36 discusses such pre-processors in great detail.
6.4
Implementing Functions using Deferred Substitutions
As Section 5 described, our implementation will be more sprightly if we deferred substitutions instead of
performing them greedily. Let’s study how to adapt our interpreter to use this more efficient strategy.
First, we must provide a definition of the substitution repository. The repository associates identifiers