Given this new kind of value, let’s study the interpretation of the four new constructs.
Sequences are easy. The interpreter evaluates the first sub-expression, ignores the resulting value (thus
having evaluated the sub-expression only for the effect it might have on the store)—notice that e1-value is
bound but never used—and returns the result of evaluating the second expression in the store (potentially)
modified by evaluating the first sub-expression:
[seqn (e1 e2)
(type-case Value×Store (interp e1 env store)
[v×s (e1-value e1-store)
(interp e2 env e1-store)])]
The essence of newbox is to obtain a new storage location, wrap its address in a boxV, and return the
boxV as the value portion of the response accompanied by an extended store.
[newbox (value-expr)
(type-case Value×Store (interp value-expr env store)
[v×s (expr-value expr-store)
(local ([define new-loc (next-location expr-store)])
(v×s (boxV new-loc)
(aSto new-loc expr-value expr-store)))])]
To modify the content of a box, the interpreter first evaluates the first sub-expression to a location, then
updates the store with a new value for the same location. Because all expressions must return a value,
setbox chooses to return the new value put in the box as the value of the entire expression.
[setbox (box-expr value-expr)
(type-case Value×Store (interp box-expr env store)
[v×s (box-value box-store)
(type-case Value×Store (interp value-expr env box-store)
[v×s (value-value value-store)
(v×s value-value
(aSto (boxV-location box-value)
value-value
value-store))])])]
Opening a box is straightforward: get a location, look it up in the store, and return the resulting value.
[openbox (box-expr)
(type-case Value×Store (interp box-expr env store)
13.4. SCOPE VERSUS EXTENT
129
[v×s (box-value box-store)
(v×s (store-lookup (boxV-location box-value)
box-store)
box-store)])]
Of course, the term “the store” is ambiguous, because there are different stores before and after evaluating
the sub-expression. Which store should we use? Does this interpreter do the right thing?
All that remains is to implement next-location. Here’s one implementation:
(define next-location
(local ([define last-loc (box −1)])
(lambda (store)
(begin
(set-box! last-loc (+ 1 (unbox last-loc)))
(unbox last-loc)))))
This is an extremely unsatisfying way to implement next-location, because it ultimately relies on a box!
However, this box is not essential. Can you get rid of it?
The core of the interpreter is in Figure 13.1 and Figure 13.2.
Exercise 13.3.2 Define next-location so it does not have side-effects.
Hint: You may need to modify the interpreter to do this.
13.4
Scope versus Extent
Notice that while closures refer to the environment of definition, they do not refer to the corresponding
store. The store is therefore a global record of changes made during execution. As a result, stores and
environments have different patterns of flow. Whereas the interpreter employs the same environment for
both arms of an addition, for instance, it cascades the store from one arm to the next and then back out
alongside the resulting value. This latter kind of flow is sometimes called threading, since it resembles the
action of a needle through cloth.
These two flows of values through the interpreter correspond to a deep difference between names and
values. A value persists in the store long after the name that introduced it has disappeared from the envi-
ronment. This is not inherently a problem, because the value may have been the result of a computation,
and some other name may have since come to be associated with that value. In other words, identifiers have
lexical scope; values themselves, however, potentially have indefinite, dynamic extent.
Some languages confuse these two ideas. As a result, when an identifier ceases to be in scope, they
remove the value corresponding to the identifier. That value may be the result of the computation, however,
and some other identifier may still have a reference to it. This premature removal of the value will, therefore,
inevitably lead to a system crash. Depending on how the implementation “removes” the value, however, the
system may crash later instead of sooner, leading to extremely vexing bugs. This is a common problem in
languages like C and C++.
In most cases, garbage collection (Section 21) lets languages dissociate scope from the reclamation of
space consumed by values. The performance of garbage collectors is often far better than we might na¨ıvely
imagine (especially in comparison to the alternative).
130
CHAPTER 13. MUTABLE DATA STRUCTURES
In terms of language design, there are many reasons why C and C++ have adopted the broken policy of
not distinguishing between scope and extent. These reasons roughly fall into the following categories:
• Justified concerns about fine-grained performance control.
• Mistakes arising from misconceptions about performance.
• History (we understand things better now than we did then).
• Ignorance of concepts that were known even at that time.
Whatever their reasons, these language design flaws have genuine and expensive consequences: they cause
both errors and poor performance in programs. These errors, in particular, can lead to serious security
problems, which have serious financial and social consequences. Therefore, the questions we raise here are
not merely academic.
While programmers who are experts in these languages have evolved a series of ad hoc techniques for
contending with these problems, we students of programming languages should know better. We should
recognize their techniques for what they are, namely symptoms of a broken programming language design
rather than proper solutions to a problem. Serious students of languages and related computer science
technologies take these flaws as a starting point for exploring new and better designs.
Exercise 13.4.1 Modify the interpreter to evaluate addition from right to left instead of left-to-right. Con-
struct a test case that should yield different answers in the two cases, and show that your implementation
returns the right value on your test case.
Exercise 13.4.2 Modify seqn to permit an arbitrary number of sub-expressions, not just two. They should
evaluate in left-to-right order.
Exercise 13.4.3 New assignments to a location currently mask the old thanks to the way we’ve defined
store-lookup, but the data structure still has a record of the old assignments. Modify the implementation of
stores so that they have at most one assignment for each location.
Exercise 13.4.4 Use Scheme procedures to implement the store as a partial function.
13.4. SCOPE VERSUS EXTENT
131
;; interp : BCFAE Env Store → Value×Store
(define (interp expr env store)
(type-case BCFAE expr
[num (n) (v×s (numV n) store)]
[add (l r)
(type-case Value×Store (interp l env store)
[v×s (l-value l-store)
(type-case Value×Store (interp r env l-store)
[v×s (r-value r-store)
(v×s (num+ l-value r-value)
r-store)])])]
[id (v) (v×s (store-lookup (env-lookup v env) store) store)]
[fun (bound-id bound-body)
(v×s (closureV bound-id bound-body env) store)]
[app (fun-expr arg-expr)
(type-case Value×Store (interp fun-expr env store)
[v×s (fun-value fun-store)
(type-case Value×Store (interp arg-expr env fun-store)
[v×s (arg-value arg-store)
(local ([define new-loc (next-location arg-store)])
(interp (closureV-body fun-value)
(aSub (closureV-param fun-value)
new-loc
(closureV-env fun-value))
(aSto new-loc
arg-value
arg-store)))])])]
[if0 (test truth falsity)
(type-case Value×Store (interp test env store)
[v×s (test-value test-store)
(if (num-zero? test-value)
(interp truth env test-store)
(interp falsity env test-store))])]
...
Figure 13.1: Implementing Mutable Data Structures , Part 1
132
CHAPTER 13. MUTABLE DATA STRUCTURES
...
[newbox (value-expr)
(type-case Value×Store (interp value-expr env store)
[v×s (expr-value expr-store)
(local ([define new-loc (next-location expr-store)])
(v×s (boxV new-loc)
(aSto new-loc expr-value expr-store)))])]
[setbox (box-expr value-expr)
(type-case Value×Store (interp box-expr env store)
[v×s (box-value box-store)
(type-case Value×Store (interp value-expr env box-store)
[v×s (value-value value-store)
(v×s value-value
(aSto (boxV-location box-value)
value-value
value-store))])])]
[openbox (box-expr)
(type-case Value×Store (interp box-expr env store)
[v×s (box-value box-store)
(v×s (store-lookup (boxV-location box-value)
box-store)
box-store)])]
[seqn (e1 e2)
(type-case Value×Store (interp e1 env store)
[v×s (e1-value e1-store)
(interp e2 env e1-store)])]))
Figure 13.2: Implementing Mutable Data Structures, Part 2
Chapter 14
Variables
In Section 13 we studied the implementation of mutable data structures. The boxes we studied there could
just as well have been vectors or other container types, such as objects with fields.
In traditional languages like C and Java, there are actually two forms of mutation. One is mutating the
value in a container, such as an object (in Java). The expression
o.f = e
evaluates o to an object, e to some value, and changes the content of field f of o to hold the value of e. Note
that o can be an arbitrary expression (for instance, it can look up an object in some other data structure) that
is evaluated to a value. In contrast, a programmer can also write a method such as
void m (int i) {
i = 4;
}
Here, i must literally be an identifier; it cannot be an arbitrary expression that evaluates to an identifier.
That is, we are not mutating the value contained within a box (or position in a vector, or a field); rather, we
are mutating the value bound to an identifier itself. That makes the identifier a variable. A more interesting
example would be a pattern that repeatedly occurs in object-oriented programming:
private int x;
void set_x (int y) {
x = y;
}
void get_x () {
return x;
}
Here, the variable x is private to the object, and can be accessed only through the getter and setter methods.
The setter assigns a new value to x.
133
134
CHAPTER 14. VARIABLES
14.1
Implementing Variables
First, let’s extend our language to include variables:
<VCFAE> ::= ...
| {set <id> <VCFAE>}
| {seqn <VCFAE> <VCFAE>}
Observe that the set expression expects a literal identifier after the keyword.
Implementing variables is a little different from implementing boxes. In the latter case, we first evaluate
the position that identifies the box:
[setbox (box-expr value-expr)
(type-case Value×Store (interp box-expr env store)
[v×s (box-value box-store)
...])]
In contrast, in a language with variables, identifiers do not represent boxes. Therefore, the corresponding
code:
[set (var value)
(type-case Value×Store (interp var env store)
[v×s (var-value var-store)
...])]
would be counter-productive. Evaluating the identifier would result in a value, which we cannot mutate, as
opposed to a location, whose content we can modify by updating the store. This immediately suggests a
slightly different evaluation strategy:
[set (var value)
(type-case Value×Store (interp value env store)
[v×s (value-value value-store)
(local ([define the-loc (env-lookup var env)])
...)])]
That is, we evaluate the expression that represents the new value to be stored in the interpreter. Instead of
evaluating the identifier, however, we only look it up in the environment. This results in a location where
the new value should be stored. In particular, notice an unusual pattern: the interpreter dereferences the
identifier in the environment, but does not dereference the result (the identifier’s location) in the store. We
have not seen this pattern before, and will never see it again after this material.
Many languages make a distinction between mutable data structures and mutable identifiers. When a
mutable identifier appears in the assignment position of an assignment operator (many languages use the
same syntactic operator, = or :=, to represent both operations), the language implementation only partially
resolves the identifier. This special kind of value—the location of an identifier—is traditionally known as
an l-value (pronounced “ell-value”).
Whence this unusual name? Consider the following two statements in C:
14.2. INTERACTION BETWEEN VARIABLES AND FUNCTION APPLICATION
135
x = 2;
y = x;
In the second statement, x must be reduced to a value—i.e., store-lookup and env-lookup must be composed
and applied to its content—whereas in the first statement, x must only be reduced to a location, not to a
value. In languages where locations are not values (more on that below), this odd kind of “value” is known
as an “l-value”, since it appears only on the left-hand-side of assignment statements.
Given this insight, we can now easily complete the definition of the assignment statement:
[set (var value)
(type-case Value×Store (interp value env store)
[v×s (value-value value-store)
(local ([define the-loc (env-lookup var env)])
(v×s value-value
(aSto the-loc value-value value-store)))])]
The rest of the interpreter remains unchanged. Note, in particular, that it still employs store-passing style.
Figure 14.1 and Figure 14.2 present the core of the interpreter.
14.2
Interaction Between Variables and Function Application
Variables and function application appear to be two independent language features, but perhaps they are not.
Consider the following program:
{with {v 0}
{with {f {fun {y}
{set y 5}}}
{seqn {f v}
v}}}
What do we expect it to evaluate to? There are two different, reasonable answers: 0 and 5. The first assumes
that the mutation is to the formal variable, y, and does not affect the actual argument, v; the second assumes
that this mutation does have the effect of modifying the actual argument.
Our current implementation yields the value 0. This is because the act of invoking the function binds
the formal parameter to a new location:
(local ([define new-loc (next-location arg-store)])
(interp (closureV-body fun-value)
(aSub (closureV-param fun-value)
new-loc
(closureV-env fun-value))
(aSto new-loc
arg-value
arg-store)))
136
CHAPTER 14. VARIABLES
The evaluated argument is held in this new location. Therefore, changes to the content of that location in the
store do not affect the actual parameter. This is the standard form of eager evaluation, traditionally called
call-by-value.
Let’s now explore the alternative. This form of evaluation is called call-by-reference. This new technique
gets its name because we will pass a reference to the actual argument, rather than merely its value. Thus,
updates to the reference within the called procedure will become visible to the calling context, too.
To explore this design, let’s extend our language further so we have two kinds of procedures: call-by-
value (fun) and call-by-reference (refun):
<RVCFAE> ::= ...
| {refun {<id>} <RVCFAE>}
| {set <id> <RVCFAE>}
| {seqn <RVCFAE> <RVCFAE>}
That is, syntactically a call-by-reference procedure looks the same as a call-by-value procedure other than
the distinguishing keyword. It is their interpretation that will distinguish them.
All the code we have developed thus far remains the same for call-by-value procedure invocation. In
particular, with expressions should continue to expand into immediate applications of fun-defined proce-
dures. Let us proceed to defining the interpretation of call-by-reference procedures.
The first step is to evaluate a reference procedure definition. This is straightforward:
[refun (bound-id bound-body)
(v×s (refclosV bound-id bound-body env) store)]
We create a new kind of closure so we can later distinguish what kind of procedure we are about to apply,
but its fields are the same:
(define-type RVCFAE-Value
[numV (n number?)]
[closureV (param symbol?)
(body RVCFAE?)
(sc SubCache?)]
[refclosV (param symbol?)
(body RVCFAE?)
(sc SubCache?)])
Now let us study the interpretation of application. After evaluating the procedure position, we must check
which kind of procedure it is before evaluating the argument. If it’s a call-by-value procedure, we proceed
as before:
[app (fun-expr arg-expr)
(type-case Value×Store (interp fun-expr env store)
[v×s (fun-value fun-store)
(type-case RVCFAE-Value fun-value
[closureV (cl-param cl-body cl-env)
(type-case Value×Store (interp arg-expr env fun-store)
14.3. PERSPECTIVE
137
[v×s (arg-value arg-store)
(local ([define new-loc (next-location arg-store)])
(interp cl-body
(aSub cl-param
new-loc
cl-env)
(aSto new-loc
arg-value
arg-store)))])]
[refclosV (cl-param cl-body cl-env)
...]
[numV ( ) (error ’interp ”trying to apply a number”)])])]
We can thus focus our attention on the interpretation of function applications where the function position
evaluates to a reference procedure closure.
When applying a call-by-reference procedure, we must supply it with the location of the actual argument.
This presupposes that the actual argument will be a variable. We put this down to an implicit constraint of
the language, namely that whenever applying a reference procedure, we assume that the argument expres-
sion is syntactically a variable. Given this, we can easily determine its location, and extend the closure’s
environment with the formal parameter bound to this location:
[refclosV (cl-param cl-body cl-env)
(local ([define arg-loc (env-lookup (id-name arg-expr) env)])
(interp cl-body
(aSub cl-param
arg-loc
cl-env)
fun-store))]
Notice the recurrence of the l-value pattern: an environment lookup without a corresponding store lookup.
(This is why we dispatched on the type of the closure without first evaluating the argument: had we not done
so, the argument expression would have been reduced to a value, which would be either useless or incorrect
in the case of reference procedures.) As a result, any mutations to the formal parameter are now changes
to the same location as the actual parameter, and are thus effectively mutations to the actual parameter also.
Thus, the example that inaugurated this section will yield the result 5.
Figure 14.3 and Figure 14.4 present the core of the interpreter.
14.3
Perspective
Should languages have reference procedures? Passing references to procedures has the following dangerous
property: the formal parameter becomes an alias of the actual parameter, as all changes to the formal
manifest as changes to the actual also. This is especially insidiuous because the programmer may not
know he is about to apply a reference procedure: Some languages like C offer the ability to mark specific
138
CHAPTER 14. VARIABLES
parameters of multi-parameter procedures with keywords such as & and ref, meaning they alone should be
passed by reference (these are known as reference parameters). The client of such a procedure may thus find
that, mysteriously, the act of invoking this procedure has changed the value of his identifiers. This aliasing
effect can lead to errors that are particularly difficult to detect and diagnose.
This phenomenon cannot occur with call-by-value: changes to the variable in the called procedure do
not affect the caller. There is, therefore, nearly universal agreement in modern languages that arguments
should be passed by value. If the called procedure intends to mutate a value, it must consume a box (or other
container data structure); the caller must, in turn, signal acceptance of this behavior by passing a box as the
actual argument. The caller then has the freedom to inspect the content of the (possibly mutated) box and
determine whether to accept this mutation in the remainder of the program, or to reject it by ignoring the
altered content of the box.
Why did languages introduce reference parameters? For one thing, they are “cheaper”: they do not
require additional allocation. (We can see this difference clearly when we contrast the two kinds of procedure
application.) However, the problems they introduce arguably far outweigh this small savings in memory.
Reference parameters do, however, also confer a small expressiveness benefit. Without reference pa-
rameters, we cannot define a procedure that swaps the content of two variables. In the following code,
{with {swap {fun {x}
{fun {y}
{with {z x}
{seqn {set x y}
{set y z}}}}}}
{with {a 3}
{with {b 2}
{seqn {{swap a} b}
b}}}}
the result of the computation is still 2, because the mutations to x and y inside the procedure do not affect
a and b. In contrast,
{with {swap {refun {x}
{refun {y}
{with {z x}
{seqn {set x y}
{set y z}}}}}}
{with {a 3}
{with {b 2}
{seqn {{swap a} b}
b}}}}
results in the value 3: since x and y are just aliases to a and b, mutations to the former are reflected as
mutations to the latter. (Note that both procedures must be refuns and not funs, else the swap is at best
partial.)
This example also, however, illustrates why aliasing can cause problems. The implementor of the pro-
cedure may have used mutation accidentally, without meaning to affect the caller. The procedure boundary
14.3. PERSPECTIVE
139
abstraction has, however, been compromised by the aliasing, and accidental side-effects can leak into the
calling contexts, exposing unnecessary implementation details of the procedure.
In the early days of programming language design, before programs were particularly sophisticated, the
ability to write simple abstractions such as swap was considered valuable (since it is used, for instance, in
the implementation of some sorting algorithms). Today, however, we recognize that such abstractions are
rather meager in the face of the needs of modern systems. We pay greater attention, instead, to the need
for creating useful abstraction boundaries between units of modularity such as procedures: the fewer hidden
interactions they have, and the less they interfere with one another, the more easily we can reason about their
behavior in isolation.
Exercise 14.3.1 While call-by-value preserves the value of variables in the calling context, it does not
protect all values. In particular, in many call-by-value languages, a composite data structure (such as a
vector) passed as an argument may be mutated by the callee, with the effects visible to the caller.
1. Does this behavior contradict the claim that the language is passing “values” as arguments? Use our
investigation of mutable data structures in Section 13 to make your argument rigorous.
Hint: Implement an int