Programming Languages: Application and Interpretation by Shriram Krishnamurthi - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Given this new kind of value, let’s study the interpretation of the four new constructs.

Sequences are easy. The interpreter evaluates the first sub-expression, ignores the resulting value (thus

having evaluated the sub-expression only for the effect it might have on the store)—notice that e1-value is

bound but never used—and returns the result of evaluating the second expression in the store (potentially)

modified by evaluating the first sub-expression:

[seqn (e1 e2)

(type-case Value×Store (interp e1 env store)

[v×s (e1-value e1-store)

(interp e2 env e1-store)])]

The essence of newbox is to obtain a new storage location, wrap its address in a boxV, and return the

boxV as the value portion of the response accompanied by an extended store.

[newbox (value-expr)

(type-case Value×Store (interp value-expr env store)

[v×s (expr-value expr-store)

(local ([define new-loc (next-location expr-store)])

(v×s (boxV new-loc)

(aSto new-loc expr-value expr-store)))])]

To modify the content of a box, the interpreter first evaluates the first sub-expression to a location, then

updates the store with a new value for the same location. Because all expressions must return a value,

setbox chooses to return the new value put in the box as the value of the entire expression.

[setbox (box-expr value-expr)

(type-case Value×Store (interp box-expr env store)

[v×s (box-value box-store)

(type-case Value×Store (interp value-expr env box-store)

[v×s (value-value value-store)

(v×s value-value

(aSto (boxV-location box-value)

value-value

value-store))])])]

Opening a box is straightforward: get a location, look it up in the store, and return the resulting value.

[openbox (box-expr)

(type-case Value×Store (interp box-expr env store)

13.4. SCOPE VERSUS EXTENT

129

[v×s (box-value box-store)

(v×s (store-lookup (boxV-location box-value)

box-store)

box-store)])]

Of course, the term “the store” is ambiguous, because there are different stores before and after evaluating

the sub-expression. Which store should we use? Does this interpreter do the right thing?

All that remains is to implement next-location. Here’s one implementation:

(define next-location

(local ([define last-loc (box −1)])

(lambda (store)

(begin

(set-box! last-loc (+ 1 (unbox last-loc)))

(unbox last-loc)))))

This is an extremely unsatisfying way to implement next-location, because it ultimately relies on a box!

However, this box is not essential. Can you get rid of it?

The core of the interpreter is in Figure 13.1 and Figure 13.2.

Exercise 13.3.2 Define next-location so it does not have side-effects.

Hint: You may need to modify the interpreter to do this.

13.4

Scope versus Extent

Notice that while closures refer to the environment of definition, they do not refer to the corresponding

store. The store is therefore a global record of changes made during execution. As a result, stores and

environments have different patterns of flow. Whereas the interpreter employs the same environment for

both arms of an addition, for instance, it cascades the store from one arm to the next and then back out

alongside the resulting value. This latter kind of flow is sometimes called threading, since it resembles the

action of a needle through cloth.

These two flows of values through the interpreter correspond to a deep difference between names and

values. A value persists in the store long after the name that introduced it has disappeared from the envi-

ronment. This is not inherently a problem, because the value may have been the result of a computation,

and some other name may have since come to be associated with that value. In other words, identifiers have

lexical scope; values themselves, however, potentially have indefinite, dynamic extent.

Some languages confuse these two ideas. As a result, when an identifier ceases to be in scope, they

remove the value corresponding to the identifier. That value may be the result of the computation, however,

and some other identifier may still have a reference to it. This premature removal of the value will, therefore,

inevitably lead to a system crash. Depending on how the implementation “removes” the value, however, the

system may crash later instead of sooner, leading to extremely vexing bugs. This is a common problem in

languages like C and C++.

In most cases, garbage collection (Section 21) lets languages dissociate scope from the reclamation of

space consumed by values. The performance of garbage collectors is often far better than we might na¨ıvely

imagine (especially in comparison to the alternative).

130

CHAPTER 13. MUTABLE DATA STRUCTURES

In terms of language design, there are many reasons why C and C++ have adopted the broken policy of

not distinguishing between scope and extent. These reasons roughly fall into the following categories:

• Justified concerns about fine-grained performance control.

• Mistakes arising from misconceptions about performance.

• History (we understand things better now than we did then).

• Ignorance of concepts that were known even at that time.

Whatever their reasons, these language design flaws have genuine and expensive consequences: they cause

both errors and poor performance in programs. These errors, in particular, can lead to serious security

problems, which have serious financial and social consequences. Therefore, the questions we raise here are

not merely academic.

While programmers who are experts in these languages have evolved a series of ad hoc techniques for

contending with these problems, we students of programming languages should know better. We should

recognize their techniques for what they are, namely symptoms of a broken programming language design

rather than proper solutions to a problem. Serious students of languages and related computer science

technologies take these flaws as a starting point for exploring new and better designs.

Exercise 13.4.1 Modify the interpreter to evaluate addition from right to left instead of left-to-right. Con-

struct a test case that should yield different answers in the two cases, and show that your implementation

returns the right value on your test case.

Exercise 13.4.2 Modify seqn to permit an arbitrary number of sub-expressions, not just two. They should

evaluate in left-to-right order.

Exercise 13.4.3 New assignments to a location currently mask the old thanks to the way we’ve defined

store-lookup, but the data structure still has a record of the old assignments. Modify the implementation of

stores so that they have at most one assignment for each location.

Exercise 13.4.4 Use Scheme procedures to implement the store as a partial function.

13.4. SCOPE VERSUS EXTENT

131

;; interp : BCFAE Env Store → Value×Store

(define (interp expr env store)

(type-case BCFAE expr

[num (n) (v×s (numV n) store)]

[add (l r)

(type-case Value×Store (interp l env store)

[v×s (l-value l-store)

(type-case Value×Store (interp r env l-store)

[v×s (r-value r-store)

(v×s (num+ l-value r-value)

r-store)])])]

[id (v) (v×s (store-lookup (env-lookup v env) store) store)]

[fun (bound-id bound-body)

(v×s (closureV bound-id bound-body env) store)]

[app (fun-expr arg-expr)

(type-case Value×Store (interp fun-expr env store)

[v×s (fun-value fun-store)

(type-case Value×Store (interp arg-expr env fun-store)

[v×s (arg-value arg-store)

(local ([define new-loc (next-location arg-store)])

(interp (closureV-body fun-value)

(aSub (closureV-param fun-value)

new-loc

(closureV-env fun-value))

(aSto new-loc

arg-value

arg-store)))])])]

[if0 (test truth falsity)

(type-case Value×Store (interp test env store)

[v×s (test-value test-store)

(if (num-zero? test-value)

(interp truth env test-store)

(interp falsity env test-store))])]

...

Figure 13.1: Implementing Mutable Data Structures , Part 1

132

CHAPTER 13. MUTABLE DATA STRUCTURES

...

[newbox (value-expr)

(type-case Value×Store (interp value-expr env store)

[v×s (expr-value expr-store)

(local ([define new-loc (next-location expr-store)])

(v×s (boxV new-loc)

(aSto new-loc expr-value expr-store)))])]

[setbox (box-expr value-expr)

(type-case Value×Store (interp box-expr env store)

[v×s (box-value box-store)

(type-case Value×Store (interp value-expr env box-store)

[v×s (value-value value-store)

(v×s value-value

(aSto (boxV-location box-value)

value-value

value-store))])])]

[openbox (box-expr)

(type-case Value×Store (interp box-expr env store)

[v×s (box-value box-store)

(v×s (store-lookup (boxV-location box-value)

box-store)

box-store)])]

[seqn (e1 e2)

(type-case Value×Store (interp e1 env store)

[v×s (e1-value e1-store)

(interp e2 env e1-store)])]))

Figure 13.2: Implementing Mutable Data Structures, Part 2

Chapter 14

Variables

In Section 13 we studied the implementation of mutable data structures. The boxes we studied there could

just as well have been vectors or other container types, such as objects with fields.

In traditional languages like C and Java, there are actually two forms of mutation. One is mutating the

value in a container, such as an object (in Java). The expression

o.f = e

evaluates o to an object, e to some value, and changes the content of field f of o to hold the value of e. Note

that o can be an arbitrary expression (for instance, it can look up an object in some other data structure) that

is evaluated to a value. In contrast, a programmer can also write a method such as

void m (int i) {

i = 4;

}

Here, i must literally be an identifier; it cannot be an arbitrary expression that evaluates to an identifier.

That is, we are not mutating the value contained within a box (or position in a vector, or a field); rather, we

are mutating the value bound to an identifier itself. That makes the identifier a variable. A more interesting

example would be a pattern that repeatedly occurs in object-oriented programming:

private int x;

void set_x (int y) {

x = y;

}

void get_x () {

return x;

}

Here, the variable x is private to the object, and can be accessed only through the getter and setter methods.

The setter assigns a new value to x.

133

134

CHAPTER 14. VARIABLES

14.1

Implementing Variables

First, let’s extend our language to include variables:

<VCFAE> ::= ...

| {set <id> <VCFAE>}

| {seqn <VCFAE> <VCFAE>}

Observe that the set expression expects a literal identifier after the keyword.

Implementing variables is a little different from implementing boxes. In the latter case, we first evaluate

the position that identifies the box:

[setbox (box-expr value-expr)

(type-case Value×Store (interp box-expr env store)

[v×s (box-value box-store)

...])]

In contrast, in a language with variables, identifiers do not represent boxes. Therefore, the corresponding

code:

[set (var value)

(type-case Value×Store (interp var env store)

[v×s (var-value var-store)

...])]

would be counter-productive. Evaluating the identifier would result in a value, which we cannot mutate, as

opposed to a location, whose content we can modify by updating the store. This immediately suggests a

slightly different evaluation strategy:

[set (var value)

(type-case Value×Store (interp value env store)

[v×s (value-value value-store)

(local ([define the-loc (env-lookup var env)])

...)])]

That is, we evaluate the expression that represents the new value to be stored in the interpreter. Instead of

evaluating the identifier, however, we only look it up in the environment. This results in a location where

the new value should be stored. In particular, notice an unusual pattern: the interpreter dereferences the

identifier in the environment, but does not dereference the result (the identifier’s location) in the store. We

have not seen this pattern before, and will never see it again after this material.

Many languages make a distinction between mutable data structures and mutable identifiers. When a

mutable identifier appears in the assignment position of an assignment operator (many languages use the

same syntactic operator, = or :=, to represent both operations), the language implementation only partially

resolves the identifier. This special kind of value—the location of an identifier—is traditionally known as

an l-value (pronounced “ell-value”).

Whence this unusual name? Consider the following two statements in C:

14.2. INTERACTION BETWEEN VARIABLES AND FUNCTION APPLICATION

135

x = 2;

y = x;

In the second statement, x must be reduced to a value—i.e., store-lookup and env-lookup must be composed

and applied to its content—whereas in the first statement, x must only be reduced to a location, not to a

value. In languages where locations are not values (more on that below), this odd kind of “value” is known

as an “l-value”, since it appears only on the left-hand-side of assignment statements.

Given this insight, we can now easily complete the definition of the assignment statement:

[set (var value)

(type-case Value×Store (interp value env store)

[v×s (value-value value-store)

(local ([define the-loc (env-lookup var env)])

(v×s value-value

(aSto the-loc value-value value-store)))])]

The rest of the interpreter remains unchanged. Note, in particular, that it still employs store-passing style.

Figure 14.1 and Figure 14.2 present the core of the interpreter.

14.2

Interaction Between Variables and Function Application

Variables and function application appear to be two independent language features, but perhaps they are not.

Consider the following program:

{with {v 0}

{with {f {fun {y}

{set y 5}}}

{seqn {f v}

v}}}

What do we expect it to evaluate to? There are two different, reasonable answers: 0 and 5. The first assumes

that the mutation is to the formal variable, y, and does not affect the actual argument, v; the second assumes

that this mutation does have the effect of modifying the actual argument.

Our current implementation yields the value 0. This is because the act of invoking the function binds

the formal parameter to a new location:

(local ([define new-loc (next-location arg-store)])

(interp (closureV-body fun-value)

(aSub (closureV-param fun-value)

new-loc

(closureV-env fun-value))

(aSto new-loc

arg-value

arg-store)))

136

CHAPTER 14. VARIABLES

The evaluated argument is held in this new location. Therefore, changes to the content of that location in the

store do not affect the actual parameter. This is the standard form of eager evaluation, traditionally called

call-by-value.

Let’s now explore the alternative. This form of evaluation is called call-by-reference. This new technique

gets its name because we will pass a reference to the actual argument, rather than merely its value. Thus,

updates to the reference within the called procedure will become visible to the calling context, too.

To explore this design, let’s extend our language further so we have two kinds of procedures: call-by-

value (fun) and call-by-reference (refun):

<RVCFAE> ::= ...

| {refun {<id>} <RVCFAE>}

| {set <id> <RVCFAE>}

| {seqn <RVCFAE> <RVCFAE>}

That is, syntactically a call-by-reference procedure looks the same as a call-by-value procedure other than

the distinguishing keyword. It is their interpretation that will distinguish them.

All the code we have developed thus far remains the same for call-by-value procedure invocation. In

particular, with expressions should continue to expand into immediate applications of fun-defined proce-

dures. Let us proceed to defining the interpretation of call-by-reference procedures.

The first step is to evaluate a reference procedure definition. This is straightforward:

[refun (bound-id bound-body)

(v×s (refclosV bound-id bound-body env) store)]

We create a new kind of closure so we can later distinguish what kind of procedure we are about to apply,

but its fields are the same:

(define-type RVCFAE-Value

[numV (n number?)]

[closureV (param symbol?)

(body RVCFAE?)

(sc SubCache?)]

[refclosV (param symbol?)

(body RVCFAE?)

(sc SubCache?)])

Now let us study the interpretation of application. After evaluating the procedure position, we must check

which kind of procedure it is before evaluating the argument. If it’s a call-by-value procedure, we proceed

as before:

[app (fun-expr arg-expr)

(type-case Value×Store (interp fun-expr env store)

[v×s (fun-value fun-store)

(type-case RVCFAE-Value fun-value

[closureV (cl-param cl-body cl-env)

(type-case Value×Store (interp arg-expr env fun-store)

14.3. PERSPECTIVE

137

[v×s (arg-value arg-store)

(local ([define new-loc (next-location arg-store)])

(interp cl-body

(aSub cl-param

new-loc

cl-env)

(aSto new-loc

arg-value

arg-store)))])]

[refclosV (cl-param cl-body cl-env)

...]

[numV ( ) (error ’interp ”trying to apply a number”)])])]

We can thus focus our attention on the interpretation of function applications where the function position

evaluates to a reference procedure closure.

When applying a call-by-reference procedure, we must supply it with the location of the actual argument.

This presupposes that the actual argument will be a variable. We put this down to an implicit constraint of

the language, namely that whenever applying a reference procedure, we assume that the argument expres-

sion is syntactically a variable. Given this, we can easily determine its location, and extend the closure’s

environment with the formal parameter bound to this location:

[refclosV (cl-param cl-body cl-env)

(local ([define arg-loc (env-lookup (id-name arg-expr) env)])

(interp cl-body

(aSub cl-param

arg-loc

cl-env)

fun-store))]

Notice the recurrence of the l-value pattern: an environment lookup without a corresponding store lookup.

(This is why we dispatched on the type of the closure without first evaluating the argument: had we not done

so, the argument expression would have been reduced to a value, which would be either useless or incorrect

in the case of reference procedures.) As a result, any mutations to the formal parameter are now changes

to the same location as the actual parameter, and are thus effectively mutations to the actual parameter also.

Thus, the example that inaugurated this section will yield the result 5.

Figure 14.3 and Figure 14.4 present the core of the interpreter.

14.3

Perspective

Should languages have reference procedures? Passing references to procedures has the following dangerous

property: the formal parameter becomes an alias of the actual parameter, as all changes to the formal

manifest as changes to the actual also. This is especially insidiuous because the programmer may not

know he is about to apply a reference procedure: Some languages like C offer the ability to mark specific

138

CHAPTER 14. VARIABLES

parameters of multi-parameter procedures with keywords such as & and ref, meaning they alone should be

passed by reference (these are known as reference parameters). The client of such a procedure may thus find

that, mysteriously, the act of invoking this procedure has changed the value of his identifiers. This aliasing

effect can lead to errors that are particularly difficult to detect and diagnose.

This phenomenon cannot occur with call-by-value: changes to the variable in the called procedure do

not affect the caller. There is, therefore, nearly universal agreement in modern languages that arguments

should be passed by value. If the called procedure intends to mutate a value, it must consume a box (or other

container data structure); the caller must, in turn, signal acceptance of this behavior by passing a box as the

actual argument. The caller then has the freedom to inspect the content of the (possibly mutated) box and

determine whether to accept this mutation in the remainder of the program, or to reject it by ignoring the

altered content of the box.

Why did languages introduce reference parameters? For one thing, they are “cheaper”: they do not

require additional allocation. (We can see this difference clearly when we contrast the two kinds of procedure

application.) However, the problems they introduce arguably far outweigh this small savings in memory.

Reference parameters do, however, also confer a small expressiveness benefit. Without reference pa-

rameters, we cannot define a procedure that swaps the content of two variables. In the following code,

{with {swap {fun {x}

{fun {y}

{with {z x}

{seqn {set x y}

{set y z}}}}}}

{with {a 3}

{with {b 2}

{seqn {{swap a} b}

b}}}}

the result of the computation is still 2, because the mutations to x and y inside the procedure do not affect

a and b. In contrast,

{with {swap {refun {x}

{refun {y}

{with {z x}

{seqn {set x y}

{set y z}}}}}}

{with {a 3}

{with {b 2}

{seqn {{swap a} b}

b}}}}

results in the value 3: since x and y are just aliases to a and b, mutations to the former are reflected as

mutations to the latter. (Note that both procedures must be refuns and not funs, else the swap is at best

partial.)

This example also, however, illustrates why aliasing can cause problems. The implementor of the pro-

cedure may have used mutation accidentally, without meaning to affect the caller. The procedure boundary

14.3. PERSPECTIVE

139

abstraction has, however, been compromised by the aliasing, and accidental side-effects can leak into the

calling contexts, exposing unnecessary implementation details of the procedure.

In the early days of programming language design, before programs were particularly sophisticated, the

ability to write simple abstractions such as swap was considered valuable (since it is used, for instance, in

the implementation of some sorting algorithms). Today, however, we recognize that such abstractions are

rather meager in the face of the needs of modern systems. We pay greater attention, instead, to the need

for creating useful abstraction boundaries between units of modularity such as procedures: the fewer hidden

interactions they have, and the less they interfere with one another, the more easily we can reason about their

behavior in isolation.

Exercise 14.3.1 While call-by-value preserves the value of variables in the calling context, it does not

protect all values. In particular, in many call-by-value languages, a composite data structure (such as a

vector) passed as an argument may be mutated by the callee, with the effects visible to the caller.

1. Does this behavior contradict the claim that the language is passing “values” as arguments? Use our

investigation of mutable data structures in Section 13 to make your argument rigorous.

Hint: Implement an int