Programming Languages: Application and Interpretation by Shriram Krishnamurthi - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

This seems to create an entirely new level of difficulty. Addition seemed to need the ability to modify the

numeral to apply the first argument one more time, but we found a clever way of applying it “from the

outside”. Subtraction, on the other hand, requires that a procedure not be applied, which seems harder still.

The solution to this problem is to make the following observation. Consider the pair 0, 0 . Now apply

the following algorithm. Given such a pair of numbers, create a new pair. The new pair’s left component is

the old pair’s right component; the new pair’s right component is the old pair’s right component incremented

by one. Visually,

initial value

:

0, 0

after 1 iteration

:

0, 1

after 2 iterations

:

1, 2

after 3 iterations

:

2, 3

after 4 iterations

:

3, 4

and so on.

You might find this procedure rather strange, in that it entirely ignores the left component of each

preceding pair to create the next one in the sequence. Notice, however, that after n iterations, the left

component holds the value n − 1. Furthermore, observe the operations that we used to obtain these pairs:

creation of an initial pair, pair deconstruction, increment by one, and new pair construction. That is, the

following procedure represents the algorithm applied at each step:

(lambda (p)

(pair (right p)

(succ (right p))))

The following represents the initial value:

(pair zero zero)

If we apply this n times, then read out the left value of the resulting pair, we get. . . n − 1!

pred ≡

(lambda (n)

22.4. ELIMINATING RECURSION

223

(left

(lambda (p)

(pair (right p)

((n

(succ (right p)))) )

(pair zero zero) )))

Once we have subtraction by one, we can implement regular subtraction and other such operations.

That leaves only one arithmetic-related primitive we need to implement factorial, namely Scheme’s

zero?. What does this operator do? Given a representation for zero, it returns true, otherwise false. What

is the one characteristic that distinguishes the numeral for zero from that for all non-zero numbers? The

latter all apply their first argument to their second at least once, while the former does not. Therefore, the

following defines zero?:

iszero ≡

(lambda (n)

((n (lambda (ignore) no)) yes))

If the first argument is applied at all, no matter how many times it’s applied, it returns the representation of

false; if it never is, then the “zero” value, the representation of true, results.

Historical aside: These numerals are known as the Church numerals, in honor of their inventor, Alonzo

Church.

Exercise 22.3.1 Can you extend this encoding to other kinds of numbers: negative integers, rationals, reals,

complex numbers, . . . ?

Hint: Some of these are easy to encode using pairs of other kinds of numbers.

Exercise 22.3.2 Here are two alternate representations of the exponentiation operation. Which one is

faster?

(define exp1

(lambda (m)

(lambda (n)

((n (prod m)) one))))

(define exp2

(lambda (m)

(lambda (n)

(n m))))

22.4

Eliminating Recursion

The define construct of Scheme is surprisingly powerful. It not only assigns values to names, it also en-

ables the construction of recursive procedures (otherwise the definition of factorial given above would not

function). To eliminate define, therefore, we must create a way of defining recursive procedures. . . without

recursion!

224

CHAPTER 22. SHRINKING THE LANGUAGE

As we do this, we will sometimes encounter expressions that we’re not yet sure how to write. We will

use the symbol • to represent a special value: if the computation ever tries to apply it, it halts immediately.

Think of it as a landmine that the computation should never apply.2

Let’s now study the recursion in factorial. Let’s begin with the following skeletal definition:

fact ≡

(lambda (n)

(if (zero? n)

1

(∗ n (• (sub1 n)))))

This definition is not entirely useless. Indeed, it correctly computes the factorial on the input 0. On any

input greater than 0, however, the computation terminates uselessly.

We can make a more useful definition by including a copy of fact as follows:

fact ≡

(lambda (n)

(if (zero? n)

1

(lambda (n)

(if (zero? n)

1

(∗ n (

(∗ n (• (sub1 n)))))

(sub1 n)))))

This definition works perfectly well on inputs of 0 and 1, but not greater. We can repeat this process

endlessly—a process, not at all coincidentally, reminiscent of creating the proper environment for recursion

in Section 9—but obviously, we will never get the true definition of factorial. We’ll have to do better.

While we’re trying to generate a spark of insight, let’s try to clean up the code above. Instead of relying

on this unspecified • operation, we might as well just parameterize over that location in the program:

mk-fact ≡

(lambda (f )

(lambda (n)

(if (zero? n)

1

(∗ n (f (sub1 n))))))

The resulting procedure isn’t quite factorial itself, but rather a factorial-maker: given the right value for

the parameter, it will yield the proper factorial procedure. That still begs the question, however, of what to

supply as a parameter.

Let’s go back to our doomed attempt to nest copies of the factorial procedure. This has the advantage

that, until the copies run out, there is always another copy available. So we have a clearer handle on the

problem now: we need to provide as a parameter something that will create another copy upon demand.

2A good approximation of • is the Scheme procedure exit.

22.4. ELIMINATING RECURSION

225

It would seem that mk-fact is just such a creator of copies. So what happens if we feed mk-fact as the

argument to mk-fact—

(mk-fact mk-fact)

—to fill the hole where we need a factorial generator? This application results in the procedure

(lambda (f )

(lambda (n)

(if (zero? n)

1

(∗ n (mk-fact (sub1 n))))))

(We’ve just substituted mk-fact for f in the body of mk-fact.) We can safely apply this procedure to 0 and

obtain 1, but if we apply it to any larger input, we get an error: mk-fact is expecting a procedure as its

argument, but here we’re applying it to a number.

Okay, so we cannot apply mk-fact to a number. To gain some insight into what we can apply it to, let’s

apply it to •, so the new definition of mk-fact is:

(lambda (f )

(lambda (n)

(if (zero? n)

1

(∗ n ((f •) (sub1 n))))))

and apply this to mk-fact. Upon substitution, this evaluates to

(lambda (n)

(if (zero? n)

1

(∗ n ((mk-fact •) (sub1 n))))))

This procedure clearly works correctly on the argument value 0. For argument 1, evaluation results in:

(∗ 1

(if (zero? 0)

1

(∗ 0 ((• •) (sub1 n)))))

which also works correctly. For initial arguments of 2 and greater, however, evaluation halts on trying to

apply •. In short, we have a definition of mk-fact that, when applied to itself, works correctly for values of

0 and 1, but no higher.

By itself, this does not seem like any kind of progress at all. In fact, however, we have come most of the

way to a solution. Recall that earlier, to define a factorial procedure for values of 0 and 1, we had to copy the

definition of the procedure, resulting in a definition that was roughly twice as large as the one we wanted.

Now we have one that is roughly the same size as the original, i.e., it involves no copying.

It is not surprising that the computation eventually terminated, when we supplied • as the argument.

But what else could we have supplied? Observe that when we supplied mk-fact as an argument, the term

226

CHAPTER 22. SHRINKING THE LANGUAGE

in the “recursive” position evaluated to (mk-fact •), which continued computation correctly (for one more

step); whereas supplying • made that term (• •), which terminated it. Therefore, we should ensure that

the function position of that application is always mk-fact. This means precisely that instead of supplying

• as the argument, we should supply mk-fact! This should not be surprising: just as applying mk-fact to

itself outside the body of mk-fact was useful in initiating the computation, doing the same inside is useful in

continuing it.

That is, we must define

mk-fact ≡

(lambda (f )

(lambda (n)

(if (zero? n)

1

(∗ n ((f f ) (sub1 n))))))

such that factorial is ostensibly defined by

(mk-fact mk-fact)

Does that work? Substituting as before, we get

(lambda (n)

(if (zero? n)

1

(∗ n ( (mk-fact mk-fact) (sub1 n)))))

This of course means we can substitute the inner application also:

(lambda (n)

(if (zero? n)

1

(lambda (n)

(if (zero? n)

1

(∗ n (

(∗ n ( (mk-fact mk-fact) (sub1 n)))))

(sub1 n)))))

and now we can see the recursion unwind: as we need another copy of factorial, the application of mk-fact to

itself generates a fresh copy. Thus, we have a satifactory solution to the problem of defining the “recursive”

factorial function without any use of recursion!

To summarize, we have the following definition,

mk-fact ≡

(lambda (f )

(lambda (n)

(if (zero? n)

1

(∗ n ((f f ) (sub1 n))))))

22.4. ELIMINATING RECURSION

227

with factorial defined by

fact ≡

(mk-fact mk-fact)

That is, factorial is

((lambda (mk-fact)

(mk-fact mk-fact))

(lambda (f )

(lambda (n)

(if (zero? n)

1

(∗ n ((f f ) (sub1 n)))))))

(Test this! In a fresh Scheme session, apply this expression directly to numeric values and make sure you

get the factorial of the input as a result. Pretty amazing, huh?)

While this is a correct implementation of factorial, we seem to be writing a lot of code relative to the

recursive version defined using define. Furthermore, we would like to know how much of this solution can

apply to other functions also. With that in mind, let’s try to refactor this code a little. What would we like to

write? As programmers, we would rather not have to keep track of the self-application in the body; that is,

we would rather write

(make-recursive-procedure

(lambda (fact)

(lambda (n)

(if (zero? n)

1

(∗ n (fact (sub1 n)))))))

which looks almost exactly like the definition created using define. So, how do we get the self-application

out?

Observe that the definition of factorial above is equivalent to this one:

((lambda (mk-fact)

(mk-fact mk-fact))

(lambda (f )

(lambda (g)

(lambda (n)

(if (zero? n)

1

(

(∗ n (g (sub1 n))))))

(f f ))))

All we have done is introduce a new level of abstraction, binding g to the self-application of f . Note,

however, that the boxed expression is precisely the definition of factorial that we wanted to write, except

228

CHAPTER 22. SHRINKING THE LANGUAGE

that fact is called g. One more level of abstraction separates the factorial-specific code from the recursive

function generator:

make-recursive-procedure ≡

(lambda (p)

((lambda (mk-fact)

(mk-fact mk-fact))

(lambda (f )

(p

(f f )))))

In fact, because this code has nothing to do with defining factorial at all, we can rename mk-fact:

make-recursive-procedure ≡

(lambda (p)

((lambda (f )

(f f ))

(lambda (f )

(p (f f )))))

This is now a generic procedure that creates recursive procedures out of its argument! It is remarkable that

such a procedure even exists; its structure is daunting at first sight, but relatively easy to understand once

you grasp the need for “copies” of a procedure, and that self-application generates as many of these copies

as necessary.

In the literature, the procedure make-recursive-procedure is known as the Y combinator. It is sometimes

also known as a “fixed-point combinator”, because it computes the fixed-point of its argument procedure.

The Lambda Calculus

With the definition of the Y combinator, we have reduced all of Scheme to just three primitives:

procedure definition, procedure application, and variables. With just those three, we have provided

an encoding of all of the rest of the language. This compact little language is the core of what is

known, also for historical reasons, as the lambda calculus. The “calculus” part of the language is

beyond the scope of our study.

In the 1930s, several mathematicians were asking fundamental questions about what could be com-

puted procedurally, and about the relative power of different formalisms. While Alan Turing was

defining his Turing machine formalism, Alonzo Church and several others created an alternate for-

malism: the lambda calculus. These mathematicians were able to demonstrate that several of their

formalisms—particularly these two—were equivalent in expressive power, so theoreticians could

choose one or the other based on convenience and suitability, without worrying about expressive

constraints. (To this day, many choose to use the lambda calculus and its variants since it offers so

much more expressive power than a Turing machine.) Indeed, the fact that so many independently-

derived formalisms had the same expressive power led to the formulation of the Church-Turing

thesis: that no formal language is more powerful than those defined by Church and Turing (the

lambda calculus and the Turing machine, respectively).

22.4. ELIMINATING RECURSION

229

Exercise 22.4.1 Type the definition of make-recursive-procedure into Scheme and use it to create a recursive

factorial procedure:

(make-recursive-procedure

(lambda (fact)

(lambda (n)

(if (zero? n)

1

(∗ n (fact (sub1 n)))))))

What do you observe? Explain and correct.

230

CHAPTER 22. SHRINKING THE LANGUAGE

Chapter 23

Semantics

We have been writing interpreters in Scheme in order to understand various features of programming lan-

guages. What if we want to explain our interpreter to someone else? If that person doesn’t know Scheme,

we can’t communicate how our interpreter works. It would be convenient to have some common language

for explaining interpreters. We already have one: math!

Let’s try some simple examples. If our program is a number n, it just evaluates to some mathematical

representation of n. We’ll use a n to represent this number, whereas n itself will hold the numeral. For

instance, the numeral 5 is represented by the number 5 (note the subtle differences in typesetting!). In other

words, we will write

n ⇒ n

where we read ⇒ as “reduces to”. Numbers are already values, so they don’t need further reduction.

How about addition? We might be tempted to write

{+ l r} ⇒ l + r

In particular, the addition to the left of the ⇒ is in the programming language, while the one on the right

happens in mathematics and results in a number. That is, the addition symbol on the left is syntactic. It could

map to any mathematical operation. A particularly perverse language might map it to multiplication, but

more realistically, it is likely to map to addition modulo some base to reflect fixed-precision arithmetic. It

is the expression on the right that gives it meaning, and in this case it assigns the meaning we would expect

(corresponding, say, to DrScheme’s use of unlimited-precision numbers for integers and rationals).

That said, this definition is unsatisfactory. Mathematical addition only works on numbers, but l and r

might each be complex expressions in need of reduction to a value (in particular, a number) so they can be

added together. We denote this as follows:

l ⇒ lv

r ⇒ rv

{+ l r} ⇒ lv + rv

The terms above the bar are called the antecedents, and those below are the consequents. This rule is just a

convenient way of writing an “if . . . then” expression: it says that if the conditions in the antecedent hold,

then those in the consequent hold. If there are multiple conditions in the antecedent, they must all hold for

231

232

CHAPTER 23. SEMANTICS

the rule to hold. So we read the rule above as: if l reduces to lv, and if r reduces to rv, then adding the

respective expressions results in the sum of their values. (In particular, it makes sense to add lv and rv, since

each is now a number.) A rule of this form is called a judgment, because based on the truth of the conditions

in the antecedent, it issues a judgment in the consequent (in this case, that the sum will be a particular value).

These rules subtly also bind names to values. That is, a different way of reading the rule is not as an “if

. . . then” but rather as an imperative: it says “reduce l, call the result lv; reduce r, call its result rv; if these

two succeed, then add lv and rv, and declare the sum the result for the entire expression”. Seen this way, l

and r are bound in the consequent to the sub-expressions of the addition term, while lv and rv are bound in

the antecedent to the results of evaluation (or reduction). This representation truly is an abstract description

of the interpreter.

Let’s turn our attention to functions. We want them to evaluate to closures, which consist of a name, a

body and an environment. How do we represent a structure in mathematics? A structure is simply a tuple,

in this case a triple. (If we had multiple kinds of tuples, we might use tags to distinguish between them, but

for now that won’t be necessary.) We would like to write

{fun {i} b} ⇒ i, b, ???

but the problem is we don’t have a value for the environment to store in the closure. So we’ll have to make

the environment explicit. From now on, ⇒ will always have a term and an environment on the left, and a

value on the right. We first rewrite our two existing reduction rules:

n, E ⇒ n

l, E ⇒ lv

r, E ⇒ rv

{+ l r}, E ⇒ lv + rv

Now we can define a reduction rule for functions:

{fun {i} b}, E ⇒ i, b, E

Given an environment, we can also look up the value of identifiers:

i, E ⇒ E (i)

All that remains is application. As with addition, application must first evaluate its subexpressions, so

the general form of an application must be as follows:

f , E ⇒???

a, E ⇒???

{ f a}, E ⇒???

What kind of value must f reduce to? A closure, naturally:

f , E ⇒ i, b, E

a, E ⇒???

{ f a}, E ⇒???

233

(We’ll use E to represent to closure environment to make clear that it may be different from E .) We don’t

particularly care what kind of value a reduces to; we’re just going to substitute it:

f , E ⇒ i, b, E

a, E ⇒ av

{ f a}, E ⇒???

But what do we write below? We have to evaluate the body, b, in the extended environment; whatever value

it returns is the value of the application. So the evaluation of b also moves into the antecedent:

f , E ⇒ i, b, E

a, E ⇒ av

b, ??? ⇒ bv

{ f a}, E ⇒ bv

In what environment do we reduce the body? It has to be the environment in the closure; if we use E instead

of E , we introduce dynamic rather than static scoping! But additionally, we must extend E with a binding

for the identifier named by i; in particular, it must be bound to the value of the argument. We can write all

this concisely as

f , E ⇒ i, b, E

a, E ⇒ av

b, E [i←av] ⇒ bv

{ f a}, E ⇒ bv

where E [i←av] means “the environment E extended with the identifier i bound to the value av”. If E

already has a binding for i, this extension shadows that binding.

The judicious use of names conveys information here. We’re demanding that the value used to extend

the environment must be the same as that resulting from evaluating a: the use of av in both places indicates

that. It also places an ordering on operations: clearly the environment can’t be extended until av is available,

so the argument must evaluate before application can proceed with the function’s body. The choice of two

different names for environments—E and E —denotes that the two environments need not be the same.

We call this a big-step operational semantics. It’s a semantics because it ascribes meanings to programs.

(We can see how a small change can result in dynamic instead of static scope and, more mundanely, that the

meaning of + is given to be addition, not some other binary operation.) It’s operational because evaluation

largely proceeds in a mechanical fashion; we aren’t compiling the entire program into a mathematical object

and using fancy math to reduce it to an answer. Finally, it’s big-step because ⇒ reduces expressions down

to irreducible answers. In contrast, a small-step semantics performs one atomic reduction at a time, rather

like a stepper in a programming environment.

Exercise 23.0.2 Extend the semantics to capture conditionals.

Exercise 23.0.3 Extend the semantics to capture lists.

Hint: You may want to consider tagging tuples.

Exercise 23.0.4 Extend the semantics to capture recursion.

Exercise 23.0.5 Alter the semantics to reflect lazy evaluation instead.

234

CHAPTER 23. SEMANTICS

Part X

Types

235

Chapter 24

Introduction

Until now, we’ve largely ignored the problem of program errors. We haven’t done so entirely: if a program-

mer writes

{fun {x}}

we do reject this program, because it isn’t syntactically legal—every function must have a body. But what

if, instead, he were to write

{+ 3

{fun {x} x}}

? Right now, our interpreter might produce an error such as

num-n: not a number

A check deep in the bowels of our interpreter is flagging the use of a non-numeric value in a position

expecting a number.

At this point, we can make the same distinction between the syntactic and meta levels about errors as

we did about representations. The error above is an error at the syntactic level,1 because the interpreter

is checking for the correct use of its internal representation. Suppose we had division in the interpreted

language, and the corresponding num/ procedure failed to check that the denominator was non-zero; then

the interpreter’s behavior would be that of Scheme’s on division-by-zero. If we had expected an error

and Scheme did not flag one (or vice versa), then the interpreter would be unfaithful to the intent of the

interpreted language.

Of course, this discussion about the source of error messages somewhat misses the point: we really

ought to reject this program without ever executing it. But rejecting it is difficult because this program is

legitimate from the perspective of the parser. It’s only illegal from the semantic viewpoint, it is the meaning,

as opposed to the syntax, of + that does not accept functions as arguments. Therefore, we clearly need a

more sophisticated layer that checks for the validity of programs.

1Not to be confused with a syntax error!

237

238

CHAPTER 24. INTRODUCTION

How hard is this? Rejecting the example above seems