This seems to create an entirely new level of difficulty. Addition seemed to need the ability to modify the
numeral to apply the first argument one more time, but we found a clever way of applying it “from the
outside”. Subtraction, on the other hand, requires that a procedure not be applied, which seems harder still.
The solution to this problem is to make the following observation. Consider the pair 0, 0 . Now apply
the following algorithm. Given such a pair of numbers, create a new pair. The new pair’s left component is
the old pair’s right component; the new pair’s right component is the old pair’s right component incremented
by one. Visually,
initial value
:
0, 0
after 1 iteration
:
0, 1
after 2 iterations
:
1, 2
after 3 iterations
:
2, 3
after 4 iterations
:
3, 4
and so on.
You might find this procedure rather strange, in that it entirely ignores the left component of each
preceding pair to create the next one in the sequence. Notice, however, that after n iterations, the left
component holds the value n − 1. Furthermore, observe the operations that we used to obtain these pairs:
creation of an initial pair, pair deconstruction, increment by one, and new pair construction. That is, the
following procedure represents the algorithm applied at each step:
(lambda (p)
(pair (right p)
(succ (right p))))
The following represents the initial value:
(pair zero zero)
If we apply this n times, then read out the left value of the resulting pair, we get. . . n − 1!
pred ≡
(lambda (n)
22.4. ELIMINATING RECURSION
223
(left
(lambda (p)
(pair (right p)
((n
(succ (right p)))) )
(pair zero zero) )))
Once we have subtraction by one, we can implement regular subtraction and other such operations.
That leaves only one arithmetic-related primitive we need to implement factorial, namely Scheme’s
zero?. What does this operator do? Given a representation for zero, it returns true, otherwise false. What
is the one characteristic that distinguishes the numeral for zero from that for all non-zero numbers? The
latter all apply their first argument to their second at least once, while the former does not. Therefore, the
following defines zero?:
iszero ≡
(lambda (n)
((n (lambda (ignore) no)) yes))
If the first argument is applied at all, no matter how many times it’s applied, it returns the representation of
false; if it never is, then the “zero” value, the representation of true, results.
Historical aside: These numerals are known as the Church numerals, in honor of their inventor, Alonzo
Church.
Exercise 22.3.1 Can you extend this encoding to other kinds of numbers: negative integers, rationals, reals,
complex numbers, . . . ?
Hint: Some of these are easy to encode using pairs of other kinds of numbers.
Exercise 22.3.2 Here are two alternate representations of the exponentiation operation. Which one is
faster?
(define exp1
(lambda (m)
(lambda (n)
((n (prod m)) one))))
(define exp2
(lambda (m)
(lambda (n)
(n m))))
22.4
Eliminating Recursion
The define construct of Scheme is surprisingly powerful. It not only assigns values to names, it also en-
ables the construction of recursive procedures (otherwise the definition of factorial given above would not
function). To eliminate define, therefore, we must create a way of defining recursive procedures. . . without
recursion!
224
CHAPTER 22. SHRINKING THE LANGUAGE
As we do this, we will sometimes encounter expressions that we’re not yet sure how to write. We will
use the symbol • to represent a special value: if the computation ever tries to apply it, it halts immediately.
Think of it as a landmine that the computation should never apply.2
Let’s now study the recursion in factorial. Let’s begin with the following skeletal definition:
fact ≡
(lambda (n)
(if (zero? n)
1
(∗ n (• (sub1 n)))))
This definition is not entirely useless. Indeed, it correctly computes the factorial on the input 0. On any
input greater than 0, however, the computation terminates uselessly.
We can make a more useful definition by including a copy of fact as follows:
fact ≡
(lambda (n)
(if (zero? n)
1
(lambda (n)
(if (zero? n)
1
(∗ n (
(∗ n (• (sub1 n)))))
(sub1 n)))))
This definition works perfectly well on inputs of 0 and 1, but not greater. We can repeat this process
endlessly—a process, not at all coincidentally, reminiscent of creating the proper environment for recursion
in Section 9—but obviously, we will never get the true definition of factorial. We’ll have to do better.
While we’re trying to generate a spark of insight, let’s try to clean up the code above. Instead of relying
on this unspecified • operation, we might as well just parameterize over that location in the program:
mk-fact ≡
(lambda (f )
(lambda (n)
(if (zero? n)
1
(∗ n (f (sub1 n))))))
The resulting procedure isn’t quite factorial itself, but rather a factorial-maker: given the right value for
the parameter, it will yield the proper factorial procedure. That still begs the question, however, of what to
supply as a parameter.
Let’s go back to our doomed attempt to nest copies of the factorial procedure. This has the advantage
that, until the copies run out, there is always another copy available. So we have a clearer handle on the
problem now: we need to provide as a parameter something that will create another copy upon demand.
2A good approximation of • is the Scheme procedure exit.
22.4. ELIMINATING RECURSION
225
It would seem that mk-fact is just such a creator of copies. So what happens if we feed mk-fact as the
argument to mk-fact—
(mk-fact mk-fact)
—to fill the hole where we need a factorial generator? This application results in the procedure
(lambda (f )
(lambda (n)
(if (zero? n)
1
(∗ n (mk-fact (sub1 n))))))
(We’ve just substituted mk-fact for f in the body of mk-fact.) We can safely apply this procedure to 0 and
obtain 1, but if we apply it to any larger input, we get an error: mk-fact is expecting a procedure as its
argument, but here we’re applying it to a number.
Okay, so we cannot apply mk-fact to a number. To gain some insight into what we can apply it to, let’s
apply it to •, so the new definition of mk-fact is:
(lambda (f )
(lambda (n)
(if (zero? n)
1
(∗ n ((f •) (sub1 n))))))
and apply this to mk-fact. Upon substitution, this evaluates to
(lambda (n)
(if (zero? n)
1
(∗ n ((mk-fact •) (sub1 n))))))
This procedure clearly works correctly on the argument value 0. For argument 1, evaluation results in:
(∗ 1
(if (zero? 0)
1
(∗ 0 ((• •) (sub1 n)))))
which also works correctly. For initial arguments of 2 and greater, however, evaluation halts on trying to
apply •. In short, we have a definition of mk-fact that, when applied to itself, works correctly for values of
0 and 1, but no higher.
By itself, this does not seem like any kind of progress at all. In fact, however, we have come most of the
way to a solution. Recall that earlier, to define a factorial procedure for values of 0 and 1, we had to copy the
definition of the procedure, resulting in a definition that was roughly twice as large as the one we wanted.
Now we have one that is roughly the same size as the original, i.e., it involves no copying.
It is not surprising that the computation eventually terminated, when we supplied • as the argument.
But what else could we have supplied? Observe that when we supplied mk-fact as an argument, the term
226
CHAPTER 22. SHRINKING THE LANGUAGE
in the “recursive” position evaluated to (mk-fact •), which continued computation correctly (for one more
step); whereas supplying • made that term (• •), which terminated it. Therefore, we should ensure that
the function position of that application is always mk-fact. This means precisely that instead of supplying
• as the argument, we should supply mk-fact! This should not be surprising: just as applying mk-fact to
itself outside the body of mk-fact was useful in initiating the computation, doing the same inside is useful in
continuing it.
That is, we must define
mk-fact ≡
(lambda (f )
(lambda (n)
(if (zero? n)
1
(∗ n ((f f ) (sub1 n))))))
such that factorial is ostensibly defined by
(mk-fact mk-fact)
Does that work? Substituting as before, we get
(lambda (n)
(if (zero? n)
1
(∗ n ( (mk-fact mk-fact) (sub1 n)))))
This of course means we can substitute the inner application also:
(lambda (n)
(if (zero? n)
1
(lambda (n)
(if (zero? n)
1
(∗ n (
(∗ n ( (mk-fact mk-fact) (sub1 n)))))
(sub1 n)))))
and now we can see the recursion unwind: as we need another copy of factorial, the application of mk-fact to
itself generates a fresh copy. Thus, we have a satifactory solution to the problem of defining the “recursive”
factorial function without any use of recursion!
To summarize, we have the following definition,
mk-fact ≡
(lambda (f )
(lambda (n)
(if (zero? n)
1
(∗ n ((f f ) (sub1 n))))))
22.4. ELIMINATING RECURSION
227
with factorial defined by
fact ≡
(mk-fact mk-fact)
That is, factorial is
((lambda (mk-fact)
(mk-fact mk-fact))
(lambda (f )
(lambda (n)
(if (zero? n)
1
(∗ n ((f f ) (sub1 n)))))))
(Test this! In a fresh Scheme session, apply this expression directly to numeric values and make sure you
get the factorial of the input as a result. Pretty amazing, huh?)
While this is a correct implementation of factorial, we seem to be writing a lot of code relative to the
recursive version defined using define. Furthermore, we would like to know how much of this solution can
apply to other functions also. With that in mind, let’s try to refactor this code a little. What would we like to
write? As programmers, we would rather not have to keep track of the self-application in the body; that is,
we would rather write
(make-recursive-procedure
(lambda (fact)
(lambda (n)
(if (zero? n)
1
(∗ n (fact (sub1 n)))))))
which looks almost exactly like the definition created using define. So, how do we get the self-application
out?
Observe that the definition of factorial above is equivalent to this one:
((lambda (mk-fact)
(mk-fact mk-fact))
(lambda (f )
(lambda (g)
(lambda (n)
(if (zero? n)
1
(
(∗ n (g (sub1 n))))))
(f f ))))
All we have done is introduce a new level of abstraction, binding g to the self-application of f . Note,
however, that the boxed expression is precisely the definition of factorial that we wanted to write, except
228
CHAPTER 22. SHRINKING THE LANGUAGE
that fact is called g. One more level of abstraction separates the factorial-specific code from the recursive
function generator:
make-recursive-procedure ≡
(lambda (p)
((lambda (mk-fact)
(mk-fact mk-fact))
(lambda (f )
(p
(f f )))))
In fact, because this code has nothing to do with defining factorial at all, we can rename mk-fact:
make-recursive-procedure ≡
(lambda (p)
((lambda (f )
(f f ))
(lambda (f )
(p (f f )))))
This is now a generic procedure that creates recursive procedures out of its argument! It is remarkable that
such a procedure even exists; its structure is daunting at first sight, but relatively easy to understand once
you grasp the need for “copies” of a procedure, and that self-application generates as many of these copies
as necessary.
In the literature, the procedure make-recursive-procedure is known as the Y combinator. It is sometimes
also known as a “fixed-point combinator”, because it computes the fixed-point of its argument procedure.
The Lambda Calculus
With the definition of the Y combinator, we have reduced all of Scheme to just three primitives:
procedure definition, procedure application, and variables. With just those three, we have provided
an encoding of all of the rest of the language. This compact little language is the core of what is
known, also for historical reasons, as the lambda calculus. The “calculus” part of the language is
beyond the scope of our study.
In the 1930s, several mathematicians were asking fundamental questions about what could be com-
puted procedurally, and about the relative power of different formalisms. While Alan Turing was
defining his Turing machine formalism, Alonzo Church and several others created an alternate for-
malism: the lambda calculus. These mathematicians were able to demonstrate that several of their
formalisms—particularly these two—were equivalent in expressive power, so theoreticians could
choose one or the other based on convenience and suitability, without worrying about expressive
constraints. (To this day, many choose to use the lambda calculus and its variants since it offers so
much more expressive power than a Turing machine.) Indeed, the fact that so many independently-
derived formalisms had the same expressive power led to the formulation of the Church-Turing
thesis: that no formal language is more powerful than those defined by Church and Turing (the
lambda calculus and the Turing machine, respectively).
22.4. ELIMINATING RECURSION
229
Exercise 22.4.1 Type the definition of make-recursive-procedure into Scheme and use it to create a recursive
factorial procedure:
(make-recursive-procedure
(lambda (fact)
(lambda (n)
(if (zero? n)
1
(∗ n (fact (sub1 n)))))))
What do you observe? Explain and correct.
230
CHAPTER 22. SHRINKING THE LANGUAGE
Chapter 23
Semantics
We have been writing interpreters in Scheme in order to understand various features of programming lan-
guages. What if we want to explain our interpreter to someone else? If that person doesn’t know Scheme,
we can’t communicate how our interpreter works. It would be convenient to have some common language
for explaining interpreters. We already have one: math!
Let’s try some simple examples. If our program is a number n, it just evaluates to some mathematical
representation of n. We’ll use a n to represent this number, whereas n itself will hold the numeral. For
instance, the numeral 5 is represented by the number 5 (note the subtle differences in typesetting!). In other
words, we will write
n ⇒ n
where we read ⇒ as “reduces to”. Numbers are already values, so they don’t need further reduction.
How about addition? We might be tempted to write
{+ l r} ⇒ l + r
In particular, the addition to the left of the ⇒ is in the programming language, while the one on the right
happens in mathematics and results in a number. That is, the addition symbol on the left is syntactic. It could
map to any mathematical operation. A particularly perverse language might map it to multiplication, but
more realistically, it is likely to map to addition modulo some base to reflect fixed-precision arithmetic. It
is the expression on the right that gives it meaning, and in this case it assigns the meaning we would expect
(corresponding, say, to DrScheme’s use of unlimited-precision numbers for integers and rationals).
That said, this definition is unsatisfactory. Mathematical addition only works on numbers, but l and r
might each be complex expressions in need of reduction to a value (in particular, a number) so they can be
added together. We denote this as follows:
l ⇒ lv
r ⇒ rv
{+ l r} ⇒ lv + rv
The terms above the bar are called the antecedents, and those below are the consequents. This rule is just a
convenient way of writing an “if . . . then” expression: it says that if the conditions in the antecedent hold,
then those in the consequent hold. If there are multiple conditions in the antecedent, they must all hold for
231
232
CHAPTER 23. SEMANTICS
the rule to hold. So we read the rule above as: if l reduces to lv, and if r reduces to rv, then adding the
respective expressions results in the sum of their values. (In particular, it makes sense to add lv and rv, since
each is now a number.) A rule of this form is called a judgment, because based on the truth of the conditions
in the antecedent, it issues a judgment in the consequent (in this case, that the sum will be a particular value).
These rules subtly also bind names to values. That is, a different way of reading the rule is not as an “if
. . . then” but rather as an imperative: it says “reduce l, call the result lv; reduce r, call its result rv; if these
two succeed, then add lv and rv, and declare the sum the result for the entire expression”. Seen this way, l
and r are bound in the consequent to the sub-expressions of the addition term, while lv and rv are bound in
the antecedent to the results of evaluation (or reduction). This representation truly is an abstract description
of the interpreter.
Let’s turn our attention to functions. We want them to evaluate to closures, which consist of a name, a
body and an environment. How do we represent a structure in mathematics? A structure is simply a tuple,
in this case a triple. (If we had multiple kinds of tuples, we might use tags to distinguish between them, but
for now that won’t be necessary.) We would like to write
{fun {i} b} ⇒ i, b, ???
but the problem is we don’t have a value for the environment to store in the closure. So we’ll have to make
the environment explicit. From now on, ⇒ will always have a term and an environment on the left, and a
value on the right. We first rewrite our two existing reduction rules:
n, E ⇒ n
l, E ⇒ lv
r, E ⇒ rv
{+ l r}, E ⇒ lv + rv
Now we can define a reduction rule for functions:
{fun {i} b}, E ⇒ i, b, E
Given an environment, we can also look up the value of identifiers:
i, E ⇒ E (i)
All that remains is application. As with addition, application must first evaluate its subexpressions, so
the general form of an application must be as follows:
f , E ⇒???
a, E ⇒???
{ f a}, E ⇒???
What kind of value must f reduce to? A closure, naturally:
f , E ⇒ i, b, E
a, E ⇒???
{ f a}, E ⇒???
233
(We’ll use E to represent to closure environment to make clear that it may be different from E .) We don’t
particularly care what kind of value a reduces to; we’re just going to substitute it:
f , E ⇒ i, b, E
a, E ⇒ av
{ f a}, E ⇒???
But what do we write below? We have to evaluate the body, b, in the extended environment; whatever value
it returns is the value of the application. So the evaluation of b also moves into the antecedent:
f , E ⇒ i, b, E
a, E ⇒ av
b, ??? ⇒ bv
{ f a}, E ⇒ bv
In what environment do we reduce the body? It has to be the environment in the closure; if we use E instead
of E , we introduce dynamic rather than static scoping! But additionally, we must extend E with a binding
for the identifier named by i; in particular, it must be bound to the value of the argument. We can write all
this concisely as
f , E ⇒ i, b, E
a, E ⇒ av
b, E [i←av] ⇒ bv
{ f a}, E ⇒ bv
where E [i←av] means “the environment E extended with the identifier i bound to the value av”. If E
already has a binding for i, this extension shadows that binding.
The judicious use of names conveys information here. We’re demanding that the value used to extend
the environment must be the same as that resulting from evaluating a: the use of av in both places indicates
that. It also places an ordering on operations: clearly the environment can’t be extended until av is available,
so the argument must evaluate before application can proceed with the function’s body. The choice of two
different names for environments—E and E —denotes that the two environments need not be the same.
We call this a big-step operational semantics. It’s a semantics because it ascribes meanings to programs.
(We can see how a small change can result in dynamic instead of static scope and, more mundanely, that the
meaning of + is given to be addition, not some other binary operation.) It’s operational because evaluation
largely proceeds in a mechanical fashion; we aren’t compiling the entire program into a mathematical object
and using fancy math to reduce it to an answer. Finally, it’s big-step because ⇒ reduces expressions down
to irreducible answers. In contrast, a small-step semantics performs one atomic reduction at a time, rather
like a stepper in a programming environment.
Exercise 23.0.2 Extend the semantics to capture conditionals.
Exercise 23.0.3 Extend the semantics to capture lists.
Hint: You may want to consider tagging tuples.
Exercise 23.0.4 Extend the semantics to capture recursion.
Exercise 23.0.5 Alter the semantics to reflect lazy evaluation instead.
234
CHAPTER 23. SEMANTICS
Part X
Types
235
Chapter 24
Introduction
Until now, we’ve largely ignored the problem of program errors. We haven’t done so entirely: if a program-
mer writes
{fun {x}}
we do reject this program, because it isn’t syntactically legal—every function must have a body. But what
if, instead, he were to write
{+ 3
{fun {x} x}}
? Right now, our interpreter might produce an error such as
num-n: not a number
A check deep in the bowels of our interpreter is flagging the use of a non-numeric value in a position
expecting a number.
At this point, we can make the same distinction between the syntactic and meta levels about errors as
we did about representations. The error above is an error at the syntactic level,1 because the interpreter
is checking for the correct use of its internal representation. Suppose we had division in the interpreted
language, and the corresponding num/ procedure failed to check that the denominator was non-zero; then
the interpreter’s behavior would be that of Scheme’s on division-by-zero. If we had expected an error
and Scheme did not flag one (or vice versa), then the interpreter would be unfaithful to the intent of the
interpreted language.
Of course, this discussion about the source of error messages somewhat misses the point: we really
ought to reject this program without ever executing it. But rejecting it is difficult because this program is
legitimate from the perspective of the parser. It’s only illegal from the semantic viewpoint, it is the meaning,
as opposed to the syntax, of + that does not accept functions as arguments. Therefore, we clearly need a
more sophisticated layer that checks for the validity of programs.
1Not to be confused with a syntax error!
237
238
CHAPTER 24. INTRODUCTION
How hard is this? Rejecting the example above seems