Programming Languages: Application and Interpretation by Shriram Krishnamurthi - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

so the resulting tree above does not match the form of the addition judgment (no matter what goes in

place of the two ???’s). To match the addition judgment the tree must have the form

/0 3 : number

/0 {fun {x :

number} :

number x}} : number

/0 {+ 3 {fun {x :

number} :

number x}}} :

???

Unfortunately, we do not have any judgments that let us conclude that a syntactic function term can

have a numeric type. So this doesn’t work either.

In short, we cannot construct a legal type derivation tree for the original term. Notice that this is not

the same as saying that the tree directly identifies an error: it does not. A type error occurs when we

are unable to construct a type judgment tree.

248

CHAPTER 25. TYPE JUDGMENTS

This is subtle enough to bear repeating: To flag a program as erroneous, we must prove that no type

derivation tree can possibly exist for that term. But perhaps some sequence of judgments that we haven’t

thought of exists that (a) is legal and (b) correctly ascribes a type to the term! To avoid this we may need

to employ quite a sophisticated proof technique, even human knowledge. (In the third example above, for

instance, we say, “we do not have any judgments that let us conclude that a syntactic function term can have

a numeric type”. But how do we know this is true? We can only conclude this by carefully studying the

structure of the judgments. A computer program might not be so lucky, and in fact may get stuck endlessly

trying judgments!)

This is why a set of type judgments alone does not suffice: what we’re really interested in is a type system

that includes an algorithm for type-checking. For the set of judgments we’ve written here, and indeed for the

ones we’ll study initially, a simple top-down, syntax-directed algorithm suffices for (a) determining the type

of each expression, and (b) concluding that some expressions manifest type errors. As our type judgments

get more sophisticated, we will need to develop more complex algorithms to continue producing tractable

and useful type systems.

Chapter 26

Typing Control

26.1

Conditionals

Let’s expand our language with a conditional construct. We can use if0 like before, but for generality it’s

going to be more convenient to have a proper conditional and a language of predicates. The type judgment

for the conditional must have the following form:

Γ c :???

Γ t :???

Γ e :???

Γ {if c t e} :???

where c is the conditional, t the “then”-expression, and e the “else”-expression.

Let’s begin with the type for c. What should it be? In a language like Scheme we permit any value, but

in a stricter, typed language, we might demand that the expression always evaluate to a boolean. (After all,

if the point is to detect errors sooner, then it does us no good to be overly lax in our type rules.) However,

we don’t yet have such a type in our type language, so we must first extend that language:

<type> ::= number

| boolean

| (<type> -> <type>)

Armed with the new type, we can now ascribe a type to the conditional expression:

Γ c : boolean

Γ t :???

Γ e :???

Γ {if c t e} :???

Now what of the other two, and of the result of the expression? One option is, naturally, to allow both arms

of the conditional to have whatever types the programmer wants:

Γ c : boolean

Γ t : τ1

Γ e : τ2

Γ {if c t e} :???

By using two distinct type variables, we do not demand any conformity between the actual types of the

arms. By permitting this flexibility, however, we encounter two problems. The first is that it isn’t clear what

249

250

CHAPTER 26. TYPING CONTROL

type to ascribe to the expression overall.1 Second, it reduces our ability to trap program errors. Consider a program like this:

{+ 3

{if {is-zero mystery}

5

{fun {x} x}}}

Because we know nothing about mystery, we must conservatively conclude that it might be non-zero,

which means eventually we are going to see a type error that we only catch at run-time. But why permit the

programmer to write such a program at all? We might as well prevent it from ever executing. Therefore, we

use the following rule to type conditionals:

Γ c : boolean

Γ t : τ

Γ e : τ

Γ {if c t e} : τ

Notice that by forcing the two arms to have the same type, we can assign that common type to the entire

expression, so the type system does not need to know which branch was chosen on a given execution: the

type remains the same.

Having added conditionals and the type boolean isn’t very useful yet, because we haven’t yet intro-

duced predicates to use in the test position of the conditional. Indeed, we can easily see that this is true

becuase we have not yet written a function type with boolean on the right-hand side of the arrow. You

can, however, easily imagine adding procedures such as is-zero, with type number -> boolean.

26.2

Recursion

Now that we have conditionals, if we can also implement recursion, we would have a Turing-complete

language (that could, for instance, with a little more arithmetic support, enable writing factorial). So the

next major piece of the puzzle is typing recursion.

Given the language TFAE (typed FAE), can we write a recursive program? Let’s try to write an infinite

loop. Our first attempt might be this FAE program

{with {f {fun {i}

{f i}}}

{f 10}}

which, expanded out, becomes

{{fun {f}

{f 10}}

{fun {i}

{f i}}}

1It’s tempting to create a new kind of type, a union type, so that the type of the expression is τ1 ∪ τ2. This has far-reaching

consequences, however, including a significant reduction in type-based guarantee of program reliability.

26.2. RECURSION

251

When we place type annotations on this program, we get

{{fun {f : (num -> num)} : num

{f 10}}

{fun {i : num} : num

{f i}}}

These last two steps don’t matter, of course. This program doesn’t result in an infinite loop, because the f

in the body of the function isn’t bound, so after the first iteration, the program halts with an error.

As an aside, this error is easier to see in the typed program: when the type checker tries to check the

type of the annotated program, it finds no type for f on the last line. Therefore, it would halt with a type

error, preventing this erroneous program from ever executing.2

Okay, that didn’t work, but we knew about that problem: we saw it in Section 9 when introducing

recursion. At the time, we asked you to consider whether it was possible to write a recursive function

without an explicit recursion construct, and Section 22 shows that it is indeed possible. The essence of the solution presented there is to use self-application:

{with {omega {fun {x}

{x x}}}

{omega omega}}

How does this work? Simply substituting omega with the function, we get

{{fun {x} {x x}}

{fun {x} {x x}}}

Substituting again, we get

{{fun {x} {x x}}

{fun {x} {x x}}}

and so on. In other words, this program executes forever. It is conventional to call the function ω (lower-case

Greek “omega”), and the entire expression Ω (upper-case Greek “omega”).3

Okay, so Ω seems to be our ticket. This is clearly an infinite loop in FAE. All we need to do is convert

it to TFAE, which is simply a matter of annotating all procedures. Since there’s only one, ω, this should be

especially easy.

To annotate ω, we must provide a type for the argument and one for the result. Let’s call the argument

type, namely the type of x, τa and that of the result τr, so that ω : τa→τr. The body of ω is {x x}. From

this, we can conclude that τa must be a function (arrow) type, since we use x in the function position of an

application. That is, τa has the form τ1→τ2, for some τ1 and τ2 yet to be determined.

2In this particular case, of course, a simpler check would prevent the erroneous program from starting to execute, namely

checking to ensure there are no free variables.

3Strictly speaking, it seems anachronistic to refer to the lower and upper “case” for the Greek alphabet, since the language

predates moveable type in the West by nearly two millennia.

252

CHAPTER 26. TYPING CONTROL

What can we say about τ1 and τ2? τ1 must be whatever type x’s argument has. Since x’s argument is

itself x, τ1 must be the same as the type of x. We just said that x has type τa. This immediately implies that

τa = τ1→τ2 = τa→τ2

In other words,

τa = τa→τ2

What type can we write that satisfies this equation? In fact, no types in our type language can satisify

it, because this type is recursive without a base case. Any type we try to write will end up being infinitely

long. Since we cannot write an infinitely long type (recall that we’re trying to annotate ω, so if the type is

infinitely long, we’d never get around to finishing the text of the program), it follows by contradiction4 that ω and Ω cannot be typed in our type system, and therefore their corresponding programs are not programs

in TFAE. (We are being rather lax here—what we’ve provided is informal reasoning, not a proof—but such

a proof does exist.)

26.3

Termination

We concluded our exploration of the type of Ω by saying that the annotation on the argument of ω must be

infinitely long. A curious reader ought to ask, is there any connection between the boundlessness of the type

and the fact that we’re trying to perform a non-terminating computation? Or is it mere coincidence?

TFAE, which is a first cousin of a language you’ll sometimes see referred to as the simply-typed lambda

calculus,5 enjoys a rather interesting property: it is said to be strongly normalizing. This intimidating term says of a programming language that no matter what program you write in the language, it will always

terminate!

To understand why this property holds, think about our type language. The only way to create compound

types is through the function constructor. But every time we apply a function, we discharge one function

constructor: that is, we “erase an arrow”. Therefore, after a finite number of function invocations, the

computation must “run out of arrows”.6 Because only function applications can keep a computation running,

the computation is forced to terminate.

This is a very informal argument for why this property holds—it is cetainly far from a proof (though,

again, a formal proof of this property does exist). However, it does help us see why we must inevitably have

bumped into an infinitely long type while trying to annotate the infinite loop.

What good is a language without infinite loops? There are in fact lots of programs that we would like to

ensure will not run forever. These include:

• inner-loops of real-time systems

• program linkers

4We implicitly assumed it would be possible to annotate ω and explored what that type annotation would be. The contradiction

is that no such annotation is possible.

5Why “simply”? You’ll see what other options there are next week.

6Oddly, this never happens to mythological heroes.

26.4. TYPED RECURSIVE PROGRAMMING

253

• packet filters in network stacks

• client-side Web scripts

• network routers

• device (such as photocopier) initialization

• configuration files (such as Makefiles)

and so on. That’s what makes the simply-typed lambda calculus so wonderful: instead of pondering and

testing endlessly (so to speak), we get mathematical certitude that, with a correct implementation of the type

checker, no infinite loops can sneak past us. In fact, the module system of the SML programming language

is effectively an implementation of the simply-typed lambda calculus, thereby guaranteeing that no matter

how complex a linking specification we write, the linking phase of the compiler will always terminate.

Exercise 26.3.1 We’ve been told that the Halting Problem is undecidable. Yet here we have a language

accompanied by a theorem that proves that all programs will terminate. In particular, then, the Halting

Problem is not only very decidable, it’s actually quite simple: In response to the question “Does this program

halt”, the answer is always “Yes!” Reconcile.

Exercise 26.3.2 While the simply-typed lambda calculus is fun to discuss, it may not be the most pliant

programming language, even as the target of a compiler (much less something programmers write explicitly).

Partly this is because it doesn’t quite focus on the right problem. To a Web browsing user, for instance, what

matters is whether a downloaded program runs immediately; five minutes isn’t really distinguishable from

non-termination.

Consequently, a better variant of the lambda calculus might be one whose types reflect resources, such

as time and space. The “type” checker would then ask the user running the program for resource bounds,

then determine whether the program can actually execute within the provided resources. Can you design

and implement such a language? Can you write useful programs in it?

26.4

Typed Recursive Programming

Strong normalization says we must provide an explicit recursion construct. To do this, we’ll simply reintro-

duce our rec construct to define the language TRCFAE. The BNF for the language is

<TRCFAE> ::= ...

| {rec {<id>

: <type> <TRCFAE>} <TRCFAE>}

with the same type language. Note that the rec construct needs an explicit type annotation also.

What is the type judgment for rec? It must be of the form

???

Γ {rec {i :

τi v} b} : τ

254

CHAPTER 26. TYPING CONTROL

since we want to conclude something about the entire term. What goes in the antecedent? We can determine

this more easily by realizing that a rec is a bit like an immediate function application. So just as with

functions, we’re going to have assumptions and guarantees—just both in the same rule.

We want to assume that τi is a legal annotation, and use that to check the body; but we also want to

guarantee that τi is a legal annotation. Let’s do them in that order. The former is relatively easy:

Γ[i←τi] b : τ

???

Γ {rec {i :

τi v} b} : τ

Now let’s hazard a guess about the form of the latter:

Γ[i←τi] b : τ

Γ v : τ

Γ {rec {i :

τi v} b} : τ

But what is the structure of the term named by v? Surely it has references to the identifier named by i in

it, but i is almost certainly not bound in Γ (and even if it is, it’s not bound to the value we want for i).

Therefore, we’ll have to extend Γ with a binding for i—not surprising, if you think about the scope of i in a

rec term—to check v also:

Γ[i←τi] b : τ

Γ[i←τi] v : τ

Γ {rec {i :

τi v} b} : τ

Is that right? Do we want v to have type τ, the type of the entire expression? Not quite: we want it to have

the type we promised it would have, namely τi:

Γ[i←τi] b : τ

Γ[i←τi] v : τi

Γ {rec {i :

τi v} b} : τ

Now we can understand how the typing of recursion works. We extend the environment not once, but

twice. The extension to type b is the one that initiates the recursion; the extension to type v is the one that

sustains it. Both extensions are therefore necessary. And because a type checker doesn’t actually run the

program, it doesn’t need an infinite number of arrows. When type checking is done and execution begins,

the run-time system does, in some sense, need “an infinite quiver of arrows”, but we’ve already seen how to

implement that in Section 10.

Exercise 26.4.1 Define the BNF entry and generate a type judgment for with in the typed language.

Exercise 26.4.2 Typing recursion looks simple, but it’s actually worth studying in detail. Take a simple

example such as Ω and work through the rules:

• Write Ω with type annotations so it passes the type checker. Draw the type judgment tree to make sure

you understand why this version of Ω types.

• Does the expression named by v in rec have to be a procedure? Do the typing rules for rec depend

on this?

Chapter 27

Typing Data

27.1

Recursive Types

27.1.1

Declaring Recursive Types

We saw in the previous lecture how rec was necessary to write recursive programs. But what about defining

recursive types? Recursive types are fundamental to computer science: even basic data structures like lists

and trees are recursive (since the rest of a list is also a list, and each sub-tree is itself a tree).

Suppose we try to type the program

{rec {length : ???

{fun {l : ???} : number

{if {empty? l}

0

{+ 1 {length {rest l}}}}}}

{length {numCons 1 {numCons 2 {numCons 3 numEmpty}}}}}

What should we write in place of the question marks?

Let’s consider the type of l. What kind of value can be an argument to l? Clearly a numeric cons,

because that’s the argument supplied in the first invocation of length. But eventually, a numeric empty is

passed to l also. This means l needs to have two types: (numeric) cons and empty.

In languages like ML (and Java), procedures do not consume arguments of more than one distinct type.

Instead, they force programmers to define a new type that encompasses all the possible arguments. This is

precisely what a datatype definition, of the kind we have been writing in Scheme, permits us to do. So let’s

try to write down such a datatype in a hypothetical extension to our (typed) implemented language:

{datatype numList

{[numEmpty]

[numCons {fst : number}

{rst : ???}]}

{rec {length : (numList -> number) ...}

{length ...}}}

255

256

CHAPTER 27. TYPING DATA

We assume that a datatype declaration introduces a collection of variants, followed by an actual body that

uses the datatype. What type annotation should we place on rst? This should be precisely the new type we

are introducing, namely numList.

A datatype declaration therefore enables us to do a few distinct things all in one notation:

1. Give names to new types.

2. Introduce conditionally-defined types (variants).

3. Permit recursive definitions.

If these are truly distinct, we should consider whether there are more primitive operators that we may provide

so a programmer can mix-and-match them as necessary.1

But how distinct are these three operations, really? Giving a type a new name would be only so useful

if the type were simple (for instance, creating the name bool as an alias for boolean may be convenient,

but it’s certainly not conceptually significant), so this capability is most useful when the name is assigned to

a complex type. Recursion needs a name to use for declaring self-references, so it depends on the ability to

introduce a new name. Finally, well-founded recursion depends on having both recursive and non-recursive

cases, meaning the recursive type must be defined as a collection of variants (of which at least one is not

self-referential). So the three capabilities coalesce very nicely.

As you may have noticed above, the datatypes we have introduced in our typed language are a bit

different from those we’re using in Scheme. Our Scheme datatypes are defined at the top-level, while those

in the implemented language enclose the expressions that refer to them. This is primarily to make it easier

to deal with the scope of the introduced types. Obviously, a full-fledged language (like ML and Haskell)

permits apparently top-level datatype declarations, but we’ll make this simplifying assumption here.

27.1.2

Judgments for Recursive Types

Let’s consider another example of a recursive type: a family tree.

{datatype FamilyTree

{[unknown]

[person {name : string}

{mother : FamilyTree}

{father : FamilyTree}]}

...}

This data definition allows us to describe as much of the genealogy as we know, and terminate the construc-

tion when we reach an unknown person. What type declarations ensue from this definition?

unknown : →FamilyTree

person : string × FamilyTree × FamilyTree→FamilyTree

1“Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and

restrictions that make additional features appear necessary”.

27.1. RECURSIVE TYPES

257

This doesn’t yet give us a way of distinguishing between the two variants, and of selecting the fields in each

variant. In Scheme, we use type-case to perform both of these operations. A corresponding case dispatcher

for the above datatype might look like

{FamilyTree-cases v

[{unknown} ...]

[{person n m f} ...]}

Its pieces would be typed as follows:

Γ v : FamilyTree

Γ e1 : τ

Γ[n←string, m←FamilyTree, f ←FamilyTree] e2 : τ

Γ {FamilyTree-cases v {[unknown] e1} {[person n m f ] e2}} : τ

In other words, to determine the type of the entire FamilyTree-cases expression, τ, we first ensure that

the value being dispatched is of the right type. Then we must make sure each branch of the switch returns a

τ .2 We can ensure that by checking each of the bodies in the right type environment. Because unknown has

no fields, its cases branch binds no variables, so we check e1 in Γ. In the branch for person, however,

we bind three variables, so we must check the type of e2 in a suitably extended Γ.

Though the judgment above is for a very specific type declaration, the general principle should be clear

from what we’ve written. Effectively, the type checker introduces a new type rule for each typed cases

statement based on the type declaration at the time it sees the declaration. Writing the judgment above in

terms of subscripted parameters is tedious but easy.

Given the type rules above, consider the following program:

{datatype FamilyTree

{[unknown]

[person {name : string}

{mother : FamilyTree}

{father : FamilyTree}]}

{person "Mitochondrial Eve" {unknown} {unknown}}}

What is the type of the expression in the body of the datatype declaration? It’s FamilyTree. But when the

value escapes from the body of the declaration, how can we access it any longer? (We assume that the type

checker renames types consistently, so FamilyTree in one scope is different from FamilyTree in another

scope—just because the names are the same, the types should not conflate.) It basically becomes an opaque

type that is no longer usable. This does not appear to be very useful at all!3

At any rate, the type checker permitted a program that is quite useless, and we might want to prevent

this. Therefore, we could place the restriction that the type defined in the datatype (in this case, FamilyTree)

should be different from the type of the expression body τ. This prevents programmers from inadvertently

returning values that nobody else can use.

2Based on the preceding discussion, if the two cases needed to return different types of values, how would you address this need

in a language that enforced the type judgment above?

3Actually, you could use this to define the essence of a module or object system. These are called existential types. But we

won’t study them further in this course.

258

CHAPTER 27. TYPING DATA

Obviously, this restriction doesn’t reach far enough. Returning a vector of FamilyTree values avoids the

restriction above, but the effect is the same: no part of the program outside the scope of the datatype can use

these values. So we may want a more stringent restriction: the type being different should not appear free

in τ.

This restriction may be overreaching, however. For instance, a programmer might define a new type,

and return a package (a vector, say) consisting of two valu