return set_contains(s.rest, v)
>>> set_contains(s, 0)
False
How many steps does this save? In the worst case, the item we are looking for may be the largest one in the
set, so the number of steps is the same as for the unordered representation. On the other hand, if we search for
items of many different sizes we can expect that sometimes we will be able to stop searching at a point near the
beginning of the list and that other times we will still need to examine most of the list. On average we should
expect to have to examine about half of the items in the set. Thus, the average number of steps required will be
about n . This is still
2
Θ(n) growth, but it does save us, on average, a factor of 2 in the number of steps over the
previous implementation.
We can obtain a more impressive speedup by re-implementing intersect_set. In the unordered repre-
sentation, this operation required Θ(n2) steps because we performed a complete scan of set2 for each element
of set1. But with the ordered representation, we can use a more clever method. We iterate through both sets
simultaneously, tracking an element e1 in set1 and e2 in set2. When e1 and e2 are equal, we include that
element in the intersection.
Suppose, however, that e1 is less than e2. Since e2 is smaller than the remaining elements of set2, we
can immediately conclude that e1 cannot appear anywhere in the remainder of set2 and hence is not in the
intersection. Thus, we no longer need to consider e1; we discard it and proceed to the next element of set1.
Similar logic advances through the elements of set2 when e2 < e1. Here is the function:
>>> def intersect_set(set1, set2):
if empty(set1) or empty(set2):
return Rlist.empty
e1, e2 = set1.first, set2.first
if e1 == e2:
return Rlist(e1, intersect_set(set1.rest, set2.rest))
elif e1 < e2:
return intersect_set(set1.rest, set2)
elif e2 < e1:
return intersect_set(set1, set2.rest)
>>> intersect_set(s, s.rest)
Rlist(2, Rlist(3))
To estimate the number of steps required by this process, observe that in each step we shrink the size of at
least one of the sets. Thus, the number of steps required is at most the sum of the sizes of set1 and set2, rather
than the product of the sizes, as with the unordered representation. This is Θ(n) growth rather than Θ(n2) -- a
considerable speedup, even for sets of moderate size. For example, the intersection of two sets of size 100 will
take around 200 steps, rather than 10,000 for the unordered representation.
Adjunction and union for sets represented as ordered sequences can also be computed in linear time. These
implementations are left as an exercise.
Sets as binary trees. We can do better than the ordered-list representation by arranging the set elements in the
form of a tree. We use the Tree class introduced previously. The entry of the root of the tree holds one element
of the set. The entries within the left branch include all elements smaller than the one at the root. Entries in
the right branch include all elements greater than the one at the root. The figure below shows some trees that
16
represent the set {1, 3, 5, 7, 9, 11}. The same set may be represented by a tree in a number of different
ways. The only thing we require for a valid representation is that all elements in the left subtree be smaller than
the tree entry and that all elements in the right subtree be larger.
The advantage of the tree representation is this: Suppose we want to check whether a value v is contained in
a set. We begin by comparing v with entry. If v is less than this, we know that we need only search the left
subtree; if v is greater, we need only search the right subtree. Now, if the tree is “balanced,” each of these
subtrees will be about half the size of the original. Thus, in one step we have reduced the problem of searching a
tree of size n to searching a tree of size n . Since the size of the tree is halved at each step, we should expect that the 2
number of steps needed to search a tree grows as Θ(log n). For large sets, this will be a significant speedup over
the previous representations. This set_contains function exploits the ordering structure of the tree-structured
set.
>>> def set_contains(s, v):
if s is None:
return False
elif s.entry == v:
return True
elif s.entry < v:
return set_contains(s.right, v)
elif s.entry > v:
return set_contains(s.left, v)
Adjoining an item to a set is implemented similarly and also requires Θ(log n) steps. To adjoin a value v, we
compare v with entry to determine whether v should be added to the right or to the left branch, and having
adjoined v to the appropriate branch we piece this newly constructed branch together with the original entry
and the other branch. If v is equal to the entry, we just return the node. If we are asked to adjoin v to an empty
tree, we generate a Tree that has v as the entry and empty right and left branches. Here is the function:
>>> def adjoin_set(s, v):
if s is None:
return Tree(v)
if s.entry == v:
return s
if s.entry < v:
return Tree(s.entry, s.left, adjoin_set(s.right, v))
if s.entry > v:
return Tree(s.entry, adjoin_set(s.left, v), s.right)
>>> adjoin_set(adjoin_set(adjoin_set(None, 2), 3), 1)
Tree(2, Tree(1), Tree(3))
17
Our claim that searching the tree can be performed in a logarithmic number of steps rests on the assumption
that the tree is “balanced,” i.e., that the left and the right subtree of every tree have approximately the same number
of elements, so that each subtree contains about half the elements of its parent. But how can we be certain that the
trees we construct will be balanced? Even if we start with a balanced tree, adding elements with adjoin_set
may produce an unbalanced result. Since the position of a newly adjoined element depends on how the element
compares with the items already in the set, we can expect that if we add elements “randomly” the tree will tend to
be balanced on the average.
But this is not a guarantee. For example, if we start with an empty set and adjoin the numbers 1 through 7 in
sequence we end up with a highly unbalanced tree in which all the left subtrees are empty, so it has no advantage
over a simple ordered list. One way to solve this problem is to define an operation that transforms an arbitrary tree
into a balanced tree with the same elements. We can perform this transformation after every few adjoin_set
operations to keep our set in balance.
Intersection and union operations can be performed on tree-structured sets in linear time by converting them
to ordered lists and back. The details are left as an exercise.
Python set implementation. The set type that is built into Python does not use any of these representations
internally. Instead, Python uses a representation that gives constant-time membership tests and adjoin operations
based on a technique called hashing, which is a topic for another course. Built-in Python sets cannot contain
mutable data types, such as lists, dictionaries, or other sets. To allow for nested sets, Python also includes a
built-in immutable frozenset class that shares methods with the set class but excludes mutation methods and
operators.
3.4 Exceptions
Programmers must be always mindful of possible errors that may arise in their programs. Examples abound:
a function may not receive arguments that it is designed to accept, a necessary resource may be missing, or
a connection across a network may be lost. When designing a program, one must anticipate the exceptional
circumstances that may arise and take appropriate measures to handle them.
There is no single correct approach to handling errors in a program. Programs designed to provide some
persistent service like a web server should be robust to errors, logging them for later consideration but continuing
to service new requests as long as possible. On the other hand, the Python interpreter handles errors by terminating
immediately and printing an error message, so that programmers can address issues as soon as they arise. In any
case, programmers must make conscious choices about how their programs should react to exceptional conditions.
Exceptions, the topic of this section, provides a general mechanism for adding error-handling logic to pro-
grams. Raising an exception is a technique for interrupting the normal flow of execution in a program, signaling
that some exceptional circumstance has arisen, and returning directly to an enclosing part of the program that was
designated to react to that circumstance. The Python interpreter raises an exception each time it detects an error
in an expression or statement. Users can also raise exceptions with raise and assert statements.
Raising exceptions. An exception is a object instance with a class that inherits, either directly or indirectly,
from the BaseException class. The assert statement introduced in Chapter 1 raises an exception with the
class AssertionError. In general, any exception instance can be raised with the raise statement. The
general form of raise statements are described in the Python docs. The most common use of raise constructs an exception instance and raises it.
>>> raise Exception(’An error occurred’)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Exception: an error occurred
When an exception is raised, no further statements in the current block of code are executed. Unless the
exception is handled (described below), the interpreter will return directly to the interactive read-eval-print loop,
or terminate entirely if Python was started with a file argument. In addition, the interpreter will print a stack
backtrace, which is a structured block of text that describes the nested set of active function calls in the branch
of execution in which the exception was raised. In the example above, the file name <stdin> indicates that the
exception was raised by the user in an interactive session, rather than from code in a file.
Handling exceptions. An exception can be handled by an enclosing try statement. A try statement consists
of multiple clauses; the first begins with try and the rest begin with except:
18
try:
<try suite>
except <exception class> as <name>:
<except suite>
...
The <try suite> is always executed immediately when the try statement is executed. Suites of the
except clauses are only executed when an exception is raised during the course of executing the <try suite>.
Each except clause specifies the particular class of exception to handle. For instance, if the <exception
class> is AssertionError, then any instance of a class inheriting from AssertionError that is raised
during the course of executing the <try suite> will be handled by the following <except suite>. Within
the <except suite>, the identifier <name> is bound to the exception object that was raised, but this binding
does not persist beyond the <except suite>.
For example, we can handle a ZeroDivisionError exception using a try statement that binds the name
x to 0 when the exception is raised.
>>> try:
x = 1/0
except ZeroDivisionError as e:
print(’handling a’, type(e))
x = 0
handling a <class ’ZeroDivisionError’>
>>> x
0
A try statement will handle exceptions that occur within the body of a function that is applied (either directly
or indirectly) within the <try suite>. When an exception is raised, control jumps directly to the body of the
<except suite> of the most recent try statement that handles that type of exception.
>>> def invert(x):
result = 1/x
# Raises a ZeroDivisionError if x is 0
print(’Never printed if x is 0’)
return result
>>> def invert_safe(x):
try:
return invert(x)
except ZeroDivisionError as e:
return str(e)
>>> invert_safe(2)
Never printed if x is 0
0.5
>>> invert_safe(0)
’division by zero’
This example illustrates that the print expression in invert is never evaluated, and instead control is
transferred to the suite of the except clause in handler. Coercing the ZeroDivisionError e to a string
gives the human-interpretable string returned by handler: ’division by zero’.
3.4.1 Exception Objects
Exception objects themselves carry attributes, such as the error message stated in an assert statement and
information about where in the course of execution the exception was raised. User-defined exception classes can
carry additional attributes.
In Chapter 1, we implemented Newton’s method to find the zeroes of arbitrary functions. The following
example defines an exception class that returns the best guess discovered in the course of iterative improvement
whenever a ValueError occurs. A math domain error (a type of ValueError) is raised when sqrt is applied
19
to a negative number. This exception is handled by raising an IterImproveError that stores the most recent
guess from Newton’s method as an attribute.
First, we define a new class that inherits from Exception.
>>> class IterImproveError(Exception):
def __init__(self, last_guess):
self.last_guess = last_guess
Next, we define a version of IterImprove, our generic iterative improvement algorithm. This version
handles any ValueError by raising an IterImproveError that stores the most recent guess. As before,
iter_improve takes as arguments two functions, each of which takes a single numerical argument. The
update function returns new guesses, while the done function returns a boolean indicating that improvement
has converged to a correct value.
>>> def iter_improve(update, done, guess=1, max_updates=1000):
k = 0
try:
while not done(guess) and k < max_updates:
guess = update(guess)
k = k + 1
return guess
except ValueError:
raise IterImproveError(guess)
Finally, we define find_root, which returns the result of iter_improve applied to a Newton update
function returned by newton_update, which is defined in Chapter 1 and requires no changes for this example.
This version of find_root handles an IterImproveError by returning its last guess.
>>> def find_root(f, guess=1):
def done(x):
return f(x) == 0
try:
return iter_improve(newton_update(f), done, guess)
except IterImproveError as e:
return e.last_guess
√
Consider applying find_root to find the zero of the function 2x2 +
x. This function has a zero at 0, but
evaluating it on any negative number will raise a ValueError. Our Chapter 1 implementation of Newton’s
Method would raise that error and fail to return any guess of the zero. Our revised implementation returns the last
guess found before the error.
>>> from math import sqrt
>>> find_root(lambda x: 2*x*x + sqrt(x))
-0.030211203830201594
While this approximation is still far from the correct answer of 0, some applications would prefer this coarse
approximation to a ValueError.
Exceptions are another technique that help us as programs to separate the concerns of our program into modular
parts. In this example, Python’s exception mechanism allowed us to separate the logic for iterative improvement,
which appears unchanged in the suite of the try clause, from the logic for handling errors, which appears in
except clauses. We will also find that exceptions are a very useful feature when implementing interpreters in
Python.
3.5 Interpreters for Languages with Combination
The software running on any modern computer is written in a variety of programming languages. There are
physical languages, such as the machine languages for particular computers. These languages are concerned with
the representation of data and control in terms of individual bits of storage and primitive machine instructions. The
20
machine-language programmer is concerned with using the given hardware to erect systems and utilities for the
efficient implementation of resource-limited computations. High-level languages, erected on a machine-language
substrate, hide concerns about the representation of data as collections of bits and the representation of programs
as sequences of primitive instructions. These languages have means of combination and abstraction, such as
procedure definition, that are appropriate to the larger-scale organization of software systems.
Metalinguistic abstraction -- establishing new languages -- plays an important role in all branches of engi-
neering design. It is particularly important to computer programming, because in programming not only can we
formulate new languages but we can also implement these languages by constructing interpreters. An interpreter
for a programming language is a function that, when applied to an expression of the language, performs the actions
required to evaluate that expression.
We now embark on a tour of the technology by which languages are established in terms of other languages.
We will first define an interpreter for a limited language called Calculator that shares the syntax of Python call
expressions. We will then develop a sketch interpreters for the Scheme and Logo languages, which is are dialects
of Lisp, the second oldest language still in widespread use today. The interpreter we create will be complete in the
sense that it will allow us to write fully general programs in Logo. To do so, it will implement the environment
model of evaluation that we have developed over the course of this text.
3.5.1 Calculator
Our first new language is Calculator, an expression language for the arithmetic operations of addition, subtraction,
multiplication, and division. Calculator shares Python’s call expression syntax, but its operators are more flexible
in the number of arguments they accept. For instance, the Calculator operators add and mul take an arbitrary
number of arguments:
calc> add(1, 2, 3, 4)
10
calc> mul()
1
The sub operator has two behaviors. With one argument, it negates the argument. With at least two arguments,
it subtracts all but the first from the first. The div operator has the semantics of Python’s operator.truediv
function and takes exactly two arguments:
calc> sub(10, 1, 2, 3)
4
calc> sub(3)
-3
calc> div(15, 12)
1.25
As in Python, call expression nesting provides a means of combination in the Calculator language. To condense
notation, the names of operators can also be replaced by their standard symbols:
calc> sub(100, mul(7, add(8, div(-12, -3))))
16.0
calc> -(100, *(7, +(8, /(-12, -3))))
16.0
We will implement an interpreter for Calculator in Python. That is, we will write a Python program that takes
a string as input and either returns the result of evaluating that string if it is a well-formed Calculator expression
or raises an appropriate exception if it is not. The core of the interpreter for the Calculator language is a recursive
function called calc_eval that evaluates a tree-structured expression object.
Expression trees. Until this point in the course, expression trees have been conceptual entities to which we
have referred in describing the process of evaluation; we have never before explicitly represented expression trees
as data in our programs. In order to write an interpreter, we must operate on expressions as data. In the course of
this chapter, many of the concepts introduced in previous chapters will finally by realized in code.
A primitive expression is just a number in Calculator, either an int or float type. All combined expres-
sions are call expressions. A call expression is represented as a class Exp that has two attribute instances. The
21
operator in Calculator is always a string: an arithmetic operator name or symbol. The operands are either
primitive expressions or themselves instances of Exp.
>>> class Exp(object):
"""A call expression in Calculator."""
def __init__(self, operator, operands):
self.operator = operator
self.operands = operands
def __repr__(self):
return ’Exp({0}, {1})’.format(repr(self.operator), repr(self.operands))
def __str__(self):
operand_strs = ’, ’.join(map(str, self.operands))
return ’{0}({1})’.format(self.operator, operand_strs)
An Exp instance defines two string methods. The __repr__ method returns Python expression, while the
__str__ method returns a Calculator expression.
>>> Exp(’add’, [1, 2])
Exp(’add’, [1, 2])
>>> str(Exp(’add’, [1, 2]))
’add(1, 2)’
>>> Exp(’add’, [1, Exp(’mul’, [2, 3, 4])])
Exp(’add’, [1, Exp(’mul’, [2, 3, 4])])
>>> str(Exp(’add’, [1, Exp(’mul’, [2, 3, 4])]))
’add(1, mul(2, 3, 4))’
This final example demonstrates how the Exp class represents the hierarchical structure in expression trees by
including instances of Exp as elements of operands.
Evaluation. The calc_eval function itself takes an expression as an argument and returns its value. It
classifies the expression by its form and directs its evaluation. For Calculator, the only two syntactic forms of
expressions are numbers and call expressions, which are Exp instances. Numbers are self-evaluating; they can be
returned directly from calc_eval. Call expressions require function application.
>>> def calc_eval(exp):
"""Evaluate a Calculator expression."""
if type(exp) in (int, float):
return exp
elif type(exp) == Exp:
arguments = list(map(calc_eval, exp.operands))
return calc_apply(exp.operator, arguments)
Call expressions are evaluated by first recursively mapping the calc_eval function to the list of operands
to compute a list of arguments. Then, the operator is applied to those arguments in a second function,
calc_apply.
The Calculator language is simple enough that we can easily express the logic of applying each operator in the
body of a single function. In calc_apply, each conditional clause corresponds to applying one operator.
>>> from operator import mul
>>> from functools import reduce
>>> def calc_apply(operator, args):
"""Apply the named operator to a list of args."""
if operator in (’add’, ’+’):
return sum(args)
if operator in (’sub’, ’-’):
if len(args) == 0:
raise TypeError(operator + ’ requires at least 1 argument’)
if len(args) == 1:
return -args[0]
22
return sum(args[:1] + [-arg for arg in args[1:]])
if operator in (’mul’, ’*’):
return reduce(mul, args, 1)
if operator in (’div’, ’/’):
if len(args) != 2:
raise TypeError(operator + ’ requires exactly 2 arguments’)
numer, denom = args
return numer/denom
Above, each suite computes the result of a different operator, or raises an appropriate TypeError when the
wrong number of arguments is given. The calc_apply function can be applied directly, but it must be passed a
list of values as arguments rather than a list of operand expressions.
>>> calc_apply(’+’, [1, 2, 3])
6
>>> calc_apply(’-’, [10, 1, 2, 3])
4
>>> calc_apply(’*’, [])
1
>>> calc_apply(’/’, [40, 5])
8.0
The role of calc_eval is to make proper calls to calc_apply by first computing the value of operand
sub-expressions before passing them as arguments to calc_apply. Thus, calc_eval can accept a nested
expression.
>>> e = Exp(’add’, [2, Exp(’mul’, [4, 6])])
>>> str(e)
’add(2, mul(4, 6))’
>>> calc_eval(e)
26
The structure of calc_eval is an example of dispatching on type: the form of the expression. The first form
of expression is a number, which requires no additional evaluation step. In general, primitive expressions that do
not require an additional evaluation step are called self-evaluating. The only self-evaluating expressions in our
Cal