Chapter 1: Building Abstractions with Functions by Harold Abelson, Gerald Jay Sussman, Julie Sussman - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

return set_contains(s.rest, v)

>>> set_contains(s, 0)

False

How many steps does this save? In the worst case, the item we are looking for may be the largest one in the

set, so the number of steps is the same as for the unordered representation. On the other hand, if we search for

items of many different sizes we can expect that sometimes we will be able to stop searching at a point near the

beginning of the list and that other times we will still need to examine most of the list. On average we should

expect to have to examine about half of the items in the set. Thus, the average number of steps required will be

about n . This is still

2

Θ(n) growth, but it does save us, on average, a factor of 2 in the number of steps over the

previous implementation.

We can obtain a more impressive speedup by re-implementing intersect_set. In the unordered repre-

sentation, this operation required Θ(n2) steps because we performed a complete scan of set2 for each element

of set1. But with the ordered representation, we can use a more clever method. We iterate through both sets

simultaneously, tracking an element e1 in set1 and e2 in set2. When e1 and e2 are equal, we include that

element in the intersection.

Suppose, however, that e1 is less than e2. Since e2 is smaller than the remaining elements of set2, we

can immediately conclude that e1 cannot appear anywhere in the remainder of set2 and hence is not in the

intersection. Thus, we no longer need to consider e1; we discard it and proceed to the next element of set1.

Similar logic advances through the elements of set2 when e2 < e1. Here is the function:

>>> def intersect_set(set1, set2):

if empty(set1) or empty(set2):

return Rlist.empty

e1, e2 = set1.first, set2.first

if e1 == e2:

return Rlist(e1, intersect_set(set1.rest, set2.rest))

elif e1 < e2:

return intersect_set(set1.rest, set2)

elif e2 < e1:

return intersect_set(set1, set2.rest)

>>> intersect_set(s, s.rest)

Rlist(2, Rlist(3))

To estimate the number of steps required by this process, observe that in each step we shrink the size of at

least one of the sets. Thus, the number of steps required is at most the sum of the sizes of set1 and set2, rather

than the product of the sizes, as with the unordered representation. This is Θ(n) growth rather than Θ(n2) -- a

considerable speedup, even for sets of moderate size. For example, the intersection of two sets of size 100 will

take around 200 steps, rather than 10,000 for the unordered representation.

Adjunction and union for sets represented as ordered sequences can also be computed in linear time. These

implementations are left as an exercise.

Sets as binary trees. We can do better than the ordered-list representation by arranging the set elements in the

form of a tree. We use the Tree class introduced previously. The entry of the root of the tree holds one element

of the set. The entries within the left branch include all elements smaller than the one at the root. Entries in

the right branch include all elements greater than the one at the root. The figure below shows some trees that

16

index-106_1.png

represent the set {1, 3, 5, 7, 9, 11}. The same set may be represented by a tree in a number of different

ways. The only thing we require for a valid representation is that all elements in the left subtree be smaller than

the tree entry and that all elements in the right subtree be larger.

The advantage of the tree representation is this: Suppose we want to check whether a value v is contained in

a set. We begin by comparing v with entry. If v is less than this, we know that we need only search the left

subtree; if v is greater, we need only search the right subtree. Now, if the tree is “balanced,” each of these

subtrees will be about half the size of the original. Thus, in one step we have reduced the problem of searching a

tree of size n to searching a tree of size n . Since the size of the tree is halved at each step, we should expect that the 2

number of steps needed to search a tree grows as Θ(log n). For large sets, this will be a significant speedup over

the previous representations. This set_contains function exploits the ordering structure of the tree-structured

set.

>>> def set_contains(s, v):

if s is None:

return False

elif s.entry == v:

return True

elif s.entry < v:

return set_contains(s.right, v)

elif s.entry > v:

return set_contains(s.left, v)

Adjoining an item to a set is implemented similarly and also requires Θ(log n) steps. To adjoin a value v, we

compare v with entry to determine whether v should be added to the right or to the left branch, and having

adjoined v to the appropriate branch we piece this newly constructed branch together with the original entry

and the other branch. If v is equal to the entry, we just return the node. If we are asked to adjoin v to an empty

tree, we generate a Tree that has v as the entry and empty right and left branches. Here is the function:

>>> def adjoin_set(s, v):

if s is None:

return Tree(v)

if s.entry == v:

return s

if s.entry < v:

return Tree(s.entry, s.left, adjoin_set(s.right, v))

if s.entry > v:

return Tree(s.entry, adjoin_set(s.left, v), s.right)

>>> adjoin_set(adjoin_set(adjoin_set(None, 2), 3), 1)

Tree(2, Tree(1), Tree(3))

17

Our claim that searching the tree can be performed in a logarithmic number of steps rests on the assumption

that the tree is “balanced,” i.e., that the left and the right subtree of every tree have approximately the same number

of elements, so that each subtree contains about half the elements of its parent. But how can we be certain that the

trees we construct will be balanced? Even if we start with a balanced tree, adding elements with adjoin_set

may produce an unbalanced result. Since the position of a newly adjoined element depends on how the element

compares with the items already in the set, we can expect that if we add elements “randomly” the tree will tend to

be balanced on the average.

But this is not a guarantee. For example, if we start with an empty set and adjoin the numbers 1 through 7 in

sequence we end up with a highly unbalanced tree in which all the left subtrees are empty, so it has no advantage

over a simple ordered list. One way to solve this problem is to define an operation that transforms an arbitrary tree

into a balanced tree with the same elements. We can perform this transformation after every few adjoin_set

operations to keep our set in balance.

Intersection and union operations can be performed on tree-structured sets in linear time by converting them

to ordered lists and back. The details are left as an exercise.

Python set implementation. The set type that is built into Python does not use any of these representations

internally. Instead, Python uses a representation that gives constant-time membership tests and adjoin operations

based on a technique called hashing, which is a topic for another course. Built-in Python sets cannot contain

mutable data types, such as lists, dictionaries, or other sets. To allow for nested sets, Python also includes a

built-in immutable frozenset class that shares methods with the set class but excludes mutation methods and

operators.

3.4 Exceptions

Programmers must be always mindful of possible errors that may arise in their programs. Examples abound:

a function may not receive arguments that it is designed to accept, a necessary resource may be missing, or

a connection across a network may be lost. When designing a program, one must anticipate the exceptional

circumstances that may arise and take appropriate measures to handle them.

There is no single correct approach to handling errors in a program. Programs designed to provide some

persistent service like a web server should be robust to errors, logging them for later consideration but continuing

to service new requests as long as possible. On the other hand, the Python interpreter handles errors by terminating

immediately and printing an error message, so that programmers can address issues as soon as they arise. In any

case, programmers must make conscious choices about how their programs should react to exceptional conditions.

Exceptions, the topic of this section, provides a general mechanism for adding error-handling logic to pro-

grams. Raising an exception is a technique for interrupting the normal flow of execution in a program, signaling

that some exceptional circumstance has arisen, and returning directly to an enclosing part of the program that was

designated to react to that circumstance. The Python interpreter raises an exception each time it detects an error

in an expression or statement. Users can also raise exceptions with raise and assert statements.

Raising exceptions. An exception is a object instance with a class that inherits, either directly or indirectly,

from the BaseException class. The assert statement introduced in Chapter 1 raises an exception with the

class AssertionError. In general, any exception instance can be raised with the raise statement. The

general form of raise statements are described in the Python docs. The most common use of raise constructs an exception instance and raises it.

>>> raise Exception(’An error occurred’)

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

Exception: an error occurred

When an exception is raised, no further statements in the current block of code are executed. Unless the

exception is handled (described below), the interpreter will return directly to the interactive read-eval-print loop,

or terminate entirely if Python was started with a file argument. In addition, the interpreter will print a stack

backtrace, which is a structured block of text that describes the nested set of active function calls in the branch

of execution in which the exception was raised. In the example above, the file name <stdin> indicates that the

exception was raised by the user in an interactive session, rather than from code in a file.

Handling exceptions. An exception can be handled by an enclosing try statement. A try statement consists

of multiple clauses; the first begins with try and the rest begin with except:

18

try:

<try suite>

except <exception class> as <name>:

<except suite>

...

The <try suite> is always executed immediately when the try statement is executed. Suites of the

except clauses are only executed when an exception is raised during the course of executing the <try suite>.

Each except clause specifies the particular class of exception to handle. For instance, if the <exception

class> is AssertionError, then any instance of a class inheriting from AssertionError that is raised

during the course of executing the <try suite> will be handled by the following <except suite>. Within

the <except suite>, the identifier <name> is bound to the exception object that was raised, but this binding

does not persist beyond the <except suite>.

For example, we can handle a ZeroDivisionError exception using a try statement that binds the name

x to 0 when the exception is raised.

>>> try:

x = 1/0

except ZeroDivisionError as e:

print(’handling a’, type(e))

x = 0

handling a <class ’ZeroDivisionError’>

>>> x

0

A try statement will handle exceptions that occur within the body of a function that is applied (either directly

or indirectly) within the <try suite>. When an exception is raised, control jumps directly to the body of the

<except suite> of the most recent try statement that handles that type of exception.

>>> def invert(x):

result = 1/x

# Raises a ZeroDivisionError if x is 0

print(’Never printed if x is 0’)

return result

>>> def invert_safe(x):

try:

return invert(x)

except ZeroDivisionError as e:

return str(e)

>>> invert_safe(2)

Never printed if x is 0

0.5

>>> invert_safe(0)

’division by zero’

This example illustrates that the print expression in invert is never evaluated, and instead control is

transferred to the suite of the except clause in handler. Coercing the ZeroDivisionError e to a string

gives the human-interpretable string returned by handler: ’division by zero’.

3.4.1 Exception Objects

Exception objects themselves carry attributes, such as the error message stated in an assert statement and

information about where in the course of execution the exception was raised. User-defined exception classes can

carry additional attributes.

In Chapter 1, we implemented Newton’s method to find the zeroes of arbitrary functions. The following

example defines an exception class that returns the best guess discovered in the course of iterative improvement

whenever a ValueError occurs. A math domain error (a type of ValueError) is raised when sqrt is applied

19

to a negative number. This exception is handled by raising an IterImproveError that stores the most recent

guess from Newton’s method as an attribute.

First, we define a new class that inherits from Exception.

>>> class IterImproveError(Exception):

def __init__(self, last_guess):

self.last_guess = last_guess

Next, we define a version of IterImprove, our generic iterative improvement algorithm. This version

handles any ValueError by raising an IterImproveError that stores the most recent guess. As before,

iter_improve takes as arguments two functions, each of which takes a single numerical argument. The

update function returns new guesses, while the done function returns a boolean indicating that improvement

has converged to a correct value.

>>> def iter_improve(update, done, guess=1, max_updates=1000):

k = 0

try:

while not done(guess) and k < max_updates:

guess = update(guess)

k = k + 1

return guess

except ValueError:

raise IterImproveError(guess)

Finally, we define find_root, which returns the result of iter_improve applied to a Newton update

function returned by newton_update, which is defined in Chapter 1 and requires no changes for this example.

This version of find_root handles an IterImproveError by returning its last guess.

>>> def find_root(f, guess=1):

def done(x):

return f(x) == 0

try:

return iter_improve(newton_update(f), done, guess)

except IterImproveError as e:

return e.last_guess

Consider applying find_root to find the zero of the function 2x2 +

x. This function has a zero at 0, but

evaluating it on any negative number will raise a ValueError. Our Chapter 1 implementation of Newton’s

Method would raise that error and fail to return any guess of the zero. Our revised implementation returns the last

guess found before the error.

>>> from math import sqrt

>>> find_root(lambda x: 2*x*x + sqrt(x))

-0.030211203830201594

While this approximation is still far from the correct answer of 0, some applications would prefer this coarse

approximation to a ValueError.

Exceptions are another technique that help us as programs to separate the concerns of our program into modular

parts. In this example, Python’s exception mechanism allowed us to separate the logic for iterative improvement,

which appears unchanged in the suite of the try clause, from the logic for handling errors, which appears in

except clauses. We will also find that exceptions are a very useful feature when implementing interpreters in

Python.

3.5 Interpreters for Languages with Combination

The software running on any modern computer is written in a variety of programming languages. There are

physical languages, such as the machine languages for particular computers. These languages are concerned with

the representation of data and control in terms of individual bits of storage and primitive machine instructions. The

20

machine-language programmer is concerned with using the given hardware to erect systems and utilities for the

efficient implementation of resource-limited computations. High-level languages, erected on a machine-language

substrate, hide concerns about the representation of data as collections of bits and the representation of programs

as sequences of primitive instructions. These languages have means of combination and abstraction, such as

procedure definition, that are appropriate to the larger-scale organization of software systems.

Metalinguistic abstraction -- establishing new languages -- plays an important role in all branches of engi-

neering design. It is particularly important to computer programming, because in programming not only can we

formulate new languages but we can also implement these languages by constructing interpreters. An interpreter

for a programming language is a function that, when applied to an expression of the language, performs the actions

required to evaluate that expression.

We now embark on a tour of the technology by which languages are established in terms of other languages.

We will first define an interpreter for a limited language called Calculator that shares the syntax of Python call

expressions. We will then develop a sketch interpreters for the Scheme and Logo languages, which is are dialects

of Lisp, the second oldest language still in widespread use today. The interpreter we create will be complete in the

sense that it will allow us to write fully general programs in Logo. To do so, it will implement the environment

model of evaluation that we have developed over the course of this text.

3.5.1 Calculator

Our first new language is Calculator, an expression language for the arithmetic operations of addition, subtraction,

multiplication, and division. Calculator shares Python’s call expression syntax, but its operators are more flexible

in the number of arguments they accept. For instance, the Calculator operators add and mul take an arbitrary

number of arguments:

calc> add(1, 2, 3, 4)

10

calc> mul()

1

The sub operator has two behaviors. With one argument, it negates the argument. With at least two arguments,

it subtracts all but the first from the first. The div operator has the semantics of Python’s operator.truediv

function and takes exactly two arguments:

calc> sub(10, 1, 2, 3)

4

calc> sub(3)

-3

calc> div(15, 12)

1.25

As in Python, call expression nesting provides a means of combination in the Calculator language. To condense

notation, the names of operators can also be replaced by their standard symbols:

calc> sub(100, mul(7, add(8, div(-12, -3))))

16.0

calc> -(100, *(7, +(8, /(-12, -3))))

16.0

We will implement an interpreter for Calculator in Python. That is, we will write a Python program that takes

a string as input and either returns the result of evaluating that string if it is a well-formed Calculator expression

or raises an appropriate exception if it is not. The core of the interpreter for the Calculator language is a recursive

function called calc_eval that evaluates a tree-structured expression object.

Expression trees. Until this point in the course, expression trees have been conceptual entities to which we

have referred in describing the process of evaluation; we have never before explicitly represented expression trees

as data in our programs. In order to write an interpreter, we must operate on expressions as data. In the course of

this chapter, many of the concepts introduced in previous chapters will finally by realized in code.

A primitive expression is just a number in Calculator, either an int or float type. All combined expres-

sions are call expressions. A call expression is represented as a class Exp that has two attribute instances. The

21

operator in Calculator is always a string: an arithmetic operator name or symbol. The operands are either

primitive expressions or themselves instances of Exp.

>>> class Exp(object):

"""A call expression in Calculator."""

def __init__(self, operator, operands):

self.operator = operator

self.operands = operands

def __repr__(self):

return ’Exp({0}, {1})’.format(repr(self.operator), repr(self.operands))

def __str__(self):

operand_strs = ’, ’.join(map(str, self.operands))

return ’{0}({1})’.format(self.operator, operand_strs)

An Exp instance defines two string methods. The __repr__ method returns Python expression, while the

__str__ method returns a Calculator expression.

>>> Exp(’add’, [1, 2])

Exp(’add’, [1, 2])

>>> str(Exp(’add’, [1, 2]))

’add(1, 2)’

>>> Exp(’add’, [1, Exp(’mul’, [2, 3, 4])])

Exp(’add’, [1, Exp(’mul’, [2, 3, 4])])

>>> str(Exp(’add’, [1, Exp(’mul’, [2, 3, 4])]))

’add(1, mul(2, 3, 4))’

This final example demonstrates how the Exp class represents the hierarchical structure in expression trees by

including instances of Exp as elements of operands.

Evaluation. The calc_eval function itself takes an expression as an argument and returns its value. It

classifies the expression by its form and directs its evaluation. For Calculator, the only two syntactic forms of

expressions are numbers and call expressions, which are Exp instances. Numbers are self-evaluating; they can be

returned directly from calc_eval. Call expressions require function application.

>>> def calc_eval(exp):

"""Evaluate a Calculator expression."""

if type(exp) in (int, float):

return exp

elif type(exp) == Exp:

arguments = list(map(calc_eval, exp.operands))

return calc_apply(exp.operator, arguments)

Call expressions are evaluated by first recursively mapping the calc_eval function to the list of operands

to compute a list of arguments. Then, the operator is applied to those arguments in a second function,

calc_apply.

The Calculator language is simple enough that we can easily express the logic of applying each operator in the

body of a single function. In calc_apply, each conditional clause corresponds to applying one operator.

>>> from operator import mul

>>> from functools import reduce

>>> def calc_apply(operator, args):

"""Apply the named operator to a list of args."""

if operator in (’add’, ’+’):

return sum(args)

if operator in (’sub’, ’-’):

if len(args) == 0:

raise TypeError(operator + ’ requires at least 1 argument’)

if len(args) == 1:

return -args[0]

22

return sum(args[:1] + [-arg for arg in args[1:]])

if operator in (’mul’, ’*’):

return reduce(mul, args, 1)

if operator in (’div’, ’/’):

if len(args) != 2:

raise TypeError(operator + ’ requires exactly 2 arguments’)

numer, denom = args

return numer/denom

Above, each suite computes the result of a different operator, or raises an appropriate TypeError when the

wrong number of arguments is given. The calc_apply function can be applied directly, but it must be passed a

list of values as arguments rather than a list of operand expressions.

>>> calc_apply(’+’, [1, 2, 3])

6

>>> calc_apply(’-’, [10, 1, 2, 3])

4

>>> calc_apply(’*’, [])

1

>>> calc_apply(’/’, [40, 5])

8.0

The role of calc_eval is to make proper calls to calc_apply by first computing the value of operand

sub-expressions before passing them as arguments to calc_apply. Thus, calc_eval can accept a nested

expression.

>>> e = Exp(’add’, [2, Exp(’mul’, [4, 6])])

>>> str(e)

’add(2, mul(4, 6))’

>>> calc_eval(e)

26

The structure of calc_eval is an example of dispatching on type: the form of the expression. The first form

of expression is a number, which requires no additional evaluation step. In general, primitive expressions that do

not require an additional evaluation step are called self-evaluating. The only self-evaluating expressions in our

Cal