Programming Languages: Application and Interpretation by Shriram Krishnamurthi - HTML preview

/ Home / Computer Sciences / Programming Languages: Application and Interpretation

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

(- (current-milliseconds) begin-time)))]))

When we test this, we find

> (my-time (expt 2 1000))

Hmm! But ever hopeful:

> (my-time (expt 2 10000))

> (my-time (expt 2 100000))

> (my-time (expt 2 1000000))

2053

which is what we expect.

How does this version of my-time work? The Scheme macro system trawls the program source and

gathers all the syntax definitions. It then substitutes all the uses of these syntax definitions with the bodies,

where each syntax definition is defined by pattern-matching (we’ll see several more examples). Only after

finishing all the substitution does it hand the program to the Scheme evaluator, which therefore doesn’t need

to know anything about the syntax definitions. That is, given the above syntax definition and the program

(my-time (expt 2 10000)), the program that the Scheme evaluator actually sees is

(let ([begin-time (current-milliseconds)])

(begin

(expt 2 10000)

(- (current-milliseconds) begin-time)))

This is the right-hand-side of the first (and only) clause in the list of rules, except e has been substituted with

the exponentiation expression. This is now an ordinary Scheme expression that the evaluator can reduce to a

322

CHAPTER 36. MACROS AS COMPILERS

value. Notice that the current time is now measured before and after the expression evaluates, thus ensuring

that we do in fact clock its evaluation.1

Exercise 36.1.1 Do macros merely implement laziness?

36.1.2

Example: Local Definitions

We saw earlier this semester that

{with {var val} body}

could be rewritten as

{{fun {var} body} val}

by a preprocessor, so our core evaluator did not need to implement with directly. The same is true of the

let construct in Scheme. Here’s a simple macro for let (again, we’ll use the my- convention to avoid any

clashes):

(define-syntax my-let-1

(syntax-rules ()

[(my-let-1 (var val) body)

((lambda (var) body) val)]))

Sure enough,

> (my-let-1 (x 3) (+ x 4))

> (my-let-1 (x 3) (my-let-1 (y 4) (+ x y)))

In full Scheme, however, the let construct is a bit more complex: it permits binding several identifiers at the

same time (as we saw in a homework assignment regarding with). Therefore, the true translation should

be regarded as something along these lines:

(let ([var val] . . .) body) =⇒ ((lambda (var . . .) body) val . . .)

That is, we want each of the variables to remain in the same order, and likewise each of the value expressions—

except we don’t know how many we will encounter, so we use . . . to indicate “zero or more”.

How are we going to define this macro? In fact, it couldn’t be easier. A researcher, Eugene Kohlbecker,

observed that numerous extensions to Scheme had this same “zero or more” form, and noticed that people

always wrote them informally using the stylized notation above. He therefore simply defined a macro system

that processed that notation:

(define-syntax my-let

(syntax-rules ()

[(my-let ([var val] · · ·) body)

((lambda (var · · ·) body) val · · ·)]))

1Technically, this isn’t exactly the expression that evaluates. We’ll return to this in a bit.

36.1. LANGUAGE REUSE

323

Therefore (my-let ([x 3] [y 4]) (+ x y)) translates into ((lambda (x y) body) 3 4) which, sure enough, reduces

to 7. Notice how the macro system is smart enough to treat the ([var val] · · ·) pattern as being the composite

of the var · · · and val · · · patterns.2 In particular, if no identifiers are bound, then this turns into an immediate application of a thunk to no arguments, which just evaluates the body.

36.1.3

Example: Nested Local Definitions

In a let, all the named expressions are bound in the same scope, which doesn’t include any of the bound

names. Sometimes, it’s useful to bind names sequentially so later bindings can refer to earlier ones. Scheme

provides the construct let∗ for this task:

(let∗ ([a 5]

[b 12]

[aˆ2 (∗ a a)]

[bˆ2 (∗ b b)]

[aˆ2+bˆ2 (+ aˆ2 bˆ2)])

(sqrt aˆ2+bˆ2))

(Think of what this would evaluate to with let instead of let∗.)

We can implement let∗ very easily by unraveling it into a sequence of lets:

(let∗ ([var val] · · ·) body) =⇒ (let ([var0 val0])

(let ([var1 val1])

· · ·

(let ([varn valn])

body)))

There is a stylized way of writing such macros in Scheme, which is to split them into two cases: when the

sequence is empty and when the sequence has one or more elements. When there are no identifiers being

bound, then let∗ does the same thing as let (which is to reduce to the expression itself):

(let∗ () body) =⇒ body

Since each · · · means “zero or more”, we need to use a more refined pattern to indicate “one or more”:

(let∗ ([var0 val0] [var-rest val-rest] · · ·) body)

The rewrite rule then becomes

(let∗ ([var0 val0] [var-rest val-rest] · · ·) body) =⇒ (let ([var0 val0])

(let∗ ([var-rest val-rest]

...)

body))

That is, we apply the macro for let∗ recursively. Written in Scheme syntax, this is expressed as (notice the

two cases):

2The use of brackets versus parentheses is purely stylistic.

324

CHAPTER 36. MACROS AS COMPILERS

(define-syntax my-let∗

(syntax-rules ()

[(my-let∗ () body)

body]

[(my-let∗ ([var0 val0]

[var-rest val-rest] · · ·)

body)

(let ([var0 val0])

(my-let∗ ([var-rest val-rest] · · ·)

body))]))

There is nothing in Scheme that prevents a runaway expansion. Therefore, it’s possible to write a

misbehaving macro that expands forever, so that evaluation never even begins. However, most macros

follow the simple stylistic pattern above, which guarantees termination (the recursion is over the bound

identifiers, and each time through, one more identifier-value pair is taken off).

36.1.4

Example: Simple Conditional

Let’s say we want a simplified form of conditional that has only two branches and one conditional. This is

effectively the same as if:

(cond2 [t e1] [else e2]) =⇒ (if t e1 e2)

We might try the following macro:

(define-syntax cond2

(syntax-rules ()

[(cond2 (t e1) (else e2))

(if t e1 e2)]))

This correctly evaluates expressions such as

(cond2 [(even? (current-seconds)) ’even]

[else

’odd])

Unfortunately, this also permits expressions such as

(cond2 [(even? (current-seconds)) ’even]

[(odd? (current-seconds)) ’odd])

This shouldn’t be syntactically legal, because cond2 permits only one conditional; in place of the second,

we require programmers to write else. We can see that this second expression doesn’t get evaluated at all by

writing something atrocious:

(cond2 [false ’even]

[(/ 1 0) ’odd])

which evaluates to ’odd.

36.1. LANGUAGE REUSE

325

What we want is for the cond2 macro to simply reject any uses that don’t have else in the second question

position. This is where the mystical () after syntax-rules comes in: it lists the keywords in the macro. That

is, we should instead define the macro as

(define-syntax cond2

(syntax-rules (else)

[(cond2 (t e1) (else e2))

(if t e1 e2)]))

Then, we get the following interaction:

> (cond2 [false ’even]

[(/ 1 0) ’odd])

cond2: bad syntax in: (cond2 (false (quote even)) ((/ 1 0) (quote odd)))}

Without the keyword designation, Scheme has no way of knowing that else has a special status; naturally,

it makes no sense to build that knowledge into the macro system. Absent such knowledge, it simply treats

else as a macro variable, and matches it against whatever term is in that position. When we put else in the

keyword list, however, the expander no longer binds it but rather expects to find it in the right position—or

else rejects the program.

36.1.5

Example: Disjunction

Let’s consider one more example from Scheme lore. In Scheme, conditionals like or and and short-circuit:

that is, when they reach a term whose value determines the result of the expression, they do not evaluate the

subsequent terms. Let’s try to implement or.

To begin with, let’s define the two-arm version of or:

(define (my-or2-fun e1 e2)

(if e1

e2))

Sure enough, a very simple example appears to work

> (my-or2-fun false true)

but it fails on a more complex example:

> (let ([x 0])

(my-or2-fun (zero? x)

(zero? (/ 1 x))))

/: division by zero

whereas a short-circuiting evaluator would not have permitted the error to occur. The problem is, once again,

Scheme’s eager evaluation regime, which performs the divison before it ever gets to the body of my-or2-fun.

In contrast, a macro does not have this problem:

326

CHAPTER 36. MACROS AS COMPILERS

(define-syntax my-or2

(syntax-rules ()

[(my-or2 e1 e2)

(if e1 e1 e2)]))

which yields

> (my-or2 false true)

> (let ([x 0])

(my-or2 (zero? x)

(zero? (/ 1 x))))

In particular, the second expression translates into

(let ([x 0])

(if (zero? x)

(zero? x)

(zero? (/ 1 x))))

(just replace e1 and e2 consistently).

As this expansion begins to demonstrate, however, this is an unsatisfying macro. We evaluate the first

expression twice, which has the potential to be inefficient but also downright wrong. (Suppose the first

expression were to output something; then we’d see the output twice. If the expression wrote a value into a

database and returned a code, executing it a second time may produce a different result than the first time.)

Therefore, we’d really like to hold on to the value of the first evaluation and return it directly if it’s not false.

That is, we want

(define-syntax my-or2

(syntax-rules ()

[(my-or2 e1 e2)

(let ([result e1])

(if result

result

e2))]))

This expands the second expression into

(let ([x 0])

(let ([result (zero? x)])

(if result

result

(zero? (/ 1 x)))))

Since Scheme is eager, the expression in the e1 position gets evaluated only once. You should construct test

cases that demonstrate this.

36.2. HYGIENE

327

36.2

Hygiene

Now what if the use of my-or2 really looked like this?

(let ([result true])

(my-or2 false

result))

which should evaluate to true. The expansion, however, is

(let ([result true])

(let ([result false])

(if result

result

result)))

which evaluates to false!

What happened here? When we look at just the input expression, we do not see only one binding of

result. Reasoning locally to that expression, we assume that my-or2 will evaluate the first expression and,

finding it false, will evaluate the second; since this is result, which is bound to true, the overall response

should also be true. Instead, however, the use of result within the macro definition interferes with result in

the context of its use, resulting in the incorrect result.

The problem we see here should seem awfully familiar: this is exactly the same problem we saw under

a different guise when trying to understand scope. Here, result in the second arm of the disjunction is bound

in the let just outside the disjunction. In contrast, result inside the macro is bound inside the macro. We as

programmers should not need to know about all the names used within macros—just as we don’t need to

know the names of identifiers used within functions! Therefore, macros should be forced to obey the scoping

rules of the language.

Just to be sure, let’s try this expression in our evaluator:

> (let ([result true])

(my-or2 false result))

We get true! This is because Scheme’s macro system is hygienic. That is, it automatically renames identifiers

to avoid accidental name clashes. The expression that actually evaluates is something like

(let ([result true])

(let ([g1729 false])

(if g1729

g1729

result)))

where g1729 is a uniquely-generated identifier name. Notice that only the results within the macro definition

get renamed. In fact, because let is itself a macro, its identifiers also get renamed (as do those introduced by

lambda and other binding forms), so the real program sent to the evaluator might well be

(let ([g4104 true])

328

CHAPTER 36. MACROS AS COMPILERS

(let ([g1729 false])

(if g1729

g1729

g4104)))

Many macro systems, such as that of C, are not hygienic. Programmers sometimes try to circumvent this by

using hideous identifier names, such as __macro_result_. This is not a solution!

1. Not only is a it painful to have to program this way, small typos would greatly increase development

time, and the macro would be much harder to decipher when a programmer tries to modify or correct

it later.

2. This solution is only as good as the programmer’s imagination; the problem still persists, lying in wait

for just the right (wrong!) identifier to be bound in the context of use. Indeed, while a programmer

may choose a sufficiently obscure name from the perspective of other programmers, not all source is

written by humans. A tool generating C code (such as a Scheme-to-C compiler) may happen to use

exactly this naming convention.

3. This name is only obscure “upto one level”. If the macro definition is recursive, then recursive in-

stances of the macro may interfere with one another.

4. If you use this macro to debug the source that contains the macro (e.g., compiling the C compiler

using itself), then your carefully-chosen “obscure” name is now guaranteed to clash!

In short, to return to a theme of this course: we should view these kinds of contortions by programmers as

a symptom of a problem that must be addressed by better language design. Don’t settle for mediocrity! In

this case, hygiene is that solution.3

Notice, by the way, that we needed hygiene for the proper execution of our very first macro, because

my-time introduced the identifier begin-time. At the time, we never even gave a thought to this identifier,

which means in the absence of hygiene, we had a disaster waiting to happen. With hygiene, we can program

using normal names (like begin-time and result) and not have to worry about the consequencs down the line,

just as with static scope we can use reasonable names for local identifiers.

36.3

More Macrology by Example

Many languages provide a looping construct for iterating through integers sequentially. Scheme doesn’t for

three reasons:

1. Because most such loops are anyway inappropriate: the indices only exist to traverse sequential data

structures. Uses of map or filter over a list accomplish the same thing but at a higher level of abstrac-

tion.

2. Because recursion in the presence of tail calls has the same computational effect.

3The algorithm, in effect, “paints” each expression on expansion, then consistently renames identifiers that have the same paints.

36.3. MORE MACROLOGY BY EXAMPLE

329

3. Because, if we really crave a more traditional syntax, we can define it using a macro!

We’ll build up a loop macro in three stages.

36.3.1

Loops with Named Iteration Identifiers

Here’s our first attempt at a for loop macro.4 We’ve generously embellished it with keywords to increase

readability:

(define-syntax for0

(syntax-rules (from to in)

[(for0 var from low to high in bodies · · ·)

(local ([define loop (lambda ( var )

(if (> var

high )

’done

(begin

bodies · · ·

(loop (+ var 1)))))])

(loop low ))]))

This lets us write programs such as

(for0 x

from 2

to 5

in (display x))

which prints 2, 3, 4 and 5. However, when we try this on a program like this

(for0 x

from 2

to (read)

in (display x))

we notice an unpleasant phenomenon: the program reads the upper-bound of the loop every time through

the loop. To correct it, we should make sure it evaluates the upper-bound expression only once, which we

can do with a small change to the macro:

(define-syntax for1

(syntax-rules (from to in)

[(for1 var from low to high in bodies · · ·)

(local ([define high-value high ]

[define loop (lambda ( var )

(if (> var high-value)

’done

(begin

4We’re using the convention of wrapping macro pattern-variables in · to emphasize their relationship to BNF.

330

CHAPTER 36. MACROS AS COMPILERS

bodies · · ·

(loop (+ var 1)))))])

(loop low ))]))

In general, we must be very careful with macros to ensure expressions are evaluated the right number of

times. In this instance, low is going to be evaluated only once and var is only an identifier name, but we

have to make sure high is evaluated only once.

In fact, however, this version is also buggy! If there is a (read) in the low position, that’s going to get

evaluated second instead of first, which is presumably not what we wanted (though notice that we didn’t

formally specify the behavior of for, either). So to get it right, we really need to evaluate low and bind its

value to an identifier first.

In general, it’s safer to bind all expression positions to names. Scheme’s eager evaluation semantics

ensures the expressions will only be evaluted once. We don’t always want this, but we want it so often that

we may as well do it by default. (The times we accidentally bind an expression too early—for instance, the

conditional expression of a while loop—we will usually discover the problem pretty quickly by testing.) In

addition we must be sure to do this binding in the right order, mirroring what the user expects (and what

our documentation for the new language construct specifies). (Observe that the problematic expression in

this example is (read), which has the side-effect of prompting the user. Of course, we may want to limit

evaluation for efficiency reasons also.)

36.3.2

Overriding Hygiene: Loops with Implicit Iteration Identifiers

When we define a loop such as the one above, we often have no real use for the loop variable. It might be

convenient to simply introduce an identifier, say it, that is automatically bound to the current value of the

loop index. Thus, the first loop example above might instead be written as

(for2 from 2

to 5

in (display it))

Here’s a proposed macro that implements this construct:

(define-syntax for2

(syntax-rules (from to in)

[(for2 from low to high in bodies · · ·)

(local ([define high-value high ]

[define loop (lambda (it)

(if (> it high-value)

’done

(begin

bodies · · ·

(loop (+ it 1)))))])

(loop low ))]))

Notice that in place of var , we are now using it. When we run this in DrScheme, we get:

36.3. MORE MACROLOGY BY EXAMPLE

331

> (for2 from 2 to 5 in (display it))

reference to undefined identifier: it

Oops! What happened?

Actually, the macro system did exactly what it should. Remember hygiene? This was supposed to

prevent inadvertent capture of identifiers across the macro definition/macro use boundary. It just so happens

that in this case, we really do want it written in the macro use to be bound by it in the macro definition.

Clearly, here’s a good example of where we want to “break” hygiene, intentionally.

Unfortunately, the simple syntax-rules mechanism we’ve been using so far isn’t quite up to this task;

we must instead switch to a slightly more complex macro definition mechanism called syntax-case. For the

most part, this looks an awful lot like syntax-rules, with a little more notation. For instance, we can define

for3 to be the same macro as for1, except written using the new macro definition mechanism instead:

(define-syntax (for3 x)

(syntax-case x (from to in)

[(for3 var from low to high in bodies · · ·)

(syntax

(local ([define high-value high ]

[define loop (lambda ( var )

(if (> var high-value)

’done

(begin

bodies · · ·

(loop (+ var 1)))))])

(loop low )) ) ]))

To convert any syntax-rules macro definition into a corresponding one that uses syntax-case, we must

make the three changes boxed above (adding a parameter to the macro name, providing the parameter as an

explicit argument to syntax-case, and wrapping the entire output expression in (syntax · · ·).

We can similarly define for4:

(define-syntax (for4 x)

(syntax-case x (from to in)

[(for4 from low to high in bodies · · ·)

(syntax

(local ([define high-value high ]

[define loop (lambda (it)

(if (> it high-value)

’done

(begin

bodies · · ·

(loop (+ it 1)))))])

(loop low )))]))

332

CHAPTER 36. MACROS AS COMPILERS

This does not solve the hygiene problem; it simply enables it by converting the macro to use syntax-case.

The reason is that syntax-case provides additional capabilities. In particular, it provides a procedure called

datum→syntax-object, which takes an arbitrary Scheme datum and a term in the macro body, and “paints”

the datum with the same colors as those on the macro body. This has the effect of persuading the hygiene

mechanism to treat the introduced term as if it were written by the programmer. As a result, it gets renamed

consistently. Thus, we must write

(define-syntax (for4 x)

(syntax-case x (from to in)

[(for4 from low to high in bodies · · ·)

(with-syntax ([it (datum→syntax-object (syntax for4) ’it)])

(syntax

(local ([define high-value high ]

[define loop (lambda (it)

(if (> it high-value)

’done

(begin

bodies · · ·

(loop (+ it 1)))))])

(loop low ))))]))

The with-syntax construct introduces new pattern variables for use in the output. The first argument to

datum→syntax-object identifies which expression the identifier the expander must pretend “introduced” the

identifier. The second, in this case, is the symbol that will be painted appropriately. Therefore, the result of

expansion on our running example will look something like

(local ([define high-value 5]

[define loop (lambda (g1729)

(if (> g1729 high-value)

’done

(begin

(display g1729)

(loop (+ g1729 1)))))])

(loop 2))

Observe how the uses of it are all renamed consistently. (In practice, other bound identifiers such as high-

value and even loop will also acquire fresh names, but we don’t show that here to keep the code more

readable.) Indeed, this mechanism is sufficiently robust that it will even do the right thing with nested loops:

(for4 from 2 to 5 in

(for4 from 1 to it in

(display it))