(- (current-milliseconds) begin-time)))]))
When we test this, we find
> (my-time (expt 2 1000))
0
Hmm! But ever hopeful:
> (my-time (expt 2 10000))
10
> (my-time (expt 2 100000))
70
> (my-time (expt 2 1000000))
2053
which is what we expect.
How does this version of my-time work? The Scheme macro system trawls the program source and
gathers all the syntax definitions. It then substitutes all the uses of these syntax definitions with the bodies,
where each syntax definition is defined by pattern-matching (we’ll see several more examples). Only after
finishing all the substitution does it hand the program to the Scheme evaluator, which therefore doesn’t need
to know anything about the syntax definitions. That is, given the above syntax definition and the program
(my-time (expt 2 10000)), the program that the Scheme evaluator actually sees is
(let ([begin-time (current-milliseconds)])
(begin
(expt 2 10000)
(- (current-milliseconds) begin-time)))
This is the right-hand-side of the first (and only) clause in the list of rules, except e has been substituted with
the exponentiation expression. This is now an ordinary Scheme expression that the evaluator can reduce to a
322
CHAPTER 36. MACROS AS COMPILERS
value. Notice that the current time is now measured before and after the expression evaluates, thus ensuring
that we do in fact clock its evaluation.1
Exercise 36.1.1 Do macros merely implement laziness?
36.1.2
Example: Local Definitions
We saw earlier this semester that
{with {var val} body}
could be rewritten as
{{fun {var} body} val}
by a preprocessor, so our core evaluator did not need to implement with directly. The same is true of the
let construct in Scheme. Here’s a simple macro for let (again, we’ll use the my- convention to avoid any
clashes):
(define-syntax my-let-1
(syntax-rules ()
[(my-let-1 (var val) body)
((lambda (var) body) val)]))
Sure enough,
> (my-let-1 (x 3) (+ x 4))
7
> (my-let-1 (x 3) (my-let-1 (y 4) (+ x y)))
7
In full Scheme, however, the let construct is a bit more complex: it permits binding several identifiers at the
same time (as we saw in a homework assignment regarding with). Therefore, the true translation should
be regarded as something along these lines:
(let ([var val] . . .) body) =⇒ ((lambda (var . . .) body) val . . .)
That is, we want each of the variables to remain in the same order, and likewise each of the value expressions—
except we don’t know how many we will encounter, so we use . . . to indicate “zero or more”.
How are we going to define this macro? In fact, it couldn’t be easier. A researcher, Eugene Kohlbecker,
observed that numerous extensions to Scheme had this same “zero or more” form, and noticed that people
always wrote them informally using the stylized notation above. He therefore simply defined a macro system
that processed that notation:
(define-syntax my-let
(syntax-rules ()
[(my-let ([var val] · · ·) body)
((lambda (var · · ·) body) val · · ·)]))
1Technically, this isn’t exactly the expression that evaluates. We’ll return to this in a bit.
36.1. LANGUAGE REUSE
323
Therefore (my-let ([x 3] [y 4]) (+ x y)) translates into ((lambda (x y) body) 3 4) which, sure enough, reduces
to 7. Notice how the macro system is smart enough to treat the ([var val] · · ·) pattern as being the composite
of the var · · · and val · · · patterns.2 In particular, if no identifiers are bound, then this turns into an immediate application of a thunk to no arguments, which just evaluates the body.
36.1.3
Example: Nested Local Definitions
In a let, all the named expressions are bound in the same scope, which doesn’t include any of the bound
names. Sometimes, it’s useful to bind names sequentially so later bindings can refer to earlier ones. Scheme
provides the construct let∗ for this task:
(let∗ ([a 5]
[b 12]
[aˆ2 (∗ a a)]
[bˆ2 (∗ b b)]
[aˆ2+bˆ2 (+ aˆ2 bˆ2)])
(sqrt aˆ2+bˆ2))
(Think of what this would evaluate to with let instead of let∗.)
We can implement let∗ very easily by unraveling it into a sequence of lets:
(let∗ ([var val] · · ·) body) =⇒ (let ([var0 val0])
(let ([var1 val1])
· · ·
(let ([varn valn])
body)))
There is a stylized way of writing such macros in Scheme, which is to split them into two cases: when the
sequence is empty and when the sequence has one or more elements. When there are no identifiers being
bound, then let∗ does the same thing as let (which is to reduce to the expression itself):
(let∗ () body) =⇒ body
Since each · · · means “zero or more”, we need to use a more refined pattern to indicate “one or more”:
(let∗ ([var0 val0] [var-rest val-rest] · · ·) body)
The rewrite rule then becomes
(let∗ ([var0 val0] [var-rest val-rest] · · ·) body) =⇒ (let ([var0 val0])
(let∗ ([var-rest val-rest]
...)
body))
That is, we apply the macro for let∗ recursively. Written in Scheme syntax, this is expressed as (notice the
two cases):
2The use of brackets versus parentheses is purely stylistic.
324
CHAPTER 36. MACROS AS COMPILERS
(define-syntax my-let∗
(syntax-rules ()
[(my-let∗ () body)
body]
[(my-let∗ ([var0 val0]
[var-rest val-rest] · · ·)
body)
(let ([var0 val0])
(my-let∗ ([var-rest val-rest] · · ·)
body))]))
There is nothing in Scheme that prevents a runaway expansion. Therefore, it’s possible to write a
misbehaving macro that expands forever, so that evaluation never even begins. However, most macros
follow the simple stylistic pattern above, which guarantees termination (the recursion is over the bound
identifiers, and each time through, one more identifier-value pair is taken off).
36.1.4
Example: Simple Conditional
Let’s say we want a simplified form of conditional that has only two branches and one conditional. This is
effectively the same as if:
(cond2 [t e1] [else e2]) =⇒ (if t e1 e2)
We might try the following macro:
(define-syntax cond2
(syntax-rules ()
[(cond2 (t e1) (else e2))
(if t e1 e2)]))
This correctly evaluates expressions such as
(cond2 [(even? (current-seconds)) ’even]
[else
’odd])
Unfortunately, this also permits expressions such as
(cond2 [(even? (current-seconds)) ’even]
[(odd? (current-seconds)) ’odd])
This shouldn’t be syntactically legal, because cond2 permits only one conditional; in place of the second,
we require programmers to write else. We can see that this second expression doesn’t get evaluated at all by
writing something atrocious:
(cond2 [false ’even]
[(/ 1 0) ’odd])
which evaluates to ’odd.
36.1. LANGUAGE REUSE
325
What we want is for the cond2 macro to simply reject any uses that don’t have else in the second question
position. This is where the mystical () after syntax-rules comes in: it lists the keywords in the macro. That
is, we should instead define the macro as
(define-syntax cond2
(syntax-rules (else)
[(cond2 (t e1) (else e2))
(if t e1 e2)]))
Then, we get the following interaction:
> (cond2 [false ’even]
[(/ 1 0) ’odd])
cond2: bad syntax in: (cond2 (false (quote even)) ((/ 1 0) (quote odd)))}
Without the keyword designation, Scheme has no way of knowing that else has a special status; naturally,
it makes no sense to build that knowledge into the macro system. Absent such knowledge, it simply treats
else as a macro variable, and matches it against whatever term is in that position. When we put else in the
keyword list, however, the expander no longer binds it but rather expects to find it in the right position—or
else rejects the program.
36.1.5
Example: Disjunction
Let’s consider one more example from Scheme lore. In Scheme, conditionals like or and and short-circuit:
that is, when they reach a term whose value determines the result of the expression, they do not evaluate the
subsequent terms. Let’s try to implement or.
To begin with, let’s define the two-arm version of or:
(define (my-or2-fun e1 e2)
(if e1
e1
e2))
Sure enough, a very simple example appears to work
> (my-or2-fun false true)
#t
but it fails on a more complex example:
> (let ([x 0])
(my-or2-fun (zero? x)
(zero? (/ 1 x))))
/: division by zero
whereas a short-circuiting evaluator would not have permitted the error to occur. The problem is, once again,
Scheme’s eager evaluation regime, which performs the divison before it ever gets to the body of my-or2-fun.
In contrast, a macro does not have this problem:
326
CHAPTER 36. MACROS AS COMPILERS
(define-syntax my-or2
(syntax-rules ()
[(my-or2 e1 e2)
(if e1 e1 e2)]))
which yields
> (my-or2 false true)
#t
> (let ([x 0])
(my-or2 (zero? x)
(zero? (/ 1 x))))
#t
In particular, the second expression translates into
(let ([x 0])
(if (zero? x)
(zero? x)
(zero? (/ 1 x))))
(just replace e1 and e2 consistently).
As this expansion begins to demonstrate, however, this is an unsatisfying macro. We evaluate the first
expression twice, which has the potential to be inefficient but also downright wrong. (Suppose the first
expression were to output something; then we’d see the output twice. If the expression wrote a value into a
database and returned a code, executing it a second time may produce a different result than the first time.)
Therefore, we’d really like to hold on to the value of the first evaluation and return it directly if it’s not false.
That is, we want
(define-syntax my-or2
(syntax-rules ()
[(my-or2 e1 e2)
(let ([result e1])
(if result
result
e2))]))
This expands the second expression into
(let ([x 0])
(let ([result (zero? x)])
(if result
result
(zero? (/ 1 x)))))
Since Scheme is eager, the expression in the e1 position gets evaluated only once. You should construct test
cases that demonstrate this.
36.2. HYGIENE
327
36.2
Hygiene
Now what if the use of my-or2 really looked like this?
(let ([result true])
(my-or2 false
result))
which should evaluate to true. The expansion, however, is
(let ([result true])
(let ([result false])
(if result
result
result)))
which evaluates to false!
What happened here? When we look at just the input expression, we do not see only one binding of
result. Reasoning locally to that expression, we assume that my-or2 will evaluate the first expression and,
finding it false, will evaluate the second; since this is result, which is bound to true, the overall response
should also be true. Instead, however, the use of result within the macro definition interferes with result in
the context of its use, resulting in the incorrect result.
The problem we see here should seem awfully familiar: this is exactly the same problem we saw under
a different guise when trying to understand scope. Here, result in the second arm of the disjunction is bound
in the let just outside the disjunction. In contrast, result inside the macro is bound inside the macro. We as
programmers should not need to know about all the names used within macros—just as we don’t need to
know the names of identifiers used within functions! Therefore, macros should be forced to obey the scoping
rules of the language.
Just to be sure, let’s try this expression in our evaluator:
> (let ([result true])
(my-or2 false result))
#t
We get true! This is because Scheme’s macro system is hygienic. That is, it automatically renames identifiers
to avoid accidental name clashes. The expression that actually evaluates is something like
(let ([result true])
(let ([g1729 false])
(if g1729
g1729
result)))
where g1729 is a uniquely-generated identifier name. Notice that only the results within the macro definition
get renamed. In fact, because let is itself a macro, its identifiers also get renamed (as do those introduced by
lambda and other binding forms), so the real program sent to the evaluator might well be
(let ([g4104 true])
328
CHAPTER 36. MACROS AS COMPILERS
(let ([g1729 false])
(if g1729
g1729
g4104)))
Many macro systems, such as that of C, are not hygienic. Programmers sometimes try to circumvent this by
using hideous identifier names, such as __macro_result_. This is not a solution!
1. Not only is a it painful to have to program this way, small typos would greatly increase development
time, and the macro would be much harder to decipher when a programmer tries to modify or correct
it later.
2. This solution is only as good as the programmer’s imagination; the problem still persists, lying in wait
for just the right (wrong!) identifier to be bound in the context of use. Indeed, while a programmer
may choose a sufficiently obscure name from the perspective of other programmers, not all source is
written by humans. A tool generating C code (such as a Scheme-to-C compiler) may happen to use
exactly this naming convention.
3. This name is only obscure “upto one level”. If the macro definition is recursive, then recursive in-
stances of the macro may interfere with one another.
4. If you use this macro to debug the source that contains the macro (e.g., compiling the C compiler
using itself), then your carefully-chosen “obscure” name is now guaranteed to clash!
In short, to return to a theme of this course: we should view these kinds of contortions by programmers as
a symptom of a problem that must be addressed by better language design. Don’t settle for mediocrity! In
this case, hygiene is that solution.3
Notice, by the way, that we needed hygiene for the proper execution of our very first macro, because
my-time introduced the identifier begin-time. At the time, we never even gave a thought to this identifier,
which means in the absence of hygiene, we had a disaster waiting to happen. With hygiene, we can program
using normal names (like begin-time and result) and not have to worry about the consequencs down the line,
just as with static scope we can use reasonable names for local identifiers.
36.3
More Macrology by Example
Many languages provide a looping construct for iterating through integers sequentially. Scheme doesn’t for
three reasons:
1. Because most such loops are anyway inappropriate: the indices only exist to traverse sequential data
structures. Uses of map or filter over a list accomplish the same thing but at a higher level of abstrac-
tion.
2. Because recursion in the presence of tail calls has the same computational effect.
3The algorithm, in effect, “paints” each expression on expansion, then consistently renames identifiers that have the same paints.
36.3. MORE MACROLOGY BY EXAMPLE
329
3. Because, if we really crave a more traditional syntax, we can define it using a macro!
We’ll build up a loop macro in three stages.
36.3.1
Loops with Named Iteration Identifiers
Here’s our first attempt at a for loop macro.4 We’ve generously embellished it with keywords to increase
readability:
(define-syntax for0
(syntax-rules (from to in)
[(for0 var from low to high in bodies · · ·)
(local ([define loop (lambda ( var )
(if (> var
high )
’done
(begin
bodies · · ·
(loop (+ var 1)))))])
(loop low ))]))
This lets us write programs such as
(for0 x
from 2
to 5
in (display x))
which prints 2, 3, 4 and 5. However, when we try this on a program like this
(for0 x
from 2
to (read)
in (display x))
we notice an unpleasant phenomenon: the program reads the upper-bound of the loop every time through
the loop. To correct it, we should make sure it evaluates the upper-bound expression only once, which we
can do with a small change to the macro:
(define-syntax for1
(syntax-rules (from to in)
[(for1 var from low to high in bodies · · ·)
(local ([define high-value high ]
[define loop (lambda ( var )
(if (> var high-value)
’done
(begin
4We’re using the convention of wrapping macro pattern-variables in · to emphasize their relationship to BNF.
330
CHAPTER 36. MACROS AS COMPILERS
bodies · · ·
(loop (+ var 1)))))])
(loop low ))]))
In general, we must be very careful with macros to ensure expressions are evaluated the right number of
times. In this instance, low is going to be evaluated only once and var is only an identifier name, but we
have to make sure high is evaluated only once.
In fact, however, this version is also buggy! If there is a (read) in the low position, that’s going to get
evaluated second instead of first, which is presumably not what we wanted (though notice that we didn’t
formally specify the behavior of for, either). So to get it right, we really need to evaluate low and bind its
value to an identifier first.
In general, it’s safer to bind all expression positions to names. Scheme’s eager evaluation semantics
ensures the expressions will only be evaluted once. We don’t always want this, but we want it so often that
we may as well do it by default. (The times we accidentally bind an expression too early—for instance, the
conditional expression of a while loop—we will usually discover the problem pretty quickly by testing.) In
addition we must be sure to do this binding in the right order, mirroring what the user expects (and what
our documentation for the new language construct specifies). (Observe that the problematic expression in
this example is (read), which has the side-effect of prompting the user. Of course, we may want to limit
evaluation for efficiency reasons also.)
36.3.2
Overriding Hygiene: Loops with Implicit Iteration Identifiers
When we define a loop such as the one above, we often have no real use for the loop variable. It might be
convenient to simply introduce an identifier, say it, that is automatically bound to the current value of the
loop index. Thus, the first loop example above might instead be written as
(for2 from 2
to 5
in (display it))
Here’s a proposed macro that implements this construct:
(define-syntax for2
(syntax-rules (from to in)
[(for2 from low to high in bodies · · ·)
(local ([define high-value high ]
[define loop (lambda (it)
(if (> it high-value)
’done
(begin
bodies · · ·
(loop (+ it 1)))))])
(loop low ))]))
Notice that in place of var , we are now using it. When we run this in DrScheme, we get:
36.3. MORE MACROLOGY BY EXAMPLE
331
> (for2 from 2 to 5 in (display it))
reference to undefined identifier: it
Oops! What happened?
Actually, the macro system did exactly what it should. Remember hygiene? This was supposed to
prevent inadvertent capture of identifiers across the macro definition/macro use boundary. It just so happens
that in this case, we really do want it written in the macro use to be bound by it in the macro definition.
Clearly, here’s a good example of where we want to “break” hygiene, intentionally.
Unfortunately, the simple syntax-rules mechanism we’ve been using so far isn’t quite up to this task;
we must instead switch to a slightly more complex macro definition mechanism called syntax-case. For the
most part, this looks an awful lot like syntax-rules, with a little more notation. For instance, we can define
for3 to be the same macro as for1, except written using the new macro definition mechanism instead:
(define-syntax (for3 x)
(syntax-case x (from to in)
[(for3 var from low to high in bodies · · ·)
(syntax
(local ([define high-value high ]
[define loop (lambda ( var )
(if (> var high-value)
’done
(begin
bodies · · ·
(loop (+ var 1)))))])
(loop low )) ) ]))
To convert any syntax-rules macro definition into a corresponding one that uses syntax-case, we must
make the three changes boxed above (adding a parameter to the macro name, providing the parameter as an
explicit argument to syntax-case, and wrapping the entire output expression in (syntax · · ·).
We can similarly define for4:
(define-syntax (for4 x)
(syntax-case x (from to in)
[(for4 from low to high in bodies · · ·)
(syntax
(local ([define high-value high ]
[define loop (lambda (it)
(if (> it high-value)
’done
(begin
bodies · · ·
(loop (+ it 1)))))])
(loop low )))]))
332
CHAPTER 36. MACROS AS COMPILERS
This does not solve the hygiene problem; it simply enables it by converting the macro to use syntax-case.
The reason is that syntax-case provides additional capabilities. In particular, it provides a procedure called
datum→syntax-object, which takes an arbitrary Scheme datum and a term in the macro body, and “paints”
the datum with the same colors as those on the macro body. This has the effect of persuading the hygiene
mechanism to treat the introduced term as if it were written by the programmer. As a result, it gets renamed
consistently. Thus, we must write
(define-syntax (for4 x)
(syntax-case x (from to in)
[(for4 from low to high in bodies · · ·)
(with-syntax ([it (datum→syntax-object (syntax for4) ’it)])
(syntax
(local ([define high-value high ]
[define loop (lambda (it)
(if (> it high-value)
’done
(begin
bodies · · ·
(loop (+ it 1)))))])
(loop low ))))]))
The with-syntax construct introduces new pattern variables for use in the output. The first argument to
datum→syntax-object identifies which expression the identifier the expander must pretend “introduced” the
identifier. The second, in this case, is the symbol that will be painted appropriately. Therefore, the result of
expansion on our running example will look something like
(local ([define high-value 5]
[define loop (lambda (g1729)
(if (> g1729 high-value)
’done
(begin
(display g1729)
(loop (+ g1729 1)))))])
(loop 2))
Observe how the uses of it are all renamed consistently. (In practice, other bound identifiers such as high-
value and even loop will also acquire fresh names, but we don’t show that here to keep the code more
readable.) Indeed, this mechanism is sufficiently robust that it will even do the right thing with nested loops:
(for4 from 2 to 5 in
(for4 from 1 to it in
(display it))