Notes on Scheme macro systems that I’ve taken while working on Peroxide, my R5RS implementation in Rust.
Some of this overview is based on this excellent message by Alex Shinn.
A macro system is a language feature that let users specify ways in which some expressions should be rewritten before they are compiled or interpreted. Macro expansion is much harder to reason about than function application, so macros should be used with care, but judicious use of macros can reduce tedium or create very powerful DSLs.
Macro systems are available in many programming languages, although their use has varied with time. They tended to be left out in the 90s, with languages such as Java and C++, and still tend not to be present in many dynamically typed languages, as these generally offer more flexibility without having to resort to compile-time code manipulation. More recent statically typed languages, such as Scala and Rust, do include more or less advanced macro facilities.
This post examines macros in Scheme and especially the syntactic closure paradigm. It assumes basic familiarity with Lisp macros.
To follow this post, it’s important to have some clear definitions. You can also consult sections 1 and 3 of R5RS for refreshers on the Scheme binding model.
A binding is the association of an identifier to a keyword, a macro, or a memory location where a value is stored. An identifier is simply a symbol that is used in a binding. The scope of a binding is the region of code where the binding is in effect (this definition only makes sense because Scheme is lexically scoped; in dynamically scoped languages, a binding’s scope cannot be determined statically). A binding A can shadow another binding B if A is within the scope of B, and A binds an identifier that is already bound in B. For instance, in
(let ((x 1)) ; outer
(let ((x 2)) ; inner
x))
the inner let
x
binding shadows the outer let
’s.
Finally, an environment is the set of bindings that are visible at some point in the code.
The need for hygiene
There are two separate issues with unhygienic macro systems. Let’s take the classic swap
macro as an example:
(defmacro swap! (a b)
`(let ((value ,a))
(set! ,a ,b)
(set! ,b value)))
-
When a macro introduces new symbols, they become visible to any code that’s transplanted from the outer scope, possibly shadowing bindings or at least giving rise to obscure errors. For instance, if
a
isvalue
:(swap! value other-value)
expands into
(let ((value value)) (set! value other-value) (set! other-value value))
which is a nonsensical expansion.
-
A macro’s expected bindings might no longer be current where the macro is expanded. For instance,
(let ((set! display)) (swap! x y))
Here,
swap!
expectedset!
to be the primitive, but it’s some different binding that the user has defined in the macro application context.
In some sense, the second case is harder to solve: the first issue can always be solved
by making sure that all new identifiers introduced in generated code (value
in this case)
are created by gensym
, a function that returns a fresh symbol on each call. This technique
does not help with the second case.
In either case, what we want is for identifiers to be tied to their environment of definition, not the environment in which they happen to be transplanted by macro expansion.
Note that this is not always true! There are examples of useful non-hygienic macros, even though hygiene is desired most of the time. Scheme macro systems such as syntactic closures let us control hygiene very finely if needed, so we can get the best of both worlds.
A classic example of a purposefully non-hygienic macro is loop
. This macro binds the
identifier exit
to a continuation which lets us escape the loop, so we can write things
like:
(let* ((x (list 1 2 3 4 5 6)))
(loop
(if (= x 4)
(exit)
(set! x (cdr x))))
x)
=> (1 2 3)
It’s clear that loop
isn’t hygienic, because exit
is an identifier that is introduced
by the macro, but visible from the calling code. If we are not careful, we risk shadowing
an existing exit
binding and causing much confusion (this is issue #2 above): for instance,
maybe the user had defined exit
as an alias to read-char
outside the loop; it is not obvious from just
looking at the code that exit
has been shadowed by the loop
macro.
We’ve been using the defmacro
example so far, but let’s look at some other macro
systems. Scheme’s syntax-rules
facility is perhaps the most well-known hygienic Lisp
macro system, and is contrasted with Common Lisp’s unhygienic defmacro
. But this distinction
tends to muddy the waters a bit, because defmacro
and syntax-rules
also have an unrelated
distinction: defmacro
is low-level, whereas syntax-rules
is high-level. In a nutshell, in a
low-level system, users construct the output expression “by hand”, using regular text or list
manipulation methods. In a high-level system, users leverage pattern matching features,
reducing expressiveness but making simple transformations much easier to define.
Being high- or low-level is unrelated to being hygienic. There are examples of macro systems with any combination of these:
Low-level | High-level | |
---|---|---|
Hygienic | Syntactic closures | Scheme’s syntax-rules |
Unhygienic | Common Lisp’s defmacro |
None in Lisp? m4 is an example. |
A low-level macro system: syntactic closures
How can we implement hygiene? There are several solutions, but Peroxide uses syntactic closures, which are due to Bawden & Rees. 1Bawden, Alan, and Jonathan Rees. Syntactic Closures. No. AI-M-1049. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB, 1988. https://apps.dtic.mil/dtic/tr/fulltext/u2/a195921.pdf
In Peroxide and many other Scheme systems, a macro is a lambda that takes three arguments: the form to expand, the macro’s definition environment, and the macro’s expansion environment. The macro produces code as a result, which will be inserted at the macro call site.
For instance, you can declare a low-level macro in the following way:
(define-syntax unless
; Note the characteristic signature
(lambda (form usage-env macro-env)
(let ((condition (cadr form))
(consequent (caddr form)))
`(if (not ,condition) ,consequent))))
When the compiler sees this declaration, it immediately compiles the lambda, and binds
it to the macro unless
. Later, code like
(unless (> 0 count) (fail))
will result in our lambda being called with parameters (unless (> 0 count) (fail))
(it’s
not critically important, but note that the macro name itself is passed to the macro as
part of the form to expand), and two environment objects representing the current and
definition environments.
The lambda outputs (if (not (> 0 count)) (fail))
. Symbols are treated completely normally,
i.e. they are assumed to refer to bindings that exist in the environment at the use site, not
at the macro definition site. This means that our unless
macro is unhygienic and subject
to both issues outlined above.
The syntactic closure primitives
What do these mysterious environment objects (the ones being passed to the lambda) look like? And how do we use them to guarantee hygiene?
There’s not much you can do with an environment object, except shove it in a syntactic
closure. A syntactic closure is created using make-syntactic-closure
:
(make-syntactic-closure env free-variables form)
This returns an object that is equivalent to form
, except any free identifiers in form
will refer to their bindings in env
, not whatever the local environment is, except for
any identifiers specified in free-variables
, which do refer to bindings in the local
environment.
The easiest way to use a syntactic closure is on a symbol. Take our unless
macro as
an example. Its implementation above is vulnerable to shadowing if
and not
. We can
rewrite the macro using syntactic closures:
(define-syntax unless
(lambda (form usage-env macro-env)
(let ((renamed-not (make-syntactic-closure macro-env '() 'not))
(renamed-if (make-syntactic-closure macro-env '() 'if)))
`(,renamed-if (,renamed-not ,(cadr form)) ,(caddr form)))))
The calls to make-syntactic-closure
here produce symbols that point to the specified
environment (macro-env
), instead of the current environment.
make-syntactic-closure
can also be used to make all symbols within a large form point
to a different environment. For instance:
(define x 'outer)
(let ((x 'middle))
(let-syntax ((print-middle-x
(lambda (form usage-env macro-env)
(make-syntactic-closure macro-env '() '(display x)))))
(let ((x 'inner) (display #f))
(print-middle-x))))
prints middle
, even though display
and x
are shadowed at the macro expansion point.
Overall, this technique lets us precisely control which identifiers should come from which environment, solving hygiene problem #1.
Shadowing a symbol-in-a-syntactic-closure
After introducing syntactic closures to a Scheme system, we end up with two kinds of identifiers: regular old symbols, and symbols in a (possibly nested) syntactic closure, which I’ll call SSCs.
All identifiers can be assigned to, and the meaning is straightforward: for an SSC, we simply edit the memory location pointed to by the binding, like we would for a regular symbol. The issue of shadowing is more complex. Ignoring syntactic sugar, the only fundamental way to shadow an identifier is by introducing a lambda that uses that identifier as a parameter.
So when an SSC is used as a lambda parameter, any references to it within the body of that lambda will instead refer to that parameter. Outside the body of the lambda, the SSC does not change meaning. This lets a macro effectively declare a binding as private by using a syntactically closed symbol in a lambda argument or a let definition.
Note that two different SSCs, even with the same environment and the same closed symbol, will refer to two different parameters if they are shadowed. For instance:
(define x 0)
(define-syntax mymacro
(lambda (f use-env mac-env)
(let ((x1 (make-syntactic-closure mac-env '() 'x))
(x2 (make-syntactic-closure mac-env '() 'x)))
`(list
,(identifier=? mac-env x1 mac-env x2)
,(let ((x1 2))
(identifier=? mac-env x1 mac-env x2))))))
(mymacro)
=> (#t #f)
In effect, if you need to solve problem 2 by creating a variable that’s invisible to
expanded code, you can use an SSC as the target of your let
. If you’re using the SSC
just for that, it also doesn’t matter which environment you create the syntactic closure
in.
Derived macro operations
Many scheme systems based on syntactic closures do not actually let users pass lambdas
in macro definitions. Instead, macros have to be created using one of sc-macro-transformer
,
rsc-macro-transformer
, or er-macro-transformer
. Here are their (slightly edited) definitions in Peroxide:
(define sc-macro-transformer
(lambda (f)
(lambda (expr use-env mac-env)
(make-syntactic-closure mac-env '() (f expr use-env)))))
(define rsc-macro-transformer
(lambda (f)
(lambda (expr use-env mac-env)
(f expr mac-env))))
(define er-macro-transformer
(lambda (f)
(lambda (expr use-env mac-env)
(let ((rename
(let ((renames '()))
(lambda (identifier)
(let ((cell (assq identifier renames)))
(if cell
(cdr cell)
(let ((name (make-syntactic-closure mac-env '() identifier)))
(set! renames (cons (cons identifier name) renames))
name))))))
(compare
(lambda (x y) (identifier=? use-env x use-env y))))
(f expr rename compare))))))
rsc-macro-transformer
is basically the same as raw macros in Peroxide (but contrary to raw macros, also exists in other implementations, such as MIT scheme).sc-macro-transformer
is reversed: all identifiers in the macro’s expansion refer to the macro’s definition environment, unless they are explicitly rebound by usingmake-syntactic-closure
onuse-env
.-
er-macro-transformer
is a somewhat higher-level version ofrsc-macro-transformer
: instead of being provided with two environments, a macro defined wither-macro-transformer
is provided with two functions:rename
turns a symbol into an identifier in the macro’s definition environment, andcompare
checks wether two symbols refer to the same binding. Symbols that are notrenamed
will refer to bindings in the expansion environment.er-macro-transformer
is very easy to use to write low-level macros, so it’s often preferred to(r)sc-macro-transformer
. The only thing to remember is that any identifier that we wish to hygienically insert into the output (which is probably most or all of them) must first be passed throughrename
. For instance, here is a hygienicunless
:
(define-syntax unless
(er-macro-transformer (lambda (expr rename compare)
`(,(rename 'if) (,(rename 'not) ,(cadr expr)) ,(caddr expr)))))
(unless (> 0 1) 42)
~> (if (not (> 0 1)) 42)
=> 42
From there, we can define syntax-case
as a (relatively complex) macro transformer. Here
is Chibi-Scheme’s definition.
Limits of the syntactic closure model and syntax-case
As noted by Marc Nieper-Wißkirchen, the syntactic closures system has limitations related to nested unhygienic macros.
The essential problem is that bare symbols are assumed to refer to the definition environment, but if the definition environment already uses symbol renaming, there is no way to refer to these symbols.
Take the following unhygienic macro:
(define-syntax unhygienic
(er-macro-transformer (lambda (exp rename compare)
'my-value)))
We have
(let ((my-value 42))
(unhygienic))
=> 42
While this macro is unhygienic, it is easy to make its use hygienic by wrapping it in an environment
where my-value
is bound. A program using the code snippet above would not be able to see that
unhygienic
is unhygienic outside of the outer let
: the only unhygienically leaking identifier,
my-value
, has been captured.
This would lead us to think that we can use such a macro to construct a different hygienic macro;
we just need to make sure that the hygienic macro inserts the call to unhygienic
inside an environment where
my-value
is defined.
But wait! We said above that any identifiers that we want to insert hygienically from an
er-macro-transformer
macro need to be renamed
. This would give us:
(define-syntax hygienic-using-unhygienic
(er-macro-transformer (lambda (exp rename compare)
`(let ((,(rename my-value) 42))
(unhygienic)))))
This is where the syntactic closure system partially breaks down: because we are passing my-value
through rename
, the expanded body of the unhygienic
macro does not see my-value
. On the other
hand, if we don not call rename
on my-value
, hygienic-using-unhygienic
would work as
expected… except it would shadow my-value
from the expansion environment, making it unhygienic.
More sophisticated macro systems, like syntax-case
, defined in R6RS and implemented in e.g. Racket,
can get around this limitation by introducing a new operator, (datum->syntax id expr)
. This operator takes
two arguments, an identifier id
and an arbitrary expression expr
, and returns a syntax object equivalent
to expr
, but where all symbols have been turned into identifiers in the same scope as the identifier id
.
Introducing such an operator requires modifying er-macro-transformer
and the way it defines rename
;
see this Chibi PR for a possible implementation.
We can now rewrite unhygienic
as
(define-syntax unhygienic
(er-macro-transformer (lambda (exp rename compare)
(datum->syntax (car exp) 'my-value))))
This magic ensures that unhygienic
is always locating my-value
in the environment from which it was called,
even if it was renamed.
Conclusion
Despite some shortcomings for advanced unhygienic macros, syntactic closure-based macro systems
are easy to implement and let us define a standards-compliant syntax-case
implementation, as
well as a range of unhygienic macros.
-
Bawden, Alan, and Jonathan Rees. Syntactic Closures. No. AI-M-1049. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB, 1988. https://apps.dtic.mil/dtic/tr/fulltext/u2/a195921.pdf ↩
Comments