Notes on Scheme macro systems that I’ve taken while working on Peroxide, my R5RS implementation in Rust.
Some of this overview is based on this excellent message by Alex Shinn.
A macro system is a language feature that let users specify ways in which some expressions should be rewritten before they are compiled or interpreted. Macro expansion is much harder to reason about than function application, so macros should be used with care, but judicious use of macros can reduce tedium or create very powerful DSLs.
Macro systems are available in many programming languages, although their use has varied with time. They tended to be left out in the 90s, with languages such as Java and C++, and still tend not to be present in many dynamically typed languages, as these generally offer more flexibility without having to resort to compile-time code manipulation. More recent statically typed languages, such as Scala and Rust, do include more or less advanced macro facilities.
This post examines macros in Scheme and especially the syntactic closure paradigm. It assumes basic familiarity with Lisp macros.
To follow this post, it’s important to have some clear definitions. You can also consult sections 1 and 3 of R5RS for refreshers on the Scheme binding model.
A binding is the association of an identifier to a keyword, a macro, or a memory location where a value is stored. An identifier is simply a symbol that is used in a binding. The scope of a binding is the region of code where the binding is in effect (this definition only makes sense because Scheme is lexically scoped; in dynamically scoped languages, a binding’s scope cannot be determined statically). A binding A can shadow another binding B if A is within the scope of B, and A binds an identifier that is already bound in B. For instance, in
(let ((x 1)) ; outer (let ((x 2)) ; inner x))
x binding shadows the outer
Finally, an environment is the set of bindings that are visible at some point in the code.
The need for hygiene
There are two separate issues with unhygienic macro systems. Let’s take the classic
macro as an example:
(defmacro swap! (a b) `(let ((value ,a)) (set! ,a ,b) (set! ,b value)))
When a macro introduces new symbols, they become visible to any code that’s transplanted from the outer scope, possibly shadowing bindings or at least giving rise to obscure errors. For instance, if
(swap! value other-value)
(let ((value value)) (set! value other-value) (set! other-value value))
which is a nonsensical expansion.
A macro’s expected bindings might no longer be current where the macro is expanded. For instance,
(let ((set! display)) (swap! x y))
set!to be the primitive, but it’s some different binding that the user has defined in the macro application context.
In some sense, the second case is harder to solve: the first issue can always be solved
by making sure that all new identifiers introduced in generated code (
value in this case)
are created by
gensym, a function that returns a fresh symbol on each call. This technique
does not help with the second case.
In either case, what we want is for identifiers to be tied to their environment of definition, not the environment in which they happen to be transplanted by macro expansion.
Note that this is not always true! There are examples of useful non-hygienic macros, even though hygiene is desired most of the time. Scheme macro systems such as syntactic closures let us control hygiene very finely if needed, so we can get the best of both worlds.
A classic example of a purposefully non-hygienic macro is
loop. This macro binds the
exit to a continuation which lets us escape the loop, so we can write things
(let* ((x (list 1 2 3 4 5 6))) (loop (if (= x 4) (exit) (set! x (cdr x)))) x) => (1 2 3)
It’s clear that
loop isn’t hygienic, because
exit is an identifier that is introduced
by the macro, but visible from the calling code. If we are not careful, we risk shadowing
exit binding and causing much confusion (this is issue #2 above): for instance,
maybe the user had defined
exit as an alias to
read-char outside the loop; it is not obvious from just
looking at the code that
exit has been shadowed by the
We’ve been using the
defmacro example so far, but let’s look at some other macro
syntax-rules facility is perhaps the most well-known hygienic Lisp
macro system, and is contrasted with Common Lisp’s unhygienic
defmacro. But this distinction
tends to muddy the waters a bit, because
syntax-rules also have an unrelated
defmacro is low-level, whereas
syntax-rules is high-level. In a nutshell, in a
low-level system, users construct the output expression “by hand”, using regular text or list
manipulation methods. In a high-level system, users leverage pattern matching features,
reducing expressiveness but making simple transformations much easier to define.
Being high- or low-level is unrelated to being hygienic. There are examples of macro systems with any combination of these:
||None in Lisp? m4 is an example.|
A low-level macro system: syntactic closures
How can we implement hygiene? There are several solutions, but Peroxide uses syntactic closures, which are due to Bawden & Rees. 1Bawden, Alan, and Jonathan Rees. Syntactic Closures. No. AI-M-1049. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB, 1988. https://apps.dtic.mil/dtic/tr/fulltext/u2/a195921.pdf
In Peroxide and many other Scheme systems, a macro is a lambda that takes three arguments: the form to expand, the macro’s definition environment, and the macro’s expansion environment. The macro produces code as a result, which will be inserted at the macro call site.
For instance, you can declare a low-level macro in the following way:
(define-syntax unless ; Note the characteristic signature (lambda (form usage-env macro-env) (let ((condition (cadr form)) (consequent (caddr form))) `(if (not ,condition) ,consequent))))
When the compiler sees this declaration, it immediately compiles the lambda, and binds
it to the macro
unless. Later, code like
(unless (> 0 count) (fail))
will result in our lambda being called with parameters
(unless (> 0 count) (fail)) (it’s
not critically important, but note that the macro name itself is passed to the macro as
part of the form to expand), and two environment objects representing the current and
The lambda outputs
(if (not (> 0 count)) (fail)). Symbols are treated completely normally,
i.e. they are assumed to refer to bindings that exist in the environment at the use site, not
at the macro definition site. This means that our
unless macro is unhygienic and subject
to both issues outlined above.
The syntactic closure primitives
What do these mysterious environment objects (the ones being passed to the lambda) look like? And how do we use them to guarantee hygiene?
There’s not much you can do with an environment object, except shove it in a syntactic
closure. A syntactic closure is created using
(make-syntactic-closure env free-variables form)
This returns an object that is equivalent to
form, except any free identifiers in
will refer to their bindings in
env, not whatever the local environment is, except for
any identifiers specified in
free-variables, which do refer to bindings in the local
The easiest way to use a syntactic closure is on a symbol. Take our
unless macro as
an example. Its implementation above is vulnerable to shadowing
not. We can
rewrite the macro using syntactic closures:
(define-syntax unless (lambda (form usage-env macro-env) (let ((renamed-not (make-syntactic-closure macro-env '() 'not)) (renamed-if (make-syntactic-closure macro-env '() 'if))) `(,renamed-if (,renamed-not ,(cadr form)) ,(caddr form)))))
The calls to
make-syntactic-closure here produce symbols that point to the specified
macro-env), instead of the current environment.
make-syntactic-closure can also be used to make all symbols within a large form point
to a different environment. For instance:
(define x 'outer) (let ((x 'middle)) (let-syntax ((print-middle-x (lambda (form usage-env macro-env) (make-syntactic-closure macro-env '() '(display x))))) (let ((x 'inner) (display #f)) (print-middle-x))))
middle, even though
x are shadowed at the macro expansion point.
Overall, this technique lets us precisely control which identifiers should come from which environment, solving hygiene problem #1.
Shadowing a symbol-in-a-syntactic-closure
After introducing syntactic closures to a Scheme system, we end up with two kinds of identifiers: regular old symbols, and symbols in a (possibly nested) syntactic closure, which I’ll call SSCs.
All identifiers can be assigned to, and the meaning is straightforward: for an SSC, we simply edit the memory location pointed to by the binding, like we would for a regular symbol. The issue of shadowing is more complex. Ignoring syntactic sugar, the only fundamental way to shadow an identifier is by introducing a lambda that uses that identifier as a parameter.
So when an SSC is used as a lambda parameter, any references to it within the body of that lambda will instead refer to that parameter. Outside the body of the lambda, the SSC does not change meaning. This lets a macro effectively declare a binding as private by using a syntactically closed symbol in a lambda argument or a let definition.
Note that two different SSCs, even with the same environment and the same closed symbol, will refer to two different parameters if they are shadowed. For instance:
(define x 0) (define-syntax mymacro (lambda (f use-env mac-env) (let ((x1 (make-syntactic-closure mac-env '() 'x)) (x2 (make-syntactic-closure mac-env '() 'x))) `(list ,(identifier=? mac-env x1 mac-env x2) ,(let ((x1 2)) (identifier=? mac-env x1 mac-env x2)))))) (mymacro) => (#t #f)
In effect, if you need to solve problem 2 by creating a variable that’s invisible to
expanded code, you can use an SSC as the target of your
let. If you’re using the SSC
just for that, it also doesn’t matter which environment you create the syntactic closure
Derived macro operations
Many scheme systems based on syntactic closures do not actually let users pass lambdas
in macro definitions. Instead, macros have to be created using one of
er-macro-transformer. Here are their (slightly edited) definitions in Peroxide:
(define sc-macro-transformer (lambda (f) (lambda (expr use-env mac-env) (make-syntactic-closure mac-env '() (f expr use-env))))) (define rsc-macro-transformer (lambda (f) (lambda (expr use-env mac-env) (f expr mac-env)))) (define er-macro-transformer (lambda (f) (lambda (expr use-env mac-env) (let ((rename (let ((renames '())) (lambda (identifier) (let ((cell (assq identifier renames))) (if cell (cdr cell) (let ((name (make-syntactic-closure mac-env '() identifier))) (set! renames (cons (cons identifier name) renames)) name)))))) (compare (lambda (x y) (identifier=? use-env x use-env y)))) (f expr rename compare))))))
rsc-macro-transformeris basically the same as raw macros in Peroxide (but contrary to raw macros, also exists in other implementations, such as MIT scheme).
sc-macro-transformeris reversed: all identifiers in the macro’s expansion refer to the macro’s definition environment, unless they are explicitly rebound by using
er-macro-transformeris a somewhat higher-level version of
rsc-macro-transformer: instead of being provided with two environments, a macro defined with
er-macro-transformeris provided with two functions:
renameturns a symbol into an identifier in the macro’s definition environment, and
comparechecks wether two symbols refer to the same binding. Symbols that are not
renamedwill refer to bindings in the expansion environment.
er-macro-transformeris very easy to use to write low-level macros, so it’s often preferred to
(r)sc-macro-transformer. The only thing to remember is that any identifier that we wish to hygienically insert into the output (which is probably most or all of them) must first be passed through
rename. For instance, here is a hygienic
(define-syntax unless (er-macro-transformer (lambda (expr rename compare) `(,(rename 'if) (,(rename 'not) ,(cadr expr)) ,(caddr expr))))) (unless (> 0 1) 42) ~> (if (not (> 0 1)) 42) => 42
From there, we can define
syntax-case as a (relatively complex) macro transformer. Here
is Chibi-Scheme’s definition.
Limits of the syntactic closure model and
As noted by Marc Nieper-Wißkirchen, the syntactic closures system has limitations related to nested unhygienic macros.
The essential problem is that bare symbols are assumed to refer to the definition environment, but if the definition environment already uses symbol renaming, there is no way to refer to these symbols.
Take the following unhygienic macro:
(define-syntax unhygienic (er-macro-transformer (lambda (exp rename compare) 'my-value)))
(let ((my-value 42)) (unhygienic)) => 42
While this macro is unhygienic, it is easy to make its use hygienic by wrapping it in an environment
my-value is bound. A program using the code snippet above would not be able to see that
unhygienic is unhygienic outside of the outer
let: the only unhygienically leaking identifier,
my-value, has been captured.
This would lead us to think that we can use such a macro to construct a different hygienic macro;
we just need to make sure that the hygienic macro inserts the call to
unhygienic inside an environment where
my-value is defined.
But wait! We said above that any identifiers that we want to insert hygienically from an
er-macro-transformer macro need to be
renamed. This would give us:
(define-syntax hygienic-using-unhygienic (er-macro-transformer (lambda (exp rename compare) `(let ((,(rename my-value) 42)) (unhygienic)))))
This is where the syntactic closure system partially breaks down: because we are passing
rename, the expanded body of the
unhygienic macro does not see
my-value. On the other
hand, if we don not call
hygienic-using-unhygienic would work as
expected… except it would shadow
my-value from the expansion environment, making it unhygienic.
More sophisticated macro systems, like
syntax-case, defined in R6RS and implemented in e.g. Racket,
can get around this limitation by introducing a new operator,
(datum->syntax id expr). This operator takes
two arguments, an identifier
id and an arbitrary expression
expr, and returns a syntax object equivalent
expr, but where all symbols have been turned into identifiers in the same scope as the identifier
Introducing such an operator requires modifying
er-macro-transformer and the way it defines
see this Chibi PR for a possible implementation.
We can now rewrite
(define-syntax unhygienic (er-macro-transformer (lambda (exp rename compare) (datum->syntax (car exp) 'my-value))))
This magic ensures that
unhygienic is always locating
my-value in the environment from which it was called,
even if it was renamed.
Despite some shortcomings for advanced unhygienic macros, syntactic closure-based macro systems
are easy to implement and let us define a standards-compliant
syntax-case implementation, as
well as a range of unhygienic macros.
Bawden, Alan, and Jonathan Rees. Syntactic Closures. No. AI-M-1049. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB, 1988. https://apps.dtic.mil/dtic/tr/fulltext/u2/a195921.pdf ↩