Macro systems in Scheme

13 May 2020

Notes on Scheme macro systems that I’ve taken while working on Peroxide, my R5RS implementation in Rust.

Some of this overview is based on this excellent message by Alex Shinn.


A macro system is a language feature that let users specify ways in which some expressions should be rewritten before they are compiled or interpreted. Macro expansion is much harder to reason about than function application, so macros should be used with care, but judicious use of macros can reduce tedium or create very powerful DSLs.

Macro systems are available in many programming languages, although their use has varied with time. They tended to be left out in the 90s, with languages such as Java and C++, and still tend not to be present in many dynamically typed languages, as they generally offer more flexibility without having to resort to compile-time code manipulation. More recent statically typed languages, such as Scala and Rust, do include more or less advanced macro facilities.

This post examines macros in Scheme and especially the syntactic closure paradigm. It assumes basic familiarity with Lisp macros.

To follow this post, it’s important to have some clear definitions. You can also consult sections 1 and 3 of R5RS for refreshers on the Scheme binding model.

A binding is the association of an identifier to a keyword, a macro, or a memory location where a value is stored. An identifier is simply a symbol that is used in a binding. The scope of a binding is the region of code where the binding is in effect (this definition only makes sense because Scheme is lexically scoped; in dynamically scoped languages, a binding’s scope cannot be determined statically). A binding A can shadow another binding B if A is within the scope of B, and A binds an identifier that is already bound in B. For instance, in

(let ((x 1))    ; outer
  (let ((x 2))  ; inner
    x))

the inner let x binding shadows the outer let’s.

Finally, an environment is a set of nested bindings.

The need for hygiene

There are two separate issues with unhygienic macro systems. Let’s take the classic swap macro as an example:

(defmacro swap! (a b)
  `(let ((value ,a)) 
     (set! ,a ,b) 
     (set! ,b value))) 
  1. When a macro introduces new symbols, they become visible to any code that’s transplanted from the outer scope, possibly shadowing bindings or at least giving rise to obscure errors. For instance, if a is value:

    (swap! value other-value)
    ~>
    (let ((value value))
      (set! value other-value)
      (set! other-value value))
    

    which is a nonsensical expansion.

  2. A macro’s expected bindings might no longer be current where the macro is expanded. For instance,

    (let ((set! display))
       (swap! x y))
    

    Here, swap! expected set! to be the primitive, but it’s some different binding that the user has defined in the macro application context.

In some sense, the second case is harder to solve: the first issue can always be solved by making sure that all new identifiers introduced in generated code (value in this case) are created by gensym, a function that returns a fresh symbol on each call. This technique does not help with the second case.

In either case, what we want is for identifiers to be tied to their environment of definition, not the environment in which they happen to be transplanted by macro expansion.

Note that this is not always true! There are examples of useful non-hygienic macros, even though hygiene is desired most of the time. Scheme macro systems such as syntactic closures let us control hygiene very finely if needed, so we can get the best of both worlds.

A classic example of a purposefully non-hygienic macro is loop. This macro binds the identifier exit to a continuation which lets us escape the loop, so we can write things like:

(let* ((x (list 1 2 3 4 5 6)))
  (loop
    (if (= x 4)
        (exit)
        (set! x (cdr x))))
  x)
  
=> (1 2 3)

It’s clear that loop isn’t hygienic, because exit is an identifier that is introduced by the macro, but visible from the calling code. If we are not careful, we risk shadowing an existing exit binding and causing much confusion (this is issue #2 above): for instance, maybe the user had defined exit as an alias to read-char outside the loop; it is not obvious from just looking at the code that exit has been shadowed by the loop macro.

We’ve been using the defmacro example so far, but let’s look at some other macro systems. Scheme’s syntax-rules facility is perhaps the most well-known hygienic Lisp macro system, and is contrasted with Common Lisp’s unhygienic defmacro. But this distinction tends to muddy the waters a bit, because defmacro and syntax-rules also have an unrelated distinction: defmacro is low-level, whereas syntax-rules is high-level. In a nutshell, in a low-level system, users construct the output expression “by hand”, using regular text or list manipulation methods. In a high-level system, users leverage pattern matching features, reducing expressiveness but making simple transformations much easier to define.

Being high- or low-level is unrelated to being hygienic. There are examples of macro systems with any combination of these:

  Low-level High-level
Hygienic Syntactic closures Scheme’s syntax-rules
Unhygienic Common Lisp’s defmacro None in Lisp? m4 is an example.

A low-level macro system: syntactic closures

How can we implement hygiene? There are several solutions, but Peroxide uses syntactic closures, which are due to Bawden & Rees. 1

In Peroxide and many other Scheme systems, a macro is a lambda that takes three arguments: the form to expand, the macro’s definition environment, and the macro’s expansion environment. The macro produces code as a result, which will be inserted at the macro call site.

For instance, you can declare a low-level macro in the following way:

(define-syntax unless
  ; Note the characteristic signature
  (lambda (form usage-env macro-env)
    (let ((condition (cadr form))
          (consequent (caddr form)))
      `(if (not ,condition) ,consequent))))  

When the compiler sees this declaration, it immediately compiles the lambda, and binds it to the macro unless. Later, code like

(unless (> 0 count) (fail))

will result in our lambda being called with parameters (unless (> 0 count) (fail)) (it’s not critically important, but note that the macro name itself is passed to the macro as part of the form to expand), and two environment objects representing the current and definition environments.

The lambda outputs (if (not (> 0 count)) (fail)). Symbols are treated completely normally, i.e. they are assumed to refer to bindings that exist in the environment at the use site, not at the macro definition site. This means that our unless macro is unhygienic and subject to both issues outlined above.

The syntactic closure primitives

What do these mysterious environment objects (the ones being passed to the lambda) look like? And how do we use them to guarantee hygiene?

There’s not much you can do with an environment object, except shove it in a syntactic closure. A syntactic closure is created using make-syntactic-closure:

(make-syntactic-closure env free-variables form)

This returns an object that is equivalent to form, except any free identifiers in form will refer to their bindings in env, not whatever the local environment is, except for any identifiers specified in free-variables, which do refer to bindings in the local environment.

The easiest way to use a syntactic closure is on a symbol. Take our unless macro as an example. Its implementation above is vulnerable to shadowing if and not. We can rewrite the macro using syntactic closures:

(define-syntax unless
  (lambda (form usage-env macro-env)
    (let ((renamed-not (make-syntactic-closure macro-env '() 'not))
          (renamed-if (make-syntactic-closure macro-env '() 'if)))
      `(,renamed-if (,renamed-not ,(cadr form)) ,(caddr form)))))

The calls to make-syntactic-closure here produce symbols that point to the specified environment (macro-env), instead of the current environment.

make-syntactic-closure can also be used to make all symbols within a large form point to a different environment. For instance:

(define x 'outer)
(let ((x 'middle))
  (let-syntax ((print-middle-x
                 (lambda (form usage-env macro-env)
                   (make-syntactic-closure macro-env '() '(display x)))))
    (let ((x 'inner) (display #f))
      (print-middle-x))))

prints middle, even though display and x are shadowed at the macro expansion point.

Overall, this technique lets us precisely control which identifiers should come from which environment, solving hygiene problem #1.

Shadowing a symbol-in-a-syntactic-closure

After introducing syntactic closures to a Scheme system, we end up with two kinds of identifiers: regular old symbols, and symbols in a (possibly nested) syntactic closure, which I’ll call SSCs.

All identifiers can be assigned to, and the meaning is straightforward: for an SSC, we simply edit the memory location pointed to by the binding, like we would for a regular symbol. The issue of shadowing is more complex. Ignoring syntactic sugar, the only fundamental way to shadow an identifier is by introducing a lambda that uses that identifier as a parameter.

So when an SSC is used as a lambda parameter, any references to it within the body of that lambda will instead refer to that parameter. Outside the body of the lambda, the SSC does not change meaning. This lets a macro effectively declare a binding as private by using a syntactically closed symbol in a lambda argument or a let definition.

Note that two different SSCs, even with the same environment and the same closed symbol, will refer to two different parameters if they are shadowed. For instance:

(define x 0)
(define-syntax mymacro
  (lambda (f use-env mac-env)
    (let ((x1 (make-syntactic-closure mac-env '() 'x))
          (x2 (make-syntactic-closure mac-env '() 'x)))
      `(list
         ,(identifier=? mac-env x1 mac-env x2)
         ,(let ((x1 2))
            (identifier=? mac-env x1 mac-env x2))))))

(mymacro)
=> (#t #f)

In effect, if you need to solve problem 2 by creating a variable that’s invisible to expanded code, you can use an SSC as the target of your let. If you’re using the SSC just for that, it also doesn’t matter which environment you create the syntactic closure in.

Derived macro operations

Many scheme systems based on syntactic closures do not actually let users pass lambdas in macro definitions. Instead, macros have to be created using one of sc-macro-transformer, rsc-macro-transformer, or er-macro-transformer. Here are their (slightly edited) definitions in Peroxide:


(define sc-macro-transformer
  (lambda (f)
    (lambda (expr use-env mac-env)
      (make-syntactic-closure mac-env '() (f expr use-env)))))

(define rsc-macro-transformer
  (lambda (f)
    (lambda (expr use-env mac-env)
      (f expr mac-env))))


(define er-macro-transformer
  (lambda (f)
    (lambda (expr use-env mac-env)
      (let ((rename 
              (let ((renames '()))
                (lambda (identifier)
                  (let ((cell (assq identifier renames)))
                    (if cell
                        (cdr cell)
                        (let ((name (make-syntactic-closure mac-env '() identifier)))
                          (set! renames (cons (cons identifier name) renames))
                          name))))))
            (compare 
              (lambda (x y) (identifier=? use-env x use-env y))))
        (f expr rename compare))))))
  • rsc-macro-transformer is basically the same as raw macros in Peroxide (but contrary to raw macros, also exists in other implementations, such as MIT scheme).
  • sc-macro-transformer is reversed: all identifiers in the macro’s expansion refer to the macro’s definition environment, unless they are explicitly rebound by using make-syntactic-closure on use-env.
  • er-macro-transformer is a somewhat higher-level version of rsc-macro-transformer: instead of being provided with two environments, a macro defined with er-macro-transformer is provided with two functions: rename turns a symbol into an identifier in the macro’s definition environment, and compare checks wether two symbols refer to the same binding. Symbols that are not renamed will refer to bindings in the expansion environment.

    er-macro-transformer is very easy to use to write low-level macros, so it’s often preferred to (r)sc-macro-transformer. The only thing to remember is that any identifier that we wish to hygienically insert into the output (which is probably most or all of them) must first be passed through rename. For instance, here is a hygienic unless:

(define-syntax unless
  (er-macro-transformer (lambda (expr rename compare)
    `(,(rename 'if) (,(rename 'not) ,(cadr expr)) ,(caddr expr)))))

(unless (> 0 1) 42)
~> (if (not (> 0 1)) 42)
=> 42

From there, we can define syntax-case as a (relatively complex) macro transformer. Here is Chibi-Scheme’s definition.

Limits of the syntactic closure model and syntax-case

As noted by Marc Nieper-Wißkirchen, the syntactic closures system has limitations related to nested unhygienic macros.

The essential problem is that bare symbols are assumed to refer to the definition environment, but if the definition environment already uses symbol renaming, there is no way to refer to these symbols.

Take the following unhygienic macro:

(define-syntax unhygienic
  (er-macro-transformer (lambda (exp rename compare)
    'my-value)))

We have

(let ((my-value 42))
  (unhygienic))
=> 42

While this macro is unhygienic, it is easy to make its use hygienic by wrapping it in an environment where my-value is bound. A program using the code snippet above would not be able to see that unhygienic is unhygienic outside of the outer let: the only unhygienically leaking identifier, my-value, has been captured.

This would lead us to think that we can use such a macro to construct a different hygienic macro; we just need to make sure that the hygienic macro inserts the call to unhygienic inside an environment where my-value is defined.

But wait! We said above that any identifiers that we want to insert hygienically from an er-macro-transformer macro need to be renamed. This would give us:

(define-syntax hygienic-using-unhygienic
  (er-macro-transformer (lambda (exp rename compare)
    `(let ((,(rename my-value) 42))
        (unhygienic)))))

This is where the syntactic closure system partially breaks down: because we are passing my-value through rename, the expanded body of the unhygienic macro does not see my-value. On the other hand, if we don not call rename on my-value, hygienic-using-unhygienic would work as expected… except it would shadow my-value from the expansion environment, making it unhygienic.

More sophisticated macro systems, like syntax-case, defined in R6RS and implemented in e.g. Racket, can get around this limitation by introducing a new operator, (datum->syntax id expr). This operator takes two arguments, an identifier id and an arbitrary expression expr, and returns a syntax object equivalent to expr, but where all symbols have been turned into identifiers in the same scope as the identifier id.

Introducing such an operator requires modifying er-macro-transformer and the way it defines rename; see this Chibi PR for a possible implementation.

We can now rewrite unhygienic as

(define-syntax unhygienic
  (er-macro-transformer (lambda (exp rename compare)
    (datum->syntax (car exp) 'my-value))))

This magic ensures that unhygienic is always locating my-value in the environment from which it was called, even if it was renamed.

Conclusion

Despite some shortcomings for advanced unhygienic macros, syntactic closure-based macro systems are easy to implement and let us define a standards-compliant syntax-case implementation, as well as a range of unhygienic macros.

  1. Bawden, Alan, and Jonathan Rees. Syntactic Closures. No. AI-M-1049. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB, 1988. https://apps.dtic.mil/dtic/tr/fulltext/u2/a195921.pdf 

Comments