# Implementing Unlambda

7 September 2019

I recently implemented Relambda, a bytecode compiling Unlambda interpreter. In this post, I review Unlambda, Bytecode compilation, and the tricks I used to make them work together.

Unlambda is an esoteric programming language in which all values are functions. I’ll review the basics of the language here, but the Unlambda homepage by David Madore (creator of the language) has a lot more details on how to actually write Unlambda programs.

An unresolved question asked by the Unlambda homepage is whether Unlambda can be meaningfully compiled, and what kind of runtime it would need. I try to adress this in the section about the d special form, which is the most problematic part.

## Unlambda overview

Unlambda is essentially an implementation of SKI calculus. You give it a SKI expression, and it reduces it until it either gets a single combinator, or loops for ever (as is the curse of Turing completeness).

Unlambda uses a nonstandard syntax for applications: instead of (xy) or xy, it uses xy (which is a bit of a pain to write in Markdown).

Besides the usual K, S and I, denoted k, s and i, we also have a few other operators:

• v returns itself when applied. It is none other than $UU$, where $U = S(S(KS)K)K$, as we saw in the SKI post,
• .x, where x is any letter, behaves like i but has the side-effect of printing the letter x when applied,
• r is like .x, but prints a line feed instead,
• c is Scheme’s call/cc. If you don’t know Scheme, this is a bit complicated to explain. When applying c to another combinator x,
• the current continuation is saved and bundled into a new combinator, which we’ll call c',
• x is called on c', and
• applying c' at any later time will teleport control flow back to the original c application, but this time the result of the c application will be the argument to c'. More on this later;
• d is a special operator, which behaves very differently from the rest. The expression that d would normally be applied to is not evaluated; instead, it’s packaged into a promise, which d returns. We’ll come back to the exact semantics;
• Unlambda 2.0 introduces e to exit and @, ? and | for input. These aren’t very difficult to implement, so we won’t be focusing on them.

For example, a simple hello world program Unlambda is

r.!.d.l.r.o.w. .,.o.l.l.e.Hi


Evaluation of this program works like this:

• first we evaluate .Hi, printing ‘H’ and returning i,
• then we evaluate .eHi = ei, printing ‘e’ end returning i,
• finally, we evaluate ri, printing a newline and returning i.

## Implementing the base combinators

Let’s focus on s, k, and i for now; v and .x can essentially be expressed in terms of these combinators, so they don’t have much relevance for the design of the interpreter. We’ll come back to c and d later.

An input expression, say skki, gets parsed into a tree where leaf nodes represent base combinators and inner nodes represent applications.

This is easy enough to interpret. For any node, recursively interpret both subtrees, then perform the application. If the node is a leaf (i.e. a combinator), just return it.

The result of evaluating ix, where x is any combinator, is just x, so that’s easy. But how should we represent the result of evaluating, kx? Well, we can just create a new combinator class, K1, which holds the combinator k was applied to. Thus kx returns the object K1(x). We are effectively just performing currying manually again. We can apply the same trick to the s combinator, and we create two partially-applied s objects, S1 and S2.

Evaluating an application of K1 is easy (K1(x)y = y). Evaluating S2 is not much harder: we need to create a new subtree and recursively run the evaluator on it.

Here is a very basic interpreter (in Python) that works with S, K and I. I’ve tried to keep it as small as possible at the expense of cleanliness, but I hope I’ll be forgiven:

@attr.s
class I: pass
@attr.s
class K: pass
@attr.s
class S: pass
@attr.s
class K1: val = attr.ib()
@attr.s
class S1: x = attr.ib()
@attr.s
class S2: x = attr.ib(); y = attr.ib()

def interpret(tree):
if isinstance(tree, tuple):
operator = interpret(tree)
operand = interpret(tree)
if isinstance(operator, I):
return operand
if isinstance(operator, K):
return K1(operand)
if isinstance(operator, S):
return S1(operand)
if isinstance(operator, K1):
return operator.val
if isinstance(operator, S1):
return S2(operator.x, operand)
if isinstance(operator, S2):
x, y, z = operator.x, operator.y, operand
return interpret(((x, z), (y, z)))
else:
return tree

interpret((S(), (K(), (I(), S()))))
# => S1(x=K1(val=S()))


## Trampolines: Adding an explicit stack

We know that SKI is Turing complete, and what makes a language Turing complete is basically the ability to write unbounded loops. A simple example of a loop is the combinator $L := SII$, which loops forever when applied to itself. Indeed,

Notice how the evaluation rules run forever but in constant space. However, if you try running

L = (('S', 'I'), 'I')
interpret((L, L))


Python will die with a RecursionError, because the evaluation of the S2 operator results in a tail call that isn’t optimized away. This issue isn’t limited to this pathological case: any looping behavior in Unlambda has to rely on recursion and proper tail call optimization (TCO) on the part of the interpreter. (If you need a refresher on how that works, I recommend section 1.2 of SICP.) So we need our interpreter to perform TCO.

The key observation is that the interpreter above is implicitly using Python’s stack as the Unlambda call stack, so Python not having TCO prevents our interpreter from doing it either. We can undo this by using a longtime favorite of functional interpreters: trampolining (aka continuation passing style, or CPS).

The idea of CPS here is to build up an object (called a continuation) that represents what the program needs to do next. Each call to interpret will take a continuation and a value, and return a new continuation and value. This continues until the continuation we get represents termination of the program. The continuation itself will enclose the Unlambda stack; in fact, it is easiest to represent a continuation in memory a stack of tasks that need to be taken.

All programs start with the continuation [Complete(), Eval()], and the current value set to the input program. In the first iteration, the Eval() task is popped off the continuation and executed. If the current value is a combinator, it does nothing else. In the second iteration, we pop the Complete() task off the stack and terminate the program with the current value, which is still that combinator, as the final result. On the other hand, if the current value for Eval() is an application, more tasks will be pushed on the current continuation. Tasks can also store parameters if they need to.

So here is a CPS interpreter for an Unlambda tree (reusing the combinator definitions above):

@attr.s
class Complete: pass
@attr.s
class Eval: pass
@attr.s
class ApplyUnevaluatedArg: unevaluated_operand = attr.ib()
@attr.s
class Apply: operator = attr.ib()

def interpret_cps(tree):
continuation = [Complete(), Eval()]
value = tree
while True:
return value
if isinstance(value, tuple):
continuation.append(ApplyUnevaluatedArg(value))
continuation.append(Eval())
value = value
continuation.append(Apply(value))
continuation.append(Eval())
pass
value = K1(value)
value = S1(value)
value = ((x, z), (y, z))
continuation.append(Eval())


This time, running interpret_cps((L, L)) will indeed make our interpreter run forever, using a bounded amount of memory.

This continuation-style representation has another big advantage: it lets us implement the c operator (aka call/cc) very easily. Calling c will create a new C1 combinator holding the current continuation. Calling that combinator will discard the now-current continuation and value and replace them with the saved continuation and the value C1 was applied to, respectively.

## Bytecode: Separating code and data

Our code above can actually be optimized if we notice that which tasks are in the continuation at any point can mostly be determined statically by looking at the initial code structure (which values are held by these tasks, however, is determined at runtime). We can then turn our initial tree into a list of tasks to take sequentially, and use the continuation / stack exclusively to store the values that the tasks previously held. This list of tasks is called bytecode. An advantage of bytecode interpreters is that they are conceptually close to a compiler: we can take the generated bytecode and turn it into a sequence of machine instructions, effectively compiling the interpreted code.

For instance, Relambda will convert the expression si into (essentially) Push(S), Push(I), Apply, Push(I), Apply. The Push task adds a combinator to the stack, the Apply task takes the two topmost combinators on the stack and applies the first to the second.

How does c work now? A continuation object must encapsulate the full state of the computation, “what to do next”. Clearly, this is no longer just the stack, which now only contains the data needed to perform the next steps of the program. But since the tasks are fixed ahead of time, the only other thing we need in a continuation is the index of the next step to execute. When restoring a continuation, we can simply restore the stack, jump to this saved index and continue execution from there.

The S2 combinator is a bit difficult to convert. In a bytecode interpreter, the initial abstract syntax tree (AST) is converted to bytecode, then discarded; no trace of it remains at runtime. This is necessary because mixing the two styles essentially forces the interpreter to implement eval, a method that can turn the AST into bytecode at runtime, which would complicate the machinery quite a bit1. However, notice how the CPS interpreter above takes advantage of the fact that we do keep the AST around to evaluate an S2 operation.

However, the AST created by S2 is has a completely static structure, so we can take the approach of compiling that static structure down to bytecode and having the Apply task on S2 jump to this piece of bytecode (after setting up the stack appropriately). We also need a way for execution to return to the original code once the evaluation of S2 is done. To handle this, we introduce a return stack which holds the addresses to which we should jump back when the current S2 evaluation is complete. We will reuse this return stack to implement the d operator.

## The d special form

d is interesting for being a control form that can also be used as a combinator. The effect of applying d to some argument a is unlike that of any other combinator: a is not interpreted any further than it already is, and is instead frozen as the body of a promise D(a). When the promise gets applied to an argument x, a is finally evaluated, then the resulting value is applied to x.

d is a bit tricky to compile. In most traditional languages, we can know at compile time where a control form ends up, as they are typically keywords. In Unlambda, d can be the result of evaluating an expression, so we can’t know in advance where a d will show up! There are actually three somewhat different cases where d can end up:

1. Any application in the original AST can have its operator evaluate to d. In this case, we need to freeze the whole operand value (which could be an arbitrarily deeply tree) into a promise. This is a bit complex, but it’s also the least “dynamic” of the three cases as we know where in the bytecode it can happen.
2. An expression of the form sxyz → xzyz, where xz evaluates to d. This case is also fairly static, and we are guaranteed that the value of the promise (yz) is an application of two combinators, not any tree as is the case in case 1.
3. An expression of the form cx, or sxyz, or D(d)z where x or y evaluate to d, and D(d) is a promise of d. This case is even more restricted, as we are now guaranteed that the argument to d has already been evaluated into a single combinator.

Case 2 and 3 are easily dispatched. In case 3, d is more of a regular combinator than a special form—it performs no magical freezing, as its argument is already fully evaluated as a combinator. d simply creates a promise of the combinator that acts like the combinator itself in nearly all respects2. Adding a rule for d to Apply will make this work without issue.

For case 2, we create a special task that is only used in the S2 subroutine. This task checks if the evaluated operator is d, and, if yes, creates a special promise holding the two combinators in the application. We also need to add a rule to Apply to evaluate such a promise, which can be done with a small subroutine à la S2.

At first, it might seem like case 1 makes d behave more or less like quote-eval in a language like Scheme. d’s behavior is actually much more restricted. The object returned by d is similar to a 0-argument lambda in Scheme; in fact, d is similar to a lambda keyword that can dynamically pop up at runtime. But we statically know all the places where d could potentially pop up: these are exactly the places in the bytecode following the evaluation of an operator. In each of these locations, we add a special operator that checks if the just-evaluated operator is in fact d. If not, it simply proceeds as normal. If yes, it creates a new promise holding the current index in the bytecode, and jumps to the position in bytecode after the evaluation of the argument (easy since we also statically know where that is).

To evaluate the generated promise, we push the current bytecode index to the return stack and jump to the bytecode index pointed to by the promise. Once we reach the end of the tasks that correspond to evaluating the argument (which is the same position d skipped to when the promise was created), we return to the position in bytecode at the top of the return stack (the one we pushed before evaluating the promise). After this jumping operation, the promise has been forced and the corresponding value is on top of the stack.

Taking a step back, this is almost exactly how a bytecode interpreter for a conventional language would compile and invoke a lambda. We simply use the fact that evaluating an AST is the same as executing the bytecode we created from that AST.

## Conclusion

This post was focused on Unlambda and the implementation I wrote, Relambda, but many of the concepts touched upon here are, such as TCO, CPS, and dynamic lambda creation, are applicable generally to functional language interpreters. I think the construction I give to compile d to bytecode means that Unlambda can in fact be reasonably compiled, although the exact meaning of compilation can always be debated!

1. A particular difficulty arises because we probably want to garbage-collect the bytecode generated by eval, but generally we assume that bytecode lives in a single continuous array of linear code. Another issue is that not mixing AST and bytecode will break our continuation model.
2. The only difference between a combinator and its combinator-promise is the case where the combinator itself is d. D(d) behaves differently than d in that it is not itself the d special form, so it does not prevent the operand from being evaluated. For all other combinators, D(x)x; without the d case, we could just return a combinator instead of its promise (although without d`, we couldn’t create promises in the first place).