I recently implemented Relambda, an esolang interpreter based on SKI combinator calculus. This post explores SKI and its relationship to lambda calculus. A basic understanding of lambda calculus is useful.
SKI Combinator Calculus
SKI combinator calculus is a very simple Turing complete computational system, that, according to Wikipedia, is “not convenient for writing software”. SKI can be seen as a functional counterpart to classical Turing tarpits: it’s an even more minimal lambda calculus, where Brainfuck is an even more minimal Turing machine.
In SKI, like in lambda calculus, all objects are functions: they can be applied to an argument and they will return a value. Of course, since all objects are functions, the argument and return value will have to be a functions themselves. We call these functions combinators.
Application of combinator $y$ to combinator $x$ is denoted $(xy)$. Application is leftassociative, and some parentheses can be left out. For instance, $xy=(xy)$, $w(xy)z=((w(xy))z)$, and $xy(zw)=((xy)(zw))$.
SKI starts with only 3 combinators:
 I (Identity): when applied, it returns its argument. For instance,
 $II = I$
 $IK = K$.
 K (Kestrel, or Constant): when applied to a value $x$, it returns a function $K_x$ that itself yields $x$ when applied. For instance,
 $(KI)K=(K_IK)=I$
 $I(KK)K=(IK_K)K=K_KK=K$.
 S (Starling): when applied to three values $x$, $y$, and $z$, returns $Sxyz = xz(yz)$. For instance,
 $SKKS = KS(KS) = K_SK_S = S$
 $SKSI = KI(SI) = K_I(SI) = I$
In the last expression above, I didn’t have to evaluate $SI$ because whatever the result, $K_I$ ignores it and return $I$. However, $SI$ is a combinator itself, which we’ll note $S_I$; in general, $Sx$ = $S_x$. When applied to another value, $y$, it returns $S_xy = S_{x,y}$. When that combinator is finally applied, we get the final version: $S_{x,y}z = xz(yz)$. You may know this as currying (we’ve talked about it in the past), and it’s common in functional languages.
So S doesn’t do much when applied once, it just stores its argument. If the result is applied again, it adds the second argument to storage. On the third application, it’s finally ready to perform the Starling transformation. So the rule we gave above for S is really:
This also reassuringly proves that our rule for S is compatible with our application and priority rules.
We’ll explain in a moment why S, K and I make sense as our base combinators.
I isn’t actually primitive
A quick interlude: can we write I in terms of K and S? That is, we want to get a combinator $I’$ such that $I’x = x$ for all $x$.
Well, given any value $x$, we know that $KxK = K_xK = x$. (Note that we could use any combinator instead of the rightmost $K$.) But $KxK$ doesn’t work as an identity combinator, because we have to plop the value right in the middle of it! Application only works by putting values to the right of the applied combinator.
But S lets us reorganize elements in an application. We know that $Sxyz = xz(yz)$; notice how we now have a $z$ in the middle of the expression. Substituting $x=K$, we get $SKyz = Kz(yz) = K_z(yz) = z$. So, for any $y$, $SKy = I$. In particular we have $SKK = SKS = I$, so we could do without the I combinator in our base set and create it from S and K. Everything would just be a bit lengthier to write.
SKI and Lambda Calculus
We can clearly do some stuff with SKI, but it’s intuitively unclear how much. As I’ve said above, SKI is Turing complete. But how can we perform more complicated operations? It turns out SKI is pretty easy to map to lambda calculus.
Let’s quickly recap the key elements of lambda calculus:
 Expressions are made of abstractions (e.g., $λy. y$), and applications (e.g., $xy$). Abstractions behave like lambda functions in any functional languages (in fact, that’s where lambda functions got their names).
 There are 3 transformations: α, β, and ɣreduction. The most important is βreduction, which tells us how to process an application: it says we can transform $((λ x.A) B)$ into $A’$, where $A’$ is obtained by replacing $x$ with $B$ everywhere it occurs in the body of $A$. We write that $A’ = A[x \rightarrow B]$. α and ɣreduction essentially deal with variable name conflicts, which are of little interest for SKI.
The fact that SKI and lambda calculus are equivalent may seem strange at first: SKI does not have abstractions, even though these are precisely what allows us to create Turing complete loops and the cool Y combinator.
SKI → Lambda Calculus
To start with, given an SKI expression, how can we turn it into a lambda calculus expression? If we can achieve this, we will have proven… Absolutely nothing (actually, this proves that lambda calculus is at least as powerful as SKI).
Converting the base operators is straightforward:
 I becomes, obviously enough, $\iota := λx. x$, an abstraction that just returns its argument.
 K becomes $\kappa := λx. λy. x$: keep the first argument, ignore the second.
 S becomes $\sigma := λx. λy. λz. xz(yz)$, which does its Starling thing.
Now if we just turn SKI applications into lambda calculus applications, we have fully mapped SKI into lambda calculus (and in a very rigorous way at that /s).
Example
Ok, let’s try mapping an expression into lambda calculus:
which is very good, because we knew that $SKI$ = I, and I maps to $λx. x$. This derivation is a little annoying, but really not difficult if you’re careful.
Lambda Calculus → SKI
Now for the harder part. We take a lambda calculus expression, and we want to turn it into SKI. We will still map lambda calculus applications into SKI applications, so that part is easy. The crux is mapping abstraction.
Assume we have an abstraction $f = λx. A$. The body, $A$, can be one of four things:
 $x$. In this case, $f = \iota = λx. x$.
 $e$, where $e$ is a variable distinct from $x$. In this case, $f = \kappa e$.

an application, $e_1 e_2$, where $e_1$ and $e_2$ are arbitrary expressions. In this case, $f = \sigma (λx. e_1) (λx. e_2)$. Indeed, for any $y$, $$\begin{align} fy = A[x\rightarrow y] &= e_1[x\rightarrow y]e_2[x\rightarrow y] \\ &= ((λx. e_1)y)((λx. e_2)y)\\ &= \sigma (λx. e_1) (λx. e_2) y. \end{align}$$ We then apply these rules recursively to $e_1$ and $e_2$,
 an abstraction, $λy. A’$. We can recursively apply this set of rules to this inner abstraction to eliminate it, then continue applying our rules on the result.
Assuming the original lambda expression had no free variables, applying this procedure recursively lets us rewrite $f$ in terms of $\iota$, $\kappa$ and $\sigma$ only. At that point, we can simply replace $\iota$ with I, $\kappa$ with K, and $\sigma$ with S to obtain an equivalent SKI expression.
So we see why the three combinators of SKI are S, K and I: these correspond exactly to the three structures we can find when eliminating abstractions from Lambda calculus.
Example
Let’s pick a more interesting example: the lambda function $v$ defined as
It’s interesting because for any $y$,
So $(uu)$ is an abstraction that always returns itself when applied. Let’s turn it into a combinator that does that!
 The body of $u$ is $v := λx. (ss)$, so we start by applying the rules to this inner abstraction (rule 4):
 The body of $v$ is $(ss)$, which is an application. Using rule 3, we turn it into $\sigma(λx. s)(λx. s)$. We now need to apply our rules recursively to the subexpressions.
 The first subexpression is $λx. s$, which is a variable other than $x$. We replace it using rule 2, and get $\kappa s$.
 The other subexpression is also $λx. s$. In the end, $v = \sigma(\kappa s)(\kappa s)$.
 Back to $u$. We have $u = λx. ((\sigma(\kappa s))(\kappa s))$ (I’ve added parentheses back for clarity). The body is an application! Apply rule 3 and we get $u = \sigma(λx. (\sigma(\kappa s)))(λx. \kappa s)$.
 The first subexpression is $w = λx. (\sigma(\kappa s))$. Another application, rule 3 again, we get $w = \sigma(λx. \sigma)(λx. (\kappa s))$.
 Applying rule 2, we get $w = \sigma(\kappa\sigma)(λx. (\kappa s))$.
 The second argument, $λx. \kappa s$, becomes $\sigma(\kappa\kappa)\iota$, through rules 3, 2, and 1. This gives us $w = \sigma(\kappa\sigma)(\sigma(\kappa\kappa)\iota)$
 The second subexpression of $u$ is $w’ = λx. \kappa s$. This turns into $w’ = \sigma(\kappa\kappa)\iota$ just as in step 5.
 We finally get $u = \sigma w w’ = \sigma (\sigma(\kappa\sigma)(\sigma(\kappa\kappa)\iota))(\sigma(\kappa\kappa)\iota)$
 This turns into $U = S(S(KS)(S(KK)I))(S(KK)I)$.
Again, this is tedious but completely mechanical! We can note that $S(KK)I = K$, which further simplifies $U$ into $S(S(KS)K)K$, a nicely symmetrical expression.
Conclusion
The main takeway here is that SKI is functionally equivalent to lambda calculus, at the cost of a potentially expensive transformation, but with much simpler rules; in particular, we do away completely with α and ɣtransformations.