Intuitively Deriving the Y Combinator

$$ \newcommand{\one}[1]{ \color{red}{ #1 } } \newcommand{\two}[1]{ \color{blue}{ #1 } } \newcommand{\three}[1]{ \color{green}{ #1 } } \newcommand{\four}[1]{ \color{magenta}{ #1 } } $$

The $\lambda$ calculus is designed to be a model of computation, so it may be surprising that it has no innate recursion or iteration. We (spoilers!) can indeed do recursion, but it takes a little bit of work. Let's try to implement a quadrupling function $Q$ in $\lambda$ calculus. $Q$ can be defined recursively (not yet in $\lambda$ calculus) like so: $$ Q(x) := \begin{cases} 0 & x = 0 \\ 4+Q(x-1) & \text{otherwise} \end{cases} $$

Quadrupling ($Q$) was chosen over doubling ($D$) and tripling ($T$) because the names $D$ and $T$ are used for other functions later in the post.

Conditionals and Booleans in $\lambda$ calculus

Like all recursive algorithms, $Q$ contains a base case, when $x = 0$, and a recursive case, when $x \neq 0$. If we want to implement this in $\lambda$ calculus, we'll need a way to distinguish betwen the base and recursive cases. Math does this, as shown above, with case notation; many programming languages do this with if-statements or" the ternary operator; $\lambda$ calculus does it with all it's got: functions

One could argue that an algorithm such as: Algorithm" A(x): 1. Return A(x) , which contains no base case, is still a recursive algorithm. On the contrary, Donald Knuth states in The Art of Computer Programming (Third Edition) that [an] algorithm must always terminate after a finite number of steps.

Any conditional computation if $C$ then $A$ else $B$ is choosing between two results, $A$ and $B$, based on the value of $C$. If $C$ is true, then $A$ is returned, and if $C$ is false, $B$ is returned. In $\lambda$ calculus, true and false are simply defined to return the first and second argument, respectively: $$ \underbrace{ \lambda xy.x }_\text{true} \hspace{20pt} \text{ and } \hspace{20pt} \underbrace{ \lambda xy.y }_\text{false} $$

Usage of $\lambda$ booleans differ slightly from usage of booleans in most systems. Instead of needing to be wrapped in an if-statement or some other construct, they are themselves functions which do the choosing operation. In order to express: $$\text{if $x \geq 0$ then $1$ else $-1$}$$ (the $sign$ of $x$) we just write:

$$(\geq 0x)(1)(-1).$$

Note that $-1$ is not the function $-$ applied to the value $1$ but rather the single unit negative one

Also: $\geq$, $0$, $1$, $-1$ and all other symbols which are not parameters are not real $\lambda$ calculus constructs, but merely shorthand. Assignment does not exist in $\lambda$ calculus. These values only stand for some unspecified functions, with the promise that these functions work together as expected. For instance, that $\geq00 = \text{true}$, $+24=6$, etc. This is elaborated on in another note.

Because assignment doesn't really exist and shorthand is not real $\lambda$ calculus, I tried to avoid using it in this post, generally opting for $\underbrace{\text{underbace}}$ and $\overbrace{\text{overbrace}}$ labeling notation instead. However, how $\geq$, $1$, $-1$ and the rest of the numbers are defined is out of the scope of this post, so I had little choice but to use shorthand for these items.

The behavior of $sign$ is not universally agreed upon. Wikipedia defines $sign(0)=0$ (and also calls the function $sgn$), and $sign(0)$ is sometimes left undefined. I personally have found it useful (particularly in programming) to have $sign(0)=1$. I use that definition here because it's a simpler implementation, requiring only one conditional branch rather than two.

Thus $sign(2)=1$ and $sign(-3)=-1$: $$ \begin{array}{ccc} (\overbrace{\lambda x. (\geq 0 x)(1)(-1)}^{sign})2 & \hspace{20pt} & (\overbrace{\lambda x.(\geq 0 x)(1)(-1)}^{sign})(-3) \\ (\geq 0 2)(1)(-1) && (\geq 0 (-3))(1)(-1) \\ (\underbrace{\lambda ab.a}_\text{true})(1)(-1) && (\underbrace{\lambda ab.b}_\text{false})(1)(-1) \\ 1 && -1 \end{array} $$

Back to Recursion

If we try to write $Q$ in $\lambda$ calculus, we will quickly realize that it isn't so simple. It certainly looks something like: $$ \lambda x. Zx0(+4(\text{RECUR}(-x1))) $$ but what should be put in the place of RECUR?

The $Z$ function is just $isZero$, defined as: $$ Z(x) := \begin{cases} \text{true} & x = 0 \\ \text{false} & x \neq 0 \end{cases} $$

Note that $Q^R$ is not some operation $\cdot^R$ applied to $Q$ but rather just another variable name.

Recursion is self-reference, so our implementation of $Q$ needs to have access to itself in its body. However, lambdas don't innately have this. A lambda only has access to its arguments. Perhaps, then, we could... pass the function to itself as an arguemnet? Something like: $$ \underbrace{\lambda \color{red}{R}x.Zx0(+4(\color{red}{R}(-x1)))}_{Q^R} $$ where $R$ is just $Q^R$, but passed as an argument. Then ${Q^R}{Q^R}$ should be a quadrupling function. Let's try it: $$ \begin{align*} & \overbrace{\one{(\lambda Rx.Zx0(+4(R(-x1))))}}^{Q^R} \ \two{Q^R} \ \three 2 \\ &= \one{Z\three{2}0(+4(\two{{Q^R}}(-\three{2}1)))} \\ &= \one{(\overbrace{\lambda ab.b}^\text{false})0(+4(\two{{Q^R}}(-\three{2}1)))} \\ &= \one{+4(\two{{Q^R}}(-\three{2}1))} \\ &= \one{+4(\two{{Q^R}}\three{1})} \tag{F} \\ &= \one{+4(\two{(\lambda Rx.Zx0(+4(R(-x1))))}\three{1})} \\ &= \one{+4\two{(\lambda x.Zx0(+4(\three{1}(-x1)))})} \\ \end{align*} $$ and our result is $4$ plus ... a function of $x$. The issue is illuminated on line $(F)$, $F$ as in $F$uck, there's an issue. $Q^R$ expects its first argument to be the recursive function $R$, but we're only passing it the $x$ value. We can fix this by passing $R$ itself within the body of $Q$: $$ \underbrace{\lambda Rx.Zx0(+4(\color{red}{RR}(-x1)))}_{Q^{RR}} $$ and indeed this works: $$ \begin{align*} & \overbrace{\one{(\lambda Rx.Zx0(+4(RR(-x1))))}}^{Q^{RR}} \ \two{Q^{RR}} \ \three 2 \\ &= \one{Z\three{2}0(+4(\two{Q^{RR}Q^{RR}}(-\three{2}1)))} \\ &= \one{+4(\two{Q^{RR}Q^{RR}}(-\three{2}1))} \\ &= \one{+4(\two{Q^{RR}Q^{RR}}\three{1})} \\ &= \one{+4(\two{(\lambda Rx.Zx0(+4(RR(-x1)))){Q^{RR}}}\three{1})} \\ &= \one{+4(\two{Z\three{1}0(+4({Q^{RR}}(-\three{1}1))))})} \\ &= \one{+4(\two{+4({Q^{RR}}(-\three{1}1)))})} \\ &= \one{+4(\two{+4(\four{{Q^{RR}}}{Q^{RR}}\three{0}))})} \\ &= \one{+4(\two{+4(\four{(\lambda Rx.Zx0(+4RR(-x1))))}{Q^{RR}}\three{0}))})} \\ &= \one{+4(\two{+4\four{(Z\three{0}0(+4\two{{Q^{RR}}}(-\three{0}1))))})})} \\ &= \one{+4(\two{+4\four{((\overbrace{\lambda ab.a}^\text{true})0(+4\two{{Q^{RR}}}(-\three{0}1))))})})} \\ &= \one{+4(\two{+4\four{0})})} \\ &= \one{8} \end{align*} $$

In general, a function in double-$R$ form should, when passed to itself, produce the desired recursive function. We can wrap up this passing-to-self or self-invokation into a function $I$ to do the work for us:

$I$ is a fun function because $II =_\beta II$: $$ \begin{align*} & II \\ &= (\lambda f.ff) (\lambda g.gg) \\ &= ff[f / (\lambda g.gg)] \\ &= (\lambda g.gg) (\lambda g.gg) \\ &= II \end{align*} $$

$$ \overbrace{\lambda f.ff}^I $$ so that we can just write $IQ^{RR}$ or, for any double-$R$ $f$, $If$. If $f$ is a function in double-$R$ form, $If$ is the desired recursive function.

Now I don't know about you, but writing functions in double-$R$ form seems like a bit of a pain to me. I don't want to have to remember to write $RR$ instead of $R$ each time I write a recursive function. I want $R$ to be the real recursion function. Let's make something to do this $R$-doubling for us. Call it $D$ for double: $$ \overbrace{ \lambda fR.f(RR) }^D $$

Now we can use the simpler formulation $Q^R$ instead of $Q^{RR}$ and have $D$ do the work for us: $$ \begin{align*} & (\overbrace{\one{\lambda fR.f(RR)}}^D) \ (\overbrace{\two{\lambda Rx.Zx0(+4(RR(-x1)))}}^{Q^R}) \\ &= \one{\lambda R.(\two{\lambda Rx.Zx0(+4(RR(-x1)))})(RR)} \\ &= \one{\lambda R.(\two{\lambda x.Zx0(+4(\one{RR}(-x1)))})} \\ &= \underbrace{\lambda Rx.Zx0(+4(RR(-x1))))}_{Q^{RR}} \\ \end{align*} $$

In general, $D$ should take a recursive function from single-$R$ form to double-$R$ form. And we know that a function in double-$R$ form may be transformed into the desired recursive function via $I$. Thus, if we have a function $f$ in single-$R$ form, $I(Df)$ is the desired recursive function. And $I \circ D$ takes a function from single-$R$ form to recursive form. But what is $I \circ D$? It's the same as $\lambda f. I(Df)$, which is

$$ \begin{align*} & \one{\lambda f.\two{I}(\three{D}f)} \\ &= \one{\lambda f.\two{(\lambda f.ff)}(\three{(\lambda fR.f(RR))}f)} \\ &= \one{\lambda f.\two{(\lambda f.ff)}\three{(\lambda R.\one{f}(RR))}} \\ &= \one{\lambda f.\three{(\lambda R.\one{f}(RR))}\three{(\lambda R.\one{f}(RR))}} \\ &= \lambda f. (\lambda x.{f}(xx))(\lambda x.{f}(xx)) \\ \end{align*} $$

which is the $Y$ combinator!

Though we can write e.g. $$ S := \lambda abc.b(abc) $$ this kind of definition of $S$ is only shorthand, and must be expanded to (i.e., replaced with) its definition before evaluation. $\lambda$ calculus does not have any rules on its own to evaluate these kinds of expressions.

If we write recursive shorthand such as: $$ F := \lambda x. Fx $$ attempting to expand this for evaluation will cause infinite recursion, so it can never be evaluated as real $\lambda$ calculus. We could, of course, create a new system which allows for these things, but this new system would not be $\lambda$ calculus.