compiling-the-lambda-calculus/script.md
2024-02-13 20:00:02 -07:00

23 KiB
Raw Blame History

About Me

"For the new faces, I'm Elizabeth, or simponic on the discord..."

  • I love functional programming; I've been enjoying programs functionally for over 3 years; in LISP and Elixir mostly. I've loved every second of it. Well, maybe not every second, but definitely most of them.
  • Soon to be SDE at AWS
  • I'm 20 years old, graduating this semester
  • I was president of the USU Linux club from 2021 - 2024

Of course this presentation is open source at the link you see on the board.

Flirting With Functions

Tomorrow's Valentine's day, but you realize; you don't have a Valentine!

Desparate to make yourself feel better, you turn to your computer and open Emacs, asking the M-x doctor for help.

I am the psychotherapist.  Please, describe your problems.  Each time you are finished talking, type RET twice.
> i need a valentine
Why do you say you might want to have this valentine?
> im lonely
Are you lonely often?
> sometimes
Can you elaborate on that?
> i need someone or something i can trust
What makes you believe you need someone or something you can trust?
> i want something that when i ask one question i'll get the same answer no matter what.
Is it because when you ask one question you will get the same answer no matter what that you came to me?
> yes
I see...  Well, are you sure that this is so?
> yes
Yes, hmmm... so, do you really think this is so?
> yes

well that's not super helpful, looks like the "AI stuck in a loop repeating itself" notion is nothing modern.

On our journey to get a valentine, it looks like we have stumbled across a black box. Why? Well... we could start shoving random numbers in it and see what happens.

hmm... [-1, 2]? okay, 1. [3, 4]? okay, 7. what if we try [-1, 2] again, see if it meets our criteria to be our valentine; not changing its answer. amazing!

No matter how many times we put two numbers in it gives me the same answer; we're in love with it already!

Let's look at what's behind this black box, see if there's more we can get to know.

Where could our curiosity take us?

A simple add function. That makes sense.

We're in love with its predictability, and simplicity; the assurance that no matter what we give as input or how many times it's given, the function behaves in a consistent manner.

But let's imagine, for a moment, a different kind of box. One where actions outside the box influences responses.

Imagine asking your partner about what food they want to eat. But instead of a straightforward answer based on your question alone and some state (i.e. hunger levels, craving ratios, etc), the function's response is influenced by other factors; the day of the week, and the state of the day or week prior. We can simulate this: ((go through the ratios and outputs)).

Let's see what causes this in this black box we don't love; don't pay attention to the implementation (there's some stuff we haven't talked about yet), but check out this line (MATH.RANDOM). This is a side effect; an unpredictible effect on the output.

Side effects introduce unpredictability into our programming relationships. Where output is not just determined by its input but also by the state of the outside world, or the system at the time of execution. Suddenly, the predictability we cherished is compromised.

We call a function that is unpredictable, or has side effects, "impure", and one that isn't - "pure".

It was obvious how we could cause unpredictability by including randomness. But here's another example, on left we have two impure functions.

When we execute the first block, we get the expected result of fib(5).

But because fact uses the same cache, we'll run fact(5)... fact(5) returns fib(5)!

Now, when I go and update the cache to be correct, we get the correct answer. I was able to change the behavior of the function by modifying state outside of it. Get the gist - impure vs pure functions, side effects, etc? Cool.

For now let's move back to studying our pure black boxes.

We love our black boxes a lot, and we want to share them with other people in our lives. So let's create another that takes a list of people and creates a custom valentine's letter for them.

Good! Now they will all share our love.

[SPACE] But a few months goes by, and all our friends' birthdays are soon coming up (why are so many cs people born in June)! We don't want to make them sad, as we see here, so we make another black box to generate a letter, and in how many days we should give it to them:

There, now, we can make sure our friends are happy on their birthdays!

But, this is getting annoying; what about Christmas, Thanksgiving, or Easter? Making a new black box to make a list of new cards, going through each person, every single time to construct a list of cards, is getting really tedious.

[SPACE] What if we generalized this? [SPACE] We create a couple of black boxes that take a person, and generate them a card, specifically; like a template for a card you could print off and fill in manually. Like this, for alan turing.

Then, we can use a new black box that takes a list of people, and applies this template to each person.

The ability in a language to pass a function around like this - like a variable - is what makes functions "first class". And the buildCards function takes a function as input, making it a "higher order function".

Functional Reproduction

After exploring the simple yet profound beauty of higher order black boxes, our journey into the functional programming romance takes an intriguing turn. What's better than one black box? A black box that creates more black boxes.

All life on Earth contains instructions in the form of DNA. During mitosis, special mechanisms in our cells read the instructions and create new instructions to give to another cell.

Very similarly, we can give our black boxes DNA from parents to reproduce and create new child black boxes. But in our world of black boxes, we refer to this DNA as "closures"; a function with its "environment" - the variables it has access to - attached to it. [SPACE]

Here we've created a black box that outputs a black box to create a certain type of letter for a person encoded in its DNA. If the type is "birthday", the output is a birthday card generator, and if it's "valentine", it's a valentine letter generator.

When we put in "Alan Turing" and "valentine", we obtain a black box with "person" (in this case, Alan Turing) in its DNA [OPEN DEV TOOLS -> valentineCardGenerator]. Now we evaluate the black box, and it returns a valentine letter for that person.

We can even go a step further:

Wait, do you guys smell that?

Quick Aside: Delicious Curry

Sneakily, we've actually started preparing curry!

We've broken up the input of the black box ~cardGenerator~ into a series of functions with a single input. This is called ~currying~.

To curry the function cardGeneratorFor we: first created a new closure over the person to provide a function that takes only the type returned that closure removed the now unnecessary type argument since that's now curried and finally changed the function signature to reflect the curried function

There's another way that we could implement the same functionality - "partial function application"; a fancy name for a simple idea; locking some arguments to partial inputs in a function and passing the rest to be applied.

Immutability

We briefly mentioned side effects and alluded them to the unpredictability of a partner choosing their food. We love our black boxes because they're reliable. But, we've actually had an impostor among us throughout this presentation.

Specifically in buildCards [SPACE]

This function actually has a small side effect - though it's contained entirely within the function - when we append to the cards list. If we were to go about living a side effect free life, that means we can't change anything lest something else depends on it.

In this case, looping through "people" inherently produces a side effect via mutation of the loop variable, so we need to eliminate it.

So how do we sus out an impostor? With copies and recursion.

Here we're not changing anything at all (except the stack), but we've kept the same functionality; this code is "immutable". When we call build_cards on a list of people, we're 100% certain we're not gonna change something and get something odd, short of a bug.

At a high level there are so many benefits to immutable functional programming:

  • Readability and reasoning about code. Functional, immutable code reads like a book!
  • Concurrency. When everything is immutable, it's much easier to prevent race conditions and deadlock. If you can avoid a mutex or semaphore here and there, that simplifies things so much. You also know for certain every thread or processor is working on the same data; it couldn't have changed when it was distributed.
  • No side effects. I can't think about how many times I've spent hours poring through a code base to find where exactly something is changing or what it can effect. This helps verify correctness and makes unit testing a dream.
  • I suppose this is more subjective, but it gives you an entirely new view of how computers work. When you work with functions you're directly treating the computer as an automaton or turing machine. There's so much to learn around functional programming, type theory, and it touches almost any given field of mathematics and computer science.

But there are also downsides:

  • Keeping immutability requires more computation in many cases (special data structures and copying)
    • Typical arrays go out the window in most applications. The cons cell / linked list is much preferred. - You also don't typically have mutable datastructures like hash maps. Instead, you need to reach to balanced trees or tries. It takes a lot of practice but you can always write code just as efficiently in terms of time complexity.
  • All things considered, to write and learn, it's difficult. The rabbit hole is as deep as your imagination. I haven't even talked about monads or treating code as data (macros); barely even scratched the surface.

But like all relationships, we need to compromise with our black boxes. Side effects are effectively unavoidable as programmers; when people press the button, they don't want the computation to just exist out there in the computer void. No! They want to see the pop up box, and the value updated on their account. We need to be able to mutate state where the tradeoff of extra compute time is unavoidable like in I/O or a relatively large database.

Writing Code Functionally

The best way to learn is by practicing. So we'll spend some time refactoring some code to be more functional and develop some higher order functions and practice currying.

We'll start with a simple example: a function that takes a list of numbers and returns the sum of the squares of the even numbers.

For millenia humans have thought about how best to find if a number is even. So the first function is the most efficient way to do so; we know 0 is even, and then numbers alternatve between even and odd. We could prove this by induction but it's up to the reader.

Then for our sumOfEvenSquares we go through all the numbers upTo and add its square if it's even.

To make this functional, we first need to keep isEven pure and immutable. We can do this again via recursion; 0 is even, so we do the same alternation between even and odd.

For sumOfEvenSquares, we can make some higher order function to filter out the even numbers, and then map over the list to square them, and then reduce to sum them.

const isEvenFn = (n: number) => {
  if (n === 0) {
    return true;
  }

  return !isEvenFn(n - 1);
};

const map = (
  transform: (x: number, ind: number) => number,
  list: number[],
  index = 0
) => {
  if (list.length === 0) {
    return [];
  }

  const [head, ...tail] = list;
  const transformed = transform(head, index);
  return [transformed].concat(map(transform, tail, index + 1));
};

const filter = (predicate: (x: number) => boolean, list: number[]) => {
  if (list.length === 0) {
    return [];
  }

  const [head, ...tail] = list;
  if (predicate(head)) {
    return [head].concat(filter(predicate, tail));
  }
  return filter(predicate, tail);
};

const reduce = (
  accumulation: (x: number, acc: number) => number,
  accumulator: number,
  list: number[]
) => {
  if (list.length === 0) {
    return accumulator;
  }

  const [head, ...tail] = list;
  const newAccumulator = accumulation(head, accumulator);
  return reduce(accumulation, newAccumulator, tail);
};

const sumOfEvenSquaresFn = (upTo: number) => {
  const numbersUpTo = map((_, i) => i, Array(upTo + 1).fill(0));
  const evens = filter((x) => isEvenFn(x), numbersUpTo);
  const evensSquared = map((x) => x * x, evens);

  return reduce((x, acc) => x + acc, 0, evensSquared);
};

And all our tests pass!

We can also use currying to make our code more readable and easier to use; for example, we can curry our reduce function to implement a sum function.

const curriedReduce = (accumulation: (x: number, acc: number) => number) => {
  return (accumulator: number) => {
    const reduce = (list: number[]) => {
      if (list.length === 0) {
        return accumulator;
      }

      const [head, ...tail] = list;
      const newAccumulator = accumulation(head, accumulator);
      return curriedReduce(accumulation)(newAccumulator)(tail);
    };

    return reduce;
  };
};

const sum = curriedReduce((x: number, acc: number) => x + acc)(0);

...
return sum(evensSquared);
...

And our tests pass once again! But one of them is commented... hmm... it's pretty big

And, we fail, getting a "maximum call stack size exceeded" error in our incredibly naive recursive implementation of filter, map, and reduce. Thankfully javascript has these builtin, so we'll use those.

const sumOfEvenSquaresFn = (upTo: number) => {
  const numbersUpTo = Array(upTo + 1)
    .fill(0)
    .map((_, i) => i);

  return numbersUpTo
    .filter((x) => isEvenFn(x))
    .map((x) => x * x)
    .reduce((x, acc) => x + acc, 0);
};

There, much shorter. But, we still get the same error. This time in isEven at the recursive call.

Now, we're gonna start getting into the weeds. The reason this is happening is because essentially the code is being transformed into:

const isEvenFn = (n: number) => {
  if (n === 0) {
    return true;
  }

  const isEven = isEvenFn(n - 1);
  return !isEven;
};

So everytime we check the number previous, we create a whole new stack frame to evaluate isEvenFn(n-1) and return control to the callee. Then we evaluate "!isEven" which requires that stack frame.

The runtime I'm using, Bun, uses JavaScriptCore - which supports Tail Call Optimization (which is, sadly, not in the JS spec). What is TCO? Tail Call Optimizations allows us to write recursive functions that overwrite the same stack frame when the last expression evaluated does not leave control of the function.

To do this, we need to ensure that inside of isEvenFn, the last expression evaluated is always isEvenFn, not !isEven.

It takes a bit of ingenuity, but to do this, we can use an accumulator inside an argument and change the logic up a bit. Here's what I mean:

const isEvenHelper = (curr: number, max: number, previous = false) => {
  if (curr === max) {
    return !previous;
  }
  return isEvenHelper(curr + 1, !previous);
};
const isEvenFn = (n: number) => {
  return isEvenHelper(0, n);
};

And there we go! It's passing. But, it's much much slower than the iterative solution. But from my experimentation, Tail Call Optimization won't help us much here since we spend most of our time doing our isEven check in linear time, and we're constantly doing garbage collection whereas our iterative solution has nothing to clean up.

The keen eyed among you might notice that since we can represent the sum of the first n natural number's squares as (n _ (n + 1) _ (2 * n + 1)) / 6, then transform each natural number to its nearest even entry by scaling by two, and halve the bounds of the sum. Which is constant time. Yeah. But that's not as fun.

========

Part Two - Marrying Functions

The Lambda Calculus

The Lambda calculus, developed by Alonzo Church in the 1930s, is a "formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution".

There are three rules that define the semantics of the Lambda Calculus:

  1. "x": A variable is a character or string representing a parameter, and is a valid lambda term.
  2. "(λ x . t)" is an "abstraction" - a function definition, taking as input the bound variable x and returning the lambda term t.
  3. (M N) is an "application", applying a lambda term M with a lambda term N.

There, we're done; the rest is an exercise to the reader. Have a good day everyone! No... I wouldn't be that mean.

We can "evaluate" a lambda term by performing three types of reduction:

  1. α - conversion: Renaming of bound variables to avoid name clashes (out of scope - we're gonna assume user knows not to bind variable names already chosen elsewhere).
  2. β - reduction: Application of a function to an argument.
  3. η - reduction: Conversion of a function to a point-free form (out of scope).

To perform a β-reduction, we substitute the argument into the body of the function, replacing all instances of the bound variable with the argument.

For example, given the lambda term (λ x . x) y, we can do a β-reduction to obtain the value y after substituting x. This folllows pretty intuitively, but more formally - we define substitution recursively - given M and N as lambda terms and x,y as variables, as follows:

  1. x[x := N] = N
  2. y[x := N] = y` if y != x
  3. (M N)[x := P] = (M[x := P]) (N[x := P])
  4. (λ x . M)[x := N] = λ x . M
  5. (λ y . M)[x := N] = λ y . (M[x := N]) if y != x and y is not free in N

A variable is free in a lambda term if it is not bound by a λ abstraction. In other words, a free variable is one that is not a parameter of the function, or in the immediate scope of an abstraction.

Lambda Calculus as a Computational Model

We encode all computation and values in the lambda calculus as a series of functions and applications. For example, we can encode a natural number n as a function that takes a function f and a value x, and applies f to x n times.

  1. 0 = λ f . λ x . x (no application of f)
  2. 1 = λ f . λ x . f x (one application of f )
  3. 2 = λ f . λ x . f (f x) (two applications of f)

This is called the "Church encoding" of the natural numbers.

We can also define the successor function succ as a function that takes a church encoded natural number n, and a function f, and a value x, and performs f on x n times, and then once more.

  1. succ = λ n . λ f . λ x . n f (f x)

For example, we do an application of succ on 0: (succ 0) = 1, and (succ 1) = 2.

  1. We perform substituion on ,n with the church encoding for 0
  2. Substituting g with f has no effect and returns the term (lambda y . y)
  3. (lambda y . y) is the identity function.

We're Done! We've constructed 1 from 0. We'll go through the same process to construct 2 from 1.

We can define the boolean values true and false as functions that take two arguments and return the first and second, respectively.

  1. true = λ x . λ y . x
  2. false = λ x . λ y . y

We can then define the logical operators and, or, and not in terms of these boolean values.

  1. and = λ p . λ q . p q false
  2. or = λ p . λ q . p true q
  3. not = λ p . p false true

(a few examples...)

Recursion is also possible in the lambda calculus via the funny-looking thing on the board, which you've probably seen before - the Y combinator. The Y combinator should satisfy the equation (Y f) = (f (Y f)), and is defined as follows:

  1. Y = λ f . (λ x . f (x x)) (λ x . f (x x))

If we let it run by itself, we can see how our recursion works. We can kinda see some formation of a "stack" of sorts.

When we apply the Y combinator to a function, it will apply the function to itself, and then apply the function to itself again, and so on, until the function returns a value.

Writing a Quick Lambda Calculus Interpreter

I think the best way to learn the lambda calculus is by building an interpreter for it, it's a quick exercise and it's a lot of fun.

When writing a Lambda Calculus interpreter, there are a few decisions to be made:

  • is it the typed or untyped lambda calculus?
  • is the language lazy or strict / eager?
  • call by value or call by name? (call by value - environment, closures)
  LambdaTerm
    = Abstraction
    / Application
    / Variable

  Application
    = LPAREN _? left:LambdaTerm _? right:LambdaTerm _? RPAREN {
        return { left, right };
      }

  Abstraction
    = LPAREN _? LAMBDA _? param:Variable _? DOT _? body:LambdaTerm _? RPAREN {
        return { param, body };
      }

  Variable
    = name:([a-z] [A-Z0-9a-z]*) { return { name: name[0] + name[1].join('') }; }

  LAMBDA = "λ" / "\"

  DOT = "."

  LPAREN = "("

  RPAREN = ")"

  _ = (" " / "\n" / "\t" / "\t\n")+

  private substitute(
    ast: LambdaTerm,
    param: string,
    replacement: LambdaTerm
  ): LambdaTerm {
    const node = ast as any;

    if (this.isVariable(node)) {
      return node.name === param ? replacement : node;
    } else if (this.isApplication(node)) {
      const left = this.substitute(node.left, param, replacement);
      const right = this.substitute(node.right, param, replacement);

      return { left, right } as Application;
    }

    return {
      param: node.param,
      body: this.substitute(node.body, param, replacement) as Abstraction,
    };
  }

  public fullBetaReduction(ast: LambdaTerm = this.ast): LambdaTerm {
    const node = ast as any;

    if (this.isApplication(node)) {
      const { left, right } = node as Application;

      if (this.isAbstraction(left)) {
        return this.fullBetaReduction(
          this.substitute(left.body, left.param.name, right)
        );
      }

      return {
        left: this.fullBetaReduction(left),
        right: this.fullBetaReduction(right),
      } as Application;
    }

    if (this.isAbstraction(node)) {
      return {
        param: node.param,
        body: this.fullBetaReduction(node.body),
      } as Abstraction;
    }

    return node;
  }

There are a lot of bugs in our interpreter; first, there is no alpha conversion, there's a high chance we could have name conflicts within lambda terms. We could use de brujin indexing, but that's outside the scope of this presentation.

Further

Obviously we've only scratched the surface of the lambda calculus and functional programming; I didn't even mention a lot of things I wanted to get to. So here's a list you can study on your own:

Pattern matching is a fun paradigm and allows one to write much more declarative code.

The lambda calculus is a "universal model of computation", meaning that any computable function can be expressed in the lambda calculus. This is known as the Church-Turing thesis, and is the basis for the theory of computation.

I highly recommend reading the paper "Compilation of Lambda-Calculus into Functional Machine Code". You know how we alluded to the "stack" that the Y combinator produced? This paper goes into how we can utilize that stack to compile lambda calculus into machine code on a stack machine by reversing the order through continuation passing style.

Extending our interpreter into A Compiler As A Sequence of Transformations

Continuation Passing Style

=======

Conclusions and Comparisons