I do not have an inkling on where to start about this absolute unit of a course by the amazing Prof. Politz
. This single-handedly taught me more than I ever had learnt anything about compilers in my previous adventures. The course, with a focus on backend compiler design and optimization was an fun class I looked forward to every other day. The class was structured around a dynamically typed language (with various snake names for each evolution of the compiler and source code extension of
.snek) that could be both compiled ahead-of-time and interpreted/evaluated with a target of
x86_64. Each evolution also had extensions that made it further engaging allowing students to build and learn more sophesticated concepts like the REPL.
What is a backend compiler?
Compilers are usually made up of 2 to 3 stages
, called (with very original names) the front, middle and back end. The front end is responsible for parsing the source code of the program into an Intermediate Representation (IR). The middle end is responsible for most of the optimizations which are performed on the IR or generating a new IR with the optimizations. The back end is responsible for the code generation phase that generates machine code which can finally run on the target hardware. This course did not include the front end, instead using a well known
S-expression syntax and available libraries for parsing to the first IR and continuing from there.
We started off simple, with the whole language consisting of numbers (integers), increment, decrement and negation unary operators. This allowed us to understand how the pipeline would look like that we can iterate over to add more features down the line.
Boa introduced the concept of variables and let bindings along with some binary operators like addition, subtraction and multiplication. Bindings helped understand how explicit scoping works binary operators allowed to observe how evaluation order can impact performance and correctness of the code.
The extension here was the first introduction into how Read-Eval-Print-Loops, or REPLs work and how we can dynamically inject code segments into memory and jump into it for execution. It also added a new way to globally
define variables that could be used a regular bindings but with a global scope.
, a runtime assembler was a godsend in terms of making this work in such a short amount of time.
Cobra stepped up the compiler into something that could be used to build real programs by introducing booleans, mutation of let bindings and control flow using
block helped add more than one statement without introducing additional scoping. This also meant we looked at how to distinguish types of a given value by tagging a value with it’s type. At this point, there was a need to write non trivial amount of tests to make sure we weren’t breaking anything while adding new features. Mutations also changed how we think about scopes and the assumptions we could make about what a variable would contain at a certain point in time.
The extension of Cobra was two-fold. First, to be able to eliminate type checks for known types in REPL since once a statement is executed, the type information is known statically before the next statement, including the usage of
defined variables. The next one was to add an Eval mode along with the REPL which evaluates the code in memory and outputs the result without ever compiling the source to target machine code.
Caduceus was a checkpoint to take stock of what we had built till now and look at others’ implementations to look at various approaches to the same problem. This was a pleasant change from other evaluation methods where past assignments never seem to mean much moving forward.
With Diamondback, the compiler was getting seriously usable. We were adding functions, a core concept in most programming languages. Functions were defined before the main body and we had variable argument function calls. Some interesting concepts such as stack frame construction and our own calling conventions vs. the C style were introduced, which was promptly utilized by the newly added
The extension here was one of the best concepts ever introduced with functions, Proper Tail Calls or Tail Call Optimization which are safe-for-space. This means that, for certain types of (recursive) function calls from a tail position could be done without constructing a new stack frame. This meant that a factorial of 10 and factorial of 20 both took the same amount of space (just one stack frame) to evaluate! There were subtleties here, with care to be taken for mutually recursive functions which could have different number of arguments and size of the stack frame.
The next extension was to port functions to the REPL and implement dynamic partial single level monomorphization , allowing the first function call to generate two functions, a fast version with optimizations which is called when the types of the arguments match the types provided during the first call and the second slower one which is similar to the functions generated before. This was a significant feature to implement, going to the weeds of memory layout to generate code that can modify it’s own code at runtime to insert the fast version into the code segment. I, for one, absolutely loved every second of implementing this.
Just when you could think the compiler is good enough, Egg Eater comes by and implements the concept of heap and heap allocated structures, specifically fixed size lists or vectors. New tags for values and heap layout for lists were discussed. This made it possible to implement mainstream algorithms like Binary Search Trees, graphs, linked lists, etc.
The extensions to this were to add mutation of list elements which brings along new problems we did not have before like reference cycles, lists referring to other lists in a loop. To address that, we also had to implement structural equality, which gets into concepts like generating a directed acyclic graph representations of lists to compare and come up with definitions for what is considered structurally equal. Also as always, we ported this to the REPL interpreter as well.
What else does a cool dynamic language need? Dynamic heap allocation and garbage collection, of course! Forest Flame brings in variable sized lists or vectors with runtime garbage collection using the LISP2 Mark/Compact algorithm . While it looks real simple to just go and clean up data that is unused, the concepts behind it, ranging from stack frame traversal to defragmenting the heap while cleaning up were very intriguing pieces of the puzzle.
The extension to this was to implement a generational garbage collection with the heap divided into a nursery and the main heap. This came along with implementing write barriers and remembered sets to make the copying nursery collection fast while also having a full fledged GC running on the main heap. It was an amazing feeling to be able to implement something similar to the likes of SGen, a GC used by Mono (the open source .NET framework).
Finally, coming to the end of this impressive class, we had an optimization challenge to improve the performance of our compiler in various aspects. This included bringing all knowledge into action like the type checker, optimization techniques such as Constant Folding, Constant Propogation and Dead Code Elimination, generating new IRs for Control Flow Analysis, etc.
x86_64 on an M1 ARM CPU with a translation layer in between and all the intricacies of macOS as opposed to Linux. The quantity of information I ingested in this one course is astonding, and I am elated to have had the opportunity to take this.
My implementation of all the things I talked about is on my GitHub . I’ll move it to its final home soon ™ once I have the chance to refactor and organize it a bit better than what I could do over the quarter.