Sunday, February 1, 2015

The Incredible Rewards Of Creating a Programming Language

Many of us enjoy seeing programming languages pop up that we might one day use, especially with cool ones like rust (memory safe without GC) and avail (extremely literate programming).

But I would argue that many of us programmers, whether we like these trending languages or not, should start putting time into programming language infrastructure. That infrastructure is what enables us to do what we do every day, and the work is incredibly rewarding.

For the past year I have been creating the programming language Wake, and it is indescribable how cool a side-project it has been.

Language creation forces you to think like both a programmer and a computer at the same time. You will learn things about your favorite programming languages whether you create something on your own, or contribute to a powerhouse like java, or help a new kid on the block like rust, or join the Wake team.

I'm not the only person to think that making compilers and compiler tools can turn you from a great programmer to a best programmer. Here's my story of discovery and accomplishment in creating Wake.

The First Wake Program

When I first started making Wake, I doodled away on a text file, making little syntaxes and commenting them like a madman writing on his padded walls.

I wanted Wake to be a testable, statically typed, compiled language, ensuring any and all code you write will have testable seams via built-in dependency injection. This turned out to be a bit of a challenge for creating the first running wake program, since I had to make a parser, and a code generator, a semantic analyzer, and a dependency injection library all in one before I could so much as print "hello world."

I do not have a formal education, so I turned to Coursera to learn the basics of parsing, type checking, and codegen before getting too far along. I highly recommend it to you too, for getting started.

And at long last, I was able to write, compile, and run the first ever Wake program. It felt like I had created life.

every Main is:
    needs Printer;
    main() {
        Printer.printLine("Hello World!");

This actually ran on my computer. And just seeing the two most overused words in all of computer science, it actually felt like Wake was opening its eyes for the first time, and letting the world know it was here.

Playing With Optional Names Instead Of Optional Types

One of the biggest experiments in Wake is making variable names universally optional in lieu of type information.

Rather than relying on semantic-restricting and/or fallible type inference to keep programs clean, I set to work on a simpler idea: that types are often a great name for variables, which both the programmer and compiler can use to easily and effectively understand most common code.

As Wake became more complete I began writing unit tests in Wake that tested Wake, and found myself constantly stopping to admire my work instead of finishing it. Reading these test let it sink in that I had created a whole little language of my own. Just by skimming these little tests it was clear that they could only be Wake. And even cooler? They all passed.

Notice how not a single variable in this code has a name. They all only have types.

import Asserts;

every AssignmentsTest is:

    testPlusEquals(Asserts) {
        var Num = 5;
        Num += 0;
        Num += 1;
        Num += 5;

This is probably the most controversial idea in Wake. It gets the most positive feedback of any feature when shown in person, and the most negative feedback of any feature when shown on the internet. Don't worry, variable names are allowed absolutely everywhere, they're just, surprisingly, often not that useful.

Its been a staple of Wake syntax, and while I've run into a few syntactic hitches in emulating languages that keep the two as separate as oil and water, I know to ignore the haters because it works, exactly as well as I always dreamed.

This idea also led to the best happy-accident in Wake's design. Months and months later, I realized one day that thanks to this simple idea, I could give Wake the tightest loop syntax ever. In disbelief I began writing yet more working code that reads like a book. See how the compiler automatically changes the array variable WatchedArgument[] to its singular counterpart WatchedArgument for me.

        foreach(WatchedArgument[]) {
            if(!Text[].contains(WatchedArgument.getWatchedType())) {

and it runs. I still, to this day, get an incredibly warm feeling when I review code like this.

A Big Boy Language

Since running loops with an automatic variable name felt so cool, I had to keep going. At this point, I'm just plain hooked.

But not every feature needs to be a brand new idea to feel that mix of pride and power. In fact, that feeling is no more intense anywhere than when I look back to making generics, lambdas, and type inference.

At this point, simply parsing the generics felt like little kids' stuff. But writing the code that understands them, and validates them, with micro transformations based on context...and thinking endlessly about the subtle effects of generic programming...that process is completely unique.

Generics in Wake look much like those in Java or C#. Excuse this example for being a bit long, its an excerpt from Wake's mocking framework wockito. Specifically, this code tracks what return values have been stubbed for a method on a mock, which is a perfect place for generic programming.

every When{T} is:

    with T[] returns = [];
    with Num = 0;

    When{T} -- thenReturn(T) {
        return this;

    T -- get() {
        var T? = returns[?Num];
        if T exists {
            Num += 1;
            return T;
        } else {
            Num = 0.orMaxOf(Num - 1); // go back one, stopping at 0
            return returns[Num]; // will throw when we stopped at 0

I had honestly never expected Wake to get this far. There's still work to be done on generics; issues which I've tackled as the standard library development demands it. But the fact that this code compiles and runs still blows my mind. Some day I will post about all the subtle quirks of generics I've realized, simply by being forced to implement them. I'd wonder how Java handled these corner cases, and write up a little program to test it. And every time, java simply didn't. Its the type of understanding that type-system geeks brag about, and writing compilers forces you to discover these things if you don't know them already.

Interfacing With Javascript

We're still working on the best ways to interface Wake with javascript and other languages, and having closures simply couldn't (and shouldn't!) be avoided.

Closures were yet another far off, likely-never-to-be-implemented sort of idea, that led me down a crazy path of type inference research and even refactoring the typechecker into a different design pattern, but as of a week ago I readied everything to cut the first release of Wake that included closure support.

These are the first anonymous functions written into the wake-on-wake test suite, including type inference in all of its glory...and yet I immediately had a problem.

    ~[test list.filter]~(Asserts) {
        var Text[] = ["hey", "hey", "man"];
        Asserts.that(Text[].filter({ t -> return true; }).getSize())Equals(3);
        Asserts.that(Text[].filter({ t -> return false; }).getSize())Equals(0);

        var filtered Text[] = Text[].filter({ t -> return t != "hey"; });

I could make List{T}.filter(), and .any(), and .sort(), but I could not make List{T}.map(), or .foldr(), or .foldl(), without creating yet another feature in a category of its own.

Generic Methods

Basking in my pride over turning out closures with type inference in such a short amount of time, I decided to try my hand at generic methods before releasing. I simply didn't want to have stupid caveats in my release notes.

It would seem at first that generic methods are very similar to generic classes, and to some degree this is true. After all, List{T}.map() could be written with generic classes only without too much hassle.

every ListMapper{T, R} is:

    R[] -- map(T[], R--fn(T) mapper) {
        var R[] = [];
        foreach(T[]) {
        return R[];

But this solution requires injecting a ListMapper of the right type into your classes, which is simply not good enough for my dream language.

To make the leap from generic classes to generic functions required writing three new recursive algorithms in the type checker, and some tweaking to work with type inference. Here's how it looks now:

every List{T} is:

    {R} R[] -- map(R -- fn(T) mapper) {

And calling it:

Text[].map({ t -> return t.getSize(); });

In our type signature, we have to save R differently than T, because one is essentially a wildcard where the other is just a fancily-tracked Object. The distinction means that within map() you can recurse into map() with any value for R that you want, but T must be the same. Then from outside of map(), T is actually bound to something, for instance, Num in Num[]. And to make matters worse, T can be bound to another generic type or argument.

warning: things are about to get like way detailed

So here's what we actually do:
  • save R as a 'generic argument'
  • typecheck the method as if all Rs are Objects, but only compatible with other Rs
  • upon use of the generic method, typecheck each argument so we know what type was actually passed in (i.e. Num)
  • recurse over each argument's signature type, and the passed-in-type, alongside each other
  • check the inside of nested types, such as lists, lambdas, and other generics
  • bail out upon any structural differences between the signature and actual types (i.e. passing Num[] into R?)
  • whenever we find a generic type, save it by name (i.e. R) to the actual value passed in (i.e. Num)
  • never allow two definitions of these generic types within one structural traversal (i.e. passing anything into Map{R, R}). Instead, assert the redefinition is the same as the previous definition.
  • copy the signature, and rewrite all Rs with the type that we found
  • go on to the next argument
This is the basics of what is required. As a final detail, by waiting to check arguments with lambda declarations until the last step, we can figure figure out that

Text[].foldl({ t, acc -> return acc + t.getSize(); }, 0);

has 0, a Num, passed into the accumulator, of type R. Therefore we take the type of the first argument (R--fn(T,R)) combined with our generic bindings (T=Text and R=Num), to infer the argument types accordingly.

I apologize for overly detailing an algorithm you've probably never cared about before, but I hope I've given you a new sense of appreciation for the types of challenges that come up when creating a compiler, and just how fun those problems can be to solve.

So Where Is Wake Now?

Wake just released v0.1.0, and there is so much more we want it to be able to do.

A lot of the dream features are in, so we're focusing now on the standard library, solving bugs, and javascript integration.

There is still a huge amount of flexibility in where we take it, and some great challenges left to solve. Integrating Wake smoothly with javascript and more is no small challenge, and we warmly invite you all to join us in having a blast designing the interop layer.

There is always a desire for more feedback, and some awesome pending features like currying and recipes and type tags. We have a roadmap describing some of it here:

Of course, if Wake isn't the perfect project for you, then make something on your own, or hack on the Rust compiler, or write some static analysis tools. The realm of possible projects is massive.

So Go Hack On A Compiler!

You'll learn more than you imagined, experience phenomenal pride in your work, and thoroughly strengthen the way you read code.