Sunday, February 1, 2015

The Incredible Rewards Of Creating a Programming Language

Many of us enjoy seeing programming languages pop up that we might one day use, especially with cool ones like rust (memory safe without GC) and avail (extremely literate programming).

But I would argue that many of us programmers, whether we like these trending languages or not, should start putting time into programming language infrastructure. That infrastructure is what enables us to do what we do every day, and the work is incredibly rewarding.

For the past year I have been creating the programming language Wake, and it is indescribable how cool a side-project it has been.

Language creation forces you to think like both a programmer and a computer at the same time. You will learn things about your favorite programming languages whether you create something on your own, or contribute to a powerhouse like java, or help a new kid on the block like rust, or join the Wake team.

I'm not the only person to think that making compilers and compiler tools can turn you from a great programmer to a best programmer. Here's my story of discovery and accomplishment in creating Wake.

The First Wake Program

When I first started making Wake, I doodled away on a text file, making little syntaxes and commenting them like a madman writing on his padded walls.

I wanted Wake to be a testable, statically typed, compiled language, ensuring any and all code you write will have testable seams via built-in dependency injection. This turned out to be a bit of a challenge for creating the first running wake program, since I had to make a parser, and a code generator, a semantic analyzer, and a dependency injection library all in one before I could so much as print "hello world."

I do not have a formal education, so I turned to Coursera to learn the basics of parsing, type checking, and codegen before getting too far along. I highly recommend it to you too, for getting started.

And at long last, I was able to write, compile, and run the first ever Wake program. It felt like I had created life.

every Main is:
    needs Printer;
    main() {
        Printer.printLine("Hello World!");

This actually ran on my computer. And just seeing the two most overused words in all of computer science, it actually felt like Wake was opening its eyes for the first time, and letting the world know it was here.

Playing With Optional Names Instead Of Optional Types

One of the biggest experiments in Wake is making variable names universally optional in lieu of type information.

Rather than relying on semantic-restricting and/or fallible type inference to keep programs clean, I set to work on a simpler idea: that types are often a great name for variables, which both the programmer and compiler can use to easily and effectively understand most common code.

As Wake became more complete I began writing unit tests in Wake that tested Wake, and found myself constantly stopping to admire my work instead of finishing it. Reading these test let it sink in that I had created a whole little language of my own. Just by skimming these little tests it was clear that they could only be Wake. And even cooler? They all passed.

Notice how not a single variable in this code has a name. They all only have types.

import Asserts;

every AssignmentsTest is:

    testPlusEquals(Asserts) {
        var Num = 5;
        Num += 0;
        Num += 1;
        Num += 5;

This is probably the most controversial idea in Wake. It gets the most positive feedback of any feature when shown in person, and the most negative feedback of any feature when shown on the internet. Don't worry, variable names are allowed absolutely everywhere, they're just, surprisingly, often not that useful.

Its been a staple of Wake syntax, and while I've run into a few syntactic hitches in emulating languages that keep the two as separate as oil and water, I know to ignore the haters because it works, exactly as well as I always dreamed.

This idea also led to the best happy-accident in Wake's design. Months and months later, I realized one day that thanks to this simple idea, I could give Wake the tightest loop syntax ever. In disbelief I began writing yet more working code that reads like a book. See how the compiler automatically changes the array variable WatchedArgument[] to its singular counterpart WatchedArgument for me.

        foreach(WatchedArgument[]) {
            if(!Text[].contains(WatchedArgument.getWatchedType())) {

and it runs. I still, to this day, get an incredibly warm feeling when I review code like this.

A Big Boy Language

Since running loops with an automatic variable name felt so cool, I had to keep going. At this point, I'm just plain hooked.

But not every feature needs to be a brand new idea to feel that mix of pride and power. In fact, that feeling is no more intense anywhere than when I look back to making generics, lambdas, and type inference.

At this point, simply parsing the generics felt like little kids' stuff. But writing the code that understands them, and validates them, with micro transformations based on context...and thinking endlessly about the subtle effects of generic programming...that process is completely unique.

Generics in Wake look much like those in Java or C#. Excuse this example for being a bit long, its an excerpt from Wake's mocking framework wockito. Specifically, this code tracks what return values have been stubbed for a method on a mock, which is a perfect place for generic programming.

every When{T} is:

    with T[] returns = [];
    with Num = 0;

    When{T} -- thenReturn(T) {
        return this;

    T -- get() {
        var T? = returns[?Num];
        if T exists {
            Num += 1;
            return T;
        } else {
            Num = 0.orMaxOf(Num - 1); // go back one, stopping at 0
            return returns[Num]; // will throw when we stopped at 0

I had honestly never expected Wake to get this far. There's still work to be done on generics; issues which I've tackled as the standard library development demands it. But the fact that this code compiles and runs still blows my mind. Some day I will post about all the subtle quirks of generics I've realized, simply by being forced to implement them. I'd wonder how Java handled these corner cases, and write up a little program to test it. And every time, java simply didn't. Its the type of understanding that type-system geeks brag about, and writing compilers forces you to discover these things if you don't know them already.

Interfacing With Javascript

We're still working on the best ways to interface Wake with javascript and other languages, and having closures simply couldn't (and shouldn't!) be avoided.

Closures were yet another far off, likely-never-to-be-implemented sort of idea, that led me down a crazy path of type inference research and even refactoring the typechecker into a different design pattern, but as of a week ago I readied everything to cut the first release of Wake that included closure support.

These are the first anonymous functions written into the wake-on-wake test suite, including type inference in all of its glory...and yet I immediately had a problem.

    ~[test list.filter]~(Asserts) {
        var Text[] = ["hey", "hey", "man"];
        Asserts.that(Text[].filter({ t -> return true; }).getSize())Equals(3);
        Asserts.that(Text[].filter({ t -> return false; }).getSize())Equals(0);

        var filtered Text[] = Text[].filter({ t -> return t != "hey"; });

I could make List{T}.filter(), and .any(), and .sort(), but I could not make List{T}.map(), or .foldr(), or .foldl(), without creating yet another feature in a category of its own.

Generic Methods

Basking in my pride over turning out closures with type inference in such a short amount of time, I decided to try my hand at generic methods before releasing. I simply didn't want to have stupid caveats in my release notes.

It would seem at first that generic methods are very similar to generic classes, and to some degree this is true. After all, List{T}.map() could be written with generic classes only without too much hassle.

every ListMapper{T, R} is:

    R[] -- map(T[], R--fn(T) mapper) {
        var R[] = [];
        foreach(T[]) {
        return R[];

But this solution requires injecting a ListMapper of the right type into your classes, which is simply not good enough for my dream language.

To make the leap from generic classes to generic functions required writing three new recursive algorithms in the type checker, and some tweaking to work with type inference. Here's how it looks now:

every List{T} is:

    {R} R[] -- map(R -- fn(T) mapper) {

And calling it:

Text[].map({ t -> return t.getSize(); });

In our type signature, we have to save R differently than T, because one is essentially a wildcard where the other is just a fancily-tracked Object. The distinction means that within map() you can recurse into map() with any value for R that you want, but T must be the same. Then from outside of map(), T is actually bound to something, for instance, Num in Num[]. And to make matters worse, T can be bound to another generic type or argument.

warning: things are about to get like way detailed

So here's what we actually do:
  • save R as a 'generic argument'
  • typecheck the method as if all Rs are Objects, but only compatible with other Rs
  • upon use of the generic method, typecheck each argument so we know what type was actually passed in (i.e. Num)
  • recurse over each argument's signature type, and the passed-in-type, alongside each other
  • check the inside of nested types, such as lists, lambdas, and other generics
  • bail out upon any structural differences between the signature and actual types (i.e. passing Num[] into R?)
  • whenever we find a generic type, save it by name (i.e. R) to the actual value passed in (i.e. Num)
  • never allow two definitions of these generic types within one structural traversal (i.e. passing anything into Map{R, R}). Instead, assert the redefinition is the same as the previous definition.
  • copy the signature, and rewrite all Rs with the type that we found
  • go on to the next argument
This is the basics of what is required. As a final detail, by waiting to check arguments with lambda declarations until the last step, we can figure figure out that

Text[].foldl({ t, acc -> return acc + t.getSize(); }, 0);

has 0, a Num, passed into the accumulator, of type R. Therefore we take the type of the first argument (R--fn(T,R)) combined with our generic bindings (T=Text and R=Num), to infer the argument types accordingly.

I apologize for overly detailing an algorithm you've probably never cared about before, but I hope I've given you a new sense of appreciation for the types of challenges that come up when creating a compiler, and just how fun those problems can be to solve.

So Where Is Wake Now?

Wake just released v0.1.0, and there is so much more we want it to be able to do.

A lot of the dream features are in, so we're focusing now on the standard library, solving bugs, and javascript integration.

There is still a huge amount of flexibility in where we take it, and some great challenges left to solve. Integrating Wake smoothly with javascript and more is no small challenge, and we warmly invite you all to join us in having a blast designing the interop layer.

There is always a desire for more feedback, and some awesome pending features like currying and recipes and type tags. We have a roadmap describing some of it here:

Of course, if Wake isn't the perfect project for you, then make something on your own, or hack on the Rust compiler, or write some static analysis tools. The realm of possible projects is massive.

So Go Hack On A Compiler!

You'll learn more than you imagined, experience phenomenal pride in your work, and thoroughly strengthen the way you read code.


  1. Neither you, or the guy you reference, explain sufficiently why learning to create a programming language would make you better programmer. You can learn how your compiler works by using it, or reading papers.

    Also bad programmers can create programming languages just as well. Bad programmer writing a language doesn't make him a good programmer magically. He simply writes a crap programming language.

    I hope you've read your SICP.

    1. What do you mean by sufficiently? Reading this I can clearly see why creating a programming language gives you a better perspective on both the code you write, how you write it and how the language you use works. I don't think this post is declaring that writing a language makes you a "good programmer magically" simply it, at least for me, demonstrates the benefits and help that it's provided to this person. Frankly, I agree that having a deeper understanding by creating something of your own, hands on, is better than reading about it or using someone else's.

      I'd like to learn more about this language myself!

    2. "I'm not the only person to think that making compilers and compiler tools can turn you from a great programmer to a best programmer." does sounds magical... I believe that crappy languages comes out of crappy programmers and also from good programmers. Putting enough effort they can become good programming languages but that doesn't imply the programmer is progressing at the same rate...

    3. Well, along the lines of "explain sufficiently."

      Let us suppose for the sake of argument that you have never written a compiler.

      If you were then to say, "you can learn how your complier works by using it, or reading papers," and he were to say "no you can't," whom should we believe? Which one of you knows the difference? It is not you, surely. We know that he understands how a compiler works; he has a working compiler to show for it.

      So, in this hypothetical situation, if he says "you can understand programming so much better if you face the challenge of writing a programming language", and you simply reply "no, I already know, I read about in a difficult book, also I know all the gcc command-line options by heart," well, these are genuine accomplishments, but listening to the two of you talk, we can be confident that he knows what you are talking about, but we cannot have the same confidence that you actually know what he is talking about.

      In such a case, you would not come off very sympathetically, because you would be making a bunch of assertions rather than listening to someone who clearly knows more than you do; and on his own blog, too, not even in an argument on Reddit or a mailing list. This would your response to his accomplishment, then, to deny it, from a position that could only be ignorance.

      Now, I know for a fact that you have in fact written a compiler before, or at least taken the first steps towards this, because you mention SICP; this is a book that, if you work through the exercises, will take you step by step through the process of writing a programming language. This is frequently mentioned as one of its great virtues, in fact, that it forces the student to think about programming by considering how programming languages work from the perspective of actually implementing one.

      However, I am afraid, then, that without more explanation, your comment comes off as not really all that helpful or interesting, and actually it reflects rather poorly on you. And here is the interesting thing: it would still reflect poorly on you even if you were right.

      Here is a person who has taken the time to teach what he believes he has learned, and you have recognized and understood his errors: but you have no interest in actually teaching us why he is wrong. If you are going to say something, you should explain from your own experience, from your own knowledge, why he is in error. But you don't do this thing, and so we end up thinking badly of you.

    4. I thought I pointed out very clear what I think is the problem here.

      It is the assumption that implementing a programming language makes you a great programmer. For that to be true, there needs to be desire to study, and preferably having read everything you can about the subject would be good too. Besides, unless it's religion you should cast away beliefs and work on reasons.

      Anyone can implement a programming language. To prove that, here's a forth in python.

      >>> sp = []
      >>> for token in raw_input('> '):
      ... if token == '+':
      ... sp.append(sp.pop() + sp.pop())
      ... else:
      ... sp.append(int(token))
      > 1 2 3 + +
      >>> sp

      It's so simple that you may have done it accidentally couple times, if you haven't done it consciously this far.

      Now the author has probably done bit more than that but I hope to be clear that it's been lot of effort think out programming like it's today. To produce a better language, or to overall learn greatly, you need to read on what people have done before you.

  2. I did a language too a few years ago, and I totally understand your feelings. There's so much things to learn with such a project !

  3. And then, the real learning come from teaching others what you did. Many a bug is found that way. In all of implementation and deign and requirements.

  4. I apologize if this comment gets posted twice. My first submission sent me to a google login page which sent me back here with no indication if my comment had been posted. I'll try to recreate my post from memory:

    It's a shame the first comment is so negative. Wake is the first programming language in a while to make me want to stop what I was doing and immediately try it. It also made me want to try my hand at helping out on the language or even creating my own, so this post was particularly insightful for me.

    Wake looks like the best parts of Java packaged with the expressiveness of Ruby. Keep up the good work!

    1. All that enthusiasm is worth a praise, but I'd still rather see reasonable enthusiasm rather than blind enthusiasm.

      Btw. When you say 'best parts of Java' or 'expressiveness of ruby', what do you mean?

  5. This comment has been removed by a blog administrator.

  6. This comment has been removed by a blog administrator.

  7. This blog awesome and i learn a lot about programming from here.The best thing about this blog is that you doing from beginning to experts level.

    Love from