Sunday, November 29, 2015

Going Open with "Encase"

I've been pondering for a while a concept for a new language. Partly its interesting to me because I want to get more diverse compiler experience than just Wake's set of problems,and partly because its a program I know I would use.

I'm going do something that I didn't do with Wake, that I think in hindsight is actually the best way to start a language -- I'm going to make it open and public from day one.

Now that wake has been officially "alpha released," there are still mountains to do with it. And just the same I have seen interest and I feel like I have made a positive impact on programmers who even just ask the question, "what problem is wake trying to solve?"

With that new experience (along with some advice that its never too early to publish an idea for an open source project), I wanted to share what little I have to say on it yet, and invite you all to critique it -- or, if it ever makes it into other developers hands in a serious way, it will serve as a cool journal of how Encase came to be.

A publicly commentable specification draft of Encase can be found here. It is still a work in progress.

What problem is Encase trying to solve?

I love the command line interface, and one of the reasons why is the magnificent composability of bash. Naturally I have a sort of terminal-formatting fetish, where I'm constantly trying to make my programs look as cool as possible.

But the two concepts are actually very separate. On the one hand, you have awk, cut, and xargs, which cooperate as beautifully as a centaur hugging a unicorn, and on the other hand you have top, less, and vim, which are a set of distant vacation spots. They look great by themselves but you can't be in both Sicily and Venice at the same time. These programs are built in ncurses, a complex GUI framework for the terminal.

I believe that ncurses was not *nix-y enough, and that it is the choice of making ncurses a library that separates the top from the less. I believe that if we could make ncurses into a command in just the right way, you could awk your less and maybe even less your top.

My approach with Encase will be to adapt ncurses into a language. You can write a program that simply spits out a mix of content and Encase, and pipe it into any program you choose, usually that program being an interpreter that renders it for you live as your program runs.

Many immediate uses come to mind.

top # view the encase code raw
top | encase # view it how humans are meant to see it
top | strip-encase > computerFriendlyLog # save JUST the actual content to a file
top | encase-record > topReplay.ncs # "record" a specific run of a program
top | custom_top_reskin.sh | encase # completely re-skin an Encase app??!

The last one specifically may be a stretch; but the great thing about ncurses as a language is it would suddenly just be strings. You could trivially, say, replace all instances of 'red' with 'blue', all borders with one twice as thick, etc etc.

As an added benefit, you would be able to run curses in any language without any required re-learning: python, bash, c, c++, php, ruby, javascript. No new APIs to learn, no special porting or required compilation flags (looking at you, PHP).

And last but not least, if done well it would make a sleek CLI program interface a breeze, no longer requiring you to look up the right ANSI escapes, and in a much more tailored way than a library (ncurses) could ever provide.

What inspired the (perhaps temporary) name?

At first I was going to go with Curse as a blatant ncurses reference, but, I thought it was too negative and too similar to ye olde curses. After all it will be a very different beast in the end.

I then thought about using "jinx" as a synonym of curse, but wanted a stronger hat tilt to ncurses than that. I almost chose "njinxes" but it was very similar to the web server nginx.

Playing still more off of ncurses, I found myself saying "ncase" when trying to play off of its phonetics. Combined with the layout engine inside it that "encases" your content, I decided that "encase" hit on the marks of inspiration, coolness, and brevity.

Some Examples

While literally every idea I'm about to present is subject to change, and/or may never see the light of day, it wouldn't be very "open" of me to tell you the goals (and even describe the name!) without showing what I had in mind.

First of all, one goal is to make it very easy to ignore most (if not all) Encase code if you want to. For this reason, I've decided to make every line of Encase begin with a standard sequence. The rest is output.

$$ this is some Encase code
this is raw output

I chose $ because it is (hopefully somewhat uniformly) rare, and also does not require an escape. I almost used backslash for its rarity, but didn't want it to be heavily escaped in other languages (ie, println("\\ your Encase code")). I chose to use multiple $s so that the chances of ever needing to worry about escaping your output is really really low.

Beyond that, it should have imports, and it should be turing complete (so that your turing complete custom UI components can be shared across languages).

$$ import alert
$$ n + 1 times alert "hello" // turing completeness

Here we assume someone has written a plugin in Encase which creates momentary popup alerts. We can import it (provided its installed on the right path...) and then use it. Notice that we're using a variable (n), doing some math (n + 1), and looping (n + 1 times ...).

I have a feeling the times construct will work very nicely in creating formatted output. I included it in Wake as part of the standard library and liked it; its a great example of the type of constructs one would find in a language tailored to string formatting.

Writers of plugins (such as this alert plugin shown above) would be able to leverage a box layout engine to create the effect of windows, panes, buttons, and more. They would (ideally) all be easily describable in Encase itself even if the layout engine they used was a native layer.

$$ format window title
$$  open border
$$      open horizbox
$$          height 2
$$          open centerAlign
$$              name titleSection
$$          end
$$      end 
$$      open horizbox
$$          open scrollbox
$$              name contentSection
$$          end
$$      end 
$$  end 
$$  render content
$$      use titleSection
$$      clear
$$      print title
$$      width times print "-" 
$$      use contentSection
$$      clear
$$      print content
$$  end 
$$ end

Here we would create a format named window, which would require the title as a parameter. The whole box is inside a border format (which would be equally simple and just draw a box around whatever inner content it gets). Inside that border, we create a horizontal box containing the title and its dividing line, then below that a scrollbox. We name those sections for use later.

When we say open border we can print to it and put other formats inside it. Same thing goes for our window. This is why we say render content, we are going to describe how to render or rerender a window whenever our content or sizing has changed. content in this case is a variable but its quite nicely readable when content is the chosen name.

The rendering engine would buffer this in memory, and at a fixed frame rate it would diff the previous state with subsequent states, performing as few and as optimal updates as possible to keep performance high.

In terms of the layout engine itself, there are many papers on some layout engines and their time complexity. I have yet to choose which I'll use but I'm sure a good one exists.

This leaves us in a nice place to design a layout, name its components, and then as our app runs we can switch panes, change their contents. We should be able to add and remove components on the fly to create very dynamic interfaces.

Problems I'm Still Thinking On

This section will be very disorganized! Muddle through as long as it holds your interest.

Unlike most GUIs, I am deliberately running the visual code in a separate process than the application. This makes conventional two-way binding and/or event handler concepts more or less impossible. For a while I thought my biggest problem would be managing scroll states, but I did realize that I can rely on the mouse (where supported by the terminal) to handle scrolling. Still, this leaves a burden on the app developer to performantly follow the state of the visuals and update them as needed, capture keystrokes and form navigation themselves. If you wanted to add click-to-select-text-box functionality, you'd be out of luck.

I think this might be elegantly solvable by changing top | encase to encase top, and let the encase interpreter call top to watch all I/O. However I'm not sure how exactly this would all work out. Perhaps a fifo streams events into the inner process? Still an unsolved issue. Remember too that the goal is to make programs composable; how could such a fifo be made simple to use in your own way?

I'm also not entirely sure what to make of multithreading. And I don't mean multi threading in terms of efficiently rendering the layouts on a multi core processor, I mean a threaded program where each thread spits out encase code that works perfectly on its own, but doesn't work perfectly when the lines get all jumbled up. On the one hand multi threading support would be difficult to seamlessly add (obviously the interpreter would have to be threadsafe, and have multiple states per thread all interacting at once...). On the other hand it might be clunky. Every thread would need to have its own prefix, and going into no-code mode for printing content would be ambiguous. If Encase ran all lines of code on one thread, anyone doing parallized output would need to manually synchronize their updates into sensical batches of code -- not exactly a minor quirk.

I'm also kind of curious about late binding variables. Notice in the window example, we have a title that is passed in when the window is created. What if you want to change the title? We would probably add in some functions specific to windows that could be called whenever you are inside a window box. But it would be cooler if there was a slick way to make it automatically rerender the title of the window whenever the original variable it came from was changed.

I'm also thinking pretty hard on what types of mistakes would be common to code in encase as its written. Anythnig I can do to avoid confusion (forgetting to end a window, forgetting a rerender function, using formats in the render function instead of the format body and getting lowered performance, etc etc) and make it less obvious. There will be no type system to catch this stuff for the users!


What I'm Excited About

Mostly I'm excited to try to add JIT aspects to the compiler. It may be a challenge for me to get it right cross platform and everything, but I think this is a great example of a language that could use a JIT (all that rerendering from scratch in a necessarily dynamic language!), and I would have a blast even optimizing a few things for a specific architecture.

I'm also excited in that it seems like it would be very easy to whip up some clones of some classic unix utilities like nano, top, and/or some bigger fish like a simple vim clone. It sounds like a cool project...maybe even make a Roguelike or two!

Either way (if I make it and its as good as I hope, or not) it has been fun to think about, and I invite you to comment with your thoughts/suggestions.

Sunday, February 1, 2015

The Incredible Rewards Of Creating a Programming Language

Many of us enjoy seeing programming languages pop up that we might one day use, especially with cool ones like rust (memory safe without GC) and avail (extremely literate programming).

But I would argue that many of us programmers, whether we like these trending languages or not, should start putting time into programming language infrastructure. That infrastructure is what enables us to do what we do every day, and the work is incredibly rewarding.

For the past year I have been creating the programming language Wake, and it is indescribable how cool a side-project it has been.

Language creation forces you to think like both a programmer and a computer at the same time. You will learn things about your favorite programming languages whether you create something on your own, or contribute to a powerhouse like java, or help a new kid on the block like rust, or join the Wake team.

I'm not the only person to think that making compilers and compiler tools can turn you from a great programmer to a best programmer. Here's my story of discovery and accomplishment in creating Wake.

The First Wake Program

When I first started making Wake, I doodled away on a text file, making little syntaxes and commenting them like a madman writing on his padded walls.

I wanted Wake to be a testable, statically typed, compiled language, ensuring any and all code you write will have testable seams via built-in dependency injection. This turned out to be a bit of a challenge for creating the first running wake program, since I had to make a parser, and a code generator, a semantic analyzer, and a dependency injection library all in one before I could so much as print "hello world."

I do not have a formal education, so I turned to Coursera to learn the basics of parsing, type checking, and codegen before getting too far along. I highly recommend it to you too, for getting started.

And at long last, I was able to write, compile, and run the first ever Wake program. It felt like I had created life.

every Main is:
    needs Printer;
    main() {
        Printer.printLine("Hello World!");
    }

This actually ran on my computer. And just seeing the two most overused words in all of computer science, it actually felt like Wake was opening its eyes for the first time, and letting the world know it was here.

Playing With Optional Names Instead Of Optional Types

One of the biggest experiments in Wake is making variable names universally optional in lieu of type information.

Rather than relying on semantic-restricting and/or fallible type inference to keep programs clean, I set to work on a simpler idea: that types are often a great name for variables, which both the programmer and compiler can use to easily and effectively understand most common code.

As Wake became more complete I began writing unit tests in Wake that tested Wake, and found myself constantly stopping to admire my work instead of finishing it. Reading these test let it sink in that I had created a whole little language of my own. Just by skimming these little tests it was clear that they could only be Wake. And even cooler? They all passed.

Notice how not a single variable in this code has a name. They all only have types.

import Asserts;

@TestClass
every AssignmentsTest is:

    @Test
    testPlusEquals(Asserts) {
        var Num = 5;
        Num += 0;
        Asserts.that(Num)Equals(5);
        Num += 1;
        Asserts.that(Num)Equals(6);
        Num += 5;
        Asserts.that(Num)Equals(11);
    }

This is probably the most controversial idea in Wake. It gets the most positive feedback of any feature when shown in person, and the most negative feedback of any feature when shown on the internet. Don't worry, variable names are allowed absolutely everywhere, they're just, surprisingly, often not that useful.

Its been a staple of Wake syntax, and while I've run into a few syntactic hitches in emulating languages that keep the two as separate as oil and water, I know to ignore the haters because it works, exactly as well as I always dreamed.

This idea also led to the best happy-accident in Wake's design. Months and months later, I realized one day that thanks to this simple idea, I could give Wake the tightest loop syntax ever. In disbelief I began writing yet more working code that reads like a book. See how the compiler automatically changes the array variable WatchedArgument[] to its singular counterpart WatchedArgument for me.

        foreach(WatchedArgument[]) {
            if(!Text[].contains(WatchedArgument.getWatchedType())) {
                Text[].push(WatchedArgument.getWatchedType());
            }
        }

and it runs. I still, to this day, get an incredibly warm feeling when I review code like this.

A Big Boy Language

Since running loops with an automatic variable name felt so cool, I had to keep going. At this point, I'm just plain hooked.

But not every feature needs to be a brand new idea to feel that mix of pride and power. In fact, that feeling is no more intense anywhere than when I look back to making generics, lambdas, and type inference.

At this point, simply parsing the generics felt like little kids' stuff. But writing the code that understands them, and validates them, with micro transformations based on context...and thinking endlessly about the subtle effects of generic programming...that process is completely unique.

Generics in Wake look much like those in Java or C#. Excuse this example for being a bit long, its an excerpt from Wake's mocking framework wockito. Specifically, this code tracks what return values have been stubbed for a method on a mock, which is a perfect place for generic programming.

every When{T} is:

    with T[] returns = [];
    with Num = 0;

    When{T} -- thenReturn(T) {
        returns.push(T);
        return this;
    }

    T -- get() {
        var T? = returns[?Num];
        if T exists {
            Num += 1;
            return T;
        } else {
            Num = 0.orMaxOf(Num - 1); // go back one, stopping at 0
            return returns[Num]; // will throw when we stopped at 0
        }
    }

I had honestly never expected Wake to get this far. There's still work to be done on generics; issues which I've tackled as the standard library development demands it. But the fact that this code compiles and runs still blows my mind. Some day I will post about all the subtle quirks of generics I've realized, simply by being forced to implement them. I'd wonder how Java handled these corner cases, and write up a little program to test it. And every time, java simply didn't. Its the type of understanding that type-system geeks brag about, and writing compilers forces you to discover these things if you don't know them already.

Interfacing With Javascript

We're still working on the best ways to interface Wake with javascript and other languages, and having closures simply couldn't (and shouldn't!) be avoided.

Closures were yet another far off, likely-never-to-be-implemented sort of idea, that led me down a crazy path of type inference research and even refactoring the typechecker into a different design pattern, but as of a week ago I readied everything to cut the first release of Wake that included closure support.

These are the first anonymous functions written into the wake-on-wake test suite, including type inference in all of its glory...and yet I immediately had a problem.

    @Test
    ~[test list.filter]~(Asserts) {
        var Text[] = ["hey", "hey", "man"];
        Asserts.that(Text[].filter({ t -> return true; }).getSize())Equals(3);
        Asserts.that(Text[].filter({ t -> return false; }).getSize())Equals(0);

        var filtered Text[] = Text[].filter({ t -> return t != "hey"; });
        Asserts.that(filtered.getSize())Equals(2);
        Asserts.that(filtered[0])Equals("hello");
        Asserts.that(filtered[1])Equals("man");
    }

I could make List{T}.filter(), and .any(), and .sort(), but I could not make List{T}.map(), or .foldr(), or .foldl(), without creating yet another feature in a category of its own.

Generic Methods

Basking in my pride over turning out closures with type inference in such a short amount of time, I decided to try my hand at generic methods before releasing. I simply didn't want to have stupid caveats in my release notes.

It would seem at first that generic methods are very similar to generic classes, and to some degree this is true. After all, List{T}.map() could be written with generic classes only without too much hassle.

every ListMapper{T, R} is:

    R[] -- map(T[], R--fn(T) mapper) {
        var R[] = [];
        foreach(T[]) {
            R[].push(mapper(T));
        }
        return R[];
    }

But this solution requires injecting a ListMapper of the right type into your classes, which is simply not good enough for my dream language.

To make the leap from generic classes to generic functions required writing three new recursive algorithms in the type checker, and some tweaking to work with type inference. Here's how it looks now:

every List{T} is:

    {R} R[] -- map(R -- fn(T) mapper) {
        //...
    }
    //...

And calling it:

Text[].map({ t -> return t.getSize(); });

In our type signature, we have to save R differently than T, because one is essentially a wildcard where the other is just a fancily-tracked Object. The distinction means that within map() you can recurse into map() with any value for R that you want, but T must be the same. Then from outside of map(), T is actually bound to something, for instance, Num in Num[]. And to make matters worse, T can be bound to another generic type or argument.

warning: things are about to get like way detailed

So here's what we actually do:
  • save R as a 'generic argument'
  • typecheck the method as if all Rs are Objects, but only compatible with other Rs
  • upon use of the generic method, typecheck each argument so we know what type was actually passed in (i.e. Num)
  • recurse over each argument's signature type, and the passed-in-type, alongside each other
  • check the inside of nested types, such as lists, lambdas, and other generics
  • bail out upon any structural differences between the signature and actual types (i.e. passing Num[] into R?)
  • whenever we find a generic type, save it by name (i.e. R) to the actual value passed in (i.e. Num)
  • never allow two definitions of these generic types within one structural traversal (i.e. passing anything into Map{R, R}). Instead, assert the redefinition is the same as the previous definition.
  • copy the signature, and rewrite all Rs with the type that we found
  • go on to the next argument
This is the basics of what is required. As a final detail, by waiting to check arguments with lambda declarations until the last step, we can figure figure out that

Text[].foldl({ t, acc -> return acc + t.getSize(); }, 0);

has 0, a Num, passed into the accumulator, of type R. Therefore we take the type of the first argument (R--fn(T,R)) combined with our generic bindings (T=Text and R=Num), to infer the argument types accordingly.

I apologize for overly detailing an algorithm you've probably never cared about before, but I hope I've given you a new sense of appreciation for the types of challenges that come up when creating a compiler, and just how fun those problems can be to solve.

So Where Is Wake Now?

Wake just released v0.1.0, and there is so much more we want it to be able to do.

A lot of the dream features are in, so we're focusing now on the standard library, solving bugs, and javascript integration.

There is still a huge amount of flexibility in where we take it, and some great challenges left to solve. Integrating Wake smoothly with javascript and more is no small challenge, and we warmly invite you all to join us in having a blast designing the interop layer.

There is always a desire for more feedback, and some awesome pending features like currying and recipes and type tags. We have a roadmap describing some of it here: http://wakelang.com/roadmap.html

Of course, if Wake isn't the perfect project for you, then make something on your own, or hack on the Rust compiler, or write some static analysis tools. The realm of possible projects is massive.

So Go Hack On A Compiler!


You'll learn more than you imagined, experience phenomenal pride in your work, and thoroughly strengthen the way you read code.

Friday, January 9, 2015

The Visitor Pattern vs Object Composition

Over the last week or so, I've been going down a large refactoring of the wake compiler. And in all the hours I've spent rewriting the parse tree's data structure, I have yet to choose between two design patterns. Its possible that there is already a third that unifies the two, and if not then I'd like to share this problem as a little hole in computer science.

The design problem is as follows: say you want to traverse a binary tree.

We'd probably do something like:

class BinaryTree {
    Object item;
    BinaryTree left;
    BinaryTree right;
}

From here you have two choices. Either create your algorithms as methods on the tree:

   public Object findByPath(int binaryPath) {
       if(!binaryPath) return item;
       else if(binaryPath & 1) {
            return left.findByPath(binaryPath << 1);
        } else {
            return right.findByPath(binaryPath << 1);
        }
    }

or you could create a class for each algorithm that operates on the tree.

class BinaryPathFinder {

    public Object findByPath(BinaryTree root, int binaryPath) {
         if(!binaryPath) return root.item;
         else if(binaryPath & 1) {
             return findByPath(root.left, binaryPath << 1);
         } else {
             return findByPath(root.right, binaryPath << 1);
         }
     }
}

This could at times be the better approach -- a binary tree has many types of algorithms that could be performed on it, so this separates the algorithms from the data structure and allows users of your API to perform operations on your data that you didn't forsee.

But our tree isn't very typesafe right now. We'd be able to, for instance, set only a left node, or a right node, and/or set both nodes and a value at once. Maybe that's expected, but maybe not. Solving this typesafety is where we create the key differences between the Visitor pattern and plain 'ol object composition.

So, we can easily make it typesafe by going with the first approach:

interface BinaryTreeNode {
    public Object findByPath(int binaryPath);
}

class BinaryTreeLeaf  implements BinaryTreeNode {
    Object item;
    public Object findByPath(binaryPath) {
        return item;
     }
}

class BinaryTreeFork implements BinaryTreeNode {
    BinaryTreeNode left;
    BinaryTreeNode right;
    public Object findByPath(binaryPath) {
         ....
    }
}

But the problem is that we cannot decouple this data structure from its algorithm without actually worsening our typesafety with reverse casts and instanceof checks, turning BinaryTreeNodes into BinayTreeForks and BinaryTreeLeafs. In fact, without our algorithm's method 'findByPath(int)', our BinaryTreeNode interface would be 100% empty and 100% useless.

The Visitor pattern is a simple but clever solution to this problem.

interface BinaryTreeNode {
    public Object visit(BinaryTreeVisitor visitor);
}

interface BinaryTreeVisitor {
    public Object visit(BinaryTreeLeaf leaf);
    public Object visit(BinaryTreeFork fork);
}

class BinaryTreeLeaf implements BinaryTreeNode {
    Object item;
    public Object visit(BinayTreeVisitor visitor) {
        return visitor.visit(this);
    }
}

class BinaryTreeFork implements BinaryTreeNode {
     BinaryTreeNode left;
     BinaryTreeNode right;
     public Object visit(BinaryTreeVisitor visitor) {
         return visitor.visit(this);
     }
}

class BinaryTreeFindByPath implements BinaryTreeVisitor {
    public __construct(int binaryPath) {
        ...
    }

    public Object visit(BinaryTreeLeaf leaf) {
        ...
    }

    public Object visit(BinaryTreeFork fork) {
        ...
    }
}

Note: this may technically be a variation from the visitor pattern. The visitor pattern is often used to attach visitors to the tree in one step and then operate on it separately, while this does both in one step.

This visitor is more complicated here, because it needs to track the state of binaryPath, but the important thing is that its typesafe and decoupled. Especially assuming a stateless algorithm, this could once again be the better choice.

But there's an advantage to the non-visitor solution that gets lost once we make that change. Object composition allows us to create invisible little nifty wrapper classes around BinaryTree. For instance, we could write a quick little caching layer.

class BinaryTreeCache implements BinaryTreeNode {

    public __construct(BinaryTreeNode wrapped) { ... }
    public Object[] cache;
    public Object findByPath(int path) {
        if(cache[path]) return cache[path];
        else return wrapped.findByPath(path);
    }

}

This allows any single node (leaf or binary) to be cached, without caching all of them. You could use inheritance here, but you'd have to choose whether to extend BinaryTreeFork or BinaryTreeLeaf. With a composition based approach, you can extend a leaf, a branch, or one of your other wrapper classes, all seamlessly.

Notice that this behaviour would not work with the visitor pattern. Without extra code for detecting a change in the visitor, you would return an outdated, possibly nonsensical result. This is because we have no longer fully decoupled behaviour from the tree, and it may step on what is supposedly the visitor's job. It's supposed to be the visitors job to handle these behaviour changes, but the visitor can not easily add caching over individual nodes in as nearly elegant of a way.

You could definitely try to mix the two patterns naively, and for some things it would work. For instance, you could make a BinaryTreeNode that lazy-loads its contents from a file, which the visitor would be able to traverse without issue. But any tweaks to the behavior of your algorithm would not be guaranteed to play nicely with the visitor (timeouts, threading, probabilistics, distributing across servers).

Essentially, the object composition approach gives you the ability to separate how to configure an algorithm from the actual moment it runs. If you want a cache at every three layers and a distributed answer for every five, you can put all those if-statements into code that builds or transforms the tree, instead of putting those if-statements inside the code that operates on the tree.

Putting my rambling about the greatness of object composition aside, the advantages here specifically include:
  • Decoupling edge-case code from main use-cases
  • Separating configuration step from operations step
  • Code-driven configuration for maximum flexibility
  • Infinite layerability of edge-case wrappers
  • Easy to extend behaviour without breaking SRP
  • Small classes
  • Possibility of tree transformers that transform your configuration


By giving the visitor the responsibility of behaviour instead of the nodes, we need to be able to compose the visitor in order to compose behavior in these neat ways. But if we wrap or extend the visitor class, we end up changing the way that every leaf or branch is used. I can't see any easy ways around this.

So the question is, could the visitor pattern be upgraded to allow object composition to change the behaviour of the algorithm on a node by node basis? I'd welcome the solution, and it would live happily in the belly of the wake compiler to stand the test of time.