Wednesday, June 27, 2018

What I Learned Making My Own JIT Language

Early last year I started creating a little JIT language named Vaiven. It is a super minimal little language including objects, lists, control flow, functions, ints, and doubles. Vaiven currently includes GC, an interpreter, a first pass assembler, and hot code gets optimized via an SSA optimization pipeline. The assembled code can call back into the interpreter as needed and vice versa. It's a functioning (but tiny) JIT. It was a super fun little project, and I had a number of reasons why I wanted to make it.

After creating Wake, which AOT compiled into JS, I wanted to go from AOT to JIT, from the "assembly" of the web to, well, real assembly!

I imagined a sort of ncurses-as-a-language TUI framework (Text User Interface), and that gave me an excuse to create a new language. I think it will work better as a language than a library for reasons detailed in my original blogpost about why I wanted to make it. This is not entirely crazy, as Elm sort of took the same approach. The TUI part is still completely untouched because it’s really something outside of my expertise. Still, it’s possible that a good, tailored JIT like what I have started with Vaiven will make such a TUI language possible.

I was also a few months into newly joining the Dart team. Dart was created by the minds behind V8. As a perk of my job, I was able to ask some brilliant VM engineers questions, and I wanted to understand the roots of my project language as well as I could.

The current state of Vaiven has some gaping holes — it has no object literal syntax, no load/store optimization, or stack traces, and the GC is a pile of hacks for reasons I wish I had space to fit into this post. Nevertheless, what I was able to do taught me a lot about JITs in general — how to best use them, when to trust or distrust them, how a JIT affects language design. And what I didn't do has arguably taught me even more! I simply needed to write up my thoughts about it.

A Brief Tour of Vaiven

Vaiven has a very keyword-based syntax, and so defining a function looks like this:
fn foo is
Typing that into the debug version of the JIT gives you a bunch of assembly output, shown below. At the moment, functions are eagerly compiled for the first optimization level (which is, almost none). This can be lazy in the future.
push rbp
mov rbp, rsp
mov eax, dword [+94810657183568]        ; mov v0, dword [+94810657183568]
add eax, 1                              ; add v0, 1
mov dword [+94810657183568], eax        ; mov dword [+94810657183568], v0
cmp eax, 10                             ; cmp v0, 10
je L2                                   ; je L2
mov rax, 281474976710656                ; mov v3, 281474976710656
jmp L1
mov rdi, 140730010364832                ; mov v4, 140730010364832
mov rsi, 94810657183664                 ; mov v5, 94810657183664
call 94810634424347
call rax
mov rsp, rbp
pop rbp
Most of this code is going crazy with giant integers — these are magical addresses specific to the compilation of this function. Each function gets, in addition to assembly that can run it, an object that tracks the data types used and waits for the function to get “hot” to optimize it.

The first value, 94810657183568, is the memory address of the call counter. You can see we get the value at that address in dword form (32 bit int). We add 1 to it to track the current function call, and we then store that 32 bit result back in memory.

We then cmp eax, 10. So on the 10th invocation, we je (jump on equal) to L2, which has more fancy big numbers!

These are addresses to the source code, and the profiling info object of that function. We call the optimizer function by address, with the other two pointers being arguments. Lastly, we call rax to execute the newly optimized assembly that we just produced!

In the "cold" case, we do not je (jump equal), so we execute the L10 block. In this case, we load a special 64 bit value which represents the number 0 into the return register and then jump to L1 which is the exit of the function (performs common stack cleanup).

The value 0 here is represented by a special 64 bit boxed value using a method described in this paper. This allows us to store and differentiate 64 bit pointers, 64 bit floats, booleans, null, 32 bit integers, and more, all inside a single 64 bit register.

We can call this function from the interpreter by using the REPL:
which prints:
Int: 0
And after doing so 10 times, we trigger the optimizer:
var x = 0
Int: 0

for x < 10 do foo(); x += 1; end

push rbp
mov rbp, rsp
mov rax, 281474976710656                ; mov v0, 281474976710656
jmp L1
mov rsp, rbp
pop rbp
You can see how the assembly we generate is not perfect; the jmp L1 is useless, but the code isn't aware of that because it is trying to avoid executing the (empty) L2 block.

But otherwise, all we do is save and store the base pointer (for GC purposes — in this case it could be optimized out by a smarter JIT) and move our special 0 value into the return register.

This looks like a huge improvement, but it’s not, because we had nothing to optimize. The only actual change is in the prolog and epilog of the function. But what if we throw a level of indirection at it, and add a parameter, as shown below:
fn bar of x is
  if x == 0
we get a much more significant amount of asm on the first pass:
push rbp
mov rbp, rsp
push rbx
sub rsp, 8
mov eax, dword [+94775382205392]        ; mov v1, dword [+94775382205392]
add eax, 1                              ; add v1, 1
mov dword [+94775382205392], eax        ; mov dword [+94775382205392], v1
cmp eax, 10                             ; cmp v1, 10
je L2                                   ; je L2
mov rax, 140737488355327                ; mov v2, 140737488355327
cmp rdi, rax                            ; cmp v0, v2
jae L4                                  ; jae L4
mov rcx, [rdi]                          ; mov v3, [v0]
shl rcx, 4                              ; shl v3, 4
jmp L3                                  ; jmp L3
mov rax, 2251799813685247               ; mov v2, 2251799813685247
mov rcx, 8                              ; mov v3, 8
cmp rdi, rax                            ; cmp v0, v2
jae L3                                  ; jae L3
mov rcx, rdi                            ; mov v3, v0
shr rcx, 48                             ; shr v3, 48
mov ax, word [+94775382205396]          ; mov v2, word [+94775382205396]
or ax, cx                               ; or v2, v3
mov word [+94775382205396], ax          ; mov word [+94775382205396], v2
mov rsi, 281474976710656                ; mov v4, 281474976710656
mov qword [rsp], rdi                    ; [Save] 
call 94775360292932
test eax, eax                           ; test v5, v5
jz L14                                  ; jz L14
mov rbx, 94775382202000                 ; mov v11, 94775382202000
call [rbx]
mov rdi, rax                            ; [Move] 
call 94775360285928
mov rax, 562949953421312                ; mov v9, 562949953421312
jmp L1
mov qword [rsp], rdi                    ; [Spill] 
mov rdi, 140730134516032                ; mov v12, 140730134516032
mov rsi, 94775382205680                 ; mov v13, 94775382205680
call 94775360237595
mov rdi, qword [rsp]                    ; [Alloc] 
call rax
lea rsp, [rbp-8]
pop rbx
pop rbp
I won't break all of this down, but I will say that this code tracks bits for the data type of x in its hardcoded metadata, it calls a function to compare x and 0 (because comparison is special-cased for Strings), and it looks up foo()'s current compiled address to invoke it. The rest is the same.

Call this function 10 times, passing an int each time, and you get the following:
for x < 10 do bar(x) ; x += 1 end

push rbp
mov rbp, rsp
sub rsp, 16
mov rax, rdi                            ; mov v5, v0
shr rax, 32                             ; shr v5, 32
cmp rax, 65536                          ; cmp v5, 65536
jne L2                                  ; jne L2
cmp edi, 0                              ; cmp v0, 0
jne L14                                 ; jne L5
mov qword [rsp], rdi                    ; [Spill] 
mov rdi, 281474976710656                ; mov v2, 281474976710656
call 93978509257960
mov rax, 562949953421312                ; mov v4, 562949953421312
jmp L1
call 140676951605440
mov rsp, rbp
pop rbp
mov qword [rsp], rdi                    ; [Spill] 
short jmp L5
Obviously this code is shorter, but why?

Firstly, it checks to see that the value passed into x was actually an integer like it has seen before. If it’s wrong, it jumps to L2 to call the deoptimized code which can handle this case. Otherwise it continues uninterrupted to L13.

That block then knows (thanks to the earlier invariant checks) that it can simply and safely run a 32 bit comparison on x against 0, and jumps if not equal (jne) to the function exit. And when x does equal 0, it continues to L4 which is the code for print(foo()).

Note that the call to foo() is inlined (foo always returns 0, remember?), and the special tagged 0 value is passed directly into the print() function, saving call overhead. Both cases (x == 0 or not) load the return register with the special void value, 562949953421312, before exiting.

Lastly, if we run a simple fib.vvn / fib.js comparison benchmark, Vaiven currently beats v8 by a hair and loses to Dart by a step:
[fibvvn] 12.676s
[fibjs] 13.400s
[fibdart] 10.956s
:: fibvvn 0.86X faster than fibdart (10.956 / 12.676)
:: fibvvn 1.06X faster than fibjs (13.400 / 12.676)
You should not be impressed by Vaiven beating v8 here, as it is over 5x slower in my other benchmark (lcf). Still, it shows that there are certain, very limited places where the Vaiven VM can act like a real JIT.

Lesson 1: Deoptimization is the root of all awesome

Before creating Vaiven, I thought of JITs like V8 as optimization engines. And of course that's true, but there's a missing piece to that statement. JITs are actually phenomenal deoptimization engines.

One thing that I have not yet implemented in Vaiven is the ability to profile types across call sites. That means that this:
fn addOne of x is x + 1 end
can be optimized into native machine addition, and this:
fn addOneToResult is getNumber() + 1 end
can not. Why? Because a JIT VM has to go through a very crazy dance when getNumber() does the wrong thing and doesn't return an integer like before. The optimized code has some set of registers that mean certain things at different times during the function execution, and uses those values efficiently. The unoptimized code does too, but those registers and stack values mean completely different things.

When getNumber() returns a string, you are in the middle of one piece of assembly code, and need to hop across that barrier into a different piece of assembly code, and you need to dynamically fix up your registers to the right stuff in order to do it. This is a slow, expensive process, and one that I couldn't do because my assembler's register allocator simply doesn't track enough data (that aside, my assembler, asmjit, was extremely nice to use).

This dance is incredibly important: without it, a JIT has no reason to exist. It’s only the ability to hop from here to there (in Vaiven's case, it can only do this hop before beginning the optimized code) that lets the machine run an efficient instruction set on your variables.

Lesson 2: Inlining is awesome, too

In Vaiven I have gotten by so far without doing on-stack deoptimization because I wrote an inliner. While you can't inline everything, you really should not be afraid of splitting functions up for readability.

If I can write a VM that inlines half decently, then you bet V8 is doing an even better job. That's not to say that there won't be times where manual inlining speeds up your code, it just shows that you should stick to the common principle: premature optimization is the root of all evil. Don't manually inline anything without benchmarking, and don't keep your work (no matter how dear it is to you) if it doesn't give you a sizeable performance bump.

The same goes for constant folding. Please write this code:
var x = 2 * PI;
and not this code:
var x = 6.243.... // 2 * pi
The compiler can do this for you, even in a JIT.

Lesson 3: Be Leery of Micro-Benchmarks

The biggest example of this lesson is that Vaiven can beat V8 on certain benchmarks. Does this make Vaiven a faster language than JS? Absolutely not. No, its faster because I picked it, and I picked it because I knew Vaiven would be able to do it quickly.

Different versions of v8 will have certain bugs, and any optimization pass can get confused by simple things. This is especially true in a JIT, where optimization passes need to run quickly.

Related to lesson two, you never really know what you're testing. You may think that you're testing multiplication here:
for(var x = 0; x < 100000; ++x) {
  x * 2;
but in reality a smart JIT will generate an empty loop, or no loop at all. You have to interfere with the optimizer to prevent this, and how to best interfere with it may be version specific to your JIT, or have certain limitations/caveats/tradeoffs. Any microbenchmark is likely to be fairly flawed.

If you really want to know what the overhead of X is in language Y, you should research the actual implementation. It may have some crazy worst case scenarios, or be very consistent. I know it’s not easy, but it’s better than learning the wrong lesson because a micro benchmark did something sneaky behind the scenes.

There is an optimization called on-stack-replacement which I didn't implement in Vaiven that handles this exact case. In Lesson 1 I talked about deoptimization being a hop from two completely different batches of assembly. On-stack-replacement is like the opposite of that, where a loop hops from the deoptimized version into the optimized version without waiting for a neat barrier like a function call site. Asking the Dart/V8 VM engineers about how hard it is to implement, they laughed. They asked me if I thought it was actually useful for real programs, and said the best reason for its existence is simply to speed up all the microbenchmarks people write, which otherwise would completely lose out on all the optimizations the JIT can provide.

Lesson 4: A lot depends on heuristics

In my VM, I have an optimization counter that detects "hot" code on the 10th run. I believe in Dart that counter is triggered on the third run. Why did I choose 10? Because it felt right at the time and I have no way of validating that any changes I make to that are a help or a hindrance.

I probably should have changed it to match the Dart VM, but there's an interesting thing about that as well: the integer three is not much granularity. Think about that. At some point, some engineer sat down and tried 2. They tried 4. They found 3 was better than the others, committed that, and then moved on to the next thing. Maybe they had lunch.

Now, Dart and V8 also have background threads optimizing code, and they have on-stack-replacement as I mentioned in lesson three. But still, it goes to show that the subtle science of picking what code is or isn't hot, at some point comes down to a meager handfull (or less!) of options. And the best option for one program may not be the best for others, though, one has to choose a single value for all.

This comes up in all sorts of places. Inliners need to guess the chances that a certain function is a good candidate or not; if it’s too eager, it will optimize the function in the current context only to have to throw it away. If it’s too cautious, it will skip long functions which would have been constant folded down into small ones and produced sizeable speed-ups to the program. This is speculative inlining, which I haven't even implemented yet in Vaiven. No, I instead have my own heuristic where I guess how much a function will get optimized. Choosing a candidate is a one-way street and my compiler has no safety mechanism to back out if the candidate leads to code explosion.

Part of why I was able to beat V8 in a fib benchmark is that I also tuned a heuristic for recursive inlining. By inlining a function into itself, I'm able to avoid the deoptimization checks on the way in and out. It’s similar to loop unrolling, and it can be a bad optimization that leads to code explosion. But I had fun implementing it, and it’s a heuristic that is valid as long as I have a benchmark telling me that it is.

The amount of heuristics here means that you should necessarily expect inconsistency across versions of your JIT. It means that any cross language benchmarks may be showing you the difference in how your program operates under a set of heuristic choices rather than measuring that language's "speed."

Even register allocation is a heuristic; by keeping data in registers rather than pushing them onto the stack, programs can run orders of magnitude faster. The problem of allocating a continually evolving set of live data into a fixed number of registers (and spilling the rest to the stack) is reducible to a Graph Coloring problem. Which is np hard. So you can be pretty sure that whatever V8 is doing is an approximation of that, with lots of shortcuts.

When everything that runs on the metal is so deeply affected by heuristics, it makes you wish you could write the assembly code yourself sometimes. There's definitely a part of me that wishes higher level languages like JavaScript and Dart offered ways to hint the inliner and register allocator and optimizer and such. However, the plethora of misguided microbenchmarks out there show that if any such options existed, that they'd be misused by people just trying to get work done, which would ultimately be a disservice. The effectiveness of such hints would be subject to all kinds of other wild factors. JITs are not simple machines.

Lesson 5: Dynamic languages can be damn fast

Lastly, I learned that there really aren't that many limits on what a JIT can do. Vaiven as it exists is more like a C compiler than a JS interpreter. Most of the type checking code involves throwing errors, and I kept it that way because it lets me optimize more aggressively afterwards.

Things like bound checks and overflow checks are not inherently very expensive. Sure, they can be, but there are optimizations that JITs perform that move these checks into spare cycles of your CPU while it’s waiting to fetch or store data into or out of memory. Additionally, if you have a language where a function can stop returning numbers entirely (in order to trigger deoptimization), it's often possible to do sneaky tricks like checking that it returned an integer, and that the integer is within a certain range, at the same time. On that note, why not throw in some arbitrary precision? You're often already paying the cost for it, in an amazingly minimized way.

Switching to AOT compilation can in fact make things common in dynamic programming become suddenly very slow. See this blog post about what Ruby developers need to adapt to if they want to get the high performance promised by Crystal.

Sure, there are things that can be "poison pills" for performance, even in JITs. For example, another reason why my fib benchmark beats V8 is that V8 has to continually check if the fib function was redefined as a deoptimization check. I don't allow that in Vaiven, so I can produce faster code. Dart was designed to have fewer of these poison pills, for instance.

There will always be applications for bare metal programming, and C/C++/Rust/ASM will always have a niche to fill for those cases. Specifically, the ability to optimize data structures themselves, and offering the programmer ways to improve CPU cache performance by increasing memory locality, are the types of things that may never be optimizable by a JIT. I hope that one days JIT get us 99% of the way there, by improving our JITs or the languages we're JITting. We'll see, of course, as both of those things evolve over time.

Closing thoughts

Making a JIT turned out to be much more similar to writing a C compiler than I had thought it would be. The amount of code dedicated to normal SSA optimization is much greater than the amount of code dedicated to logging type info, communicating between compilation strategies/phases, and handling code changing out from under you.

It was absolutely a fun experience, and I am hoping to continue making Vaiven a useful  JIT so I can plug it into other things, or just leave it on a shelf with a pretty frame.

I encourage all readers who do or don't use JITed languages to trust them, even when it seems impossible that they could really be that good. They really are (or can be). Throw out your micro-optimizations, and choose a language that will make your code easy to work on and maintain. Focus on algorithmic performance instead of line counts.

In the end, this is just another blogpost backing the age old adage, coined by Donald Knuth, that premature optimization is the root of all evil.

Editing Credits: Dulce Flores Cruz, Wm Leler

Tuesday, March 27, 2018

What I did last year at Google: IDE Tooling for AngularDart 5

I haven't blogged in over a year now, because I ended up joining Google to work on Dart, where I've been slowly easing into my opinionated state while learning from the experts.

I've also been really really focused on this new tooling which I've already announced elsewhere but wanted to share here too!


Dart is a statically typed language which I've grown to really enjoy using, and Dart 2 will be even more strictly typed. That is why it makes such a great platform for the Angular framework, where components are mostly statically linked together in templates to create performant UIs.

So, to further improve the productive and safe developer experience we've created for developers, I got the chance to work on IDE tools for preserving that type-safety inside of AngularDart templates! It requires AngularDart 5, and works out of the box with IntelliJ/WebStorm. It can also be configured to work in VSCode, vim, and more.

It's been a blast to create, and I am so fortunate to have had the chance to do this all open source. Let's see what it offers so far!


The new analysis plugin will find many type errors inside your templates for you. Expressions are validated against the directives you use, the inputs they contain, and the references you bind (#foo, let item of, …).

Here’s a misspelled member, so you don’t have to play human-spellchecker!

And mismatched types on a component input are no longer a problem either.

We can also give you errors related to $event variable types,

And we check the content you transclude inside your directives. This is one example of where we catch not only type errors but also dead code.

And we could go on! We catch a slew of other types of errors, both in templates, and component definitions.

In addition to being highlighted in your editor, the full listing of errors is displayed in the Dart Analysis panel.


We didn’t just stop at validation! While much of the work was getting our analysis to be performant and fully well-behaved in regards to the compiler, much of the value comes from using that resolved template state to deliver you the goodness of regular Dart autocompletion in your templates:

But we can complete more than regular Dart members — how about inputs, outputs, and HTML tags?

Tags? Check.

Attributes? Check.

Stars? Check.

Attributes within stars? Check.

Bonus points: Suggested tags with attributes within component transclusions!

Transclusions are a famously advanced topic within angular, so if you think you know what exactly it is that we’re suggesting, pat yourself on the back for knowing AngularDart very well.

Suffice it to say, component APIs are much easier to follow when using our plugin!


Support here varies a lot more by editor, but within IntelliJ you can click through parts of Dart expressions. In other editors you may be able to navigate on even more entities such as tags and inputs/outputs.

How To Use

Make sure you are using an AngularDart 5 beta release, and Dart 2.0.0-dev.31 or newer.

Simply add this to your analysis_options.yaml file:
      enabled: true
And restart your IDE. Note, it may take a few seconds to spin up the plugin. It will first have to download sources & dependencies for you, and the first analysis will be slower than subsequent ones.

You can also play around with it in our analyzer plugin demo project.

If you are using IntelliJ, this should be all you have to do. For other editors, they may not run our plugin on HTML without extra work — for instance, VSCode requires a flag.

Closing Thoughts

If you have never tried the Dart language, and enjoy or want to explore the angular framework, I definitely recommend AngularDart which is very popular within Google. For anyone just learning, I hope that the IDE tooling here can play interactive-tutor to get you off the ground quickly!

Our plugins source code lives on Github, which has more information on what’s supported and where you can file issues if you have questions or come across any bugs.

The new IDE tooling will be part of the next stable release of AngularDart (v5). It's been a blast to work on offering even better productivity for the AngularDart community.

And I hope I check in here with updates more regularly since I'd say this was a pretty interesting last year and a half!

Thursday, January 19, 2017

Thank You For Admitting on the Internet...the Stupid Things You've Done (3D Printer Edition)

I'd like to keep this blog post a short and sweet, I mean long and winding Thank You, to everyone who's ever admitted somewhere on the internet that they've done something stupid. One such person saved me a lot of hassle and a lot of money on my Printrbot Simple Metal.

The exact issue I had? Mysteriously, it would stop two to three layers into my prints. Are you having this issue? Please don't read my rambling. skip to the bottom for the solution that worked for me. Oh, and then come back and read the rambling because I worked hard on it.

I just moved to San Francisco from Portland to work from Google a few months ago. In the process, I got full custody of the 3D printer that my brother and I had bought together. Woo hoo!

A few months in, I start really getting the itch to print something again. I know, months. I'm not really a hardcore user. That will be obvious by the end of the post. Though I had a few things going against me: I had recently ruined my Fedora install in a botched attempt to install 32 bit libs (which had my slicer, modeler, favorite settings...) and an even more botched attempt to fix the failure. If I were to get that working again, I had other more pressing things to do with more important data on that disk. I didn't have a lot of time because it as a new job. And even where to put the printer was up in the air. No, not literally. It ended up on a desk.

Side note, the best part of unpacking our movers' good work once we got settled in the right place in San Francisco, was that everything was labeled by room & general contents: books, dishes, clothes. No, that's not the best part. The best part is the 3D printer was in a box labeled "complicated machine." Which it certainly is. They did their best to pack it well, but the filament strand into the printer broke, leaving likely all kinds of twists in the spool during hundreds of miles of bumpy transport. I had just fixed a twisted spool of pla, and its the simple things that discourage you. Still, better than when the UPS people delivered it to us with the words "THIS SIDE UP" in big letters and completely upside down.

So I officially missed the Christmas season for printing presents and decorations, but alas, there's always one next year.

So two days ago I found the cables, installed the modeler and slicer anew and designed my model. I went to print, and a few minor hitches later got underway on breaking the fast that is my 3D printing hobby. Gotta love breakfast.

But two layers in, it just stopped. Shit.

I of course first simply tried again. Still shit.

USB connection? Nope -- tried it on a microUSB card and it had the same issue.

At this point I reasoned, it could be an issue related to the gcode. Either corrupt gcode from the slicer, or a hardware failure that happens for a particular instruction. Except, no, because the same exact print would fail at different times.

I mean, I'm a smart person usually, and I did a bunch of smart things.

I double-checked that no wires were loose coming into the board. I think that was really smart of me. I did indeed later find that as one of the official instructions for what to do when you have this problem. I smiled that I had beaten that advice to the punch.

But still, no luck. I had already, at this point, dialed up the cost of a replacement board, and a replacement power supply. $80+ was not what I wanted to see. But sometimes that's just how 3D printers roll.

So just before calling it quits, and ordering a part, I decided to read through a long long long long long (like, longer than this blogpost telling one stupid dumb story long), long forum thread about people having this issue. And by the grace of God, there was an idiot. On the thread. An idiot who...admitted it.

Blah blah blah blah, "and oh wow do I feel dumb" were his words. Basically, he had two different power supplies under his desk. One of them was the right one. That is not what he plugged in to his computer.

With less power available, it can only keep the hothead hot and the servos spinning at high speed for so long. Start a print, and you're ticking down until the powersupply is overwhelmed, its voltage drops, and then suddenly...the board crashes. And then recovers, because the power requirements immediately go down. Tries to print again.

At this point it is absolutely crucial that you reread this already really long blogpost about one tiny thing, from the top. You don't want to? Really? Ok, I'll give you a spoiler; I mentioned something about unboxing the cables...

Turns out my Logitech GT25 or something racing wheel has a DC power supply with the same connector as my Printrbot Simple Metal. So that's what I had grabbed instead of the right one. Much like my equally dumb friend on the internet.

Did my dumb friend on the internet have to post that he was dumb? Did he have to post his solution at all? Couldn't he have worded it in the highly-ignorable-but-face-saving-manner of "just check the right cable is plugged in, you jerks!" Yes. And unlike most people on the internet, that's not what he did.

He did something dumb that day and he admitted it.

I did something dumb and I'm admitting it.

The next time you do something dumb you should post it on the internet, too. You might just save someone $80+, time, stress, and the hairloss that comes with it. We're all stupid, and it helps to admit it.

Tuesday, June 21, 2016

Else If Is Not Special (Except in Python)

Prepare yourselves, for I am about to make a code formatting argument, which is almost always certainly a waste of time. But this formatting argument is rooted in a truth of how programming languages work, and as a programming language designer, I just feel too damn compelled to speak up.

Elif Hell

OK OK, so you want to know why else if is not special, probably about 101x more than you want to know how I happen to horizontally and vertically align my code using spaces, tabs, or god forbid a mixture of both. I don't blame you. But forgive me delaying the good parts for just a moment, and let me plunge you into the deep dark depths of code formatting hell. Hell is fun too right?

This is a simple hell, but imagine, if you will, that else if had not yet been invented.

Even in the real world today, using lots of else ifs is an antipattern. But here in elif hell, lots of else ifs is a cause of tears, stress wrinkles, and death.

if (statusCode == 200) {
  // ...
} else {
  if (statusCode == 201) {
    // ...
  } else {
    if (statusCode == 202) {
      // ...
    } else {
      if (statusCode == 203) {
        // ...
      } else {
        // Nesting Continues Forever In Elif Hell

In this hell, for anything more than three else ifs you can get fired unless you resort to needlessly complicating the runtime in order to not require else if. Imagine, someone writing a library that does for elif hell what promises did for callback hell. In this case, the runtime is not the problem and a runtime solution would be crazy.

But before you start praying to the elif gods for salvation, you can take in a breath of fresh air, because in most languages, elif hell could not exist.

Defining If And Else

The standard definition of if/else looks like this:

ifstmt -> 'if' '(' expr ')' stmt_or_block
       | 'if' '(' expr ')' stmt_or_block 'else' stmt_or_block

Now, I wouldn't be able to call myself a language designer if I didn't tell you that this has an ambiguity which will make parser generators rear their ugly heads and tell you it has a shift/reduce conflict. The problem is related to allowing statements after if conditions, allowing the parser to see if(...) if(...) if(...) if(...) else ... In this case, the parse rules above could in theory match the else against any of the four if statements that had been opened. This is called the "dangling else" problem.

In practice, choosing to match the else to the last opened if statement (shift over reduce) works like a charm, which is how c, c++, java, php, and javascript, etc all work. This approach has been working for decades. Others demand blocks as if it were the only way to catch the infamous "goto fail" bug, a request I both sympathize with and intend to slightly complicate later.

So long as else may be followed by a stmt_or_block, elif hell cannot exist. The following code is, by these simple rules, a natural phenomenon which parses down according to how I commented it.

if (statusCode == 200) {
  // within an if
} else if (statusCode == 201) {
  // within an if within an else after an if
} else if (statusCode == 202) {
  // within an if within an else after an if within an else after an if
} else if (statusCode == 203) {
  // within an if within an else after an if within an else after an if within
  // an else after an if
} else {
  // within an else after an if within an else after an if within an else after
  // an if within an else after an if

Although it gets complicated in the final parse tree, code which is parsed recursively need not be formatted in such a way. This is absolutely how else ifs should be formatted, not nested forever and ever, and this style springs right out of the ground based on the most intuitive definitions of else/if, no help needed.

There is nothing special here. Usually. (I'm looking at you, python, bash, ruby...)

An Unnamed Hell We Live In Today

There is a lesson to be learned here, from a language designer's perspective, though I do not know who to give credit for seeing through the stigma and finding it. All I know is I first saw it in action while browsing through the code of gnome-terminal.

Although else if seems like a first-class language concept, it absolutely is not. It is just a combination of two language constructs that serve a purpose together. And let me tell you, else and if are not the only constructs with useful combinations.

Would you enjoy a language that had a loop where feature? An if then loop feature? A loop case feature, or a loop loop feature? Would you be interested in someone like me designing one?

Programming is all about combining constructs, and these constructs are ripe to be combined. They are literally...figuratively begging you to combine them. And the general programming community has sworn itself into some kind of hell we don't yet have a name for, where we write code recursively with nesting where we need not.

With that grandeur, I present code formatted in a way that you will probably hate.

// loop where
for (item in items)
if (item.isHub()) {
  // ...

// if then loop
if (items.containsHub())
for (item in items) {
  // ...

// loop case
for (item in items)
switch(...) {
   case ...
   // ...

// loop loop
for (item in items)
for (tokens in item.getTokens()) {
  // ...

These statements are no different from an if else, and are waiting eagerly to be used.

While you may not find them palatable at first, I implore you to remember how you first felt about camel case, or snake case, or curly braces, or whitespace significance, or spaces around your operators. There must be some type of formatting that you remember hating at first, which has grown on you since then. I urge you to give this formatting style half the chance you gave your last preference change.

Among the reasons of why this formatting style really is worth considering, is that the lack of nesting not only looks simple, but it calls out the simplicity of how the two control statements are 100% linked. That is, 100% of the code within the inner control structure makes up 100% of code within the outer control structure. Once delimiters are added and indentation is used, that should signify that that type of linked relationship is not true, and the code requires more careful reading.

I do not expect this formatting style to win, because the fact is that it's already out there, and not spreading, not catching on. And who am I to call that a loss for us programmers?

Well, I would ask in turn, who are you to not give it a chance?

Obligatory Why-Did-You-Rip-On-Python-And-Ruby Addendum

How Python and Ruby etc were designed is really not the goal of this blog post, but it is worth calling out and exploring the unexpected consequences of their design choices. After all, every language design choice is a tradeoff, and it is hardly insulting either language to explore that.

As I said earlier, else if actually is special when else cannot be followed by a statement instead of a block. Python and ruby have specific parse rules designed to handle else ifs to prevent elif hell, and that's part of why they don't use the keywords else if but rather elif and elsif. For the same reasons, in Python and Ruby, you cannot write if for, or for if, etc., without using unnecessary nesting, or waiting for them to be manually indoctrinated in the language.

In Python's case this trick simply cannot work, and perhaps that is OK, as whitespace significance does carry some inarguable benefits. In Ruby's case, it is more an expression of how programmers prefer keywords to symbols, so much that curly brace languages have failed make them required due to "verbosity" while keyword based languages have managed to convince everyone to type three alphabetic characters to end everything in the pursuit of readability.

What gets in the way is how these languages try to do us a favor. It does seem to myself and to many people that allowing if statements and else statements without blocks is an unnecessary risk. You risk running into the goto fail bug, and in fact, I myself never omit curly braces except when combining constructs as listed above.

In any case, we need not throw the baby out with the bathwater. Perhaps if the languages which support this formatting style already begin to use it and champion its benefits, then maybe our next round of languages and our current generation of linters can be made to accept control flow combinations but not lone statements otherwise, and maybe we will be all the slightly better off for it.

In either case, for most languages, else if is not special, and perhaps the trend of treating it as such is a step backwards.

Sunday, November 29, 2015

Going Open with "Encase"

I've been pondering for a while a concept for a new language. Partly its interesting to me because I want to get more diverse compiler experience than just Wake's set of problems,and partly because its a program I know I would use.

I'm going do something that I didn't do with Wake, that I think in hindsight is actually the best way to start a language -- I'm going to make it open and public from day one.

Now that wake has been officially "alpha released," there are still mountains to do with it. And just the same I have seen interest and I feel like I have made a positive impact on programmers who even just ask the question, "what problem is wake trying to solve?"

With that new experience (along with some advice that its never too early to publish an idea for an open source project), I wanted to share what little I have to say on it yet, and invite you all to critique it -- or, if it ever makes it into other developers hands in a serious way, it will serve as a cool journal of how Encase came to be.

A publicly commentable specification draft of Encase can be found here. It is still a work in progress.

What problem is Encase trying to solve?

I love the command line interface, and one of the reasons why is the magnificent composability of bash. Naturally I have a sort of terminal-formatting fetish, where I'm constantly trying to make my programs look as cool as possible.

But the two concepts are actually very separate. On the one hand, you have awk, cut, and xargs, which cooperate as beautifully as a centaur hugging a unicorn, and on the other hand you have top, less, and vim, which are a set of distant vacation spots. They look great by themselves but you can't be in both Sicily and Venice at the same time. These programs are built in ncurses, a complex GUI framework for the terminal.

I believe that ncurses was not *nix-y enough, and that it is the choice of making ncurses a library that separates the top from the less. I believe that if we could make ncurses into a command in just the right way, you could awk your less and maybe even less your top.

My approach with Encase will be to adapt ncurses into a language. You can write a program that simply spits out a mix of content and Encase, and pipe it into any program you choose, usually that program being an interpreter that renders it for you live as your program runs.

Many immediate uses come to mind.

top # view the encase code raw
top | encase # view it how humans are meant to see it
top | strip-encase > computerFriendlyLog # save JUST the actual content to a file
top | encase-record > topReplay.ncs # "record" a specific run of a program
top | | encase # completely re-skin an Encase app??!

The last one specifically may be a stretch; but the great thing about ncurses as a language is it would suddenly just be strings. You could trivially, say, replace all instances of 'red' with 'blue', all borders with one twice as thick, etc etc.

As an added benefit, you would be able to run curses in any language without any required re-learning: python, bash, c, c++, php, ruby, javascript. No new APIs to learn, no special porting or required compilation flags (looking at you, PHP).

And last but not least, if done well it would make a sleek CLI program interface a breeze, no longer requiring you to look up the right ANSI escapes, and in a much more tailored way than a library (ncurses) could ever provide.

What inspired the (perhaps temporary) name?

At first I was going to go with Curse as a blatant ncurses reference, but, I thought it was too negative and too similar to ye olde curses. After all it will be a very different beast in the end.

I then thought about using "jinx" as a synonym of curse, but wanted a stronger hat tilt to ncurses than that. I almost chose "njinxes" but it was very similar to the web server nginx.

Playing still more off of ncurses, I found myself saying "ncase" when trying to play off of its phonetics. Combined with the layout engine inside it that "encases" your content, I decided that "encase" hit on the marks of inspiration, coolness, and brevity.

Some Examples

While literally every idea I'm about to present is subject to change, and/or may never see the light of day, it wouldn't be very "open" of me to tell you the goals (and even describe the name!) without showing what I had in mind.

First of all, one goal is to make it very easy to ignore most (if not all) Encase code if you want to. For this reason, I've decided to make every line of Encase begin with a standard sequence. The rest is output.

$$ this is some Encase code
this is raw output

I chose $ because it is (hopefully somewhat uniformly) rare, and also does not require an escape. I almost used backslash for its rarity, but didn't want it to be heavily escaped in other languages (ie, println("\\ your Encase code")). I chose to use multiple $s so that the chances of ever needing to worry about escaping your output is really really low.

Beyond that, it should have imports, and it should be turing complete (so that your turing complete custom UI components can be shared across languages).

$$ import alert
$$ n + 1 times alert "hello" // turing completeness

Here we assume someone has written a plugin in Encase which creates momentary popup alerts. We can import it (provided its installed on the right path...) and then use it. Notice that we're using a variable (n), doing some math (n + 1), and looping (n + 1 times ...).

I have a feeling the times construct will work very nicely in creating formatted output. I included it in Wake as part of the standard library and liked it; its a great example of the type of constructs one would find in a language tailored to string formatting.

Writers of plugins (such as this alert plugin shown above) would be able to leverage a box layout engine to create the effect of windows, panes, buttons, and more. They would (ideally) all be easily describable in Encase itself even if the layout engine they used was a native layer.

$$ format window title
$$  open border
$$      open horizbox
$$          height 2
$$          open centerAlign
$$              name titleSection
$$          end
$$      end 
$$      open horizbox
$$          open scrollbox
$$              name contentSection
$$          end
$$      end 
$$  end 
$$  render content
$$      use titleSection
$$      clear
$$      print title
$$      width times print "-" 
$$      use contentSection
$$      clear
$$      print content
$$  end 
$$ end

Here we would create a format named window, which would require the title as a parameter. The whole box is inside a border format (which would be equally simple and just draw a box around whatever inner content it gets). Inside that border, we create a horizontal box containing the title and its dividing line, then below that a scrollbox. We name those sections for use later.

When we say open border we can print to it and put other formats inside it. Same thing goes for our window. This is why we say render content, we are going to describe how to render or rerender a window whenever our content or sizing has changed. content in this case is a variable but its quite nicely readable when content is the chosen name.

The rendering engine would buffer this in memory, and at a fixed frame rate it would diff the previous state with subsequent states, performing as few and as optimal updates as possible to keep performance high.

In terms of the layout engine itself, there are many papers on some layout engines and their time complexity. I have yet to choose which I'll use but I'm sure a good one exists.

This leaves us in a nice place to design a layout, name its components, and then as our app runs we can switch panes, change their contents. We should be able to add and remove components on the fly to create very dynamic interfaces.

Problems I'm Still Thinking On

This section will be very disorganized! Muddle through as long as it holds your interest.

Unlike most GUIs, I am deliberately running the visual code in a separate process than the application. This makes conventional two-way binding and/or event handler concepts more or less impossible. For a while I thought my biggest problem would be managing scroll states, but I did realize that I can rely on the mouse (where supported by the terminal) to handle scrolling. Still, this leaves a burden on the app developer to performantly follow the state of the visuals and update them as needed, capture keystrokes and form navigation themselves. If you wanted to add click-to-select-text-box functionality, you'd be out of luck.

I think this might be elegantly solvable by changing top | encase to encase top, and let the encase interpreter call top to watch all I/O. However I'm not sure how exactly this would all work out. Perhaps a fifo streams events into the inner process? Still an unsolved issue. Remember too that the goal is to make programs composable; how could such a fifo be made simple to use in your own way?

I'm also not entirely sure what to make of multithreading. And I don't mean multi threading in terms of efficiently rendering the layouts on a multi core processor, I mean a threaded program where each thread spits out encase code that works perfectly on its own, but doesn't work perfectly when the lines get all jumbled up. On the one hand multi threading support would be difficult to seamlessly add (obviously the interpreter would have to be threadsafe, and have multiple states per thread all interacting at once...). On the other hand it might be clunky. Every thread would need to have its own prefix, and going into no-code mode for printing content would be ambiguous. If Encase ran all lines of code on one thread, anyone doing parallized output would need to manually synchronize their updates into sensical batches of code -- not exactly a minor quirk.

I'm also kind of curious about late binding variables. Notice in the window example, we have a title that is passed in when the window is created. What if you want to change the title? We would probably add in some functions specific to windows that could be called whenever you are inside a window box. But it would be cooler if there was a slick way to make it automatically rerender the title of the window whenever the original variable it came from was changed.

I'm also thinking pretty hard on what types of mistakes would be common to code in encase as its written. Anythnig I can do to avoid confusion (forgetting to end a window, forgetting a rerender function, using formats in the render function instead of the format body and getting lowered performance, etc etc) and make it less obvious. There will be no type system to catch this stuff for the users!

What I'm Excited About

Mostly I'm excited to try to add JIT aspects to the compiler. It may be a challenge for me to get it right cross platform and everything, but I think this is a great example of a language that could use a JIT (all that rerendering from scratch in a necessarily dynamic language!), and I would have a blast even optimizing a few things for a specific architecture.

I'm also excited in that it seems like it would be very easy to whip up some clones of some classic unix utilities like nano, top, and/or some bigger fish like a simple vim clone. It sounds like a cool project...maybe even make a Roguelike or two!

Either way (if I make it and its as good as I hope, or not) it has been fun to think about, and I invite you to comment with your thoughts/suggestions.

Sunday, February 1, 2015

The Incredible Rewards Of Creating a Programming Language

Many of us enjoy seeing programming languages pop up that we might one day use, especially with cool ones like rust (memory safe without GC) and avail (extremely literate programming).

But I would argue that many of us programmers, whether we like these trending languages or not, should start putting time into programming language infrastructure. That infrastructure is what enables us to do what we do every day, and the work is incredibly rewarding.

For the past year I have been creating the programming language Wake, and it is indescribable how cool a side-project it has been.

Language creation forces you to think like both a programmer and a computer at the same time. You will learn things about your favorite programming languages whether you create something on your own, or contribute to a powerhouse like java, or help a new kid on the block like rust, or join the Wake team.

I'm not the only person to think that making compilers and compiler tools can turn you from a great programmer to a best programmer. Here's my story of discovery and accomplishment in creating Wake.

The First Wake Program

When I first started making Wake, I doodled away on a text file, making little syntaxes and commenting them like a madman writing on his padded walls.

I wanted Wake to be a testable, statically typed, compiled language, ensuring any and all code you write will have testable seams via built-in dependency injection. This turned out to be a bit of a challenge for creating the first running wake program, since I had to make a parser, and a code generator, a semantic analyzer, and a dependency injection library all in one before I could so much as print "hello world."

I do not have a formal education, so I turned to Coursera to learn the basics of parsing, type checking, and codegen before getting too far along. I highly recommend it to you too, for getting started.

And at long last, I was able to write, compile, and run the first ever Wake program. It felt like I had created life.

every Main is:
    needs Printer;
    main() {
        Printer.printLine("Hello World!");

This actually ran on my computer. And just seeing the two most overused words in all of computer science, it actually felt like Wake was opening its eyes for the first time, and letting the world know it was here.

Playing With Optional Names Instead Of Optional Types

One of the biggest experiments in Wake is making variable names universally optional in lieu of type information.

Rather than relying on semantic-restricting and/or fallible type inference to keep programs clean, I set to work on a simpler idea: that types are often a great name for variables, which both the programmer and compiler can use to easily and effectively understand most common code.

As Wake became more complete I began writing unit tests in Wake that tested Wake, and found myself constantly stopping to admire my work instead of finishing it. Reading these test let it sink in that I had created a whole little language of my own. Just by skimming these little tests it was clear that they could only be Wake. And even cooler? They all passed.

Notice how not a single variable in this code has a name. They all only have types.

import Asserts;

every AssignmentsTest is:

    testPlusEquals(Asserts) {
        var Num = 5;
        Num += 0;
        Num += 1;
        Num += 5;

This is probably the most controversial idea in Wake. It gets the most positive feedback of any feature when shown in person, and the most negative feedback of any feature when shown on the internet. Don't worry, variable names are allowed absolutely everywhere, they're just, surprisingly, often not that useful.

Its been a staple of Wake syntax, and while I've run into a few syntactic hitches in emulating languages that keep the two as separate as oil and water, I know to ignore the haters because it works, exactly as well as I always dreamed.

This idea also led to the best happy-accident in Wake's design. Months and months later, I realized one day that thanks to this simple idea, I could give Wake the tightest loop syntax ever. In disbelief I began writing yet more working code that reads like a book. See how the compiler automatically changes the array variable WatchedArgument[] to its singular counterpart WatchedArgument for me.

        foreach(WatchedArgument[]) {
            if(!Text[].contains(WatchedArgument.getWatchedType())) {

and it runs. I still, to this day, get an incredibly warm feeling when I review code like this.

A Big Boy Language

Since running loops with an automatic variable name felt so cool, I had to keep going. At this point, I'm just plain hooked.

But not every feature needs to be a brand new idea to feel that mix of pride and power. In fact, that feeling is no more intense anywhere than when I look back to making generics, lambdas, and type inference.

At this point, simply parsing the generics felt like little kids' stuff. But writing the code that understands them, and validates them, with micro transformations based on context...and thinking endlessly about the subtle effects of generic programming...that process is completely unique.

Generics in Wake look much like those in Java or C#. Excuse this example for being a bit long, its an excerpt from Wake's mocking framework wockito. Specifically, this code tracks what return values have been stubbed for a method on a mock, which is a perfect place for generic programming.

every When{T} is:

    with T[] returns = [];
    with Num = 0;

    When{T} -- thenReturn(T) {
        return this;

    T -- get() {
        var T? = returns[?Num];
        if T exists {
            Num += 1;
            return T;
        } else {
            Num = 0.orMaxOf(Num - 1); // go back one, stopping at 0
            return returns[Num]; // will throw when we stopped at 0

I had honestly never expected Wake to get this far. There's still work to be done on generics; issues which I've tackled as the standard library development demands it. But the fact that this code compiles and runs still blows my mind. Some day I will post about all the subtle quirks of generics I've realized, simply by being forced to implement them. I'd wonder how Java handled these corner cases, and write up a little program to test it. And every time, java simply didn't. Its the type of understanding that type-system geeks brag about, and writing compilers forces you to discover these things if you don't know them already.

Interfacing With Javascript

We're still working on the best ways to interface Wake with javascript and other languages, and having closures simply couldn't (and shouldn't!) be avoided.

Closures were yet another far off, likely-never-to-be-implemented sort of idea, that led me down a crazy path of type inference research and even refactoring the typechecker into a different design pattern, but as of a week ago I readied everything to cut the first release of Wake that included closure support.

These are the first anonymous functions written into the wake-on-wake test suite, including type inference in all of its glory...and yet I immediately had a problem.

    ~[test list.filter]~(Asserts) {
        var Text[] = ["hey", "hey", "man"];
        Asserts.that(Text[].filter({ t -> return true; }).getSize())Equals(3);
        Asserts.that(Text[].filter({ t -> return false; }).getSize())Equals(0);

        var filtered Text[] = Text[].filter({ t -> return t != "hey"; });

I could make List{T}.filter(), and .any(), and .sort(), but I could not make List{T}.map(), or .foldr(), or .foldl(), without creating yet another feature in a category of its own.

Generic Methods

Basking in my pride over turning out closures with type inference in such a short amount of time, I decided to try my hand at generic methods before releasing. I simply didn't want to have stupid caveats in my release notes.

It would seem at first that generic methods are very similar to generic classes, and to some degree this is true. After all, List{T}.map() could be written with generic classes only without too much hassle.

every ListMapper{T, R} is:

    R[] -- map(T[], R--fn(T) mapper) {
        var R[] = [];
        foreach(T[]) {
        return R[];

But this solution requires injecting a ListMapper of the right type into your classes, which is simply not good enough for my dream language.

To make the leap from generic classes to generic functions required writing three new recursive algorithms in the type checker, and some tweaking to work with type inference. Here's how it looks now:

every List{T} is:

    {R} R[] -- map(R -- fn(T) mapper) {

And calling it:

Text[].map({ t -> return t.getSize(); });

In our type signature, we have to save R differently than T, because one is essentially a wildcard where the other is just a fancily-tracked Object. The distinction means that within map() you can recurse into map() with any value for R that you want, but T must be the same. Then from outside of map(), T is actually bound to something, for instance, Num in Num[]. And to make matters worse, T can be bound to another generic type or argument.

warning: things are about to get like way detailed

So here's what we actually do:
  • save R as a 'generic argument'
  • typecheck the method as if all Rs are Objects, but only compatible with other Rs
  • upon use of the generic method, typecheck each argument so we know what type was actually passed in (i.e. Num)
  • recurse over each argument's signature type, and the passed-in-type, alongside each other
  • check the inside of nested types, such as lists, lambdas, and other generics
  • bail out upon any structural differences between the signature and actual types (i.e. passing Num[] into R?)
  • whenever we find a generic type, save it by name (i.e. R) to the actual value passed in (i.e. Num)
  • never allow two definitions of these generic types within one structural traversal (i.e. passing anything into Map{R, R}). Instead, assert the redefinition is the same as the previous definition.
  • copy the signature, and rewrite all Rs with the type that we found
  • go on to the next argument
This is the basics of what is required. As a final detail, by waiting to check arguments with lambda declarations until the last step, we can figure figure out that

Text[].foldl({ t, acc -> return acc + t.getSize(); }, 0);

has 0, a Num, passed into the accumulator, of type R. Therefore we take the type of the first argument (R--fn(T,R)) combined with our generic bindings (T=Text and R=Num), to infer the argument types accordingly.

I apologize for overly detailing an algorithm you've probably never cared about before, but I hope I've given you a new sense of appreciation for the types of challenges that come up when creating a compiler, and just how fun those problems can be to solve.

So Where Is Wake Now?

Wake just released v0.1.0, and there is so much more we want it to be able to do.

A lot of the dream features are in, so we're focusing now on the standard library, solving bugs, and javascript integration.

There is still a huge amount of flexibility in where we take it, and some great challenges left to solve. Integrating Wake smoothly with javascript and more is no small challenge, and we warmly invite you all to join us in having a blast designing the interop layer.

There is always a desire for more feedback, and some awesome pending features like currying and recipes and type tags. We have a roadmap describing some of it here:

Of course, if Wake isn't the perfect project for you, then make something on your own, or hack on the Rust compiler, or write some static analysis tools. The realm of possible projects is massive.

So Go Hack On A Compiler!

You'll learn more than you imagined, experience phenomenal pride in your work, and thoroughly strengthen the way you read code.

Friday, January 9, 2015

The Visitor Pattern vs Object Composition

Over the last week or so, I've been going down a large refactoring of the wake compiler. And in all the hours I've spent rewriting the parse tree's data structure, I have yet to choose between two design patterns. Its possible that there is already a third that unifies the two, and if not then I'd like to share this problem as a little hole in computer science.

The design problem is as follows: say you want to traverse a binary tree.

We'd probably do something like:

class BinaryTree {
    Object item;
    BinaryTree left;
    BinaryTree right;

From here you have two choices. Either create your algorithms as methods on the tree:

   public Object findByPath(int binaryPath) {
       if(!binaryPath) return item;
       else if(binaryPath & 1) {
            return left.findByPath(binaryPath << 1);
        } else {
            return right.findByPath(binaryPath << 1);

or you could create a class for each algorithm that operates on the tree.

class BinaryPathFinder {

    public Object findByPath(BinaryTree root, int binaryPath) {
         if(!binaryPath) return root.item;
         else if(binaryPath & 1) {
             return findByPath(root.left, binaryPath << 1);
         } else {
             return findByPath(root.right, binaryPath << 1);

This could at times be the better approach -- a binary tree has many types of algorithms that could be performed on it, so this separates the algorithms from the data structure and allows users of your API to perform operations on your data that you didn't forsee.

But our tree isn't very typesafe right now. We'd be able to, for instance, set only a left node, or a right node, and/or set both nodes and a value at once. Maybe that's expected, but maybe not. Solving this typesafety is where we create the key differences between the Visitor pattern and plain 'ol object composition.

So, we can easily make it typesafe by going with the first approach:

interface BinaryTreeNode {
    public Object findByPath(int binaryPath);

class BinaryTreeLeaf  implements BinaryTreeNode {
    Object item;
    public Object findByPath(binaryPath) {
        return item;

class BinaryTreeFork implements BinaryTreeNode {
    BinaryTreeNode left;
    BinaryTreeNode right;
    public Object findByPath(binaryPath) {

But the problem is that we cannot decouple this data structure from its algorithm without actually worsening our typesafety with reverse casts and instanceof checks, turning BinaryTreeNodes into BinayTreeForks and BinaryTreeLeafs. In fact, without our algorithm's method 'findByPath(int)', our BinaryTreeNode interface would be 100% empty and 100% useless.

The Visitor pattern is a simple but clever solution to this problem.

interface BinaryTreeNode {
    public Object visit(BinaryTreeVisitor visitor);

interface BinaryTreeVisitor {
    public Object visit(BinaryTreeLeaf leaf);
    public Object visit(BinaryTreeFork fork);

class BinaryTreeLeaf implements BinaryTreeNode {
    Object item;
    public Object visit(BinayTreeVisitor visitor) {
        return visitor.visit(this);

class BinaryTreeFork implements BinaryTreeNode {
     BinaryTreeNode left;
     BinaryTreeNode right;
     public Object visit(BinaryTreeVisitor visitor) {
         return visitor.visit(this);

class BinaryTreeFindByPath implements BinaryTreeVisitor {
    public __construct(int binaryPath) {

    public Object visit(BinaryTreeLeaf leaf) {

    public Object visit(BinaryTreeFork fork) {

Note: this may technically be a variation from the visitor pattern. The visitor pattern is often used to attach visitors to the tree in one step and then operate on it separately, while this does both in one step.

This visitor is more complicated here, because it needs to track the state of binaryPath, but the important thing is that its typesafe and decoupled. Especially assuming a stateless algorithm, this could once again be the better choice.

But there's an advantage to the non-visitor solution that gets lost once we make that change. Object composition allows us to create invisible little nifty wrapper classes around BinaryTree. For instance, we could write a quick little caching layer.

class BinaryTreeCache implements BinaryTreeNode {

    public __construct(BinaryTreeNode wrapped) { ... }
    public Object[] cache;
    public Object findByPath(int path) {
        if(cache[path]) return cache[path];
        else return wrapped.findByPath(path);


This allows any single node (leaf or binary) to be cached, without caching all of them. You could use inheritance here, but you'd have to choose whether to extend BinaryTreeFork or BinaryTreeLeaf. With a composition based approach, you can extend a leaf, a branch, or one of your other wrapper classes, all seamlessly.

Notice that this behaviour would not work with the visitor pattern. Without extra code for detecting a change in the visitor, you would return an outdated, possibly nonsensical result. This is because we have no longer fully decoupled behaviour from the tree, and it may step on what is supposedly the visitor's job. It's supposed to be the visitors job to handle these behaviour changes, but the visitor can not easily add caching over individual nodes in as nearly elegant of a way.

You could definitely try to mix the two patterns naively, and for some things it would work. For instance, you could make a BinaryTreeNode that lazy-loads its contents from a file, which the visitor would be able to traverse without issue. But any tweaks to the behavior of your algorithm would not be guaranteed to play nicely with the visitor (timeouts, threading, probabilistics, distributing across servers).

Essentially, the object composition approach gives you the ability to separate how to configure an algorithm from the actual moment it runs. If you want a cache at every three layers and a distributed answer for every five, you can put all those if-statements into code that builds or transforms the tree, instead of putting those if-statements inside the code that operates on the tree.

Putting my rambling about the greatness of object composition aside, the advantages here specifically include:
  • Decoupling edge-case code from main use-cases
  • Separating configuration step from operations step
  • Code-driven configuration for maximum flexibility
  • Infinite layerability of edge-case wrappers
  • Easy to extend behaviour without breaking SRP
  • Small classes
  • Possibility of tree transformers that transform your configuration

By giving the visitor the responsibility of behaviour instead of the nodes, we need to be able to compose the visitor in order to compose behavior in these neat ways. But if we wrap or extend the visitor class, we end up changing the way that every leaf or branch is used. I can't see any easy ways around this.

So the question is, could the visitor pattern be upgraded to allow object composition to change the behaviour of the algorithm on a node by node basis? I'd welcome the solution, and it would live happily in the belly of the wake compiler to stand the test of time.