Tuesday, June 21, 2016

Else If Is Not Special (Except in Python)

Prepare yourselves, for I am about to make a code formatting argument, which is almost always certainly a waste of time. But this formatting argument is rooted in a truth of how programming languages work, and as a programming language designer, I just feel too damn compelled to speak up.

Elif Hell


OK OK, so you want to know why else if is not special, probably about 101x more than you want to know how I happen to horizontally and vertically align my code using spaces, tabs, or god forbid a mixture of both. I don't blame you. But forgive me delaying the good parts for just a moment, and let me plunge you into the deep dark depths of code formatting hell. Hell is fun too right?

This is a simple hell, but imagine, if you will, that else if had not yet been invented.

Even in the real world today, using lots of else ifs is an antipattern. But here in elif hell, lots of else ifs is a cause of tears, stress wrinkles, and death.

if (statusCode == 200) {
  // ...
} else {
  if (statusCode == 201) {
    // ...
  } else {
    if (statusCode == 202) {
      // ...
    } else {
      if (statusCode == 203) {
        // ...
      } else {
        // Nesting Continues Forever In Elif Hell
      }
    }
  }
}

In this hell, for anything more than three else ifs you can get fired unless you resort to needlessly complicating the runtime in order to not require else if. Imagine, someone writing a library that does for elif hell what promises did for callback hell. In this case, the runtime is not the problem and a runtime solution would be crazy.

But before you start praying to the elif gods for salvation, you can take in a breath of fresh air, because in most languages, elif hell could not exist.

Defining If And Else


The standard definition of if/else looks like this:

ifstmt -> 'if' '(' expr ')' stmt_or_block
       | 'if' '(' expr ')' stmt_or_block 'else' stmt_or_block


Now, I wouldn't be able to call myself a language designer if I didn't tell you that this has an ambiguity which will make parser generators rear their ugly heads and tell you it has a shift/reduce conflict. The problem is related to allowing statements after if conditions, allowing the parser to see if(...) if(...) if(...) if(...) else ... In this case, the parse rules above could in theory match the else against any of the four if statements that had been opened. This is called the "dangling else" problem.

In practice, choosing to match the else to the last opened if statement (shift over reduce) works like a charm, which is how c, c++, java, php, and javascript, etc all work. This approach has been working for decades. Others demand blocks as if it were the only way to catch the infamous "goto fail" bug, a request I both sympathize with and intend to slightly complicate later.

So long as else may be followed by a stmt_or_block, elif hell cannot exist. The following code is, by these simple rules, a natural phenomenon which parses down according to how I commented it.

if (statusCode == 200) {
  // within an if
} else if (statusCode == 201) {
  // within an if within an else after an if
} else if (statusCode == 202) {
  // within an if within an else after an if within an else after an if
} else if (statusCode == 203) {
  // within an if within an else after an if within an else after an if within
  // an else after an if
} else {
  // within an else after an if within an else after an if within an else after
  // an if within an else after an if
}

Although it gets complicated in the final parse tree, code which is parsed recursively need not be formatted in such a way. This is absolutely how else ifs should be formatted, not nested forever and ever, and this style springs right out of the ground based on the most intuitive definitions of else/if, no help needed.

There is nothing special here. Usually. (I'm looking at you, python, bash, ruby...)

An Unnamed Hell We Live In Today


There is a lesson to be learned here, from a language designer's perspective, though I do not know who to give credit for seeing through the stigma and finding it. All I know is I first saw it in action while browsing through the code of gnome-terminal.

Although else if seems like a first-class language concept, it absolutely is not. It is just a combination of two language constructs that serve a purpose together. And let me tell you, else and if are not the only constructs with useful combinations.

Would you enjoy a language that had a loop where feature? An if then loop feature? A loop case feature, or a loop loop feature? Would you be interested in someone like me designing one?

Programming is all about combining constructs, and these constructs are ripe to be combined. They are literally...figuratively begging you to combine them. And the general programming community has sworn itself into some kind of hell we don't yet have a name for, where we write code recursively with nesting where we need not.

With that grandeur, I present code formatted in a way that you will probably hate.

// loop where
for (item in items)
if (item.isHub()) {
  // ...
}

// if then loop
if (items.containsHub())
for (item in items) {
  // ...
}

// loop case
for (item in items)
switch(...) {
   case ...
   // ...
}

// loop loop
for (item in items)
for (tokens in item.getTokens()) {
  // ...
}

These statements are no different from an if else, and are waiting eagerly to be used.

While you may not find them palatable at first, I implore you to remember how you first felt about camel case, or snake case, or curly braces, or whitespace significance, or spaces around your operators. There must be some type of formatting that you remember hating at first, which has grown on you since then. I urge you to give this formatting style half the chance you gave your last preference change.

Among the reasons of why this formatting style really is worth considering, is that the lack of nesting not only looks simple, but it calls out the simplicity of how the two control statements are 100% linked. That is, 100% of the code within the inner control structure makes up 100% of code within the outer control structure. Once delimiters are added and indentation is used, that should signify that that type of linked relationship is not true, and the code requires more careful reading.

I do not expect this formatting style to win, because the fact is that it's already out there, and not spreading, not catching on. And who am I to call that a loss for us programmers?

Well, I would ask in turn, who are you to not give it a chance?

Obligatory Why-Did-You-Rip-On-Python-And-Ruby Addendum


How Python and Ruby etc were designed is really not the goal of this blog post, but it is worth calling out and exploring the unexpected consequences of their design choices. After all, every language design choice is a tradeoff, and it is hardly insulting either language to explore that.

As I said earlier, else if actually is special when else cannot be followed by a statement instead of a block. Python and ruby have specific parse rules designed to handle else ifs to prevent elif hell, and that's part of why they don't use the keywords else if but rather elif and elsif. For the same reasons, in Python and Ruby, you cannot write if for, or for if, etc., without using unnecessary nesting, or waiting for them to be manually indoctrinated in the language.

In Python's case this trick simply cannot work, and perhaps that is OK, as whitespace significance does carry some inarguable benefits. In Ruby's case, it is more an expression of how programmers prefer keywords to symbols, so much that curly brace languages have failed make them required due to "verbosity" while keyword based languages have managed to convince everyone to type three alphabetic characters to end everything in the pursuit of readability.

What gets in the way is how these languages try to do us a favor. It does seem to myself and to many people that allowing if statements and else statements without blocks is an unnecessary risk. You risk running into the goto fail bug, and in fact, I myself never omit curly braces except when combining constructs as listed above.

In any case, we need not throw the baby out with the bathwater. Perhaps if the languages which support this formatting style already begin to use it and champion its benefits, then maybe our next round of languages and our current generation of linters can be made to accept control flow combinations but not lone statements otherwise, and maybe we will be all the slightly better off for it.

In either case, for most languages, else if is not special, and perhaps the trend of treating it as such is a step backwards.

2 comments:

  1. I would use that style if I still had to program in Pascal/Delphi or any language where you need the begin keyword to open your blocks. I find "then begin" and "do begin" so cumbersome that, to reuse your words, the blocks seem beg to be left out when they are not strictly needed. Unfortunately, I remember that the slightest refactoring (such as adding logging statements) would force you to re-add the blocks anyway.

    There was another pattern similar to this one that people used in Delphi a while ago (when it was hot): the "try try" pattern. Since you needed one "try" with the except clause and another one with finally (you couldn't combine them), people would pretend that "try try ... except finally end end" was a single block for indentations purposes.

    I guess indentation styles were much more diverse in the 70s and 80s. Some old code used to indent the block delimiters {/} or begin/end together with the block they delimit.

    for i := 0 to 10 do
    →begin
    →code;
    →code;
    →end;

    That looks totally unreadable to me.

    ReplyDelete
  2. I have used a variant on that style, except I put the if and the for on the same line.

    ReplyDelete