Wednesday, December 24, 2014

Why did we need this to be a framework?

While this post will be about Spring, the java framework, I highly urge you to read this as more than a criticism of Spring. In fact, I think Spring+Jpa+rdbms is currently an unrivaled web development stack.

But I want to question what spring does, or at least used to, do for developers. And ultimately I want to question why developers won't do these things for themselves.

Dependency Injection

For the time being, lets put aside @Autowired. We're often told that Dependency Injection is a good thing, and the prevalence of Spring seems to support that. We're also often told that Dependency Injection is a pattern and not a framework, and Spring seems to fly right in the face of that.

Lets take a look at Spring's claim of decoupling, with regards to its xml-based dependency injection.

<bean id="mySpringComponent" 
    class="com.wakelang.HowDoIUseOutsideOfSpring"
    p:injected-ref="myOtherSpringComponent" />

<bean id="myOtherSpringComponent"
    class="com.wakelang.AlsoHowDoIUseThisOutsideOfSpring"
    p:injectedNumber="10" />

Of course this code is decoupled from Spring! I hope I'm not shocking anyone when I present the following code:

AlsoHowDoIUseThisOutsideOfSpring myOtherSpringComponent
    = new HowDoIUseThisOutsideOfSpring();
mySpringComponent.setInjected(myOtherSpringComponent);

HowDoIUseOutsideOfSpring mySpringComponent = new HowDoIUseOutsideOfSpring();
mySpringComponent.setInjectedNumber(10);

Give some more realistic classnames here, and our example would be both clearer than the xml, and terser than the xml, and have something called "type safety" that's usually a big hit among java devs. Did I mention no startup performance where its doing xml parsing/analysis plus reflection?

Clearly, with this many drawbacks to the xml approach, it never would've gotten off of the drawing board. So lets look at some other reasons why the spring does things for you that you couldn't have done more easily yourself. For instance, the point of DI is to not call new on our services. Spring solved that for us, right?

Its a Design Pattern, Jim, But Not As We Know It

The only real important part of Dependency Injection is that the code which wires your dependencies is not your business logic. If the code that pulls your apps weight around reaches into global state, calls a static method, or uses a new, then you're not going to be able two swap out or reuse bitesized chunks of business logic without also rewriting and/or complicating your business logic.

So spring lets us have a main method that looks like:

ApplicationContext ctx =
    new SpringXmlApplicationContextFromClasspath("/blah/blah.xml");

This main method can be changed without complicating your business logic, and the xml config that can be changed without complicating your business logic. Seems pretty good. Otherwise we have a main method that looks like:

AlsoHowDoIUseThisOutsideOfSpring myOtherSpringComponent
    = new HowDoIUseThisOutsideOfSpring();
mySpringComponent.setInjected(myOtherSpringComponent);

HowDoIUseOutsideOfSpring mySpringComponent = new HowDoIUseOutsideOfSpring();
mySpringComponent.setInjectedNumber(10);

Oh no! This is the exact same code example we had before! This seems to suggest our verbose xml solution is even clunkier than our verbose xml led us to believe.

I will note that we now have established an advantage of Spring xml config, which is that the xml can be changed by a user without compiling a fork of Main and reassembling your jar. Which is nice. Why would I try to take that from everyone?

Type Safety: A Love Story

Type safety is a spectrum. We've all done things with Objects than can't be done to any old object, suppressed casting warnings, and held perfectly typeable data in string form because we were in a situation where we simply weren't worried about it, for better or worse. Meanwhile, languages keep coming out that improve our type systems, with things like null-safe type systems, memory-safe type systems, and automatic extraction of impurity out of functions (though I can't seem to find the language which pioneered that one).

We don't need to abandon typesafety wholesale to allow truly dynamic configuration.

Class <?> clazz = Class.forName(properties.get("LoggerFactory"));
Constructor<?> constructor = clazz.getConstructor(Properties.class);
LoggerFactory loggerFactory = (LoggerFactory) constructor.newInstance(properties);

Now you can create any main class you want, to read any properties file you want, to dynamically construct any factories you want. Now you can simply include an implementation of LoggerFactory on the classpath and it can enjoy the benefits of a typesafety in the way it constructs the Logger, even though it can do so in absolutely any way it wants.

Alternatively, if you are lower on the I-care-about-typesafety spectrum and want to save yourself the classpath hassle and/or the compile step, you can implement an XMLLoggerFactory which uses the properties or the classpath to find an xml file that would probably look more or less like the spring files I was ripping on earlier. Nevertheless, you get to mix typesafe config with dynamic config.

And as for aspects such as transactions and discovering controller routes, I believe the approach all along should have been object-graph transformers, which traverse every object property recursively and replace them with wrapped instances implementing the new behaviours.

aspects.applyTo(mySpringComponent, myOtherSpringComonent);
webmvc.applyTo(mySpringComponent); 

In short, we don't need no stinking framework. With a little bit of diligence and some isolated libraries, we can do what spring does, in an arguably superior way. This qualifies dependency injection and aspect-oriented programming as design patterns. So why do we need a framework for us to adopt them?

Passing in Straw-Man Arguments

I hope I can more or less invalidate my own point here by pointing out that modern Spring has so many features which my approach wholly fails to address, such as autowiring private properties, circular constructor references, enforcing more-or-less declarative configs. But without these things, the original Spring probably would've been off as just a thin library with a standardized xml structure for loading factories by classname.

Spring's newest annotation-based config is leaps and bounds better than the XML solutions and the one that I offered as well. As are many other dependency injection frameworks out there: Guice and Dagger and others. All of these frameworks are about simplifying the code we write in the implementation of these design patterns, and should do little else.

Challenge Yourself as a Programmer

After all this, I still wonder why the early days of Spring ever inspired anyone. Or, more accurately, why so few people were moved by the design pattern and it took a framework which does mostly nothing in order to convince the masses.

Ultimately, I would say its that we don't challenge ourselves enough as programmers. Our threshold for appreciating new ideas and others' code is so low that we require completely new implementations of simple ideas in order to accept their power.

Challenge yourself to improve all of the simple things first. Sometimes good code can look like boilerplate code at first. Challenge yourself to discover and explore why others' suggestions have more merit than you might've first thought.

Not all advancements in programming are a crazy new framework, build tool, library, algorithm, or language. Challenge yourself to move forward in the simple ways too.

Thursday, November 13, 2014

Annotations, Pretty errors now built into Wake

It becomes tougher and tougher to keep making quick releases of Wake. In addition to compiling the feature, I now have a slew of libraries written in Wake that might need to be updated, and/or recompiled in the proper order.

This was the case with the most recent release of Wake, which involved the creation of Annotations. Notably, they are now used by wUnit to find what's a @TestClass and what's a @TestMethod. From here on, they can be used to create javascript interoperability, as I covered in my last blog post.

This annotation data is stored in a binary file that just tells the compiler how a class can be used, which I've called a table file (and it has the .table extension). It is basically the same as a .hi file, for haskellers out there. While many features and bugfixes for the Wake compiler do not affect these files, such as the new error reporting, other features do.

But changing the table files is not a compiler-only change. The libraries that you use in your wake projects must be rebuilt with a compatible version, and some libraries themselves, such as the wake reflection library, must be updated to read the latest stuff. It's a blast to spend time writing in Wake instead of C++, but it risks lowering productivity as we near V1.

That's why, behind the scenes, this release features a new build server for Wake. When features come out that are not binary compatible with the last version of the compiler, it is now a simple task to rebuild all of the core libraries in order with the latest compiler, and save those libs in a bundle for our users to download and use in their projects.

Infrastructure like this is exactly what we need to keep productivity high as features keep coming out and we have keep building new Wake libraries to maintain.

We are still looking for helping hands, and still chugging onward. See you again for the next major feature addition!

Tuesday, October 28, 2014

Thoughts on the First Steps of Wake's JS Interop

For the last several months I have been creating the language Wake, and now with hard work and help from friends its nearing feature completion as a usable language. However, there's a huge goal of the project that has yet to be realized -- compiling to multiple languages, and having smooth interoperability with them.

At the moment wake compiles to javascript, so that's where our interop features will begin. My original plan for Wake's js interop was to create a whole new compiler from scratch, written in wake, that compiled a file with a custom syntax, and spat out ordinary wake compiler artifacts:

wake-js-interop JQuery.wkjs -d bin/waketable -o bin/wakeobj/JQuery.o
creates files: bin/waktable/JQuery.table, bin/wakeobj/JQuery.o

There does have to be some "glue" code that sits between Wake world and real JQuery, so JQuery.o is the glue and JQuery.table describes its usage to the Wake compiler. JQuery.wkjs would usually look like simple wake, but other things would be very custom syntax.

every JQuery is:
    Object[] -- map(Object[] what, Object fn(Object) mapper) = "$.map(what, mapper)"
    JQElements -- find(Text selector) = "return $(selector)";

What are the problems with this approach? Well you take into account all of the semantics and file encoders and everything else that has already been written in the wake compiler, from inheritance to interfaces to providers to circular dependency relationships, its a ton of work to do. Meanwhile, the first step would be to code a clone of Yacc in wake which is its own difficulty. So last night I took a different approach and its a solid proof of concept! Simply create an abstract class in Wake

every Console is:
    log(Text);

compile its table file (usage definitions) only:

wake Console.wk -d bin/waketable -t

and use the 'wake-js-gen' github project on the table file (in this case the last argument is saying that we're wrapping the singleton var named "console" but there are other options such as wrapping constructors)

wake-js-gen bin/waketable Console bin/wakeobj/Console.o console

and now you can console.log in js from wake!

Console.log('test!');
//test!

In the immediate moment, I'll probably use this to create some simple wrappers for JQuery, Regexp, etc. so that we can code against them ASAP. But next, we have to come up with a strategy for making this more flexible. But I'm thinking about bringing in annotations for the task:

extern JQuery is:
// 'extern' instead of 'every' will tell the compiler not to create a .o file
// and that these abstract methods aren't abstract

    @Js("return $(selector)")
    JQElements -- find(Text selector);

    @JsSingletonMethodOn("$")
    Object[] -- map(Object[], Object fn(Object));

This way your wake source code can be the master copy, and the tools can still be codegen-only, no validation or parsing. Not only does it make my job easier for the js generator, but it makes my job easier for every other language we support from here on, and it makes it easier for others to come by and create these generators themselves. What do you think? Is this a good interop strategy or a bad one?

Thursday, October 2, 2014

The Am I Vulnerable To Shellshock Checklist

There are a ton of people sharing a test to see if your shell is vulnerable to shell shock. However, exploiting the shellshock bug (unlike heartbleed) requires multiple failure points.

It is impossible to precisely define these failure points, and this has the internet in much more fear than is actually warranted. Are you concerned about your vulnerability to the shellshock bug? Perhaps after reading this checklist you will change your mind.

Disclaimer: This checklist is not comprehensive, and will not necessarily be up to date. This checklist attempts to be accurate for the vast majority of servers as of the time it is written. Use your own brains too.

1. Are you on a unix system?

This includes mac, freeBSD, solaris, and linux. If no, you are not vulnerable. Otherwise, proceed to step 2.

2. Are you on a server or a personal computer?

Only servers, or personal computers with server software installed, are likely to be vulnerable. If you don't know, you can probably call yourself safe. Otherwise proceed.

3. Are you using apache mod-cgi or restricted ssh sessions?

If you are using apache mod-cgi proceed to step 4. As for ssh, of course all ssh sessions are restricted except for root, but don't be concerned just yet. In this case the type of restricted ssh session we are referring to means that you have a file in ~/.ssh/authorized keys that limits a user with a key to a specific command. If you are using this on your server proceed to step 5. Otherwise go to step 6. Note that realistically anyone using mod-cgi should patch, just in case. It's really crazy not to if you are using mod-cgi at all.

4. Are you using bash scripts in your cgi bin?

If you use !#/bin/bash then you need to patch. If not, go to step 12.

5. Do any malicious parties have an authorized key to your server?

If so you are vulnerable. If not, you are vulnerable to non-malicious parties doing more than they are supposed to, which is another but less severe type of vulnerability. Patch before they do non-malicious things by breaking out of their session restrictions.

6. Congratulations, you have no mass-targetable vulnerabilities!

At this point you are only vulnerable to targeted attacks from parties that specifically dislike you. Remember that you have a lot to worry about should this be the case, and shellshock might not be the most important thing to patch. exec($_REQUEST(...)) comes to mind as something a bit more pressing than shellshock. Continue to step 7.

7. Do you have any open ports to custom apps?

Noncustom apps could have vulnerabilities too, but so far none have yet come up aside from the two mentioned. For this reason we'll look at custom apps only - web services, database services, etc. If you don't, you are probably completely safe from shellshock! Otherwise proceed to step 8.

8. Do your custom apps set any environment variables based on user input?

This is unlikely. In fact, github searches reveal that php codebases are about 10x more likely to be executing arbitrary user input as setting it in an environment variable. If your custom apps and the libraries they use aren't doing this, you are completely safe from shellshock exploits. Otherwise, keep going on to step 9.

9. Are there high restrictions to setting these environment variables?

Take a moment to reflect on the restrictions of your last answer. When users can set environment variables on your system, do they need to be on a certain IP? Do they need a password? These things might protect you adequately. If not, you may be surprised to hear that we still aren't at an exploit yet and you'll need to go to step 10.

10. Could your custom app run any system commands after setting these variables?

Defining this failure point is difficult and varies from app to app. Some apps like PHP may set an environment variable in a process only responsible for a single page load. Other apps might set it in a persistent daemon. Any processes spawned from these places will inherit the environment variables. Should such a process invoke a system command (even something harmless like `ls`), proceed to step 11. Otherwise, you are safe, but in dangerous waters since custom apps usually change over time.

11. Is your user's default shell bash?

This is either possible with a unix user's configuration, or with a symlink from /bin/sh to /bin/bash. If neither of these things have happened, go to step 12. Unfortunately, if bash is your default shell, you are vulnerable.

12. Are any of your invoked commands bash?

By this I mean, does /bin/sh every invoke the command 'bash ...'? Note that environments are inherited across processes, so you are equally vulnerable if sh launches a process that launches bash. If not, you are in severely dangerous waters but are technically probably safe for now. Congratulations for being right on the edge! To everyone else, you need to patch.


Its far worse than I thought!


thought noone ever. As you can see, the shellshock attack only has certain avenues of effective attacks, and severely limited avenues to targeted attacks. You likely have higher things to worry about, like updating your wordpress site and protection from XSS attacks. There are probably more websites storing passwords in cleartext than websites vulnerable to shellshock.

A blogger claimed to have found three thousand vulnerable servers, but never responded to my comment asking how many total servers had been scanned in the process. Until the answer comes out, we can only assume from the author's tone that it is "very low." The checklist here supports this.

I already made my disclaimer, this list might have some holes for niche servers. Anyone with critiques of my checklist are welcomed and I'll do my best to correct inaccuracies.

Friday, September 5, 2014

Announcing angular-dep.vim, a Vim Plugin for Angular.js Developers

One of the best practices in angular.js development that I have trouble adhering to is minifiable injectables.

I have always been a huge fan of dependency injection, and I am constantly advocating for it in my work projects that aren't fortunate enough to have it. When attempting to convince a developer who hasn't experienced the wonders of dependency injection and the slew of benefits it brings to testability, they only seem to think of it as an API. A verbose API.

That is part of why I am creating the language Wake, to make syntactically beautiful DI and to require its use. Because without a dedicated language doing so, dependency injection is inevitably more work - and that's the work you have to sell to that skeptical developer. Take java.

@Inject
public MyInjectedClass(MyDependency1 mdep1, MyDependency2 mdep2, MyDependency3 mdep3, MyDependency4 mdep4) {
    myDep1 = mdep1;
    myDep2 = mdep2;
    myDep3 = mdep3;
    myDep4 = mdep4;
}

protected MyDependency1 myDep1;
protected MyDependency2 myDep2;
protected MyDependency3 myDep3;
protected MyDependency4 myDep4;

I have stumbled over solving this problem in PHP, I can rarely talk spring developers out of using @Autowired on properties instead of constructors, it seems to be a constant battle.

So I was blown away and thrilled to find angular.js, an MVC framework with so many excellent features including beautiful dependency injection.


app.controller('MyCtrl', function($my, $injected, $items, here, and, here) {
    // code
});

It was like coming up for fresh air, and yes, I now finally have many friends who appreciate and understand dependency injection. It worked.

So when people tell me the truth that I don't want to hear,

You shouldn't use that syntax in your injectables. It isn't compatible with minifiers. Use ["$deps", "here", function($deps, here) { ... }] instead.

I grudgingly nod my head. Beautiful dependency injection without boilerplate had slipped through my fingers.

But this boilerplate is inherently predictable, so I decided I've had enough. I sunk some time into creating a plugin that manages the two for me, with extra goodies to add/remove/reorder the dependencies for me.

Here's a gif of it in action, recorded with the handy-dandy ttygif project.





Check it out on github! 


I suppose a vim plugin will have to do until Wake makes its first production release.

Saturday, July 26, 2014

From the Ashes of Google's Noop, Comes the Language Wake

The programming language Noop was not an official Google product, though it was often given credit for being so. Dedication to testability was what made Noop so unique and exciting, and what generated hype about its eagerly-awaited first release.

Unfortunately Noop fell the way of most languages, and was cancelled before its enthusiastic followers got a chance to try it out.

Now our small team is excited to announce that Noop's unrealized feature set has inspired Wake, a practical object-oriented language in Alpha release status. Anyone who was interested in Noop can now instead read Wake's sleek documentation, try it out online, or join the drive to V1.

Why Noop Didn't Die

The Java community is probably the most dedicated to unit testing, which is not reflected in the language's design or standard library. A couple of really bad ideas like URL equivalence reading from static DNS calls exemplify the lack of now-common practices such as Dependency Injection. It becomes a full career to explain how to write testable code (see: Misko Hevery, the author of that article I just cited on Dependency Injection). There isn't a language in existence yet which makes testability inevitable.

Wake, like Noop, resolves the issue entirely. In Wake your program startup is dependency injected. This means you can simply declare:
every Main is:
    needs Printer;
and your startup class will get a Printer, the Printer gets what it needs, etc. Then in your test cases you can pass in any Printer you wish to verify its output.
every MainTest is:
    provides Main, Printer <- MockPrinter;
    testSomething() {
        var Main from this;
        // Test code
    }
As you can see in this example, Wake has removed the new keyword, requiring all new objects to be created by injectable providers. This means statements you used to write as new Printer() are now Printer from PrinterProvider, a simple change that allows PrinterProvider to return any (more testable) implementation of Printer which it desires.

New Language, New Tricks

We didn't stop at testability.

A huge goal of Wake is to make the smoothest syntax of any statically-typed language. Type inference is one option, but it doesn't play as well with OOP languages as it does with functional ones. The easiest solution we saw was to stop requiring developers to name their typed variables.
greet(Person) {
    // code
}
Here we have a method that works on a Person, which in java we would've defined as Person person, or worse, Person p. This syntax is only possible in a language which scrapped static methods because they are death to testability.

This has an amazing side-effect on iteration. Since lists are also valid variable names, we have a completely new form of iteration in Wake.
var Entity[] = getEntities();
foreach(Entity[]) {
    save(Entity);
}
This foreach statement takes a list of Entity objects named Entity[], and lowers it to the variable Entity so that you can execute code on each. For those who are curious, foreach operates on expressions, not variable names. This means the above could have been written as foreach(getEntities()) save(Entity). This is a concept only possible in a language which understands your program's types, and lets your variables be named directly after them.

Current Status of Wake

Wake supports a huge array of features, including null-safe types, inheritance, method overloading, and generics. It even has a unit-testing framework, a mocking framework, and compile-time reflection. Tooling is in progress and some major features such as closures are still in progress. We won't call anything V1 until we have these features.

It is also worth noting that Wake currently compiles to Javascript. Not to spill too many beans, but we're hoping to create several more compile targets in the future. Maybe Wake will become a swiss army knife for our constantly diverging platforms.

If you are looking to try something new, advise the direction of the project, or even hack at the compiler (written in C++), then Wake might be the next (or at least the newest) language to be worthy of your excitement or contributions.

Monday, June 9, 2014

Why You Want to Test Private Methods

As programmers, we want to leave behind an API that "can't go wrong," that doesn't cause bugs by being used in the wrong order. Often the first tool we grab to do this is private methods.

This leads to a classic question.
How do I test private methods?
And the standard answer, why do you want to?

It's a response that makes one feel like a wise, bearded, mysterious master of code. But I already know the answer, for the vast majority of cases. The answer to why someone wants to test a private method is:

They shouldn't be using a private method.

At least, this is the answer 99% of the time.

But Not Me, Seriously

Odds are against you. Lets see what private methods should not necessarily be used for. Odds are pretty high that your reason for using private methods is in this list.
  • Keeping inner data structures coherent
  • Reducing duplicate code
  • Isolating pure functions from stateful ones
But I've made a list of cryptic nonsense. This is a time for examples.

Keeping inner data structures coherent
class Teacher {
    private String[] students;

    // Use this instead of adding directly to student, so
    // that its always alphabetical.
    private void addStudent(String student) {
        students.add(student);
        alphabetizeStudents(); 
    }
    // some public methods... 
}

Here we wrote a private method to handle writes into our array of students, and maintain their coherence - in this case, alphabetization. You then think, alphabetization is easy to test. How do I test this?

Reducing duplicate code
class StudentCsv {
    private writeLine(String... vals) {
       bool first = true; 
        for(String val : vals) {
            File.write(val);
            if(!first) File.write(',');
            else first = false;
       }
    }
    // some public methods... 
}

Here we found yourself writing a lot of comma management in this CSV generator. You think, lets abstract this into a method. And comma separation is easy to test, but now its private. How do I test this?

Isolating pure functions from stateful ones
class CollisionDetector {
    public long euclidianDistance(Point a, Point b) {
       return sqrt(sqr(a.x - b.x) - sqr(a.y - b.y));
    }
    // some public methods... 
}

Your heart tells you a distance function doesn't belong buried inside other code in your collision detector. Why? Its stateless. So it could be anywhere, and its reusable; maybe you'll use it somewhere else in CollisionDetector. Its not reusable otherwise yet, but that doesn't matter. Don't people say to make code usable before reusable? And so now the question in your mind is, how do I test that it's usable?

Make It Reusable

One of the reasons why testable code is so highly related to good code is that it enforces decoupling. To test your code, don't make the method public, make the method reusable.

We want our solution to keep our APIs clean and impossible to use wrong. And of course we can keep that. We can't call private methods of course, just like you can't call a public method on a private class member.

This gives us a simple pattern for preserving encapsulation, honoring the Single responsibility Principle, and making your private methods reusable.
  1. Create a new class to handle whatever responsibility that private method had. Suppose we're refactoring Recipe and creating a class Ingredient.
  2. Move any other Recipe code which belongs to the IngredienSet responsibility into the IngredientSet code.
  3. Give a new private member of type IngredientSet to Recipe

And that's it. Let me apply this to the simplest example:

Isolate pure functions from stateful classes
class DistanceCalculator {
    public long euclidianDistance(Point a, Point b) { ... }
}

class CollisionDetector {
    private DistanceCalculator distCalc = new DistanceCalculator();
    // some public methods...
}

By creating a class DistanceCalculator, we now have the ability to check distances anywhere in our code, including our tests. At the same time, we haven't polluted CollisionDetector's interface to include distance checking that isn't its responsibility.

Next up, why are we coupling CSV output directly in our StudentCsv class? Reducing code is good, but you missed the fact that you can:

Reduce responsibilities in a class
class Csv {
    public void write(File file, String... values) { ... }
}

class StudentCsv {
    private Csv = new Csv();
    // some public methods
}

It only took creating a CSV class to handle comma separation, and now anyone can quickly create a new report. Our student CSV only handles gathering the student information and feeding it into the CSV generator.

Finally, we're up against our alphabetized students listing. While making it testable is reward enough on its own, changing our implementation will also:

Actually ensure data coherence
class AlphabetizedList extends ArrayList<String> {
    @Override
    public void add(String s) {
        parent.add(s);
        alphabetize();
    }

    private void alphabetize() { ... } 
}

class Teacher {
     private AlphabetizedList students = new AlphabetizedList();
}

Here we created a subclass of ArrayList which imposes a limitation on its superclass. On each write, it alphabetizes itself. Now our Teacher class can simply treat its list of students as a list of students, and its impossible to have any code which changes the list and doesn't also alphabetize it.

If you're still actually reading my code examples, you noticed that something special about this last code example. It contained a private method, alphabetize() on AlphabetizedList!

When to Use Private Methods

This is where I reaffirm that private methods have a place, and describe gives alphabetize() the privilege.

Except, alphabetize() should not have been a private method. It was a trick. Alphabetize is an example of isolating pure functions from stateful classes. It is entirely conceivable that someone may want to one day alphabetize some other list. The fact that I made it private in the first place should be an immediate clue that the code might deserve a class of its own.

So when is it correct to use private methods, if ever? I could come up with many longwinded and vague, highly-nuanced answers, and it wouldn't help anyone at all. So I'll give you the practical, albeit informal answer instead.

Use private methods any time you already would have, and when you neither want to test, nor reuse the code.

Of course, if you probably will test/reuse it in the future, you might want to extract it into a class now before it gets ingrained into the system as a bad choice.

So next time someone says in a smarmy voice, "Why do you want to test a private method?" make sure to give them the answer.

Monday, June 2, 2014

Compression Driven Development

Good Code is Hard to Measure

And all too often, even good programmers get into a battle of the-least-code-wins. And not to the point of code golf, I mean real production code that's tightened as much as possible.

Codebases all eventually get bloated and buggy and hard to navigate. We all know that less code means less code to trudge through, and fewer headaches. And of course, where bloated code is annoying, duplicated code is an inexcusable bug waiting to happen. This would seem to indicate that excess code causes all of our woes.

A recent blogpost touts these facts as the reasons for "Compression-oriented Programming," telling programmers to develop in this process:
Begin by just typing out exactly what [needs] to happen in each specific case, without any regard to “correctness” or “abstraction” ...when doing the same thing a second time somewhere else, ...pull out the reusable portion and share it.
But before we slip back into focusing on the battle of the least code, let's explore the consequences of taking this methodology too far.

What Does Bad Compression Look Like?

Let's start with some repetitive code, and let's see how badly this advice could be followed.
cairo_set_color(cairo, 0, 20, 20, 30);
cairo_line_to(cairo, 7, 8);
cairo_set_color(cairo, 0, 30, 30, 20);
cairo_line_to(cairo, 8, 9);
cairo_set_color(cairo, 0, 20, 20, 30);
cairo_line_to(cairo, 9, 10);
cairo_set_color(cairo, 0, 30, 30, 20);
cairo_line_to(cairo, 10, 11);

This code draws from 7,8 to 8,9 to 9,10 to 10,11 in alternating colors. It shouldn't be hard to figure that out, and it shouldn't be that hard to maintain.

But let's follow the compression-oriented programming guide and say that these eight lines are certainly doing something more than once. Let's see what we are doing repeatedly that we can streamline.

We are
  • setting the color to 0, 20, 20, 30 twice
  • setting the color to 0, 30, 20, 20 twice
  • drawing from x, y to x+1, y+1 three times
  • using 'cairo_set_color(cairo' four times
  • using 'cairo_line_to(cairo' four times.
So let's "clean this code up."
#define my_cairo_set_color(a, r, g, b) (cairo_set_color(cairo, a, r, g, b))
#define my_cairo_line_to(x, y) (cairo_line_to(cairo, x, y))
#define cairo_move_to_next_xy my_cairo_line_to(x, y); x++; y++;

void cairo_set_subtle_blue_grey() {
  my_cairo_set_color(0, 20, 20, 30);
}

void cairo_set_subtle_red_grey() {
  my_cairo_set_color(0, 30, 30, 20);
}

run() {
  int x = 7; int y = 8;
  cairo_set_subtle_blue_grey();
  cairo_move_to_next_xy;
  cairo_set_subtle_red_grey();
  cairo_move_to_next_xy;
  cairo_set_subtle_blue_grey();
  cairo_move_to_next_xy;
  cairo_set_subtle_red_grey();
  cairo_move_to_next_xy;
}

Once you understand the domain language, the 'cleaned up' code is certainly more literate. But at a huge cost! I have deliberately chosen bad abstractions, such as macros using invisible variables, and hidden alteration of state.

But our code in run() is still so redundant! And after all that work! Lets see if we can fix it in another pass of our compressor.
run() {
  int x = 7; int y = 8;
  for(int i = 0; i < 4; i++) {
    i % 2 ? cairo_set_subtle_red_grey() : cairo_set_subtle_blue_grey();
    cairo_move_to_next_xy;
  }
}

Now this is what we wanted! It might invoke runtime costs but hopefully our optimizing compiler will see through it. It might take a split second to determine whether red or blue will come first, and there's still hidden variables and hidden alteration of state, but overall it's much more compact.

I'm going to 'clean up' some duplication in our macros and helper functions, and then here is our final result.
#define call_cairo(fn) fn(cairo,
#define my_cairo_set_color(a, r, g, b) (call_cairo cairo_set_color a, r, g, b))
#define my_cairo_line_to(x, y) (call_cairo cairo_line_to x, y))
#define cairo_move_to_next_xy my_cairo_line_to(x, y); x++; y++;
#define select_hue(expected) (hue == expected ? 30 : 20)
#define HUE_RED 0
#define HUE_BLUE 1
#define HUE_GREEN 2

void cairo_set_subtle_grey_hue(int hue) {
  my_cairo_set_color(0, select_hue(HUE_RED), select_hue(HUE, GREEN), select_hue(HUE, BLUE)); 

void cairo_set_subtle_blue_grey() {
  my_cairo_set_subtle_grey_hue(HUE_BLUE);
}

void cairo_set_subtle_red_grey() {
  my_cairo_set_subtle_grey_hue(HUE_RED);
}

run() {
  int x = 7; int y = 8;
  for(int i = 0; i < 4; i++) {
    i % 2 ? cairo_set_subtle_red_grey() : cairo_set_subtle_blue_grey();
    cairo_move_to_next_xy;
  }
}

Hey! I Can See Through Your Tricks

The problem that I forced in my example compression is that I chose the wrong abstractions in the wrong order, and focused on the wrong things.

I compressed code paths that are incidental to the program, instead of paths that are inherent to the problem domain.

And, as a cherry on top, I compressed in ways that I knew would simply lead to another round of compression.

How extreme is this example? I don't think it's that extreme. Once the code has been split out to this level abstraction, I don't think any programmer is going to fix it any time soon. I've made a pretty tangled hairball in just 27 lines of code, who would risk fixing it if it were spread out across a hundred lines? A thousand? Five thousand?

That novice developer focused on removing duplicate code went too far, and nobody stopped him, and nobody will undo his mistake.

How it Should Have Ended, in Six Lines

int x = 7; int y = 8;
for(int i = 0; i < 4; i++) {
  i % 2 ? cairo_set_color(cairo, 0, 20, 20, 30) : cairo_set_color(cairo, 0, 30, 20, 20);
  cairo_move_to(x, y);
  x++; y++;
}

This code is more maintainable than the eight-line version, but it is also harder to read. Given how this repetitive code is rare, we've maybe turned an 8k simple-to-follow codebase into a 7k harder-to-follow codebase. Its not exactly worth a raise.

Abstraction Trees

Code is hierarchical, and that hierarchy often (though not always) needs to be understood in order to solve bugs, add new features, and refactor code. Any time you consolidate code you heighten the abstraction tree.

Following the abstraction tree of the good example, we must
  1. understand that we'll run the code four times
  2. understand the state change each time
  3. understand the color swap each time.
In our bad example, we must additionally
  1. understand that my_cairo_move_to uses & modifies state
  2. understand call_cairo's syntax trickery
  3. understand how hues in RGB work
  4. understand that select_hue chooses a high/low value to match

We've doubled what we need to know in order to follow the code from top to bottom, and that is a huge price to pay.

So when is compression-driven development worth pursuing? When is it an antipattern?

Compression-driven Development is an Antipattern

But not always.

You are turning yourself into a compression algorithm with the restriction of syntactically valid results. Don't do it.

Approach the problem of duplicated code as a domain-specific problem. Don't solve it with the first abstraction that comes to mind, solve it in the way that the problem-space is necessarily tied to.

Abstraction should be used when the behavior you are compressing is clearly part of the problem domain, and when the difficulty of traversing the abstraction tree from top to bottom is clearly lower than the cost of modifying duplicated code.

Compression-oriented programming can give you insights on good vs bad abstractions. Compression-driven development (where compression is the means as well as the end) is only ever incidentally good to a codebase at best, and actually encourages the creation of monolithic, bloated codebases at worst.

Wednesday, April 2, 2014

Tutorial: The mechanisms behind Gradle

Gradle is a build system made to rival maven. Gradle has declarative scripts, plugins, dependency management, artifact publishing, multiproject builds, and all under a sleek Groovy DSL. For random tasks I still would often use make instead, but for structured projects I highly recommend you ditch maven and move to gradle.

You either like maven or you don't

This isn't a maven-bashing article (though I would love to write one) but I will simply say that for my latest project at work, migrating from maven to gradle cut our build times down by 30%-70%, ~600 lines of xml became ~150 lines of groovy, and weeks of various people troubleshooting various issues with maven became three days of trying to grok groovy's workings.

Assuming you are moving from maven to gradle, like I did, you may wish for a better tutorial, like I did. Gradle's documentation is great for standard workflows, but gradle is more flexible than that. It was figuring these things out that cost me the most time, so I figured I'd publish what I'd learned to further gradle as a great tool.

What this tutorial covers

I'm going to walk you through what I went through. We're going to begin with gradle basics, move to multi-project setup, and then end on some advanced notes that explain how it all works.

This tutorial does not cover every way things could be accomplished, or every feature you may like to know. It is intended to get you on firm ground from which point documentation can take over. This tutorial is also not intended to start projects from the ground up, there are plugins that automatically do much of what I'm about to describe, and often they do everything you need.

If plugins don't do everything you need, here is how to cleanly write build scripts that integrate with them.

Basic tasks

To start off, lets create a project where gradle build prints "hello." Its extremely straightforward:

build.gradle
task build << {
  println "hello";
}

By using the << operator, you can specify code to be run when a task is triggered. But don't worry, you don't have to write redundant code for things like running commands or managing files. Here we define some basic tasks leveraging the "task type" paradigm.

build.gradle
task myDeletionTask(type: Delete) {
  delete fileTree('**.o');
}

task myCopyTask(type: Copy) {
  from 'path/to/source/file'
  into 'path/to/destination/directory'
}

task myCommandTask(type: Exec) {
  commandLine "echo", "hello", "world"
}

Gradle will even capture the output of "myCommandTask" and print it with gradle formatting automatically.

This does bring us to an important distinction. When a task is declared with <<, its code is run when the task is run. However, if the task is declared without <<, its code will be run when the task is configured (ie, immediately). Thus, our first "hello" task needed << to delay echoing until build, while our copy, delete, and exec tasks cannot have << or the configurations will be delayed.

Task dependencies

You can add dependencies between tasks in a couple ways, but these seem to be the most common:

build.gradle
task myTask {
  dependsOn build
}

myOtherTask.dependsOn myTask

tasks.withType(Compile) {
  task -> task.dependsOn myTask
}

And viola, with this simple example, myTask will run before all compilation tasks, and also before myOtherTask.

If you are curious, the last of the three is functional programming in groovy. What it means is this. For each compile task, invoke this function. That task will be the argument named task (as per task ->), and with it we'll use the tried and true method dependsOn.

Breaking up your build into projects

When you run a `gradle` command, it will search up the directory tree looking for a settings.gradle file which defines a project structure. You can declare a project setup like this:

settings.gradle
include "ProjectA", "ProjectB"

and if your projects names don't directly correspond to their paths, you can fix that as well:

settings.gradle
include "ProjectA", "ProjectB"

project(":ProjectA").projectDir = file("path/to/project/a")
project(":ProjectB").projectDir = file("path/to/project/b")

You can now have build.gradle files inside ProjectA's root directory and ProjectB's directory that configure the projects just as you would expect, but you can also create relationships between these projects and cut redundant configurations into a master build file. It is noteworthy to add that you do not need build scripts in these directories at all, often the master build script can take care of everything these subproject build files might need to do.

Multi-project configurations


Next to this settings.gradle file, you can create a root build.gradle file which will serve as a master build script. Here we just add empty tasks to the root project, but we'll talk about configuration injection in a bit.

build.gradle
task build
task clean

Quite boring. What is this master build script for? Well, you can configure subprojects directly, either by name or to each one in general.

build.gradle
project(":ProjectA") {
  task projectAOnlyTask << { println "Project A!"; }
}

subProjects { subProject ->
  task allProjectsTask << { println "Any project!"; }
}

Project A now has a task "projectAOnlyTask," and every project has a task "allProjectsTask." This is simple but incredibly useful code consolidation. It essentially lets you have your own convention-vs-configuration defined in your root build.gradle. Notice the form of configuring all subprojects uses the same functional groovy syntax as tasks.withType -- a function with one argument named "subProject" that's executed for each subproject.

Waiting for subproject evaluation


One gotcha here, is that this root project script is evaluated before each subproject script. That may sound inconsequential, but it means any tasks defined in a subproject build.gradle file cannot be used here, for instance, in a dependency relationship. Gradle solves this with some more functional sugarcoating.

build.gradle
subProjects { subProject ->
  afterEvaluate {
    taskDefinedInSubProject.dependsOn taskDefinedInRootProject
  }
}

The extension API


So lastly, what if you find you have 20 lines of code duplicated between two different projects? Its simple, using the extension properties API, you can make flags/properties in subproject build scripts that the root build script looks for.

projectA/build.gradle
ext {
  runsDocumenter = true
}
build.gradle
subProjects { subproject ->
  ext {
    runsDocumenter = false
  }
  afterEvaluate {
    if(subproject.ext.runsDocumenter == true) {
      task document(type: Exec) {
        commandLine "document.sh", subproject.projectDir.path
    }
  }
}

You have all the power of Groovy to make this API get your job done.

Dependencies between projects

If you split up a project into multiple smaller projects, you probably have a relationship between these two source trees. In this circumstance you almost certainly have a compilation relationship as well. In gradle, you can declare dependencies from project to project like this.

projectA/build.gradle
dependencies {
  compile project ":ProjectB"
}
projectB/build.gradle
dependencies {
  compile project ":ProjectC"
}

Gradle will now incrementally look back up the project dependency tree, as well as pass around jar files, and more. It really is too good to be true, yet somehow it is.

Advanced configuration

What's discussed so far, plus some plugins and supplemental API reading, should get you an advanced build script for an advanced project. However, there is still much magic yet in the way these multi-project builds work. To finish the article, I will break down a simple, but mysterious declaration from the code sample above.

projectA/build.gradle
dependencies {
  compile project(":ProjectB")
}

What does this do? What does it mean?

I recently created protobuf generation as a subproject of a sprawling android project. Since there is no protobuf plugin, I found strange issues when trying to create dependencies on that project in this manner. Stumbling through it greatly improved my sense of understanding of gradle.

We'll take two projects that run simple commands, have no plugins, and then create a dependency between them.

dependency/build.gradle
task build(type: Exec) {
  commandLine "touch", "myfile"
}
dependent/build.gradle
dependencies {
  compile project(":dependency")
}
task build << {
  println "hello"
}

The command `gradle build` now yields us a vague error,

A problem occurred evaluating project ':dependent'.
> Could not find method compile() for argument [project ':dependency'] on project ':dependent'

It makes me glad to say that the "compile" part of the dependency declaration is not an inherent part of gradle's DSL. What this means is that in our current project, something is missing. What is missing? A project configuration. These are added automatically by most plugins, but in our case we have to understand how they work and add one ourselves. Hint: its easy.

dependent/build.gradle
configurations {
  compile
}

Running this now nets us a message "hello" and an empty file in dependency/myfile. However, in more complex scenarios you'll find that there is a problem. Firstly, dependent:build will not build incrementally. Secondly, this will not, surprisingly, guarantee that dependency:build will run before dependent:build.

To solve the first problem you can use the dsl quite neatly:

dependency/build.gradle
task build(type: Exec) {
  commandLine "touch", "myfile"
  outputs.file file("myfile")
}

By declaring outputs (and by the way, there is an "inputs" property too), we can prevent gradle from needlessly regenerating this file. But what's wrong with our order?

The problem is that gradle doesn't know which tasks from :dependent generate the artifacts needed by :dependency, and gradle doesn't know which tasks in :dependent need the artifacts generated by whichever task generates them in :dependency. While declaring inputs and outputs should solve this problem in theory, it isn't scalable and crossing projects like that is against gradle's One True Way.

Telling gradle that :dependent:build should use the artifacts in the compile configuration is easy, so we'll fix that first.

dependent/build.gradle
task build << {
  dependsOn configurations.compile
  println "hello"
}

Its strange how much simple changes can clear away so much fog. Unfortunately, there isn't an easy fix on the next problem.

What gradle expects to see is something like this. Depending on a project means you declare a relationship with its artifacts. Artifacts are matched up to configurations, and tasks. While you can specify exact artifacts from a project by having multiple configurations, the standard DSL format expects a configuration named "default".

dependency/build.gradle
configurations {
  default
}

artifacts {
  default build
}  

Here we give our dependency a default configuration that's pulled in by our dependent project. Then by use of artifacts, we can match that configuration to a specific task. However, since default is a keyword in groovy, we actually have to do this instead.

dependency/build.gradle
configurations {
  delegate.default
}

artifacts {
  delegate.default build
}  

Be aware, we're in the belly of the beast. Our problems don't end here. The next problem you'll see if you try to run this is that "build" is not a task of type "AbstractPublishableTask". To truly get this code working, you would need to create a plugin which defines a new task which implements these methods. However, you don't want to do that, because its too much work, and that's why you ditched maven in the first place. Fear not, there is an easy hack to solve your woes.

dependency/build.gradle
configurations {
  delegate.default
}

task jarHack(type: Jar)

artifacts {
  delegate.default jarHack
}

build.dependsOn jarHack  

And now we have made gradle happy, by appeasing to its focus on java. The important part is not the big picture, but the details. You should now have a better grip on gradle than most, and there should not be much in a build process that escapes you.

By doing it the real way and not the groovy way you can share the artifacts with your other build processes. However, I am not qualified to describe that process, and so I will leave it in your hands to do so and maybe blog about it. Or maybe I'll get around to it eventually.

In summary

Hopefully you can now look at buildscripts and see through the DSL and into the core. If you have any questions, feel free to leave a comment and I'll try to clarify any subjects/topics that I missed.

Happy gradling!

Sunday, March 16, 2014

Programming languages are an artist's medium.

I'm an opinionated programmer, and I like that about myself. I often find myself in the next debate about best practices, ways a program could be written differently, and ways that languages could be designed differently. And one thing that's surprising about these debates is how often the phrase "a language is a tool" comes up. When I said static methods are almost always bad, I heard this:
A language is a tool. To not use all of its features is to not use it to its full potential.
Later, when I kind of wanted to remove the "=" operator in my language Wake to prevent that classic assignment-inside-an-if-condition bug, my C++minded friend also told me that languages were tools. This time, the point being that I can't prevent their misuse, and should leave them alone.

In a way the metaphor is perfect. I think both of those objections to my ideas, valid or not, were truly more along the lines of, "Don't reorganize the tools in my workshop, and definitely don't break them."

Building an App in Hammer

The idea that x is a tool is a truism. One definition of "tool" you will find online is a means to an end. Another one that I saw said a tool is something used in the practice of a vocation. So sticky notes are tools. Chairs are tools. Ideas are tools.

For such a broad definition of "tool" as this, those two arguments from my friends often don't apply. You probably don't think its a good idea to enact every idea you've ever had, to deliver its full potential. You definitely wouldn't say that sticky notes couldn't use stronger adhesive, because the ultimate usage of sticky notes will always fundamentally require knowing the best surfaces they'll adhere to.

And if C++ was the hammer of choice for your latest app, you wouldn't say you built that app in hammer.

Viewing Programming Languages as a Medium

I read a blog post by Jamis Buck stating that Java is the LEGO of programming languages...Ruby is the Play-Doh. You can build a house in Legos and you can rebuild it later in Play-Doh. His metaphor also coincidentally communicates the difficulty of swapping and merging languages.

Applications are sculpted in our medium of choice. That makes our compilers, IDEs, and VCSs into our tools - like our glazes, our bench, and our catalogs. Some languages have to be fired to finalize the end result and others don't, like clay vs marble, C++ vs PHP. Some materials can be glued together beautifully and some not so much -- depending on if you view the JNI layer as "pretty".

Problems in a language (such as the classic assignment-in-an-if-statement problem) can be viewed as bubbles in your medium, waiting to appear without warning. Problems in a language's usability can be compared to malleability, fragility, the difficulty of working against the grain.

Its been said that Java is naturally untestable, and I would agree insofar that the easiest (most intuitive) way to build an app out of Java, PHP, or C++ often leads to rigid, untestable code. That is why I am in the process of creating Wake: to create a medium that can be intuitively built into testable structures, but without the Play-Doh of Ruby. It is also my goal to make it compatible with as many other mediums as possible, something that a few recent languages have attempted.

Languages are a Medium, so Insert Lesson Here

Since a medium is a tool, many expressions involving tools translate directly over to mediums. You'll see that this isn't a meaningless distinction of semantics, but rather that there really are more applicable insights that follow from a single changed word.
Pick the right medium for the job.
Know when your app has to be light and fast, or strong and durable. Choose a medium sculpted by techniques you already know. Unlike tools, be careful to only mix compatible materials.
A bad workman blames his materials.
Know your materials, their strengths and weaknesses. If the material is a poor choice, do whatever you can to change it. Unlike tools, sometimes projects are past the point of feasibly changing the medium. At this point, a good workman is of course allowed to blame somebody else's materials.
Don't work for the material, make the material work for you.
For tools like a hammer, this is an encouragement to sometimes just use the handle, or to occasionally use it to prop open a door. In other words, it's a way of saying you may be doing too little with your hammer. For a material, however, this advice warns you of doing too much. Don't go against the grain. Don't make deep gashes that may have to be filled in later. See the shapes already present in your block of wood and blend them into your design. Do as little with the material as you have to in order to get a final product.

Lastly, My Own Personal Advice

Languages are a medium. The future will always offer new and improved concoctions. Do not be afraid to learn how to work with them.

Sunday, January 19, 2014

The power of Eclim vs. the stranglehold of IDEs

It took about a year of my professional programming career for me to grow out of IDEs. At the time I was working on a very large project (read: over a million lines of code) and of course inexperienced to boot. On day one, I was given a netbeans installation, and I was wowed by the code completion.

My first C++ development was done on Code::Blocks, and I believe I used JBuilder on my first java programs. At the time it simply "was the way," because I had followed guides that led me there. I didn't know that it was a giant horrible dominating slow layer on top of powerful highly optimized tools, and besides, I was so much slower to think than to write.

There were a couple things about netbeans that drove me crazy. I found that the project was so big, code completion popups were slower to appear than to type everything out myself. Crashes led to lost work. My favorite though, was new file creation. If you used netbeans to make the file, it would hang for sometimes a couple minutes. However, you could get around this by making the file from a terminal, and netbeans would pick it up right away.


Going without an IDE, a million lines at a time

I went cold turkey one day and cannot have been happier. I learned more than just :q!, :w, and :wq in vim, and found myself with a natural, fully-fledged scripting language beautifully molded into the editor. It takes 15 minutes to create auto-comment commands, helper for internationalizing a template file, and so so much more. I was free.

I found that I was even more obsessed with type-safety, and clear, reliable method names. I learned grep really well, and sed, and find --exec...

Not remembering a method name led to my setting up project documentation generators. Needing to know which classes did what turned me into an encyclopedia of sorts; the other developers would usually ask me about the code in a subsystem that neither of us had touched since last May.

The problem with IDEs is they are written...backwards. The culture of *nix is to make features out of existing tools. IDEs do use existing tools to ease compilation and debugging and more, but usually by, say, pretending .classpath doesn't exist and creating a new, unintuitive, unnecessary graphical editor for a simple xml file. And then, all of the features in powerful mature editors are slowly rebuilt from scratch. IDEs don't care if you love emacs or vim or any other hardcore editors out there, they think they know what you want better than you do.

Eclipse is literally a home-grown window manager, running a home-grown file browser, housing a home-grown editor, with a tremendously complex plugin API if you want to extend it. Plugins are great, but sometimes I just want simple inputs and outputs that I can leverage from my own context.

I tried Vrapper, an eclipse plugin that "vim-like editing" to the main editor, but it had so many issues. Rectangular select didn't work for me, vim-style redo/undo didn't work for me, ^C wouldn't trigger <ESC>, and the various popups at different timing made muscle memory unreliable.

It isn't Vrapper's fault - it is, after all, re-implementing a very complex interface from scratch through plugins to a very complex engine, instead of querying a very complex engine from a already-built complex interface.

 

Eclim, the best of both worlds

I tip my hat to the creators of Eclim. It has automatic imports, scope-sensitive refactoring, type-safe autocompletion, project tree browsers, error reporting, and provides helper methods to edit .classpath should you desire them. All of these features are simple vim functions, provided by a plugin that talks to a headless eclipse daemon.

Eclim has proven to me in just three days:
  1. that its fast
  2. that its powerful
  3. that it stays out of your way
  4. that it has great docs
  5. that it has very few bugs (error reporting may get out of sync with your edits)
  6. that it has almost every feature you want
  7. that its easy to install
The best part, by far, is that if you have any gripes with the way it does things, you can probably make it work how you want with just a little vimscript knowledge.

We live in a world of awful IDEs and powerful editors, and its about time someone unified them.

Friday, January 17, 2014

A log fought battle

While I'd be the first one to say static methods are the root of all evil, I wouldn't be far from the truth if I did.

In case you aren't aware already, static methods reduce flexibility, testability, and take the Object out of your Object-Oriented code.

It's a wonder that so many people don't see this, when it's in the name - and yes, it deserves it. Forget for a moment that statics are simply namespaces and global state; a static method or a static value is the opposite of a runtime value or a runtime method. I don't mean its more type safe, I mean its cement. It's the anti-if-statement. It's stealing the code from another class and pretending that doesn't violate the single responsibility principle.

Using something static means...force exactly my desired behavior on others that use my code, as if that weren't the opposite of flexibility.

If you don't know what I mean, take this example:

class PasswordVerifier {
    public boolean verify(String password, Byte[] hash) {
        return Crypto.md5(password).equals(hash);
    }
}

Now say a few months pass. I no longer want to use md5, since sha256 is much safer, and I found out that I should use a unique salt for each user to prevent rainbow table attacks. And, hashes being irreversible, I still need to support the old database that used plain md5. The old implementation of PasswordVerifier is just plain wrong.

Isn't that normal, though? Changing requirements lead to changing code. Of course, now this function must accept a salt, and maybe a User, to detect if we use md5 or sha256.

But a changing method signature means changing all other code using it. It means I can't merge the change into the version one branch after solving the bug in master.

On the other hand,

class PasswordVerifier {
    public boolean verify(String password, Byte[] hash, Hasher hasher) {
        return hasher.hash(password).equals(hash);
    }
}

Now I can change how the code operates at different times when it's used, not just you changing how it operates at the different times that it's written.

In my bug fix on master, I just do

boolean valid = verifier.verify(password, hash, new Hasher() {
    @Override
    public Byte[] hash(String input){
        return user.getIsLegacyPassword()
            ? super(input)
            : Crypto.sha256(user.getSalt() + input);
    }
});

And here you will see the other problem with static methods: so far this post has focused on why not to use them, yet I just did with Crypto.sha256(). This is actually intentional, and demonstrates why not to write static methods. It's simple; if you write them you will force others to use them.

I have fought a long battle over the (in)convenience and (in)flexibility of static methods, and in that time I've learned that my argument is more likely to be heard if I "concede" on at least one point. I can use that point to explain why a line is drawn and where I draw it. After all, it's a hard sell to say static methods are always bad.

The example I freely gave of a "good" static method is a logger. After all, loggers are never the condition of a test, never complicate writing tests, never change APIs, and never need to be done in multiple ways, right? And in the rare case they do, it's easy to refactor and definitely won't get merged into the v1 branch.

Every time I said loggers should be static I cringed just a bit, because I knew, the static implementation is still flawed and unnecessary. And still I forced myself to believe it, to be even slightly moderate.

Now please, if you will, import the static logger android.util.Log;

A week ago the logcat app on my android test workstation started acting strangely. It would crash unexpectedly, clear when I didn't want it to, and it decided not to print exception traces ever again. Here I am running tests on an android library, and reading the debug output requires unplugging my android motherboard from my monitor, plugging it back into my development workstation, opening a terminal, and running the adb logcat command.

I eventually rewrote every Log.v(Tag, String) call to print on the interface, a change that could have been as easy as subclassing the app's logger instance. Of course I cannot merge these changes into master as it is for test purposes only. And I thought nothing of it.

It's strange how quickly afterwards I lost another day over this static logger.

Today I began unit tests on that framework. To keep the tests fast enough that they'll run on every build, I took the code with no relationship to the android framework. Here nobody needs a slow android emulator, I wrote a test that could run anywhere that has installed java. 

    1 test failed, 0 tests passed, 1 test total.
    MyTestClass.myTestMethod: Stub!

Because the component I am testing is written for android, even the code that is pure business logic and has nothing to do with android, will use android.util.Log. But android apps don't compile against the real code. Google instead gives you a special .jar file in which every single method call throws an exception, including Log.v(), until you run it in a real device.

In the end I found that I could include Robolectric to generate bytecode that overwrites the "Stub!" implementation of android.util.Log, but not without bringing in a huge compilation dependency, tearing my hair out, and craving a systemwide refactoring.

It would've been a hell of a lot easier if these classes accepted a non-static Logger instance as a dependency, and I could pass in my own NoopLogger or InterfaceLogger, and go home on time.

It's a log fought battle, but I will keep fighting it, and I hope you do the same.