Monday, June 9, 2014

Why You Want to Test Private Methods

As programmers, we want to leave behind an API that "can't go wrong," that doesn't cause bugs by being used in the wrong order. Often the first tool we grab to do this is private methods.

This leads to a classic question.
How do I test private methods?
And the standard answer, why do you want to?

It's a response that makes one feel like a wise, bearded, mysterious master of code. But I already know the answer, for the vast majority of cases. The answer to why someone wants to test a private method is:

They shouldn't be using a private method.

At least, this is the answer 99% of the time.

But Not Me, Seriously

Odds are against you. Lets see what private methods should not necessarily be used for. Odds are pretty high that your reason for using private methods is in this list.
  • Keeping inner data structures coherent
  • Reducing duplicate code
  • Isolating pure functions from stateful ones
But I've made a list of cryptic nonsense. This is a time for examples.

Keeping inner data structures coherent
class Teacher {
    private String[] students;

    // Use this instead of adding directly to student, so
    // that its always alphabetical.
    private void addStudent(String student) {
        students.add(student);
        alphabetizeStudents(); 
    }
    // some public methods... 
}

Here we wrote a private method to handle writes into our array of students, and maintain their coherence - in this case, alphabetization. You then think, alphabetization is easy to test. How do I test this?

Reducing duplicate code
class StudentCsv {
    private writeLine(String... vals) {
       bool first = true; 
        for(String val : vals) {
            File.write(val);
            if(!first) File.write(',');
            else first = false;
       }
    }
    // some public methods... 
}

Here we found yourself writing a lot of comma management in this CSV generator. You think, lets abstract this into a method. And comma separation is easy to test, but now its private. How do I test this?

Isolating pure functions from stateful ones
class CollisionDetector {
    public long euclidianDistance(Point a, Point b) {
       return sqrt(sqr(a.x - b.x) - sqr(a.y - b.y));
    }
    // some public methods... 
}

Your heart tells you a distance function doesn't belong buried inside other code in your collision detector. Why? Its stateless. So it could be anywhere, and its reusable; maybe you'll use it somewhere else in CollisionDetector. Its not reusable otherwise yet, but that doesn't matter. Don't people say to make code usable before reusable? And so now the question in your mind is, how do I test that it's usable?

Make It Reusable

One of the reasons why testable code is so highly related to good code is that it enforces decoupling. To test your code, don't make the method public, make the method reusable.

We want our solution to keep our APIs clean and impossible to use wrong. And of course we can keep that. We can't call private methods of course, just like you can't call a public method on a private class member.

This gives us a simple pattern for preserving encapsulation, honoring the Single responsibility Principle, and making your private methods reusable.
  1. Create a new class to handle whatever responsibility that private method had. Suppose we're refactoring Recipe and creating a class Ingredient.
  2. Move any other Recipe code which belongs to the IngredienSet responsibility into the IngredientSet code.
  3. Give a new private member of type IngredientSet to Recipe

And that's it. Let me apply this to the simplest example:

Isolate pure functions from stateful classes
class DistanceCalculator {
    public long euclidianDistance(Point a, Point b) { ... }
}

class CollisionDetector {
    private DistanceCalculator distCalc = new DistanceCalculator();
    // some public methods...
}

By creating a class DistanceCalculator, we now have the ability to check distances anywhere in our code, including our tests. At the same time, we haven't polluted CollisionDetector's interface to include distance checking that isn't its responsibility.

Next up, why are we coupling CSV output directly in our StudentCsv class? Reducing code is good, but you missed the fact that you can:

Reduce responsibilities in a class
class Csv {
    public void write(File file, String... values) { ... }
}

class StudentCsv {
    private Csv = new Csv();
    // some public methods
}

It only took creating a CSV class to handle comma separation, and now anyone can quickly create a new report. Our student CSV only handles gathering the student information and feeding it into the CSV generator.

Finally, we're up against our alphabetized students listing. While making it testable is reward enough on its own, changing our implementation will also:

Actually ensure data coherence
class AlphabetizedList extends ArrayList<String> {
    @Override
    public void add(String s) {
        parent.add(s);
        alphabetize();
    }

    private void alphabetize() { ... } 
}

class Teacher {
     private AlphabetizedList students = new AlphabetizedList();
}

Here we created a subclass of ArrayList which imposes a limitation on its superclass. On each write, it alphabetizes itself. Now our Teacher class can simply treat its list of students as a list of students, and its impossible to have any code which changes the list and doesn't also alphabetize it.

If you're still actually reading my code examples, you noticed that something special about this last code example. It contained a private method, alphabetize() on AlphabetizedList!

When to Use Private Methods

This is where I reaffirm that private methods have a place, and describe gives alphabetize() the privilege.

Except, alphabetize() should not have been a private method. It was a trick. Alphabetize is an example of isolating pure functions from stateful classes. It is entirely conceivable that someone may want to one day alphabetize some other list. The fact that I made it private in the first place should be an immediate clue that the code might deserve a class of its own.

So when is it correct to use private methods, if ever? I could come up with many longwinded and vague, highly-nuanced answers, and it wouldn't help anyone at all. So I'll give you the practical, albeit informal answer instead.

Use private methods any time you already would have, and when you neither want to test, nor reuse the code.

Of course, if you probably will test/reuse it in the future, you might want to extract it into a class now before it gets ingrained into the system as a bad choice.

So next time someone says in a smarmy voice, "Why do you want to test a private method?" make sure to give them the answer.

Monday, June 2, 2014

Compression Driven Development

Good Code is Hard to Measure

And all too often, even good programmers get into a battle of the-least-code-wins. And not to the point of code golf, I mean real production code that's tightened as much as possible.

Codebases all eventually get bloated and buggy and hard to navigate. We all know that less code means less code to trudge through, and fewer headaches. And of course, where bloated code is annoying, duplicated code is an inexcusable bug waiting to happen. This would seem to indicate that excess code causes all of our woes.

A recent blogpost touts these facts as the reasons for "Compression-oriented Programming," telling programmers to develop in this process:
Begin by just typing out exactly what [needs] to happen in each specific case, without any regard to “correctness” or “abstraction” ...when doing the same thing a second time somewhere else, ...pull out the reusable portion and share it.
But before we slip back into focusing on the battle of the least code, let's explore the consequences of taking this methodology too far.

What Does Bad Compression Look Like?

Let's start with some repetitive code, and let's see how badly this advice could be followed.
cairo_set_color(cairo, 0, 20, 20, 30);
cairo_line_to(cairo, 7, 8);
cairo_set_color(cairo, 0, 30, 30, 20);
cairo_line_to(cairo, 8, 9);
cairo_set_color(cairo, 0, 20, 20, 30);
cairo_line_to(cairo, 9, 10);
cairo_set_color(cairo, 0, 30, 30, 20);
cairo_line_to(cairo, 10, 11);

This code draws from 7,8 to 8,9 to 9,10 to 10,11 in alternating colors. It shouldn't be hard to figure that out, and it shouldn't be that hard to maintain.

But let's follow the compression-oriented programming guide and say that these eight lines are certainly doing something more than once. Let's see what we are doing repeatedly that we can streamline.

We are
  • setting the color to 0, 20, 20, 30 twice
  • setting the color to 0, 30, 20, 20 twice
  • drawing from x, y to x+1, y+1 three times
  • using 'cairo_set_color(cairo' four times
  • using 'cairo_line_to(cairo' four times.
So let's "clean this code up."
#define my_cairo_set_color(a, r, g, b) (cairo_set_color(cairo, a, r, g, b))
#define my_cairo_line_to(x, y) (cairo_line_to(cairo, x, y))
#define cairo_move_to_next_xy my_cairo_line_to(x, y); x++; y++;

void cairo_set_subtle_blue_grey() {
  my_cairo_set_color(0, 20, 20, 30);
}

void cairo_set_subtle_red_grey() {
  my_cairo_set_color(0, 30, 30, 20);
}

run() {
  int x = 7; int y = 8;
  cairo_set_subtle_blue_grey();
  cairo_move_to_next_xy;
  cairo_set_subtle_red_grey();
  cairo_move_to_next_xy;
  cairo_set_subtle_blue_grey();
  cairo_move_to_next_xy;
  cairo_set_subtle_red_grey();
  cairo_move_to_next_xy;
}

Once you understand the domain language, the 'cleaned up' code is certainly more literate. But at a huge cost! I have deliberately chosen bad abstractions, such as macros using invisible variables, and hidden alteration of state.

But our code in run() is still so redundant! And after all that work! Lets see if we can fix it in another pass of our compressor.
run() {
  int x = 7; int y = 8;
  for(int i = 0; i < 4; i++) {
    i % 2 ? cairo_set_subtle_red_grey() : cairo_set_subtle_blue_grey();
    cairo_move_to_next_xy;
  }
}

Now this is what we wanted! It might invoke runtime costs but hopefully our optimizing compiler will see through it. It might take a split second to determine whether red or blue will come first, and there's still hidden variables and hidden alteration of state, but overall it's much more compact.

I'm going to 'clean up' some duplication in our macros and helper functions, and then here is our final result.
#define call_cairo(fn) fn(cairo,
#define my_cairo_set_color(a, r, g, b) (call_cairo cairo_set_color a, r, g, b))
#define my_cairo_line_to(x, y) (call_cairo cairo_line_to x, y))
#define cairo_move_to_next_xy my_cairo_line_to(x, y); x++; y++;
#define select_hue(expected) (hue == expected ? 30 : 20)
#define HUE_RED 0
#define HUE_BLUE 1
#define HUE_GREEN 2

void cairo_set_subtle_grey_hue(int hue) {
  my_cairo_set_color(0, select_hue(HUE_RED), select_hue(HUE, GREEN), select_hue(HUE, BLUE)); 

void cairo_set_subtle_blue_grey() {
  my_cairo_set_subtle_grey_hue(HUE_BLUE);
}

void cairo_set_subtle_red_grey() {
  my_cairo_set_subtle_grey_hue(HUE_RED);
}

run() {
  int x = 7; int y = 8;
  for(int i = 0; i < 4; i++) {
    i % 2 ? cairo_set_subtle_red_grey() : cairo_set_subtle_blue_grey();
    cairo_move_to_next_xy;
  }
}

Hey! I Can See Through Your Tricks

The problem that I forced in my example compression is that I chose the wrong abstractions in the wrong order, and focused on the wrong things.

I compressed code paths that are incidental to the program, instead of paths that are inherent to the problem domain.

And, as a cherry on top, I compressed in ways that I knew would simply lead to another round of compression.

How extreme is this example? I don't think it's that extreme. Once the code has been split out to this level abstraction, I don't think any programmer is going to fix it any time soon. I've made a pretty tangled hairball in just 27 lines of code, who would risk fixing it if it were spread out across a hundred lines? A thousand? Five thousand?

That novice developer focused on removing duplicate code went too far, and nobody stopped him, and nobody will undo his mistake.

How it Should Have Ended, in Six Lines

int x = 7; int y = 8;
for(int i = 0; i < 4; i++) {
  i % 2 ? cairo_set_color(cairo, 0, 20, 20, 30) : cairo_set_color(cairo, 0, 30, 20, 20);
  cairo_move_to(x, y);
  x++; y++;
}

This code is more maintainable than the eight-line version, but it is also harder to read. Given how this repetitive code is rare, we've maybe turned an 8k simple-to-follow codebase into a 7k harder-to-follow codebase. Its not exactly worth a raise.

Abstraction Trees

Code is hierarchical, and that hierarchy often (though not always) needs to be understood in order to solve bugs, add new features, and refactor code. Any time you consolidate code you heighten the abstraction tree.

Following the abstraction tree of the good example, we must
  1. understand that we'll run the code four times
  2. understand the state change each time
  3. understand the color swap each time.
In our bad example, we must additionally
  1. understand that my_cairo_move_to uses & modifies state
  2. understand call_cairo's syntax trickery
  3. understand how hues in RGB work
  4. understand that select_hue chooses a high/low value to match

We've doubled what we need to know in order to follow the code from top to bottom, and that is a huge price to pay.

So when is compression-driven development worth pursuing? When is it an antipattern?

Compression-driven Development is an Antipattern

But not always.

You are turning yourself into a compression algorithm with the restriction of syntactically valid results. Don't do it.

Approach the problem of duplicated code as a domain-specific problem. Don't solve it with the first abstraction that comes to mind, solve it in the way that the problem-space is necessarily tied to.

Abstraction should be used when the behavior you are compressing is clearly part of the problem domain, and when the difficulty of traversing the abstraction tree from top to bottom is clearly lower than the cost of modifying duplicated code.

Compression-oriented programming can give you insights on good vs bad abstractions. Compression-driven development (where compression is the means as well as the end) is only ever incidentally good to a codebase at best, and actually encourages the creation of monolithic, bloated codebases at worst.