Friday, January 9, 2015

The Visitor Pattern vs Object Composition

Over the last week or so, I've been going down a large refactoring of the wake compiler. And in all the hours I've spent rewriting the parse tree's data structure, I have yet to choose between two design patterns. Its possible that there is already a third that unifies the two, and if not then I'd like to share this problem as a little hole in computer science.

The design problem is as follows: say you want to traverse a binary tree.

We'd probably do something like:

class BinaryTree {
    Object item;
    BinaryTree left;
    BinaryTree right;
}

From here you have two choices. Either create your algorithms as methods on the tree:

   public Object findByPath(int binaryPath) {
       if(!binaryPath) return item;
       else if(binaryPath & 1) {
            return left.findByPath(binaryPath << 1);
        } else {
            return right.findByPath(binaryPath << 1);
        }
    }

or you could create a class for each algorithm that operates on the tree.

class BinaryPathFinder {

    public Object findByPath(BinaryTree root, int binaryPath) {
         if(!binaryPath) return root.item;
         else if(binaryPath & 1) {
             return findByPath(root.left, binaryPath << 1);
         } else {
             return findByPath(root.right, binaryPath << 1);
         }
     }
}

This could at times be the better approach -- a binary tree has many types of algorithms that could be performed on it, so this separates the algorithms from the data structure and allows users of your API to perform operations on your data that you didn't forsee.

But our tree isn't very typesafe right now. We'd be able to, for instance, set only a left node, or a right node, and/or set both nodes and a value at once. Maybe that's expected, but maybe not. Solving this typesafety is where we create the key differences between the Visitor pattern and plain 'ol object composition.

So, we can easily make it typesafe by going with the first approach:

interface BinaryTreeNode {
    public Object findByPath(int binaryPath);
}

class BinaryTreeLeaf  implements BinaryTreeNode {
    Object item;
    public Object findByPath(binaryPath) {
        return item;
     }
}

class BinaryTreeFork implements BinaryTreeNode {
    BinaryTreeNode left;
    BinaryTreeNode right;
    public Object findByPath(binaryPath) {
         ....
    }
}

But the problem is that we cannot decouple this data structure from its algorithm without actually worsening our typesafety with reverse casts and instanceof checks, turning BinaryTreeNodes into BinayTreeForks and BinaryTreeLeafs. In fact, without our algorithm's method 'findByPath(int)', our BinaryTreeNode interface would be 100% empty and 100% useless.

The Visitor pattern is a simple but clever solution to this problem.

interface BinaryTreeNode {
    public Object visit(BinaryTreeVisitor visitor);
}

interface BinaryTreeVisitor {
    public Object visit(BinaryTreeLeaf leaf);
    public Object visit(BinaryTreeFork fork);
}

class BinaryTreeLeaf implements BinaryTreeNode {
    Object item;
    public Object visit(BinayTreeVisitor visitor) {
        return visitor.visit(this);
    }
}

class BinaryTreeFork implements BinaryTreeNode {
     BinaryTreeNode left;
     BinaryTreeNode right;
     public Object visit(BinaryTreeVisitor visitor) {
         return visitor.visit(this);
     }
}

class BinaryTreeFindByPath implements BinaryTreeVisitor {
    public __construct(int binaryPath) {
        ...
    }

    public Object visit(BinaryTreeLeaf leaf) {
        ...
    }

    public Object visit(BinaryTreeFork fork) {
        ...
    }
}

Note: this may technically be a variation from the visitor pattern. The visitor pattern is often used to attach visitors to the tree in one step and then operate on it separately, while this does both in one step.

This visitor is more complicated here, because it needs to track the state of binaryPath, but the important thing is that its typesafe and decoupled. Especially assuming a stateless algorithm, this could once again be the better choice.

But there's an advantage to the non-visitor solution that gets lost once we make that change. Object composition allows us to create invisible little nifty wrapper classes around BinaryTree. For instance, we could write a quick little caching layer.

class BinaryTreeCache implements BinaryTreeNode {

    public __construct(BinaryTreeNode wrapped) { ... }
    public Object[] cache;
    public Object findByPath(int path) {
        if(cache[path]) return cache[path];
        else return wrapped.findByPath(path);
    }

}

This allows any single node (leaf or binary) to be cached, without caching all of them. You could use inheritance here, but you'd have to choose whether to extend BinaryTreeFork or BinaryTreeLeaf. With a composition based approach, you can extend a leaf, a branch, or one of your other wrapper classes, all seamlessly.

Notice that this behaviour would not work with the visitor pattern. Without extra code for detecting a change in the visitor, you would return an outdated, possibly nonsensical result. This is because we have no longer fully decoupled behaviour from the tree, and it may step on what is supposedly the visitor's job. It's supposed to be the visitors job to handle these behaviour changes, but the visitor can not easily add caching over individual nodes in as nearly elegant of a way.

You could definitely try to mix the two patterns naively, and for some things it would work. For instance, you could make a BinaryTreeNode that lazy-loads its contents from a file, which the visitor would be able to traverse without issue. But any tweaks to the behavior of your algorithm would not be guaranteed to play nicely with the visitor (timeouts, threading, probabilistics, distributing across servers).

Essentially, the object composition approach gives you the ability to separate how to configure an algorithm from the actual moment it runs. If you want a cache at every three layers and a distributed answer for every five, you can put all those if-statements into code that builds or transforms the tree, instead of putting those if-statements inside the code that operates on the tree.

Putting my rambling about the greatness of object composition aside, the advantages here specifically include:
  • Decoupling edge-case code from main use-cases
  • Separating configuration step from operations step
  • Code-driven configuration for maximum flexibility
  • Infinite layerability of edge-case wrappers
  • Easy to extend behaviour without breaking SRP
  • Small classes
  • Possibility of tree transformers that transform your configuration


By giving the visitor the responsibility of behaviour instead of the nodes, we need to be able to compose the visitor in order to compose behavior in these neat ways. But if we wrap or extend the visitor class, we end up changing the way that every leaf or branch is used. I can't see any easy ways around this.

So the question is, could the visitor pattern be upgraded to allow object composition to change the behaviour of the algorithm on a node by node basis? I'd welcome the solution, and it would live happily in the belly of the wake compiler to stand the test of time.