Tuesday, August 04, 2009

Variations on the visitor pattern

In the current project I've written so far at least 20 class hierarchies (+ their visitors) representing ADTs.

The common scenario is that we have some data which is created by one module and consumed by others. Since the data is shared by 2+ modules, most operations on it require several services from several modules.
So we're out of luck with the Interpreter Pattern, we use the Visitor Pattern/Double Dispatch:

abstract class Expression {
....static class Number extends Expression {
........public final int n;
........Number(int n) { this.n = n; }
....}
....static class Plus extends Expression {
........// since the data is immutable, there makes no sense to create getters
........public final Expression left;
........ public final Expression right;
........Plus(left, right) { this.left = left; this.right = right; }
....}

....// toString, equals, hashCode are implemented with apache commons-lang, based on reflection up to the Expression class
....}

The classic Visitor pattern says:

interface Visitor {
....void visit(Number n);
....void visit(Plus plus);
}

and

abstract void Expression#accept(Visitor v);
....void Number#accept(Visitor v) { v.visit(this); }
....void Plus#accept(Visitor v) { v.visit(this); }
// noticed the copy-paste !? (imagine doing that 40+ times...)

In this example the visitor is a stateful closure which encapsulates a computation its initial data and its final (accumulated) result.

The problem which might arise is how do we handle adding more data type (i.e. classes) to our hierarchy. See the Expression Problem, Extensible Visitor/book (more on google ;).
The proposed solution is to drill a hole in the type safety, in order to make old code compatible with new code:

interface Visitor {
....void visit(Number n);
....void visit(Plus plus);
....void otherwise(Expression x);
}

Now we want to make the visitor stateless, maybe some visitors could be injectable services, eg: ExpressionPrinter, ExpressionEvaluator etc. In this scenario we want to extract the initial data & the result from the visitor:

interface Visitor[A,R] {
....R visit(Number n, A args);
....R visit(Plus plus, A args);
....R otherwise(Expression x, A args);
}

we also need to change the signature of the Expression#accept

abstract [A,R] R Expression#accept(Visitor v, A args);
....[A,R] R Number#accept(Visitor v, A args) { v.visit(this, args); }
....[A,R] R Plus#accept(Visitor v, A args) { v.visit(this, args); }
// noticed the copy-paste !? (imagine doing that 40+ times...)

In this scenarion we can pass the initial data as args , the Expression will pass that to the visitor, which computes the final result R.

Another variation is when the visitor returns an element of the type which he visited. For that we need to modify the Expression:

interface Expression {
....E accept(Visitor v, A args);
....static class Number implements Expression { ... }
}

In this case we might use the A args in order to accumulate other results than the specified E type (we use the A as an accumulator).

If we want to really keep things clean and separate input from output, we could return on accept a tuple[R, E], but that's not needed in 90% of the cases.

Compare that with pattern matching on case-classes:
* no need for toString, equals, hashCode since they are generated by the compiler
* no need for visitors due to pattern matching
** no issues with statefull visitors
* no need for manual written ctors since they are written by the compiler,
* we get cca 50% less code to maintain.

Now scale that to 20+ class hierarchies...

No comments: