Game of Pricks

You can glean a lot from a type signature. An example (in Haskell):

mystery :: a -> [a] -> Bool

Haskell

There are really only a couple of things this function can sensibly do. Further, because this is Haskell (which wisely shuns false economies like mutation) we can tell that mystery does not depend upon nor modify any global mutable state.

Well-crafted object-oriented code can imply similar behavioural attributes through method signatures. An example class:

class RegistrationService(object):
def __init__(self, userRepo, mailService):
# elided

Python

Just from the constructor the reader can infer that RegistrationService is going to be doing something with users — maybe updating the database — and something with mail — sending a welcome message perhaps. We can immediately see the interactions this class has with the rest of the system, and we can see exactly what dependencies have to be satisfied, either in production or test code.

It can be difficult for client code to wire up those dependencies but we have dependency injection these days to take care of this. Now, injection is fantastic in production, but I’m weary of using it in test code1, so tests tend to directly invoke constructors with mocked dependencies in code.

The problem here is that as a class’ dependencies change over time, those instantiations in test code must also be changed. This is especially bad in poorly isolated “unit” tests that effectively hand-wire each layer in some jaunty incomprehensible pyramid, mocking only where absolutely necessary.

One “solution” is to reach for the deus ex machina of service location. I vehemently oppose this cheat. Service location is nothing more than global variables for people who should know better. We must eschew such “pragmatic” laziness for it obscures the requirements and behaviours of our class that are clearly observed and inferred when dependencies are specified up front in the constructor. Code becomes intractable and temporally coupled to the state of the service locator, and those tests you were trying to avoid maintaining are just going to start failing anyway because you haven’t added the new dependency to the service locator!

Instead we should view the pain of maintaining tests not as something to be avoided, but as a necessary signal that something is wrong and that we must fix it.

If a constructor specifies twelve collaborators as parameters, well that’s unfortunate but at least it’s honest. Those parameters are hard dependencies that must be satisfied for the code to compile. Culling them and replacing them with calls to a service locator does not remove the dependency, it doesn’t simplify or fix the system, it just hides the problem temporarily. So, reduce and manage your dependencies, even if it means splitting services into multiple classes, and focus your unit tests on individual units for Pete’s sake!


  1. Believe me, it has been tried: I advise against it.  

□ □ □

Yumbo

A dear colleague — the talented and ever-forgiving Mr Sigal — recently broached the subject of parsing yUML‘s concise graph DSL for class diagrams. A quick perusal of the samples evinced a simple (if terse) regular grammar. As such, a regular expression would be a sufficient parser, but that would miss out on some prime neckbeardery!

No, the perfect tool for the job would be a combination of two classic Unix tools: lex(1) and yacc(1). The basic workings of the two are quite similar — rustle up a hodge-podge of declarative rules and C-ish actions, run the lex or yacc binary over it, and out pops some lovely, brain melting C, ready to integrate however you please. The difference: lex produces a regular expression-driven lexer (or scanner; yylex) which reads a bytestream, splitting it up into tokens, while yacc produces a shift-reduce parser (yyparse) which endeavours to fit those tokens into a language you define using grammar rules.

You don’t have to use the two together (you can easily create your own lexer if you so please), but something about the names gives me the impression that they’re built to go together.

As an example, here’s a yacc grammar to recognize and evaluate simple arithmetic expressions1:

Expression : Expression Term '\n'   { printf("= %d\n", $2); }
|
;
 
Term : Term '+' Term { $$ = $1 + $3; }
| Term '-' Term { $$ = $1 - $3; }
| Factor { $$ = $1; }
;
 
Factor : '(' Term ')' { $$ = $2; }
| Factor '*' Factor { $$ = $1 * $3; }
| Factor '/' Factor { $$ = $1 / $3; }
| T_Number { $$ = yylval; }
;

Yacc

Note that $$ is the return value from the current rule, $n is the result of the nth part of the production, T_Number represents a numeric token identified by the lexer, and yylval is its value. The corresponding lex definition is simple:

[0-9]+      { yylval = atoi(yytext); return T_Number; }
[-+*/()\n] { return yytext[0]; }
. { /* ignored */ }

Lex

After a couple of hours playing with similar examples, I had a basic recognizer for a subset of the desired language. A couple more evenings and it covered about everything I needed, and collected the results into some simple data structures.

Next step: create some diagrams. I briefly flirted with the idea of generating ASCII diagrams, but it turns out to require some actual talent, and thus was out of my reach. However, after a bit of research it was clear that Graphviz could be coaxed into creating some pretty reasonable class diagrams. A couple more spurts of hackery and I had a generator for the dot language, and a web front-end to wrap everything up.

Here is the resulting program in webular form, and here is the source.


  1. All the important bits are elided; please refer to the repo for details.  

□ □ □

Worse Than Wrong

In a fit of mania, this happened.

I’m sorry.

□ □ □

Kiwi PyCon 2011

Your humble author was positively bristling with anticipation ahead of last weekend’s PieCon. Imagine the surprise when upon arriving I realised that this was no festival of appreciation for New Zealand’s culinary gift to the world – the meat pie – but instead a conference for adherents of the Python programming language!

Disappointed but not deterred, I endeavoured to make the most of the situation. After all, delicious pizza (almost as good as a pie) was to be supplied.

Jeff Rush’s presentation on metaprogramming was an early highlight, given my predilection for clever code. It left your author with a deep appreciation for the malleability of a language where abstract concepts (like method dispatch) are exposed in the same way as standard language operations (like dictionary lookups).

Two presentations addressed asynchronicity. Alex Dong of trunk.ly covered gevent which – amazingly – monkey patches the standard library to convert synchronous calls into the asynchronous equivalents, and Redis, which is used as a message queue between a master process and any number of worker processes. Aurynn Shaw of Weta covered the alternatives: the venerable Twisted Matrix and Tornado. I’m intrigued by Twisted’s use of deferred objects passed up to the event loop as opposed to Tornado’s explicit CPS (which is more in line with upstart Node).

Lightning talks were a particular highlight. “Just Because You Can, Doesn’t Mean You Should”1, and “Job Security (Through Obfuscation)” were standouts – again: impenetrability is an epic win in my books.

From Mark Ramm came an uproarious keynote on non-relational databases. I honestly believe Sourceforge would have a bright future if they gave up on project hosting and instead dedicated the site to Mark’s hilarious anecdotes. Technically, he also managed to clarify C.A.P. and somewhat temper the hype around so-called NoSQL datastores.

GitHub’s Corey Donohue gave a seizure-inducing overview of their devops process. I can’t help but feel envious of the (technological and cultural) ability to deploy to production on a whim (up to twenty times a day), simply by messaging a chat bot2! I hesitate to use the word “empowerment” but its difficult to find another way of describing the gestalt of the GitHub philosophy.

Embedded systems were covered (tangentially) by Michael Hudson-Doyle – his work for Linaro concerns automated testing of the Linux kernel across racks of disparate ARM chips – and Jeremy Stott. Jeremy’s demonstration may have been the most ambitious I’ve yet seen: an ARM-based mBed running a Python VM (PyMite) seated in a mobile robot controlled over Wi-Fi, and filmed with an Android smartphone impersonating a webcam. Great stuff!

One of the last talks was one I was looking forward to all weekend. Douglas Bagnall described the various types of pseudo-random number generators, from linear congruential generators (used by libc, Java, and JavaScript, and quite abysmal) to generalised feedback shift register and the Mersenne Twister (as used by Python) to more advanced stream ciphers. The most amusing moment of the weekend was the description of cryptographers – they all hate each other so they put a lot of effort into breaking each other’s PRNGs – and DJB in particular: lots of people hate him so his are really well tested!

All in all, a fun time, despite the lack of pastry.


  1. I contend that anything entitled evil ranks as a “should”.  

  2. My current project deploys (to system test) a maximum of once a week.  

□ □ □

The most pernicious feature of the C-derived languages is certainly the NULL pointer. The modern Haskell type system has to some extent tamed the serpent, but in the crude tools of today’s business world we must constantly be vigilant against following an uninitialised reference, lest we fall victim to a segfault, NPE, or NRE.

Foo foo = new Foo {
Bar = new Bar {
Baz = new Baz {
Quux = null
}
}
};
 
// egads!
Console.Out.WriteLine(foo.Bar.Baz.Quux.Z);

C#

In Haskell, variables that may have no value are represented by a Maybe type, which provides an instance of a monad. Computations on such types are chained using the “bind” operator (>>=) – if any computation fails (by returning None), the entire computation results in None.

C# foolishly forbids one from overloading >>=, but we can create something similar:

interface IMaybe<Ta>
{
IMaybe<Tb> Bind<Tb>(Func<Ta, IMaybe<Tb>> f);
 
Ta FromMaybe(Ta deflt);
}
 
class Just<Ta> : IMaybe<Ta>
{
private readonly Ta x;
 
public Just(Ta x) {
this.x = x;
}
 
public IMaybe<Tb> Bind<Tb>(Func<Ta, IMaybe<Tb>> f)
{
return f(x);
}
 
public Ta FromMaybe(Ta deflt)
{
return x;
}
}
 
class None<Ta> : IMaybe<Ta>
{
public IMaybe<Tb> Bind<Tb>(Func<Ta, IMaybe<Tb>> f)
{
return new None<Tb>();
}
 
public Ta FromMaybe(Ta deflt)
{
return deflt;
}
}

C#

A static utility class can now “lift” unclean C# values into a Maybe type:

static class Maybe
{
public static IMaybe<T> None<T>()
{
return new None<T>();
}
 
public static IMaybe<T> Just<T>(T x) {
return new Just<T>(x);
}
 
public static IMaybe<T> Lift<T>(T obj)
{
return obj == null ? None<T>() : Just(obj);
}
}

C#

The use of this abstraction is now obvious!

int z = Maybe.Lift(foo)
.Bind(f => Maybe.Lift(f.Bar))
.Bind(b => Maybe.Lift(b.Baz))
.Bind(b => Maybe.Lift(b.Quux))
.Bind(q => Maybe.Lift(q.Z))
.FromMaybe(0);

C#

We now have something analogous to chaining a series of >>= operators. But Haskell also provides “do”-syntax, where monadic chaining masquerades as a simple imperative language. Can we do something similar in C#?

The language (again, foolishly) won’t permit us to define our own syntax, however, we can traverse and manipulate existing syntax in the way of expression trees. A feasible approximation:

int z = Maybe.Do(foo, f => f.Bar.Baz.Quux.Z);

C#

Let’s try and map a chain of property accesses into a form equivalent to the monadic version above. Essentially, each occurrence of the . operator must be mapped into a .Bind(x => Maybe.Lift(x.y)) – that is, a property expression inside a call to Lift inside a λ expression inside a call to Bind.

static class Maybe
{
// ...
 

// traverse the expression starting from `a' and return a default value if
// anything is null
public static Tb Do<Ta, Tb>(Ta a, Expression<Func<Ta, Tb>> expr)
{
return Do(a, expr, default (Tb));
}
 
// traverse the expression starting from `a' and return `deflt' if
// anything is null
public static Tb Do<Ta, Tb>(Ta a, Expression<Func<Ta, Tb>> expr, Tb deflt)
{
Func<IMaybe<Ta>, IMaybe<Tb>> f = DoExpr(expr).Compile()
as Func<IMaybe<Ta>, IMaybe<Tb>>;
return f(Lift(a)).FromMaybe(deflt);
}
 
// map the simple `expr' into one resplendent with Lifts and Binds
private static LambdaExpression DoExpr<Ta, Tb>(Expression<Func<Ta, Tb>> expr)
{
ParameterExpression param = Expression.Parameter(typeof (IMaybe<Ta>));
return Expression.Lambda(WrapProperty(expr.Body, param), param);
}
 
// recursively walk along a sequence of member expressions converting each
// into Bindful form
// n.b. that this starts from the rightmost occurance of `.' and walks back
// until it hits our `a' from above
private static Expression WrapProperty(Expression expr, Expression param)
{
MemberExpression memberExpr = (MemberExpression) expr;
 
Type memberType = memberExpr.Type;
Type exprType = memberExpr.Expression.Type;
Type maybeType = typeof (IMaybe<>).MakeGenericType(exprType);
 
MethodInfo genericBind = maybeType.GetMethod("Bind");
MethodInfo bind = genericBind.MakeGenericMethod(memberType);
 
MethodInfo genericLift = typeof (Maybe).GetMethod("Lift");
MethodInfo lift = genericLift.MakeGenericMethod(memberType);
 
ParameterExpression lambdaParam = Expression.Parameter(exprType);
Expression property = Expression.Property(lambdaParam, memberExpr.Member.Name);
Expression lambda = Expression.Lambda(
Expression.Call(lift, property),
lambdaParam);
 
Expression target = memberExpr.Expression is MemberExpression
? WrapProperty(memberExpr.Expression, param)
: param;
return Expression.Call(target, bind, lambda);
}
}

C#

There’s further opportunity to provide a LINQ-compatible Select method and thus enable query expressions, but the result is not nearly as beautiful as that shown.