July 11, 2009

Deferred object models, plus caveat

I've been doing a bunch of work in JavaScript lately using Dojo's asynchronous I/O object implementation, dojo.Deferred, based off Twisted Python's deferred model. Deferreds represent a piece of asynchronous I/O (i.e. data retrieval from a web service) and a chain of callbacks that should be performed when it completes.

Deferreds are an excellent way to keep your head from exploding when dealing with callback-oriented APIs, especially when you need to coordinate the point at which several asynchronous operations occur.

Object Models

Let's say that you so happened to be eliminating web service dependencies with a language-specific abstraction barrier — using deferreds to represent your lazy data retrieval is a very nice abstraction. [*]

/**
 * Constructor.
 */
function RemoteAnimal(uri, kwargs) {
    var self = this;
    self._uri = uri;
    if (!kwargs) {
        return;
    }
    self._children = kwargs.children;
}

/**
 * Simple (synchronous) getter.
 */
RemoteAnimal.prototype.getURI = function() {
    // Simple, synchronous getter.
    var self = this;
    return self._uri;
};

/**
 * Asynchronous fetcher.
 */
RemoteAnimal.prototype.fetchChildren = function() {
    if (self._children !== undefined) {
        // Already fetched
        return self._children;
    }
    var deferred = dojo.Deferred();
    asyncFind('animal', // type
        'Parent = ' + self.getURI(), // query
        function callback(results) {
            var objs = dojo.map(results, self.constructor);
            self._children = objs;
            deferred.callback(self._children);
        },
        function errback(errors) {
            console && console.dir(errors);
            throw new Exception('Problem finding children');
        });
    return deferred;
};

The above uses an obvious naming convention to differentiate between synchronous and asynchronous getters — it's important to know when a method will be returning a deferred as opposed to the desired value itself!

There is, however, a sneaky issue with RemoteAnimal.prototype.fetchChildren -- can you tell what it is? You have to think pretty hard about what kinds of execution asynchronous I/O allows from the perspective of a caller.

Caveat

The key is to realize that fetchChildren could be called a whole lot of times in a row on a really slow connection (or if the caller is calling it in a for loop, which is more likely to be a problem). This will create n outstanding asyncFinds, triggering n asynchronous XMLHttpRequests, where we really only needed one. To fix this, you need to patch your code so that only one request could be outstanding at any given time, like so:

RemoteAnimal.prototype.fetchChildren = function() {
    // Property where the first deferred gets stashed.
    var flagProperty = '_childrenDeferred';
    // Property where runner-up deferreds get stashed.
    var othersProperty = '_childrenDeferredOthers';

    if (self._children !== undefined) {
        // Already fetched
        return self._children;
    }
    var deferred = dojo.Deferred();

    if (self[flagProperty]) {
        // Fetch is outstanding... latch on
        if (!(othersProperty in self)) {
            self[othersProperty] = [];
        }
        self[othersProperty].push(deferred);
        return deferred;
    }

    asyncFind('animal', // type
        'Parent = ' + self.getURI(), // query
        function callback(results) {
            var objs = dojo.map(results, self.constructor);
            self._children = objs;
            deferred.callback(self._children);
            delete self[flagProperty];
            if (!self[othersProperty]) {
                // No runner-ups.
                return;
            }
            // Trigger the runner-up deferreds in the order that they came.
            dojo.forEach(self[othersProperty], function(otherDeferred) {
                otherDeferred.callback(self._children);
            });
            delete self[othersProperty];
        },
        function errback(errors) {
            console && console.dir(errors);
            throw new Exception('Problem finding children');
        });
    self[flagProperty] = deferred; // Flag as outstanding
    return deferred;
}

There's a non-trivial amount of added complexity there:

The annoying part is that you can't just tack the runner-up deferred triggers onto the original as callbacks. (That would make the construction so much easier!) Between the first and any later invocation of fetchChildren;, there may be callbacks added to the original deferred that would arbitrarily change the outcome. As a result, we have to call the other deferreds back directly with the found children from the original fetch.

It's possible to factor out this runner-up-aggregating behavior in a highly reusable way — this code was meant to demonstrate the caveat and how to fix it. Encapsulating the runner-up triggering behavior is left as an exercise to the reader. :-)

Non-Obvious Operations with Deferreds

For educational purposes I've enumerated a few non-obvious operations that you can perform using deferreds. [†]

Barrier (block until all parallelized operations complete)
var deferreds = dojo.map(animals, function(animal) {
    return animal.fetchChildren();
});
var megaDeferred = DeferredList.gatherResults(deferreds);
megaDeferred.addCallback(function process(listOfChildren) {
    // "Oh yeah, there's definitely more than one children in there.."
    // Extra points if you can name the quote!
    assertEquals(listOfChildren.length, animals.length,
        'Must have equal number of animals and lists of children');
    // ...
});
return megaDeferred;

The great thing about this is that you can take embarrassingly parallel I/O operations and perform them all at once without the headaches that accompany threading.

Recursion (keep issuing I/O until a base condition is met) [‡]
RemoteAnimal.getSubtree = function(animals) {
    var all = new Array(animals);

    function flattenAndFilter(listOfLists) {
        var accumulator = [];
        dojo.forEach(listOfLists, function(list) {
            if (!list) { // Filter nulls
                return;
            }
            accumulator.push.apply(accumulator, list);
        });
        return accumulator;
    }

    function iterate(generation) {
        if (generation.length === 0) {
            return all;
        }
        var deferreds = dojo.map(generation, function(animal) {
            return animal.getChildren();
        });
        var megaDeferred = Deferred.gatherResults(deferreds);
        megaDeferred.addCallback(flattenAndFilter);
        megaDeferred.addCallback(iterate);
        return megaDeferred;
    }

    // Initialize
    var deferreds = dojo.map(animals, function(animal) {
        return animal.getChildren();
    });
    var megaDeferred = DeferredList.gatherResults(deferreds);
    megaDeferred.addCallback(flattenAndFilter);
    megaDeferred.addCallback(iterate);
    return megaDeferred;
}

The key to understanding this example is that returning a deferred from a callback chains that deferred in. In other words, if you return a deferred from a callback, all that deferred's callbacks will be executed before you come back to the original. That way, we can perform recursion without worrying whether or not somebody has added more callbacks onto the initial megaDeferred returned from the getSubtree function — the chained callbacks execute right away, before any callbacks added by the caller.

Footnotes

[*]

It's important to note that you may also want to implement vector operations for efficiency reasons, as in RemoteAnimal.getSubtree above.

[†]

Note that dojo.DeferredList.gatherResults is actually mapped as dojo.DeferredList.prototype.gatherResults in my version of Dojo (1.3.1) -- not sure what's up with that. It's certainly a factory function, so I've mapped it onto the dojo.DeferredList constructor as well.

[‡]

Some potential refactoring is omitted for clarity.