June 1, 2009

Registry pattern trumps import magic

The other night I saw an interesting tweet in the #Python Twitter channel -- Patrick was looking to harness the dynamism of a language like Python in a way that many Pythonistas would consider magical. [*] Coming from languages with more rigid execution models, it's understandably easy to confuse dynamic and magical. [†]

What is magic?

To quote the jargon file, magic is:

Characteristic of something that works although no one really understands why (this is especially called black magic).

Taken in the context of programming, magic refers to code that works without a straightforward way of determining why it works.

Today's more flexible languages provide the programmer with a significant amount of power at runtime, making the barrier to "accidental magic" much lower. As a programmer who works with dynamic languages, there's an important responsibility to keep in mind: err on the side of caution with the Principle of Least Surprise.

[T]o design usable interfaces, it's best when possible not to design an entire new interface model. Novelty is a barrier to entry; it puts a learning burden on the user, so minimize it.

This principle indicates that using well known design patterns and language idioms is a "best practice" in library design. When you follow that guideline, people will already have an understanding of the interface that you're providing; therefore, they will have one less thing to worry about in leveraging your library to write their code.

Discovery Mechanism Proposals

Patrick is solving a common category of problem: he wants to allow clients to flexibly extend his parsing library's capabilities. For example, if his module knows how to parse xml and yaml files out of the box, programmers using his library should be able to add their own rst and html parser capabilities with ease.

Patrick's proposal is this:

If you were to do this, you would use the various utilities in the imp module to load the modules dynamically, then determine the appropriate classes via the inspect module. [‡]

My counter-proposal is this, which is also known as the Registry Pattern, a form of runtime configuration and behavior extension:

Parser library:

class UnknownMimetypeException(Exception): pass
class ParseError(Exception): pass

class IParser:
    """
    Reference interface for parser classes;
    inheritance is not necessary.
    """

    parseable_mimetypes = set()

    def __init__(self, file):
        self.file = file
        self.doctree = None

    def parse(self):
        """
        Parse :ivar:`file` and place the parsed document
        tree into :ivar:`doctree`.
        """
        raise NotImplementedError


class ParserFacade:
    """
    Assumes that there can only be one parser per mimetype.
    :ivar mimetype_to_parser_cls: Storage for parser registry.
    """

    def __init__(self):
        self.mimetype_to_parser_cls = {}

    def register_parser(self, cls):
        for mimetype in cls.parseable_mimetypes:
            self.mimetype_to_parser_cls[mimetype] = cls

        return cls # For use as a decorator.

    def parse(self, file, mimetype):
        """
        Determine the appropriate parser for the mimetype,
        create a parser to parse the file, and perform
        the parsing.

        :return: The parser object.
        """
        try:
            parser_cls = self.mimetype_to_parser_cls[mimetype]
        except KeyError:
            raise UnknownMimetypeException(mimetype)

        parser = parser_cls(file)
        parser.parse() # May raise ParseError
        return parser


default_facade = ParserFacade()
register_parser = default_facade.register_parser
parse = default_facade.parse

Client code:

from parser_lib import register_parser

@register_parser
class SpamParser:
    """
    Parses ``.spam`` files.
    Conforms to implicit parser interface of `parser_lib`.
    """

    parseable_mimetypes = {'text/spam'}

    def __init__(self, file):
        self.file = file
        self.doctree = None

    def parse(self):
        raise NotImplementedError

After the client code executes, the SpamParser will then be available for parsing text/spam mimetype files via parser_lib.parse.

Here are some of my considerations in determining which of these is the least magical:

Magical Allure

The problem with magic is that it is freaking cool and it drives all the ladies crazy. [¶] As a result, the right hemisphere of your developer-brain yearns for your library clients to read instructions like:

Drag and drop your Python code into my directory — I'll take care of it from there.

That's right, that's all there is to it.

Oh, I know what you're thinking — yes, I'm available — check out parser_lib.PHONE_NUMBER and give me a call sometime.

But, as you envision phone calls from sexy Pythonistas, the left hemisphere of your brain is screaming at the top of its lungs! [#]

Magic leaves the audience wondering how the trick is done, and the analytical side of the programmer mind hates that. It implies that there's a non-trivial abstraction somewhere that does reasonably complex things, but it's unclear where it can be found or how to leverage it differently.

Coders need control and understanding of their code and, by extension, as much control and understanding over third party code as is reasonably possible. Because of this, concise, loosely coupled, and extensible abstractions are always preferred to the imposition of elaborate usage design ideas on clients of your code. It's best to assume that people will want to leverage the functionality your code provides, but that you can't foresee the use cases.

To Reiterate: Dynamic does not Imply Magical

Revisiting my opening point: anecdotal evidence suggests that some members of the static typing camp see we programming-dynamism dynamos as anarchic lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into lawless towns of code, type checking blowing by the vacant sheriff's station as tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it's easy to see why you would start doing all sorts of fancy things when you cross into dynamism town — little do you know, we don't take kindly to that 'round these parts.

In other, more intelligble words, this is a serious misconception — dynamism isn't a free pass to disregard the Principle of Least Surprise — dynamism proponents still want order in the programming universe. Perhaps we value our sanity even more! The key insight is that programming dynamism does allow you additional flexibility when it's required or practical to use. More rigid execution models require you to use workarounds, laboriously at times, for a similar degree of flexibility.

As demonstrated by Marius' comment in my last entry, Python coders have a healthy respect for the power of late binding, arbitrary code execution on module import, and seamless platform integration. Accompanying this is a healthy wariness of black magic.

Caveat

It's possible that Patrick was developing a closed-system application (e.g. the Eclipse IDE) and not a library like I was assuming.

In the application case, extensions are typically discovered (though not necessarily activated) by enumerating a directory. When the user activates such an extension, the modules found within it are loaded into the application. This is the commonly found plugin model — it's typically more difficult to wrap the application interface and do configurations at load time, so the application developer must provide an extension hook.

However, the registration pattern should still be preferred to reflection in this case! When the extension is activated and the extension modules load, the registration decorator will be executed along with all the other top-level code in the extension modules.

The extension has the capability to inform the application of the extension's functionality instead having the application query the plugin for its capabilities. This is a form of loosely coupled cooperative configuration that eases the burden on the application and eliminates the requirement to foresee needs of the extensions. [♠]

Footnotes

[*]

Note that you can't call it dynamic programming, as that would alias a well known term from the branch of computer science concerned with algorithms. Programming language dynamism it is!

[†]

Much like a dehydrated wanderer in the desert mistakes a shapely pile of sand for an oasis!

[‡]

As of the date of this publishing, Patrick's implementation seems to have gone a bit astray with text processing of Python source files. Prefer dynamic module loading and inspection to text processing source code! Enumerating the reasons this is preferred is beyond the scope of this article.

[§]

In Python < 3.0 you can perform class decoration without the decorator syntax. Decorator syntax is just syntactic sugar for "invoke this method and rebind the identifier in this scope", like so:

class SomeClass(object):
    pass
SomeClass = my_class_decorator(SomeClass) # Decorate the class.
[¶]

Perhaps men as well, but I've never seen any TV evidence to justify that conclusion.

[#]

Yes, in this analogy brains have lungs. If you've read this far you're probably not a biologist anyway.

[♠]

Of course, the plugin model always has security implications. Unless you go out of your way to make a sandboxed Python environment for plugins, you need to trust the plugins that you activate — they have the ability to execute arbitrary code.