Registry pattern trumps import magic
The other night I saw an interesting tweet in the #Python Twitter channel — Patrick was looking to harness the dynamism of a language like Python in a way that many Pythonistas would consider magical. [*] Coming from languages with more rigid execution models, it’s understandably easy to confuse dynamic and magical. [†]
What is magic?
To quote the jargon file, magic is:
Characteristic of something that works although no one really understands why (this is especially called black magic).
Taken in the context of programming, magic refers to code that works without a straightforward way of determining why it works.
Today’s more flexible languages provide the programmer with a significant amount of power at runtime, making the barrier to "accidental magic" much lower. As a programmer who works with dynamic languages, there’s an important responsibility to keep in mind: err on the side of caution with the Principle of Least Surprise.
[T]o design usable interfaces, it’s best when possible not to design an entire new interface model. Novelty is a barrier to entry; it puts a learning burden on the user, so minimize it.
This principle indicates that using well known design patterns and language idioms is a "best practice" in library design. When you follow that guideline, people will already have an understanding of the interface that you’re providing; therefore, they will have one less thing to worry about in leveraging your library to write their code.
Discovery Mechanism Proposals
Patrick is solving a common category of problem: he wants to allow clients to flexibly extend his parsing library’s capabilities. For example, if his module knows how to parse xml and yaml files out of the box, programmers using his library should be able to add their own rst and html parser capabilities with ease.
Patrick’s proposal is this:
- Have the programmer place all extension modules that might contain parser classes in a known directory.
- In a factory class constructor, take a directory listing of the known directory.
- Import every module present in that listing.
- Inspect each module imported this way for class members.
- For each class found, add it to an accumulator if it inherits from a Parser abstract base class provided by the module.
If you were to do this, you would use the various utilities in the imp module to load the modules dynamically, then determine the appropriate classes via the inspect module. [‡]
My counter-proposal is this, which is also known as the Registry Pattern, a form of runtime configuration and behavior extension:
- Have the programmer import a decorator from our module.
- Let them decorate any class [§] that conforms to the implicit Parser interface.
Parser library:
class UnknownMimetypeException(Exception): pass class ParseError(Exception): pass class IParser: """Reference interface for parser classes -- inheritance is not necessary.""" parseable_mimetypes = set() def __init__(self, file): self.file = file self.doctree = None def parse(self): """Parse :ivar:`file` and place the parsed document tree into :ivar:`doctree`. """ raise NotImplementedError class ParserFacade: """Assumes that there can only be one parser per mimetype. :ivar mimetype_to_parser_cls: Storage for parser registry. """ def __init__(self): self.mimetype_to_parser_cls = {} def register_parser(self, cls): for mimetype in cls.parseable_mimetypes: self.mimetype_to_parser_cls[mimetype] = cls def parse(self, file, mimetype): """Determine the appropriate parser for the mimetype, create a parser to parse the file, and perform the parsing. :return: The parser object. """ try: parser_cls = self.mimetype_to_parser_cls[mimetype] except KeyError: raise UnknownMimetypeException(mimetype) parser = parser_cls(file) parser.parse() # May raise ParseError return parser default_facade = ParserFacade() register_parser = default_facade.register_parser parse = default_facade.parse
Client code:
from parser_lib import register_parser @register_parser class SpamParser: """Parses ``.spam`` files. Conforms to implicit parser interface of `parser_lib`. """ parseable_mimetypes = {'text/spam'} def __init__(self, file): self.file = file self.doctree = None def parse(self): raise NotImplementedError
After the client code executes, the SpamParser will then be available for parsing text/spam mimetype files via parser_lib.parse.
Here are some of my considerations in determining which of these is the least magical:
- Which interface is the easiest to explain?
- Which implementation will be the easiest to explain?
- Which is more fragile? (Which is most likely to break when "special case uses" crop up?)
- Which is easier to test?
Magical Allure
The problem with magic is that it is freaking cool and it drives all the ladies crazy. [¶] As a result, the right hemisphere of your developer-brain yearns for your library clients to read instructions like:
Drag and drop your Python code into my directory — I’ll take care of it from there.
That’s right, that’s all there is to it.
Oh, I know what you’re thinking — yes, I’m available — check out parser_lib.PHONE_NUMBER and give me a call sometime.
But, as you envision phone calls from sexy Pythonistas, the left hemisphere of your brain is screaming at the top of its lungs! [#]
Magic leaves the audience wondering how the trick is done, and the analytical side of the programmer mind hates that. It implies that there’s a non-trivial abstraction somewhere that does reasonably complex things, but it’s unclear where it can be found or how to leverage it differently.
Coders need control and understanding of their code and, by extension, as much control and understanding over third party code as is reasonably possible. Because of this, concise, loosely coupled, and extensible abstractions are always preferred to the imposition of elaborate usage design ideas on clients of your code. It’s best to assume that people will want to leverage the functionality your code provides, but that you can’t foresee the use cases.
To Reiterate: Dynamic does not Imply Magical
Revisiting my opening point: anecdotal evidence suggests that some members of the static typing camp see we programming-dynamism dynamos as anarchic lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into lawless towns of code, type checking blowing by the vacant sheriff’s station as tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it’s easy to see why you would start doing all sorts of fancy things when you cross into dynamism town — little do you know, we don’t take kindly to that ’round these parts.
In other, more intelligble words, this is a serious misconception — dynamism isn’t a free pass to disregard the Principle of Least Surprise — dynamism proponents still want order in the programming universe. Perhaps we value our sanity even more! The key insight is that programming dynamism does allow you additional flexibility when it’s required or practical to use. More rigid execution models require you to use workarounds, laboriously at times, for a similar degree of flexibility.
As demonstrated by Marius’ comment in my last entry, Python coders have a healthy respect for the power of late binding, arbitrary code execution on module import, and seamless platform integration. Accompanying this is a healthy wariness of black magic.
Caveat
It’s possible that Patrick was developing a closed-system application (e.g. the Eclipse IDE) and not a library like I was assuming.
In the application case, extensions are typically discovered (though not necessarily activated) by enumerating a directory. When the user activates such an extension, the modules found within it are loaded into the application. This is the commonly found plugin model — it’s typically more difficult to wrap the application interface and do configurations at load time, so the application developer must provide an extension hook.
However, the registration pattern should still be preferred to reflection in this case! When the extension is activated and the extension modules load, the registration decorator will be executed along with all the other top-level code in the extension modules.
The extension has the capability to inform the application of the extension’s functionality instead having the application query the plugin for its capabilities. This is a form of loosely coupled cooperative configuration that eases the burden on the application and eliminates the requirement to foresee needs of the extensions. [♠]
Footnotes
| [*] | Note that you can’t call it dynamic programming, as that would alias a well known term from the branch of computer science concerned with algorithms. Programming language dynamism it is! |
| [†] | Much like a dehydrated wanderer in the desert mistakes a shapely pile of sand for an oasis! |
| [‡] | As of the date of this publishing, Patrick’s implementation seems to have gone a bit astray with text processing of Python source files. Prefer dynamic module loading and inspection to text processing source code! Enumerating the reasons this is preferred is beyond the scope of this article. |
| [§] |
In Python < 3.0 you can perform class decoration without the decorator syntax. Decorator syntax is just syntactic sugar for "invoke this method and rebind the identifier in this scope", like so: class SomeClass(object): pass SomeClass = my_class_decorator(SomeClass) # Decorate the class. |
| [¶] | Perhaps men as well, but I’ve never seen any TV evidence to justify that conclusion. |
| [#] | Yes, in this analogy brains have lungs. If you’ve read this far you’re probably not a biologist anyway. |
| [♠] | Of course, the plugin model always has security implications. Unless you go out of your way to make a sandboxed Python environment for plugins, you need to trust the plugins that you activate — they have the ability to execute arbitrary code. |
Tags: Design Patterns, Dynamism, Jargon, Magic, Principle of Least Surprise
June 1st, 2009 at 12:14
It might be tempting to make things magical, but just like with the ladies, when the magic fades, reality sets in. I agree wholeheartedly with this post. And in the case of an extensible application, it’s best to do some import “magic” but still allow the registry to take it from there. You nailed it. Agree. +1.
If I see another “list the directory, import everything, register/call/etc any class that extends X” I’m going to scream. What if I don’t want to subclass your lame abstract class? I should always be allowed to use the wonderful “implicit interfaces” of my dynamic language.
June 1st, 2009 at 14:06
Just today I’ve been looking at ways to implement plugins and yours is the clearest explanation I’ve read.
I am lazy and therefore keen on being DRY. Is it wrong to want some kind of autodiscovery mechanism? It seems tiresome to have to manually import every ‘plugin’ by name.
June 1st, 2009 at 14:21
@Andy: Staying DRY is certainly important. (Don’t forget to bring a towel!)
Autodiscovery is not wrong in and of itself. Check out the Caveat section — if you’re writing a stand-alone application (as opposed to a library), you will probably want to enumerate an extension directory with the “imp“ module. Even so, you will probably want users to activate the extension through some manual process (perhaps just by name), for security reasons.
Assuming that you *do* perform autodiscovery this way, the registry pattern is not much repetition to use within the extension modules. Does that clear things up, or are you saying that the decorator violates DRY?
June 1st, 2009 at 14:45
Thanks for the reply.
I am writing a Django server-side app so if anyone is writing to my filesystem then my plugin security is the least of my problems!
I was rather keen on an architecture that involves just dropping .py files into a specified folder without needing to explicitly import them. Is that exactly what you are warning against?
June 1st, 2009 at 15:06
@Andy: Since it sounds like you’re writing a self-contained application, you fall under the caveat. In that case I’m saying it’s fine to find the extension dynamically, but it’s better to let the extensions configure themselves into your application (as with the registry patten) than it is for the application to inspect them on import and pull things out of them. That part is *really* magical.
Let me know if that doesn’t clear things up! Thanks for your comments.
June 1st, 2009 at 15:53
Auto discovery is overkill unless the number of modules is high. If it is your own project adding an ‘import myparser’ isn’t a hardship; if you are passing the class upstream to the framework author then they can stick the import in the module with all the other parsers.
Explicit registration is good. As I mentioned in my PyCon talk about Class Decorators “inherits means implements” falls apart. Eventually you will have an intermediate class that isn’t a parser itself but is inherited by parsers. e.g. a class that implements a common subset of XML and HTML4, or HTML4 and HTML5.
It is a matter of style but I would move the parseable_mimetypes attribute out of the parser class and into the registration call. As it is you have explicit registration that then implicitly introspects the class; you might as well go 100% explicit.
@register_parser(‘text/plain’, ‘text/ascii’)
class TextParser(IParser): pass
June 1st, 2009 at 16:29
@Jack: Excellent point — the mimetypes should optionally be specified in the decorator. I would use the attribute as a fallback mechanism, since you won’t be able to get at the class variables easily from the place the decorator is used in the common case (it’s usually before the class definition).
June 5th, 2009 at 02:51
Nice to see that progression is made!
First of all I must admit that I’m fairly new to Python and tried some things to have the code act like the way I wanted. Beforehand I knew that this would eventually not be the way to go, but I needed a starting point for a discussion :)
My actual idea was that I can write a parser, put it in a parser directory and from then it just works without any further constraints. From there I went on to try to accomplish that (by any means, regarding the code). Of course you are right that text processing is done to find out which classes are parser classes and that it’s not a preferrable situation.
Anyway: Nice post and discussion is very welcome, let’s learn!
June 27th, 2009 at 02:48
[...] Any thoughts, comments and discussions are appreciated. For more information: Chris Leary has posted an improvement here [...]