The other night I saw an interesting tweet in the #Python Twitter channel — Patrick was looking to harness the dynamism of a language like Python in a way that many Pythonistas would consider magical. Coming from languages with more rigid execution models, it’s understandably easy to confuse dynamic and magical.
What is magic?
To quote the jargon file, magic is:
Characteristic of something that works although no one really understands why (this is especially called black magic).
Taken in the context of programming, magic refers to code that works without a straightforward way of determining why it works.
Today’s more flexible languages provide the programmer with a significant amount of power at runtime, making the barrier to "accidental magic" much lower. As a programmer who works with dynamic languages, there’s an important responsibility to keep in mind: err on the side of caution with the Principle of Least Surprise.
[T]o design usable interfaces, it’s best when possible not to design an entire new interface model. Novelty is a barrier to entry; it puts a learning burden on the user, so minimize it.
This principle indicates that using well known design patterns and language idioms is a "best practice" in library design. When you follow that guideline, people will already have an understanding of the interface that you’re providing; therefore, they will have one less thing to worry about in leveraging your library to write their code.
Discovery Mechanism Proposals
Patrick is solving a common category of problem: he wants to allow clients to flexibly extend his parsing library’s capabilities. For example, if his module knows how to parse xml and yaml files out of the box, programmers using his library should be able to add their own rst and html parser capabilities with ease.
Patrick’s proposal is this:
- Have the programmer place all extension modules that might contain parser classes in a known directory.
- In a factory class constructor, take a directory listing of the known directory.
- Import every module present in that listing.
- Inspect each module imported this way for class members.
- For each class found, add it to an accumulator if it inherits from a Parser abstract base class provided by the module.
If you were to do this, you would use the various utilities in the imp module to load the modules dynamically, then determine the appropriate classes via the inspect module.
My counter-proposal is this, which is also known as the Registry Pattern, a form of runtime configuration and behavior extension:
- Have the programmer import a decorator from our module.
- Let them decorate any class that conforms to the implicit Parser interface.
Parser library:
class UnknownMimetypeException(Exception): pass
class ParseError(Exception): pass
class IParser:
"""Reference interface for parser classes -- inheritance is not
necessary."""
parseable_mimetypes = set()
def __init__(self, file):
self.file = file
self.doctree = None
def parse(self):
"""Parse :ivar:`file` and place the parsed document tree into
:ivar:`doctree`.
"""
raise NotImplementedError
class ParserFacade:
"""Assumes that there can only be one parser per mimetype.
:ivar mimetype_to_parser_cls: Storage for parser registry.
"""
def __init__(self):
self.mimetype_to_parser_cls = {}
def register_parser(self, cls):
for mimetype in cls.parseable_mimetypes:
self.mimetype_to_parser_cls[mimetype] = cls
def parse(self, file, mimetype):
"""Determine the appropriate parser for the mimetype, create a
parser to parse the file, and perform the parsing.
:return: The parser object.
"""
try:
parser_cls = self.mimetype_to_parser_cls[mimetype]
except KeyError:
raise UnknownMimetypeException(mimetype)
parser = parser_cls(file)
parser.parse() # May raise ParseError
return parser
default_facade = ParserFacade()
register_parser = default_facade.register_parser
parse = default_facade.parse
Client code:
from parser_lib import register_parser
@register_parser
class SpamParser:
"""Parses ``.spam`` files.
Conforms to implicit parser interface of `parser_lib`.
"""
parseable_mimetypes = {'text/spam'}
def __init__(self, file):
self.file = file
self.doctree = None
def parse(self):
raise NotImplementedError
After the client code executes, the SpamParser will then be available for parsing text/spam mimetype files via parser_lib.parse.
Here are some of my considerations in determining which of these is the least magical:
- Which interface is the easiest to explain?
- Which implementation will be the easiest to explain?
- Which is more fragile? (Which is most likely to break when "special case uses" crop up?)
- Which is easier to test?
Magical Allure
The problem with magic is that it is freaking cool and it drives all the ladies crazy. As a result, the right hemisphere of your developer-brain yearns for your library clients to read instructions like:
Drag and drop your Python code into my directory — I’ll take care of it from there.
That’s right, that’s all there is to it.
Oh, I know what you’re thinking — yes, I’m available — check out parser_lib.PHONE_NUMBER and give me a call sometime.
But, as you envision phone calls from sexy Pythonistas, the left hemisphere of your brain is screaming at the top of its lungs!
Magic leaves the audience wondering how the trick is done, and the analytical side of the programmer mind hates that. It implies that there’s a non-trivial abstraction somewhere that does reasonably complex things, but it’s unclear where it can be found or how to leverage it differently.
Coders need control and understanding of their code and, by extension, as much control and understanding over third party code as is reasonably possible. Because of this, concise, loosely coupled, and extensible abstractions are always preferred to the imposition of elaborate usage design ideas on clients of your code. It’s best to assume that people will want to leverage the functionality your code provides, but that you can’t foresee the use cases.
To Reiterate: Dynamic does not Imply Magical
Revisiting my opening point: anecdotal evidence suggests that some members of the static typing camp see we programming-dynamism dynamos as anarchic lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into lawless towns of code, type checking blowing by the vacant sheriff’s station as tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it’s easy to see why you would start doing all sorts of fancy things when you cross into dynamism town — little do you know, we don’t take kindly to that ’round these parts.
In other, more intelligble words, this is a serious misconception — dynamism isn’t a free pass to disregard the Principle of Least Surprise — dynamism proponents still want order in the programming universe. Perhaps we value our sanity even more! The key insight is that programming dynamism does allow you additional flexibility when it’s required or practical to use. More rigid execution models require you to use workarounds, laboriously at times, for a similar degree of flexibility.
As demonstrated by Marius’ comment in my last entry, Python coders have a healthy respect for the power of late binding, arbitrary code execution on module import, and seamless platform integration. Accompanying this is a healthy wariness of black magic.
Caveat
It’s possible that Patrick was developing a closed-system application (e.g. the Eclipse IDE) and not a library like I was assuming.
In the application case, extensions are typically discovered (though not necessarily activated) by enumerating a directory. When the user activates such an extension, the modules found within it are loaded into the application. This is the commonly found plugin model — it’s typically more difficult to wrap the application interface and do configurations at load time, so the application developer must provide an extension hook.
However, the registration pattern should still be preferred to reflection in this case! When the extension is activated and the extension modules load, the registration decorator will be executed along with all the other top-level code in the extension modules.
The extension has the capability to inform the application of the extension’s functionality instead having the application query the plugin for its capabilities. This is a form of loosely coupled cooperative configuration that eases the burden on the application and eliminates the requirement to foresee needs of the extensions.