The other night I saw an interesting tweet in the #Python Twitter channel --
Patrick was looking to harness the dynamism of a language like Python in a
way that many Pythonistas would consider magical. [*] Coming from languages
with more rigid execution models, it's understandably easy to confuse
dynamic and magical. [†]
What is magic?
To quote the jargon file, magic is:
Characteristic of something that works although no one really understands
why (this is especially called black magic).
Taken in the context of programming, magic refers to code that works without a
straightforward way of determining why it works.
Today's more flexible languages provide the programmer with a significant
amount of power at runtime, making the barrier to "accidental magic" much
lower. As a programmer who works with dynamic languages, there's an important
responsibility to keep in mind: err on the side of caution with the Principle
of Least Surprise.
[T]o design usable interfaces, it's best when possible not to design an
entire new interface model. Novelty is a barrier to entry; it puts a
learning burden on the user, so minimize it.
This principle indicates that using well known design patterns and language
idioms is a "best practice" in library design. When you follow that
guideline, people will already have an understanding of the interface that
you're providing; therefore, they will have one less thing to worry about in
leveraging your library to write their code.
Discovery Mechanism Proposals
Patrick is solving a common category of problem: he wants to allow clients to
flexibly extend his parsing library's capabilities. For example, if his
module knows how to parse xml and yaml files out of the box,
programmers using his library should be able to add their own rst and
html parser capabilities with ease.
Patrick's proposal is this:
Have the programmer place all extension modules that might contain
parser classes in a known directory.
In a factory class constructor, take a directory listing of the known
directory.
Import every module present in that listing.
Inspect each module imported this way for class members.
For each class found, add it to an accumulator if it inherits from a
Parser abstract base class provided by the module.
If you were to do this, you would use the various utilities in the imp
module to load the modules dynamically, then determine the appropriate
classes via the inspect module. [‡]
My counter-proposal is this, which is also known as the Registry
Pattern, a form of runtime configuration and behavior extension:
Parser library:
class UnknownMimetypeException(Exception): pass
class ParseError(Exception): pass
class IParser:
"""Reference interface for parser classes -- inheritance is not
necessary."""
parseable_mimetypes = set()
def __init__(self, file):
self.file = file
self.doctree = None
def parse(self):
"""Parse :ivar:`file` and place the parsed document tree into
:ivar:`doctree`.
"""
raise NotImplementedError
class ParserFacade:
"""Assumes that there can only be one parser per mimetype.
:ivar mimetype_to_parser_cls: Storage for parser registry.
"""
def __init__(self):
self.mimetype_to_parser_cls = {}
def register_parser(self, cls):
for mimetype in cls.parseable_mimetypes:
self.mimetype_to_parser_cls[mimetype] = cls
def parse(self, file, mimetype):
"""Determine the appropriate parser for the mimetype, create a
parser to parse the file, and perform the parsing.
:return: The parser object.
"""
try:
parser_cls = self.mimetype_to_parser_cls[mimetype]
except KeyError:
raise UnknownMimetypeException(mimetype)
parser = parser_cls(file)
parser.parse() # May raise ParseError
return parser
default_facade = ParserFacade()
register_parser = default_facade.register_parser
parse = default_facade.parse
Client code:
from parser_lib import register_parser
@register_parser
class SpamParser:
"""Parses ``.spam`` files.
Conforms to implicit parser interface of `parser_lib`.
"""
parseable_mimetypes = {'text/spam'}
def __init__(self, file):
self.file = file
self.doctree = None
def parse(self):
raise NotImplementedError
After the client code executes, the SpamParser will then be available for
parsing text/spam mimetype files via parser_lib.parse.
Here are some of my considerations in determining which of these is the least
magical:
Which interface is the easiest to explain?
Which implementation will be the easiest to explain?
Which is more fragile? (Which is most likely to break when
"special case uses" crop up?)
Which is easier to test?
Magical Allure
The problem with magic is that it is freaking cool and it drives all the
ladies crazy. [¶] As a result, the right hemisphere of your developer-brain
yearns for your library clients to read instructions like:
Drag and drop your Python code into my directory — I'll take care of it
from there.
That's right, that's all there is to it.
Oh, I know what you're thinking — yes, I'm available — check out
parser_lib.PHONE_NUMBER and give me a call sometime.
But, as you envision phone calls from sexy Pythonistas, the left hemisphere of
your brain is screaming at the top of its lungs! [#]
Magic leaves the audience wondering how the trick is done, and the analytical
side of the programmer mind hates that. It implies that there's a non-trivial
abstraction somewhere that does reasonably complex things, but it's unclear
where it can be found or how to leverage it differently.
Coders need control and understanding of their code and, by extension, as much
control and understanding over third party code as is reasonably possible.
Because of this, concise, loosely coupled, and extensible abstractions are
always preferred to the imposition of elaborate usage design ideas on
clients of your code. It's best to assume that people will want to leverage the
functionality your code provides, but that you can't foresee the use cases.
To Reiterate: Dynamic does not Imply Magical
Revisiting my opening point: anecdotal evidence suggests that some members of
the static typing camp see we programming-dynamism dynamos as anarchic
lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into
lawless towns of code, type checking blowing by the vacant sheriff's station as
tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it's easy
to see why you would start doing all sorts of fancy things when you cross into
dynamism town — little do you know, we don't take kindly to that 'round these
parts.
In other, more intelligble words, this is a serious misconception — dynamism
isn't a free pass to disregard the Principle of Least Surprise — dynamism
proponents still want order in the programming universe. Perhaps we value our
sanity even more! The key insight is that programming dynamism does allow
you additional flexibility when it's required or practical to use. More
rigid execution models require you to use workarounds, laboriously at times,
for a similar degree of flexibility.
As demonstrated by Marius' comment in my last entry, Python coders
have a healthy respect for the power of late binding, arbitrary
code execution on module import, and seamless platform integration.
Accompanying this is a healthy wariness of black magic.
Caveat
It's possible that Patrick was developing a closed-system application (e.g.
the Eclipse IDE) and not a library like I was assuming.
In the application case, extensions are typically discovered (though not
necessarily activated) by enumerating a directory. When the user activates such
an extension, the modules found within it are loaded into the application.
This is the commonly found plugin model — it's typically more difficult to
wrap the application interface and do configurations at load time, so the
application developer must provide an extension hook.
However, the registration pattern should still be preferred to reflection in
this case! When the extension is activated and the extension modules load, the
registration decorator will be executed along with all the other top-level code
in the extension modules.
The extension has the capability to inform the application of the extension's
functionality instead having the application query the plugin for its
capabilities. This is a form of loosely coupled cooperative configuration
that eases the burden on the application and eliminates the requirement to
foresee needs of the extensions. [♠]
Footnotes
| [*] | Note that you can't call it dynamic programming, as that would alias a
well known term from the branch of computer science concerned with
algorithms. Programming language dynamism it is! |
| [†] | Much like a dehydrated wanderer in the desert mistakes a shapely pile
of sand for an oasis! |
| [‡] | As of the date of this publishing, Patrick's implementation seems to have
gone a bit astray with text processing of Python source files. Prefer
dynamic module loading and inspection to text processing source code!
Enumerating the reasons this is preferred is beyond the scope of this
article. |
| [§] | In Python < 3.0 you can perform class decoration without the decorator
syntax. Decorator syntax is just syntactic sugar for "invoke this
method and rebind the identifier in this scope", like so: class SomeClass(object):
pass
SomeClass = my_class_decorator(SomeClass) # Decorate the class.
|
| [¶] | Perhaps men as well, but I've never seen any TV evidence to justify that
conclusion. |
| [#] | Yes, in this analogy brains have lungs. If you've read this far you're
probably not a biologist anyway. |
| [♠] | Of course, the plugin model always has security implications. Unless you go
out of your way to make a sandboxed Python environment for plugins, you
need to trust the plugins that you activate — they have the ability to
execute arbitrary code. |