Why you should bother!
I write this entry in response to Why Should I Bother?. The answer, in
short, is that I find Python to be a great language for getting things done,
and you shouldn't let stupid interviewers deter you from learning a language
that allows you to get more done.
I think I'm a pretty tough interviewer, so I also describe the things I'd
recommend that a Python coder knows before applying to a Java position, based
on my own technical-interview tactics.
A spectrum of formality
As my friend pointed out to me long ago, many computer scientists don't
care about effective programming methods, because they prefer theory.
Understandably, we computer scientists qua programmers (AKA software engineers)
find ourselves rent in twain.
As a computer science degree candidate, you are inevitably enamored with
complex formalities, terminology, [*] and a robust knowledge of mathematical
models that you'll use a few times per programming project (if you're lucky).
Pragmatic programming during a course of study often takes a back seat to
familiar computer science concepts and conformance to industry desires.
As a programmer, I enjoy Python because I find it minimizes boilerplate,
maximizes my time thinking about the problem domain, and permits me to use
whichever paradigm works best. I find that I write programs more quickly and
spend less time working around language deficiencies. Importantly, the
execution model fits in my brain.
Real Computer Scientists (TM) tend to love pure-functional programming
languages because they fit into mathematical models nicely — founded on
recursion, Curry-Howard isomorphism, and what have you — whereas Python is
strongly imperative and, in its dynamism, lacks the same sort of formality.
Languages like Java sit somewhere in the middle. They're still strongly
imperative (there are no higher-order functions in Java), but there are more
formalities. As the most notable example, compile-time type checking eliminates
the possibility of type errors, which gives some programmers a sense of safety.
[†] Such languages still let scientists chew on some computer sciencey
problems; for example, where values clash with the type system, like provably
eliminating NullPointerExceptions, which is fun, but difficult!
As the cost of increased formality, this class of languages is more
syntax-heavy and leans on design patterns to get some of the flexibility
dynamic typing gives you up front.
It's debatable which category of languages is easiest to learn, but Java-like
languages have footholds in the industry from historical C++ developer bases,
Sun's successful marketing of Java off of C++, and the more recent successes of
the C# .NET platform.
It makes sense that we're predominantly taught this category of languages in
school: as a result, we can play the percentages and apply for most available
developer jobs. Given that we have to learn it, you might as well do some
throw-away programming in it now and again to keep yourself from forgetting
everything; however, I'd recommend, as a programmer, that you save the fun
projects for whichever language(s) that you find most intriguing.
I picture ease-and-rapidity of development-and-maintenance on a spectrum from
low to high friction — other languages I've worked in fall somewhere on that
spectrum as higher friction than Python. Though many computer scientists much
smarter than I seem to conflate formality and safety, I'm fairly convinced I
attain code completion and maintainability goals more readily with the
imperative and flexible Python language. Plus, perhaps most importantly, I
have fun.
My technical-interview protocol
The protocol I use to interview candidates is pretty simple. [‡]
Analyze their background experience for potential weaknesses that would
affect their ability to perform in the position.
Don't let the candidate talk about anything until the last n minutes,
when they can ask questions or comment on the interview experience.
Hone in on the areas of potential weakness to determine how bad they are
(if they're bad at all).
Evaluate how easy it is to overcome those weaknesses. Determine where their
level in relevant areas falls in the Dreyfus model and how much it matters
to the description of the position.
Potential Java interview weaknesses
Interviewing a candidate whose background is primarily Python based for a
generic Java developer position (as in Sayamindu's entry), I would
immediately flag the following areas as potential weaknesses:
- Primitive data types
A programmer can pretty much get away never knowing how a number works in
Python, since you typically overflow to appropriately sized data types
automatically.
The candidate needs to know what all the Java primitives are when the
names are provided to them, and must be able to describe why you would
choose to use one over another. Knowing pass-by-value versus
pass-by-reference is a plus. In Python there is a somewhat similar
distinction between mutable and immutable types — if they understand
the subtleties of identifier binding, learning by-ref versus by-value will
be a cinch. If they don't know either, I'll be worried.
- Object oriented design
The candidate's Python background must not be entirely procedural, or they
won't fare well in a Java environment (which forces object orientation).
Additionally, this would indicate that they probably haven't done much
design work: even if they're an object-orientation iconoclast, they have to
know what they're rebelling against and why.
They need to know:
When polymorphism is appropriate.
What should be exposed from an encapsulation perspective.
What the benefits of interfaces are (in a statically typed,
single-inheritance language).
Basically, if they don't know the fundamentals of object oriented design,
I'll assume they've only ever written "scripts," by which I mean,
"Small, unimportant code that glues the I/O of several real applications
together." I don't use the term lightly.
- Unit testing
If they've been writing real-world Python without a single unit test or
doctest, they've been Doing it Wrong (TM).
unittest is purposefully modeled on xUnit. They may have to learn the
new jUnit 4 decorator syntax when they start work, but they should be able
to claim they've worked with a jUnit 3 -like API.
- Abstract data structures
Python has tuples, lists and dictionaries — all polymorphic containers --
and they'll do everything that your programmer heart desires. [§] Some
other languages don't have such nice abstractions.
It'd be awesome if they knew:
They must know (in ascending importance):
The difference between a HashMap and a
TreeMap.
The difference between a vector and a linked list, or
when one should preferred over the other. The names are
unimportant — I'd clarify that a vector was a dynamically growing
array.
The "difference" between a tree and a graph.
Turning attention to you, the reader: if you're lacking in data structures
knowledge, I recommend you read a data structures book and actually
implement the data structures. Then, take a few minutes to figure out where
you'd actually use them in an application. They stick in your head fairly
well once you've implemented them once.
Some interviewers will ask stupid questions like how to implement sorting
algorithms. Again, just pick up a data structures book and implement them
once, and you'll get the gist. Refresh yourself before the interview,
because these are a silly favorite — very few people have to implement
sorting functions anymore.
- Design patterns
Design patterns serve several purposes:
They establish a common language for communicating proposed solutions to
commonly found problems.
They prevent developers for inventing stupid solutions to a solved class
of problems.
They contain a suite of workarounds for inflexibilities in statically
typed languages.
I would want to assure myself that you had an appropriate knowledge of
relevant design patterns. More important than the names: if I describe them
to you, will you recognize them and their useful applications?
For example, have you ever used the observer pattern? Adapter pattern?
Proxying? Facade? You almost certainly had to use all of those if you've
done major design work in Python.
- Background concepts
These are some things that I would feel extra good about if the candidate
knew and could accurately describe how they relate to their Python analogs:
The importance of string builders (Python list joining idiom)
Basic idea of how I/O streams work (Python files under the hood)
Basic knowledge of typecasting (Python has implicit polymorphism)
Practical advice
Some (bad) interviewers just won't like you because you don't know their
favorite language. If you're interviewing for a position that's likely to be
Java oriented, find the easiest IDE out there and write an application in it
for fun. Try porting a Python application you wrote and see how the concepts
translate — that's often an eye-opener. Or katas!
If you find yourself unawares in an interview with these "language crusaders,"
there's nothing you can do but show that you have the capacity to learn their
language in the few weeks vacation you have before you start. If it makes you
feel better, keep a mental map from languages to number of jerks you've
encountered — even normalizing by developer-base size the results can be
surprising. ;-)
Footnotes
Learning Python by example: RC4
One of my friends at work was fakeplaining [*] that he had been on the Python
programming mailing list at work for some time, yet still did not know Python.
Being hopelessly suggestible in the face of obvious sarcasm, I decided to
sacrifice a few hours of sleep to the god of blog. [†]
Note that this entry is aimed at people who already know how to program and
have been looking for a tidbit to try in Python.
There are a lot of side notes I've left out for simplicity of explanation;
however, I also attempted to make the experience interesting by introducing one
of Python's more advanced features, called "generator functions," into the mix.
Hopefully it strikes a balance. Please comment if you are utterly confused by
generators — I may add an alternate section that allows the reader to avoid
them altogether.
You kata wanna...
A number of leaders in the programming community are hot on this trend called
"code katas." I'm actually a big fan of this trend, mainly because I've
been writing code for no reason, hating on it, throwing it away, and
subsequently rewriting it for my entire life, but I now get to call it
something cool- and ninja-sounding. Doing such things in my spare time is no
longer considered "inexcusable nerdiness;" rather, it's my small endeavor to
bring professionalism to the field of software engineering. *cough*
One reason that I really enjoy this new trend is that programmers are posting
their own morsel-sized programming problems left and right, giving ample
opportunities to explore new languages (and dusty corners of ones you know
well) without accidentally becoming BDFL of a seminal framework or utility.
[‡]
RC4 Pseudocode
Case in point, I'll use the recent kata from Programming Praxis for this
Python exercise, as they provide straightforward pseudocode. Here's the
encryption algorithm named RC4, as quoted from Programming Praxis:
The key array K is initialized like this:
for i from 0 to 255
K[i] := i
j := 0
for i from 0 to 255
j := (j + K[i] + key[i mod klen]) mod 256
swap K[i], K[j]
Once the key array is initialized, the pseudo-random byte generator uses a
similar calculation to build the stream of random bytes:
i := 0
j := 0
start:
i := (i + 1) mod 256
j := (j + K[i]) mod 256
swap K[i], K[j]
output K[ (K[i] + K[j]) mod 256 ]
goto start
The first step in writing our RC4 program is to translate this pseudocode to
Python, while the second step is to add a command line front-end for some
off-the-cuff implementation experience.
If you'd like to look at the final program ahead of time, grab a copy of my
reference implementation.
Porting the initialization
For initialization, we use a provided key to calculate a 256 entry integer
sequence. Open a new file called rc4.py and write the following function:
def initialize(key):
"""Produce a 256-entry list based on `key` (a sequence of numbers) as
the first step in RC4.
Note: indices in key greater than 255 will be ignored.
"""
k = range(256)
j = 0
for i in range(256):
j = (j + k[i] + key[i % len(key)]) % 256
k[i], k[j] = k[j], k[i]
return k
The simplicity of the translation demonstrates why Python is sometimes called
"executable pseudocode". Breaking it down line by line:
defines a function named initialize that takes a single argument,
key.
A documentation string ("docstring" for short). In Python, documentation is
associated with a function even at runtime, in contrast to traditional
JavaDoc or POD. [§] If the first statement in a function is a string
literal, it is used as the docstring for that function. [¶]
The built-in range function returns a list of values. [#] "Built-in"
is the terminology used for items that are "available all the time without
explicitly importing anything."
This function also has a two-argument form, range(start, stop);
however, in the single argument form, start has a default of 0, so the
function invocation returns a list of all the integers in the mathematical
interval [0, 256), for a total of 256 values.
There is only one for loop syntax: for [identifier] in [iterable].
Lists are iterable because they contain a sequence of objects.
Finite collections also support the built-in function len([sizable]).
The way that numerical arithmetic works and sequence indexing via
seq[idx] should be familiar.
Python has an elegant swap capability — what's important to note is
that the entire right hand side is evaluated, then assigned to the left
hand side.
Python functions optionally return a value. If no return statement is
encountered, None is returned, which indicates the absence of a value
(docs).
Generators: functions that pause
Python has a convenient feature, called "generator functions," that allows you
to create a stream of values using function-definition syntax. [♠] You can
think of generator functions as special functions that can pause and resume,
returning a value each time it pauses.
The canonical example illustrates the concept well — use the interactive
Python shell to explore how generator functions work, by running the python
command without arguments. Make sure the version is python2.3 or above.
Once you're in the interactive shell, type the following:
>>> def gen_counter():
... i = 0
... while True:
... yield i
... i += 1
...
>>>
Note the use of a yield statement, which tells Python that it is a
generator function. Calling a generator function creates an iterable generator
object, which can then produce a potentially infinite series of values:
>>> counter = gen_counter()
>>> print counter
<generator object gen_counter at 0xb7e3fbe4>
>>> counter.next()
0
>>> counter.next()
1
Note that because the stream of values is potentially infinite and lazily
evaluated, there's no concept of length: it's not representative of a container
so much as a sequence:
>>> len(counter)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
Also important to note is that none of the local values in the generator
instances are shared; i.e. instantiating a second generator has no effect on
the first:
>>> one_counter = gen_counter()
>>> another_counter = gen_counter()
>>> one_counter.next()
0
>>> another_counter.next()
0
Since generators are iterable, you can use them in a for loop just like
containers. (But watch out for infinite generators and no break condition
in your for loop!)
If you're still confused, the official tutorial's section on generators
may help to clarify things. For an in-depth look at generators and why they're
awesome, David M. Beazley's PyCon presentation on generators is
excellent.
Applying generators to RC4
This dove-tails nicely with the second part of the algorithm, which requires a
stream of values to XOR against. The generator is nearly a direct translation
from the pseudocode, which you may also add to rc4.py:
def gen_random_bytes(k):
"""Yield a pseudo-random stream of bytes based on 256-byte array `k`."""
i = 0
j = 0
while True:
i = (i + 1) % 256
j = (j + k[i]) % 256
k[i], k[j] = k[j], k[i]
yield k[(k[i] + k[j]) % 256]
Each time .next() is called on the generator instance, the function
executes until the first yield statement is encountered, produces that
value, and saves the function state for later.
Yes, we could create a big list of pseudo-random values the length of the
text, but creating them all at the same time adds O(len(text)) memory
overhead, whereas the generator is constant memory overhead (and
computationally efficient).
Tying it together
Now we just need a function that does the XORing, which teaches us about
strings and characters.
def run_rc4(k, text):
cipher_chars = []
random_byte_gen = gen_random_bytes(k)
for char in text:
byte = ord(char)
cipher_byte = byte ^ random_byte_gen.next()
cipher_chars.append(chr(cipher_byte))
return ''.join(cipher_chars)
Line by line:
An empty list cipher character accumulator is created.
The generator object is instantiated by calling the generator function.
As you can see from the for loop, Python strings are iterable as
sequences of characters. Characters in Python are just strings of length
one, so you can think of a string iterator as stepping over all of its
one-character substrings in order.
To convert a textual character into its character-code numerical value, the
built-in ord function is used (docs).
The meat of the algorithm: XOR the textual character with the next
pseudo-random byte from the byte stream.
After obtaining the cipher-byte through the XOR, we want to convert back to
a textual (character) representation, which we do via the built-in chr
function (docs). We then place that character into a sequence of cipher
characters. [♥]
To join together characters to form a string, we use the
str.join([iterable]) method (docs). [♦] Note that, on some
platforms, this is much more efficient than using += (for string
concatenation) over and over again. It's a best practice to use this
sequence-joining idiom to avoid possible concatenation overhead. [♣]
Front-end fun
If you thought that the pseudo-code translation looked like a piece of cake,
you may feel up to a challenge: write a command line interface that:
Asks for an encryption key.
Turns the key to a sequence of integer values and initializes with it.
Continually asks for user-provided text to translate and spits out the
corresponding cipher text.
What you need to know
In Python 2.x print is a statement that is followed by comma-separated
values, where each comma turns into a space. The print statement puts a
newline at the end of its output by default:
>>> print 'a', 'b', 'c', 'd'
a b c d
To suppress the newline (for example, in a loop), leave on a trailing
comma:
>>> for char in ['a', 'b', 'c', 'd']:
... print char,
...
a b c d
If there's something that you can't do using the above, I'll refer you to
the Python tutorial's section on fancier output formatting.
The built-in function called raw_input (docs) displays a message
and then requests user input as a command line, returning the user input as
a string. For example:
name = raw_input('What is your name? ')
print 'Your name is', name
The built-in function called repr (docs) returns the Python
representation of a string (or any object) — this is useful for escaping
strings with funky, non- printable characters in it, as our cipher
algorithm is likely to do. For example, you'll probably want to use
something along the lines of:
cipher_text = run_rc4(k, text)
print 'Cipher text:', repr(cipher_text)
The character-escaping performed by repr can be reversed using the
string decode method: str.decode('string_escape'). For example:
>>> text_with_escapes = '\\x61\\x62\\x63'
>>> print text_with_escapes
\x61\x62\x63
>>> print text_with_escapes.decode('string_escape')
abc
So if you want to allow the user to enter ciphertext at the command prompt,
you can read it in and decode it from the escaped format.
If you just put the CLI front-end code to execute at the bottom of the
rc4.py file, it will work; however, it's a best practice to test to
make sure that rc4.py is the file that's being executed. To do this,
wrap the command line interface code in a test like the following:
if __name__ == '__main__': # This is the file being executed.
main()
If you need help
I wrote a reference implementation and posted it to github — feel free to
check it out if you get stuck.
Here's a sample usage of my implementation:
===========
RC4 Utility
===========
The output values are valid Python strings. They may contain escape characters
of the form \xhh to avoid confusing your terminal emulator. Only the first 256
characters of the encryption key are used.
Enter an encryption key: an encryption key!
Enter plain or cipher text: Victory is mine!
Your RC4 text is: '\xecL\xce(\x16\x8e3\xf02!\xcd\xc6\x9a\xc0j\x98'
Enter plain or cipher text: '\xecL\xce(\x16\x8e3\xf02!\xcd\xc6\x9a\xc0j\x98'
Unescaping ciphertext...
Your RC4 text is: 'Victory is mine!'
Once you find that your cipher is reversible, you've probably got it right!
Again, please comment if anything is unclear.
Footnotes
Python 3.1, you shouldn't have!
I highly appreciate the presents that the Python 3.1 team (unwittingly) got me
for my birthday this year. This morning I wrote the following snippet to
determine the day-frequency of birthday occurrences: [*]
#!/usr/bin/env python3.1
import collections
import datetime
from operator import itemgetter
birthday = {part: int(input('Enter birth {}: '.format(part)))
for part in 'year month day'.split()}
now = datetime.datetime.now()
days = []
for year in range(birthday['year'], now.year + 1):
iter_birthday = dict(birthday)
iter_birthday['year'] = year
date = datetime.date(**iter_birthday)
days.append(date.strftime('%A'))
counter = collections.Counter(days)
fmt = '{:15} {:>3}'
for day, frequency in sorted(counter.items(), key=itemgetter(1),
reverse=True):
print(fmt.format(day, frequency))
print(fmt.format('Total', sum(counter.values())))
Check out the Python 3.1 goodness: automatically numbered format strings and
collections.Counter. It's also a relief that I no longer have to type
iter, x, or raw — Python is even prettier without the 2.x cruft.
Turns out that my original day of birth is still in the lead!
Updates
Added dict comprehensions per astute comments. :-)
Replaced lambda argument to sorted with the more readable
operator.itemgetter, which I didn't realize exists! Thanks @xtian.
Footnotes
Registry pattern trumps import magic
The other night I saw an interesting tweet in the #Python Twitter channel --
Patrick was looking to harness the dynamism of a language like Python in a
way that many Pythonistas would consider magical. [*] Coming from languages
with more rigid execution models, it's understandably easy to confuse
dynamic and magical. [†]
What is magic?
To quote the jargon file, magic is:
Characteristic of something that works although no one really understands
why (this is especially called black magic).
Taken in the context of programming, magic refers to code that works without a
straightforward way of determining why it works.
Today's more flexible languages provide the programmer with a significant
amount of power at runtime, making the barrier to "accidental magic" much
lower. As a programmer who works with dynamic languages, there's an important
responsibility to keep in mind: err on the side of caution with the Principle
of Least Surprise.
[T]o design usable interfaces, it's best when possible not to design an
entire new interface model. Novelty is a barrier to entry; it puts a
learning burden on the user, so minimize it.
This principle indicates that using well known design patterns and language
idioms is a "best practice" in library design. When you follow that
guideline, people will already have an understanding of the interface that
you're providing; therefore, they will have one less thing to worry about in
leveraging your library to write their code.
Discovery Mechanism Proposals
Patrick is solving a common category of problem: he wants to allow clients to
flexibly extend his parsing library's capabilities. For example, if his
module knows how to parse xml and yaml files out of the box,
programmers using his library should be able to add their own rst and
html parser capabilities with ease.
Patrick's proposal is this:
Have the programmer place all extension modules that might contain
parser classes in a known directory.
In a factory class constructor, take a directory listing of the known
directory.
Import every module present in that listing.
Inspect each module imported this way for class members.
For each class found, add it to an accumulator if it inherits from a
Parser abstract base class provided by the module.
If you were to do this, you would use the various utilities in the imp
module to load the modules dynamically, then determine the appropriate
classes via the inspect module. [‡]
My counter-proposal is this, which is also known as the Registry
Pattern, a form of runtime configuration and behavior extension:
Parser library:
class UnknownMimetypeException(Exception): pass
class ParseError(Exception): pass
class IParser:
"""
Reference interface for parser classes;
inheritance is not necessary.
"""
parseable_mimetypes = set()
def __init__(self, file):
self.file = file
self.doctree = None
def parse(self):
"""
Parse :ivar:`file` and place the parsed document
tree into :ivar:`doctree`.
"""
raise NotImplementedError
class ParserFacade:
"""
Assumes that there can only be one parser per mimetype.
:ivar mimetype_to_parser_cls: Storage for parser registry.
"""
def __init__(self):
self.mimetype_to_parser_cls = {}
def register_parser(self, cls):
for mimetype in cls.parseable_mimetypes:
self.mimetype_to_parser_cls[mimetype] = cls
return cls # For use as a decorator.
def parse(self, file, mimetype):
"""
Determine the appropriate parser for the mimetype,
create a parser to parse the file, and perform
the parsing.
:return: The parser object.
"""
try:
parser_cls = self.mimetype_to_parser_cls[mimetype]
except KeyError:
raise UnknownMimetypeException(mimetype)
parser = parser_cls(file)
parser.parse() # May raise ParseError
return parser
default_facade = ParserFacade()
register_parser = default_facade.register_parser
parse = default_facade.parse
Client code:
from parser_lib import register_parser
@register_parser
class SpamParser:
"""
Parses ``.spam`` files.
Conforms to implicit parser interface of `parser_lib`.
"""
parseable_mimetypes = {'text/spam'}
def __init__(self, file):
self.file = file
self.doctree = None
def parse(self):
raise NotImplementedError
After the client code executes, the SpamParser will then be available for
parsing text/spam mimetype files via parser_lib.parse.
Here are some of my considerations in determining which of these is the least
magical:
Which interface is the easiest to explain?
Which implementation will be the easiest to explain?
Which is more fragile? (Which is most likely to break when
"special case uses" crop up?)
Which is easier to test?
Magical Allure
The problem with magic is that it is freaking cool and it drives all the
ladies crazy. [¶] As a result, the right hemisphere of your developer-brain
yearns for your library clients to read instructions like:
Drag and drop your Python code into my directory — I'll take care of it
from there.
That's right, that's all there is to it.
Oh, I know what you're thinking — yes, I'm available — check out
parser_lib.PHONE_NUMBER and give me a call sometime.
But, as you envision phone calls from sexy Pythonistas, the left hemisphere of
your brain is screaming at the top of its lungs! [#]
Magic leaves the audience wondering how the trick is done, and the analytical
side of the programmer mind hates that. It implies that there's a non-trivial
abstraction somewhere that does reasonably complex things, but it's unclear
where it can be found or how to leverage it differently.
Coders need control and understanding of their code and, by extension, as much
control and understanding over third party code as is reasonably possible.
Because of this, concise, loosely coupled, and extensible abstractions are
always preferred to the imposition of elaborate usage design ideas on
clients of your code. It's best to assume that people will want to leverage the
functionality your code provides, but that you can't foresee the use cases.
To Reiterate: Dynamic does not Imply Magical
Revisiting my opening point: anecdotal evidence suggests that some members of
the static typing camp see we programming-dynamism dynamos as anarchic
lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into
lawless towns of code, type checking blowing by the vacant sheriff's station as
tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it's easy
to see why you would start doing all sorts of fancy things when you cross into
dynamism town — little do you know, we don't take kindly to that 'round these
parts.
In other, more intelligble words, this is a serious misconception — dynamism
isn't a free pass to disregard the Principle of Least Surprise — dynamism
proponents still want order in the programming universe. Perhaps we value our
sanity even more! The key insight is that programming dynamism does allow
you additional flexibility when it's required or practical to use. More
rigid execution models require you to use workarounds, laboriously at times,
for a similar degree of flexibility.
As demonstrated by Marius' comment in my last entry, Python coders
have a healthy respect for the power of late binding, arbitrary
code execution on module import, and seamless platform integration.
Accompanying this is a healthy wariness of black magic.
Caveat
It's possible that Patrick was developing a closed-system application (e.g.
the Eclipse IDE) and not a library like I was assuming.
In the application case, extensions are typically discovered (though not
necessarily activated) by enumerating a directory. When the user activates such
an extension, the modules found within it are loaded into the application.
This is the commonly found plugin model — it's typically more difficult to
wrap the application interface and do configurations at load time, so the
application developer must provide an extension hook.
However, the registration pattern should still be preferred to reflection in
this case! When the extension is activated and the extension modules load, the
registration decorator will be executed along with all the other top-level code
in the extension modules.
The extension has the capability to inform the application of the extension's
functionality instead having the application query the plugin for its
capabilities. This is a form of loosely coupled cooperative configuration
that eases the burden on the application and eliminates the requirement to
foresee needs of the extensions. [♠]
Footnotes
Monstrous polymorphism and a Python post-import hook decorator
I queue up a few thousand things to do before I get on an airplane: synchronize
two-thousand Google Reader entries, load up a bunch of websites I've been
meaning to read, and make sure for-fun projects are pulled from their most
updated branches.
Then, once I get up in the air, I realize that I don't really want to do 90%
of those things crammed into a seat with no elbow room. I end up doing one or
two. Along with reading Stevey's Drunken Blog Rant: When Polymorphism
Fails, this entry is all the productivity I can claim. The full code
repository for this entry is online if you'd like to follow along.
Polymorphism Recap
The word "polymorphic" comes from Greek roots meaning "many shaped." (Or they
lied to me in school — one of those.) From a worldly perspective I can see
this meaning two things:
A single object can take on many shapes, or
Requirements for a general "shape" can be satisfied by different categories
of objects.
As it turns out, both of these concepts apply to the Object-Oriented
programming, but the canonical meaning is the latter. [*] As Yegge
says:
If you have a bunch of similar objects [...], and they're all supposed to
respond differently to some situation, then you add a virtual method to them
and implement it differently for each object.
(If you don't know what a virtual method is, the Wikipedia page has an
alternate explanation.)
Yegge's Example
Yegge demonstrates that strictly adhering to the principles of polymorphism
does not always produce the best design:
Let's say you've got a big installed base of monsters. [...] Now let's say one
of your users wants to come in and write a little OpinionatedElf monster. [...]
Let's say the OpinionatedElf's sole purpose in life is to proclaim whether it
likes other monsters or not. It sits on your shoulder, and whenever you run
into, say, an Orc, it screams bloodthirstily: "I hate Orcs!!! Aaaaaargh!!!"
(This, incidentally, is how I feel about C++.)
The polymorphic approach to this problem is simple: go through every one of
your 150 monsters and add a doesMrOpinionatedElfHateYou() method.
This is a great counterexample — it induces an instant recognition of absurdity.
He then touches on the fact that dynamic languages allow you to do neat things
consistent with polymorphism due to the flexibility of the object structure
(which is typically just a hash map from identifiers to arbitrary object
values):
I guess if you could somehow enumerate all the classes in the system, and
check if they derive from Monster, then you could do this whole thing in a few
lines of code. In Ruby, I bet you can... but only for the already-loaded
classes. It doesn't work for classes still sitting on disk! You could solve
that, but then there's the network...
This is clearly impractical, but I figured there was some exploratory value to
implementing this challenge in Python. This entry is a small walk-through for
code to detect interface conformity by inspection, enumerate the classes in the
environment, manipulate classes in place, and add an import hook to manipulate
classes loaded from future modules.
The Antagonist
Double entendre intended. :-)
class OpinionatedElf(object):
is_liked_by_class_name = {
'OpinionatedElf': True,
'Orc': False,
'Troll': False}
def __init__(self, name):
self.name = name
def be_scary(self):
print("I'm small, ugly, and don't like the cut of your jib!")
def proclaim_penchance(self, other):
if not IMonster.is_conforming(other):
print("I can't even tell what that is!")
return
is_liked = other.is_liked_by_elf()
class_name = other.__class__.__name__
if is_liked is None:
print("I'm not sure how I feel about %s" % class_name)
return
emotion = 'love' if is_liked else 'hate'
print('I %s %s!!! Aaaaaargh!!!' % (emotion, other.__class__.__name__))
Determining which Classes are Monsters
First of all, Python doesn't require (nor does it encourage) a rigid type
hierarchy. Python's all about the interfaces, which are often implicit.
Step one is to create a way to recognize classes that implement the
monster interface:
required_methods = ['be_scary']
def be_scary(self):
raise NotImplementedError
@classmethod
def is_conforming(cls, object):
result = all(callable(getattr(object, attr_name, None))
for attr_name in cls.required_methods)
logging.debug('%s conforms? %s', object, result)
return result
assert IMonster.is_conforming(IMonster)
This is a simple little class — there are better third party libraries to use
if you want real interface functionality (i.e. more generic conformity
testing and Design By Contract).
Enumerating the Classes in the Environment
All of the modules that have been loaded into the Python environment are placed
into sys.modules. By inspecting each of these modules, we can
manipulate the classes contained inside if they conform to our monster
interface.
for name, module in sys.modules.iteritems():
extend_monsters(module)
The extend_monsters function is a bit nuanced because immutable modules
also live in sys.modules. We skip those, along with abstract base classes,
which have trouble with inspect.getmembers():
def extend_monsters(module, extension_tag='_opinionated_extended'):
"""Extend monsters in the module's top-level namespace to
tell if they are liked by the :class:`OpinionatedElf`.
and tag it with the :param:`extension_tag` as a flag name.
Do not attempt to extend already-flagged modules.
Do not clobber existing methods with the extension method name.
Warning: swallows exceptional cases where :param:`module`
is builtin, frozen, or None.
"""
name = module.__name__ if module else None
logging.info('Instrumenting module %s', name)
if not module or imp.is_builtin(name) or imp.is_frozen(name) \
or getattr(module, extension_tag, False):
logging.info('Skipping module: %s', name)
return
module._opinionated_instrumented = True
classes = inspect.getmembers(module, inspect.isclass)
for name, cls in classes:
logging.debug('%s: %s', name, cls)
try:
conforming = IMonster.is_conforming(cls)
except AttributeError, e:
if '__abstractmethods__' in str(e): # Abstract class.
continue
raise
if not conforming:
continue
class_name = cls.__name__
logging.debug('Instrumenting class %s', class_name)
attr_name = 'is_liked_by_elf'
if hasattr(cls, attr_name): # Don't clobber existing methods.
logging.warn('Method already exists: %s', cls)
continue
logging.info('Setting %s on %s', attr_name, class_name)
setattr(cls, attr_name,
lambda self: OpinionatedElf.is_liked_by_class_name.get(
self.__class__.__name__, None))
If we were going to be thorough, we would recurse on the members of the class
to see if the class scope was enclosing any more IMonster classes, but
you're never really going to find them all: if a module defines a monster class
in a function-local scope, there's no good way to get the local class statement
and modify it through inspection.
In any case, we're at the point where we can modify all monsters in the
top-level namespace of already-loaded modules. What about modules that we have
yet to load?
Post-import Hook
There is no standard post-import hook (that I know of) in Python. PEP 369
looks promising, but I couldn't find any record of additional work being done
on it. The current import hooks, described in PEP 302, are all pre-import
hooks. As such, you have to decorate the __import__ builtin, wrapping the
original with your intended post-input functionality, like so: [†]
def import_decorator(old_import, post_processor):
"""
:param old_import: The import function to decorate, most likely
``__builtin__.__import__``.
:param post_processor: Function of the form
`post_processor(module) -> module`.
:return: A new import function, most likely to be assigned to
``__builtin__.__import__``.
"""
assert all(callable(fun) for fun in (old_import, post_processor))
def new_import(``*args``, ``**kwargs``):
module = old_import(\*args, \*\*kwargs)
return post_processor(module)
return new_import
After which we can replace the old __import__ with its decorated counterpart:
__builtin__.__import__ = import_decorator(__builtin__.__import__,
extend_monsters)
The Network
Yegge brings up the issue of dynamically generated classes by mentioning
network communications, calling to mind examples such as Java's RMI and
CORBA. This is a scary place to go, even just conceptualizing. If
metaclasses are used, I don't see any difficulty in decorating __new__ with
the same kind of inspection we employed above; however, code generation
presents potentially insurmountable problems.
Decorating the eval family of functions to modify new classes created
seems possible, but it would be challenging and requires additional
research on my part. exec is a keyword/statement, which I would think is a
hopeless cause.
Footnotes