Posts Tagged ‘Dynamism’

Inedible vectors of spam: learning non-reified generics by example

Friday, December 4th, 2009

I’ve been playing with the new Java features, only having done minor projects in it since 1.4, and there have been a lot of nice improvements! One thing that made me do a double take, however, was a run-in with non-reified types in Java generics. Luckily, one of my Java-head friends was online and beat me with a stick of enlightenment until I understood what was going on.

In the Java generic system, the collections are represented by two separate, yet equally important concepts — the compile-time generic parameters and the run-time casts who check the collection members. These are their stories.

Dun, dun!

An example: worth a thousand lame intros

The following code is a distilled representation of the situation I encountered:

import java.util.Arrays;
import java.util.List;
 
public class BadCast {
 
    static interface Edible {}
 
    static class Spam implements Edible {}
 
    List<Spam> canSomeSpam() {
        return Arrays.asList(new Spam(), new Spam(), new Spam());
    }
 
    /**
     * @note Return type *must* be List<Editable> (because we intend to
     *       implement an interface that requires it).
     */
    List<Edible> castSomeSpam() {
        return (List<Edible>) canSomeSpam();
    }
 
}

It produced the following error in my IDE:

Cannot cast from List<BadCast.Spam> to List<BadCast.Edible>

At which point I scratched my head and thought, "If all Spams are Edible, [*] why won’t it let me cast List<Spam> to List<Edible>? This seems silly."

Potential for error

A slightly expanded example points out where that simple view goes wrong: [†]

import java.util.Arrays;
import java.util.List;
import java.util.Vector;
 
public class GenericFun implements Runnable {
 
    static interface Edible {}
 
    static class Spam implements Edible {
 
        void decompose() {}
    }
 
    List<Spam> canSomeSpam() {
        return Arrays.asList(new Spam(), new Spam(), new Spam());
    }
 
    /**
     * Loves to stick his apples into things.
     */
    static class JohnnyAppleseed {
 
        static class Apple implements Edible {}
 
        JohnnyAppleseed(List<Edible> edibles) {
            edibles.add(new Apple());
        }
 
    }
 
    @Override
    public void run() {
        List<Spam> spams = new Vector<Spam>(canSomeSpam());
        List<Edible> edibles = (List<Edible>) spams;
        new JohnnyAppleseed(edibles); // He puts his apple in our spams!
        for (Spam s : spams) {
            s.decompose(); // What does this do when it gets to the apple!?
        }
    }
}

We make a (mutable) collection of spams, but this time, unlike in the previous example, we keep a reference to that collection. Then, when we give it to JohnnyAppleseed, he sticks a damn Apple in there, invalidating the supposed type of spams! (If you still don’t see it, note that the object referenced by spams is aliased to edibles.) Then, when we invoke the decompose method on the Apple that is confused with a Spam, what could possibly happen?!

The red pill: there is no runtime-generic-type-parameterization!

Though the above code won’t compile, this kind of thing actually is possible, and it’s where the implementation of generics starts to leak through the abstraction. To quote Neal Gafter:

Many people are unsatisfied with the restrictions caused by the way generics are implemented in Java. Specifically, they are unhappy that generic type parameters are not reified: they are not available at runtime. Generics are implemented using erasure, in which generic type parameters are simply removed at runtime. That doesn’t render generics useless, because you get typechecking at compile-time based on the generic type parameters, and also because the compiler inserts casts in the code (so that you don’t have to) based on the type parameters.

The implementation of generics using erasure also causes Java to have unchecked operations, which are operations that would normally check something at runtime but can’t do so because not enough information is available. For example, a cast to the type List<String> is an unchecked cast, because the generated code checks that the object is a List but doesn’t check whether it is the right kind of list.

At runtime, List<Edible> is no different from List. At compile-time, however, a List<Edible> cannot be cast to from List<Spam>, because it knows what evil things you could then do (like sticking Apples in there).

But if you did stick an Apple in there (like I told you that you can actually do, with evidence to follow shortly), you wouldn’t know anything was wrong until you tried to use it like a Spam. This is a clear violation of the "error out early" policy that allows you to localize your debugging. [‡]

In what way does the program error out when you try to use the masquerading Apple like a Spam? Well, when you write:

for (Spam s : spams) {
    s.decompose(); // What does this do when it gets to the apple!?
}

The code the compiler actually generates is:

for (Object s : spams) {
    ((Spam)s).decompose();
}

At which point it’s clear what will happen to the Apple instance — a ClassCastException, because it’s not a Spam!

Exception in thread "main" java.lang.ClassCastException: GenericFun$JohnnyAppleseed$Apple cannot be cast to GenericFun$Spam
        at GenericFun.run(GenericFun.java:36)

Backpedaling

Okay, so in the first example we didn’t keep a reference to the List around, making it acceptable (but bad style) to perform an unchecked cast:

Since, under the hood, the generic type parameters are erased, there’s no runtime difference between List<Edible> and plain ol’ List. If we just cast to List, it will give us a warning:

Type safety: The expression of type List needs unchecked conversion to conform to List<BadCast.Edible>

The real solution, though, is to just make an unnecessary "defensive copy" when you cross this function boundary; i.e.

List<Edible> castSomeSpam() {
    return new Vector<Edible>(canSomeSpam());
}

Footnotes

[*] Obviously a point of contention among ham connoisseurs.
[†]

This doesn’t compile, because we’re imagining that the cast were possible. Compilers don’t respond well when you ask them to imagine things:

$ javac 'Imagine you could cast List<Spam> to List<Edible>!'
javac: invalid flag: Imagine you could cast List<Spam> to List<Edible>!
Usage: javac <options> <source files>
use -help for a list of possible options
[‡] Note that if you must do something like this, you can use a Collections.checkedList to get the early detection. Still, the client is going to be pissed that they tried to put their delicious Ham in there and got an unexpected ClassCastException — probably best to use Collections.unmodifiableList if the reference ownership isn’t fully transferred.

Registry pattern trumps import magic

Monday, June 1st, 2009

The other night I saw an interesting tweet in the #Python Twitter channelPatrick was looking to harness the dynamism of a language like Python in a way that many Pythonistas would consider magical. [*] Coming from languages with more rigid execution models, it’s understandably easy to confuse dynamic and magical. [†]

What is magic?

To quote the jargon file, magic is:

Characteristic of something that works although no one really understands why (this is especially called black magic).

Taken in the context of programming, magic refers to code that works without a straightforward way of determining why it works.

Today’s more flexible languages provide the programmer with a significant amount of power at runtime, making the barrier to "accidental magic" much lower. As a programmer who works with dynamic languages, there’s an important responsibility to keep in mind: err on the side of caution with the Principle of Least Surprise.

[T]o design usable interfaces, it’s best when possible not to design an entire new interface model. Novelty is a barrier to entry; it puts a learning burden on the user, so minimize it.

This principle indicates that using well known design patterns and language idioms is a "best practice" in library design. When you follow that guideline, people will already have an understanding of the interface that you’re providing; therefore, they will have one less thing to worry about in leveraging your library to write their code.

Discovery Mechanism Proposals

Patrick is solving a common category of problem: he wants to allow clients to flexibly extend his parsing library’s capabilities. For example, if his module knows how to parse xml and yaml files out of the box, programmers using his library should be able to add their own rst and html parser capabilities with ease.

Patrick’s proposal is this:

  • Have the programmer place all extension modules that might contain parser classes in a known directory.
  • In a factory class constructor, take a directory listing of the known directory.
  • Import every module present in that listing.
  • Inspect each module imported this way for class members.
  • For each class found, add it to an accumulator if it inherits from a Parser abstract base class provided by the module.

If you were to do this, you would use the various utilities in the imp module to load the modules dynamically, then determine the appropriate classes via the inspect module. [‡]

My counter-proposal is this, which is also known as the Registry Pattern, a form of runtime configuration and behavior extension:

  • Have the programmer import a decorator from our module.
  • Let them decorate any class [§] that conforms to the implicit Parser interface.

Parser library:

class UnknownMimetypeException(Exception): pass
class ParseError(Exception): pass
 
class IParser:
 
    """Reference interface for parser classes -- inheritance is not
    necessary."""
 
    parseable_mimetypes = set()
 
    def __init__(self, file):
        self.file = file
        self.doctree = None
 
    def parse(self):
        """Parse :ivar:`file` and place the parsed document tree into
        :ivar:`doctree`.
        """
        raise NotImplementedError
 
 
class ParserFacade:
 
    """Assumes that there can only be one parser per mimetype.
    :ivar mimetype_to_parser_cls: Storage for parser registry.
    """
 
    def __init__(self):
        self.mimetype_to_parser_cls = {}
 
    def register_parser(self, cls):
        for mimetype in cls.parseable_mimetypes:
            self.mimetype_to_parser_cls[mimetype] = cls
 
    def parse(self, file, mimetype):
        """Determine the appropriate parser for the mimetype, create a
        parser to parse the file, and perform the parsing.
 
        :return: The parser object.
        """
        try:
            parser_cls = self.mimetype_to_parser_cls[mimetype]
        except KeyError:
            raise UnknownMimetypeException(mimetype)
 
        parser = parser_cls(file)
        parser.parse() # May raise ParseError
        return parser
 
 
default_facade = ParserFacade()
register_parser = default_facade.register_parser
parse = default_facade.parse

Client code:

from parser_lib import register_parser
 
@register_parser
class SpamParser:
 
    """Parses ``.spam`` files.
    Conforms to implicit parser interface of `parser_lib`.
    """
 
    parseable_mimetypes = {'text/spam'}
 
    def __init__(self, file):
        self.file = file
        self.doctree = None
 
    def parse(self):
        raise NotImplementedError

After the client code executes, the SpamParser will then be available for parsing text/spam mimetype files via parser_lib.parse.

Here are some of my considerations in determining which of these is the least magical:

  • Which interface is the easiest to explain?
  • Which implementation will be the easiest to explain?
  • Which is more fragile? (Which is most likely to break when "special case uses" crop up?)
  • Which is easier to test?

Magical Allure

The problem with magic is that it is freaking cool and it drives all the ladies crazy. [¶] As a result, the right hemisphere of your developer-brain yearns for your library clients to read instructions like:

Drag and drop your Python code into my directory — I’ll take care of it from there.

That’s right, that’s all there is to it.

Oh, I know what you’re thinking — yes, I’m available — check out parser_lib.PHONE_NUMBER and give me a call sometime.

But, as you envision phone calls from sexy Pythonistas, the left hemisphere of your brain is screaming at the top of its lungs! [#]

Magic leaves the audience wondering how the trick is done, and the analytical side of the programmer mind hates that. It implies that there’s a non-trivial abstraction somewhere that does reasonably complex things, but it’s unclear where it can be found or how to leverage it differently.

Coders need control and understanding of their code and, by extension, as much control and understanding over third party code as is reasonably possible. Because of this, concise, loosely coupled, and extensible abstractions are always preferred to the imposition of elaborate usage design ideas on clients of your code. It’s best to assume that people will want to leverage the functionality your code provides, but that you can’t foresee the use cases.

To Reiterate: Dynamic does not Imply Magical

Revisiting my opening point: anecdotal evidence suggests that some members of the static typing camp see we programming-dynamism dynamos as anarchic lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into lawless towns of code, type checking blowing by the vacant sheriff’s station as tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it’s easy to see why you would start doing all sorts of fancy things when you cross into dynamism town — little do you know, we don’t take kindly to that ’round these parts.

In other, more intelligble words, this is a serious misconception — dynamism isn’t a free pass to disregard the Principle of Least Surprise — dynamism proponents still want order in the programming universe. Perhaps we value our sanity even more! The key insight is that programming dynamism does allow you additional flexibility when it’s required or practical to use. More rigid execution models require you to use workarounds, laboriously at times, for a similar degree of flexibility.

As demonstrated by Marius’ comment in my last entry, Python coders have a healthy respect for the power of late binding, arbitrary code execution on module import, and seamless platform integration. Accompanying this is a healthy wariness of black magic.

Caveat

It’s possible that Patrick was developing a closed-system application (e.g. the Eclipse IDE) and not a library like I was assuming.

In the application case, extensions are typically discovered (though not necessarily activated) by enumerating a directory. When the user activates such an extension, the modules found within it are loaded into the application. This is the commonly found plugin model — it’s typically more difficult to wrap the application interface and do configurations at load time, so the application developer must provide an extension hook.

However, the registration pattern should still be preferred to reflection in this case! When the extension is activated and the extension modules load, the registration decorator will be executed along with all the other top-level code in the extension modules.

The extension has the capability to inform the application of the extension’s functionality instead having the application query the plugin for its capabilities. This is a form of loosely coupled cooperative configuration that eases the burden on the application and eliminates the requirement to foresee needs of the extensions. [♠]

Footnotes

[*] Note that you can’t call it dynamic programming, as that would alias a well known term from the branch of computer science concerned with algorithms. Programming language dynamism it is!
[†] Much like a dehydrated wanderer in the desert mistakes a shapely pile of sand for an oasis!
[‡] As of the date of this publishing, Patrick’s implementation seems to have gone a bit astray with text processing of Python source files. Prefer dynamic module loading and inspection to text processing source code! Enumerating the reasons this is preferred is beyond the scope of this article.
[§]

In Python < 3.0 you can perform class decoration without the decorator syntax. Decorator syntax is just syntactic sugar for "invoke this method and rebind the identifier in this scope", like so:

class SomeClass(object):
    pass
SomeClass = my_class_decorator(SomeClass) # Decorate the class.
[¶] Perhaps men as well, but I’ve never seen any TV evidence to justify that conclusion.
[#] Yes, in this analogy brains have lungs. If you’ve read this far you’re probably not a biologist anyway.
[♠] Of course, the plugin model always has security implications. Unless you go out of your way to make a sandboxed Python environment for plugins, you need to trust the plugins that you activate — they have the ability to execute arbitrary code.

Monstrous polymorphism and a Python post-import hook decorator

Wednesday, April 15th, 2009

I queue up a few thousand things to do before I get on an airplane: synchronize two-thousand Google Reader entries, load up a bunch of websites I’ve been meaning to read, and make sure for-fun projects are pulled from their most updated branches.

Then, once I get up in the air, I realize that I don’t really want to do 90% of those things crammed into a seat with no elbow room. I end up doing one or two. Along with reading Stevey’s Drunken Blog Rant: When Polymorphism Fails, this entry is all the productivity I can claim. The full code repository for this entry is online if you’d like to follow along.

Polymorphism Recap

The word “polymorphic” comes from Greek roots meaning “many shaped.” (Or they lied to me in school — one of those.) From a worldly perspective I can see this meaning two things:

  • A single object can take on many shapes, or
  • Requirements for a general “shape” can be satisfied by different categories of objects.

As it turns out, both of these concepts apply to the Object-Oriented programming, but the canonical meaning is the latter. [*] As Yegge says:

If you have a bunch of similar objects [...], and they’re all supposed to respond differently to some situation, then you add a virtual method to them and implement it differently for each object.

(If you don’t know what a virtual method is, the Wikipedia page has an alternate explanation.)

Yegge’s Example

Yegge demonstrates that strictly adhering to the principles of polymorphism does not always produce the best design:

Let’s say you’ve got a big installed base of monsters. [...] Now let’s say one of your users wants to come in and write a little OpinionatedElf monster. [...] Let’s say the OpinionatedElf’s sole purpose in life is to proclaim whether it likes other monsters or not. It sits on your shoulder, and whenever you run into, say, an Orc, it screams bloodthirstily: “I hate Orcs!!! Aaaaaargh!!!” (This, incidentally, is how I feel about C++.)

The polymorphic approach to this problem is simple: go through every one of your 150 monsters and add a doesMrOpinionatedElfHateYou() method.

This is a great counterexample — it induces an instant recognition of absurdity.

He then touches on the fact that dynamic languages allow you to do neat things consistent with polymorphism due to the flexibility of the object structure (which is typically just a hash map from identifiers to arbitrary object values):

I guess if you could somehow enumerate all the classes in the system, and check if they derive from Monster, then you could do this whole thing in a few lines of code. In Ruby, I bet you can… but only for the already-loaded classes. It doesn’t work for classes still sitting on disk! You could solve that, but then there’s the network…

This is clearly impractical, but I figured there was some exploratory value to implementing this challenge in Python. This entry is a small walk-through for code to detect interface conformity by inspection, enumerate the classes in the environment, manipulate classes in place, and add an import hook to manipulate classes loaded from future modules.

The Antagonist

Double entendre intended. :-)

class OpinionatedElf(object):
 
    is_liked_by_class_name = {
        'OpinionatedElf': True,
        'Orc': False,
        'Troll': False}
 
    def __init__(self, name):
        self.name = name
 
    def be_scary(self):
        print("I'm small, ugly, and don't like the cut of your jib!")
 
    def proclaim_penchance(self, other):
        if not IMonster.is_conforming(other):
            print("I can't even tell what that is!")
            return
        is_liked = other.is_liked_by_elf()
        class_name = other.__class__.__name__
        if is_liked is None:
            print("I'm not sure how I feel about %s" % class_name)
            return
        emotion = 'love' if is_liked else 'hate'
        print('I %s %s!!! Aaaaaargh!!!' % (emotion, other.__class__.__name__))

Determining which Classes are Monsters

First of all, Python doesn’t require (nor does it encourage) a rigid type hierarchy. Python’s all about the interfaces, which are often implicit. Step one is to create a way to recognize classes that implement the monster interface: [%]

class IMonster(object):
 
    required_methods = ['be_scary']
 
    def be_scary(self):
        raise NotImplementedError
 
    @classmethod
    def is_conforming(cls, object):
        result = all(callable(getattr(object, attr_name, None))
            for attr_name in cls.required_methods)
        logging.debug('%s conforms? %s', object, result)
        return result
 
assert IMonster.is_conforming(IMonster)

This is a simple little class — there are better third party libraries to use if you want real interface functionality (i.e. more generic conformity testing and Design By Contract).

Enumerating the Classes in the Environment

All of the modules that have been loaded into the Python environment are placed into sys.modules. By inspecting each of these modules, we can manipulate the classes contained inside if they conform to our monster interface.

for name, module in sys.modules.iteritems():
    extend_monsters(module)

The extend_monsters function is a bit nuanced because immutable modules also live in sys.modules. We skip those, along with abstract base classes, which have trouble with inspect.getmembers():

def extend_monsters(module, extension_tag='_opinionated_extended'):
    """Extend monsters in the module's top-level namespace to
    tell if they are liked by the :class:`OpinionatedElf`.
    and tag it with the :param:`extension_tag` as a flag name.
    Do not attempt to extend already-flagged modules.
    Do not clobber existing methods with the extension method name.
 
    Warning: swallows exceptional cases where :param:`module`
        is builtin, frozen, or None.
    """ 
    name = module.__name__ if module else None
    logging.info('Instrumenting module %s', name)
    if not module or imp.is_builtin(name) or imp.is_frozen(name) \
            or getattr(module, extension_tag, False):
        logging.info('Skipping module: %s', name)
        return
    module._opinionated_instrumented = True
    classes = inspect.getmembers(module, inspect.isclass)
    for name, cls in classes:
        logging.debug('%s: %s', name, cls)
        try:
            conforming = IMonster.is_conforming(cls)
        except AttributeError, e:
            if '__abstractmethods__' in str(e): # Abstract class.
                continue
            raise
        if not conforming:
            continue
        class_name = cls.__name__
        logging.debug('Instrumenting class %s', class_name)
        attr_name = 'is_liked_by_elf'
        if hasattr(cls, attr_name): # Don't clobber existing methods.
            logging.warn('Method already exists: %s', cls)
            continue
        logging.info('Setting %s on %s', attr_name, class_name)
        setattr(cls, attr_name,
            lambda self: OpinionatedElf.is_liked_by_class_name.get(
                self.__class__.__name__, None))

If we were going to be thorough, we would recurse on the members of the class to see if the class scope was enclosing any more IMonster classes, but you’re never really going to find them all: if a module defines a monster class in a function-local scope, there’s no good way to get the local class statement and modify it through inspection.

In any case, we’re at the point where we can modify all monsters in the top-level namespace of already-loaded modules. What about modules that we have yet to load?

Post-import Hook

There is no standard post-import hook (that I know of) in Python. PEP 369 looks promising, but I couldn’t find any record of additional work being done on it. The current import hooks, described in PEP 302, are all pre-import hooks. As such, you have to decorate the __import__ builtin, wrapping the original with your intended post-input functionality, like so: [^]

def import_decorator(old_import, post_processor):
    """
    :param old_import: The import function to decorate, most likely
        ``__builtin__.__import__``.
    :param post_processor: Function of the form
        `post_processor(module) -> module`.
    :return: A new import function, most likely to be assigned to
        ``__builtin__.__import__``.
    """
    assert all(callable(fun) for fun in (old_import, post_processor))
    def new_import(*args, **kwargs):
        module = old_import(*args, **kwargs)
        return post_processor(module)
    return new_import

After which we can replace the old __import__ with its decorated counterpart:

__builtin__.__import__ = import_decorator(__builtin__.__import__,
    extend_monsters)

The Network

Yegge brings up the issue of dynamically generated classes by mentioning network communications, calling to mind examples such as Java’s RMI and CORBA. This is a scary place to go, even just conceptualizing. If metaclasses are used, I don’t see any difficulty in decorating __new__ with the same kind of inspection we employed above; however, code generation presents potentially insurmountable problems.

Decorating the eval family of functions to modify new classes created seems possible, but it would be challenging and requires additional research on my part. exec is a keyword/statement, which I would think is a hopeless cause.

Footnotes

[*] In accordance with the former, an object can implement many interfaces.
[^] This function is actually a generic decorate-with-post-processing closure, but I added the references to import for more explicit documentation.