July 20, 2010

B&B++: bed and breakfast for programmers

1. Collect background

This is the latest in my steal-my-idea-but-give-me-free-stuff-after-you-do series, with slightly more earning potential than my last installment, "Strike a Cord".

I recently spoke to some Mozillians who had participated in a "code retreat" — I'd only heard tale of such a thing in lore and folk song, but it seems like a brilliant concept.

The idea is this: a small think tank (of one or more persons) requires a large amount of code throughput on a task which requires a high degree of focus. To facilitate that, they run far from the middling issues of civilized society and deep into the wilderness [*] to code "butt-necked in harmony with Eywa". [†] Through single-minded concentration and a dearth of non-maskable interrupts, they emerge victorious. [‡]

2. ?

Follow these simple steps to steal my idea:

  1. Assume that the aforementioned code retreat process is awesome.

  2. Make a bed-and-breakfast in the outskirts of a city that's attractive to programmers (for whatever reason).

  3. Offer retreats with high-speed internet access, offices with whiteboards, mirra chairs, height-adjustable desks, pay-as-you-go phone conference equipment, high-res DLP projectors, disco balls, whatever. Make it clearly "the works". If you want to go even further, mount speakers and sound-proof the walls. [§]

  4. Make the experience as luxurious and classy as reasonably possible so that the programmers respect the "sanctity" of the retreat: chef-prepared meals, an indisputably good coffee machine, a Z80 prominently featured as a piece of wall art, and a complimentary bag-o-munchy-chips regimen. Beautiful scenery in which one can walk and think would definitely be a plus, and proximity to a nerd-friendly bar never hurt a nerdy establishment either.

The patrons have a good degree of flexibility as a result of this setup. They might hole themselves away in offices 95% of the time, emerging only to sleep, gather delicious food, and scuttle back into their offices. Alternatively, if they're on a more casual endeavor (coding vacation?), they might choose to strike up conversations with people at meals and go out to see the sites.

3. Profit!

Please do steal my idea and make a lot of money for yourself (share it with no one!) — I only ask that you offer me a free stay once you get off the ground.

I'll leave you off with a little marketing campaign idea:

B&B++: universally evaluated as the way to B, and, after each bed and breakfast, we get a little bit better. Until we overflow. [¶]



Or a hotel.


Sadly, I can't take credit for this phrase.


Readers familiar with XP may draw a parallel to the practice of Kanban, which has a fascinating backstory, and acknowledges the awesome power of JIT.


For the mercy of those who dislike techno.


Hey, I'm giving this advice away for free, you can't expect it to all be good. No company ever survived giving their excellent primary product away for free. [#]


Ugh, too much meta-humor. If you've read and understood up to this point, I apologize.

Tool teams should work like sleeper cells

I've had some unique experiences interacting-with and participating-in tool development at previous companies that I've worked for, with the quality of those experiences in the broad spectrum from train-wreck to near-satisfactory. From that mental scarring has emerged the weighty goo of an idea, which may be interesting food for thought. [*]

How it starts

At some point in a company's growth, management notices that there is a lot of sub-par, [†] redundant, and distributed tool development going on. Employees have been manufacturing quick-and-dirty tools in order to perform their jobs more efficiently.

Management then ponders the benefit of centralizing that tool development. It seems like an easy sell:

Good management will also consider the negative repercussions of turning distributed and independent resources into a shared and centrally managed resource:

How I've seen it work (warning: depressing, hyperbolic)

  1. A group at the company makes a strong enough case to the centralized-tool-management machinery — a request for tool development is granted.

  2. A series of inevitably painful meetings are scheduled where the customer dictates their requirements, after which the tool team either rejects them or misunderstands/mis-prioritizes them because: a) that's not how it works — they have to actively gather the requirements, and b) they don't have enough time to do all the silly little things that the customer wants.

    Because people are fighting each other to get what they want, everybody forgets that the customers haven't really described the problem domain in any relevant detail.

  3. The tool team developers are happy to go code in peace, without going back for more painful meetings. They create a tool according to their understanding of the requirements during the first iteration.

  4. The customer has no idea how the tool team came up with a product that was nothing like their expectation. They say something overly dramatic like, "it's all wrong," pissing off the tool team, and lose faith in the ability of the tool team to deliver the product they want.

  5. The customer goes back to doing it manually or continue to develop their own tools, expecting that the tool team will fail.

  6. The tool team fails because the customer lost interest in telling them what they actually needed and giving good feedback. It wasn't the tool that anybody was looking for because the process doomed it from the start.

I say that this scenario is depressing because tool teams exist to make life better for everybody — they enjoy writing software that makes your life easier. Working with a tool team should not be painful. You should want to jump for joy when you start working with them and take them out to beers when you're finished working with them, because they're just that good. I think that, by taking a less traditional approach, you will be able to achieve much better results...

How it should work

  1. A group at the company makes a strong enough case to the centralized-tool-management machinery — a request for tool development is granted.

  2. A small handful of tool team operatives [‡], probably around two or three people, split off from the rest of the tool team and are placed in the company hierarchy under the team of the customers. They sit the customers' cube farm, go to their meetings to listen (but no laptops!), etc., just like a typical team member would.

  3. The customer team brings the operatives up to speed on the automatable task that must be performed each day through immersion. Depending on the frequency, breadth, and duration of the manual processes, the operatives must perform this manual process somewhere on the scale from weeks to months, until they develop a full understanding of the variety of manual processes that must be performed. [§] All operatives should be 100% assigned to the manual tasks for this duration, temporarily offloading members of customer team after their ramp-up.

  4. Bam! With an unquestionably solid understanding of the problem domain, the tool team sleeper cells activate. 80% of the manual task load is transitioned off of the operatives so that they can begin development work. Agile-style iterations of 1-2 weeks should be used.

  5. After each iteration there must be a usable product (by definition of an iteration). As a result of this, a percentage of the manual task load is shifted back onto the operatives each iteration, augmenting the original 20%. If the tool is actually developing properly, the operatives will be able to cope with the increased load over time.

  6. As the feature set begins to stabilize or the manual task load approaches zero (because it has all been automated), the product is released to the customers for feedback and a limited amount of future-proofing is considered for final iterations.

  7. Most customer feedback is ignored, but a small and reasonable subset is acted on. If the operatives were able to make do with the full task load plus development, it's probably a lot better than it used to be, and the customer is just getting greedy.

  8. The customer takes the operatives out for beers, since the tool team saved them a crapload of time and accounted for all the issues in the problem domain.

  9. A single operative hangs back with the customer for a few more iterations to eyeball maintenance concerns and maybe do a little more future-proofing while the rest head back to the tool team. The one who hangs back gets some kind of special reward for being a team player.


In the sleeper cell approach, the operatives have a clear understanding of what's important through first hand knowledge and experience and, consequently, know the ways in which the software has to be flexible. It emulates the way that organic tool development is found in the wild, as described in the introductory paragraph, but puts the task of creating the actual tool in the hands of experienced tool developers (our operatives!).

I think it's also noteworthy that this approach adheres to a reasonable principle: to write a good program to automate a task, you have to know/understand the variety of ways in which you might perform that task by hand, across all the likely variables.

The operatives are forced to live with the fruits of their labor; i.e. a defect like slow load times will be more painful for them, because they have to work with their tool regularly and take on larger workloads on an ongoing basis, before developers can ever get their hands on it.

Notice that there's still the benefit through centralization of tool developers: central contact point for tool needs, cultivating expertise in developers, knowledge of shared code base, understanding of infrastructure and contact points for infrastructural resource needs; however, you avoid the weird customer disconnect that comes with time slicing a traditional tool team.

Tools developers may also find that they enjoy the team that they're working in so much that they request to stay on that team! How awesome of a pitch is that to new hires? "Do you have a strong background in software development? Work closely with established software experts, make connections to people who will love you when you're done awesome-ing their lives, and take a whirlwind tour of the company within one year."



Yes, I'm suggesting you digest my mind-goo.


For some definition of par.


I'm calling them operatives now, because their roles are different from tool developers, as you'll see.


It is beneficial if a small seed of hatred for the manual task begins to develop, though care should be taken not to allow operatives to be consumed by said hatred.

Virtues of Extreme Programming practices

Aside: I've changed the name of my blog to reflect a new writing approach. I've found, with good consistency, that being near-pathologically honest and forward is a boon to my learning productivity. Sometimes it causes temporary setbacks (embarrassment, remorse) when I step into areas that I don't fully understand, but the increased rate of progress is worthwhile. For example, this approach should help me get some SQRRR accomplished more readily, as I can get more ideas out in the open and don't need to feel like an expert on everything I write about.

In my limited experience with Extreme Programming (XP) practices, I've felt it was a long-term benefit for myself and my teammates. Unfortunately, because of XP's deviation from the more standard programming practices that I was taught, the activities originally carry a certain weirdness and unapproachability about them. Tests up front? Two people writing a single piece of code? Broadcasting the concrete tasks you accomplished on a daily basis?

After shaking off the inevitable willies, I've found that those activities improve relationships between myself and other team members and help to solidify code understanding and emphasize maintainability. From what I've read, this is what the developers of XP were trying to help optimize: the productivity that results from accepting the social aspect of coding. It is strictly more useful to form good working relationships with humans than with rubber ducks.

A nice secondary effect from the social coding activity is an increased flow of institutional knowledge. Everybody knows little secrets about the corners of your code base or have figured out optimized workflows — somewhat obviously, interpersonal flow of info helps keep more people in the know. When it takes five to ten minutes to explain a code concept to someone, both parties start to get the feeling it should be documented somewhere.

It reads a bit dramatic, but this snippet from the XP website has been fairly accurate in my experience:

Extreme Programming improves a software project in five essential ways; communication, simplicity, feedback, respect, and courage. Extreme Programmers constantly communicate with their customers and fellow programmers. They keep their design simple and clean.

The cons that I've witnessed are some minor bikeshedding and the increased overhead that seems to accompany these tasks:

On the other hand, I've also witnessed these costs get amortized away:

At Mozilla we seem to have a decent code review process down, which is one of my favorite social coding practices when it's done well. At the moment, my team doesn't seem too keen on some of the other practices I've found helpful, and it's certainly not something you should force. In any case, I'm happy to be the guy who talks about how great I've found these practices when the topic comes up until somebody comes around. ;-)

Learning Python by example: list comprehensions

My friend, who is starting to learn Python 2.x, asked me what this snippet did:

def collapse(seq):
    # Preserve order.
    uniq = []
    [uniq.append(item) for item in seq if not uniq.count(item)]
    return uniq

This is not a snippet that should be emulated (i.e. it's bad); however, it makes me happy: there are so many things that can be informatively corrected!

What is a list comprehension?

A list comprehension is a special brackety syntax to perform a transform operation with an optional filter clause that always produces a new sequence (list) object as a result. To break it down visually, you perform:

new_range = [i * i          for i in range(5)   if i % 2 == 0]

Which corresponds to:

*result*  = [*transform*    *iteration*         *filter*     ]

The filter piece answers the question, "should this item be transformed?" If the answer is yes, then the transform piece is evaluated and becomes an element in the result. The iteration [*] order is preserved in the result.

Go ahead and figure out what you expect new_range to be in the prior example. You can double check me in the Python shell, but I think it comes out to be:

>>> new_range = [i * i for i in range(5) if i % 2 == 0]
>>> print new_range
[0, 4, 16]

If it still isn't clicking, we can try to make the example less noisy by getting rid of the transform and filter — can you tell what this will produce?

>>> new_range = [i for i in range(5)]

So what's wrong with that first snippet?

As we observed in the previous section, a list comprehension always produces a result list, where the elements of the result list are the transformed elements of the iteration. That means, if there's no filter piece, there are exactly as many result elements as there were iteration elements.

Weird thing number one about the snippet — the list comprehension result is unused. It's created, mind you — list comprehension always create a value, even if you don't care what it is — but it just goes off to oblivion. (In technical terms, it becomes garbage.) When you don't need the result, just use a for loop! This is better:

def colapse(seq):
    """Preserve order."""
    uniq = []
    for item in seq:
        if not uniq.count(item):
    return uniq

It's two more lines, but it's less weird looking and wasteful. "Better for everybody who reads and runs your code," means you should do it.

Moral of the story: a list comprehension isn't just, "shorthand for a loop." It's shorthand for a transform from an input sequence to an output sequence with an optional filter. If it gets too complex or weird looking, just make a loop. It's not that hard and readers of your code will thank you.

Weird thing number two: the transform, list.append(item), produces None as its output value, because the return value from list.append is always None. Therefore, the result, even though it isn't kept anywhere, is a list of None values of the same length as seq (notice that there's no filter clause).

Weird thing number three: list.count(item) iterates over every element in the list looking for things that == to item. If you think through the case where you call collapse on an entirely unique sequence, you can tell that the collapse algorithm is O(n2). In fact, it's even worse than it may seem at first glance, because count will keep going all the way to the end of uniq, even if it finds item in the first index of uniq. What the original author really wanted was item not in uniq, which bails out early if it finds item in uniq.

Also worth mentioning for the computer-sciency folk playing along at home: if all elements of the sequence are comparable, you can bring that down to O(n * log n) by using a "shadow" sorted sequence and bisecting to test for membership. If the sequence is hashable you can bring it down to O(n), perhaps by using the set datatype if you are in Python >= 2.3. Note that the common cases of strings, numbers, and tuples (any built-in immutable datatype, for that matter) are hashable.

From Python history

It's interesting to note that Python Enhancement Proposal (PEP) #270 considered putting a uniq function into the language distribution, but withdrew it with the following statement:

Removing duplicate elements from a list is a common task, but there are only two reasons I can see for making it a built-in. The first is if it could be done much faster, which isn't the case. The second is if it makes it significantly easier to write code. The introduction of sets.py eliminates this situation since creating a sequence without duplicates is just a matter of choosing a different data structure: a set instead of a list.

Remember that sets can only contain hashable elements (same policy as dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as mentioned in the last paragraph of the previous section.



"Iteration" is just a fancy word for "step through the sequence, element by element, and give that element a name." In our case we're giving the name i.

Registry pattern trumps import magic

The other night I saw an interesting tweet in the #Python Twitter channel -- Patrick was looking to harness the dynamism of a language like Python in a way that many Pythonistas would consider magical. [*] Coming from languages with more rigid execution models, it's understandably easy to confuse dynamic and magical. [†]

What is magic?

To quote the jargon file, magic is:

Characteristic of something that works although no one really understands why (this is especially called black magic).

Taken in the context of programming, magic refers to code that works without a straightforward way of determining why it works.

Today's more flexible languages provide the programmer with a significant amount of power at runtime, making the barrier to "accidental magic" much lower. As a programmer who works with dynamic languages, there's an important responsibility to keep in mind: err on the side of caution with the Principle of Least Surprise.

[T]o design usable interfaces, it's best when possible not to design an entire new interface model. Novelty is a barrier to entry; it puts a learning burden on the user, so minimize it.

This principle indicates that using well known design patterns and language idioms is a "best practice" in library design. When you follow that guideline, people will already have an understanding of the interface that you're providing; therefore, they will have one less thing to worry about in leveraging your library to write their code.

Discovery Mechanism Proposals

Patrick is solving a common category of problem: he wants to allow clients to flexibly extend his parsing library's capabilities. For example, if his module knows how to parse xml and yaml files out of the box, programmers using his library should be able to add their own rst and html parser capabilities with ease.

Patrick's proposal is this:

If you were to do this, you would use the various utilities in the imp module to load the modules dynamically, then determine the appropriate classes via the inspect module. [‡]

My counter-proposal is this, which is also known as the Registry Pattern, a form of runtime configuration and behavior extension:

Parser library:

class UnknownMimetypeException(Exception): pass
class ParseError(Exception): pass

class IParser:
    Reference interface for parser classes;
    inheritance is not necessary.

    parseable_mimetypes = set()

    def __init__(self, file):
        self.file = file
        self.doctree = None

    def parse(self):
        Parse :ivar:`file` and place the parsed document
        tree into :ivar:`doctree`.
        raise NotImplementedError

class ParserFacade:
    Assumes that there can only be one parser per mimetype.
    :ivar mimetype_to_parser_cls: Storage for parser registry.

    def __init__(self):
        self.mimetype_to_parser_cls = {}

    def register_parser(self, cls):
        for mimetype in cls.parseable_mimetypes:
            self.mimetype_to_parser_cls[mimetype] = cls

        return cls # For use as a decorator.

    def parse(self, file, mimetype):
        Determine the appropriate parser for the mimetype,
        create a parser to parse the file, and perform
        the parsing.

        :return: The parser object.
            parser_cls = self.mimetype_to_parser_cls[mimetype]
        except KeyError:
            raise UnknownMimetypeException(mimetype)

        parser = parser_cls(file)
        parser.parse() # May raise ParseError
        return parser

default_facade = ParserFacade()
register_parser = default_facade.register_parser
parse = default_facade.parse

Client code:

from parser_lib import register_parser

class SpamParser:
    Parses ``.spam`` files.
    Conforms to implicit parser interface of `parser_lib`.

    parseable_mimetypes = {'text/spam'}

    def __init__(self, file):
        self.file = file
        self.doctree = None

    def parse(self):
        raise NotImplementedError

After the client code executes, the SpamParser will then be available for parsing text/spam mimetype files via parser_lib.parse.

Here are some of my considerations in determining which of these is the least magical:

Magical Allure

The problem with magic is that it is freaking cool and it drives all the ladies crazy. [¶] As a result, the right hemisphere of your developer-brain yearns for your library clients to read instructions like:

Drag and drop your Python code into my directory — I'll take care of it from there.

That's right, that's all there is to it.

Oh, I know what you're thinking — yes, I'm available — check out parser_lib.PHONE_NUMBER and give me a call sometime.

But, as you envision phone calls from sexy Pythonistas, the left hemisphere of your brain is screaming at the top of its lungs! [#]

Magic leaves the audience wondering how the trick is done, and the analytical side of the programmer mind hates that. It implies that there's a non-trivial abstraction somewhere that does reasonably complex things, but it's unclear where it can be found or how to leverage it differently.

Coders need control and understanding of their code and, by extension, as much control and understanding over third party code as is reasonably possible. Because of this, concise, loosely coupled, and extensible abstractions are always preferred to the imposition of elaborate usage design ideas on clients of your code. It's best to assume that people will want to leverage the functionality your code provides, but that you can't foresee the use cases.

To Reiterate: Dynamic does not Imply Magical

Revisiting my opening point: anecdotal evidence suggests that some members of the static typing camp see we programming-dynamism dynamos as anarchic lovers of programming chaos. Shoot-from-the-hip cowboys, strolling into lawless towns of code, type checking blowing by the vacant sheriff's station as tumbleweeds in the wind. (Enough imagery for you?) With this outlook, it's easy to see why you would start doing all sorts of fancy things when you cross into dynamism town — little do you know, we don't take kindly to that 'round these parts.

In other, more intelligble words, this is a serious misconception — dynamism isn't a free pass to disregard the Principle of Least Surprise — dynamism proponents still want order in the programming universe. Perhaps we value our sanity even more! The key insight is that programming dynamism does allow you additional flexibility when it's required or practical to use. More rigid execution models require you to use workarounds, laboriously at times, for a similar degree of flexibility.

As demonstrated by Marius' comment in my last entry, Python coders have a healthy respect for the power of late binding, arbitrary code execution on module import, and seamless platform integration. Accompanying this is a healthy wariness of black magic.


It's possible that Patrick was developing a closed-system application (e.g. the Eclipse IDE) and not a library like I was assuming.

In the application case, extensions are typically discovered (though not necessarily activated) by enumerating a directory. When the user activates such an extension, the modules found within it are loaded into the application. This is the commonly found plugin model — it's typically more difficult to wrap the application interface and do configurations at load time, so the application developer must provide an extension hook.

However, the registration pattern should still be preferred to reflection in this case! When the extension is activated and the extension modules load, the registration decorator will be executed along with all the other top-level code in the extension modules.

The extension has the capability to inform the application of the extension's functionality instead having the application query the plugin for its capabilities. This is a form of loosely coupled cooperative configuration that eases the burden on the application and eliminates the requirement to foresee needs of the extensions. [♠]



Note that you can't call it dynamic programming, as that would alias a well known term from the branch of computer science concerned with algorithms. Programming language dynamism it is!


Much like a dehydrated wanderer in the desert mistakes a shapely pile of sand for an oasis!


As of the date of this publishing, Patrick's implementation seems to have gone a bit astray with text processing of Python source files. Prefer dynamic module loading and inspection to text processing source code! Enumerating the reasons this is preferred is beyond the scope of this article.


In Python < 3.0 you can perform class decoration without the decorator syntax. Decorator syntax is just syntactic sugar for "invoke this method and rebind the identifier in this scope", like so:

class SomeClass(object):
SomeClass = my_class_decorator(SomeClass) # Decorate the class.

Perhaps men as well, but I've never seen any TV evidence to justify that conclusion.


Yes, in this analogy brains have lungs. If you've read this far you're probably not a biologist anyway.


Of course, the plugin model always has security implications. Unless you go out of your way to make a sandboxed Python environment for plugins, you need to trust the plugins that you activate — they have the ability to execute arbitrary code.