July 14, 2010

Tool teams should work like sleeper cells

I've had some unique experiences interacting-with and participating-in tool development at previous companies that I've worked for, with the quality of those experiences in the broad spectrum from train-wreck to near-satisfactory. From that mental scarring has emerged the weighty goo of an idea, which may be interesting food for thought. [*]

How it starts

At some point in a company's growth, management notices that there is a lot of sub-par, [†] redundant, and distributed tool development going on. Employees have been manufacturing quick-and-dirty tools in order to perform their jobs more efficiently.

Management then ponders the benefit of centralizing that tool development. It seems like an easy sell:

Good management will also consider the negative repercussions of turning distributed and independent resources into a shared and centrally managed resource:

How I've seen it work (warning: depressing, hyperbolic)

  1. A group at the company makes a strong enough case to the centralized-tool-management machinery — a request for tool development is granted.

  2. A series of inevitably painful meetings are scheduled where the customer dictates their requirements, after which the tool team either rejects them or misunderstands/mis-prioritizes them because: a) that's not how it works — they have to actively gather the requirements, and b) they don't have enough time to do all the silly little things that the customer wants.

    Because people are fighting each other to get what they want, everybody forgets that the customers haven't really described the problem domain in any relevant detail.

  3. The tool team developers are happy to go code in peace, without going back for more painful meetings. They create a tool according to their understanding of the requirements during the first iteration.

  4. The customer has no idea how the tool team came up with a product that was nothing like their expectation. They say something overly dramatic like, "it's all wrong," pissing off the tool team, and lose faith in the ability of the tool team to deliver the product they want.

  5. The customer goes back to doing it manually or continue to develop their own tools, expecting that the tool team will fail.

  6. The tool team fails because the customer lost interest in telling them what they actually needed and giving good feedback. It wasn't the tool that anybody was looking for because the process doomed it from the start.

I say that this scenario is depressing because tool teams exist to make life better for everybody — they enjoy writing software that makes your life easier. Working with a tool team should not be painful. You should want to jump for joy when you start working with them and take them out to beers when you're finished working with them, because they're just that good. I think that, by taking a less traditional approach, you will be able to achieve much better results...

How it should work

  1. A group at the company makes a strong enough case to the centralized-tool-management machinery — a request for tool development is granted.

  2. A small handful of tool team operatives [‡], probably around two or three people, split off from the rest of the tool team and are placed in the company hierarchy under the team of the customers. They sit the customers' cube farm, go to their meetings to listen (but no laptops!), etc., just like a typical team member would.

  3. The customer team brings the operatives up to speed on the automatable task that must be performed each day through immersion. Depending on the frequency, breadth, and duration of the manual processes, the operatives must perform this manual process somewhere on the scale from weeks to months, until they develop a full understanding of the variety of manual processes that must be performed. [§] All operatives should be 100% assigned to the manual tasks for this duration, temporarily offloading members of customer team after their ramp-up.

  4. Bam! With an unquestionably solid understanding of the problem domain, the tool team sleeper cells activate. 80% of the manual task load is transitioned off of the operatives so that they can begin development work. Agile-style iterations of 1-2 weeks should be used.

  5. After each iteration there must be a usable product (by definition of an iteration). As a result of this, a percentage of the manual task load is shifted back onto the operatives each iteration, augmenting the original 20%. If the tool is actually developing properly, the operatives will be able to cope with the increased load over time.

  6. As the feature set begins to stabilize or the manual task load approaches zero (because it has all been automated), the product is released to the customers for feedback and a limited amount of future-proofing is considered for final iterations.

  7. Most customer feedback is ignored, but a small and reasonable subset is acted on. If the operatives were able to make do with the full task load plus development, it's probably a lot better than it used to be, and the customer is just getting greedy.

  8. The customer takes the operatives out for beers, since the tool team saved them a crapload of time and accounted for all the issues in the problem domain.

  9. A single operative hangs back with the customer for a few more iterations to eyeball maintenance concerns and maybe do a little more future-proofing while the rest head back to the tool team. The one who hangs back gets some kind of special reward for being a team player.


In the sleeper cell approach, the operatives have a clear understanding of what's important through first hand knowledge and experience and, consequently, know the ways in which the software has to be flexible. It emulates the way that organic tool development is found in the wild, as described in the introductory paragraph, but puts the task of creating the actual tool in the hands of experienced tool developers (our operatives!).

I think it's also noteworthy that this approach adheres to a reasonable principle: to write a good program to automate a task, you have to know/understand the variety of ways in which you might perform that task by hand, across all the likely variables.

The operatives are forced to live with the fruits of their labor; i.e. a defect like slow load times will be more painful for them, because they have to work with their tool regularly and take on larger workloads on an ongoing basis, before developers can ever get their hands on it.

Notice that there's still the benefit through centralization of tool developers: central contact point for tool needs, cultivating expertise in developers, knowledge of shared code base, understanding of infrastructure and contact points for infrastructural resource needs; however, you avoid the weird customer disconnect that comes with time slicing a traditional tool team.

Tools developers may also find that they enjoy the team that they're working in so much that they request to stay on that team! How awesome of a pitch is that to new hires? "Do you have a strong background in software development? Work closely with established software experts, make connections to people who will love you when you're done awesome-ing their lives, and take a whirlwind tour of the company within one year."



Yes, I'm suggesting you digest my mind-goo.


For some definition of par.


I'm calling them operatives now, because their roles are different from tool developers, as you'll see.


It is beneficial if a small seed of hatred for the manual task begins to develop, though care should be taken not to allow operatives to be consumed by said hatred.

Efficiency of list comprehensions

I'm psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I've restructured it for posterity.

Efficiency of the more common usage

Let's look at the efficiency of list comprehensions in the more common usage, where the comprehension's list result is actually relevant (or, in compiler-speak, live-out).

Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:

import dis
import inspect
import timeit

programs = dict(
result = []
for i in range(20):
    result.append(i * 2)
result = []
add = result.append
for i in range(20):
    add(i * 2)
    comprehension='result = [i * 2 for i in range(20)]',

for name, text in programs.iteritems():
    print name, timeit.Timer(stmt=text).timeit()
    code = compile(text, '<string>', 'exec')
loop 11.1495118141
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)

  3           6 SETUP_LOOP              37 (to 46)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (20)
             15 CALL_FUNCTION            1
             18 GET_ITER
        >>   19 FOR_ITER                23 (to 45)
             22 STORE_NAME               2 (i)

  4          25 LOAD_NAME                0 (result)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 LOAD_CONST               1 (2)
             37 BINARY_MULTIPLY
             38 CALL_FUNCTION            1
             41 POP_TOP
             42 JUMP_ABSOLUTE           19
        >>   45 POP_BLOCK
        >>   46 LOAD_CONST               2 (None)
             49 RETURN_VALUE
loop_faster 8.36096310616
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)

  3           6 LOAD_NAME                0 (result)
              9 LOAD_ATTR                1 (append)
             12 STORE_NAME               2 (add)

  4          15 SETUP_LOOP              34 (to 52)
             18 LOAD_NAME                3 (range)
             21 LOAD_CONST               0 (20)
             24 CALL_FUNCTION            1
             27 GET_ITER
        >>   28 FOR_ITER                20 (to 51)
             31 STORE_NAME               4 (i)

  5          34 LOAD_NAME                2 (add)
             37 LOAD_NAME                4 (i)
             40 LOAD_CONST               1 (2)
             43 BINARY_MULTIPLY
             44 CALL_FUNCTION            1
             47 POP_TOP
             48 JUMP_ABSOLUTE           28
        >>   51 POP_BLOCK
        >>   52 LOAD_CONST               2 (None)
             55 RETURN_VALUE
comprehension 7.08145213127
  1           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_NAME               0 (_[1])
              7 LOAD_NAME                1 (range)
             10 LOAD_CONST               0 (20)
             13 CALL_FUNCTION            1
             16 GET_ITER
        >>   17 FOR_ITER                17 (to 37)
             20 STORE_NAME               2 (i)
             23 LOAD_NAME                0 (_[1])
             26 LOAD_NAME                2 (i)
             29 LOAD_CONST               1 (2)
             32 BINARY_MULTIPLY
             33 LIST_APPEND
             34 JUMP_ABSOLUTE           17
        >>   37 DELETE_NAME              0 (_[1])
             40 STORE_NAME               3 (result)
             43 LOAD_CONST               2 (None)
             46 RETURN_VALUE

List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized LIST_APPEND bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).

In the loop_faster program, you avoid the overhead of the append attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized LIST_APPEND bytecode instead of incurring the overhead of a function call, so it still trumps.

Using list comprehensions for side effects

I want to address a point that was brought up in the previous entry as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I'll discuss the subjective bit first, since that's the least sciency part.


Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…

Michael Foord

First of all, thanks to Michael for his excellent and thought provoking comment!

My response is that removing the use of the result does indeed make it less readable, precisely because you're using a result-producing control flow construct where the result is not needed. I suppose I'm positing that it's inherently confusing to do that with your syntax: there's a looping form that doesn't produce a result, so that should be used instead. It's expressing your semantic intention via syntax.

For advanced Pythonistas it's easy for figure out what's going on at a glance, but comprehension-as-loop definitely has a "there's more than one way to do it" smell about it, which also makes it less amenable to people learning the language.

With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn't require a result they now ask themselves, "Can I fit this into the list comprehension form?" Those mental branches are, to me, what "one way to do it" is designed to avoid. When I read Perl code, I take "mental exceptions" all the time because the author didn't use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that "no result needed" should automatically imply a loop construct.


Consider two functions, comprehension and loop:

def loop():
    accum = []
    for i in range(20):
    return accum

def comprehension():
    accum = []
    [accum.append(i) for i in range(20)]
    return accum

N.B. This example is comparing the efficiency of a list comprehension where the result of the comprehension is ignored to a for loop that produces no result, as is discussed in the referenced entry, Python by example: list comprehensions.

Michael Foord comments:

Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.

However, the disassembly, obtained via dis.dis(func) looks like the following for the loop:

2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)

3           6 SETUP_LOOP              33 (to 42)
            9 LOAD_GLOBAL              0 (range)
           12 LOAD_CONST               1 (20)
           15 CALL_FUNCTION            1
           18 GET_ITER
      >>   19 FOR_ITER                19 (to 41)
           22 STORE_FAST               1 (i)

4          25 LOAD_FAST                0 (accum)
           28 LOAD_ATTR                1 (append)
           31 LOAD_FAST                1 (i)
           34 CALL_FUNCTION            1
           37 POP_TOP
           38 JUMP_ABSOLUTE           19
      >>   41 POP_BLOCK

5     >>   42 LOAD_FAST                0 (accum)
           45 RETURN_VALUE

And it looks like the following for the comprehension:

2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)

3           6 BUILD_LIST               0
            9 DUP_TOP
           10 STORE_FAST               1 (_[1])
           13 LOAD_GLOBAL              0 (range)
           16 LOAD_CONST               1 (20)
           19 CALL_FUNCTION            1
           22 GET_ITER
      >>   23 FOR_ITER                22 (to 48)
           26 STORE_FAST               2 (i)
           29 LOAD_FAST                1 (_[1])
           32 LOAD_FAST                0 (accum)
           35 LOAD_ATTR                1 (append)
           38 LOAD_FAST                2 (i)
           41 CALL_FUNCTION            1
           44 LIST_APPEND
           45 JUMP_ABSOLUTE           23
      >>   48 DELETE_FAST              1 (_[1])
           51 POP_TOP

4          52 LOAD_FAST                0 (accum)
           55 RETURN_VALUE

By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just "syntactic sugar" for the for loop, as mentioned by nes — they both lower down into the same control flow construct at a virtual machine level, at least in CPython.

The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead — it's simply deleted in bytecode 48. Unless the POP_BLOCK operation (bytecode 41) of the loop disassembly is very expensive (I haven't looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.

Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a for loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special Cython construct, because LOAD_GLOBAL range could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper The effect of unrolling and inlining for Python bytecode optimizations.

Thoughts on programming language fluency

I noticed that Effective Java's foreword is written by Guy Steele, so I actually bothered to read it. Here's the bit I found particularly intriguing:

If you have ever studied a second language yourself and then tried to use it outside the classroom, you know that there are three things you must master: how the language is structured (grammar), how to name things you want to talk about (vocabulary), and the customary and effective ways to say everyday things (usage).

When programmers enter the job market, the idea that, "We have the capability to learn any programming language," gets thrown around a lot. I now realize that this sentiment is irrelevant in many cases, because the deciding factor in the hiring process is more often time to fluency.

Time to fluency as a hiring factor

Let's say that there are two candidates, Fry and Laurie, interviewing for a programming position using Haskell. [*] Fry comes off as very intelligent during the interview process, but has only used OCaml and sounds like he mutabled all of the stuff that would make your head explode using monads. Laurie, on the other hand, couldn't figure out how many ping pong balls fit into Air Force One or why manhole covers are round, [†] but is clearly fluent in Haskell. Which one gets hired?

The answer to this question is another question: When are they required to be pumping out production-quality code?

Even working all hours of the day, the time to fluency for a language is on the order of weeks, independent of other scary new-workplace factors. Although books like Effective * can get you on the right track, fluency is ultimately attained through experience. Insofar as programming is a perpetual decision of what to make flexible and what to hard-code, you must spend time in the hot seat to gain necessary intuition — each language's unique characteristics change the nature of the game.

Everybody wants to hire Fry; however, Laurie will end up with the job due to time constraints on the part of the hiring manager. I'm pretty sure that Joel's interview notions are over-idealized in the general case:

Anyway, software teams want to hire people with aptitude, not a particular skill set. Any skill set that people can bring to the job will be technologically obsolete in a couple of years, anyway, so it’s better to hire people that are going to be able to learn any new technology rather than people who happen to know how to make JDBC talk to a MySQL database right this minute.

Reqs have to be filled so that the trains run on time — it's hard to let real, here-and-now schedules slip to avoid hypothetical, three-years-later slip.

Extreme Programming as catalyst

You remember that scene from The Matrix where Neo gets all the Kung Fu downloaded into his brain in a matter of seconds? That whole process is nearly as awesome as code reviews.

Pair programming and code reviews:

This is totally speculative, but from my experience I'd be willing to believe you can reduce the minimum-time-to-fluency by an order of magnitude with the right (read: friendly and supportive) Extreme Programming environment.

What I learned: When you create interfaces for everything (instead of base classes) it's almost less work to make a factory.



You know it's a hypothetical because it's a Haskell position. Bzinga!


The point is that Fry has the high ground in terms of perceived aptitude. I actually think most of the Mount Fuji questions are nearly useless in determining aptitude, though I do enjoy them. The referenced sentence is a poor attempt at a joke. ;-)

Thoughts on self-modifying code and Futurist Programmers

Around 8th grade I read an article about a faction of programmers — the Futurist Programmers — whose rallying cry is paraphrased in the following quotation:

Why does computer science reject self modifying programs? Why have some departments stopped teaching assembly language programming? On what scientific basis has this been done? Where is the experimental evidence to support these actions?

As far as I remember, this movement attempted to emphasize the purity of computer programming, which they believed was a form of artistry. This was posed as a throwback to the tenets Italian Futurism, which were opposed to tradition and commoditization, in the context of computer programming. A Wikipedia excerpt will probably be helpful:

The Futurists admired speed, technology, youth and violence, the car, the plane and the industrial city, all that represented the technological triumph of humanity over nature, and they were passionate nationalists.

Thinking about JavaScript Just In Time compilers (JITs) today — like TraceMonkey — reminded me of this philosophy. I believe that their line of questioning was insightful, but the formulation was misdirected. Technological triumph stems primarily from computers doing what humans want them to do. It's additionally awesome if the computers can do these things extra quickly; however, if they do things incorrectly very quickly, humanity comes out much less triumphant. Perhaps we even come out worse for the experience.

Secondly, we note that humanity strives for the ability to make further progress based on the success of past experiences. This is the concept of extensibility and reusability. Standing on the shoulders of giants, if you will. Self modifying code that I have encountered is often very clever; however, programming cleverness tends to be at odds with readability. [*] This is not to say that all self-modifying code is unreadable: in languages with dynamic method dispatch, swapping a object's methods out (with some kind of locking mechanism) is a recognized idiom that can lead to beneficial efficiency/complexity trade-offs. [†]

Ultimately, you'd have trouble finding computer enthusiasts who find speed unimportant. Everybody loves it when their computers are more efficient! The caveat is that most computer enthusiasts will, in many situations, put speed down here: after correctness and extensibility. As a testament to this, there is continuing emergence and acceptance of Very High Level Languages (VHLLs) over low level programming languages in non-academic contexts.

So how did the futurists have the right idea? "Introspective" programs are important. There's lots of information at runtime that we can use to more efficiently execute programs. [‡] Hotspot JITs, such as the aforementioned TraceMonkey, know this well: the basic premise is that they dynamically rewrite the code they're executing or, in recent developments with Google's V8, rewrite it before executing. The key here is that we can now:

  1. Write correct, extensible programs.

  2. Write correct, extensible programs to optimize the programs from 1.

  3. Run the more efficient result of combining 2 and 1.

Self-hosting platforms such as PyPy and intermediary representation JITs such as LLVM also show astonishing insight into introspective techniques. These platforms can be used to a number of ends, including, but not limited to, the increases in speed that the Futurist Programmers seem to be longing for.

In the end, I only have one rebuttal question for the Futurist Programmers: What kind of science disregards the accuracy and reproducibility of results for the sake of fast "experiments"? [§] We don't reject self-modifying programs without consideration — there are very important maintainability and extensibility concerns that have to be taken into account before making a decision. It's not always a choice between making something artistically beautiful or performing a feat of engineering: if most computer enthusiasts are like me, they're searching for a way to produce an appropriate mix of the two.



This is generally recognized within the Python community.


As an example of this, think of the singleton access pattern in a multithreaded application. After Singleton.get_instance() has instantiated the class on the first call, you could swap get_instance() with a method that simply returns the created reference. This avoids subsequent locking and singleton-instantiation checking that you would incur from the old get_instance() method.


I recommend the Steve Yegge talk on dynamic languages for some more background on this topic.


What is an application if not a software engineer's big, scary experiment?

Python's generators sure are handy

While rewriting some older code today, I ran across a good example of the clarity inherent in Python's generator expressions. Some time ago, I had written this weirdo construct:

for regex in date_regexes:
    match = regex.search(line)
    if match:
# ... do stuff with the match

The syntax highlighting makes the problem fairly obvious: there's way too much syntax!

First of all, I used the semi-obscure "for-else" construct. For those of you who don't read the Python BNF grammar for fun (as in: the for statement), the definition may be useful:

So long as the for loop isn't (prematurely) terminated by a break statement, the code in the else suite gets evaluated. To restate (in the contrapositive): the code in the else suite doesn't get evaluated if the for loop is terminated with a break statement. From this definition we can deduce that if a match was found, I did not want to return early.

That's way too much stuff to think about. Generators come to the rescue!

def first(iterable):
    """:return: The first item in the iterable that evaluates
    as True.
    for item in iterable:
        if item:
            return item
    return None

match = first(regex.search(line) for regex in regexes)
if not match:
# ... do stuff with the match

At a glance, this is much shorter and more comprehensible. We pass a generator expression to the first function, which performs a kind of short-circuit evaluation — as soon as a match is found, we stop running regexes (which can be expensive). This is a pretty rockin' solution, so far as I can tell.

Prior to generator expressions, to do something similar to this we'd have to use a list comprehension, like so:

match = first([regex.search(line) for regex in regexes])
if not match:
# ... do stuff with the match

We dislike this because the list comprehension will run all of the regexes, even if one already found a match. What we really want is the short circuit evaluation provided by generator expressions and the any builtin, as shown above. Huzzah!


Originally I thought that the any built-in returned the first object which evaluated to a boolean True, but it actually returns the boolean True if any of the objects evaluate to True. I've edited to reflect my mistake.