Posts Tagged ‘Efficiency’

Thoughts on programming language fluency

Sunday, November 29th, 2009

I noticed that Effective Java’s foreword is written by Guy Steele, so I actually bothered to read it. Here’s the bit I found particularly intriguing:

If you have ever studied a second language yourself and then tried to use it outside the classroom, you know that there are three things you must master: how the language is structured (grammar), how to name things you want to talk about (vocabulary), and the customary and effective ways to say everyday things (usage).

When programmers enter the job market, the idea that, "We have the capability to learn any programming language," gets thrown around a lot. I now realize that this sentiment is irrelevant in many cases, because the deciding factor in the hiring process is more often time to fluency.

Time to fluency as a hiring factor

Let’s say that there are two candidates, Fry and Laurie, interviewing for a programming position using Haskell. [*] Fry comes off as very intelligent during the interview process, but has only used OCaml and sounds like he mutabled all of the stuff that would make your head explode using monads. Laurie, on the other hand, couldn’t figure out how many ping pong balls fit into Air Force One or why manhole covers are round, [†] but is clearly fluent in Haskell. Which one gets hired?

The answer to this question is another question: When are they required to be pumping out production-quality code?

Even working all hours of the day, the time to fluency for a language is on the order of weeks, independent of other scary new-workplace factors. Although books like Effective * can get you on the right track, fluency is ultimately attained through experience. Insofar as programming is a perpetual decision of what to make flexible and what to hard-code, you must spend time in the hot seat to gain necessary intuition — each language’s unique characteristics change the nature of the game.

Everybody wants to hire Fry; however, Laurie will end up with the job due to time constraints on the part of the hiring manager. I’m pretty sure that Joel’s interview notions are over-idealized in the general case:

Anyway, software teams want to hire people with aptitude, not a particular skill set. Any skill set that people can bring to the job will be technologically obsolete in a couple of years, anyway, so it’s better to hire people that are going to be able to learn any new technology rather than people who happen to know how to make JDBC talk to a MySQL database right this minute.

Reqs have to be filled so that the trains run on time — it’s hard to let real, here-and-now schedules slip to avoid hypothetical, three-years-later slip.

Extreme Programming as catalyst

You remember that scene from The Matrix where Neo gets all the Kung Fu downloaded into his brain in a matter of seconds? That whole process is nearly as awesome as code reviews.

Pair programming and code reviews:

  • Trick your brain into learning everything faster through mild stress and the threat of looking noobish in your colleagues’ eyes.
  • Give you the shoulders of language-fluent programmers to stand on as they push you in the right direction.
  • Back off in accordance with your fluency acquisition.

This is totally speculative, but from my experience I’d be willing to believe you can reduce the minimum-time-to-fluency by an order of magnitude with the right (read: friendly and supportive) Extreme Programming environment.

Footnotes

[*] You know it’s a hypothetical because it’s a Haskell position. Bzinga!
[†] The point is that Fry has the high ground in terms of perceived aptitude. I actually think most of the Mount Fuji questions are nearly useless in determining aptitude, though I do enjoy them. The referenced sentence is a poor attempt at a joke. ;-)

Thoughts on self-modifying code and Futurist Programmers

Thursday, September 18th, 2008

Around 8th grade I read an article about a faction of programmers — the Futurist Programmers — whose rallying cry is paraphrased in the following quotation:

Why does computer science reject self modifying programs? Why have some departments stopped teaching assembly language programming? On what scientific basis has this been done? Where is the experimental evidence to support these actions?

As far as I remember, this movement attempted to emphasize the purity of computer programming, which they believed was a form of artistry. This was posed as a throwback to the tenets Italian Futurism, which were opposed to tradition and commoditization, in the context of computer programming. A Wikipedia excerpt will probably be helpful:

The Futurists admired speed, technology, youth and violence, the car, the plane and the industrial city, all that represented the technological triumph of humanity over nature, and they were passionate nationalists.

Thinking about JavaScript Just In Time compilers (JITs) today — like TraceMonkey — reminded me of this philosophy. I believe that their line of questioning was insightful, but the formulation was misdirected. Technological triumph stems primarily from computers doing what humans want them to do. It’s additionally awesome if the computers can do these things extra quickly; however, if they do things incorrectly very quickly, humanity comes out much less triumphant. Perhaps we even come out worse for the experience.

Secondly, we note that humanity strives for the ability to make further progress based on the success of past experiences. This is the concept of extensibility and reusability. Standing on the shoulders of giants, if you will. Self modifying code that I have encountered is often very clever; however, programming cleverness tends to be at odds with readability. [*] This is not to say that all self-modifying code is unreadable: in languages with dynamic method dispatch, swapping a object’s methods out (with some kind of locking mechanism) is a recognized idiom that can lead to beneficial efficiency/complexity trade-offs. [†]

Ultimately, you’d have trouble finding computer enthusiasts who find speed unimportant. Everybody loves it when their computers are more efficient! The caveat is that most computer enthusiasts will, in many situations, put speed down here: after correctness and extensibility. As a testament to this, there is continuing emergence and acceptance of Very High Level Languages (VHLLs) over low level programming languages in non-academic contexts.

So how did the futurists have the right idea? "Introspective" programs are important. There’s lots of information at runtime that we can use to more efficiently execute programs. [‡] Hotspot JITs, such as the aforementioned TraceMonkey, know this well: the basic premise is that they dynamically rewrite the code they’re executing or, in recent developments with Google’s V8, rewrite it before executing. The key here is that we can now:

  1. Write correct, extensible programs.
  2. Write correct, extensible programs to optimize the programs from 1.
  3. Run the more efficient result of combining 2 and 1.

Self-hosting platforms such as PyPy and intermediary representation JITs such as LLVM also show astonishing insight into introspective techniques. These platforms can be used to a number of ends, including, but not limited to, the increases in speed that the Futurist Programmers seem to be longing for.

In the end, I only have one rebuttal question for the Futurist Programmers: What kind of science disregards the accuracy and reproducibility of results for the sake of fast "experiments"? [§] We don’t reject self-modifying programs without consideration — there are very important maintainability and extensibility concerns that have to be taken into account before making a decision. It’s not always a choice between making something artistically beautiful or performing a feat of engineering: if most computer enthusiasts are like me, they’re searching for a way to produce an appropriate mix of the two.

Footnotes

[*] This is generally recognized within the Python community.
[†] As an example of this, think of the singleton access pattern in a multithreaded application. After Singleton.get_instance() has instantiated the class on the first call, you could swap get_instance() with a method that simply returns the created reference. This avoids subsequent locking and singleton-instantiation checking that you would incur from the old get_instance() method.
[‡] I recommend the Steve Yegge talk on dynamic languages for some more background on this topic.
[§] What is an application if not a software engineer’s big, scary experiment?

Python’s generators sure are handy

Wednesday, January 23rd, 2008

While rewriting some older code today, I ran across a good example of the clarity inherent in Python’s generator expressions. Some time ago, I had written this weirdo construct:

for regex in date_regexes:
    match = regex.search(line)
    if match:
        break
else:
    return
# ... do stuff with the match

The syntax highlighting makes the problem fairly obvious: there’s way too much syntax!

First of all, I used the semi-obscure "for-else" construct. For those of you who don’t read the Python BNF grammar for fun (as in: the for statement), the definition may be useful:

So long as the for loop isn’t (prematurely) terminated by a break statement, the code in the else suite gets evaluated. To restate (in the contrapositive): the code in the else suite doesn’t get evaluated if the for loop is terminated with a break statement. From this definition we can deduce that if a match was found, I did not want to return early.

That’s way too much stuff to think about. Generators come to the rescue!

def first(iterable):
    """:return: The first item in the iterable that evaluates
    as True.
    """
    for item in iterable:
        if item:
            return item
    return None
 
match = first(regex.search(line) for regex in regexes)
if not match:
    return
# ... do stuff with the match

At a glance, this is much shorter and more comprehensible. We pass a generator expression to the first function, which performs a kind of short-circuit evaluation — as soon as a match is found, we stop running regexes (which can be expensive). This is a pretty rockin’ solution, so far as I can tell.

Prior to generator expressions, to do something similar to this we’d have to use a list comprehension, like so:

match = first([regex.search(line) for regex in regexes])
if not match:
    return
# ... do stuff with the match

We dislike this because the list comprehension will run all of the regexes, even if one already found a match. What we really want is the short circuit evaluation provided by generator expressions and the any builtin, as shown above. Huzzah!

Edit

Originally I thought that the any built-in returned the first object which evaluated to a boolean True, but it actually returns the boolean True if any of the objects evaluate to True. I’ve edited to reflect my mistake.