September 18, 2008

Thoughts on self-modifying code and Futurist Programmers

Around 8th grade I read an article about a faction of programmers — the Futurist Programmers — whose rallying cry is paraphrased in the following quotation:

Why does computer science reject self modifying programs? Why have some departments stopped teaching assembly language programming? On what scientific basis has this been done? Where is the experimental evidence to support these actions?

As far as I remember, this movement attempted to emphasize the purity of computer programming, which they believed was a form of artistry. This was posed as a throwback to the tenets Italian Futurism, which were opposed to tradition and commoditization, in the context of computer programming. A Wikipedia excerpt will probably be helpful:

The Futurists admired speed, technology, youth and violence, the car, the plane and the industrial city, all that represented the technological triumph of humanity over nature, and they were passionate nationalists.

Thinking about JavaScript Just In Time compilers (JITs) today — like TraceMonkey — reminded me of this philosophy. I believe that their line of questioning was insightful, but the formulation was misdirected. Technological triumph stems primarily from computers doing what humans want them to do. It's additionally awesome if the computers can do these things extra quickly; however, if they do things incorrectly very quickly, humanity comes out much less triumphant. Perhaps we even come out worse for the experience.

Secondly, we note that humanity strives for the ability to make further progress based on the success of past experiences. This is the concept of extensibility and reusability. Standing on the shoulders of giants, if you will. Self modifying code that I have encountered is often very clever; however, programming cleverness tends to be at odds with readability. [*] This is not to say that all self-modifying code is unreadable: in languages with dynamic method dispatch, swapping a object's methods out (with some kind of locking mechanism) is a recognized idiom that can lead to beneficial efficiency/complexity trade-offs. [†]

Ultimately, you'd have trouble finding computer enthusiasts who find speed unimportant. Everybody loves it when their computers are more efficient! The caveat is that most computer enthusiasts will, in many situations, put speed down here: after correctness and extensibility. As a testament to this, there is continuing emergence and acceptance of Very High Level Languages (VHLLs) over low level programming languages in non-academic contexts.

So how did the futurists have the right idea? "Introspective" programs are important. There's lots of information at runtime that we can use to more efficiently execute programs. [‡] Hotspot JITs, such as the aforementioned TraceMonkey, know this well: the basic premise is that they dynamically rewrite the code they're executing or, in recent developments with Google's V8, rewrite it before executing. The key here is that we can now:

  1. Write correct, extensible programs.

  2. Write correct, extensible programs to optimize the programs from 1.

  3. Run the more efficient result of combining 2 and 1.

Self-hosting platforms such as PyPy and intermediary representation JITs such as LLVM also show astonishing insight into introspective techniques. These platforms can be used to a number of ends, including, but not limited to, the increases in speed that the Futurist Programmers seem to be longing for.

In the end, I only have one rebuttal question for the Futurist Programmers: What kind of science disregards the accuracy and reproducibility of results for the sake of fast "experiments"? [§] We don't reject self-modifying programs without consideration — there are very important maintainability and extensibility concerns that have to be taken into account before making a decision. It's not always a choice between making something artistically beautiful or performing a feat of engineering: if most computer enthusiasts are like me, they're searching for a way to produce an appropriate mix of the two.



This is generally recognized within the Python community.


As an example of this, think of the singleton access pattern in a multithreaded application. After Singleton.get_instance() has instantiated the class on the first call, you could swap get_instance() with a method that simply returns the created reference. This avoids subsequent locking and singleton-instantiation checking that you would incur from the old get_instance() method.


I recommend the Steve Yegge talk on dynamic languages for some more background on this topic.


What is an application if not a software engineer's big, scary experiment?