Notes from the JS pit: closure optimization

May 11th, 2010

In anticipation of a much-delayed dentist appointment tomorrow morning and under the assumption that hard liquor removes plaque, I’ve produced [*] an entry in the spirit of Stevey’s Drunken Blog Rants, s/wine/scotch/g. I apologize for any and all incomprehensibility, although Stevey may not mind since it’s largely an entry about funargs, which he seems to have a thing for. (Not that I blame him — I’m thinking about them while drinking…) It also appears I may need to prove myself worthy of emigration to planet Mozilla, so hopefully an entry filled with funarg debauchery will serve that purpose as well.

Background

Lately, I’ve been doing a little work on closure optimization, as permitted by static analysis; i.e. the parser/compiler marks which functions can be optimized into various closure forms.

In a language that permits nested functions and functions as first-class values, there are a few things you need to ask about each function before you optimize it:

  • Can it escape (leave the scope it’s defined in)?
  • Does it refer to free variables (identifiers declared outside the function definition)?
  • Are the free variables it refers to potentially redefined after the function definition?

Function escape (the funarg problem)

If a function can execute outside the scope in which it was lexically defined, it is said to be a "funarg", a fancy word for "potentially escaping outside the scope where it’s defined". We call certain functions in the JS runtime Algol-like closures if they are immediately applied function expressions, like so:

function outer() {
    var x = 12;
    return (function cubeX() { return x * x * x; })();
}

The function cubeX can never execute outside the confines of outer — there’s no way for the function definition to escape. It’s as if you just took the expression x * x * x, wrapped it in a lambda (function expression), and immediately executed that expression. [†]

Apparently a lot of Algol programmers had the hots for this kinda thing — the whole function-wrapping thing was totally optional, but you chose to do it, Algol programmers, and we respect your choice.

You can optimize this case through static analysis. As long as there’s no possibility of escape between a declaration and its use in a nested function, the nested function knows exactly how far to reach up the stack to retrieve/manipulate the variable — the activation record stack is totally determined at compile time. Because there’s no escaping, there’s not even any need to import the upvar into the Algol-like function.

Dijkstra’s display optimization

To optimize this Algol-like closure case we used a construct called a "Dijkstra display" (or something named along those lines). You just keep an array of stack frame pointers, with each array slot representing the frame currently executing at that function nesting level. When outer is called in the above, outer’s stack frame pointer would be placed in the display array at nesting level 0, so the array would look like:

Level 0: &outerStackFrame
Level 1: NULL
Level 2: NULL
...

Then, when cubeX is invoked, it is placed at nesting level 1:

Level 0: &outerStackFrame
Level 1: &cubeX
Level 2: NULL
...

At parse time, we tell cubeX that it can reach up to level 0, frame slot 0 to retrieve the jsval for x. [‡] Even if you have "parent" frame references in each stack frame, this array really helps when a function is reaching up many levels to retrieve an upvar, since you can do a single array lookup instead of an n link parent chain traversal. Note that this is only useful when you know the upvar-referring functions will never escape, because the display can only track stack frames for functions that are currently executing.

There’s also the possibility that two functions at the same nesting level are executing simultaneously; i.e.

function outer() {
    var x = 24;
    function innerFirst() { return x; }
    function innerSecond() {
        var x = 42;
        return innerFirst();
    }
    return innerSecond();
}

To deal with this case, each stack frame has a pointer to the "chained" display stack frame for that nesting level, which is restored when the executing function returns. To go through the motions:

Level 0: &outerStackFrame
Level 1: &innerSecond
Level 2: NULL
...

Which then activates innerFirst at the same static level (1), which saves the pointer that it’s clobbering in the display array.

Level 0: &outerStackFrame
Level 1: &innerFirst (encapsulates &innerSecond)
Level 2: NULL
...

Then, when innerFirst looks up the static levels for x, it gets the correct value, restoring innerSecond when it’s done executing in a return-style bytecode (which would be important if there were further function nesting in innerSecond). [§]

Okay, hopefully I’ve explained that well enough, because now I get to tell you that we’ve found this optimization to be fairly useless in SpiderMonkey experimental surveys and we hope to rip it out at some point. The interesting case that we actually care about (flat closures) is discussed in the second to last section.

Free variable references

Because JS is a lexically scoped language [¶] we can determine which enclosing scope a free variable is defined in. [#] If a function’s free variables only refer to bindings in the global scope, then it doesn’t need any information from the functions that enclose it. For these functions the set of free variables in nested functions is the null set, so we call it a null closure. Top-level functions are null closures. [♠]

function outer() {
    return function cube(x) { return x * x * x; }; // Null closure - no upvars.
}

Free variables are termed upvars, since they are identifiers that refer to variables in higher (enclosing) scopes. At parse time, when we’re trying to find a declaration to match up with a use, they’re called unresolved lexical dependencies. Though JavaScript scopes are less volatile — and, as some will undoubtedly point out, less flexible — I believe that the name upvar comes from this construct in Tcl, which lets you inject vars into and read vars from arbitrary scopes as determined by the runtime call stack: [♥]

set x 7
 
proc most_outer {} {
    proc outer {} {
        set x 24
        proc upvar_setter {level} {
            upvar $level x x
            set x 42
        }
        proc upvar_printer {level} {
            upvar $level x x
            puts $x
        }
        upvar_printer 1
        upvar_setter 1
        upvar_printer 1
        upvar_setter 2
        upvar_printer 2
        upvar_printer 3
        upvar_setter 3
        upvar_printer 3
    }
    outer
}
most_outer # Yields the numbers 24, 42, 42, 7, and 42.

Upvar redefinitions

If you know that the upvar is never redefined after the nested function is created, it is effectively immutable — similar to the effect of Java’s partial closures in anonymous inner classes via the final keyword. In this case, you can create an optimized closure in a form we call a flat closure — if, during static analysis, you find that none of the upvars are redefined after the function definition, you can import the upvars into the closure, effectively copying the immutable jsvals into extra function slots.

On the other hand, if variables in enclosing scopes are (re)defined after the function definition (and thus, don’t appear immutable to the function), a shared environment object has to be created so that nested functions can correctly see when the updates to the jsvals occur. Take the following example:

function outer() {
    var communicationChannel = 24;
    function innerGetter() {
        return communicationChannel();
    }
    function innerSetter() {
        communicationChannel = 42;
    }
    return [innerGetter, innerSetter];
}

Closing over references

In this case, outer must create an environment record outside of the stack so that when innerGetter and innerSetter escape on return, they can see both communicate through the upvar. This is the nice encapsulation-effect you can get through closure-by-reference, and is often used in the JS "constructor-pattern", like so:

function MooCow() {
    var hasBell = false;
    var noise = "Moo.";
    return {
        pontificate: function() { return hasBell? noise + " <GONG!>" : noise; }
        giveBell: function() { hasBell = true; }
    };
}

It’s interesting to note that all the languages I work with these days perform closure-by-reference, as opposed to closure-by-value. In constrast, closure-by-value would snapshot all identifiers in the enclosing scope, so immutable types (strings, numbers) would be impossible to change.

Sometimes, closure-by-reference can produce side effects that surprise developers, such as:

def surprise():
    funs = [lambda: x ** 2 for x in range(6)]
    assert funs[0]() == 25

This occurs because x is bound in function-local scope, and all the lambdas close over it by reference. When x is mutated in further iterations of the list comprehension (at least in Python 2.x), the lambdas are closed over the environment record of surprise, and all of them see the last value that x was updated to.

I can sympathize. In fact, I’ve wrote a program to do so:

var lambdas = [];
var condolences = ["You're totally right",
        "and I understand what you're coming from, but",
        "this is how closures work nowadays"];
for (var i = 0; i < condolences.length; i++) {
    var condolence = condolences[i];
    lambdas.push(function() { return condolence; });
}
print(lambdas[0]());

Keep in mind that var delcarations are hoisted to function scope in JS.

I implore you to note that comments will most likely be received while I’m sober.

Footnotes

[*] Vomited?
[†] Cue complaints about the imperfect lambda abstraction in JavaScript. Dang Ruby kids, go play with your blocks! ;-)
[‡] Roughly. Gory details left out for illustrative purposes.
[§] There’s also the case where the display array runs out of space for the array. I believe we emit unoptimized name-lookups in this case, but I don’t entirely recall.
[¶] With a few insidious dynamic scoping constructs thrown in. I’ll get to that in a later entry.
[#] Barring enclosing with statements and injected eval scopes.
[♠] Unless they contain an eval or with, in which case we call them "heavyweight" — though they still don’t need information from enclosing functions, they must carry a stack of environment records, so they’re not optimal. I love how many footenotes I make when I talk about the JavaScript language. ;-)
[♥] As a result, it’s extremely difficult to optimize accesses like these without whole propgram analysis.

Notes from the JS pit: lofty goals, humble beginnings

May 1st, 2010

I’ve been working at Mozilla for about two months on the JavaScript (JS) engine team, the members of which sit in an area affectionately known as the "JS pit".

Mozillians appear to try to blog on a regular basis, so I’ll be starting a series of entries prefixed "notes from the JS pit" to explain what I’ve been working on and/or thinking about.

Notably, I feel fortunate to work for a company that encourages this kind of openness.

Goals

I always feel stupid writing down goals — they seem so self-evident; however, it helps to put the work I’m doing into perspective and gives me something that I can to refer back to.

I’m also a big believer in the effectiveness of public accountability, so publishing those goals seems prudent — my notes from the JS pit are poised to help me stay motivated more than anything else.

My goals are to:

  • Do my part to meet and exceed performance targets in the JS engine. It’s an exciting time with lots of healthy competition to motivate this.
  • Immerse myself in language design concepts and seek out important implementation details, even if those details don’t pertain to something that I’m working on directly. This also entails some independent research and small projects to gain better understanding of foreign language concepts.
  • Deliberately take time to think constructively about alternative or innovative ways to accomplish our language goals. Taking things at face value as "the way we’ve always done it" is the way that innovative things get overlooked — this is relatively easy to do with a pair of fresh eyes, but similarly difficult because you’re the team noob.

Working with compiler engineers from diverse language backgrounds, it’s prime time for sucking knowledge out of people’s heads, comparing and contrasting it, and listening to them argue with each other. Heck, just look at the concepts behind JS: an imperative, Scheme-inspired, prototypal, C-and-Java-syntax conforming language that’s irrevocably tied to a practical platform, the web. It’s bound to be a fun ride.

From start to present

I started off implementing the simpler opcodes for JaegerMonkey (JM) and getting an understanding the JM code base. Not too long into it, I was told that looking into quick-and-easy parser optimizations was a priority — somebody had reported that a significant fraction of the GMail load bar time could be attributed to JS parsing. [*]

Now, JavaScript isn’t the easiest language in the world to parse; for example, automatic semicolon insertion creates some non-traditional obstacles for generated shift/reduce parsers [†] — it effectively makes an error correction algorithm part of the normal parse procedure. The details are for another entry, but suffice it to say that our recursive descent parser code gets complex, especially due to our E4X support and some of the static analyses we perform for optimizations before bytecode emission.

In pursuing JS parser optimization I assembled a suite of parsing benchmarks from sites on the web with "large" JS payloads — I call this suite parsemark. After getting some speedups from simple inlining, I attempted a somewhat fundamental change to the parser to reduce the number of branch mispredictions, in converting it to always have a token "pre-lexed" as opposed to the prior "lex-on-demand" model. Roughly, this required adding a "there’s always a lexed token" invariant to the lexer and hoisting lexer calls/modesets from substitution rules into their referring nonterminals in the parser. The details for this are also entry fodder. Sadly, it demonstrated negligible performance gains for the increase in complexity. Sure taught me a lot about our parser, though.

The biggest performance win was obtained through a basic fix to our parser arena-allocation chunk sizing. sayrer noticed that a surprising amount of time was being spent in kernel space, so we tracked the issue down. It was frustrating to work for a few weeks on a fundamental change and then realize that multiplying a constant by four can get you a 20% parsing speedup, but I’ve certainly learned to look a lot more closely at the vmem subsystem when things are slow. I have some speedup statistics and a comparison to V8 (with all its lazy parsing and parse-caching bits ripped out), but I don’t have much faith that my environment hasn’t changed in the course of all the historical data measurements — writing a script to verify speedup over changesets seems like a viable option for future notes.

In the miscellany department, I’ve been trying to do a good amount of work fixing broken windows via cleanup patches. I’m finding it difficult to strike a balance here, since there’s a lot of modularity-breaking interdependencies in the code base — what appear to be simple cleanups tend to unravel into large patches that get stale easily. However, cleanup does force you to read through the code you’re modifying, which is always good when you’re learning a new code base.

Looking back on it, it doesn’t seem like a lot of work; of course, my hope is that the time I spend up-front getting accustomed to the codebase will let me make progress on my goals more rapidly.

Stay tuned for more JS pittage — unpredictable time, but predictable channel.

Footnotes

[*] To date, I haven’t looked into this myself. Ideally, I should have verified it before starting on the parser work, but I was eager to start working on things rather than investigate the reasons behind them.
[†] Though I’ve seen that Webkit’s JavaScriptCore uses Bison — I’m going to have to check out that implementation at some point.

Efficiency of list comprehensions

April 30th, 2010

I’m psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I’ve restructured it for posterity.

Efficiency of the more common usage

Let’s look at the efficiency of list comprehensions in the more common usage, where the comprehension’s list result is actually relevant (or, in compiler-speak, live-out).

Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:

import dis
import inspect
import timeit
 
 
programs = dict(
    loop="""
result = []
for i in range(20):
    result.append(i * 2)
""",
   loop_faster="""
result = []
add = result.append
for i in range(20):
    add(i * 2)
""",
    comprehension='result = [i * 2 for i in range(20)]',
)
 
 
for name, text in programs.iteritems():
    print name, timeit.Timer(stmt=text).timeit()
    code = compile(text, '<string>', 'exec')
    dis.disassemble(code)
loop 11.1495118141
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
 
  3           6 SETUP_LOOP              37 (to 46)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (20)
             15 CALL_FUNCTION            1
             18 GET_ITER
        >>   19 FOR_ITER                23 (to 45)
             22 STORE_NAME               2 (i)
 
  4          25 LOAD_NAME                0 (result)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 LOAD_CONST               1 (2)
             37 BINARY_MULTIPLY
             38 CALL_FUNCTION            1
             41 POP_TOP
             42 JUMP_ABSOLUTE           19
        >>   45 POP_BLOCK
        >>   46 LOAD_CONST               2 (None)
             49 RETURN_VALUE
loop_faster 8.36096310616
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
 
  3           6 LOAD_NAME                0 (result)
              9 LOAD_ATTR                1 (append)
             12 STORE_NAME               2 (add)
 
  4          15 SETUP_LOOP              34 (to 52)
             18 LOAD_NAME                3 (range)
             21 LOAD_CONST               0 (20)
             24 CALL_FUNCTION            1
             27 GET_ITER
        >>   28 FOR_ITER                20 (to 51)
             31 STORE_NAME               4 (i)
 
  5          34 LOAD_NAME                2 (add)
             37 LOAD_NAME                4 (i)
             40 LOAD_CONST               1 (2)
             43 BINARY_MULTIPLY
             44 CALL_FUNCTION            1
             47 POP_TOP
             48 JUMP_ABSOLUTE           28
        >>   51 POP_BLOCK
        >>   52 LOAD_CONST               2 (None)
             55 RETURN_VALUE
comprehension 7.08145213127
  1           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_NAME               0 (_[1])
              7 LOAD_NAME                1 (range)
             10 LOAD_CONST               0 (20)
             13 CALL_FUNCTION            1
             16 GET_ITER
        >>   17 FOR_ITER                17 (to 37)
             20 STORE_NAME               2 (i)
             23 LOAD_NAME                0 (_[1])
             26 LOAD_NAME                2 (i)
             29 LOAD_CONST               1 (2)
             32 BINARY_MULTIPLY
             33 LIST_APPEND
             34 JUMP_ABSOLUTE           17
        >>   37 DELETE_NAME              0 (_[1])
             40 STORE_NAME               3 (result)
             43 LOAD_CONST               2 (None)
             46 RETURN_VALUE

List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized LIST_APPEND bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).

In the loop_faster program, you avoid the overhead of the append attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized LIST_APPEND bytecode instead of incurring the overhead of a function call, so it still trumps.

Using list comprehensions for side effects

I want to address a point that was brought up in the previous entry as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I’ll discuss the subjective bit first, since that’s the least sciency part.

Readability

Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…

Michael Foord

First of all, thanks to Michael for his excellent and thought provoking comment!

My response is that removing the use of the result does indeed make it less readable, precisely because you’re using a result-producing control flow construct where the result is not needed. I suppose I’m positing that it’s inherently confusing to do that with your syntax: there’s a looping form that doesn’t produce a result, so that should be used instead. It’s expressing your semantic intention via syntax.

For advanced Pythonistas it’s easy for figure out what’s going on at a glance, but comprehension-as-loop definitely has a "there’s more than one way to do it" smell about it, which also makes it less amenable to people learning the language.

With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn’t require a result they now ask themselves, "Can I fit this into the list comprehension form?" Those mental branches are, to me, what "one way to do it" is designed to avoid. When I read Perl code, I take "mental exceptions" all the time because the author didn’t use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that "no result needed" should automatically imply a loop construct.

Efficiency

Consider two functions, comprehension and loop:

def loop():
    accum = []
    for i in range(20):
        accum.append(i)
    return accum
 
 
def comprehension():
    accum = []
    [accum.append(i) for i in range(20)]
    return accum

N.B. This example is comparing the efficiency of a list comprehension where the result of the comprehension is ignored to a for loop that produces no result, as is discussed in the referenced entry, Python by example: list comprehensions.

Michael Foord comments:

Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.

However, the disassembly, obtained via dis.dis(func) looks like the following for the loop:

2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
 
3           6 SETUP_LOOP              33 (to 42)
            9 LOAD_GLOBAL              0 (range)
           12 LOAD_CONST               1 (20)
           15 CALL_FUNCTION            1
           18 GET_ITER
      >>   19 FOR_ITER                19 (to 41)
           22 STORE_FAST               1 (i)
 
4          25 LOAD_FAST                0 (accum)
           28 LOAD_ATTR                1 (append)
           31 LOAD_FAST                1 (i)
           34 CALL_FUNCTION            1
           37 POP_TOP
           38 JUMP_ABSOLUTE           19
      >>   41 POP_BLOCK
 
5     >>   42 LOAD_FAST                0 (accum)
           45 RETURN_VALUE

And it looks like the following for the comprehension:

2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
 
3           6 BUILD_LIST               0
            9 DUP_TOP
           10 STORE_FAST               1 (_[1])
           13 LOAD_GLOBAL              0 (range)
           16 LOAD_CONST               1 (20)
           19 CALL_FUNCTION            1
           22 GET_ITER
      >>   23 FOR_ITER                22 (to 48)
           26 STORE_FAST               2 (i)
           29 LOAD_FAST                1 (_[1])
           32 LOAD_FAST                0 (accum)
           35 LOAD_ATTR                1 (append)
           38 LOAD_FAST                2 (i)
           41 CALL_FUNCTION            1
           44 LIST_APPEND
           45 JUMP_ABSOLUTE           23
      >>   48 DELETE_FAST              1 (_[1])
           51 POP_TOP
 
4          52 LOAD_FAST                0 (accum)
           55 RETURN_VALUE

By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just "syntactic sugar" for the for loop, as mentioned by nes — they both lower down into the same control flow construct at a virtual machine level, at least in CPython.

The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead — it’s simply deleted in bytecode 48. Unless the POP_BLOCK operation (bytecode 41) of the loop disassembly is very expensive (I haven’t looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.

Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a for loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special Cython construct, because LOAD_GLOBAL range could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper The effect of unrolling and inlining for Python bytecode optimizations.

Learning Python by example: list comprehensions

April 29th, 2010

My friend, who is starting to learn Python 2.x, asked me what this snippet did:

def collapse(seq):
    # Preserve order.
    uniq = []
    [uniq.append(item) for item in seq if not uniq.count(item)]
    return uniq

This is not a snippet that should be emulated (i.e. it’s bad); however, it makes me happy: there are so many things that can be informatively corrected!

What is a list comprehension?

A list comprehension is a special brackety syntax to perform a transform operation with an optional filter clause that always produces a new sequence (list) object as a result. To break it down visually, you perform:

new_range = [i * i          for i in range(5)   if i % 2 == 0]

Which corresponds to:

*result*  = [*transform*    *iteration*         *filter*     ]

The filter piece answers the question, "should this item be transformed?" If the answer is yes, then the transform piece is evaluated and becomes an element in the result. The iteration [*] order is preserved in the result.

Go ahead and figure out what you expect new_range to be in the prior example. You can double check me in the Python shell, but I think it comes out to be:

>>> new_range = [i * i for i in range(5) if i % 2 == 0]
>>> print new_range
[0, 4, 16]

If it still isn’t clicking, we can try to make the example less noisy by getting rid of the transform and filter — can you tell what this will produce?

>>> new_range = [i for i in range(5)]

So what’s wrong with that first snippet?

As we observed in the previous section, a list comprehension always produces a result list, where the elements of the result list are the transformed elements of the iteration. That means, if there’s no filter piece, there are exactly as many result elements as there were iteration elements.

Weird thing number one about the snippet — the list comprehension result is unused. It’s created, mind you — list comprehension always create a value, even if you don’t care what it is — but it just goes off to oblivion. (In technical terms, it becomes garbage.) When you don’t need the result, just use a for loop! This is better:

def colapse(seq):
    """Preserve order."""
    uniq = []
    for item in seq:
        if not uniq.count(item):
            uniq.append(item)
    return uniq

It’s two more lines, but it’s less weird looking and wasteful. "Better for everybody who reads and runs your code," means you should do it.

Moral of the story: a list comprehension isn’t just, "shorthand for a loop." It’s shorthand for a transform from an input sequence to an output sequence with an optional filter. If it gets too complex or weird looking, just make a loop. It’s not that hard and readers of your code will thank you.

Weird thing number two: the transform, list.append(item), produces None as its output value, because the return value from list.append is always None. Therefore, the result, even though it isn’t kept anywhere, is a list of None values of the same length as seq (notice that there’s no filter clause).

Weird thing number three: list.count(item) iterates over every element in the list looking for things that == to item. If you think through the case where you call collapse on an entirely unique sequence, you can tell that the collapse algorithm is O(n2). In fact, it’s even worse than it may seem at first glance, because count will keep going all the way to the end of uniq, even if it finds item in the first index of uniq. What the original author really wanted was item not in uniq, which bails out early if it finds item in uniq.

Also worth mentioning for the computer-sciency folk playing along at home: if all elements of the sequence are comparable, you can bring that down to O(n * log n) by using a "shadow" sorted sequence and bisecting to test for membership. If the sequence is hashable you can bring it down to O(n), perhaps by using the set datatype if you are in Python >= 2.3. Note that the common cases of strings, numbers, and tuples (any built-in immutable datatype, for that matter) are hashable.

From Python history

It’s interesting to note that Python Enhancement Proposal (PEP) #270 considered putting a uniq function into the language distribution, but withdrew it with the following statement:

Removing duplicate elements from a list is a common task, but there are only two reasons I can see for making it a built-in. The first is if it could be done much faster, which isn’t the case. The second is if it makes it significantly easier to write code. The introduction of sets.py eliminates this situation since creating a sequence without duplicates is just a matter of choosing a different data structure: a set instead of a list.

Remember that sets can only contain hashable elements (same policy as dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as mentioned in the last paragraph of the previous section.

Footnotes

[*] "Iteration" is just a fancy word for "step through the sequence, element by element, and give that element a name." In our case we’re giving the name i.

Thoughts on the effect and effectiveness of ranting

April 11th, 2010

I’m intrigued by posts like this fix-our-goddamn-Rails-tickets-you-lazy-non-contributing-user rant and Zed’s shut-up-and-learn-C-you-noob rant. I find them relatively tactless, as I’m sure others do, but that says nothing about their effectiveness.

As a writer, it’s a classic value judgment: one one hand, you’re lowering the level of discourse; on the other hand, that may be what’s necessary to reach the pathologically uninformed. From a writer’s perspective, how many new Rails contributors or interested C programmers would justify that kind of entry? I’m reminded of an ESR post I read a while back:

I listened to the others on the channel offer polite, reasoned, factually correct counterarguments to this guy, and get nowhere. And suddenly…suddenly, I understood why. It was because the beliefs the ignoramus was spouting were only surface structure; refuting them one-by-one could do no good without directly confronting the substructure, the emotional underpinnings that made ignoramus unable to consider or evaluate counter-evidence.

Fighting against people’s irrationally-held positions with well-reasoned arguments can be ineffective. When you combine this phenomenon with the web’s insatiable attention to conflict and drama, trollish writing is a predictable and potentially effective result. Especially on the web, infamous and famous aren’t far apart.