Notes from the JS pit: lofty goals, humble beginnings
Mozillians appear to try to blog on a regular basis, so I'll be starting a series of entries prefixed "notes from the JS pit" to explain what I've been working on and/or thinking about.
Notably, I feel fortunate to work for a company that encourages this kind of openness.
I always feel stupid writing down goals — they seem so self-evident; however, it helps to put the work I'm doing into perspective and gives me something that I can to refer back to.
I'm also a big believer in the effectiveness of public accountability, so publishing those goals seems prudent — my notes from the JS pit are poised to help me stay motivated more than anything else.
My goals are to:
Do my part to meet and exceed performance targets in the JS engine. It's an exciting time with lots of healthy competition to motivate this.
Immerse myself in language design concepts and seek out important implementation details, even if those details don't pertain to something that I'm working on directly. This also entails some independent research and small projects to gain better understanding of foreign language concepts.
Deliberately take time to think constructively about alternative or innovative ways to accomplish our language goals. Taking things at face value as "the way we've always done it" is the way that innovative things get overlooked — this is relatively easy to do with a pair of fresh eyes, but similarly difficult because you're the team noob.
Working with compiler engineers from diverse language backgrounds, it's prime time for sucking knowledge out of people's heads, comparing and contrasting it, and listening to them argue with each other. Heck, just look at the concepts behind JS: an imperative, Scheme-inspired, prototypal, C-and-Java-syntax conforming language that's irrevocably tied to a practical platform, the web. It's bound to be a fun ride.
From start to present
I started off implementing the simpler opcodes for JaegerMonkey (JM) and getting an understanding the JM code base. Not too long into it, I was told that looking into quick-and-easy parser optimizations was a priority — somebody had reported that a significant fraction of the GMail load bar time could be attributed to JS parsing. [*]
In pursuing JS parser optimization I assembled a suite of parsing benchmarks from sites on the web with "large" JS payloads — I call this suite parsemark. After getting some speedups from simple inlining, I attempted a somewhat fundamental change to the parser to reduce the number of branch mispredictions, in converting it to always have a token "pre-lexed" as opposed to the prior "lex-on-demand" model. Roughly, this required adding a "there's always a lexed token" invariant to the lexer and hoisting lexer calls/modesets from substitution rules into their referring nonterminals in the parser. The details for this are also entry fodder. Sadly, it demonstrated negligible performance gains for the increase in complexity. Sure taught me a lot about our parser, though.
The biggest performance win was obtained through a basic fix to our parser arena-allocation chunk sizing. sayrer noticed that a surprising amount of time was being spent in kernel space, so we tracked the issue down. It was frustrating to work for a few weeks on a fundamental change and then realize that multiplying a constant by four can get you a 20% parsing speedup, but I've certainly learned to look a lot more closely at the vmem subsystem when things are slow. I have some speedup statistics and a comparison to V8 (with all its lazy parsing and parse-caching bits ripped out), but I don't have much faith that my environment hasn't changed in the course of all the historical data measurements — writing a script to verify speedup over changesets seems like a viable option for future notes.
In the miscellany department, I've been trying to do a good amount of work fixing broken windows via cleanup patches. I'm finding it difficult to strike a balance here, since there's a lot of modularity-breaking interdependencies in the code base — what appear to be simple cleanups tend to unravel into large patches that get stale easily. However, cleanup does force you to read through the code you're modifying, which is always good when you're learning a new code base.
Looking back on it, it doesn't seem like a lot of work; of course, my hope is that the time I spend up-front getting accustomed to the codebase will let me make progress on my goals more rapidly.
Stay tuned for more JS pittage — unpredictable time, but predictable channel.
Efficiency of list comprehensions
I'm psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I've restructured it for posterity.
Efficiency of the more common usage
Let's look at the efficiency of list comprehensions in the more common usage, where the comprehension's list result is actually relevant (or, in compiler-speak, live-out).
Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:
programs = dict(
result = 
for i in range(20):
result.append(i * 2)
result = 
add = result.append
for i in range(20):
add(i * 2)
comprehension='result = [i * 2 for i in range(20)]',
for name, text in programs.iteritems():
print name, timeit.Timer(stmt=text).timeit()
code = compile(text, '<string>', 'exec')
2 0 BUILD_LIST 0
3 STORE_NAME 0 (result)
3 6 SETUP_LOOP 37 (to 46)
9 LOAD_NAME 1 (range)
12 LOAD_CONST 0 (20)
15 CALL_FUNCTION 1
>> 19 FOR_ITER 23 (to 45)
22 STORE_NAME 2 (i)
4 25 LOAD_NAME 0 (result)
28 LOAD_ATTR 3 (append)
31 LOAD_NAME 2 (i)
34 LOAD_CONST 1 (2)
38 CALL_FUNCTION 1
42 JUMP_ABSOLUTE 19
>> 45 POP_BLOCK
>> 46 LOAD_CONST 2 (None)
2 0 BUILD_LIST 0
3 STORE_NAME 0 (result)
3 6 LOAD_NAME 0 (result)
9 LOAD_ATTR 1 (append)
12 STORE_NAME 2 (add)
4 15 SETUP_LOOP 34 (to 52)
18 LOAD_NAME 3 (range)
21 LOAD_CONST 0 (20)
24 CALL_FUNCTION 1
>> 28 FOR_ITER 20 (to 51)
31 STORE_NAME 4 (i)
5 34 LOAD_NAME 2 (add)
37 LOAD_NAME 4 (i)
40 LOAD_CONST 1 (2)
44 CALL_FUNCTION 1
48 JUMP_ABSOLUTE 28
>> 51 POP_BLOCK
>> 52 LOAD_CONST 2 (None)
1 0 BUILD_LIST 0
4 STORE_NAME 0 (_)
7 LOAD_NAME 1 (range)
10 LOAD_CONST 0 (20)
13 CALL_FUNCTION 1
>> 17 FOR_ITER 17 (to 37)
20 STORE_NAME 2 (i)
23 LOAD_NAME 0 (_)
26 LOAD_NAME 2 (i)
29 LOAD_CONST 1 (2)
34 JUMP_ABSOLUTE 17
>> 37 DELETE_NAME 0 (_)
40 STORE_NAME 3 (result)
43 LOAD_CONST 2 (None)
List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized LIST_APPEND bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).
In the loop_faster program, you avoid the overhead of the append attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized LIST_APPEND bytecode instead of incurring the overhead of a function call, so it still trumps.
Using list comprehensions for side effects
I want to address a point that was brought up in the previous entry as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I'll discuss the subjective bit first, since that's the least sciency part.
Simple test – if you did need the result would the comprehension be
easily understood? If the answer is yes then removing the assignment on the
left hand side doesn’t magically make it less readable…
First of all, thanks to Michael for his excellent and thought provoking comment!
My response is that removing the use of the result does indeed make it less readable, precisely because you're using a result-producing control flow construct where the result is not needed. I suppose I'm positing that it's inherently confusing to do that with your syntax: there's a looping form that doesn't produce a result, so that should be used instead. It's expressing your semantic intention via syntax.
For advanced Pythonistas it's easy for figure out what's going on at a glance, but comprehension-as-loop definitely has a "there's more than one way to do it" smell about it, which also makes it less amenable to people learning the language.
With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn't require a result they now ask themselves, "Can I fit this into the list comprehension form?" Those mental branches are, to me, what "one way to do it" is designed to avoid. When I read Perl code, I take "mental exceptions" all the time because the author didn't use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that "no result needed" should automatically imply a loop construct.
Consider two functions, comprehension and loop:
accum = 
for i in range(20):
accum = 
[accum.append(i) for i in range(20)]
N.B. This example is comparing the efficiency of a list comprehension where the result of the comprehension is ignored to a for loop that produces no result, as is discussed in the referenced entry, Python by example: list comprehensions.
Michael Foord comments:
Your alternative for the single line, easily readable, list comprehension
is four lines that are less efficient because the loop happens in the
interpreter rather than in C.
However, the disassembly, obtained via dis.dis(func) looks like the following for the loop:
2 0 BUILD_LIST 0
3 STORE_FAST 0 (accum)
3 6 SETUP_LOOP 33 (to 42)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (20)
15 CALL_FUNCTION 1
>> 19 FOR_ITER 19 (to 41)
22 STORE_FAST 1 (i)
4 25 LOAD_FAST 0 (accum)
28 LOAD_ATTR 1 (append)
31 LOAD_FAST 1 (i)
34 CALL_FUNCTION 1
38 JUMP_ABSOLUTE 19
>> 41 POP_BLOCK
5 >> 42 LOAD_FAST 0 (accum)
And it looks like the following for the comprehension:
2 0 BUILD_LIST 0
3 STORE_FAST 0 (accum)
3 6 BUILD_LIST 0
10 STORE_FAST 1 (_)
13 LOAD_GLOBAL 0 (range)
16 LOAD_CONST 1 (20)
19 CALL_FUNCTION 1
>> 23 FOR_ITER 22 (to 48)
26 STORE_FAST 2 (i)
29 LOAD_FAST 1 (_)
32 LOAD_FAST 0 (accum)
35 LOAD_ATTR 1 (append)
38 LOAD_FAST 2 (i)
41 CALL_FUNCTION 1
45 JUMP_ABSOLUTE 23
>> 48 DELETE_FAST 1 (_)
4 52 LOAD_FAST 0 (accum)
By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just "syntactic sugar" for the for loop, as mentioned by nes — they both lower down into the same control flow construct at a virtual machine level, at least in CPython.
The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead — it's simply deleted in bytecode 48. Unless the POP_BLOCK operation (bytecode 41) of the loop disassembly is very expensive (I haven't looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.
Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a for loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special Cython construct, because LOAD_GLOBAL range could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper The effect of unrolling and inlining for Python bytecode optimizations.
Learning Python by example: list comprehensions
My friend, who is starting to learn Python 2.x, asked me what this snippet did:
# Preserve order.
uniq = 
[uniq.append(item) for item in seq if not uniq.count(item)]
This is not a snippet that should be emulated (i.e. it's bad); however, it
makes me happy: there are so many things that can be informatively corrected!
What is a list comprehension?
A list comprehension is a special brackety syntax to perform a transform
operation with an optional filter clause that always produces a new sequence
(list) object as a result. To break it down visually, you perform:
new_range = [i * i for i in range(5) if i % 2 == 0]
Which corresponds to:
*result* = [*transform* *iteration* *filter* ]
The filter piece answers the question, "should this item be transformed?" If
the answer is yes, then the transform piece is evaluated and becomes an
element in the result. The iteration [*] order is preserved in the
Go ahead and figure out what you expect new_range to be in the prior
example. You can double check me in the Python shell, but I think it comes out
>>> new_range = [i * i for i in range(5) if i % 2 == 0]
>>> print new_range
[0, 4, 16]
If it still isn't clicking, we can try to make the example less noisy by
getting rid of the transform and filter — can you tell what this will produce?
>>> new_range = [i for i in range(5)]
So what's wrong with that first snippet?
As we observed in the previous section, a list comprehension always produces a
result list, where the elements of the result list are the transformed
elements of the iteration. That means, if there's no filter piece, there
are exactly as many result elements as there were iteration elements.
Weird thing number one about the snippet — the list comprehension result is
unused. It's created, mind you — list comprehension always create a value,
even if you don't care what it is — but it just goes off to oblivion. (In
technical terms, it becomes garbage.) When you don't need the result, just
use a for loop! This is better:
uniq = 
for item in seq:
if not uniq.count(item):
It's two more lines, but it's less weird looking and wasteful. "Better for
everybody who reads and runs your code," means you should do it.
Moral of the story: a list comprehension isn't just, "shorthand for a loop."
It's shorthand for a transform from an input sequence to an output sequence
with an optional filter. If it gets too complex or weird looking, just make a
loop. It's not that hard and readers of your code will thank you.
Weird thing number two: the transform, list.append(item), produces None
as its output value, because the return value from list.append is always
None. Therefore, the result, even though it isn't kept anywhere, is a
list of None values of the same length as seq (notice that there's no
Weird thing number three: list.count(item) iterates over every element in
the list looking for things that == to item. If you think through
the case where you call collapse on an entirely unique sequence, you can
tell that the collapse algorithm is O(n2). In fact, it's even worse
than it may seem at first glance, because count will keep going all the way
to the end of uniq, even if it finds item in the first index of
uniq. What the original author really wanted was item not in uniq,
which bails out early if it finds item in uniq.
Also worth mentioning for the computer-sciency folk playing along at home: if
all elements of the sequence are comparable, you can bring that down to O(n *
log n) by using a "shadow" sorted sequence and bisecting to test for
membership. If the sequence is hashable you can bring it down to O(n), perhaps
by using the set datatype if you are in Python >= 2.3. Note that the common
cases of strings, numbers, and tuples (any built-in immutable datatype, for
that matter) are hashable.
From Python history
It's interesting to note that Python Enhancement Proposal (PEP) #270
considered putting a uniq function into the language distribution, but
withdrew it with the following statement:
Removing duplicate elements from a list is a common task, but
there are only two reasons I can see for making it a built-in.
The first is if it could be done much faster, which isn't the
case. The second is if it makes it significantly easier to
write code. The introduction of sets.py eliminates this
situation since creating a sequence without duplicates is just
a matter of choosing a different data structure: a set instead
of a list.
Remember that sets can only contain hashable elements (same policy as
dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as
mentioned in the last paragraph of the previous section.
Thoughts on the effect and effectiveness of ranting
I'm intrigued by posts like this
fix-our-goddamn-Rails-tickets-you-lazy-non-contributing-user rant and
Zed's shut-up-and-learn-C-you-noob rant. I find them relatively tactless,
as I'm sure others do, but that says nothing about their effectiveness.
As a writer, it's a classic value judgment: one one hand, you're lowering the
level of discourse; on the other hand, that may be what's necessary to reach
the pathologically uninformed. From a writer's perspective, how many new Rails
contributors or interested C programmers would justify that kind of entry? I'm
reminded of an ESR post I read a while back:
I listened to the others on the channel offer polite, reasoned, factually
correct counterarguments to this guy, and get nowhere. And
suddenly…suddenly, I understood why. It was because the beliefs the
ignoramus was spouting were only surface structure; refuting them
one-by-one could do no good without directly confronting the substructure,
the emotional underpinnings that made ignoramus unable to consider or
Fighting against people's irrationally-held positions with well-reasoned
arguments can be ineffective. When you combine this phenomenon with the web's
insatiable attention to conflict and drama, trollish writing is a predictable
and potentially effective result. Especially on the web, infamous and
famous aren't far apart.
Code ☃ Unicode
Let's come to terms: angle brackets and forward slashes are overloaded. Between relational operators, templates, XML tags, (HTML/squiggly brace language) comments, division, regular expressions, and path separators, what don't they do?
I think it's clear to everyone that XML is the best and most human readable markup format ever conceived (data serialization and database backing store applications heartily included), so it's time for all that crufty old junk from yesteryear to learn its place. Widely adopted web standards (such as Binary XML and E4X) and well specified information exchange protocols (such as SOAP) speak for themselves through the synergy they've utilized in enterprise compute environments.
The results of a confidential survey I conducted conclusively demonstrate beyond any possibility of refutation that you type more angle brackets in an average markup document than you will type angle-bracket relational operators for the next ten years.
In conclusion, your life expectancy decreases as you continue to use the less-than operator and forward slash instead of accepting XML into your heart as a first-class syntax. I understand that some may not enjoy life or the pursuit of happiness and that they will continue to use deprecated syntaxes. To each their own.
The (intolerably whitespace-sensitive) Python programming language nearly came to a similar conclusion to use unicode more pervasively, while simultaneously making it a real programming language by way of the use of types, but did not have the wherewithal to see it through.
Of course, I'd also recommend that C++ solve its nested template delimiter issue with ☃ and ☼ to close instead of increasing the context-sensitivity of the parser. [*] It follows the logical flow of start/end delimiting.
As soon as Emoji are accepted as proper unicode code points, I will revise my recommendation to suggest using the standard poo emoticon for a template start delimiter, because increased giggling is demonstrated to reduce the likelihood of head-and-wall involved injuries during C++ compilation, second only to regular use of head protection while programming.