<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Honest to a Segfault &#187; Programming</title>
	<atom:link href="http://blog.cdleary.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cdleary.com</link>
	<description>__author__ = &#039;Chris Leary&#039;</description>
	<lastBuildDate>Sun, 05 Sep 2010 20:41:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>A prototypal binding trap</title>
		<link>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/</link>
		<comments>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/#comments</comments>
		<pubDate>Sun, 05 Sep 2010 20:29:30 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Trap]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=905</guid>
		<description><![CDATA[It always pains me to explain these little identifier resolution traps: #!/usr/bin/env python3 &#160; class Egg: &#160; _next_id = 1 &#160; def __init__&#40;self&#41;: self.id = self._next_id self._next_id += 1 assert Egg._next_id is self._next_id &#160; &#160; if __name__ == '__main__': f = Egg&#40;&#41; Fails the assertion. It&#8217;s decomposing the assignment-update into its constituent tmp = self._next_id [...]]]></description>
			<content:encoded><![CDATA[<p>It always pains me to explain these little identifier resolution traps:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> Egg:
&nbsp;
    _next_id = <span style="color: #ff4500;">1</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: #008000;">id</span> = <span style="color: #008000;">self</span>._next_id
        <span style="color: #008000;">self</span>._next_id += <span style="color: #ff4500;">1</span>
        <span style="color: #ff7700;font-weight:bold;">assert</span> Egg._next_id <span style="color: #ff7700;font-weight:bold;">is</span> <span style="color: #008000;">self</span>._next_id
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    f = Egg<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Fails the assertion. It&#8217;s decomposing the assignment-update into its constituent <tt class="docutils literal"><span class="pre">tmp</span> <span class="pre">=</span> <span class="pre">self._next_id</span> <span class="pre">+</span> <span class="pre">1;</span> <span class="pre">self._next_id</span> <span class="pre">=</span> <span class="pre">tmp</span></tt> components, but a programmer could reasonably expect a Lookup/Update hash map ADT operation to occur instead &#8212; get the slot by lookup, mutate the value found, and store back in an atomic sense, clobbering the class member with the updated value &#8212; but that&#8217;s <strong>not</strong> how it works.</p>
<p>This goes with the prototypal territory:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> Egg<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">id</span> <span style="color: #339933;">=</span> <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span> <span style="color: #339933;">+=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
    assertEq<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span>.<span style="color: #660066;">next_id</span><span style="color: #339933;">,</span> Egg.<span style="color: #660066;">prototype</span>.<span style="color: #660066;">next_id</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
Egg.<span style="color: #660066;">prototype</span>.<span style="color: #660066;">next_id</span> <span style="color: #339933;">=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
<span style="color: #003366; font-weight: bold;">var</span> e <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">new</span> Egg<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// Error: Assertion failed: got 2, expected 1</span></pre></div></div>

<p>Fortunately, note that this behavior definitely has the semantics you want when you add inheritance to the mix. Is the <tt class="docutils literal"><span class="pre">self.viscosity</span></tt> that this class implementation is referring to a class member or an instance member in the base class?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/env python3</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> sauce
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> AwesomeSauce<span style="color: black;">&#40;</span>sauce.<span style="color: black;">Sauce</span><span style="color: black;">&#41;</span>:
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">super</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #008000;">self</span>.<span style="color: black;">viscosity</span> += <span style="color: #ff4500;">12</span> <span style="color: #808080; font-style: italic;"># Deca-viscoses per milliliter.</span></pre></div></div>

<p>The answer is that you don&#8217;t care &#8212; it&#8217;s being rebound on the <tt class="docutils literal"><span class="pre">AwesomeSauce</span></tt> instance no matter what.</p>
<p>Moral of the story is to be mindful when using update operations on class members &#8212; if you want to rebind a class member within an instance method, you&#8217;ve got to use the class name instead of <tt>self</tt>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/09/a-prototypal-binding-trap/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Coding style as a feature of language design</title>
		<link>http://blog.cdleary.com/2010/07/coding-style-as-a-feature-of-language-design/</link>
		<comments>http://blog.cdleary.com/2010/07/coding-style-as-a-feature-of-language-design/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 16:00:09 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Languages]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Data Structures]]></category>
		<category><![CDATA[Parsing]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=886</guid>
		<description><![CDATA[roc recently posted a thought-provoking entry titled, &#34;Coding Style as a Failure of Language Design&#34;, in which he states: Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel [...]]]></description>
			<content:encoded><![CDATA[<p><a class="reference external" href="http://weblogs.mozillazine.org/roc/">roc</a> recently posted a thought-provoking entry titled, <a class="reference external" href="http://weblogs.mozillazine.org/roc/archives/2010/07/coding_style_as.html">&quot;Coding Style as a Failure of Language Design&quot;</a>, in which he states:</p>
<blockquote><p>
Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel the need to do it. Syntax would be uniform within and across projects, and developers would not need to learn multiple variants of the same language.</p></blockquote>
<p>I totally agree with roc&#8217;s point that there is overhead in learning-and-conforming-to local style guidelines. I also agree that this overhead is unnecessary and that language implementers should find ways to eliminate it; however, I think that imposing additional arbitrary constraints on the syntax is heading in the wrong direction.</p>
<p>Your language&#8217;s execution engine <a class="footnote-reference" href="#id6" id="id3"><tt>[*]</tt></a> already has a method of normalizing crazy styles: it forms an abstract syntax tree. Before the abstract syntax tree (AST) is mutated <a class="footnote-reference" href="#id7" id="id4"><tt>[†]</tt></a> it is in perfect correspondence with the original source text, modulo the infinite number of possible formatting preferences. This <em>is</em> the necessary set of constraints on the syntax that can actually result in your program being executed as it is written. <a class="footnote-reference" href="#id8" id="id5"><tt>[‡]</tt></a></p>
<p>So, why don&#8217;t we just lug <em>that</em> thing around instead of the source text itself?</p>
<div class="section" id="the-dream">
<h3>The dream</h3>
<p>The feature that languages should offer is a <a class="reference external" href="http://en.wikipedia.org/wiki/Multiplexer">mux/demux</a> service: mux an infinite number of formatting preferences into an AST (via a traditional parser); demux the AST into source text via an AST-decompiler, parameterized by an arbitrarily large set of formatting options. Language implementations could ship with a pair of standalone binaries. Seriously, the <em>reference language implementation</em> should understand its own formatting parameters at least as well as Eclipse does. <a class="footnote-reference" href="#id11" id="id9"><tt>[§]</tt></a></p>
<p>Once you have the demux tool, you run it on your AST files as a post-checkout hook in your revision control system for instant style personalization. If the engine accepts the AST directly as input, you would only need to demux the files you planned to work on &#8212; if the engine accepted an AST directly as input in lieu of source text, this could even be an optimization.</p>
<p>Different execution engines are likely to use different ASTs, but there should be little problem with composability: checked-in AST goes through standalone demux with an arbitrary set of preferences, then through the alternate compiler&#8217;s mux. So long as the engines have the same language grammar for the source text, everybody&#8217;s happy, and you don&#8217;t have to waste time writing silly AST-to-AST-prime transforms.</p>
<p>In this model, linters are just composable AST observers/transforms that have no ordering dependencies. You could even offer a service for simple grammatical extensions without going so far as <a class="reference external" href="http://en.wikipedia.org/wiki/Perl_6_rules">language level support</a>. Want a block-end delimiter in the Python code you look at? <a class="footnote-reference" href="#id14" id="id12"><tt>[¶]</tt></a> Why not, just use a transform to rip it out before it leaves the front-end of the execution engine.</p>
</div>
<div class="section" id="reality">
<h3>Reality</h3>
<p>Of course, the set of languages we know and love has some overlap with the set of languages that totally suck to parse, whether due to <a class="reference external" href="http://en.wikipedia.org/wiki/C_preprocessor">preprocessors</a> or <a class="reference external" href="http://en.wikipedia.org/wiki/JavaScript_syntax#Whitespace_and_semicolons">context sensitivity</a> or <a class="reference external" href="http://en.wikipedia.org/wiki/Black_Perl">the desire to parse poems</a>, but I would bet good money that there are solutions for such languages. In any case, the symmetric difference between those two sets could get with it, and new languages would be kind to follow suit. It would certainly be an interesting post-FF4 experiment for SpiderMonkey, as we&#8217;ve got a plan on file to clean up the parser interfaces for an <a class="reference external" href="http://wiki.ecmascript.org/doku.php?id=strawman:ast">intriguing ECMAScript strawman proposal</a> anywho.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[*]</tt></a></td>
<td>Interpreter, compiler, translator, whatever.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4"><tt>[†]</tt></a></td>
<td>To do constant folding or what have you.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5"><tt>[‡]</tt></a></td>
<td>Oh yeah, and comments. We would have to keep those around too. They&#8217;re easy enough to throw away during the first pass over the AST.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id11" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id9"><tt>[§]</tt></a></td>
<td>Even more ideal, you&#8217;d move all of that formatting and autocompletion code out of IDEs into a language service API.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12"><tt>[¶]</tt></a></td>
<td>Presumably because you despise all that is good and righteous in the world? ;-)</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/07/coding-style-as-a-feature-of-language-design/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Notes from the JS pit: closure optimization</title>
		<link>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-closure-optimization/</link>
		<comments>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-closure-optimization/#comments</comments>
		<pubDate>Tue, 11 May 2010 10:01:33 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Dijkstra display]]></category>
		<category><![CDATA[Funarg Problem]]></category>
		<category><![CDATA[JS Pit]]></category>
		<category><![CDATA[Parsing]]></category>
		<category><![CDATA[Static Analysis]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=848</guid>
		<description><![CDATA[In anticipation of a much-delayed dentist appointment tomorrow morning and under the assumption that hard liquor removes plaque, I&#8217;ve produced [*] an entry in the spirit of Stevey&#8217;s Drunken Blog Rants, s/wine/scotch/g. I apologize for any and all incomprehensibility, although Stevey may not mind since it&#8217;s largely an entry about funargs, which he seems to [...]]]></description>
			<content:encoded><![CDATA[<p>In anticipation of a much-delayed dentist appointment tomorrow morning and under the assumption that hard liquor removes plaque, I&#8217;ve produced <a class="footnote-reference" href="#id4" id="id1"><tt>[*]</tt></a> an entry in the spirit of <a class="reference external" href="http://steve-yegge.blogspot.com/2008_04_01_archive.html">Stevey&#8217;s Drunken Blog Rants</a>, <tt class="docutils literal"><span class="pre">s/wine/scotch/g</span></tt>. I apologize for any and all incomprehensibility, although Stevey may not mind since it&#8217;s largely an entry about funargs, which he seems to have a thing for. (Not that I blame him &#8212; I&#8217;m thinking about them while drinking&#8230;) It also appears I may need to prove myself worthy of emigration to <a class="reference external" href="http://planet.mozilla.org/">planet Mozilla</a>, so hopefully an entry filled with funarg debauchery will serve that purpose as well.</p>
<div class="section" id="background">
<h3>Background</h3>
<p>Lately, I&#8217;ve been doing a little work on closure optimization, as permitted by static analysis; i.e. the parser/compiler marks which functions can be optimized into various closure forms.</p>
<p>In a language that permits nested functions and functions as first-class values, there are a few things you need to ask about each function before you optimize it:</p>
<ul class="simple">
<li>Can it <em>escape</em> (leave the scope it&#8217;s defined in)?</li>
<li>Does it refer to <em>free variables</em> (identifiers declared outside the function definition)?</li>
<li>Are the free variables it refers to potentially <em>redefined</em> after the function definition?</li>
</ul>
</div>
<div class="section" id="function-escape-the-funarg-problem">
<h3>Function escape (the funarg problem)</h3>
<p>If a function can execute outside the scope in which it was lexically defined, it is said to be a &quot;funarg&quot;, a fancy word for &quot;potentially escaping outside the scope where it&#8217;s defined&quot;. We call certain functions in the JS runtime <em>Algol-like closures</em> if they are immediately applied function expressions, like so:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> x <span style="color: #339933;">=</span> <span style="color: #CC0000;">12</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span> cubeX<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> x <span style="color: #339933;">*</span> x <span style="color: #339933;">*</span> x<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The function <tt class="docutils literal"><span class="pre">cubeX</span></tt> can never execute outside the confines of <tt class="docutils literal"><span class="pre">outer</span></tt> &#8212; there&#8217;s no way for the function definition to escape. It&#8217;s as if you just took the expression <tt class="docutils literal"><span class="pre">x</span> <span class="pre">*</span> <span class="pre">x</span> <span class="pre">*</span> <span class="pre">x</span></tt>, wrapped it in a lambda (function expression), and immediately executed that expression. <a class="footnote-reference" href="#id6" id="id5"><tt>[†]</tt></a></p>
<p>Apparently a lot of Algol programmers had the hots for this kinda thing &#8212; the whole function-wrapping thing was totally optional, but you chose to do it, Algol programmers, and we respect your choice.</p>
<p>You can optimize this case through static analysis. As long as there&#8217;s no possibility of escape between a declaration and its use in a nested function, the nested function knows exactly how far to reach up the stack to retrieve/manipulate the variable &#8212; the activation record stack is totally determined at compile time. Because there&#8217;s no escaping, there&#8217;s not even any need to import the upvar into the Algol-like function.</p>
<div class="section" id="dijkstra-s-display-optimization">
<h4>Dijkstra&#8217;s display optimization</h4>
<p>To optimize this Algol-like closure case we used a construct called a &quot;Dijkstra display&quot; (or something named along those lines). You just keep an array of stack frame pointers, with each array slot representing the frame currently executing at that function nesting level. When outer is called in the above, outer&#8217;s stack frame pointer would be placed in the display array at nesting level 0, so the array would look like:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: NULL
Level 2: NULL
...</pre></div></div>

<p>Then, when cubeX is invoked, it is placed at nesting level 1:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: &amp;cubeX
Level 2: NULL
...</pre></div></div>

<p>At parse time, we tell <tt class="docutils literal"><span class="pre">cubeX</span></tt> that it can reach up to level 0, frame slot 0 to retrieve the <tt class="docutils literal"><span class="pre">jsval</span></tt> for x. <a class="footnote-reference" href="#id8" id="id7"><tt>[‡]</tt></a> Even if you have &quot;parent&quot; frame references in each stack frame, this array really helps when a function is reaching up many levels to retrieve an upvar, since you can do a single array lookup instead of an <em>n</em> link parent chain traversal. Note that this is only useful when you know the upvar-referring functions will never escape, because the display can only track stack frames for functions that are currently executing.</p>
<p>There&#8217;s also the possibility that two functions at the same nesting level are executing simultaneously; i.e.</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> x <span style="color: #339933;">=</span> <span style="color: #CC0000;">24</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerFirst<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> x<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerSecond<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #003366; font-weight: bold;">var</span> x <span style="color: #339933;">=</span> <span style="color: #CC0000;">42</span><span style="color: #339933;">;</span>
        <span style="color: #000066; font-weight: bold;">return</span> innerFirst<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000066; font-weight: bold;">return</span> innerSecond<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>To deal with this case, each stack frame has a pointer to the &quot;chained&quot; display stack frame for that nesting level, which is restored when the executing function returns. To go through the motions:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: &amp;innerSecond
Level 2: NULL
...</pre></div></div>

<p>Which then activates <tt class="docutils literal"><span class="pre">innerFirst</span></tt> at the same static level (1), which saves the pointer that it&#8217;s clobbering in the display array.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">Level 0: &amp;outerStackFrame
Level 1: &amp;innerFirst (encapsulates &amp;innerSecond)
Level 2: NULL
...</pre></div></div>

<p>Then, when <tt class="docutils literal"><span class="pre">innerFirst</span></tt> looks up the static levels for <tt class="docutils literal"><span class="pre">x</span></tt>, it gets the correct value, restoring <tt class="docutils literal"><span class="pre">innerSecond</span></tt> when it&#8217;s done executing in a <tt class="docutils literal"><span class="pre">return</span></tt>-style bytecode (which would be important if there were further function nesting in <tt class="docutils literal"><span class="pre">innerSecond</span></tt>). <a class="footnote-reference" href="#id10" id="id9"><tt>[§]</tt></a></p>
<p>Okay, hopefully I&#8217;ve explained that well enough, because now I get to tell you that we&#8217;ve found this optimization to be fairly useless in SpiderMonkey experimental surveys and we hope to rip it out at some point. The interesting case that we actually care about (flat closures) is discussed in the second to last section.</p>
</div>
</div>
<div class="section" id="free-variable-references">
<h3>Free variable references</h3>
<p>Because JS is a lexically scoped language <a class="footnote-reference" href="#id14" id="id11"><tt>[¶]</tt></a> we can determine which enclosing scope a free variable is defined in. <a class="footnote-reference" href="#id15" id="id12"><tt>[#]</tt></a> If a function&#8217;s free variables only refer to bindings in the global scope, then it doesn&#8217;t need any information from the functions that enclose it. For these functions the set of free variables in nested functions is the null set, so we call it a <em>null closure</em>. Top-level functions are null closures. <a class="footnote-reference" href="#id16" id="id13"><tt>[♠]</tt></a></p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #003366; font-weight: bold;">function</span> cube<span style="color: #009900;">&#40;</span>x<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> x <span style="color: #339933;">*</span> x <span style="color: #339933;">*</span> x<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span> <span style="color: #006600; font-style: italic;">// Null closure - no upvars.</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Free variables are termed <em>upvars</em>, since they are identifiers that refer to variables in higher (enclosing) scopes. At parse time, when we&#8217;re trying to find a declaration to match up with a use, they&#8217;re called <em>unresolved lexical dependencies</em>. Though JavaScript scopes are less volatile &#8212; and, as some will undoubtedly point out, less flexible &#8212; I believe that the name upvar comes from this construct in Tcl, which lets you inject vars into and read vars from arbitrary scopes as determined by the runtime call stack: <a class="footnote-reference" href="#id18" id="id17"><tt>[♥]</tt></a></p>

<div class="wp_syntax"><div class="code"><pre class="tcl" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">set</span> x <span style="color: #ff4500;">7</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">proc</span> most_outer <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span> <span style="color: black;">&#123;</span>
    <span style="color: #ff7700;font-weight:bold;">proc</span> outer <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span> <span style="color: black;">&#123;</span>
        <span style="color: #ff7700;font-weight:bold;">set</span> x <span style="color: #ff4500;">24</span>
        <span style="color: #ff7700;font-weight:bold;">proc</span> upvar_setter <span style="color: #483d8b;">{level}</span> <span style="color: black;">&#123;</span>
            <span style="color: #ff7700;font-weight:bold;">upvar</span> <span style="color: #ff3333;">$level</span> x x
            <span style="color: #ff7700;font-weight:bold;">set</span> x <span style="color: #ff4500;">42</span>
        <span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">proc</span> upvar_printer <span style="color: #483d8b;">{level}</span> <span style="color: black;">&#123;</span>
            <span style="color: #ff7700;font-weight:bold;">upvar</span> <span style="color: #ff3333;">$level</span> x x
            <span style="color: #008000;">puts</span> <span style="color: #ff3333;">$x</span>
        <span style="color: black;">&#125;</span>
        upvar_printer <span style="color: #ff4500;">1</span>
        upvar_setter <span style="color: #ff4500;">1</span>
        upvar_printer <span style="color: #ff4500;">1</span>
        upvar_setter <span style="color: #ff4500;">2</span>
        upvar_printer <span style="color: #ff4500;">2</span>
        upvar_printer <span style="color: #ff4500;">3</span>
        upvar_setter <span style="color: #ff4500;">3</span>
        upvar_printer <span style="color: #ff4500;">3</span>
    <span style="color: black;">&#125;</span>
    outer
<span style="color: black;">&#125;</span>
most_outer <span style="color: #808080; font-style: italic;"># Yields the numbers 24, 42, 42, 7, and 42.</span></pre></div></div>

</div>
<div class="section" id="upvar-redefinitions">
<h3>Upvar redefinitions</h3>
<p>If you know that the upvar is never redefined after the nested function is created, it is effectively <em>immutable</em> &#8212; similar to the effect of Java&#8217;s partial closures in anonymous inner classes via the <tt class="docutils literal"><span class="pre">final</span></tt> keyword. In this case, you can create an optimized closure in a form we call a <em>flat closure</em> &#8212; if, during static analysis, you find that none of the upvars are redefined after the function definition, you can <em>import</em> the upvars into the closure, effectively copying the immutable <tt class="docutils literal"><span class="pre">jsval</span></tt>s into extra function slots.</p>
<p>On the other hand, if variables in enclosing scopes <em>are</em> (re)defined after the function definition (and thus, don&#8217;t appear immutable to the function), a shared environment object has to be created so that nested functions can correctly see when the updates to the <tt class="docutils literal"><span class="pre">jsval</span></tt>s occur. Take the following example:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> outer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> communicationChannel <span style="color: #339933;">=</span> <span style="color: #CC0000;">24</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerGetter<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066; font-weight: bold;">return</span> communicationChannel<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #003366; font-weight: bold;">function</span> innerSetter<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        communicationChannel <span style="color: #339933;">=</span> <span style="color: #CC0000;">42</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #009900;">&#91;</span>innerGetter<span style="color: #339933;">,</span> innerSetter<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

</div>
<div class="section" id="closing-over-references">
<h3>Closing over references</h3>
<p>In this case, <tt class="docutils literal"><span class="pre">outer</span></tt> must create an environment record outside of the stack so that when <tt class="docutils literal"><span class="pre">innerGetter</span></tt> and <tt class="docutils literal"><span class="pre">innerSetter</span></tt> escape on return, they can see both communicate through the upvar. This is the nice encapsulation-effect you can get through closure-by-reference, and is often used in the JS &quot;constructor-pattern&quot;, like so:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> MooCow<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> hasBell <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">false</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">var</span> noise <span style="color: #339933;">=</span> <span style="color: #3366CC;">&quot;Moo.&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">return</span> <span style="color: #009900;">&#123;</span>
        pontificate<span style="color: #339933;">:</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> hasBell<span style="color: #339933;">?</span> noise <span style="color: #339933;">+</span> <span style="color: #3366CC;">&quot; &lt;GONG!&gt;&quot;</span> <span style="color: #339933;">:</span> noise<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
        giveBell<span style="color: #339933;">:</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> hasBell <span style="color: #339933;">=</span> <span style="color: #003366; font-weight: bold;">true</span><span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>It&#8217;s interesting to note that all the languages I work with these days perform closure-by-reference, as opposed to closure-by-value. In constrast, closure-by-value would snapshot all identifiers in the enclosing scope, so immutable types (strings, numbers) would be impossible to change.</p>
<p>Sometimes, closure-by-reference can produce side effects that <a class="reference external" href="http://math.andrej.com/2009/04/09/pythons-lambda-is-broken/">surprise developers</a>, such as:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> surprise<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    funs = <span style="color: black;">&#91;</span><span style="color: #ff7700;font-weight:bold;">lambda</span>: x <span style="color: #66cc66;">**</span> <span style="color: #ff4500;">2</span> <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">assert</span> funs<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">25</span></pre></div></div>

<p>This occurs because <tt class="docutils literal"><span class="pre">x</span></tt> is bound in function-local scope, and all the lambdas close over it by reference. When <tt class="docutils literal"><span class="pre">x</span></tt> is mutated in further iterations of the list comprehension (at least in Python 2.x), the lambdas are closed over the environment record of surprise, and <em>all of them</em> see the last value that <tt class="docutils literal"><span class="pre">x</span></tt> was updated to.</p>
<p>I can sympathize. In fact, I&#8217;ve wrote a program to do so:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">var</span> lambdas <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #003366; font-weight: bold;">var</span> condolences <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">&quot;You're totally right&quot;</span><span style="color: #339933;">,</span>
        <span style="color: #3366CC;">&quot;and I understand what you're coming from, but&quot;</span><span style="color: #339933;">,</span>
        <span style="color: #3366CC;">&quot;this is how closures work nowadays&quot;</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #000066; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">var</span> i <span style="color: #339933;">=</span> <span style="color: #CC0000;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> condolences.<span style="color: #660066;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> condolence <span style="color: #339933;">=</span> condolences<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    lambdas.<span style="color: #660066;">push</span><span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #000066; font-weight: bold;">return</span> condolence<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000066;">print</span><span style="color: #009900;">&#40;</span>lambdas<span style="color: #009900;">&#91;</span><span style="color: #CC0000;">0</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Keep in mind that <tt class="docutils literal"><span class="pre">var</span></tt> delcarations are hoisted to function scope in JS.</p>
<p>I implore you to note that comments will most likely be received while I&#8217;m sober.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>Vomited?</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id5"><tt>[†]</tt></a></td>
<td>Cue complaints about the imperfect lambda abstraction in JavaScript. Dang Ruby kids, go play with your blocks! ;-)</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id7"><tt>[‡]</tt></a></td>
<td>Roughly. Gory details left out for illustrative purposes.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id10" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id9"><tt>[§]</tt></a></td>
<td>There&#8217;s also the case where the display array runs out of space for the array. I believe we emit unoptimized name-lookups in this case, but I don&#8217;t entirely recall.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id14" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id11"><tt>[¶]</tt></a></td>
<td>With a few insidious dynamic scoping constructs thrown in. I&#8217;ll get to that in a later entry.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id15" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id12"><tt>[#]</tt></a></td>
<td>Barring enclosing <tt class="docutils literal"><span class="pre">with</span></tt> statements and injected eval scopes.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id16" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id13"><tt>[♠]</tt></a></td>
<td>Unless they contain an <tt class="docutils literal"><span class="pre">eval</span></tt> or <tt class="docutils literal"><span class="pre">with</span></tt>, in which case we call them &quot;heavyweight&quot; &#8212; though they still don&#8217;t need information from enclosing functions, they must carry a stack of environment records, so they&#8217;re not optimal. I love how many footenotes I make when I talk about the JavaScript language. ;-)</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id18" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id17"><tt>[♥]</tt></a></td>
<td>As a result, it&#8217;s extremely difficult to optimize accesses like these without whole propgram analysis.</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-closure-optimization/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Notes from the JS pit: lofty goals, humble beginnings</title>
		<link>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-lofty-goals-humble-beginnings/</link>
		<comments>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-lofty-goals-humble-beginnings/#comments</comments>
		<pubDate>Sun, 02 May 2010 06:40:14 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Mozilla]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Benchmarks]]></category>
		<category><![CDATA[Compilation]]></category>
		<category><![CDATA[JS Pit]]></category>
		<category><![CDATA[Parsing]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=801</guid>
		<description><![CDATA[I&#8217;ve been working at Mozilla for about two months on the JavaScript (JS) engine team, the members of which sit in an area affectionately known as the &#34;JS pit&#34;. Mozillians appear to try to blog on a regular basis, so I&#8217;ll be starting a series of entries prefixed &#34;notes from the JS pit&#34; to explain [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working at Mozilla for about two months on the JavaScript (JS) engine team, the members of which sit in an area affectionately known as the &quot;JS pit&quot;.</p>
<p>Mozillians appear to try to blog on a regular basis, so I&#8217;ll be starting a series of entries prefixed &quot;notes from the JS pit&quot; to explain what I&#8217;ve been working on and/or thinking about.</p>
<p>Notably, I feel fortunate to work for a company that encourages this kind of openness.</p>
<div class="section" id="goals">
<h3>Goals</h3>
<p>I always feel stupid writing down goals &#8212; they seem so self-evident; however, it helps to put the work I&#8217;m doing into perspective and gives me something that I can to refer back to.</p>
<p>I&#8217;m also a big believer in the effectiveness of public accountability, so publishing those goals seems prudent &#8212; my notes from the JS pit are poised to help me stay motivated more than anything else.</p>
<p>My goals are to:</p>
<ul class="simple">
<li>Do my part to meet and exceed performance targets in the JS engine. It&#8217;s an exciting time with lots of healthy competition to motivate this.</li>
<li>Immerse myself in language design concepts and seek out important implementation details, even if those details don&#8217;t pertain to something that I&#8217;m working on directly. This also entails some independent research and small projects to gain better understanding of foreign language concepts.</li>
<li>Deliberately take time to think constructively about alternative or innovative ways to accomplish our language goals. Taking things at face value as &quot;the way we&#8217;ve always done it&quot; is the way that innovative things get overlooked &#8212; this is relatively easy to do with a pair of fresh eyes, but similarly difficult because you&#8217;re the team noob.</li>
</ul>
<p>Working with compiler engineers from diverse language backgrounds, it&#8217;s prime time for sucking knowledge out of people&#8217;s heads, comparing and contrasting it, and listening to them argue with each other. Heck, just look at the concepts behind JS: an imperative, Scheme-inspired, prototypal, C-and-Java-syntax conforming language that&#8217;s irrevocably tied to a practical platform, the web. It&#8217;s bound to be a fun ride.</p>
</div>
<div class="section" id="from-start-to-present">
<h3>From start to present</h3>
<p>I started off implementing the simpler opcodes for <a class="reference external" href="https://wiki.mozilla.org/JaegerMonkey">JaegerMonkey</a> (JM) and getting an understanding the JM code base. Not too long into it, I was told that looking into quick-and-easy parser optimizations was a priority &#8212; somebody had reported that a significant fraction of the GMail load bar time could be attributed to JS parsing. <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a></p>
<p>Now, JavaScript isn&#8217;t the easiest language in the world to parse; for example, <a class="reference external" href="http://en.wikipedia.org/wiki/JavaScript_syntax#Whitespace_and_semicolons">automatic semicolon insertion</a> creates some non-traditional obstacles for generated shift/reduce parsers <a class="footnote-reference" href="#id4" id="id3"><tt>[†]</tt></a> &#8212; it effectively makes an error correction algorithm part of the normal parse procedure. The details are for another entry, but suffice it to say that our recursive descent parser code gets complex, especially due to our <a class="reference external" href="https://developer.mozilla.org/en/E4X">E4X support</a> and some of the static analyses we perform for optimizations before bytecode emission.</p>
<p>In pursuing JS parser optimization I assembled a suite of parsing benchmarks from sites on the web with &quot;large&quot; JS payloads &#8212; I call this suite <em>parsemark</em>. After getting some speedups from simple inlining, I attempted a somewhat fundamental change to the parser to reduce the number of branch mispredictions, in converting it to always have a token &quot;pre-lexed&quot; as opposed to the prior &quot;lex-on-demand&quot; model. Roughly, this required adding a &quot;there&#8217;s always a lexed token&quot; invariant to the lexer and hoisting lexer calls/modesets from substitution rules into their referring nonterminals in the parser. The details for this are also entry fodder. Sadly, it demonstrated negligible performance gains for the increase in complexity. Sure taught me a lot about our parser, though.</p>
<p>The biggest performance win was obtained through a basic fix to our parser arena-allocation chunk sizing. <a class="reference external" href="http://blog.mozilla.com/rob-sayre/">sayrer</a> noticed that a surprising amount of time was being spent in kernel space, so we tracked the issue down. It was frustrating to work for a few weeks on a fundamental change and then realize that multiplying a constant by four can get you a 20% parsing speedup, but I&#8217;ve certainly learned to look a lot more closely at the vmem subsystem when things are slow. I have some speedup statistics and a comparison to V8 (with all its lazy parsing and parse-caching bits ripped out), but I don&#8217;t have much faith that my environment hasn&#8217;t changed in the course of all the historical data measurements &#8212; writing a script to verify speedup over changesets seems like a viable option for future notes.</p>
<p>In the miscellany department, I&#8217;ve been trying to do a good amount of work <a class="reference external" href="http://www.artima.com/intv/fixit.html">fixing broken windows</a> via cleanup patches. I&#8217;m finding it difficult to strike a balance here, since there&#8217;s a lot of modularity-breaking interdependencies in the code base &#8212; what appear to be simple cleanups tend to unravel into large patches that get stale easily. However, cleanup does force you to read through the code you&#8217;re modifying, which is always good when you&#8217;re learning a new code base.</p>
<p>Looking back on it, it doesn&#8217;t seem like a lot of work; of course, my hope is that the time I spend up-front getting accustomed to the codebase will let me make progress on my goals more rapidly.</p>
<p>Stay tuned for more JS pittage &#8212; unpredictable time, but predictable channel.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>To date, I haven&#8217;t looked into this myself. Ideally, I should have verified it before starting on the parser work, but I was eager to start working on things rather than investigate the reasons behind them.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3"><tt>[†]</tt></a></td>
<td>Though I&#8217;ve seen that Webkit&#8217;s JavaScriptCore uses Bison &#8212; I&#8217;m going to have to check out that implementation at some point.</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/05/notes-from-the-js-pit-lofty-goals-humble-beginnings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Efficiency of list comprehensions</title>
		<link>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/</link>
		<comments>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 19:31:08 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Efficiency]]></category>
		<category><![CDATA[List Comprehensions]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=782</guid>
		<description><![CDATA[I&#8217;m psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I&#8217;ve restructured it for posterity. Efficiency of the more common usage Let&#8217;s look [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m psyched about the awesome comments on my previous entry, <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/">Python by example: list comprehensions</a>. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I&#8217;ve restructured it for posterity.</p>
<div class="section" id="efficiency-of-the-more-common-usage">
<h3>Efficiency of the more common usage</h3>
<p>Let&#8217;s look at the efficiency of list comprehensions in the more common usage, where the comprehension&#8217;s list result is actually relevant (or, in compiler-speak, live-out).</p>
<p>Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">dis</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">inspect</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">timeit</span>
&nbsp;
&nbsp;
programs = <span style="color: #008000;">dict</span><span style="color: black;">&#40;</span>
    loop=<span style="color: #483d8b;">&quot;&quot;&quot;
result = []
for i in range(20):
    result.append(i * 2)
&quot;&quot;&quot;</span>,
   loop_faster=<span style="color: #483d8b;">&quot;&quot;&quot;
result = []
add = result.append
for i in range(20):
    add(i * 2)
&quot;&quot;&quot;</span>,
    comprehension=<span style="color: #483d8b;">'result = [i * 2 for i in range(20)]'</span>,
<span style="color: black;">&#41;</span>
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">for</span> name, text <span style="color: #ff7700;font-weight:bold;">in</span> programs.<span style="color: black;">iteritems</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> name, <span style="color: #dc143c;">timeit</span>.<span style="color: black;">Timer</span><span style="color: black;">&#40;</span>stmt=text<span style="color: black;">&#41;</span>.<span style="color: #dc143c;">timeit</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">code</span> = <span style="color: #008000;">compile</span><span style="color: black;">&#40;</span>text, <span style="color: #483d8b;">'&lt;string&gt;'</span>, <span style="color: #483d8b;">'exec'</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">dis</span>.<span style="color: black;">disassemble</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">code</span><span style="color: black;">&#41;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">loop 11.1495118141
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
&nbsp;
  3           6 SETUP_LOOP              37 (to 46)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (20)
             15 CALL_FUNCTION            1
             18 GET_ITER
        &gt;&gt;   19 FOR_ITER                23 (to 45)
             22 STORE_NAME               2 (i)
&nbsp;
  4          25 LOAD_NAME                0 (result)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 LOAD_CONST               1 (2)
             37 BINARY_MULTIPLY
             38 CALL_FUNCTION            1
             41 POP_TOP
             42 JUMP_ABSOLUTE           19
        &gt;&gt;   45 POP_BLOCK
        &gt;&gt;   46 LOAD_CONST               2 (None)
             49 RETURN_VALUE
loop_faster 8.36096310616
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
&nbsp;
  3           6 LOAD_NAME                0 (result)
              9 LOAD_ATTR                1 (append)
             12 STORE_NAME               2 (add)
&nbsp;
  4          15 SETUP_LOOP              34 (to 52)
             18 LOAD_NAME                3 (range)
             21 LOAD_CONST               0 (20)
             24 CALL_FUNCTION            1
             27 GET_ITER
        &gt;&gt;   28 FOR_ITER                20 (to 51)
             31 STORE_NAME               4 (i)
&nbsp;
  5          34 LOAD_NAME                2 (add)
             37 LOAD_NAME                4 (i)
             40 LOAD_CONST               1 (2)
             43 BINARY_MULTIPLY
             44 CALL_FUNCTION            1
             47 POP_TOP
             48 JUMP_ABSOLUTE           28
        &gt;&gt;   51 POP_BLOCK
        &gt;&gt;   52 LOAD_CONST               2 (None)
             55 RETURN_VALUE
comprehension 7.08145213127
  1           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_NAME               0 (_[1])
              7 LOAD_NAME                1 (range)
             10 LOAD_CONST               0 (20)
             13 CALL_FUNCTION            1
             16 GET_ITER
        &gt;&gt;   17 FOR_ITER                17 (to 37)
             20 STORE_NAME               2 (i)
             23 LOAD_NAME                0 (_[1])
             26 LOAD_NAME                2 (i)
             29 LOAD_CONST               1 (2)
             32 BINARY_MULTIPLY
             33 LIST_APPEND
             34 JUMP_ABSOLUTE           17
        &gt;&gt;   37 DELETE_NAME              0 (_[1])
             40 STORE_NAME               3 (result)
             43 LOAD_CONST               2 (None)
             46 RETURN_VALUE</pre></div></div>

<p>List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized <tt class="docutils literal"><span class="pre">LIST_APPEND</span></tt> bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).</p>
<p>In the <tt class="docutils literal"><span class="pre">loop_faster</span></tt> program, you avoid the overhead of the <tt class="docutils literal"><span class="pre">append</span></tt> attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized <tt class="docutils literal"><span class="pre">LIST_APPEND</span></tt> bytecode instead of incurring the overhead of a function call, so it still trumps.</p>
</div>
<div class="section" id="using-list-comprehensions-for-side-effects">
<h3>Using list comprehensions for side effects</h3>
<p>I want to address a point that was brought up in the <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comments">previous entry</a> as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I&#8217;ll discuss the subjective bit first, since that&#8217;s the least sciency part.</p>
<div class="section" id="readability">
<h4>Readability</h4>
<blockquote>
<p>Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…</p>
<p class="attribution">&mdash;<a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1539">Michael Foord</a></p>
</blockquote>
<p>First of all, thanks to Michael for his excellent and thought provoking comment!</p>
<p>My response is that removing the use of the result does indeed make it less readable, precisely <em>because</em> you&#8217;re using a result-producing control flow construct where the result is not needed. I suppose I&#8217;m positing that it&#8217;s inherently confusing to do that with your syntax: there&#8217;s a looping form that doesn&#8217;t produce a result, so that should be used instead. It&#8217;s expressing your semantic intention via syntax.</p>
<p>For advanced Pythonistas it&#8217;s easy for figure out what&#8217;s going on at a glance, but comprehension-as-loop definitely has a &quot;there&#8217;s more than one way to do it&quot; smell about it, which also makes it less amenable to people learning the language.</p>
<p>With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn&#8217;t require a result they now ask themselves, &quot;Can I fit this into the list comprehension form?&quot; Those mental branches are, to me, what &quot;one way to do it&quot; is designed to avoid. When I read Perl code, I take &quot;mental exceptions&quot; all the time because the author didn&#8217;t use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that &quot;no result needed&quot; should automatically imply a loop construct.</p>
</div>
<div class="section" id="efficiency">
<h4>Efficiency</h4>
<p>Consider two functions, <tt class="docutils literal"><span class="pre">comprehension</span></tt> and <tt class="docutils literal"><span class="pre">loop</span></tt>:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> loop<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    accum = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span>:
        accum.<span style="color: black;">append</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> accum
&nbsp;
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> comprehension<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    accum = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: black;">&#91;</span>accum.<span style="color: black;">append</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">20</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> accum</pre></div></div>

<p>N.B. This example is comparing the efficiency of a list comprehension <strong>where the result of the comprehension is ignored</strong> to a for loop that produces no result, as is discussed in the referenced entry, <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/">Python by example: list comprehensions</a>.</p>
<p><a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1539">Michael Foord</a> comments:</p>
<blockquote><p>
Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.</p></blockquote>
<p>However, the disassembly, obtained via <a class="reference external" href="http://docs.python.org/library/dis.html#dis.dis">dis.dis(func)</a> looks like the following for the loop:</p>

<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
&nbsp;
3           6 SETUP_LOOP              33 (to 42)
            9 LOAD_GLOBAL              0 (range)
           12 LOAD_CONST               1 (20)
           15 CALL_FUNCTION            1
           18 GET_ITER
      &gt;&gt;   19 FOR_ITER                19 (to 41)
           22 STORE_FAST               1 (i)
&nbsp;
4          25 LOAD_FAST                0 (accum)
           28 LOAD_ATTR                1 (append)
           31 LOAD_FAST                1 (i)
           34 CALL_FUNCTION            1
           37 POP_TOP
           38 JUMP_ABSOLUTE           19
      &gt;&gt;   41 POP_BLOCK
&nbsp;
5     &gt;&gt;   42 LOAD_FAST                0 (accum)
           45 RETURN_VALUE</pre></div></div>

<p>And it looks like the following for the comprehension:</p>

<div class="wp_syntax"><div class="code"><pre class="python-disasm" style="font-family:monospace;">2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
&nbsp;
3           6 BUILD_LIST               0
            9 DUP_TOP
           10 STORE_FAST               1 (_[1])
           13 LOAD_GLOBAL              0 (range)
           16 LOAD_CONST               1 (20)
           19 CALL_FUNCTION            1
           22 GET_ITER
      &gt;&gt;   23 FOR_ITER                22 (to 48)
           26 STORE_FAST               2 (i)
           29 LOAD_FAST                1 (_[1])
           32 LOAD_FAST                0 (accum)
           35 LOAD_ATTR                1 (append)
           38 LOAD_FAST                2 (i)
           41 CALL_FUNCTION            1
           44 LIST_APPEND
           45 JUMP_ABSOLUTE           23
      &gt;&gt;   48 DELETE_FAST              1 (_[1])
           51 POP_TOP
&nbsp;
4          52 LOAD_FAST                0 (accum)
           55 RETURN_VALUE</pre></div></div>

<p>By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just &quot;syntactic sugar&quot; for the <tt class="docutils literal"><span class="pre">for</span></tt> loop, as mentioned by <a class="reference external" href="http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comment-1540">nes</a> &#8212; they both lower down into the same control flow construct at a virtual machine level, at least in CPython.</p>
<p>The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead &#8212; it&#8217;s simply deleted in bytecode 48. Unless the <tt class="docutils literal"><span class="pre">POP_BLOCK</span></tt> operation (bytecode 41) of the loop disassembly is very expensive (I haven&#8217;t looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.</p>
<p>Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a <tt class="docutils literal"><span class="pre">for</span></tt> loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special <a class="reference external" href="http://www.cython.org/">Cython</a> construct, because <tt class="docutils literal"><span class="pre">LOAD_GLOBAL</span> <span class="pre">range</span></tt> could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper <a class="reference external" href="http://portal.acm.org/citation.cfm?id=1534530.1534550">The effect of unrolling and inlining for Python bytecode optimizations</a>.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Learning Python by example: list comprehensions</title>
		<link>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/</link>
		<comments>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 16:00:59 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Introductory]]></category>
		<category><![CDATA[List Comprehensions]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=762</guid>
		<description><![CDATA[My friend, who is starting to learn Python 2.x, asked me what this snippet did: def collapse&#40;seq&#41;: # Preserve order. uniq = &#91;&#93; &#91;uniq.append&#40;item&#41; for item in seq if not uniq.count&#40;item&#41;&#93; return uniq This is not a snippet that should be emulated (i.e. it&#8217;s bad); however, it makes me happy: there are so many things [...]]]></description>
			<content:encoded><![CDATA[<p>My friend, who is starting to learn Python 2.x, asked me what this snippet did:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> collapse<span style="color: black;">&#40;</span>seq<span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># Preserve order.</span>
    uniq = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: black;">&#91;</span>uniq.<span style="color: black;">append</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> seq <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> uniq.<span style="color: black;">count</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> uniq</pre></div></div>

<p>This is not a snippet that should be emulated (i.e. <em>it&#8217;s bad</em>); however, it makes me happy: there are so many things that can be informatively corrected!</p>
<div class="section" id="what-is-a-list-comprehension">
<h3>What is a list comprehension?</h3>
<p>A list comprehension is a special brackety syntax to perform a <em>transform</em> operation with an optional <em>filter</em> clause that always produces a new sequence (list) object as a <em>result</em>. To break it down visually, you perform:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">new_range = <span style="color: black;">&#91;</span>i <span style="color: #66cc66;">*</span> i          <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>   <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">2</span> == <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Which corresponds to:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudocode" style="font-family:monospace;">*result*  = [*transform*    *iteration*         *filter*     ]</pre></div></div>

<p>The <em>filter</em> piece answers the question, &quot;should this item be transformed?&quot; If the answer is yes, then the <em>transform</em> piece is evaluated and becomes an element in the <em>result</em>. The <em>iteration</em> <a class="footnote-reference" href="#id2" id="id1"><tt>[*]</tt></a> order is preserved in the <em>result</em>.</p>
<p>Go ahead and figure out what you expect <tt class="docutils literal"><span class="pre">new_range</span></tt> to be in the prior example. You can double check me in the Python shell, but I think it comes out to be:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> new_range = <span style="color: black;">&#91;</span>i <span style="color: #66cc66;">*</span> i <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">if</span> i <span style="color: #66cc66;">%</span> <span style="color: #ff4500;">2</span> == <span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> <span style="color: #ff7700;font-weight:bold;">print</span> new_range
<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">4</span>, <span style="color: #ff4500;">16</span><span style="color: black;">&#93;</span></pre></div></div>

<p>If it still isn&#8217;t clicking, we can try to make the example less noisy by getting rid of the transform and filter &#8212; can you tell what this will produce?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> new_range = <span style="color: black;">&#91;</span>i <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span></pre></div></div>

</div>
<div class="section" id="so-what-s-wrong-with-that-first-snippet">
<h3>So what&#8217;s wrong with that first snippet?</h3>
<p>As we observed in the previous section, a list comprehension always produces a <em>result</em> list, where the elements of the result list are the <em>transformed</em> elements of the <em>iteration</em>. That means, if there&#8217;s no <em>filter</em> piece, there are exactly as many <em>result</em> elements as there were <em>iteration</em> elements.</p>
<p>Weird thing number one about the snippet &#8212; the list comprehension <em>result</em> is unused. It&#8217;s created, mind you &#8212; list comprehension always create a value, even if you don&#8217;t care what it is &#8212; but it just goes off to oblivion. (In technical terms, it becomes <em>garbage</em>.) When you don&#8217;t need the <em>result</em>, just use a <tt class="docutils literal"><span class="pre">for</span></tt> loop! This is better:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> colapse<span style="color: black;">&#40;</span>seq<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Preserve order.&quot;&quot;&quot;</span>
    uniq = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> seq:
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> uniq.<span style="color: black;">count</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span>:
            uniq.<span style="color: black;">append</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> uniq</pre></div></div>

<p>It&#8217;s two more lines, but it&#8217;s less weird looking and wasteful. &quot;Better for everybody who reads <strong>and</strong> runs your code,&quot; means <a class="reference external" href="http://steve-yegge.blogspot.com/2008/09/programmings-dirtiest-little-secret.html">you should do it</a>.</p>
<p>Moral of the story: a list comprehension isn&#8217;t just, &quot;shorthand for a loop.&quot; It&#8217;s shorthand for a transform from an input sequence to an output sequence with an optional filter. If it gets too complex or weird looking, just make a loop.  It&#8217;s not that hard and readers of your code will thank you.</p>
<p>Weird thing number two: the transform, <tt class="docutils literal"><span class="pre">list.append(item)</span></tt>, produces <tt class="docutils literal"><span class="pre">None</span></tt> as its output value, because the return value from <tt class="docutils literal"><span class="pre">list.append</span></tt> is always <tt class="docutils literal"><span class="pre">None</span></tt>. Therefore, the <em>result</em>, even though it isn&#8217;t kept anywhere, is a list of <tt class="docutils literal"><span class="pre">None</span></tt> values of the same length as <tt class="docutils literal"><span class="pre">seq</span></tt> (notice that there&#8217;s no filter clause).</p>
<p>Weird thing number three: <tt class="docutils literal"><span class="pre">list.count(item)</span></tt> iterates over every element in the <tt class="docutils literal"><span class="pre">list</span></tt> looking for things that <tt class="docutils literal"><span class="pre">==</span></tt> to <tt class="docutils literal"><span class="pre">item</span></tt>. If you think through the case where you call <tt class="docutils literal"><span class="pre">collapse</span></tt> on an entirely unique sequence, you can tell that the collapse algorithm is O(n<sup>2</sup>).  In fact, it&#8217;s even worse than it may seem at first glance, because <tt class="docutils literal"><span class="pre">count</span></tt> will keep going all the way to the end of <tt class="docutils literal"><span class="pre">uniq</span></tt>, even if it finds <tt class="docutils literal"><span class="pre">item</span></tt> in the first index of <tt class="docutils literal"><span class="pre">uniq</span></tt>. What the original author really wanted was <tt class="docutils literal"><span class="pre">item</span> <span class="pre">not</span> <span class="pre">in</span> <span class="pre">uniq</span></tt>, which bails out early if it finds <tt class="docutils literal"><span class="pre">item</span></tt> in <tt class="docutils literal"><span class="pre">uniq</span></tt>.</p>
<p>Also worth mentioning for the computer-sciency folk playing along at home: if all elements of the sequence are comparable, you can bring that down to O(n * log n) by using a &quot;shadow&quot; sorted sequence and <a class="reference external" href="http://docs.python.org/library/bisect.html#module-bisect">bisecting</a> to test for membership. If the sequence is hashable you can bring it down to O(n), perhaps by using the <a class="reference external" href="http://docs.python.org/library/stdtypes.html#set">set</a> datatype if you are in Python &gt;= 2.3. Note that the common cases of strings, numbers, and tuples (any built-in immutable datatype, for that matter) are hashable.</p>
</div>
<div class="section" id="from-python-history">
<h3>From Python history</h3>
<p>It&#8217;s interesting to note that <a class="reference external" href="http://www.python.org/dev/peps/pep-0270/">Python Enhancement Proposal (PEP) #270</a> considered putting a <tt class="docutils literal"><span class="pre">uniq</span></tt> function into the language distribution, but withdrew it with the following statement:</p>
<blockquote><p>
Removing duplicate elements from a list is a common task, but there are only two reasons I can see for making it a built-in. The first is if it could be done much faster, which isn&#8217;t the case.  The second is if it makes it significantly easier to write code.  The introduction of <tt class="docutils literal"><span class="pre">sets.py</span></tt> eliminates this situation since creating a sequence without duplicates is just a matter of choosing a different data structure: a set instead of a list.</p></blockquote>
<p>Remember that sets can only contain hashable elements (same policy as dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as mentioned in the last paragraph of the previous section.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id2" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>&quot;Iteration&quot; is just a fancy word for &quot;step through the sequence, element by element, and give that element a name.&quot; In our case we&#8217;re giving the name <tt class="docutils literal"><span class="pre">i</span></tt>.</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/learning-python-by-example-list-comprehensions/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Code ☃ Unicode</title>
		<link>http://blog.cdleary.com/2010/04/code-%e2%98%83-unicode/</link>
		<comments>http://blog.cdleary.com/2010/04/code-%e2%98%83-unicode/#comments</comments>
		<pubDate>Thu, 01 Apr 2010 19:00:36 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Sarcasm]]></category>
		<category><![CDATA[The Web]]></category>
		<category><![CDATA[Cpp]]></category>
		<category><![CDATA[Snowman]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=739</guid>
		<description><![CDATA[Let&#8217;s come to terms: angle brackets and forward slashes are overloaded. Between relational operators, templates, bitwise shift operators, XML tags, (HTML/squiggly brace language) comments, division, regular expressions, and path separators, what don&#8217;t they do? I think it&#8217;s clear to everyone that XML is the best and most human readable markup format ever conceived (for all [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s come to terms: angle brackets and forward slashes are overloaded. Between relational operators, templates, bitwise shift operators, XML tags, (HTML/squiggly brace language) comments, division, regular expressions, and path separators, what don&#8217;t they do?</p>
<p>I think it&#8217;s clear to everyone that XML is the best and most human readable markup format ever conceived (for all data serialization and database backing store applications without exception), so it&#8217;s time for all that crufty old junk from yesteryear to learn its place. Widely adopted web standards (such as <a class="reference external" href="http://en.wikipedia.org/wiki/Binary_XML">Binary XML</a> and <a class="reference external" href="http://en.wikipedia.org/wiki/ECMAScript_for_XML">E4X</a>) and well specified information exchange protocols (such as <a class="reference external" href="http://en.wikipedia.org/wiki/SOAP#Technical_critique">SOAP</a>) speak for themselves through the synergy they&#8217;ve utilized in enterprise compute environments.</p>
<p>The results of a confidential survey I conducted conclusively demonstrate beyond any possibility of refutation that you type more angle brackets in an average markup document than you will type angle-bracket relational operators for the next ten years.</p>
<p>In conclusion, your life expectancy decreases as you continue to use the less-than operator and forward slash instead of accepting XML into your heart as a first-class syntax. I understand that some may not enjoy life or the pursuit of happiness and that they will continue to use deprecated syntaxes. To each their own.</p>
<p>I have contributed a <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=556488">JavaScript parser patch</a> to rectify the situation: the ☃ operator is a heart-warming replacement for the (now XML-exclusive) pointy-on-the-left angle bracket and the commonly seen tilde diaeresis ⍨ replaces slash for delimiting regular expressions. I am confident this patch will achieve swift adoption, as it decreases the context sensitivity of the JavaScript parser, which is a clear and direct benefit for browser end users.</p>
<p>The (intolerably whitespace-sensitive) Python programming language nearly came to a similar conclusion to <a class="reference external" href="http://www.python.org/dev/peps/pep-3117/">use unicode more pervasively</a>, while simultaneously making it a <em>real</em> programming language by way of the use of types, but did not have enough garments to see it through.</p>
<p>Another interesting benefit: because JavaScript files may be UTF-16 encoded, this increases the utilization of bytes in the source text by filling the upper octets with non-zero values. This, in the aggregate, will increase the meaningful bandwidth utilization of the Internet as a whole.</p>
<p>Of course, I&#8217;d also recommend that C++ solve its nested template delimiter issue with ☃ and ☼ to close instead of increasing the context-sensitivity of the parser. <a class="footnote-reference" href="#id7" id="id6"><tt>[*]</tt></a> It clearly follows the logical flow of start/end delimiting.</p>
<p>As soon as <a class="reference external" href="http://sites.google.com/site/unicodesymbols/Home/emoji-symbols">Emoji are accepted as proper unicode code points</a>, I will revise my recommendation to suggest using the standard <a class="reference external" href="http://gmailblog.blogspot.com/2009/04/new-in-labs-extra-emoticons.html">poo</a> emoticon for a template start delimiter, because <a class="reference external" href="http://colloquy.info/project/ticket/1411">increased giggling</a> is demonstrated to reduce the likelihood of head-and-wall involved injuries during C++ compilation, second only to regular use of head protection while programming.</p>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id6"><tt>[*]</tt></a></td>
<td>Which provides a direct detriment to the end user &#8212; optimizing compilers spend most of their time in the parser.</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/04/code-%e2%98%83-unicode/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Two postfix operations redux: sequence points</title>
		<link>http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/</link>
		<comments>http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 19:00:43 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[C]]></category>
		<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Language Lawyering]]></category>
		<category><![CDATA[Sequence Points]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=709</guid>
		<description><![CDATA[Describes sequence points and the postincrement operator in C, contrasts with the Java language, and notes the expressions in Python that unfortunately look like preincrement/predecrement.]]></description>
			<content:encoded><![CDATA[<p>Get ready for some serious language lawyering.</p>
<p>I was going back and converting my old entries to <a class="reference external" href="http://en.wikipedia.org/wiki/ReStructuredText">reStructuredText</a> when I found <a class="reference external" href="http://blog.cdleary.com/2007/09/two-postfix-operations-in-a-single-statement-in-gcc/">an entry in which I was wrong</a>! (Shocking, I know.)</p>
<div class="section" id="c">
<h3>C</h3>
<p>Stupid old me didn&#8217;t know about sequence points back in 2007: the effects of the <tt class="docutils literal"><span class="pre">++</span></tt> operator in the C expression <tt class="docutils literal"><span class="pre">i++</span> <span class="pre">*</span> <span class="pre">i++</span></tt> are in an indeterminate state of side-effect completion until one of the language-defined <a class="reference external" href="http://en.wikipedia.org/wiki/Sequence_point">sequence points</a> is encountered (i.e. a semicolon or function invocation).</p>
<p>From the C99 standard 6.5.4.2 item 2 regarding the postfix increment and decrement operators:</p>
<blockquote><p>
The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.</p></blockquote>
<p>Therefore, the compiler is totally at liberty to interpret that expression as:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">mov</span> lhs_result<span style="color: #339933;">,</span> i     <span style="color: #666666; font-style: italic;">; Copy the values of the postincrement evaluation.</span>
<span style="color: #00007f; font-weight: bold;">mov</span> rhs_result<span style="color: #339933;">,</span> i     <span style="color: #666666; font-style: italic;">; (Which is the original value of i.)</span>
<span style="color: #00007f; font-weight: bold;">mul</span> result<span style="color: #339933;">,</span> lhs_result<span style="color: #339933;">,</span> rhs_result
<span style="color: #00007f; font-weight: bold;">add</span> i<span style="color: #339933;">,</span> lhs_result<span style="color: #339933;">,</span> <span style="color: #0000ff;">1</span>
<span style="color: #00007f; font-weight: bold;">add</span> i<span style="color: #339933;">,</span> rhs_result<span style="color: #339933;">,</span> <span style="color: #0000ff;">1</span>  <span style="color: #666666; font-style: italic;">; Second increment clobbers with the same value!</span></pre></div></div>

<p>This results in the same result as the GCC compilation in the referenced entry: <tt class="docutils literal"><span class="pre">i</span></tt> is <tt class="docutils literal"><span class="pre">12</span></tt> and the result is <tt class="docutils literal"><span class="pre">121</span></tt>.</p>
<p>As I mentioned before, the reason this can occur is that nothing in the syntax <em>forces</em> the first postincrement to be evaluated before the second one. To give an analogy to concurrency constructs: you have a kind of compile-time &quot;race condition&quot; in your syntax between the two postincrements that could be solved with a sequence point &quot;barrier&quot;. <a class="footnote-reference" href="#id3" id="id2"><tt>[*]</tt></a></p>
<p>In this assembly, those <tt class="docutils literal"><span class="pre">add</span></tt>s can float anywhere they like after their corresponding <tt class="docutils literal"><span class="pre">mov</span></tt> instruction and can operate directly on <tt class="docutils literal"><span class="pre">i</span></tt> instead of the temporary if they&#8217;d prefer. Here&#8217;s an possible sequence that results in a value of <tt class="docutils literal"><span class="pre">132</span></tt> and <tt class="docutils literal"><span class="pre">i</span></tt> as <tt class="docutils literal"><span class="pre">13</span></tt>.</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">mov</span> lhs_result<span style="color: #339933;">,</span> i <span style="color: #666666; font-style: italic;">; Gets the original 11.</span>
<span style="color: #00007f; font-weight: bold;">inc</span> i             <span style="color: #666666; font-style: italic;">; Increment in-place after the start value is copied.</span>
<span style="color: #00007f; font-weight: bold;">mov</span> rhs_result<span style="color: #339933;">,</span> i <span style="color: #666666; font-style: italic;">; Gets the new value 12.</span>
<span style="color: #00007f; font-weight: bold;">inc</span> i             <span style="color: #666666; font-style: italic;">; Increment occurs in-place again, making 13.</span>
<span style="color: #00007f; font-weight: bold;">mul</span> result<span style="color: #339933;">,</span> lhs_result<span style="color: #339933;">,</span> rhs_result</pre></div></div>

<p>Even if you know what you&#8217;re doing, mixing two postfix operations, or <em>any</em> side effect, using the less obvious sequence points (like function invocation) is dangerous and easy to get wrong. Clearly it is not a best practice. <a class="footnote-reference" href="#id5" id="id4"><tt>[†]</tt></a></p>
</div>
<div class="section" id="java">
<h3>Java</h3>
<p>The postincrement operation appears to have sequence-point-like semantics in the Java language <a class="reference external" href="http://stackoverflow.com/questions/654715/in-java-how-does-a-post-increment-operator-act-in-a-return-statement/654735#654735">through experimentation</a>, and it does! From the Java language specification (page 416):</p>
<blockquote><p>
The Java programming language also guarantees that every operand of an operator (except the conditional operators &amp;&amp;, ||, and ? :) appears to be fully evaluated before any part of the operation itself is performed.</p></blockquote>
<p>Which combines with the definition of the postfix increment expression (page 485):</p>
<blockquote><p>
A postfix expression followed by a ++ operator is a postfix increment expression.</p></blockquote>
<p>As well as left-to-right expression evaluation (page 415):</p>
<blockquote><p>
The left-hand operand of a binary operator appears to be fully evaluated before any part of the right-hand operand is evaluated.</p></blockquote>
<p>To a definitive conclusion that <tt class="docutils literal"><span class="pre">i++</span> <span class="pre">*</span> <span class="pre">i++</span></tt> will always result in <tt class="docutils literal"><span class="pre">132</span> <span class="pre">==</span> <span class="pre">11</span> <span class="pre">*</span> <span class="pre">12</span></tt> and <tt class="docutils literal"><span class="pre">i</span> <span class="pre">==</span> <span class="pre">13</span></tt> when <tt class="docutils literal"><span class="pre">i</span> <span class="pre">==</span> <span class="pre">11</span></tt> to start.</p>
</div>
<div class="section" id="python">
<h3>Python</h3>
<p>Python has no increment operators specifically so you don&#8217;t have to deal with this kind of nonsense.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> count = <span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> count++
  File <span style="color: #483d8b;">&quot;&lt;stdin&gt;&quot;</span>, line <span style="color: #ff4500;">1</span>
    count++
          ^
<span style="color: #008000;">SyntaxError</span>: invalid syntax</pre></div></div>

<p>Annoyingly for newbies, though, it looks like <tt class="docutils literal"><span class="pre">++count</span></tt> is a valid expression that <a class="reference external" href="http://stackoverflow.com/questions/1485841/python-behaviour-of-increment-and-decrement-operators/1485854#1485854">happens to look like preincrement</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> count = <span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> ++count
<span style="color: #ff4500;">0</span>
<span style="color: #66cc66;">&gt;&gt;&gt;</span> --count
<span style="color: #ff4500;">0</span></pre></div></div>

<p>They&#8217;re actually two unary positive and negative operators, respectively. Just one of the hazards of a context free grammar, I suppose.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2"><tt>[*]</tt></a></td>
<td>I threw this in because the ordeal reminds me of the classic bank account concurrency problem. If it&#8217;s more confusing than descriptive, please ignore it. :-)</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4"><tt>[†]</tt></a></td>
<td>
<p class="first">Since function invocation defines sequence points, I <em>thought</em> this code sequence guaranteed those results:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &lt;stdio.h&gt;</span>
&nbsp;
<span style="color: #993333;">int</span> identity<span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> value<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> <span style="color: #b1b100;">return</span> value<span style="color: #339933;">;</span> <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #993333;">int</span> i <span style="color: #339933;">=</span> <span style="color: #0000dd;">11</span><span style="color: #339933;">;</span>
        <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;%d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> identity<span style="color: #009900;">&#40;</span>i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">*</span> identity<span style="color: #009900;">&#40;</span>i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;%d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> i<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>As <a class="reference external" href="http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/comment-page-1/#comment-1393">Dan points out</a>, the order of evaluation is totally unspecified &#8212; the left hand and right hand subexpression can potentially be evaluated <em>concurrently</em>.</p>
</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2010/01/two-postfix-operations-redux-sequence-points/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Inedible vectors of spam: learning non-reified generics by example</title>
		<link>http://blog.cdleary.com/2009/12/inedible-vectors-of-spam-learning-non-reified-generics-by-example/</link>
		<comments>http://blog.cdleary.com/2009/12/inedible-vectors-of-spam-learning-non-reified-generics-by-example/#comments</comments>
		<pubDate>Fri, 04 Dec 2009 17:00:14 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Dynamism]]></category>
		<category><![CDATA[Introductory]]></category>
		<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=677</guid>
		<description><![CDATA[In the Java generic system, the collections are represented by two separate, yet equally important concepts -- the *compile-time* generic parameters and the *run-time* casts who check the collection members. These are their stories.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing with the new Java features, only having done minor projects in it since 1.4, and there have been a lot of nice improvements!  One thing that made me do a double take, however, was a run-in with non-reified types in Java generics. Luckily, <a class="reference external" href="http://twitter.com/pvirdone">one of my Java-head friends</a> was online and beat me with a <a class="reference external" href="http://www.kwanumzen.org/pzc/oldnewsletter/v03n02-1975-february-dssn-buddhasenlightenmentdayspeech.html">stick of enlightenment</a> until I understood what was going on.</p>
<p>In the Java generic system, the collections are represented by two separate, yet equally important concepts &#8212; the <em>compile-time</em> generic parameters and the <em>run-time</em> casts who check the collection members. These are their stories.</p>
<p><a class="reference external" href="http://en.wikipedia.org/wiki/Law_%26_Order">Dun, dun!</a></p>
<div class="section" id="an-example-worth-a-thousand-lame-intros">
<h3>An example: worth a thousand lame intros</h3>
<p>The following code is a distilled representation of the situation I encountered:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Arrays</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.List</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> BadCast <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">interface</span> Edible <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> Spam <span style="color: #000000; font-weight: bold;">implements</span> Edible <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    List<span style="color: #339933;">&lt;</span>Spam<span style="color: #339933;">&gt;</span> canSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #003399;">Arrays</span>.<span style="color: #006633;">asList</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Spam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Spam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Spam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * @note Return type *must* be List&lt;Editable&gt; (because we intend to
     *       implement an interface that requires it).
     */</span>
    List<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span> castSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #009900;">&#40;</span>List<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#41;</span> canSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>It produced the following error in my IDE:</p>
<blockquote><p>
Cannot cast from <tt class="docutils literal"><span class="pre">List&lt;BadCast.Spam&gt;</span></tt> to <tt class="docutils literal"><span class="pre">List&lt;BadCast.Edible&gt;</span></tt></p></blockquote>
<p>At which point I scratched my head and thought, &quot;If all <tt class="docutils literal"><span class="pre">Spam</span></tt>s are <tt class="docutils literal"><span class="pre">Edible</span></tt>, <a class="footnote-reference" href="#id5" id="id4"><tt>[*]</tt></a> why won&#8217;t it let me cast <tt class="docutils literal"><span class="pre">List&lt;Spam&gt;</span></tt> to <tt class="docutils literal"><span class="pre">List&lt;Edible&gt;</span></tt>? This seems silly.&quot;</p>
</div>
<div class="section" id="potential-for-error">
<h3>Potential for error</h3>
<p>A slightly expanded example points out where that simple view goes wrong: <a class="footnote-reference" href="#id7" id="id6"><tt>[†]</tt></a></p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Arrays</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.List</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Vector</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> GenericFun <span style="color: #000000; font-weight: bold;">implements</span> <span style="color: #003399;">Runnable</span> <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">interface</span> Edible <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> Spam <span style="color: #000000; font-weight: bold;">implements</span> Edible <span style="color: #009900;">&#123;</span>
&nbsp;
        <span style="color: #000066; font-weight: bold;">void</span> decompose<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    List<span style="color: #339933;">&lt;</span>Spam<span style="color: #339933;">&gt;</span> canSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #003399;">Arrays</span>.<span style="color: #006633;">asList</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Spam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Spam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000000; font-weight: bold;">new</span> Spam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Loves to stick his apples into things.
     */</span>
    <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> JohnnyAppleseed <span style="color: #009900;">&#123;</span>
&nbsp;
        <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">class</span> Apple <span style="color: #000000; font-weight: bold;">implements</span> Edible <span style="color: #009900;">&#123;</span><span style="color: #009900;">&#125;</span>
&nbsp;
        JohnnyAppleseed<span style="color: #009900;">&#40;</span>List<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span> edibles<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            edibles.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Apple<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #009900;">&#125;</span>
&nbsp;
    @Override
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> run<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        List<span style="color: #339933;">&lt;</span>Spam<span style="color: #339933;">&gt;</span> spams <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Vector<span style="color: #339933;">&lt;</span>Spam<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span>canSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        List<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span> edibles <span style="color: #339933;">=</span> <span style="color: #009900;">&#40;</span>List<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#41;</span> spams<span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">new</span> JohnnyAppleseed<span style="color: #009900;">&#40;</span>edibles<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// He puts his apple in our spams!</span>
        <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span>Spam s <span style="color: #339933;">:</span> spams<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            s.<span style="color: #006633;">decompose</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// What does this do when it gets to the apple!?</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>We make a (mutable) collection of spams, but this time, unlike in the previous example, <em>we keep a reference to that collection</em>. Then, when we give it to <tt class="docutils literal"><span class="pre">JohnnyAppleseed</span></tt>, he sticks a damn <tt class="docutils literal"><span class="pre">Apple</span></tt> in there, invalidating the supposed type of <tt class="docutils literal"><span class="pre">spams</span></tt>!  (If you still don&#8217;t see it, note that the object referenced by spams is aliased to edibles.) Then, when we invoke the <tt class="docutils literal"><span class="pre">decompose</span></tt> method on the <tt class="docutils literal"><span class="pre">Apple</span></tt> that is confused with a <tt class="docutils literal"><span class="pre">Spam</span></tt>, what could possibly happen?!</p>
<p><center><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/bpjXFF0zIJY&#038;hl=en_US&#038;fs=1&#038;start=22"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/bpjXFF0zIJY&#038;hl=en_US&#038;fs=1&#038;start=22" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></center></div>
<div class="section" id="the-red-pill-there-is-no-runtime-generic-type-parameterization">
<h3>The red pill: there is no runtime-generic-type-parameterization!</h3>
<p>Though the above code won&#8217;t compile, this kind of thing actually <em>is</em> possible, and it&#8217;s where the implementation of generics starts to <a class="reference external" href="http://en.wikipedia.org/wiki/Leaky_abstraction">leak through the abstraction</a>.  To quote <a class="reference external" href="http://gafter.blogspot.com/2006/11/reified-generics-for-java.html">Neal Gafter</a>:</p>
<blockquote>
<p>Many people are unsatisfied with the restrictions caused by the way generics are implemented in Java. Specifically, they are unhappy that generic type parameters are not reified: they are not available at runtime. Generics are implemented using erasure, in which generic type parameters are simply removed at runtime. That doesn&#8217;t render generics useless, because you get typechecking at compile-time based on the generic type parameters, and also because the compiler inserts casts in the code (so that you don&#8217;t have to) based on the type parameters.</p>
<p>&#8230;</p>
<p>The implementation of generics using erasure also causes Java to have unchecked operations, which are operations that would normally check something at runtime but can&#8217;t do so because not enough information is available. For example, a cast to the type List&lt;String&gt; is an unchecked cast, because the generated code checks that the object is a List but doesn&#8217;t check whether it is the right kind of list.</p>
</blockquote>
<p>At runtime, <tt class="docutils literal"><span class="pre">List&lt;Edible&gt;</span></tt> is no different from <tt class="docutils literal"><span class="pre">List</span></tt>. At <em>compile</em>-time, however, a <tt class="docutils literal"><span class="pre">List&lt;Edible&gt;</span></tt> cannot be cast to from <tt class="docutils literal"><span class="pre">List&lt;Spam&gt;</span></tt>, because it knows what evil things you could then do (like sticking <tt class="docutils literal"><span class="pre">Apple</span></tt>s in there).</p>
<p>But if you <em>did</em> stick an <tt class="docutils literal"><span class="pre">Apple</span></tt> in there (like I told you that you <em>can</em> actually do, with evidence to follow shortly), you wouldn&#8217;t know anything was wrong until you tried to <em>use</em> it like a <tt class="docutils literal"><span class="pre">Spam</span></tt>. This is a clear violation of the &quot;error out early&quot; policy that allows you to localize your debugging. <a class="footnote-reference" href="#id10" id="id8"><tt>[‡]</tt></a></p>
<p>In what way does the program error out when you try to use the masquerading <tt class="docutils literal"><span class="pre">Apple</span></tt> like a <tt class="docutils literal"><span class="pre">Spam</span></tt>?  Well, when you write:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span>Spam s <span style="color: #339933;">:</span> spams<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    s.<span style="color: #006633;">decompose</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">// What does this do when it gets to the apple!?</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The code the compiler actually generates is:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #003399;">Object</span> s <span style="color: #339933;">:</span> spams<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>Spam<span style="color: #009900;">&#41;</span>s<span style="color: #009900;">&#41;</span>.<span style="color: #006633;">decompose</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>At which point it&#8217;s clear what will happen to the <tt class="docutils literal"><span class="pre">Apple</span></tt> instance &#8212; a <tt class="docutils literal"><span class="pre">ClassCastException</span></tt>, because it&#8217;s not a <tt class="docutils literal"><span class="pre">Spam</span></tt>!</p>

<div class="wp_syntax"><div class="code"><pre class="tty" style="font-family:monospace;">Exception in thread &quot;main&quot; java.lang.ClassCastException: GenericFun$JohnnyAppleseed$Apple cannot be cast to GenericFun$Spam
        at GenericFun.run(GenericFun.java:36)</pre></div></div>

</div>
<div class="section" id="backpedaling">
<h3>Backpedaling</h3>
<p>Okay, so in the first example we didn&#8217;t keep a reference to the <tt class="docutils literal"><span class="pre">List</span></tt> around, making it acceptable (but bad style) to perform an <em>unchecked cast</em>:</p>
<p>Since, under the hood, the generic type parameters are erased, there&#8217;s no runtime difference between <tt class="docutils literal"><span class="pre">List&lt;Edible&gt;</span></tt> and plain ol&#8217; <tt class="docutils literal"><span class="pre">List</span></tt>. If we just cast to <tt class="docutils literal"><span class="pre">List</span></tt>, it will give us a warning:</p>
<blockquote><p>
Type safety: The expression of type <tt class="docutils literal"><span class="pre">List</span></tt> needs unchecked conversion to conform to <tt class="docutils literal"><span class="pre">List&lt;BadCast.Edible&gt;</span></tt></p></blockquote>
<p>The real solution, though, is to <strong>just make an unnecessary &quot;defensive copy&quot; when you cross this function boundary</strong>; i.e.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">List<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span> castSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000000; font-weight: bold;">new</span> Vector<span style="color: #339933;">&lt;</span>Edible<span style="color: #339933;">&gt;</span><span style="color: #009900;">&#40;</span>canSomeSpam<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4"><tt>[*]</tt></a></td>
<td>Obviously a point of contention among ham connoisseurs.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id6"><tt>[†]</tt></a></td>
<td>
<p class="first">This doesn&#8217;t compile, because we&#8217;re imagining that the cast were possible. Compilers don&#8217;t respond well when you ask them to imagine things:</p>

<div class="wp_syntax"><div class="code"><pre class="tty" style="font-family:monospace;">$ javac 'Imagine you could cast List&lt;Spam&gt; to List&lt;Edible&gt;!'
javac: invalid flag: Imagine you could cast List&lt;Spam&gt; to List&lt;Edible&gt;!
Usage: javac &lt;options&gt; &lt;source files&gt;
use -help for a list of possible options</pre></div></div>

</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id10" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id8"><tt>[‡]</tt></a></td>
<td>Note that if you <em>must</em> do something like this, you can use a <a class="reference external" href="http://java.sun.com/javase/6/docs/api/java/util/Collections.html#checkedList(java.util.List,%20java.lang.Class)">Collections.checkedList</a> to get the early detection. Still, the client is going to be pissed that they tried to put their delicious <tt class="docutils literal"><span class="pre">Ham</span></tt> in there and got an unexpected <tt class="docutils literal"><span class="pre">ClassCastException</span></tt> &#8212; probably best to use <a class="reference external" href="http://java.sun.com/javase/6/docs/api/java/util/Collections.html#unmodifiableList%28java.util.List%29">Collections.unmodifiableList</a> if the reference ownership isn&#8217;t fully transferred.</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/12/inedible-vectors-of-spam-learning-non-reified-generics-by-example/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts on programming language fluency</title>
		<link>http://blog.cdleary.com/2009/11/thoughts-on-programming-language-fluency/</link>
		<comments>http://blog.cdleary.com/2009/11/thoughts-on-programming-language-fluency/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 19:57:09 +0000</pubDate>
		<dc:creator>cdleary</dc:creator>
				<category><![CDATA[Languages]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Code Reviews]]></category>
		<category><![CDATA[Efficiency]]></category>
		<category><![CDATA[Experience]]></category>
		<category><![CDATA[Interviews]]></category>
		<category><![CDATA[Short-ish]]></category>

		<guid isPermaLink="false">http://blog.cdleary.com/?p=661</guid>
		<description><![CDATA[I noticed that Effective Java&#8216;s foreword is written by Guy Steele, so I actually bothered to read it. Here&#8217;s the bit I found particularly intriguing: If you have ever studied a second language yourself and then tried to use it outside the classroom, you know that there are three things you must master: how the [...]]]></description>
			<content:encoded><![CDATA[<p>I noticed that <a class="reference external" href="http://www.amazon.com/Effective-Java-2nd-Joshua-Bloch/dp/0321356683">Effective Java</a>&#8216;s foreword is written by <a class="reference external" href="http://en.wikipedia.org/wiki/Guy_Steele">Guy Steele</a>, so I actually bothered to read it. Here&#8217;s the bit I found particularly intriguing:</p>
<blockquote><p>
If you have ever studied a second language yourself and then tried to use it outside the classroom, you know that there are three things you must master: how the language is structured (grammar), how to name things you want to talk about (vocabulary), and the customary and effective ways to say everyday things (usage).</p></blockquote>
<p>When programmers enter the job market, the idea that, <strong>&quot;We have the capability to learn any programming language,&quot;</strong> gets thrown around a lot. I now realize that this sentiment is irrelevant in many cases, because the deciding factor in the hiring process is more often <strong>time to fluency</strong>.</p>
<div class="section" id="time-to-fluency-as-a-hiring-factor">
<h3>Time to fluency as a hiring factor</h3>
<p>Let&#8217;s say that there are two candidates, Fry and Laurie, interviewing for a programming position using Haskell. <a class="footnote-reference" href="#id3" id="id1"><tt>[*]</tt></a>  Fry comes off as very intelligent during the interview process, but has only used OCaml and sounds like he <tt class="docutils literal"><span class="pre">mutable</span></tt>d all of the stuff that would make your <a class="reference external" href="http://www.mit.edu/~mkgray/head-explode.html">head explode</a> using <a class="reference external" href="http://en.wikipedia.org/wiki/Monad_%28functional_programming%29">monads</a>.  Laurie, on the other hand, couldn&#8217;t figure out how many ping pong balls fit into Air Force One or why manhole covers are round, <a class="footnote-reference" href="#id4" id="id2"><tt>[†]</tt></a> but is clearly fluent in Haskell. Which one gets hired?</p>
<p>The answer to this question is another question: <strong>When are they required to be pumping out production-quality code?</strong></p>
<p>Even working all hours of the day, the time to fluency for a language is on the order of weeks, independent of other scary new-workplace factors. Although books like <em>Effective *</em> can get you on the right track, fluency is ultimately attained through experience. Insofar as programming is a perpetual decision of what to make flexible and what to hard-code, you must spend time in the hot seat to gain necessary intuition &#8212; each language&#8217;s unique characteristics change the nature of the game.</p>
<p><strong>Everybody wants to hire Fry; however, Laurie will end up with the job due to time constraints on the part of the hiring manager.</strong> I&#8217;m pretty sure that <a class="reference external" href="http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html">Joel&#8217;s interview notions</a> are over-idealized in the general case:</p>
<blockquote><p>
Anyway, software teams want to hire people with aptitude, not a particular skill set. Any skill set that people can bring to the job will be technologically obsolete in a couple of years, anyway, so it’s better to hire people that are going to be able to learn any new technology rather than people who happen to know how to make JDBC talk to a MySQL database right this minute.</p></blockquote>
<p>Reqs have to be filled so that the trains run on time &#8212; it&#8217;s hard to let <em>real, here-and-now</em> schedules slip to avoid <em>hypothetical, three-years-later</em> slip.</p>
</div>
<div class="section" id="extreme-programming-as-catalyst">
<h3>Extreme Programming as catalyst</h3>
<p>You remember that scene from <em>The Matrix</em> where Neo gets all the Kung Fu downloaded into his brain in a matter of seconds? That whole process is <em>nearly</em> as awesome as code reviews.</p>
<p><center><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/6vMO3XmNXe4&#038;color1=0xb1b1b1&#038;color2=0xcfcfcf&#038;hl=en_US&#038;feature=player_embedded&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowScriptAccess" value="always"></param><embed src="http://www.youtube.com/v/6vMO3XmNXe4&#038;color1=0xb1b1b1&#038;color2=0xcfcfcf&#038;hl=en_US&#038;feature=player_embedded&#038;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" allowScriptAccess="always" width="425" height="344"></embed></object></center>
<p>Pair programming and code reviews:</p>
<ul class="simple">
<li>Trick your brain into learning everything faster through mild stress and the threat of looking noobish in your colleagues&#8217; eyes.</li>
<li>Give you the shoulders of language-fluent programmers to stand on as they push you in the right direction.</li>
<li>Back off in accordance with your fluency acquisition.</li>
</ul>
<p>This is totally speculative, but from my experience I&#8217;d be willing to believe you can reduce the minimum-time-to-fluency by an order of magnitude with the right (read: friendly and supportive) Extreme Programming environment.</p>
</div>
<div class="section" id="footnotes">
<h3>Footnotes</h3>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1"><tt>[*]</tt></a></td>
<td>You know it&#8217;s a hypothetical because it&#8217;s a Haskell position. Bzinga!</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2"><tt>[†]</tt></a></td>
<td>The point is that Fry has the high ground in terms of perceived aptitude. I actually think most of the <a class="reference external" href="http://www.amazon.com/Would-Move-Mount-Fuji-ebook/dp/B000Q67H6I/">Mount Fuji</a> questions are nearly useless in determining aptitude, though I do enjoy them. The referenced sentence is a poor attempt at a joke. ;-)</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.cdleary.com/2009/11/thoughts-on-programming-language-fluency/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
