May 11, 2010

Notes from the JS pit: closure optimization

In anticipation of a much-delayed dentist appointment tomorrow morning and under the assumption that hard liquor removes plaque, I've produced [*] an entry in the spirit of Stevey's Drunken Blog Rants, s/wine/scotch/g. I apologize for any and all incomprehensibility, although Stevey may not mind since it's largely an entry about funargs, which he seems to have a thing for. (Not that I blame him — I'm thinking about them while drinking...) It also appears I may need to prove myself worthy of emigration to planet Mozilla, so hopefully an entry filled with funarg debauchery will serve that purpose as well.


Lately, I've been doing a little work on closure optimization, as permitted by static analysis; i.e. the parser/compiler marks which functions can be optimized into various closure forms.

In a language that permits nested functions and functions as first-class values, there are a few things you need to ask about each function before you optimize it:

Function escape (the funarg problem)

If a function can execute outside the scope in which it was lexically defined, it is said to be a "funarg", a fancy word for "potentially escaping outside the scope where it's defined". We call certain functions in the JS runtime Algol-like closures if they are immediately applied function expressions, like so:

function outer() {
    var x = 12;
    return (function cubeX() { return x * x * x; })();

The function cubeX can never execute outside the confines of outer — there's no way for the function definition to escape. It's as if you just took the expression x * x * x, wrapped it in a lambda (function expression), and immediately executed that expression. [†]

Apparently a lot of Algol programmers had the hots for this kinda thing — the whole function-wrapping thing was totally optional, but you chose to do it, Algol programmers, and we respect your choice.

You can optimize this case through static analysis. As long as there's no possibility of escape between a declaration and its use in a nested function, the nested function knows exactly how far to reach up the stack to retrieve/manipulate the variable — the activation record stack is totally determined at compile time. Because there's no escaping, there's not even any need to import the upvar into the Algol-like function.

Dijkstra's display optimization

To optimize this Algol-like closure case we used a construct called a "Dijkstra display" (or something named along those lines). You just keep an array of stack frame pointers, with each array slot representing the frame currently executing at that function nesting level. When outer is called in the above, outer's stack frame pointer would be placed in the display array at nesting level 0, so the array would look like:

Level 0: &outerStackFrame
Level 1: NULL
Level 2: NULL

Then, when cubeX is invoked, it is placed at nesting level 1:

Level 0: &outerStackFrame
Level 1: &cubeX
Level 2: NULL

At parse time, we tell cubeX that it can reach up to level 0, frame slot 0 to retrieve the jsval for x. [‡] Even if you have "parent" frame references in each stack frame, this array really helps when a function is reaching up many levels to retrieve an upvar, since you can do a single array lookup instead of an n link parent chain traversal. Note that this is only useful when you know the upvar-referring functions will never escape, because the display can only track stack frames for functions that are currently executing.

There's also the possibility that two functions at the same nesting level are executing simultaneously; i.e.

function outer() {
    var x = 24;
    function innerFirst() { return x; }
    function innerSecond() {
        var x = 42;
        return innerFirst();
    return innerSecond();

To deal with this case, each stack frame has a pointer to the "chained" display stack frame for that nesting level, which is restored when the executing function returns. To go through the motions:

Level 0: &outerStackFrame
Level 1: &innerSecond
Level 2: NULL

Which then activates innerFirst at the same static level (1), which saves the pointer that it's clobbering in the display array.

Level 0: &outerStackFrame
Level 1: &innerFirst (encapsulates &innerSecond)
Level 2: NULL

Then, when innerFirst looks up the static levels for x, it gets the correct value, restoring innerSecond when it's done executing in a return-style bytecode (which would be important if there were further function nesting in innerSecond). [§]

Okay, hopefully I've explained that well enough, because now I get to tell you that we've found this optimization to be fairly useless in SpiderMonkey experimental surveys and we hope to rip it out at some point. The interesting case that we actually care about (flat closures) is discussed in the second to last section.

Free variable references

Because JS is a lexically scoped language [¶] we can determine which enclosing scope a free variable is defined in. [#] If a function's free variables only refer to bindings in the global scope, then it doesn't need any information from the functions that enclose it. For these functions the set of free variables in nested functions is the null set, so we call it a null closure. Top-level functions are null closures. [♠]

function outer() {
    return function cube(x) { return x * x * x; }; // Null closure - no upvars.

Free variables are termed upvars, since they are identifiers that refer to variables in higher (enclosing) scopes. At parse time, when we're trying to find a declaration to match up with a use, they're called unresolved lexical dependencies. Though JavaScript scopes are less volatile — and, as some will undoubtedly point out, less flexible — I believe that the name upvar comes from this construct in Tcl, which lets you inject vars into and read vars from arbitrary scopes as determined by the runtime call stack: [♥]

set x 7

proc most_outer {} {
    proc outer {} {
        set x 24
        proc upvar_setter {level} {
            upvar $level x x
            set x 42
        proc upvar_printer {level} {
            upvar $level x x
            puts $x
        upvar_printer 1
        upvar_setter 1
        upvar_printer 1
        upvar_setter 2
        upvar_printer 2
        upvar_printer 3
        upvar_setter 3
        upvar_printer 3
most_outer # Yields the numbers 24, 42, 42, 7, and 42.

Upvar redefinitions

If you know that the upvar is never redefined after the nested function is created, it is effectively immutable — similar to the effect of Java's partial closures in anonymous inner classes via the final keyword. In this case, you can create an optimized closure in a form we call a flat closure — if, during static analysis, you find that none of the upvars are redefined after the function definition, you can import the upvars into the closure, effectively copying the immutable jsvals into extra function slots.

On the other hand, if variables in enclosing scopes are (re)defined after the function definition (and thus, don't appear immutable to the function), a shared environment object has to be created so that nested functions can correctly see when the updates to the jsvals occur. Take the following example:

function outer() {
    var communicationChannel = 24;
    function innerGetter() {
        return communicationChannel();
    function innerSetter() {
        communicationChannel = 42;
    return [innerGetter, innerSetter];

Closing over references

In this case, outer must create an environment record outside of the stack so that when innerGetter and innerSetter escape on return, they can see both communicate through the upvar. This is the nice encapsulation-effect you can get through closure-by-reference, and is often used in the JS "constructor-pattern", like so:

function MooCow() {
    var hasBell = false;
    var noise = "Moo.";
    return {
        pontificate: function() { return hasBell? noise + " <GONG!>" : noise; }
        giveBell: function() { hasBell = true; }

It's interesting to note that all the languages I work with these days perform closure-by-reference, as opposed to closure-by-value. In constrast, closure-by-value would snapshot all identifiers in the enclosing scope, so immutable types (strings, numbers) would be impossible to change.

Sometimes, closure-by-reference can produce side effects that surprise developers, such as:

def surprise():
    funs = [lambda: x ** 2 for x in range(6)]
    assert funs[0]() == 25

This occurs because x is bound in function-local scope, and all the lambdas close over it by reference. When x is mutated in further iterations of the list comprehension (at least in Python 2.x), the lambdas are closed over the environment record of surprise, and all of them see the last value that x was updated to.

I can sympathize. In fact, I've wrote a program to do so:

var lambdas = [];
var condolences = ["You're totally right",
        "and I understand what you're coming from, but",
        "this is how closures work nowadays"];
for (var i = 0; i < condolences.length; i++) {
    var condolence = condolences[i];
    lambdas.push(function() { return condolence; });

Keep in mind that var delcarations are hoisted to function scope in JS.

I implore you to note that comments will most likely be received while I'm sober.





Cue complaints about the imperfect lambda abstraction in JavaScript. Dang Ruby kids, go play with your blocks! ;-)


Roughly. Gory details left out for illustrative purposes.


There's also the case where the display array runs out of space for the array. I believe we emit unoptimized name-lookups in this case, but I don't entirely recall.


With a few insidious dynamic scoping constructs thrown in. I'll get to that in a later entry.


Barring enclosing with statements and injected eval scopes.


Unless they contain an eval or with, in which case we call them "heavyweight" — though they still don't need information from enclosing functions, they must carry a stack of environment records, so they're not optimal. I love how many footenotes I make when I talk about the JavaScript language. ;-)


As a result, it's extremely difficult to optimize accesses like these without whole propgram analysis.