'JavaScript' category

« Older entries

Accomplish your new year's resolution of being more badass

I know what you're going through — I've done it all before.

The market is teeming with products that purport to help you meet your badassery quota.

First you do the shakes. Then, you go with the bars that say they're infused with the good stuff, but just seem to leave a slightly corrugated taste in your mouth. Finally, you're droppin' hard-earned dinero on a facility that you don't need or a professional badassery trainer whose appointments you desperately wish you could cancel.

But I'm here to tell you don't need to shell out cash-money to become more badass, my friends. Not anymore, thanks to the beauty of open source, the ES6 plans wiki page, and our delightful SpiderMonkey technical staff who are standing by to receive your calls for mentorship.

Allow me to explain.

Badass begets badass

You may have seen Badass JavaScript: self described as, "A showcase of awesome JavaScript code that pushes the boundaries of what's possible on the web." Check out their badass year in review if you haven't. (Some of the stuff that the interwebs has done with JavaScript even has @NeckbeardHacker envious.)

It probably won't surprise you, but do you know what those folks love? JavaScript evolution. There's nothing quite so refreshing as a fresh-squeezed JS language feature that removes that irritating itching-burning sensation. Sets that do what you mean! String repetition that you can use without copy/pasting from your last project! Inverse hyperbolic sine that you can use for... your... math! (Again, all of this is in the ES6 plans.)

I, for example, have wanted String.prototype.startsWith very badly, to the point that I've started washing people's window panes against their will as they exit highway 101. Around here, odds are that a programmer sees my sign and implements the thing just to stop me from bothering them again. (A little tactic that I call SpiderGuerilla warfare.)

Me, holding my SpiderGuerilla sign.

So what are you waiting for?

I know, you're probably already quite beefcake, but here's my three step plan:

  1. Watch the SpiderMonkey hacking intro.

  2. Pick out a bug from the ES6 plans.

  3. Come talk to great people on irc.mozilla.org in channel #jsapi (for example, cdleary, jorendorff, luke, or whoever else is in there) or comment in the bug — tell them that you're on a quest to become even more badass, describe a bug that you're interested in taking, and give a quick note on what you've done with the engine so far — for example, walking through the video in step 1! We'll find you a mentor who will get you started on the right track.

Don't miss out on this exclusive offer — SpiderMonkey contribution is not sold in stores.

In fact, if you act now, we'll throw in an IonMonkey shirt (or another Firefox shirt of equivalent awesomeness) and publish a blurb about your feature in Mozilla hacks. Of course, you can also have yourself added to about:credits, providing that's what you are into.

IonMonkey shirt.

This one-of-a-kind offer is too ridonk to pass up. Just listen to this testimonial from one of our badass contributors:

I started contributing to SpiderMonkey and now I can write a JIT compiler from scratch in a matter of days. BEEFCAKE!

@evilpies [Liberally paraphrased]

See you in the tubes!

String representation in SpiderMonkey

I'm back from holiday break and I need to limber up my tech writing a bit. Reason 1337 of my ever-growing compendium, Nerdy Reasons to Love the Internet, is that there are always interesting discussions going on. [*] I came across Never create Ruby strings longer than 23 characters the other day, and despite the link-bait title, there's a nice discussion of string representation within MRI (a Ruby VM).

My recap will be somewhat abbreviated, since I've only given myself a chunk of the morning to write this, so feel free to ask for clarification / follow up in the comments.

Basic language overview

At the language level JavaScript strings are pretty easy to understand. They are immutable, same as in Python:

>>> foo = 'abc'
>>> foo[2] = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
js> options('strict')
""
js> foo = 'abc'
"abc"
js> foo[2] = 'd'
typein:3: strict warning: foo[2] is read-only
"d"

But you can (without mutating any of the original values) compare them for equality, concat them, slice them, regexp replace/split/match on them, trim whitespace from them, slap them to chop vegetables, and so forth. (See the MDN docs for String.prototype methods.) In the VM, we need to make those operations fast, with an emphasis on the operations that the web uses heavily, which are ideally [†] the ones reflected in benchmarks.

Abstractly

In an abstract sense, a primitive string in SpiderMonkey is a GC cell (i.e. small header that is managed by the garbage collector) that has a length and refers to an array of UCS2 (uniformly 16-bit) characters. [‡]

Recall that, in many dynamic language implementations, type tagging is used in order to represent the actual type of an statically-unknown-typed value at runtime. This generally allows you to work on integers (and, in SpiderMonkey, doubles) without allocating any space on the heap. Primitive strings are very important to distinguish quickly and they are subtly distinct from (non-primitive) objects, so they have their own type tag in our value representation, as you can see in the following VM function:

/*
 * Convert the given value to a string.  This method includes an inline
 * fast-path for the case where the value is already a string; if the value is
 * known not to be a string, use ToStringSlow instead.
 */
static JS_ALWAYS_INLINE JSString *
ToString(JSContext *cx, const js::Value &v)
{
    if (v.isString())
        return v.toString();
    return ToStringSlow(cx, v);
}

Aside

In JavaScript there's an annoying distinction between primitive strings and string objects that you may have seen:

js> foo = new String('abc')
(new String("abc"))
js> foo.substr(0, 2)
"ab"
js> foo[2]
"c"
js> foo.toString()
"abc"

For simplicity and because they're uninteresting, let's pretend those new String things don't exist.

Atoms

The simplest string form to describe is called an "atom", which is somewhat similar to an interned string in Python. When you write a literal string or identifier in your JavaScript code, SpiderMonkey's parser turns it into one of these atoms.

(function() {
    // Both 'someObject' and 'twenty' are atomized at parse time!
    return someObject['twenty'];
})()

Note that the user has no overt control over which strings get atomized (i.e. there is no intern builtin). Also, there are a bunch of "primordial" atoms that the engine creates when it starts up: things like the empty string, prototype, apply, and so on.

The interesting property of atoms is that any two atoms can be compared in O(1) time (via pointer comparison). Some work is required on behalf of the runtime to guarantee that property.

To get an atom within the VM, you have to say, "Hey SpiderMonkey runtime, atomize these characters for me!" In the general case the runtime then does a classic "get or create" via a hash table lookup: it determines whether or not those characters have an existing atom and, if not, creates one. The atomized primitive string that you get back always has its characters contiguous in memory — a property which is interesting by contrast to...

Ropes

Let's say that you had immutable strings, like in JavaScript, and you had three large books already in string form: let's call them AoCP I, II, and III. Then, some jerk thinks it would be funny to take the first third of the first book, the second third of the second book, and the third third of the third book, and slice them together into a single string.

What's the simplest thing that could possibly work? Let's say that each book is a 8MiB long. You could allocate a new, 8MiB array of characters and memcpy the appropriate characters from each string into the resulting buffer, but now you've added 33% memory overhead and wasted quite a few cycles.

A related issue is efficient appending and prepending. Let's say you have a program that does something like:

var resultString = '';

function onMoreText(text) {
    // If the text starts with a letter in the lower part of the alphabet,
    // put it at the beginning; otherwise, put it at the end.
    if (text[0] < 'l')
        resultString = text + resultString;
    else
        resultString = resultString + text;
};

If you did the naive "new string and memcpy" for all of the appends and prepends, you'd end up creating a lot of useless garbage inside the VM. The Python community has the conventional wisdom that you should build up a container (like a deque) and join on it, but it's difficult to hold the entire ecosystem of web programmers to such standards.

In the SpiderMonkey VM, the general solution to problems like these this is to build up a tree-like data structure that represents the sequence of immutable substrings. and collapse that datastructure only when necessary. Say that you write this:

(function weirdJoin(a, b, c, d) {
    var lhs = a + b;
    var rhs = c + d;
    var result = lhs + rhs;
    return result;
})('I', 'love', 'SpiderMonkey', '!');

The concatenation is performed lazily by using a tree-like data structure (actually a DAG, since the same string cell can appear in the structure at multiple points) that we call a rope. Say that all of the arguments are different atoms — the resulting rope would look like:

SpiderMonkey rope concatenation example.

Since strings are immutable at the language level, cycles can never form. When the character array is requested within the engine, a binary tree traversal is performed to flatten the constituent strings' characters into a single, newly-allocated buffer. Note that, when the rope is not flattened, the characters which constitute the overall string are not in a single contiguous region of memory — they're spread across several buffers!

Dependent strings

How about when you do superHugeString.substr(4096, 16384)? Naively, you need to copy the characters in that range into a new string.

However, in SpiderMonkey there's also a concept of dependent strings which simply reuse some of the buffer contents of an existing string's character array. In a very simple fashion, the derived string keeps the referred-to string alive in order to reuse the characters in the referred-to string's buffer.

Fancier and fancier!

Really small strings are a fairly common case: they are used for things like array property indices and single-characters of strings — recall that, in JavaScript, all object properties are named by strings, unlike in languages like Python which uses arbitrary hashables. To optimize for this case, we have strings with characters embedded into their GC cell header, avoiding a heap-allocated character buffer. [§] We also pre-initialize many of these (less than length-3 strings and integers up to 256) atoms when the runtime starts up to bypass the typical hash table lookup overhead.

I'm out of time, but I hope this gives some insight into the good number of tricks are played to make common uses of JavaScript strings fast within the SpiderMonkey VM. For you curious types, there's lots more information in the code!

Footnotes

[*]

Of course, reason 1336 is that there are so many lame 1337 references that it's actually still funny.

[†]

Note the emphasis on ideally. Ideally, it should make you chuckle.

[‡]

Fun aside: the maximum length of a string in our engine is currently bounded to 28 bits.

[§]

Our garbage collector is not currently capable of creating variable-sized allocations — we only have fixed-size header classes.

Screencast: SpiderMonkey hacking intro

I saw Gary Bernhardt's DestroyAllSoftware screencasts last night and thought, "Wow, that's a great idea! I bet it takes a lot less time to make a screencast than to write a blog entry with thousands of gross words in it."

And here you have it: my first screencast. Since it's only five minutes long, I was able to make it right after breakfast! It shows how to get started building the JS shell and how to add a new shell builtin that performs very important functionality.

I'm not very good at speaking and typing concurrently, so be gentle! (Also, it's hard to see if it's not fullscreen and HD quality, so I recommend using that.)

Update 2011-10-19 1700: @dukeleto points out that, if you need to work around recent issues with Mercurial clone timeouts on Mozilla servers, you can also visit the Mercurial web view for the tip revision and download via the bz2 link displayed at the top.

JS regexps implemented in JS

If you heart JS and want to learn more about how regular expressions work, I've got yet another fun project for you to work on.

Back when I was working more heavily on the regular expression engine, some ECMAScript spec correctness questions came up. As it turns out, the regular expression part of the specification is written in scary-sounding-but-really-not-that-hard-to-understand continuation passing style (and it really seems to make the most sense that way).

I tried to work through the first bug on paper, but I forgot to carry the one or something, and I got the wrong result. dmandelin quickly whipped up a program modeled on the spec to resolve that one example conclusively. Seeing how easy that was, I followed his lead and started working on a little library to resolve these questions in ways that save more trees.

I haven't worked on it much lately (according to this neat GitHub activity doohickey), but I put cdlre out on github today, and I'd be happy to review pull requests. The regular expression specification for matching (ECMAScript revision 5 section 15.10) is easy to translate into running code, and I left a lot of features unimplemented, just for you!

I'll just copy-pasta the rest from the project README:

Potential applications:

  • Regression testing the specification against host implementations.

  • Use in understanding why regular expressions succeed/fail to match.

  • Use in a metacircular interpreter (like Narcissus).

  • Use as a staging ground for regular expression optimizations and/or a regular expression compiler. (Such a compiler could target eval as a backend or a JIT code execution foreign function.)

Goals

  • Be capable of visualizing (or at least dumping out) the ECMAScript standard steps taken in matching a regular expression.

  • Be capable of enabling/disabling the de-facto quirks from various browsers which are not yet part of the standard.

  • Be capable of running a thorough regression suite against the host regular expression engine (presumably with a set of permitted quirk options).

  • Keep the JS code a direct translation from the spec where possible and practical.

I'm sure that hooking in a comprehensible visualization would be a helpful tool for web developers who want to harness the Indiana Jones-like power of regular expressions.

Go-go gadget community?

Thinking of dynamic language code as a template

You can think of a dynamic language code as a template. This code has generality because of duck typing; the idea is roughly, "If the code can work on a value, it will, with no muss or fuss."

Some programmers find this to be an empowering form of "do what I say" — you don't need to explain everything you want your code to do in terms that even an optimizing compiler can understand.

In traditional dynamic language VMs, this template is instantiated with concrete value types at runtime. When a value flows into a bit of code, the interpreter inspects the type of the value and then runs the appropriate "instantiation of the template".

Example

Consider the function:

function bitAndAdd(a, b, c) {
    return a + (b & c);
}

In its maximally generic, template-looking form it reads like this pseudo bytecode::

bitAndAdd<T0, T1, T2> =
    GETARG0 -> T0
    GETARG1 -> T1
    GETARG2 -> T2
    BITAND<T1, T2> -> T3
    ADD<T1, T3> -> T4

In the JS engine, the T-prefixed types can be any in the set {null, undefined, boolean, int32, double, object, string} — these are the engine internal value tags.

In a bytecode-based interpreter, each bytecode typically asks, "Is this thing that I'm acting on an int32? Or double? Or string? Or ...? Or user defined type?" [*] Once that's determined, it runs the appropriate sub-section of the bytecode implementation for that type. The question-asking part of the bytecode is, in effect, directing you to the appropriate template instantiation. (Note that bytecodes are kind of like mini-templates composed into the larger function template.)

When you call bitAndAdd<int32, int32, int32>, you get an instantiated code path at runtime that looks like::

bitAndAdd<int32, int32, int32> =
    GETARG0 -> int32
    GETARG1 -> int32
    GETARG2 -> int32
    BITAND<int32, int32> -> int32
    ADD<int32, int32> -> ...

This path represents all of the machine code executed in the interpreter that does useful work for this function. Imagine the bytecodes that make up the function as a bunch of enum values in an array. For each enum value in the array, the interpreter dispatches to an type-unspecialized bytecode implementation. Then, the unspecialized bytecode implementation inspects the input types and dispatches to the specialized implementation.

A decent example is the ADD opcode in JavaScript. The unspecialized version of ADD looks something like::

ADD<T0, T1>(T0 lhs, T1 rhs) {
    if (T0 is int32 and T1 is int32)
        result = ADD<int32, int32>(lhs.getInt32(), rhs.getInt32())
    else if (T0 is string and T1 is string)
        result = ADD<string, string>(lhs.getString(), rhs.getString())
    else if (...)
        ...
}

You can see the unspecialized ADD delegating to the type-specialized bytecode sub-implementations. A specialization like ADD<int32, int32> are fairly simple and read something like::

ADD<int32, int32>(int32 lhs, int32 rhs) {
    result = lhs + rhs
    if (overflow)
        result = (double) lhs + (double) rhs;
}

Specializing the template

Of course, with JIT technology, we try to be a lot smarter about making the instantiation efficient. There are a few key mechanisms:

Avoiding bytecode dispatch

The composition of mini-templates (bytecodes) into the larger program template (made out of bytecodes) causes "jump to the next bytecode in the program" cost in an interpreter loop. By creating a machine-code implementation for the sequence of bytecodes that make up a function, we naturally cause machine-code duplication, but avoid that interpretation overhead. This is a natural consequence of choosing to JIT compile a function.

Constant propagation / local type inference (prime example: JägerMonkey)

If the JIT sees that you're performing a bitwise-and, and the spec says that the result of a bitwise-and is always an int32, the JIT can safely assume that fact downstream in the code; for example, in:

function byteSquare(x) {
    var a = x & 0xff;
    var b = a * a;
    return b;
}

The inputs to the a * a multiply are int32s, without question. There's no reason to instantiate the machine code for other types when you know it's going to be a int32.

Whole program type inference (prime examples: JägerMonkey + Type Inference, IonMonkey)

Flowing conservative type information through all the code in the system tells you key interprocedural facts. For example, just by looking at your program it may be easy to see that the divmod(dividend, divisor) function is only ever called with two constant integers as arguments.

With that as the basic idea, the analysis can get a lot fancier; for example, analyses can see that you're only storing new Car() objects into the vehicle property of your Person records, and that new Car() will always have an int32 value in its mufflerCount attribute.

By considering everything [†] that the template could possibly do, we can still narrow down the possibilities in worthwhile cases. This less us instantiate less code and know more facts for use in optimization.

Trace recording (prime example: TraceMonkey)

When a loop is determined to be important (i.e. it hits a certain execution count) a recorder notes which bytecodes execute in the interpreter and what types flow into them. This avoids both bytecode dispatch overhead and type ambiguity, creating a very highly specialized template.

However, if the types ever stop matching up with the observed bytecode execution — say, myObject.quotient used to be an int32, but when we ran the trace a subsequent time it turned out to be a double — the execution of that trace cannot continue, because it's instantiated for the wrong types downstream. Because of this possibility, a bunch of checks for type compatibility have to be inserted in the template instantiation.

It's important to note that the generated template is more than just type dependent — it can also be value depdendent! Because a trace only records a single execution path and questions like "Do I execute the if body or the else body?" can be determined by the values held by a given type.

Some folks argue that single-execution recording tends to create overly instantiated templates for typical dynamic language code.

Profiling (prime example: IonMonkey)

If you think of trace recording as observing bytecode for a single run of execution, profiling is observing bytecode for n runs of execution [‡] and accumulating the notes. This eases the pain of compiling "flip floppy" code, because you don't have to see the primarily taken path the first time, you have n tries to get a feel for it.

This process lets you instantiate a template that's general enough to handle all the things you've seen over the n iterations you've observed.

While trace compilation culls out all but one path of control flow, profiling data can be used to cull out all unobserved (and presumably cold) control flow as well. This may increase the effectiveness and decrease the runtime of dataflow-analysis-based optimizations, because there's less unimportant code to consider. I'm not aware of existing profiling JITs that do this, however, which indicates to me that it may not be an overall win.

Further discussion

Dynamic language compilers try to use runtime feedback as well as things it can infer about the code to specialize the template in the best way possible and make generated machine code blazingly fast. We want to be able to mix, "things we definitely know" (from analysis) with "things we are pretty sure we know" (from runtime feedback) in order to get the maximum speedup possible.

One exciting thing about Just-In-Time is that this runtime feedback makes it at least theoretically possible to exceed the speed of a corresponding static program; however, there are signficant challenges involved. Whenever we start leaning too heavily on "things we are pretty sure we know," we need to be able to keep the world from falling apart when we're wrong.

It's actually a fairly emotional topic in compiler engineering: how do you hold the world together when everything you thought you knew is wrong?

Be strong. Until next time.

Footnotes

[*]

Of course, bytecodes that don't need to know about the types of its operands don't ask the questions. Note that GETARG0 takes no parameters, and so wouldn't need to ask any questions.

[†]

Nearly everything. The type inference algorithm actually makes some conservative simplifications to cut down on analysis time.

[‡]

To some approximation. As an alternative, you can also observe execution for some period of time t, which has some correlation to the number of runs of execution.