January 13, 2012

Lively assertions

Recently, "another" discussion about fatal assertions has cropped up in the Mozilla community. Luckily for me, I've missed all of the other discussions, so this is the one where I get to throw in my two bits.

Effectively, I only work on the JS engine, and the JS engine only has fatal assertions. This approach works for the JS team, and I can take an insider's guess as to why.

What's a fatal assertion?

In Mozilla, we have two relevant build modes: debug and no-debug.

A fatal assertion means that, when I write JS_ASSERT(someCondition), if someCondition doesn't hold, we call abort in debug build mode. As a result, the code which follows the assertion may legitimately assume that someCondition holds. You will never see something like this in the JS engine:

{
    JS_ASSERT(0 <= offset && offset < size);
    if (0 <= offset && offset < size) // Bad! Already enforced!
        offset_ = offset;
}

The interesting thing is that, in no-debug mode, we will not call abort. We eliminate the assertion condition test entirely. This means that, in production, the code which follows the assertion assumes that someCondition holds, and there's nothing checking that to be the case. [*]

Exploding early and often

If a JS-engine hacker assumes someCondition during development, and it turns out that someCondition isn't the case, we'd like to know about it, and we'd like to know about it LOUDLY.

Our valiant security team runs fuzz testing against the JS engine continuously, and hitting any one of these fatal assertions causes an abort. When you know that there is input that causes an abort in debug mode, you have a few potential resolutions:

But I think the real key to this whole process is simple: if things are exploding, a member of the bomb squad will show up and come to some resolution. Fatal assertions force action in a way that logs will not. You must (at least cursorily) investigate any one of these assertions as though it were in the most severe category, and some form of resolution must be timely in order to unblock fuzzers and other developers.

Everything that the hacker feels can and should be asserted is being asserted in a way that's impossible to ignore. Invariants present in the code base are reflected by the fatal assertions and, once they've proven themselves by running the regression/fuzzer/web gamut, can be depended upon — they certainly come to reinforce and strengthen each other over time.

Footnotes

[*]

We do have mechanisms that hackers can use for further checking, however. If crash reports indicate that some assertions may be suspect in production environments, we have a JS_OPT_ASSERT for doing diagnostics in our pre-release distribution channels. Since the most reliable information in a crash report tends to be the line number that you crashed on, fatal non-debug assertions are a very useful capability.

String representation in SpiderMonkey

I'm back from holiday break and I need to limber up my tech writing a bit. Reason 1337 of my ever-growing compendium, Nerdy Reasons to Love the Internet, is that there are always interesting discussions going on. [*] I came across Never create Ruby strings longer than 23 characters the other day, and despite the link-bait title, there's a nice discussion of string representation within MRI (a Ruby VM).

My recap will be somewhat abbreviated, since I've only given myself a chunk of the morning to write this, so feel free to ask for clarification / follow up in the comments.

Basic language overview

At the language level JavaScript strings are pretty easy to understand. They are immutable, same as in Python:

>>> foo = 'abc'
>>> foo[2] = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
js> options('strict')
""
js> foo = 'abc'
"abc"
js> foo[2] = 'd'
typein:3: strict warning: foo[2] is read-only
"d"

But you can (without mutating any of the original values) compare them for equality, concat them, slice them, regexp replace/split/match on them, trim whitespace from them, slap them to chop vegetables, and so forth. (See the MDN docs for String.prototype methods.) In the VM, we need to make those operations fast, with an emphasis on the operations that the web uses heavily, which are ideally [†] the ones reflected in benchmarks.

Abstractly

In an abstract sense, a primitive string in SpiderMonkey is a GC cell (i.e. small header that is managed by the garbage collector) that has a length and refers to an array of UCS2 (uniformly 16-bit) characters. [‡]

Recall that, in many dynamic language implementations, type tagging is used in order to represent the actual type of an statically-unknown-typed value at runtime. This generally allows you to work on integers (and, in SpiderMonkey, doubles) without allocating any space on the heap. Primitive strings are very important to distinguish quickly and they are subtly distinct from (non-primitive) objects, so they have their own type tag in our value representation, as you can see in the following VM function:

/*
 * Convert the given value to a string.  This method includes an inline
 * fast-path for the case where the value is already a string; if the value is
 * known not to be a string, use ToStringSlow instead.
 */
static JS_ALWAYS_INLINE JSString *
ToString(JSContext *cx, const js::Value &v)
{
    if (v.isString())
        return v.toString();
    return ToStringSlow(cx, v);
}

Aside

In JavaScript there's an annoying distinction between primitive strings and string objects that you may have seen:

js> foo = new String('abc')
(new String("abc"))
js> foo.substr(0, 2)
"ab"
js> foo[2]
"c"
js> foo.toString()
"abc"

For simplicity and because they're uninteresting, let's pretend those new String things don't exist.

Atoms

The simplest string form to describe is called an "atom", which is somewhat similar to an interned string in Python. When you write a literal string or identifier in your JavaScript code, SpiderMonkey's parser turns it into one of these atoms.

(function() {
    // Both 'someObject' and 'twenty' are atomized at parse time!
    return someObject['twenty'];
})()

Note that the user has no overt control over which strings get atomized (i.e. there is no intern builtin). Also, there are a bunch of "primordial" atoms that the engine creates when it starts up: things like the empty string, prototype, apply, and so on.

The interesting property of atoms is that any two atoms can be compared in O(1) time (via pointer comparison). Some work is required on behalf of the runtime to guarantee that property.

To get an atom within the VM, you have to say, "Hey SpiderMonkey runtime, atomize these characters for me!" In the general case the runtime then does a classic "get or create" via a hash table lookup: it determines whether or not those characters have an existing atom and, if not, creates one. The atomized primitive string that you get back always has its characters contiguous in memory — a property which is interesting by contrast to...

Ropes

Let's say that you had immutable strings, like in JavaScript, and you had three large books already in string form: let's call them AoCP I, II, and III. Then, some jerk thinks it would be funny to take the first third of the first book, the second third of the second book, and the third third of the third book, and slice them together into a single string.

What's the simplest thing that could possibly work? Let's say that each book is a 8MiB long. You could allocate a new, 8MiB array of characters and memcpy the appropriate characters from each string into the resulting buffer, but now you've added 33% memory overhead and wasted quite a few cycles.

A related issue is efficient appending and prepending. Let's say you have a program that does something like:

var resultString = '';

function onMoreText(text) {
    // If the text starts with a letter in the lower part of the alphabet,
    // put it at the beginning; otherwise, put it at the end.
    if (text[0] < 'l')
        resultString = text + resultString;
    else
        resultString = resultString + text;
};

If you did the naive "new string and memcpy" for all of the appends and prepends, you'd end up creating a lot of useless garbage inside the VM. The Python community has the conventional wisdom that you should build up a container (like a deque) and join on it, but it's difficult to hold the entire ecosystem of web programmers to such standards.

In the SpiderMonkey VM, the general solution to problems like these this is to build up a tree-like data structure that represents the sequence of immutable substrings. and collapse that datastructure only when necessary. Say that you write this:

(function weirdJoin(a, b, c, d) {
    var lhs = a + b;
    var rhs = c + d;
    var result = lhs + rhs;
    return result;
})('I', 'love', 'SpiderMonkey', '!');

The concatenation is performed lazily by using a tree-like data structure (actually a DAG, since the same string cell can appear in the structure at multiple points) that we call a rope. Say that all of the arguments are different atoms — the resulting rope would look like:

SpiderMonkey rope concatenation example.

Since strings are immutable at the language level, cycles can never form. When the character array is requested within the engine, a binary tree traversal is performed to flatten the constituent strings' characters into a single, newly-allocated buffer. Note that, when the rope is not flattened, the characters which constitute the overall string are not in a single contiguous region of memory — they're spread across several buffers!

Dependent strings

How about when you do superHugeString.substr(4096, 16384)? Naively, you need to copy the characters in that range into a new string.

However, in SpiderMonkey there's also a concept of dependent strings which simply reuse some of the buffer contents of an existing string's character array. In a very simple fashion, the derived string keeps the referred-to string alive in order to reuse the characters in the referred-to string's buffer.

Fancier and fancier!

Really small strings are a fairly common case: they are used for things like array property indices and single-characters of strings — recall that, in JavaScript, all object properties are named by strings, unlike in languages like Python which uses arbitrary hashables. To optimize for this case, we have strings with characters embedded into their GC cell header, avoiding a heap-allocated character buffer. [§] We also pre-initialize many of these (less than length-3 strings and integers up to 256) atoms when the runtime starts up to bypass the typical hash table lookup overhead.

I'm out of time, but I hope this gives some insight into the good number of tricks are played to make common uses of JavaScript strings fast within the SpiderMonkey VM. For you curious types, there's lots more information in the code!

Footnotes

[*]

Of course, reason 1336 is that there are so many lame 1337 references that it's actually still funny.

[†]

Note the emphasis on ideally. Ideally, it should make you chuckle.

[‡]

Fun aside: the maximum length of a string in our engine is currently bounded to 28 bits.

[§]

Our garbage collector is not currently capable of creating variable-sized allocations — we only have fixed-size header classes.

SpiderMonkey bubbles OOM retvals

Handling Out-of-Memory conditions gracefully in a large-scale application is notoriously difficult. [*] You generally have to detect that you're OOMing at an arbitrary allocation site, then carefully not allocate any memory, but delegate to part of your application that can recover or report what happened to the user, all without allocating any memory.

Did I mention that, once you've OOM'd, you really shouldn't allocate any memory?

Recovery, so far as I know, means flushing out any expendable caches or trying to coax the operating system to page stuff out to disk.

Of course, malloc may only indicate this OOM-like scenario (by returning NULL) if you're "lucky". If you're unlucky, on overcommit systems you can get taken out by the OOM killer, dispensing its unique brand of indiscriminate justice. [†] Or, you can crawl to brain-numbing speeds as your memory is relegated to swap and the bully operating system chants, "Stop DoSing yourself! Stop DoSing yourself!"

In any case, a NULL return value from malloc must not be propagated down the "successful allocation" paths in order to avoid potentially-exploitable NULL pointer deference vulnerabilities.

SpiderMonkey

In SpiderMonkey we check for error conditions, including OOM conditions, everywhere. (C++ exceptions were not attempted in the Mozilla code base, due to performance issues with the Windows ABI.) As a result, most functions in SpiderMonkey are fallible. A typical signature looks something like:

bool
DoSomeStuff(JSContext *cx)
{
    MonkeyCage *cage = cx->new_<MonkeyCage>();
    if (!cage)
        return false;

    // ...put monkeys in the cage, or something...

    return true;
}

A bool is returned in order to indicate failure. cx is the execution context, which is fancy way of saying, "An object that you thread through all your engine functions because it holds references to junk you're going to need."

One thing you're definitely going to need is allocation functionality. You can get at allocation functionality through helper methods like cx->malloc_() and cx->new_<>() — these do some accounting (how many outstanding bytes have been malloc'd on this context?) and, if you run out of memory, flush some GC stuff to see if there's free space to be had.

When an error occurs while you DoSomeStuff, you set "exception" details (such as a JavaScript exception object or an "I've hit an OOM" flag) on the context. Then, you return false to your caller. And, if it doesn't know how to handle the exception, that caller returns false to its caller.

This transitive "bubbling" of a false return value to callers unwinds the C++ stack — RAII objects with destructors release any resources they had acquired — and permits callers who understand the error to handle it appropriately. In this sense, even out-of-memory errors are "catchable" by native code in some sense.

At the outer membrane of the engine is the SpiderMonkey API, called the JSAPI. The functions in the JSAPI reflect this same fallibility: most of the API functions also return a boolean value and take a context which can be used to report an error.

V8

By contrast, V8 is able to use this construct to avoid a similar OOM check-and-bubble guideline:

void* Malloced::New(size_t size) {
  void* result = malloc(size);
  if (result == NULL) {
    v8::internal::FatalProcessOutOfMemory("Malloced operator new");
  }
  return result;
}

The FatalProcessOutOfMemory calls a fatal error callback and then calls abort(). It doesn't look like the callback is capable of doing recovery and continuing execution, but I'm new to the codebase, so I'm not totally sure.

With process isolation, calling abort() when you run out of memory is somewhat excusable. If we did this in Firefox, which does not have process-per-tab, one page which hit such an OOM would take out the whole program. In Chrome, however, an OOM handled in this way will take out the current process-isolated tab group. When you have lots of tabs open in Chrome, a sizable number of tabs may be muxed to a single process, in which case this is might be considered bad user-level behavior, but it works well in the more common case (with a small number of tabs).

Ignoring opinions on whether aborting a tab group is somehow "worse" than alternatives, I have to say that not explicitly checking for OOMs is nice. Writing, maintaining, and validating the correctness of seldom-taken but abundantly-existent OOM paths is a tax on development.

"Infallible" allocation

Allocations can be bucketed into two simple categories:

If you've attempted allocation recovery procedures but still fail to allocate the former category, life is tough. These should generally succeed, because they're well-understood known quantities: a call to malloc(sizeof(double)) failing means you're up against a pretty serious memory limit, so it's tough to do more than display a pre-canned, "Sorry, everything is hosed!" dialog. Unless your process is architected such that you can cleave off and deallocate a large amount of currently used (and otherwise unreferred to) memory at the user's behest with zero memory overhead, you don't have very good options.

By constrast, the latter might not succeed just because the user-controlled content is sufficiently weird, malformed, or malicious.

To avoid the tax I mentioned in the last section, the Mozilla Gecko engine (outside of SpiderMonkey) has been moving to use a strategy called "infallible malloc" for the former category of allocations. If a well-understood allocation of a reasonable and generally fixed size fails, it will first attempt memory recovery procedures [‡] and, failing at that, will cause the process to abort. With this scheme, you avoid the development tax of checking of OOM conditions.

Back to SpiderMonkey

So, is there practical benefit to bubbling an OOM through the whole engine to the API caller?

Currently, yes. Ultimately, probably not.

At the moment, both of the categories of allocation I mentioned are using the same mechanism, so hitting a content-induced OOM (second category) in SpiderMonkey will bubble the issue up to Gecko, which will be able to report that error and proceed as normal. In the future, however, there's no essential reason to bubble the issue up to Gecko: we could just use infallible malloc from SpiderMonkey directly.

SpiderMonkey already supports specification of a custom allocator at compile time, so we would just need to incrementally identify known- and fixed-size allocations and port them to use infallible malloc. That way, instead of bubbling up a return code from SpiderMonkey to ultimately cause an abort() from infallible malloc in Gecko, we could cause the abort() from SpiderMonkey's infallible malloc call directly, and avoid the OOM development tax.

Footnotes

[*]

With apologies to Erik Corry for misunderstanding how V8 handled OOMs in my recent reading!

[†]

This could occur due to a legitimate dereference of a memory chunk that malloc cheerily gave you — it just never happened to be backed by physical memory! Similarly, on a system where multiple applications are running simultaneously, any one of them is a potential candidate for OOM killing, so your process may get OOM killed because a different process legitimately dereferenced its overcommitted memory.

[‡]

I've been informed that this is not currently enabled, but is under active development by the MemShrink team.

Reviews redux

"Whoa, Billy reviewed a one-meg patch to the hairiest part of the codebase in just two hours!" [*]

It's pretty easy to identify what's wrong with that sentence. The speed of a review is not an achievement. Billy could have literally just checked the, "yes, I reviewed it" button without looking at the patch.

... but an empty review looks pretty bad, especially as the size of the patch grows. So maybe Billy padded it out by identifying two hours worth of style nits and asking for a few comments here and there. In any case, the code quality is no more assured after the review than before it.

Conventional wisdom is that it's economically prudent to do good code reviews: finding defects early incurs the lowest cost, review has a 'peer pressure' based motivation towards quality improvement, and review creates knowledge redundancy that mitigates the bus effect. In research literature on code review, effectiveness is typically measured as "defects found per KLoC". [†] However, this dimension ignores the element of "time per review": I'm going to argue that time to give a good review varies in the complexity and size of the modifications.

Now, one can argue that, if Billy does anything more than ignorantly checking the little "I've reviewed this" box, he has the potential to add value. After all, code isn't going to be defect-free when it comes out of review, so we're just talking about a difference in degree. If we assume that truly obscure or systematic bugs won't jump out from a diff, what additional value is Billy really providing by taking a long time?

This is where it gets tricky. I think the reason that folks can have trouble deciding how long reviews should take is that we don't know what a review really entails. When I request that somebody review my patch, what will they try to suss out? What kind of code quality (in terms of functional correctness and safety) is actually being assured at the component level, across all reviewed code?

If you can't say that your reviews ensure some generally understood level of code quality (i.e. certain issues have definitively been considered), it's hard to say that you're using reviews as an effective tool.

Aside: even with clear expectations for the code review process, each party has to exercise some discipline and avoid the temptation to lean on the other party. For mental framing purposes, it's a defect-finding game in which you're adversaries: the developer wants to post a patch with as few defects as possible and the reviewer wants to find as many defects as they possibly can within a reasonable window of time.

A few best practices

From the research I've read on code review, these are two simple things that are supposed to increase defect-finding effectiveness:

Scan, then dig.

Do a preliminary pass to get the gist of how it's structured and what it's doing. Note down anything that looks fishy at a glance. Once you finish your scan, then do another pass that digs into all the corner cases you can think of and inspects each line thoroughly.

Keep checklists.

One checklist for self-reviews and one checklist for reviews of everybody else's stuff. I've seen it recommended that you scan through the code once for every checklist item to do a truly thorough review.

The self-review checklist is important because you tend to repeat the same mistakes until you've learned them cold. When you make a defect and it gets caught, figure out where it fits into your list and make a mental and/or physical note of the example, or add it as a new category.

Having a communal checklist can also be helpful for identifying group pain points. "Everybody screws up GC-rooting JSString-derived chars sometimes," is easily codified in a communal checklist document that the whole team can reference. In addition, this document helps newcomers avoid potential pitfalls and points out areas of the code that could generally be more usable / less error prone.

Here's another nice summary of more effective practices.

I'm personally of the opinion that, if you find something that you think is defective, you try to write a test to demonstrate it. The beneficial outcomes of this are:

I think in an ideal situation there are also linter tools in place to avoid style nits altogether: aside from nits masquerading as legitimate review comments, automatically enforced stylistic consistency is nice.

Footnotes

[*]

Just in case you were wondering, Billy is not an actual person. I think I started using Billy as my hypothetical example person's name after I saw this fairly amusing video.

[†]

In the literature almost every substantive comment is grouped under the term "defect". This includes things like design decisions and suggested factorings. In the same sense that finding a behavioral error early has benefit, finding these issues early helps improve the overall quality of the product going forward.

Chemistry and compatibility

There's a spectrum for the working compatibility between two people.

On the far left of the spectrum, there's negativity. You hate the other person's guts, and can't work with them at all. There's some personality conflict (which could simply be, "That person is an asshole") or some impasse that would require psychotherapy to bridge.

On the far right of the spectrum, there's chemistry. Effectively, you want to have their technological babies. You finish each other's... that's right, sandwiches. Or sentences. Or parser combinator libraries. When you stumble with a task or concept, that person is there to pick you up with a how's-it-going or whiteboard marker, and that's a two way street. You work together like the badass components of a emergently-more badass machine. Bio-digital jazz, man.

And smack dab in the middle, there's plain ol' compatible. This is like the "friend zone" of the working world. It's fine, and you can go on that way indefinitely, getting things done at a reasonable clip, but it probably doesn't get the creative juices flowing. You're scheduled to meet at a waypoint instead of bushwhacking away at the thicket together.

It takes time, effort, and luck to find people that you have working chemistry with — they're understandably rare. The effort has to start somewhere, though. Maybe it's a good exercise to imagine a person that you're just working-compatible with: if you bore to them your technological soul, might you get something going on?