January 13, 2012

Lively assertions

Recently, "another" discussion about fatal assertions has cropped up in the Mozilla community. Luckily for me, I've missed all of the other discussions, so this is the one where I get to throw in my two bits.

Effectively, I only work on the JS engine, and the JS engine only has fatal assertions. This approach works for the JS team, and I can take an insider's guess as to why.

What's a fatal assertion?

In Mozilla, we have two relevant build modes: debug and no-debug.

A fatal assertion means that, when I write JS_ASSERT(someCondition), if someCondition doesn't hold, we call abort in debug build mode. As a result, the code which follows the assertion may legitimately assume that someCondition holds. You will never see something like this in the JS engine:

    JS_ASSERT(0 <= offset && offset < size);
    if (0 <= offset && offset < size) // Bad! Already enforced!
        offset_ = offset;

The interesting thing is that, in no-debug mode, we will not call abort. We eliminate the assertion condition test entirely. This means that, in production, the code which follows the assertion assumes that someCondition holds, and there's nothing checking that to be the case. [*]

Exploding early and often

If a JS-engine hacker assumes someCondition during development, and it turns out that someCondition isn't the case, we'd like to know about it, and we'd like to know about it LOUDLY.

Our valiant security team runs fuzz testing against the JS engine continuously, and hitting any one of these fatal assertions causes an abort. When you know that there is input that causes an abort in debug mode, you have a few potential resolutions:

But I think the real key to this whole process is simple: if things are exploding, a member of the bomb squad will show up and come to some resolution. Fatal assertions force action in a way that logs will not. You must (at least cursorily) investigate any one of these assertions as though it were in the most severe category, and some form of resolution must be timely in order to unblock fuzzers and other developers.

Everything that the hacker feels can and should be asserted is being asserted in a way that's impossible to ignore. Invariants present in the code base are reflected by the fatal assertions and, once they've proven themselves by running the regression/fuzzer/web gamut, can be depended upon — they certainly come to reinforce and strengthen each other over time.



We do have mechanisms that hackers can use for further checking, however. If crash reports indicate that some assertions may be suspect in production environments, we have a JS_OPT_ASSERT for doing diagnostics in our pre-release distribution channels. Since the most reliable information in a crash report tends to be the line number that you crashed on, fatal non-debug assertions are a very useful capability.

Reviews redux

"Whoa, Billy reviewed a one-meg patch to the hairiest part of the codebase in just two hours!" [*]

It's pretty easy to identify what's wrong with that sentence. The speed of a review is not an achievement. Billy could have literally just checked the, "yes, I reviewed it" button without looking at the patch.

... but an empty review looks pretty bad, especially as the size of the patch grows. So maybe Billy padded it out by identifying two hours worth of style nits and asking for a few comments here and there. In any case, the code quality is no more assured after the review than before it.

Conventional wisdom is that it's economically prudent to do good code reviews: finding defects early incurs the lowest cost, review has a 'peer pressure' based motivation towards quality improvement, and review creates knowledge redundancy that mitigates the bus effect. In research literature on code review, effectiveness is typically measured as "defects found per KLoC". [†] However, this dimension ignores the element of "time per review": I'm going to argue that time to give a good review varies in the complexity and size of the modifications.

Now, one can argue that, if Billy does anything more than ignorantly checking the little "I've reviewed this" box, he has the potential to add value. After all, code isn't going to be defect-free when it comes out of review, so we're just talking about a difference in degree. If we assume that truly obscure or systematic bugs won't jump out from a diff, what additional value is Billy really providing by taking a long time?

This is where it gets tricky. I think the reason that folks can have trouble deciding how long reviews should take is that we don't know what a review really entails. When I request that somebody review my patch, what will they try to suss out? What kind of code quality (in terms of functional correctness and safety) is actually being assured at the component level, across all reviewed code?

If you can't say that your reviews ensure some generally understood level of code quality (i.e. certain issues have definitively been considered), it's hard to say that you're using reviews as an effective tool.

Aside: even with clear expectations for the code review process, each party has to exercise some discipline and avoid the temptation to lean on the other party. For mental framing purposes, it's a defect-finding game in which you're adversaries: the developer wants to post a patch with as few defects as possible and the reviewer wants to find as many defects as they possibly can within a reasonable window of time.

A few best practices

From the research I've read on code review, these are two simple things that are supposed to increase defect-finding effectiveness:

Scan, then dig.

Do a preliminary pass to get the gist of how it's structured and what it's doing. Note down anything that looks fishy at a glance. Once you finish your scan, then do another pass that digs into all the corner cases you can think of and inspects each line thoroughly.

Keep checklists.

One checklist for self-reviews and one checklist for reviews of everybody else's stuff. I've seen it recommended that you scan through the code once for every checklist item to do a truly thorough review.

The self-review checklist is important because you tend to repeat the same mistakes until you've learned them cold. When you make a defect and it gets caught, figure out where it fits into your list and make a mental and/or physical note of the example, or add it as a new category.

Having a communal checklist can also be helpful for identifying group pain points. "Everybody screws up GC-rooting JSString-derived chars sometimes," is easily codified in a communal checklist document that the whole team can reference. In addition, this document helps newcomers avoid potential pitfalls and points out areas of the code that could generally be more usable / less error prone.

Here's another nice summary of more effective practices.

I'm personally of the opinion that, if you find something that you think is defective, you try to write a test to demonstrate it. The beneficial outcomes of this are:

I think in an ideal situation there are also linter tools in place to avoid style nits altogether: aside from nits masquerading as legitimate review comments, automatically enforced stylistic consistency is nice.



Just in case you were wondering, Billy is not an actual person. I think I started using Billy as my hypothetical example person's name after I saw this fairly amusing video.


In the literature almost every substantive comment is grouped under the term "defect". This includes things like design decisions and suggested factorings. In the same sense that finding a behavioral error early has benefit, finding these issues early helps improve the overall quality of the product going forward.

Too smart, doesn't get quite so many things done?

We care about our craft. We're totally smart and get things done. No question.

But "smart and gets things done" has to have some kind of spectrum associated with it, right? There's at least a "smart" dimension and a "gets things done" dimension.

An easy question to ask is, "Am I overthinking?" (This is especially easy to ask if you're overthinking.)

We often quibble about how to get things done better [*] in terms of practicalities, but it often feels like people who ignore the long tail of practicalities achieve greatness with the least effort.

If you had to pick one, would it be better to over-think or to over-do?

(My advice: don't think about it too much.)



In some asymptotic sense of better.

The spectrum of hack

The spectrum of hack.

At my first official performance review as a software engineer, my manager described this spectrum and told me that I was too far on the right. I'll admit that I was a bit shocked. He was an epic software engineer before he became a manager, so it's not like he didn't know what he was talking about...

He pointed out that a lot of tasks don't require that level of perfectionism and that you can get a lot done by letting yourself come over to the hack side.

I believed the powers gained by going over to the hack side were... unnatural. I rejected the advice at first. Over time, however, the concept has stuck with me, like one of those resource-leeching brain earwigs. What do they call those again? Ah right, good ideas.

Since then, I've embraced the idea and I've worked to become more versatile on the spectrum. It hurts at times — in some ways you're dropkicking the craftsman inside yourself right in the face. But, I think I've formulated a theory:

Programming mastery is the ability to oscillate wildly across the spectrum without skipping a beat. I imagine that a master has an instantaneous comprehension of what's appropriate and required to get the job done, but can write code that makes your eyes spring a joy-leak when the opportunity arises. Think "mind like water".

Being perfectly comfortable with all parts of the spectrum simultaneously:

That's a goal worth striving for. Introspection is tough, but we shouldn't leave mastery to the monks that happen to have Z80s in their monasteries.

Remove the self-selection bias from Q&A sessions

I've been in quite a few painful Q&A sessions. I think we can do better.

When somebody volunteers to ask a question, their question is not necessarily of interest to anybody in the audience other than themselves. Despite that possibility, publicly answering the question necessarily takes up the time of every person in the audience. Remember the kid in class who asked 90% of the questions — questions that nobody else ever cared about?

There's a simple way to fix this.

Step 1: At the end of your presentation, ask for a show of hands for those people interested in having a Q&A session. If very few people raise their hands, you should be worried about the quality your presentation. However, in the less gloomy scenario where a good number of people raise their hands, you have a representative sample for the audience population that might be interested in the answer to any given question. (The other people should probably leave, but suggesting that would be rude.)

Step 2: After each person volunteers to ask a question, give the audience a quick poll by saying, "Raise your hand if you're interested in the answer to this question." If very few hands go up, reassure the person that their question is a good one [*] and tell them you'd love to stick around and chat afterwards.

That's it. Each several-minute irrelevant Q&A iteration is now reduced to a few seconds and you have additional feedback as to the topics the audience is interested in — the added audience involvement can't hurt either.


Real-time feedback software that the audience interacts with during your presentation can definitely do better. This is just a low-tech proposal to raise the bar a bit.

As my friend pointed out in discussing this, there's no simple way to determine the critical mass for answering a question — it's possible that any given question will only be interesting to a fraction of the audience. It's also possible that you, the presenter, know that the answer to a question is particularly interesting and should be answered without soliciting audience feedback. I believe in your intuition!



Even if it's not — otherwise, question-asking potentially becomes public humiliation, which is very undesirable.