December 6, 2010

Successfully observing heap data in minidump stacks

If my bedroom's system of organization is to be believed, the heap is a great place to put things.

In all seriousness, though, sometimes you can't diagnose a crash without heap data. It can also be difficult to differentiate bogus information that the debugger presents to you from the real stuff.

Regular expressions (regexps) are particularly heapy: the source of the regexp, the mini-program that the regexp boils down to, the string that the regexp operates on, and the result of match all reside on the heap. Since our minidumps only capture stack data, it's difficult to glean relevant information when things go wrong with regexps.

Unintuition

So, if we want to see relevant heap data, we have to get it onto the stack. What does your gut tell you the solution is? Make a buffer on the stack and memcpy some data into it!

Ah, but the compiler hates your gut(s). It optimizes away both the stack data and the memcpy, because it can prove that the program doesn't observe any of the stack values.

Well, I suppose the compiler didn't really hate your guts — it actually has no idea that you wanted to observe that data. It only understands the semantics of the language that it's compiling, and those semantics state that the stack buffer was unobservable.

So how do we tell the compiler not to optimize away our stack buffer? Here are some approaches that people have told me do not work at the point of the crash:

So what do we know actually works?

In a recent bug I had success doing the following:

struct JSContext {
    // ...
    volatile jschar *sampleBuf;
};

void doBadThings(JSContext *heapContext, JSString *str) {
    /*
     * We've witnessed a weird crashy address in the bug reports, so when
     * we see that, we want to take a sample of the string that's
     * crashing.
     */
    if (aboutToCrossWeirdCrashyBoundary(str)) {
        jschar buf[128];
        heapContext->sampleBuf = &buf;
        memcpy(buf, str->chars(), JS_MIN(128, str->length()));
    }

    // ... Point of crash!
}

And for values that need to be reliably observed through debug information at the point of the crash (i.e. not have their contents somehow affected by optimization):

struct JSContext {
    // ...
    volatile size_t *strLen
};

void doOtherBadThings(JSContext *heapContext, JSString *str) {
    volatile size_t len = str->length();
    heapContext->strLen = &len;
    /*
     * We'll be able to observe the correct value of len within this
     * frame.
     */

    // ... Point of crash!
}

Note that the only real difference between these two examples is that I didn't mark the stacked buffer itself as volatile and that still worked out. More experimentation is needed! Unfortunately, figuring out what works for diagnostics has largely been trial and error.

(I also wonder if we can do much fancier things with breakpad — this is a quick-and-dirty solution that I knew was likely to work. Nobody that I've talked to so far knows how we'd go about registering breakpad hooks, so that's another thing to look into!)

Alternative approach

As dmandelin also pointed out to me, you can also think of crashes as black boxes that take in a build/URL and produce a line number. If you can detect that you're about to crash, then you can switch on a value of interest (or if-ladder with arbitrary conditions) and intentionally early-crash on different arms of the switch, producing helpful line number indicators and potentially narrowing down the source of the problem.

if (weAreAboutToCrash()) {
    if (this == NULL && aboutToBeSwallowedByTheSun)
        JS_CRASH("Bad this pointer.");
    else if (this->count == 666 && !circleOfProtectionBlack)
        JS_CRASH("Posessed.");
    else
        JS_CRASH("Unknown crash reason");
}

Footnotes

[*]

It's confusing that this wouldn't work because, as lw points out, you might be able to mmap bits of the stack to MMIO space or something crazy that the compiler would then incorrectly optimize. Reads and writes to volatile storage are supposed to be one of the observable properties of a C/C++ program, right alongside I/O.