ARM chars are unsigned by default
[Latest from the "I can't believe I'm writing a blog entry about this"
department, but the context and surrounding discussion is interesting. --Ed]
If you're like me, or one of the other thousands of concerned parents who has borne C code into this cruel, topsy-turvy, and oftentimes undefined world, you read the C standard aloud to your programs each night. It's comforting to know that K&R are out there, somewhere, watching over them, as visions of Duff's Devices dance in their wee little heads.
The shocking truth
In all probability, you're one of today's lucky bunch who find out that the
signedness of the char datatype in C is undefined. The implication being, when
you write char, the compiler is implicitly (but consistently) giving it
either the signed or unsigned modifier. From the spec: [*]
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.
Irrespective of the choice made, char is a separate type from the
other two and is not compatible with either.
—ISO 9899:1999, section "6.2.5 Types"
Why is char distinct from the explicitly-signed variants to begin with? A
great discussion of historical portability questions is given here:
Fast forward [to 1993] and you'll find no single "load character from
memory and sign extend" in the ARM instruction set. That's why, for
performance reasons, every compiler I'm aware of makes the default char
type signed on x86, but unsigned on ARM. (A workaround for the GNU GCC
compiler is the -fsigned-char parameter, which forces all chars to
—Portability and the ARM Processor, Trevor Harmon, 2003
It's worth noting, though, that in modern times there are both LDRB (Load
Register Byte) and LDRSB (Load Register Signed Byte) instructions available
in the ISA that do sign extension after the load operation in a single
So what does this mean in practice? Conventional wisdom is that you use
unsigned values when you're bit bashing (although you have to be extra careful
bit-bashing types smaller than int due to promotion rules) and signed values
when you're doing math, [‡] but now we have this third type, the
implicit-signedness char. What's the conventional wisdom on that?
Signedness-un-decorated char is for ASCII text
If you find yourself writing:
char some_char = NUMERIC_VALUE;
You should probably reconsider. In that case, when you're clearly doing
something numeric, spring for a signed char so the effect of arithmetic
expressions across platforms is more clear. But the more typical usage is still
For numeric uses, also consider adopting a fixed-width or minimum-width
datatype from <stdint.h>. You really don't want to hold the additional
complexity of char signedness in your head, as integer promotion rules are
already quite tricky.
Examples to consider
Some of the following mistakes will trigger warnings, but you should realize there's
something to be aware of in the warning spew (or a compiler option to consider
changing) when you're cross-compiling for ARM.
Example of badness: testing the high bit
Let's say you wanted to see if the high bit were set on a char. If you assume signed chars, this easy-to-write comparison seems legit:
But if your char type is unsigned that test will never pass.
Example of badness: comparison to negative numeric literals
You could also make the classic mistake:
char c = getchar(); // Should actually be placed in an int!
while (c != EOF)
This comparison would never return true with an 8-bit unsigned char
datatype and a 32-bit int datatype. Here's the breakdown:
When getchar() returns ((signed int) -1) to represent EOF, you'll
truncate that value into 0xFFu (because chars are an unsigned 8-bit datatype).
Then, when you compare against EOF, you'll promote that unsigned value to a
signed integer without sign extension (preserving the bit pattern of the
original, unsigned char value), and get comparison between 0xFF (255 in
decimal) and 0xFFFFFFFF (-1 in decimal). For all the values in the unsigned
char range, I hope it's clear that this test will never pass. [§]
To make the example a little more obvious we can replace the call to
getchar() and the EOF with a numeric -1 literal and the same thing
char c = -1;
assert(c == -1); // This assertion fails. Yikes.
That last snippet can be tested by compiling in GCC with -fsigned-char and
-funsigned-char if you'd like to see the difference in action.
Picky monkeys PIC ARM
Alongside our ferocious fixing, one of our late-game performance initiatives was to get all of our polymorphic inline caches (AKA PICs) enabled on ARM devices. It was low risk and of high benefit to our Firefox for Mobile browser, whose badass-yet-cute codename is Fennec.
Jacob Bramley and I took on this ARM support task in bug 588021, obviously building on excellent prior inline cache work from fellow team members David Anderson, Dave Mandelin, Sean Stangl, and Bill McCloskey.
tl;dr: Firefox for Mobile fast on ARM. Pretty graphs.
Melts in your mouth, not in your ARM
To recap, JägerMonkeyJM is also known as the "method compiler": it takes a method's bytecode as input and orders up the corresponding blob of machine code with some helpful information on the side. Its primary sub-components are the register tracker, which helps the compiler transform the stack-based bytecode and reuse already-allocated machine registers intelligently, and the MacroAssembler, which is the machine-code-emitting component we imported from Webkit's Nitro engine.
The MacroAssembler is the secret sauce for JägerMonkeyJM's platform independence. It's an elegantly-designed component that can be used to emit machine code for multiple target architectures: all of x86, x86-64, and ARM assembly are supported through the same C++ interface! This abstraction is the reason that we only need one implementation of the compiler for all three architectures, which has been a clear win in terms of cross-platform feature additions and maintainability.
"So", you ask, "if you've got this great MacroAssembler-thingy-thing, why didn't all the inline caches work on all the platforms to begin with?" Or, alternatively, "If all the compiler code is shared among all the platforms, why didn't all the inline caches crash on ARM?"
The answer is that some platform-specifics had crept into our compiler code!
ARM'd and ifdef-dangerous
As explained in the entry on inline caches, an inline cache is a chunk of self-modifying machine code. A machine code "template" is emitted that is later tweaked to reflect the cached result of a common value. If you're frequently accessing the nostrilCount property of Nose objects, inline caches make that fast by embedding a shortcut for that access into the machine code itself.
In the machine code "template" that we use for inline caches, we need to know where certain constants, like object type and object-property location, live as offsets into the machine code so that we can change them later, during a process called repatching. However, when our compiler says, "If this value is not 0xdeadbeef, go do something else," we wind up with different encodings on each platform.
As you may have guessed, machine-code offsets are different for each platform, which made it easier for other subtle platform-specifics to creep into the compiler as well.
To answer the question raised earlier, the MacroAssembler interface wasn't heavily relied on for the early inline cache implementations. Inline caches were first implemented for x86, and although x86 is a variable-width instruction set, all of the instruction sequences emitted from the compiler had a known instruction width and format. [*] This permitted us to use known-constant-offset values for the x86 platform inline caches. These known-constant-offsets never changed and so didn't require any space or access time overhead in side-structures. They seemed like the clear solution when x86 was the only platform to get up-and-running.
Then x86-64 (AKA x64) came along, flaunting its large register set and colorful plumage. On x64, the instruction sequence did not have a known width and format! Depending on whether the extended register set is used, things like mov instructions may require a special REX prefix byte in the instruction stream (highlighted in blue above). This led to more ifdefs — on x64 a bunch more values have to be saved in order to know where to patch our inline caches!
As a result, getting inline caches working on ARM was largely a JägerMonkey refactoring effort. Early on, we had used conditional compilation (preprocessor flags) to get inline caches running on a platform-by-platform basis, which was clearly the right decision for rapid iteration, but we decided that it was time to pay down some of our technical debt.
Paying down the debt: not quite an ARM and a leg
The MacroAssembler deals with raw machine values — you can tell it dull-sounding machine-level things like, "Move this 17 bit sign-extended immediate into the EAX register."
On the other hand, we have our own awesome-sounding value representation in the SpiderMonkey engine: on both 32-bit and 64-bit platforms every "JS value" is a 64-bit wide piece of data that contains both the type of the data and the data itself. [†] Because the compiler is manipulating these VM values all the time, when we started the JägerMonkeyJM compiler it was only natural to put the MacroAssembler in a delicious candy coating that also knew how to deal with these VM values.
The NunboxAssembler, pictured in red, [‡] is a specialized assembler with routines to deal with our "nunbox" value representation. [§] The idea of the refactoring was to candy-coat a peer of the MacroAssembler, the Repatcher, with routines that knew how to patch common inline cache constructs that the NunboxAssembler was emitting.
With the inline cache Repatcher in place, we were once again able to move all the platform-specific code out of the compiler and into a single, isolated part of the code base, hidden behind a common interface.
Routines like NunboxAssembler::emitTypeGuard, which knows how to emit a type guard regardless of the platform, are paired with routines like ICRepatcher::patchTypeGuard(newType), which knows how to patch a type guard regardless of platform. Similarly, NunboxAssembler::loadObjectProperty has a ICRepatcher::patchObjectPropertyLoad. The constructs that are generated by the NunboxAssembler are properly patched by the corresponding ICRepatcher method on a miss. It's all quite zen.
On real devices running the Fennec betas, we've seen marked improvements since Beta 3. [¶] Most notably, we've leapfrogged the stock Android 2.2 browser on the V8-V5 benchmark on both the Galaxy S and the Nexus One. Pretty graphs courtesy of Mark Finkle.
ARMn't you glad I didn't say banana?
Since I've run out of remotely-acceptable ARM malapropisms, these topics will be left to further discussion. Feel free to comment on anything that deserves further clarification!
Why does the JägerMonkeyJM ARM back-end emit fixed-width ARMv7 machine code, instead of Thumb2?
How are JägerMonkeyJM exceptions implemented on ARM? (It's slightly different from x86/x64.)
What are the current development platform limitations?
How does the compiler prevent a constant pool from being dumped into the code stream?