November 14, 2012

ARM chars are unsigned by default

[Latest from the "I can't believe I'm writing a blog entry about this" department, but the context and surrounding discussion is interesting. --Ed]

If you're like me, or one of the other thousands of concerned parents who has borne C code into this cruel, topsy-turvy, and oftentimes undefined world, you read the C standard aloud to your programs each night. It's comforting to know that K&R are out there, somewhere, watching over them, as visions of Duff's Devices dance in their wee little heads.

The shocking truth

In all probability, you're one of today's lucky bunch who find out that the signedness of the char datatype in C is undefined. The implication being, when you write char, the compiler is implicitly (but consistently) giving it either the signed or unsigned modifier. From the spec: [*]

The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

...

Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

—ISO 9899:1999, section "6.2.5 Types"

Why is char distinct from the explicitly-signed variants to begin with? A great discussion of historical portability questions is given here:

Fast forward [to 1993] and you'll find no single "load character from memory and sign extend" in the ARM instruction set. That's why, for performance reasons, every compiler I'm aware of makes the default char type signed on x86, but unsigned on ARM. (A workaround for the GNU GCC compiler is the -fsigned-char parameter, which forces all chars to become signed.)

Portability and the ARM Processor, Trevor Harmon, 2003

It's worth noting, though, that in modern times there are both LDRB (Load Register Byte) and LDRSB (Load Register Signed Byte) instructions available in the ISA that do sign extension after the load operation in a single instruction. [†]

So what does this mean in practice? Conventional wisdom is that you use unsigned values when you're bit bashing (although you have to be extra careful bit-bashing types smaller than int due to promotion rules) and signed values when you're doing math, [‡] but now we have this third type, the implicit-signedness char. What's the conventional wisdom on that?

Signedness-un-decorated char is for ASCII text

If you find yourself writing:

char some_char = NUMERIC_VALUE;

You should probably reconsider. In that case, when you're clearly doing something numeric, spring for a signed char so the effect of arithmetic expressions across platforms is more clear. But the more typical usage is still good:

char some_char = 'a';

For numeric uses, also consider adopting a fixed-width or minimum-width datatype from <stdint.h>. You really don't want to hold the additional complexity of char signedness in your head, as integer promotion rules are already quite tricky.

Examples to consider

Some of the following mistakes will trigger warnings, but you should realize there's something to be aware of in the warning spew (or a compiler option to consider changing) when you're cross-compiling for ARM.

Example of badness: testing the high bit

Let's say you wanted to see if the high bit were set on a char. If you assume signed chars, this easy-to-write comparison seems legit:

if (some_char < 0)

But if your char type is unsigned that test will never pass.

Example of badness: comparison to negative numeric literals

You could also make the classic mistake:

char c = getchar(); // Should actually be placed in an int!
while (c != EOF)

This comparison would never return true with an 8-bit unsigned char datatype and a 32-bit int datatype. Here's the breakdown:

When getchar() returns ((signed int) -1) to represent EOF, you'll truncate that value into 0xFFu (because chars are an unsigned 8-bit datatype). Then, when you compare against EOF, you'll promote that unsigned value to a signed integer without sign extension (preserving the bit pattern of the original, unsigned char value), and get comparison between 0xFF (255 in decimal) and 0xFFFFFFFF (-1 in decimal). For all the values in the unsigned char range, I hope it's clear that this test will never pass. [§]

To make the example a little more obvious we can replace the call to getchar() and the EOF with a numeric -1 literal and the same thing will happen.

char c = -1;
assert(c == -1); // This assertion fails. Yikes.

That last snippet can be tested by compiling in GCC with -fsigned-char and -funsigned-char if you'd like to see the difference in action.

Footnotes

[*]

The spec goes on to say that you can figure out the underlying signedness by checking whether CHAR_MIN from <limits.h> is 0 or SCHAR_MIN. In C++ you could do the <limits>-based std::numeric_limits<char>::is_signed dance.

[†]

Although the same encodings exist in Thumb-sub-ISA, the ARM-sub-ISA encoding for LSRSB lacks a shift capability on the load output as a result of this historical artifact.

[‡]

Although sometimes of the tradeoffs can be more subtle. Scott Meyers discusses more issues quite well, per usual.

[§]

Notably, if you make the same mistake in in the signed char case you can breathe easier, because you'll sign extend for the comparison, making the test passable.

These tablets are for consumption

I got lucky and came up with a witty title for this one despite sleep deprivation. I could probably go on some sarcastic diatribe about how we happily pay half a thousand dollars for a magazine-consolidating bathroom reading device while people with TB lack necessary medical supplies; but, surprisingly, my goal is not to torture you, dear reader. Mostly because you've got it going on.

In reality, I just wanted to confess to the world that I get it now. Work recently lent me an Asus Transformer tablet (sans fancy keyboard dock thing I've heard about) in order to debug a JS problem in OS X cross compiles. So, I took the plunge, trying to figure out what people actually use these things for in their daily lives.

For me, the answer was pretty simple: streamlined content consumption.

I quickly learned that I can't create anything of value on a tablet in its natural habitat — at least, until the demand for "world's funniest pot-roast-fisted input device typing error videos" goes mainstream.

At first, I found this infuriating. Most of my typical computing time is spent creating things — things of questionable value though they may be. But then, a docile sense of calm and well being washed over me, like that inexplicable clump of undissolved Koolaid powder licked off the lips of a siren or a wildly misfired tranquilizer dart.

I don't have to try to produce things all the time. I can chill.

Reading books in the book reader, catching up on bug mail, knocking down a few cool and refreshing feed reader entries on one of California's patronizingly delectably prodigiously warm October days.

Sure, all the cross country runners care about now is training, but if you entice them to run in a giant hamster ball, how much more likely are they to stop and smell the roses?

(Presumably the aforementioned hamster ball has large air holes that you could potentially smell flowers through.)

Thoughts on desktop Linux incompatibilities with iPhone and Android

Linux users want music-player/phone integration. Linux users want to sync all of their data — contacts, emails, calendars, bookmarks, documents, ebooks, music, photos, videos — at the touch of a button. Linux users want 3G data rates. Linux users want a state of the art, coordinated mobile platform.

If FLOSS developers are so prone to scratching their own itches, why doesn't there exist such a thing?

Because large scale mobile device companies box us out.

The iPhone Platform

I believe that Linux users who purchase their iPhone with the intent of jailbreaking it to fake compatibility are doing the Linux community a great disservice. They are purchasing a device which is made with the intent of not working with your computer. There's no more mass storage device. There's no longer a known iTunesDB format. The iPhone goes so far to obscure our intended usage that the community-recommend method of gaining functionality was to use an arbitrary code execution exploit. This is what we're driven to do. Do you want to support this behavior with your $200-500?

From a technological standpoint, our historical success at reverse engineering is very cool. It demonstrates the community's technical prowess through our ability to overcome artificial barriers. Despite the coolness factor, however, we can not and should not rely on our ability to kluge around obstacles in our path. Why? Because it doesn't allow us to make any definitive progress. It constantly puts us several steps behind the capabilities of a "properly" functioning device, both due to the difficulty of finding a solution and the misdirection of creative energy. One can't reasonably expect to build a working, Linux-compatible platform on top of a series of hacks that could potentially break with any minor release.

Even more insulting is the message that alternative solutions that work within the system are unwelcome. In my mind, the rallying cry of the Linux community should be "iPhone != iTunes". Ideally, the community could write an iTunes replacement application that played Ogg Vorbis and FLAC files. Let's enumerate some problems that this would solve for FLOSS developers and enthusiasts:

  1. We wouldn't have to reverse engineer the new iTunesDB format (or anything having to do with iTunes).

  2. We wouldn't have to reverse engineer the new iPhone USB protocol.

  3. We would be starting a platform with a solid base that we could build upon. We would no longer be at the mercy of a development shop that clearly doesn't care about our demographic.

  4. We could have it connect to a small socket server on our local machines and automatically sync music over WiFi.

  5. We could play Ogg Vorbis files, for God's sake!

We could write a whole suite of totally legitimate applications for the iPhone to perform compatible iPhone-native-application-like functionality, all within the artificial constraints of the iPhone! There's nothing stopping us — except for the distribution mechanism. If Apple is at all amenable to our cause, the rejection of competitive apps will have to stop. Again: we should not have to void our warranties to use our product in legitimate ways on our competitive computing platforms.

Sadly, even if iTunes-store enlightenment came to fruition, we'd still be screwed. Platform restrictions disallow several key abilities. Case in point, we could not background our iTunes-replacement music player while we browsed the web (or did anything else, for that matter). We find ourselves at the mercy of the exposed API and Human Interface restrictions. Although this is unfortunate, it's decidedly better than founding a platform on our ability to hack around the poor design decisions of others.

The Android Platform

I'm much less well informed about the Android platform and the upcoming HTC Dream mobile device. Nobody is well informed at this point — almost exactly one month from the expected release date — much to the chagrin of potential customers. There are early indications that Linux desktop compatibility will not be supported natively on this platform either. As a Linux user, I can only cross my fingers and hope that Android will be as open as Google makes it out to be, while keeping a close watch on the potentially hazardous centralized distribution model.

Food For Thought

Since this article is supposed to contain my "thoughts on" the subject, I feel I should also share this little tidbit that keeps rattling around in my head. I'm not drawing any conclusions, just providing the reader with another, incomplete step in my thought process.

Monopoly law exists, in part, to disallow certain practices that are thought to be detrimental to "consumer welfare". From Wikipedia (emphasis added):

Competition law does not make merely having a monopoly illegal, but rather abusing the power that a monopoly may confer, for instance through exclusionary practices.

Update: September 20, 2008

An application named MailWrangler was also barred from the Apple Store for vaguely duplicating the functionality of Mail.app. From Angelo DiNardi's article:

Normally to check multiple Gmail accounts in mobile Safari you would have to log in and out of all of the accounts, typing the username and password for each. Using just the Apple Mail application you aren’t able to see threaded views, your google contacts, archive (quickly), star, etc without going through the hassles that are present when using Gmail’s IMAP on the iPhone.

This is another case of barring an application that offers features for a smaller demographic. I personally can't see why Apple is so "afraid" — let third party apps spring up for specialized features, so long as they don't violate the device's terms of use. If you feel like incorporating those features into Mail.app somewhere down the road, the other applications will die out naturally.

I feel sincere sympathy for Angelo; however, on the desktop Linux side we're at an even greater disadvantage — for us, there isn't even similar functionality available on the iPhone platform. To just sync our music, we have to void our warranties. The only thing we can possibly do without voiding our warranties is write an app with similar functionality to the iTunes music player and acquire it through the Apple Store. Forbidding us from doing this makes legitimate desktop Linux use impossible — for what advantage?

IDE cable termination

I never gave much thought as to how IDE cables are terminated. Recently, I broke an exceptionally small IDE cable that lives in my hard drive enclosure — I can never figure out how to pull IDEs out by the head, and so I always end up yanking on the cable, often detrimentally. :)

In breaking the head of the cable, I found out that this IDE (and I assume this holds for all IDEs) is "vampire tapped", reminding me of 10BASE5 Ethernet technology. Effectively, all 40 of the insulated wire sheaths are pierced by sharp spikes in the terminator. I'm not sure if this vampire tap method also holds for the three-head IDEs (board/master/slave) — I'll have to dismantle one of those in the future. It might be fun to look into IDE arbitration protocol at some point to figure out how those three-head IDE cables work properly. Are they a single bus with three vampire taps, or two separate buses with the middle device acting as an arbiter?

At any rate, it's real hard to get one of these terminators situated right after you knock it out of place. There are holes from the previous termination that you have to place just right. So far as my external enclosure is concerned, it looks like I've gotta find a new cable. :/

Hoarding hard drives

Cleaning out the basement, among a bunch of other junk, I found 6 hard drives (which I thought was a large number of hard drives). For some reason I thought it'd be fun to enumerate them...

  1. IBM Deskstar 75GXP, 46.1GB, 7200rpm

  2. Maxtor DiamondMax VL 30, 23.0GB, 5400rpm

  3. IBM Deskstar 40GV, 20.4GB, 5400rpm

  4. Maxtor DiamondMax 6800, 10.1GB, 5400rpm

  5. Maxtor DiamondMax 2160, 8.4GB, 5400rpm

  6. Western Digital Caviar AC22500, 2.5GB, 5400rpm

The average size of a hard drive in my basement is 18.42GB!