December 13, 2012

Systems programming at my alma mater

Bryan also asked me this at NodeConf last year, where I was chatting with him about the then-in-development IonMonkey:

An old e-mail to the Cornell CS faculty: https://gist.github.com/4278516  Have things changed in the decade since?

I remembered my talk with Bryan when I went to recruit there last year and asked the same interview question that he references — except with the pointer uninitialized so candidates would have to enumerate the possibilities — to see what evidence I could collect. My thoughts on the issue haven't really changed since that chat, so I'll just repeat them here.

(And, although I do not speak for my employer, for any programmers back in Ithaca who think systems programming and stuff like Birman's class is cool beans, my team is hiring both full time and interns in the valley, and I would be delighted if you decided to apply.)

My overarching thought: bring the passion

Many of the people I'm really proud that my teams have hired out of undergrad are just "in love" with systems programming, just as a skilled artisan "cares" about their craft. They work on personal projects and steer their trajectory towards it somewhat independent of the curriculum.

Passion seems to be pretty key, along with follow-through, and ability to work well with others, in the people I've thumbs-up'd over the years. Of course I always want people who do well in their more systems-oriented curriculum and live in a solid part the current-ability curve, but I always have an eye out for the passionately interested ones.

So, I tend to wonder: if an org has a "can systems program" distribution among the candidates, can you predict the existence of the outliers at the career fair from the position of the fat part of that curve?

Anecdotally, myself and two other systems hackers on the JavaScript engine came from the same undergrad program, modulo a few years, although we took radically different paths to get to the team. They are among the best and most passionate systems programmers I've ever known, which also pushes me to think passionate interest may be a high-order bit.

Regardless, it's obviously in systems companies' best interest to try to get the most bang per buck on recruiting trips, so you can see how Bryan's point of order is relevant.

My biased take-away from my time there

I graduated less than a decade ago, so I have my own point of reference. From my time there several years ago, I got the feeling that the mentality was:

This didn't come from any kind of authority, it's just putting into words the "this is how things are done around here" understanding I had at the time. All of them seemed reasonable in context, though I didn't think I wanted to head down the path alluded by those rules of thumb. Of course these were, in the end, just rules of thumb: we still had things like a Linux farm used by some courses.

I feel that the "horrible for teaching" problem extends to other important real-world systems considerations as well: I learned MIPS and Alpha [*], presumably due to their clean RISC heritage, but golly do I ever wish I was taught more about specifics of x86 systems. And POSIX systems. [†]

Of course that kind of thing — picking a "real-world" ISA or compute platform — can be a tricky play for a curriculum: what do you do about the to-be SUN folks? Perhaps you've taught them all this x86-specific nonsense when they only care about SPARC. How many of the "there-be-dragons" lessons from x86 would cross-apply?

There's a balance between trade and fundamentals, and I feel I was often reminded that I was there to cultivate excellent fundamentals which could later be applied appropriately to the trends of industry and academia.

But seriously, it's just writing C...

For my graduating class, CS undergrad didn't really require writing C. The closest you were forced to get was translating C constructs (like loops and function calls) to MIPS and filling in blanks in existing programs. You note the bijection-looking relationship between C and assembly and can pretty much move on.

I tried to steer to hit as much interesting systems-level programming as possible. To summarize a path to learning a workable amount of systems programming in my school of yore, in hopes it will translate to something helpful existing today:

I'm not a good alum in failing to keep up with the goings-ons but, if I had a recommendation based on personal experience, it'd be to do stuff like that. Unfortunately, I've also been at companies where the most basic interview question is "how does a vtable actually work" or on nuances of C++ exceptions, so for some jobs you may want to take an advanced C++ class as well.

Understanding a NULL pointer deref isn't writing C

Eh, it kind of is. On my recruiting trip, if people didn't get my uninitialized pointer dereference question, I would ask them questions about MMUs if they had taken the computer organization class. Some knew how an MMU worked (of course, some more roughly than others), but didn't realize that OSes had a policy of keeping the null page mapping invalid.

So if you understand an MMU, why don't you know what's going to happen in the NULL pointer deref? Because you've never actually written a C program and screwed it up. Or your haven't written enough assembly with pointer manipulation. If you've actually written a Java program and screwed it up you might say NullPointerException, but then you remember there are no exceptions in C, so you have to quickly come up with an answer that fits and say zero.

I think another example might help to illustrate the disconnect: the difference between protected mode and user mode is well understood among people who complete an operating systems course, but the conventions associated with them (something like "tell me about init"), or what a "traditional" physical memory space actually looks like, seem to be out of scope without outside interest.

This kind of interview scenario is usually time to fluency sensitive — wrapping your head around modern C and sane manual memory management isn't trivial, so it does require some time and experience. Plus when you're working regularly with footguns, team members want a basic level of trust in coding capability. It's not that you think the person can't do the job, it's just not the right timing if you need to find somebody who can hit the ground running. Bryan also mentions this in his email.

Thankfully for those of us concerned with the placement of the fat part of the distribution, it sounds like Professor Sirer is saying it's been moving even more in the right direction in the time since I've departed. And, for the big reveal, I did find good systems candidates on my trip, and at the same time avoided freezing to death despite going soft in California all these years.

Brain teaser

I'll round this entry off with a little brain teaser for you systems-minded folks: I contend that the following might not segfault.

// ...

int main() {
    mysterious_function();
    A *a = NULL;
    printf("%d\n", a->integer_member);
    return EXIT_SUCCESS;
}

How many reasons can you enumerate as to why? What if we eliminate the call to the mysterious function?

Footnotes

[*]

In an advanced course we had an Alpha 21264 that I came to love deeply.

[†]

I'm hoping there's more emphasis on POSIX these days with the mobile growth and Linux/OS X dominance in that space.

ARM chars are unsigned by default

[Latest from the "I can't believe I'm writing a blog entry about this" department, but the context and surrounding discussion is interesting. --Ed]

If you're like me, or one of the other thousands of concerned parents who has borne C code into this cruel, topsy-turvy, and oftentimes undefined world, you read the C standard aloud to your programs each night. It's comforting to know that K&R are out there, somewhere, watching over them, as visions of Duff's Devices dance in their wee little heads.

The shocking truth

In all probability, you're one of today's lucky bunch who find out that the signedness of the char datatype in C is undefined. The implication being, when you write char, the compiler is implicitly (but consistently) giving it either the signed or unsigned modifier. From the spec: [*]

The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

...

Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

—ISO 9899:1999, section "6.2.5 Types"

Why is char distinct from the explicitly-signed variants to begin with? A great discussion of historical portability questions is given here:

Fast forward [to 1993] and you'll find no single "load character from memory and sign extend" in the ARM instruction set. That's why, for performance reasons, every compiler I'm aware of makes the default char type signed on x86, but unsigned on ARM. (A workaround for the GNU GCC compiler is the -fsigned-char parameter, which forces all chars to become signed.)

Portability and the ARM Processor, Trevor Harmon, 2003

It's worth noting, though, that in modern times there are both LDRB (Load Register Byte) and LDRSB (Load Register Signed Byte) instructions available in the ISA that do sign extension after the load operation in a single instruction. [†]

So what does this mean in practice? Conventional wisdom is that you use unsigned values when you're bit bashing (although you have to be extra careful bit-bashing types smaller than int due to promotion rules) and signed values when you're doing math, [‡] but now we have this third type, the implicit-signedness char. What's the conventional wisdom on that?

Signedness-un-decorated char is for ASCII text

If you find yourself writing:

char some_char = NUMERIC_VALUE;

You should probably reconsider. In that case, when you're clearly doing something numeric, spring for a signed char so the effect of arithmetic expressions across platforms is more clear. But the more typical usage is still good:

char some_char = 'a';

For numeric uses, also consider adopting a fixed-width or minimum-width datatype from <stdint.h>. You really don't want to hold the additional complexity of char signedness in your head, as integer promotion rules are already quite tricky.

Examples to consider

Some of the following mistakes will trigger warnings, but you should realize there's something to be aware of in the warning spew (or a compiler option to consider changing) when you're cross-compiling for ARM.

Example of badness: testing the high bit

Let's say you wanted to see if the high bit were set on a char. If you assume signed chars, this easy-to-write comparison seems legit:

if (some_char < 0)

But if your char type is unsigned that test will never pass.

Example of badness: comparison to negative numeric literals

You could also make the classic mistake:

char c = getchar(); // Should actually be placed in an int!
while (c != EOF)

This comparison would never return true with an 8-bit unsigned char datatype and a 32-bit int datatype. Here's the breakdown:

When getchar() returns ((signed int) -1) to represent EOF, you'll truncate that value into 0xFFu (because chars are an unsigned 8-bit datatype). Then, when you compare against EOF, you'll promote that unsigned value to a signed integer without sign extension (preserving the bit pattern of the original, unsigned char value), and get comparison between 0xFF (255 in decimal) and 0xFFFFFFFF (-1 in decimal). For all the values in the unsigned char range, I hope it's clear that this test will never pass. [§]

To make the example a little more obvious we can replace the call to getchar() and the EOF with a numeric -1 literal and the same thing will happen.

char c = -1;
assert(c == -1); // This assertion fails. Yikes.

That last snippet can be tested by compiling in GCC with -fsigned-char and -funsigned-char if you'd like to see the difference in action.

Footnotes

[*]

The spec goes on to say that you can figure out the underlying signedness by checking whether CHAR_MIN from <limits.h> is 0 or SCHAR_MIN. In C++ you could do the <limits>-based std::numeric_limits<char>::is_signed dance.

[†]

Although the same encodings exist in Thumb-sub-ISA, the ARM-sub-ISA encoding for LSRSB lacks a shift capability on the load output as a result of this historical artifact.

[‡]

Although sometimes of the tradeoffs can be more subtle. Scott Meyers discusses more issues quite well, per usual.

[§]

Notably, if you make the same mistake in in the signed char case you can breathe easier, because you'll sign extend for the comparison, making the test passable.

Simple, selfish, and unscientific shootout

Disclaimer

I've caught some flak over publishing my "selfish" (read: empirical testing that yields results which are only relevant to me) multi-language-engine-and-standard-library "shootout" (read: I wrote the same basic functionality across multiple languages, somewhat like on the shootout.alioth.debian.org site, the Computer Language Benchmarks Game). I value the concept and process of learning in the open, but it may require more time and consideration of clarity than I had given in this entry. Taking it down would apparently be a breach of etiquette, so please read the following TL;DR as a primer.

TL;DR: I encourage you to personally try writing small utilities against a variety of language engines when you have the opportunity. Consider how much tweaking of the original code you have to do in order to obtain a well-performing implementation. Weigh the development effort and your natural proficiency against the performance, clarity, and safety of the resulting program. Gather evidence and be eager to test your cost assumptions. Commit to learning about sources of overhead and unforeseen characteristics of your libraries. You may be surprised which engines give the best bang per time spent.

It has also been suggested to me that all native languages are within ~3x of one another on generated code performance, and the rest of the difference is generally attributable to the library or algorithm, so that's an interesting rule of thumb to keep in mind.

If you'd like to see how to write a small utility against a variety of language engines, you can check out the Github repo.

Introduction

We tend to throw around "orders of magnitude" when it comes to "programming language speeds", even though we know that the concept of a programming language having a speed for arbitrary programs makes little sense. But, when I'm coding up something small, I find myself pondering a very concrete question: which available language engine (language implementation and libraries) could I reasonably write this little program against that would give the best speed over development effort?

I'm not looking to optimize all the buttery nooks and crannies of this program, nor do I want to drill into potential deficiencies in the I/O subsystem: I just want to make a painless little utility that doesn't require me to go on a lunch break.

XKCD knows what I'm talking about:

Unscientific

I was writing a very simple, single-threaded program to generate about a billion uniformly random int32s in a text file, and I decided I would do a selfish little shootout: write the same program in a set of "viable" languages (remember, this is all about me :-), unscientifically use time(1) on the programs a few times, consider how painful it was to write, and see what the runtimes come out to be.

For 100 million integers on my CentOS Bloomfield box, these were the runtimes for my initial, naive implementations and their lightly tweaked counterparts:

Impl

Naive Runtime

Naive Ratio

Tweaked Runtime

Tweaked Ratio

Engine

.cpp

~0m 11s

~0m 15s

GCC 4.4.6 -O3

.java

~0m 18s

~1.5x

~0m 19s

~1.25x

JDK 1.7.0.04

.go

~1m 5s

~6x

~0m 23s

~1.5x

go1.0.1

.rs

~1m 7s

~6x

~0m 23s

~1.5x

rustc -O3 0.2 (trunk)

.ml

~0m 37s

~3.3x

~0m 35s

~2.5x

ocamlopt 3.11.2

.py

~1m 6s

~6x

~0m 51s

~3.5x

PyPy 1.9.1 (nightly)

.lua

~1m 36s

~9x

~0m 27s (FFI)

~1.8x

LuaJIT 2.0.0-beta10 (trunk)

.rb

~1m 50s

~10x

ruby 2.0.0 (trunk)

Like all developers, I have varied levels of expertise across languages and their standard libraries; but, as I said, this is a selfish shootout, so my competence in a given language is considered part of the baseline. You'll see in the comments that many readers identified performance bugs in these code samples.

There are also caveats for the random numbers I was generating in OCaml (due to tag bit stealing).

For a billion integers the naive C++0x version took 1m 42s and the naive Java version took 2m 18s (1.35x slower). I didn't want to spend the time to slow down the others by an order of magnitude.

As a result — with perpetual intent to improve my abilities in all engines I work with, willful ignorance of the reasoning, acknowledgement that I need to perform more experiments like this to draw a more reasonable conclusion over time, and malice aforethought — I'll hereby declare myself guilty of leaning a bit more towards writing things like this in C++ when I want better runtimes in the giga range for little IO-and-compute programs.

Show me the code!

I threw the code up on github, but the versions that I wrote naively (before optimization suggestions) are duplicated here for convenience.

C++0x:

#include <cstdlib>
#include <cstdio>
#include <random>

int
main(int argc, char **argv)
{
    if (2 != argc) {
        fprintf(stderr, "Usage: %s <elem_count>\n", argv[0]);
        return EXIT_FAILURE;
    }

    long long count = atoll(argv[1]);

    std::mt19937 rng;
    rng.seed(time(NULL));

    std::uniform_int<int32_t> dist;

    FILE *file = fopen("vec_gen.out", "w");
    if (NULL == file) {
        perror("could not open vector file for writing");
        return EXIT_FAILURE;
    }

    for (long long i = 0; i < count; ++i) {
        int32_t r = dist(rng);
        fprintf(file, "%d\n", r);
    }

    fclose(file);
    return EXIT_SUCCESS;
}

Java:

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.util.Random;
import java.io.IOException;

class VecGen
{
    static final int INT_MAX = 2147483647;

    public static void main(String args[]) {
        if (args.length != 1) {
            System.err.println("Usage: VecGen <elem_count>");
            System.exit(-1);
        }

        int count = Integer.parseInt(args[0]);

        try {
            FileWriter fw = new FileWriter("vec_gen.out");
            BufferedWriter bw = new BufferedWriter(fw);

            Random rng = new Random();

            for (int i = 0; i < count; ++i) {
                int r = rng.nextInt(INT_MAX);
                bw.write(r + "\n");
            }

            bw.close();
        } catch (IOException e) {
            System.err.println("Received I/O exception: " + e);
            System.exit(-2);
        }
    }
};

Python:

import random
import sys

def main():
    if len(sys.argv) != 2:
        print >> sys.stderr, "Usage: %s <elem_count>" % sys.argv[0]
        exit(-1)

    count = int(sys.argv[1])
    random.seed()

    with open('vec_gen.out', 'w') as file:
        for i in xrange(count):
            r = random.getrandbits(31)
            print >> file, r

if __name__ == '__main__':
    main()

OCaml:

if (Array.length Sys.argv) <> 2 then (
  let msg = "Usage: " ^ (Array.get Sys.argv 0) ^ " <elem_count>\n" in
  prerr_string msg;
  exit (-1)
) else (
  let count = int_of_string (Array.get Sys.argv 1) in
  let file = open_out "vec_gen.out" in
  let rec write_rand_line n =
    if n = 0 then ()
    else
      let r = Random.bits () in
      output_string file ((string_of_int r) ^ "\n");
      write_rand_line (n - 1)
  in
  write_rand_line count;
  exit 0
)

Lua:

function main(args)
    if #args ~= 1 then
        io.stderr:write("Usage: " .. args[0] .. " <elem_count>\n")
        os.exit(-1)
    end

    local count = tonumber(args[1])

    math.randomseed(os.time())

    local upper = math.floor(2^31 - 1)

    io.output(io.open("vec_gen.out", "w"))
    for i = 1,count do
        local r = math.random(0, upper)
        io.write(r)
        io.write("\n")
    end

    io.close()
end

main(arg)

Rust:

import io::writer_util;

fn main(args: [str]) {
    if vec::len(args) != 2u {
        let usage = #fmt("Usage: %s <elem_count>\n", args[0]);
        io::stderr().write_str(usage);
        os::set_exit_status(-1);
        ret;
    }

    let count = option::get(int::from_str(args[1]));
    let rng = rand::seeded_rng(rand::seed());

    let fw = result::get(io::buffered_file_writer("vec_gen.out"));
    let mut i = 0;
    while i < count {
        let r = rng.next() & (0x7fffffffu as u32);
        fw.write_line(int::to_str(r as int, 10u));
        i += 1;
    }
}

Go:

package main

import (
    "bufio"
    "fmt"
    "log"
    "math/rand"
    "os"
    "strconv"
)

func main() {
    if len(os.Args) != 2 {
        fmt.Fprintf(os.Stderr, "usage: %s <elem_count>\n", os.Args[0])
        os.Exit(-1)
    }

    count, err := strconv.Atoi(os.Args[1])
    if err != nil { panic(err) }

    file, err := os.Create("vec_gen.out")
    if err != nil { panic(err) }
    defer file.Close()

    bw := bufio.NewWriter(file)
    defer func() {
        err := bw.Flush()
        if err != nil { log.Fatal(err) }
    }()

    for i := 0; i < count; i++ {
        r := rand.Int31()
        fmt.Fprintf(bw, "%d\n", r)
    }
}

Ruby:

def main()
    if ARGV.length != 1 then
        warn "Usage: #{$0} <elem_count>"
        exit -1
    end

    count = ARGV[0].to_i
    file = File.open "vec_gen.out", "w"
    upper = 2147483647

    for i in 1..count do
        r = rand(upper)
        file.write r
        file.write "\n"
    end
end

main()

Updates

2012-06-03 2100

Reflect Makefile switch to ocaml native compiler, I was using the bytecode compiler.

2012-06-04 1500

Add Ruby, because I seem to remember enough of it.

2012-06-04 1930

Update Rust numbers per Graydon's comment. The code under test remained unchanged for the entry's inline results table.

Using C89 in 2012 isn't crazy

The first group I worked with in industry wrote the compiler in C and made fun of C++ on a regular basis. The second group I worked with in industry wrote the compiler in C++ and made fun of C on occasion. Like most systems programmers I've met, they were a loveable, but snarky bunch!

In any case, I've seen life on both sides of the fence, and there's really simple reasoning that dictates what you choose from C89, GNU C, C99, C++98, or C++11 in the year 2012 AD:

If this sounds simple, you're lucky!

Life gets a little bit more interesting when the match is fuzzy: you could make a strategic gamble and (at least initially) ignore parts of your "maximal" target market to gain some productivity. If you're under the gun, that may be the right way to go.

But then again, keeping your options open is also important. The wider the target market the more people you can give an immediate "yes" to. I have to imagine that phone calls like this can be important:

[A sunny afternoon in a South Bay office park. Just outside, a white Prius merges three lanes without activating a blinker. Suddenly, the phone rings.]

Nice to hear from you, Bigbucks McWindfall! What's that? You say you want my code to run as an exokernel on an in-house embedded platform with an in-house C89 toolchain? No problem! We'll send a guy to your office to compile our product and run tests tomorrow morning.

Suffice it to say that there are legitimate considerations. Consider that GCC isn't everywhere (though I love how prevalent it is these days!) and it certainly doesn't generate the best code on every platform for every workload. Consider that MSVC can only compile C89 as "real" C (as opposed to a C++ subset). Consider that the folks out there who have custom toolchains probably have them because they can afford them.

There are benefits to taking a dependency on a lowest common denominator.

Accomplish your new year's resolution of being more badass

I know what you're going through — I've done it all before.

The market is teeming with products that purport to help you meet your badassery quota.

First you do the shakes. Then, you go with the bars that say they're infused with the good stuff, but just seem to leave a slightly corrugated taste in your mouth. Finally, you're droppin' hard-earned dinero on a facility that you don't need or a professional badassery trainer whose appointments you desperately wish you could cancel.

But I'm here to tell you don't need to shell out cash-money to become more badass, my friends. Not anymore, thanks to the beauty of open source, the ES6 plans wiki page, and our delightful SpiderMonkey technical staff who are standing by to receive your calls for mentorship.

Allow me to explain.

Badass begets badass

You may have seen Badass JavaScript: self described as, "A showcase of awesome JavaScript code that pushes the boundaries of what's possible on the web." Check out their badass year in review if you haven't. (Some of the stuff that the interwebs has done with JavaScript even has @NeckbeardHacker envious.)

It probably won't surprise you, but do you know what those folks love? JavaScript evolution. There's nothing quite so refreshing as a fresh-squeezed JS language feature that removes that irritating itching-burning sensation. Sets that do what you mean! String repetition that you can use without copy/pasting from your last project! Inverse hyperbolic sine that you can use for... your... math! (Again, all of this is in the ES6 plans.)

I, for example, have wanted String.prototype.startsWith very badly, to the point that I've started washing people's window panes against their will as they exit highway 101. Around here, odds are that a programmer sees my sign and implements the thing just to stop me from bothering them again. (A little tactic that I call SpiderGuerilla warfare.)

Me, holding my SpiderGuerilla sign.

So what are you waiting for?

I know, you're probably already quite beefcake, but here's my three step plan:

  1. Watch the SpiderMonkey hacking intro.

  2. Pick out a bug from the ES6 plans.

  3. Come talk to great people on irc.mozilla.org in channel #jsapi (for example, cdleary, jorendorff, luke, or whoever else is in there) or comment in the bug — tell them that you're on a quest to become even more badass, describe a bug that you're interested in taking, and give a quick note on what you've done with the engine so far — for example, walking through the video in step 1! We'll find you a mentor who will get you started on the right track.

Don't miss out on this exclusive offer — SpiderMonkey contribution is not sold in stores.

In fact, if you act now, we'll throw in an IonMonkey shirt (or another Firefox shirt of equivalent awesomeness) and publish a blurb about your feature in Mozilla hacks. Of course, you can also have yourself added to about:credits, providing that's what you are into.

IonMonkey shirt.

This one-of-a-kind offer is too ridonk to pass up. Just listen to this testimonial from one of our badass contributors:

I started contributing to SpiderMonkey and now I can write a JIT compiler from scratch in a matter of days. BEEFCAKE!

@evilpies [Liberally paraphrased]

See you in the tubes!