June 3, 2012

Simple, selfish, and unscientific shootout

Disclaimer

I've caught some flak over publishing my "selfish" (read: empirical testing that yields results which are only relevant to me) multi-language-engine-and-standard-library "shootout" (read: I wrote the same basic functionality across multiple languages, somewhat like on the shootout.alioth.debian.org site, the Computer Language Benchmarks Game). I value the concept and process of learning in the open, but it may require more time and consideration of clarity than I had given in this entry. Taking it down would apparently be a breach of etiquette, so please read the following TL;DR as a primer.

TL;DR: I encourage you to personally try writing small utilities against a variety of language engines when you have the opportunity. Consider how much tweaking of the original code you have to do in order to obtain a well-performing implementation. Weigh the development effort and your natural proficiency against the performance, clarity, and safety of the resulting program. Gather evidence and be eager to test your cost assumptions. Commit to learning about sources of overhead and unforeseen characteristics of your libraries. You may be surprised which engines give the best bang per time spent.

It has also been suggested to me that all native languages are within ~3x of one another on generated code performance, and the rest of the difference is generally attributable to the library or algorithm, so that's an interesting rule of thumb to keep in mind.

If you'd like to see how to write a small utility against a variety of language engines, you can check out the Github repo.

Introduction

We tend to throw around "orders of magnitude" when it comes to "programming language speeds", even though we know that the concept of a programming language having a speed for arbitrary programs makes little sense. But, when I'm coding up something small, I find myself pondering a very concrete question: which available language engine (language implementation and libraries) could I reasonably write this little program against that would give the best speed over development effort?

I'm not looking to optimize all the buttery nooks and crannies of this program, nor do I want to drill into potential deficiencies in the I/O subsystem: I just want to make a painless little utility that doesn't require me to go on a lunch break.

XKCD knows what I'm talking about:

Unscientific

I was writing a very simple, single-threaded program to generate about a billion uniformly random int32s in a text file, and I decided I would do a selfish little shootout: write the same program in a set of "viable" languages (remember, this is all about me :-), unscientifically use time(1) on the programs a few times, consider how painful it was to write, and see what the runtimes come out to be.

For 100 million integers on my CentOS Bloomfield box, these were the runtimes for my initial, naive implementations and their lightly tweaked counterparts:

Impl

Naive Runtime

Naive Ratio

Tweaked Runtime

Tweaked Ratio

Engine

.cpp

~0m 11s

~0m 15s

GCC 4.4.6 -O3

.java

~0m 18s

~1.5x

~0m 19s

~1.25x

JDK 1.7.0.04

.go

~1m 5s

~6x

~0m 23s

~1.5x

go1.0.1

.rs

~1m 7s

~6x

~0m 23s

~1.5x

rustc -O3 0.2 (trunk)

.ml

~0m 37s

~3.3x

~0m 35s

~2.5x

ocamlopt 3.11.2

.py

~1m 6s

~6x

~0m 51s

~3.5x

PyPy 1.9.1 (nightly)

.lua

~1m 36s

~9x

~0m 27s (FFI)

~1.8x

LuaJIT 2.0.0-beta10 (trunk)

.rb

~1m 50s

~10x

ruby 2.0.0 (trunk)

Like all developers, I have varied levels of expertise across languages and their standard libraries; but, as I said, this is a selfish shootout, so my competence in a given language is considered part of the baseline. You'll see in the comments that many readers identified performance bugs in these code samples.

There are also caveats for the random numbers I was generating in OCaml (due to tag bit stealing).

For a billion integers the naive C++0x version took 1m 42s and the naive Java version took 2m 18s (1.35x slower). I didn't want to spend the time to slow down the others by an order of magnitude.

As a result — with perpetual intent to improve my abilities in all engines I work with, willful ignorance of the reasoning, acknowledgement that I need to perform more experiments like this to draw a more reasonable conclusion over time, and malice aforethought — I'll hereby declare myself guilty of leaning a bit more towards writing things like this in C++ when I want better runtimes in the giga range for little IO-and-compute programs.

Show me the code!

I threw the code up on github, but the versions that I wrote naively (before optimization suggestions) are duplicated here for convenience.

C++0x:

#include <cstdlib>
#include <cstdio>
#include <random>

int
main(int argc, char **argv)
{
    if (2 != argc) {
        fprintf(stderr, "Usage: %s <elem_count>\n", argv[0]);
        return EXIT_FAILURE;
    }

    long long count = atoll(argv[1]);

    std::mt19937 rng;
    rng.seed(time(NULL));

    std::uniform_int<int32_t> dist;

    FILE *file = fopen("vec_gen.out", "w");
    if (NULL == file) {
        perror("could not open vector file for writing");
        return EXIT_FAILURE;
    }

    for (long long i = 0; i < count; ++i) {
        int32_t r = dist(rng);
        fprintf(file, "%d\n", r);
    }

    fclose(file);
    return EXIT_SUCCESS;
}

Java:

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.util.Random;
import java.io.IOException;

class VecGen
{
    static final int INT_MAX = 2147483647;

    public static void main(String args[]) {
        if (args.length != 1) {
            System.err.println("Usage: VecGen <elem_count>");
            System.exit(-1);
        }

        int count = Integer.parseInt(args[0]);

        try {
            FileWriter fw = new FileWriter("vec_gen.out");
            BufferedWriter bw = new BufferedWriter(fw);

            Random rng = new Random();

            for (int i = 0; i < count; ++i) {
                int r = rng.nextInt(INT_MAX);
                bw.write(r + "\n");
            }

            bw.close();
        } catch (IOException e) {
            System.err.println("Received I/O exception: " + e);
            System.exit(-2);
        }
    }
};

Python:

import random
import sys

def main():
    if len(sys.argv) != 2:
        print >> sys.stderr, "Usage: %s <elem_count>" % sys.argv[0]
        exit(-1)

    count = int(sys.argv[1])
    random.seed()

    with open('vec_gen.out', 'w') as file:
        for i in xrange(count):
            r = random.getrandbits(31)
            print >> file, r

if __name__ == '__main__':
    main()

OCaml:

if (Array.length Sys.argv) <> 2 then (
  let msg = "Usage: " ^ (Array.get Sys.argv 0) ^ " <elem_count>\n" in
  prerr_string msg;
  exit (-1)
) else (
  let count = int_of_string (Array.get Sys.argv 1) in
  let file = open_out "vec_gen.out" in
  let rec write_rand_line n =
    if n = 0 then ()
    else
      let r = Random.bits () in
      output_string file ((string_of_int r) ^ "\n");
      write_rand_line (n - 1)
  in
  write_rand_line count;
  exit 0
)

Lua:

function main(args)
    if #args ~= 1 then
        io.stderr:write("Usage: " .. args[0] .. " <elem_count>\n")
        os.exit(-1)
    end

    local count = tonumber(args[1])

    math.randomseed(os.time())

    local upper = math.floor(2^31 - 1)

    io.output(io.open("vec_gen.out", "w"))
    for i = 1,count do
        local r = math.random(0, upper)
        io.write(r)
        io.write("\n")
    end

    io.close()
end

main(arg)

Rust:

import io::writer_util;

fn main(args: [str]) {
    if vec::len(args) != 2u {
        let usage = #fmt("Usage: %s <elem_count>\n", args[0]);
        io::stderr().write_str(usage);
        os::set_exit_status(-1);
        ret;
    }

    let count = option::get(int::from_str(args[1]));
    let rng = rand::seeded_rng(rand::seed());

    let fw = result::get(io::buffered_file_writer("vec_gen.out"));
    let mut i = 0;
    while i < count {
        let r = rng.next() & (0x7fffffffu as u32);
        fw.write_line(int::to_str(r as int, 10u));
        i += 1;
    }
}

Go:

package main

import (
    "bufio"
    "fmt"
    "log"
    "math/rand"
    "os"
    "strconv"
)

func main() {
    if len(os.Args) != 2 {
        fmt.Fprintf(os.Stderr, "usage: %s <elem_count>\n", os.Args[0])
        os.Exit(-1)
    }

    count, err := strconv.Atoi(os.Args[1])
    if err != nil { panic(err) }

    file, err := os.Create("vec_gen.out")
    if err != nil { panic(err) }
    defer file.Close()

    bw := bufio.NewWriter(file)
    defer func() {
        err := bw.Flush()
        if err != nil { log.Fatal(err) }
    }()

    for i := 0; i < count; i++ {
        r := rand.Int31()
        fmt.Fprintf(bw, "%d\n", r)
    }
}

Ruby:

def main()
    if ARGV.length != 1 then
        warn "Usage: #{$0} <elem_count>"
        exit -1
    end

    count = ARGV[0].to_i
    file = File.open "vec_gen.out", "w"
    upper = 2147483647

    for i in 1..count do
        r = rand(upper)
        file.write r
        file.write "\n"
    end
end

main()

Updates

2012-06-03 2100

Reflect Makefile switch to ocaml native compiler, I was using the bytecode compiler.

2012-06-04 1500

Add Ruby, because I seem to remember enough of it.

2012-06-04 1930

Update Rust numbers per Graydon's comment. The code under test remained unchanged for the entry's inline results table.