April 4, 2008

SML is improving, though benchmarks are getting worse

In the following, jenkinsInt is a function that uses no references — it takes an integer, calculates the Jenkins One-at-a-time hash code on a number of bytes corresponding to its width, and returns that value as an integer. Pretty simple, all in all.

Standard ML of New Jersey v110.54 [built: Sat Sep 24 16:19:34 2005]
val jenkinsInt = fn : int -> int
- jenkinsInt(4);
jenkinsInt: 4
Segmentation fault (core dumped)

Segfault!? Clearly this is a failure of the runtime, so I turned to version <tt>.62</tt>, which is known to be slower:

Standard ML of New Jersey v110.62 [built: Wed May  9 18:06:38 2007]
val jenkinsInt = fn : int -> int
- jenkinsInt(4);
jenkinsInt: 4
uncaught exception Overflow [overflow]
raised at: file stdin

At least we get a nice error message in the newer version. It was a lot easier to fix the error once I knew what was really happening.

For kicks, here's my (fixed) implementation. I don't guarantee its correctness — I wrote it hastily for a class project.

structure Hashes = struct
  val () = if Word.wordSize <> Option.valOf(Int.precision)
           then print("WARNING: unexpected int size\n")
           else ()

  val maxIntInverse = if Int.maxInt = NONE then ~1
                      else ~1 * Option.valOf(Int.maxInt)

  (* I'm not sure how much optimization the compiler does, so I'm precomputing
   * these values -- they're quite frequently used if a lot of hashing is being
   * done. *)
  val zero = Word.fromInt(0)
  val three = Word.fromInt(3)
  val six = Word.fromInt(6)
  val eight = Word.fromInt(8)
  val ten = Word.fromInt(10)
  val eleven = Word.fromInt(11)
  val fifteen = Word.fromInt(15)
  val sixteen = Word.fromInt(16)
  val twenty_four = Word.fromInt(24)
  val two_hundred_fifty_five = Word.fromInt(255)

  fun jenkinsByte(key:Word.word, hash:Word.word):Word.word =
      val hash = hash + key
      val hash = hash + Word.<<(hash, ten)
      val hash = Word.xorb(hash, Word.>>(hash, six))

  fun jenkinsChar(key:char, hash:Word.word):Word.word =
    let val key = Word.fromInt(Char.ord(key)) in
      jenkinsByte(key, hash)

  fun jenkinsFinish(hash:Word.word):int =
      val hash = hash + Word.<<(hash, three)
      val hash = Word.xorb(hash, Word.>>(hash, eleven))
      val hash = hash + Word.<<(hash, fifteen)
      val result = Word.toIntX(hash)
      val result = if result < maxIntInverse then maxIntInverse else result

  fun jenkinsChars(cs:char list):int =
      val hash = zero
      val hash = List.foldl(jenkinsChar)(hash)(cs)
      val result = Word.toIntX(hash)
      val result = if result < maxIntInverse then maxIntInverse else result
      val absResult = Int.abs(result)

  fun jenkinsInt(i:int):int =
      val w = Word.fromInt(i)
      val b1 = Word.andb(w, two_hundred_fifty_five)
      val b2 = Word.>>(w, eight)
      val b3 = Word.>>(w, sixteen)
      val b4 = Word.>>(w, twenty_four)
      val hash = List.foldl(jenkinsByte)(zero)([b1, b2, b3, b4])

  fun jenkinsString(s:string):int = jenkinsChars(String.explode(s))