'C' category

Newer entries »

Bit twiddling: Simple O(1) membership test

Disclaimer

Bit twiddling is fun. Plus, it has several advantages:

  • It gives you a greater understanding of the nuances of the (C) language. As we all know, learning C is important. If you're a computer engineer and have weak C-fu, then it is time, grasshopper.

  • Bit twiddling is important to know if you're ever going to write software in the field of computer engineering. Without a good understanding of how to slice-and-dice primitive data types, your simulations will be significantly more prone to error. Trust me, you don't want to use gdb for hours only to find that your shift value was off by one. Not that I've ever done that.

  • It makes you feel smart, like when you're the only person who knows the secret to your party trick. I don't know what that's like personally, of course, because I spend all my party time bit bashing, but I hear it's similar.

You have to understand, though, that clever tricks without appropriate documentation will make people want to break your face. [*] Always bit bash responsibly: appoint a designated code-reader to make sure you're clear enough, and leave your keys at the door.

The Problem

Let's say you wanted to know whether a number was a valid PCI Express link width in terms of number of lanes. We know that valid widths are x1, x2, x4, x8, x12, x16, or x32, and want to construct a function of the following form:

#include <stdbool.h>
#include <stdint.h>
#include <assert.h>

/**
 * :return: Whether the lane count is valid.
 */
bool is_valid_link_width(uint8_t lane_count);

/**
 * Unit test for ``is_valid_link_width``.
 */
int main(int argc, char **argv) {
    assert(!is_valid_link_width(0));
    assert(is_valid_link_width(1));
    assert(is_valid_link_width(2));
    assert(!is_valid_link_width(3));
    assert(is_valid_link_width(32));
    assert(!is_valid_link_width(33));
    assert(!is_valid_link_width(33));
    assert(!is_valid_link_width(0xff));
    return 0;
}

Note that the uint8_t has a width of exactly 8 bits. [†]

How would you write it?

Less Interesting Solution

If you were thinking switch statement, that will work. You could use a switch statement with intentional fall-throughs and hope that the compiler optimizes a branch table for you. (For values this small and dense it probably will, as mentioned in the referenced article.) If the compiler doesn't write the branch table for you, but instead generates the equivalent of a big if/else if ladder, your solution doesn't satisfy the O(1) constraint: in that case, the worst case control flow hits every rung of the ladder (the else if guards), making it O(n).

bool is_valid_link_width(uint8_t lane_count) {
    switch (lane_count) {
    case 1:
    case 2:
    case 4:
    case 8:
    case 12:
    case 16:
    case 32:
        return true;
    }
    return false;
}

An implementation that I like better, which doesn't put as much faith in the compiler, is as follows:

bool is_valid_link_width(uint8_t lane_count) {
    return 0x100011116ULL & (1ULL << lane_count);
}

How cool is that?

The Neat Trick

The clever insight here is that we can encode all of our target "true" values in binary form, like so:

       32                      16    12    8     4    1
0b__0001__0000__0000__0000__0001__0001__0001__0001__0110

Now, if we were to take a 1 value and move it over a number of binary slots equal to the lane count, it will line up with a 1 value in this long binary number we've constructed. Take the bitwise-AND of those two values, and we wind up with:

  • A non-zero (true) value if they line up, OR

  • A zero value if they don't

This is exactly what we were looking for.

This long binary number we've created must be converted from binary into a hexadecimal value, so that we can represent it as an integer literal in our C program. Encoding each binary 4-tuple into hex from right to left, we get the value 0x100011116.

There's an issue with this value, however. Unless we specify a suffix for our integer literal, the compiler is allowed to truncate the value to its native word size, [‡] which would cause serious problems. For x86 systems with 16 bit words, our value could be truncated to 0x1116, which would only allow lane sizes of 1, 2, 4, 8, and 12 — the allowed values of 16 and 32 would be cut off!

To solve this, as you can see in the function definition, we add the ULL integer suffix, which explicitly marks the integer literal as an unsigned long long. (The long long integer data type was added to the C language in the C99 standard.) This data type is required to be at least 64 bits wide, so it can definitely hold our 33 relevant bits (32 plus the zero bit which is there for the 1ULL << 0 case). The long data type is too small to hold this value, as long can potentially be only 32 bits wide per the C standard (and it is 32 bits wide on most modern machines).

Readability Counts

Note that there's a more readable version of the same trick in the following:

bool is_valid_link_width(uint8_t lane_count) {
    const uint64_t set =
        1ULL << 1
        | 1ULL << 2
        | 1ULL << 4
        | 1ULL << 8
        | 1ULL << 12
        | 1ULL << 16
        | 1ULL << 32;
    return set & (1ULL << lane_count);
}

Here we make the construction of the big integer more explicit and make the code less prone to our errors in encoding the literal binary value into hex. Any compiler worth its salt will fold out the const calculation at compile time, so no overhead will be incurred for writing it this way.

I demonstrated the other way of doing it first to: a) blow your mind a little bit, and b) demonstrate an idiom you might see in other people's (overly) clever code. Now there's a chance you can recognize and decipher it without adequate documentation. Huzzah!

Footnotes

[*]

A great rift in the universe known as "Other People's Perl" has been the cause of 80% of the computer-engineering face breakage since 1987. Don't let this tragedy happen to you or your beloved programmers.

[†]

I like using fixed-width integer types, especially in describing problems, because they helpfully constrain the possibilities on the input domain. This is even more important for newcomers who are just wrapping their heads around bit twiddling.

[‡]

I can't find standards documents/discussions to support this claim, but it's definitely what I was taught. Can anybody provide evidence to confirm/deny?

procfs and preload

Two of the cool utilities that I've checked out lately have centered around /proc. /proc is a virtual filesystem mountpoint — the filesystem entities are generated on the fly by the kernel. The filesystem entities provide information about the kernel state and, consequently, the currently running processes. [*]

The utilities are preload and powertop. Both are written in C, though I think that either of them could be written more clearly in Python.

preload

Preload's premise is fascinating. Each shared library that a running process is using via MMIO can be queried via /proc/[pid]/maps, which contains entries of the form:

[vm_start_addr]-[vm_end_addr] [perms] [file_offset] [device_major_id]:[device_minor_id] [inode_num] [file_path]

Preload uses a Markov chain to decide which shared library pages to "pre-load" into the page cache by reading and analyzing these maps over time. Preload's primary goal was to reduce login times by pre-emptively warming up a cold page cache, which it was successful in doing. The catch is that running preload was shown to decrease performance once the cache was warmed up, indicating that it may have just gotten in the way of the native Linux page cache prefetch algorithm. [†]

There are a few other things in /proc that preload uses, like /proc/meminfo, but querying the maps is the meat and potatoes. I was thinking of porting it to Python so that I could understand the structure of the program better, but the fact that the daemon caused a performance decrease over a warm cache turned me off the idea.

References

Footnotes

[*]

A cool side note — all files in /proc have a file size of 0 except kcore and self.

[†]

The page_cache_readahead() function in the Linux kernel.

Two postfix operations in a single statement in GCC

#include <stdio.h>

int z = 11;

int main()
{
    printf("%d\n", ((z++) * (z++)));
    printf("%d\n", z);
    return 0;
}
$ gcc -o postfix_test.o postfix_test.c; ./postfix_test.o
121
12

Surprised? I sure was. It looks like gcc interprets two postfix operations in a single statement as a single postfix increment request. I guess this makes sense if you consider the postfix operator to mean, "Wait for this statement to complete, then have the variable increment." Assuming this specification, the second time that you postfix-increment the compiler says, "Yeah, I’m already going to have the variable increment when the statement completes — no need to tell me again."

On the other hand, prefix increment does work twice in the same statement. Maybe this is a decision that’s left up to the compiler? It’s not specified in K&R as far as I can see, but I haven’t checked any of the ANSI specifications.

Updates

2007/09/26 Here’s what Java has to say!

class DoublePostfixTester
{
    public static void main(String[] args)
    {
        int z = 11;
        System.out.println(((z++) * (z++)));
        System.out.println(z);
    }
}
$ javac DoublePostfixTester.java
$ java DoublePostfixTester
132
13

Which is what I would have expected in the first place. Bravo, Java — we’re more alike than I thought.

Matching _t types in your .vimrc

Background

I find myself constantly reproducing my .vimrc file. It's most frequently because I'm migrating from system to system; however, I sometimes just lose it during a reformat (or forget to rsync with the -a flag).

One part of Vim that I'm not fond of is its regex. It takes the one thing I like about Perl (the ecumenical regex syntax) and throws it out the window. As a result, I usually write hackish regexes to highlight my type_t cTypes on the fly, which never highlight quite what I want them to.

Evolution

One example is a regex I found doing a Google search for "vim match _t", which, admittedly, doesn't return much. The most relevant hit suggests the following:

syntax match cType /[^ (]*_t[ )]/ " very wrong

This suggestion is pretty bad — it doesn't match cow_t in any of the following, as examples:

typedef struct Cow cow_t;
cow_t* my_cow;
cow_t my_cow;

At first I thought the correct regex was the following, which matches all of the above:

syntax match cType /\w\+_t\W\{-}/ " also wrong

It's annoying Vim regex doesn't have the standard operators (like +) without the escape, and that there's that awkward match on the last atom (W, or non-word character) to drop it with a special funky-looking-dealie. I believe the Perl equivalent is the equally unintuitive ?? postfix, but it has the clear advantage of being the de-facto standard.

The above faultily matches on things like the following, however: cow_tip(); This indicates that we need to match on the previous portion in all cases, except where there's a word character following. For this, we use the following, correct, construct:

syntax match cType /\w\+_t\w\@!/ " CORRECT!

I couldn't have figured it out without this handy reference, as well as the more extensive Vim documentation for fixing my original error by using clown-hat looking constructs.

Efficiency/Readability Fix (July 30, 2008)

My friend Trevor Caira pointed out the existence of the \zs and \ze atoms. These resize the match to the specified start and end (respectively), without using clown-hat trickery. <@:)

This makes the regex look a great deal more straightforward:

syntax match cType /\w\+_t\ze\W/