February 13, 2008

State of the X-Window

I just watched a presentation on X.org's history, architectural overview, and recent developments, entitled State of the X-Window. If you use an operating system that runs X, I recommend you check it out.

The thing that I found most personally relevant is that there is currently a bug which prevents framebuffer re-allocation. As a result, the framebuffer size is fixed until the X-server is restarted!

This has always been a pain for me, since I can hotplug my external monitor into my laptop flawlessly, but can't switch out of clone to dual head without restarting X (since that would require a framebuffer size increase). The server developer promises that it'll be fixed in the next few months — I'll certainly be excited to have that working!

Python's generators sure are handy

While rewriting some older code today, I ran across a good example of the clarity inherent in Python's generator expressions. Some time ago, I had written this weirdo construct:

for regex in date_regexes:
    match = regex.search(line)
    if match:
        break
else:
    return
# ... do stuff with the match

The syntax highlighting makes the problem fairly obvious: there's way too much syntax!

First of all, I used the semi-obscure "for-else" construct. For those of you who don't read the Python BNF grammar for fun (as in: the for statement), the definition may be useful:

So long as the for loop isn't (prematurely) terminated by a break statement, the code in the else suite gets evaluated. To restate (in the contrapositive): the code in the else suite doesn't get evaluated if the for loop is terminated with a break statement. From this definition we can deduce that if a match was found, I did not want to return early.

That's way too much stuff to think about. Generators come to the rescue!

def first(iterable):
    """:return: The first item in the iterable that evaluates
    as True.
    """
    for item in iterable:
        if item:
            return item
    return None

match = first(regex.search(line) for regex in regexes)
if not match:
    return
# ... do stuff with the match

At a glance, this is much shorter and more comprehensible. We pass a generator expression to the first function, which performs a kind of short-circuit evaluation — as soon as a match is found, we stop running regexes (which can be expensive). This is a pretty rockin' solution, so far as I can tell.

Prior to generator expressions, to do something similar to this we'd have to use a list comprehension, like so:

match = first([regex.search(line) for regex in regexes])
if not match:
    return
# ... do stuff with the match

We dislike this because the list comprehension will run all of the regexes, even if one already found a match. What we really want is the short circuit evaluation provided by generator expressions and the any builtin, as shown above. Huzzah!

Edit

Originally I thought that the any built-in returned the first object which evaluated to a boolean True, but it actually returns the boolean True if any of the objects evaluate to True. I've edited to reflect my mistake.

IDE cable termination

I never gave much thought as to how IDE cables are terminated. Recently, I broke an exceptionally small IDE cable that lives in my hard drive enclosure — I can never figure out how to pull IDEs out by the head, and so I always end up yanking on the cable, often detrimentally. :)

In breaking the head of the cable, I found out that this IDE (and I assume this holds for all IDEs) is "vampire tapped", reminding me of 10BASE5 Ethernet technology. Effectively, all 40 of the insulated wire sheaths are pierced by sharp spikes in the terminator. I'm not sure if this vampire tap method also holds for the three-head IDEs (board/master/slave) — I'll have to dismantle one of those in the future. It might be fun to look into IDE arbitration protocol at some point to figure out how those three-head IDE cables work properly. Are they a single bus with three vampire taps, or two separate buses with the middle device acting as an arbiter?

At any rate, it's real hard to get one of these terminators situated right after you knock it out of place. There are holes from the previous termination that you have to place just right. So far as my external enclosure is concerned, it looks like I've gotta find a new cable. :/

Hoarding hard drives

Cleaning out the basement, among a bunch of other junk, I found 6 hard drives (which I thought was a large number of hard drives). For some reason I thought it'd be fun to enumerate them...

  1. IBM Deskstar 75GXP, 46.1GB, 7200rpm

  2. Maxtor DiamondMax VL 30, 23.0GB, 5400rpm

  3. IBM Deskstar 40GV, 20.4GB, 5400rpm

  4. Maxtor DiamondMax 6800, 10.1GB, 5400rpm

  5. Maxtor DiamondMax 2160, 8.4GB, 5400rpm

  6. Western Digital Caviar AC22500, 2.5GB, 5400rpm

The average size of a hard drive in my basement is 18.42GB!

procfs and preload

Two of the cool utilities that I've checked out lately have centered around /proc. /proc is a virtual filesystem mountpoint — the filesystem entities are generated on the fly by the kernel. The filesystem entities provide information about the kernel state and, consequently, the currently running processes. [*]

The utilities are preload and powertop. Both are written in C, though I think that either of them could be written more clearly in Python.

preload

Preload's premise is fascinating. Each shared library that a running process is using via MMIO can be queried via /proc/[pid]/maps, which contains entries of the form:

[vm_start_addr]-[vm_end_addr] [perms] [file_offset] [device_major_id]:[device_minor_id] [inode_num] [file_path]

Preload uses a Markov chain to decide which shared library pages to "pre-load" into the page cache by reading and analyzing these maps over time. Preload's primary goal was to reduce login times by pre-emptively warming up a cold page cache, which it was successful in doing. The catch is that running preload was shown to decrease performance once the cache was warmed up, indicating that it may have just gotten in the way of the native Linux page cache prefetch algorithm. [†]

There are a few other things in /proc that preload uses, like /proc/meminfo, but querying the maps is the meat and potatoes. I was thinking of porting it to Python so that I could understand the structure of the program better, but the fact that the daemon caused a performance decrease over a warm cache turned me off the idea.

References

Footnotes

[*]

A cool side note — all files in /proc have a file size of 0 except kcore and self.

[†]

The page_cache_readahead() function in the Linux kernel.