August 28, 2007

Matching _t types in your .vimrc

Background

I find myself constantly reproducing my .vimrc file. It's most frequently because I'm migrating from system to system; however, I sometimes just lose it during a reformat (or forget to rsync with the -a flag).

One part of Vim that I'm not fond of is its regex. It takes the one thing I like about Perl (the ecumenical regex syntax) and throws it out the window. As a result, I usually write hackish regexes to highlight my type_t cTypes on the fly, which never highlight quite what I want them to.

Evolution

One example is a regex I found doing a Google search for "vim match _t", which, admittedly, doesn't return much. The most relevant hit suggests the following:

syntax match cType /[^ (]*_t[ )]/ " very wrong

This suggestion is pretty bad — it doesn't match cow_t in any of the following, as examples:

typedef struct Cow cow_t;
cow_t* my_cow;
cow_t my_cow;

At first I thought the correct regex was the following, which matches all of the above:

syntax match cType /\w\+_t\W\{-}/ " also wrong

It's annoying Vim regex doesn't have the standard operators (like +) without the escape, and that there's that awkward match on the last atom (W, or non-word character) to drop it with a special funky-looking-dealie. I believe the Perl equivalent is the equally unintuitive ?? postfix, but it has the clear advantage of being the de-facto standard.

The above faultily matches on things like the following, however: cow_tip(); This indicates that we need to match on the previous portion in all cases, except where there's a word character following. For this, we use the following, correct, construct:

syntax match cType /\w\+_t\w\@!/ " CORRECT!

I couldn't have figured it out without this handy reference, as well as the more extensive Vim documentation for fixing my original error by using clown-hat looking constructs.

Efficiency/Readability Fix (July 30, 2008)

My friend Trevor Caira pointed out the existence of the \zs and \ze atoms. These resize the match to the specified start and end (respectively), without using clown-hat trickery. <@:)

This makes the regex look a great deal more straightforward:

syntax match cType /\w\+_t\ze\W/