October 12, 2011

JS regexps implemented in JS

If you heart JS and want to learn more about how regular expressions work, I've got yet another fun project for you to work on.

Back when I was working more heavily on the regular expression engine, some ECMAScript spec correctness questions came up. As it turns out, the regular expression part of the specification is written in scary-sounding-but-really-not-that-hard-to-understand continuation passing style (and it really seems to make the most sense that way).

I tried to work through the first bug on paper, but I forgot to carry the one or something, and I got the wrong result. dmandelin quickly whipped up a program modeled on the spec to resolve that one example conclusively. Seeing how easy that was, I followed his lead and started working on a little library to resolve these questions in ways that save more trees.

I haven't worked on it much lately (according to this neat GitHub activity doohickey), but I put cdlre out on github today, and I'd be happy to review pull requests. The regular expression specification for matching (ECMAScript revision 5 section 15.10) is easy to translate into running code, and I left a lot of features unimplemented, just for you!

I'll just copy-pasta the rest from the project README:

Potential applications:

  • Regression testing the specification against host implementations.

  • Use in understanding why regular expressions succeed/fail to match.

  • Use in a metacircular interpreter (like Narcissus).

  • Use as a staging ground for regular expression optimizations and/or a regular expression compiler. (Such a compiler could target eval as a backend or a JIT code execution foreign function.)


  • Be capable of visualizing (or at least dumping out) the ECMAScript standard steps taken in matching a regular expression.

  • Be capable of enabling/disabling the de-facto quirks from various browsers which are not yet part of the standard.

  • Be capable of running a thorough regression suite against the host regular expression engine (presumably with a set of permitted quirk options).

  • Keep the JS code a direct translation from the spec where possible and practical.

I'm sure that hooking in a comprehensible visualization would be a helpful tool for web developers who want to harness the Indiana Jones-like power of regular expressions.

Go-go gadget community?