July 29, 2010

Coding style as a feature of language design

roc recently posted a thought-provoking entry titled, "Coding Style as a Failure of Language Design", in which he states:

Languages already make rules about syntax that are somewhat arbitrary. Projects imposing additional syntax restrictions indicate that the language did not constrain the syntax enough; if the language syntax was sufficiently constrained, projects would not feel the need to do it. Syntax would be uniform within and across projects, and developers would not need to learn multiple variants of the same language.

I totally agree with roc's point that there is overhead in learning-and-conforming-to local style guidelines. I also agree that this overhead is unnecessary and that language implementers should find ways to eliminate it; however, I think that imposing additional arbitrary constraints on the syntax is heading in the wrong direction.

Your language's execution engine [*] already has a method of normalizing crazy styles: it forms an abstract syntax tree. Before the abstract syntax tree (AST) is mutated [†] it is in perfect correspondence with the original source text, modulo the infinite number of possible formatting preferences. This is the necessary set of constraints on the syntax that can actually result in your program being executed as it is written. [‡]

So, why don't we just lug that thing around instead of the source text itself?

The dream

The feature that languages should offer is a mux/demux service: mux an infinite number of formatting preferences into an AST (via a traditional parser); demux the AST into source text via an AST-decompiler, parameterized by an arbitrarily large set of formatting options. Language implementations could ship with a pair of standalone binaries. Seriously, the reference language implementation should understand its own formatting parameters at least as well as Eclipse does. [§]

Once you have the demux tool, you run it on your AST files as a post-checkout hook in your revision control system for instant style personalization. If the engine accepts the AST directly as input, you would only need to demux the files you planned to work on — if the engine accepted an AST directly as input in lieu of source text, this could even be an optimization.

Different execution engines are likely to use different ASTs, but there should be little problem with composability: checked-in AST goes through standalone demux with an arbitrary set of preferences, then through the alternate compiler's mux. So long as the engines have the same language grammar for the source text, everybody's happy, and you don't have to waste time writing silly AST-to-AST-prime transforms.

In this model, linters are just composable AST observers/transforms that have no ordering dependencies. You could even offer a service for simple grammatical extensions without going so far as language level support. Want a block-end delimiter in the Python code you look at? [¶] Why not, just use a transform to rip it out before it leaves the front-end of the execution engine.

Reality

Of course, the set of languages we know and love has some overlap with the set of languages that totally suck to parse, whether due to preprocessors or context sensitivity or the desire to parse poems, but I would bet good money that there are solutions for such languages. In any case, the symmetric difference between those two sets could get with it, and new languages would be kind to follow suit. It would certainly be an interesting post-FF4 experiment for SpiderMonkey, as we've got a plan on file to clean up the parser interfaces for an intriguing ECMAScript strawman proposal anywho.

Footnotes

[*]

Interpreter, compiler, translator, whatever.

[†]

To do constant folding or what have you.

[‡]

Oh yeah, and comments. We would have to keep those around too. They're easy enough to throw away during the first pass over the AST.

[§]

Even more ideal, you'd move all of that formatting and autocompletion code out of IDEs into a language service API.

[¶]

Presumably because you despise all that is good and righteous in the world? ;-)