CPP counter macrology

Posted by cdleary on 2020-03-28

I can calculate the motion of heavenly bodies, but not the madness of people.

Isaac Newton (purportedly)

Limited though they may be, C macros as implemented in the C Preprocessor are still useful in these heady, modern, chainsaw-juggling C++ days.

One aspect that keeps C macros useful in this modern era is the ability to (un-hygienically) generate new symbols in the source text. This can be useful for generating new "left hand side" names for temporaries used in macro expansions that cannot be easily placed in a new block scope; e.g. for ASSIGN_OR_RETURN style macros. [*]

[*]The difficulty in a C++ assign-or-return macro comes from the fact we cannot necessarily declare the left hand side's type without initializing it, so we create a temporary and then move out of the temporary result into the LHS definition if the temporary result is not-an-error.

This is accomplished by using the __COUNTER__ built-in macro object, which the GNU GCC documentation lists as a "common predefined GNU extension". Every time __COUNTER__ is used, the preprocessor bumps the count up for the translation unit for use in the next instance.

For example, we can simply make a text file called experiment.txt with the contents:

__COUNTER__
__COUNTER__

Then we run the C preprocessor on it (accessible on my system as the cpp binary or the clang-cpp-9 binary): [†]

[†]The -P flag suppresses extra stuff in the output we don't need.
$ cpp -P experiment.txt
0
1

Seems simple enough, right? But the really fun thing about __COUNTER__ is that you get to remember/rederive how the C preprocessor rules work every time you try to use it to actually make a symbol generator! Let's observe...

Attempting symbol generation

If you've used C macros before you probably recall the "paste tokens together" syntax: the double-hash! We can use it directly in a function-like macro: [‡]

#define PUT_A_BIRD_ON_IT(__name) __name##bird

PUT_A_BIRD_ON_IT(larry)
[‡]Note: I prefix all my function-like-macro parameters with double-underscores so that it's very clear something macro-related went wrong if they appear in the source text. Also it doesn't matter whether you put spaces around the ## operator, but IMO this is visually evocative of how the result will look.

Put that in a text file and run cpp on it and you get:

$ cpp -P experiment.txt
larrybird

"Awesome!" you think, "I can use this to put birds on so many things! And counters too!"

But we see that, with counters, things get trickier somehow...

#define PUT_A_COUNTER_ON_IT(__name) __name##__COUNTER__

PUT_A_COUNTER_ON_IT(larry)

Running cpp on this gives us:

larry__COUNTER__

The difference being that bird (from earlier) is an identifier with no macro definition, but __COUNTER__ is an identifier with a macro definition, and that definition is not being expanded the way we'd like it to!

So how do we define a macro that pastes a counter value onto our identifier so that we can generate new symbols?

The tricky bit: Argument Prescan/Pre-Expansion

The GNU CPP documentation has a big clue of what we're dealing with in the documentation on "argument prescan":

Macro arguments are completely macro-expanded before they are substituted into a macro body, unless they are stringized or pasted with other tokens.

After substitution, the entire macro body, including the substituted arguments, is scanned again for macros to be expanded. The result is that the arguments are scanned twice to expand macro calls in them.

The first clause tells us something interesting: if we paste a macro argument, the expansion will behave differently than if we do not.

// Just place tokens side by side.
#define NO_PASTE(__x) __x __COUNTER__
// Paste the token with __COUNTER__
#define YES_PASTE(__x) __x##__COUNTER__
// Just assume the identifier is larry and paste it with __COUNTER__
#define NO_ARGS() larry##__COUNTER__
// Just concatenate together whatever the arguments are.
#define CONCAT(__x, __y) __x##__y

NO_PASTE(larry)
YES_PASTE(larry)
NO_ARGS()
CONCAT(larry, __COUNTER__)
$ cpp -P experiment.txt
larry 0
larry__COUNTER__
larry__COUNTER__
larry__COUNTER__

With our attempts to paste, we find several ways to not have __COUNTER__ expand... in the one example where we do not paste, and just place the tokens next to each other, it expands __COUNTER__ happily, but that's not what we need to generate a new symbol!

The documentation on "argument prescan" mentions a case where knowing "argument prescan" is a thing is actually useful:

If an argument is stringized or concatenated, the prescan does not occur. If you want to expand a macro, then stringize or concatenate its expansion, you can do that by causing one macro to call another macro that does the stringizing or concatenation.

So the documentation implies this behavior can be overcome by adding a new layer of indirection, like so:

#define _CONCAT(__x, __y)  __x##__y
#define CONCAT(__x, __y) _CONCAT(__x, __y)

CONCAT(larry, __COUNTER__)
$ cpp -P experiment.txt
larry0

And if we want a macro to introduce the __COUNTER__ automatically for the macro user, as we often do for generating new symbols, we have to add yet another layer of indirection to preserve this behavior.

#define _CONCAT(__x, __y) __x##__y
#define CONCAT(__x, __y) _CONCAT(__x, __y)
#define CONCAT_COUNTER(__x) CONCAT(__x, __COUNTER__)

CONCAT_COUNTER(larry)
CONCAT_COUNTER(larry)
$ cpp -P experiment.txt
larry0
larry1

Attempting to use the leaf-most _CONCAT directly from CONCAT_COUNTER will not do what we want.

#define _CONCAT(__x, __y) __x##__y
#define CONCAT_COUNTER(__x) _CONCAT(__x, __COUNTER__)

CONCAT_COUNTER(larry)
$ cpp -P experiment.txt
larry__COUNTER__

In the C specification / clang source

It seems like this notion of "prescanning" is specified more formally in section 6.10.3.4 "Rescanning and further replacement" (under 6.10.3 "Macro Replacement") of the C standard (draft link).

After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. Then, the resulting preprocessing token sequence is rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.

I also found this referred to in Clang as part of the TokenLexer::ExpandFunctionArguments, with the clause cited and everything! Very nice source documentation, and I found the clang source easier to read / instrument than the GCC macro.c source.

// If it is not the LHS/RHS of a ## operator, we must pre-expand the
// argument and substitute the expanded tokens into the result.  This is
// C99 6.10.3.1p1.
if (!PasteBefore && !PasteAfter) {
  ... // [early return]
}

// Okay, we have a token that is either the LHS or RHS of a paste (##)
// argument.  It gets substituted as its non-pre-expanded tokens.
const Token *ArgToks = ActualArgs->getUnexpArgument(ArgNo);
...

It's clear how argument tokens are treated differently based on their "pastedness" via this code structure / source documentation.

Conclusion

In conclusion, the preprocessor does something different based on whether a token is pasted or not in the body of the macro definition, which is why:

#define PASTE(__x, __y) __x##__y
PASTE(larry, __COUNTER__)

Does not give you what you want. It sees the parameter __y is pasted and so does not expand it via its macro definition, while...

#define PASTE(__x, __y) __x##__y
#define PASTE_WRAPPER(__x, __y) PASTE(__x, __y)
PASTE_WRAPPER(larry, __COUNTER__)

... does! The argument __COUNTER__ is given as parameter __y and __y is not pasted in the body of the macro, therefore it expands via its macro definition.

tags: cpp, macros