Two postfix operations redux: sequence points
Get ready for some serious language lawyering.
I was going back and converting my old entries to reStructuredText when I found an entry in which I was wrong! (Shocking, I know.)
C
Stupid old me didn’t know about sequence points back in 2007: the effects of the ++ operator in the C expression i++ * i++ are in an indeterminate state of side-effect completion until one of the language-defined sequence points is encountered (i.e. a semicolon or function invocation).
From the C99 standard 6.5.4.2 item 2 regarding the postfix increment and decrement operators:
The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
Therefore, the compiler is totally at liberty to interpret that expression as:
mov lhs_result, i ; Copy the values of the postincrement evaluation. mov rhs_result, i ; (Which is the original value of i.) mul result, lhs_result, rhs_result add i, lhs_result, 1 add i, rhs_result, 1 ; Second increment clobbers with the same value!
This results in the same result as the GCC compilation in the referenced entry: i is 12 and the result is 121.
As I mentioned before, the reason this can occur is that nothing in the syntax forces the first postincrement to be evaluated before the second one. To give an analogy to concurrency constructs: you have a kind of compile-time "race condition" in your syntax between the two postincrements that could be solved with a sequence point "barrier". [*]
In this assembly, those adds can float anywhere they like after their corresponding mov instruction and can operate directly on i instead of the temporary if they’d prefer. Here’s an possible sequence that results in a value of 132 and i as 13.
mov lhs_result, i ; Gets the original 11. inc i ; Increment in-place after the start value is copied. mov rhs_result, i ; Gets the new value 12. inc i ; Increment occurs in-place again, making 13. mul result, lhs_result, rhs_result
Even if you know what you’re doing, mixing two postfix operations, or any side effect, using the less obvious sequence points (like function invocation) is dangerous and easy to get wrong. Clearly it is not a best practice. [†]
Java
The postincrement operation appears to have sequence-point-like semantics in the Java language through experimentation, and it does! From the Java language specification (page 416):
The Java programming language also guarantees that every operand of an operator (except the conditional operators &&, ||, and ? :) appears to be fully evaluated before any part of the operation itself is performed.
Which combines with the definition of the postfix increment expression (page 485):
A postfix expression followed by a ++ operator is a postfix increment expression.
As well as left-to-right expression evaluation (page 415):
The left-hand operand of a binary operator appears to be fully evaluated before any part of the right-hand operand is evaluated.
To a definitive conclusion that i++ * i++ will always result in 132 == 11 * 12 and i == 13 when i == 11 to start.
Python
Python has no increment operators specifically so you don’t have to deal with this kind of nonsense.
>>> count = 0 >>> count++ File "<stdin>", line 1 count++ ^ SyntaxError: invalid syntax
Annoyingly for newbies, though, it looks like ++count is a valid expression that happens to look like preincrement.
>>> count = 0 >>> ++count 0 >>> --count 0
They’re actually two unary positive and negative operators, respectively. Just one of the hazards of a context free grammar, I suppose.
Footnotes
| [*] | I threw this in because the ordeal reminds me of the classic bank account concurrency problem. If it’s more confusing than descriptive, please ignore it. :-) |
| [†] |
Since function invocation defines sequence points, I thought this code sequence guaranteed those results: #include <stdio.h> int identity(int value) { return value; } int main() { int i = 11; printf("%d\n", identity(i++) * identity(i++)); printf("%d\n", i); return 0; } As Dan points out, the order of evaluation is totally unspecified — the left hand and right hand subexpression can potentially be evaluated concurrently. |
Tags: Java, Language Lawyering, Sequence Points
January 10th, 2010 at 16:54
Are you sure the C compiler isn’t allowed to interpret identity(i++) * identity(i++) as (tmp1 = i, tmp2 = i, i = tmp1 + 1, i = tmp2 + 1, identity(tmp1) * identity(tmp2)) and print 121, 12?
gcc seems to warn on this code when invoked with -Wsequence-point:
January 10th, 2010 at 17:12
@Michał: That’s actually a very interesting question that I don’t know the answer to (I see the same warning). Section 6.5 (Expressions) item 3 states, “[T]he order of evaluation of subexpressions and the order in which side effects take place are both unspecified,” which implies to me an that there is a definitive ordering of subexpression evaluation, it’s just unspecified which order it is. That could be a misinterpretation, though.
I guess we’d have to read through GCC to figure out why it’s giving that particular warning. This stuff is just too hokey to resolve from precise wording in the specifications. :-)
January 11th, 2010 at 03:43
I believe ‘undefined’ and ‘unspecified’ mean exactly that; the result of the expression may vary depending on the compiler used, and possibly what it’s targeting too.
For example, see this comparison of compiling your test program in Clang, Apple GCC & LLVM GCC.
January 13th, 2010 at 18:39
@Dan: Your evidence has bested my finest lawyering. :-)
Apparently the subexpressions can be evaluated concurrently. I’ll update the entry now to fix the mistake.
January 14th, 2010 at 01:33
Actually, a similar issue once bit me. What would you expect the code at the bottom of this post to print?
The results:
As part of a compiler course, I once wrote a VM in C, and — for brevity — I had function call sites as arguments of another function call. Later, I wanted to try — for fun — to compile the code with the IBM XLC compiler. The unexpected evaluation ordering function arguments caused the VM to crash. (This was in 2003, I believe.)
The lesson I learned from that is that you shouldn’t make assumptions about when a C compiler decides to do what. Surprisingly large parts of the C language are unspecified, which in effect means implementation-defined; that is, you get what you, at the discretion of the compiler.
I never engaged much in language lawyering, for some reason, and it never really appealed to me. My best guess is that it’s because I tend to distrust my own conclusions :-)
January 14th, 2010 at 01:44
I may have remembered wrong, considering the following results:
As you can see, the evaluation order of function arguments of GCC varies depending on the architecture targeted. Rather than when trying XLC, it might have been when I tried to run the code on my Mac at home that I ran into this. Nonetheless, the fact that the evaluation order is changed by simple retargeting of the compiler underscores exactly what ‘unspecified’ and ‘undefined’ mean for portability…
January 14th, 2010 at 16:19
@Dan: Yeah, I don’t see language lawyering as a good thing, it’s just a funny concept that engineers will hang on every word of a spec, as I have just done. Nice examples of code that doesn’t port! The intuition that program order is strictly top-to-bottom and left-to-right is certainly hard to overcome at times.