CPython Block-Stack Example

Chop your own wood and it will warm you twice.

—Henry David Thoreau via Henry Ford

Allow me to tell you a tale: of the stack that's not the stack.

A mysterious place where nulls are pushed and popped in lieu of PyObjects. A wondrous place, where tracebacks are traversed for display, rolling downhill collecting file and line numbers like fresh wet snow. Where complex control flow is bred into existence through the tender embrace of a few status codes.

A place called... the block stack.

Or like... don't. That's ok too. This stuff is pretty dense.

ToC:

Basic idea and example

Observing stack values on trace

The unwinding procedure

Variations

Other Constructs

Recapping exceptions, case analysis

Some takeaways

Appendix: CPython Notes

Basic idea and example

The block stack is used to implement interesting lexical constructs in CPython:

Exception-related constructs: try-except-else-finally
Loop control: break / continue / {for,while}-else
Frame control: return / yield
Context manager constructs: with

It's called "the block stack" because, in reality, a CPython frame has two stacks:

A value stack, which contains the intermediate results for bytecodes in Python expressions to communicate values to one another.
A block stack, which tracks control transfer information that is independent of the value stack.

It makes some intuitive sense that, because the "unexpectional" path in the VM evaluates everything in a stack-like fashion, there might be some side-structure to indicate what happens in the "exceptional" circumstances where normal evaluation is interrupted. [*]

[*]	In JIT compiler IRs they are sometimes called "exception edges" for this reason.

For example, consider the following code snippet, where we blow up in the middle of an arithmetic expression:

# A type that explodes when you try to multiply it.
class Exploder:
    def __mul__(self, other): raise TypeError('boom')

# A function that prints the result of a simple math expression.
def f(a, b, c):
    print(a + b * c)

# A main routine that wraps up f so we can observe an expected error.
def main(a, b, c):
    try:
        f(a, b, c)
    except TypeError:
        pass
    else:
        raise AssertionError

# Invoke it all!
main(1, Exploder(), 3)

Note that the sub-expression b * c in function f is the one that explodes. But, because Python is running a stack machine, a will already have been pushed onto the stack, hoping that when b * c resolves successfully we'll then happily perform a binary add operation with the two top stack values.

But alas, it's not meant to be! As we can determine with a little mental simulation of the following bytecode sequence, when the BINARY_MULTIPLY blows up, both print and a will be abandoned on the stack.

>>> dis.dis(f)
  5           0 LOAD_GLOBAL              0 (print)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (b)
              6 LOAD_FAST                2 (c)
              8 BINARY_MULTIPLY
             10 BINARY_ADD
             12 CALL_FUNCTION            1
             14 POP_TOP
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE

So, in the end, since nothing in the code for this frame said it was able to deal with an exception (e.g. via try or with syntax), the exception blows up the whole frame, and proceeds to see if it can blow up the caller frame!

We see that in the disassembly of main, unlike in f, we do have constructs that are capable of handling the exception. Here's an annotated version of the main function's bytecode, as we've already seen the disassembly of f above. (No need to understand all the inline comments on a first read, we'll trace through the execution of this bytecode in the section below.)

>>> def main(a, b, c):
...     try:
...         f(a, b, c)
...     except TypeError:
...         pass
...     else:
...         raise AssertionError
>>> dis.dis(main)  # [cdl] With my annotations inline!
# [cdl] Do the invocation under the "try" by pushing an "except" handler.
 11           0 SETUP_EXCEPT            16 (to 18)

 12           2 LOAD_GLOBAL              0 (f)
              4 LOAD_FAST                0 (a)
              6 LOAD_FAST                1 (b)
              8 LOAD_FAST                2 (c)
             10 CALL_FUNCTION            3
             12 POP_TOP    # [cdl] Discard return value.
             # [cdl] "try" is done w/o exception: discard except block, jump to "else".
             14 POP_BLOCK
             16 JUMP_FORWARD            20 (to 38)

# [cdl] Except clause / block @ PC 18.
 13     >>   18 DUP_TOP
             20 LOAD_GLOBAL              1 (TypeError)
             22 COMPARE_OP              10 (exception match)
             24 POP_JUMP_IF_FALSE       36  # [cdl] Jump to re-raise on failed match.
             # [cdl] We don't use the exception data (we just pass) so pop it off.
             26 POP_TOP
             28 POP_TOP
             30 POP_TOP

 14          32 POP_EXCEPT  # [cdl] Restores old exception (there is none).
             34 JUMP_FORWARD             6 (to 42)  # [cdl] Jump to return.
        >>   36 END_FINALLY  # [cdl] Re-raiser.

# [cdl] Path where we assert-fail if we didn't see the type error @ PC 38.
 16     >>   38 LOAD_GLOBAL              2 (AssertionError)
             40 RAISE_VARARGS            1
# [cdl] Our implicit return of None @ PC 42.
        >>   42 LOAD_CONST               0 (None)
             44 RETURN_VALUE

Observing stack values on trace

Staring at the disassembly and thinking hard is great and all, but let's look at a trace to firm up our understanding (using a little ctypes-based tracer that I hacked up to dump frame state).

Tips on how to read the trace:

The "stack" shown beneath each bytecode is the stack it is operating upon, i.e. where it would grab any operands from.
CPython generally emits bytecodes in an order you'd expect, according to a walk of the AST's statements, so it's easy to pick a line number in the original code to start following the bytecode sequence.
If the bytecode is from a new line the file/lineno is mentioned above the printed bytecode.

trace: the `try`

We'll start looking at the trace of the try: the SETUP_EXCEPT bytecode executes and notes where we want to "trap" [†] the PC to if an exception occurs.

[†]	"Trapping" or "interrupt vectoring" is just systems programmer-y jargon for "grabbing the PC from where it naturally was and putting it somewhere special".

Observe that the SETUP_EXCEPT bytecode doesn't mention what kind of exception to trap -- the testing to see if we can handle the particular exception subtype is done in the handler code we trap to.

In fact, the SETUP_EXCEPT bytecode execution does nothing except for push an entry onto the block stack for later observation! As we'll see, only if something goes wrong does the block stack entry represent "doing something".

Here's the trace entry -- it notes when we start new line, as well as the PC and instruction that's about to execute, and shows the stack it's going to operate upon:

type_error_from_udt_mul.py:11
  0 Instruction(opname='SETUP_EXCEPT', opcode=121, arg=16, argval=18, argrepr='to 18', offset=0, starts_line=11, is_jump_target=False)
          stack (0): empty

Now that this SETUP_EXCEPT bytecode is done executing, we'll see there's a single record on the block stack in the trace entries below!

trace: the invocation of `f`

We main calls f with all its arguments. Observe the CALL_FUNCTION bytecode argument is 3, which means it's invoking the function with the three arguments on the top of the stack.

This invocation is not terribly exciting to observe for our block-stack-understanding purposes, but we can see that when we call f there is still our SETUP_EXCEPT entry on the block stack for this (main) frame.

type_error_from_udt_mul.py:12
  2 Instruction(opname='LOAD_GLOBAL', opcode=116, arg=0, argval='f', argrepr='f', offset=2, starts_line=12, is_jump_target=False)
          stack (0): empty
         f_iblock: 1
          blockstack 0: type: SETUP_EXCEPT handler: 18 level: 0
  4 Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='a', argrepr='a', offset=4, starts_line=None, is_jump_target=False)
          stack (1):
           TOS0: <class 'function'> :: <function f>
         f_iblock: 1
          blockstack 0: type: SETUP_EXCEPT handler: 18 level: 0
  6 Instruction(opname='LOAD_FAST', opcode=124, arg=1, argval='b', argrepr='b', offset=6, starts_line=None, is_jump_target=False)
          stack (2):
           TOS0: <class 'int'> :: 1
           TOS1: <class 'function'> :: <function f>
         f_iblock: 1
          blockstack 0: type: SETUP_EXCEPT handler: 18 level: 0
  8 Instruction(opname='LOAD_FAST', opcode=124, arg=2, argval='c', argrepr='c', offset=8, starts_line=None, is_jump_target=False)
          stack (3):
           TOS0: <class '__main__.Exploder'> :: <__main__.Exploder object>
           TOS1: <class 'int'> :: 1
           TOS2: <class 'function'> :: <function f>
         f_iblock: 1
          blockstack 0: type: SETUP_EXCEPT handler: 18 level: 0
 10 Instruction(opname='CALL_FUNCTION', opcode=131, arg=3, argval=3, argrepr='', offset=10, starts_line=None, is_jump_target=False)
          stack (4):
           TOS0: <class 'int'> :: 3
           TOS1: <class '__main__.Exploder'> :: <__main__.Exploder object>
           TOS2: <class 'int'> :: 1
           TOS3: <class 'function'> :: <function f>
         f_iblock: 1
          blockstack 0: type: SETUP_EXCEPT handler: 18 level: 0

Great, we've called the function with our three arguments, and with the block stack appropriately set up to handle a potential exception!

trace: the control transfer to the custom `mul`

Now, within f, we execute the bytecode sequence that we disassembled earlier, repeated here:

>>> dis.dis(f)
  5           0 LOAD_GLOBAL              0 (print)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (b)
              6 LOAD_FAST                2 (c)
              8 BINARY_MULTIPLY
             10 BINARY_ADD
             12 CALL_FUNCTION            1
             14 POP_TOP
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE

We expect this invocation to blow up on the BINARY_MULTIPLY, but first we observe that BINARY_MULTIPLY control transfers to the custom __mul__ on the user-defined type:

type_error_from_udt_mul.py:7
  0 Instruction(opname='LOAD_GLOBAL', opcode=116, arg=0, argval='print', argrepr='print', offset=0, starts_line=7, is_jump_target=False)
          stack (0): empty
  2 Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='a', argrepr='a', offset=2, starts_line=None, is_jump_target=False)
          stack (1):
           TOS0: <class 'builtin_function_or_method'> :: <built-in function print>
  4 Instruction(opname='LOAD_FAST', opcode=124, arg=1, argval='b', argrepr='b', offset=4, starts_line=None, is_jump_target=False)
          stack (2):
           TOS0: <class 'int'> :: 1
           TOS1: <class 'builtin_function_or_method'> :: <built-in function print>
  6 Instruction(opname='LOAD_FAST', opcode=124, arg=2, argval='c', argrepr='c', offset=6, starts_line=None, is_jump_target=False)
          stack (3):
           TOS0: <class '__main__.Exploder'> :: <__main__.Exploder object>
           TOS1: <class 'int'> :: 1
           TOS2: <class 'builtin_function_or_method'> :: <built-in function print>
  8 Instruction(opname='BINARY_MULTIPLY', opcode=20, arg=None, argval=None, argrepr='', offset=8, starts_line=None, is_jump_target=False)
          stack (4):
           TOS0: <class 'int'> :: 3
           TOS1: <class '__main__.Exploder'> :: <__main__.Exploder object>
           TOS2: <class 'int'> :: 1
           TOS3: <class 'builtin_function_or_method'> :: <built-in function print>

This is where the "dunder" magic happens -- though we don't see an explicit CALL_FUNCTION looking opcode, internally the definition of BINARY_MULTIPLY knows the user-defined type has customized the notion of multiplication and implicitly creates a new interpreter frame to invoke that function.

Because of this implicit invocation, we now dive into the user-defined multiply, and witness the RAISE_VARARGS bytecode that puts us into our exception state:

type_error_from_udt_mul.py:3
  0 Instruction(opname='LOAD_GLOBAL', opcode=116, arg=0, argval='TypeError', argrepr='TypeError', offset=0, starts_line=3, is_jump_target=False)
          stack (0): empty
  2 Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval='boom', argrepr="'boom'", offset=2, starts_line=None, is_jump_target=False)
          stack (1):
           TOS0: <class 'type'> :: <class 'TypeError'>
  4 Instruction(opname='CALL_FUNCTION', opcode=131, arg=1, argval=1, argrepr='', offset=4, starts_line=None, is_jump_target=False)
          stack (2):
           TOS0: <class 'str'> :: 'boom'
           TOS1: <class 'type'> :: <class 'TypeError'>
  6 Instruction(opname='RAISE_VARARGS', opcode=130, arg=1, argval=1, argrepr='', offset=6, starts_line=None, is_jump_target=False)
          stack (1):
           TOS0: <class 'TypeError'> :: TypeError('boom')

trace: the control transfer back to `main`

Now we see a discontinuity in the trace: the control transfers directly from the raise statement in Exploder.__mul__ to main -- specifically, the place in main that the SETUP_EXCEPT bytecode at the very beginning told us to trap to in the event of an exception: PC 18.

The bytecodes that we trap to generally execute the "exception testing clause" that's given by the except (FooException, BarException) as e syntax (called an except_clause in the CPython grammar). You can see the disassembly of this sequence of bytecodes in main:

13     >>   18 DUP_TOP
            20 LOAD_GLOBAL              1 (TypeError)
            22 COMPARE_OP              10 (exception match)
            24 POP_JUMP_IF_FALSE       68
            26 POP_TOP
            28 STORE_FAST               3 (e)
            30 POP_TOP
            32 SETUP_FINALLY           22 (to 56)

Translated to prose: see if it's a type error, if not jump down to the else clause, otherwise bind the exception as e and fall through to this except suite of statements that handles a TypeError.

trace: we find the `SETUP_EXCEPT` is munged!

Before we even execute that sequence of bytecodes, however, we notice something interesting...

Looking at the trace, we see that, upon trapping to PC 18, the SETUP_EXCEPT block stack entry has now been munged into an EXCEPT_HANDLER opcode instead, and we have a peculiar new configuration of things on the stack:

type_error_from_udt_mul.py:13
 18 Instruction(opname='DUP_TOP', opcode=4, arg=None, argval=None, argrepr='', offset=18, starts_line=13, is_jump_target=True)
          stack (6):
           TOS0: <class 'type'> :: <class 'TypeError'>
           TOS1: <class 'TypeError'> :: TypeError('boom')
           TOS2: <class 'traceback'> :: <traceback object>
           TOS3: <class 'NoneType'> :: None
           TOS4: <null>
           TOS5: <null>
         f_iblock: 1
          blockstack 0: type: EXCEPT_HANDLER handler: -1 level: 0

So how did that happen? What are these six values and how they did they get there?

The unwinding procedure

"The answer's not in the box, it's in the band." -- Antitrust

As we noted earlier, the block-stack pushing bytecodes don't really do anything themselves -- what they do is drop records that the unwinding procedure might need to pick up and take action on under exceptional circumstances.

"The functionality's not in the bytecode, it's in the unwinder."

—this blog trying to be cool like the cinematic masterpiece that is Antitrust

Actually, it's more precisely "a combination of the unwinder and how the bytecode generator chooses to translate the AST into the bytecode sequence", but it's harder to make that sound catchy.

Remember back: after our raise, CPython "unwound" the current frame (f invocation) looking for an block stack entry capable of dealing with the exception.

Because f did not have a way of dealing with the exception, the exception was then set against the current Python thread state, and an error code was returned to the caller; i.e. main's function call bytecode. This procedure generally allows an exception to traverse multiple Python stack frames looking for appropriate handler(s).

(Note that, because of finally blocks and re-raises, the exception is not necessarily quashed when an appropriate handler is found.)

Once this unwinding procedure finds a SETUP_EXCEPT or SETUP_FINALLY handler (as it did find a SETUP_EXCEPT in main), it marks that block as "exception handling work in progress", switching the block entry opcode to EXCEPT_HANDLER, then dumps some exception state onto the stack for the handler code to observe, and then jumps to the handler.

It then becomes the trapped-to handler bytecode's responsibility to figure out whether it is appropriate to handle the particular exception that was raised (e.g. is it a ValueError clause for a ValueError-subclass exception?), and if not, continue down the chain of except-clauses that may ultimately result in re-raising the exception.

When except-clauses are present, the exception is re-raised in practice by jumping to an END_FINALLY bytecode at the end of the chain of the except-clause tests.

Back to the trace

Given this context, we can now make some sense of what we saw on the stack: when the unwinding process found an appropriate handler, the CPython interpreter unwinding procedure placed the relevant data onto the stack for the handler bytecode to observe, and then jumped to the handler.

The interpreter pushed the traceback, value, and exception type for the exception formerly being handled (so we can deal with potentially-nested exception handling).

In this case, since we're not nested in another exception handler, there is nothing to describe, so it pushed (nullptr, nullptr, None). Fun that this is a way you actually get null pointers on the value stack!
Then the interpreter pushed the traceback, value, and exception type for the exception currently being raised. The exception handlers will perform tests on this to see if they're appropriate to handle the exception (e.g. should run the statements contained within except-clauses).

Generally, with the values set up like this on the stack, the exception handler can process the top three values to see if it wants to quash the exception, and delegate to an END_FINALLY to either restore the exception formerly being handled or re-raise the exception if the newly raised exception was unhandled after all.

Aside: Something that wasn't obvious to me is that traceback objects aren't sequences of filenames / line numbers, they actually refer to frames (which may be dead / done executing), and external routines walk the frame stack; e.g. to print tracebacks as text.

Variations

From this basic understanding we can play with syntactic features in the example to see how CPython implements various things.

Bare `except`

Simpler than the original, we don't need to do an exception match with a bare "except" clause:

>>> def main(a, b, c):
...     try:
...         f(a, b, c)
...     except TypeError as e:
...         pass
...     else:
...         raise AssertionError
>>> dis.dis(main)  # [cdl] With my annotations inline!
  2           0 SETUP_EXCEPT            16 (to 18)

  3           2 LOAD_GLOBAL              0 (f)
              4 LOAD_FAST                0 (a)
              6 LOAD_FAST                1 (b)
              8 LOAD_FAST                2 (c)
             10 CALL_FUNCTION            3
             12 POP_TOP
             14 POP_BLOCK
             16 JUMP_FORWARD            12 (to 30)

# [cdl] No test! The END_FINALLY here seems like dead code.
  4     >>   18 POP_TOP
             20 POP_TOP
             22 POP_TOP

  5          24 POP_EXCEPT
             26 JUMP_FORWARD             2 (to 30)
             28 END_FINALLY
        >>   30 LOAD_CONST               0 (None)
             32 RETURN_VALUE

In this example, we see the AST-to-bytecode emitter doesn't play too many optimization tricks -- even if the END_FINALLY is apparently dead (unreachable), it'll still be emitted. Similarly for the emission of SETUP_LOOP blocks even without a continue or break keyword being present in the loop.

Additionally, some of the bytecode ordering decisions seem arbitrary so far as I can tell; e.g. why put a SETUP_LOOP bytecode above iterable-creation instead of below it, when the only thing that observes a SETUP_LOOP on the block stack is a continue or break statement? May relate to debugging tools, but it's hard for me to intuit the reason.

Binding the exception value

If we bind the exception to a name, we can see CPython insert a little finally block to ensure the exception binding doesn't leak out beyond the scope.

>>> def main(a, b, c):
...     try:
...         f(a, b, c)
...     except TypeError as e:
...         pass
...     else:
...         raise AssertionError
>>> dis.dis(main)  # [cdl] With my annotations inline!
 11           0 SETUP_EXCEPT            16 (to 18)

 12           2 LOAD_GLOBAL              0 (f)
              4 LOAD_FAST                0 (a)
              6 LOAD_FAST                1 (b)
              8 LOAD_FAST                2 (c)
             10 CALL_FUNCTION            3
             12 POP_TOP
             14 POP_BLOCK
             16 JUMP_FORWARD            34 (to 52)

 13     >>   18 DUP_TOP
             20 LOAD_GLOBAL              1 (TypeError)
             22 COMPARE_OP              10 (exception match)
             24 POP_JUMP_IF_FALSE       50
             26 POP_TOP  # [cdl] Don't need the type anymore.
# [cdl] Bind the exception value as 'e'.
             28 STORE_FAST               3 (e)
             30 POP_TOP  # [cdl] Don't need the traceback.
             # [cdl] Ensure we remove the binding for "e" regardless of
             # what the except block does so it doesn't leak out into the
             # enclosing scope.
             32 SETUP_FINALLY            4 (to 38)

# [cdl] Turns out our except clause doesn't do anything.
 14          34 POP_BLOCK
# [cdl] This none on the stack signals to END_FINALLY that nothing needs to be
# raised, the exception is quashed.
             36 LOAD_CONST               0 (None)

# [cdl] Clobber the exception by storing None to the slot, then deleting the
# slot (so subsequent LOAD_FAST references cause UnboundLocalErrors).
        >>   38 LOAD_CONST               0 (None)
             40 STORE_FAST               3 (e)
             42 DELETE_FAST              3 (e)
# [cdl] Artificially-introduced-finally block is now done (for unbinding "e").
             44 END_FINALLY
# [cdl] We know we got into an except handler so get rid of the EXCEPT_HANDLER on
# the block stack.
             46 POP_EXCEPT
# [cdl] Hop over the else clause.
             48 JUMP_FORWARD             6 (to 56)
        >>   50 END_FINALLY

# [cdl] Else clause is an assertion error.
 16     >>   52 LOAD_GLOBAL              2 (AssertionError)
             54 RAISE_VARARGS            1
# [cdl] Implicit return of None.
        >>   56 LOAD_CONST               0 (None)
             58 RETURN_VALUE

I can't intuit why exactly it's better to delete the exception name from locals than to have it leak out into the function scope, especially given that the other locals bound in the exception suite do leak out into function scope. (Perhaps it's just more resource-release friendly? Would be interesting to see the historical perspective that led to that decision if somebody has a reference.)

try/except/finally

We can see that when we introduce a syntactic finally block into the mix, the bytecode emitter simply wraps a SETUP_FINALLY/END_FINALLY pair around the sequence we saw earlier (our handler block happens to be empty so it traps directly to an END_FINALLY).

>>> def main(a, b, c):
...     try:
...         f(a, b, c)
...     except TypeError:
...         pass
...     finally:
...         pass
>>> dis.dis(main)  # [cdl] With my annotations inline!
# [cdl] At the very beginning we set up a finally block.
 11           0 SETUP_FINALLY           42 (to 44)
# [cdl] Then we set up our except block which gets first dibs to look at an
# exception if one occurs (because it's newer on the block stack).
              2 SETUP_EXCEPT            16 (to 20)

 12           4 LOAD_GLOBAL              0 (f)
              6 LOAD_FAST                0 (a)
              8 LOAD_FAST                1 (b)
             10 LOAD_FAST                2 (c)
             12 CALL_FUNCTION            3
             14 POP_TOP
             16 POP_BLOCK
             18 JUMP_FORWARD            20 (to 40)

 13     >>   20 DUP_TOP
             22 LOAD_GLOBAL              1 (TypeError)
             24 COMPARE_OP              10 (exception match)
             26 POP_JUMP_IF_FALSE       38
             28 POP_TOP
             30 POP_TOP
             32 POP_TOP

 14          34 POP_EXCEPT
             36 JUMP_FORWARD             2 (to 40)
        >>   38 END_FINALLY
        >>   40 POP_BLOCK
             # [cdl] Running END_FINALLY with None of the stack indicates
             # everything is ok.
             42 LOAD_CONST               0 (None)

 16     >>   44 END_FINALLY
             46 LOAD_CONST               0 (None)
             48 RETURN_VALUE

We see on the "happy path" for an END_FINALLY, the bytecode emitter will push a None to tell the END_FINALLY that all is well and no exception needs to be re-raised.

Other Constructs

Loop control

Loop continues are also "exceptional control transfers" that are handled via the unwinder. The BREAK_LOOP and CONTINUE_LOOP opcodes just set the "why" flag and go to the unwinder sequence. The CPython source notes that these might be considered "pseudo exceptions".

A clear reason one might think of these as "pseudo exceptions" is that a continue, for example, can cause you to exit lexical blocks, like the try block in the below:

accum = 0
for i in range(4):
    try:
        if i % 2 == 0:
            continue
    finally:
        accum += i

assert accum == 0+1+2+3, accum

Even though this code continues to the next iteration, the finally clause must trigger, so the block stack unwinder must kick in. The continue nested within a lexical block actually causes the CONTINUE_LOOP bytecode to be emitted, which acts like a pseudo-exception-raise -- it unwinds the block stack until the nearest SETUP_LOOP on the block stack is encountered.

But wait, given that we have to execute that finally block, how do we "remember" to continue on to the head of the loop when we're done? How does the "continuation" from the end of the finally block to the loop header occur?

As it turns out, the unwinder has a way to signal to the finally block, which is terminated by the END_FINALLY bytecode, that when it's done it should proceed with the "continue" procedure: it pushes the "we're continuing" status code onto the stack before jumping into the finally block, along with the target bytecode PC we were continuing to. Eventually the unwind process will hit the SETUP_LOOP that the continue is nested within when all the finally blocks are taken care of.

Same goes for return within a finally block, but instead of the value being pushed to the stack being a program counter, it's the return value that we should continue to try to return through the remainder of the unwind process. The "we're returning" status code is placed on the stack in this case, which discriminates between e.g. continuing a loop and returning a value.

`with` blocks

with blocks add a few new bytecodes on top of what we've already established, but no new block-stack entry kinds. This seems owed to the fact they're a fancy kind of potentially-exception-quashing finally block.

There is a SETUP_WITH that uses a SETUP_FINALLY block stack entry (since with is finally-like), then the combination of WITH_CLEANUP_START and WITH_CLEANUP_END are used to potentially quash a raised exception in the context manager's __exit__ function.

I'll leave a detailed example of with block execution for another day.

`yield`

Yielding leaves the frame state (block stack and value stack) in tact, simply popping the return value and returning it as the result of the frame execution, waiting for later resumption.

This implies a "finally" block that contains a yield would not execute until after the frame was resumed from its yield, as we can see in the following:

>>> def my_generator():
...     try:
...         yield 42
...     finally:
...         print('whoo')
>>> g = my_generator()
>>> next(g)
42
>>> next(g)
whoo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Recapping exceptions, case analysis

Just to recap what we've seen of exception handling with some case analysis:

You try-except for an exception to potentially happen, and it does not happen.
You try-except for an exception to potentially happen, and it does happen.
You try-except for an exception to potentially happen, but a different one happens.

In (1) we execute a SETUP_EXCEPT bytecode, which pushes a block onto the block stack, but then we get to the end of the try scope, and we POP_BLOCK to lop it back off the block stack.

Both (2) and (3) follow a common procedure: our SETUP_EXCEPT is transmorgified into an EXCEPT_HANDLER via the unwinding procedure as we pop down to the expected value stack level for the handler (in case there was extra expression cruft on the stack). Then we push the thread's exc_info (exception currently being handled) onto the stack, and switches the curexc (exception currently being raised) to be the exc_info (exception currently being handled). Then we jump to the handler PC noted in the block stack entry. This is where (2) will differ from (3) -- the code in the handler will decide whether or not the exception is a match, or whether it should jump to another handler. If no handler in the except clause chain claims responsibility, we hit the END_FINALLY which re-raises the exception, and the unwind procedure continues "upwards".

Some takeaways

Only one exception can be raised from a frame xor only one value can be returned from a frame.

Can we enter a frame under an exception? Sure.

def f():
    try:
        # Trigger an EXCEPT_HANDLER so we can see both the exception being
        # raised (from the caller) and the exception being handled (from
        # right here) on the stack.
        raise TypeError
    except TypeError as e:
        print(e)

def main():
    try:
        raise ValueError
    finally:
        f()  # Invoke f under an exception state.

main()

The above leads to the more general property: exception handling is stacky. You can run code in a handler or a finally block that causes more exceptions to happen.

Stacky exception handling can be paused/resumed within generators.

>>> def my_generator():
...     try:
...         raise ValueError
...     except:
...         yield 42
...         raise
...
>>> g = my_generator()
>>> next(g)
42
>>> sys.exc_info()
(None, None, None)
>>> next(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in my_generator
ValueError

(Notable that the previous sequence doesn't seem to work as expected on Python 2.7!) There is specific code that puts generator-specific exception state on-to and off-of the exception state stack.

You can re-raise stuff that didn't come from your frame, CPython traverses to get the top-most exception:
```
>>> def reraiser():
...     raise
...
>>> try:
...   raise ValueError
... except:
...   reraiser()
...
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<stdin>", line 2, in <module>
ValueError
```
Somewhat surprising implication of this is that it is easy to grab a value in the program that is executing arbitrarily older-than-you in the stack (if you're running under a handler).
When can the stack level be non-zero for a SETUP_EXCEPT block stack record? Sometimes bytecodes persistently push stuff onto the stack across statements, waiting for another bookend to come along.

Understanding the nuances of the CPython block stack has been fun (also because I had forgotten most of what I once knew about the SpiderMonkey VM's block stack)! Hope this prose has offered some helpful examples / potential guidance for others who end up exploring it as well.

Appendix: CPython Notes

Primarily I've been reading the Python 3.7 sources, and all the world tends to unfurl from ceval.c!

label: `fast_block_end`

Control is transferred in the interpreter loop to fast_block_end (source) when we don't follow the normal "stacky" evaluation flow; e.g. RAISE_VARARGS bytecode transfers control to this label, RETURN_VALUE does as well, {BREAK,CONTINUE}_LOOP, END_FINALLY, and if we flag that an exception has occurred in something the interpreter loop has called (e.g. if you go to load a value but find it is an unbound local).

Many bytecodes call some Python API function, check whether the result of it is null, and if so transfer control to the error label that falls through to the fast_block_end handler with the why status code set to WHY_EXCEPTION.

If we pop the whole block stack for this frame as part of the unwinding procedure, then we have an unhandled exception: we then assert that PyErr_Occurred is true so that we know the caller will be able to see the error, and we return nullptr so that they know to go to their exception handling case.

As mentioned earlier, an opcode like SETUP_EXCEPT doesn't do anything besides put itself onto the block stack -- fast_block_end does the work of observing that a SETUP_EXCEPT is present on the block stack and doing the right corresponding thing once an exception occurs. An implication of this is that the SETUP_{EXCEPT,FINALLY} and fast_block_end are defined symbiotically, and the bytecode emitter must navigate their mutual contract.

With fast_block_end unwinding an exception through a SETUP_{EXCEPT,FINALLY} block, it will push the previous exception state, and the currently observed exception that we will attempt to handle.

bytecode: `SETUP_FINALLY`

Just like the SETUP_{EXCEPT,LOOP} bytecodes, the SETUP_FINALLY bytecode (source) simply pushes an entry onto the block stack with the same opcode. They key to its functionality is "what happens to it when we're unwinding and find such an entry".

If we unwind the block stack and encounter a SETUP_FINALLY block, we need to execute the "finally" code. That's the idea of "finally": it's supposed to run unconditionally, regardless of whether exceptions are going on or not.

The way the interpreter loop communicates to the finally handler is by:

Optionally pushing the "return value" if we're returning (or continuing, which can also take a value similar to a return).
Pushing the "why" code that caused us to transfer control to the finally handler.
Jumping to the finally handler.

The finally handler will then do its thing... unlike an except block, it doesn't require knowledge about values from the outside world, so it can do expression evaluation with its stack values sitting on top of the "why" code that we pushed earlier. When the finally block is ultimately complete, we hit the END_FINALLY bookend that will observe the reason we entered the finally block and keep it going (now that we've executed everything in the finally-handler bytecode sequence).

bytecode: `POP_EXCEPT`

The POP_EXCEPT bytecode (source) pops the block stack when it knows the top of the block stack must be an EXCEPT_HANDLER, and unwinds the except handler block, restoring the previous exception state; i.e. if we make it to the end of an except clause bytecode sequence and so the current exception is quashed and the previous exception state should be restored for potential handling "outside" this construct.

macro: `UNWIND_EXCEPT_HANDLER(b)`

The UNWIND_EXCEPT_HANDLER macro (source) pops the value stack to the block b's level + 3, then restores the remaining three values into exc_info (exception currently being handled), while dropping refs on the previous exc_info.

bytecode: `END_FINALLY`

The END_FINALLY bytecode (source) pops a "status" off the stack: this is either a WHY_* enum value integer, an exception class, or None. Those values change the behavior of the END_FINALLY drastically.

None is the common/easy case: we simply dispatch the next bytecode.
The other values generally mean we'll take the asynchronous control transfer path (fast_block_end).
- If we see an exception class as the status, the END_FINALLY is likely being used as the end of an except-clause chain: we pop the value and traceback (since we've already popped the exc_type), and restore it as the current exception, set the exception status, and start the unwind procedure.

END_FINALLY can be used as the jump target for the end of an except-clause chain -- the exception class will be at the top of the stack after the except-clause has tested-and-failed, and so it will do its "re-raise" behavior.

This was mentioned above, but is an observation that stands out particularly with END_FINALLY: the Python bytecode emitter emits bytecodes in a stylized fashion that works with the bytecode/unwinder contract. The fact END_FINALLY works under a few different circumstances that are somewhat distinct is an implementation choice of the CPython VM. It would definitely be interesting to find artifacts on the reason END_FINALLY doing these few different things was chosen over alternatives!

Basic idea and example

Observing stack values on trace

trace: the try

trace: the invocation of f

trace: the control transfer to the custom __mul__

trace: the control transfer back to main

trace: we find the SETUP_EXCEPT is munged!