December 26, 2010

A generator protocol trap

You should be cautious that the functions you call from generators don't accidentally raise StopIteration exceptions.


Generators are generally a little tricky in a behind-the-scenes sort of way. Performing a return from a generator is different from a function's return — a generator's return statement actually raises a StopIteration instance.

>>> def foo():
...     yield 1
...     return
>>> i = foo()
>>> next(i)
>>> try:
...     next(i)
... except StopIteration as e:
...     print(id(e), e)


There's a snag that I run into at times: when you're writing a generator and that generator calls into other functions, be aware that those callees may accidentally raise StopIteration exceptions themselves.

def insidious(whoops=True):
    if whoops:
        raise StopIteration
        return 2

def generator():
    yield 1
    yield insidious()
    yield 3

if __name__ == '__main__':
    print([i for i in generator()])

This program happily prints out [1].

If you substitute StopIteration with ValueError, you get a traceback, as you'd probably expect. The leakage of a StopIteration exception, however, propagates up to the code that moves the generator along, [*] which sees an uncaught StopIteration exception and terminates the loop.


The same trap exists in the SpiderMonkey (Firefox) JavaScript dialect:

function insidious() {
    throw StopIteration;

function generator() {
    yield 1;
    yield insidious();
    yield 3;

print(uneval([i for (i in generator())]));

Which prints [1].

VM guts

As a side note for those interested in the guts of virtual machines, the primordial StopIteration class is known to the virtual machine, regardless of user hackery with the builtins:

>>> def simple_generator():
...     yield 1
...     raise StopIteration
...     yield 3
>>> print([i for i in simple_generator()])
>>> class FakeStopIteration(Exception): pass
>>> __builtins__.StopIteration = FakeStopIteration
>>> print([i for i in simple_generator()])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "<stdin>", line 3, in simple_generator


You may look at this issue and think:

The fundamental problem is that uncaught exceptions are raised over any number of function invocations. Generators should have been designed such that you have to stop the generator from the generator invocation.

In an alternate design, a special value (like a StopIteration singleton) might be yield'd from the generator to indicate that it's done.

One issue with that alternative approach is that you're introducing a special-value-check into the hot path within the virtual machine — i.e. you'd be slowing down the common process of yielding iteration items. Using an exception only requires the VM to extend the normal exception handling machinery a bit and adds no additional overhead to the hot path. I think the measurable significance of this overhead is questionable.

Another issue is that it hurts the lambda abstraction — namely, the ability to factor your generator function into smaller helper functions that also have the ability to terminate the generator. In the absence of a language-understood exception, the programmer has to invent a new way for the helper functions to communicate to the generator that they'd like the generator to terminate.



The FOR_ITER bytecode in CPython VM for-loops.