Python's generators sure are handy
While rewriting some older code today, I ran across a good example of the clarity inherent in Python's generator expressions. Some time ago, I had written this weirdo construct:
for regex in date_regexes: match = regex.search(line) if match: break else: return # ... do stuff with the match
The syntax highlighting makes the problem fairly obvious: there's way too much syntax!
First of all, I used the semi-obscure "for-else" construct. For those of you who don't read the Python BNF grammar for fun (as in: the for statement), the definition may be useful:
So long as the for loop isn't (prematurely) terminated by a break statement, the code in the else suite gets evaluated. To restate (in the contrapositive): the code in the else suite doesn't get evaluated if the for loop is terminated with a break statement. From this definition we can deduce that if a match was found, I did not want to return early.
That's way too much stuff to think about. Generators come to the rescue!
def first(iterable): """:return: The first item in the iterable that evaluates as True. """ for item in iterable: if item: return item return None match = first(regex.search(line) for regex in regexes) if not match: return # ... do stuff with the match
At a glance, this is much shorter and more comprehensible. We pass a generator expression to the first function, which performs a kind of short-circuit evaluation — as soon as a match is found, we stop running regexes (which can be expensive). This is a pretty rockin' solution, so far as I can tell.
Prior to generator expressions, to do something similar to this we'd have to use a list comprehension, like so:
match = first([regex.search(line) for regex in regexes]) if not match: return # ... do stuff with the match
We dislike this because the list comprehension will run all of the regexes, even if one already found a match. What we really want is the short circuit evaluation provided by generator expressions and the any builtin, as shown above. Huzzah!
Originally I thought that the any built-in returned the first object which evaluated to a boolean True, but it actually returns the boolean True if any of the objects evaluate to True. I've edited to reflect my mistake.