Posts Tagged ‘Refactoring’

Idiomatic Python refactoring: for-else, “in” (contains) operator

Wednesday, October 1st, 2008

I was perusing the App Engine SDK and I came across this snippet:

if self.choices:
  match = False
  for choice in self.choices:
    if choice == value:
      match = True
  if not match:
    raise BadValueError('Property %s is %r; must be one of %r' %
                        (self.name, value, self.choices))

Since I don’t work with many other Python programmers, I always have trouble figuring out what interesting tidbits would be useful to post in, say, a blog entry. I don’t have a good understanding of the popular knowledge level, but I figure that I can’t go too wrong refactoring code written by Google engineers (who I naively assume are all as cool as Steve Yegge). [*]

The for-else statement

Let’s forget about self for now [†] and refactor to use an obscure (but useful) Python feature, the for-else construct. for-else removes the necessity for the boolean-flag-state idiom from the original code, which is often used in lower level languages. [‡]

if choices:
    for choice in choices:
        if choice == value:
            break
    else:
        raise BadValueError

The for-else statement looks a little strange when you first encounter it, but I’ve come to love it. The else suite is evaluated if you don’t break out of the for loop. In this case, if we didn’t break out of the for loop, then we never found a value equivalent to choice.

We also gain some efficiency over the original by using the break statement as soon as we find a match: there’s no need to keep looking if you’ve already found a result! This can save you from iterating over all len(choices) items if you find it’s a valid choice in the first iteration.

in (contains) operator

Here is an even more readable and Python-like refactoring that uses the in operator: [§]

if choices and value not in choices:
    raise BadValueError

The in operator works on any iterable object and performs the same behavior as the code above: it looks for any item within self.choices such that choice == item. If it finds it early in the list, it won’t keep looking. This is similar behavior to our early break statement from the first refactoring.

Just like the original code with the for loop, the in operator raises a TypeError if choices is not iterable. The in operator is effectively a drop-in replacement for the (more verbose) for loop when it comes to membership testing.

Footnotes

[*] You should read his blog if you don’t already.
[†] For the language lawyers: we’re forgetting about the fact that this code was intended to be executed in a bound instance method. ;)
[‡] For example, C. For more information on programming languages and their "heights", see this Wikipedia entry.
[§] Yeah, yeah… technically it’s the not in operator.

Python’s generators sure are handy

Wednesday, January 23rd, 2008

While rewriting some older code today, I ran across a good example of the clarity inherent in Python’s generator expressions. Some time ago, I had written this weirdo construct:

for regex in date_regexes:
    match = regex.search(line)
    if match:
        break
else:
    return
# ... do stuff with the match

The syntax highlighting makes the problem fairly obvious: there’s way too much syntax!

First of all, I used the semi-obscure "for-else" construct. For those of you who don’t read the Python BNF grammar for fun (as in: the for statement), the definition may be useful:

So long as the for loop isn’t (prematurely) terminated by a break statement, the code in the else suite gets evaluated. To restate (in the contrapositive): the code in the else suite doesn’t get evaluated if the for loop is terminated with a break statement. From this definition we can deduce that if a match was found, I did not want to return early.

That’s way too much stuff to think about. Generators come to the rescue!

def first(iterable):
    """:return: The first item in the iterable that evaluates
    as True.
    """
    for item in iterable:
        if item:
            return item
    return None
 
match = first(regex.search(line) for regex in regexes)
if not match:
    return
# ... do stuff with the match

At a glance, this is much shorter and more comprehensible. We pass a generator expression to the first function, which performs a kind of short-circuit evaluation — as soon as a match is found, we stop running regexes (which can be expensive). This is a pretty rockin’ solution, so far as I can tell.

Prior to generator expressions, to do something similar to this we’d have to use a list comprehension, like so:

match = first([regex.search(line) for regex in regexes])
if not match:
    return
# ... do stuff with the match

We dislike this because the list comprehension will run all of the regexes, even if one already found a match. What we really want is the short circuit evaluation provided by generator expressions and the any builtin, as shown above. Huzzah!

Edit

Originally I thought that the any built-in returned the first object which evaluated to a boolean True, but it actually returns the boolean True if any of the objects evaluate to True. I’ve edited to reflect my mistake.