Learning Python by example: list comprehensions

My friend, who is starting to learn Python 2.x, asked me what this snippet did:

def collapse(seq):
    # Preserve order.
    uniq = []
    [uniq.append(item) for item in seq if not uniq.count(item)]
    return uniq

This is not a snippet that should be emulated (i.e. it’s bad); however, it makes me happy: there are so many things that can be informatively corrected!

What is a list comprehension?

A list comprehension is a special brackety syntax to perform a transform operation with an optional filter clause that always produces a new sequence (list) object as a result. To break it down visually, you perform:

new_range = [i * i          for i in range(5)   if i % 2 == 0]

Which corresponds to:

*result*  = [*transform*    *iteration*         *filter*     ]

The filter piece answers the question, "should this item be transformed?" If the answer is yes, then the transform piece is evaluated and becomes an element in the result. The iteration [*] order is preserved in the result.

Go ahead and figure out what you expect new_range to be in the prior example. You can double check me in the Python shell, but I think it comes out to be:

>>> new_range = [i * i for i in range(5) if i % 2 == 0]
>>> print new_range
[0, 4, 16]

If it still isn’t clicking, we can try to make the example less noisy by getting rid of the transform and filter — can you tell what this will produce?

>>> new_range = [i for i in range(5)]

So what’s wrong with that first snippet?

As we observed in the previous section, a list comprehension always produces a result list, where the elements of the result list are the transformed elements of the iteration. That means, if there’s no filter piece, there are exactly as many result elements as there were iteration elements.

Weird thing number one about the snippet — the list comprehension result is unused. It’s created, mind you — list comprehension always create a value, even if you don’t care what it is — but it just goes off to oblivion. (In technical terms, it becomes garbage.) When you don’t need the result, just use a for loop! This is better:

def colapse(seq):
    """Preserve order."""
    uniq = []
    for item in seq:
        if not uniq.count(item):
            uniq.append(item)
    return uniq

It’s two more lines, but it’s less weird looking and wasteful. "Better for everybody who reads and runs your code," means you should do it.

Moral of the story: a list comprehension isn’t just, "shorthand for a loop." It’s shorthand for a transform from an input sequence to an output sequence with an optional filter. If it gets too complex or weird looking, just make a loop. It’s not that hard and readers of your code will thank you.

Weird thing number two: the transform, list.append(item), produces None as its output value, because the return value from list.append is always None. Therefore, the result, even though it isn’t kept anywhere, is a list of None values of the same length as seq (notice that there’s no filter clause).

Weird thing number three: list.count(item) iterates over every element in the list looking for things that == to item. If you think through the case where you call collapse on an entirely unique sequence, you can tell that the collapse algorithm is O(n2). In fact, it’s even worse than it may seem at first glance, because count will keep going all the way to the end of uniq, even if it finds item in the first index of uniq. What the original author really wanted was item not in uniq, which bails out early if it finds item in uniq.

Also worth mentioning for the computer-sciency folk playing along at home: if all elements of the sequence are comparable, you can bring that down to O(n * log n) by using a "shadow" sorted sequence and bisecting to test for membership. If the sequence is hashable you can bring it down to O(n), perhaps by using the set datatype if you are in Python >= 2.3. Note that the common cases of strings, numbers, and tuples (any built-in immutable datatype, for that matter) are hashable.

From Python history

It’s interesting to note that Python Enhancement Proposal (PEP) #270 considered putting a uniq function into the language distribution, but withdrew it with the following statement:

Removing duplicate elements from a list is a common task, but there are only two reasons I can see for making it a built-in. The first is if it could be done much faster, which isn’t the case. The second is if it makes it significantly easier to write code. The introduction of sets.py eliminates this situation since creating a sequence without duplicates is just a matter of choosing a different data structure: a set instead of a list.

Remember that sets can only contain hashable elements (same policy as dictionary keys) and are therefore not suitable for all uniq-ifying tasks, as mentioned in the last paragraph of the previous section.

Footnotes

[*] "Iteration" is just a fancy word for "step through the sequence, element by element, and give that element a name." In our case we’re giving the name i.

Tags: ,

4 Responses to “Learning Python by example: list comprehensions”

  1. niall Says:

    I think its fair to say that list comprehensions were a straightforward port, in the 70s by Burstall and Darlington and included in a host of functional languages (Haskell, Miranda, KRC, and the other usual suspects), of the usual mathematical set building notation. {function | variable \in inputset, predicate}.

    Other data structures can be “comprehended”, sets, bags, etc. But Wadler later noted that the notation applies to any monad because there’s a correspondence between the 3 components of the monad and the forms of qualifier. That is, unit with the empty qualifier, map with the generators, and and join with the composition of the filters.

    I recall the Kleisli database work, in bioinformatics, used monad comprehension ideas to build a practical query language by restricting structural recursion to homomorphisms on the commutative idempotent monoid of sets. They rebuilt a language around this restriction (CPL). Later various DSLs have used similar monadic comprehension ideas in Haskell to build embedded query languages for databases.

    Then Erik Meijer started applying similar ideas in Visual Basic and C# (after his earlier research languages at MS) which brings us to LINQ.. which is essentially monad comprehensions sufficiently disguised as warm fuzzy things to get into a mainstream language. Interesting to watch C# arguably become more VB like to express various things – I seem to recall Erik arguing that VB was the best mainstream language for the integration of some of his ideas due to the mix of static and dynamic typing. Later of course other languages have added local type propagation to get a similar feel.. I argue with calling it type inference really since it makes people think that the incredibly weak form of it the see in C#/C++ (new meaning of auto) is really comparable to the immensely more powerful forms found in other languages.

  2. Michael Foord Says:

    Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.

    In fact in Python 3 ‘map’ has been removed as a builtin because it can be completely replaced by list comprehensions – including where you use the “transform” for its side effect instead of building a list (which is usually how I use map).

    Using list comprehensions instead of a for loop isn’t prima-facie “bad” or less readable, even when you don’t need the result. Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…

    Sure – sometimes unpacking a comprehension into a loop can be a win for readability, but this can be true whether you are keeping the list or not.

  3. nes Says:

    one way of looking at it is that list comprehensions in python are just syntactic sugar, this:

    mylist=[expression for elem in sourcelist]

    is equivalent to this:

    mylist=[]; for elem in sourcelist: mylist.append(expression)

    So your example:

    def collapse(seq):
        # Preserve order.
        uniq = []
        [uniq.append(item) for item in seq if not uniq.count(item)]
        return uniq

    Is equivalent of this:

    def collapse(seq):
        # Preserve order.
        uniq = []
        tmp=[]
        for item in seq:
            if not uniq.count(item):
                tmp.append(uniq.append(item))
        return uniq
  4. VaporWarning» Blog Archive » Efficiency of list comprehensions Says:

    [...] VaporWarning __author__ = 'Chris Leary'; class VaporWarning(UserWarning): raise NotImplementedError « Learning Python by example: list comprehensions [...]

Leave a Reply