Efficiency of list comprehensions

I’m psyched about the awesome comments on my previous entry, Python by example: list comprehensions. Originally this entry was just a response to those comments, but people who stumbled across this entry on the interwebz found the response format too confusing, so I’ve restructured it for posterity.

Efficiency of the more common usage

Let’s look at the efficiency of list comprehensions in the more common usage, where the comprehension’s list result is actually relevant (or, in compiler-speak, live-out).

Using the following program, you can see the time spent in each implementation and the corresponding bytecode sequence:

import dis
import inspect
import timeit
 
 
programs = dict(
    loop="""
result = []
for i in range(20):
    result.append(i * 2)
""",
   loop_faster="""
result = []
add = result.append
for i in range(20):
    add(i * 2)
""",
    comprehension='result = [i * 2 for i in range(20)]',
)
 
 
for name, text in programs.iteritems():
    print name, timeit.Timer(stmt=text).timeit()
    code = compile(text, '<string>', 'exec')
    dis.disassemble(code)
loop 11.1495118141
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
 
  3           6 SETUP_LOOP              37 (to 46)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (20)
             15 CALL_FUNCTION            1
             18 GET_ITER
        >>   19 FOR_ITER                23 (to 45)
             22 STORE_NAME               2 (i)
 
  4          25 LOAD_NAME                0 (result)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 LOAD_CONST               1 (2)
             37 BINARY_MULTIPLY
             38 CALL_FUNCTION            1
             41 POP_TOP
             42 JUMP_ABSOLUTE           19
        >>   45 POP_BLOCK
        >>   46 LOAD_CONST               2 (None)
             49 RETURN_VALUE
loop_faster 8.36096310616
  2           0 BUILD_LIST               0
              3 STORE_NAME               0 (result)
 
  3           6 LOAD_NAME                0 (result)
              9 LOAD_ATTR                1 (append)
             12 STORE_NAME               2 (add)
 
  4          15 SETUP_LOOP              34 (to 52)
             18 LOAD_NAME                3 (range)
             21 LOAD_CONST               0 (20)
             24 CALL_FUNCTION            1
             27 GET_ITER
        >>   28 FOR_ITER                20 (to 51)
             31 STORE_NAME               4 (i)
 
  5          34 LOAD_NAME                2 (add)
             37 LOAD_NAME                4 (i)
             40 LOAD_CONST               1 (2)
             43 BINARY_MULTIPLY
             44 CALL_FUNCTION            1
             47 POP_TOP
             48 JUMP_ABSOLUTE           28
        >>   51 POP_BLOCK
        >>   52 LOAD_CONST               2 (None)
             55 RETURN_VALUE
comprehension 7.08145213127
  1           0 BUILD_LIST               0
              3 DUP_TOP
              4 STORE_NAME               0 (_[1])
              7 LOAD_NAME                1 (range)
             10 LOAD_CONST               0 (20)
             13 CALL_FUNCTION            1
             16 GET_ITER
        >>   17 FOR_ITER                17 (to 37)
             20 STORE_NAME               2 (i)
             23 LOAD_NAME                0 (_[1])
             26 LOAD_NAME                2 (i)
             29 LOAD_CONST               1 (2)
             32 BINARY_MULTIPLY
             33 LIST_APPEND
             34 JUMP_ABSOLUTE           17
        >>   37 DELETE_NAME              0 (_[1])
             40 STORE_NAME               3 (result)
             43 LOAD_CONST               2 (None)
             46 RETURN_VALUE

List comprehensions perform better here because you don’t need to load the append attribute off of the list (loop program, bytecode 28) and call it as a function (loop program, bytecode 38). Instead, in a comprehension, a specialized LIST_APPEND bytecode is generated for a fast append onto the result list (comprehension program, bytecode 33).

In the loop_faster program, you avoid the overhead of the append attribute lookup by hoisting it out of the loop and placing the result in a fastlocal (bytecode 9-12), so it loops more quickly; however, the comprehension uses a specialized LIST_APPEND bytecode instead of incurring the overhead of a function call, so it still trumps.

Using list comprehensions for side effects

I want to address a point that was brought up in the previous entry as to the efficiency of for loops versus list comprehensions when used purely for side effects, but I’ll discuss the subjective bit first, since that’s the least sciency part.

Readability

Simple test – if you did need the result would the comprehension be easily understood? If the answer is yes then removing the assignment on the left hand side doesn’t magically make it less readable…

Michael Foord

First of all, thanks to Michael for his excellent and thought provoking comment!

My response is that removing the use of the result does indeed make it less readable, precisely because you’re using a result-producing control flow construct where the result is not needed. I suppose I’m positing that it’s inherently confusing to do that with your syntax: there’s a looping form that doesn’t produce a result, so that should be used instead. It’s expressing your semantic intention via syntax.

For advanced Pythonistas it’s easy for figure out what’s going on at a glance, but comprehension-as-loop definitely has a "there’s more than one way to do it" smell about it, which also makes it less amenable to people learning the language.

With a viable comprehension-as-loop option, every time a user goes to write a loop that doesn’t require a result they now ask themselves, "Can I fit this into the list comprehension form?" Those mental branches are, to me, what "one way to do it" is designed to avoid. When I read Perl code, I take "mental exceptions" all the time because the author didn’t use the construct that I would have used in the same situation. Minimizing that is a good thing, so I maintain that "no result needed" should automatically imply a loop construct.

Efficiency

Consider two functions, comprehension and loop:

def loop():
    accum = []
    for i in range(20):
        accum.append(i)
    return accum
 
 
def comprehension():
    accum = []
    [accum.append(i) for i in range(20)]
    return accum

N.B. This example is comparing the efficiency of a list comprehension where the result of the comprehension is ignored to a for loop that produces no result, as is discussed in the referenced entry, Python by example: list comprehensions.

Michael Foord comments:

Your alternative for the single line, easily readable, list comprehension is four lines that are less efficient because the loop happens in the interpreter rather than in C.

However, the disassembly, obtained via dis.dis(func) looks like the following for the loop:

2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
 
3           6 SETUP_LOOP              33 (to 42)
            9 LOAD_GLOBAL              0 (range)
           12 LOAD_CONST               1 (20)
           15 CALL_FUNCTION            1
           18 GET_ITER
      >>   19 FOR_ITER                19 (to 41)
           22 STORE_FAST               1 (i)
 
4          25 LOAD_FAST                0 (accum)
           28 LOAD_ATTR                1 (append)
           31 LOAD_FAST                1 (i)
           34 CALL_FUNCTION            1
           37 POP_TOP
           38 JUMP_ABSOLUTE           19
      >>   41 POP_BLOCK
 
5     >>   42 LOAD_FAST                0 (accum)
           45 RETURN_VALUE

And it looks like the following for the comprehension:

2           0 BUILD_LIST               0
            3 STORE_FAST               0 (accum)
 
3           6 BUILD_LIST               0
            9 DUP_TOP
           10 STORE_FAST               1 (_[1])
           13 LOAD_GLOBAL              0 (range)
           16 LOAD_CONST               1 (20)
           19 CALL_FUNCTION            1
           22 GET_ITER
      >>   23 FOR_ITER                22 (to 48)
           26 STORE_FAST               2 (i)
           29 LOAD_FAST                1 (_[1])
           32 LOAD_FAST                0 (accum)
           35 LOAD_ATTR                1 (append)
           38 LOAD_FAST                2 (i)
           41 CALL_FUNCTION            1
           44 LIST_APPEND
           45 JUMP_ABSOLUTE           23
      >>   48 DELETE_FAST              1 (_[1])
           51 POP_TOP
 
4          52 LOAD_FAST                0 (accum)
           55 RETURN_VALUE

By looking at the bytecode instructions, we see that the list comprehension is, at a language level, actually just "syntactic sugar" for the for loop, as mentioned by nes — they both lower down into the same control flow construct at a virtual machine level, at least in CPython.

The primary difference between the two disassemblies is that a superfluous list comprehension result is stored into fastlocal 1, which is loaded (bytecode 29) and appended to (bytecode 44) each iteration, creating some additional overhead — it’s simply deleted in bytecode 48. Unless the POP_BLOCK operation (bytecode 41) of the loop disassembly is very expensive (I haven’t looked into its implementation), the comprehension disassembly is guaranteed to be less efficient.

Because of this, I believe that Michael was mistaken in referring to an overhead that results from use of a for loop versus a list comprehension for CPython. It would be interesting to perform a survey of the list comprehension optimization techniques used in various Python implementations, but optimization seems difficult outside of something like a special Cython construct, because LOAD_GLOBAL range could potentially be changed from the builtin range function. Various issues of this kind are discussed in the (very interesting) paper The effect of unrolling and inlining for Python bytecode optimizations.

Tags: ,

25 Responses to “Efficiency of list comprehensions”

  1. Ian Says:

    I recall years ago being told that list comprehensions were faster than for loops for the reason Michael mentioned, despite building a list and throwing it away. Perhaps this has always been incorrect, but I wonder if somewhere around Python 2.3 something changed. This was pretty universally accepted truth years ago…

  2. Ian Says:
    >>> s="""\
    ... for x in range(200): x*2
    ... """
    >>> t = timeit.Timer(stmt=s)
    >>> t.timeit(number=10000)
    0.18480706214904785
    >>> s="""\
    ... [x*2 for x in range(200)]
    ... """
    >>> t =  timeit.Timer(stmt=s)
    >>> t.timeit(number=10000)
    0.25453495979309082
  3. Michael Foord Says:

    Ok, surprised that list comprehensions aren’t more efficient – but after playing with timeit that’s correct. However I still disagree that using a list comprehension as a looping construct is unreadable… :-)

  4. cdleary Says:

    @Michael Foord: Just a nit: not unreadable, less readable, enough for me to care. :-)

  5. André Roberge Says:

    I think you are missing the point of a list comprehension. Contrary to a for loop, it does NOT have to use append – which is, in some ways, the whole point of using a list comprehension. Here’s how your comprehension() function should have been written:

    >>> def comprehension():
    ...     accum = [i for i in range(20)]
    ...     return accum

    And the resulting byte code is:

      2           0 BUILD_LIST               0
                  3 DUP_TOP             
                  4 STORE_FAST               0 (_[1])
                  7 LOAD_GLOBAL              0 (range)
                 10 LOAD_CONST               1 (20)
                 13 CALL_FUNCTION            1
                 16 GET_ITER            
            >>   17 FOR_ITER                13 (to 33)
                 20 STORE_FAST               1 (i)
                 23 LOAD_FAST                0 (_[1])
                 26 LOAD_FAST                1 (i)
                 29 LIST_APPEND         
                 30 JUMP_ABSOLUTE           17
            >>   33 DELETE_FAST              0 (_[1])
                 36 STORE_FAST               2 (accum)
     
      3          39 LOAD_FAST                2 (accum)
                 42 RETURN_VALUE

    This has fewer bytecode instructions than your for loop example and your comprehension example.

    The example given by Ian in a comment is misleading I believe: the for loop does not create a list. The purpose of a list comprehension is to create a list; if a list is not needed, why create one using a list comprehension?

  6. bc Says:

    Memory allocation is the bottleneck when creating a big list. Thus, in the example above, the for-loop gets an unfair advantage by not having to allocate space for it’s entire result. When I compare

    a=[]
    for i in xrange(200):
        a.append(i*2)

    to

    a = [i*2 for i in xrange(200)]

    I find the list comprehension is ~2x faster than the standard for loop.

  7. cdleary Says:

    @André Roberge: Good point, I didn’t provide enough background: the crux of the discussion in the referenced entry, Learning Python by example: list comprehensions, is whether ignoring the result of a list comprehension is still as efficient as a for loop. I had to provide exactly equivalent constructs to test the efficiency: one which used a for loop construct, and another which used a list comprehension and threw the result away. Thanks for commenting and helping that get cleared up!

  8. Doug Napoleone Says:

    It all comes down to if you are trying to build an actual result.
    The timeit comment above I view as a false example, as it is doing work, but then just throwing it out. You never do that. You are always trying to produce SOME result.

    This is where generators come into play. It is much easier to prototype with list comprehension and then turn it into a generator.

    For example, say you wish to sum the result of a bunch of calls. You could do that in a forloop:

    y=0
    for x in xrange(200): y += x**2

    Or you can do more wasteful list comprehension:

    res = [x**2 for x in xrange(200)]
    sum(res)

    Or you can do something even better:

    sum(x**2 for xrange(200)).

    (xrange being a proxy for some function doing work and returning an iterable)

    As for functions yielding temporary results for this type of building.. use yield in a generator function.

    Often if you really care about the nth level of optimization, you need to sacrifice some readability.

  9. Fredrik Says:

    Hear hear!
    I was skeptical when list comprehensions was introduced.
    The first time I saw it in code it was the first time in I could not figure out what was going on in Python.
    As I had not read about them beforehand and they did not have a keyword or function name I could use to search or locate the documentation for it.
    When I finally found the docs I found it odd that Python had become a “bidirectional language” where you could not read all code syntax left to right but had to jump around to understand it.

    And as it was the new and cool feature I remember people going nuts with it and nest them (The Horror!) and tried to use them whenever they could.

    Since then I’ve leaned to live with it and use them as it is indeed more compact and the addition of generator expressions made the same syntax even more useful.

    So for me it’s now a convenient syntax that I use for very simple loops as a map replacement. And if they would be removed I would probably miss them quite a bit.

    But the introduction of LC also marked the beginning of the end of Python as the “pseudo code” language I could warmly recommend to beginners.

  10. cdleary Says:

    @Doug Napoleone: You say:

    The timeit comment above I view as a false example, as it is doing work, but then just throwing it out. You never do that. You are always trying to produce SOME result.

    That is actually what we’re discussing — whether using a list comprehension but producing no result is a good way to use list comprehensions. In the entry referenced in the first paragraph I argue that it’s not, but @(Michael Foord) believes that it is. That is what leads to the test cases given in the entry and @Ian’s comment.

  11. Doug Napoleone Says:

    @cdleary

    Ah! sorry, yea, that is stupid. Of course it will be slower. List comprehension is not a shorthand for a loop construct. It is shorthand for list construction (which may use multiple loop constructs to do the work).

    I think you have convinced Mr. Foord otherwise ;-)

  12. niall Says:

    > result-producing control flow construct

    I dislike thinking of list comprehensions as control flow; they’re more expression oriented way of constructing a list than control flow. By training and inclination, I prefer pure expression oriented languages and that spills over to any time I use Python. Over-specifying the evaluation order and explicitly looping feels wrong – premature sequentialization considered harmful ;-)

    As you mentioned earlier, list comprehensions can be thought of as syntactic sugar for maps and filters. They can be extended to zips as well via zip list comprehensions – where you have multiple independent qualifier lists. The canonical example being [(x,y) | x<-listx | y < listy] to zip together listx and listy. A more fun example would be generating a donut of points via random selection.

    (Pardon my foreign accent).

    randomDots d = drop 1 [(x, y) | x <- rand1 | y <- rand2]
      where 
        rand1 = d : [(g*22695477 + 1)  `mod`   2^32   | g <- g1]
        rand2 = d : [(g*1103515245 + 12345) `mod` 2^32 | g <- g2]
     
    donut = [point | point@(x,y) <- randomDots 42, let b= x^2+y^2, b 10000]

    But I think there’s some other things consider besides such bonbons.

    1) I may be wrong, but I dislike that I can’t write this (where the for loop is a one-liner)

    def collapse(seq):
        """Preserve order."""
        uniq = []
        for item in seq: if not uniq.count(item): uniq.append(item)
        return uniq

    whereas I can write

    def collapse(seq):
        """Preserve order."""
        uniq = []
        [uniq.append(item) for item in seq if not uniq.count(item)]
        return uniq

    That’s an annoying restriction to me, especially if I add the a filter when revisiting a comprehension.

    2) With regards efficiency, if there was measurable overhead, such as creating and destroying a large list, I’d consider it a defect of the implementation that the unneeded list value creation is not elided. (Ingrained impurity also minimizes any hope of parallel execution here as well of course).

    3) Are we over specifying? Eg. do we really care about the ordering? If not, then I’d prefer to write
    “collapse = toList . fromList” where I don’t have to explicitly name the arguments and can just say that collapsing means convert the list to a set and back again. If I do care about the ordering, and I have an ordinal type, then I can still use a set (again, pardon the foreign accent), and walk thru the values

    collapse = go S.empty
      where go _ [] = []
            go s (x:xs) | S.member x s = go s xs
                        | otherwise    = x : go (S.insert x s) xs

    whereas if I want ordering but only want to assume I have an equality comparison, giving up some performance, I can say

    collapse (x:xs) = x : collapse (==)  (filter (\y -> not (eq x y)) xs)

    I prefer to deal with these expressions rather than reconstruct the expression being generated by mentally simulating control flow.

  13. plonstic Says:

    +1

    """
    side effect  for loop
    list  list comprehension
    intermediates values  generator comprehension
     
    Use function (or lambda) to improve readability
    """

    That’s my rule :)

  14. masklinn Says:

    @niall

    1) I may be wrong, but I dislike that I can’t write this (where the for loop is a one-liner)

    You can write the loop that way, but it’s generally not good style as it’s less readable and less marked as a block (though it’s one) than the regular indenting. It’s absolutely possible (and sometimes done when there is only one statement in a block’s body).

  15. jwithers Says:

    @Fredrik: Precisely! This trope that will not die that python is a psudeocode language and great beginners was true at one point, but between decorators, list comps, generator comps and such syntatic sugar, we have lost that. It isn’t that I can’t code using those constructs just fine and do or even find them somewhat handy, it is just that this fiction we maintain in the community that this is somehow a better beginners language and easier to understand than others is laughable as we keep piling on language constructs that don’t really have a lot of purpose other than to be cool. Still love me some python, but am not a big fan of the constant language construct additions for the sake of language construct additions.

  16. cdleary Says:

    @masklinn: Actually I don’t believe it’s possible due to the newline restrictions in the grammar rules. Taken from the Full Grammar docs:

    for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
    suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
    simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
    small_stmt: (expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt |
                 import_stmt | global_stmt | exec_stmt | assert_stmt)

    The only thing that doesn’t require a newline is a simple_stmt, and that precludes a conditional (though you can use a conditional expression, there’s no else suite here). People who dislike one liners generally consider this to be a good thing.

  17. Federico Says:

    An example of a more “typical” list comprehension use:

    >>> s="""
    ... a = []
    ... for x in xrange(200):
    ...     a.append(x*2)
    ... """
    >>> q="""
    ... a = [ x*2 for x in xrange(200)]
    ... """
    >>> timeit.Timer(stmt=s).timeit(number=10000)
    0.55010199546813965
    >>> timeit.Timer(stmt=q).timeit(number=10000)
    0.29913878440856934
  18. cdleary Says:

    @Federico: Another excellent point! The reason for the more typical use of list comprehensions being better (where the result is actually used) is that you don’t need to load the append attribute off of the list and call it as a function. Instead, in a comprehension, a specialized LIST_APPEND bytecode is generated for a fast append onto the result list. When you throw the result away, this benefit no longer exists, as it’s still more overhead than not doing any appending at all.

  19. Brian Kennedy Says:

    On the topic of the use of list comprehensions with side effects whose result isn’t used, I’d agree that this is bad form. List comprehensions are intended to be and most commonly used as an expression. That is distinct from a statement in that it has no side effects and it returns a value. Its similar to seeing the following python code

    foo = SomeObject()
    foo.bar
    foo.bar

    It might be correct and actually do something for SomeObject, but its terribly confusing because its using a syntax meant for expressions to accomplish some state change outside of the expression.

    Seeing

    [foo.bar() for foo in foos]

    in python is a lot like how I felt when I found out that

    5;

    was a valid C statement.

  20. nes Says:

    I agree with you Chris. List comprehensions are functional constructs and should follow that paradigm. In other words the “expression” in “[expression for x in lst]” should have no side effects. If the expression has no side effects and the comprehension is not assigned to anything it is a no-op and an error. If the expression has side effects then use the traditional procedural syntax instead. Doing otherwise you are twisting the way those constructs were meant to be used (and I perfectly understand the thrill of being twisted once in a while :-)).

  21. Michael Foord Says:

    Not at all. In Python 3 map is removed (where side effects are *obviously* fine) because *list comprehensions make map obsolete*.

  22. cdleary Says:

    @Michael Foord: I don’t see the justification for “where side effects are *obviously* fine” — could you elaborate? Thanks!

  23. Michael Foord Says:

    @cdleary
    I *assume* you would have no problem with code like this?

        def do_something(arg):
            ...
        map(do_something, some_iterable)

    It’s a nice straightforward construct. As always in Python you convey the intent of the code through the naming of the “do_something” function. If the name makes it clear that it has side-effects (i.e. does something) then a nice pythonic equivalent is:

        [do_something(arg) for arg in iterable]

    In Python 2 I more often use the map version for this kind of mapping. map is gone in Python 3 because the list comprehension equivalent fully replaces it. Interesting that for efficiency a for loop is faster, even if more verbose. :-)

  24. cdleary Says:

    @Michael Foord: Good point, we should discuss Python 3! I actually don’t use that construct because map creates a list in the size of the transform, but clearly I’m biased here. :-)

    The docs for map in Python 3 indicate that it now makes an iterable (ala itertools.imap), which doesn’t have the same memory overhead, but also doesn’t eagerly evaluate, so you could use something to exhaust the iterator, like:

    all(map(do_something, iterable))

    The 2to3 translation of the construct you mentioned retains the memory overhead:

    list(map(do_something, iterable))

    But you can do similar with a generator expression to avoid the memory overhead:

    all(do_something(item) for item in iterable)

    To me, they seem hacky when compared to the memory efficient:

    for item in iterable:
        do_something(item)

    But I suppose we’ve beaten that horse to death by now. :-)

  25. cdleary Says:

    Actually, I’m wrong: all is not a valid way to exhaust the iterator because it’ll short circuit once it comes across a result that’s falsy. If the do_something function produces Nones all the time, that’s right away. :-/

    I suppose you could just use an itertoolsy recipe like:

    def exhaust(iterable):
        for item in iterable:
            pass

    At which point you’re kind of out-of-lining the for loop that I would otherwise be putting inline.

Leave a Reply