为什么`for`循环计算真值的速度如此之快?

2022-01-09 00:00:00 python python-3.x performance sum for-loop

问题描述

我最近在姐妹网站上回答了一个

为什么手动 for 循环如此之快?它几乎比使用 sum 快两倍.而且由于内置的 sum 应该比手动对列表求和快大约五倍(根据

解决方案

sum 相当快，但 sum 并不是导致速度变慢的原因.三个主要因素导致放缓:

使用生成器表达式会导致不断暂停和恢复生成器的开销.
您的生成器版本无条件添加，而不是仅在数字为偶数时添加.当数字为奇数时，这会更昂贵.
添加布尔值而不是整数会阻止 sum 使用其整数快速路径.

<小时>

与列表推导相比，生成器有两个主要优势:它们占用的内存少得多，并且如果不需要所有元素，它们可以提前终止.它们不是旨在在需要所有元素的情况下提供时间优势.每个元素暂停和恢复一次生成器非常昂贵.

如果我们用列表推导替换 genexp:

在 [66]: def f1(x):.....:返回总和(c in '02468' for c in str(x)).....:在 [67] 中:定义 f2(x):.....:返回总和([c in '02468' for c in str(x)]).....:在 [68] 中:x = int('1234567890'*50)在 [69] 中:%timeit f1(x)10000 次循环，5 次中的最佳:每个循环 52.2 µs在 [70] 中:%timeit f2(x)10000 次循环，5 次中的最佳:每个循环 40.5 µs

我们看到了立竿见影的加速，但代价是在列表上浪费了大量内存.

<小时>

如果您查看您的 genexp 版本:

def count_even_digits_spyr03_sum(n):return sum(c in "02468" for c in str(n))

你会发现它没有if.它只是将布尔值扔到 sum 中.相反，您的循环:

def count_even_digits_spyr03_for(n):计数 = 0对于 str(n) 中的 c:如果 c 在02468"中:计数 += 1返回计数

仅当数字为偶数时才添加任何内容.

如果我们将之前定义的 f2 更改为也包含 if，我们会看到另一个加速:

在 [71]: def f3(x):.....: return sum([True for c in str(x) if c in '02468']).....:在 [72] 中:%timeit f3(x)10000 次循环，最好的 5 次:每个循环 34.9 µs

f1，与您的原始代码相同，耗时 52.2 µs，而 f2，仅更改列表理解，耗时 40.5 µs.

<小时>

在 f3 中使用 True 而不是 1 可能看起来很尴尬.这是因为将其更改为 1 会激活最终加速.sum 有一个快速路径用于整数，但快速路径仅对类型为 int 的对象激活.bool 不算数.这是检查项目是否为 int 类型的行:

if (PyLong_CheckExact(item)) {

一旦我们做出最后的改变，将 True 改为 1:

在 [73]: def f4(x):.....:返回总和([1 for c in str(x) if c in '02468']).....:在 [74] 中:%timeit f4(x)10000 次循环，5 次中的最佳:每个循环 33.3 µs

我们看到了最后一个小幅加速.

<小时>

那么，毕竟，我们是否击败了显式循环?

在 [75]: def explicit_loop(x):.....:计数 = 0....:对于 str(x) 中的 c:.....:如果'02468'中的c:....:计数 += 1.....:返回计数.....:在 [76] 中:%timeit 显式循环(x)10000 次循环，5 次中的最佳:每个循环 32.7 µs

不.我们大致收支平衡，但我们没有打败它.剩下的大问题是列表.构建它很昂贵，并且 sum 必须通过列表迭代器来检索元素，这有其自身的成本(尽管我认为这部分非常便宜).不幸的是，只要我们通过 test-digits-and-call-sum 方法，我们就没有任何好的方法来摆脱列表.显式循环获胜.

我们还能走得更远吗?好吧，到目前为止，我们一直在尝试使 sum 更接近显式循环，但是如果我们被这个愚蠢的列表所困扰，我们可能会偏离显式循环并调用 len 而不是 sum:

def f5(x):return len([1 for c in str(x) if c in '02468'])

单独测试数字也不是我们可以尝试打破循环的唯一方法.进一步偏离显式循环，我们还可以尝试 str.count.str.count 直接在 C 中迭代字符串的缓冲区，避免了很多包装对象和间接.我们需要调用它 5 次，对字符串进行 5 次传递，但仍然有回报:

def f6(x):s = str(x)return sum(s.count(c) for c in '02468')

不幸的是，这就是我用来计时的网站因为使用太多资源而让我陷入tarpit"的时候，所以我不得不切换网站.以下时序无法与上述时序直接比较:

>>>导入时间>>>定义 f(x):... return sum([1 for c in str(x) if c in '02468'])...>>>定义 g(x):... return len([1 for c in str(x) if c in '02468'])...>>>定义 h(x):... s = str(x)... return sum(s.count(c) for c in '02468')...>>>x = int('1234567890'*50)>>>timeit.timeit(lambda: f(x), number=10000)0.331528635986615>>>timeit.timeit(lambda: g(x), number=10000)0.30292080697836354>>>timeit.timeit(lambda: h(x), number=10000)0.15950968803372234>>>def 显式循环(x):...计数 = 0... 对于 str(x) 中的 c:... 如果 c 在 '02468' 中:...计数 += 1...返回计数...>>>timeit.timeit(lambda:explicit_loop(x), number=10000)0.3305045129964128

I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

In addition I looked at using a list comprehension and list.count:

def count_even_digits_spyr03_list(n):
    return [c in "02468" for c in str(n)].count(True)

The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:

Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?

Using an if as a filter like so:

def count_even_digits_spyr03_sum2(n):
    return sum(1 for c in str(n) if c in "02468")

Improves the timing only to the same level as the list comprehension.

When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:

解决方案

sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:

The use of a generator expression causes overhead for constantly pausing and resuming the generator.
Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.
Adding booleans instead of ints prevents sum from using its integer fast path.

Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.

If we replace the genexp with a list comprehension:

In [66]: def f1(x):
   ....:     return sum(c in '02468' for c in str(x))
   ....: 
In [67]: def f2(x):
   ....:     return sum([c in '02468' for c in str(x)])
   ....: 
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop

we see an immediate speedup, at the cost of wasting a bunch of memory on a list.

If you look at your genexp version:

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

you'll see it has no if. It just throws booleans into sum. In constrast, your loop:

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

only adds anything if the digit is even.

If we change the f2 defined earlier to also incorporate an if, we see another speedup:

In [71]: def f3(x):
   ....:     return sum([True for c in str(x) if c in '02468'])
   ....: 
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop

f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.

It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:

if (PyLong_CheckExact(item)) {

Once we make the final change, changing True to 1:

In [73]: def f4(x):
   ....:     return sum([1 for c in str(x) if c in '02468'])
   ....: 
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop

we see one last small speedup.

So after all that, do we beat the explicit loop?

In [75]: def explicit_loop(x):
   ....:     count = 0
   ....:     for c in str(x):
   ....:         if c in '02468':
   ....:             count += 1
   ....:     return count
   ....: 
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop

Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.

Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:

def f5(x):
    return len([1 for c in str(x) if c in '02468'])

Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:

def f6(x):
    s = str(x)
    return sum(s.count(c) for c in '02468')

Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:

>>> import timeit
>>> def f(x):
...     return sum([1 for c in str(x) if c in '02468'])
... 
>>> def g(x):
...     return len([1 for c in str(x) if c in '02468'])
... 
>>> def h(x):
...     s = str(x)
...     return sum(s.count(c) for c in '02468')
... 
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
...     count = 0
...     for c in str(x):
...         if c in '02468':
...             count += 1
...     return count
... 
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128

相关文章