列表上的总和可能更快

2022-01-09 00:00:00 python list performance sum

问题描述

这是对 问题

首先,您会注意到您不能对字符串列表执行 sum 来连接它们,python 告诉您改用 str.join,并且这是个好建议,因为无论您如何在字符串上使用 +,性能都很差.

So first, you'll notice that you cannot perform a sum on a list of strings to concatenate them, python tells you to use str.join instead, and that's good advice because no matter how you use + on strings, the performance is bad.

不能使用 sum"的限制不适用于 list,但 itertools.chain.from_iterable 是首选方式执行此类列表展平.

The "cannot use sum" restriction doesn't apply to list, and though, itertools.chain.from_iterable is the preferred way to perform such list flattening.

但是 sum(x,[])x 是一个列表的列表时肯定是错误的.

But sum(x,[]) when x is a list of lists is definitively bad.

但它应该保持这种状态吗?

But should it stay that way?

我比较了 3 种方法

import time
import itertools

a = [list(range(1,1000)) for _ in range(1000)]

start=time.time()
sum(a,[])
print(time.time()-start)

start=time.time()
list(itertools.chain.from_iterable(a))
print(time.time()-start)


start=time.time()
z=[]
for s in a:
    z += s
print(time.time()-start)

结果:

  • sum() 在列表列表中:10.46647310256958.好的,我们知道了.
  • itertools.chain:0.07705187797546387
  • 使用就地加法的自定义累积总和:0.057044029235839844(如您所见,可能比 itertools.chain 更快)
  • sum() on the list of lists: 10.46647310256958. Okay, we knew.
  • itertools.chain: 0.07705187797546387
  • custom accumulated sum using in-place addition: 0.057044029235839844 (can be faster than itertools.chain as you see)

所以 sum 远远落后,因为它执行 result = result + b 而不是 result += b

So sum is way behind because it performs result = result + b instead of result += b

所以现在我的问题是:

为什么 sum 不能在可用的情况下使用这种累积方法?

Why can't sum use this accumulative approach when available?

(这对于已经存在的应用程序来说是透明的,并且可以使用内置的 sum 来有效地展平列表)

(That would be transparent for already existing applications and would make possible the use of the sum built-in to flatten lists efficiently)


解决方案

我们可以尝试让 sum() 更智能,但 Alex Martelli 和 Guido van Rossum 想让它专注于算术求和.

We could try to make sum() smarter, but Alex Martelli and Guido van Rossum wanted to keep it focused on arithmetic summations.

FWIW,您应该使用这个简单的代码获得合理的性能:

FWIW, you should get reasonable performance with this simple code:

result = []
for seq in mylists:
    result += seq

对于您的其他问题,为什么在可用时不能 sum 使用这种累积方法?",请参阅 Python/bltinmodule.c 中有关 builtin_sum() 的评论:

For your other question, "why can't sum use this accumulative approach when available?", see this comment for builtin_sum() in Python/bltinmodule.c:

    /* It's tempting to use PyNumber_InPlaceAdd instead of
       PyNumber_Add here, to avoid quadratic running time
       when doing 'sum(list_of_lists, [])'.  However, this
       would produce a change in behaviour: a snippet like

         empty = []
         sum([[x] for x in range(10)], empty)

       would change the value of empty. */

相关文章