列表上的总和可能更快
问题描述
这是对 问题
首先,您会注意到您不能对字符串列表执行 sum
来连接它们,python 告诉您改用 str.join
,并且这是个好建议,因为无论您如何在字符串上使用 +
,性能都很差.
So first, you'll notice that you cannot perform a sum
on a list of strings to concatenate them, python tells you to use str.join
instead, and that's good advice because no matter how you use +
on strings, the performance is bad.
不能使用 sum
"的限制不适用于 list
,但 itertools.chain.from_iterable
是首选方式执行此类列表展平.
The "cannot use sum
" restriction doesn't apply to list
, and though, itertools.chain.from_iterable
is the preferred way to perform such list flattening.
但是 sum(x,[])
当 x
是一个列表的列表时肯定是错误的.
But sum(x,[])
when x
is a list of lists is definitively bad.
但它应该保持这种状态吗?
But should it stay that way?
我比较了 3 种方法
import time
import itertools
a = [list(range(1,1000)) for _ in range(1000)]
start=time.time()
sum(a,[])
print(time.time()-start)
start=time.time()
list(itertools.chain.from_iterable(a))
print(time.time()-start)
start=time.time()
z=[]
for s in a:
z += s
print(time.time()-start)
结果:
sum()
在列表列表中:10.46647310256958.好的,我们知道了.itertools.chain
:0.07705187797546387- 使用就地加法的自定义累积总和:0.057044029235839844(如您所见,可能比
itertools.chain
更快)
sum()
on the list of lists: 10.46647310256958. Okay, we knew.itertools.chain
: 0.07705187797546387- custom accumulated sum using in-place addition: 0.057044029235839844 (can be faster than
itertools.chain
as you see)
所以 sum
远远落后,因为它执行 result = result + b
而不是 result += b
So sum
is way behind because it performs result = result + b
instead of result += b
所以现在我的问题是:
为什么 sum
不能在可用的情况下使用这种累积方法?
Why can't sum
use this accumulative approach when available?
(这对于已经存在的应用程序来说是透明的,并且可以使用内置的 sum
来有效地展平列表)
(That would be transparent for already existing applications and would make possible the use of the sum
built-in to flatten lists efficiently)
解决方案
我们可以尝试让 sum() 更智能,但 Alex Martelli 和 Guido van Rossum 想让它专注于算术求和.
We could try to make sum() smarter, but Alex Martelli and Guido van Rossum wanted to keep it focused on arithmetic summations.
FWIW,您应该使用这个简单的代码获得合理的性能:
FWIW, you should get reasonable performance with this simple code:
result = []
for seq in mylists:
result += seq
对于您的其他问题,为什么在可用时不能 sum 使用这种累积方法?",请参阅 Python/bltinmodule.c 中有关 builtin_sum() 的评论:
For your other question, "why can't sum use this accumulative approach when available?", see this comment for builtin_sum() in Python/bltinmodule.c:
/* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
相关文章