迭代列表的奇怪速度差异
问题描述
我创建了两个重复两个不同值的长列表.在第一个列表中,值交替出现,在第二个列表中,一个值出现在另一个值之前:
I create two long lists repeating two different values. In the first list the values alternate, in the second list one value's occurrences come before the other's:
a1 = [object(), object()] * 10**6
a2 = a1[::2] + a1[1::2]
然后我迭代它们,对它们什么都不做:
Then I iterate them, doing nothing with them:
for _ in a1: pass
for _ in a2: pass
两者中哪一个迭代得更快?取决于我如何测量!我用每种计时方法进行了 50 场比赛:
Which of the two gets iterated faster? Depends on how I measure! I did 50 races with each timing method:
timing method: timeit.timeit
list a1 won 50 times
list a2 won 0 times
timing method: timeit.default_timer
list a1 won 0 times
list a2 won 50 times
为什么这两种计时方法给我的结果完全相反?为什么这两个列表之间存在速度差异?我希望两次更像 25-25,而不是 50-0 和 0-50.
Why do the two timing methods give me completely opposite results? And why are there speed differences between the two lists at all? I'd expect something more like 25-25 twice, instead of 50-0 and 0-50.
以前类似问题的原因以及我认为他们在这里不负责任的原因:
Reasons from previous similar questions and why I don't think they're responsible here:
- 分支预测:我没有任何可以产生影响的分支.
- 缓存未命中:不应该发生,因为我只有两个值(而且我什至什么都不做和他们一起).
- 垃圾回收:此处不涉及.
- 无论如何,这些都不能解释计时方法之间的相反速度差异.
- branch prediction: I don't have any branching that could make a difference.
- cache misses: Shouldn't happen, as I only have two values (and I don't even do anything with them).
- garbage collection: Not involved here.
- None of these would explain the opposite speed differences between the timing methods anyway.
我使用的是 Python 3.10,在弱 Windows 笔记本电脑和强 Linux Google Compute Engine 实例上的结果相同.
I'm using Python 3.10, same results on a weak Windows laptop and a strong Linux Google Compute Engine instance.
完整代码:
from timeit import timeit, default_timer
a1 = [object(), object()] * 10**6
a2 = a1[::2] + a1[1::2]
for method in 'timeit', 'default_timer':
wins1 = wins2 = 0
for _ in range(50):
time1 = time2 = float('inf')
for _ in range(5):
if method == 'timeit':
t1 = timeit('for _ in a1: pass', 'from __main__ import a1', number=1)
t2 = timeit('for _ in a2: pass', 'from __main__ import a2', number=1)
else:
start = default_timer();
for _ in a1: pass
end = default_timer()
t1 = end - start
start = default_timer();
for _ in a2: pass
end = default_timer()
t2 = end - start
time1 = min(time1, t1)
time2 = min(time2, t2)
wins1 += time1 < time2
wins2 += time2 < time1
print(f'timing method: timeit.{method}')
print(f' list a1 won {wins1} times')
print(f' list a2 won {wins2} times')
print()
解决方案
不是答案",而是线索:添加
Not "an answer", but a clue: add
def f(a):
for _ in a: pass
然后在default_timer
分支中替换
for _ in a1: pass
使用 f(a1)
,对于 a2
的迭代也类似.
with f(a1)
, and similarly for the iteration over a2
.
那时我看到了两件事:
- 输出变化更统一
timing method: timeit.timeit
list a1 won 50 times
list a2 won 0 times
timing method: timeit.default_timer
list a1 won 50 times
list a2 won 0 times
- 不再是
timeit
跑得比default_timer
快的情况,
- It's no longer the case that
timeit
runs faster thandefault_timer
,
#2 非常明显,可能很难注意到原因 ;-) 在原来的情况下,for 循环目标 _
是一个全局模块,因此在每次迭代时更新它需要一个 dict 查找和存储.但是 timeit
会根据传递给它的内容构建一个函数,因此在 timeit
运行的代码中,_
是一个局部变量,而一个更快的 STORE_FAST
索引存储就足够了.
#2 is so obvious it may be hard to notice why ;-) In the original, the for-loop target _
is a module global, so updating it on each iteration requires a dict lookup and store. But timeit
builds a function out of the stuff passed to it, so in the code timeit
runs, _
is a local variable instead, and a faster STORE_FAST
indexed store suffices.
函数 f()
被引入以便举重"在这两种情况下都在局部变量上完成.
Function f()
was introduced so that the "heavy lifting" is done in both cases on local variables.
这里计时的代码做的很少,以至于 dict 查找和 index-a-vector-with-an-int 操作之间的差异可能非常显着
The code being timed here does so very little that the difference between a dict lookup and an index-a-vector-with-an-int operation can be highly significant
相关文章