在 any() 语句中迭代一个小列表是否更快?
问题描述
在低长度迭代的限制下考虑以下操作,
Consider the following operation in the limit of low length iterables,
d = (3, slice(None, None, None), slice(None, None, None))
In [215]: %timeit any([type(i) == slice for i in d])
1000000 loops, best of 3: 695 ns per loop
In [214]: %timeit any(type(i) == slice for i in d)
1000000 loops, best of 3: 929 ns per loop
设置为 list
比使用生成器表达式快 25%?
Setting as a list
is 25% faster than using a generator expression?
为什么会这样,因为设置为 list
是一个额外的操作.
Why is this the case as setting as a list
is an extra operation.
注意:在两次运行中,我都收到了警告:最慢的运行时间比最快的运行时间长 6.42 倍.这可能意味着正在缓存中间结果
I
Note: In both runs I obtained the warning: The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached
I
在这个特定的测试中,list()
结构的速度更快,直到 4
的长度,生成器由此提高了性能.
In this particular test, list()
structures are faster up to a length of 4
from which the generator has increased performance.
红线表示发生此事件的位置,黑线表示两者性能相同的位置.
The red line shows where this event occurs and the black line shows where both are equal in performance.
通过使用所有内核,代码在我的 MacBook Pro 上运行大约需要 1 分钟:
The code takes about 1min to run on my MacBook Pro by utilising all the cores:
import timeit, pylab, multiprocessing
import numpy as np
manager = multiprocessing.Manager()
g = manager.list([])
l = manager.list([])
rng = range(1,16) # list lengths
max_series = [3,slice(None, None, None)]*rng[-1] # alternate array types
series = [max_series[:n] for n in rng]
number, reps = 1000000, 5
def func_l(d):
l.append(timeit.repeat("any([type(i) == slice for i in {}])".format(d),repeat=reps, number=number))
print "done List, len:{}".format(len(d))
def func_g(d):
g.append(timeit.repeat("any(type(i) == slice for i in {})".format(d), repeat=reps, number=number))
print "done Generator, len:{}".format(len(d))
p = multiprocessing.Pool(processes=min(16,rng[-1])) # optimize for 16 processors
p.map(func_l, series) # pool list
p.map(func_g, series) # pool gens
ratio = np.asarray(g).mean(axis=1) / np.asarray(l).mean(axis=1)
pylab.plot(rng, ratio, label='av. generator time / av. list time')
pylab.title("{} iterations, averaged over {} runs".format(number,reps))
pylab.xlabel("length of iterable")
pylab.ylabel("Time Ratio (Higher is worse)")
pylab.legend()
lt_zero = np.argmax(ratio<1.)
pylab.axhline(y=1, color='k')
pylab.axvline(x=lt_zero+1, color='r')
pylab.ion() ; pylab.show()
解决方案
关键是你正在应用 any
的项目的大小.在更大的数据集上重复相同的过程:
The catch is the size of the items you are applying any
on. Repeat the same process on a larger dataset:
In [2]: d = ([3] * 1000) + [slice(None, None, None), slice(None, None, None)]*1000
In [3]: %timeit any([type(i) == slice for i in d])
1000 loops, best of 3: 736 µs per loop
In [4]: %timeit any(type(i) == slice for i in d)
1000 loops, best of 3: 285 µs per loop
然后,使用 list
(将所有项目加载到内存中)会变得更慢,而生成器表达式的效果会更好.
Then, using a list
(loading all the items into memory) becomes much slower, and the generator expression plays out better.
相关文章