为什么我的埃拉托色尼筛法处理整数比处理布尔值更快?

2022-01-19 00:00:00 python python-2.7 performance boolean cpython

问题描述

我写了一个简单的埃拉托色尼筛法，它使用一个列表，如果不是素数，则将它们变成零，如下所示:

def eSieve(n): #其中 m 是直到 n 的所有整数的定长列表'''创建一个小于或等于n的素数列表'''m = [1]*(n+1)对于 xrange(2,int((n)**0.5)+1) 中的 i:如果 m[i]:对于 xrange(i*i,n+1,i) 中的 j:m[j]=0return [i for i in xrange(2,n) if m[i]]

我用 %timeit 测试了它的运行速度，得到:

#n: t#10**1:7 微秒#10**2:26.6 微秒#10**3:234 微秒#10**4:2.46 毫秒#10**5:26.4 毫秒#10**6:292 毫秒#10**7:3.27 秒

我假设，如果我将 [1] 和 0 更改为布尔值，它会运行得更快......但它却相反:

#n: t#10**1:7.31 微秒#10**2:29.5 微秒#10**3:297 微秒#10**4:2.99 毫秒#10**5:29.9 毫秒#10**6:331 毫秒#10**7:3.7 秒

为什么布尔值变慢了?

解决方案

发生这种情况是因为 True 和 False 在 Python 2 中被视为全局变量. 0 和 1 文字只是常量，通过快速数组引用查找，而全局变量是全局命名空间中的 dictionary 查找(通过内置ins 命名空间):

>>>导入dis>>>定义 foo():... a = 真... b = 1...>>>dis.dis(foo)2 0 LOAD_GLOBAL 0(真)3 STORE_FAST 0 (一)3 6 负载常数 1 (1)9 STORE_FAST 1 (b)12 LOAD_CONST 0(无)15 返回值

True 值使用 LOAD_GLOBAL 字节码查找，而 1 文字值使用 LOAD_CONST 复制到堆栈.

如果你使 True 和 False locals 你可以再次使它们同样快:

def eSieve(n, True=True, False=False):m = [真]*(n+1)对于 xrange(2,int((n)**0.5)+1) 中的 i:如果 m[i]:对于 xrange(i*i,n+1,i) 中的 j:m[j]=假return [i for i in xrange(2,n) if m[i]]

将 True 和 False 指定为 for 参数的默认值，为函数提供了作为局部变量的名称，并具有完全相同的值；再次使用简化版本:

>>>def bar(真=真，假=假):... 真 == 假...>>>dis.dis(bar)2 0 LOAD_FAST 0(真)3 LOAD_FAST 1(假)6 比较_OP 2 (==)9 POP_TOP10 LOAD_CONST 0(无)13 返回值

注意 LOAD_FAST 操作码，现在索引就像 LOAD_CONST 字节码一样；CPython 函数中的局部变量与字节码常量一样存储在数组中.

随着这一变化，使用布尔值胜出，尽管差距很小；我的时间:

# n 个整数 globals locals# 10**1 4.31 微秒 4.2 微秒 4.2 微秒# 10**2 17.1 µs 17.3 µs 16.5 µs# 10**3 147 微秒 158 微秒 144 微秒# 10**4 1.5 毫秒 1.66 毫秒 1.48 毫秒# 10**5 16.4 毫秒 18.2 毫秒 15.9 毫秒# 10**6 190 毫秒 215 毫秒 189 毫秒# 10**7 2.21 秒 2.47 秒 2.18 秒

区别并不大，因为 Python 布尔值只是一个 int 子类.

请注意，在 Python 3 中，True 和 False 已成为关键字，无法再分配，因此可以将它们视为整数文字.p>

I wrote a simple Sieve of Eratosthenes, which uses a list of ones and turns them into zeros if not prime, like so:

def eSieve(n): #Where m is fixed-length list of all integers up to n
    '''Creates a list of primes less than or equal to n'''
    m = [1]*(n+1)
    for i in xrange(2,int((n)**0.5)+1):
        if m[i]:
            for j in xrange(i*i,n+1,i):
                m[j]=0
    return [i for i in xrange(2,n) if m[i]]

I tested the speed it ran with %timeit and got:

#n: t
#10**1: 7 μs
#10**2: 26.6 μs
#10**3: 234 μs
#10**4: 2.46 ms
#10**5: 26.4 ms
#10**6: 292 ms
#10**7: 3.27 s

I assumed, if I changed [1] and 0 to booleans, it would run faster... but it does the opposite:

#n: t
#10**1: 7.31 μs
#10**2: 29.5 μs
#10**3: 297 μs
#10**4: 2.99 ms
#10**5: 29.9 ms
#10**6: 331 ms
#10**7: 3.7 s

Why are the booleans slower?

解决方案

This happens because True and False are looked up as globals in Python 2. The 0 and 1 literals are just constants, looked up by a quick array reference, while globals are dictionary lookups in the global namespace (falling through to the built-ins namespace):

>>> import dis
>>> def foo():
...     a = True
...     b = 1
... 
>>> dis.dis(foo)
  2           0 LOAD_GLOBAL              0 (True)
              3 STORE_FAST               0 (a)

  3           6 LOAD_CONST               1 (1)
              9 STORE_FAST               1 (b)
             12 LOAD_CONST               0 (None)
             15 RETURN_VALUE

The True value is looked up with the LOAD_GLOBAL bytecode, while the 1 literal value is copied to the stack with LOAD_CONST.

If you make True and False locals you can make them just as fast again:

def eSieve(n, True=True, False=False):
    m = [True]*(n+1)
    for i in xrange(2,int((n)**0.5)+1):
        if m[i]:
            for j in xrange(i*i,n+1,i):
                m[j]=False
    return [i for i in xrange(2,n) if m[i]]

Assigning True and False as default values to for arguments gives the function those names as locals, with the exact same values; again using a simplified version:

>>> def bar(True=True, False=False):
...     True == False
... 
>>> dis.dis(bar)
  2           0 LOAD_FAST                0 (True)
              3 LOAD_FAST                1 (False)
              6 COMPARE_OP               2 (==)
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

Note the LOAD_FAST opcodes, now with indices just like the LOAD_CONST bytecodes; locals in a CPython function are stored in an array just like bytecode constants.

With that change, using booleans wins out, albeit by a small margin; my timings:

# n      integers  globals  locals
# 10**1  4.31 µs   4.2 µs   4.2 µs
# 10**2  17.1 µs   17.3 µs  16.5 µs
# 10**3  147 µs    158 µs   144 µs
# 10**4  1.5 ms    1.66 ms  1.48 ms
# 10**5  16.4 ms   18.2 ms  15.9 ms
# 10**6  190 ms    215 ms   189 ms   
# 10**7  2.21 s    2.47 s   2.18 s

The difference isn't really that much because Python booleans are just an int subclass.

Note that in Python 3, True and False have become keywords and can no longer be assigned to, making it possible to treat them just like integer literals.

相关文章