为什么“10000000000000000 在范围内(1000000000000001)"Python 3 这么快?
问题描述
据我了解 range()
函数,其实就是 Python 3 中的一种对象类型,动态生成其内容,类似于生成器.
It is my understanding that the range()
function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.
在这种情况下,我预计以下行会花费过多的时间,因为为了确定 1 万亿是否在范围内,必须生成 1 万亿值:
This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:
1_000_000_000_000_000 in range(1_000_000_000_000_001)
此外:似乎无论我添加多少个零,计算或多或少都需要相同的时间(基本上是瞬时的).
Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).
我也尝试过这样的事情,但计算仍然几乎是即时的:
I have also tried things like this, but the calculation is still almost instant:
# count by tens
1_000_000_000_000_000_000_000 in range(0,1_000_000_000_000_000_000_001,10)
如果我尝试实现自己的范围函数,结果就不那么好了!
If I try to implement my own range function, the result is not so nice!
def my_crappy_range(N):
i = 0
while i < N:
yield i
i += 1
return
range()
对象在后台做了什么让它如此之快?
What is the range()
object doing under the hood that makes it so fast?
Martijn Pieters 的答案 因其完整性而被选中,但另请参阅 abarnert's first answer 很好地讨论了 range
在 Python 3 中成为一个成熟的 sequence 意味着什么,以及关于 Python 实现中 __contains__
函数优化的潜在不一致的一些信息/警告.abarnert 的其他答案 更详细地介绍了一些细节,并为那些对 Python 3 优化背后的历史感兴趣的人提供了链接(并且缺乏xrange
在 Python 2 中的优化).回答 通过戳 和 通过 wim 提供相关的C源代码和解释给有兴趣的人.
Martijn Pieters's answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range
to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__
function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange
in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.
解决方案
Python 3 range()
对象不会立即产生数字;它是一个智能的序列对象,它产生数字按需.它只包含您的开始、停止和步长值,然后当您迭代对象时,每次迭代都会计算下一个整数.
The Python 3 range()
object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.
该对象还实现了 object.__contains__
钩子,如果您的号码在其范围内,则计算.计算是(接近)恒定时间操作*.永远不需要扫描范围内所有可能的整数.
The object also implements the object.__contains__
hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.
来自 range()
对象文档一个>:
From the range()
object documentation:
range
类型优于常规 list
或 tuple
类型的优点是范围对象将始终采用相同(小)量内存,无论它代表的范围大小(因为它只存储 start
、stop
和 step
值,计算单个项目和根据需要添加子范围).
The advantage of the
range
type over a regularlist
ortuple
is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores thestart
,stop
andstep
values, calculating individual items and subranges as needed).
因此,您的 range()
对象至少可以:
So at a minimum, your range()
object would do:
class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1
def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step
def __len__(self):
return self.length
def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('my_range object index out of range')
def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0
这仍然缺少一些真正的 range()
支持的东西(例如 .index()
或 .count()
方法、散列、相等性测试或切片),但应该给你一个想法.
This is still missing several things that a real range()
supports (such as the .index()
or .count()
methods, hashing, equality testing, or slicing), but should give you an idea.
我还简化了 __contains__
实现,只关注整数测试;如果你给一个真正的 range()
对象一个非整数值(包括 int
的子类),就会启动一个慢速扫描来查看是否有匹配,就像如果您对所有包含值的列表使用包含测试.这样做是为了继续支持其他恰好支持整数相等测试但预计也不支持整数算术的数字类型.请参阅实现遏制测试的原始 Python 问题.
I also simplified the __contains__
implementation to only focus on integer tests; if you give a real range()
object a non-integer value (including subclasses of int
), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.
* 接近恒定时间,因为 Python 整数是无界的,因此数学运算也会随着 N 的增长而增长,这使得它成为 O(log N) 运算.由于这一切都是在优化的 C 代码中执行的,并且 Python 将整数值存储在 30 位块中,因此由于此处涉及的整数的大小,您会在看到任何性能影响之前耗尽内存.
* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.
相关文章