Python for 循环和迭代器行为
问题描述
我想进一步了解 iterators
,所以如果我错了,请纠正我.
I wanted to understand a bit more about iterators
, so please correct me if I'm wrong.
迭代器是一个对象,它有一个指向下一个对象的指针,并被读取为缓冲区或流(即链表).它们特别有效,因为它们所做的只是通过引用而不是使用索引来告诉您下一步是什么.
An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.
但是我仍然不明白为什么会发生以下行为:
However I still don't understand why is the following behavior happening:
In [1]: iter = (i for i in range(5))
In [2]: for _ in iter:
....: print _
....:
0
1
2
3
4
In [3]: for _ in iter:
....: print _
....:
In [4]:
在通过迭代器 (In [2]
) 的第一个循环之后,就好像它被消耗并留空,所以第二个循环 (In [3]
)什么都不打印.
After a first loop through the iterator (In [2]
) it's as if it was consumed and left empty, so the second loop (In [3]
) prints nothing.
但是我从未为 iter
变量分配新值.
However I never assigned a new value to the iter
variable.
for
循环的底层到底发生了什么?
What is really happening under the hood of the for
loop?
解决方案
你的怀疑是正确的:迭代器已经被消费了.
Your suspicion is correct: the iterator has been consumed.
实际上,您的迭代器是一个 generator,它是一个能够只迭代一次.
In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.
type((i for i in range(5))) # says it's type generator
def another_generator():
yield 1 # the yield expression makes it a generator, not a function
type(another_generator()) # also a generator
它们高效的原因与通过引用"告诉您下一步是什么无关.它们是高效的,因为它们只根据请求生成下一个项目;所有项目都不是一次生成的.事实上,你可以拥有一个无限的生成器:
The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:
def my_gen():
while True:
yield 1 # again: yield means it is a generator, not a function
for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!
其他一些有助于提高理解的更正:
Some other corrections to help improve your understanding:
- 生成器不是指针,其行为方式与您在其他语言中可能熟悉的指针不同.
- 与其他语言的区别之一:如上所述,生成器的每个结果都是动态生成的.在请求之前不会生成下一个结果.
- 关键字组合
for
in
接受一个可迭代对象作为其第二个参数. - 可迭代对象可以是生成器,如您的示例情况,但它也可以是任何其他可迭代对象,例如
list
或dict
,或str
对象(字符串)或提供所需功能的用户定义类型. - 应用了
iter
函数到对象以获取迭代器(顺便说一句:不要像您所做的那样在 Python 中使用iter
作为变量名 - 它是关键字之一).实际上,更准确地说,对象的__iter__
method 被调用(也就是说,在大多数情况下,所有iter
函数无论如何都会执行;__iter__
是 Python 所谓的魔术方法"之一). - 如果调用
__iter__
成功,函数next()
在循环中一遍又一遍地应用于可迭代对象,并将第一个变量提供给for
in
分配给next()
函数的结果.(记住:可迭代对象可以是生成器,或者容器对象的迭代器,或者任何其他可迭代对象.)实际上,更准确地说:它调用迭代器对象的__next__
方法,这是另一种魔术方法". for
循环在next()
引发StopIteration
异常(这通常发生在当调用next()
时可迭代对象没有要产生的另一个对象时).
- The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
- One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
- The keyword combination
for
in
accepts an iterable object as its second argument. - The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a
list
, ordict
, or astr
object (string), or a user-defined type that provides the required functionality. - The
iter
function is applied to the object to get an iterator (by the way: don't useiter
as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's__iter__
method is called (which is, for the most part, all theiter
function does anyway;__iter__
is one of Python's so-called "magic methods"). - If the call to
__iter__
is successful, the functionnext()
is applied to the iterable object over and over again, in a loop, and the first variable supplied tofor
in
is assigned to the result of thenext()
function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's__next__
method, which is another "magic method". - The
for
loop ends whennext()
raises theStopIteration
exception (which usually happens when the iterable does not have another object to yield whennext()
is called).
您可以通过这种方式在 python 中手动"实现 for
循环(可能并不完美,但足够接近):
You can "manually" implement a for
loop in python this way (probably not perfect, but close enough):
try:
temp = iterable.__iter__()
except AttributeError():
raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
while True:
try:
_ = temp.__next__()
except StopIteration:
break
except AttributeError:
raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
# this is the "body" of the for loop
continue
上面的代码和你的示例代码几乎没有区别.
There is pretty much no difference between the above and your example code.
实际上,for
循环中更有趣的部分不是for
,而是in
.单独使用 in
会产生与 for
in
不同的效果,但了解 in
的作用非常有用使用它的参数,因为 for
in
实现了非常相似的行为.
Actually, the more interesting part of a for
loop is not the for
, but the in
. Using in
by itself produces a different effect than for
in
, but it is very useful to understand what in
does with its arguments, since for
in
implements very similar behavior.
单独使用时,
in
关键字首先调用对象的__contains__
方法,又是一个神奇的方法"(注意使用for
时会跳过这一步在代码>).在容器上单独使用
in
,您可以执行以下操作:
When used by itself, the
in
keyword first calls the object's__contains__
method, which is yet another "magic method" (note that this step is skipped when usingfor
in
). Usingin
by itself on a container, you can do things like this:
1 in [1, 2, 3] # True
'He' in 'Hello' # True
3 in range(10) # True
'eH' in 'Hello'[::-1] # True
如果可迭代对象不是容器(即它没有 __contains__
方法),in
接下来会尝试调用对象的 __iter__
方法.如前所述:__iter__
方法返回 Python 中已知的 迭代器.基本上,迭代器是一个对象,您可以使用内置的通用函数 next()
on1.生成器只是迭代器的一种.
If the iterable object is NOT a container (i.e. it doesn't have a __contains__
method), in
next tries to call the object's __iter__
method. As was said previously: the __iter__
method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next()
on1. A generator is just one type of iterator.
如果您希望创建自己的对象类型以进行迭代(即,您可以使用 for
in
,或仅使用 in
,on它),了解 yield 关键字很有用"noreferrer">生成器(如上所述).
If you wish to create your own object type to iterate over (i.e, you can use for
in
, or just in
, on it), it's useful to know about the yield
keyword, which is used in generators (as mentioned above).
class MyIterable():
def __iter__(self):
yield 1
m = MyIterable()
for _ in m: print(_) # 1
1 in m # True
yield
的存在将函数或方法变成了生成器,而不是常规的函数/方法.如果您使用生成器,则不需要 __next__
方法(它会自动带来 __next__
).
The presence of yield
turns a function or method into a generator instead of a regular function/method. You don't need the __next__
method if you use a generator (it brings __next__
along with it automatically).
如果您希望创建自己的容器对象类型(即,您可以在其上单独使用 in
,但不能使用 for
in
),您只需要 __contains__
方法.
If you wish to create your own container object type (i.e, you can use in
on it by itself, but NOT for
in
), you just need the __contains__
method.
class MyUselessContainer():
def __contains__(self, obj):
return True
m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True
<小时>
1 请注意,要成为迭代器,对象必须实现 迭代器协议.这仅意味着 __next__
和 __iter__
方法都必须正确实现(生成器免费"提供此功能,所以你不要使用时无需担心).还要注意 ___next__
方法 实际上是 next
(无下划线)在 Python 2 中.
1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__
and __iter__
methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__
method is actually next
(no underscores) in Python 2.
2请参阅此答案了解创建可迭代类的不同方法.
2 See this answer for the different ways to create iterable classes.
相关文章