Python for 循环和迭代器行为

2022-01-10 00:00:00 python iterator

问题描述

我想进一步了解 iterators,所以如果我错了,请纠正我.

I wanted to understand a bit more about iterators, so please correct me if I'm wrong.

迭代器是一个对象,它有一个指向下一个对象的指针,并被读取为缓冲区或流(即链表).它们特别有效,因为它们所做的只是通过引用而不是使用索引来告诉您下一步是什么.

An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.

但是我仍然不明白为什么会发生以下行为:

However I still don't understand why is the following behavior happening:

In [1]: iter = (i for i in range(5))

In [2]: for _ in iter:
   ....:     print _
   ....:     
0
1
2
3
4

In [3]: for _ in iter:
   ....:     print _
   ....:     

In [4]: 

在通过迭代器 (In [2]) 的第一个循环之后,就好像它被消耗并留空,所以第二个循环 (In [3])什么都不打印.

After a first loop through the iterator (In [2]) it's as if it was consumed and left empty, so the second loop (In [3]) prints nothing.

但是我从未为 iter 变量分配新值.

However I never assigned a new value to the iter variable.

for 循环的底层到底发生了什么?

What is really happening under the hood of the for loop?


解决方案

你的怀疑是正确的:迭代器已经被消费了.

Your suspicion is correct: the iterator has been consumed.

实际上,您的迭代器是一个 generator,它是一个能够只迭代一次.

In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.

type((i for i in range(5))) # says it's type generator 

def another_generator():
    yield 1 # the yield expression makes it a generator, not a function

type(another_generator()) # also a generator

它们高效的原因与通过引用"告诉您下一步是什么无关.它们是高效的,因为它们只根据请求生成下一个项目;所有项目都不是一次生成的.事实上,你可以拥有一个无限的生成器:

The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:

def my_gen():
    while True:
        yield 1 # again: yield means it is a generator, not a function

for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!

其他一些有助于提高理解的更正:

Some other corrections to help improve your understanding:

  • 生成器不是指针,其行为方式与您在其他语言中可能熟悉的指针不同.
  • 与其他语言的区别之一:如上所述,生成器的每个结果都是动态生成的.在请求之前不会生成下一个结果.
  • 关键字组合 for in 接受一个可迭代对象作为其第二个参数.
  • 可迭代对象可以是生成器,如您的示例情况,但它也可以是任何其他可迭代对象,例如 listdict,或str 对象(字符串)或提供所需功能的用户定义类型.
  • 应用了 iter 函数到对象以获取迭代器(顺便说一句:不要像您所做的那样在 Python 中使用 iter 作为变量名 - 它是关键字之一).实际上,更准确地说,对象的 __iter__method 被调用(也就是说,在大多数情况下,所有 iter 函数无论如何都会执行;__iter__ 是 Python 所谓的魔术方法"之一).
  • 如果调用 __iter__ 成功,函数 next() 在循环中一遍又一遍地应用于可迭代对象,并将第一个变量提供给 for in 分配给 next() 函数的结果.(记住:可迭代对象可以是生成器,或者容器对象的迭代器,或者任何其他可迭代对象.)实际上,更准确地说:它调用迭代器对象的 __next__ 方法,这是另一种魔术方法".
  • for 循环在 next() 引发 StopIteration 异常(这通常发生在当调用 next() 时可迭代对象没有要产生的另一个对象时).
  • The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
  • One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
  • The keyword combination for in accepts an iterable object as its second argument.
  • The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a str object (string), or a user-defined type that provides the required functionality.
  • The iter function is applied to the object to get an iterator (by the way: don't use iter as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__ method is called (which is, for the most part, all the iter function does anyway; __iter__ is one of Python's so-called "magic methods").
  • If the call to __iter__ is successful, the function next() is applied to the iterable object over and over again, in a loop, and the first variable supplied to for in is assigned to the result of the next() function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method, which is another "magic method".
  • The for loop ends when next() raises the StopIteration exception (which usually happens when the iterable does not have another object to yield when next() is called).

您可以通过这种方式在 python 中手动"实现 for 循环(可能并不完美,但足够接近):

You can "manually" implement a for loop in python this way (probably not perfect, but close enough):

try:
    temp = iterable.__iter__()
except AttributeError():
    raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
    while True:
        try:
            _ = temp.__next__()
        except StopIteration:
            break
        except AttributeError:
            raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
        # this is the "body" of the for loop
        continue

上面的代码和你的示例代码几乎没有区别.

There is pretty much no difference between the above and your example code.

实际上,for 循环中更有趣的部分不是for,而是in.单独使用 in 会产生与 for in 不同的效果,但了解 in 的作用非常有用使用它的参数,因为 for in 实现了非常相似的行为.

Actually, the more interesting part of a for loop is not the for, but the in. Using in by itself produces a different effect than for in, but it is very useful to understand what in does with its arguments, since for in implements very similar behavior.

  • 单独使用时,in 关键字首先调用对象的__contains__ 方法,又是一个神奇的方法"(注意使用for 时会跳过这一步在).在容器上单独使用 in,您可以执行以下操作:

  • When used by itself, the in keyword first calls the object's __contains__ method, which is yet another "magic method" (note that this step is skipped when using for in). Using in by itself on a container, you can do things like this:

1 in [1, 2, 3] # True
'He' in 'Hello' # True
3 in range(10) # True
'eH' in 'Hello'[::-1] # True

  • 如果可迭代对象不是容器(即它没有 __contains__ 方法),in 接下来会尝试调用对象的 __iter__ 方法.如前所述:__iter__ 方法返回 Python 中已知的 迭代器.基本上,迭代器是一个对象,您可以使用内置的通用函数 next() on1.生成器只是迭代器的一种.

  • If the iterable object is NOT a container (i.e. it doesn't have a __contains__ method), in next tries to call the object's __iter__ method. As was said previously: the __iter__ method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next() on1. A generator is just one type of iterator.

    如果您希望创建自己的对象类型以进行迭代(即,您可以使用 for in,或仅使用 in,on它),了解 yield 关键字很有用"noreferrer">生成器(如上所述).

    If you wish to create your own object type to iterate over (i.e, you can use for in, or just in, on it), it's useful to know about the yield keyword, which is used in generators (as mentioned above).

    class MyIterable():
        def __iter__(self):
            yield 1
    
    m = MyIterable()
    for _ in m: print(_) # 1
    1 in m # True    
    

    yield 的存在将函数或方法变成了生成器,而不是常规的函数/方法.如果您使用生成器,则不需要 __next__ 方法(它会自动带来 __next__ ).

    The presence of yield turns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator (it brings __next__ along with it automatically).

    如果您希望创建自己的容器对象类型(即,您可以在其上单独使用 in,但不能使用 for in),您只需要 __contains__ 方法.

    If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.

    class MyUselessContainer():
        def __contains__(self, obj):
            return True
    
    m = MyUselessContainer()
    1 in m # True
    'Foo' in m # True
    TypeError in m # True
    None in m # True
    

    <小时>

    1 请注意,要成为迭代器,对象必须实现 迭代器协议.这仅意味着 __next____iter__ 方法都必须正确实现(生成器免费"提供此功能,所以你不要使用时无需担心).还要注意 ___next__ 方法 实际上是 next(无下划线)在 Python 2 中.


    1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__ and __iter__ methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__ method is actually next (no underscores) in Python 2.

    2请参阅此答案了解创建可迭代类的不同方法.

    2 See this answer for the different ways to create iterable classes.

  • 相关文章