Python `yield from`,还是返回一个生成器?

2022-01-19 00:00:00 python generator function return

问题描述

我写了这段简单的代码:

I wrote this simple piece of code:

def mymap(func, *seq):
  return (func(*args) for args in zip(*seq))

我应该使用上面的'return'语句来返回一个生成器,还是使用这样的'yield from'指令:

Should I use the 'return' statement as above to return a generator, or use a 'yield from' instruction like this:

def mymap(func, *seq):
  yield from (func(*args) for args in zip(*seq))

除了return"和yield from"之间的技术差异之外,一般情况下哪种方法更好?

and beyond the technical difference between 'return' and 'yield from', which is the better approach the in general case?


解决方案

不同的是,你的第一个mymap只是一个普通的函数,在这种情况下,工厂返回一个生成器.一切一旦你调用函数,体内就会被执行.

The difference is that your first mymap is just a usual function, in this case a factory which returns a generator. Everything inside the body gets executed as soon as you call the function.

def gen_factory(func, seq):
    """Generator factory returning a generator."""
    # do stuff ... immediately when factory gets called
    print("build generator & return")
    return (func(*args) for args in seq)

第二个mymap也是一个工厂,但它也是一个生成器本身,由内部自建的子发电机产生.因为它本身就是一个生成器,所以主体的执行确实直到第一次调用 next(generator) 才开始.

The second mymap is also a factory, but it's also a generator itself, yielding from a self-built sub-generator inside. Because it is a generator itself, execution of the body does not start until the first invokation of next(generator).

def gen_generator(func, seq):
    """Generator yielding from sub-generator inside."""
    # do stuff ... first time when 'next' gets called
    print("build generator & yield")
    yield from (func(*args) for args in seq)

我认为下面的例子会更清楚.我们定义了应该用函数处理的数据包,捆绑在我们传递给生成器的作业中.

I think the following example will make it clearer. We define data packages which shall be processed with functions, bundled up in jobs we pass to the generators.

def add(a, b):
    return a + b

def sqrt(a):
    return a ** 0.5

data1 = [*zip(range(1, 5))]  # [(1,), (2,), (3,), (4,)]
data2 = [(2, 1), (3, 1), (4, 1), (5, 1)]

job1 = (sqrt, data1)
job2 = (add, data2)

现在我们在 IPython 等交互式 shell 中运行以下代码看到不同的行为.gen_factory 立即打印出,而 gen_generator 仅在 next() 被调用后才这样做.

Now we run the following code inside an interactive shell like IPython to see the different behavior. gen_factory immediately prints out, while gen_generator only does so after next() being called.

gen_fac = gen_factory(*job1)
# build generator & return <-- printed immediately
next(gen_fac)  # start
# Out: 1.0
[*gen_fac]  # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

gen_gen = gen_generator(*job1)
next(gen_gen)  # start
# build generator & yield <-- printed with first next()
# Out: 1.0
[*gen_gen]  # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

为您提供一个更合理的构造用例示例像 gen_generator 我们将对其进行一些扩展并制作一个协程通过将产量分配给变量来摆脱它,所以我们可以注入工作使用 send() 进入正在运行的生成器.

To give you a more reasonable use case example for a construct like gen_generator we'll extend it a little and make a coroutine out of it by assigning yield to variables, so we can inject jobs into the running generator with send().

另外我们创建了一个辅助函数来运行所有的任务在工作中并在完成后要求新的工作.

Additionally we create a helper function which will run all tasks inside a job and ask as for a new one upon completion.

def gen_coroutine():
    """Generator coroutine yielding from sub-generator inside."""
    # do stuff... first time when 'next' gets called
    print("receive job, build generator & yield, loop")
    while True:
        try:
            func, seq = yield "send me work ... or I quit with next next()"
        except TypeError:
            return "no job left"
        else:
            yield from (func(*args) for args in seq)


def do_job(gen, job):
    """Run all tasks in job."""
    print(gen.send(job))
    while True:
        result = next(gen)
        print(result)
        if result == "send me work ... or I quit with next next()":
            break

现在我们使用辅助函数 do_job 和两个作业运行 gen_coroutine.

Now we run gen_coroutinewith our helper function do_joband two jobs.

gen_co = gen_coroutine()
next(gen_co)  # start
# receive job, build generator & yield, loop  <-- printed with first next()
# Out:'send me work ... or I quit with next next()'
do_job(gen_co, job1)  # prints out all results from job
# 1
# 1.4142135623730951
# 1.7320508075688772
# 2.0
# send me work... or I quit with next next()
do_job(gen_co, job2)  # send another job into generator
# 3
# 4
# 5
# 6
# send me work... or I quit with next next()
next(gen_co)
# Traceback ...
# StopIteration: no job left

回到您的问题,一般来说哪个版本更好.IMO 之类的 gen_factory 仅在您需要为要创建的多个生成器完成相同的事情时才有意义,或者如果生成器的构建过程足够复杂以证明使用工厂而不是构建是合理的带有生成器理解的各个生成器.

To come back to your question which version is the better approach in general. IMO something like gen_factory makes only sense if you need the same thing done for multiple generators you are going to create, or in cases your construction process for generators is complicated enough to justify use of a factory instead of building individual generators in place with a generator comprehension.

上面对 gen_generator 函数的描述(第二个 mymap)状态它是本身就是一个生成器".这有点模糊,技术上不是确实正确,但有助于推理功能的差异在这个棘手的设置中 gen_factory 还返回一个生成器,即由里面的生成器理解构建的.

The description above for the gen_generator function (second mymap) states "it is a generator itself". That is a bit vague and technically not really correct, but facilitates reasoning about the differences of the functions in this tricky setup where gen_factory also returns a generator, namely that one built by the generator comprehension inside.

事实上 any 函数(不仅是这个问题中带有生成器理解的函数!)在调用时,内部带有 yield返回一个从函数体构造出来的生成器对象.

In fact any function (not only those from this question with generator comprehensions inside!) with a yield inside, upon invocation, just returns a generator object which gets constructed out of the function body.

type(gen_coroutine) # 函数
gen_co = gen_coroutine();type(gen_co) # 生成器

所以我们在上面观察到的 gen_generatorgen_coroutine 的整个动作发生在这些生成器对象中,内部带有 yield 的函数之前已经吐出.

So the whole action we observed above for gen_generator and gen_coroutine takes place within these generator objects, functions with yield inside have spit out before.

相关文章