delay() 函数有什么作用(在 Python 中与 joblib 一起使用时)
问题描述
我已经阅读了 文档,但我不明白这是什么意思:延迟函数是一个简单的技巧,可以使用函数调用语法创建元组(函数、args、kwargs).
I've read through the documentation, but I don't understand what is meant by:
The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.
我正在使用它来遍历我想要操作的列表(allImages),如下所示:
I'm using it to iterate over the list I want to operate on (allImages) as follows:
def joblib_loop():
Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)
这会返回我想要的 HOG 功能(并使用我所有的 8 个内核来提高速度),但我只是不确定它实际上在做什么.
This returns my HOG features, like I want (and with the speed gain using all my 8 cores), but I'm just not sure what it is actually doing.
我的 Python 知识充其量还可以,但我很可能缺少一些基本知识.任何指向正确方向的指针将不胜感激
My Python knowledge is alright at best, and it's very possible that I'm missing something basic. Any pointers in the right direction would be most appreciated
解决方案
如果我们看看如果我们简单地写会发生什么事情会变得更清楚
Perhaps things become clearer if we look at what would happen if instead we simply wrote
Parallel(n_jobs=8)(getHog(i) for i in allImages)
在这种情况下,可以更自然地表达为:
which, in this context, could be expressed more naturally as:
- 使用
n_jobs=8
创建一个 - 创建列表
[getHog(i) for i in allImages]
- 将该列表传递给
Parallel
实例
Parallel
实例- Create a
Parallel
instance withn_jobs=8
- create the list
[getHog(i) for i in allImages]
- pass that list to the
Parallel
instance
有什么问题?当列表被传递给 Parallel
对象时,所有 getHog(i)
调用都已经返回 - 所以没有任何东西可以并行执行!所有的工作都已经在主线程中按顺序完成了.
What's the problem? By the time the list gets passed to the Parallel
object, all getHog(i)
calls have already returned - so there's nothing left to execute in Parallel! All the work was already done in the main thread, sequentially.
我们实际上想要的是告诉Python我们想用什么参数调用什么函数,没有实际调用它们——换句话说,我们想要延迟执行.
What we actually want is to tell Python what functions we want to call with what arguments, without actually calling them - in other words, we want to delay the execution.
这是 delayed
方便我们做的事情,语法清晰.如果我们想告诉 Python 我们想稍后调用 foo(2, g=3)
,我们可以简单地写成 delayed(foo)(2, g=3)代码>.返回的是元组
(foo, [2], {g: 3})
,包含:
This is what delayed
conveniently allows us to do, with clear syntax. If we want to tell Python that we'd like to call foo(2, g=3)
sometime later, we can simply write delayed(foo)(2, g=3)
. Returned is the tuple (foo, [2], {g: 3})
, containing:
- 对我们要调用的函数的引用,例如
foo
- 所有参数(简称args")不带关键字,例如
2
- 所有关键字参数(简称kwargs"),例如
g=3
- a reference to the function we want to call, e.g.
foo
- all arguments (short "args") without a keyword, e.g.t
2
- all keyword arguments (short "kwargs"), e.g.
g=3
因此,通过编写 Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)
,而不是上面的顺序,现在会发生以下情况:
So, by writing Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)
, instead of the above sequence, now the following happens:
创建了具有
n_jobs=8
的Parallel
实例
名单
[delayed(getHog)(i) for i in allImages]
被创建,评估为
[(getHog, [img1], {}), (getHog, [img2], {}), ... ]
该列表被传递给 Parallel
实例
Parallel
实例创建 8 个线程并将列表中的元组分配给它们
The Parallel
instance creates 8 threads and distributes the tuples from the list to them
最后,这些线程中的每一个都开始执行元组,即,它们调用第一个元素,并将第二个和第三个元素解包为参数 tup[0](*tup[1], **tup[2])
,将元组转回我们真正想要做的调用,getHog(img2)
.
Finally, each of those threads starts executing the tuples, i.e., they call the first element with the second and the third elements unpacked as arguments tup[0](*tup[1], **tup[2])
, turning the tuple back into the call we actually intended to do, getHog(img2)
.
相关文章