多处理池“apply_async"似乎只调用一次函数
问题描述
我一直在关注文档以尝试了解多处理池.我想出了这个:
导入时间从多处理导入池定义 f(a):打印 'f(' + str(a) + ')'返回真t = time.time()池 = 池(进程 = 10)结果 = pool.apply_async(f, (1,))打印结果.get()池.close()print ' [i] 经过的时间 ' + str(time.time() - t)
我正在尝试使用 10 个进程来评估函数 f(a)
.我在 f
中放了一条打印语句.
这是我得到的输出:
$ python pooltest.pyf(1)真的[i] 经过时间 0.0270888805389
在我看来,函数 f
只被评估一次.
我可能没有使用正确的方法,但我正在寻找的最终结果是同时运行 10 个进程的 f
,并获得每个进程返回的结果.所以我会列出 10 个结果(可能相同也可能不同).
关于多处理的文档非常混乱,要弄清楚我应该采用哪种方法并非易事,在我看来,在我上面提供的示例中 f
应该运行 10 次.
apply_async 并不是要启动多个进程;它只是为了在池的一个进程中调用带有参数的函数.如果您希望函数被调用 10 次,则需要进行 10 次调用.
首先,请注意 apply 上的文档()
(强调):
apply(func[, args[, kwds]])
使用参数 args 和关键字参数 kwds 调用 func.它阻塞直到结果准备好.鉴于此块, apply_async() 更好适合并行执行工作.另外,func 只是在池中的一名工人中执行.
现在,在 apply_async 的文档中()
:
apply_async(func[, args[, kwds[, callback[, error_callback]]]])
apply() 方法的变体,它返回一个结果对象.
两者的区别只是 apply_async 立即返回.您可以使用 map()
多次调用一个函数,但如果您使用相同的输入进行调用,那么创建 相同 的列表有点多余参数只是为了具有正确长度的序列.
但是,如果您使用 same 输入调用不同的函数,那么您实际上只是在调用更高阶的函数,您可以使用 map
或 map_async()
像这样:
multiprocessing.map(lambda f: f(1), 函数)
除了 lambda 函数不可腌制,因此您需要使用已定义的函数(请参阅 如何让 Pool.map取一个 lambda 函数).您实际上可以使用内置的 apply()
(不是多处理的)(尽管它已被弃用):
multiprocessing.map(apply,[(f,1) for f in functions])
自己编写也很容易:
def apply_(f,*args,**kwargs):返回 f(*args,**kwargs)multiprocessing.map(apply_,[(f,1) for f in functions])
I've been following the docs to try to understand multiprocessing pools. I came up with this:
import time
from multiprocessing import Pool
def f(a):
print 'f(' + str(a) + ')'
return True
t = time.time()
pool = Pool(processes=10)
result = pool.apply_async(f, (1,))
print result.get()
pool.close()
print ' [i] Time elapsed ' + str(time.time() - t)
I'm trying to use 10 processes to evaluate the function f(a)
. I've put a print statement in f
.
This is the output I'm getting:
$ python pooltest.py
f(1)
True
[i] Time elapsed 0.0270888805389
It appears to me that the function f
is only getting evaluated once.
I'm likely not using the right method but the end result I'm looking for is to run f
with 10 processes simultaneously, and get the result returned by each one of those process. So I would end with a list of 10 results (which may or may not be identical).
The docs on multiprocessing are quite confusing and it's not trivial to figure out which approach I should be taking and it seems to me that f
should be run 10 times in the example I provided above.
apply_async isn't meant to launch multiple processes; it's just meant to call the function with the arguments in one of the processes of the pool. You'll need to make 10 calls if you want the function to be called 10 times.
First, note the docs on apply()
(emphasis added):
apply(func[, args[, kwds]])
Call func with arguments args and keyword arguments kwds. It blocks until the result is ready. Given this blocks, apply_async() is better suited for performing work in parallel. Additionally, func is only executed in one of the workers of the pool.
Now, in the docs for apply_async()
:
apply_async(func[, args[, kwds[, callback[, error_callback]]]])
A variant of the apply() method which returns a result object.
The difference between the two is just that apply_async returns immediately. You can use map()
to call a function multiple times, though if you're calling with the same inputs, then it's a little redudant to create the list of the same argument just to have a sequence of the right length.
However, if you're calling different functions with the same input, then you're really just calling a higher order function, and you could do it with map
or map_async()
like this:
multiprocessing.map(lambda f: f(1), functions)
except that lambda functions aren't pickleable, so you'd need to use a defined function (see How to let Pool.map take a lambda function). You can actually use the builtin apply()
(not the multiprocessing one) (although it's deprecated):
multiprocessing.map(apply,[(f,1) for f in functions])
It's easy enough to write your own, too:
def apply_(f,*args,**kwargs):
return f(*args,**kwargs)
multiprocessing.map(apply_,[(f,1) for f in functions])
相关文章