Python多处理池,加入;不等待继续?
问题描述
(1) 我正在尝试使用 pool.map
后跟 pool.join()
,但 python 似乎并没有等待 pool.map
在继续通过 pool.join()
之前完成.这是我尝试过的一个简单示例:
(1) I'm trying to use pool.map
followed by pool.join()
, but python doesn't seem to be waiting for pool.map
to finish before going on past the pool.join()
. Here's a simple example of what I've tried:
from multiprocessing import Pool
foo = {1: []}
def f(x):
foo[1].append(x)
print foo
def main():
pool = Pool()
pool.map(f, range(100))
pool.close()
pool.join()
print foo
if __name__ == '__main__':
main()
打印输出只是{1: []}
,就好像python只是忽略了join
命令并在它之前运行了print foo
有机会运行 f
.预期的结果是 foo
是 {1:[0,1,...,99]}
,并且使用普通的内置 python map
给出了这个结果.为什么池化版本打印 {1: []}
,如何更改我的代码以使其打印预期结果?
The printed output is just {1: []}
, as if python just ignored the join
command and ran print foo
before it had a chance to run f
. The intended result is that foo
is {1:[0,1,...,99]}
, and using the ordinary built-in python map
gives this result. Why is the pooled version printing {1: []}
, and how can I change my code to make it print the intended result?
(2) 理想情况下,我还想将 foo
定义为 main()
中的局部变量并将其传递给 f
,但是通过使 foo
作为 f
的第一个参数并使用
(2) Ideally I'd also like to define foo
as a local variable in main()
and pass it to f
,
but doing this by making foo
the first argument of f
and using
pool.map(functools.partial(f, foo), range(100))
产生相同的输出.(并且可能还存在每个进程现在都有自己的 foo
副本的问题?)尽管如此,它还是使用普通的 map
工作.
produces the same output. (and possibly also has the problem that each process now has its own copy of foo
?) Though again, it works using the normal map
instead.
解决方案
这不是map
的正确使用方式.
This is not the correct way to use map
.
- 以这种方式使用全局变量是绝对错误的.进程不共享相同的内存(通常),因此每个
f
都将拥有自己的foo
副本.要在不同进程之间共享变量,您应该使用Manager
- 传递给
map
的函数通常会返回一个值.
- Using a global variable that way is absolutely wrong. Processes do not share the same memory (normally) so every
f
will have his own copy offoo
. To share a variable between different processes you should use aManager
- Function passed to
map
are, usually, expected to return a value.
我建议你阅读一些文档.
但是,这里是一个虚拟示例,说明如何实现它:
However here is a dummy example of how you could implement it:
from multiprocessing import Pool
foo = {1: []}
def f(x):
return x
def main():
pool = Pool()
foo[1] = pool.map(f, range(100))
pool.close()
pool.join()
print foo
if __name__ == '__main__':
main()
您也可以执行 pool.map(functools.partial(f, foo), range(100))
之类的操作,其中 foo
是 Manager代码>.
You may also do something like pool.map(functools.partial(f, foo), range(100))
where foo
is a Manager
.
相关文章