检索使用 multiprocessing.Pool.map 启动的进程的退出代码
问题描述
我正在使用 python multiprocessing
模块来并行化一些计算繁重的任务.显而易见的选择是使用 Pool
工人,然后使用 map
方法.
I'm using python multiprocessing
module to parallelize some computationally heavy tasks.
The obvious choice is to use a Pool
of workers and then use the map
method.
但是,进程可能会失败.例如,它们可能会被 oom-killer
静默杀死.因此,我希望能够检索使用 map
启动的进程的退出代码.
However, processes can fail. For instance, they may be silently killed for instance by the oom-killer
. Therefore I would like to be able to retrieve the exit code of the processes launched with map
.
此外,出于日志记录的目的,我希望能够知道为执行可迭代中的每个值而启动的进程的 PID.
Additionally, for logging purpose, I would like to be able to know the PID of the process launched to execute each value in the the iterable.
解决方案
如果您使用的是 multiprocessing.Pool.map
,您通常对退出代码不感兴趣> 在池中的子流程中,您对它们从其工作项返回的值感兴趣.这是因为在正常情况下,Pool
中的进程在您 close
/join
池之前不会退出,因此没有退出代码检索直到所有工作完成,并且 Pool
即将被销毁.因此,没有公共 API 可以获取这些子流程的退出代码.
If you're using multiprocessing.Pool.map
you're generally not interested in the exit code of the sub-processes in the pool, you're interested in what value they returned from their work item. This is because under normal conditions, the processes in a Pool
won't exit until you close
/join
the pool, so there's no exit codes to retrieve until all work is complete, and the Pool
is about to be destroyed. Because of this, there is no public API to get the exit codes of those sub-processes.
现在,您担心异常情况,即带外某些东西会在其中一个子进程正在工作时杀死它.如果你遇到这样的问题,你可能会遇到一些奇怪的行为.事实上,在我的测试中,我在 Pool
中杀死了一个进程,而它作为 map
调用的一部分正在工作,map
从未完成,因为被杀死的进程没有完成.然而,Python 确实立即启动了一个新进程来替换我杀死的那个.
Now, you're worried about exceptional conditions, where something out-of-band kills one of the sub-processes while it's doing work. If you hit an issue like this, you're probably going to run into some strange behavior. In fact, in my tests where I killed a process in a Pool
while it was doing work as part of a map
call, map
never completed, because the killed process didn't complete. Python did, however, immediately launch a new process to replace the one I killed.
也就是说,您可以通过使用私有 _pool
属性直接访问池中的 multiprocessing.Process
对象来获取池中每个进程的 pid:
That said, you can get the pid of each process in your pool by accessing the multiprocessing.Process
objects inside the pool directly, using the private _pool
attribute:
pool = multiprocessing.Pool()
for proc in pool._pool:
print proc.pid
因此,您可以做一件事来尝试检测进程何时意外死亡(假设您没有因此陷入阻塞调用).您可以通过在调用 map_async
之前和之后检查池中的进程列表来做到这一点:
So, one thing you could do to try to detect when a process had died unexpectedly (assuming you don't get stuck in a blocking call as a result). You can do this by examining the list of processes in the pool before and after making a call to map_async
:
before = pool._pool[:] # Make a copy of the list of Process objects in our pool
result = pool.map_async(func, iterable) # Use map_async so we don't get stuck.
while not result.ready(): # Wait for the call to complete
if any(proc.exitcode for proc in before): # Abort if one of our original processes is dead.
print "One of our processes has exited. Something probably went horribly wrong."
break
result.wait(timeout=1)
else: # We'll enter this block if we don't reach `break` above.
print result.get() # Actually fetch the result list here.
我们必须制作列表的副本,因为当 Pool
中的进程死亡时,Python 会立即用新进程替换它,并将死亡的进程从列表中删除.
We have to make a copy of the list because when a process in the Pool
dies, Python immediately replaces it with a new process, and removes the dead one from the list.
这在我的测试中对我有用,但是因为它依赖于 Pool
对象 (_pool
) 的私有属性,所以在生产代码中使用它是有风险的.我还建议,过分担心这种情况可能有点过头了,因为这种情况不太可能发生并且会使实现变得非常复杂.
This worked for me in my tests, but because it's relying on a private attribute of the Pool
object (_pool
) it's risky to use in production code. I would also suggest that it may be overkill to worry too much about this scenario, since it's very unlikely to occur and complicates the implementation significantly.
相关文章