获取python多处理池中worker的唯一ID
问题描述
有没有办法为 python 多处理池中的每个工作人员分配一个唯一的 ID,以便池中特定工作人员运行的作业可以知道哪个工作人员正在运行它?根据文档, Process
有一个 name
但是
Is there a way to assign each worker in a python multiprocessing pool a unique ID in a way that a job being run by a particular worker in the pool could know which worker is running it? According to the docs, a Process
has a name
but
名称是一个仅用于识别目的的字符串.它没有语义.多个进程可以被赋予相同的名称.
The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name.
对于我的特定用例,我想在一组四个 GPU 上运行一堆作业,并且需要为应该运行作业的 GPU 设置设备号.因为作业的长度不均匀,所以我想确保在前一个作业完成之前尝试在 GPU 上运行的作业不会在 GPU 上发生冲突(因此这排除了将 ID 预先分配给工作单元提前).
For my particular use-case, I want to run a bunch of jobs on a group of four GPUs, and need to set the device number for the GPU that the job should run on. Because the jobs are of non-uniform length, I want to be sure that I don't have a collision on a GPU of a job trying to run on it before the previous one completes (so this precludes pre-assigning an ID to the unit of work ahead of time).
解决方案
看起来你想要的很简单:multiprocessing.current_process()
.例如:
It seems like what you want is simple: multiprocessing.current_process()
. For example:
import multiprocessing
def f(x):
print multiprocessing.current_process()
return x * x
p = multiprocessing.Pool()
print p.map(f, range(6))
输出:
$ python foo.py
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-3, started daemon)>
<Process(PoolWorker-1, started daemon)>
<Process(PoolWorker-2, started daemon)>
<Process(PoolWorker-4, started daemon)>
[0, 1, 4, 9, 16, 25]
这会返回进程对象本身,因此进程可以是它自己的身份.您也可以在其上调用 id
以获得唯一的数字 id ——在 cpython 中,这是进程对象的内存地址,所以我不认为有任何可能性的重叠.最后,您可以使用进程的 ident
或 pid
属性——但这仅在进程启动后设置.
This returns the process object itself, so the process can be its own identity. You could also call id
on it for a unique numerical id -- in cpython, this is the memory address of the process object, so I don't think there's any possibility of overlap. Finally, you can use the ident
or the pid
property of the process -- but that's only set once the process is started.
此外,查看源代码,在我看来,自动生成的名称(如上面 Process
repr 字符串中的第一个值所示)很可能是唯一的.multiprocessing
为每个进程维护一个 itertools.counter
对象,用于生成 _identity
元组用于它产生的任何子进程.因此顶级进程产生具有单值 id 的子进程,它们产生具有双值 id 的进程,依此类推.然后,如果没有名称传递给 Process
构造函数,它只是 使用 ':'.join(...)
根据 _identity 自动生成名称.然后 Pool
更改名称使用 replace
处理,自动生成的 id 保持不变.
Furthermore, looking over the source, it seems to me very likely that autogenerated names (as exemplified by the first value in the Process
repr strings above) are unique. multiprocessing
maintains an itertools.counter
object for every process, which is used to generate an _identity
tuple for any child processes it spawns. So the top-level process produces child process with single-value ids, and they spawn process with two-value ids, and so on. Then, if no name is passed to the Process
constructor, it simply autogenerates the name based on the _identity, using ':'.join(...)
. Then Pool
alters the name of the process using replace
, leaving the autogenerated id the same.
这一切的结果是虽然两个Process
es可能有相同的名字,因为你可能给它们分配了相同的名字创建它们时,如果您不触摸 name 参数,它们是唯一的.此外,理论上您可以使用 _identity
作为唯一标识符;但我认为他们将这个变量设为私有是有原因的!
The upshot of all this is that although two Process
es may have the same name, because you may assign the same name to them when you create them, they are unique if you don't touch the name parameter. Also, you could theoretically use _identity
as a unique identifier; but I gather they made that variable private for a reason!
上面的一个例子:
import multiprocessing
def f(x):
created = multiprocessing.Process()
current = multiprocessing.current_process()
print 'running:', current.name, current._identity
print 'created:', created.name, created._identity
return x * x
p = multiprocessing.Pool()
print p.map(f, range(6))
输出:
$ python foo.py
running: PoolWorker-1 (1,)
created: Process-1:1 (1, 1)
running: PoolWorker-2 (2,)
created: Process-2:1 (2, 1)
running: PoolWorker-3 (3,)
created: Process-3:1 (3, 1)
running: PoolWorker-1 (1,)
created: Process-1:2 (1, 2)
running: PoolWorker-2 (2,)
created: Process-2:2 (2, 2)
running: PoolWorker-4 (4,)
created: Process-4:1 (4, 1)
[0, 1, 4, 9, 16, 25]
相关文章