带有 maxtasksperchild 的 multiprocessing.Pool 产生相等的 PID

2022-01-12 00:00:00 python python-3.x multiprocessing pid

问题描述

我需要在与所有其他内存完全隔离的进程中多次运行一个函数.我想为此使用 multiprocessing (因为我需要序列化来自函数的复杂输出).我将 start_method 设置为 'spawn' 并使用带有 maxtasksperchild=1 的池.我希望为每个任务获得不同的进程,因此会看到不同的 PID:

I need to run a function in a process, which is completely isolated from all other memory, several times. I would like to use multiprocessing for that (since I need to serialize a complex output coming from the functions). I set the start_method to 'spawn' and use a pool with maxtasksperchild=1. I would expect to get a different process for each task, and therefore see a different PID:

import multiprocessing
import time
import os

def f(x):
    print("PID: %d" % os.getpid())
    time.sleep(x)
    complex_obj = 5 #more complex axtually
    return complex_obj

if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    pool = multiprocessing.Pool(4, maxtasksperchild=1)
    pool.map(f, [5]*30)
    pool.close()

但是我得到的输出是:

$ python untitled1.py 
PID: 30010
PID: 30009
PID: 30012
PID: 30011
PID: 30010
PID: 30009
PID: 30012
PID: 30011
PID: 30018
PID: 30017
PID: 30019
PID: 30020
PID: 30018
PID: 30019
PID: 30017
PID: 30020
...

因此,不会在每个任务之后重新生成进程.是否有一种每次都自动获取新 PID 的方法(即无需为每组进程启动一个新池)?

So the processes are not being respawned after every task. Is there an automatic way of getting a new PID each time (ie without starting a new pool for each set of processes)?


解决方案

您还需要在调用 pool.map 时指定 chunksize=1.否则,从工作进程的感知来看,您的可迭代项中的多个项目会被捆绑到一个任务"中:

You need to also specify chunksize=1 in the call to pool.map. Otherwise, multiple items in your iterable get bundled together into one "task" from the perception of the worker processes:

import multiprocessing
import time
import os

def f(x):
    print("PID: %d" % os.getpid())
    time.sleep(x)
    complex_obj = 5 #more complex axtually
    return complex_obj

if __name__ == '__main__':
    multiprocessing.set_start_method('spawn')
    pool = multiprocessing.Pool(4, maxtasksperchild=1)
    pool.map(f, [5]*30, chunksize=1)
    pool.close()

现在输出没有重复的 PID:

Output doesn't have repeated PIDs now:

PID: 4912
PID: 4913
PID: 4914
PID: 4915
PID: 4938
PID: 4937
PID: 4940
PID: 4939
PID: 4966
PID: 4965
PID: 4970
PID: 4971
PID: 4991
PID: 4990
PID: 4992
PID: 4993
PID: 5013
PID: 5014
PID: 5012

相关文章