多处理返回“打开的文件太多";但是使用 `with...as` 可以解决这个问题.为什么?

问题描述

我使用 this answer 以便在 Linux 机器上运行 Python 中的多处理并行命令.

I was using this answer in order to run parallel commands with multiprocessing in Python on a Linux box.

我的代码做了类似的事情:

My code did something like:

import multiprocessing
import logging

def cycle(offset):
    # Do stuff

def run():
    for nprocess in process_per_cycle:
        logger.info("Start cycle with %d processes", nprocess)
        offsets = list(range(nprocess))
        pool = multiprocessing.Pool(nprocess)
        pool.map(cycle, offsets)

但是我收到了这个错误:OSError: [Errno 24] Too many open files
因此,代码打开了太多的文件描述符,即:它启动了太多的进程而没有终止它们.

But I was getting this error: OSError: [Errno 24] Too many open files
So, the code was opening too many file descriptor, i.e.: it was starting too many processes and not terminating them.

我修复了它,用这些行替换了最后两行:

I fixed it replacing the last two lines with these lines:

    with multiprocessing.Pool(nprocess) as pool:
        pool.map(cycle, offsets)

但我不知道这些行修复它的确切原因.

But I do not know exactly why those lines fixed it.

with 下面发生了什么?


解决方案

您正在循环中创建新进程,然后在完成后忘记关闭它们.结果,您有太多打开的进程.这是个坏主意.

You're creating new processes inside a loop, and then forgetting to close them once you're done with them. As a result, there comes a point where you have too many open processes. This is a bad idea.

您可以通过使用自动调用 pool.terminate 的上下文管理器来解决此问题,或者您自己手动调用 pool.terminate.或者,你为什么不在循环外创建一个池一次,然后将任务发送到里面的进程?

You could fix this by using a context manager which automatically calls pool.terminate, or manually call pool.terminate yourself. Alternatively, why don't you create a pool outside the loop just once, and then send tasks to the processes inside?

pool = multiprocessing.Pool(nprocess) # initialise your pool
for nprocess in process_per_cycle:
    ...       
    pool.map(cycle, offsets) # delegate work inside your loop

pool.close() # shut down the pool

有关更多信息,您可以仔细阅读 multiprocessing.Pool 文档.

For more information, you could peruse the multiprocessing.Pool documentation.

相关文章