为什么在'__main__'中导入模块不允许multiprocessig使用模块?

2022-01-12 00:00:00 python python-2.7 multiprocessing arcpy

问题描述

我已经通过将导入移到顶部声明解决了我的问题,但这让我想知道:为什么我不能在函数中使用在 '__main__' 中导入的模块multiprocessing 的目标?

I've already solved my problem by moving the import to the top declarations, but it left me wondering: Why cant I use a module that was imported in '__main__' in functions that are the targets of multiprocessing?

例如:

import os
import multiprocessing as mp

def run(in_file, out_dir, out_q):
    arcpy.RaterToPolygon_conversion(in_file, out_dir, "NO_SIMPIFY", "Value")
    status = str("Done with "+os.path.basename(in_file))
    out_q.put(status, block=False)

if __name__ == '__main__':
    raw_input("Program may hang, press Enter to import ArcPy...")
    import arcpy

    q = mp.Queue()
    _file = path/to/file
    _dir = path/to/dir
    # There are actually lots of files in a loop to build
    # processes but I just do one for context here
    p = mp.Process(target=run, args=(_file, _dir, q))
    p.start()

# I do stuff with Queue below to status user

当您在 IDLE 中运行它时,它根本不会出错...只是继续进行 Queue 检查(这很好,所以不是问题).问题是,当您在 CMD 终端(操作系统或 Python)中运行它时,会产生 arcpy 未定义的错误!

When you run this in IDLE it doesn't error at all...just keeps doing a Queue check (which is good so not the problem). The problem is that when you run this in the CMD terminal (either OS or Python) it produces the error that arcpy is not defined!

只是一个奇怪的话题.


解决方案

类unix系统和windows的情况不同.在 unixy 系统上,multiprocessing 使用 fork 创建共享父内存空间的写时复制视图的子进程.子进程会看到来自父进程的导入,包括父进程在 if __name__ == "__main__": 下导入的任何内容.

The situation is different in unix-like systems and Windows. On the unixy systems, multiprocessing uses fork to create child processes that share a copy-on-write view of the parent memory space. The child sees the imports from the parent, including anything the parent imported under if __name__ == "__main__":.

在 windows 上,没有 fork,必须执行一个新进程.但是简单地重新运行父进程是行不通的——它会再次运行整个程序.相反,multiprocessing 运行自己的 python 程序,该程序导入父主脚本,然后腌制/取消腌制父对象空间的视图,希望这对于子进程来说足够了.

On windows, there is no fork, a new process has to be executed. But simply rerunning the parent process doesn't work - it would run the whole program again. Instead, multiprocessing runs its own python program that imports the parent main script and then pickles/unpickles a view of the parent object space that is, hopefully, sufficient for the child process.

该程序是子进程的 __main__ 并且父脚本的 __main__ 不运行.主脚本就像任何其他模块一样被导入.原因很简单:运行父 __main__ 只会再次运行完整的父程序,这是 mp 必须避免的.

That program is the __main__ for the child process and the __main__ of the parent script doesn't run. The main script was just imported like any other module. The reason is simple: running the parent __main__ would just run the full parent program again, which mp must avoid.

这是一个测试来显示发生了什么.一个名为 testmp.py 的主模块和一个由第一个模块导入的第二个模块 test2.py.

Here is a test to show what is going on. A main module called testmp.py and a second module test2.py that is imported by the first.

testmp.py

import os
import multiprocessing as mp

print("importing test2")
import test2

def worker():
    print('worker pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))

if __name__ == "__main__":
    print('main pid: {}, module name: {}, file name: {}'.format(os.getpid(), 
        __name__, __file__))
    print("running process")
    proc = mp.Process(target=worker)
    proc.start()
    proc.join()

test2.py

import os

print('test2 pid: {}, module name: {}, file name: {}'.format(os.getpid(),
        __name__, __file__))

在 Linux 上运行时,test2 被导入一次,worker 运行在主模块中.

When run on Linux, test2 is imported once and the worker runs in the main module.

importing test2
test2 pid: 17840, module name: test2, file name: /media/td/USB20FD/tmp/test2.py
main pid: 17840, module name: __main__, file name: testmp.py
running process
worker pid: 17841, module name: __main__, file name: testmp.py

在 windows 下,请注意importing test2"打印了两次 - testmp.py 运行了两次.但是main pid"只打印了一次——它的 __main__ 没有运行.那是因为 multiprocessing 在导入期间将模块名称更改为 __mp_main__.

Under windows, notice that "importing test2" is printed twice - testmp.py was run two times. But "main pid" was only printed once - its __main__ wasn't run. That's because multiprocessing changed the module name to __mp_main__ during import.

E:	mp>py testmp.py
importing test2
test2 pid: 7536, module name: test2, file name: E:	mp	est2.py
main pid: 7536, module name: __main__, file name: testmp.py
running process
importing test2
test2 pid: 7544, module name: test2, file name: E:	mp	est2.py
worker pid: 7544, module name: __mp_main__, file name: E:	mp	estmp.py

相关文章