强制使用 if __name__=="__main__"在 Windows 中同时使用多处理
问题描述
在 windows 上的 python 中使用多处理时,希望保护程序的入口点.文档说确保新的 Python 解释器可以安全地导入主模块,而不会导致意外的副作用(例如启动新进程)".谁能解释一下这到底是什么意思?
While using multiprocessing in python on windows, it is expected to protect the entry point of the program. The documentation says "Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process)". Can anyone explain what exactly does this mean ?
解决方案
扩展一下你已经得到的好答案,如果你了解 Linux 系统做什么,它会有所帮助.它们使用 fork()
生成新进程,这有两个好 后果:
Expanding a bit on the good answer you already got, it helps if you understand what Linux-y systems do. They spawn new processes using fork()
, which has two good consequences:
- 主程序中存在的所有数据结构对子进程都是可见的.他们实际上是处理数据的副本.
- 子进程在主程序中紧跟
fork()
的指令处开始执行 - 因此任何已在模块中执行的模块级代码都不会再次执行.
- All data structures existing in the main program are visible to the child processes. They actually work on copies of the data.
- The child processes start executing at the instruction immediately following the
fork()
in the main program - so any module-level code already executed in the module will not be executed again.
fork()
在 Windows 中是不可能的,因此在 Windows 上,每个模块都由每个子进程重新导入.所以:
fork()
isn't possible in Windows, so on Windows each module is imported anew by each child process. So:
- 在 Windows 上,no 存在于主程序中的数据结构对子进程是可见的;并且,
- 所有模块级代码在每个子进程中执行.
- On Windows, no data structures existing in the main program are visible to the child processes; and,
- All module-level code is executed in each child process.
所以你需要考虑一下想要只在主程序中执行的代码.最明显的例子是,您希望创建子进程的代码仅在主程序中运行——因此应该受到 __name__ == '__main__'
的保护.举一个更微妙的例子,考虑构建一个巨大列表的代码,您打算将其传递给工作进程以进行爬网.您可能也想保护它,因为在这种情况下,让每个工作进程浪费 RAM 和时间来构建自己的巨大列表的无用副本是没有意义的.
So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by __name__ == '__main__'
. For a subtler example, consider code that builds a gigantic list, which you intend to pass out to worker processes to crawl over. You probably want to protect that too, because there's no point in this case to make each worker process waste RAM and time building their own useless copies of the gigantic list.
请注意,即使在 Linux 系统上适当地使用 __name__ == "__main__"
也是一个好主意,因为它使预期的工作分工更清晰.并行程序可能会令人困惑——每一点都有帮助 ;-)
Note that it's a Good Idea to use __name__ == "__main__"
appropriately even on Linux-y systems, because it makes the intended division of work clearer. Parallel programs can be confusing - every little bit helps ;-)
相关文章