在多处理期间保持统一计数?

2022-01-12 00:00:00 python multithreading multiprocessing

问题描述

我有一个 python 程序，它运行蒙特卡罗模拟来寻找概率问题的答案.我正在使用多处理，这里是伪代码

I have a python program that runs a Monte Carlo simulation to find answers to probability questions. I am using multiprocessing and here it is in pseudo code

import multiprocessing def runmycode(result_queue): print "Requested..." while 1==1: iterations +=1 if "result found (for example)": result_queue.put("result!") print "Done" processs = [] result_queue = multiprocessing.Queue() for n in range(4): # start 4 processes process = multiprocessing.Process(target=runmycode, args=[result_queue]) process.start() processs.append(process) print "Waiting for result..." result = result_queue.get() # wait for process in processs: # then kill them all off process.terminate() print "Got result:", result

我想对此进行扩展，以便统一计算已运行的迭代次数.就像如果线程 1 已经运行了 100 次，线程 2 已经运行了 100 次，那么我想总共显示 200 次迭代，作为控制台的打印.我指的是线程进程中的 iterations 变量.如何确保所有线程都添加到同一个变量?我认为使用 iterations 的 Global 版本会起作用，但事实并非如此.

I'd like to extend this so that I can keep a unified count of the number of iterations that have been run. Like if thread 1 has run 100 times and thread 2 has run 100 times then I want to show 200 iterations total, as a print to the console. I am referring to the iterations variable in the thread process. How can I make sure that ALL threads are adding to the same variable? I thought that using a Global version of iterations would work but it does not.

解决方案

正常的全局变量在进程之间的共享方式与线程之间的共享方式不同.您需要使用流程感知数据结构.对于您的用例，multiprocessing.Value 应该可以正常工作:

Normal global variables are not shared between processes the way they are shared between threads. You need to use a process-aware data structure. For your use-case, a multiprocessing.Value should work fine:

import multiprocessing def runmycode(result_queue, iterations): print("Requested...") while 1==1: # This is an infinite loop, so I assume you want something else here with iterations.get_lock(): # Need a lock because incrementing isn't atomic iterations.value += 1 if "result found (for example)": result_queue.put("result!") print("Done") if __name__ == "__main__": processs = [] result_queue = multiprocessing.Queue() iterations = multiprocessing.Value('i', 0) for n in range(4): # start 4 processes process = multiprocessing.Process(target=runmycode, args=(result_queue, iterations)) process.start() processs.append(process) print("Waiting for result...") result = result_queue.get() # wait for process in processs: # then kill them all off process.terminate() print("Got result: {}".format(result)) print("Total iterations {}".format(iterations.value))

几点说明:

我明确地将 Value 传递给孩子，以保持代码与 Windows 兼容，Windows 无法在父子之间共享读/写全局变量.
我用锁保护了增量，因为它不是原子操作，并且容易受到竞争条件的影响.
我添加了一个 if __name__ == "__main__": 保护，再次帮助提高 Windows 兼容性，并作为一般最佳实践.

I explicitly passed the Value to the children, to keep the code compatible with Windows, which can't share read/write global variables between parent and children.

I protected the increment with a lock, because its not an atomic operation, and is susceptible to race conditions.

I added an if __name__ == "__main__": guard, again to help with Windows compatibility, and just as a general best practice.

相关文章