Python multiprocessing.Pool() 不会使用 100% 的每个 CPU
问题描述
我正在研究 Python 中的多处理.例如,考虑 Python 多处理中给出的示例 文档(我在示例中将 100 更改为 1000000,只是为了消耗更多时间).当我运行它时,我确实看到 Pool() 正在使用所有 4 个进程,但我没有看到每个 CPU 都移动到 100%.如何实现每个CPU的使用率100%?
I am working on multiprocessing in Python. For example, consider the example given in the Python multiprocessing documentation (I have changed 100 to 1000000 in the example, just to consume more time). When I run this, I do see that Pool() is using all the 4 processes but I don't see each CPU moving upto 100%. How to achieve the usage of each CPU by 100%?
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4)
result = pool.map(f, range(10000000))
解决方案
是因为multiprocessing
需要主进程和后台工作进程进行进程间通信,通信开销比较大(挂钟)时间比你的情况下的实际"计算(x * x
).
It is because multiprocessing
requires interprocess communication between the main process and the worker processes behind the scene, and the communication overhead took more (wall-clock) time than the "actual" computation (x * x
) in your case.
尝试更重"的计算内核,比如
Try "heavier" computation kernel instead, like
def f(x):
return reduce(lambda a, b: math.log(a+b), xrange(10**5), x)
更新(澄清)
我指出 OP 观察到的低 CPU 使用率是由于 multiprocessing
中固有的 IPC 开销,但 OP 不需要过多担心,因为原始计算内核是太轻"了,不能用作基准.换句话说,multiprocessing
使用这种方式过于轻"的内核效果最差.如果 OP 在 multiprocessing
之上实现了一个真实世界的逻辑(我敢肯定,它会比 x * x
更重"),那么 OP 将我保证,达到不错的效率.我提出的重"内核实验支持了我的论点.
Update (clarification)
I pointed out that the low CPU usage observed by the OP was due to the IPC overhead inherent in multiprocessing
but the OP didn't need to worry about it too much because the original computation kernel was way too "light" to be used as a benchmark. In other words, multiprocessing
works the worst with such a way too "light" kernel. If the OP implements a real-world logic (which, I'm sure, will be somewhat "heavier" than x * x
) on top of multiprocessing
, the OP will achieve a decent efficiency, I assure. My argument is backed up by an experiment with the "heavy" kernel I presented.
@FilipMalczak,我希望我的澄清对你有意义.
@FilipMalczak, I hope my clarification makes sense to you.
顺便说一下,在使用multiprocessing
时有一些方法可以提高x * x
的效率.例如,我们可以将 1,000 个作业合并为一个,然后将其提交到 Pool
,除非我们需要实时解决每个作业(即,如果您实现 REST API 服务器,我们不应该这样做以这种方式).
By the way there are some ways to improve the efficiency of x * x
while using multiprocessing
. For example, we can combine 1,000 jobs into one before we submit it to Pool
unless we are required to solve each job in real time (ie. if you implement a REST API server, we shouldn't do in this way).
相关文章