如何在 Python 中使用多处理并行求和循环

2022-01-12 00:00:00 python multiprocessing

问题描述

我很难理解如何使用 Python 的多处理模块.

I am having difficulty understanding how to use Python's multiprocessing module.

我有一个从 1n 的总和,其中 n=10^10,太大而无法放入列表中,这似乎是许多使用多处理的在线示例的主旨.

I have a sum from 1 to n where n=10^10, which is too large to fit into a list, which seems to be the thrust of many examples online using multiprocessing.

有没有办法将范围拆分"成一定大小的段,然后对每个段进行求和?

Is there a way to "split up" the range into segments of a certain size and then perform the sum for each segment?

例如

def sum_nums(low,high):
    result = 0
    for i in range(low,high+1):
        result += i
    return result

我想通过将 sum_nums(1,10**10) 分解为许多 sum_nums(1,1000) + sum_nums(1001,2000) + sum_nums(2001) 来计算,3000)... 等等.我知道有一个接近形式的 n(n+1)/2 但假装我们不知道.

And I want to compute sum_nums(1,10**10) by breaking it up into many sum_nums(1,1000) + sum_nums(1001,2000) + sum_nums(2001,3000)... and so on. I know there is a close-form n(n+1)/2 but pretend we don't know that.

这是我尝试过的

import multiprocessing

def sum_nums(low,high):
    result = 0
    for i in range(low,high+1):
        result += i
    return result

if __name__ == "__main__":
    n = 1000 
    procs = 2 

    sizeSegment = n/procs

    jobs = []
    for i in range(0, procs):
        process = multiprocessing.Process(target=sum_nums, args=(i*sizeSegment+1, (i+1)*sizeSegment))
        jobs.append(process)

    for j in jobs:
        j.start()
    for j in jobs:
        j.join()

    #where is the result?


解决方案

首先,解决内存问题的最佳方法是使用迭代器/生成器而不是列表:

First, the best way to get around the memory issue is to use an iterator/generator instead of a list:

def sum_nums(low, high):
    result = 0
    for i in xrange(low, high+1):
        result += 1
    return result

在python3中,range()产生一个迭代器,所以这只在python2中需要

现在,当您希望将处理拆分到不同的进程或 CPU 内核时,多处理就派上用场了.如果您不需要控制单个工作人员,那么最简单的方法是使用进程池.这将允许您将函数映射到池并获取输出.您也可以使用 apply_async 一次将作业应用到池中,并获得延迟结果,您可以使用 .get():

Now, where multiprocessing comes in is when you want to split up the processing to different processes or CPU cores. If you don't need to control the individual workers than the easiest method is to use a process pool. This will let you map a function to the pool and get the output. You can alternatively use apply_async to apply jobs to the pool one at a time and get a delayed result which you can get with .get():

import multiprocessing
from multiprocessing import Pool
from time import time

def sum_nums(low, high):
    result = 0
    for i in xrange(low, high+1):
        result += i
    return result

# map requires a function to handle a single argument
def sn((low,high)):
    return sum_nums(low, high) 

if __name__ == '__main__': 
    #t = time()
    # takes forever   
    #print sum_nums(1,10**10)
    #print '{} s'.format(time() -t)
    p = Pool(4)

    n = int(1e8)
    r = range(0,10**10+1,n)
    results = []

    # using apply_async
    t = time()
    for arg in zip([x+1 for x in r],r[1:]):
        results.append(p.apply_async(sum_nums, arg))

    # wait for results
    print sum(res.get() for res in results)
    print '{} s'.format(time() -t)

    # using process pool
    t = time()
    print sum(p.map(sn, zip([x+1 for x in r], r[1:])))
    print '{} s'.format(time() -t)

在我的机器上,仅使用 10**10 调用 sum_nums 需要将近 9 分钟,但使用 Pool(8)n=int(1e8) 将这个时间缩短到一分钟多一点.

On my machine, just calling sum_nums with 10**10 takes almost 9 minutes, but using a Pool(8) and n=int(1e8) reduces this to just over a minute.

相关文章