具有单个函数的 Python 多处理

2022-01-12 00:00:00 python function multithreading multiprocessing

问题描述

我有一个当前正在运行的模拟，但 ETA 大约需要 40 小时 - 我正在尝试通过多处理来加速它.

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.

它本质上迭代了一个变量 (L) 的 3 个值，以及第二个变量 (a) 的 99 个值.使用这些值，它实际上运行了一个复杂的模拟并返回 9 个不同的标准偏差.因此(尽管我还没有这样编码)它本质上是一个函数，它接受两个值作为输入 (L,a) 并返回 9 个值.

It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.

这是我拥有的代码的精髓:

Here is the essence of the code I have:

STD_1 = [] STD_2 = [] # etc. for L in range(0,6,2): for a in range(1,100): ### simulation code ### STD_1.append(value_1) STD_2.append(value_2) # etc.

以下是我可以修改的内容:

Here is what I can modify it to:

master_list = [] def simulate(a,L): ### simulation code ### return (a,L,STD_1, STD_2 etc.) for L in range(0,6,2): for a in range(1,100): master_list.append(simulate(a,L))

由于每个模拟都是独立的，因此它似乎是实现某种多线程/处理的理想场所.

Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.

我将如何编写这个代码?

How exactly would I go about coding this?

另外，是否所有内容都会按顺序返回到主列表，或者如果多个进程正在工作，它可能会出现故障?

Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?

编辑 2:这是我的代码——但它运行不正确.它询问我是否想在我运行程序后立即终止它.

EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.

import multiprocessing data = [] for L in range(0,6,2): for a in range(1,100): data.append((L,a)) print (data) def simulation(arg): # unpack the tuple a = arg[1] L = arg[0] STD_1 = a**2 STD_2 = a**3 STD_3 = a**4 # simulation code # return((STD_1,STD_2,STD_3)) print("1") p = multiprocessing.Pool() print ("2") results = p.map(simulation, data)

编辑 3:还有什么是多处理的限制.我听说它不能在 OS X 上运行.这是正确的吗?

EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

解决方案

将每次迭代的数据包装成一个元组.
列出这些元组的data
编写函数f处理一个元组并返回一个结果
创建 p = multiprocessing.Pool() 对象.
调用results = p.map(f, data)

Wrap the data for each iteration up into a tuple.

Make a list data of those tuples

Write a function f to process one tuple and return one result

Create p = multiprocessing.Pool() object.

Call results = p.map(f, data)

这将运行尽可能多的 f 实例，因为您的机器在不同进程中拥有内核.

This will run as many instances of f as your machine has cores in separate processes.

Edit1:示例:

from multiprocessing import Pool data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)] def f(t): name, a, b, c = t return (name, a + b + c) p = Pool() results = p.map(f, data) print results

多处理应该可以在 OSX 等类 UNIX 平台上正常工作.只有缺少 os.fork 的平台(主要是 MS Windows)需要特别注意.但即使在那里它仍然有效.请参阅多处理文档.

Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

相关文章