使用 Python 将巨大的嵌套循环划分为 8 个(或更多)进程的巧妙方法是什么?

2022-01-12 00:00:00 python multiprocessing

问题描述

这一次我面临一个设计"问题.使用 Python,我实现了一个使用 5 个参数的数学算法.为了找到这 5 个参数的最佳组合,我使用 5 层嵌套循环来枚举给定范围内的所有可能组合.完成所需的时间似乎超出了我的预期.所以我觉得是时候使用多线程了……

this time i'm facing a "design" problem. Using Python, I have a implement a mathematical algorithm which uses 5 parameters. To find the best combination of these 5 parameters, i used 5-layer nested loop to enumerate all possible combinations in a given range. The time it takes to finish appeared to be beyond my expectation. So I think it's the time to use multithreading...

嵌套循环的核心任务是计算和保存.在当前代码中,每个计算的结果都附加到一个列表中,并且该列表将在程序结束时写入一个文件.

The task in the core of nested loops are calculation and saving. In current code, result from every calculation is appended to a list and the list will be written to a file at the end of program.

由于我对任何语言的多线程都没有太多经验,更不用说 Python,我想请教一些关于这个问题的结构应该是什么的提示.即,如何将计算动态分配给线程,线程如何保存结果,然后将所有结果合并到一个文件中.希望线程数可以调整.

since I don't have too much experience of multithreading in any language, not to mention Python, I would like to ask for some hints on what should the structure be for this problem. Namely, how should the calculations be assigned to the threads dynamically and how should the threads save results and later combine all results into one file. I hope the number of threads can be adjustable.

任何带有代码的插图都会很有帮助.

Any illustration with code will be very helpful.

非常感谢您的宝贵时间,我很感激.

thank you very much for your time, I appreciate it.

第二天更新:感谢所有有用的答案,现在我知道它是多处理而不是多线程.我总是混淆这两个概念,因为我认为如果它是多线程的,那么操作系统会在可用时自动使用多个处理器来运行它.今晚我会抽出时间来实践一下多处理.

update of 2nd Day: thanks for all helpful answers, now I know that it is multiprocessing instead of multithreading. I always confuse with these two concepts because I think if it is multithreaded then the OS will automatically use multiple processor to run it when available. I will find time to have some hands-on with multiprocessing tonight.


解决方案

你可以尝试使用jug,我为非常相似的问题编写的一个库.然后你的代码看起来像

You can try using jug, a library I wrote for very similar problems. Your code would then look something like

from jug import TaskGenerator
evaluate = TaskGenerator(evaluate)

for p0 in [1,2,3]:
    for p1 in xrange(10):
        for p2 in xrange(10,20):
             for p3 in [True, False]:
                for p4 in xrange(100):
                    results.append(evaluate(p0,p1,p2,p3,p4))

现在您可以运行任意数量的进程(如果您可以访问计算机集群,甚至可以跨网络运行).

Now you could run as many processes as you'd like (even across a network if you have access to a computer cluster).

相关文章