带有悲情的 Python 多处理

2022-01-12 00:00:00 python multiprocessing pathos pool

问题描述

我正在尝试使用 Python 的 pathos 将计算指定到单独的进程中,以便使用多核处理器加速它.我的代码组织如下:

I am trying to use Python's pathos to designate computations into separate processes in order to accelerate it with multicore processor. My code is organized like:

class:
   def foo(self,name):
    ...
    setattr(self,name,something)
    ...
   def boo(self):
      for name in list:
         self.foo(name)

由于我在使用 multiprocessing.Pool 时遇到了酸洗问题,所以我决定尝试一下 pathos.我尝试过,如先前主题中所建议的那样:

As I had pickling problems with multiprocessing.Pool, I decided to try pathos. I tried, as suggested in previous topics:

import pathos.multiprocessing

但它导致错误:没有模块多处理 - 我在最新的 pathos 版本中找不到.

but it resulted in error: No module multiprocessing - which I can't find in latest pathos version.

然后我尝试修改boo方法:

Then I tried modify boo method:

def boo(self):
 import pathos
 pathos.pp_map.pp_map(self.foo,list)

现在没有抛出错误,但 foo 不起作用 - 我的类的实例没有新属性.请帮助我,因为在花了一天时间之后,我不知道下一步该去哪里.

Now there is no error thrown, but foo does not work - instance of my class has no new attributes. Please help me, because I have no idea where to move next, after a day spent on that.


解决方案

我是pathos的作者.我不确定您想从上面的代码中做什么.但是,我也许可以阐明一些观点.下面是一些类似的代码:

I'm the pathos author. I'm not sure what you want to do from your code above. However, I can maybe shed some light. Here's some similar code:

>>> from pathos.multiprocessing import ProcessingPool
>>> class Bar:
...   def foo(self, name):
...     return len(str(name))
...   def boo(self, things):
...     for thing in things:
...       self.sum += self.foo(thing)
...     return self.sum
...   sum = 0
... 
>>> b = Bar()
>>> results = ProcessingPool().map(b.boo, [[12,3,456],[8,9,10],['a','b','cde']])
>>> results
[6, 4, 5]
>>> b.sum
0

那么上面发生的事情是 Bar 实例 bboo 方法在 b.boo 被传递给一个新的 python 进程,然后对每个嵌套列表进行评估.可以看到结果是正确的……len("12")+len("3")+len("456")是6,以此类推.

So what happens above, is that the boo method of the Bar instance b is called where b.boo is passed to a new python process, and then evaluated for each of the nested lists. You can see that the results are correct… len("12")+len("3")+len("456") is 6, and so on.

但是,您也可以看到,当您查看 b.sum 时,它神秘地仍然是 0.为什么 b.sum 仍然为零?好吧,multiprocessing(以及pathos.multiprocessing)所做的,就是将您通过地图传递给其他python的任何内容进行COPY过程……然后(并行)调用复制的实例并返回被调用的方法调用的任何结果.请注意,您必须返回结果,或打印它们,或记录它们,或将它们发送到文件,或以其他方式.它们无法像您预期的那样返回到原始实例,因为它不是发送到其他处理器的原始实例.实例的副本被创建,然后被丢弃——它们每个都增加了它们的 sum 属性,但原始的 `b.sum' 没有改变.

However, you can also see that when you look at b.sum, it's mysteriously still 0. Why is b.sum still zero? Well, what multiprocessing (and thus also pathos.multiprocessing) does, is make a COPY of whatever you pass through the map to the other python process… and then the copied instance is then called (in parallel) and return whatever results are called by the method invoked. Note you have to RETURN results, or print them, or log them, or send them to a file, or otherwise. They can't go back to the original instance as you might expect, because it's not the original instance that's sent over to the other processors. The copies of the instance are created, then disposed of -- each of them had their sum attribute increased, but the original `b.sum' is untouched.

然而,pathos 内计划使上述工作如您所料 - 原始对象 IS 已更新,但它不起作用还是那样.

There is however, plans within pathos to make something like the above work as you might expect -- where the original object IS updated, but it doesn't work like that yet.

如果您使用 pip 进行安装,请注意最新发布的 pathos 版本已有几年历史,可能无法安装正确,或者可能无法安装所有子模块.一个新的 pathos 版本正在等待中,但在那之前,最好从 github 获取最新版本的代码,然后从那里安装.主干大部分稳定在开发中.我认为您的问题可能是由于安装中的新"pip --旧"pathos 不兼容,并非所有软件包都已安装.如果 pathos.multiprocessing 缺失,这很可能是罪魁祸首.

If you are installing with pip, note that the latest released version of pathos is several years old, and may not install correctly, or may not install all of the submodules. A new pathos release is pending, but until then, it's better to get the latest version of the code from github, and install from there. The trunk is for the most part stable under development. I think your issue may have been that not all packages were installed, due to a "new" pip -- "old" pathos incompatibility in the install. If pathos.multiprocessing is missing, this is the most likely culprit.

在此处从 github 获取 pathos:https://github.com/uqfoundation/pathos

Get pathos from github here: https://github.com/uqfoundation/pathos

相关文章