我可以在 python 中创建共享的多数组或列表对象列表以进行多处理吗?

问题描述

我需要创建一个多维数组或列表列表的共享对象,以便其他进程可以使用它.有没有办法创建它,因为我已经看到它是不可能的.我试过了:

from multiprocessing import Process, Value, Arrayarr = Array('i', range(10))arr[:][0, 1, 2, 3, 4, 5, 6, 7, 8, 9]arr[2]=[12,43]TypeError:需要一个整数

我听说 numpy 数组可以是多数组和共享对象,如果以上不可能,有人可以告诉我如何使 numpy 数组成为共享对象吗??

解决方案

将 numpy 数组变成共享对象(完整示例):

将 ctypes 导入为 c将 numpy 导入为 np将多处理导入为 mpn, m = 2, 3mp_arr = mp.Array(c.c_double, n*m) # 共享,可以从多个进程中使用# 然后在每个新进程中创建一个新的 numpy 数组,使用:arr = np.frombuffer(mp_arr.get_obj()) # mp_arr 和 arr 共享同一个内存# 让它变成二维的b = arr.reshape((n,m)) # b 和 arr 共享同一个内存

如果您不需要 shared (如共享同一内存")对象并且仅可以从多个进程中使用的对象就足够了,那么您可以使用 multiprocessing.经理:

from multiprocessing import Process, Manager定义 f(L):row = L[0] # 取第一行row.append(10) # 改变它L[0] = row #注意:重要:将行复制回来(否则为父#process 不会看到更改)如果 __name__ == '__main__':经理=经理()lst = manager.list()lst.append([1])lst.append([2, 3])print(lst) # before: [[1], [2, 3]]p = 进程(目标=f,args=(lst,))p.start()p.join()print(lst) # after: [[1, 10], [2, 3]]

来自文档:p><块引用>

服务器进程管理器比使用共享内存更灵活对象,因为它们可以支持任意对象类型.此外,单个管理器可以由不同的进程共享通过网络的计算机.但是,它们比使用共享的要慢记忆.

I need to make a shared object of a multidimensional array or list of lists for it to be available to the other processes. Is there a way to create it as for what i have seen it is not possible. I have tried:

from multiprocessing import Process, Value, Array
arr = Array('i', range(10))
arr[:]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
arr[2]=[12,43]
TypeError: an integer is required

I heard numpy array can be multiarray and a shared object, if above is not possible can someone tell me how to make a numpy array a shared object??

解决方案

To make a numpy array a shared object (full example):

import ctypes as c
import numpy as np
import multiprocessing as mp

n, m = 2, 3
mp_arr = mp.Array(c.c_double, n*m) # shared, can be used from multiple processes
# then in each new process create a new numpy array using:
arr = np.frombuffer(mp_arr.get_obj()) # mp_arr and arr share the same memory
# make it two-dimensional
b = arr.reshape((n,m)) # b and arr share the same memory

If you don't need a shared (as in "share the same memory") object and a mere object that can be used from multiple processes is enough then you could use multiprocessing.Manager:

from multiprocessing import Process, Manager

def f(L):
    row = L[0] # take the 1st row
    row.append(10) # change it
    L[0] = row #NOTE: important: copy the row back (otherwise parent
               #process won't see the changes)

if __name__ == '__main__':
    manager = Manager()

    lst = manager.list()
    lst.append([1])
    lst.append([2, 3])
    print(lst) # before: [[1], [2, 3]]

    p = Process(target=f, args=(lst,))
    p.start()
    p.join()

    print(lst) # after: [[1, 10], [2, 3]]

From the docs:

Server process managers are more flexible than using shared memory objects because they can be made to support arbitrary object types. Also, a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.

相关文章