在子进程已经启动后授予对共享内存的访问权限

2022-01-12 00:00:00 python multiprocessing ipc shared-memory

问题描述

如果数据仅在子进程生成后可用(使用 multiprocessing.Process)?

How do I give child processes access to data in shared memory if the data is only available after the child processes have been spawned (using multiprocessing.Process)?

我知道 multiprocessing.sharedctypes.RawArray,但我不知道如何让我的子进程访问在进程已经启动后创建的 RawArray .

I am aware of multiprocessing.sharedctypes.RawArray, but I can't figure out how to give my child processes access to a RawArray that is created after the processes have already started.

数据由父进程生成,数据量事先不知道.

The data is generated by the parent process, and the amount of data is not known in advance.

如果不是 GIL 我会使用线程来代替这将完成这项任务简单一点.使用非 CPython 实现不是一种选择.

If not for the GIL I'd be using threading instead which will make this task a little simpler. Using a non-CPython implementation is not an option.

查看 muliprocessing.sharedctypes,看起来共享 ctype 对象被分配了 使用 mmaped 内存.

Looking under the hood of muliprocessing.sharedctypes, it looks like shared ctype objects are allocated using mmaped memory.

所以这个问题真的可以归结为:如果 mmap() 在子进程生成后被父进程调用,子进程能否访问匿名映射的内存?

So this question really boils down to: Can a child process access an anonymously mapped memory if mmap() was called by the parent after the child process was spawned?

这有点像 this问题,除了在我的例子中 mmap() 的调用者是父进程而不是子进程.

That's somewhat in the vein of what's being asked in this question, except that in my case the caller of mmap() is the parent process and not the child process.

我创建了自己的 RawArray 版本,它在底层使用了 shm_open().只要标识符(tag)匹配,生成的共享 ctypes 数组就可以与任何进程共享.

I created my own version of RawArray that uses shm_open() under the hood. The resulting shared ctypes array can be shared with any process as long as the identifier (tag) matches.

请参阅此答案 了解详细信息和示例.

See this answer for details and an example.


解决方案

您的问题听起来非常适合 posix_ipcsysv_ipc 模块,它们公开用于共享内存、信号量和消息队列的 POSIX 或 SysV API.那里的特征矩阵包括在他提供的模块中挑选的极好的建议.

Your problem sounds like a perfect fit for the posix_ipc or sysv_ipc modules, which expose either the POSIX or SysV APIs for shared memory, semaphores, and message queues. The feature matrix there includes excellent advice for picking amongst the modules he provides.

匿名 mmap(2) 区域的问题在于,您无法轻松地与其他进程共享它们——如果它们是文件支持的,这很容易,但如果您不这样做实际上需要文件来做其他事情,感觉很傻.您可以在 clone(2) 系统调用中使用 CLONE_VM 标志,如果这是在 C 中,但我不想尝试使用它带有一个可能对内存安全做出假设的语言解释器.(即使在 C 语言中也会有点危险,因为五年后的维护程序员可能也对 CLONE_VM 行为感到震惊.)

The problem with anonymous mmap(2) areas is that you cannot easily share them with other processes -- if they were file-backed, it'd be easy, but if you don't actually need the file for anything else, it feels silly. You could use the CLONE_VM flag to the clone(2) system call if this were in C, but I wouldn't want to try using it with a language interpreter that probably makes assumptions about memory safety. (It'd be a little dangerous even in C, as maintenance programmers five years from now might also be shocked by the CLONE_VM behavior.)

但是 SysV 和更新的 POSIX 共享内存映射甚至允许不相关的进程通过标识符附加和分离共享内存,因此您需要做的就是与使用映射的进程共享创建映射的进程的标识符,然后当您在映射中操作数据时,它们可同时供所有进程使用,而无需任何额外的解析开销.shm_open(3) 函数返回一个 int,在以后调用 ftruncate(2) 时用作文件描述符mmap(2),因此其他进程可以使用共享内存段,而无需在文件系统中创建文件——即使使用它的所有进程都已退出,该内存仍将持续存在.(对于 Unix 来说可能有点奇怪,但它很灵活.)

But the SysV and newer POSIX shared memory mappings allow even unrelated processes to attach and detach from shared memory by identifier, so all you need to do is share the identifier from the processes that create the mappings with the processes that consume the mappings, and then when you manipulate data within the mappings, they are available to all processes simultaneously without any additional parsing overhead. The shm_open(3) function returns an int that is used as a file descriptor in later calls to ftruncate(2) and then mmap(2), so other processes can use the shared memory segment without a file being created in the filesystem -- and this memory will persist even if all processes using it have exited. (A little strange for Unix, perhaps, but it is flexible.)

相关文章