使用 subprocess.Popen 的 Python 内存分配错误

2022-01-18 00:00:00 python subprocess memory-management

问题描述

我正在做一些生物信息学工作.我有一个 python 脚本,它有时会调用一个程序来执行一个昂贵的过程(序列对齐......使用大量的计算能力和内存).我使用 subprocess.Popen 调用它.当我在测试用例上运行它时,它完成并完成得很好.但是,当我在完整文件上运行它时,它必须为不同的输入集多次执行此操作,它就会死掉.子进程抛出:

I am doing some bioinformatics work. I have a python script that at one point calls a program to do an expensive process (sequence alignment..uses a lot of computational power and memory). I call it using subprocess.Popen. When I run it on a testcase, it completes and finishes fine. However, when I run it on the full file, where it would have to do this multiple times for different sets of inputs, it dies. Subprocess throws:

OSError: [Errno 12] Cannot allocate memory

我找到了几个链接这里和这里和here 类似的问题,但我不确定它们是否适用于我的情况.

I found a few links here and here and here to similar problems, but I'm not sure that they apply in my case.

默认情况下,sequence aligner 会尝试请求 51000M 的内存.它并不总是使用那么多,但它可能会.加载和处理完整的输入后,没有那么多可用.但是,将它请求或将尝试使用的数量限制在运行时可能可用的较低数量仍然会给我同样的错误.我也试过用 shell=True 和同样的东西运行.

By default, the sequence aligner will try to request 51000M of memory. It doesn't always use that much, but it might. With the full input loaded and processed, that much is not available. However, capping the amount it requests or will attempt to use at a lower amount that might be available when running still gives me the same error. I've also tried running with shell=True and same thing.

这几天一直困扰着我.感谢您的帮助.

This has been bugging me for a few days now. Thanks for any help.

扩展回溯:

File "..../python2.6/subprocess.py", line 1037, in _execute_child
    self.pid=os.fork()
OSError: [Errno 12] Cannot allocate memory

抛出错误.

Edit2:在 64 位 ubuntu 10.4 上运行 python 2.6.4

Running in python 2.6.4 on 64 bit ubuntu 10.4


解决方案

对OP感到非常抱歉.6 年后,没有人提到这是 Unix 中很常见的问题,实际上与 python 或生物信息学无关.对 os.fork() 的调用暂时使父进程的内存翻倍(父进程的内存必须对子进程可用),然后将其全部丢弃以执行 exec().虽然实际上并不总是复制这个内存,但系统必须有足够的内存来允许它被复制,因此如果你的父进程正在使用超过一半的系统内存并且你的子进程甚至"wc -l",你会遇到内存错误.

I feel really sorry for the OP. 6 years later and no one mentioned that this is a very common problem in Unix, and actually has nothing to do with python or bioinformatics. A call to os.fork() temporarily doubles the memory of the parent process (the memory of the parent process must be available to the child process), before throwing it all away to do an exec(). While this memory isn't always actually copied, the system must have enough memory to allow for it to be copied, and thus if you're parent process is using more than half of the system memory and you subprocess out even "wc -l", you're going to run into a memory error.

解决方案是使用 posix_spawn,或者在脚本开头创建所有子进程,同时内存消耗较低,然后在父进程完成内存密集型操作后使用它们.

The solution is to use posix_spawn, or create all your subprocesses at the beginning of the script, while memory consumption is low, then use them later on after the parent process has done it's memory-intensive thing.

使用关键字os.fork"和memory"进行 google 搜索将显示几个关于该主题的 Stack Overflow 帖子,可以进一步解释正在发生的事情:)

A google search using the keyworks "os.fork" and "memory" will show several Stack Overflow posts on the topic that can further explain what's going on :)

相关文章