使用 subprocess.Popen 处理大输出

2022-01-18 00:00:00 python subprocess

问题描述

我有一些 Python 代码可以执行外部应用程序,当应用程序的输出量很小时,它可以正常工作,但在输出量很大时会挂起.我的代码如下:

I have some Python code that executes an external app which works fine when the app has a small amount of output, but hangs when there is a lot. My code looks like:

p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
errcode = p.wait()
retval = p.stdout.read()
errmess = p.stderr.read()
if errcode:
    log.error('cmd failed <%s>: %s' % (errcode,errmess))

文档中的注释似乎表明存在潜在问题.等待中,有:

There are comments in the docs that seem to indicate the potential issue. Under wait, there is:

警告:如果子进程向 stdoutstderr 管道生成足够的输出,从而阻塞等待 OS 管道缓冲区接受更多数据,这将死锁.使用 communicate() 来避免这种情况.

Warning: This will deadlock if the child process generates enough output to a stdout or stderr pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

虽然在交流中,但我明白了:

though under communicate, I see:

注意读取的数据是缓存在内存中的,所以如果数据量很大或者没有限制,不要使用这种方法.

Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

所以我不清楚如果我有大量数据,我应该使用其中任何一个.他们没有说明在这种情况下我应该使用什么方法.

So it is unclear to me that I should use either of these if I have a large amount of data. They don't indicate what method I should use in that case.

我确实需要 exec 的返回值并解析和使用 stdoutstderr.

I do need the return value from the exec and do parse and use both the stdout and stderr.

那么,在 Python 中,执行具有大量输出的外部应用程序的等效方法是什么?

So what is an equivalent method in Python to exec an external app that is going to have large output?


解决方案

您正在对两个文件进行阻塞读取;第一个需要在第二个开始之前完成.如果应用程序向 stderr 写入大量内容,而对 stdout 没有任何内容,那么您的进程将等待 stdout 上未到来的数据,而您正在运行的程序坐在那里等待它写入​​ stderr 的内容被读取(它永远不会 - 因为您正在等待 stdout).

You're doing blocking reads to two files; the first needs to complete before the second starts. If the application writes a lot to stderr, and nothing to stdout, then your process will sit waiting for data on stdout that isn't coming, while the program you're running sits there waiting for the stuff it wrote to stderr to be read (which it never will be--since you're waiting for stdout).

有几种方法可以解决此问题.

There are a few ways you can fix this.

最简单的就是不拦截stderr;离开 stderr=None.错误会直接输出到stderr.您不能拦截它们并将它们显示为您自己的消息的一部分.对于命令行工具,这通常是可以的.对于其他应用,这可能是个问题.

The simplest is to not intercept stderr; leave stderr=None. Errors will be output to stderr directly. You can't intercept them and display them as part of your own message. For commandline tools, this is often OK. For other apps, it can be a problem.

另一种简单的方法是将 stderr 重定向到 stdout,因此您只有一个传入文件:设置 stderr=STDOUT.这意味着您无法区分常规输出和错误输出.这可能会也可能不会接受,具体取决于应用程序如何写入输出.

Another simple approach is to redirect stderr to stdout, so you only have one incoming file: set stderr=STDOUT. This means you can't distinguish regular output from error output. This may or may not be acceptable, depending on how the application writes output.

完整而复杂的处理方法是 select (http://docs.python.org/library/select.html).这使您可以以非阻塞方式读取:只要数据出现在 stdoutstderr 上,您就会获取数据.如果真的有必要,我只会推荐这个.这可能不适用于 Windows.

The complete and complicated way of handling this is select (http://docs.python.org/library/select.html). This lets you read in a non-blocking way: you get data whenever data appears on either stdout or stderr. I'd only recommend this if it's really necessary. This probably doesn't work in Windows.

相关文章