Python:subprocess.call,stdout to file,stderr to file,在屏幕上实时显示stderr

2022-01-18 00:00:00 python subprocess stderr

问题描述

我有一个命令行工具(实际上是几个),我正在用 Python 编写一个包装器.

I have a command line tool (actually, several) that I am writing a wrapper for in Python.

工具一般是这样使用的:

The tool is generally used like this:

 $ path_to_tool -option1 -option2 > file_out

用户将输出写入 file_out,并且还能够在工具运行时看到各种状态消息.

The user gets the output written to file_out, and is also able to see various status messages of the tool as it is running.

我想复制此行为,同时还将 stderr(状态消息)记录到文件中.

I want to replicate this behavior, while also logging stderr (the status messages) to a file.

我拥有的是这样的:

from subprocess import call
call(['path_to_tool','-option1','option2'], stdout = file_out, stderr = log_file)

除了 stderr 没有写入屏幕之外,这工作正常.当然,我可以添加代码以将 log_file 的内容打印到屏幕上,但是用户将在一切完成后看到它,而不是在它发生时看到它.

This works fine EXCEPT that stderr is not written to the screen. I can add code to print the contents of the log_file to the screen of course, but then the user will see it after everything is done rather than while it is happening.

回顾一下,期望的行为是:

To recap, desired behavior is:

  1. 使用 call() 或 subprocess()
  2. 将标准输出定向到文件
  3. 将 stderr 直接写入文件,同时将 stderr 实时写入屏幕,就好像工具已直接从命令行调用.

我有一种感觉,要么我错过了一些非常简单的东西,要么这比我想象的要复杂得多...感谢您的帮助!

I have a feeling I'm either missing something really simple, or this is much more complicated than I thought...thanks for any help!

这只需要在 Linux 上工作.

this only needs to work on Linux.


解决方案

你可以用 subprocess 来做这件事,但这并不简单.如果您查看文档中的 Frequently Used Arguments,您会看到您可以将 PIPE 作为 stderr 参数传递,这会创建一个新管道,将管道的一侧传递给子进程,并使另一侧侧可用作 stderr 属性.*

You can do this with subprocess, but it's not trivial. If you look at the Frequently Used Arguments in the docs, you'll see that you can pass PIPE as the stderr argument, which creates a new pipe, passes one side of the pipe to the child process, and makes the other side available to use as the stderr attribute.*

因此,您需要维护该管道,写入屏幕和文件.一般来说,为此获得正确的细节非常棘手.**在您的情况下,只有一个管道,并且您计划同步维护它,所以还不错.

So, you will need to service that pipe, writing to the screen and to the file. In general, getting the details right for this is very tricky.** In your case, there's only one pipe, and you're planning on servicing it synchronously, so it's not that bad.

import subprocess
proc = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=file_out, stderr=subprocess.PIPE)
for line in proc.stderr:
    sys.stdout.write(line)
    log_file.write(line)
proc.wait()

(请注意,在 proc.stderr 中使用 for line 存在一些问题: - 基本上,如果您正在阅读的内容因任何原因没有被行缓冲,您可以坐下来等待换行,即使实际上有半行数据要处理.您可以使用 read(128) 甚至 read(1)<一次读取块/code>,如果需要的话可以更顺利地获取数据.如果你需要真正得到每个字节,一旦它到达,并且负担不起 read(1) 的成本,你会需要将管道置于非阻塞模式并异步读取.)

(Note that there are some issues using for line in proc.stderr:—basically, if what you're reading turns out not to be line-buffered for any reason, you can sit around waiting for a newline even though there's actually half a line worth of data to process. You can read chunks at a time with, say, read(128), or even read(1), to get the data more smoothly if necessary. If you need to actually get every byte as soon as it arrives, and can't afford the cost of read(1), you'll need to put the pipe in non-blocking mode and read asynchronously.)

但如果你在 Unix 上,使用 tee 命令可能会更简单.

But if you're on Unix, it might be simpler to use the tee command to do it for you.

对于一个快速而肮脏的解决方案,您可以使用外壳通过它进行管道传输.像这样的:

For a quick&dirty solution, you can use the shell to pipe through it. Something like this:

subprocess.call('path_to_tool -option1 option2 2|tee log_file 1>2', shell=True,
                stdout=file_out)

但我不想调试 shell 管道;让我们在 Python 中完成,如 在文档中:

But I don't want to debug shell piping; let's do it in Python, as shown in the docs:

tool = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=file_out, stderr=subprocess.PIPE)
tee = subprocess.Popen(['tee', 'log_file'], stdin=tool.stderr)
tool.stderr.close()
tee.communicate()

<小时>

最后,PyPI 上的子进程和/或 shell 周围有十几个或更多更高级别的包装器——shshellshell_commandshelloutiterpipessargecmd_utilscommandwrapper 等.对于shell"、subprocess"、process"、command line"等,找到一个你喜欢的,让问题变得微不足道.


Finally, there are a dozen or more higher-level wrappers around subprocesses and/or the shell on PyPI—sh, shell, shell_command, shellout, iterpipes, sarge, cmd_utils, commandwrapper, etc. Search for "shell", "subprocess", "process", "command line", etc. and find one you like that makes the problem trivial.

如果您需要同时收集 stderr 和 stdout 怎么办?

What if you need to gather both stderr and stdout?

正如 Sven Marnach 在评论中所建议的那样,简单的方法就是将一个重定向到另一个.只需像这样更改 Popen 参数:

The easy way to do it is to just redirect one to the other, as Sven Marnach suggests in a comment. Just change the Popen parameters like this:

tool = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

然后在您使用 tool.stderr 的任何地方,都使用 tool.stdout 代替——例如,对于最后一个示例:

And then everywhere you used tool.stderr, use tool.stdout instead—e.g., for the last example:

tee = subprocess.Popen(['tee', 'log_file'], stdin=tool.stdout)
tool.stdout.close()
tee.communicate()

但这有一些权衡.最明显的是,将两个流混合在一起意味着您不能将 stdout 记录到 file_out 并将 stderr 记录到 log_file,或者将 stdout 复制到您的 stdout 并将 stderr 复制到您的 stderr.但这也意味着排序可能是不确定的——如果子进程总是在向 stdout 写入任何内容之前向 stderr 写入两行,那么一旦混合流,您最终可能会在这两行之间得到一堆 stdout.这意味着它们必须共享 stdout 的缓冲模式,所以如果您依赖 linux/glibc 保证 stderr 是行缓冲的事实(除非子进程显式更改它),那可能不再正确.

But this has some tradeoffs. Most obviously, mixing the two streams together means you can't log stdout to file_out and stderr to log_file, or copy stdout to your stdout and stderr to your stderr. But it also means the ordering can be non-deterministic—if the subprocess always writes two lines to stderr before writing anything to stdout, you might end up getting a bunch of stdout between those two lines once you mix the streams. And it means they have to share stdout's buffering mode, so if you were relying on the fact that linux/glibc guarantees stderr to be line-buffered (unless the subprocess explicitly changes it), that may no longer be true.

如果您需要分别处理这两个过程,则会变得更加困难.之前,我说过,只要您只有一根管道并且可以同步维护它,就可以轻松地在运行中维护管道.如果你有两个管道,那显然不再正确.想象一下,您正在等待 tool.stdout.read(),而新数据来自 tool.stderr.如果数据过多,可能会导致管道溢出和子进程阻塞.但即使没有发生这种情况,您显然也无法读取和记录 stderr 数据,直到从 stdout 中输入内容.

If you need to handle the two processes separately, it gets more difficult. Earlier, I said that servicing the pipe on the fly is easy as long as you only have one pipe and can service it synchronously. If you have two pipes, that's obviously no longer true. Imagine you're waiting on tool.stdout.read(), and new data comes in from tool.stderr. If there's too much data, it can cause the pipe to overflow and the subprocess to block. But even if that doesn't happen, you obviously won't be able to read and log the stderr data until something comes in from stdout.

如果您使用 pipe-through-tee 解决方案,则可以避免最初的问题……但只能通过创建一个同样糟糕的新项目来解决.您有两个 tee 实例,当您在一个实例上调用 communicate 时,另一个实例一直在等待.

If you use the pipe-through-tee solution, that avoids the initial problem… but only by creating a new project that's just as bad. You have two tee instances, and while you're calling communicate on one, the other one is sitting around waiting forever.

因此,无论哪种方式,您都需要某种异步机制.您可以使用线程、select 反应器、gevent 之类的东西来做到这一点.

So, either way, you need some kind of asynchronous mechanism. You can do this is with threads, a select reactor, something like gevent, etc.

这是一个快速而肮脏的例子:

Here's a quick and dirty example:

proc = subprocess.Popen(['path_to_tool', '-option1', 'option2'],
                        stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def tee_pipe(pipe, f1, f2):
    for line in pipe:
        f1.write(line)
        f2.write(line)
t1 = threading.Thread(target=tee_pipe, args=(proc.stdout, file_out, sys.stdout))
t2 = threading.Thread(target=tee_pipe, args=(proc.stderr, log_file, sys.stderr))
t3 = threading.Thread(proc.wait)
t1.start(); t2.start(); t3.start()
t1.join(); t2.join(); t3.join()

但是,在某些极端情况下这不起作用.(问题是 SIGCHLD 和 SIGPIPE/EPIPE/EOF 到达的顺序.我认为这些都不会影响我们,因为我们没有发送任何输入……但不要不假思索就相信我通过和/或测试.)subprocess.communicate<3.3+ 中的/code> 函数可以正确处理所有繁琐的细节.但是您可能会发现使用您可以在 PyPI 和 ActiveState 上找到的异步子流程包装器实现之一,甚至是来自像 Twisted 这样成熟的异步框架的子流程的东西会简单得多.

However, there are some edge cases where that won't work. (The problem is the order in which SIGCHLD and SIGPIPE/EPIPE/EOF arrive. I don't think any of that will affect us here, since we're not sending any input… but don't trust me on that without thinking it through and/or testing.) The subprocess.communicate function from 3.3+ gets all the fiddly details right. But you may find it a lot simpler to use one of the async-subprocess wrapper implementations you can find on PyPI and ActiveState, or even the subprocess stuff from a full-fledged async framework like Twisted.

* 文档并没有真正解释管道是什么,几乎就像他们希望你是一个老 Unix C 手一样……但是一些例子,尤其是在 用 subprocess 模块 部分替换旧函数,展示它们的使用方法,非常简单.

* The docs don't really explain what pipes are, almost as if they expect you to be an old Unix C hand… But some of the examples, especially in the Replacing Older Functions with the subprocess Module section, show how they're used, and it's pretty simple.

** 困难的部分是正确地对两个或多个管道进行排序.如果您在一个管道上等待,另一个可能会溢出并阻塞,从而阻止您对另一个管道的等待完成.解决这个问题的唯一简单方法是创建一个线程来服务每个管道.(在大多数 *nix 平台上,您可以使用 selectpoll 反应器,但要实现跨平台非常困难.) 模块的源代码,尤其是 communicate 及其助手,展示了如何去做吧.(我链接到 3.3,因为在早期版本中,communicate 本身会出错一些重要的事情……)这就是为什么在需要时尽可能使用 communicate比一根管子.在您的情况下,您不能使用 communicate,但幸运的是您不需要多个管道.

** The hard part is sequencing two or more pipes properly. If you wait on one pipe, the other may overflow and block, preventing your wait on the other one from ever finishing. The only easy way to get around this is to create a thread to service each pipe. (On most *nix platforms, you can use a select or poll reactor instead, but making that cross-platform is amazingly difficult.) The source to the module, especially communicate and its helpers, shows how to do it. (I linked to 3.3, because in earlier versions, communicate itself gets some important things wrong…) This is why, whenever possible, you want to use communicate if you need more than one pipe. In your case, you can't use communicate, but fortunately you don't need more than one pipe.

相关文章