像在终端中一样近乎实时地运行命令并分别获取其标准输出和标准错误

2022-01-18 00:00:00 python subprocess pexpect tty pty

问题描述

我正在尝试在 Python 中找到一种方法来运行其他程序:

  1. 可以记录正在运行的程序的标准输出和标准错误分开.
  2. 正在运行的程序的stdout和stderr可以是近乎实时地查看,这样如果子进程挂起,用户可以看到.(即我们不等待执行完成之前将 stdout/stderr 打印给用户)
  3. 奖励标准:正在运行的程序不知道它是通过 python 运行的,因此不会做意想不到的事情(比如将其输出分块而不是实时打印,或退出,因为它需要终端查看其输出).这个小标准几乎意味着我们需要使用我认为的 pty.

这是我到目前为止所得到的......方法一:

def 方法1(命令):## subprocess.communicate() 将分别给我们标准输出和标准错误,## 但我们必须等到命令执行结束才能打印任何内容.## 这意味着如果子进程挂起,我们永远不会知道....proc=subprocess.Popen(命令,stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True,可执行文件='/bin/bash')stdout, stderr = proc.communicate() # 两者都记录,但无法实时打印 stdout/stderr打印'#########实时#########'########         不可​​能打印'##########结果##########'打印标准输出:"打印标准输出打印标准输出:"打印标准错误

方法二

def 方法2(命令):## 使用 pexpect 在 pty 中运行我们的命令,我们可以实时看到孩子的标准输出,## 但是我们看不到curl google.com"中的标准错误,大概是因为它没有连接到 pty?## 此外,除了写入文件(p.logfile)之外,我不知道如何记录它.我需要标准输出和标准错误## 作为字符串,而不是磁盘上的文件!从好的方面来说,pexpect 会提供很多额外的功能(如果它有效的话!)proc = pexpect.spawn('/bin/bash', ['-c', command])打印'#########实时#########'proc.interact()打印'##########结果##########'########         不可​​能

方法三:

def 方法3(命令):## 此方法与方法 1 非常相似,并且可以完全按预期工作## 如果只有 proc.xxx.read(1) 不会阻止等待某事.它确实如此.所以这是没用的.proc=subprocess.Popen(命令,stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=True,可执行文件='/bin/bash')打印'#########实时#########'out,err,outbuf,errbuf = '','','',''firstToSpeak = 无而 proc.poll() == 无:stdout = proc.stdout.read(1) # 块stderr = proc.stderr.read(1) # 也阻塞如果 firstToSpeak == 无:if stdout != '': firstToSpeak = 'stdout';outbuf,errbuf = 标准输出,标准错误elif stderr != '': firstToSpeak = 'stderr';outbuf,errbuf = 标准输出,标准错误别的:if (stdout != '') 或 (stderr != ''): outbuf += stdout;errbuf += 标准错误别的:输出 += 输出缓冲区;错误 += errbuf;如果 firstToSpeak == 'stdout': sys.stdout.write(outbuf+errbuf);sys.stdout.flush()否则:sys.stdout.write(errbuf+outbuf);sys.stdout.flush()firstToSpeak = 无打印 ''打印'##########结果##########'打印标准输出:"打印打印标准错误:"打印错误

要尝试这些方法,您需要 import sys,subprocess,pexpect

pexpect 是纯 python,可以使用

<块引用>

sudo pip install pexpect

我认为解决方案将涉及 python 的 pty 模块 - 这有点像我找不到任何知道如何使用的人的黑魔法.也许SO知道:)作为提示,我建议您使用curl www.google.com"作为测试命令,因为它出于某种原因在 stderr 上打印其状态:D

<小时>

UPDATE-1:
好的,所以 pty 库不适合人类消费.文档本质上是源代码.任何提出的阻塞而不是异步的解决方案都不会在这里工作.Padraic Cunningham 的 Threads/Queue 方法效果很好,尽管添加 pty 支持是不可能的——而且它是脏的"(引用 Freenode 的 #python).似乎唯一适合生产标准代码的解决方案是使用 Twisted 框架,它甚至支持 pty 作为布尔开关来运行进程,就像它们是从 shell 调用一样.但是将 Twisted 添加到项目中需要完全重写所有代码.这真是太糟糕了:/

UPDATE-2:

<块引用>

提供了两个答案,其中一个针对前两个标准,并且在您只需要标准输出和stderr 使用 线程和队列.另一个答案使用 select,一个用于读取文件描述符的非阻塞方法和 pty,一种用于读取文件描述符的方法欺骗"生成的进程相信它正在真实运行终端就像它直接从 Bash 运行一样 - 但可能会也可能不会有副作用.我希望我能接受这两个答案,因为正确"的方法真的取决于情况和你为什么子处理一开始,可惜我只能接受一个.

解决方案

正在运行的程序的stdout和stderr可以分开记录.

您不能使用 pexpect,因为 stdout 和 stderr 都指向同一个 pty,之后就无法将它们分开.

<块引用>

可以近乎实时地查看正在运行的程序的标准输出和标准错误,这样如果子进程挂起,用户可以看到.(即,在将 stdout/stderr 打印给用户之前,我们不会等待执行完成)

如果子进程的输出不是 tty,那么

<块引用>

正在运行的程序不知道它是通过 python 运行的,因此不会做意外的事情(比如将其输出分块而不是实时打印,或者因为需要终端查看其输出而退出).

看来,你的意思是相反的,如果输出被重定向到管道(当你使用 stdout=PIPE 在 Python 中).这意味着默认的 threading 或 asyncio 解决方案 在您的情况下无法正常工作.

有几种解决方法:

  • 该命令可以接受命令行参数,例如 grep --line-bufferedpython -u,以禁用块缓冲.

  • stdbuf 适用于某些程序 即,您可以运行 ['stdbuf', '-oL', '-eL'] + command 使用上面的 threading 或 asyncio 解决方案,您应该分别获得 stdout、stderr 并且行应该近乎实时地出现:

    #!/usr/bin/env python3导入操作系统导入系统从选择导入选择从子流程导入 Popen, PIPE与 Popen(['stdbuf', '-oL', '-e0', 'curl', 'www.google.com'],标准输出 = 管道,标准错误 = 管道)作为 p:可读 = {p.stdout.fileno(): sys.stdout.buffer, # 单独记录p.stderr.fileno(): sys.stderr.buffer,}虽然可读:对于选择中的 fd(可读,[],[])[0]:data = os.read(fd, 1024) # 读取可用如果不是数据:#EOF德尔可读[fd]别的:可读[fd].write(数据)可读[fd].flush()

  • 最后,您可以尝试 pty + select 解决方案,使用两个 pty:

    #!/usr/bin/env python3导入错误号导入操作系统进口pty导入系统从选择导入选择从子流程导入 Popen主人,奴隶= zip(pty.openpty(),pty.openpty())with Popen([sys.executable, '-c', r'''import sys, timeprint('stdout', 1) # 没有显式刷新时间.sleep(.5)打印('stderr',2,文件=sys.stderr)时间.sleep(.5)打印(标准输出",3)时间.sleep(.5)打印('stderr',4,文件=sys.stderr)'''],标准输入=奴隶[0],标准输出=奴隶[0],标准错误=奴隶[1]):对于奴隶中的 fd:os.close(fd) # 无输入可读 = {masters[0]: sys.stdout.buffer, # 单独记录大师[1]:sys.stderr.buffer,}虽然可读:对于选择中的 fd(可读,[],[])[0]:尝试:data = os.read(fd, 1024) # 读取可用除了 OSError 为 e:如果 e.errno != errno.EIO:提出 #XXX 清理del readable[fd] # EIO 在某些系统上意味着 EOF别的:如果不是数据:#EOF德尔可读[fd]别的:可读[fd].write(数据)可读[fd].flush()对于大师中的fd:os.close(fd)

    我不知道对 stdout、stderr 使用不同的 pty 有什么副作用.您可以尝试在您的情况下单个 pty 是否足够,例如,设置 stderr=PIPE 并使用 p.stderr.fileno() 而不是 masters[1].sh 来源中的评论表明如果stderr 不在 {STDOUT, pipe}

I am trying to find a way in Python to run other programs in such a way that:

  1. The stdout and stderr of the program being run can be logged separately.
  2. The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)
  3. Bonus criteria: The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output). This small criteria pretty much means we will need to use a pty I think.

Here is what i've got so far... Method 1:

def method1(command):
    ## subprocess.communicate() will give us the stdout and stderr sepurately, 
    ## but we will have to wait until the end of command execution to print anything.
    ## This means if the child process hangs, we will never know....
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    stdout, stderr = proc.communicate() # record both, but no way to print stdout/stderr in real-time
    print ' ######### REAL-TIME ######### '
    ########         Not Possible
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print stdout
    print 'STDOUT:'
    print stderr

Method 2

def method2(command):
    ## Using pexpect to run our command in a pty, we can see the child's stdout in real-time,
    ## however we cannot see the stderr from "curl google.com", presumably because it is not connected to a pty?
    ## Furthermore, I do not know how to log it beyond writing out to a file (p.logfile). I need the stdout and stderr
    ## as strings, not files on disk! On the upside, pexpect would give alot of extra functionality (if it worked!)
    proc = pexpect.spawn('/bin/bash', ['-c', command])
    print ' ######### REAL-TIME ######### '
    proc.interact()
    print ' ########## RESULTS ########## '
    ########         Not Possible

Method 3:

def method3(command):
    ## This method is very much like method1, and would work exactly as desired
    ## if only proc.xxx.read(1) wouldn't block waiting for something. Which it does. So this is useless.
    proc=subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True, executable='/bin/bash')
    print ' ######### REAL-TIME ######### '
    out,err,outbuf,errbuf = '','','',''
    firstToSpeak = None
    while proc.poll() == None:
            stdout = proc.stdout.read(1) # blocks
            stderr = proc.stderr.read(1) # also blocks
            if firstToSpeak == None:
                if stdout != '': firstToSpeak = 'stdout'; outbuf,errbuf = stdout,stderr
                elif stderr != '': firstToSpeak = 'stderr'; outbuf,errbuf = stdout,stderr
            else:
                if (stdout != '') or (stderr != ''): outbuf += stdout; errbuf += stderr
                else:
                    out += outbuf; err += errbuf;
                    if firstToSpeak == 'stdout': sys.stdout.write(outbuf+errbuf);sys.stdout.flush()
                    else: sys.stdout.write(errbuf+outbuf);sys.stdout.flush()
                    firstToSpeak = None
    print ''
    print ' ########## RESULTS ########## '
    print 'STDOUT:'
    print out
    print 'STDERR:'
    print err

To try these methods out, you will need to import sys,subprocess,pexpect

pexpect is pure-python and can be had with

sudo pip install pexpect

I think the solution will involve python's pty module - which is somewhat of a black art that I cannot find anyone who knows how to use. Perhaps SO knows :) As a heads-up, i recommend you use 'curl www.google.com' as a test command, because it prints its status out on stderr for some reason :D


UPDATE-1:
OK so the pty library is not fit for human consumption. The docs, essentially, are the source code. Any presented solution that is blocking and not async is not going to work here. The Threads/Queue method by Padraic Cunningham works great, although adding pty support is not possible - and it's 'dirty' (to quote Freenode's #python). It seems like the only solution fit for production-standard code is using the Twisted framework, which even supports pty as a boolean switch to run processes exactly as if they were invoked from the shell. But adding Twisted into a project requires a total rewrite of all the code. This is a total bummer :/

UPDATE-2:

Two answers were provided, one of which addresses the first two criteria and will work well where you just need both the stdout and stderr using Threads and Queue. The other answer uses select, a non-blocking method for reading file descriptors, and pty, a method to "trick" the spawned process into believing it is running in a real terminal just as if it was run from Bash directly - but may or may not have side-effects. I wish I could accept both answers, because the "correct" method really depends on the situation and why you are subprocessing in the first place, but alas, I could only accept one.

解决方案

The stdout and stderr of the program being run can be logged separately.

You can't use pexpect because both stdout and stderr go to the same pty and there is no way to separate them after that.

The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)

If the output of a subprocess is not a tty then it is likely that it uses a block buffering and therefore if it doesn't produce much output then it won't be "real time" e.g., if the buffer is 4K then your parent Python process won't see anything until the child process prints 4K chars and the buffer overflows or it is flushed explicitly (inside the subprocess). This buffer is inside the child process and there are no standard ways to manage it from outside. Here's picture that shows stdio buffers and the pipe buffer for command 1 | command2 shell pipeline:

The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output).

It seems, you meant the opposite i.e., it is likely that your child process chunks its output instead of flushing each output line as soon as possible if the output is redirected to a pipe (when you use stdout=PIPE in Python). It means that the default threading or asyncio solutions won't work as is in your case.

There are several options to workaround it:

  • the command may accept a command-line argument such as grep --line-buffered or python -u, to disable block buffering.

  • stdbuf works for some programs i.e., you could run ['stdbuf', '-oL', '-eL'] + command using the threading or asyncio solution above and you should get stdout, stderr separately and lines should appear in near-real time:

    #!/usr/bin/env python3
    import os
    import sys
    from select import select
    from subprocess import Popen, PIPE
    
    with Popen(['stdbuf', '-oL', '-e0', 'curl', 'www.google.com'],
               stdout=PIPE, stderr=PIPE) as p:
        readable = {
            p.stdout.fileno(): sys.stdout.buffer, # log separately
            p.stderr.fileno(): sys.stderr.buffer,
        }
        while readable:
            for fd in select(readable, [], [])[0]:
                data = os.read(fd, 1024) # read available
                if not data: # EOF
                    del readable[fd]
                else: 
                    readable[fd].write(data)
                    readable[fd].flush()
    

  • finally, you could try pty + select solution with two ptys:

    #!/usr/bin/env python3
    import errno
    import os
    import pty
    import sys
    from select import select
    from subprocess import Popen
    
    masters, slaves = zip(pty.openpty(), pty.openpty())
    with Popen([sys.executable, '-c', r'''import sys, time
    print('stdout', 1) # no explicit flush
    time.sleep(.5)
    print('stderr', 2, file=sys.stderr)
    time.sleep(.5)
    print('stdout', 3)
    time.sleep(.5)
    print('stderr', 4, file=sys.stderr)
    '''],
               stdin=slaves[0], stdout=slaves[0], stderr=slaves[1]):
        for fd in slaves:
            os.close(fd) # no input
        readable = {
            masters[0]: sys.stdout.buffer, # log separately
            masters[1]: sys.stderr.buffer,
        }
        while readable:
            for fd in select(readable, [], [])[0]:
                try:
                    data = os.read(fd, 1024) # read available
                except OSError as e:
                    if e.errno != errno.EIO:
                        raise #XXX cleanup
                    del readable[fd] # EIO means EOF on some systems
                else:
                    if not data: # EOF
                        del readable[fd]
                    else:
                        readable[fd].write(data)
                        readable[fd].flush()
    for fd in masters:
        os.close(fd)
    

    I don't know what are the side-effects of using different ptys for stdout, stderr. You could try whether a single pty is enough in your case e.g., set stderr=PIPE and use p.stderr.fileno() instead of masters[1]. Comment in sh source suggests that there are issues if stderr not in {STDOUT, pipe}

相关文章