从子进程中实时捕获标准输出

2022-01-18 00:00:00 python subprocess stdout

问题描述

我想在 Windows 中 subprocess.Popen() rsync.exe,并在 Python 中打印标准输出.

I want to subprocess.Popen() rsync.exe in Windows, and print the stdout in Python.

我的代码可以工作,但在文件传输完成之前它无法捕捉进度!我想实时打印每个文件的进度.

My code works, but it doesn't catch the progress until a file transfer is done! I want to print the progress for each file in real time.

现在使用 Python 3.1,因为我听说它应该更好地处理 IO.

Using Python 3.1 now since I heard it should be better at handling IO.

import subprocess, time, os, sys

cmd = "rsync.exe -vaz -P source/ dest/"
p, line = True, 'start'


p = subprocess.Popen(cmd,
                     shell=True,
                     bufsize=64,
                     stdin=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

for line in p.stdout:
    print(">>> " + str(line.rstrip()))
    p.stdout.flush()


解决方案

subprocess的一些经验法则.

  • 从不使用 shell=True.它不必要地调用一个额外的 shell 进程来调用您的程序.
  • 调用进程时,参数作为列表传递.python 中的 sys.argv 是一个列表,C 中的 argv 也是如此.所以你将 list 传递给 Popen 调用子进程,而不是字符串.
  • 不阅读时不要将 stderr 重定向到 PIPE.
  • 当你不写的时候不要重定向 stdin.
  • Never use shell=True. It needlessly invokes an extra shell process to call your program.
  • When calling processes, arguments are passed around as lists. sys.argv in python is a list, and so is argv in C. So you pass a list to Popen to call subprocesses, not a string.
  • Don't redirect stderr to a PIPE when you're not reading it.
  • Don't redirect stdin when you're not writing to it.

例子:

import subprocess, time, os, sys
cmd = ["rsync.exe", "-vaz", "-P", "source/" ,"dest/"]

p = subprocess.Popen(cmd,
                     stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT)

for line in iter(p.stdout.readline, b''):
    print(">>> " + line.rstrip())

也就是说,当 rsync 检测到它连接到管道而不是终端时,它可能会缓冲其输出.这是默认行为 - 当连接到管道时,程序必须显式刷新标准输出以获得实时结果,否则标准 C 库将缓冲.

That said, it is probable that rsync buffers its output when it detects that it is connected to a pipe instead of a terminal. This is the default behavior - when connected to a pipe, programs must explicitly flush stdout for realtime results, otherwise standard C library will buffer.

要对此进行测试,请尝试运行它:

To test for that, try running this instead:

cmd = [sys.executable, 'test_out.py']

并创建一个包含以下内容的 test_out.py 文件:

and create a test_out.py file with the contents:

import sys
import time
print ("Hello")
sys.stdout.flush()
time.sleep(10)
print ("World")

执行该子进程应该给您Hello"并等待 10 秒,然后再给World".如果上面的 python 代码而不是 rsync 发生这种情况,这意味着 rsync 本身正在缓冲输出,所以你不走运.

Executing that subprocess should give you "Hello" and wait 10 seconds before giving "World". If that happens with the python code above and not with rsync, that means rsync itself is buffering output, so you are out of luck.

一种解决方案是直接连接到 pty,使用类似 pexpect 之类的东西.

A solution would be to connect direct to a pty, using something like pexpect.

相关文章