来自 subprocess.run() 的 CompletedProcess 不返回字符串

2022-01-18 00:00:00 python subprocess python-3.5

问题描述

根据 Python 3.5 文档,subprocess.run() 返回一个 CompletedProcess 对象,它的 stdout 成员包含一个字节序列,或者如果 run() 被调用时使用了universal_newlines=True,则为一个字符串".我只看到一个字节序列而不是一个字符串,我假设(希望)它相当于一个文本行.例如,

According to the Python 3.5 docs, subprocess.run() returns an a CompletedProcess object with a stdout member that contains "A bytes sequence, or a string if run() was called with universal_newlines=True." I'm only seeing a byte sequence and not a string, which I was assuming (hoping) would be equivalent to a text line. For example,

import pprint
import subprocess

my_data = ""
line_count = 0

proc = subprocess.run(
         args = [ 'cat', 'input.txt' ],
         universal_newlines = True,
         stdout = subprocess.PIPE)

for text_line in proc.stdout:
    my_data += text_line
    line_count += 1

word_file = open('output.txt', 'w')
pprint.pprint(my_data, word_file)
pprint.pprint(line_count, word_file)

注意:这使用了 Python 3.5 中的一项新功能,该功能不会在以前的版本中运行.

Note: this uses a new feature in Python 3.5 that won't run in previous versions.

我是否需要创建自己的行缓冲逻辑,或者有没有办法让 Python 为我做这件事?

Do I need to create my own line buffering logic, or is there a way to get Python to do that for me?


解决方案

proc.stdout 在你的情况下已经是一个字符串,运行 print(type(proc.stdout)),以确保.它包含所有子进程的输出——subprocess.run() 直到子进程死亡才返回.

proc.stdout is already a string in your case, run print(type(proc.stdout)), to make sure. It contains all subprocess' output -- subprocess.run() does not return until the child process is dead.

for text_line in proc.stdout: 不正确:for char in text_string 枚举 Python 中的字符(Unicode 代码点),而不是行.要获取线路,请致电:

for text_line in proc.stdout: is incorrect: for char in text_string enumerates characters (Unicode codepoints) in Python, not lines. To get lines, call:

lines = result.stdout.splitlines()

如果字符串中有 Unicode 换行符,结果可能与 .split(' ') 不同.

The result may be different from .split(' ') if there are Unicode newlines in the string.

如果你想逐行读取输出(以避免长时间运行的进程耗尽内存):

If you want to read the output line by line (to avoid running out of memory for long-running processes):

from subrocess import Popen, PIPE

with Popen(command, stdout=PIPE, universal_newlines=True) as process:
    for line in process.stdout:
        do_something_with(line)

注意:process.stdout 在这种情况下是一个类似文件的对象.Popen() 不等待进程完成——Popen() 在子进程启动后立即返回.process 是一个 subprocess.Popen 实例,这里不是 CompletedProcess.

Note: process.stdout is a file-like object in this case. Popen() does not wait for the process to finish -- Popen() returns immidiately as soon as the child process is started. process is a subprocess.Popen instance, not CompletedProcess here.

如果您只需要计算输出中的行数(以 b' ' 结尾),例如 wc -l:

If all you need is to count the number of lines (terminated by b' ') in the output, like wc -l:

from functools import partial

with Popen(command, stdout=PIPE) as process:
    read_chunk = partial(process.stdout.read, 1 << 13)
    line_count = sum(chunk.count(b'
') for chunk in iter(read_chunk, b''))

请参阅为什么在 C++ 中从标准输入读取行比 Python 慢得多?

相关文章