在 Python 中,如何将字符串写入远程机器上的文件?

2022-01-19 00:00:00 python rsync file ssh network-programming

问题描述

在 Machine1 上,我有一个 Python2.7 脚本,它计算 RAM 中的一个大(最多 10MB)二进制字符串,我想将它写入 Machine2 上的磁盘文件,这是一台远程机器.做这个的最好方式是什么?

On Machine1, I have a Python2.7 script that computes a big (up to 10MB) binary string in RAM that I'd like to write to a disk file on Machine2, which is a remote machine. What is the best way to do this?

约束:

  • 两台机器都是 Ubuntu 13.04.它们之间的连接速度很快——它们在同一个网络上.

  • Both machines are Ubuntu 13.04. The connection between them is fast -- they are on the same network.

Machine2 上可能尚不存在目标目录,因此可能需要创建它.

The destination directory might not yet exist on Machine2, so it might need to be created.

如果这很容易,我想避免将字符串从 RAM 写入 Machine1 上的临时磁盘文件.这是否消除了可能使用系统调用 rsync 的解决方案?

If it's easy, I would like to avoid writing the string from RAM to a temporary disk file on Machine1. Does that eliminate solutions that might use a system call to rsync?

因为字符串是二进制的,它可能包含可以解释为换行符的字节.这似乎排除了可能对 Machine2 上的 echo 命令使用系统调用的解决方案.

Because the string is binary, it might contain bytes that could be interpreted as a newline. This would seem to rule out solutions that might use a system call to the echo command on Machine2.

我希望它在 Machine2 上尽可能轻量级.因此,我想避免在 Machine2 上运行 ftp 等服务或在那里进行其他配置活动.另外,我不太了解安全性,因此希望避免打开其他端口,除非确实有必要.

I would like this to be as lightweight on Machine2 as possible. Thus, I would like to avoid running services like ftp on Machine2 or engage in other configuration activities there. Plus, I don't understand security that well, and so would like to avoid opening additional ports unless truly necessary.

我在 Machine1 和 Machine2 上设置了 ssh 密钥,并希望将它们用于身份验证.

I have ssh keys set up on Machine1 and Machine2, and would like to use them for authentication.

Machine1 正在运行多个线程,因此多个线程可能会在重叠时间尝试写入 Machine2 上的同一文件.我不介意在这种情况下将文件写入两次(或多次)导致的效率低下,但 Machine2 上的结果数据文件不应被同时写入损坏.也许需要 Machine2 上的操作系统锁定?

Machine1 is running multiple threads, and so it is possible that more than one thread could attempt to write to the same file on Machine2 at overlapping times. I do not mind the inefficiency caused by having the file written twice (or more) in this case, but the resulting datafile on Machine2 should not be corrupted by simultaneous writes. Maybe an OS lock on Machine2 is needed?

我支持 rsync 解决方案,因为它是一个独立的实体,我对它的理解相当好,并且不需要在 Machine2 上进行配置.

I'm rooting for an rsync solution, since it is a self-contained entity that I understand reasonably well, and requires no configuration on Machine2.


解决方案

使用 subprocess.Popen 打开一个新的 SSH 进程到 Machine2,然后将数据写入它的 STDIN.

You open a new SSH process to Machine2 using subprocess.Popen and then you write your data to its STDIN.

import subprocess

cmd = ['ssh', 'user@machine2',
       'mkdir -p output/dir; cat - > output/dir/file.dat']

p = subprocess.Popen(cmd, stdin=subprocess.PIPE)

your_inmem_data = 'foobarbaz' * 1024 * 1024

for chunk_ix in range(0, len(your_inmem_data), 1024):
    chunk = your_inmem_data[chunk_ix:chunk_ix + 1024]
    p.stdin.write(chunk)

我刚刚验证它可以像宣传的那样工作并复制所有 10485760 个虚拟字节.

I've just verified that it works as advertised and copies all of the 10485760 dummy bytes.

PS 一个可能更简洁/更优雅的解决方案是让 Python 程序将其输出写入 sys.stdout 并进行管道传输到 ssh 外部:

P.S. A potentially cleaner/more elegant solution would be to have the Python program write its output to sys.stdout instead and do the piping to ssh externally:

$ python process.py | ssh <the same ssh command>

相关文章