在 Python 中通过 HTTP 将未知大小的数据从客户端流式传输到服务器

2022-01-09 00:00:00 python http upload

问题描述

不幸的是,我之前的问题因为是一个问题的精确副本"而被关闭,而它绝对不是,特此再次说明.

As unfortunately my previous question got closed for being an "exact copy" of a question while it definitely IS NOT, hereby again.

它不是 Python: HTTP Post a large file带流媒体

那个处理流式传输一个大文件;我想将文件的任意块一个接一个地发送到同一个 http 连接.所以我有一个 20 MB 的文件,我想做的是打开一个 HTTP 连接,然后发送 1 MB,再发送 1 MB,等等,直到完成.使用相同的连接,因此服务器会看到一个 20 MB 的块出现在该连接上.

That one deals with streaming a big file; I want to send arbitrary chunks of a file one by one to the same http connection. So I have a file of say 20 MB, and what I want to do is open an HTTP connection, then send 1 MB, send another 1 MB, etc, until it's complete. Using the same connection, so the server sees a 20 MB chunk appear over that connection.

映射文件是我也打算做的,但是当从标准输入读取数据时这不起作用.主要针对第二种情况,我正在寻找这种逐部分提供的数据.

Mmapping a file is what I ALSO intend to do, but that does not work when the data is read from stdin. And primarily for that second case I an looking for this part-by-part feeding of data.

老实说,我想知道它是否可以完成 - 如果不能,我想知道,然后可以关闭问题.但如果能做到,那怎么做到呢?

Honestly I wonder whether it can be done at all - if not, I'd like to know, then can close the issue. But if it can be done, how could it be done?


解决方案

从客户的角度来看,这很容易.你可以使用httplib的底层接口——putrequest, putheader, endheaderssend — 以任意大小的块向服务器发送您想要的任何内容.

From the client’s perspective, it’s easy. You can use httplib’s low-level interface—putrequest, putheader, endheaders, and send—to send whatever you want to the server in chunks of any size.

但您还需要指明文件的结束位置.

But you also need to indicate where your file ends.

如果您事先知道文件的总大小,您可以简单地包含 Content-Length 标头,服务器将在这么多字节后停止读取您的请求正文.代码可能如下所示.

If you know the total size of the file in advance, you can simply include the Content-Length header, and the server will stop reading your request body after that many bytes. The code may then look like this.

import httplib
import os.path

total_size = os.path.getsize('/path/to/file')
infile = open('/path/to/file')
conn = httplib.HTTPConnection('example.org')
conn.connect()
conn.putrequest('POST', '/upload/')
conn.putheader('Content-Type', 'application/octet-stream')
conn.putheader('Content-Length', str(total_size))
conn.endheaders()
while True:
    chunk = infile.read(1024)
    if not chunk:
        break
    conn.send(chunk)
resp = conn.getresponse()

如果事先不知道总大小,理论上的答案是分块传输编码.问题是,虽然它被广泛用于响应,但对于请求来说似乎不太受欢迎(尽管定义一样).普通的 HTTP 服务器可能无法开箱即用地处理它.但是,如果服务器也在您的控制之下,您可以尝试手动解析请求正文中的块并将它们重新组装到原始文件中.

If you don’t know the total size in advance, the theoretical answer is the chunked transfer encoding. Problem is, while it is widely used for responses, it seems less popular (although just as well defined) for requests. Stock HTTP servers may not be able to handle it out of the box. But if the server is under your control too, you could try manually parsing the chunks from the request body and reassembling them into the original file.

另一种选择是通过同一连接将每个块作为单独的请求(使用 Content-Length)发送.但是您仍然需要在服务器上实现自定义逻辑.此外,您需要在请求之间保持状态.

Another option is to send each chunk as a separate request (with Content-Length) over the same connection. But you still need to implement custom logic on the server. Moreover, you need to persist state between requests.

添加于 2012-12-27. 有 一个 nginx 模块 可以转换将请求分块为常规请求.只要您不需要真正的流式传输(在客户端完成发送之前开始处理请求),这可能会有所帮助.

Added 2012-12-27. There’s an nginx module that converts chunked requests into regular ones. May be helpful so long as you don’t need true streaming (start handling the request before the client is done sending it).

相关文章