通过 PHP 或 Apache 从服务器端上传 HTTP 文件

2021-12-21 00:00:00 http file-upload node.js php mod-rewrite

当上传大文件(>100M)到服务器时,PHP 总是首先接受来自浏览器的整个数据 POST.我们无法注入上传过程.

When uploading big file (>100M) to server, PHP always accept entire data POST from browser first. We cannot inject into the process of uploading.

例如,在我的 PHP 代码中,在将整个数据发送到服务器之前检查token"的值是IMPOSSIBLE:

For example, check the value of "token" before entire data send to server is IMPOSSIBLE in my PHP code:

<form enctype="multipart/form-data" action="upload.php?token=XXXXXX" method="POST">
    <input type="hidden" name="MAX_FILE_SIZE" value="3000000" />
    Send this file: <input name="userfile" type="file" />
    <input type="submit" value="Send File" />
</form>

所以我尝试像这样使用 mod_rewrite:

So I've try to use mod_rewrite like this:

RewriteEngine On
RewriteMap mymap prg:/tmp/map.php
RewriteCond %{QUERY_STRING} ^token=(.*)$ [NC]
RewriteRule ^/upload/fake.php$ ${mymap:%1} [L]

map.php

#!/usr/bin/php
<?php
define("REAL_TARGET", "/upload/real.php
");
define("FORBIDDEN", "/upload/forbidden.html
");

$handle = fopen ("php://stdin","r");
while($token = trim(fgets($handle))) {
file_put_contents("/tmp/map.log", $token."
", FILE_APPEND);
    if (check_token($token)) {
        echo REAL_TARGET;
    } else {
        echo FORBIDDEN;
    }
}

function check_token ($token) {//do your own security check
    return substr($token,0,4) === 'alix';
}

但是......它再次失败.mod_rewrite 在这种情况下看起来工作太晚了.数据仍然完全传输.

But ... It fails again. mod_rewrite looks working too late in this situation. Data still transfer entirely.

然后我尝试了 Node.js,就像这样(代码片段):

Then I tried Node.js, like this (code snip):

var stream = new multipart.Stream(req);
stream.addListener('part', function(part) {
    sys.print(req.uri.params.token+"
");
    if (req.uri.params.token != "xxxx") {//check token
      res.sendHeader(200, {'Content-Type': 'text/plain'});
      res.sendBody('Incorrect token!');
      res.finish();
      sys.puts("
=> Block");
      return false;
    }

结果是...失败再次.

所以请帮我找到解决这个问题的正确路径或者告诉我没有办法.

So please help me to find the correct path to resolve this issue or tell me there is no way.

相关问题:

PHP (使用 Apache 或 Nginx)在 POST 请求完成之前检查 HTTP 标头?

有人能告诉我如何让这个脚本在开始上传过程之前而不是在上传文件之后检查密码吗?

推荐答案

首先,你可以试试这段代码自己使用我为此创建的 GitHub 存储库.只需克隆存储库并运行 node header.

First of all, you can try this code yourself using the GitHub repo I created for this. Just clone the repository and run node header.

(剧透,如果您正在阅读本文,并且在时间压力下需要做某事而没有心情学习(:(),最后有一个更简单的解决方案)

(Spoiler, if you're reading this and are under time pressure to get something to work and not in the mood to learn ( :( ), there is a simpler solution at the end)

这是一个很好的问题.您所要求的是很有可能并且不需要客户端,只需更深入地了解 HTTP 协议的工作原理,同时展示 node.js 的运行方式:)

This is a great question. What you are asking for is very possible and no clientside is needed, just a deeper understanding of how the HTTP protocol works while showing how node.js rocks :)

如果我们深入了解底层 TCP 协议 和针对这种特定情况,我们自己处理 HTTP 请求.Node.js 让您可以使用内置的 net 模块轻松完成此操作.

This can be made easy if we go one level deeper to the underlying TCP protocol and process the HTTP requests ourselves for this specific case. Node.js lets you do this easily using the built in net module.

首先,让我们看看 HTTP 请求是如何工作的.

First, let's look at how HTTP requests work.

一个 HTTP 请求包含在由 CRLF ( ) 分隔的键值对的一般格式.我们知道,当我们到达一个双 CRLF(即 )时,header 部分就结束了.

An HTTP request consists of a headers section in the general format of key:value pairs seperated by CRLF ( ). We know that the header section ended when we reach a double CRLF (that is ).

典型的 HTTP GET 请求可能如下所示:

A typical HTTP GET request might look something like this:

GET /resource HTTP/1.1  
Cache-Control: no-cache  
User-Agent: Mozilla/5.0 

Hello=World&stuff=other

空行"之前的顶部是标题部分,底部是请求的正文.您的请求在 body 部分看起来会有所不同,因为它是用 multipart/form-data 编码的,但标头将保持相似让我们探索这如何适用于我们.

The top part before the 'empty line' is the headers section and the bottom part is the body of the request. Your request will look a bit differently in the body section since it is encoded with multipart/form-data but the header will remain similarLet's explore how this applies to us.

我们可以在 TCP 中监听原始请求并读取我们得到的数据包,直到我们读取我们谈到的双重 crlf.然后我们将检查我们已经拥有的短标题部分,以进行我们需要的任何验证.在我们这样做之后,如果验证没有通过(例如通过简单地结束 TCP 连接),我们可以结束请求,或者通过它.这允许我们不接收或读取请求正文,而只接收更小的标头.

We can listen to the raw request in TCP and read the packets we get until we read that double crlf we talked about. Then we will check the short header section which we already have for whatever validation we need. After we do that, we can either end the request if validation did not pass (For example by simply ending the TCP connection), or pass it through. This allows us to not receive or read the request body, but just the headers which are much smaller.

将其嵌入到现有应用程序中的一种简单方法是将来自它的请求代理到特定用例的实际 HTTP 服务器.

One easy way to embed this into an already existing application is to proxy requests from it to the actual HTTP server for the specific use case.

这个解决方案是最简单的.这只是一个建议.

This solution is as bare bones as it gets. It is just a suggestion.

这是工作流程:

  1. 我们需要 node.js 中的 net 模块,它允许我们在 node.js 中创建 tcp 服务器

  1. We require the net module in node.js which allows us to create tcp servers in node.js

使用 net 模块创建一个 TCP 服务器,它将监听数据:var tcpServer = net.createServer(function (socket) {... .别忘了告诉它监听正确的端口

Create a TCP server using the net module which will listen to data: var tcpServer = net.createServer(function (socket) {... . Don't forget to tell it to listen to the correct port

  • 在该回调中,侦听数据事件 socket.on("data",function(data){ ,每当数据包到达时就会触发.
  • 从 'data' 事件中读取传递缓冲区的数据,并将其存储在一个变量中
  • 检查双 CRLF,这确保请求 HEADER 部分已经结束 根据HTTP 协议
  • 假设验证是一个标头(用你的话来说是令牌)在解析只是标头后检查它,(也就是说,我们得到了双 CRLF).这在检查内容长度标头时也有效.
  • 如果您发现标头未检出,请调用 socket.end() 以关闭连接.
    • Inside that callback, listen to data events socket.on("data",function(data){ , which will trigger whenever a packet arrives.
    • read the data of the passed buffer from the 'data' event, and store that in a variable
    • check for double CRLF, this ensures that the request HEADER section has ended according to the HTTP protocol
    • Assuming that the validation is a header (token in your words) check it after parsing just the headers , (that is, we got the double CRLF). This also works when checking for the content-length header.
    • If you notice that the headers don't check out, call socket.end() which will close the connection.
    • 读取标题的方法:

      function readHeaders(headers) {
          var parsedHeaders = {};
          var previous = "";    
          headers.forEach(function (val) {
              // check if the next line is actually continuing a header from previous line
              if (isContinuation(val)) {
                  if (previous !== "") {
                      parsedHeaders[previous] += decodeURIComponent(val.trimLeft());
                      return;
                  } else {
                      throw new Exception("continuation, but no previous header");
                  }
              }
      
              // parse a header that looks like : "name: SP value".
              var index = val.indexOf(":");
      
              if (index === -1) {
                  throw new Exception("bad header structure: ");
              }
      
              var head = val.substr(0, index).toLowerCase();
              var value = val.substr(index + 1).trimLeft();
      
              previous = head;
              if (value !== "") {
                  parsedHeaders[head] = decodeURIComponent(value);
              } else {
                  parsedHeaders[head] = null;
              }
          });
          return parsedHeaders;
      };
      

      一种检查数据事件缓冲区中双 CRLF 的方法,如果它存在于对象中,则返回其位置:

      A method for checking double CRLF in a buffer you get on a data event, and return its location if it exists in an object:

      function checkForCRLF(data) {
          if (!Buffer.isBuffer(data)) {
              data = new Buffer(data,"utf-8");
          }
          for (var i = 0; i < data.length - 1; i++) {
              if (data[i] === 13) { //
                  if (data[i + 1] === 10) { //
      
                      if (i + 3 < data.length && data[i + 2] === 13 && data[i + 3] === 10) {
                          return { loc: i, after: i + 4 };
                      }
                  }
              } else if (data[i] === 10) { //
      
      
                  if (data[i + 1] === 10) { //
      
                      return { loc: i, after: i + 2 };
                  }
              }
          }    
          return { loc: -1, after: -1337 };
      };
      

      还有这个小实用方法:

      function isContinuation(str) {
          return str.charAt(0) === " " || str.charAt(0) === "	";
      }
      

      实施

      var net = require("net"); // To use the node net module for TCP server. Node has equivalent modules for secure communication if you'd like to use HTTPS
      
      //Create the server
      var server = net.createServer(function(socket){ // Create a TCP server
          var req = []; //buffers so far, to save the data in case the headers don't arrive in a single packet
          socket.on("data",function(data){
              req.push(data); // add the new buffer
              var check = checkForCRLF(data);
              if(check.loc !== -1){ // This means we got to the end of the headers!
                  var dataUpToHeaders= req.map(function(x){
                      return x.toString();//get buffer strings
                  }).join("");
                  //get data up to /r/n
                  dataUpToHeaders = dataUpToHeaders.substring(0,check.after);
                  //split by line
                  var headerList = dataUpToHeaders.trim().split("
      ");
                  headerList.shift() ;// remove the request line itself, eg GET / HTTP1.1
                  console.log("Got headers!");
                  //Read the headers
                  var headerObject = readHeaders(headerList);
                  //Get the header with your token
                  console.log(headerObject["your-header-name"]);
      
                  // Now perform all checks you need for it
                  /*
                  if(!yourHeaderValueValid){
                      socket.end();
                  }else{
                               //continue reading request body, and pass control to whatever logic you want!
                  }
                  */
      
      
              }
          });
      }).listen(8080); // listen to port 8080 for the sake of the example
      

      如果您有任何问题,请随时提问:)

      If you have any questions feel free to ask :)

      但这有什么好玩的?如果您最初跳过此处,您将不会了解 HTTP 的工作原理:)

      But what's the fun in that? If you skipped here initially, you wouldn't learn how HTTP works :)

      Node.js 有一个内置的 http 模块.由于 node.js 中的请求本质上是分块的,尤其是长请求,因此您无需更深入地了解协议即可实现相同的内容.

      Node.js has a built in http module. Since requests are chunked by nature in node.js, especially long requests, you can implement the same thing without the more advanced understanding of the protocol.

      这次我们使用http模块来创建一个http服务器

      This time, let's use the http module to create an http server

      server = http.createServer( function(req, res) { //create an HTTP server
          // The parameters are request/response objects
          // check if method is post, and the headers contain your value.
          // The connection was established but the body wasn't sent yet,
          // More information on how this works is in the above solution
          var specialRequest = (req.method == "POST") && req.headers["YourHeader"] === "YourTokenValue";
          if(specialRequest ){ // detect requests for special treatment
            // same as TCP direct solution add chunks
            req.on('data',function(chunkOfBody){
                    //handle a chunk of the message body
            });
          }else{
              res.end(); // abort the underlying TCP connection, since the request and response use the same TCP connection this will work
              //req.destroy() // destroy the request in a non-clean matter, probably not what you want.
          }
      }).listen(8080);
      

      这是基于这样一个事实,nodejs http 模块中的 request 句柄在默认情况下实际上是在发送标头后挂钩的(但没有执行任何其他操作).(在服务器模块中 , 这个在解析器模块中)

      This is based on the fact the request handle in a nodejs http module actually hooks on after the headers were sent (but nothing else was performed) by default. (this in the server module , this in the parser module)

      用户 igorw 建议使用 100 Continue 标题假设您的目标浏览器支持它.100 Continue 是一种状态代码,旨在完全按照您的意图执行:

      User igorw suggested a somewhat cleaner solution using the 100 Continue header assuming browsers you're targeting supports it. 100 Continue is a status code designed to do exactly what you're attempting to:

      100(继续)状态(见第 10.1.1 节)的目的是为了允许发送带有请求正文的请求消息的客户端确定源服务器是否愿意接受请求(基于请求头)在客户端发送请求之前身体.在某些情况下,它可能不合适或高度如果服务器拒绝,客户端发送正文的效率低下不看正文的消息.

      The purpose of the 100 (Continue) status (see section 10.1.1) is to allow a client that is sending a request message with a request body to determine if the origin server is willing to accept the request (based on the request headers) before the client sends the request body. In some cases, it might either be inappropriate or highly inefficient for the client to send the body if the server will reject the message without looking at the body.

      这里是:

      var http = require('http');
       
      function handle(req, rep) {
          req.pipe(process.stdout); // pipe the request to the output stream for further handling
          req.on('end', function () {
              rep.end();
              console.log('');
          });
      }
       
      var server = new http.Server();
       
      server.on('checkContinue', function (req, rep) {
          if (!req.headers['x-foo']) {
              console.log('did not have foo');
              rep.writeHead(400);
              rep.end();
              return;
          }
       
          rep.writeContinue();
          handle(req, rep);
      });
       
      server.listen(8080);
      

      您可以在此处查看示例输入/输出.这将要求您使用适当的 Expect: 标头触发请求.

      You can see sample input/output here. This would require your request to fire with the appropriate Expect: header.

相关文章