处理此问题的最佳方法是什么:通过 PHP 进行大量下载 + 客户端连接缓慢 = 文件完全下载之前脚本超时

2022-01-24 00:00:00 connection download timeout php tracking

我的客户想要一种向用户提供下载的方法,但前提是他们必须填写注册表(基本上是姓名和电子邮件).一封带有可下载内容链接的电子邮件将发送给用户.这些链接包含对包、文件和用户唯一的注册哈希,它们实际上会转到一个 PHP 页面,该页面记录每次下载并通过将文件写入标准输出(连同适当的标头)来推送文件.此解决方案具有固有缺陷, 但这就是他们想要做的.需要说的是我努力推动他们 1.) 限制可下载文件的大小和 2.) 考虑使用 CDN (他们有国际客户但托管在美国在 2 个镜像服务器和一个使用粘性 IP 的负载平衡器上).无论如何,它对我有用",但他们的一些国际客户的连接速度非常慢(d/l 速率约为 60kB/秒),其中一些文件非常大(150 MB).由于这是为这些文件提供服务的 PHP 脚本,因此它受脚本超时设置的约束.起初我将其设置为 300 秒(5 分钟),但这对于一些 beta 用户来说还不够.于是我尝试根据文件大小除以 100kb/sec 连接来计算脚本超时,但其中一些用户甚至比这还要慢.

My client wanted a way to offer downloads to users, but only after they fill out a registration form (basically name and email). An email is sent to the user with the links for the downloadable content. The links contain a registration hash unique to the package, file, and user, and they actually go to a PHP page that logs each download and pushes the file out by writing it to stdout (along with the appropriate headers. This solution has inherent flaws, but this is how they wanted to do it. It needs to be said that I pushed them hard to 1.) limit the sizes of the downloadable files and 2.) think about using a CDN (they have international customers but are hosted in the US on 2 mirrored servers and a load balancer that uses sticky IPs). Anyway, it "works for me" but some of their international customers are on really slow connections (d/l rates of ~60kB/sec) and some of these files are pretty big (150 MB). Since this is a PHP script that is serving these files, it is bound by the script timeout setting. At first I had set this to 300 seconds (5 minutes), but this was not enough time for some of the beta users. So then I tried calculating the script timeout based on the size of the file divided by a 100kb/sec connection, but some of these users are even slower than that.

现在客户端只想提高超时值.我不想一起删除超时,以防脚本以某种方式进入无限循环.我也不想为了一些包罗万象的最低公分母连接速率(大多数人的下载速度远快于 100kb/秒)而任意推迟超时.而且我还希望能够在某个时候告诉客户看,这些文件太大而无法以这种方式处理.您正在通过这些 40 多分钟的连接影响网站其余部分的性能.我们要么需要重新考虑它们的交付方式或使用小得多的文件."

Now the client wants to just up the timeout value. I don't want to remove the timeout all together in case the script somehow gets into an infinite loop. I also don't want to keep pushing out the timeout arbitrarily for some catch-all lowest-common-denominator connection rate (most people are downloading much faster than 100kb/sec). And I also want to be able to tell the client at some point "Look, these files are too big to process this way. You are affecting the performance of the rest of the website with these 40-plus minute connections. We either need to rethink how they are delivered or use much smaller files."

我想到了几个解决方案,如下:

I have a couple of solutions in mind, which are as follows:

  1. CDN - 将文件移动到 CDN 服务,例如 Amazon 或 Google 的.我们仍然可以通过 PHP 文件记录下载尝试,然后将浏览器重定向到真实文件.这样做的一个缺点是用户可以绕过脚本并在获得 URL 后直接从 CDN 下载(可以通过查看 HTTP 标头来收集).这还不错,但不是我们想要的.
  2. 扩展服务器群 - 将服务器群从 2 台服务器扩展到 4 台以上服务器,并从负载平衡器中删除粘性 IP 规则.缺点:这些是 Windows 服务器,因此价格昂贵.它们没有理由不能成为 Linux 机器,但是设置所有新机器可能需要比客户端允许的时间更多的时间.
  3. 设置 2 台新服务器严格用于提供这些下载 - 基本上与 #2 相同的优点和缺点,除了我们至少可以将网站的其余部分与(并微调新服务器)隔离to) 这个特定的过程.我们也可以很容易地制作这些 Linux 机器.
  4. 检测用户的连接速度 - 我想到了一种方法来检测用户的当前速度,方法是在下载登陆页面上使用 AJAX 来计算下载静态文件所需的时间一个已知的文件大小,然后将该信息发送到服务器并根据该信息计算超时.这并不理想,但总比估计连接速度过高或过低要好.我不确定如何将速度信息返回到服务器,因为我们目前使用的是从服务器发送的重定向标头.
  1. CDN - move the files to a CDN service such as Amazon's or Google's. We can still log the download attempts via the PHP file, but then redirect the browser to the real file. One drawback with this is that a user could bypass the script and download directly from the CDN once they have the URL (which could be gleaned by watching the HTTP headers). This isn't bad, but it's not desired.
  2. Expand the server farm - Expand the server farm from 2 to 4+ servers and remove the sticky IP rule from the load balancer. Downside: these are Windows servers so they are expensive. There is no reason why they couldn't be Linux boxes, but setting up all new boxes could take more time than the client would allow.
  3. Setup 2 new servers strictly for serving these downloads - Basically the same benefits and drawbacks as #2, except that we could at least isolate the rest of the website from (and fine tune the new servers to) this particular process. We could also pretty easily make these Linux boxes.
  4. Detect the users connection rate - I had in mind a way to detect the current speed of the user by using AJAX on the download landing page to time how long it takes to downloading a static file with a known file size, then sending that info to the server and calculating the timeout based on that info. It's not ideal, but it's better than estimating the connection speed too high or too low. I'm not sure how I would get the speed info back to the server though since we currently use a redirect header that is sent from the server.

# 的 1-3 可能会被拒绝或至少被推迟.那么 4 是解决这个问题的好方法,还是我没有考虑过其他的事情?

Chances are #'s 1-3 will be declined or at least pushed off. So is 4 a good way to go about this, or is there something else I haven't considered?

(随意挑战原始解决方案.)

(Feel free to challenge the original solution.)

推荐答案

使用 X-SENDFILE.大多数网络服务器会在本地或通过插件 (apache) 支持它.

Use X-SENDFILE. Most webservers will support it either natively, or though a plugin (apache).

使用此标头,您可以简单地指定本地文件路径并退出 PHP 脚本.网络服务器会看到标头并提供该文件.

using this header you can simply specify a local file path and exit the PHP script. The webserver sees the header and serves that file instead.

相关文章