PHP/Curl:HEAD Request 在某些网站上需要很长时间

2022-01-11 00:00:00 http-headers performance header curl php

我有一个简单的代码,它对 URL 进行头部请求,然后打印响应头.我注意到在某些网站上,这可能需要很长时间才能完成.

I have simple code that does a head request for a URL and then prints the response headers. I've noticed that on some sites, this can take a long time to complete.

例如,请求 http://www.arstechnica.com 大约需要两分钟.我使用另一个执行相同基本任务的网站尝试了相同的请求,它立即返回.所以一定是我设置不正确导致了这个延迟.

For example, requesting http://www.arstechnica.com takes about two minutes. I've tried the same request using another web site that does the same basic task, and it comes back immediately. So there must be something I have set incorrectly that's causing this delay.

这是我的代码:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);

// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'

$content = curl_exec ($ch);
curl_close ($ch);

这里有一个指向具有相同功能的网站的链接:http://www.seoconsultants.com/tools/headers.asp

Here's a link to the web site that does the same function: http://www.seoconsultants.com/tools/headers.asp

上面的代码,至少在我的服务器上,需要两分钟才能检索 www.arstechnica.com,但上面链接中的服务会立即返回它.

The code above, at least on my server, takes two minutes to retrieve www.arstechnica.com, but the service at the link above returns it right away.

我错过了什么?

推荐答案

试着简化一下:

print htmlentities(file_get_contents("http://www.arstechnica.com"));

上面的输出立即在我的网络服务器上.如果您没有,那么您的网络托管服务商很可能有某种设置来限制此类请求.

The above outputs instantly on my webserver. If it doesn't on yours, there's a good chance your web host has some kind of setting in place to throttle these kind of requests.

编辑:

由于上述情况会立即发生,请尝试设置 此 curl 设置 在您的原始代码上:

Since the above happens instantly for you, try setting this curl setting on your original code:

curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);

使用您发布的工具,我注意到 http://www.arstechnica.com 为发送给它的任何请求发送了 301 标头.cURL 可能会得到这个并且没有遵循为其指定的新位置,从而导致您的脚本挂起.

Using the tool you posted, I noticed that http://www.arstechnica.com has a 301 header sent for any request sent to it. It is possible that cURL is getting this and not following the new Location specified to it, thus causing your script to hang.

第二次编辑:

奇怪的是,尝试与上面相同的代码也导致我的网络服务器挂起.我替换了这段代码:

Curiously enough, trying the same code you have above was making my webserver hang too. I replaced this code:

curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'

有了这个:

curl_setopt($ch, CURLOPT_NOBODY, true);

手册建议您采用哪种方式头请求.它使它立即工作.

Which is the way the manual recommends you do a HEAD request. It made it work instantly.

相关文章