file_get_contents - URL 中的特殊字符 - 特例
在这种特殊情况下,我不会让 file_get_contents() 返回页面,其中 url 包含一个Ö"字符.
I'm not getting file_get_contents() to return the page in this particular case where the url contains an 'Ö' character.
$url = "https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1&type=subgroup&startdate=20150101&enddate=20300501"
print file_get_contents($url);
如何使 file_get_contents() 在这个 url 上按预期工作?
How do I make file_get_contents() work as expected on this url?
我尝试了以下解决方案但没有奏效:
I have tried following solutions whithout a working result:
1.
print rawurlencode(utf8_encode($url));
2.
print mb_convert_encoding($url, 'HTML-ENTITIES', "UTF-8");
3.
$url = urlencode($url);
print file_get_contents($url);
4.
$content = file_get_contents($url);
print mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
在这些问题中找到:
file_get_contents - URL 中的特殊字符
PHP 获取带有特殊字符的 url,无需 urlencode:ing!
file_get_contents() 分解 UTF-8 字符
更新:正如您所看到的,在我的示例中实际上返回了一个页面,但它不是您在浏览器中键入 url 时获得的预期页面.
UPDATE: As you can see a page is actually returned in my example but it is not the expected page, the one you get when you type the url in the browser.
推荐答案
URL 不能包含Ö"! 从这个基本前提开始.任何不在严格定义的 ASCII 子集中的字符都必须进行 URL 编码才能在 URL 中表示.正确的方法是 urlencode
或 rawurlencode
(取决于服务器期望的格式)URL 的单个部分,而不是整个 URL.
URLs cannot contain "Ö"! Start from this basic premise. Any characters not within a narrowly defined subset of ASCII must be URL-encoded to be represented within a URL. The right way to do that is to urlencode
or rawurlencode
(depending on which format the server expects) the individual segment of the URL, not the URL as a whole.
例如:
$url = sprintf('https://se.timeedit.net/web/liu/db1/schema/s/s.html?tab=3&object=%s&type=subgroup&startdate=20150101&enddate=20300501',
rawurlencode('CM_949A11_1534_1603_DAG_DST_50_ÖVRIGT_1_1'));
您仍然需要为字符串使用正确的编码!ISO-8859-1 中的 Ö
将 URL 编码为 %D6
,而在 UTF-8 中它将编码为 %C3%96
.哪一个是正确的取决于服务器的期望.
You will still need to use the correct encoding for the string! Ö
in ISO-8859-1 would be URL encoded to %D6
, while in UTF-8 it would be encoded to %C3%96
. Which one is the correct one depends on what the server expects.
相关文章