当 PHP 无法指示正确的编码时如何加载 XML?

我正在尝试从远程位置加载 XML 源,因此我无法控制格式.不幸的是,我尝试加载的 XML 文件没有编码:

I'm trying to load an XML source from a remote location, so i have no control of the formatting. Unfortunately the XML file I'm trying to load has no encoding:

<ROOT xmlns:sql="urn:schemas-microsoft-com:xml-sql"> <NODE> </NODE> </ROOT>

在尝试以下操作时:

$doc = new DOMDocument( );
$doc->load(URI);

我明白了:

Input is not proper UTF-8, indicate encoding ! Bytes: 0xA3 0x38 0x2C 0x38

我已经研究了抑制这种情况的方法,但没有运气.我应该如何加载它以便我可以将它与 DOMDocument 一起使用?

Ive looked at ways to suppress this, but no luck. How should I load this so that I can use it with DOMDocument?

推荐答案

您必须将文档转换为 UTF-8,最简单的方法是使用 utf8_encode().

You've to convert your document into UTF-8, the easiest would be to use utf8_encode().

DOM 文档示例:

$doc = new DOMDocument();
$content = utf8_encode(file_get_contents($url));
$doc->loadXML($content);

简单 XML 示例:

$xmlInput = simplexml_load_string(utf8_encode(file_get_contents($url_or_file)));

<小时>

如果您不知道当前的编码,请使用mb_detect_encoding(),例如:

$content = utf8_encode(file_get_contents($url_or_file));
$encoding = mb_detect_encoding($content);
$doc = new DOMdocument();
$res = $doc->loadXML("<?xml encoding='$encoding'>" . $content);

注意事项:

  • 如果无法检测到编码(函数将返回 FALSE),您可以尝试通过 utf8_encode().
  • 如果您通过 $doc->loadHTML 加载 html 代码,您仍然可以使用 XML 标头.
  • If encoding cannot be detected (function will return FALSE), you may try to force the encoding via utf8_encode().
  • If you're loading html code via $doc->loadHTML instead, you can still use XML header.

如果您知道编码,请使用 iconv() 进行转换:

If you know the encoding, use iconv() to convert it:

$xml = iconv('ISO-8859-1' ,'UTF-8', $xmlInput)

相关文章