LIBXML_NOENT 做什么(为什么不叫 LIBXML_ENT)?

2022-01-10 00:00:00 xml xml-parsing libxml2 php

在 PHP 中,可以将可选参数传递给各种 XML 解析器,其中之一是 LIBXML_NOENT.documentation 有这样的说法:

In PHP, one can pass optional arguments to various XML parsers, one of them being LIBXML_NOENT. The documentation has this to say about it:

LIBXML_NOENT(整数)
替代实体

LIBXML_NOENT (integer)
Substitute entities

替代实体 信息量不是很大(什么实体?它们什么时候被替代?).但我认为假设 NOENTNO_ENTITIESNO_EXTERNAL_ENTITIES 的缩写是公平的,所以对我来说,这个标志似乎是一个公平的假设禁用(外部)实体的解析.

Substitute entities isn't very informative (what entities? when are they substituted?). But I think it's fair to assume that NOENT is short for NO_ENTITIES or NO_EXTERNAL_ENTITIES, so to me it seems to be a fair assumption that this flag disables the parsing of (external) entities.

但确实不是这样的:

$xml = '<!DOCTYPE root [<!ENTITY c PUBLIC "bar" "/etc/passwd">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT);
echo $dom->textContent;

结果是回显了/etc/passwd 的内容.如果没有 LIBXML_NOENT 参数,情况并非如此.

The result is that the content of /etc/passwd is echoed. Without the LIBXML_NOENT argument this is not the case.

对于非外部实体,该标志似乎没有任何作用.示例:

For non-external entities, the flag doesn't seem to have any effect. Example:

$xml = '<!DOCTYPE root [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->textContent;

这段代码的结果是TEST",有没有LIBXML_NOENT.

The result of this code is "TEST", with and without LIBXML_NOENT.

该标志似乎对 &lt; 等预定义实体没有任何影响.

The flag doesn't seem to have any effect on pre-defined entities such as &lt;.

所以我的问题是:

  • LIBXML_NOENT 标志到底有什么作用?
  • 为什么叫LIBXML_NOENT?它的缩写是什么,LIBXML_ENTLIBXML_PARSE_EXTERNAL_ENTITIES 不是更合适吗?
  • 是否存在实际上阻止解析所有实体的标志?
  • What exactly does the LIBXML_NOENT flag do?
  • Why is it called LIBXML_NOENT? What is it short for, and wouldn't LIBXML_ENT or LIBXML_PARSE_EXTERNAL_ENTITIES be a better fit?
  • Is there a flag that actually prevents the parsing of all entities?

推荐答案

问:LIBXML_NOENT 标志具体有什么作用?

该标志允许替换 XML 字符实体引用,无论是否外部.

The flag enables the substitution of XML character entity references, external or not.

问:为什么叫LIBXML_NOENT?它的缩写是什么,LIBXML_ENT 或 LIBXML_PARSE_EXTERNAL_ENTITIES 不是更合适吗?

这个名字确实具有误导性.我认为 NOENT 只是意味着解析文档的节点树不会包含任何实体节点,因此解析器将替换实体.如果没有 NOENT,解析器会为实体创建 DOMEntityReference 节点参考文献.

The name is indeed misleading. I think that NOENT simply means that the node tree of the parsed document won't contain any entity nodes, so the parser will substitute entities. Without NOENT, the parser creates DOMEntityReference nodes for entity references.

问:是否存在实际上阻止解析所有实体的标志?

LIBXML_NOENT 启用所有实体引用的替换.如果您不想扩展实体,只需省略该标志即可.例如

LIBXML_NOENT enables the substitution of all entity references. If you don't want entities to be expanded, simply omit the flag. For example

$xml = '<!DOCTYPE test [<!ENTITY c "TEST">]>
<test>&c;</test>';
$dom = new DOMDocument();
$dom->loadXML($xml);
echo $dom->saveXML();

打印

<?xml version="1.0"?>
<!DOCTYPE test [
<!ENTITY c "TEST">
]>
<test>&c;</test>

似乎 textContent 会自行替换实体,这可能是 PHP 绑定的一个特性.如果没有 LIBXML_NOENT,它会导致内部和外部实体的行为不同,因为后者不会被加载.

It seems that textContent replaces entities on its own which might be a peculiarity of the PHP bindings. Without LIBXML_NOENT, it leads to different behavior for internal and external entities because the latter won't be loaded.

相关文章