at-sign (@) 是有效的 HTML/XML 标记字符吗?

2022-01-18 00:00:00 xml character tags html specifications

我正在使用正则表达式进行一些 HTML 剥离(是的,我知道,从不 parse 使用正则表达式的 HTML,但我只是 stripping 它,我也不幸的是不能使用任何外部库).我正在使用正则表达式食谱中的正则表达式,效果很好,只是我遇到了这个问题:

I'm doing some HTML stripping using regular expressions (yes, I know, never parse HTML with regexes, but I'm just stripping it, and I also unfortunately cannot use any external libraries). I'm using a regex from the Regular Expressions Cookbook, and it has worked great, except I just ran into this problem:

在字符串 Bob Saget <bobs@aol.com> 中,我的正则表达式将电子邮件作为标签进行匹配.

In the string Bob Saget <bobs@aol.com>, my regex is matching the email as a tag.

所以我的问题是,@ 是一个有效的 XML 或 HTML tag 字符吗?(我不是在问它在属性中是否有效;我知道它是有效的)如果不是,我将能够在我的正则表达式中成功地排除它.

So my question is, is the @ sign a valid XML or HTML tag character? (I'm not asking whether or not it is valid within an attribute; I know that it is) If it is not, I will be able to successfully exclude it in my regex.

我不确定在哪里查找.我看了 here 我认为这在 XML, 标记中不允许使用 at 符号;不过,我会很感激一些具体的证据.

I'm not sure where to look this up. I looked here and I think that says that in XML, the at-sign is not allowed in a tag; however, I would appreciate some concrete proof.

推荐答案

再看XML规范:

一个标签包括:

'<' Name (S Attribute)* S? '>'

名称包括:

NameStartChar (NameChar)*

NameStartChar 包括:

A NameStartChar consists of:

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

NameChar 包括:

A NameChar consists of:

NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

@ 符号是 U+0040

所以 @ 符号在 NameChar 或 NameStartChar 中无效,因此在 Name 中无效.

So the @ sign is not valid in a NameChar or a NameStartChar, and thus not valid in a Name.

相关文章