在 Python 中使用 ElementTree 解析带有命名空间的 XML

2022-01-10 00:00:00 python python-2.7 elementtree xml xml-parsing

问题描述

我有一个xml，它的一小部分看起来像这样:

I have an xml, small part of it looks like this:

<?xml version="1.0" ?> <i:insert xmlns:i="urn:com:xml:insert" xmlns="urn:com:xml:data"> <data> <image imageId="1"></image> <content>Content</content> </data> </i:insert>

当我使用 ElementTree 解析它并将其保存到一个文件中时，我看到以下内容:

When i parse it using ElementTree and save it to a file i see following:

<ns0:insert xmlns:ns0="urn:com:xml:insert" xmlns:ns1="urn:com:xml:data"> <ns1:data> <ns1:image imageId="1"></ns1:image> <ns1:content>Content</ns1:content> </ns1:data> </ns0:insert>

为什么它会改变前缀并将它们放在任何地方?使用 minidom 我没有这样的问题.配置好了吗?ElementTree 的文档很差.问题是，在这样的解析之后我找不到任何节点，例如 image - 如果我像 {namespace}image 或只是 image.为什么?任何建议都非常感谢.
Why does it change prefixes and put them everywhere? Using minidom i don't have such problem. Is it configured? Documentation for ElementTree is very poor. The problem is, that i can't find any node after such parsing, for example image - can't find it with or without namespace if i use it like {namespace}image or just image. Why's that? Any suggestions are strongly appreciated. 我已经尝试过的: import xml.etree.ElementTree as ET tree = ET.parse('test.xml') root = tree.getroot() for a in root.findall('ns1:image'): print a.attrib 这会返回一个错误，而另一个则什么也不返回: This returns an error and the other one returns nothing: for a in root.findall('{urn:com:xml:data}image'): print a.attrib 我也尝试过制作这样的命名空间并使用它: I also tried to make namespace like this and use it: namespaces = {'ns1': 'urn:com:xml:data'} for a in root.findall('ns1:image', namespaces): print a.attrib 它什么也不返回.我做错了什么? It returns nothing. What am i doing wrong? 解决方案这个片段来自你的问题， This snippet from your question, for a in root.findall('{urn:com:xml:data}image'): print a.attrib 不输出任何内容，因为它只查找树根的直接 {urn:com:xml:data}image 子级. does not output anything because it only looks for direct {urn:com:xml:data}image children of the root of the tree. 这个稍加修改的代码， for a in root.findall('.//{urn:com:xml:data}image'): print a.attrib 将打印 {'imageId': '1'} 因为它使用 .//，它会选择所有级别的匹配子元素. will print {'imageId': '1'} because it uses .//, which selects matching subelements on all levels. 参考:https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax. ElementTree 默认情况下不仅保留原始命名空间前缀有点烦人，但请记住，无论如何，前缀并不重要.register_namespace() 函数可用于在序列化 XML 时设置所需的前缀.该函数对解析或搜索没有任何影响. It is a bit annoying that ElementTree does not just retain the original namespace prefixes by default, but keep in mind that it is not the prefixes that matter anyway. The register_namespace() function can be used to set the wanted prefix when serializing the XML. The function does not have any effect on parsing or searching.


	
		相关文章