如何从 JDOM 获取节点内容
我正在使用 import org.jdom.* 编写一个 java 应用程序;
I'm writing an application in java using import org.jdom.*;
我的 XML 是有效的,但有时它包含 HTML 标记.例如,像这样:
My XML is valid,but sometimes it contains HTML tags. For example, something like this:
<program-title>Anatomy & Physiology</program-title>
<overview>
<content>
For more info click <a href="page.html">here</a>
<p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>
</content>
</overview>
<key-information>
<category>Health & Human Services</category>
所以我的问题在于 <p > overview.content 节点内的标签.
So my problem is with the < p > tags inside the overview.content node.
我希望这段代码可以工作:
I was hoping that this code would work :
Element overview = sds.getChild("overview");
Element content = overview.getChild("content");
System.out.println(content.getText());
但它返回空白.
如何从 overview.content 节点返回所有文本(嵌套标签和所有)?
How do I return all the text ( nested tags and all ) from the overview.content node ?
谢谢
推荐答案
content.getText()
提供即时文本,该文本仅对带有文本内容的叶子元素有用.
content.getText()
gives immediate text which is only useful fine with the leaf elements with text content.
技巧是使用 org.jdom.output.XMLOutputter
(带文本模式 CompactFormat
)
Trick is to use org.jdom.output.XMLOutputter
( with text mode CompactFormat
)
public static void main(String[] args) throws Exception {
SAXBuilder builder = new SAXBuilder();
String xmlFileName = "a.xml";
Document doc = builder.build(xmlFileName);
Element root = doc.getRootElement();
Element overview = root.getChild("overview");
Element content = overview.getChild("content");
XMLOutputter outp = new XMLOutputter();
outp.setFormat(Format.getCompactFormat());
//outp.setFormat(Format.getRawFormat());
//outp.setFormat(Format.getPrettyFormat());
//outp.getFormat().setTextMode(Format.TextMode.PRESERVE);
StringWriter sw = new StringWriter();
outp.output(content.getContent(), sw);
StringBuffer sb = sw.getBuffer();
System.out.println(sb.toString());
}
输出
For more info click<a href="page.html">here</a><p>Learn more about the human body. Choose from a variety of Physiology (A&P) designed for complementary therapies.&#160; Online studies options are available.</p>
请探索其他 格式化 选项并在上面进行修改根据您的需要编写代码.
Do explore other formatting options and modify above code to your need.
封装XMLOutputter格式选项的类.典型用户可以使用getRawFormat()(不改变空白)、getPrettyFormat()(空白美化)、getCompactFormat()(空白归一化)得到的标准格式配置."
"Class to encapsulate XMLOutputter format options. Typical users can use the standard format configurations obtained by getRawFormat() (no whitespace changes), getPrettyFormat() (whitespace beautification), and getCompactFormat() (whitespace normalization). "
相关文章