replaceWith() 后的 find() 不起作用(使用 BeautifulSoup)

2022-01-20 00:00:00 python beautifulsoup find

问题描述

请考虑以下 python 会话:

Please consider the following python session:

>>> from BeautifulSoup import BeautifulSoup
>>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i")
>>> myi.replaceWith(BeautifulSoup("was"))
>>> s.find("i")
>>> s = BeautifulSoup("<p>This <i>is</i> a <i>test</i>.</p>"); myi = s.find("i")
>>> myi.replaceWith("was")
>>> s.find("i")
<i>test</i>

请注意第 4 行后 s.find("i") 的缺失输出!

Please note the missing output of s.find("i") after line 4!

这是什么原因?有解决办法吗?

What's the reason for this? Is there a workaround?

实际上,该示例并未演示用例,即:

Actually, the example doesn't demonstrate the usecase, which is:

myi.replaceWith(BeautifulSoup("wa<b>s</b>"))

每当插入的部分包含自己重要的 html 代码时,我看不出如何用其他内容替换此语法.只是有

Whenever the inserted part contains itself nontrivial html code, I don't see how you could replace this syntax with something else. Just having

myi.replaceWith("wa<b>s</b>")

将用实体替换 html 特殊字符.

will replace the html special chars by entities.


解决方案

更简单的答案:调用 replaceWith 后,通过调用 s 重新生成并清理 s= BeautifulSoup(s.renderContents()).然后你就可以再次find了.

Simpler answer : after your call to replaceWith, regenerate and clean s by calling s = BeautifulSoup(s.renderContents()). Then you can find again.

相关文章