从电子邮件的字符串正文中删除格式标记
调用时如何去掉所有格式化标签:
How do you remove all formatting tags when calling:
GmailApp.getInboxThreads()[0].getMessages()[0].getBody()
使得文本的唯一剩余部分是可以阅读的.
such that the only remainder of text is that which can be read.
格式化可以被破坏;正文中的文本只需要解析即可,但标签如:
Formatting can be destroyed; the text in the body is only needed to be parsed, but tags such as:
"&"
<br>
可能还有其他人,需要删除.
and possibly others, need to be removed.
推荐答案
我不确定您所说的 .getBody()
是什么意思 - 这应该返回一个 DOM 正文元素吗?
I am not sure what you mean by .getBody()
- is this supposed to return a DOM body element?
然而,去除 HTML 标签最简单的解决方案可能是让浏览器渲染 HTML 并询问他的文本内容:
However, the simplest solution for removing HTML tags is probably to let the browser render the HTML and ask him for the text content:
var myHTMLContent = "hello & world <br />!";
var tempDiv = document.createElement('div');
tempDiv.innerHTML = myHTMLContent;
// retrieve the cleaned content:
var textContent = tempDiv.innerText;
对于上面的例子,textContent
变量将包含文本
With the above example, the textContent
variable will contain the text
"hello & world
!"
(请注意由于 <br/>
标签引起的换行符.)
(Note the line break due to the <br />
tag.)
相关文章