从电子邮件的字符串正文中删除格式标记

2022-01-18 00:00:00 google-apps-script tags javascript html

调用时如何去掉所有格式化标签:

How do you remove all formatting tags when calling:

GmailApp.getInboxThreads()[0].getMessages()[0].getBody()

使得文本的唯一剩余部分是可以阅读的.

such that the only remainder of text is that which can be read.

格式化可以被破坏;正文中的文本只需要解析即可,但标签如:

Formatting can be destroyed; the text in the body is only needed to be parsed, but tags such as:

"&" 
<br>

可能还有其他人,需要删除.

and possibly others, need to be removed.

推荐答案

我不确定您所说的 .getBody() 是什么意思 - 这应该返回一个 DOM 正文元素吗?

I am not sure what you mean by .getBody() - is this supposed to return a DOM body element?

然而,去除 HTML 标签最简单的解决方案可能是让浏览器渲染 HTML 并询问他的文本内容:

However, the simplest solution for removing HTML tags is probably to let the browser render the HTML and ask him for the text content:

var myHTMLContent = "hello &amp; world <br />!";
var tempDiv = document.createElement('div');
tempDiv.innerHTML = myHTMLContent;

// retrieve the cleaned content:
var textContent = tempDiv.innerText;

对于上面的例子,textContent 变量将包含文本

With the above example, the textContent variable will contain the text

"hello & world
!"

(请注意由于 <br/> 标签引起的换行符.)

(Note the line break due to the <br /> tag.)

相关文章