在 PHP 中将 Word 文档转换为可用的 HTML

2021-12-31 00:00:00 ms-word php

我有一组 Word 文档,我想使用我编写的 PHP 工具发布这些文档.我将 Word 文档复制并粘贴到文本框中,然后使用 PHP 程序将它们保存到 MySQL 中.我遇到的问题源于 Word 文档具有的所有非标准字符,例如弯引号和省略号 ("...").我目前所做的是用纯文本或 HTML 实体(é 等)手动搜索和替换这些类型的东西(以及外来符号,例如 e-acute) 是否有 PHP 中的函数我可以调用将获取 Word 文档的输出并将所有应为实体的内容转换为实体,并将其他在 Firefox 中无法正确显示的符号转换为可以显示的符号.

I have a set of Word documents which I want to publish using a PHP tool I've written. I copy and paste the Word documents into a text box and then save them into MySQL using the PHP program. The problem I Have arises from all the non-standard characters that Word documents have, like curly quotes and ellipses ("..."). What I do at the moment is manually search and replace these kinds of things (and also foreign symbols such as e-acute) with either plain text or HTML entities (&eacute ; etc) Is there a function in PHP I can call that will take the output of a Word document and convert everything that should be entities into entities, and other symbols that don't display properly in Firefox into symbols that do display.

谢谢!

推荐答案

更好的解决方案是确保您的数据库设置为支持 UTF-8 字符.扩展集中可用的附加字符应涵盖您所谈论的所有非标准"字符.

A better solution would be to ensure that your database is set-up to support UTF-8 characters. The additional characters available in the extended set should cover all the "non-standard" characters that you're talking about.

否则,如果您确实必须将这些字符转换为 HTML 实体,请使用 htmlentities().

Otherwise, if you really must convert these characters into HTML entities, use htmlentities().

相关文章