通过 javascript 对 CD-Rom 上的静态 HTML 文件进行全文搜索

我将在 CD-Rom 上提供一组静态 HTML 页面;这些页面需要在没有任何 Internet 访问权限的情况下完全可见.

I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.

我想为这些页面的内容提供全文搜索(类似 Lucene),它应该可以从 CD-Rom 中正常工作",而无需在客户端计算机上安装软件.

I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.

在 javascript 中实现搜索引擎将是完美的解决方案,但我很难找到任何看起来可靠/当前/流行的...?

A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?

我确实找到了这些:+ jsFind+ js-search

I did find these: + jsFind + js-search

但这两个项目似乎都相当不活跃?

but both projects seem rather inactive?

除了特定的 javascript 搜索引擎之外,另一种解决方案是能够从 javascript 访问本地 Lucene 索引:索引本身将使用 Lucene 构建并与 HTML 文件一起复制到 CD-Rom.

Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.

编辑:自己构建(见下文).

Edit: built it myself (see below).

推荐答案

其实我自己做的.

现有的解决方案(我能找到)没有说服力.

The existing solutions (that I could find) were unconvincing.

我希望能够搜索显示为一页的很长的树(ul/li/ul...);它包含 5000 多个项目.

I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.

在一个页面上显示这么长的树听起来有点奇怪,但实际上折叠/展开它比单独的页面更直观,而且由于我们离线,下载时间不是问题(解析时间是,不过,Chrome 很棒 ;-)

It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)

现代浏览器(无论如何都是FF和Chrome)提供的搜索"功能有两个大问题:它们只搜索页面上的可见项,并且无法搜索不连续的单词.

The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.

我希望能够搜索折叠的项目(在屏幕上不可见);我想在搜索一三"时找到一二三"(就像谷歌/Lucene);我只想打开包含找到的项目的树的分支.

I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.

所以,我所做的是:

  1. 创建单词的倒排索引 <-> 列表中项目的 ID(通过 xslt)(文档中大约 4500 个唯一单词)
  2. 将此索引转换为一堆 javascript 数组(一个单词 = 一个数组,包含 id)
  3. 搜索时,与搜索词表示的数组相交
  4. 第 3 步返回一个 ID 数组,然后我可以打开/突出显示该数组

它完全符合我的需要,而且速度非常快.更好的是,因为它从独立的索引"(id 数组)中搜索,所以它甚至可以在浏览器中没有加载列表时进行搜索!

It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!

相关文章