SQL Server 2008 全文搜索 (FTS) 与 Lucene.NET
我知道过去有关于 SQL 2005 与 Lucene.NET 的问题,但自 2008 年问世以来,他们对其进行了很多更改,并想知道是否有人可以给我优点/缺点(或链接到文章).
I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article).
推荐答案
我于 2006 年在 SQL Server 2005 的 FTS 之上构建了一个中等大小的知识库(可能是 2GB 的索引文本),现在已将其移至 2008 的 iFTS.这两种情况对我来说都很好,但从 2005 年到 2008 年的转变对我来说实际上是一种进步.
I built a medium-size knowledge base (maybe 2GB of indexed text) on top of SQL Server 2005's FTS in 2006, and have now moved it to 2008's iFTS. Both situations have worked well for me, but the move from 2005 to 2008 was actually an improvement for me.
我的情况与 StackOverflow 不同,因为我正在索引仅在夜间刷新的数据,但是我试图将来自多个 CONTAINSTABLE 语句的搜索结果重新连接到彼此和关系表中.
My situation was NOT like StackOverflow's in the sense that I was indexing data that was only refreshed nightly, however I was trying to join search results from multiple CONTAINSTABLE statements back in to each other and to relational tables.
在 2005 年的 FTS 中,这意味着每个 CONTAINSTABLE 都必须在索引上执行搜索,返回完整的结果,然后让数据库引擎将这些结果连接到关系表(这对我来说都是透明的,但它正在发生并且查询成本很高).2008 年的 iFTS 改善了这种情况,因为数据库集成允许多个 CONTAINSTABLE 结果成为查询计划的一部分,从而提高了许多搜索的效率.
In 2005's FTS, this meant each CONTAINSTABLE would have to execute its search on the index, return the full results and then have the DB engine join those results to the relational tables (this was all transparent to me, but it was happening and was expensive to the queries). 2008's iFTS improved this situation because the database integration allows the multiple CONTAINSTABLE results to become part of the query plan which made a lot of searches more efficient.
我认为 2005 年和 2008 年的 FTS 引擎以及 Lucene.NET 都有架构权衡,这些权衡会更好或更差地适应许多项目环境 - 我很幸运升级对我有利.我完全可以理解为什么 2008 年的 iFTS 不能在与 2005 年相同的配置下工作,因为 StackOverflow.com 等用例具有高度 OLTP 特性.但是,我不会忽视 2008 iFTS 可以从繁重的插入事务负载中分离出来的可能性......但听起来它可能与迁移到 Lucene.NET 一样多的工作......而且很酷Lucene.NET 的因素是难以忽视的 ;)
I think that both 2005 and 2008's FTS engines, as well as Lucene.NET, have architectural tradeoffs that are going to align better or worse to a lot of project circumstances - I just got lucky that the upgrade worked in my favor. I can completely see why 2008's iFTS wouldn't work in the same configuration as 2005's for the highly OLTP nature of a use case like StackOverflow.com. However, I would not discount the possibility that the 2008 iFTS could be isolated from the heavy insert transaction load... but it also sounds like it could be as much work to accomplish that as move to Lucene.NET ... and the cool factor of Lucene.NET is hard to ignore ;)
无论如何,对我来说,SQL 2008 的 iFTS 在大多数情况下的易用性和效率可能会超过 Lucene 的酷"因素(虽然它很容易使用,但我从未在生产系统中使用过它,所以我我对此保留评论).我会很想知道在 StackOverflow 或类似情况下 Lucene 的效率(事实证明是这样吗?现在实施了吗?).
Anyway, for me, the ease and efficiency of SQL 2008's iFTS in the majority of situations probably edges out Lucene's 'cool' factor (though it is easy to use, I've never used it in a production system so I'm reserving comment on that). I would be interesting in knowing how much more efficient Lucene is (has turned out to be? is it implemented now?) in StackOverflow or similar situations.
相关文章