php mysql 全文搜索:lucene、sphinx 还是?
这无可否认类似(但不是重复)全文搜索引擎的比较 - Lucene、Sphinx、Postgresql、MySQL?,但是我正在寻找的是来自使用多个可用系统的经验的好处(似乎有很多:我使用过 lucene,但没有使用 sphinx",反之亦然).
This is admittedly similar to (but not a duplicate of) Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?, however what I am looking for are specific, supported, recommendations from the benefit of experience with more than one of the available systems (there seems to be a lot of: "I've used lucene, but not sphinx", and vice a versa).
设置:标准 LAMP(Mysql 5.0,PHP 5).
The setup: Standard LAMP (Mysql 5.0, PHP 5).
MySQL:表使用 InnoDB 引擎进行外键约束
MySQL: tables are using the InnoDB engine for foreign key constraints
我们正在查看索引数据,而不是页面.要索引的数据可能是多种语言(utf-8 字符集)
We are looking at indexing data, not pages. data to be indexed may be in multiple languages (utf-8 charset)
我遇到的一些比较(例如 http://blog.evanweaver.com/articles/2008/03/17/rails-search-benchmarks/) 要么不完全适用(ferret 是 lucene 端口,但与 Zend_Search_Lucene 不同)要么他们正在推动自己的系统/实施(并非完全公正).
A number of the comparisons I've come across (like http://blog.evanweaver.com/articles/2008/03/17/rails-search-benchmarks/) are either not entirely applicable (ferret is a lucene port but not the same as Zend_Search_Lucene) or they are pushing their own systems/implementations (not exactly unbiased).
我遇到的其他一些(例如 http://whatstheplot.com/blog/tag/lucene/ 和 http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/) 为两个系统的性能提供了截然不同的结果.
Some others I've come across (such as http://whatstheplot.com/blog/tag/lucene/ and http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/) provide very different results for performance of the two systems.
此外,在我读过的大部分内容中几乎都忽略了 Xapian.这也值得考虑吗?
Also, all but ignored in much of what I've read is Xapian. Might this be worth consideration as well?
所以...我希望这里的一些人对这个问题有一些经验,可以帮助提出一些建议或为我指明正确的方向.
So... I'm hoping that some of you here on SO have some experience with this question and could help with some recommendations or point me in the right direction.
推荐答案
Sphinx 的一个优点是您可以在客户端和 MySQL 服务器之间插入"它,并且它只会干扰"专门解决它的查询,透明地将其他人从 MySQL 中弹开——参见例如 这篇文章.这是否对您的用例有利,您最有发言权!
One advantage of Sphinx is that you can "interpose" it between your clients and the MySQL server, and it will only "interfere" on queries specifically addressing it, transparently bouncing the others off MySQL -- see e.g this article. Whether that's an advantage in your use case, you're best placed to say!
抱歉,没有使用 Xapian 或 Lucene 的实际经验——不过,阅读有关如何部署它们的信息,听起来(对我来说!)好像只有在您确定了实质性优势时才值得.否则,Sphinx 的像馅饼一样简单"的部署,作为您的客户端和您的 MySQL 服务器之间的代理",对我来说就像是一个巨大的、实质性的胜利!
Sorry, no real-life experience with Xapian or Lucene -- still, reading about how to deploy them, makes it sound like (to me!) as if it might be worth it only if you identified substantial advantages. Otherwise, Sphinx's "easy as pie" deployment, as a "proxy" between your clients and your MySQL server, feels like a big, substantial win to me!
相关文章