php mysql全文搜索:lucene,sphinx,还是?

2022-01-15 00:00:00 search lucene php mysql sphinx

诚然类似(但不重复)全文搜索引擎的比较 - Lucene、Sphinx、Postgresql、MySQL?,但是我正在寻找的是具体的、受支持的、来自受益于多个可用系统的经验(似乎有很多:我使用过 lucene,但没有使用 sphinx",反之亦然).

This is admittedly similar to (but not a duplicate of) Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?, however what I am looking for are specific, supported, recommendations from the benefit of experience with more than one of the available systems (there seems to be a lot of: "I've used lucene, but not sphinx", and vice a versa).

设置:标准 LAMP(Mysql 5.0、PHP 5).

The setup: Standard LAMP (Mysql 5.0, PHP 5).

MySQL:表使用 InnoDB 引擎进行外键约束

MySQL: tables are using the InnoDB engine for foreign key constraints

我们正在查看索引数据,而不是页面.要索引的数据可能是多种语言(utf-8 字符集)

We are looking at indexing data, not pages. data to be indexed may be in multiple languages (utf-8 charset)

我遇到的一些比较(例如 http://blog.evanweaver.com/articles/2008/03/17/rails-search-benchmarks/) 要么不完全适用(ferret 是一个 lucene 端口,但与 Zend_Search_Lucene 不同)或他们正在推动自己的系统/实施(并非完全公正).

A number of the comparisons I've come across (like http://blog.evanweaver.com/articles/2008/03/17/rails-search-benchmarks/) are either not entirely applicable (ferret is a lucene port but not the same as Zend_Search_Lucene) or they are pushing their own systems/implementations (not exactly unbiased).

我遇到的其他一些(例如 http://whatstheplot.com/blog/标签/lucene/ 和 http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/)为这两个系统的性能提供了截然不同的结果.

Some others I've come across (such as http://whatstheplot.com/blog/tag/lucene/ and http://pagetracer.com/2008/02/15/sphinx-and-lucene-search-engines-first-impressions/) provide very different results for performance of the two systems.

另外,在我读过的大部分内容中几乎被忽略的是 Xapian.这也值得考虑吗?

Also, all but ignored in much of what I've read is Xapian. Might this be worth consideration as well?

所以...我希望你们这里的一些人对这个问题有一些经验,可以帮助提出一些建议或为我指明正确的方向.

So... I'm hoping that some of you here on SO have some experience with this question and could help with some recommendations or point me in the right direction.

推荐答案

Sphinx 的一个优点是您可以在客户端和 MySQL 服务器之间插入"它,它只会干扰"专门针对它的查询,透明地将其他人从 MySQL 中反弹——参见例如 这篇文章.无论这是否是您的用例中的优势,您都可以说!

One advantage of Sphinx is that you can "interpose" it between your clients and the MySQL server, and it will only "interfere" on queries specifically addressing it, transparently bouncing the others off MySQL -- see e.g this article. Whether that's an advantage in your use case, you're best placed to say!

抱歉,没有使用 Xapian 或 Lucene 的实际经验——尽管如此,阅读有关如何部署它们的信息,听起来(对我来说!)好像只有在您发现实质性优势时才值得.否则,Sphinx 的简单易用"部署,作为您的客户端和 MySQL 服务器之间的代理",对我来说就像是一个巨大的、实质性的胜利!

Sorry, no real-life experience with Xapian or Lucene -- still, reading about how to deploy them, makes it sound like (to me!) as if it might be worth it only if you identified substantial advantages. Otherwise, Sphinx's "easy as pie" deployment, as a "proxy" between your clients and your MySQL server, feels like a big, substantial win to me!

相关文章