使用 InnoDB 进行全文搜索
我正在开发一个大容量的 Web 应用程序,其中的一部分是讨论帖子的 MySQL 数据库,该数据库需要平稳地增长到 2000 万行以上.
I'm developing a high-volume web application, where part of it is a MySQL database of discussion posts that will need to grow to 20M+ rows, smoothly.
我最初计划对表使用 MyISAM(对于内置的 全文搜索功能),但一想到整个表格由于一次写入操作而被锁定,我就想闭嘴.行级锁更有意义(更不用说 InnoDB 在处理大表时的其他速度优势).因此,出于这个原因,我非常确定要使用 InnoDB.
I was originally planning on using MyISAM for the tables (for the built-in fulltext search capabilities), but the thought of the entire table being locked due to a single write operation makes me shutter. Row-level locks make so much more sense (not to mention InnoDB's other speed advantages when dealing with huge tables). So, for this reason, I'm pretty determined to use InnoDB.
问题是... InnoDB 没有内置全文搜索功能.
The problem is... InnoDB doesn't have built-in fulltext search capabilities.
我应该使用第三方搜索系统吗?喜欢 Lucene(c++)/狮身人面像?你们中的任何一个数据库忍者有什么建议/指导吗?LinkedIn 的 zoie(基于 Lucene)目前看起来是最好的选择del>... 围绕实时功能构建(这对我的应用程序非常重要.)我有点犹豫要不要提交但没有一些洞察力...
Should I go with a third-party search system? Like Lucene(c++) / Sphinx? Do any of you database ninjas have any suggestions/guidance? LinkedIn's zoie (based off Lucene) looks like the best option at the moment... having been built around realtime capabilities (which is pretty critical for my application.) I'm a little hesitant to commit yet without some insight...
(仅供参考:将在 EC2 上使用高内存设备,使用 PHP 为前端提供服务)
(FYI: going to be on EC2 with high-memory rigs, using PHP to serve the frontend)
推荐答案
我可以保证 MyISAM 全文是一个糟糕的选择 - 即使撇开 MyISAM 表的各种问题,我已经看到全文内容从rails 并开始自我损坏并定期使 MySQL 崩溃.
I can vouch for MyISAM fulltext being a bad option - even leaving aside the various problems with MyISAM tables in general, I've seen the fulltext stuff go off the rails and start corrupting itself and crashing MySQL regularly.
专用搜索引擎绝对是这里最灵活的选择 - 将帖子数据存储在 MySQL/innodb 中,然后将文本导出到您的搜索引擎.您可以非常轻松地设置定期完整索引构建/发布,并在您觉得需要并愿意花时间时添加实时索引更新.
A dedicated search engine is definitely going to be the most flexible option here - store the post data in MySQL/innodb, and then export the text to your search engine. You can set up a periodic full index build/publish pretty easily, and add real-time index updates if you feel the need and want to spend the time.
Lucene 和 Sphinx 是不错的选择,Xapian 也是不错的选择,它既美观又轻巧.如果您走 Lucene 路线,请不要假设 Clucene 会更好,即使您不想与 Java 搏斗,尽管我真的没有资格讨论两者的优缺点.
Lucene and Sphinx are good options, as is Xapian, which is nice and lightweight. If you go the Lucene route don't assume that Clucene will better, even if you'd prefer not to wrestle with Java, although I'm not really qualified to discuss the pros and cons of either.
相关文章