MongoDB 是关系 db + lucene 的有效替代品吗?
在一个新项目中,我需要大量使用 lucene 来实现搜索器.这个搜索器将是项目中非常重要(也是很大)的一部分.用 MongoDb 替换关系数据库 + Lucene 是否有效或方便?
On a new project I need a hard use of lucene for a searcher implementation. This searcher will be a very important (and big) piece of the project. Is valid or convenient replacing Relational Database + Lucene with MongoDb?
好的,我会澄清一下:我不是在问风险,我可以在这个项目中付出这个代价.我的观点是:MongoDB 是面向这种东西的吗?我可以制作一个与 Lucene 具有相同性能的完整搜索引擎吗?一位朋友向我指出 MongoDB 作为替代方案,但我看不出 Lucene 性能是否与文档替代方案一起提供(然后,我也会在 MongoDB 中看到它),或者另一方面,倒排索引和优化是完整的与文档方向无关.
edit: Ok, I will clarify: I'm not asking about risk, I can pay that price in this project. My point is: Is MongoDB oriented to this kind of thing? Can I make a full search engine with the same perfomance as I can get on Lucene?. A friend point me out MongoDB as alternative, but I don't see if the Lucene performance comes with the document alternative (and then, I will see it in MongoDB too), or, in other hand, the inverted index and optimitizations are complety independant of document orientation.
推荐答案
从技术上讲,您可以使用 MongoDB 进行全文搜索,但是您错过了全文搜索提供商必须提供的很多功能.我喜欢 MongoDB,但如果完全关心实施时间,我会将它与全文搜索提供程序(例如 Lucene 或 Sphinx)结合使用.我认为 MongoDB 索引单词数组的便捷能力最好留给基于标记的标记和搜索,而不是全文搜索.
Technically you can do full text search with MongoDB, but you're missing out on a lot that a full text search provider has to offer. I love MongoDB, but I'd couple it with a full text search provider (such as Lucene or Sphinx) if time to implementation is at all a concern. I think MongoDB's convenient ability to index word arrays is better left to tagging and searching based on tagging than full text search.
搜索(信息检索)不仅仅是抓取任何匹配的文档,如果您希望搜索结果具有任何相关性,您将需要类似 TF-IDF、短语匹配(单词序列得分更高)或任何数量的其他 IR 技术来提高搜索精度.如果你使用 MongoDB,你需要从头开始实现它.
Search (Information Retrieval) isn't just about grabbing any documents that match, if you want your search results to have any relevance at all you're going to need something along the lines of TF-IDF, phrase matching (words in a sequence score higher) or any number of other IR techniques to improve search precision. If you use MongoDB you'll need to implement it all from scratch.
如果您真的想从头开始实现这一切,但又不想为事物的原始存储而烦恼,那么 MongoDB 非常接近您可以在其之上实现它的最佳数据库存储(想不出很多其他的),但这仍然不是一个很好的选择.
If you really want to implement it all from scratch but not bother with the raw storage side of things, MongoDB is pretty close to the best DB store that you could implement it on top of (can't think of many others), but that still doesn't make it a great option.
相关文章