分布式检索系统 ElasticSearch 和 SenseiDB 比较

2022-04-27 00:00:00 索引 支持 分布式 缺点 灵活

从网上找了一些关于这两个系统的介绍和比较的文章

1) senseidb VS. Solr VS. elasticsearch (***Incomplete***)

现阿里巴巴平台技术部技术专家王福强写的SenseiDB,Solr和ElasticSearch三者的比较

SenseiDB

 

特点

 * 主要解决高速索引更新的问题; 底层是zoie的“2-swapping-in-memory-index + 1-on-disk-index”索引结构支持

 * 需要定义schema;

 * 通过Gateway可以接入多种数据源;

 * 通过BQL或者REST API,甚至各种语言bindings进行数据查询;

 * 支持通过hadoop MR job批量更新数据索引;

优点

 * 高速索引更新 - 多数据源接入 - 灵活的访问接口 - 与hadoop生态的集成 - 的分布式扩展能力

缺点

 * static schema

 * application side versioning maitaining

为何没有直接用Solr?

   摘录在John Wang的访谈片段:

   Sensei leverages Lucene.

   We weren’t able to leverage Solr because of the following requirements:

   * High update requirement, 10’s of thousands updates per second in to the system

   * Real distributed solution, current Solr’s distributed story has a SPOF at the master, and Solr Cloud is not yet completed.

   * Complex faceting support. Not just your standard terms based faceting. We needed to facet on social graph, dynamic time ranges and many other interesting faceting scenarios. Faceting behavior also needs to be highly customizable, which is not available via Solr.

ElasticSearch

 

特点

 * Schema-Free | Schemaless

 * feed index engine with JSON formatted documents

 * Query by Lucene based query string or JSON based query DSL over HTTP or Native Java;

 * shards and replicas, LB and routings

 * cloud integration

 * multiple search *

 * multiple data sources integration with River

 * many more…

优点

 * 许多灵活, 的特性(见features列表) - 作者拥有多年在搜索领域的涉猎经验 - senseidb的pros它也基本都有

缺点

 * 文档不足 - 后端没有大的商业机构支持

2) ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?

ElasticSearch 作者 kimchy 在 stackoverflow 上对 ElasticSearch 的描述

kimchy 同时也是 Compass 的作者

ElasticSearch 拥有先进的分布式模型,原生支持JSON,通过JSON DSL进行交互,具有丰富的特性

3) Solr vs. ElasticSearch

还是 Stackoverflow 对 ElasticSearch 的评论

优点

 * ElasticSearch is distributed. No separate project required. Replicas are near real-time too, which is called "Push replication".

 * ElasticSearch fully supports the near real-time search of Apache Lucene.

 * Handling multitenancy is not a special configuration, where with Solr a more advanced setup is necessary.

 * ElasticSearch introduces the concept of the Gateway, which makes full backups easier.

缺点

 * Only one main developer [this isn't true anymore according to the current elasticsearch GitHub organization, besides having a pretty active committer base in the first place]

 * No autowarming feature

4) Realtime Search: Solr vs Elasticsearch

应用实时检索的情况下,Solr 和 ElasticSearch 的性能比较

* 实际应用中,ElasticSearch 在检索性能上比 Solr 有50倍的提升

5) SenseiDB Performance Benchmark

SenseiDB 与 MySQL 的性能对比

总结

 * 两者都具有良好的分布式扩展能力,都支持实时检索。

 * ElasticSearch 由 Compass 的作者 kimchy 开发,目标提供云计算平台的检索系统,具有许多灵活的特性。主要缺点是缺乏大公司支持,开发时间短尚不成熟。

 * SenseiDB 由 LinkedIn 开源,目标支持高速索引更新,多数据源接入,灵活的访问接口。缺点是需要定义静态Schema,开发时间也不长。

相关文章