为日志而生的存储引擎

2020-06-01 00:00:00 专区 订阅 付费 希望 建立一个

详细设计文档:Documentation · Issue #1 · esdb/lstore


A log storage engine, that supports

  • Multi-dimensional search (like Elasticsearch, but optimized for append-only log)
  • Conditional log append

It is trivial to achieve both goals by storing the log into a relational database, or a key-value store, or a search engine. However, generic storage system is not cost efficient to store log of immutable entries. If we only support "append-only" data, a great deal of optimization is possible. The goal of lstore is to be more than 10x cost efficient than traditional storage systems (still working on that).

Conditional log append is a crucial feature to make lstore reusable in a wide range of scenarios. Any kind of optimistic locking relies on some sort of conditional log append. It is like the lock cmpxchg instruction for the distributed world.

A high level summary of the storage format is "bloomfilter tree of log blocks". Majorities of storage systems built on top of sorting. They works like disk backed tree. However, lstore is built on top of hashing. It works like disk backed hashmap. The architecture is similar with bitcask. But unlike bitcask, lstore is a low level append-only log store, instead of mutable key-value store.

List of optimizations:

  • Bloom filter tree, which enables multi-dimensional search without sorting
  • Min-max index (TODO: to support range query)
  • Off-heap memory management: map file system cache directly as go object
  • Columnar data block, to scan less data
  • Columnar data block enables linear searching with SIMD instruction
  • Columnar data block enables searchable compression (search without decompression first)
    • Dictionary Encoding (TODO)
    • Sparse Column (TODO)
    • FOR integer compression (TODO)

相关文章