B 树与哈希表

在MySQL中,索引类型是b-tree,访问b-tree中的元素是在对数分摊时间O(log(n)).

In MySQL, an index type is a b-tree, and access an element in a b-tree is in logarithmic amortized time O(log(n)).

另一方面,访问哈希表中的元素是O(1).

On the other hand, accessing an element in a hash table is in O(1).

为什么不使用哈希表代替 b 树来访问数据库中的数据?

Why is a hash table not used instead of a b-tree in order to access data inside a database?

推荐答案

您只能通过哈希表中的主键访问元素.这比使用树算法(O(1) 而不是 log(n))要快,但您不能选择范围(xy 之间的所有内容).树算法在 Log(n) 中支持这一点,而哈希索引可以导致全表扫描 O(n).此外,哈希索引的常量开销通常更大(这不是 theta 符号中的因素,但它仍然存在).此外,树算法通常更易于维护、随数据增长、规模等.

You can only access elements by their primary key in a hashtable. This is faster than with a tree algorithm (O(1) instead of log(n)), but you cannot select ranges (everything in between x and y). Tree algorithms support this in Log(n) whereas hash indexes can result in a full table scan O(n). Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists). Also tree algorithms are usually easier to maintain, grow with data, scale, etc.

哈希索引使用预定义的哈希大小,因此您最终会得到一些桶";对象存储的位置.这些对象再次循环以真正在此分区中找到正确的对象.

Hash indexes work with pre-defined hash sizes, so you end up with some "buckets" where the objects are stored in. These objects are looped over again to really find the right one inside this partition.

因此,如果您的尺寸较小,则小元素的开销很大,大尺寸会导致进一步扫描.

So if you have small sizes you have a lot of overhead for small elements, big sizes result in further scanning.

今天的哈希表算法通常可以扩展,但扩展效率很低.

Todays hash tables algorithms usually scale, but scaling can be inefficient.

确实有可扩展的散列算法.不要问我这是如何工作的——这对我来说也是个谜.AFAIK 它们是从可扩展复制演变而来的,在这种复制中,重新散列并不容易.

There are indeed scalable hashing algorithms. Don't ask me how that works - its a mystery to me too. AFAIK they evolved from scalable replication where re-hashing is not easy.

它被称为 RUSH - Replication Under Scalable Hashing,这些算法因此被称为 RUSH 算法.

Its called RUSH - Replication Under Scalable Hashing, and those algorithms are thus called RUSH algorithms.

但是,与散列大小相比,您的索引可能会超过可容忍的大小,并且您的整个索引需要重新构建.通常这不是问题,但对于巨大的数据库来说,这可能需要几天时间.

However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.

树算法的权衡很小,它们几乎适用于所有用例,因此是默认值.

The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.

但是,如果您有一个非常精确的用例,并且您确切地知道需要什么并且只需要什么,则可以利用散列索引.

However if you have a very precise use case and you know exactly what and only what is going to be needed, you can take advantage of hashing indexes.

相关文章