CrateDB初探（二）：PARTITION, SHARDING AND REPLICATION

2022-02-25 00:00:00 分区分区表路由分片主键

分区
官方文档：PARTITIONED TABLES

A partitioned table is a virtual table that can be created by naming one or more columns by which it is split into separate internal tables, called partitions.

分区表是一个虚拟表，它被分割为独立的内部表

如果表中存在主键，那么主键必须被包括在分区值中，且不能包含非主键列

（这里分片数num_shards是系统设置的）

通过information_schema.table_partitions查看目前表的分区，由于没有数据，分区数为0

根据day列分区，目前有三个分区，每个分区又有6个分片（分片数是db自动设置的，如何设置分片数下面会介绍）

限制

分区列的值不能被更新，会抛异常

查询
查询时好带上分区列，这样db会过滤分区，提高查询效率

UPDATE, DELETE and SELECT queries are all optimized to only affect as few partitions as possible based on the partitions referenced in the WHERE clause.

分片
官方文档：SHARDING

Shards are then distributed across the cluster. As nodes are added to the cluster, CrateDB will move shards around to achieve maximum possible distribution.

表分片后每个shard在集群上分布。集群在动态增加节点后，cratedb会调整shards在各节点上的分布，以达到可能的大化分布。对用户来说，在查询时并不用关心sharding，在表级别(table-level) sharding对用户是透明的，

基本语法
CLUSTERED INTO <number> SHARDS

注意：分片数可以在运行时调整，但是在调整结束前此表会是read-only状态。

注意：上述方式，对于分区表（partitioned table）分片数量调整只针对新分区，对于已经存在的分区没有影响。

如果要对已存在的分区进行调整，需要指定分区

路由(routing)的语法

如果设置了主键，路由列可以忽略，cratedb默认通过主键进行routing

如果设置了路由列，且存在主键，那么路由列必须包含一个主键，官方示例：

确定分片数量
官方文档：SHARDING GUIDE (best practice guide for CrateDB)

shards数量与数据类型，查询类型与硬件类型有关（type of data you're processing, the * of queries you're running, and the type of hardware you're using.）

根据官方文档，设置与集群能用到的CPUs个数相同或者略超过(a little over-allocation) CPUs的shards数量是比较合理的，

但是，当大多数节点的表均shards(shards per table) 超过了他们能用到的CPUs数量时，性能会下降。

懒人做法：

根据每个nodes有两个CPUs的假设，让CrateDB来做决定（猜测）

副本

什么是Replication

官方文档

Replication指的是每个主分片(primary shard)额外储存了N份副本. Replication是为了增加数据库读的性能以及实现高可用.

information_schema.tables查看上面my_table9这个表的num_reps列为0-1复制，这个是cratedb的默认设置

基本语法

在新建表时可以设置副本数，当设置为0时可以在控制台看到只有一个主shard

来源 https://blog.csdn.net/gxf1027/article/details/104833538

相关文章