CrateDB初探(二):PARTITION, SHARDING AND REPLICATION
分区
官方文档:PARTITIONED TABLES
A partitioned table is a virtual table that can be created by naming one or more columns by which it is split into separate internal tables, called partitions.
分区表是一个虚拟表,它被分割为独立的内部表
如果表中存在主键,那么主键必须被包括在分区值中,且不能包含非主键列
根据day列分区,目前有三个分区,每个分区又有6个分片(分片数是db自动设置的,如何设置分片数下面会介绍)
限制
查询
查询时好带上分区列,这样db会过滤分区,提高查询效率
UPDATE, DELETE and SELECT queries are all optimized to only affect as few partitions as possible based on the partitions referenced in the WHERE clause.
分片
官方文档:SHARDING
Shards are then distributed across the cluster. As nodes are added to the cluster, CrateDB will move shards around to achieve maximum possible distribution.
表分片后每个shard在集群上分布。集群在动态增加节点后,cratedb会调整shards在各节点上的分布,以达到可能的大化分布。对用户来说,在查询时并不用关心sharding,在表级别(table-level) sharding对用户是透明的,
基本语法
CLUSTERED INTO <number> SHARDS
注意:分片数可以在运行时调整,但是在调整结束前此表会是read-only状态。
路由(routing)的语法
确定分片数量
官方文档:SHARDING GUIDE (best practice guide for CrateDB)
shards数量与数据类型,查询类型与硬件类型有关(type of data you're processing, the * of queries you're running, and the type of hardware you're using.)
根据官方文档,设置与集群能用到的CPUs个数相同或者略超过(a little over-allocation) CPUs的shards数量是比较合理的,
但是,当大多数节点的表均shards(shards per table) 超过了他们能用到的CPUs数量时,性能会下降。
副本
什么是Replication
information_schema.tables查看上面my_table9这个表的num_reps列为0-1复制,这个是cratedb的默认设置
基本语法
来源 https://blog.csdn.net/gxf1027/article/details/104833538
相关文章