citus 之六 多表join

2022-05-07 00:00:00 专区 节点 分片 关联 广播

os: ubuntu 16.04
postgresql: 9.6.8
citus: postgresql-9.6-citus 8.0.0

ip规划如下:

192.168.0.92 pgsql1 --coordinator 节点

192.168.0.90 pgsql2 --worker 节点
192.168.0.88 pgsql3 --worker 节点

distributed table:分片表,rows会分片在 worker节点中。主要用于大量数据的事实表。
reference table:广播表,每个 worker 节点都保存一模一样的数据。主要用于维度表。

准备数据
citusdb=# select * from pg_dist_node;
nodeid | groupid | nodename | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster
--------+---------+--------------+----------+----------+-------------+----------+----------+-------------
1 | 1 | 192.168.0.90 | 5432 | default | f | t | primary | default
2 | 2 | 192.168.0.88 | 5432 | default | f | t | primary | default
(2 rows)

Time: 0.744 ms


准备分片表

citusdb=# create table shard_t0(c0 varchar(100),c1 varchar(100),c2 varchar(100),c3 varchar(100));
select create_distributed_table('shard_t0','c0');
insert into shard_t0(c0,c1,c2,c3)
select id::varchar,(id*2)::varchar,md5((id)::varchar),md5(md5((id)::varchar))
from generate_series(1,1000000) as id;

citusdb=#


citusdb=# create table shard_t1(c0 varchar(100),c1 varchar(100),c2 varchar(100),c3 varchar(100));
select create_distributed_table('shard_t1','c0');
insert into shard_t1(c0,c1,c2,c3)
select id::varchar,(id*2)::varchar,md5((id)::varchar),md5(md5((id)::varchar))
from generate_series(1,1000000) as id;

citusdb=#


citusdb=# create table shard_t2(c0 varchar(100),c1 varchar(100),c2 varchar(100),c3 varchar(100));
select create_distributed_table('shard_t2','c1');
insert into shard_t2(c0,c1,c2,c3)
select id::varchar,(id*2)::varchar,md5((id)::varchar),md5(md5((id)::varchar))
from generate_series(1,1000000) as id;

citusdb=#


准备广播表

citusdb=# create table ref_t0(c0 varchar(100),c1 varchar(100),c2 varchar(100),c3 varchar(100));
select create_reference_table('ref_t0');
insert into ref_t0(c0,c1,c2,c3)
select id::varchar,(id*2)::varchar,md5((id)::varchar),md5(md5((id)::varchar))
from generate_series(1,500000) as id;

citusdb=#


citusdb=# create table ref_t1(c0 varchar(100),c1 varchar(100),c2 varchar(100),c3 varchar(100));
select create_reference_table('ref_t1');
insert into ref_t1(c0,c1,c2,c3)
select id::varchar,(id*2)::varchar,md5((id)::varchar),md5(md5((id)::varchar))
from generate_series(1,500000) as id;

citusdb=#


查看表

citusdb=# \d
List of relations
Schema | Name | Type | Owner
--------+--------------------+-------+---------
public | pg_stat_statements | view | cituser
public | ref_t0 | table | cituser
public | ref_t1 | table | cituser
public | shard_t0 | table | cituser
public | shard_t1 | table | cituser
public | shard_t2 | table | cituser
(6 rows)


分片表与分片表关联
相同分片列

citusdb=# select count(t0.*) from shard_t0 t0,shard_t1 t1 where t0.c0=t1.c0;
count
---------
1000000
(1 row)

一个表为分片列,另外一个表为非分片列

citusdb=# select count(t0.*) from shard_t0 t0,shard_t1 t1 where t0.c0=t1.c1;
ERROR: the query contains a join that requires repartitioning
HINT: Set citus.enable_repartition_joins to on to enable repartitioning

citusdb=# show citus.enable_repartition_joins;
citus.enable_repartition_joins
--------------------------------
off
(1 row)

citusdb=# set citus.enable_repartition_joins = on;
SET
citusdb=# set citus.task_executor_type='task-tracker';
SET
citusdb=# select count(t0.*) from shard_t0 t0,shard_t1 t1 where t0.c0=t1.c1;
ERROR: failed to execute task 3

为什么会出现 ERROR: failed to execute task ?

关联列均为为非分片列

citusdb=# select count(t0.*) from shard_t0 t0,shard_t1 t1 where t0.c1=t1.c1;
ERROR: the query contains a join that requires repartitioning
HINT: Set citus.enable_repartition_joins to on to enable repartitioning

citusdb=# show citus.enable_repartition_joins;
citus.enable_repartition_joins
--------------------------------
off
(1 row)

citusdb=# set citus.enable_repartition_joins = on;
SET
citusdb=# set citus.task_executor_type='task-tracker';
SET
citusdb=# select count(t0.*) from shard_t0 t0,shard_t1 t1 where t0.c1=t1.c1;
ERROR: failed to execute task 3



和前一个错误一样,是不是 citus 8.0 的硬限制或者bug ,感觉不应该啊。
这个有空再要好好研究下。

分片表与广播表关联
分片表为分片列

citusdb=# select count(t0.*) from shard_t0 t0,ref_t0 t1 where t0.c0=t1.c0;
count
--------
500000
(1 row)

分片表为非分片列

citusdb=# select count(t0.*) from shard_t0 t0,ref_t0 t1 where t0.c1=t1.c0;
count
--------
250000
(1 row)

顺利

广播表与广播表关联
citusdb=# select count(t0.*) from ref_t0 t0,ref_t1 t1 where t0.c0=t1.c0;
count
--------
500000
(1 row)

————————————————
版权声明:本文为CSDN博主「数据库人生」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/ctypyb2002/article/details/84068774

相关文章