Greenplum多级分区表

2023-03-03 00:00:00 拆分 专区 都是 分区 分区表

现实生产环境中,一些大表免不了需要进行拆分,那么这里我们介绍一下Greenplum对大表的处理方法
一般我们表都是堆表,按行存储的,Greenplum支持列存储,并且支持数据压缩


如果当你的表大到一个度,那么就需要进行逻辑上的拆分,将一张大表拆分为多张小表,这样可以提升相关表操作的效率,一般我们的行存储在返回多列的效率上比较高,所以一般OLTP都是使用行存储,列存储更适合用于统计的相关操作,所以都是用在OLAP系统下

创建一个范围分区表

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan16 START (date '2016-01-01') INCLUSIVE ,
PARTITION Feb16 START (date '2016-02-01') INCLUSIVE ,
PARTITION Mar16 START (date '2016-03-01') INCLUSIVE ,
PARTITION Apr16 START (date '2016-04-01') INCLUSIVE ,
PARTITION May16 START (date '2016-05-01') INCLUSIVE ,
PARTITION Jun16 START (date '2016-06-01') INCLUSIVE ,
PARTITION Jul16 START (date '2016-07-01') INCLUSIVE ,
PARTITION Aug16 START (date '2016-08-01') INCLUSIVE ,
PARTITION Sep16 START (date '2016-09-01') INCLUSIVE ,
PARTITION Oct16 START (date '2016-10-01') INCLUSIVE ,
PARTITION Nov-16 START (date '2016-11-01') INCLUSIVE ,
PARTITION Dec16 START (date '2016-12-01') INCLUSIVE
END (date '2017-01-01') EXCLUSIVE );

insert into sales values (1,'2016-01-31',1.02);
select * from sales_1_prt_jan16;
id | date | amt
----+------------+------
1 | 2016-01-31 | 1.02

insert into sales values (1,'2016-03-02',1.02);

select * from sales_1_prt_mar16;
id | date | amt
----+------------+------
1 | 2016-03-02 | 1.02

这里没有定义end的值,那么结束值就是下一个开始值,但是不包含这个值

创建一个list分区
CREATE TABLE rank (id int, rank int, year int, gender
char(1), count int )
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'),
PARTITION boys VALUES ('M'),
DEFAULT PARTITION other );

list之前一篇文章详细介绍过,不再细讲


创建Multi-level分区表
CREATE TABLE sales (trans_id int, date date, amount
decimal(9,2), region text)
DISTRIBUTED BY (trans_id)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'),
SUBPARTITION asia VALUES ('asia'),
SUBPARTITION europe VALUES ('europe'),
DEFAULT SUBPARTITION other_regions)
(START (date '2011-01-01') INCLUSIVE
END (date '2012-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month'),
DEFAULT PARTITION outlying_dates );

这里先按date范围分区,间隔按月,然后又按region进行list分区
warehouse=# \dt+ sales_1_prt_3* --经过查证,后面插入22年2月的数据都是落在这个范围分区
List of relations
Schema | Name | Type | Owner | Storage | Description
--------+-----------------------------------+-------+---------+---------+-------------
public | sales_1_prt_3 | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_asia | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_europe | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_other_regions | table | gpadmin | heap |
public | sales_1_prt_3_2_prt_usa | table | gpadmin | heap |

插入2011年不同region举例
warehouse=# insert into sales VALUES (1,'2011-02-22',3.96,'hank');
INSERT 0 1
warehouse=# select * from sales;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | hank
(1 row)

warehouse=# insert into sales VALUES (1,'2011-02-22',3.96,'usa');
INSERT 0 1

warehouse=# select * from sales_1_prt_3;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | usa
1 | 2011-02-22 | 3.96 | hank
(2 rows)

warehouse=# select * from sales_1_prt_3_2_prt_usa;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | usa

warehouse=# select * from sales_1_prt_3_2_prt_other_regions;
trans_id | date | amount | region
----------+------------+--------+--------
1 | 2011-02-22 | 3.96 | hank

大于2011年的数据举例:
warehouse=# insert into sales VALUES (3,'2012-03-31',3.96,'asia');
warehouse=# select * from sales_1_prt_outlying_dates;
trans_id | date | amount | region
----------+------------+--------+--------
3 | 2012-03-31 | 3.96 | asia
(1 row)

warehouse=# select * from sales_1_prt_outlying_dates_2_prt_asia;
trans_id | date | amount | region
----------+------------+--------+--------
3 | 2012-03-31 | 3.96 | asia


创建three_level分区表
CREATE TABLE p3_sales (id int, year int, month int, day int,
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
SUBPARTITION BY RANGE (month)
SUBPARTITION TEMPLATE (
START (1) END (13) EVERY (1),
DEFAULT SUBPARTITION other_months )
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE (
SUBPARTITION usa VALUES ('usa'),
SUBPARTITION europe VALUES ('europe'),
SUBPARTITION asia VALUES ('asia'),
DEFAULT SUBPARTITION other_regions )
( START (2002) END (2012) EVERY (1),
DEFAULT PARTITION outlying_years );



本文来源:https://blog.csdn.net/dazuiba008/article/details/79993710

相关文章