Greenplum多级分区表

2023-03-03 00:00:00 拆分专区都是分区分区表

现实生产环境中，一些大表免不了需要进行拆分，那么这里我们介绍一下Greenplum对大表的处理方法
一般我们表都是堆表，按行存储的，Greenplum支持列存储，并且支持数据压缩

如果当你的表大到一个度，那么就需要进行逻辑上的拆分，将一张大表拆分为多张小表，这样可以提升相关表操作的效率，一般我们的行存储在返回多列的效率上比较高，所以一般OLTP都是使用行存储，列存储更适合用于统计的相关操作，所以都是用在OLAP系统下

创建一个范围分区表
 
CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan16 START (date '2016-01-01') INCLUSIVE , 
  PARTITION Feb16 START (date '2016-02-01') INCLUSIVE ,
  PARTITION Mar16 START (date '2016-03-01') INCLUSIVE ,
  PARTITION Apr16 START (date '2016-04-01') INCLUSIVE ,
  PARTITION May16 START (date '2016-05-01') INCLUSIVE ,
  PARTITION Jun16 START (date '2016-06-01') INCLUSIVE ,
  PARTITION Jul16 START (date '2016-07-01') INCLUSIVE ,
  PARTITION Aug16 START (date '2016-08-01') INCLUSIVE ,
  PARTITION Sep16 START (date '2016-09-01') INCLUSIVE ,
  PARTITION Oct16 START (date '2016-10-01') INCLUSIVE ,
  PARTITION Nov-16 START (date '2016-11-01') INCLUSIVE ,
  PARTITION Dec16 START (date '2016-12-01') INCLUSIVE 
                  END (date '2017-01-01') EXCLUSIVE );
 
insert into sales values (1,'2016-01-31',1.02);
select * from sales_1_prt_jan16;
 id |    date    | amt  
----+------------+------
  1 | 2016-01-31 | 1.02
 
insert into sales values (1,'2016-03-02',1.02);
 
select * from sales_1_prt_mar16;
 id |    date    | amt  
----+------------+------
  1 | 2016-03-02 | 1.02  
 
这里没有定义end的值，那么结束值就是下一个开始值，但是不包含这个值
 
创建一个list分区
CREATE TABLE rank (id int, rank int, year int, gender 
char(1), count int ) 
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'), 
  PARTITION boys VALUES ('M'), 
  DEFAULT PARTITION other );
 
list之前一篇文章详细介绍过,不再细讲
 
 
创建Multi-level分区表
CREATE TABLE sales (trans_id int, date date, amount 
decimal(9,2), region text) 
DISTRIBUTED BY (trans_id)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'), 
  SUBPARTITION asia VALUES ('asia'), 
  SUBPARTITION europe VALUES ('europe'), 
  DEFAULT SUBPARTITION other_regions)
  (START (date '2011-01-01') INCLUSIVE
   END (date '2012-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 month'), 
   DEFAULT PARTITION outlying_dates );
 
这里先按date范围分区，间隔按月，然后又按region进行list分区
warehouse=# \dt+ sales_1_prt_3*  --经过查证，后面插入22年2月的数据都是落在这个范围分区
                                  List of relations
 Schema |               Name                | Type  |  Owner  | Storage | Description 
--------+-----------------------------------+-------+---------+---------+-------------
 public | sales_1_prt_3                     | table | gpadmin | heap    | 
 public | sales_1_prt_3_2_prt_asia          | table | gpadmin | heap    | 
 public | sales_1_prt_3_2_prt_europe        | table | gpadmin | heap    | 
 public | sales_1_prt_3_2_prt_other_regions | table | gpadmin | heap    | 
 public | sales_1_prt_3_2_prt_usa           | table | gpadmin | heap    | 
 
插入2011年不同region举例
warehouse=#  insert into sales VALUES (1,'2011-02-22',3.96,'hank');
INSERT 0 1
warehouse=# select * from sales;
 trans_id |    date    | amount | region 
----------+------------+--------+--------
        1 | 2011-02-22 |   3.96 | hank
(1 row)
 
warehouse=#  insert into sales VALUES (1,'2011-02-22',3.96,'usa');
INSERT 0 1
 
warehouse=# select * from sales_1_prt_3;
 trans_id |    date    | amount | region 
----------+------------+--------+--------
        1 | 2011-02-22 |   3.96 | usa
        1 | 2011-02-22 |   3.96 | hank
(2 rows)
 
warehouse=# select * from sales_1_prt_3_2_prt_usa;
 trans_id |    date    | amount | region 
----------+------------+--------+--------
        1 | 2011-02-22 |   3.96 | usa
 
warehouse=# select * from sales_1_prt_3_2_prt_other_regions;
 trans_id |    date    | amount | region 
----------+------------+--------+--------
        1 | 2011-02-22 |   3.96 | hank
 
大于2011年的数据举例：
warehouse=# insert into sales VALUES (3,'2012-03-31',3.96,'asia');
warehouse=# select * from sales_1_prt_outlying_dates;
 trans_id |    date    | amount | region 
----------+------------+--------+--------
        3 | 2012-03-31 |   3.96 | asia
(1 row)
 
warehouse=# select * from sales_1_prt_outlying_dates_2_prt_asia;
 trans_id |    date    | amount | region 
----------+------------+--------+--------
        3 | 2012-03-31 |   3.96 | asia
 
 
创建three_level分区表
CREATE TABLE p3_sales (id int, year int, month int, day int, 
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
    SUBPARTITION BY RANGE (month)
       SUBPARTITION TEMPLATE (
        START (1) END (13) EVERY (1), 
        DEFAULT SUBPARTITION other_months )
           SUBPARTITION BY LIST (region)
             SUBPARTITION TEMPLATE (
               SUBPARTITION usa VALUES ('usa'),
               SUBPARTITION europe VALUES ('europe'),
               SUBPARTITION asia VALUES ('asia'),
               DEFAULT SUBPARTITION other_regions )
( START (2002) END (2012) EVERY (1), 
  DEFAULT PARTITION outlying_years );

本文来源：https://blog.csdn.net/dazuiba008/article/details/79993710

相关文章