PolarDB-X 存储引擎组件 GalaxyEngine 开源技术解读
PolarDB-X 是由阿里巴巴自主研发的云原生分布式数据库,融合了分布式 SQL 引擎 GalaxySQL 和分布式存储引擎 GalaxyEngine,其中 GalaxyEngine 是新一代面向分布式场景的 MySQL 发行版本,作为官方 MySQL 版本的一个分支,除了吸收和跟进官方版本的持续改进以外,尤其在分布式场景下,实现了 Lizard 分布式事务和全局一致性解决方案、 Galaxy X-Protocol 交互协议 pipeline request、 X-Engine 存储引擎、 Galaxy X-Paxos Cluster 保证数据零丢失并持续可用,以及共享的 RDS MySQL 内核企业级功能等,GalaxyEngine 遵循 GPLv2 License,希望通过开源,回馈社区,共建 MySQL 生态繁荣。
GalaxyEngine 开源的技术细节:
1. Lizard 分布式事务和全局一致性解决方案
1.1 问题背景
1.2 Lizard SCN 单机事务系统
MySQL InnoDB 事务系统
1.从 Active Trx 拷贝一个 Read View;
2.检索行记录,根据 TID 到 Read View 中进行二分查找,判断可见性。
Lizard SCN 事务系统
Delayed Record Cleanout
Flashback Query
SELECT ... FROM tablename
AS OF [SCN | TIMESTAMP] expr;
mysql> SELECT * FROM tab AS OF TIMESTAMP '2020-12-17 16:40:40';
+----+---------+---------------------+
| id | version | gmt_modify |
+----+---------+---------------------+
| 1 | 1 | 2020-12-17 16:40:38 |
| 2 | 1 | 2020-12-17 16:40:39 |
+----+---------+---------------------+
mysql> SELECT * FROM tab AS OF TIMESTAMP '2020-12-17 16:40:55';
+----+---------+---------------------+
| id | version | gmt_modify |
+----+---------+---------------------+
| 1 | 2 | 2020-12-17 16:40:54 |
| 2 | 2 | 2020-12-17 16:40:54 |
+----+---------+---------------------+
Lizard SCN 性能表现
1.3 Lizard GCN 分布式事务系统
两阶段提交
SET SESSION innodb_commit_seq = [GCN];
XA START xid
......
XA COMMIT xid;
XA START xid;
......
XA COMMIT xid $GCN;
全局一致性
SET SESSION innodb_snapshot_seq = [GCN]
SELECT ... FROM tablename
AS OF GCN expr;
TSO 解决方案
CREATE SEQUENCE [IF NOT EXISTS] schema.sequence_name
[START WITH <constant>]
[MINVALUE <constant>]
[MAXVALUE <constant>]
[INCREMENT BY <constant>]
[CACHE <constant> | NOCACHE]
[CYCLE | NOCYCLE]
;
1. SELECT [nextval | currval]
FROM seq;
2. SELECT [nextval(seq) | currval(seq)];
3. SELECT [seq.currval | seq.nextval]
FROM dual;
CREATE SEQUENCE [IF NOT EXISTS] schema.sequence_name
[CACHE <constant> | NOCACHE]
TIMESTEAMP;
XA 事务完整性
2. Galaxy X-protocol 交互协议
消息格式
消息处理模型
3. X-Engine 存储引擎
X-Engine 整体架构
1. 利用LSM-tree先天优势,提升了系统写性能天花板。 2. 优化LSM-tree的compaction操作,以降低对系统性能的冲击,使得系统性能表现趋于平稳。
3. 利用持久化数据层只读特点,发挥压缩优势降低成本。 4. 利用天然分层结构,结合硬件能力使用冷热分层结构,降低综合成本。
5. 利用精细化访问机制和缓存技术,弥补LSM-tree读性能短板。
X-Engine 使用方法
建表语法
CREATE TABLE t1 (
c1 int PRIMARY KEY,
c2 int
) ENGINE = xengine;
X-Engine 性能水准
4. RDS MySQL 企业级功能
4.1 Purge Large Table Asynchronously
mysql> select * from information_schema.innodb_purge_files;
+--------+---------------------+--------------------+---------------+-------------------------+--------------+
| log_id | start_time | original_path | original_size | temporary_path | current_size |
+--------+---------------------+--------------------+---------------+-------------------------+--------------+
| | 2021-05-14 14:40:01 | ./file_purge/t.ibd | 146800640 | ./#FP_210514 14:40:01_9 | 79691776 |
+--------+---------------------+--------------------+---------------+-------------------------+--------------+
1 row in set (.20 sec)
4.2 Recycle Bin
Recycle Bin 机制介绍
回收机制:
如果是与表无关的对象,根据操作语句决定是否保留,不做回收。 如果是表的附属对象,可能会修改表数据的,做删除处理,例如 Trigger 和 Foreign key。但 Column statistics 不做清理,随表进入回收站。
清理机制:
权限机制:
命名机制:
Recycle Bin 相关参数
Recycle Bin 接口设计
-- Purge Table
dbms_recycle.purge_table('<TABLE>');
-- Restore table
dbms_recycle.restore_table('<RECYCLE_TABLE>','<DEST_DB>','<DEST_TABLE>');
-- Show tables
mysql> call dbms_recycle.show_tables();
+-----------------+---------------+---------------+--------------+---------------------+---------------------+
| SCHEMA | TABLE | ORIGIN_SCHEMA | ORIGIN_TABLE | RECYCLED_TIME | PURGE_TIME |
+-----------------+---------------+---------------+--------------+---------------------+---------------------+
| __recycle_bin__ | __innodb_1063 | product_db | t1 | 2019-08-08 11:01:46 | 2019-08-15 11:01:46 |
| __recycle_bin__ | __innodb_1064 | product_db | t2 | 2019-08-08 11:01:46 | 2019-08-15 11:01:46 |
| __recycle_bin__ | __innodb_1065 | product_db | parent | 2019-08-08 11:01:46 | 2019-08-15 11:01:46 |
| __recycle_bin__ | __innodb_1066 | product_db | child | 2019-08-08 11:01:46 | 2019-08-15 11:01:46 |
+-----------------+---------------+---------------+--------------+---------------------+---------------------+
4 rows in set (0.00 sec)
4.3 Returning
DBMS_TRANS.returning(<Field_list>,<Statement>);
mysql> call dbms_trans.returning("*", "insert into t(id) values(NULL),(NULL)");
+----+------+---------------------+
| id | col1 | col2 |
+----+------+---------------------+
| 1 | 1 | 2019-09-03 10:39:05 |
| 2 | 1 | 2019-09-03 10:39:05 |
+----+------+---------------------+
2 rows in set (.01 sec)
mysql> call dbms_trans.returning("id, col1, col2", "update t set col1 = 2 where id >2");
+----+------+---------------------+
| id | col1 | col2 |
+----+------+---------------------+
| 3 | 2 | 2019-09-03 10:41:06 |
| 4 | 2 | 2019-09-03 10:41:06 |
+----+------+---------------------+
2 rows in set (.01 sec)
mysql> call dbms_trans.returning("id, col1, col2", "delete from t where id < 3");
+----+------+---------------------+
| id | col1 | col2 |
+----+------+---------------------+
| 1 | 1 | 2019-09-03 10:40:55 |
| 2 | 1 | 2019-09-03 10:40:55 |
+----+------+---------------------+
2 rows in set (.00 sec)
4.4 Statement Concurrency Control
功能设计
SQL command: SQL命令类型,例如SELECT、UPDATE、INSERT、DELETE等。 Object: SQL命令操作的对象,例如TABLE、VIEW等。
keywords: SQL命令的关键字。
接口设计
-- Add Rule
dbms_ccl.add_ccl_rule('<Type>','<Schema_name>','<Table_name>',<Concurrency_count>,'<Keywords>');
-- Delete Rule
dbms_ccl.del_ccl_rule(<Id>);
-- Show Rule
dbms_ccl.show_ccl_rule();
-- Flush Rule
dbms_ccl.flush_ccl_rule();
4.5 Statement Outline
功能设计
Optimizer Hint 根据作用域和 hint 对象,分为 Global level hint、Table/Index level hint、Join order hint等。 Index Hint 根据 Index Hint 的类型和范围进行分类。
outline
表中。接口设计
-- Add optimizer Outline
dbms_outln.add_optimizer_outline('<Schema_name>','<Digest>','<query_block>','<hint>','<query>');
-- Add Index Outline
dbms_outln.add_index_outline('<Schema_name>','<Digest>',<Position>,'<Type>','<Hint>','<Scope>','<Query>');
-- Delete Outline
dbms_outln.del_outline(<Id>);
-- Show Outline
mysql> call dbms_outln.show_outline();
+------+------------+------------------------------------------------------------------+-----------+-------+------+-------------------------------------------------------+------+----------+-------------------------------------------------------------------------------------+
| ID | SCHEMA | DIGEST | TYPE | SCOPE | POS | HINT | HIT | OVERFLOW | DIGEST_TEXT |
+------+------------+------------------------------------------------------------------+-----------+-------+------+-------------------------------------------------------+------+----------+-------------------------------------------------------------------------------------+
| 33 | outline_db | 36bebc61fce7e32b93926aec3fdd790dad5d895107e2d8d3848d1c60b74bcde6 | OPTIMIZER | | 1 | /*+ SET_VAR(foreign_key_checks=OFF) */ | 1 | | SELECT * FROM `t1` WHERE `id` = ? |
| 32 | outline_db | 36bebc61fce7e32b93926aec3fdd790dad5d895107e2d8d3848d1c60b74bcde6 | OPTIMIZER | | 1 | /*+ MAX_EXECUTION_TIME(1000) */ | 2 | | SELECT * FROM `t1` WHERE `id` = ? |
| 34 | outline_db | d4dcef634a4a664518e5fb8a21c6ce9b79fccb44b773e86431eb67840975b649 | OPTIMIZER | | 1 | /*+ BNL(t1,t2) */ | 1 | | SELECT `t1` . `id` , `t2` . `id` FROM `t1` , `t2` |
| 35 | outline_db | 5a726a609b6fbfb76bb8f9d2a24af913a2b9d07f015f2ee1f6f2d12dfad72e6f | OPTIMIZER | | 2 | /*+ QB_NAME(subq1) */ | 2 | | SELECT * FROM `t1` WHERE `t1` . `col1` IN ( SELECT `col1` FROM `t2` ) |
| 36 | outline_db | 5a726a609b6fbfb76bb8f9d2a24af913a2b9d07f015f2ee1f6f2d12dfad72e6f | OPTIMIZER | | 1 | /*+ SEMIJOIN(@subq1 MATERIALIZATION, DUPSWEEDOUT) */ | 2 | | SELECT * FROM `t1` WHERE `t1` . `col1` IN ( SELECT `col1` FROM `t2` ) |
| 30 | outline_db | b4369611be7ab2d27c85897632576a04bc08f50b928a1d735b62d0a140628c4c | USE INDEX | | 1 | ind_1 | 3 | | SELECT * FROM `t1` WHERE `t1` . `col1` = ? AND `t1` . `col2` = ? |
| 31 | outline_db | 33c71541754093f78a1f2108795cfb45f8b15ec5d6bff76884f4461fb7f33419 | USE INDEX | | 2 | ind_2 | 1 | | SELECT * FROM `t1` , `t2` WHERE `t1` . `col1` = `t2` . `col1` AND `t2` . `col2` = ? |
+------+------------+------------------------------------------------------------------+-----------+-------+------+-------------------------------------------------------+------+----------+-------------------------------------------------------------------------------------+
7 rows in set (0.00 sec)
4.6 Performance Insight
Performance Insight 介绍
Object statisticsObject statistics查询表和索引的统计信息,包括如下两个表:
TABLE_STATISTICS:记录读取和修改的行。 INDEX_STATISTICS:记录索引的读取行。
Performance pointPerformance point 提供实例的详细性能信息,方便您更快更准确地量化SQL的开销。Performance point包括如下三个维度:
CPU:包括执行任务的总时间(Elapsed time)、CPU执行任务的时间(CPU time)等。 LOCK:包括服务器MDL锁时间、存储事务锁时间、互斥冲突(仅调试模式)、读写锁冲突等。 IO:数据文件读写时间、日志文件写入时间、逻辑读取、物理读取、物理异步读取等。
Object statistics 使用方法
确认参数 OPT_TABLESTAT 和 OPT_INDEXSTAT 的值为 ON。示例如下:
mysql> show variables like "opt_%_stat";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| opt_indexstat | ON |
| opt_tablestat | ON |
+---------------+-------+
mysql> select * from TABLE_STATISTICS limit 10;
+--------------+--------------+-----------+--------------+------------------------+---------------+--------------+--------------+
| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES | ROWS_INSERTED | ROWS_DELETED | ROWS_UPDATED |
+--------------+--------------+-----------+--------------+------------------------+---------------+--------------+--------------+
| mysql | db | 2 | 0 | | 0 | | 0 |
| mysql | engine_cost | 2 | | 0 | | 0 | |
| mysql | proxies_priv | 1 | 0 | | 0 | | 0 |
| mysql | server_cost | 6 | | 0 | | 0 | |
| mysql | tables_priv | 2 | 0 | | 0 | | 0 |
| mysql | user | 7 | | 0 | | 0 | |
| test | sbtest1 | 1686 | 142 | 184 | 112 | 12 | 18 |
| test | sbtest1 | 1806 | 125 | 150 | 105 | 5 | 15 |
| test | sbtest100 | 1623 | 141 | 182 | 110 | 10 | 21 |
| test | sbtest11 | 1254 | 136 | 172 | 110 | 10 | 16 |
+--------------+--------------+-----------+--------------+------------------------+---------------+--------------+--------------+
mysql> select * from INDEX_STATISTICS limit 10;
+--------------+--------------+------------+-----------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | ROWS_READ |
+--------------+--------------+------------+-----------+
| mysql | db | PRIMARY | 2 |
| mysql | engine_cost | PRIMARY | 2 |
| mysql | proxies_priv | PRIMARY | 1 |
| mysql | server_cost | PRIMARY | 6 |
| mysql | tables_priv | PRIMARY | 2 |
| mysql | user | PRIMARY | 7 |
| test | sbtest1 | PRIMARY | 2500 |
| test | sbtest10 | PRIMARY | 3007 |
| test | sbtest10 | PRIMARY | 2642 |
| test | sbtest11 | PRIMARY | 2091 |
+--------------+--------------+------------+-----------+
Performance point 使用方法
mysql> show variables like "%performance_point%";
+---------------------------------------+-------+
| Variable_name | Value |
+---------------------------------------+-------+
| performance_point_dbug_enabled | OFF |
| performance_point_enabled | ON |
| performance_point_iostat_interval | 2 |
| performance_point_iostat_volume_size | 10000 |
| performance_point_lock_rwlock_enabled | ON |
+---------------------------------------+-------+
mysql> select * from events_statements_summary_by_digest_supplement limit 10;
+--------------------+----------------------------------+-------------------------------------------+--------------+
| SCHEMA_NAME | DIGEST | DIGEST_TEXT | ELAPSED_TIME | ......
+--------------------+----------------------------------+-------------------------------------------+--------------+
| NULL | 6b787dd1f9c6f6c5033120760a1a82de | SELECT @@`version_comment` LIMIT ? | 932 |
| NULL | 2fb4341654df6995113d998c52e5abc9 | SHOW SCHEMAS | 2363 |
| NULL | 8a93e76a7846384621567fb4daa1bf95 | SHOW VARIABLES LIKE ? | 17933 |
| NULL | dd148234ac7a20cb5aee7720fb44b7ea | SELECT SCHEMA ( ) | 1006 |
| information_schema | 2fb4341654df6995113d998c52e5abc9 | SHOW SCHEMAS | 2156 |
| information_schema | 74af182f3a2bd265678d3dadb53e08da | SHOW TABLES | 3161 |
| information_schema | d3a66515192fcb100aaef6f8b6e45603 | SELECT * FROM `TABLE_STATISTICS` LIMIT ? | 2081 |
| information_schema | b3726b7c4c4db4b309de2dbc45ff52af | SELECT * FROM `INDEX_STATISTICS` LIMIT ? | 2384 |
| information_schema | dd148234ac7a20cb5aee7720fb44b7ea | SELECT SCHEMA ( ) | 129 |
| test | 2fb4341654df6995113d998c52e5abc9 | SHOW SCHEMAS | 342 |
+--------------------+----------------------------------+-------------------------------------------+--------------+
相关文章