MySQL处理海量数据

2022-06-10 00:00:00 partitioning mysql

目前,我的应用程序每小时大约生成4000万条记录,我已经为每小时创建了一个分区,这样我就可以更容易地在需要时删除该分区,还可以使用该分区来聚合数据。

如果没有发生任何查询,我的聚合过程将运行得很好,但一旦启动这些查询,聚合代码就需要一个多小时才能完成。

在MySQL中,是否有基于对数据库发生的查询而冻结且不影响的进程?

回复@Rick

  • 内存:32 GB
  • Innodb_Buffer_Pool_Size:20 GB
  • 固态硬盘:是
  • 读取类型:它是 GROUP BY和UPDATE混合覆盖主键

我不想每隔5分钟进行一次聚合,因为这也会生成大量记录,应用程序无法实现,我实际上每小时保存5个分区并运行最旧的分区,我的应用程序至少需要5小时的非聚合数据。

对于我的应用程序,不需要ACID类型的特征,因此将默认隔离更改为Read-Unmitted,并将自动提交更改为0,这提高了聚合代码的运行速度,但插入受到了影响,这需要2秒以上的时间。

此处更新聚合查询的配置文件信息

+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| QUERY_ID | SEQ | STATE                     | DURATION  | CPU_USER   | CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION       | SOURCE_FILE          | SOURCE_LINE |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
|    50754 |   2 | continuing inside routine |  0.000015 |   0.000197 |   0.000036 |                 1 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | NULL                  | NULL                 |        NULL |
|    50754 |   3 | checking permissions      |  0.000007 |   0.000005 |   0.000001 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | check_access          | sql_authorization.cc |         809 |
|    50754 |   4 | checking permissions      |  0.000006 |   0.000006 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | check_access          | sql_authorization.cc |         809 |
|    50754 |   5 | Opening tables            |  0.000017 |   0.000013 |   0.000003 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | open_tables           | sql_base.cc          |        5815 |
|    50754 |   6 | init                      |  0.000260 |   0.000400 |   0.000073 |                 1 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | handle_query          | sql_select.cc        |         128 |
|    50754 |   7 | System lock               |  0.000011 |   0.000009 |   0.000001 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | mysql_lock_tables     | lock.cc              |         330 |
|    50754 |   8 | optimizing                |  0.000115 |   0.000098 |   0.000017 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | optimize              | sql_optimizer.cc     |         158 |
|    50754 |   9 | statistics                |  0.001624 |   0.003051 |   0.000552 |                 3 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | optimize              | sql_optimizer.cc     |         374 |
|    50754 |  10 | preparing                 |  0.000158 |   0.000134 |   0.000024 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | optimize              | sql_optimizer.cc     |         482 |
|    50754 |  11 | Sorting result            |  0.000009 |   0.000007 |   0.000001 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | make_tmp_tables_info  | sql_select.cc        |        3849 |
|    50754 |  12 | executing                 |  0.000006 |   0.000005 |   0.000001 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | exec                  | sql_executor.cc      |         126 |
|    50754 |  13 | Sending data              | 40.298694 | 144.161765 |  12.297466 |            596361 |              261826 |       265128 |       2899384 |             0 |                 0 |                 0 |               328 |     0 | exec                  | sql_executor.cc      |         202 |
|    50754 |  14 | end                       |  0.000031 |   0.000024 |   0.000005 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | handle_query          | sql_select.cc        |         206 |
|    50754 |  15 | query end                 |  0.000016 |   0.000013 |   0.000003 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | mysql_execute_command | sql_parse.cc         |        4959 |
|    50754 |  16 | closing tables            |  0.000055 |   0.000048 |   0.000007 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | mysql_execute_command | sql_parse.cc         |        5018 |
|    50754 |  17 | query end                 |  0.000007 |   0.000005 |   0.000002 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | mysql_execute_command | sql_parse.cc         |        4959 |
|    50754 |  18 | closing tables            |  0.000012 |   0.000009 |   0.000002 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | mysql_execute_command | sql_parse.cc         |        5018 |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+

聚合代码如下所示,每次占用约100个客户端密钥,每小时可用客户端密钥约为100K

insert into DB.NETWORK_USAGE_FINAL(clientKey,k1,k2,k3,k4,k5,createdAt)
            select clientKey, sum(k1) as k1, sum(k2) as k2, sum(k3) as k3 ,
            k4,  k5 , "',startTime,'" from DB.NETWORK_USAGE_F1 partition (',partitionKey,') 
            where clientKey in (',selectedClientKey,')
            group by clientKey,k4,k5

[mysqld_safe]
socket      = /var/run/mysqld/mysqld.sock
nice        = 0

[mysqld]
#
# * Basic Settings
#
innodb_buffer_pool_size=20G
innodb_buffer_pool_instances=20
max_connections=100
query_cache_size=0
query_cache_type=0
query_cache_limit=2M
innodb_log_file_size=3G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_io_capacity = 2000
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
back_log = 1000
tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
innodb_lru_scan_depth=100
table_open_cache=4000
max_allowed_packet=1G
innodb_file_per_table=1
character-set-server = utf8
collation-server = utf8_unicode_ci
event_scheduler = ON
transaction_isolation = READ-COMMITTED

user        = mysql
pid-file    = /var/run/mysqld/mysqld.pid
socket      = /var/run/mysqld/mysqld.sock
port        = 3306
basedir     = /usr
datadir     = /var/lib/mysql
tmpdir      = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking


key_buffer_size     = 1G
thread_stack        = 128M
thread_cache_size       = 8

myisam-recover-options  = BACKUP

log_error = /var/log/mysql/error.log

expire_logs_days    = 10
max_binlog_size   = 100M

显示CREATE TABLE

CREATE TABLE `NETWORK_USAGE_F1` (
  `id` char(15) NOT NULL,
  `clientKey` int(11) NOT NULL,
  `k4` int(11) NOT NULL,
  `k5` char(50) NOT NULL,
  `createdAt` datetime NOT NULL,
  `partitionKey` int(11) NOT NULL,
  `k1` bigint(20) NOT NULL,
  `k2` bigint(20) NOT NULL,
  `k3` int(11) NOT NULL,
  PRIMARY KEY (`id`,`partitionKey`),
  KEY `key2` (`clientKey`,`k4`,`k5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY RANGE (partitionKey)
 */

@回复里克更新:

  • 聚合代码一次在100个客户端键(限制)上运行,一个小时内将存在大约100K个唯一的客户端键,但数据库中一个小时/分区的总行数约为4000万行(因此,每个客户端键大约有400行)
  • 使用的ID仅为15个字符长度
  • 目前我有5个分区,分区键格式为YYYYMMMDDHH
  • 不使用MyISAM

解决方案

让我们看看聚合代码。和SHOW CREATE TABLE

可尝试的其他内容:

  • 改为每5分钟汇总一次。
  • 不断地聚合";。也就是说,执行自上次执行聚合以来的所有行。
  • 将行收集到"临时"表中。然后进行聚合,最后将行复制到主表中。以下是有关乒乓球技术的讨论:http://mysql.rjweb.org/doc.php/staging_table

每秒11K行是很多,但也不是不可能。

其他问题:

  • 您有多少内存?
  • innodb_buffer_pool_size的设置是什么?
  • 您有固态硬盘吗?
  • 同时进行哪些类型的读取?(从大桌子上看书?是否正在读取汇总表?)

(re my.cnf)

我建议您将这些限制为内存的1%(除非您有确定的较大原因):

tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
max_allowed_packet=1G
key_buffer_size     = 1G    -- for MyISAM only

我希望您没有使用MyISAM。如果不是将其降低到50M。

thread_stack        = 128M

糟糕!保留默认设置!

(架构)

  • 使用VARCHAR,而不是CHAR,除非字符串确实是固定长度的。
  • 由于您的桌子将很大,请使用实际最小的INT大小。
  • 有多少个PARTITIONs?哪个版本的MySQL?(分区太多本身可能会拖累性能。)
  • partitionkey设置为什么?

(其他)

  • 是拼写错误,还是我被它的内容搞糊涂了?
  • 此外,这似乎与每小时4,000万条记录冲突。

相关文章