MySQL处理海量数据
目前,我的应用程序每小时大约生成4000万条记录,我已经为每小时创建了一个分区,这样我就可以更容易地在需要时删除该分区,还可以使用该分区来聚合数据。
如果没有发生任何查询,我的聚合过程将运行得很好,但一旦启动这些查询,聚合代码就需要一个多小时才能完成。
在MySQL中,是否有基于对数据库发生的查询而冻结且不影响的进程?
回复@Rick
- 内存:32 GB
- Innodb_Buffer_Pool_Size:20 GB
- 固态硬盘:是
- 读取类型:它是 GROUP BY和UPDATE混合覆盖主键
我不想每隔5分钟进行一次聚合,因为这也会生成大量记录,应用程序无法实现,我实际上每小时保存5个分区并运行最旧的分区,我的应用程序至少需要5小时的非聚合数据。
对于我的应用程序,不需要ACID类型的特征,因此将默认隔离更改为Read-Unmitted,并将自动提交更改为0,这提高了聚合代码的运行速度,但插入受到了影响,这需要2秒以上的时间。
此处更新聚合查询的配置文件信息
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| QUERY_ID | SEQ | STATE | DURATION | CPU_USER | CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| 50754 | 2 | continuing inside routine | 0.000015 | 0.000197 | 0.000036 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL |
| 50754 | 3 | checking permissions | 0.000007 | 0.000005 | 0.000001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_authorization.cc | 809 |
| 50754 | 4 | checking permissions | 0.000006 | 0.000006 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_authorization.cc | 809 |
| 50754 | 5 | Opening tables | 0.000017 | 0.000013 | 0.000003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 5815 |
| 50754 | 6 | init | 0.000260 | 0.000400 | 0.000073 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | handle_query | sql_select.cc | 128 |
| 50754 | 7 | System lock | 0.000011 | 0.000009 | 0.000001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc | 330 |
| 50754 | 8 | optimizing | 0.000115 | 0.000098 | 0.000017 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc | 158 |
| 50754 | 9 | statistics | 0.001624 | 0.003051 | 0.000552 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc | 374 |
| 50754 | 10 | preparing | 0.000158 | 0.000134 | 0.000024 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc | 482 |
| 50754 | 11 | Sorting result | 0.000009 | 0.000007 | 0.000001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | make_tmp_tables_info | sql_select.cc | 3849 |
| 50754 | 12 | executing | 0.000006 | 0.000005 | 0.000001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | exec | sql_executor.cc | 126 |
| 50754 | 13 | Sending data | 40.298694 | 144.161765 | 12.297466 | 596361 | 261826 | 265128 | 2899384 | 0 | 0 | 0 | 328 | 0 | exec | sql_executor.cc | 202 |
| 50754 | 14 | end | 0.000031 | 0.000024 | 0.000005 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | handle_query | sql_select.cc | 206 |
| 50754 | 15 | query end | 0.000016 | 0.000013 | 0.000003 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 4959 |
| 50754 | 16 | closing tables | 0.000055 | 0.000048 | 0.000007 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5018 |
| 50754 | 17 | query end | 0.000007 | 0.000005 | 0.000002 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 4959 |
| 50754 | 18 | closing tables | 0.000012 | 0.000009 | 0.000002 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc | 5018 |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
聚合代码如下所示,每次占用约100个客户端密钥,每小时可用客户端密钥约为100K
insert into DB.NETWORK_USAGE_FINAL(clientKey,k1,k2,k3,k4,k5,createdAt)
select clientKey, sum(k1) as k1, sum(k2) as k2, sum(k3) as k3 ,
k4, k5 , "',startTime,'" from DB.NETWORK_USAGE_F1 partition (',partitionKey,')
where clientKey in (',selectedClientKey,')
group by clientKey,k4,k5
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
#
# * Basic Settings
#
innodb_buffer_pool_size=20G
innodb_buffer_pool_instances=20
max_connections=100
query_cache_size=0
query_cache_type=0
query_cache_limit=2M
innodb_log_file_size=3G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_io_capacity = 2000
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
back_log = 1000
tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
innodb_lru_scan_depth=100
table_open_cache=4000
max_allowed_packet=1G
innodb_file_per_table=1
character-set-server = utf8
collation-server = utf8_unicode_ci
event_scheduler = ON
transaction_isolation = READ-COMMITTED
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
key_buffer_size = 1G
thread_stack = 128M
thread_cache_size = 8
myisam-recover-options = BACKUP
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size = 100M
显示CREATE TABLE
CREATE TABLE `NETWORK_USAGE_F1` (
`id` char(15) NOT NULL,
`clientKey` int(11) NOT NULL,
`k4` int(11) NOT NULL,
`k5` char(50) NOT NULL,
`createdAt` datetime NOT NULL,
`partitionKey` int(11) NOT NULL,
`k1` bigint(20) NOT NULL,
`k2` bigint(20) NOT NULL,
`k3` int(11) NOT NULL,
PRIMARY KEY (`id`,`partitionKey`),
KEY `key2` (`clientKey`,`k4`,`k5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY RANGE (partitionKey)
*/
@回复里克更新:
- 聚合代码一次在100个客户端键(限制)上运行,一个小时内将存在大约100K个唯一的客户端键,但数据库中一个小时/分区的总行数约为4000万行(因此,每个客户端键大约有400行)
- 使用的ID仅为15个字符长度
- 目前我有5个分区,分区键格式为YYYYMMMDDHH
- 不使用MyISAM
解决方案
让我们看看聚合代码。和SHOW CREATE TABLE
。
可尝试的其他内容:
- 改为每5分钟汇总一次。
- 不断地聚合";。也就是说,执行自上次执行聚合以来的所有行。
- 将行收集到"临时"表中。然后进行聚合,最后将行复制到主表中。以下是有关乒乓球技术的讨论:http://mysql.rjweb.org/doc.php/staging_table
每秒11K行是很多,但也不是不可能。
其他问题:
- 您有多少内存?
innodb_buffer_pool_size
的设置是什么?- 您有固态硬盘吗?
- 同时进行哪些类型的读取?(从大桌子上看书?是否正在读取汇总表?)
(re my.cnf)
我建议您将这些限制为内存的1%(除非您有确定的较大原因):
tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
max_allowed_packet=1G
key_buffer_size = 1G -- for MyISAM only
我希望您没有使用MyISAM。如果不是将其降低到50M。
thread_stack = 128M
糟糕!保留默认设置!
(架构)
- 使用
VARCHAR
,而不是CHAR
,除非字符串确实是固定长度的。 - 由于您的桌子将很大,请使用实际最小的
INT
大小。 - 有多少个
PARTITIONs
?哪个版本的MySQL?(分区太多本身可能会拖累性能。) partitionkey
设置为什么?
(其他)
- 是拼写错误,还是我被它的内容搞糊涂了?
- 此外,这似乎与每小时4,000万条记录冲突。
相关文章