用mysql处理非常大的数据
抱歉长篇幅!
我有一个包含约 30 个表的数据库(InnoDB 引擎).其中只有两个表,即transaction"和shift"非常大(第一个有 150 万行,而 shift 有 23k 行).现在一切正常,我对当前数据库大小没有问题.
但是,我们将有一个类似的数据库(相同的数据类型、设计……)但要大得多,例如,事务"表将有大约 10 亿条记录(大约 230 万条记录)每天交易),我们正在考虑如何处理 MySQL 中的如此大的数据量?(它是读写密集型的).我阅读了很多相关的帖子,看看 Mysql(更具体地说是 InnoDB 引擎)是否可以在数十亿条记录上表现良好,但我仍然有一些问题.我读过的一些相关帖子如下:
- MySQL 能否合理地对数十亿行进行查询?
- 是 InnoDB (MySQL 5.5.8) 数十亿行的正确选择?
- 数十亿行的最佳数据存储
- MySQL 可以有多大在性能开始下降之前获取数据库
- 为什么MySQL 在处理大表时可能会很慢?
- Mysql 能处理大约 3 亿条记录的表吗?
到目前为止我所了解的提高超大表性能的方法:
- (对于 innoDB 表,这是我的情况)增加
innodb_buffer_pool_size
(例如,高达 80% 的 RAM).另外,我发现了一些其他 MySQL 性能调整设置 这里percona博客 - 在表上有适当的索引(在查询中使用 EXPLAN)
- 对表进行分区
- MySQL 分片或集群
以下是我的问题/困惑:
关于分区,我有一些疑问,我们是否应该使用它.一方面,当表非常大时,许多人建议使用它来提高性能.另一方面,我读过很多帖子说它不会提高查询性能,也不会使查询运行得更快(例如,此处 和此处).另外,我阅读了 MySQL 参考手册 InnoDB 外键和 MySQL 分区不兼容(我们有外键).
关于索引,现在它们表现良好,但据我所知,对于非常大的表,索引更具限制性(正如 Kevin Bedell 在他的回答中提到的 此处).此外,索引加快读取速度,同时减慢写入速度(插入/更新).那么,对于我们将拥有这个大数据库的新类似项目,我们是否应该先插入/加载所有数据,然后创建索引?(加快插入速度)
如果我们不能对我们的大表(事务"表)使用分区,有什么替代方案可以提高性能?(除了 MySQl 变量设置,例如
innodb_buffer_pool_size
).我们应该使用Mysql集群吗?(我们也有很多连接)
编辑
这是我们最大的名为transaction"的表的show create table
语句:
CREATE TABLE `transaction` (`id` int(11) NOT NULL AUTO_INCREMENT,`terminal_transaction_id` int(11) 非空,`fuel_terminal_id` int(11) 非空,`fuel_terminal_serial` int(11) 非空,`xboard_id` int(11) 非空,`gas_station_id` int(11) 非空,`operator_id` 文本非空,`shift_id` int(11) 非空,`xboard_total_counter` int(11) 非空,`fuel_type` int(11) 非空,`start_fuel_time` int(11) 非空,`end_fuel_time` int(11) 默认为空,`preset_amount` int(11) 非空,`actual_amount` int(11) 默认为空,`fuel_cost` int(11) 默认为空,`payment_cost` int(11) 默认为空,`purchase_type` int(11) 非空,`payment_ref_id` 文本,`unit_fuel_price` int(11) 非空,`fuel_status_id` int(11) 默认为空,`fuel_mode_id` int(11) 非空,`payment_result` int(11) 非空,`card_pan` 文本,`state` int(11) 默认为空,`totalizer` int(11) NOT NULL DEFAULT '0',`shift_start_time` int(11) 默认为空,主键(`id`),唯一键`terminal_transaction_id`(`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`)使用BTREE,KEY`start_fuel_time_idx`(`start_fuel_time`),KEY`fuel_terminal_idx`(`fuel_terminal_id`),密钥`xboard_idx`(`xboard_id`),密钥`gas_station_id`(`gas_station_id`)使用BTREE,关键`purchase_type`(`purchase_type`)使用BTREE,密钥`shift_start_time`(`shift_start_time`)使用BTREE,关键`fuel_type`(`fuel_type`)使用BTREE) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
感谢您的时间,
解决方案MySQL 能否合理地对数十亿行执行查询?-- MySQL 可以处理"数十亿行.合理"取决于查询;让我们看看他们.
InnoDB (MySQL 5.5.8) 是数十亿行的正确选择吗?-- 5.7 有一些改进,但 5.5 还是不错的,尽管
将近 68 岁了,即将不再支持.数十亿行的最佳数据存储 -- 如果您的意思是引擎",那就是 InnoDB.
在性能开始下降之前 MySQL 数据库可以有多大——同样,这取决于查询.我可以给你看一个会崩溃的 1K 行表;我曾与十亿行的表格一起工作.
为什么 MySQL 在处理大表时会变慢?-- 范围扫描导致 I/O,这是缓慢的部分.
Mysql 能处理大约 3 亿条记录的表吗?——再次,是的.限制在大约一万亿行.
(对于 InnoDB 表,这是我的情况)增加 innodb_buffer_pool_size(例如,高达 80% 的 RAM).另外,我在 Percona 博客中找到了其他一些 MySQL 性能调整设置——是的
在表上有适当的索引(在查询中使用 EXPLAIN)——好吧,让我们看看它们.在这个关键领域可能会犯很多错误.
对表进行分区——分区不是万能的!";我在我的博客
MySQL 分片 -- 目前这是 DIY
MySQL 集群——目前最好的答案是一些基于 Galera 的选项(PXC、MariaDB 10、DIY w/Oracle).Oracle 的组复制"是一个可行的竞争者.
分区不支持
FOREIGN KEY
或全局";唯一
.UUID,就你所说的规模而言,不仅会减慢系统的速度,而且实际上会杀死它.类型 1 UUID 可能是一种解决方法.>
插入和索引构建速度——变化太多,无法给出一个答案.让我们看看您的暂定
CREATE TABLE
以及您打算如何输入数据.很多连接——标准化,但不要过度标准化.";特别是,不要将日期时间或浮点数或其他连续的"数据标准化.值.
建立汇总表
每天 230 万笔交易 -- 如果这是 230 万次插入(30 次/秒),那么性能问题不大.如果更复杂,则可能需要 RAID、SSD、批处理等.
处理如此大量的数据——如果大多数活动与最近"行,那么 buffer_pool 将很好地缓存"活动,从而避免 I/O.如果活动是随机的",那么 MySQL(或任何人)就会有 I/O 问题.
缩小数据类型有助于像您这样的表.我怀疑您是否需要 4 个字节来指定
fuel_type
.有多种 1 字节方法.
Sorry for the long post!
I have a database containing ~30 tables (InnoDB engine). Only two of these tables, namely, "transaction" and "shift" are quite large (the first one have 1.5 million rows and shift has 23k rows). Now everything works fine and I don't have problem with the current database size.
However, we will have a similar database (same datatypes, design ,..) but much larger, e.g., the "transaction" table will have about 1 billion records (about 2,3 million transaction per day) and we are thinking about how we should deal with such volume of data in MySQL? (it is both read and write intensive). I read a lot of related posts to see if Mysql (and more specifically InnoDB engine) can perform well with billions of records, but still I have some questions. Some of those related posts that I've read are in the following:
- Can MySQL reasonably perform queries on billions of rows?
- Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?
- Best data store for billions of rows
- How big can a MySQL database get before performance starts to degrade
- Why MySQL could be slow with large tables?
- Can Mysql handle tables which will hold about 300 million records?
What I've understood so far to improve the performance for very large tables:
- (for innoDB tables which is my case) increasing the
innodb_buffer_pool_size
(e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in percona blog - having proper indexes on the table (using EXPLAN on queries)
- partitioning the table
- MySQL Sharding or clustering
Here are my questions/confusions:
About partitioning, I have some doubts whether we should use it or not. On one hand many people suggested it to improve performance when table is very large. On the other hand, I've read many posts saying it does not improve query performance and it does not make queries run faster (e.g., here and here). Also, I read in MySQL Reference Manual that InnoDB foreign keys and MySQL partitioning are not compatible (we have foreign keys).
Regarding indexes, right now they perform well, but as far as I understood, for very large tables indexing is more restrictive (as Kevin Bedell mentioned in his answer here). Also, indexes speed up reads while slow down write (insert/update). So, for the new similar project that we will have this large DB, should we first insert/load all the data and then create indexes? (to speed up the insert)
If we cannot use partitioning for our big table ("transaction" table), what is an alternative option to improve the performance? (except MySQl variable settings such as
innodb_buffer_pool_size
). Should we use Mysql clusters? (we have also lots of joins)
EDIT
This is the show create table
statement for our largest table named "transaction":
CREATE TABLE `transaction` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`terminal_transaction_id` int(11) NOT NULL,
`fuel_terminal_id` int(11) NOT NULL,
`fuel_terminal_serial` int(11) NOT NULL,
`xboard_id` int(11) NOT NULL,
`gas_station_id` int(11) NOT NULL,
`operator_id` text NOT NULL,
`shift_id` int(11) NOT NULL,
`xboard_total_counter` int(11) NOT NULL,
`fuel_type` int(11) NOT NULL,
`start_fuel_time` int(11) NOT NULL,
`end_fuel_time` int(11) DEFAULT NULL,
`preset_amount` int(11) NOT NULL,
`actual_amount` int(11) DEFAULT NULL,
`fuel_cost` int(11) DEFAULT NULL,
`payment_cost` int(11) DEFAULT NULL,
`purchase_type` int(11) NOT NULL,
`payment_ref_id` text,
`unit_fuel_price` int(11) NOT NULL,
`fuel_status_id` int(11) DEFAULT NULL,
`fuel_mode_id` int(11) NOT NULL,
`payment_result` int(11) NOT NULL,
`card_pan` text,
`state` int(11) DEFAULT NULL,
`totalizer` int(11) NOT NULL DEFAULT '0',
`shift_start_time` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `terminal_transaction_id` (`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`) USING BTREE,
KEY `start_fuel_time_idx` (`start_fuel_time`),
KEY `fuel_terminal_idx` (`fuel_terminal_id`),
KEY `xboard_idx` (`xboard_id`),
KEY `gas_station_id` (`gas_station_id`) USING BTREE,
KEY `purchase_type` (`purchase_type`) USING BTREE,
KEY `shift_start_time` (`shift_start_time`) USING BTREE,
KEY `fuel_type` (`fuel_type`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
Thanks for your time,
解决方案Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.
Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being
nearly 68 years old, andon the verge ofno longer being supported.Best data store for billions of rows -- If you mean 'Engine', then InnoDB.
How big can a MySQL database get before the performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.
Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.
Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.
(for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog -- yes
having proper indexes on the table (using EXPLAIN on queries) -- well, let's see them. There are a lot of mistakes that can be made in this critical area.
partitioning the table -- "Partitioning is not a panacea!" I harp on that in my blog
MySQL Sharding -- Currently this is DIY
MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle's "Group Replication" is a viable contender.
Partitioning does not support
FOREIGN KEY
or "global"UNIQUE
.UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. Type 1 UUIDs may be a workaround.
Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative
CREATE TABLE
and how you intend to feed the data in.Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.
Do build summary tables
2,3 million transactions per day -- If that is 2.3M inserts (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.
deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or anyone else) will have I/O issues.
Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify
fuel_type
. There are multiple 1-byte approaches.
相关文章