索引布尔列与日期时间列的查询性能
如果索引设置在 datetime
类型列,而不是 boolean
类型列(并且在该列上完成查询),查询性能是否有显着差异?
Is there a notable difference in query performance, if the index is set on datetime
type column, instead of boolean
type column (and querying is done on that column)?
在我目前的设计中,我有 2 列:
In my current design I got 2 columns:
is_active
TINYINT(1),索引deleted_at
日期时间
is_active
TINYINT(1), indexeddeleted_at
DATETIME
查询是SELECT * FROM table WHERE is_active = 1;
如果我在 deleted_at
列上建立索引,然后运行类似 SELECT * FROM table WHERE deleted_at is null;
之类的查询,会不会更慢?p>
Would it be any slower, if I made an index on deleted_at
column instead, and ran queries like this SELECT * FROM table WHERE deleted_at is null;
?
推荐答案
这是一个包含 10M 行的 MariaDB (10.0.19) 基准测试(使用 序列插件):
Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin):
drop table if exists test;
CREATE TABLE `test` (
`id` MEDIUMINT UNSIGNED NOT NULL,
`is_active` TINYINT UNSIGNED NOT NULL,
`deleted_at` TIMESTAMP NULL,
PRIMARY KEY (`id`),
INDEX `is_active` (`is_active`),
INDEX `deleted_at` (`deleted_at`)
) ENGINE=InnoDB
select seq id
, rand(1)<0.5 as is_active
, case when rand(1)<0.5
then null
else '2017-03-18' - interval floor(rand(2)*1000000) second
end as deleted_at
from seq_1_to_10000000;
为了测量我在执行查询后使用 set profiling=1
并运行 show profile
的时间.从分析结果中,我取 Sending data
的值,因为其他所有内容都小于 1 毫秒.
To measure the time I use set profiling=1
and run show profile
after executing a query. From the profiling result I take the value of Sending data
since everything else is altogether less than one msec.
TINYINT 索引:
SELECT COUNT(*) FROM test WHERE is_active = 1;
运行时间:~ 738 毫秒
TIMESTAMP索引:
SELECT COUNT(*) FROM test WHERE deleted_at is null;
运行时间:~ 748 毫秒
索引大小:
select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats
where database_name = 'tmp'
and table_name = 'test'
and stat_name = 'size'
结果:
database_name | table_name | index_name | stat_value*@@innodb_page_size
-----------------------------------------------------------------------
tmp | test | PRIMARY | 275513344
tmp | test | deleted_at | 170639360
tmp | test | is_active | 97107968
请注意,虽然 TIMESTAMP(4 字节)是 TYNYINT(1 字节)的 4 倍,但索引大小甚至没有两倍大.但是,如果它不适合内存,则索引大小可能会很大.因此,当我将 innodb_buffer_pool_size
从 1G
更改为 50M
时,我得到以下数字:
Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change innodb_buffer_pool_size
from 1G
to 50M
i get the following numbers:
- TINYINT:~ 960 毫秒
- 时间戳:~ 1500 毫秒
为了更直接地解决这个问题,我对数据做了一些更改:
To address the question more directly I did some changes to the data:
- 我使用 DATETIME 而不是 TIMESTAMP
- 由于条目通常很少被删除,我使用
rand(1)<0.99
(1% 已删除)而不是rand(1)<0.5
(50% 已删除)) - 表格大小从 10M 变为 1M 行.
SELECT COUNT(*)
改为SELECT *
- Instead of TIMESTAMP I use DATETIME
- Since entries are usually rarely deleted I use
rand(1)<0.99
(1% deleted) instead ofrand(1)<0.5
(50% deleted) - Table size changed from 10M to 1M rows.
SELECT COUNT(*)
changed toSELECT *
索引大小:
index_name | stat_value*@@innodb_page_size
------------------------------------------
PRIMARY | 25739264
deleted_at | 12075008
is_active | 11026432
由于 99% 的 deleted_at
值为 NULL,因此索引大小没有显着差异,尽管非空 DATETIME 需要 8 个字节 (MariaDB).
Since 99% of deleted_at
values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB).
SELECT * FROM test WHERE is_active = 1; -- 782 msec
SELECT * FROM test WHERE deleted_at is null; -- 829 msec
删除两个索引两个查询在大约 350 毫秒内执行.删除 is_active
列后,deleted_at is null
查询将在 280 毫秒内执行.
Dropping both indexes both queries execute in about 350 msec. And dropping the is_active
column the deleted_at is null
query executes in 280 msec.
请注意,这仍然不是一个现实的场景.您不太可能希望从 1M 行中选择 990K 行并将其交付给用户.表格中可能还会有更多列(可能包括文本).但它表明,您可能不需要 is_active
列(如果它不添加其他信息),并且任何索引在最好的情况下对于选择未删除的条目都是无用的.
Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the is_active
column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries.
然而,索引对于选择已删除的行很有用:
However an index can be usefull to select deleted rows:
SELECT * FROM test WHERE is_active = 0;
使用索引在 10 毫秒内执行,不使用索引在 170 毫秒内执行.
Executes in 10 msec with index and in 170 msec without index.
SELECT * FROM test WHERE deleted_at is not null;
使用索引在 11 毫秒内执行,不使用索引在 167 毫秒内执行.
Executes in 11 msec with index and in 167 msec without index.
删除 is_active
列,它在 4 毫秒内执行索引,在 150 毫秒内执行索引.
Dropping the is_active
column it executes in 4 msec with index and in 150 msec without index.
因此,如果这种情况以某种方式适合您的数据,那么结论将是:删除 is_active
列,如果您很少选择已删除,则不要在 deleted_at
列上创建索引条目.或者根据您的需要调整基准并做出自己的结论.
So if this scenario somehow fits your data the conclusion would be: Drop the is_active
column and don't create an index on deleted_at
column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.
相关文章