MySQL明明有索引，为什么不用？

2022-04-02 00:00:00 索引成本的是扫描代价

微信公众号：DBA随笔

一个经典的MySQL索引问题

今天分享一个线上的经典MySQL索引问题。

背景介绍

今天在线上运维过程中，遇到了一个MySQL的经典索引问题。线上的表结构不方便展示，我模拟了一个表结构用于说明问题：

CREATE TABLE `test_index` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(10) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_age` (`age`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4

上面的表结构中，有两个索引：一个是id，也是自增主键，另外一个是idx_age，它是一个普通二级索引。

表里面插入了从1~10w的数据，形如下面这样：

+----+------+------+
| id | name | age  |
+----+------+------+
|  1 | 1    |    1 |
|  2 | 2    |    2 |
|  3 | 3    |    3 |
|  4 | 4    |    4 |
|  5 | 5    |    5 |
|  6 | 6    |    6 |
|  7 | 7    |    7 |
|  8 | 8    |    8 |
|  9 | 9    |    9 |
| 10 | 10   |   10 |
+----+------+------+
10 rows in set (.01 sec)

下面我们看两个查询的SQL：

SQL 1：

select * from test_index where age>0 and age<20000 order by age ;

SQL 2：

select * from test_index where age>0 and age<20000 order by age limit ？;

其中，？代表一个具体的数字。

那么对于这两个SQL，MySQL会采用何种执行计划呢？

执行计划分析

执行计划分析如下：

MySQL  >explain select * from test_index where  age> and age<20000 order by age ;
+----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                       |
+----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
|  1 | SIMPLE      | test_index | NULL       | ALL  | idx_age       | NULL | NULL    | NULL | 100139 |    37.47 | Using where; Using filesort |
+----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (.00 sec)

MySQL >explain select * from test_index where  age> and age<20000 order by age limit 100,200;
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------+
|  1 | SIMPLE      | test_index | NULL       | range | idx_age       | idx_age | 5       | NULL | 37524 |   100.00 | Using index condition |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------+
1 row in set, 1 warning (.00 sec)

可以看到：

个SQL，关键字标红的是：all，using filesort

第二个SQL，关键字表用的是：range，idx_age

我们表的age字段上有普通索引，本身就是根据age字段进行排序的。但是我们的SQL语句中，除了要访问age，还需要访问id列和name列，所以，在age这个索引上，不能获取所有的数据。

那么，正常情况下，这个SQL的可能的执行方法有以下两种：

方法1、扫描age字段，先找到符合条件的age值，再根据age字段的索引，获得对应的id列值，然后"回表"，去聚集索引(也就是主键id）上去查找对应的name列的值

方法2、直接在聚集索引id上面查找满足age字段的值，然后再利用文件排序。

从实际的情况来看，

SQL 1：

select * from test_index where age>0 and age<20000 order by age ;

采用的是上述方法2

而SQL2：

select * from test_index where age>0 and age<20000 order by age limit 100,200;

采用的是上述方法1

age列上已经创建了索引，查询也是按照age来的，而且还有排序操作。为什么第2个SQL的限制条件更多，但是却能用到索引，个SQL的限制条件更少，却用不到索引？

真实情况是什么样子的？

我们再来看下面的对比：

MySQL >explain select * from test_index where  age> and age<20000 order by age limit 100,200;
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------+
| id | select_type | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------+
|  1 | SIMPLE      | test_index | NULL       | range | idx_age       | idx_age | 5       | NULL | 37524 |   100.00 | Using index condition |
+----+-------------+------------+------------+-------+---------------+---------+---------+------+-------+----------+-----------------------+
1 row in set, 1 warning (.00 sec)

MySQL >explain select * from test_index where  age> and age<20000 order by age limit 100,20000;
+----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                       |
+----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
|  1 | SIMPLE      | test_index | NULL       | ALL  | idx_age       | NULL | NULL    | NULL | 100139 |    37.47 | Using where; Using filesort |
+----+-------------+------------+------------+------+---------------+------+---------+------+--------+----------+-----------------------------+
1 row in set, 1 warning (.00 sec)

上面的这两个SQL语句：

SQL 3：

explain select * from test_index where age>0 and age<20000 order by age limit 100,200;

SQL 4：

explain select * from test_index where age>0 and age<20000 order by age limit 100,20000;

仅仅是limit 后面的值不一样，但是执行计划却完全不相同。

扫描行数100的，采用的方法1，先索引查询，再回表；

扫描行数19900的，采用的方法2，直接查聚集索引。

为什么会有这种差异呢？

MySQL优化器的逻辑是关键。

在MySQL存储引擎中，磁盘和内存通过数据页来交互，

MySQL中，采用的是基于成本的优化。通常我们说的成本，指代的是CPU成本和IO成本。

在MySQL优化器代码中有这样的设定：

1、读取一个数据页的成本是1（也就是IO成本）；

2、从这个数据页中找到一个数据记录的成本可以大概表示成0.2（也就是CPU成本）；

3、读取一个范围的数据记录相当于读取一个数据页的成本，也是1。

4、每条记录每次回表操作都相当于访问一个页面

当然，实际中，比这个计算方法要复杂。

MySQL中，决定一个查询究竟用哪个索引的过程，可以简单模拟成下面这样：

1、根据搜索条件，找到所有可能的索引，并逐一计算走每条索引的代价

2、计算全表扫描的代价

3、对比各种执行方案，选出成本低的一个

全表扫描的代价=

IO代价

+CPU代价

所有的页面*1+

所有的记录数*0.2

索引扫描的代价=

二级索引IO代价

+二级索引CPU代价

+回表访问IO代价

+回表访问CPU代价

1个数据页*1

+ 所有满足条件的记录数*0.2

+ 所有满足条件的记录数*1 （每次回表都相当于访问一个页面）

+ 所有满足条件的记录数*0.2

上述案例中，我们累计有10w记录，总的页面数大约是225个，计算方法如下（先查看表的基本状况，然后利用data_length计算页面数）：

MySQL >show table status like 'test_index'\G
*************************** 1. row ***************************
           Name: test_index
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 100139
 Avg_row_length: 36
    Data_length: 3686400
Max_data_length: 
   Index_length: 1589248
      Data_free: 4194304
 Auto_increment: 100001
    Create_time: 2022-01-10 21:50:53
    Update_time: 2022-01-10 21:57:01
     Check_time: NULL
      Collation: utf8mb4_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (.00 sec)

MySQL 5724 myadmin_common@127...1 :test 11:06:07 >select 3686400/16/1024;
+-----------------+
| 3686400/16/1024 |
+-----------------+
|    225.00000000 |
+-----------------+
1 row in set (.00 sec)

而扫描的age范围是[0,20000], 也就是大概1/5的所有记录。

我们简单计算下这个数值：

如果扫描行数比较多，例如有2w个，则：

全表扫描代价=225*1+100000*0.2=20225

索引扫描代价=1+4000+20000*1 + 4000=28001

这里全表扫描，代价更低。

如果扫描行数很小，例如只有100个，则：

全表扫描代价=225*1+100000*0.2=20225

索引扫描代价=1+100*0.2+100*1 + 100*0.2=141

这里索引扫描，代价更低。

看到这里，想必上述的问题能够回答了，对于：

SQL 1：

select * from test_index where age>0 and age<20000 order by age ;

相当于使用了limit 100000，采用的是全表扫描

而SQL2：

select * from test_index where age>0 and age<20000 order by age limit 100,200;

只扫描了100行，因此采用索引扫描的方法

总结：

当然，真正的查询代价计算方法，比上述描述更为复杂，这里是想通过这样一个特殊案例和计算方法，告诉大家：

1、不同的扫描行数，会导致不同的扫描代价。

2、不能认为限制条件少的语句更应该使用索引。

3、不能认为索引创建了就一定会用到而要通过。

4、explain是个好工具，可以对SQL语句真正的执行计划加以分析，从而给出正确的优化方案。

相关文章