MySQL varchar 索引长度

2021-11-20 00:00:00 indexing mysql varchar

我有一张这样的桌子:

CREATE TABLE `products` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(512) NOT NULL,
  `description` text,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=38 DEFAULT CHARSET=utf8;

还有一个这样的:

CREATE TABLE `product_variants` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(11) unsigned NOT NULL,
  `product_code` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `product_code` (`product_code`),
  KEY `product_variant_product_fk` (`product_id`),
  CONSTRAINT `product_variant_product_fk` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037 DEFAULT CHARSET=utf8;

和这样的 SQL 语句

and an SQL statement like this

SELECT p.id AS id, p.name AS name, p.description AS description, pv.id AS product_variant_id, pv.product_code AS product_code
FROM products p
INNER JOIN product_variants pv ON pv.product_id = p.id
ORDER BY p.name ASC
LIMIT 300 OFFSET 0;

如果我解释给我这个:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

对于一百万行,这很慢.我试过在上添加索引products.name 与:

For a million rows, this is pretty slow. I've tried adding an index on products.name with:

ALTER TABLE products ADD INDEX `product_name_idx` (name(512));

这给出了:

mysql> show indexes from products;
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| products |          0 | PRIMARY          |            1 | id              | A         |      993658 |     NULL | NULL   |      | BTREE      |         |               |
| products |          1 | product_manf_fk  |            1 | manufacturer_id | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| products |          1 | product_name_idx |            1 | name            | A         |         201 |      255 | NULL   |      | BTREE      |         |               |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

我认为 Sub_part 列显示了已在索引(以字节为单位),如本页所述.

I think that the Sub_part column shows the prefix that has been in indexed (in bytes), as described on this page.

当我重新解释查询时,我得到:

When I re-explain the query, I get:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

看起来新索引没有被使用.如上所述本页,索引不会用于排序前缀索引.事实上,如果我截断数据:

which looks like the new index is not being used. As described on this page, indexes will not be used for sorting if they are prefix indexes. In fact if I truncate the data with:

alter table products modify `name`  varchar(255) not null;

解释给出:

+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
| id | select_type | table | type  | possible_keys              | key                        | key_len | ref                                          | rows | Extra |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
|  1 | SIMPLE      | p     | index | PRIMARY                    | product_name_idx           | 767     | NULL                                         |  300 |       |
|  1 | SIMPLE      | pv    | ref   | product_variant_product_fk | product_variant_product_fk | 4       | oh_2c98c233_69fe_4f06_ad0d_fe6f85a5beac.p.id |    1 |       |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+

我认为这是支持的.但是,它在此页面上说InnoDB 表最多可以有 767 字节的索引.如果长度在字节,为什么它拒绝超过255?如果它在字符,它如何决定每个 UTF-8 字符的长度?是它只是假设 3?

which I think backs that up. However, it says on this page that InnoDB tables can have up to 767 bytes of index. If the length is in bytes, why does it refuse to have more than 255? If it's in characters, how is it deciding the length of each UTF-8 character? Is it just assuming 3?

另外,我正在使用这个版本的 MySQL:

Also, am using this version of MySQL:

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.27-log |
+------------+
1 row in set (0.00 sec)

推荐答案

2021 年 9 月我已经使用 MySQL 8.0 几年了,所以这里有一些更新的信息.

Sept 2021 edit: I've been using MySQL 8.0 for a couple years now, so here's some updated info.

MySQL 手册现在有一个非常有用的页面 关于utf8mb3(目前也称为utf8)和utf8mb4之间的转换.utf8mb3 已弃用并将最终被移除;当它被移除时,它的当前别名 utf8 将改为引用 utf8mb4.

The MySQL manual now has a very informative page regarding conversion between utf8mb3 (currently also known as utf8) and utf8mb4. utf8mb3 is deprecated and will be removed eventually; and when it is removed, its current alias, utf8, will refer to utf8mb4 instead.

使用已弃用的 utf8mb3,您可以在索引中存储最多 255 个字符,而使用 utf8mb4,使用 COMPACT 时最多可存储 191 个字符或 REDUNDANT 行格式.

With deprecated utf8mb3, you can store up to 255 characters in an index, while with utf8mb4, up to 191, when using COMPACT or REDUNDANT row format.

使用 COMPRESSEDDYNAMIC 行格式时,索引键前缀最多可达 3072 字节.使用它们,您可以为 utf8mb3 索引最多 1024 个字符,为 utf8mb4 索引最多 768 个字符.

With COMPRESSED or DYNAMIC row format, index key prefixes can be up to 3072 bytes. With them, you can index up to 1024 characters for utf8mb3, and 768 characters for utf8mb4.

下面是我之前的回答,它解释了您可以索引的字符数量与字节数量背后的一些逻辑.

Below is my previous answer, which explains some of the logic behind the number of characters you can index versus the number of bytes.

由于我的研究,我必须修改我的答案.我最初发布了这个(引用自己的话):

I must revise my answer due to my research. I originally posted this (quoting myself):

我相信答案是你不知道会有多少个字符在索引中,因为你不知道你的字符有多少字节将是(除非你做一些事情来排除多字节字符).

I believe the answer is that you cannot know how many characters will be in the index because you cannot know how many bytes your characters will be (unless you do something to exclude multi-byte characters).

我不确定,但它可能仍然是正确的,但与我的想法不同.

And I'm not sure, but it might still be correct, but not in quite the way I was thinking.

正确答案如下:

MySQL 假设每个 utf8 字符有 3 个字节.255 个字符是您可以为每列指定的最大索引大小,因为 256x3=768,这打破了 767 字节的限制.

如果不指定索引大小,MySQL 会选择最大大小(即每列 255 个).UNIQUE 约束不能放在长度大于 255 的 utf8 列上,因为唯一索引必须包含整个单元格值.但是可以使用常规索引 - 它只会索引前 255 个字符(或前 767 个字节?).对我来说,这仍然是一个谜.

If you don't specify index size, MySQL chooses the maximum size (i.e. 255 per column). A UNIQUE constraint cannot be put on a utf8 column whose length is greater than 255, because a unique index must contain the entire cell value. But a regular index can be used - it will just index the first 255 characters (or first 767 bytes?). And that is where there is still some mystery for me.

谜团:我可以理解为什么 MySQL 为安全起见假设每个字符 3 个字节,否则 UNIQUE 约束可能会被破坏.但是文档似乎表明索引实际上以字节为单位,而不是字符.因此,假设您在 varchar(256) 列上放置了 255 个字符(765 字节)的索引.如果您存储的字符都是 ASCII 1 字节字符,例如 A-Z、a-z、0-9,那么您可以将整个列放入 767 字节索引中.看起来这就是实际发生的事情.

The MySTERY: I can see why MySQL assumes 3 bytes per character, for safety, because otherwise the UNIQUE constraint could be broken. But the docs seem to suggest that the index is actually sized in bytes, not characters. So, suppose you put a 255 char (765 byte) index on a varchar(256) column. If the characters you store are all ASCII, 1-byte characters, like A-Z, a-z, 0-9, then then you can fit the entire column into the 767 byte index. And it seems like that is what would actually happen.

以下是我的原始答案中有关字符、字节等的更多信息.

Below is some more information from my original answer about characters, bytes, etc.

根据维基百科,UTF-8字符可以是1,2,3, 或 4 个字节长.但是,根据 这个 mysql 文档,最大字符大小为 3 个字节,因此任何超过 255 个字符的列索引索引都可能达到该字节限制.但据我了解,可能不是.如果您的大部分字符都在 ASCII 范围内,那么您的平均字符大小将接近 1 个字节.例如,如果您的平均字符大小为 1.3 个字节(大部分为 1 个字节,但也有大量 2-3 个字节的字符),那么您可以指定索引为 767/1.3

According to wikipedia, UTF-8 character can be 1,2, 3, or 4 bytes long. But, according to this mysql documentation, the maximium character size is 3 bytes, and so any column index index over 255 characters might hit that byte limit. But as I understand it, it might not. If most of your characters are in the ASCII range, then your average character size will be closer to 1 byte. If your average character size is, for example, 1.3 bytes (mostly 1 byte, but a significant number of 2-3 byte characters), then you could specify an index of 767/1.3

因此,如果您主要存储 1 个字节的字符,则您的实际字符限制更像是:767/1.3 = 590.但事实证明这不是它的工作方式.255 个字符是限制.

So, if you are storing mostly 1-byte characters, your actual character limit would be more like: 767 / 1.3 = 590. But it turns out that is not the way it works. 255 characters is the limit.

如此 MySQL 文档中所述,

前缀限制以字节为单位,而前缀长度以字节为单位CREATE INDEX 语句被解释为字符数非二进制数据类型(CHAR、VARCHAR、TEXT).考虑到这一点为使用多字节的列指定前缀长度时字符集.

Prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for nonbinary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set.

似乎 MySQL 建议人们像我刚才那样进行计算/推测,以确定 varchar 列的键大小.但实际上,您不能为 utf8 列指定大于 255 的索引.

It seems that MySQL is advising people to do a calculation/guestimation like I just did in order to determine your key size for a varchar column. But in fact you cannot specify an index larger than 255 for utf8 columns.

最后,如果你再次参考我的第二个链接,还有这个:

Finally, if you refer back to my second link again, there is also this:

当启用了 innodb_large_prefix 配置选项时,这个长度限制提高到 3072 字节,对于使用动态和压缩行格式.

When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.

因此,如果您愿意,似乎可以通过一些调整获得更大的索引.只要确保行格式是动态的或压缩的.在这种情况下,您可能可以指定 1023 或 1024 个字符的索引.

So it seems like you can get much larger indexes if you want to, with a bit of tweaking. Just make sure the row formats are DYNAMIC or COMPRESSED. You can probably specify an index of 1023 or 1024 characters in that case.

我只是尝试在带有 tinyint(1) 列的 varchar(511) 列上创建复合索引,但收到错误消息,指出最大索引大小为 767 字节.这让我相信 MySQL 假设 utf8 字符集列将包含每个字符 3 个字节(最大值),并允许您最多使用 255 个字符.但也许这仅适用于复合索引.当我发现更多信息时,我会更新我的答案.但现在我把它留作编辑.

I just tried to create a composite index on a varchar(511) column with a tinyint(1) column and got the error message saying the max index size was 767 bytes. This makes me believe that MySQL assumes utf8 character set columns will contain 3 bytes per character (the maximum), and allows you to use 255 chars max. But perhaps that is only with composite indexes. I will update my answer as I find out more. But for now I'm leaving this as an edit.

相关文章