MySQL 索引 - 最佳实践是什么?
我已经在我的 MySQL 数据库上使用索引有一段时间了,但从未正确了解它们.通常,我会使用 WHERE
子句在我将要搜索或选择的任何字段上放置索引,但有时它看起来并不那么黑白分明.
I've been using indexes on my MySQL databases for a while now but never properly learnt about them. Generally I put an index on any fields that I will be searching or selecting using a WHERE
clause but sometimes it doesn't seem so black and white.
MySQL 索引的最佳实践是什么?
示例情况/困境:
如果一个表有六列并且所有列都可以搜索,我应该索引所有列还是不索引?
If a table has six columns and all of them are searchable, should I index all of them or none of them?
索引对性能有哪些负面影响?
What are the negative performance impacts of indexing?
如果我有一个 VARCHAR 2500 列可以从我的网站的某些部分进行搜索,我应该将它编入索引吗?
If I have a VARCHAR 2500 column which is searchable from parts of my site, should I index it?
推荐答案
你绝对应该花一些时间阅读索引,有很多关于它的文章,了解正在发生的事情很重要.
You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.
从广义上讲,索引对表的行进行排序.
Broadly speaking, an index imposes an ordering on the rows of a table.
为简单起见,假设表格只是一个大的 CSV 文件.每当插入一行时,它都会被插入最后.所以表的自然"顺序就是插入行的顺序.
For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.
想象一下,您已经在一个非常基本的电子表格应用程序中加载了该 CSV 文件.这个电子表格所做的只是显示数据,并按顺序对行进行编号.
Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.
现在假设您需要在第三列中找到具有某个值M"的所有行.鉴于您拥有的可用资源,您只有一种选择.您扫描表格,检查每一行的第三列的值.如果您有很多行,则此方法(表扫描")可能需要很长时间!
Now imagine that you need to find all the rows that have some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!
现在想象一下,除了这个表,你还有一个索引.此特定索引是第三列中的值的索引.该索引以某种有意义的顺序(例如,按字母顺序)列出了第三列中的所有值,并为每个值提供了出现该值的行号列表.
Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.
现在您有一个很好的策略来查找第三列的值为M"的所有行.例如,您可以执行二进制搜索!表扫描需要您查看 N 行(其中 N 是行数),而在最坏的情况下,二进制搜索仅需要您查看 log-n 索引条目.哇,那肯定容易多了!
Now you have a good strategy for finding all the rows where the value of the third column is "M". For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!
当然,如果您有这个索引,并且要向表中添加行(最后,因为我们的概念表就是这样工作的),您需要每次都更新索引.因此,您在编写新行时会做更多的工作,但在搜索内容时可以节省大量时间.
Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.
因此,一般而言,索引会在读取效率和写入效率之间进行权衡.由于没有索引,插入会非常快——数据库引擎只是向表中添加一行.添加索引时,引擎必须在执行插入时更新每个索引.
So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.
另一方面,读取速度变得更快.
On the other hand, reads become a lot faster.
希望这能涵盖您的前两个问题(正如其他人所回答的那样——您需要找到正确的平衡点).
Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).
你的第三个场景有点复杂.如果您使用 LIKE,索引引擎通常会帮助您将读取速度提高到第一个%".换句话说,如果您选择 WHERE 列 LIKE 'foo%bar%',数据库将使用索引查找列以foo"开头的所有行,然后需要扫描该中间行集以查找子集包含酒吧".SELECT ... WHERE column LIKE '%bar%' 不能使用索引.我希望你能明白为什么.
Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.
最后,您需要开始考虑多列的索引.概念是相同的,并且行为类似于 LIKE 的东西——基本上,如果你在 (a,b,c) 上有一个索引,引擎将继续尽可能地从左到右使用索引.因此,对 a 列的搜索可能会使用 (a,b,c) 索引,就像 (a,b) 上的搜索一样.但是,如果您搜索 WHERE b=5 AND c=1)
Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentially, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)
希望这有助于阐明一点,但我必须重申,您最好花几个小时来寻找深入解释这些事情的好文章.阅读特定数据库服务器的文档也是一个好主意.查询规划器实现和使用索引的方式可能差异很大.
Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.
相关文章