如何提高 Django 管理员搜索相关字段(MySQL)中的查询性能
在 Django 中我有这个:
In Django I have this:
models.py
class Book(models.Model):
isbn = models.CharField(max_length=16, db_index=True)
title = models.CharField(max_length=255, db_index=True)
... other fields ...
class Author(models.Model):
first_name = models.CharField(max_length=128, db_index=True)
last_name = models.CharField(max_length=128, db_index=True)
books = models.ManyToManyField(Book, blank=True)
... other fields ...
admin.py
class AuthorAdmin(admin.ModelAdmin):
search_fields = ('first_name', 'last_name', 'books__isbn', 'books__title')
...
我的问题是,当我从 Author admin list page 使用 2 个或更多短期术语进行搜索时,MySQL 开始花费大量时间(对于 3 个术语查询,至少需要 8 秒).我有大约 5000 位作者和 2500 本书.这里的short很重要.如果我搜索a b c",那么 3 个非常短的术语,我没有足够的耐心等待结果(我至少等了 2 分钟).相反,如果我搜索所有蜜蜂线索",我会在 2 秒内得到结果.所以这个问题看起来真的是相关领域的短期问题.
My problem is when I do a search from the Author admin list page with 2 or more short terms, MySQL start to take a lot of time (at least 8 sec. for a 3 terms query). I have around 5000 Authors and 2500 Books. The short here is very important. If I search for 'a b c', so 3 really short terms, I'm not enough patient to wait for the result (I waited at least 2 min.). Instead if I search for 'all bee clue' I got the result in 2 sec. So the problem look the be really with short terms on related fields.
此搜索产生的 SQL 查询有很多 JOIN、LIKE、AND 和 OR,但没有子查询.
The SQL query resulting from this search have a lot of JOIN, LIKE, AND and OR but no subquery.
我正在使用 MySQL 5.1,但我尝试使用 5.5 没有更多成功.
I'm using MySQL 5.1 but I tried with 5.5 with no more success.
我还尝试将 innodb_buffer_pool_size
增加到一个非常大的值.没有任何改变.
I also tried to increase the innodb_buffer_pool_size
to a really large value. That change nothing.
我现在唯一可以提高性能的想法是非规范化为 isbn
和 title
字段(即直接将它们复制到作者中)但我必须添加一堆机制来使这些字段与 Book 中的真实字段保持同步.
The only idea I have right now to improve the performance is to denormalize to isbn
and title
field (ie copy them directly into Authors) but I will have to add a bunch of mechanics to keep these fields in sync with the real ones in Book.
关于如何改进此查询的任何建议?
Any suggestions on how to improve this query?
推荐答案
经过大量调查,我发现问题出在如何为管理员搜索字段构建搜索查询(在 ChangeList
类).在多词搜索(用空格分隔的词)中,每个词都通过链接一个新的 filter()
添加到 QuerySet.当search_fields
中有一个或多个相关字段时,创建的SQL查询会有很多JOIN
和很多JOIN
一个接一个地串起来对于每个相关领域(请参阅我的 相关问题 获取一些示例和更多信息).JOIN
链的存在使得每个术语将仅在数据过滤器的子集中通过先例术语进行搜索,并且最重要的是,相关字段只需要一个术语(而不是需要有所有条款)进行匹配.请参阅跨越多值关系 在 Django 文档中了解有关此主题的更多信息.我很确定这是管理员搜索字段大部分时间想要的行为.
After a lot of investigations I found that the problem come from how the search query is built for the admin search field (in the ChangeList
class). In a multi-terms search (words separated by space) each term is added to the QuerySet by chaining a new filter()
. When there's one or more related fields in the search_fields
, the created SQL query will have a lot of JOIN
chained one after the other with many JOIN
for each related field (see my related question for some examples and more info). This chain of JOIN
is there so that each term will be search only in the subset of data filter by the precedent term AND, most important, that a related field need to only have one term (vs needing to have ALL terms) to make a match. See Spanning multi-valued relationships in the Django docs for more info on this subject. I'm pretty sure it's the behavior wanted most of the time for the admin search field.
此查询(涉及相关字段)的缺点是性能变化(执行查询的时间)可能非常大.这取决于很多因素:搜索词的数量、搜索的词、字段搜索的类型(VARCHAR 等)、字段搜索的数量、表中的数据、表的大小等.使用正确的组合很容易有一个几乎永远需要的查询(一个需要超过 10 分钟的查询.对我来说是一个在这个搜索字段的上下文中需要永远的查询).
The drawback of this query (with related fields involved) is that the variation in performance (time to perform the query) can be really large. It depends on a lot of factors: number of searched terms, terms searched, kind of field search (VARCHAR, etc.), number of field search, data in the tables, size of the tables, etc. With the right combination it's easy to have a query that will take mostly forever (a query that take more then 10 min. for me is a query that take forever in the context of this search field).
之所以需要这么长时间,是因为数据库需要为每个词条创建一个临时表,并且大部分时间都对其进行扫描以搜索下一个词条.所以,这加起来真的很快.
The reason why it can take so long is that the database need to create a temporary table for each term and scan it mostly entirely to search for the next term. So, this adds up really quickly.
提高性能的一个可能的改变是ANDed在同一个filter()
中的所有术语.这样,他们将只有一个 JOIN
相关字段(如果是多对多,则为 2 个)而不是更多.这个查询会快很多,而且性能变化很小.缺点是相关字段必须包含所有要匹配的术语,因此在许多情况下您可以获得较少的匹配项.
A possible change to do to improve the performance is to ANDed all terms in the same filter()
. This way their will be only one JOIN
by related field (or 2 if it's a many to many) instead of many more. This query will be a lot faster and with really small performance variation. The drawback is that related fields will have to have ALL the terms to match, so, you can get less matches in many cases.
正如 trinchet 所要求的,这是更改搜索行为所需的内容(对于 Django 1.7).您需要覆盖您想要进行此类搜索的管理类的 get_search_results()
.您需要将所有方法代码从基类 (ModelAdmin
) 复制到您自己的类中.然后你需要改变这些行:
As asked by trinchet here’s what’s needed to do the change of search behavior (for Django 1.7). You need to override the get_search_results()
of the admin classes where you want this kind of search. You need to copy all the method code from the base class (ModelAdmin
) to your own class. Then you need to change those lines:
for bit in search_term.split():
or_queries = [models.Q(**{orm_lookup: bit})
for orm_lookup in orm_lookups]
queryset = queryset.filter(reduce(operator.or_, or_queries))
至于:
and_queries = []
for bit in search_term.split():
or_queries = [models.Q(**{orm_lookup: bit})
for orm_lookup in orm_lookups]
and_queries.append(Q(reduce(operator.or_, or_queries)))
queryset = queryset.filter(reduce(operator.and_, and_queries))
此代码未经测试.我的原始代码是针对 Django 1.4 的,我只是在此处将其改编为 1.7.
This code is not tested. My original code was for Django 1.4 and I just adapt it for 1.7 here.
相关文章