如何优化 MySQL 的 ORDER BY RAND() 函数?

2021-11-20 00:00:00 performance random mysql

我想优化我的查询,所以我查看了 mysql-slow.log.

I'd like to optimize my queries so I look into mysql-slow.log.

我的大多数慢查询都包含 ORDER BY RAND().我找不到解决此问题的真正解决方案.MySQLPerformanceBlog<有一个可能的解决方案/a> 但我认为这还不够.在优化不佳(或经常更新、用户管理)的表上,它不起作用,或者我需要运行两个或多个查询才能选择我的 PHP 生成的随机行.

Most of my slow queries contains ORDER BY RAND(). I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlog but I don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or I need to run two or more queries before I can select my PHP-generated random row.

这个问题有什么解决办法吗?

Is there any solution for this issue?

一个虚拟的例子:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
ORDER BY
        RAND()
LIMIT 1

推荐答案

试试这个:

SELECT  *
FROM    (
        SELECT  @cnt := COUNT(*) + 1,
                @lim := 10
        FROM    t_random
        ) vars
STRAIGHT_JOIN
        (
        SELECT  r.*,
                @lim := @lim - 1
        FROM    t_random r
        WHERE   (@cnt := @cnt - 1)
                AND RAND(20090301) < @lim / @cnt
        ) i

这在 MyISAM 上特别有效(因为 COUNT(*) 是即时的),但即使在 InnoDB 中它也是 10ORDER BY RAND() 效率高几倍.

This is especially efficient on MyISAM (since the COUNT(*) is instant), but even in InnoDB it's 10 times more efficient than ORDER BY RAND().

这里的主要思想是我们不排序,而是保留两个变量并计算当前步骤要选择的行的运行概率.

The main idea here is that we don't sort, but instead keep two variables and calculate the running probability of a row to be selected on the current step.

有关详细信息,请参阅我博客中的这篇文章:

See this article in my blog for more detail:

  • 选择随机行

更新:

如果你只需要选择一个随机记录,试试这个:

If you need to select but a single random record, try this:

SELECT  aco.*
FROM    (
        SELECT  minid + FLOOR((maxid - minid) * RAND()) AS randid
        FROM    (
                SELECT  MAX(ac_id) AS maxid, MIN(ac_id) AS minid
                FROM    accomodation
                ) q
        ) q2
JOIN    accomodation aco
ON      aco.ac_id =
        COALESCE
        (
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_id > randid
                AND ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        ),
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        )
        )

这假设您的 ac_id 的分布或多或少是均匀的.

This assumes your ac_id's are distributed more or less evenly.

相关文章