将 `rand()` 与 `have` 一起使用
我有一个包含记录列表的表.每次迭代,都必须从特定偏移量开始随机选取一组.每行都有机会被挑选(例如,新的或不经常挑选的行会被挑选更多).
I have a table that contains a list of records. Each iteration, a random set of these must be picked, starting from a specific offset. Each row has a chance to be picked (so e.g. new or not often picked rows are picked more).
但是,something 不起作用,导致返回的行不满足使用别名 rand()
的条件.
However, something doesn't work, causing rows to be returned that do not satisfy a condition using an aliased rand()
.
我正在尝试使用以下查询:
I'm attempting to use the following query:
select
id,
probability,
rand() rolledChance
from records
where id > :offset
having rolledChance < probability;
其中 :offset
是一个准备好的语句参数,并且是该用户在最后一次迭代中最后扫描的 id.
Where :offset
is a prepared statement parameter, and is the last scanned id in the last iteration for this user.
在这样创建的表上(这是表的相关子集):
On a table created like this (which is the relevant subset of the table):
CREATE TABLE records (id INT, probability FLOAT);
其中probability 是records
表中介于0 和1 之间的值.但是,这将返回条件不满足的行.我通过以下查询检查了这一点:
Where probability is a value between 0 and 1 on the table records
. However, this returns rows where the condition does not satisfy. I checked this with the following query:
select
*,
x.rolledChance < x.probability shouldPick
from
(select
id,
probability,
rand() rolledChance
from records
having rolledChance < probability
) x;
返回的几行是:
id probability rolledChance shouldPick
12 0.546358 0.015139976530466207 1
26 0.877424 0.9730734508233829 0
46 0.954425 0.35213605347288407 1
当我如下改变第二个查询的用途时,它按预期工作,并且只返回 rolledChance
实际上低于 probability
的行:
When I repurpose the second query as follows, it works as expected, and only returns rows where rolledChance
is actually lower than probability
:
select
*,
x.rolledChance < x.probability shouldPick
from
(select id, probability, rand() rolledChance from records) x
where rolledChance < probability;
那么我错过了什么?probability
和 rolledChance
的使用是否与我在比较中的想法不同?rand()
每次在同一个查询中使用别名时都会评估吗?
So what am I missing? Are the probability
and rolledChance
used differently than I thought in the comparison? Is the rand()
evaluated every time the alias is used in the same query?
版本输出:mysql Ver 15.1 Distrib 10.0.28-MariaDB,用于使用 readline 5.2 的 debian-linux-gnu (x86_64)
,在 Debian Jessie 上运行.
Version output: mysql Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
, running on Debian Jessie.
推荐答案
我认为问题是 HAVING 是在 GROUP BY 之后应用的,但仍然在 SELECT 阶段之前.我意识到这很令人困惑,因为 HAVING 子句引用了 SELECT 语句中的列,但我认为它基本上只是执行 SELECT 语句中的任何内容两次 - 一次用于拥有,然后再次用于 SELECT.
I think the problem is that HAVING is applied after GROUP BY, but still before the SELECT phase. I realise it's confusing because the HAVING clause references a column from the SELECT statement, but I think it basically just executes whatever is in the SELECT statement twice - once for the having, and then again for the SELECT.
例如,请参阅此答案.
注意,这特别令人困惑,因为如果您在 HAVING 子句中引用未出现在 SELECT 语句中的列名,则会引发错误.
Note, it's especially confusing because if you refer to a column name that doesn't appear in the SELECT statement in a HAVING clause it'll throw an error.
例如,这个小提琴
但是按照上面的那个小提琴,它仍然可以让你根据一个没有出现在输出中的函数的结果进行实际过滤.长话短说,HAVING 子句仍在做你想做的事,但你不能同时过滤一个随机值并使用这种方法同时显示它.如果你需要这样做,你需要先使用子查询来固定值,然后外部查询才能过滤并显示在它上面.
But as per that fiddle above, it'll still let you actually filter based on the result of a function that doesn't appear in the output. Long story short, the HAVING clause is still doing what you want, but you can't both filter on a random value and display it at the same time using that approach. If you need to do that, you need to use a subquery to fix the value first, then the outer query can filter and display on it.
另外,为了清楚起见,可能值得在 having 子句中使用 RAND(),而不是 SQL 部分.虽然我知道这个问题是在问为什么它这样做而不是试图专门解决问题.
Also, to make it clear, it's probably worth just using RAND() in the having clause, not the SQL part. Though I get that this question is asking why it's doing this rather than trying to solve the problem specifically.
相关文章