如何考虑权重随机选择一行?
我有一张看起来像这样的桌子:
I have a table which looks like that:
id: primary key
content: varchar
weight: int
我想要做的是从这张表中随机选择一行,但要考虑到权重.例如,如果我有 3 行:
What I want to do is randomly select one row from this table, but taking into account the weight. For example, if I have 3 rows:
id, content, weight
1, "some content", 60
2, "other content", 40
3, "something", 100
第一行有 30% 的几率被选中,第二行有 20% 的几率被选中,第三行有 50% 的几率被选中.
The first row has 30% chance of being selected, the second row has 20% chance of being selected, and the third row has 50% chance of being selected.
有没有办法做到这一点?如果我必须执行 2 或 3 个查询,那不是问题.
Is there a way to do that? If I have to execute 2 or 3 queries it's not a problem.
推荐答案
我觉得最简单的其实就是使用加权水库采样:
I think the simplest is actually to use the weighted reservoir sampling:
SELECT
id,
-LOG(RAND()) / weight AS priority
FROM
your_table
ORDER BY priority
LIMIT 1;
这是一种很棒的方法,可以让您从 N 个元素中选择 M 个,其中每个元素被选择的概率与其权重成正比.当您只需要一个元素时,它也能正常工作.这篇文章中描述了该方法.注意,他们选择了 POW(RAND(), 1/weight) 的最大值,相当于选择了 -LOG(RAND())/weight 的最小值.
It's a great method that lets you choose M out of N elements where the probability to be chosen for each element is proportional to its weight. It works just as well when you happen to only want one element. The method is described in this article. Note that they choose the biggest values of POW(RAND(), 1/weight), which is equivalent to choosing the smallest values of -LOG(RAND()) / weight.
相关文章