MySQL 8 按日期分区计算平均值

2021-09-25 00:00:00 window-functions mysql aggregate-functions mysql-8.0

我在这里设置了一个小提琴:https://www.db-fiddle.com/f/snDGExYZgoYASvWkDGHKDC/2

I've setup a fiddle here: https://www.db-fiddle.com/f/snDGExYZgoYASvWkDGHKDC/2

还有:

架构:

CREATE TABLE `scores` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT, `shift_id` int unsigned NOT NULL, `employee_name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL, `score` double(8,2) unsigned NOT NULL, `created_at` timestamp NOT NULL, PRIMARY KEY (`id`) ); INSERT INTO scores(shift_id, employee_name, score, created_at) VALUES (1, "John", 6.72, "2020-04-01 00:00:00"), (1, "Bob", 15.71, "2020-04-01 00:00:00"), (1, "Bob", 54.02, "2020-04-01 08:00:00"), (1, "John", 23.55, "2020-04-01 13:00:00"), (2, "John", 9.13, "2020-04-02 00:00:00"), (2, "Bob", 44.76, "2020-04-02 00:00:00"), (2, "Bob", 33.40, "2020-04-02 08:00:00"), (2, "James", 20, "2020-04-02 00:00:00"), (3, "John", 20, "2020-04-02 00:00:00"), (3, "Bob", 20, "2020-04-02 00:00:00"), (3, "Bob", 30, "2020-04-02 08:00:00"), (3, "James", 10, "2020-04-02 00:00:00")

查询 1:

-- This doesn't work SELECT employee_name, DATE_FORMAT(created_at, '%Y-%m-%d') AS `date`, ANY_VALUE(AVG(score) OVER(PARTITION BY(ANY_VALUE(created_at)))) AS `average_score` FROM scores GROUP BY employee_name, date;

查询 2:

SELECT employee_name, DATE_FORMAT(created_at, '%Y-%m-%d') AS `date`, ANY_VALUE(AVG(score)) AS `average_score` FROM scores GROUP BY employee_name, date;

查询 3:

-- This works but scales very poorly with millions of rows SELECT t1.employee_name, ANY_VALUE(DATE_FORMAT(t1.created_at, '%Y-%m-%d')) AS `date`, ANY_VALUE(SUM(t1.score) / ( SELECT SUM(t2.score) FROM scores t2 WHERE date(t2.created_at) = date(t1.created_at) ) * 100) AS `average_score` FROM scores t1 GROUP BY t1.employee_name, date;

第三个查询正确执行，但在我的测试中，当扩展到数百万行时非常慢.我认为这是因为它是一个相关的子查询并且运行了数百万次.

The third query executes correctly but in my testing has been very slow when scaling to millions of rows. I think this is because it is a correlated subquery and runs millions of times.

前两次尝试是我尝试创建以使用 MySQL 8 Window Functions 对平均计算进行分区.然而，这些正在产生意想不到的结果.给定日期的 average_score 总数应该加起来为 100，就像在第三个查询中一样.

The first two attempts are me trying to created to use MySQL 8 Window Functions to partition the average calculation. However, these are giving unexpected results. The total average_scores for a given day should add up to 100, like it does in the 3rd query.

有人知道更有效的计算方法吗?

Does anyone know of a more efficient way to calculate this?

还值得注意的是，在现实中，查询中也会有一个 WHERE IN 以按特定的 shift_id 进行过滤.给定的 shift_ids 数量可以是几十万，也可以是一百万.

It's also worth noting that in reality, there will also be a WHERE IN on the queries to filter by specific shift_ids. The number of shift_ids given could be in the hundreds of thousands, up to a million.

正在考虑的另一件事是 ElasticSearch.是否有助于更快地计算这些?

One other thing being considered is ElasticSearch. Would it help with calculating these in a quicker way?

推荐答案

您可以使用窗口函数.诀窍是取每个员工每天总分的窗口总和，如下所示:

You can use window functions. The trick is to take a window sum of the total score per employee for each day, like so:

select employee_name, date(created_at) created_date, 100 * sum(score) / sum(sum(score)) over(partition by date(created_at)) monthly_score from scores group by employee_name, date(created_at)

在你的数据库小提琴中，这个产量:

In your DB Fiddle, this yields:

| employee_name | created_date | monthly_score | | ------------- | ------------ | ------------- | | John | 2020-04-01 | 30.27 | | Bob | 2020-04-01 | 69.73 | | John | 2020-04-02 | 15.55342 | | Bob | 2020-04-02 | 68.42864 | | James | 2020-04-02 | 16.01794 |

相关文章