MySQL选择以NULL分隔的数据系列之间的日期范围

2022-01-21 00:00:00 dataset mysql date-range

这是解决 更复杂的问题.

有一个数据表:

+------------+------+----------+-----------+
| date       | data | data_max | data_diff |
+------------+------+----------+-----------+
| 2017-01-02 |    2 |        2 |      NULL |
| 2017-01-03 |    4 |        4 |      NULL |
| 2017-01-04 |    1 |        4 |        -3 |
| 2017-01-05 |    3 |        4 |        -1 |
| 2017-01-06 |    1 |        4 |        -3 |
| 2017-01-07 |    4 |        4 |      NULL |
| 2017-01-08 |    5 |        5 |      NULL |
| 2017-01-09 |   -2 |        5 |        -7 |
| 2017-01-10 |    0 |        5 |        -5 |
| 2017-01-11 |   -5 |        5 |       -10 |
| 2017-01-12 |    6 |        6 |      NULL |
| 2017-01-13 |    4 |        6 |        -2 |
+------------+------+----------+-----------+

我想为每个数据子集分别计算 data_diff 的最小值和最大值.每个数据子集都以 NULL 开头(但最后一个可能不以 NULL 结尾,而是以数据结尾)我还需要每个数据子集的开始和结束日期,以便稍后用于计算最小值、最大值.我想获取日期范围:

I want to calculate Min and Max values of data_diff but separately for each data subset. Each subset of data starts with NULL (but the last one may not end with NULL but with the data) I need also start and end date of each data subset that I can later use for calculating Min, Max values. I would like to get date ranges:

+----------------+--------------+
| diff_date_from | diff_date_to |
+----------------+--------------+
| 2017-01-04     | 2017-01-06   |
| 2017-01-09     | 2017-01-11   |
| 2017-01-13     | 2017-01-13   |
+----------------+--------------+

如果您想获取示例数据,请在此处查询:

If you would like to get the example data here's a query:

CREATE TABLE IF NOT EXISTS `test`
(
   `date_time` DATETIME UNIQUE NOT NULL,
   `data` INT NOT NULL
)
ENGINE InnoDB;

INSERT INTO `test` VALUES
('2017-01-02', 2),
('2017-01-03', 4),
('2017-01-04', 1),
('2017-01-05', 3),
('2017-01-06', 1),
('2017-01-07', 4),
('2017-01-08', 5),
('2017-01-09', -2),
('2017-01-10', 0),
('2017-01-11', -5),
('2017-01-12', 6),
('2017-01-13', 4)

;

SELECT 
    DATE(`date_time`) AS `date`,
    `data`,
    `data_max`,
    IF(`data` < `data_max`,  - (`data_max` - `data`), NULL) 
    AS `data_diff`

FROM
    (
    SELECT 
        `date_time`,
        `data`,
        MAX(`data`) OVER (ORDER BY `date_time` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `data_max`
    FROM
        `test`
    ) t
;

是否可以编写一个提供上述日期范围的查询?还是必须有一个程序或某种技巧?

Is it possible to write a single query that will provide date ranges as above? Or there must be a procedure or some sort of trick applied?

也许带有 OVER 的窗口函数会有所帮助,但我不知道如何在非 NULL 的当前行和从 NULL 前面的行开始的前行之间指定其窗口边界.这完全可行吗?

Maybe a window function with OVER could help but I'm not aware how to specify its window boundary between current row that is not NULL and preceding rows starting from a row preceded by NULL. Is this feasible at all?

RANGE 运算符用于设置窗口边界 文档看起来很有希望:

There is RANGE operator for setting window boundary Documentation that looks promising:

PRECEDING:对于 ROWS,边界是当前行之前的 expr 行.对于 RANGE,边界是值等于当前行值减去 expr 的行;如果当前行值为 NULL,则界限为行的同行.

PRECEDING: For ROWS, the bound is expr rows before the current row. For RANGE, the bound is the rows with values equal to the current row value minus expr; if the current row value is NULL, the bound is the peers of the row.

还有一部分:

在前 10 和后 10 之间按 X ASC 范围排序

ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING

帧从 NULL 开始并在 NULL 停止,因此只包含行值为 NULL.

The frame starts at NULL and stops at NULL, thus includes only rows with value NULL.

但我不明白只包含带有null 的行.也许对于日期范围 2017-01-022017-01-03 但对于 2017-01-032017-01-07怎么会?

But I don't get the point of inlcuding only rows with null. Perhaps for the date range 2017-01-02 to 2017-01-03 but for 2017-01-03 to 2017-01-07 how come?

推荐答案

我一直在研究您的更复杂的问题"(仍在寻找答案),但这里有一个解决方案.鉴于您使用的是窗口函数,我假设您使用的是 MySQL 8,因此也可以使用 CTE:

I've been looking at your "more complex problem" (still working on an answer) but here is a solution for this problem. Given that you are using window functions I'm assuming you're using MySQL 8 and so can use CTEs as well:

WITH cte AS (SELECT DATE(`date_time`) AS `date`,
                    `data`,
                    MAX(`data`) OVER (ORDER BY `date_time` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS `data_max`
             FROM `test`),
cte2 AS (SELECT `date`,
                `data`,
                `data_max`,
                CASE WHEN `data` < `data_max` THEN `data` - `data_max` END AS `data_diff`
         FROM cte)
SELECT `data_max`, 
       MIN(CASE WHEN `data_diff` IS NOT NULL THEN date END) AS diff_date_from,
       MAX(CASE WHEN `data_diff` IS NOT NULL THEN date END) AS diff_date_to
FROM cte2
GROUP BY `data_max`
HAVING diff_date_from IS NOT NULL

输出:

data_max    diff_date_from  diff_date_to
4           2017-01-04      2017-01-06
5           2017-01-09      2017-01-11
6           2017-01-13      2017-01-13

关于 dbfiddle 的演示

相关文章