将连续的日期有效期间隔连接在一起
我有一系列记录,其中包含一些具有时间有效性的信息(产品类型).
I have a series of records containing some information (product type) with temporal validity.
如果分组信息(产品类型)保持不变,我想将相邻的有效间隔融合在一起.我不能使用带有 MIN
和 MAX
的简单 GROUP BY
,因为某些产品类型(在示例中为 A
) 可以离开"和回来".
I would like to meld together adjacent validity intervals, provided that the grouping information (the product type) stays the same. I cannot use a simple GROUP BY
with MIN
and MAX
, because some product types (A
, in the example) can "go away" and "come back".
使用 Oracle 11g.
Using Oracle 11g.
MySQL 的一个类似问题是:我如何在 MySQL 中做一个连续的组?
A similar question for MySQL is: How can I do a contiguous group by in MySQL?
输入数据:
| PRODUCT | START_DATE | END_DATE |
|---------|----------------------------------|----------------------------------|
| A | July, 01 2013 00:00:00+0000 | July, 31 2013 00:00:00+0000 |
| A | August, 01 2013 00:00:00+0000 | August, 31 2013 00:00:00+0000 |
| A | September, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
| B | October, 01 2013 00:00:00+0000 | October, 31 2013 00:00:00+0000 |
| B | November, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 |
| A | December, 01 2013 00:00:00+0000 | December, 31 2013 00:00:00+0000 |
| A | January, 01 2014 00:00:00+0000 | January, 31 2014 00:00:00+0000 |
| A | February, 01 2014 00:00:00+0000 | February, 28 2014 00:00:00+0000 |
| A | March, 01 2014 00:00:00+0000 | March, 31 2014 00:00:00+0000 |
预期结果:
| PRODUCT | START_DATE | END_DATE |
|---------|---------------------------------|----------------------------------|
| A | July, 01 2013 00:00:00+0000 | September, 30 2013 00:00:00+0000 |
| B | October, 01 2013 00:00:00+0000 | November, 30 2013 00:00:00+0000 |
| A | December, 01 2013 00:00:00+0000 | March, 31 2014 00:00:00+0000 |
查看完整的 SQL Fiddle.
推荐答案
这是一个间隙和孤岛问题.有多种方法可以接近它;这使用 lead
和 lag
分析函数:
This is a gaps-and-islands problem. There are various ways to approach it; this uses lead
and lag
analytic functions:
select distinct product,
case when start_date is null then lag(start_date)
over (partition by product order by rn) else start_date end as start_date,
case when end_date is null then lead(end_date)
over (partition by product order by rn) else end_date end as end_date
from (
select product, start_date, end_date, rn
from (
select t.product,
case when lag(end_date)
over (partition by product order by start_date) is null
or lag(end_date)
over (partition by product order by start_date) != start_date - 1
then start_date end as start_date,
case when lead(start_date)
over (partition by product order by start_date) is null
or lead(start_date)
over (partition by product order by start_date) != end_date + 1
then end_date end as end_date,
row_number() over (partition by product order by start_date) as rn
from t
)
where start_date is not null or end_date is not null
)
order by start_date, product;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13 30-SEP-13
B 01-OCT-13 30-NOV-13
A 01-DEC-13 31-MAR-14
SQL 小提琴
最里面的查询查看产品的前后记录,如果记录不连续,则仅保留开始和/或结束时间:
The innermost query looks at the preceding and following records for the product, and only retains the start and/or end time if the records are not contiguous:
select t.product,
case when lag(end_date)
over (partition by product order by start_date) is null
or lag(end_date)
over (partition by product order by start_date) != start_date - 1
then start_date end as start_date,
case when lead(start_date)
over (partition by product order by start_date) is null
or lead(start_date)
over (partition by product order by start_date) != end_date + 1
then end_date end as end_date
from t;
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13
A
A 30-SEP-13
A 01-DEC-13
A
A
A 31-MAR-14
B 01-OCT-13
B 30-NOV-13
选择的下一级别删除那些处于中期的日期,其中两个日期都被内部查询清空,这给出:
The next level of select removes those which are mid-period, where both dates were blanked by the inner query, which gives:
PRODUCT START_DATE END_DATE
------- ---------- ---------
A 01-JUL-13
A 30-SEP-13
A 01-DEC-13
A 31-MAR-14
B 01-OCT-13
B 30-NOV-13
外部查询然后折叠那些相邻的对;我使用了创建重复项然后用 distinct
消除它们的简单方法,但是您可以通过其他方式来实现,例如将两个值放入一对行中,并将两个值保留在另一个中null,然后用另一层选择消除那些,但我认为这里可以区分.
The outer query then collapses those adjacent pairs; I've used the easy route of creating duplicates and then eliminating them with distinct
, but you can do it other ways, like putting both values into one of the pairs of rows and leaving both values in the other null, and then eliminating those with another layer of select, but I think distinct is OK here.
如果您的实际用例有时间,而不仅仅是日期,那么您需要调整内部查询中的比较;而不是 +/- 1,可能是 1 秒的间隔,或者 1/86400(如果您愿意),但取决于您的值的精度.
If your real-world use case has times, not just dates, then you'll need to adjust the comparison in the inner query; rather than +/- 1, an interval of 1 second perhaps, or 1/86400 if you prefer, but depends on the precision of your values.
相关文章