将 GROUP BY 与 FIRST_VALUE 和 LAST_VALUE 一起使用

我正在处理一些当前以 1 分钟间隔存储的数据,如下所示:

I'm working with some data that is currently stored in 1 minute intervals that looks like this:

CREATE TABLE #MinuteData
    (
      [Id] INT ,
      [MinuteBar] DATETIME ,
      [Open] NUMERIC(12, 6) ,
      [High] NUMERIC(12, 6) ,
      [Low] NUMERIC(12, 6) ,
      [Close] NUMERIC(12, 6)
    );

INSERT  INTO #MinuteData
        ( [Id], [MinuteBar], [Open], [High], [Low], [Close] )
VALUES  ( 1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880 ),
        ( 2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880 ),
        ( 3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040 ),
        ( 4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050 ),
        ( 5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030 ),
        ( 6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950 ),
        ( 7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990 ),
        ( 8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170 ),
        ( 9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120 ),
        ( 10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970 );

SELECT  *
FROM    #MinuteData;

DROP TABLE #MinuteData;

这些值跟踪货币汇率,因此对于每分钟间隔(柱),有一分钟开始时的 Open 价格和分钟结束时的 Close 价格.HighLow 值代表每一分钟内的最高和最低速率.

The values track currency exchange rates, so for each minute interval (bar), there is the Open price as the minute started and a Close price for the minute end. The High and Low values represent the highest and lowest rate during each individual minute.

期望输出

我希望以 5 分钟为间隔重新格式化这些数据,以产生以下输出:

I'm looking to reformat this data in to 5 minute intervals to produce the following output:

MinuteBar                Open       Close       Low         High
2015-01-01 17:00:00.000  1.557870   1.558030    1.557870    1.558100
2015-01-01 17:05:00.000  1.558580   1.557970    1.557870    1.558710

这需要来自 5 的第一分钟的 Open 值,来自 5 的最后一分钟的 Close 值.HighLow 值表示 5 分钟内的最高 high 和最低 low 速率.

This takes the Open value from the first minute of the 5, the Close value from the last minute of the 5. The High and Low values represent the highest high and lowest low rates across the 5 minute period.

当前解决方案

我有一个解决方案可以做到这一点(如下),但感觉不雅,因为它依赖于 id 值和自连接.此外,我打算在更大的数据集上运行它,所以如果可能的话,我希望以更有效的方式进行:

I have a solution that does this (below), but it feels inelegant as it relies on id values and self joins. Also, I intend to run it on much larger datasets so I was looking to do it in a more efficient manner if possible:

-- Create a column to allow grouping in 5 minute Intervals
SELECT  Id, MinuteBar, [Open], High, Low, [Close], 
DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval
INTO    #5MinuteData
FROM    #MinuteData
ORDER BY minutebar

-- Group by inteval and aggregate prior to self join
SELECT  Interval ,
        MIN(MinuteBar) AS MinuteBar ,
        MIN(Id) AS OpenId ,
        MAX(Id) AS CloseId ,
        MIN(Low) AS Low ,
        MAX(High) AS High
INTO    #DataMinMax
FROM    #5MinuteData
GROUP BY Interval;

-- Self join to get the Open and Close values
SELECT  t1.Interval ,
        t1.MinuteBar ,
        tOpen.[Open] ,
        tClose.[Close] ,
        t1.Low ,
        t1.High
FROM    #DataMinMax t1
        INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId
        INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;

DROP TABLE #DataMinMax
DROP TABLE #5MinuteData

返工尝试

我一直在考虑使用 FIRST_VALUE 和 LAST_VALUE,因为它似乎是我所追求的,但我不能完全让它与我正在做的分组一起工作.可能有比我正在尝试做的更好的解决方案,所以我愿意接受建议.目前我正在尝试这样做:

Instead of the above queries, I've been looking at using FIRST_VALUE and LAST_VALUE, as it seems to be what I'm after, but I can't quite get it working with the grouping that I'm doing. There might be a better solution than what I'm trying to do, so I'm open to suggestions. Currently I'm trying to do this:

SELECT  MIN(MinuteBar) MinuteBar5 ,
        FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening,
        MAX(High) AS High ,
        MIN(Low) AS Low ,
        LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing ,
        DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval
FROM    #MinuteData
GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5

这给了我以下错误,它与 FIRST_VALUELAST_VALUE 相关,因为如果我删除这些行,查询将运行:

This gives me the below error, which is related to the FIRST_VALUE and LAST_VALUE as the query runs if I remove those lines:

列#MinuteData.MinuteBar"在选择列表中无效,因为它不包含在聚合函数或 GROUP BY 子句中.

Column '#MinuteData.MinuteBar' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

推荐答案

SELECT 
    MIN(MinuteBar) AS MinuteBar5,
    Opening,
    MAX(High) AS High,
    MIN(Low) AS Low,
    Closing,
    Interval
FROM 
(
    SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
           FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
           DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
           *
    FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing

与您当前的解决方案接近的解决方案.有两个地方你做错了.

A solution close to your current one. There are two places you did wrong.

  1. FIRST_VALUE 和 LAST_VALUE 是分析函数,它们作用于窗口或分区,而不是组.您可以单独运行嵌套查询并查看其结果.

  1. FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.

LAST_VALUE 是当前窗口的最后一个值,在您的查询中没有指定,默认窗口是从当前分区的第一行到当前行的行.您可以按降序使用 FIRST_VALUE 或指定一个窗口

LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with descending order or specify a window

LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 
            ORDER BY MinuteBar 
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,

相关文章