将 GROUP BY 与 FIRST_VALUE 和 LAST_VALUE 一起使用
我正在处理一些当前以 1 分钟间隔存储的数据,如下所示:
I'm working with some data that is currently stored in 1 minute intervals that looks like this:
CREATE TABLE #MinuteData
(
[Id] INT ,
[MinuteBar] DATETIME ,
[Open] NUMERIC(12, 6) ,
[High] NUMERIC(12, 6) ,
[Low] NUMERIC(12, 6) ,
[Close] NUMERIC(12, 6)
);
INSERT INTO #MinuteData
( [Id], [MinuteBar], [Open], [High], [Low], [Close] )
VALUES ( 1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880 ),
( 2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880 ),
( 3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040 ),
( 4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050 ),
( 5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030 ),
( 6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950 ),
( 7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990 ),
( 8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170 ),
( 9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120 ),
( 10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970 );
SELECT *
FROM #MinuteData;
DROP TABLE #MinuteData;
这些值跟踪货币汇率,因此对于每分钟间隔(柱),有一分钟开始时的 Open
价格和分钟结束时的 Close
价格.High
和 Low
值代表每一分钟内的最高和最低速率.
The values track currency exchange rates, so for each minute interval (bar), there is the Open
price as the minute started and a Close
price for the minute end. The High
and Low
values represent the highest and lowest rate during each individual minute.
期望输出
我希望以 5 分钟为间隔重新格式化这些数据,以产生以下输出:
I'm looking to reformat this data in to 5 minute intervals to produce the following output:
MinuteBar Open Close Low High
2015-01-01 17:00:00.000 1.557870 1.558030 1.557870 1.558100
2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710
这需要来自 5 的第一分钟的 Open
值,来自 5 的最后一分钟的 Close
值.High
和 Low
值表示 5 分钟内的最高 high
和最低 low
速率.
This takes the Open
value from the first minute of the 5, the Close
value from the last minute of the 5. The High
and Low
values represent the highest high
and lowest low
rates across the 5 minute period.
当前解决方案
我有一个解决方案可以做到这一点(如下),但感觉不雅,因为它依赖于 id
值和自连接.此外,我打算在更大的数据集上运行它,所以如果可能的话,我希望以更有效的方式进行:
I have a solution that does this (below), but it feels inelegant as it relies on id
values and self joins. Also, I intend to run it on much larger datasets so I was looking to do it in a more efficient manner if possible:
-- Create a column to allow grouping in 5 minute Intervals
SELECT Id, MinuteBar, [Open], High, Low, [Close],
DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval
INTO #5MinuteData
FROM #MinuteData
ORDER BY minutebar
-- Group by inteval and aggregate prior to self join
SELECT Interval ,
MIN(MinuteBar) AS MinuteBar ,
MIN(Id) AS OpenId ,
MAX(Id) AS CloseId ,
MIN(Low) AS Low ,
MAX(High) AS High
INTO #DataMinMax
FROM #5MinuteData
GROUP BY Interval;
-- Self join to get the Open and Close values
SELECT t1.Interval ,
t1.MinuteBar ,
tOpen.[Open] ,
tClose.[Close] ,
t1.Low ,
t1.High
FROM #DataMinMax t1
INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId
INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;
DROP TABLE #DataMinMax
DROP TABLE #5MinuteData
返工尝试
我一直在考虑使用 FIRST_VALUE 和 LAST_VALUE,因为它似乎是我所追求的,但我不能完全让它与我正在做的分组一起工作.可能有比我正在尝试做的更好的解决方案,所以我愿意接受建议.目前我正在尝试这样做:
Instead of the above queries, I've been looking at using FIRST_VALUE and LAST_VALUE, as it seems to be what I'm after, but I can't quite get it working with the grouping that I'm doing. There might be a better solution than what I'm trying to do, so I'm open to suggestions. Currently I'm trying to do this:
SELECT MIN(MinuteBar) MinuteBar5 ,
FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening,
MAX(High) AS High ,
MIN(Low) AS Low ,
LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing ,
DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval
FROM #MinuteData
GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5
这给了我以下错误,它与 FIRST_VALUE
和 LAST_VALUE
相关,因为如果我删除这些行,查询将运行:
This gives me the below error, which is related to the FIRST_VALUE
and LAST_VALUE
as the query runs if I remove those lines:
列#MinuteData.MinuteBar"在选择列表中无效,因为它不包含在聚合函数或 GROUP BY 子句中.
Column '#MinuteData.MinuteBar' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
推荐答案
SELECT
MIN(MinuteBar) AS MinuteBar5,
Opening,
MAX(High) AS High,
MIN(Low) AS Low,
Closing,
Interval
FROM
(
SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
*
FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing
与您当前的解决方案接近的解决方案.有两个地方你做错了.
A solution close to your current one. There are two places you did wrong.
FIRST_VALUE 和 LAST_VALUE 是分析函数,它们作用于窗口或分区,而不是组.您可以单独运行嵌套查询并查看其结果.
FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.
LAST_VALUE 是当前窗口的最后一个值,在您的查询中没有指定,默认窗口是从当前分区的第一行到当前行的行.您可以按降序使用 FIRST_VALUE 或指定一个窗口
LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with descending order or specify a window
LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5
ORDER BY MinuteBar
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,
相关文章