移动中位数,T-SQL 中的模式

2021-09-25 00:00:00 window-functions tsql sql-server

我使用的是 SQL Server 2012,我知道计算移动平均线非常简单.但我需要的是获取像这样定义的窗口框架的模式和中位数(当前行之前的窗口为 2;月份唯一):

I am using SQL Server 2012 and I know it is quite simple to calculate moving averages. But what I need is to get the mode and the median for a defined window frame like so (with a window of 2 preceding to current row; month unique):

MONTH | CODE | MEDIAN | MODE
   1      0        0      0
   2      3        1.5    0
   3      2        2      0
   4      2        2      2
   5      2        2      2
   6      5        2      2
   7      3        3      2

如果有多个值符合模式,则选择第一个.

If several values qualify as mode, than pick the first.

推荐答案

我彻底评论了我的代码.阅读我对我的模式计算的评论,让我知道它需要调整.总的来说,这是一个相对简单的查询.它只是有很多丑陋的子查询,并且有很多评论.看看:

I commented my code thoroughly. Read my comments on my Mode calculations and let me know it needs tweaking. Overall, it's a relatively simple query. It just has a lot of ugly subqueries and it has a lot of comments. Check it out:

DECLARE @Table TABLE ([Month] INT,[Code] INT);
INSERT INTO @Table
    VALUES  (1,0),
            (2,3),
            (3,2),
            (4,2), --Try commenting this out to test my special mode thingymajig
            (5,2),
            (6,5),
            (7,3);

WITH CTE
AS
(
SELECT  ROW_NUMBER() OVER (ORDER BY [Month]) row_num,
        [Month],
        CAST(Code AS FLOAT) Code
FROM @Table
)

SELECT [Month],
        Code,
        ISNULL((
                SELECT  CASE
                            --When there is only one previous value at row_num = 2, find Mean of first two codes
                            WHEN A.row_num = 2 THEN (LAG(B.code,1) OVER (ORDER BY [Code]) + B.Code)/2.0
                            --Else find middle code value of current and previous two rows
                            ELSE B.Code
                        END
                FROM CTE B 
                --How subquery relates to outer query
                WHERE B.row_num BETWEEN A.row_num - 2 AND A.row_num 
                ORDER BY B.[Code] 
                --Order by code and offset by 1 so don't select the lowest value, but fetch the one above the lowest value
                OFFSET 1 ROW FETCH NEXT 1 ROW ONLY),
        0) AS Median,
        --I did mode a little different
            --Instead of Avg(D.Code) you could list the values because with mode, 
                --If there's a tie with more than one of each number, you have multiple modes
                --Instead of doing that, I simply return the mean of the tied modes
                    --When there's one, it doesn't change anything.
                        --If you were to delete the month 4, then your number of Codes 2 and number of Codes 3 would be the same in the last row.
                        --Proper mode would be 2,3. I instead average them out to be 2.5.
        ISNULL((
                SELECT AVG(D.Code)
                FROM (
                    SELECT  C.Code,
                            COUNT(*) cnt,
                            DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) dnse_rank
                    FROM CTE C
                    WHERE C.row_num <= A.row_num
                    GROUP BY C.Code
                    HAVING COUNT(*) > 1) D
                WHERE D.dnse_rank = 1),
        0) AS Mode
FROM CTE A

结果:

Month       Code                   Median                 Mode
----------- ---------------------- ---------------------- ----------------------
1           0                      0                      0
2           3                      1.5                    0
3           2                      2                      0
4           2                      2                      2
5           2                      2                      2
6           5                      2                      2
7           3                      3                      2

相关文章