Oracle SQL - 确定连续值范围

2021-12-05 00:00:00 sql oracle

这是我的桌子:

ID  Name      Department
1   Michael   Marketing
2   Alex      Marketing
3   Tom       Marketing
4   John      Sales
5   Brad      Marketing
6   Leo       Marketing
7   Kevin     Production

我正在尝试查找 ID 范围,其中 Department = 'Marketing':

I am trying to find ID ranges where Department = 'Marketing':

Range   From   To
Range1  1      3    
Range2  5      6

任何帮助将不胜感激.

推荐答案

使用一种叫做 Tabibitosan.

此技术的作用是将每个组的行的位置与整个行集进行比较,以确定同一组中的行是否彼此相邻.

What this technique does is compare the positions of each group's rows to the overall set of rows, in order to work out if rows in the same group are next to each other or not.

例如,使用您的示例数据,这看起来像:

E.g., with your example data, this looks like:

WITH your_table AS (SELECT 1 ID, 'Michael' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 2 ID, 'Alex' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 3 ID, 'Tom' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 4 ID, 'John' NAME, 'Sales' department FROM dual UNION ALL
                    SELECT 5 ID, 'Brad' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 6 ID, 'Leo' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 7 ID, 'Kevin' NAME, 'Production' department FROM dual)
-- end of mimicking your table with data in it. See the SQL below:
SELECT ID,
       NAME,
       department,
       row_number() OVER (ORDER BY ID) overall_rn,
       row_number() OVER (PARTITION BY department ORDER BY ID) department_rn,
       row_number() OVER (ORDER BY ID) - row_number() OVER (PARTITION BY department ORDER BY ID) grp
FROM   your_table;

        ID NAME    DEPARTMENT OVERALL_RN DEPARTMENT_RN        GRP
---------- ------- ---------- ---------- ------------- ----------
         1 Michael Marketing           1             1          0
         2 Alex    Marketing           2             2          0
         3 Tom     Marketing           3             3          0
         4 John    Sales               4             1          3
         5 Brad    Marketing           5             4          1
         6 Leo     Marketing           6             5          1
         7 Kevin   Production          7             1          6

在这里,我给整个数据集的所有行都分配了一个按 id 升序排列的行号(overall_rn 列),并且我给每个部门的行分配了一行编号(department_rn 列),再次按 id 升序.

Here, I've given all the rows across the entire set of data a row number in ascending id order (the overall_rn column), and I've given the rows in each department a row number (the department_rn column), again in ascending id order.

既然我已经这样做了,我们可以从另一个中减去一个(grp 列).

Now that I've done that, we can subtract one from the other (the grp column).

请注意 grp 列中的数字对于彼此相邻的 deparment 行是如何保持不变的,但每次出现间隙时都会发生变化.

Notice how the number in the grp column remains the same for deparment rows that are next to each other, but it changes each time there's a gap.

例如对于营销部门,第 1-3 行彼此相邻且 grp = 0,但第 4 行营销实际上位于整体结果集的第 5 行,因此它现在具有不同的 grp 编号.由于第 5 个营销行位于整个集合的第 6 行,因此它与第 4 个营销行具有相同的 grp 编号,因此我们知道它们彼此相邻.

E.g. for the Marketing department, rows 1-3 are next to each other and have grp = 0, but the 4th Marketing row is actually on the 5th row of the overall results set, so it now has a different grp number. Since the 5th marketing row is on the 6th row of the overall set, it has the same grp number as the 4th marketing row, so we know they're next to each other.

一旦我们获得了 grp 信息,就很容易对部门和我们的新 grp 列进行聚合查询分组,使用 min 和 max 来查找开始和结束 ID:

Once we have that grp information, it's a simple matter of doing an aggregate query grouping on both the department and our new grp column, using min and max to find the start and end ids:

WITH your_table AS (SELECT 1 ID, 'Michael' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 2 ID, 'Alex' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 3 ID, 'Tom' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 4 ID, 'John' NAME, 'Sales' department FROM dual UNION ALL
                    SELECT 5 ID, 'Brad' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 6 ID, 'Leo' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 7 ID, 'Kevin' NAME, 'Production' department FROM dual)
-- end of mimicking your table with data in it. See the SQL below:
SELECT department,
       MIN(ID) start_id,
       MAX(ID) end_id
FROM   (SELECT ID,
               NAME,
               department,
               row_number() OVER (ORDER BY ID) - row_number() OVER (PARTITION BY department ORDER BY ID) grp
        FROM   your_table)
GROUP BY department, grp;

DEPARTMENT   START_ID     END_ID
---------- ---------- ----------
Marketing           1          3
Marketing           5          6
Sales               4          4
Production          7          7

注意,我假设 id 列中的间隙并不重要(即,如果 id = 6 没有行(因此 Leo 和 Kevin 的 id 分别为 7 和 8),那么 Leo 和 Brad 仍会出现在同一组中,开始 id = 5,结束 id = 7.

N.B., I've assumed that gaps in the id columns aren't important (i.e. if there was no row for id = 6 (so Leo and Kevin's ids were 7 and 8 respectively), then Leo and Brad would still appear in the same group, with a start id = 5 and end id = 7.

如果 id 列中的间隙算作表示一个新组,那么您可以只使用 id 来标记整个行集(即无需计算整体_rn;只需使用 id 列).

If gaps in the id columns count as indicating a new group, then you could just use the id to label the overall set of rows (i.e. no need to caluclate the overall_rn; just use the id column instead).

这意味着您的查询将变成:

That means your query would become:

WITH your_table AS (SELECT 1 ID, 'Michael' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 2 ID, 'Alex' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 3 ID, 'Tom' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 4 ID, 'John' NAME, 'Sales' department FROM dual UNION ALL
                    SELECT 5 ID, 'Brad' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 7 ID, 'Leo' NAME, 'Marketing' department FROM dual UNION ALL
                    SELECT 8 ID, 'Kevin' NAME, 'Production' department FROM dual)
-- end of mimicking your table with data in it. See the SQL below:
SELECT department,
       MIN(ID) start_id,
       MAX(ID) end_id
FROM   (SELECT ID,
               NAME,
               department,
               ID - row_number() OVER (PARTITION BY department ORDER BY ID) grp
        FROM   your_table)
GROUP BY department, grp;

DEPARTMENT   START_ID     END_ID
---------- ---------- ----------
Marketing           1          3
Sales               4          4
Marketing           5          5
Marketing           7          7
Production          8          8

相关文章