Oracle 'Partition By' 和 'Row_Number' 关键字

我有一个由其他人编写的 SQL 查询,我正在尝试弄清楚它的作用.有人可以解释一下 Partition ByRow_Number 关键字在这里的作用并给出一个简单的例子,以及为什么要使用它?>

分区示例:

(SELECT cdt.*,ROW_NUMBER ()OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currencyORDER BY cdt.country_code, cdt.account, cdt.currency)序列号从 CUSTOMER_DETAILS cdt);

我在网上看过一些例子,它们有点太深入了.

提前致谢!

解决方案

PARTITION BY 隔离集,这使您能够工作(ROW_NUMBER(),COUNT(),SUM(),等)在相关集合上独立.

在您的查询中,相关集合由具有类似 cdt.country_code、cdt.account、cdt.currency 的行组成.当您对这些列进行分区并对它们应用 ROW_NUMBER 时.这些组合/集合中的其他列将从 ROW_NUMBER 接收序列号

但是这个查询很有趣,如果你的分区由一些唯一的数据组成,并且你在它上面放了一个 row_number,它只会产生相同的数字.这就像您在保证唯一的分区上执行 ORDER BY 一样.例如,将 GUID 视为 cdt.country_code、cdt.account、cdt.currency

的唯一组合

newid() 生成 GUID,那么您对这个表达式有什么期望?

选择row_number() over(partition by newid() order by ho)来自 tbl;

...对了,所有的分区(没有分区,每一行都分区在自己的行中)行的row_numbers都设置为1

基本上,您应该对非唯一列进行分区.OVER 上的 ORDER BY 需要 PARTITION BY 有一个非唯一的组合,否则所有的 row_numbers 都会变成 1

例如,这是您的数据:

create table tbl(hi varchar, ho varchar);插入 tbl 值('斧头'),('A','Y'),('A','Z'),('B','W'),('B','W'),('C','L'),('C','L');

那么这类似于您的查询:

选择row_number() over(按 ho 分区,按 ho 排序)来自 tbl;

它的输出是什么?

HI HO COLUMN_2一个 X 1A Y 11黑白 1黑白 2CL 1CL 2

你看到HI HO的组合了吗?前三行有唯一的组合,因此它们被设置为1,B行具有相同的W,因此不同的ROW_NUMBERS,对于HI C行也是如此.

现在,为什么需要 ORDER BY 呢?如果之前的开发者只是想在相似的数据上放一个row_number(例如HI B,所有数据都是B-W,B-W),他可以这样做:

选择row_number() over(按 ho 分区)来自 tbl;

但是,遗憾的是,Oracle(和 Sql Server 也是如此)不允许没有 ORDER BY 的分区;而在 Postgresql 中,PARTITION 上的 ORDER BY 是可选的:http:///www.sqlfiddle.com/#!1/27821/1

选择row_number() over(按 ho 分区)来自 tbl;

你分区上的ORDER BY看起来有点多余,不是因为前任开发者的错,有些数据库就是不允许PARTITION没有ORDERBY,他可能找不到合适的候选列进行排序.如果 PARTITION BY 列和 ORDER BY 列相同,只需删除 ORDER BY,但由于某些数据库不允许这样做,您可以这样做:

SELECT cdt.*,ROW_NUMBER ()OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency按 newid() 排序)序列号来自 CUSTOMER_DETAILS cdt

您找不到用于对相似数据进行排序的好列?您也可以随机排序,无论如何,分区数据具有相同的值.例如,您可以使用 GUID(对于 SQL Server,您使用 newid()).因此,具有与以前的开发人员相同的输出,不幸的是,某些数据库不允许 PARTITION 没有 ORDER BY

尽管如此,它让我难以捉摸,而且我找不到将数字放在相同组合上的充分理由(在上面的示例中为 B-W、B-W).它给人的印象是数据库具有冗余数据.不知怎的让我想起了这一点:如何从表的同一记录列表中获取唯一记录?表中没有唯一约束

看到 PARTITION BY 与 ORDER BY 的列组合相同,这真的看起来很神秘,无法轻易推断出代码的意图.

现场测试:http://www.sqlfiddle.com/#!3/27821/6


但正如 dbaseman 也注意到的那样,在同一列上进行分区和排序是没有用的.

你有一组这样的数据:

create table tbl(hi varchar, ho varchar);插入 tbl 值('斧头'),('斧头'),('斧头'),('经过'),('经过'),('C','Z'),('C','Z');

然后你 PARTITION BY ho;然后你通过 ho 订购.对类似数据进行编号是没有意义的 :-) http://www.sqlfiddle.com/#!3/29ab8/3

选择row_number() over(partition by ho order by ho) as nr来自 tbl;

输出:

HI HO ROW_QUERY_A一个 X 1一个 X 2一个 X 3乙 1乙 2CZ 1CZ 2

看到了吗?为什么需要将行号放在相同的组合上?你将分析三重 A、X、双 B、Y、双 C、Z 什么?:-)


您只需要在非唯一列上使用 PARTITION,然后对非唯一列的 unique 列进行排序.例子会更清楚:

create table tbl(hi varchar, ho varchar);插入 tbl 值('广告'),('A','E'),('A','F'),('B','F'),('是'),('C','E'),('光盘');选择row_number() over(partition by hi order by ho) as nr来自 tbl;

PARTITION BY hi 对非唯一列进行操作,然后在每个分区列上对其唯一列(ho)进行排序,ORDER BY ho

输出:

HI HO NRA D 1乙 2一个 F 3乙 1B F 2光盘 12

那个数据集更有意义

现场测试:http://www.sqlfiddle.com/#!3/d0b44/1

这类似于您在 PARTITION BY 和 ORDER BY 上具有相同列的查询:

选择row_number() over(partition by ho order by ho) as nr来自 tbl;

这是输出:

HI HO NRA D 11一个 F 1乙 1乙 F 1光盘 11

看到了吗?没有意义?

现场测试:http://www.sqlfiddle.com/#!3/d0b44/3


最后这可能是正确的查询:

SELECT cdt.*,ROW_NUMBER ()OVER (PARTITION BY cdt.country_code, cdt.account -- 删除: cdt.currency订购者-- 删除:cdt.country_code、cdt.account、cdt.currency) -- 保留序列号来自 CUSTOMER_DETAILS cdt

I have a SQL query written by someone else and I'm trying to figure out what it does. Can someone please explain what the Partition By and Row_Number keywords does here and give a simple example of it in action, as well as why one would want to use it?

An example of partition by:

(SELECT cdt.*,
        ROW_NUMBER ()
        OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency
              ORDER BY cdt.country_code, cdt.account, cdt.currency)
           seq_no
   FROM CUSTOMER_DETAILS cdt);

I've seen some examples online, they are in bit too depth.

Thanks in advance!

解决方案

PARTITION BY segregate sets, this enables you to be able to work(ROW_NUMBER(),COUNT(),SUM(),etc) on related set independently.

In your query, the related set comprised of rows with similar cdt.country_code, cdt.account, cdt.currency. When you partition on those columns and you apply ROW_NUMBER on them. Those other columns on those combination/set will receive sequential number from ROW_NUMBER

But that query is funny, if your partition by some unique data and you put a row_number on it, it will just produce same number. It's like you do an ORDER BY on a partition that is guaranteed to be unique. Example, think of GUID as unique combination of cdt.country_code, cdt.account, cdt.currency

newid() produces GUID, so what shall you expect by this expression?

select
   hi,ho,
   row_number() over(partition by newid() order by hi,ho)
from tbl;

...Right, all the partitioned(none was partitioned, every row is partitioned in their own row) rows' row_numbers are all set to 1

Basically, you should partition on non-unique columns. ORDER BY on OVER needed the PARTITION BY to have a non-unique combination, otherwise all row_numbers will become 1

An example, this is your data:

create table tbl(hi varchar, ho varchar);

insert into tbl values
('A','X'),
('A','Y'),
('A','Z'),
('B','W'),
('B','W'),
('C','L'),
('C','L');

Then this is analogous to your query:

select
   hi,ho,
   row_number() over(partition by hi,ho order by hi,ho)
from tbl;

What will be the output of that?

HI  HO  COLUMN_2
A   X   1
A   Y   1
A   Z   1
B   W   1
B   W   2
C   L   1
C   L   2

You see thee combination of HI HO? The first three rows has unique combination, hence they are set to 1, the B rows has same W, hence different ROW_NUMBERS, likewise with HI C rows.

Now, why is the ORDER BY needed there? If the previous developer merely want to put a row_number on similar data (e.g. HI B, all data are B-W, B-W), he can just do this:

select
   hi,ho,
   row_number() over(partition by hi,ho)
from tbl;

But alas, Oracle(and Sql Server too) doesn't allow partition with no ORDER BY; whereas in Postgresql, ORDER BY on PARTITION is optional: http://www.sqlfiddle.com/#!1/27821/1

select
   hi,ho,
   row_number() over(partition by hi,ho)
from tbl;

Your ORDER BY on your partition look a bit redundant, not because of the previous developer's fault, some database just don't allow PARTITION with no ORDER BY, he might not able find a good candidate column to sort on. If both PARTITION BY columns and ORDER BY columns are the same just remove the ORDER BY, but since some database don't allow it, you can just do this:

SELECT cdt.*,
        ROW_NUMBER ()
        OVER (PARTITION BY cdt.country_code, cdt.account, cdt.currency
              ORDER BY newid())
           seq_no
   FROM CUSTOMER_DETAILS cdt

You cannot find a good column to use for sorting similar data? You might as well sort on random, the partitioned data have the same values anyway. You can use GUID for example(you use newid() for SQL Server). So that has the same output made by previous developer, it's unfortunate that some database doesn't allow PARTITION with no ORDER BY

Though really, it eludes me and I cannot find a good reason to put a number on the same combinations (B-W, B-W in example above). It's giving the impression of database having redundant data. Somehow reminded me of this: How to get one unique record from the same list of records from table? No Unique constraint in the table

It really looks arcane seeing a PARTITION BY with same combination of columns with ORDER BY, can not easily infer the code's intent.

Live test: http://www.sqlfiddle.com/#!3/27821/6


But as dbaseman have noticed also, it's useless to partition and order on same columns.

You have a set of data like this:

create table tbl(hi varchar, ho varchar);

insert into tbl values
('A','X'),
('A','X'),
('A','X'),
('B','Y'),
('B','Y'),
('C','Z'),
('C','Z');

Then you PARTITION BY hi,ho; and then you ORDER BY hi,ho. There's no sense numbering similar data :-) http://www.sqlfiddle.com/#!3/29ab8/3

select
   hi,ho,
   row_number() over(partition by hi,ho order by hi,ho) as nr
from tbl;

Output:

HI  HO  ROW_QUERY_A
A   X   1
A   X   2
A   X   3
B   Y   1
B   Y   2
C   Z   1
C   Z   2

See? Why need to put row numbers on same combination? What you will analyze on triple A,X, on double B,Y, on double C,Z? :-)


You just need to use PARTITION on non-unique column, then you sort on non-unique column(s)'s unique-ing column. Example will make it more clear:

create table tbl(hi varchar, ho varchar);

insert into tbl values
('A','D'),
('A','E'),
('A','F'),
('B','F'),
('B','E'),
('C','E'),
('C','D');

select
   hi,ho,
   row_number() over(partition by hi order by ho) as nr
from tbl;

PARTITION BY hi operates on non unique column, then on each partitioned column, you order on its unique column(ho), ORDER BY ho

Output:

HI  HO  NR
A   D   1
A   E   2
A   F   3
B   E   1
B   F   2
C   D   1
C   E   2

That data set makes more sense

Live test: http://www.sqlfiddle.com/#!3/d0b44/1

And this is similar to your query with same columns on both PARTITION BY and ORDER BY:

select
   hi,ho,
   row_number() over(partition by hi,ho order by hi,ho) as nr
from tbl;

And this is the ouput:

HI  HO  NR
A   D   1
A   E   1
A   F   1
B   E   1
B   F   1
C   D   1
C   E   1

See? no sense?

Live test: http://www.sqlfiddle.com/#!3/d0b44/3


Finally this might be the right query:

SELECT cdt.*,
     ROW_NUMBER ()
     OVER (PARTITION BY cdt.country_code, cdt.account -- removed: cdt.currency
           ORDER BY 
               -- removed: cdt.country_code, cdt.account, 
               cdt.currency) -- keep
        seq_no
FROM CUSTOMER_DETAILS cdt

相关文章