编写 SQL 以识别分组中的多个子分组

2021-09-10 00:00:00 sql tsql sql-server

我有一个程序可以汇总一个表中的非规范化数据并将其移动到另一个表中，由于数据错误，我们经常在插入时遇到重复的键冲突.我想为用户创建一个报告，以帮助他们确定错误的原因.

I have a program that summarizes non-normalized data in one table and moves it to another and we frequently get a duplicate key violation on the insert due to bad data. I want to create a report for the users to help them identify the cause of the error.

例如，考虑以下人为设计的简单 SQL，它汇总了 Companies 表中的数据并将其插入 CompanySum 中，该 CompanySum 的主键为 State/Zone.为了使 INSERT 不会失败，对于每个唯一的主键 State/Zone 组合，公司/代码的不同组合不能超过一个.如果有，我们希望插入失败，以便可以更正数据.

For example, consider the following contrived simple SQL which summarizes data in the table Companies and inserts it into CompanySum, which has a primary key of State/Zone. In order for the INSERT not to fail, there cannot be more than one distinct combinations of Company/Code for every unique primary key State/Zone combination. If there is, we want the insert to fail so that the data can be corrected.

INSERT INTO CompanySum ( [State] ,[Zone] ,[Company] ,[Code] ,[Revenue] ) SELECT --Keys of target [State] ,[Zone] --We are expecting to have one distinct combination of these fields per key grouping ,[Company] ,[Code] --Aggregate ,SUM([Revenue]) FROM COMPANIES GROUP BY [State] ,[Zone] ,[Company] ,[Code]

我想创建一份报告来帮助用户轻松识别和更正数据，以便在一个州/地区内只有一个不同的公司/代码组合.对于每个不同的州/地区值，我想确定州/地区内不同的公司/代码组合.如果一个州/地区中有多个公司/代码组合，我希望该州/地区中的所有记录都显示在输出中.例如，这里是示例输入和所需的输出:

I would like to create a report to help the users easily identify and correct the data so that there is only one distinct Company/Code combination within a State/Zone. For each distinct State/Zone value, I would like to identify the distinct Company/Code combinations within the State/Zone. If there are more than one Company/Code combinations within a State/Zone, I would like all of the records in the State/Zone to be displayed in the output. For example, here is the sample input and desired output:

Data: RecordNumber State Zone Company Code Revenue ------------ ----- ---- ------- ---- -------- 1 CT B State of CT 65453 10 2 CT B State of CT 65453 3 3 CT B Travelers 33443 20 4 CT C Cigna 45678 24 5 CT C Cigna 45678 234 6 MI A GM 48089 100 7 MI A GM 54555 200 8 MI B Chrysler 43434 44 Desired Output: RecordNumber State Zone Company Code Revenue ------------ ----- ---- ------- ---- -------- 1 CT B State of CT 65453 10 2 CT B State of CT 65453 3 3 CT B Travelers 33443 20 6 MI A GM 48089 100 7 MI A GM 54555 200

这是创建此测试场景所需的 DDL 和 DML

Here is the DDL and DML needed to create this test scenario

CREATE TABLE [dbo].[Companies]( [RecordNumber] [int] NULL, [State] [char](2) NOT NULL, [Zone] [varchar](30) NOT NULL, [Company] [varchar](30) NOT NULL, [Code] [varchar](30) NOT NULL, [Revenue] [numeric](9, 1) NULL ) ON [PRIMARY] CREATE TABLE [dbo].[CompanySum]( [State] [char](2) NOT NULL, [Zone] [varchar](30) NOT NULL, [Company] [varchar](30) NOT NULL, [Code] [varchar](30) NOT NULL, [Revenue] [numeric](9, 1) NULL, CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED ( [State] ASC, [Zone] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] DELETE FROM [dbo].[Companies] GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (1, N'CT', N'B', N'State of CT', N'65453', CAST(10.0 AS Numeric(9, 1))) GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (2, N'CT', N'B', N'State of CT', N'65453', CAST(3.0 AS Numeric(9, 1))) GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (3, N'CT', N'B', N'Travelers', N'33443', CAST(20.0 AS Numeric(9, 1))) GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (4, N'CT', N'C', N'Cigna', N'45678', CAST(24.0 AS Numeric(9, 1))) INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (5, N'CT', N'C', N'Cigna', N'45678', CAST(234.0 AS Numeric(9, 1))) GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (6, N'MI', N'A', N'GM', N'48089', CAST(100.0 AS Numeric(9, 1))) GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (7, N'MI', N'A', N'GM', N'54555', CAST(200.0 AS Numeric(9, 1))) GO INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (8, N'MI', N'B', N'Chrysler', N'43434', CAST(44.0 AS Numeric(9, 1))) GO

这是对我之前一篇文章的更好重构SQL 返回一组关键列中非关键列的唯一组合，我试图帮助澄清问题并提供一个读者可以使用的简单工作示例.

This is a hopefully better re-construction of a previous post of mine SQL to return unique combinations of non key columns within a set of key columns where I am trying to help clarify the question and provide a simple working example that readers can use.

请看这个 SQL 小提琴:

Please see this SQL Fiddle:

http://sqlfiddle.com/#!18/d0141/1

推荐答案

这是解决方案吗?

小提琴:http://sqlfiddle.com/#!18/12e9a0/9

select c.* from Companies c inner join ( select State, Zone from Companies group by State, Zone having count(distinct Company + Code) > 1 ) as dup_state_zone on( c.State = dup_state_zone.State and c.Zone = dup_state_zone.Zone )

已编辑 - 修复了 have 子句，有一点作弊...

Edited - Fix the having clause, with a little cheat...

相关文章