在 GROUP BY 后连接一个字段

2021-12-27 00:00:00 sql group-by sql-server scalability sql-server-group-concat

这个问题在 SO 中被问过很多次，但没有一个答案对我的情况满意.

问题 1
问题 2
问题 3
问题 4

我正在处理一个 DataObjectVersions 表，该表包含大约 120 万个唯一对象的多个版本(并且还在增加).我需要为每个唯一对象连接来自特定字段的更改.

现在我正在使用 Q3 中提供的带有 XML 路径的解决方案，但是在这个表上运行这样的查询是一个整体性能灾难.SQL Server 在 1900 万后开始重新调整数据.知道此数据将被连接两次以上，您可以想象其影响.

我正在寻找最有效的可扩展性感知方式来连接由其他字段(这当然不是键)分组的不同行的相同字段的值.更准确地说，这是在数据仓库的视图中使用的.

我试图简化描述，但这里有一个完整的概述我有多个带有以下列的表

<前>[ID][创建时间][由...制作][删除时间][删除者][资源编号][帐户ID][类型]

一个视图用于返回所有表中所有记录的并集，它仍将返回相同的列(在我的问题中由版本表描述).[ResourceId] 和 [AccountId] 是对象(组成员、系统帐户等)的唯一复合标识符.具体来说是资源分配.[Type] 用于标识不同的级别(如文件分配情况下的读/写/执行)

对于不同的唯一对象，所有其他字段都包含相同的值(在不同的表中).我需要获取对象并连接 [Type] 列的值.之后处理所有行，并且 ([ResourceId],[AccountId]) 组合必须是唯一的(不同类型存在时不是这种情况).

编辑 2:

我正在使用这个功能:

创建函数 [dbo].[GetUniqueType](@ResourceId 作为唯一标识符，@Account 作为唯一标识符)返回 nvarchar(100)作为开始return STUFF((select ',' + raType.Type from vwAllAssignments raType where raType.AccountId = @Account and raType.ResourceId = @ResourceId and raType.DeletedBy is null for xml path('')), 1,1,'')结尾去

vwAllAssignments 是返回所有表行的并集的视图.

我终于选择了

SELECT [创建时间],[删除时间],[删除者],[资源ID]，[帐户ID],dbo.GetUniqueType([ResourceId],[AccountId]) AS [类型]从 vwAllAssignmentsGROUP BY [ResourceId]、[AccountId]、[CreatedTime]、[DeletedTime]、[DeletedBy]

解决方案

试试这个:

SELECT [创建时间],[删除时间],[删除者],[资源ID]，[帐户ID],STUFF((选择',' + raType.Type来自 vwAllAssignments raType其中 raType.AccountId = vwAllAssignments.AccountId 和raType.ResourceId = vwAllAssignments.ResourceId 和raType.DeletedBy 为空for xml path('')), 1,1,'') AS [类型]从 vwAllAssignmentsGROUP BY [ResourceId]、[AccountId]、[CreatedTime]、[DeletedTime]、[DeletedBy]

像这样的索引应该会有所帮助.

在 vwAllAssignments(AccountId, ResourceId, DeletedBy) 上创建索引 IX_vwAllAssignments include(Type)

This question have been asked many times in SO but none of the answers is satisfying to my situation.

Question 1
Question 2
Question 3
Question 4

I am dealing with a DataObjectVersions table that contains multiple versions for around 1.2 million unique objects (and increasing). I need to concatenate changes from a specific field for each unique object.

Right now I am using the solution with the XML Path presented in Q3 but running such a query on this table is a total performance disaster. SQL Server started to retun Data after 19mn. Knowing that this data will be than joined twice, you can imagine the impact.

I am looking for the most efficient scalability-aware way to concatenate the values of the same fields of different rows grouped by an other field (which is not of course a key). To be more precise, this is used within a view in a Datawarehouse.

EDIT:

I tried to simplify the description but here is a complete overview I have multiple tables with the following columns

   [ID]
   [CreatedTime]
   [CreatedBy]
   [DeletedTime]
   [DeletedBy]
   [ResourceId]
   [AccountId]
   [Type]

A view is used to return the union of all records from all tables, which will still return the same columns (described in my questions by the versions table). [ResourceId] and [AccountId] are a unique composite identifier of an object (Group membership, System account, etc.. a resource assignment specifically). The [Type] is used to identify different levels (like Read/Write/Execute in the case of a file assignment)

All other fields contain the same values (in different tables) for different unique objects. I need to get the objects and concatenate the values of the [Type] column. All the row are processed afterward and the ([ResourceId],[AccountId]) combination must be unique (not the case when different types exists).

EDIT 2:

I am using this function:

CREATE FUNCTION [dbo].[GetUniqueType]
(
    @ResourceId as uniqueidentifier,
    @Account as uniqueidentifier
)
RETURNS nvarchar(100)
AS
BEGIN   
    return STUFF((select ',' + raType.Type from vwAllAssignments raType where raType.AccountId = @Account and raType.ResourceId = @ResourceId and raType.DeletedBy is null for xml path('')), 1,1,'')
END

GO

vwAllAssignments is the view returning the union of all tables rows.

Finally I am selecting

SELECT [CreatedTime]
      ,[DeletedTime]
      ,[DeletedBy]
      ,[ResourceId]
      ,[AccountId]
      ,dbo.GetUniqueType([ResourceId],[AccountId]) AS [Type]
FROM vwAllAssignments
GROUP BY [ResourceId], [AccountId], [CreatedTime], [DeletedTime], [DeletedBy]

解决方案

Try this:

SELECT [CreatedTime]
      ,[DeletedTime]
      ,[DeletedBy]
      ,[ResourceId]
      ,[AccountId]
      ,STUFF((select ',' + raType.Type 
              from vwAllAssignments raType 
              where raType.AccountId = vwAllAssignments.AccountId and 
                    raType.ResourceId = vwAllAssignments.ResourceId and 
                    raType.DeletedBy is null 
              for xml path('')), 1,1,'') AS [Type]
FROM vwAllAssignments
GROUP BY [ResourceId], [AccountId], [CreatedTime], [DeletedTime], [DeletedBy]

And an index like this should be helpful.

create index IX_vwAllAssignments on vwAllAssignments(AccountId, ResourceId, DeletedBy) include(Type)

相关文章