sql选择具有匹配子集的记录

2021-09-10 00:00:00 sql tsql sql-server

有两组员工:经理和咕噜声.
对于每位经理,都有一个表 manager_meetings,其中包含每位经理参加的会议的列表.一个类似的表 grunt_meetings 包含每个 grunt 参加的会议的列表.

There are two sets of employees: managers and grunts.
For each manager, there's a table manager_meetings that holds a list of which meetings each manager attended. A similar table grunt_meetings holds a list of which meetings each grunt attended.

所以:

manager_meetings                          grunt_meetings
managerID      meetingID                  gruntID        meetingID
1              a                          4              a
1              b                          4              b
1              c                          4              c
2              a                          4              d
2              b                          5              a 
3              c                          5              b
3              d                          5              c
3              e                          6              a 
                                          6              c 
                                          7              b 
                                          7              a

当经理和咕噜声知道完全相同的信息时,所有者不喜欢它.这让他的头很痛.他想确定这种情况,因此他可以将经理降级为咕噜声,或将咕噜声提升为经理,或者让他们一起打高尔夫球.店主喜欢打高尔夫球.

The owner doesn't like it when a manager and a grunt know exactly the same information. It makes his head hurt. He wants to identify this situation, so he can demote the manager to a grunt, or promote the grunt to a manager, or take them both golfing. The owner likes to golf.

任务是列出 manager 和 grunt 参加完全相同会议的每个组合.如果经理参加的会议多于咕噜声,则无法匹配.如果咕噜声参加的会议比经理多,那就没有对手了.

The task is to list every combination of manager and grunt where both attended exactly the same meetings. If the manager attended more meeting than the grunt, no match. If the grunt attended more meetings than the manager, no match.

这里的预期结果是:

ManagerID            GruntID
2                    7
1                    5 

...因为 manager 2 和 grunt 7 都参加了 (a,b),而 manager 1 和 grunt 5 都参加了 (a,b,c).

...because manager 2 and grunt 7 both attended (a,b), while manager 1 and grunt 5 both attended (a,b,c).

通过将子查询中的会议子集转换为 XML,并将每个 grunt 的 XML 列表与每个经理的 XML 进行比较,我可以用笨拙的方式解决它.但这太可怕了,而且我还必须向所有者解释什么是 XML.而且我不喜欢打高尔夫球.

I can solve it in a clunky way, by pivoting up the subset of meetings in a subquery into XML, and comparing each grunt's XML list to each manager's XML. But that's horrible, and also I have to explain to the owner what XML is. And I don't like golfing.

有没有更好的方法来做"WHERE {subset1} = {subset2}"?感觉就像我错过了某种巧妙的连接.

Is there some better way to do "WHERE {subset1} = {subset2}"? It feels like I'm missing some clever kind of join.

SQL 小提琴

推荐答案

这是一个有效的版本:

select m.mId, g.gId, count(*) --select m.mid, g.gid, mm.meetingid, gm.meetingid as gmm
from manager m cross join
     grunt g left outer join
     (select mm.*, count(*) over (partition by mm.mid) as cnt
      from manager_meeting mm
     ) mm
     on mm.mid = m.mId full outer join
     (select gm.*, count(*) over (partition by gm.gid) as cnt
      from grunt_meeting gm
     ) gm
     on gm.gid = g.gid and gm.meetingid = mm.meetingid 
group by m.mId, g.gId, mm.cnt, gm.cnt
having count(*) = mm.cnt and mm.cnt = gm.cnt;

字符串比较方法更短,可能更容易理解,而且可能更快.

The string comparison method is shorter, perhaps easier to understand, and probably faster.

对于获得完全匹配的特定情况,可以简化查询:

For your particular case of getting exact matches, the query can be simplified:

select mm.mId, gm.gId
from (select mm.*, count(*) over (partition by mm.mid) as cnt
      from manager_meeting mm
     ) mm join
     (select gm.*, count(*) over (partition by gm.gid) as cnt
      from grunt_meeting gm
     ) gm
     on gm.meetingid = mm.meetingid and
        mm.cnt = gm.cnt
group by mm.mId, gm.gId
having count(*) = max(mm.cnt);

在性能和清晰度方面,这可能比字符串版本更具竞争力.

This might be more competitive with the string version, both in terms of performance and clarity.

它计算咕噜声和经理之间的匹配次数.然后检查这是否是每个人的所有会议.

It counts the number of matches between a grunt and a manager. It then checks that this is all the meetings for each.

相关文章