SQL Server 中最优雅的生成排列的方法

2021-12-10 00:00:00 sql permutation tsql sql-server

给定下表:

Index | Element
---------------
  1   |    A
  2   |    B
  3   |    C
  4   |    D

我们希望使用元素生成所有可能的排列(无重复).最终结果(跳过一些行)将如下所示:

We want to generate all the possible permutations (without repetitions) using the elements. the final result (skipping some rows) will look like this:

  Results
----------
   ABCD
   ABDC
   ACBD
   ACDB
   ADAC
   ADCA

   ...

   DABC
   DACB
   DBCA
   DBAC
   DCAB
   DCBA

  (24 Rows)

你会怎么做?

推荐答案

在发表了一些可能是尖刻的评论后,这个问题整晚都在我的脑海中徘徊,我最终想出了以下基于集合的方法.我相信它绝对符合优雅"的条件,但我也认为它符合有点愚蠢"的条件.你打电话.

After making some perhaps snarky comments, this problem stuck in my brain all evening, and I eventually came up with the following set-based approach. I believe it definitely qualifies as "elegant", but then I also think it qualifies as "kinda dumb". You make the call.

首先,设置一些表:

--  For testing purposes
DROP TABLE Source
DROP TABLE Numbers
DROP TABLE Results


--  Add as many rows as need be processed--though note that you get N! (number of rows, factorial) results,
--  and that gets big fast. The Identity column must start at 1, or the algorithm will have to be adjusted.
--  Element could be more than char(1), though the algorithm would have to be adjusted again, and each element
--  must be the same length.
CREATE TABLE Source
 (
   SourceId  int      not null  identity(1,1)
  ,Element   char(1)  not null
 )

INSERT Source (Element) values ('A')
INSERT Source (Element) values ('B')
INSERT Source (Element) values ('C')
INSERT Source (Element) values ('D')
--INSERT Source (Element) values ('E')
--INSERT Source (Element) values ('F')


--  This is a standard Tally table (or "table of numbers")
--  It only needs to be as long as there are elements in table Source
CREATE TABLE Numbers (Number int not null)
INSERT Numbers (Number) values (1)
INSERT Numbers (Number) values (2)
INSERT Numbers (Number) values (3)
INSERT Numbers (Number) values (4)
INSERT Numbers (Number) values (5)
INSERT Numbers (Number) values (6)
INSERT Numbers (Number) values (7)
INSERT Numbers (Number) values (8)
INSERT Numbers (Number) values (9)
INSERT Numbers (Number) values (10)


--  Results are iteratively built here. This could be a temp table. An index on "Length" might make runs
--  faster for large sets.  Combo must be at least as long as there are characters to be permuted.
CREATE TABLE Results
 (
   Combo   varchar(10)  not null
  ,Length  int          not null
 )

这是例程:

SET NOCOUNT on

DECLARE
  @Loop     int
 ,@MaxLoop  int


--  How many elements there are to process
SELECT @MaxLoop = max(SourceId)
 from Source


--  Initialize first value
TRUNCATE TABLE Results
INSERT Results (Combo, Length)
 select Element, 1
  from Source
  where SourceId = 1

SET @Loop = 2

--  Iterate to add each element after the first
WHILE @Loop <= @MaxLoop
 BEGIN

    --  See comments below. Note that the "distinct" remove duplicates, if a given value
    --  is to be included more than once
    INSERT Results (Combo, Length)
     select distinct
        left(re.Combo, @Loop - nm.Number)
        + so.Element
        + right(re.Combo, nm.Number - 1)
       ,@Loop
      from Results re
       inner join Numbers nm
        on nm.Number <= @Loop
       inner join Source so
        on so.SourceId = @Loop
      where re.Length = @Loop - 1

    --  For performance, add this in if sets will be large
    --DELETE Results
    -- where Length <> @Loop

    SET @Loop = @Loop + 1
 END

--  Show results
SELECT *
 from Results
 where Length = @MaxLoop
 order by Combo

一般的想法是:当向任何字符串(例如A")添加新元素(例如B")时,要捕获所有排列,您将添加 B到所有可能的位置 (Ba, aB),产生一组新的字符串.然后迭代:向字符串中的每个位置添加一个新元素(C)(AB 变为 Cab, aCb, abC),对于所有字符串 (Cba, bCa, baC),您就有了一组排列.迭代每个结果集下一个字符,直到用完字符...或资源.10 个元素是 360 万个排列,上述算法大约为 48MB,14 个(唯一)元素将达到 870 亿个排列和 1.163 TB.

The general idea is: when adding a new element (say "B") to any string (say, "A"), to catch all permutations you would add B to all possible positions (Ba, aB), resulting in a new set of strings. Then iterate: Add a new element (C) to each position in a string (AB becomes Cab, aCb, abC), for all strings (Cba, bCa, baC), and you have the set of permutations. Iterate over each result set with the next character until you run out of characters... or resources. 10 elements is 3.6 million permutations, roughly 48MB with the above algorithm, and 14 (unique) elements would hit 87 billion permutations and 1.163 terabytes.

我确信它最终会被嵌入到 CTE 中,但最终这将是一个美化的循环.逻辑这样更清晰,我不禁认为CTE执行计划将是一场噩梦.

I'm sure it could eventually be wedged into a CTE, but in the end all that would be is a glorified loop. The logic is clearer this way, and I can't help but think the CTE execution plan would be a nightmare.

相关文章