每个连接的计数 - 优化

2021-11-20 00:00:00 join optimization mysql

结果:我使用了三种方法:

  1. 三个子查询,每个(我的)1 个连接
  2. 三个子查询,无连接,用 where (SlimsGhost) 过滤
  3. 三重连接 (Solarflare)

我用解释"和分析"做了一些统计,解释了每个查询必须做的工作,以下结果并不奇怪:

原帖

这个想法是连接 4 个表,每次使用相同的 PK,然后计算每个连接将分别给出多少行.

显而易见的答案是使用子查询单独执行每个连接.

但是可以通过一个查询来完成吗?会不会更有效率?

选择LES CIGARES DU PHARAON"作为Titre",(选择计数(payalb.idPays)从专辑专辑左连接 pays_album payalb 使用 ( idAlb )其中 alb.titreAlb = "LES CIGARES DU PHARAON") 作为支付",(选择计数( peralb.idPers)从专辑专辑左连接 pers_album peralb 使用 ( idAlb )其中 alb.titreAlb = "LES CIGARES DU PHARAON") 为 "Personnages",(选择计数( juralb.idJur )从专辑专辑使用 ( idAlb ) 左加入 juron_album juralb其中 alb.titreAlb = "LES CIGARES DU PHARAON") 为 "Jurons";+------------------------+------+------------+--------+|标题 |支付|人事 |裕廊 |+------------------------+------+------------+--------+|LES CIGARES DU PHARAON |3 |13 |50 |+------------------------+------+------------+--------+

表格相册行数:22

表 pays_album 行数:45

table personnage_album 行数:100

表 juron_album 行数:1704

这是我尝试过的:

选择alb.titreAlb为Titre",sum(case when alb.idAlb=payalb.idAlb then 1 else 0 end) "Pays",sum(case when alb.idAlb=peralb.idAlb then 1 else 0 end) "Personnages",sum(case when alb.idAlb=juralb.idAlb then 1 else 0 end) "Jurons"从专辑专辑左连接 pays_album payalb 使用 ( idAlb )左连接 pers_album peralb 使用 ( idAlb )使用 ( idAlb ) 左加入 juron_album juralb其中 alb.titreAlb = "LES CIGARES DU PHARAON"由 alb.titreAlb 分组;+------------------------+------+------------+--------+|标题 |支付|人事 |裕廊 |+------------------------+------+------------+--------+|LES CIGARES DU PHARAON |1950 |1950 |1950 |+------------------------+------+------------+--------+

但它计算的是全连接表的总行数,... (1950 = 3 * 13 * 50)

架构:https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_schema.png

表格内容:https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_description

如果你想玩玩它:

db_init : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_ok.mysql

解决方案

为了优化目的,一个好的经验法则是少加入,而不是多.事实上,您应该尝试用尽可能少的行连接尽可能少的行.通过任何额外的连接,您将增加成本而不是增加成本.因为 mysql 基本上只会生成一个很大的乘法矩阵.不过,其中很多都被索引和其他东西优化掉了.

但是要回答您的问题:假设表具有唯一键并且 idalb 是专辑的唯一键,实际上可以仅使用一个大连接进行计数.然后,只有这样,您才能像您的代码一样执行此操作:

选择alb.titreAlb为Titre",计数(不同的payalb.idAlb,payalb.PrimaryKeyFields)支付",count(distinct peralb.idAlb, peralb.PrimaryKeyFields) "Personnages",计数(不同的 juralb.idAlb,juralb.PrimaryKeyFields)Jurons"从专辑专辑左连接 pays_album payalb 使用 ( idAlb )左连接 pers_album peralb 使用 ( idAlb )使用 ( idAlb ) 左加入 juron_album juralb其中 alb.titreAlb = "LES CIGARES DU PHARAON"由 alb.titreAlb 分组

其中 PrimaryKeyFields 代表连接表的主键字段(您必须查找它们).

Distinct 将消除其他连接对计数的影响.但不幸的是,一般来说,distinct 不会消除连接对成本的影响.

尽管,如果您的索引涵盖了表的所有 (idAlb + PrimaryKeyFields) 字段,那甚至可能与原始解决方案一样快(因为它可以优化 distinct排序)并且将接近您的想法(只需遍历每个表/索引一次).但是在正常或最坏的情况下,它的表现应该比合理的解决方案(如 SlimGhost 的解决方案)更差 - 因为它是否会找到最佳策略值得怀疑.但是玩弄它并检查解释(并发布结果),也许 mysql 会做一些疯狂的事情.

RESULTS : I've used three methods :

  1. Three sub queries, 1 join in each (mine)
  2. Three sub queries, no join, filtering with where (SlimsGhost)
  3. Triple join (Solarflare)

I've made some stats with "explain" and "profiling" which explains the work each query must do and the following results weren't surprising : stats

Relative results :

  1. 100%
  2. 79%
  3. 1715%

ORIGINAL POST

The idea is to join 4 tables, using the same PK each time and then count how many rows each join would separately gives.

The obvious answer is to do each join... separately with sub queries.

But is it possible to do it with one query? Would it be more efficient?

select "LES CIGARES DU PHARAON" as "Titre",
          (select count( payalb.idPays)
          from album alb
                     left join pays_album payalb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Pays",
          (select count( peralb.idPers)
          from album alb
                     left join pers_album peralb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Personnages",
          (select count( juralb.idJur)
          from album alb
                     left join juron_album juralb using ( idAlb )
          where alb.titreAlb = "LES CIGARES DU PHARAON") as "Jurons"
; 
+------------------------+------+-------------+--------+
| Titre                  | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON |    3 |          13 |     50 |
+------------------------+------+-------------+--------+

table album rows : 22

table pays_album rows : 45

table personnage_album rows : 100

table juron_album rows : 1704

Here is what I tried :

select alb.titreAlb as "Titre",
         sum(case when alb.idAlb=payalb.idAlb then 1 else 0 end) "Pays",
         sum(case when alb.idAlb=peralb.idAlb then 1 else 0 end) "Personnages",
         sum(case when alb.idAlb=juralb.idAlb then 1 else 0 end) "Jurons"
from album alb
          left join pays_album payalb using ( idAlb )
          left join pers_album peralb using ( idAlb )
          left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb
;
+------------------------+------+-------------+--------+
| Titre                  | Pays | Personnages | Jurons |
+------------------------+------+-------------+--------+
| LES CIGARES DU PHARAON | 1950 |        1950 |   1950 |
+------------------------+------+-------------+--------+

but it counts the total number of rows of the full joined table, ... (1950 = 3 * 13 * 50)

schema : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_schema.png

tables content : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_description

If you want to play to play with it :

db_init : https://github.com/LittleNooby/gbd2015-2016/blob/master/tintin_ok.mysql

解决方案

For optimization purposes, a good rule of thumb is to join less, not more. In fact, you should try to join as few rows as you can with as few rows as you can. With any additional join, you will multiply cost instead of adding cost. Because mysql will basically just generate a big multiplied matrix. A lot of that gets optimized away by indexes and other stuff though.

But to answer your question: it is actually possible to count with only one big join, assuming the tables have unique keys and idalb is a unique key for album. Then, and only then, you can do it similar to your code:

select alb.titreAlb as "Titre",
       count(distinct payalb.idAlb, payalb.PrimaryKeyFields) "Pays",
       count(distinct peralb.idAlb, peralb.PrimaryKeyFields) "Personnages",
       count(distinct juralb.idAlb, juralb.PrimaryKeyFields) "Jurons"
from album alb
left join pays_album payalb using ( idAlb )
left join pers_album peralb using ( idAlb )
left join juron_album juralb using ( idAlb )
where alb.titreAlb = "LES CIGARES DU PHARAON"
group by alb.titreAlb

where PrimaryKeyFields stands for the primary key fields of the joined tables (you have to look them up).

Distinct will remove the effect the other joins have on the count. But unfortunately, in general, distinct will not remove the effect the joins have on the cost.

Although, if you have indexes that cover all (idAlb + PrimaryKeyFields)-fields of your tables, that might be even as fast as the original solution (because it can optimize the distinct to not do a sorting) and will come close to what you were thinking of (just walking through every table/index once). But in a normal or worst case szenario, it should perform worse than a reasonable solution (like SlimGhost's one) - because it is doubtful it will find the optimal strategy. But play around with it and check the explains (and post the findings), maybe mysql will do something crazy.

相关文章