SQL查询从多个表返回数据

2021-11-20 00:00:00 sql select mysql

我想知道以下内容:

  • 如何从我的数据库中的多个表中获取数据?
  • 有哪些方法可以做到这一点?
  • 什么是联接和联合?它们之间有何不同?
  • 与其他的相比,我应该什么时候使用它们?

我打算在我的(例如 - PHP)应用程序中使用它,但不想对数据库运行多个查询,我有哪些选项可以在单个查询中从多个表中获取数据?

注意:我写这篇文章是因为我希望能够链接到关于我在 PHP 队列中经常遇到的众多问题的写得很好的指南,所以当我发布一个回答.

答案涵盖以下内容:

  1. 第 1 部分 - 联接和工会
  2. 第 2 部分 - 子查询
  3. 第 3 部分 - 技巧和高效的代码
  4. 第 4 部分 - From 子句中的子查询
  5. 第 5 部分 - 约翰的诡计混杂

解决方案

第 1 部分 - 联接和联合

这个答案涵盖:

  1. 第 1 部分
    • 使用内连接连接两个或多个表(请参阅维基百科条目了解更多信息)
    • 如何使用联合查询
    • 左外连接和右外连接(这个 stackOverflow 答案非常适合描述连接类型)
    • 交叉查询(以及如果您的数据库不支持它们,如何重现它们) - 这是 SQL-Server 的一个功能 (查看信息)和 莉>
  2. 第 2 部分
    • 子查询 - 它们是什么,可以在哪里使用以及需要注意什么
    • 笛卡尔加入 AKA - 哦,痛苦!

有多种方法可以从数据库中的多个表中检索数据.在这个答案中,我将使用 ANSI-92 连接语法.这可能与其他一些使用旧的 ANSI-89 语法的教程不同(如果你习惯了 89,可能看起来不那么直观 - 但我只能说尝试一下),因为它是当查询开始变得更复杂时,更容易理解.为什么要使用它?有性能提升吗?简短的回答是否定的,但是一旦你习惯了它就更容易阅读了.使用这种语法阅读其他人编写的查询会更容易.

我还将使用一个小型车场的概念,它有一个数据库来跟踪它有哪些可用的汽车.所有者已聘请您作为他的 IT 计算机人员,并希望您能够毫不犹豫地将他要求的数据交给他.

我制作了一些最终表将使用的查找表.这将为我们提供一个合理的模型来工作.首先,我将针对具有以下结构的示例数据库运行查询.我会试着想想在开始时犯的常见错误,并解释它们出了什么问题——当然也会展示如何纠正它们.

第一个表格只是一个颜色列表,以便我们知道车场里有什么颜色.

mysql>创建表颜色(id int(3) not null auto_increment 主键,->颜色变量(15),油漆变量(10));查询正常,0 行受影响(0.01 秒)mysql>从颜色显示列;+-------+-------------+------+-----+---------+-----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+-------+-------------+------+-----+---------+-----------------+|身份证 |整数(3) |否 |PRI |空 |自动增量||颜色 |varchar(15) |是 ||空 |||油漆|varchar(10) |是 ||空 ||+-------+-------------+------+-----+---------+-----------------+3 行(0.01 秒)mysql>插入颜色(颜色,油漆)值('红色','金属'),->('绿色', '光泽'), ('蓝色', '金属'),->('白色' '光泽'), ('黑色' '光泽');查询正常,5 行受影响(0.00 秒)记录:5 重复:0 警告:0mysql>从颜色中选择*;+----+-------+----------+|身份证 |颜色 |油漆|+----+-------+----------+|1 |红色 |金属 ||2 |绿色 |光泽 ||3 |蓝色 |金属 ||4 |白色 |光泽 ||5 |黑色 |光泽 |+----+-------+----------+5 行(0.00 秒)

品牌表标识了caryard 可能销售的不同品牌汽车.

mysql>创建表品牌(id int(3) not null auto_increment 主键,->品牌 varchar(15));查询正常,0 行受影响(0.01 秒)mysql>显示品牌栏目;+-------+-------------+------+-----+---------+-----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+-------+-------------+------+-----+---------+-----------------+|身份证 |整数(3) |否 |PRI |空 |自动增量||品牌 |varchar(15) |是 ||空 ||+-------+-------------+------+-----+---------+-----------------+2 行(0.01 秒)mysql>插入品牌 (brand) 值 ('Ford'), ('Toyota'),->('Nissan'), ('Smart'), ('BMW');查询正常,5 行受影响(0.00 秒)记录:5 重复:0 警告:0mysql>从品牌中选择*;+----+--------+|身份证 |品牌 |+----+--------+|1 |福特 ||2 |丰田 ||3 |日产 ||4 |智能 ||5 |宝马 |+----+--------+5 行(0.00 秒)

模型表将涵盖不同类型的汽车,使用不同的汽车类型而不是实际的汽车模型会更简单.

mysql>创建表模型(id int(3) not null auto_increment 主键,->模型 varchar(15));查询正常,0 行受影响(0.01 秒)mysql>显示模型中的列;+-------+-------------+------+-----+---------+-----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+-------+-------------+------+-----+---------+-----------------+|身份证 |整数(3) |否 |PRI |空 |自动增量||模型 |varchar(15) |是 ||空 ||+-------+-------------+------+-----+---------+-----------------+2 行(0.00 秒)mysql>插入模型 (model) 值 ('Sports'), ('Sedan'), ('4WD'), ('Luxury');查询正常,4 行受影响(0.00 秒)记录:4 重复:0 警告:0mysql>从模型中选择 *;+----+--------+|身份证 |模型 |+----+--------+|1 |体育 ||2 |轿车 ||3 |四驱 ||4 |豪华 |+----+--------+4 行(0.00 秒)

最后,将所有这些其他表捆绑在一起,将所有东西捆绑在一起的表.ID 字段实际上是用于识别汽车的唯一批号.

mysql>创建表汽车(id int(3) not null auto_increment 主键,->颜色 int(3), 品牌 int(3), 型号 int(3));查询正常,0 行受影响(0.01 秒)mysql>显示来自汽车的列;+-------+--------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+-------+--------+------+-----+---------+----------------+|身份证 |整数(3) |否 |PRI |空 |自动增量||颜色 |整数(3) |是 ||空 |||品牌 |整数(3) |是 ||空 |||模型 |整数(3) |是 ||空 ||+-------+--------+------+-----+---------+----------------+4 行(0.00 秒)mysql>插入汽车(颜色,品牌,型号)值(1,2,1),(3,1,2),(5,3,1),->(4,4,2), (2,2,3), (3,5,4), (4,1,3), (2,2,1), (5,2,3), (4),5,1);查询正常,10 行受影响(0.00 秒)记录:10 重复:0 警告:0mysql>从汽车中选择 *;+----+-------+-------+-------+|身份证 |颜色 |品牌 |模型 |+----+-------+-------+-------+|1 |1 |2 |1 ||2 |3 |1 |2 ||3 |5 |3 |1 ||4 |4 |4 |2 ||5 |2 |2 |3 ||6 |3 |5 |4 ||7 |4 |1 |3 ||8 |2 |2 |1 ||9 |5 |2 |3 ||10 |4 |5 |1 |+----+-------+-------+-------+10 行(0.00 秒)

这将为我们提供足够的数据(我希望)来涵盖以下不同类型连接的示例,并提供足够的数据使它们值得.

因此,老板想知道他拥有的所有跑车的 ID.

这是一个简单的两表连接.我们有一个表来标识模型和包含可用库存的表.可以看到,cars表的model列中的数据与cars表的models列相关> 我们有桌子.现在,我们知道模型表的 1 用于 Sports 的 ID,所以让我们编写连接.

选择ID,模型从汽车加入模型型号=ID

所以这个查询看起来不错吧?我们已经确定了两个表并包含了我们需要的信息,并使用了一个连接来正确识别要连接的列.

ERROR 1052 (23000): 字段列表中的ID"列不明确

哦,不!我们的第一个查询出错了!是的,它是一个李子.您会看到,查询确实获得了正确的列,但其中一些列存在于两个表中,因此数据库对我们的实际含义和位置感到困惑.有两种解决方案可以解决这个问题.第一个很好很简单,我们可以使用 tableName.columnName 来告诉数据库我们的意思,就像这样:

选择汽车.ID,模型.model从汽车加入模型在cars.model=models.ID+----+--------+|身份证 |模型 |+----+--------+|1 |体育 ||3 |体育 ||8 |体育 ||10 |体育 ||2 |轿车 ||4 |轿车 ||5 |四驱 ||7 |四驱 ||9 |四驱 ||6 |豪华 |+----+--------+10 行(0.00 秒)

另一个可能更常用,称为表别名.这个例子中的表格有漂亮而简短的名字,但是输入类似 KPI_DAILY_SALES_BY_DEPARTMENT 的东西可能会很快变老,所以一个简单的方法是给表格起这样的昵称:

选择援助,b.模型从汽车加入模型 ba.model=b.ID

现在,回到请求.正如你所看到的,我们有我们需要的信息,但我们也有没有被要求的信息,所以我们需要在语句中包含一个 where 子句,以只获取被要求的跑车.由于我更喜欢​​表别名方法而不是一遍又一遍地使用表名,所以从现在开始我将坚持使用它.

显然,我们需要在查询中添加一个 where 子句.我们可以通过 ID=1model='Sports' 来识别跑车.由于 ID 已编入索引和主键(而且它的键入次数较少),因此让我们在查询中使用它.

选择援助,b.模型从汽车加入模型 ba.model=b.ID在哪里b.ID=1+----+--------+|身份证 |模型 |+----+--------+|1 |体育 ||3 |体育 ||8 |体育 ||10 |体育 |+----+--------+4 行(0.00 秒)

宾果游戏!老板很高兴.当然,作为老板,从不满足于他的要求,他查看信息,然后说我也想要颜色.

好的,所以我们已经编写了查询的很大一部分,但是我们需要使用第三个表,即颜色.现在,我们的主信息表 cars 存储了汽车颜色 ID,这会链接回颜色 ID 列.因此,以与原始类似的方式,我们可以加入第三个表:

选择援助,b.模型从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID在哪里b.ID=1+----+--------+|身份证 |模型 |+----+--------+|1 |体育 ||3 |体育 ||8 |体育 ||10 |体育 |+----+--------+4 行(0.00 秒)

该死,虽然表已正确连接并且相关列已链接,但我们忘记从我们刚刚链接的新表中提取实际信息.

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID在哪里b.ID=1+----+--------+-------+|身份证 |模型 |颜色 |+----+--------+-------+|1 |体育 |红色 ||8 |体育 |绿色 ||10 |体育 |白色 ||3 |体育 |黑色 |+----+--------+-------+4 行(0.00 秒)

是的,那是我们暂时离开的老板.现在,更详细地解释其中的一些内容.如您所见,我们语句中的 from 子句链接了我们的主表(我经常使用包含信息的表,而不是查找表或维度表.查询将与所有表一样正常工作切换了,但是当我们在几个月后回到这个查询来阅读它时就没那么有意义了,所以通常最好尝试编写一个很好且易于理解的查询 - 直观地布局它,使用 nice缩进,以便一切都尽可能清楚.如果您继续教导他人,请尝试将这些特征灌输到他们的查询中 - 特别是如果您要对它们进行故障排除时.

完全有可能以这种方式链接越来越多的表格.

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.ID在哪里b.ID=1

虽然我忘记在 join 语句中包含一个我们可能想要连接多个列的表,但这里有一个示例.如果 models 表具有品牌特定的型号,因此也有一个名为 brand 的列,该列链接回 上的 brands 表ID 字段,可以这样做:

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.IDb.brand=d.ID在哪里b.ID=1

您可以看到,上面的查询不仅将连接表链接到主 cars 表,还指定了已连接表之间的连接.如果没有这样做,则结果称为笛卡尔连接 - 这是 dba 代表不好.笛卡尔连接是一种返回行的连接,因为信息没有告诉数据库如何限制结果,因此查询返回所有符合条件的行.

所以,举一个笛卡尔连接的例子,让我们运行以下查询:

选择援助,b.模型从汽车加入模型 b+----+--------+|身份证 |模型 |+----+--------+|1 |体育 ||1 |轿车 ||1 |四驱 ||1 |豪华 ||2 |体育 ||2 |轿车 ||2 |四驱 ||2 |豪华 ||3 |体育 ||3 |轿车 ||3 |四驱 ||3 |豪华 ||4 |体育 ||4 |轿车 ||4 |四驱 ||4 |豪华 ||5 |体育 ||5 |轿车 ||5 |四驱 ||5 |豪华 ||6 |体育 ||6 |轿车 ||6 |四驱 ||6 |豪华 ||7 |体育 ||7 |轿车 ||7 |四驱 ||7 |豪华 ||8 |体育 ||8 |轿车 ||8 |四驱 ||8 |豪华 ||9 |体育 ||9 |轿车 ||9 |四驱 ||9 |豪华 ||10 |体育 ||10 |轿车 ||10 |四驱 ||10 |豪华 |+----+--------+40 行(0.00 秒)

天哪,太丑了.然而,就数据库而言,它正是所要求的.在查询中,我们要求 carsIDmodelsmodel.但是,因为我们没有指定如何连接这些表,数据库已经将第一个表中的每一行与每一行进行了匹配.第二张桌子.

好的,所以老板回来了,他又想了解更多信息.我想要相同的列表,但还包括四驱车.

然而,这给了我们一个很好的借口来看看两种不同的方法来实现这一点.我们可以在 where 子句中添加另一个条件,如下所示:

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.ID在哪里b.ID=1或 b.ID=3

虽然上面的方法工作得很好,但让我们换个角度看,这是展示 union 查询如何工作的一个很好的借口.

我们知道以下将返回所有跑车:

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.ID在哪里b.ID=1

以下将返回所有 4WD:

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.ID在哪里b.ID=3

因此,通过在它们之间添加 union all 子句,第二个查询的结果将附加到第一个查询的结果中.

选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.ID在哪里b.ID=1联合所有选择援助,b.模型,c.颜色从汽车加入模型 ba.model=b.ID加入颜色 ca.color=c.ID加盟品牌da.brand=d.ID在哪里b.ID=3+----+--------+-------+|身份证 |模型 |颜色 |+----+--------+-------+|1 |体育 |红色 ||8 |体育 |绿色 ||10 |体育 |白色 ||3 |体育 |黑色 ||5 |四驱 |绿色 ||7 |四驱 |白色 ||9 |四驱 |黑色 |+----+--------+-------+7 行(0.00 秒)

如您所见,首先返回第一个查询的结果,然后是第二个查询的结果.

在这个例子中,简单地使用第一个查询当然会容易得多,但是 union 查询对于特定情况可能非常有用.它们是从不容易连接在一起的表中返回特定结果的好方法 - 或者就此而言完全不相关的表.但是,有一些规则需要遵循.

  • 第一个查询中的列类型必须与下面所有其他查询中的列类型相匹配.
  • 第一个查询中的列名称将用于标识整个结果集.
  • 每个查询中的列数必须相同.

现在,您可能想知道 区别在于使用unionunion all.union 查询将删除重复项,而 union all 不会.这确实意味着在使用 union 而不是 union all 时,性能会受到很小的影响,但结果可能是值得的 - 我不会在此推测这类事情不过.

关于这个注释,这里可能需要注意一些额外的注释.

  • 如果我们想对结果进行排序,我们可以使用 order by 但您不能再使用别名.在上面的查询中,附加一个 order by a.ID 会导致错误 - 就结果而言,该列被称为 ID 而不是 a.ID - 即使在两个查询中使用了相同的别名.
  • 我们只能有一个order by语句,而且必须是最后一个语句.

对于接下来的示例,我将向我们的表中添加一些额外的行.

我已将 Holden 添加到品牌表中.我还在 cars 中添加了一行,其 color 值为 12 - 在颜色表中没有引用.

好的,老板又回来了,大声喊叫 - *我想统计我们携带的每个品牌以及其中的汽车数量!` - 典型的,我们只是讨论了一个有趣的部分,然后老板想要更多的工作.

Rightyo,所以我们需要做的第一件事就是获得可能品牌的完整列表.

选择一个品牌从品牌+--------+|品牌 |+--------+|福特 ||丰田 ||日产 ||智能 ||宝马 ||霍尔顿 |+--------+6 行(0.00 秒)

现在,当我们将其加入到我们的汽车表中时,我们得到以下结果:

选择一个品牌从品牌加入汽车 ba.ID=b.brand通过...分组一个品牌+--------+|品牌 |+--------+|宝马 ||福特 ||日产 ||智能 ||丰田 |+--------+5 行(0.00 秒)

这当然是一个问题 - 我们没有看到任何提及我添加的可爱的 Holden 品牌.

这是因为连接在两个表中查找匹配的行.由于Holden 类型的汽车中没有数据,因此不会返回.这是我们可以使用 outer 连接的地方.这将返回所有一个表中的结果,无论它们在另一个表中是否匹配:

选择一个品牌从品牌左外连接车 ba.ID=b.brand通过...分组一个品牌+--------+|品牌 |+--------+|宝马 ||福特 ||霍尔顿 ||日产 ||智能 ||丰田 |+--------+6 行(0.00 秒)

既然我们有了这个,我们可以添加一个可爱的聚合函数来计算并让老板暂时摆脱困境.

选择一个品牌,count(b.id) 作为 countOfBrand从品牌左外连接车 ba.ID=b.brand通过...分组一个品牌+--------+--------------+|品牌 |countOfBrand |+--------+--------------+|宝马 |2 ||福特 |2 ||霍尔顿 |0 ||日产 |1 ||智能 |1 ||丰田 |5 |+--------+--------------+6 行(0.00 秒)

这样,老板就躲开了.

现在,为了更详细地解释这一点,外连接可以是 leftright 类型.Left 或 Right 定义完全包含哪个表.left outer join 将包含左侧表中的所有行,而(你猜对了)right external join 将包含右侧表中的所有结果进入结果.

某些数据库将允许完全外连接,这将从两个表中带回结果(无论是否匹配),但并非所有数据库都支持这种方式.

现在,我可能认为此时此刻,您想知道是否可以在查询中合并连接类型 - 答案是肯定的,您绝对可以.

选择b.品牌,c.颜色,count(a.id) 作为 countOfBrand从汽车右外连接品牌 bb.ID=a.brand加入颜色 ca.color=c.ID通过...分组一个品牌,c.颜色+--------+-------+--------------+|品牌 |颜色 |countOfBrand |+--------+-------+--------------+|福特 |蓝色 |1 ||福特 |白色 |1 ||丰田 |黑色 |1 ||丰田 |绿色 |2 ||丰田 |红色 |1 ||日产 |黑色 |1 ||智能 |白色 |1 ||宝马 |蓝色 |1 ||宝马 |白色 |1 |+--------+-------+--------------+9 行(0.00 秒)

那么,为什么这不是预期的结果?这是因为虽然我们选择了从汽车到品牌的外部连接,但在连接到颜色中并没有指定 - 所以特定的连接只会带回两个表中匹配的结果.

以下查询可用于获得我们预期的结果:

选择一个品牌,c.颜色,count(b.id) 作为 countOfBrand从品牌左外连接车 ba.ID=b.brand左外连接颜色 c在 b.color=c.ID 上通过...分组一个品牌,c.颜色+--------+-------+--------------+|品牌 |颜色 |countOfBrand |+--------+-------+--------------+|宝马 |蓝色 |1 ||宝马 |白色 |1 ||福特 |蓝色 |1 ||福特 |白色 |1 ||霍尔顿 |空 |0 ||日产 |黑色 |1 ||智能 |白色 |1 ||丰田 |空 |1 ||丰田 |黑色 |1 ||丰田 |绿色 |2 ||丰田 |红色 |1 |+--------+-------+--------------+11 行(0.00 秒)

如我们所见,我们在查询中有两个外部联接,结果按预期通过.

现在,你问的那些其他类型的连接怎么样?交叉路口呢?

好吧,并非所有数据库都支持 intersection,但几乎所有数据库都允许您通过连接(或至少结构良好的 where 语句)创建交集.

Intersection 是一种连接类型,有点类似于上述的 union - 但区别在于它仅返回相同的数据行(并且我确实意味着相同)在联合加入的各种单独查询之间.只会返回在各方面都相同的行.

一个简单的例子是这样的:

选择*从颜色在哪里ID>2相交选择*从颜色在哪里id<4

虽然普通的 union 查询将返回表的所有行(第一个查询返回 ID>2 上的任何内容,第二个返回任何具有 ID<;4) 这将导致一个完整的集合,交叉查询将只返回匹配 id=3 的行,因为它满足两个条件.

现在,如果您的数据库不支持 intersect 查询,则可以使用以下查询轻松完成上述操作:

选择援助,a.颜色,a.油漆从颜色 a加入颜色 ba.ID=b.ID在哪里a.ID>2b.ID<4+----+-------+----------+|身份证 |颜色 |油漆|+----+-------+----------+|3 |蓝色 |金属 |+----+-------+----------+1 行(0.00 秒)

如果您希望使用本身不支持交集查询的数据库在两个不同的表之间执行交集,则需要在表的每一列上创建一个连接.>

I would like to know the following:

  • how to get data from multiple tables in my database?
  • what types of methods are there to do this?
  • what are joins and unions and how are they different from one another?
  • When should I use each one compared to the others?

I am planning to use this in my (for example - PHP) application, but don't want to run multiple queries against the database, what options do I have to get data from multiple tables in a single query?

Note: I am writing this as I would like to be able to link to a well written guide on the numerous questions that I constantly come across in the PHP queue, so I can link to this for further detail when I post an answer.

The answers cover off the following:

  1. Part 1 - Joins and Unions
  2. Part 2 - Subqueries
  3. Part 3 - Tricks and Efficient Code
  4. Part 4 - Subqueries in the From Clause
  5. Part 5 - Mixed Bag of John's Tricks

解决方案

Part 1 - Joins and Unions

This answer covers:

  1. Part 1
    • Joining two or more tables using an inner join (See the wikipedia entry for additional info)
    • How to use a union query
    • Left and Right Outer Joins (this stackOverflow answer is excellent to describe types of joins)
    • Intersect queries (and how to reproduce them if your database doesn't support them) - this is a function of SQL-Server (see info) and part of the reason I wrote this whole thing in the first place.
  2. Part 2
    • Subqueries - what they are, where they can be used and what to watch out for
    • Cartesian joins AKA - Oh, the misery!

There are a number of ways to retrieve data from multiple tables in a database. In this answer, I will be using ANSI-92 join syntax. This may be different to a number of other tutorials out there which use the older ANSI-89 syntax (and if you are used to 89, may seem much less intuitive - but all I can say is to try it) as it is much easier to understand when the queries start getting more complex. Why use it? Is there a performance gain? The short answer is no, but it is easier to read once you get used to it. It is easier to read queries written by other folks using this syntax.

I am also going to use the concept of a small caryard which has a database to keep track of what cars it has available. The owner has hired you as his IT Computer guy and expects you to be able to drop him the data that he asks for at the drop of a hat.

I have made a number of lookup tables that will be used by the final table. This will give us a reasonable model to work from. To start off, I will be running my queries against an example database that has the following structure. I will try to think of common mistakes that are made when starting out and explain what goes wrong with them - as well as of course showing how to correct them.

The first table is simply a color listing so that we know what colors we have in the car yard.

mysql> create table colors(id int(3) not null auto_increment primary key, 
    -> color varchar(15), paint varchar(10));
Query OK, 0 rows affected (0.01 sec)

mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(3)      | NO   | PRI | NULL    | auto_increment |
| color | varchar(15) | YES  |     | NULL    |                |
| paint | varchar(10) | YES  |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)

mysql> insert into colors (color, paint) values ('Red', 'Metallic'), 
    -> ('Green', 'Gloss'), ('Blue', 'Metallic'), 
    -> ('White' 'Gloss'), ('Black' 'Gloss');
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select * from colors;
+----+-------+----------+
| id | color | paint    |
+----+-------+----------+
|  1 | Red   | Metallic |
|  2 | Green | Gloss    |
|  3 | Blue  | Metallic |
|  4 | White | Gloss    |
|  5 | Black | Gloss    |
+----+-------+----------+
5 rows in set (0.00 sec)

The brands table identifies the different brands of the cars out caryard could possibly sell.

mysql> create table brands (id int(3) not null auto_increment primary key, 
    -> brand varchar(15));
Query OK, 0 rows affected (0.01 sec)

mysql> show columns from brands;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(3)      | NO   | PRI | NULL    | auto_increment |
| brand | varchar(15) | YES  |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)

mysql> insert into brands (brand) values ('Ford'), ('Toyota'), 
    -> ('Nissan'), ('Smart'), ('BMW');
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select * from brands;
+----+--------+
| id | brand  |
+----+--------+
|  1 | Ford   |
|  2 | Toyota |
|  3 | Nissan |
|  4 | Smart  |
|  5 | BMW    |
+----+--------+
5 rows in set (0.00 sec)

The model table will cover off different types of cars, it is going to be simpler for this to use different car types rather than actual car models.

mysql> create table models (id int(3) not null auto_increment primary key, 
    -> model varchar(15));
Query OK, 0 rows affected (0.01 sec)

mysql> show columns from models;
+-------+-------------+------+-----+---------+----------------+
| Field | Type        | Null | Key | Default | Extra          |
+-------+-------------+------+-----+---------+----------------+
| id    | int(3)      | NO   | PRI | NULL    | auto_increment |
| model | varchar(15) | YES  |     | NULL    |                |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)

mysql> insert into models (model) values ('Sports'), ('Sedan'), ('4WD'), ('Luxury');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from models;
+----+--------+
| id | model  |
+----+--------+
|  1 | Sports |
|  2 | Sedan  |
|  3 | 4WD    |
|  4 | Luxury |
+----+--------+
4 rows in set (0.00 sec)

And finally, to tie up all these other tables, the table that ties everything together. The ID field is actually the unique lot number used to identify cars.

mysql> create table cars (id int(3) not null auto_increment primary key, 
    -> color int(3), brand int(3), model int(3));
Query OK, 0 rows affected (0.01 sec)

mysql> show columns from cars;
+-------+--------+------+-----+---------+----------------+
| Field | Type   | Null | Key | Default | Extra          |
+-------+--------+------+-----+---------+----------------+
| id    | int(3) | NO   | PRI | NULL    | auto_increment |
| color | int(3) | YES  |     | NULL    |                |
| brand | int(3) | YES  |     | NULL    |                |
| model | int(3) | YES  |     | NULL    |                |
+-------+--------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

mysql> insert into cars (color, brand, model) values (1,2,1), (3,1,2), (5,3,1), 
    -> (4,4,2), (2,2,3), (3,5,4), (4,1,3), (2,2,1), (5,2,3), (4,5,1);
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> select * from cars;
+----+-------+-------+-------+
| id | color | brand | model |
+----+-------+-------+-------+
|  1 |     1 |     2 |     1 |
|  2 |     3 |     1 |     2 |
|  3 |     5 |     3 |     1 |
|  4 |     4 |     4 |     2 |
|  5 |     2 |     2 |     3 |
|  6 |     3 |     5 |     4 |
|  7 |     4 |     1 |     3 |
|  8 |     2 |     2 |     1 |
|  9 |     5 |     2 |     3 |
| 10 |     4 |     5 |     1 |
+----+-------+-------+-------+
10 rows in set (0.00 sec)

This will give us enough data (I hope) to cover off the examples below of different types of joins and also give enough data to make them worthwhile.

So getting into the grit of it, the boss wants to know The IDs of all the sports cars he has.

This is a simple two table join. We have a table that identifies the model and the table with the available stock in it. As you can see, the data in the model column of the cars table relates to the models column of the cars table we have. Now, we know that the models table has an ID of 1 for Sports so lets write the join.

select
    ID,
    model
from
    cars
        join models
            on model=ID

So this query looks good right? We have identified the two tables and contain the information we need and use a join that correctly identifies what columns to join on.

ERROR 1052 (23000): Column 'ID' in field list is ambiguous

Oh noes! An error in our first query! Yes, and it is a plum. You see, the query has indeed got the right columns, but some of them exist in both tables, so the database gets confused about what actual column we mean and where. There are two solutions to solve this. The first is nice and simple, we can use tableName.columnName to tell the database exactly what we mean, like this:

select
    cars.ID,
    models.model
from
    cars
        join models
            on cars.model=models.ID

+----+--------+
| ID | model  |
+----+--------+
|  1 | Sports |
|  3 | Sports |
|  8 | Sports |
| 10 | Sports |
|  2 | Sedan  |
|  4 | Sedan  |
|  5 | 4WD    |
|  7 | 4WD    |
|  9 | 4WD    |
|  6 | Luxury |
+----+--------+
10 rows in set (0.00 sec)

The other is probably more often used and is called table aliasing. The tables in this example have nice and short simple names, but typing out something like KPI_DAILY_SALES_BY_DEPARTMENT would probably get old quickly, so a simple way is to nickname the table like this:

select
    a.ID,
    b.model
from
    cars a
        join models b
            on a.model=b.ID

Now, back to the request. As you can see we have the information we need, but we also have information that wasn't asked for, so we need to include a where clause in the statement to only get the Sports cars as was asked. As I prefer the table alias method rather than using the table names over and over, I will stick to it from this point onwards.

Clearly, we need to add a where clause to our query. We can identify Sports cars either by ID=1 or model='Sports'. As the ID is indexed and the primary key (and it happens to be less typing), lets use that in our query.

select
    a.ID,
    b.model
from
    cars a
        join models b
            on a.model=b.ID
where
    b.ID=1

+----+--------+
| ID | model  |
+----+--------+
|  1 | Sports |
|  3 | Sports |
|  8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)

Bingo! The boss is happy. Of course, being a boss and never being happy with what he asked for, he looks at the information, then says I want the colors as well.

Okay, so we have a good part of our query already written, but we need to use a third table which is colors. Now, our main information table cars stores the car color ID and this links back to the colors ID column. So, in a similar manner to the original, we can join a third table:

select
    a.ID,
    b.model
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
where
    b.ID=1

+----+--------+
| ID | model  |
+----+--------+
|  1 | Sports |
|  3 | Sports |
|  8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)

Damn, although the table was correctly joined and the related columns were linked, we forgot to pull in the actual information from the new table that we just linked.

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
where
    b.ID=1

+----+--------+-------+
| ID | model  | color |
+----+--------+-------+
|  1 | Sports | Red   |
|  8 | Sports | Green |
| 10 | Sports | White |
|  3 | Sports | Black |
+----+--------+-------+
4 rows in set (0.00 sec)

Right, that's the boss off our back for a moment. Now, to explain some of this in a little more detail. As you can see, the from clause in our statement links our main table (I often use a table that contains information rather than a lookup or dimension table. The query would work just as well with the tables all switched around, but make less sense when we come back to this query to read it in a few months time, so it is often best to try to write a query that will be nice and easy to understand - lay it out intuitively, use nice indenting so that everything is as clear as it can be. If you go on to teach others, try to instill these characteristics in their queries - especially if you will be troubleshooting them.

It is entirely possible to keep linking more and more tables in this manner.

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
where
    b.ID=1

While I forgot to include a table where we might want to join more than one column in the join statement, here is an example. If the models table had brand-specific models and therefore also had a column called brand which linked back to the brands table on the ID field, it could be done as this:

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
            and b.brand=d.ID
where
    b.ID=1

You can see, the query above not only links the joined tables to the main cars table, but also specifies joins between the already joined tables. If this wasn't done, the result is called a cartesian join - which is dba speak for bad. A cartesian join is one where rows are returned because the information doesn't tell the database how to limit the results, so the query returns all the rows that fit the criteria.

So, to give an example of a cartesian join, lets run the following query:

select
    a.ID,
    b.model
from
    cars a
        join models b

+----+--------+
| ID | model  |
+----+--------+
|  1 | Sports |
|  1 | Sedan  |
|  1 | 4WD    |
|  1 | Luxury |
|  2 | Sports |
|  2 | Sedan  |
|  2 | 4WD    |
|  2 | Luxury |
|  3 | Sports |
|  3 | Sedan  |
|  3 | 4WD    |
|  3 | Luxury |
|  4 | Sports |
|  4 | Sedan  |
|  4 | 4WD    |
|  4 | Luxury |
|  5 | Sports |
|  5 | Sedan  |
|  5 | 4WD    |
|  5 | Luxury |
|  6 | Sports |
|  6 | Sedan  |
|  6 | 4WD    |
|  6 | Luxury |
|  7 | Sports |
|  7 | Sedan  |
|  7 | 4WD    |
|  7 | Luxury |
|  8 | Sports |
|  8 | Sedan  |
|  8 | 4WD    |
|  8 | Luxury |
|  9 | Sports |
|  9 | Sedan  |
|  9 | 4WD    |
|  9 | Luxury |
| 10 | Sports |
| 10 | Sedan  |
| 10 | 4WD    |
| 10 | Luxury |
+----+--------+
40 rows in set (0.00 sec)

Good god, that's ugly. However, as far as the database is concerned, it is exactly what was asked for. In the query, we asked for for the ID from cars and the model from models. However, because we didn't specify how to join the tables, the database has matched every row from the first table with every row from the second table.

Okay, so the boss is back, and he wants more information again. I want the same list, but also include 4WDs in it.

This however, gives us a great excuse to look at two different ways to accomplish this. We could add another condition to the where clause like this:

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
where
    b.ID=1
    or b.ID=3

While the above will work perfectly well, lets look at it differently, this is a great excuse to show how a union query will work.

We know that the following will return all the Sports cars:

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
where
    b.ID=1

And the following would return all the 4WDs:

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
where
    b.ID=3

So by adding a union all clause between them, the results of the second query will be appended to the results of the first query.

select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
where
    b.ID=1
union all
select
    a.ID,
    b.model,
    c.color
from
    cars a
        join models b
            on a.model=b.ID
        join colors c
            on a.color=c.ID
        join brands d
            on a.brand=d.ID
where
    b.ID=3

+----+--------+-------+
| ID | model  | color |
+----+--------+-------+
|  1 | Sports | Red   |
|  8 | Sports | Green |
| 10 | Sports | White |
|  3 | Sports | Black |
|  5 | 4WD    | Green |
|  7 | 4WD    | White |
|  9 | 4WD    | Black |
+----+--------+-------+
7 rows in set (0.00 sec)

As you can see, the results of the first query are returned first, followed by the results of the second query.

In this example, it would of course have been much easier to simply use the first query, but union queries can be great for specific cases. They are a great way to return specific results from tables from tables that aren't easily joined together - or for that matter completely unrelated tables. There are a few rules to follow however.

  • The column types from the first query must match the column types from every other query below.
  • The names of the columns from the first query will be used to identify the entire set of results.
  • The number of columns in each query must be the same.

Now, you might be wondering what the difference is between using union and union all. A union query will remove duplicates, while a union all will not. This does mean that there is a small performance hit when using union over union all but the results may be worth it - I won't speculate on that sort of thing in this though.

On this note, it might be worth noting some additional notes here.

  • If we wanted to order the results, we can use an order by but you can't use the alias anymore. In the query above, appending an order by a.ID would result in an error - as far as the results are concerned, the column is called ID rather than a.ID - even though the same alias has been used in both queries.
  • We can only have one order by statement, and it must be as the last statement.

For the next examples, I am adding a few extra rows to our tables.

I have added Holden to the brands table. I have also added a row into cars that has the color value of 12 - which has no reference in the colors table.

Okay, the boss is back again, barking requests out - *I want a count of each brand we carry and the number of cars in it!` - Typical, we just get to an interesting section of our discussion and the boss wants more work.

Rightyo, so the first thing we need to do is get a complete listing of possible brands.

select
    a.brand
from
    brands a

+--------+
| brand  |
+--------+
| Ford   |
| Toyota |
| Nissan |
| Smart  |
| BMW    |
| Holden |
+--------+
6 rows in set (0.00 sec)

Now, when we join this to our cars table we get the following result:

select
    a.brand
from
    brands a
        join cars b
            on a.ID=b.brand
group by
    a.brand

+--------+
| brand  |
+--------+
| BMW    |
| Ford   |
| Nissan |
| Smart  |
| Toyota |
+--------+
5 rows in set (0.00 sec)

Which is of course a problem - we aren't seeing any mention of the lovely Holden brand I added.

This is because a join looks for matching rows in both tables. As there is no data in cars that is of type Holden it isn't returned. This is where we can use an outer join. This will return all the results from one table whether they are matched in the other table or not:

select
    a.brand
from
    brands a
        left outer join cars b
            on a.ID=b.brand
group by
    a.brand

+--------+
| brand  |
+--------+
| BMW    |
| Ford   |
| Holden |
| Nissan |
| Smart  |
| Toyota |
+--------+
6 rows in set (0.00 sec)

Now that we have that, we can add a lovely aggregate function to get a count and get the boss off our backs for a moment.

select
    a.brand,
    count(b.id) as countOfBrand
from
    brands a
        left outer join cars b
            on a.ID=b.brand
group by
    a.brand

+--------+--------------+
| brand  | countOfBrand |
+--------+--------------+
| BMW    |            2 |
| Ford   |            2 |
| Holden |            0 |
| Nissan |            1 |
| Smart  |            1 |
| Toyota |            5 |
+--------+--------------+
6 rows in set (0.00 sec)

And with that, away the boss skulks.

Now, to explain this in some more detail, outer joins can be of the left or right type. The Left or Right defines which table is fully included. A left outer join will include all the rows from the table on the left, while (you guessed it) a right outer join brings all the results from the table on the right into the results.

Some databases will allow a full outer join which will bring back results (whether matched or not) from both tables, but this isn't supported in all databases.

Now, I probably figure at this point in time, you are wondering whether or not you can merge join types in a query - and the answer is yes, you absolutely can.

select
    b.brand,
    c.color,
    count(a.id) as countOfBrand
from
    cars a
        right outer join brands b
            on b.ID=a.brand
        join colors c
            on a.color=c.ID
group by
    a.brand,
    c.color

+--------+-------+--------------+
| brand  | color | countOfBrand |
+--------+-------+--------------+
| Ford   | Blue  |            1 |
| Ford   | White |            1 |
| Toyota | Black |            1 |
| Toyota | Green |            2 |
| Toyota | Red   |            1 |
| Nissan | Black |            1 |
| Smart  | White |            1 |
| BMW    | Blue  |            1 |
| BMW    | White |            1 |
+--------+-------+--------------+
9 rows in set (0.00 sec)

So, why is that not the results that were expected? It is because although we have selected the outer join from cars to brands, it wasn't specified in the join to colors - so that particular join will only bring back results that match in both tables.

Here is the query that would work to get the results that we expected:

select
    a.brand,
    c.color,
    count(b.id) as countOfBrand
from
    brands a
        left outer join cars b
            on a.ID=b.brand
        left outer join colors c
            on b.color=c.ID
group by
    a.brand,
    c.color

+--------+-------+--------------+
| brand  | color | countOfBrand |
+--------+-------+--------------+
| BMW    | Blue  |            1 |
| BMW    | White |            1 |
| Ford   | Blue  |            1 |
| Ford   | White |            1 |
| Holden | NULL  |            0 |
| Nissan | Black |            1 |
| Smart  | White |            1 |
| Toyota | NULL  |            1 |
| Toyota | Black |            1 |
| Toyota | Green |            2 |
| Toyota | Red   |            1 |
+--------+-------+--------------+
11 rows in set (0.00 sec)

As we can see, we have two outer joins in the query and the results are coming through as expected.

Now, how about those other types of joins you ask? What about Intersections?

Well, not all databases support the intersection but pretty much all databases will allow you to create an intersection through a join (or a well structured where statement at the least).

An Intersection is a type of join somewhat similar to a union as described above - but the difference is that it only returns rows of data that are identical (and I do mean identical) between the various individual queries joined by the union. Only rows that are identical in every regard will be returned.

A simple example would be as such:

select
    *
from
    colors
where
    ID>2
intersect
select
    *
from
    colors
where
    id<4

While a normal union query would return all the rows of the table (the first query returning anything over ID>2 and the second anything having ID<4) which would result in a full set, an intersect query would only return the row matching id=3 as it meets both criteria.

Now, if your database doesn't support an intersect query, the above can be easily accomlished with the following query:

select
    a.ID,
    a.color,
    a.paint
from
    colors a
        join colors b
            on a.ID=b.ID
where
    a.ID>2
    and b.ID<4

+----+-------+----------+
| ID | color | paint    |
+----+-------+----------+
|  3 | Blue  | Metallic |
+----+-------+----------+
1 row in set (0.00 sec)

If you wish to perform an intersection across two different tables using a database that doesn't inherently support an intersection query, you will need to create a join on every column of the tables.

相关文章