这个(规范化的)数据库结构是否允许我按照我的意图按标签进行搜索?

2021-12-17 00:00:00 join mysql relational-database database-design

我正在尝试建立一个包含以下三个表的规范化 MySQL 数据库.第一个表包含可以由各种标签描述的项目列表.第三个表包含用于描述第一个表中的项目的各种标签.中间的表将其他两个表相互关联.在每个表的情况下，id 是一个自动递增的主键(每个都用作中间表中的外键)

+---------------+---------------------+---------------+|表 1 |表 2 |表 3 |+--------------+--------------------+----------------+|id item |id item_id tag_id|id tag|+--------------+--------------------+----------------+|1 西班牙猎犬|1 1 4|1只鸟||2 虎斑|2 1 23|4 宠物||3只鸡|3 1 41|23 狗||4金鱼|4 2 4|24 猫|||5 2 24|25 爬行动物|||6 3 1|38 条鱼|||7 3 40|40 好吃|||8 4 4|41 便宜|||9 4 38|42 贵|||10 4 41||||||+--------------+--------------------+----------------+

我想对三个表运行一个或多个标签的查询，以返回与所有标签匹配的项目.

因此，例如，查询pet"将返回项目 (1)spaniel、(2)tabby 和 (4)goldfish，因为它们都被标记为pet".一起查询cheap"和pet"将返回 (1)spaniel 和 (4)goldfish，因为它们都被标记为cheap"和pet".虎斑猫不会被退回，因为它只被标记为宠物"而不是便宜"(在我的世界里，虎斑猫很贵:P)

查询cheap"、pet"和dog"只会返回 (1) Spaniel，因为它是唯一一个匹配所有三个标签的.

无论如何，这是理想的行为.我有两个问题.

<块引用>

这是为我的预期目的设置桌子的最佳方式吗?我是对规范化的想法还是陌生的数据库，我正在接受这个继续 - 对效率的任何投入或即使这是一个合适的布局因为我的数据库会很多赞赏.
如果上述设置可行，我如何构建一个单个 MySQL 查询来实现我的预期目的?*(也就是说，为了一系列标签，只返回匹配所有指定的项目标签).我试过做各种各样的JOINs/UNIONs 但它们都不是给我想要的效果(通常返回匹配任何一个的所有项目标签)．我花了一些时间查看 MySQL 手册在线，但我觉得我失踪了概念上的东西.

*我说的是单个查询，因为我当然可以运行一系列简单的 WHERE/JOIN 查询，每个标签一个，然后在 PHP 或事后组合/排序返回的项目，但这似乎是愚蠢和低效的这样做的方式.在适当的设置下，我觉得有一种方法可以使用单个 MySQL 查询来完成此操作.

解决方案

您的架构看起来相当不错.连接表中不需要 ID 列——只需从其他表的 ID 列创建一个主键(尽管参见 Marjan Venema 的评论和我是否应该使用复合主键? 对此的替代视图).以下示例展示了如何创建表、添加一些数据以及执行您请求的查询.

创建表，用外键完成约束.简而言之，外键约束有助于确保数据库的完整性.在此示例中，如果 item 和 tag 表中没有匹配的项目，它们会阻止将项目插入连接表 (item_tag):

如果不存在则创建表`item`(`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,`item` VARCHAR(255) NOT NULL ,主键 (`id`) )引擎 = InnoDB;如果不存在`tag`，则创建表(`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,`tag` VARCHAR(255) NOT NULL ,主键 (`id`) )引擎 = InnoDB;如果不存在则创建表`item_tag`(`item_id` INT UNSIGNED NOT NULL ,`tag_id` INT UNSIGNED NOT NULL ,PRIMARY KEY (`item_id`, `tag_id`) ,索引`fk_item_tag_item`(`item_id` ASC)，索引`fk_item_tag_tag`(`tag_id` ASC)，约束`fk_item_tag_item`外键(`item_id`)参考`item`(`id`)删除级联在更新级联，约束`fk_item_tag_tag`外键(`tag_id`)参考`标签`(`id`)删除级联更新级联)引擎 = InnoDB;

插入一些测试数据:

INSERT INTO item (item) VALUES('西班牙猎犬'),('虎斑'),('鸡')，('金鱼');INSERT INTO tag (标签) VALUES('鸟')，('宠物')，('狗')，('猫')，('爬虫')，('鱼')，('美味的')，('便宜的')，('昂贵的');INSERT INTO item_tag (item_id, tag_id) VALUES(1,2),(1,3),(1,8),(2,2),(2,4),(3,1),(3,7),(4,2),(4,6),(4,8);

选择所有项目和所有标签:

SELECT item.id, item.item, tag.tag发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.id;+----+-----------+-----------+|身份证 |项目 |标签 |+----+-----------+-----------+|1 |西班牙猎犬 |宠物||1 |西班牙猎犬 |狗||1 |西班牙猎犬 |便宜||2 |虎斑|宠物||2 |虎斑|猫 ||3 |鸡|鸟||3 |鸡|美味||4 |金鱼|宠物||4 |金鱼|鱼 ||4 |金鱼|便宜|+----+-----------+-----------+

选择带有特定标签的项目:

SELECT item.id, item.item, tag.tag发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签 = '宠物';+----+------------+-----+|身份证 |项目 |标签 |+----+------------+-----+|1 |西班牙猎犬 |宠物||2 |虎斑|宠物||4 |金鱼|宠物|+----+------------+-----+

选择带有一个或多个标签的项目.请注意，这将返回带有 cheap 或 pet 标签的商品:

SELECT item.id, item.item, tag.tag发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签('便宜'，'宠物')；+----+----------+-------+|身份证 |项目 |标签 |+----+----------+-------+|1 |西班牙猎犬 |宠物||1 |西班牙猎犬 |便宜||2 |虎斑|宠物||4 |金鱼|宠物||4 |金鱼|便宜|+----+----------+-------+

上述查询产生了您可能不想要的答案，如下面的查询所突出显示.在这种情况下，没有带有 house 标记的项目，但此查询仍然返回一些行:

SELECT item.id, item.item, tag.tag发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签('便宜'，'房子')；+----+----------+-------+|身份证 |项目 |标签 |+----+----------+-------+|1 |西班牙猎犬 |便宜||4 |金鱼|便宜|+----+----------+-------+

您可以通过添加来解决这个问题GROUP BY 和 HAVING:

SELECT item.id, item.item, tag.tag发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签('便宜'，'房子')GROUP BY item.id HAVING COUNT(*) = 2;空集(0.00 秒)

GROUP BY 将所有具有相同 id(或您指定的任何列)的项目组合到一个行中，从而有效地删除重复项.HAVING COUNT 将结果限制为匹配分组行数等于 2 的结果.这确保只返回带有两个标签的项目 - 请注意，此值必须与 IN 子句中指定的标签数量相匹配.这是一个产生一些东西的例子:

SELECT item.id, item.item, tag.tag发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签('便宜'，'宠物')GROUP BY item.id HAVING COUNT(*) = 2;+----+------------+-----+|身份证 |项目 |标签 |+----+------------+-----+|1 |西班牙猎犬 |宠物||4 |金鱼|宠物|+----+------------+-----+

请注意，在前面的示例中，项目已组合在一起，以免出现重复项.在这种情况下，不需要 tag 列，因为它只会混淆结果——您已经知道有哪些标签，因为您已经请求了带有这些标签的项目.因此，您可以通过从查询中删除 tag 列来稍微简化一下:

SELECT item.id, item.item发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签('便宜'，'宠物')GROUP BY item.id HAVING COUNT(*) = 2;+----+------------+|身份证 |项目 |+----+------------+|1 |西班牙猎犬 ||4 |金鱼|+----+------------+

你可以更进一步，使用 GROUP_CONCAT 提供匹配标签的列表.如果您想要包含一个或多个指定标签的项目列表，但不一定是所有标签，这可能会很方便:

SELECT item.id, item.item, GROUP_CONCAT(tag.tag) AS 标签发件人JOIN item_tag ON item_tag.item_id = item.idJOIN tag ON item_tag.tag_id = tag.idWHERE 标签 ('cheap', 'pet', 'bird', 'cat')按 ID 分组；+----+-----------+-----------+|身份证 |项目 |标签 |+----+-----------+-----------+|1 |西班牙猎犬 |宠物，便宜||2 |虎斑|宠物，猫||3 |鸡|鸟||4 |金鱼|宠物，便宜|+----+-----------+-----------+

上述模式设计的一个问题是可能会输入重复的项目和标签.也就是说，您可以根据需要多次将bird 插入tag 表中，这并不好.解决此问题的一种方法是将 UNIQUE INDEX 添加到 item 和 tag 列.这具有帮助加快依赖于这些列的查询的额外好处.更新后的 CREATE TABLE 命令现在看起来像这样:

如果不存在则创建表`item`(`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,`item` VARCHAR(255) NOT NULL ,唯一索引 `item` (`item`) ,主键 (`id`) )引擎 = InnoDB;如果不存在`tag`，则创建表(`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,`tag` VARCHAR(255) NOT NULL ,唯一索引`标签`(`标签`)，主键 (`id`) )引擎 = InnoDB;

现在，如果您尝试插入重复值，MySQL 将阻止您这样做:

INSERT INTO tag (tag) VALUES ('bird');ERROR 1062 (23000): 键 'tag' 的重复条目 'bird'

I am trying to set up a normalised MySQL database containing the three following tables. The first table contains a list of items which can be described by various tags. The third table contains the various tags used to describe the items in the first table. The middle table relates the other two tables to each other. In each table's case, the id is an auto-incrementing primary key (and each is used as the foreign key in the middle table)

+---------------+---------------------+---------------+
|   Table 1     |      Table 2        |   Table 3     |
+---------------+---------------------+---------------+
|id        item |id   item_id   tag_id|id          tag|
+---------------+---------------------+---------------+
| 1      spaniel| 1         1        4| 1         bird|
| 2        tabby| 2         1       23| 4          pet|
| 3      chicken| 3         1       41|23          dog|
| 4     goldfish| 4         2        4|24          cat|
|               | 5         2       24|25      reptile|
|               | 6         3        1|38         fish|
|               | 7         3       40|40    delicious|
|               | 8         4        4|41        cheap|
|               | 9         4       38|42    expensive|
|               |10         4       41|               |
|               |                     |               |
+---------------+---------------------+---------------+

I want to run a query of one ore more tags against the three tables to return the items that match ALL of the tags.

So for example, querying for "pet" would return the items (1)spaniel, (2)tabby and (4)goldfish, because all of them are tagged "pet". Querying for "cheap" and "pet" together would return (1)spaniel and (4)goldfish because they are both tagged "cheap" and "pet". Tabby would not be returned as it is only tagged "pet" but not "cheap" (in my world tabby cats are expensive :P)

Querying for "cheap", "pet" and "dog" would only return (1)Spaniel, since it is the only one matching all three tags.

Anyway, this is the desired behaviour. I have two questions.

Is this the best way to set up my tables for my intended purposes? I am still new to ideas of normalising databases, and am picking this up as I go along - any input on efficiency or even if this is an appropriate layout for my database would be much appreciated.

Provided the above setup is workable, how could I structure a single MySQL query to achieve my intended purpose?* (that being, for a series of tags, returning ONLY the item(s) that match ALL the specified tags). I have tried doing a variety of JOINs/UNIONs but none of them are giving me the desired effect(usually return ALL the items that match ANY of the tags)． I've spent some time looking through the MySQL manual online but I feel like I'm missing something conceptually.

*I say single query since of course I could just run a series of simple WHERE/JOIN queries, one for each tag and then combine/sort the returned items in PHP or something after the fact but it seems a foolish and inefficient way of doing it. I feel like there is a way I should be able to do this with a single MySQL query, given the appropriate setup.

解决方案

Your schema is looking fairly good. There’s no need for the ID column in your join table—just create a primary key from the ID columns of the other tables (although see Marjan Venema's comment and Should I use composite primary keys or not? for alternative views on this). The following examples show how you can create the tables, add some data, and perform the queries that you requested.

Create tables, complete with foreign key constraints. In short, foreign key constraints help to ensure database integrity. In this example, they prevent items being inserted in the join table (item_tag), if there are no matching items in the item and tag tables:

CREATE  TABLE IF NOT EXISTS `item` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `item` VARCHAR(255) NOT NULL ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS `tag` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `tag` VARCHAR(255) NOT NULL ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS `item_tag` (
  `item_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`item_id`, `tag_id`) ,
  INDEX `fk_item_tag_item` (`item_id` ASC) ,
  INDEX `fk_item_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_item_tag_item`
    FOREIGN KEY (`item_id` )
    REFERENCES `item` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_item_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

Insert some test data:

INSERT INTO item (item) VALUES
('spaniel'),
('tabby'),
('chicken'),
('goldfish');

INSERT INTO tag (tag) VALUES
('bird'),
('pet'),
('dog'),
('cat'),
('reptile'),
('fish'),
('delicious'),
('cheap'),
('expensive');

INSERT INTO item_tag (item_id, tag_id) VALUES
(1,2),
(1,3),
(1,8),
(2,2),
(2,4),
(3,1),
(3,7),
(4,2),
(4,6),
(4,8);

Select all items and all tags:

SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id;

+----+----------+-----------+
| id | item     | tag       |
+----+----------+-----------+
|  1 | spaniel  | pet       |
|  1 | spaniel  | dog       |
|  1 | spaniel  | cheap     |
|  2 | tabby    | pet       |
|  2 | tabby    | cat       |
|  3 | chicken  | bird      |
|  3 | chicken  | delicious |
|  4 | goldfish | pet       |
|  4 | goldfish | fish      |
|  4 | goldfish | cheap     |
+----+----------+-----------+

Select items with a specific tag:

SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag = 'pet';

+----+----------+-----+
| id | item     | tag |
+----+----------+-----+
|  1 | spaniel  | pet |
|  2 | tabby    | pet |
|  4 | goldfish | pet |
+----+----------+-----+

Select items with one or more tags. Note that this will return items that have the tags cheap OR pet:

SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet');

+----+----------+-------+
| id | item     | tag   |
+----+----------+-------+
|  1 | spaniel  | pet   |
|  1 | spaniel  | cheap |
|  2 | tabby    | pet   |
|  4 | goldfish | pet   |
|  4 | goldfish | cheap |
+----+----------+-------+

The above query produces an answer that you might not want, as highlighted by the following query. In this case, there are no items with the house tag, but this query still returns some rows:

SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'house');

+----+----------+-------+
| id | item     | tag   |
+----+----------+-------+
|  1 | spaniel  | cheap |
|  4 | goldfish | cheap |
+----+----------+-------+

You can fix that by adding GROUP BY and HAVING:

SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'house')
GROUP BY item.id HAVING COUNT(*) = 2;

Empty set (0.00 sec)

GROUP BY causes all items with the same id (or whatever column you specify) to be grouped together into a single row, effectively removing duplicates. HAVING COUNT limits the results to those where the count of the matching grouped rows is equal to two. That ensures that only items with two tags are returned - note that this value must match the number of tags specified in the IN clause. Here’s an example that produces something:

SELECT item.id, item.item, tag.tag
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet')
GROUP BY item.id HAVING COUNT(*) = 2;

+----+----------+-----+
| id | item     | tag |
+----+----------+-----+
|  1 | spaniel  | pet |
|  4 | goldfish | pet |
+----+----------+-----+

Note that in the previous example, the items have been grouped together so that you don’t get duplicates. In this case, there’s no need for the tag column, as that just confuses the results—you already know what tags there are, as you have requested items with those tags. You can therefore simplify things a little by removing the tag column from the query:

SELECT item.id, item.item
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet')
GROUP BY item.id HAVING COUNT(*) = 2;

+----+----------+
| id | item     |
+----+----------+
|  1 | spaniel  |
|  4 | goldfish |
+----+----------+

You could go a step further, and use GROUP_CONCAT to provide a list of matching tags. This might be handy where you want a list of items that have one or more of the specified tags, but not necessarily all of them:

SELECT item.id, item.item, GROUP_CONCAT(tag.tag) AS tags
FROM item
JOIN item_tag ON item_tag.item_id = item.id
JOIN tag ON item_tag.tag_id = tag.id
WHERE tag IN ('cheap', 'pet', 'bird', 'cat')
GROUP BY id;

+----+----------+-----------+
| id | item     | tags      |
+----+----------+-----------+
|  1 | spaniel  | pet,cheap |
|  2 | tabby    | pet,cat   |
|  3 | chicken  | bird      |
|  4 | goldfish | pet,cheap |
+----+----------+-----------+

One problem with the above schema design is that it is possible to enter duplicate items and tags. That is, you could insert bird into the tag table as many times as you like, and this is not good. One way to fix that is to add a UNIQUE INDEX to the item and tag columns. This has the added benefit of helping to speed up queries which rely on these columns. The updated CREATE TABLE commands now look like this:

CREATE  TABLE IF NOT EXISTS `item` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `item` VARCHAR(255) NOT NULL ,
  UNIQUE INDEX `item` (`item`) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

CREATE  TABLE IF NOT EXISTS `tag` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `tag` VARCHAR(255) NOT NULL ,
  UNIQUE INDEX `tag` (`tag`) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

Now if you try to insert a duplicate value, MySQL will prevent you from doing so:

INSERT INTO tag (tag) VALUES ('bird');
ERROR 1062 (23000): Duplicate entry 'bird' for key 'tag'

相关文章