SQL 中的分层标记

2022-01-18 00:00:00 sql database tags mysql normalizing

我有一个 PHP Web 应用程序，它使用 MySQL 数据库进行对象标记，其中我使用了接受的标记结构作为这个 SO 问题.

I have a PHP web application which uses a MySQL database for object tagging, in which I've used the tag structure accepted as the answer to this SO question.

我想实现一个标签层次结构，其中每个标签都可以有一个唯一的父标签.对父标签 T 的搜索将匹配 T 的所有后代(即 T、父标签为 T(T 的子代)、T 的孙子等).

I'd like to implement a tag hierarchy, where each tag can have a unique parent tag. Searches for a parent tag T would then match all descendants of T (i.e. T, tags whos parent is T (children of T), grandchildren of T, etc.).

执行此操作的最简单方法似乎是将 ParentID 字段添加到标签表中，该字段包含标签的父标签的 ID，或者如果标签没有父标签，则包含一些幻数.然而，搜索后代需要对数据库进行重复的完整搜索以找到每个世代"中的标签，我想避免这种情况.

The easiest way of doing this seems to be to add a ParentID field to the tag table, which contains the ID of a tag's parent tag, or some magic number if the tag has no parent. Searching for descendants, however, then requires repeated full searches of the database to find the tags in each 'generation', which I'd like to avoid.

一种(可能)更快但不那么规范的方法是创建一个包含每个标签的所有子代，甚至每个标签的所有后代的表.然而，这会带来数据库中数据不一致的风险(例如，标签是多个父级的子级).

A (presumably) faster, but less normalised way of doing this would be to have a table containing all the children of each tag, or even all the descendants of each tag. This however runs the risk of inconsistent data in the database (e.g. a tag being the child of more than one parent).

有没有一种好方法可以让查询快速找到后代，同时尽可能保持数据标准化?

Is there a good way to make queries to find descendants fast, while keeping the data as normalised as possible?

推荐答案

阿里的答案有一个链接到 Joe Celko 为 Smarties 编写的 SQL 中的树和层次结构，这证实了我的怀疑 - 没有一个简单的数据库结构可以提供世界上最好的.最适合我的目的似乎是这本书中详述的频繁插入树"，它类似于阿里链接的嵌套集模型"，但具有非连续索引.这允许 O(1) 插入(a la 非结构化 BASIC 行编号)，并在需要时偶尔进行索引重组.

Ali's answer has a link to Joe Celko's Trees and Hierarchies in SQL for Smarties, which confirms my suspicion - there isn't a simple database structure that offers the best of all worlds. The best for my purpose seems to be the "Frequent Insertion Tree" detailed in this book, which is like the "Nested Set Model" of Ali's link, but with non-consecutive indexing. This allows O(1) insertion (a la unstructured BASIC line numbering), with occasional index reorganisation as and when needed.

相关文章