MySQL有效地存储无向图边

2021-12-24 00:00:00 graph mysql

我想存储无向图的边(例如,给朋友).要存储和检索节点 a 的所有朋友,可以使用:

I want to store undirected graph edges (for example, for friends). To store and retrieve all friends of node a, one can use:

每条边创建两行,每个节点查询一列:

Create two rows per edge, query on one column per node:

+--------------------------+
| id | from_node | to_node |
+--------------------------+
| 1  |  a        |  b      |
| 2  |  b        |  a      |
+--------------------------+
SELECT * FROM `x` WHERE from_node = a

每条边创建一行,使用OR:

Create one row per edge, use OR:

+--------------------------+
| id | node_a    | node_b  |
+--------------------------+
| 1  |  a        |  b      |
+--------------------------+
SELECT * FROM `y` WHERE node_a = a OR node_b = a

哪种方法可以提高查找效率?

Which makes for more efficient lookups?

  • x 具有 2n 行,from_nodeto_node 上的索引,一列查找
  • y 具有 n 行,node_anode_b 上的索引,使用 OR
  • Table x with 2n rows, indices on from_node and to_node, lookup on one column
  • Table y with n rows, indices on node_a and node_b, lookup on both columns using OR

推荐答案

如果你优化一切,那么 X 将是最快的,假设你从磁盘读取数据并查询一个人的朋友.那是因为您可以在磁盘上排列您的数据,以便它们被排序以匹配一个索引,即您正在查询的索引.所以,对于一个人,你只需要做一次磁盘搜索.Y 需要对两个索引进行查询,因此可能意味着多次搜索以检索朋友,即使对于一个人也是如此(并且磁盘访问时间通常支配简单查询).

if you optimise everything, then X will be fastest, assuming that you read data from disk and are querying for friends of a single person. that's because you can arrange your data on disk so that they are ordered to match one index, which is the one you are querying. so, for a single person, you only need to do one disk seek. Y requires queries on two indices, so may imply multiple seeks to retrieve friends, even for a single person (and disk access time usually dominates simple queries).

请参阅维基百科的聚集索引(以及mysql 手册)

see clustered indices at wikipedia (and the mysql manual)

如果你有幸知道数据总是在内存中,那么它们可能都足够快"(即使数据在磁盘上,它们也可能足够快——我不是说 X 是最好的设计,只有它才能最有效).

if you are lucky enough to know that data will always be in memory then they will likely both be "fast enough" (and even if the data are on disk they may be fast enough - i am not saying X is the best design, only that it can be made most efficient).

相关文章