自然键与代理键 innodb 外键

一个问题:

我有两张桌子:

Product
id INT
name VARCHAR(64)
something TEXT
else INT
entirely BOOL

Ingredient
id INT
name VARCHAR(64)
description TEXT

现在我也有一个链接表

Products_Ingredients
product_id INT
ingredient_id INT

对于我的多对多关系.

现在,产品和成分都将拥有独特的名称.所以我可以使用名称作为自然键……但这会是一个好主意吗?

Now both products and ingredients will have unique names. So I can use names as natural keys... however will that be a good idea?

假设我有一个产品:Paint Thinner Supreme成分:丁基硝基四环素

Say I have a product: Paint Thinner Supreme with ingredient: Butylonitrotetrocycline

将这些名称用作链接表中的复合键是个好主意吗?

Will that be a good idea to use those names as composite key in the link table?

就我了解使用自然键代替代理的想法而言,我有点无法停止认为使用简单整数作为主键(和外键)会快得多.MySQL 服务器消化这些不同的键的方式会有所不同吗?

As much as I understand idea behind using natural keys over the surrogates, I kinda can't stop thinking that using simple integers as primary keys (and foreign ones) will be much faster. Will there be a difference in a way in which MySQL server digests those different keys?

你有什么看法?

推荐答案

何时可以衡量,意见并不重要.

Opinions don't matter when you can measure.

我使用自然键和代理在 PostgreSQL 上实现了这个.对于 100,000 个随机选择的产品(1053462 行),我总共使用了 300,000 种产品、180 种成分,并填充了两个产品成分"表,每种产品包含 3 到 17 种成分.

I implemented this on PostgreSQL using both natural keys and surrogates. I used 300,000 total products, 180 ingredients, and populated two "product ingredient" tables with 3 to 17 ingredients per product, for 100,000 randomly selected products (1053462 rows).

使用在 0.067 毫秒内返回的自然键选择单个产品的所有成分.使用代理,0.199 毫秒.

Selecting all the ingredients for a single product using natural keys returned in 0.067 ms. Using surrogates, 0.199ms.

使用在 0.145 毫秒内返回的自然键返回单个产品的所有非 id 列.使用代理,0.222 毫秒

Returning all the non-id columns for a single product using natural keys returned in 0.145 ms. Using surrogates, 0.222 ms

所以自然键在这个数据集上的速度大约快 2 到 3 倍.

So natural keys were about 2 to 3 times faster on this data set.

自然键不需要任何连接即可返回此数据.代理键需要两个连接.

Natural keys don't require any joins to return this data. Surrogate keys require two joins.

实际性能差异取决于表格的宽度、行数、页面大小和名称长度等.代理键会开始优于自然键,但很少有人尝试衡量这一点.

The actual performance difference depends on the width of your tables, number of rows, page size, and length of names, and things like that. There will be a point where surrogate keys start outperforming natural keys, but few people try to measure that.

当我为雇主的操作数据库设计数据库时,我构建了一个测试平台,其中包含围绕自然键设计的表和围绕 ID 编号设计的表.这两种模式都有超过 1300 万行计算机生成的样本数据.在少数情况下,对 id 号模式的查询比自然键模式高 50%.(因此,使用 id 号花费 20 秒的复杂查询使用自然键花费 30 秒.)但是 80% 的测试查询针对自然键模式具有更快的 SELECT 性能.有时它的速度快得惊人——相差 30 比 1.

When I was designing the database for my employer's operational database, I built a testbed with tables designed around natural keys and with tables designed around id numbers. Both those schemas have more than 13 million rows of computer-generated sample data. In a few cases, queries on the id number schema outperformed the natural key schema by 50%. (So a complex query that took 20 seconds with id numbers took 30 seconds with natural keys.) But 80% of the test queries had faster SELECT performance against the natural key schema. And sometimes it was staggeringly faster--a difference of 30 to 1.

我们预计自然键在未来几年内的性能将优于我们数据库中的代理.(除非我们将某些表移到 SSD 上,在这种情况下,自然键可能永远优于代理.)

We expect natural keys to outperform surrogates in our database for years to come. (Unless we move certain tables over to an SSD, in which case natural keys will probably outperform surrogates forever.)

相关文章