具有 NULL 的唯一键

2022-01-06 00:00:00 null database mysql relational-model

这个问题需要一些假设背景.让我们考虑一个 employee 表,它包含 namedate_of_birthtitlesalary 列, 使用 MySQL 作为 RDBMS.因为如果任何给定的人的姓名和出生日期与另一个人相同,那么根据定义,他们就是同一个人(除非我们有两个名叫亚伯拉罕·林肯的人出生于 1809 年 2 月 12 日的惊人巧合),我们将放置一个namedate_of_birth 上的唯一键,意思是不要存储同一个人两次".现在考虑这个数据:

This question requires some hypothetical background. Let's consider an employee table that has columns name, date_of_birth, title, salary, using MySQL as the RDBMS. Since if any given person has the same name and birth date as another person, they are, by definition, the same person (barring amazing coincidences where we have two people named Abraham Lincoln born on February 12, 1809), we'll put a unique key on name and date_of_birth that means "don't store the same person twice." Now consider this data:

id name        date_of_birth title          salary
 1 John Smith  1960-10-02    President      500,000
 2 Jane Doe    1982-05-05    Accountant      80,000
 3 Jim Johnson NULL          Office Manager  40,000
 4 Tim Smith   1899-04-11    Janitor         95,000

如果我现在尝试运行以下语句,它应该并且将会失败:

If I now try to run the following statement, it should and will fail:

INSERT INTO employee (name, date_of_birth, title, salary)
VALUES ('Tim Smith', '1899-04-11', 'Janitor', '95,000')

如果我尝试这个,它会成功:

If I try this one, it will succeed:

INSERT INTO employee (name, title, salary)
VALUES ('Jim Johnson', 'Office Manager', '40,000')

现在我的数据将如下所示:

And now my data will look like this:

id name        date_of_birth title          salary
 1 John Smith  1960-10-02    President      500,000
 2 Jane Doe    1982-05-05    Accountant      80,000
 3 Jim Johnson NULL          Office Manager  40,000
 4 Tim Smith   1899-04-11    Janitor         95,000
 5 Jim Johnson NULL          Office Manager  40,000

这不是我想要的,但我不能说我完全不同意发生的事情.如果我们谈论数学集合,

This is not what I want but I can't say I entirely disagree with what happened. If we talk in terms of mathematical sets,

{'Tim Smith', '1899-04-11'} = {'Tim Smith', '1899-04-11'} <-- TRUE
{'Tim Smith', '1899-04-11'} = {'Jane Doe', '1982-05-05'} <-- FALSE
{'Tim Smith', '1899-04-11'} = {'Jim Johnson', NULL} <-- UNKNOWN
{'Jim Johnson', NULL} = {'Jim Johnson', NULL} <-- UNKNOWN

我的猜测是 MySQL 说,由于我不知道出生日期为 NULL 的 Jim Johnson 不在此表中,我将加他."

My guess is that MySQL says, "Since I don't know that Jim Johnson with a NULL birth date isn't already in this table, I'll add him."

我的问题是:即使 date_of_birth 并不总是已知,我如何防止重复? 到目前为止,我想出的最好方法是移动 date_of_birth 到不同的表.然而,这样做的问题是,我可能最终会遇到两个具有相同姓名、职位和薪水、不同出生日期的收银员,并且无法在没有重复的情况下存储它们.

My question is: How can I prevent duplicates even though date_of_birth is not always known? The best I've come up with so far is to move date_of_birth to a different table. The problem with that, however, is that I might end up with, say, two cashiers with the same name, title and salary, different birth dates and no way to store them both without having duplicates.

推荐答案

唯一键的一个基本属性是它必须是唯一的.使该键成为 Nullable 的一部分会破坏此属性.

A fundamental property of a unique key is that it must be unique. Making part of that key Nullable destroys this property.

您的问题有两种可能的解决方案:

There are two possible solutions to your problem:

  • 一种错误的方法是使用一些魔法日期来表示未知.这只是让你过去DBMS问题",但没有解决逻辑意义上的问题.预计日期未知的两个约翰史密斯"条目会出现问题出生的.这些人是同一个人还是独一无二的个体?如果你知道它们是不同的,那么你又回到了同样的老问题——您的唯一键不是唯一的.甚至不要考虑分配整个范围的魔法日期代表未知"——这才是真正的地狱之路.

  • One way, the wrong way, would be to use some magic date to represent unknown. This just gets you past the DBMS "problem" but does not solve the problem in a logical sense. Expect problems with two "John Smith" entries having unknown dates of birth. Are these guys one and the same or are they unique individuals? If you know they are different then you are back to the same old problem - your Unique Key just isn't unique. Don't even think about assigning a whole range of magic dates to represent "unknown" - this is truly the road to hell.

更好的方法是创建一个 EmployeeId 属性作为代理键.这只是一个您分配给您知道是唯一的个人的任意标识符.这标识符通常只是一个整数值.然后创建一个 Employee 表来关联 EmployeeId(唯一的,不可为空的关键)你认为是依赖属性的,在这种情况下姓名和出生日期(其中任何一项都可以为空).在任何地方使用 EmployeeId 代理键以前使用姓名/出生日期.这会向您的系统添加一个新表,但是以稳健的方式解决未知值的问题.

A better way is to create an EmployeeId attribute as a surrogate key. This is just an arbitrary identifier that you assign to individuals that you know are unique. This identifier is often just an integer value. Then create an Employee table to relate the EmployeeId (unique, non-nullable key) to what you believe are the dependant attributers, in this case Name and Date of Birth (any of which may be nullable). Use the EmployeeId surrogate key everywhere that you previously used the Name/Date-of-Birth. This adds a new table to your system but solves the problem of unknown values in a robust manner.

相关文章