SQL join 以下外键:静态检查 LHS 是否保留键

通常您按照外键连接两个表,以便始终可以找到 RHS 表中的行.添加连接不会影响受查询影响的行数.例如

Often you join two tables following their foreign key, so that the row in the RHS table will always be found. Adding the join does not affect the number of rows affected by the query. For example

create table a (x int not null primary key)
create table b (x int not null primary key, y int not null)
alter table a add foreign key (x) references b (x)

现在,假设你在这两个表中设置了一些数据,你可以从a中获取一定数量的行:

Now, assuming you set up some data in these two tables, you can get a certain number of rows from a:

select x from a

在外键之后添加到 b 的连接不会改变这一点:

Adding a join to b following the foreign key does not change this:

select a.x from a join b on a.x = b.x

但是,通常情况下的连接并非如此,它可能会过滤掉一些行或(通过笛卡尔积)添加更多行:

However, that is not true of joins in general, which may filter out some rows or (by Cartesian product) add more:

select a.x from a join b on a.x = b.x and b.y != 42 -- probably gives fewer rows

select a.x from a join b on a.x != b.y -- probably gives more rows

在阅读 SQL 代码时,没有明显的方法来判断 join 是否是保留键的类型,它可能会增加额外的列但不会改变返回的行数,或者它是否有其他效果.随着时间的推移,我制定了一个我主要坚持的编码约定:

When reading SQL code there is no obvious way to tell whether a join is the key-preserving kind, which may add extra columns but does not change the number of rows returned, or whether it has other effects. Over time I have developed a coding convention which I mostly stick to:

  • 如果是保留密钥的连接,请使用 join
  • 如果要过滤行,将过滤条件放在where子句中
  • 如果想要更多的行,有时笛卡尔积的cross join 是最清晰的方法
  • if a key-preserving join, use join
  • if wanting to filter rows, put the filter condition in the where clause
  • if wanting more rows, sometimes cross join for Cartesian product is the clearest way

这些通常只是样式问题,因为您通常可以将谓词放入 join 子句或 where 子句中,例如.

These are usually just style issues, since you can often put a predicate into either the join clause or the where clause, for example.

我的问题

在编译查询时,是否有某种方法可以让数据库服务器静态检查这些保留键的连接?我知道查询优化器已经知道外键上的连接将始终在外键指向的表中找到一行.但为了人类读者的利益,我想在我的 SQL 代码中标记它.例如,假设新语法 fkjoin 用于外键后的连接.那么下面的SQL片段是否会报错:

Is there some way to have these key-preserving joins statically checked by the database server when the query is compiled? I understand that the query optimizer already knows that a join on a foreign key will always find exactly one row in the table pointed to by the foreign key. But I would like to tag it in my SQL code for the benefit of human readers. For example, suppose the new syntax fkjoin is used for a join following a foreign key. Then the following SQL fragments will give errors or not:

a fkjoin b on a.x = b.x -- OK

a fkjoin b on a.x = b.x and b.y = 42 -- "Error, join can fail due to extra predicate"

a fkjoin b on a.x = b.y -- "Error, no foreign key from a.x to b.y"

这对我在编写 SQL 以及稍后返回阅读时非常有用.我理解并接受更改数据库中的外键会改变在这个方案下什么 SQL 是合法的 - 对我来说,这是一个理想的结果,因为如果一个必要的 FK 不再存在,那么查询的键保留语义是不存在的更长时间的保证,我想了解一下.

This would be a useful check for me when writing the SQL, and also when returning to read it later. I understand and accept that changing the foreign keys in the database would change what SQL is legal under this scheme - to me, that is a desired outcome, since if a necessary FK ceases to exist then the key-preserving semantics of the query are no longer guaranteed, and I'd like to find out about it.

可能有一些外部 SQL 静态检查器工具可以完成这项工作,并且可以使用特殊的注释语法而不是新关键字.检查器工具需要访问数据库模式以查看存在哪些外键,但它不需要实际执行查询.

Potentially, there could be some external SQL static checker tool that does the work, and special comment syntax could be used rather than a new keyword. The checker tool would need access to the database schema to see what foreign keys exist, but it would not need to actually execute the query.

有什么可以做我想要的吗?我正在使用 MSSQL 2008 R2.(学究的微软 SQL Server)

Is there something that does what I want? I am using MSSQL 2008 R2. (Microsoft SQL Server for the pedantic)

推荐答案

我意识到您有兴趣指出特定列上的特定连接是在 FK 上,还是限制,或者可能是其他情况,或以上都没有.(并且不清楚您所说的连接成功"或失败"是什么意思,或其相关性.)而专注于这些信息,如下所述,就是忽略了对更重要和基本的事情的关注.

I realize that you are interested in indicating whether a particular join on particular columns is on a FK, or is a restriction, or perhaps is of some other case, or none of the preceding. (And it's not clear what you mean by "success" or "failure" of a join, or its relevance.) Whereas focusing on that information, as explained below, is to miss focusing on more important and fundamental things.

基表具有含义"或谓词(表达式)",它是 DBA 给出的填充(命名)空白语句.语句的空格的名称是表的列.填空以提出关于世界的真实命题的行列在表中.填补空白以对世界做出错误主张的行被排除在外.即一个表包含满足其语句的行.在不了解基表的语句、观察世界并将适当的行放入表中的情况下,您无法将基表设置为某个值.你不能从基表中了解世界,除非知道它的陈述并将当前行命题视为真,缺席行命题为假.即你需要它的语句才能使用数据库.

A base table has a "meaning" or "predicate (expression)" that is a fill-in-the-(named-)blanks statement given by the DBA. The names of the blanks of the statement are the columns of the table. Rows that fill in the blanks to make a true proposition about the world go in the table. Rows that fill in the blanks to make a false proposition about the world are left out. Ie a table holds the rows that satisfy its statement. You cannot set a base table to a certain value without knowing its statement, observing the world and putting the appropriate rows into the table. You cannot know about the world from base tables except by knowing its statement and taking present-row propositions to be true and absent-row propositions to be false. Ie you need its statement to use the database.

请注意,表声明的典型语法看起来像是其语句的简写:

Notice that the typical syntax for a table declaration looks like a shorthand for its statement:

-- employee [eid] is named [name] and lives at [address] in ...
EMPLOYEE(eid,name,address,...)

您可以通过在其他语句之间/周围放置逻辑运算符 AND、OR、AND NOT、EXISTS name、AND condition 等来制作更大的语句.如果通过转换将语句转换为关系/SQL 表达式

You can make bigger statements by putting logic operators AND, OR, AND NOT, EXISTS name, AND condition, etc between/around other statements. If you translate a statement to a relation/SQL expression by converting

  • 表对其名称的声明
  • AND 到 JOIN
  • OR 到 UNION
  • AND NOT EXCEPT/MINUS
  • EXISTS C,... [...] 到 SELECT 除 C,... FROM ...
  • AND condition 到 ON/WHERE condition
  • 暗示 SUBSETOF
  • IFF 到 =
  • a table's statement to its name
  • AND to JOIN
  • OR to UNION
  • AND NOT to EXCEPT/MINUS
  • EXISTS C,... [...] to SELECT all columns but C,... FROM ...
  • AND condition to ON/WHERE condition
  • IMPLIES to SUBSETOF
  • IFF to =

然后你会得到一个关系表达式,它计算使语句为真的行.(UNION & EXCEPT/MINUS 的参数需要相同的列.)所以就像每个表都保存满足其语句的行一样,查询表达式保存满足其语句的行.你不能从查询结果中了解世界,除非知道它的陈述并将其当前行命题视为真,将缺席行命题视为假.即您需要它的语句来编写或解释查询.(请注意,无论约束条件如何,这都是正确的.)

then you get a relation expression that calculates the rows that make the statement true. (Arguments of UNION & EXCEPT/MINUS need the same columns.) So just as every table holds the rows satisfying its statement, a query expression holds the rows that satisfy its statement. You cannot know about the world from a query result except by knowing its statement and taking its present-row propositions to be true and absent-row propositions to be false. Ie you need its statement to compose or interpret a query. (Observe that this is true regardless of what constraints hold.)

这是关系模型的基础:表表达式计算满足相应语句的行.(就SQL的不同而言,它实际上是不合逻辑的.)

This is the foundation of the relational model: table expressions calculate rows satisfying corresponding statements. (To the extent that SQL differs, it is literally illogical.)

例如:如果表 T 包含使语句 T(...,T.Ci,...) 为真的行,并且表 U 包含使语句 T(...,T.Ci,...) 为真的行make statement U(...,U.Cj,...) true then table T JOIN U 包含 make statement T(...,T.Ci,...) AND 的行U(...,U.Cj,...) 真.这就是对使用数据库很重要的 JOIN 语义.你总是可以连接,连接总是有意义的,它总是其操作数意义的与.是否有任何表碰巧对其他表具有 FK,对于推理更新或查询并不是特别有帮助.(当您犯错误时,DBMS 会使用约束.)

Eg: If table T holds the rows that make statement T(...,T.Ci,...) true and table U holds the rows that make statement U(...,U.Cj,...) true then table T JOIN U holds the rows that make statement T(...,T.Ci,...) AND U(...,U.Cj,...) true. That is the semantics of JOIN that is important to using a database. You can always join, and a join always has a meaning, and it is always the AND of its operands' meanings. Whether any tables happen to have FKs to others just isn't particularly helpful for reasoning about updates or queries. (The DBMS uses constraints for when you make mistakes.)

约束表达式仅对应于一个命题,即关于世界的始终为真的语句,同时对应于一个关于基表的命题.例如对于C UNIQUE NOT NULL in U,下面三个表达式是等价的:

A constraint expression just corresponds to a proposition aka always-true statement about the world and simultaneusly to one about base tables. Eg for C UNIQUE NOT NULL in U, the following three expressions are equivalent to each other:

  • 外键 T (C) 引用 U (C)
  •    EXISTS C 以外的列 T(...,C,...)
    IMPLIES EXISTS C 以外的列 U(...,C,...)
  • (SELECT C FROM T) SUBSETOF (SELECT C FROM U)

确实,这意味着 SELECT C FROM T JOIN U ON TC = UC = SELECT C FROM U,即 FK 上的连接返回相同的数字行.但那又怎样?连接的含义仍然是其参数的相同功能.

It is true that this implies that SELECT C FROM T JOIN U ON T.C = U.C = SELECT C FROM U, ie a join on a FK returns the same number of rows. But so what? The join's meaning is still the same function of its arguments'.

特定列集上的特定联接是否涉及外键与理解查询的含义无关.

Whether a particular join on a particular column set involves a foreign key is just not germane to understanding the meaning of a query.

相关文章