比较所有列的 MariaDB/MySQL 的替代方案

2022-01-15 00:00:00 function mariadb mysql stored-procedures except

我知道 MariaDB 和 MySQL 不支持 EXCEPT.我想找到这样的替代方案:

I know MariaDB and MySQL don't support EXCEPT. I would like to find an alternative to something like this:

SELECT * FROM table EXCEPT SELECT * FROM backup_table

其中 table 和 backup_table 具有相同的架构.

Where the table and backup_table have the same schema.

我看到的所有帖子都建议我使用WHERE column IN (...)"比较单个列.我的问题是我需要为每个表比较两个表之间的所有列.我希望将其编写为遍历所有表的过程或函数，以查找数据库中的任何更改.基本上，我想找出所有表中已更新或插入的所有记录.

All the posts I've seen suggests that I compare a single column using "WHERE column IN (...)". The problem in my case is that I need to compare all the columns between the two tables for every table. I'm hoping to write this as procedure or function looping through all the tables, looking for any changes in the database. Basically, I want to find out all the records that have been updated or inserted in all my tables.

推荐答案

如果我面临这个任务，我会使用反连接模式.这是一个外连接，用于返回当前表中的所有行以及备份表中的匹配"行.然后在 WHERE 子句中，我们排除所有完全匹配的行.返回不匹配的行.

If I was faced with that task, I'd use an anti-join pattern. That's an outer join, to return all rows from the current table, along with "matching" rows from the backup table. Then in the WHERE clause, we exclude all rows that had an exact match. Returning rows that don't match.

SELECT t.* FROM mytable t LEFT JOIN backup_mytable s ON s.id <=> t.id AND s.col_two <=> t.col_two AND s.col_three <=> t.col_three AND ... WHERE s.id IS NULL

这假定列 id 保证为非 NULL.PRIMARY KEY 列(或作为表的 PRIMARY KEY 一部分的任何列，或具有 NOT NULL 约束的任何列都可以使用.)

This assumes that the column id is guaranteed to be non-NULL. The PRIMARY KEY column (or any column that is part of the PRIMARY KEY of the table, or any column that has a NOT NULL constraint would serve.)

此查询仅返回与备份表中的行不匹配的行.它不指示它的行是否不存在，或者列的值是否已更改.

This query only returns the rows that don't match a row in backup table. It doesn't indicate whether its row that doesn't exist, or whether a value of a column was changed.

要获取原始表中与备份表中的行不匹配的行，只需交换表名即可.

And to get rows in the original table that don't match rows in the backup table, just swap the table names.

对于所有列都定义为 NOT NULL 的表的特殊情况，我们可以在连接谓词上采用快捷方式.

For the special case of a table with all columns defined as NOT NULL, we could take a shortcut on the join predicates.

FROM mytable t NATURAL LEFT JOIN backup_mytable s WHERE s.id IS NULL

这相当于一个 LEFT JOIN 与一个 USING 子句的所有列在两个表中命名相同.

That's equivalent to a LEFT JOIN with a USING clause of all columns that are named the same in both tables.

FROM mytable t LEFT JOIN backup_mytable s USING (id, col_two, col_three, ...) WHERE s.id IS NULL

这相当于在每一列上指定一个相等比较(如果两个表具有相同的列)

That's equivalent to specifying an equality comparison on every column (if both tables have the same columns)

FROM mytable t LEFT JOIN backup_mytable s ON s.id = t.id AND s.col_two = t.col_two AND s.col_three = t.col_three

任何列中出现的任何 NULL 值都会与相等比较发生冲突，并返回 NULL.

Any occurrences of NULL values in any of the columns are going to screw with the equality comparison, and return NULL.

这就是为什么第一个查询使用空安全比较 <=>(宇宙飞船)运算符.NULL <=>NULL 将返回 TRUE，其中 NULL = NULL 将返回 NULL.

And that's why the first query uses the null-safe comparison <=> (spaceship) operator. NULL <=> NULL will return TRUE, where NULL = NULL will return NULL.

对于第一个查询模式，我会使用 SQL 来帮助我生成所需的 SQL，而不是繁琐地输入每一列的所有比较.

For that first query pattern, rather than tediously typing out all of those comparisons of every column, I would use SQL to help me generate the SQL I need.

SELECT CONCAT(' AND s.`',c.column_name,'` <=> t.`',c.column_name,'`') AS `-- stmt` FROM information_schema.columns c WHERE c.table_schema = 'mydatabase' AND c.table_name = 'mytable' ORDER BY c.ordinal_position

我会获取该查询返回的行，并将其粘贴到

I'd take the rows returned by that query, and paste that in

SELECT t.* FROM ... t JOIN ... s ON 1=1 -- paste here -- WHERE s.id IS NULL ORDER BY t.id

<小时>
如果我需要只匹配 id 列的查询，并且需要识别哪些列发生了变化，我会在 SELECT 列表中使用表达式.例如:

If I needed query that matched on just the id column, and needed to identify which columns had changed, I'd use expressions in the SELECT list. For example:

SELECT s.`id` <=> t.`id` AS `match_id` , s.`col_one` <=> t.`col_one` AS `match_col_one` , s.`col_three` <=> t.`col_three` AS `match_col_three` FROM mytable t JOIN backup_mytable s ON s.id = t.id HAVING NOT match_col_one

这里在 HAVING 子句中引用 SELECT 列表中的列别名，以排除具有相同 col_one 值的行；返回 col_one 不同的行.

Here referencing the column alias in the SELECT list in a HAVING clause, to exclude rows that have the same value of col_one; returning rows where col_one is different.

再次，我将针对 information_schema.columns 使用 SQL 来帮助加快查询编写过程.

Again, I would use SQL against information_schema.columns to help speed up the query writing process.

相关文章