SQL:选择同一表中行不符合条件的事务

2022-01-23 00:00:00 sql subquery mysql

我有一张交易表:

Transactions
------------
id | account | type | date_time             | amount
----------------------------------------------------
 1 | 001     | 'R'  | '2012-01-01 10:01:00' | 1000
 2 | 003     | 'R'  | '2012-01-02 12:53:10' | 1500
 3 | 003     | 'A'  | '2012-01-03 13:10:01' | -1500
 4 | 002     | 'R'  | '2012-01-03 17:56:00' | 2000
 5 | 001     | 'R'  | '2012-01-04 12:30:01' | 1000
 6 | 002     | 'A'  | '2012-01-04 13:23:01' | -2000
 7 | 003     | 'R'  | '2012-01-04 15:13:10' | 3000
 8 | 003     | 'R'  | '2012-01-05 12:12:00' | 1250
 9 | 003     | 'A'  | '2012-01-06 17:24:01' | -1250

并且我希望选择所有特定类型('R'),但不是那些立即(按 date_time 字段的顺序)为同一帐户提交的另一个类型的交易('A')...

and I wish to select all of certain type ('R'), but not those that immediatly (in order of the date_time field) have another transaction of another type ('A') for the same account filed...

因此,在前面的示例中,查询应该抛出以下行:

So, the query should throw the following rows, given the previous example:

id | account |type  | date                  | amount
----------------------------------------------------
 1 | 001     | 'R'  | '2012-01-01 10:01:00' | 1000
 5 | 001     | 'R'  | '2012-01-04 12:30:01' | 1000
 7 | 003     | 'R'  | '2012-01-04 15:13:10' | 3000

(如您所见,第 2 行未显示,因为第 3 行取消"它...第 4 行也被第 6 行取消";第 7 行确实出现(即使帐户 003 属于已取消第 2 行,这次在第 7 行,它没有被任何A"行取消);并且第 8 行不会出现(对于 003 帐户也是如此,因为现在这个被 9 取消了,它也不会取消 7,只是上一个:8...

(As you can see, row 2 isn't displayed because row 3 'cancels' it... also row 4 is 'cancelled' by row 6'; Row 7 do appears (even though the account 003 belongs to cancelled row #2, this time in row 7 it's not cancelled by any 'A' row); And row 8 won't appear (it's too for 003 account since now this one is cancelled by 9, which doesn't cancels 7 too, just the previouse one: 8...

我在 Where 子句中尝试过联接、子查询,但我真的不确定我必须如何进行查询...

I have tried Joins, subqueries in Where clauses but I'm really not sure how do I must make my query...

我尝试过的:

尝试加入:

   SELECT trans.type as type,
          trans.amount as amount,
          trans.date_time as dt,
          trans.account as acct,
     FROM Transactions trans
INNER JOIN ( SELECT t.type AS type, t.acct AS acct, t.date_time AS date_time
               FROM Transactions t
              WHERE t.date_time > trans.date_time
           ORDER BY t.date_time DESC
          ) AS nextTrans
       ON nextTrans.acct = trans.acct
    WHERE trans.type IN ('R')
      AND nextTrans.type NOT IN ('A')
 ORDER BY DATE(trans.date_time) ASC

这会引发错误,因为我无法将外部值引入 MySQL 中的 JOIN.

This throws an error, since I can't introduce external values to the JOIN in MySQL.

在 where 中尝试子查询:

Trying subquery in where:

   SELECT trans.type as type,
          trans.amount as amount,
          trans.date_time as dt,
          trans.account as acct,
     FROM Transactions trans
    WHERE trans.type IN ('R')
      AND trans.datetime <
          ( SELECT t.date_time AS date_time
               FROM Transactions t
              WHERE t.account = trans.account
           ORDER BY t.date_time DESC
          ) AS nextTrans
       ON nextTrans.acct = trans.acct

 ORDER BY DATE(trans.date_time) ASC

这是错误的,我可以将外部值引入 MySQL 中的 WHERE,但我无法找到正确过滤所需内容的方法...

This is wrong, I can get to introduce external values to the WHERE in MySQL but I cannot manage to find the way to filter correctly for what I need...

重要

我设法找到了一个解决方案,但现在需要认真优化.这里是:

I managed to achieve a solution, but it now needs serious optimization. Here it is:

SELECT *
  FROM (SELECT t1.*, tFlagged.id AS cancId, tFlagged.type AS cancFlag
          FROM transactions t1
     LEFT JOIN (SELECT t2.*
                  FROM transactions t2
              ORDER BY t2.date_time ASC ) tFlagged
            ON (t1.account=tFlagged.account
                  AND
                t1.date_time < tFlagged.date_time)
         WHERE t1.type = 'R'
      GROUP BY t1.id) tCanc
 WHERE tCanc.cancFlag IS NULL
    OR tCanc.cancFlag <> 'A'

我自己加入了表格,只是考虑了相同的帐户和很棒的 date_time.加入按 date_time 排序.按 id 分组我设法只获得了加入的第一个结果,这恰好是同一帐户的下一个交易.

I joined the table with itself, just considering same account and great date_time. The Join goes ordered by date_time. Grouping by id I managed to get only the first result of the join, which happens to be the next transaction for the same account.

然后在外部选择中,我过滤掉那些具有A"的,因为这意味着下一个事务实际上是取消它.换句话说,如果同一个账户没有下一笔交易,或者下一笔交易是'R',那么它不会被取消,它必须显示在结果中......

Then on the outer select, I filter out those that have an 'A', since that means that the next transaction was effectively a cancelation for it. In other words, if there is no next transaction for the same account or if the next transaction is an 'R', then it is not cancelled and it must be shown in the result...

我知道了:

+----+---------+------+---------------------+--------+--------+----------+
| id | account | type | date_time           | amount | cancId | cancFlag |
+----+---------+------+---------------------+--------+--------+----------+
|  1 | 001     |   R  | 2012-01-01 10:01:00 |   1000 |      5 | R        |
|  5 | 001     |   R  | 2012-01-04 12:30:01 |   1000 |   NULL | NULL     |
|  7 | 003     |   R  | 2012-01-04 15:13:10 |   3000 |      8 | R        |
+----+---------+------+---------------------+--------+--------+----------+

它将同一帐户的每笔交易及时与下一笔交易相关联,然后过滤掉那些已取消的交易......成功!!

It relates each transaction with the next one in time for the same account and then filters out those that have been cancelled... Success!!

正如我所说,现在的问题是优化.我的真实数据有很多行(作为一个通过时间保存事务的表,预计会有),并且对于现在约 10,000 行的表,我在 1 分 44 秒内得到了这个查询的肯定结果.我想这就是加入的问题......(对于那些知道这里的协议的人,我该怎么办?在这里发起一个新问题并将其作为解决方案发布?或者只是在这里等待更多答案?)

As I said, the problem now is optimization. My real data has a lot of rows (as a table holding transactions through time is expected to have), and for a table of ~10,000 rows right now, I got a positive result with this query in 1min.44sec. I suppose that's the thing with joins... (For those who know the protocol in here, what should I do? launch a new question here and post this as a solution to this one? Or just wait for more answers here?)

推荐答案

这里有一个基于嵌套子查询的解决方案.首先,我添加了几行以捕获更多案例.例如,事务 10 不应被事务 12 取消,因为事务 11 介于两者之间.

Here is a solution based on nested subqueries. First, I added a few rows to catch a few more cases. Transaction 10, for example, should not be cancelled by transaction 12, because transaction 11 comes in between.

> select * from transactions order by date_time;
+----+---------+------+---------------------+--------+
| id | account | type | date_time           | amount |
+----+---------+------+---------------------+--------+
|  1 |       1 | R    | 2012-01-01 10:01:00 |   1000 |
|  2 |       3 | R    | 2012-01-02 12:53:10 |   1500 |
|  3 |       3 | A    | 2012-01-03 13:10:01 |  -1500 |
|  4 |       2 | R    | 2012-01-03 17:56:00 |   2000 |
|  5 |       1 | R    | 2012-01-04 12:30:01 |   1000 |
|  6 |       2 | A    | 2012-01-04 13:23:01 |  -2000 |
|  7 |       3 | R    | 2012-01-04 15:13:10 |   3000 |
|  8 |       3 | R    | 2012-01-05 12:12:00 |   1250 |
|  9 |       3 | A    | 2012-01-06 17:24:01 |  -1250 |
| 10 |       3 | R    | 2012-01-07 00:00:00 |   1250 |
| 11 |       3 | R    | 2012-01-07 05:00:00 |   4000 |
| 12 |       3 | A    | 2012-01-08 00:00:00 |  -1250 |
| 14 |       2 | R    | 2012-01-09 00:00:00 |   2000 |
| 13 |       3 | A    | 2012-01-10 00:00:00 |  -1500 |
| 15 |       2 | A    | 2012-01-11 04:00:00 |  -2000 |
| 16 |       2 | R    | 2012-01-12 00:00:00 |   5000 |
+----+---------+------+---------------------+--------+
16 rows in set (0.00 sec)

首先,为每笔交易创建一个查询,以获取同一账户中最近一次交易的日期":

First, create a query to grab, for each transaction, "the date of the most recent transaction before that one in the same account":

SELECT t2.*,
       MAX(t1.date_time) AS prev_date
FROM transactions t1
JOIN transactions t2
ON (t1.account = t2.account
   AND t2.date_time > t1.date_time)
GROUP BY t2.account,t2.date_time
ORDER BY t2.date_time;

+----+---------+------+---------------------+--------+---------------------+
| id | account | type | date_time           | amount | prev_date           |
+----+---------+------+---------------------+--------+---------------------+
|  3 |       3 | A    | 2012-01-03 13:10:01 |  -1500 | 2012-01-02 12:53:10 |
|  5 |       1 | R    | 2012-01-04 12:30:01 |   1000 | 2012-01-01 10:01:00 |
|  6 |       2 | A    | 2012-01-04 13:23:01 |  -2000 | 2012-01-03 17:56:00 |
|  7 |       3 | R    | 2012-01-04 15:13:10 |   3000 | 2012-01-03 13:10:01 |
|  8 |       3 | R    | 2012-01-05 12:12:00 |   1250 | 2012-01-04 15:13:10 |
|  9 |       3 | A    | 2012-01-06 17:24:01 |  -1250 | 2012-01-05 12:12:00 |
| 10 |       3 | R    | 2012-01-07 00:00:00 |   1250 | 2012-01-06 17:24:01 |
| 11 |       3 | R    | 2012-01-07 05:00:00 |   4000 | 2012-01-07 00:00:00 |
| 12 |       3 | A    | 2012-01-08 00:00:00 |  -1250 | 2012-01-07 05:00:00 |
| 14 |       2 | R    | 2012-01-09 00:00:00 |   2000 | 2012-01-04 13:23:01 |
| 13 |       3 | A    | 2012-01-10 00:00:00 |  -1500 | 2012-01-08 00:00:00 |
| 15 |       2 | A    | 2012-01-11 04:00:00 |  -2000 | 2012-01-09 00:00:00 |
| 16 |       2 | R    | 2012-01-12 00:00:00 |   5000 | 2012-01-11 04:00:00 |
+----+---------+------+---------------------+--------+---------------------+
13 rows in set (0.00 sec)

将其用作子查询以获取同一行中的每个事务及其前身.使用一些过滤来提取我们感兴趣的交易 - 即A"交易,其前身是R"交易,它们完全取消了 -

Use that as a subquery to get each transaction and its predecessor on the same row. Use some filtering to pull out the transactions we're interested in - namely, 'A' transactions whose predecessors are 'R' transactions that they exactly cancel out -

SELECT
  t3.*,transactions.*
FROM
  transactions
  JOIN
  (SELECT t2.*,
          MAX(t1.date_time) AS prev_date
   FROM transactions t1
   JOIN transactions t2
   ON (t1.account = t2.account
      AND t2.date_time > t1.date_time)
   GROUP BY t2.account,t2.date_time) t3
  ON t3.account = transactions.account
     AND t3.prev_date = transactions.date_time
     AND t3.type='A'
     AND transactions.type='R'
     AND t3.amount + transactions.amount = 0
  ORDER BY t3.date_time;


+----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+
| id | account | type | date_time           | amount | prev_date           | id | account | type | date_time           | amount |
+----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+
|  3 |       3 | A    | 2012-01-03 13:10:01 |  -1500 | 2012-01-02 12:53:10 |  2 |       3 | R    | 2012-01-02 12:53:10 |   1500 |
|  6 |       2 | A    | 2012-01-04 13:23:01 |  -2000 | 2012-01-03 17:56:00 |  4 |       2 | R    | 2012-01-03 17:56:00 |   2000 |
|  9 |       3 | A    | 2012-01-06 17:24:01 |  -1250 | 2012-01-05 12:12:00 |  8 |       3 | R    | 2012-01-05 12:12:00 |   1250 |
| 15 |       2 | A    | 2012-01-11 04:00:00 |  -2000 | 2012-01-09 00:00:00 | 14 |       2 | R    | 2012-01-09 00:00:00 |   2000 |
+----+---------+------+---------------------+--------+---------------------+----+---------+------+---------------------+--------+
4 rows in set (0.00 sec)

从上面的结果可以看出,我们快到了 - 我们已经识别出不需要的交易.使用 LEFT JOIN 我们可以将它们从整个事务集中过滤掉:

From the result above it's apparent we're almost there - we've identified the unwanted transactions. Using LEFT JOIN we can filter these out of the whole transaction set:

SELECT
  transactions.*
FROM
  transactions
LEFT JOIN
  (SELECT
     transactions.id
   FROM
     transactions
     JOIN
     (SELECT t2.*,
             MAX(t1.date_time) AS prev_date
      FROM transactions t1
      JOIN transactions t2
      ON (t1.account = t2.account
         AND t2.date_time > t1.date_time)
      GROUP BY t2.account,t2.date_time) t3
     ON t3.account = transactions.account
        AND t3.prev_date = transactions.date_time
        AND t3.type='A'
        AND transactions.type='R'
        AND t3.amount + transactions.amount = 0) t4
  USING(id)
  WHERE t4.id IS NULL
    AND transactions.type = 'R'
  ORDER BY transactions.date_time;

+----+---------+------+---------------------+--------+
| id | account | type | date_time           | amount |
+----+---------+------+---------------------+--------+
|  1 |       1 | R    | 2012-01-01 10:01:00 |   1000 |
|  5 |       1 | R    | 2012-01-04 12:30:01 |   1000 |
|  7 |       3 | R    | 2012-01-04 15:13:10 |   3000 |
| 10 |       3 | R    | 2012-01-07 00:00:00 |   1250 |
| 11 |       3 | R    | 2012-01-07 05:00:00 |   4000 |
| 16 |       2 | R    | 2012-01-12 00:00:00 |   5000 |
+----+---------+------+---------------------+--------+

相关文章