MySQL GROUP BY DateTime +/- 3 秒

2021-12-26 00:00:00 sql group-by mysql datetime-generation

假设我有一个包含 3 列的表格:

Suppose I have a table with 3 columns:

id (PK, int)
时间戳(日期时间)
标题(文字)

我有以下记录:

1, 2010-01-01 15:00:00, Some Title 2, 2010-01-01 15:00:02, Some Title 3, 2010-01-02 15:00:00, Some Title

我需要做一个 GROUP BY 记录，它们之间的间隔在 3 秒之内.对于此表，第 1 行和第 2 行将组合在一起.

I need to do a GROUP BY records that are within 3 seconds of each other. For this table, rows 1 and 2 would be grouped together.

这里有一个类似的问题:Mysql DateTime group by 15 mins

There is a similar question here: Mysql DateTime group by 15 mins

我也发现了这个:http://www.artfulsoftware.com/infotree/queries.php#106

我不知道如何将这些方法转换为几秒钟内有效的方法.SO 问题上的方法的问题在于，在我看来，它仅适用于落在从已知点开始的时间范围内的记录.例如，如果我要让 FLOOR() 以秒为单位工作，则每隔 5 秒，15:00:04 的时间将与 15:00:01 分组，但不会与 15:00:06 分组.

I don't know how to convert these methods into something that will work for seconds. The trouble with the method on the SO question is that it seems to me that it would only work for records falling within a bin of time that starts at a known point. For instance, if I were to get FLOOR() to work with seconds, at an interval of 5 seconds, a time of 15:00:04 would be grouped with 15:00:01, but not grouped with 15:00:06.

这有意义吗?如果需要进一步说明，请告诉我.

Does this make sense? Please let me know if further clarification is needed.

对于一组数字 {1, 2, 3, 4, 5, 6, 7, 50, 51, 60}，似乎最好将它们分组{1, 2, 3, 4, 5, 6, 7}, {50, 51}, {60}，这样每个分组行取决于该行是否在前一行的 3 秒内.我知道这会改变一些事情，我很抱歉在这方面有点虚幻.

For the set of numbers, {1, 2, 3, 4, 5, 6, 7, 50, 51, 60}, it seems it might be best to group them {1, 2, 3, 4, 5, 6, 7}, {50, 51}, {60}, so that each grouping row depends on if the row is within 3 seconds of the previous. I know this changes things a bit, I'm sorry for being wishywashy on this.

我正在尝试模糊匹配来自不同服务器的日志.服务器#1 可能会记录一个项目项目#1"，而服务器#2 将在服务器#1 的几秒钟内记录相同的项目项目#1".我需要在两个日志行上做一些聚合函数.不幸的是，由于服务器软件的性质，我只有标题可以继续.

I am trying to fuzzy-match logs from different servers. Server #1 may log an item, "Item #1", and Server #2 will log that same item, "Item #1", within a few seconds of server #1. I need to do some aggregate functions on both log lines. Unfortunately, I only have title to go on, due to the nature of the server software.

推荐答案

我正在使用 Tom H. 的绝妙主意，但在这里做的有点不同:

I'm using Tom H.'s excellent idea but doing it a little differently here:

我们可以找到所有作为链开头的时间，而不是查找所有作为链开头的行，然后返回并找到与时间匹配的行.

Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.

此处的查询 #1 应该通过查找哪些时间在它们之下但在 3 秒内没有任何时间来告诉您哪些时间是链的开始:

Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:

SELECT DISTINCT Timestamp FROM Table a LEFT JOIN Table b ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS AND b.Timestamp < a.Timestamp) WHERE b.Timestamp IS NULL

然后对于每一行，我们可以找到小于查询 #2 时间戳的最大链起始时间戳:

And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:

SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime FROM Table JOIN ([query #1]) StartofChains ON Table.Timestamp >= StartOfChains.TimeStamp GROUP BY Table.id

一旦我们有了它，我们就可以根据需要对其进行分组.

Once we have that, we can GROUP BY it as you wanted.

SELECT COUNT(*) --or whatever FROM Table JOIN ([query #2]) GroupingQuery ON Table.id = GroupingQuery.id GROUP BY GroupingQuery.ChainStartTime

我不完全确定这与 Tom H 的答案是否有足够的区别，可以单独发布，但听起来您在实施方面遇到了问题，我正在考虑，所以我想我会再次发布.祝你好运！

I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!

相关文章