检查连续 x 天 - 数据库中的给定时间戳
谁能给我一个想法或提示,如何在存储登录名(用户 ID、时间戳)的数据库表 (MySQL) 中连续检查 X 天?
Could anybody give me an idea or hint how you could check for X consecutive days in a database table (MySQL) where logins (user id, timestamp) are stored?
Stackoverflow 做到了(例如,像 Enthusiast 这样的徽章 - 如果您连续登录 30 天左右......).您必须使用哪些功能或如何使用它的想法是什么?
Stackoverflow does it (e.g. badges like Enthusiast - if you log in for 30 consecutive days or so...). What functions would you have to use or what is the idea of how to do it?
类似于SELECT 1 FROM login_dates WHERE ...
?
推荐答案
您可以使用移位自外连接和变量来完成此操作.请参阅此解决方案:
You can accomplish this using a shifted self-outer-join in conjunction with a variable. See this solution:
SELECT IF(COUNT(1) > 0, 1, 0) AS has_consec
FROM
(
SELECT *
FROM
(
SELECT IF(b.login_date IS NULL, @val:=@val+1, @val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT @val:=0) var_init
LEFT JOIN tbl b ON
a.user_id = b.user_id AND
a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1
) a
GROUP BY a.consec_set
HAVING COUNT(1) >= 30
) a
这将返回 1
或 0
根据用户是否在 ANYTIME连续登录 30 天或更长时间em> 过去.
This will return either a 1
or a 0
based on if a user has logged in for 30 consecutive days or more at ANYTIME in the past.
这个查询的首当其冲的其实是第一个子选择.让我们仔细看看,以便更好地了解它是如何工作的:
The brunt of this query is really in the first subselect. Let's take a closer look so we can better understand how this works:
使用以下示例数据集:
CREATE TABLE tbl (
user_id INT,
login_date DATE
);
INSERT INTO tbl VALUES
(1, '2012-04-01'), (2, '2012-04-02'),
(1, '2012-04-25'), (2, '2012-04-03'),
(1, '2012-05-03'), (2, '2012-04-04'),
(1, '2012-05-04'), (2, '2012-05-04'),
(1, '2012-05-05'), (2, '2012-05-06'),
(1, '2012-05-06'), (2, '2012-05-08'),
(1, '2012-05-07'), (2, '2012-05-09'),
(1, '2012-05-09'), (2, '2012-05-11'),
(1, '2012-05-10'), (2, '2012-05-17'),
(1, '2012-05-11'), (2, '2012-05-18'),
(1, '2012-05-12'), (2, '2012-05-19'),
(1, '2012-05-16'), (2, '2012-05-20'),
(1, '2012-05-19'), (2, '2012-05-21'),
(1, '2012-05-20'), (2, '2012-05-22'),
(1, '2012-05-21'), (2, '2012-05-25'),
(1, '2012-05-22'), (2, '2012-05-26'),
(1, '2012-05-25'), (2, '2012-05-27'),
(2, '2012-05-28'),
(2, '2012-05-29'),
(2, '2012-05-30'),
(2, '2012-05-31'),
(2, '2012-06-01'),
(2, '2012-06-02');
这个查询:
SELECT a.*, b.*, IF(b.login_date IS NULL, @val:=@val+1, @val) AS consec_set
FROM tbl a
CROSS JOIN (SELECT @val:=0) var_init
LEFT JOIN tbl b ON
a.user_id = b.user_id AND
a.login_date = b.login_date + INTERVAL 1 DAY
WHERE a.user_id = 1
将产生:
如您所见,我们正在做的是将连接表移动 +1 天.对于与前一天不连续的每一天,LEFT JOIN 会生成一个 NULL
值.
As you can see, what we are doing is shifting the joined table by +1 day. For each day that is not consecutive with the prior day, a NULL
value is generated by the LEFT JOIN.
既然我们知道非连续天数在哪里,我们可以使用一个变量来区分连续天数的每个集合,方法是检测移位表的行是否是 NULL
.如果它们是NULL
,则天数不连续,因此只需增加变量.如果它们是 NOT NULL
,则不要增加变量:
Now that we know where the non-consecutive days are, we can use a variable to differentiate each set of consecutive days by detecting whether or not the shifted table's rows are NULL
. If they are NULL
, the days are not consecutive, so just increment the variable. If they are NOT NULL
, then don't increment the variable:
在我们用递增变量区分每组连续天之后,只需按每个组"(如 consec_set
列中定义)进行分组并使用 HAVING
过滤掉任何少于指定连续天数(在您的示例中为 30)的集合:
After we've differentiated each set of consecutive days with the incrementing variable, it's then just a simple matter of grouping by each "set" (as defined in the consec_set
column) and using HAVING
to filter out any set that has less than the specified consecutive days (30 in your example):
最后,我们包装 THAT 查询并简单地计算连续 30 天或更多天的集合数.如果有这些集合中的一个或多个,则返回1
,否则返回0
.
Then finally, we wrap THAT query and simply count the number of sets that had 30 or more consecutive days. If there was one or more of these sets, then return 1
, otherwise return 0
.
相关文章