TSQL 时间序列模式数据挖掘
以包含以下 3 个字段的 SQL 表为例:
Take a SQL table with the following 3 fields:
Id,TimeStamp,Item,UserId
我想确定会话中 UserId
最常见的 Item
序列.会话将简单地由时间阈值定义(即,如果 X 分钟内没有完整内容,则未来的任何条目都将被分组到一个新会话中).
I would like to determine the most common sequences of Item
for a UserId
in a session. A session would simply be defined by a threshold of time (i.e. if there are no entires for X minutes, any future entries would be grouped into a new session).
理想情况下,项目序列可以有一种模糊分组,其中序列中的一个或两个差异仍然可以被视为相同并组合在一起.
Ideally, the sequence of Items could have a sort of fuzzy grouping where one or two differences in the sequence could still be counted as the same and grouped together.
有人知道我如何在 SQL 中解决这个问题吗?
Anyone know how I might tackle this problem in SQL?
更新:
为了澄清,让我们假设 Items 是杂货店岛.我有一个月的人去杂货店.基本问题是人们使用什么岛以及它的顺序是什么.他们最常去的是1,2,3
还是1,2,1,3,4
?
(现在我很好奇用户在我们网站上的路径,但你知道,杂货店更直观).
(Right now I am curious about paths of users on our sites, but you know, grocery store is more visual).
更新 2:
这是一个简单的案例:
Update 2:
Here is a simple case:
CREATE Table #StoreActivity
(
id int,
CreationDate datetime ,
Isle int,
UserId int
)
Insert INTO #StoreActivity
Values
(1, CAST('12-1-2011 03:10:01' AS Datetime), 1, 2222),
(2, CAST('12-1-2011 03:10:07' AS Datetime), 1, 1111),
(3, CAST('12-1-2011 03:10:12' AS Datetime), 2, 2222),
(4, CAST('12-1-2011 04:10:01' AS Datetime), 1, 2222),
(5, CAST('12-1-2011 04:10:23' AS Datetime), 2, 2222)
Select * from #StoreActivity
DROP Table #StoreActivity
/* So with the above data, we have 2 sequences if we declare a session or visit dead if there is no activity for a minute : `1,2` (With a count of 2), and `1` (with a count of 1)*/
推荐答案
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY TimeStamp, Id) AS rn,
ROW_NUMBER() OVER (PARTITION BY UserId, Item ORDER BY TimeStamp, Id) AS rnd
FROM mytable
)
SELECT *,
rnd - rn AS sequence
FROM q
sequence
列将在给定 UserId
的序列中的所有记录之间共享.您可以对其进行分组或做任何您喜欢的事情.
The sequence
column will be shared among all records in a sequence for a given UserId
. You can group on it or do whatever you like.
相关文章