按10分钟间隔对 pandas 数据帧进行分组
问题描述
给定以下 pandas 数据帧:
timestamp
0 2018-10-05 23:07:02
1 2018-10-05 23:07:13
2 2018-10-05 23:07:23
3 2018-10-05 23:07:36
4 2018-10-05 23:08:02
5 2018-10-05 23:09:16
6 2018-10-05 23:09:21
7 2018-10-05 23:09:39
8 2018-10-05 23:09:47
9 2018-10-05 23:10:01
10 2018-10-05 23:10:11
11 2018-10-05 23:10:23
12 2018-10-05 23:10:59
13 2018-10-05 23:11:03
14 2018-10-08 03:35:32
15 2018-10-08 03:35:58
16 2018-10-08 03:37:16
17 2018-10-08 03:38:04
18 2018-10-08 03:38:30
19 2018-10-08 03:38:36
20 2018-10-08 03:38:42
21 2018-10-08 03:38:52
22 2018-10-08 03:38:57
23 2018-10-08 03:39:10
24 2018-10-08 03:39:27
25 2018-10-08 03:40:47
26 2018-10-08 03:40:54
27 2018-10-08 03:41:02
28 2018-10-08 03:41:12
29 2018-10-08 03:41:32
如何在每行10分钟的时间段内进行标记?例如:
timestamp 10min_period
0 2018-10-05 23:07:02 period_1
2 2018-10-05 23:07:23 period_1
1 2018-10-05 23:07:13 period_1
2 2018-10-05 23:07:23 period_1
3 2018-10-05 23:07:36 period_1
4 2018-10-05 23:08:02 period_1
5 2018-10-05 23:09:16 period_1
6 2018-10-05 23:09:21 period_1
7 2018-10-05 23:09:39 period_1
8 2018-10-05 23:09:47 period_1
9 2018-10-05 23:10:01 period_1
10 2018-10-05 23:10:11 period_1
11 2018-10-05 23:10:23 period_1
12 2018-10-05 23:10:59 period_1
13 2018-10-05 23:11:03 period_1
14 2018-10-08 03:35:32 period_2
15 2018-10-08 03:35:58 period_2
16 2018-10-08 03:37:16 period_2
17 2018-10-08 03:38:04 period_2
18 2018-10-08 03:38:30 period_2
19 2018-10-08 03:38:36 period_2
20 2018-10-08 03:38:42 period_2
21 2018-10-08 03:38:52 period_2
22 2018-10-08 03:38:57 period_2
23 2018-10-08 03:39:10 period_2
24 2018-10-08 03:39:27 period_2
25 2018-10-08 03:40:47 period_2
26 2018-10-08 04:40:54 period_3
27 2018-10-08 04:41:02 period_3
28 2018-10-08 04:41:12 period_3
29 2018-10-08 04:41:32 period_3
正如您在上面的预期输出中看到的,每个period_n
标签都是通过计算10分钟的时间段来创建的,当日期时间序列超过10分钟的阈值时,就会创建一个新的标签。我尝试使用dt.floor(10Min)
对象,但是,它不起作用,因为它没有记录从哪里开始,从哪里结束,计算10分钟的时间。我也试着:
a = df['timestamp'].offsets.DateOffset(minutes=10)
然而,它不起作用。你知道如何将我的df分割成10分钟的时间段吗?这项质询与其他质询不同,因为我没有指明何时开始计算。也就是说,我从第一个DateTime行实例开始计数,并从该实例开始计算10个时间分钟的周期。
更新:
转换为DateTime对象后,我还尝试
df['timestamp'].groupby(pd.TimeGrouper(freq='10Min'))
但是,我收到了:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
解决方案
df['timestamp'] = pd.to_datetime(df['timestamp'])
diffs = df['timestamp'] - df['timestamp'].shift()
laps = diffs > pd.Timedelta('10 min')
periods = laps.cumsum().apply(lambda x: 'period_{}'.format(x+1))
df['10min_period'] = periods
相关文章