如何在 pandas 数据框中对最大和最小时间戳进行分组

2022-01-22 00:00:00 python pandas pivot

问题描述

我想对数据集进行分组并返回最大和最小时间戳.这是我的数据

I want to group a dataset and return the maximum and minimum timestamp. Here's my data

id timestamp 1 2017-09-17 10:09:01 2 2017-10-02 01:13:15 1 2017-09-17 10:53:07 1 2017-09-17 10:52:18 2 2017-09-12 21:59:40

这是我想要的输出

id max min 1 2017-09-17 10:53:07 2017-09-17 10:09:01 2 2017-10-02 01:13:15 2017-09-12 21:59:40

这就是我所做的，代码似乎效率不高，我希望在 pandas 上有更好的方法来做到这一点

Here's what I did, the code seems not efficient, I hope theres better way to do this on pandas

data1 = df.sort_values('timestamp').drop_duplicates(['customer_id'], keep='last') data2 = df.sort_values('timestamp').drop_duplicates(['customer_id'], keep='first') data1['max'] = data1['timestamp'] data2['min'] = data2['timestamp'] data = data1.merge(data2, on = 'customer_id', how='left') data = data.drop(['timestamp_x','timestamp_y'], axis=1)

熊猫似乎有这种枢轴

解决方案

我觉得需要agg:

df = df.groupby('id')['timestamp'].agg(['min','max']).reset_index() print (df) id min max 0 1 2017-09-17 10:09:01 2017-09-17 10:53:07 1 2 2017-09-12 21:59:40 2017-10-02 01:13:15

或者稍微修改一下你的解决方案(应该会更快):

Or a bit modify your solution (should be faster):

data = df.sort_values('timestamp') data1 = data.drop_duplicates(['id'], keep='last').set_index('id') data2 = data.drop_duplicates(['id'], keep='first').set_index('id') df = pd.concat([data1['timestamp'], data2['timestamp']],keys=('max','min'), axis=1) print (df) max min id 1 2017-09-17 10:53:07 2017-09-17 10:09:01 2 2017-10-02 01:13:15 2017-09-12 21:59:40

相关文章