Python 按日期列出分组

2022-01-13 00:00:00 python list itertools timestamp date

问题描述

假设我有一个如下所示的列表:

Say I have a list looks like this:

[(datetime.datetime(2013, 8, 8, 1, 20, 15), 2060), (datetime.datetime(2013, 8, 9, 1, 6, 14), 2055), (datetime.datetime(2013, 8, 9, 1, 21, 1), 2050), (datetime.datetime(2013, 8, 10, 1, 5, 49), 2050), (datetime.datetime(2013, 8, 10, 1, 19, 51), 2050), (datetime.datetime(2013, 8, 11, 2, 4, 53), 2050), (datetime.datetime(2013, 8, 12, 0, 29, 45), 2050), (datetime.datetime(2013, 8, 12, 0, 44, 13), 2050), (datetime.datetime(2013, 8, 13, 0, 34, 13), 2050), (datetime.datetime(2013, 8, 13, 0, 47, 29), 2050), (datetime.datetime(2013, 8, 14, 1, 30, 39), 2050), (datetime.datetime(2013, 8, 14, 1, 33, 51), 2050), (datetime.datetime(2013, 8, 15, 0, 41, 1), 2050), (datetime.datetime(2013, 8, 15, 0, 54, 45), 2050), (datetime.datetime(2013, 8, 16, 0, 29, 57), 1950), (datetime.datetime(2013, 8, 16, 0, 43, 11), 1950), (datetime.datetime(2013, 8, 17, 0, 27, 4), 1950), (datetime.datetime(2013, 8, 17, 0, 42, 30), 1950), (datetime.datetime(2013, 8, 18, 0, 26, 26), 1950), (datetime.datetime(2013, 8, 18, 0, 43, 11), 1950), (datetime.datetime(2013, 8, 19, 0, 41, 49), 1950), (datetime.datetime(2013, 8, 20, 1, 10, 23), 1950), (datetime.datetime(2013, 8, 20, 1, 23, 44), 1950), (datetime.datetime(2013, 8, 21, 0, 47, 25), 1950), (datetime.datetime(2013, 8, 21, 1, 0, 12), 1950), (datetime.datetime(2013, 8, 22, 0, 45, 21), 1950), (datetime.datetime(2013, 8, 22, 1, 4, 33), 1950), (datetime.datetime(2013, 8, 23, 0, 51, 27), 1950), (datetime.datetime(2013, 8, 23, 1, 6, 36), 1950), (datetime.datetime(2013, 8, 24, 0, 41, 3), 1950), (datetime.datetime(2013, 8, 24, 0, 53, 14), 1950), (datetime.datetime(2013, 8, 25, 0, 29, 24), 1950), (datetime.datetime(2013, 8, 25, 0, 42, 40), 1950), (datetime.datetime(2013, 8, 26, 0, 28, 13), 1950), (datetime.datetime(2013, 8, 26, 0, 43, 30), 1950), (datetime.datetime(2013, 8, 27, 0, 30, 1), 1950), (datetime.datetime(2013, 8, 27, 0, 43, 43), 1950), (datetime.datetime(2013, 8, 28, 0, 33, 19), 1950), (datetime.datetime(2013, 8, 28, 0, 49, 11), 1950), (datetime.datetime(2013, 8, 29, 0, 26, 49), 1950), (datetime.datetime(2013, 8, 29, 0, 41, 21), 1950), (datetime.datetime(2013, 8, 30, 0, 26, 13), 1950), (datetime.datetime(2013, 8, 30, 0, 42, 9), 1950), (datetime.datetime(2013, 8, 31, 0, 23, 40), 1950), (datetime.datetime(2013, 8, 31, 0, 39, 49), 1950), (datetime.datetime(2013, 9, 1, 0, 22, 2), 1950), (datetime.datetime(2013, 9, 1, 0, 38, 16), 1950), (datetime.datetime(2013, 9, 2, 0, 21, 2), 1950), (datetime.datetime(2013, 9, 2, 0, 36, 19), 1950), (datetime.datetime(2013, 9, 3, 0, 22, 16), 1950), (datetime.datetime(2013, 9, 3, 0, 39, 2), 1900)]

很明显,您可以看到这是一个元组列表,每个元组中的第一个元素是一个时间戳.已采用良好格式,由以下人员生成:

clearly you could see that this is a list of tuple and the first element in each tuple is a timestamp. Already in good format, generated by:

datetime.strptime(record[0], timeFormat)

第二个元素是监控值.但是,每天可能有多个记录.例如,datetime.datetime(2013, 8, 9..) 上有两条记录,它们有两个不同的值 2055 和 2050.我想要的是实际上每天的最大值.所以在这种情况下.2055 将是 (2013, 8, 9) 的唯一记录.

And the second element is the monitoring value. However, there might be multiple records in each day. For example, there are two records on datetime.datetime(2013, 8, 9..), which have two different values 2055 and 2050. What I want is the actually the maximum in each day. So in this case. 2055 would be the only records for (2013, 8, 9).

我想知道 Python 中是否有一种方便的方法可以做到这一点.类似mysql的东西:

I am wondering would there be a handy way in Python to do that. Some thing similar like mysql:

select 
    date(timestamp), 
    max(value)
from table 
group by date(timestamp)

mysql 语句只是为了展示这个想法,我绝对想要一个 python 解决方案.

The mysql statement is just to show the idea and I definitely want a python solution.


解决方案

使用 itertools.groupby:

>>> records = [(datetime.datetime(2013, 8, 8, 1, 20, 15), 2060), ....]
>>> import itertools
>>> [(dt, max(v for d, v in grp)) for dt, grp in itertools.groupby(records, key=lambda x: x[0].date())]
[(datetime.date(2013, 8, 8), 2060),
 (datetime.date(2013, 8, 9), 2055),
 (datetime.date(2013, 8, 10), 2050),
 ...
]

注意:假设记录已排序.如果没有,您应该先按日期对它们进行排序.

NOTE: assumed that the records are sorted. If not, you should sort them first by dates.

相关文章