Pandas、groupby 和特定月份的求和

2022-01-09 00:00:00 python pandas sum

问题描述

我有一个数据框:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 982 entries, 2009-10-30 00:00:00 to 2012-12-16 00:00:00
Data columns (total 4 columns):
rain        981  non-null values
temp_max    982  non-null values
temp_min    982  non-null values
temp        982  non-null values
dtypes: float64(4)

对于每年/每月的求和,我使用:

For summing per Year/Month i use :

mdata = data.groupby([lambda x: x.year, lambda x: x.month]).agg([sum])

但我需要季节性分析(夏季、冬季等),那么我如何创建特定月份的总和,例如每年的 [1 ,2 ,3]?

But i need Seasonal analysis (summer, winter etc), so how i can create the Sum of specific months like [1 ,2 ,3] of each year?


解决方案

是的,对我来说似乎很简洁的一种解决方案是使用 Seasons 字典,然后使用函数对数据进行分组.作为组键传递的任何函数,每个索引值都会调用一次,返回值用作组名.

Yes, one solution which seems neat to me is to use a Seasons dictionary and then group the data using a function. Any function passed as a group key is called once per index value and the return values are used as the group names.

import pandas as pd
import numpy as np
from pandas import DataFrame
import datetime
# Create a year's worth of data
base = datetime.date.today() - datetime.timedelta(365)
Datelist = [base + datetime.timedelta(days = x) for x in range(365)]
DF = DataFrame(np.random.rand(365), index = Datelist)

# Create a Seasonal Dictionary that will map months to seasons
SeasonDict = {11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Spring', 3: 'Spring', 4: 'Spring', 5: 'Summer', 6: 'Summer', 7: 'Summer', 
8: 'Autumn', 9: 'Autumn', 10: 'Autumn'}

# Write a function that will be used to group the data
def GroupFunc(x):
    return SeasonDict[x.month]

# Call the function with the groupby operation. 
Grouped = DF.groupby(GroupFunc)
Grouped.sum()

该函数获取每个索引值并在季节字典中查找月份并返回与月份键对应的值.该值随后成为组名.

The function takes each index value and looks up the month in the Seasons Dictionary and returns the value corresponding to the month key. This value then becomes the group name.

或者,您可以使用示例中的 lambda(效率更高,但我认为上面的内容更容易理解):

Alternatively you can use the lambda as in your example (which is more efficient, but I thought the above would be easier to understand):

DF.groupby(lambda x: SeasonDict[x.month]).sum()

根据评论的附加代码在我看来,您最好对数据进行切片.因此,您可以执行以下操作

ADDITIONAL CODE AS PER COMMENTS It seems to me like you would be better off slicing the data. So you could do the following

DF['Season'] = ""
for row in DF.index:
    DF.Season[row] = SeasonDict[row.month]
DFWinter = DF[DF.Season == 'Winter']

现在您有了一个包含冬季数据的新数据框,可以随意使用.不同之处在于 groupby 操作允许您对所有数据进行相同的操作,而听起来您想以不同的方式调查数据集不同部分的属性.为此,最好进行切片,在这种情况下使用布尔切片.

Now you have a new data frame with the winter data in, to play with as you desire. The difference is that the groupby operations allow you to undertake the same operations on all the data, whereas it sounds like you wanted to investigate the properties of different parts of your data set in different ways. To do that its better to slice, in this case using Boolean slicing.

相关文章