按时间值进行透视排序 - Pandas
问题描述
我想 pivot
一个 df
并显示基于时间值的值,而不是列值.
I want to pivot
a df
and display values based off time values, not column values.
df = pd.DataFrame({
'Place' : ['John','Alan','Cory','Jim','John','Alan','Cory','Jim'],
'Number' : ['2','3','5','5','3','4','6','6'],
'Code' : ['1','2','3','4','1','2','3','4'],
'Time' : ['1904-01-01 08:00:00','1904-01-01 09:00:00','1904-01-02 01:00:00','1904-01-02 02:00:00','1904-01-01 08:10:00','1904-01-01 09:10:00','1904-01-02 01:10:00','1904-01-02 02:10:00'],
})
df = df.pivot_table(index = 'Number', columns = 'Place', values = 'Time', aggfunc = 'first').fillna('')
输出:
Place Alan Cory Jim John
Number
2 1904-01-01 08:00:00
3 1904-01-01 09:00:00 1904-01-01 08:10:00
4 1904-01-01 09:10:00
5 1904-01-02 01:00:00 1904-01-02 02:00:00
6 1904-01-02 01:10:00 1904-01-02 02:10:00
预期输出:
Place John Alan Cory Jim
Number
2 1904-01-01 08:00:00
3 1904-01-01 08:10:00 1904-01-01 09:00:00
4 1904-01-01 09:10:00
5 1904-01-02 01:00:00 1904-01-02 02:00:00
6 1904-01-02 01:10:00 1904-01-02 02:10:00
注意:我只添加了一个虚拟日期来区分午夜之后的时间.一旦 df
被适当地排序,我最终会放弃日期并留下时间.
Note: I've only added a dummy dates to differentiate for times after midnight. I will eventually drop the dates and just leave the times once the df
is appropriately sorted.
解决方案
不幸的是 pivot_table
默认对列名进行排序,并且没有参数以避免它.所以可能的解决方案是 DataFrame.reindex
列 Place
的原始唯一值:
Unfortunately pivot_table
sorting columns names by default and no paramater for avoid it. So possible solution is DataFrame.reindex
by original unique values of column Place
:
#if necessary convert to datetimes and sorting
df['Time'] = pd.to_datetime(df['Time'])
df = df.sort_values('Time')
df1 = df.pivot_table(index='Number',columns='Place',values='Time',aggfunc='first').fillna('')
df1 = df1.reindex(columns=df['Place'].unique())
print (df1)
Place John Alan Cory
Number
2 1904-01-01 08:00:00
3 1904-01-01 08:10:00 1904-01-01 09:00:00
4 1904-01-01 09:10:00
5 1904-01-02 01:00:00
6 1904-01-02 01:10:00
Place Jim
Number
2
3
4
5 1904-01-02 02:00:00
6 1904-01-02 02:10:00
相关文章