按时间值进行透视排序 - Pandas

2022-01-22 00:00:00 pandas pivot

问题描述

我想 pivot 一个 df 并显示基于时间值的值,而不是列值.

I want to pivot a df and display values based off time values, not column values.

df = pd.DataFrame({
    'Place' : ['John','Alan','Cory','Jim','John','Alan','Cory','Jim'],                                
    'Number' : ['2','3','5','5','3','4','6','6'],          
    'Code' : ['1','2','3','4','1','2','3','4'],                      
    'Time' : ['1904-01-01 08:00:00','1904-01-01 09:00:00','1904-01-02 01:00:00','1904-01-02 02:00:00','1904-01-01 08:10:00','1904-01-01 09:10:00','1904-01-02 01:10:00','1904-01-02 02:10:00'],                           
    })

df = df.pivot_table(index = 'Number', columns = 'Place', values = 'Time', aggfunc = 'first').fillna('')

输出:

Place                  Alan                 Cory                  Jim                 John
Number                                                                                    
2                                                                      1904-01-01 08:00:00
3       1904-01-01 09:00:00                                            1904-01-01 08:10:00
4       1904-01-01 09:10:00                                                               
5                            1904-01-02 01:00:00  1904-01-02 02:00:00                     
6                            1904-01-02 01:10:00  1904-01-02 02:10:00 

预期输出:

Place                  John                 Alan                 Cory                  Jim
Number                                                                                    
2       1904-01-01 08:00:00                                                               
3       1904-01-01 08:10:00  1904-01-01 09:00:00                                          
4                            1904-01-01 09:10:00                                          
5                                                 1904-01-02 01:00:00  1904-01-02 02:00:00
6                                                 1904-01-02 01:10:00  1904-01-02 02:10:00             

注意:我只添加了一个虚拟日期来区分午夜之后的时间.一旦 df 被适当地排序,我最终会放弃日期并留下时间.

Note: I've only added a dummy dates to differentiate for times after midnight. I will eventually drop the dates and just leave the times once the df is appropriately sorted.


解决方案

不幸的是 pivot_table 默认对列名进行排序,并且没有参数以避免它.所以可能的解决方案是 DataFrame.reindexPlace 的原始唯一值:

Unfortunately pivot_table sorting columns names by default and no paramater for avoid it. So possible solution is DataFrame.reindex by original unique values of column Place:

#if necessary convert to datetimes and sorting
df['Time'] = pd.to_datetime(df['Time'])
df = df.sort_values('Time')
df1 = df.pivot_table(index='Number',columns='Place',values='Time',aggfunc='first').fillna('')

df1 = df1.reindex(columns=df['Place'].unique())
print (df1)
Place                  John                 Alan                 Cory  
Number                                                                  
2       1904-01-01 08:00:00                                             
3       1904-01-01 08:10:00  1904-01-01 09:00:00                        
4                            1904-01-01 09:10:00                        
5                                                 1904-01-02 01:00:00   
6                                                 1904-01-02 01:10:00   

Place                   Jim  
Number                       
2                            
3                            
4                            
5       1904-01-02 02:00:00  
6       1904-01-02 02:10:00  

相关文章