如何将年、月和日列合并到单个日期时间列?

2022-01-13 00:00:00 python pandas datetime timestamp date

问题描述

我有以下数据框df:

        id  lat        lon      year    month   day         
0       381 53.30660   -0.54649 2004    1       2       
1       381 53.30660   -0.54649 2004    1       3            
2       381 53.30660   -0.54649 2004    1       4   

我想创建一个新列 df['Date'],其中 yearmonthday 列按 yyyy-md 格式组合.

and I want to create a new column df['Date'] where the year, month, and day columns are combined according to the format yyyy-m-d.

在这篇文章之后,我做到了:

`df['Date']=pd.to_datetime(df['year']*10000000000
                           +df['month']*100000000
                           +df['day']*1000000,
                           format='%Y-%m-%d%')`

结果不是我预期的,因为它是从 1970 年而不是 2004 年开始的,而且它还包含我没有指定的小时戳:

The result is not what I expected, as it starts from 1970 instead of 2004, and it also contains the hour stamp, which I did not specify:

        id  lat        lon      year    month   day  Date           
0       381 53.30660   -0.54649 2004    1       2    1970-01-01 05:34:00.102    
1       381 53.30660   -0.54649 2004    1       3    1970-01-01 05:34:00.103         
2       381 53.30660   -0.54649 2004    1       4    1970-01-01 05:34:00.104

由于日期应该是 2004-1-2 格式,我做错了什么?

As the dates should be in the 2004-1-2 format, what am I doing wrong?


解决方案

有一个更简单的方法:

In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])

In [251]: df
Out[251]:
    id      lat      lon  year  month  day       Date
0  381  53.3066 -0.54649  2004      1    2 2004-01-02
1  381  53.3066 -0.54649  2004      1    3 2004-01-03
2  381  53.3066 -0.54649  2004      1    4 2004-01-04

来自 文档:

从 DataFrame 的多列中组装日期时间.按键可以是常见的缩写,如 [yearmonthdayminutesecondmsusns])或相同的复数形式

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [year, month, day, minute, second, ms, us, ns]) or plurals of the same

相关文章