如何将年、月和日列合并到单个日期时间列?
问题描述
我有以下数据框df
:
id lat lon year month day
0 381 53.30660 -0.54649 2004 1 2
1 381 53.30660 -0.54649 2004 1 3
2 381 53.30660 -0.54649 2004 1 4
我想创建一个新列 df['Date']
,其中 year
、month
和 day
列按 yyyy-md
格式组合.
and I want to create a new column df['Date']
where the year
, month
, and day
columns are combined according to the format yyyy-m-d
.
在这篇文章之后,我做到了:
`df['Date']=pd.to_datetime(df['year']*10000000000
+df['month']*100000000
+df['day']*1000000,
format='%Y-%m-%d%')`
结果不是我预期的,因为它是从 1970 年而不是 2004 年开始的,而且它还包含我没有指定的小时戳:
The result is not what I expected, as it starts from 1970 instead of 2004, and it also contains the hour stamp, which I did not specify:
id lat lon year month day Date
0 381 53.30660 -0.54649 2004 1 2 1970-01-01 05:34:00.102
1 381 53.30660 -0.54649 2004 1 3 1970-01-01 05:34:00.103
2 381 53.30660 -0.54649 2004 1 4 1970-01-01 05:34:00.104
由于日期应该是 2004-1-2
格式,我做错了什么?
As the dates should be in the 2004-1-2
format, what am I doing wrong?
解决方案
有一个更简单的方法:
In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])
In [251]: df
Out[251]:
id lat lon year month day Date
0 381 53.3066 -0.54649 2004 1 2 2004-01-02
1 381 53.3066 -0.54649 2004 1 3 2004-01-03
2 381 53.3066 -0.54649 2004 1 4 2004-01-04
来自 文档:
从 DataFrame 的多列中组装日期时间.按键可以是常见的缩写,如 [year
、month
、day
、minute
、second
、ms
、us
、ns
])或相同的复数形式
Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [
year
,month
,day
,minute
,second
,ms
,us
,ns
]) or plurals of the same
相关文章