将 pandas 时区感知 DateTimeIndex 转换为天真的时间戳,但在特定时区

2022-01-16 00:00:00 python pandas datetime timezone

问题描述

您可以使用函数 tz_localize 使 Timestamp 或 DateTimeIndex 时区感知,但如何反其道而行之:如何将时区感知 Timestamp 转换为幼稚时间戳,同时保留其时区?

You can use the function tz_localize to make a Timestamp or DateTimeIndex timezone aware, but how can you do the opposite: how can you convert a timezone aware Timestamp to a naive one, while preserving its timezone?

一个例子:

In [82]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10, freq='s', tz="Europe/Brussels")

In [83]: t
Out[83]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels

我可以通过将时区设置为无来删除时区,但结果会转换为 UTC(12 点变为 10):

I could remove the timezone by setting it to None, but then the result is converted to UTC (12 o'clock became 10):

In [86]: t.tz = None

In [87]: t
Out[87]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 10:00:00, ..., 2013-05-18 10:00:09]
Length: 10, Freq: S, Timezone: None

是否有另一种方法可以将 DateTimeIndex 转换为天真的时区,但同时保留它设置的时区?

Is there another way I can convert a DateTimeIndex to timezone naive, but while preserving the timezone it was set in?

一些上下文关于我问这个的原因:我想使用时区天真的时间序列(以避免时区的额外麻烦,我不需要它们来处理我正在处理的情况).
但由于某种原因,我必须在我的本地时区(欧洲/布鲁塞尔)处理一个时区感知时间序列.由于我的所有其他数据都是时区幼稚(但以我的本地时区表示),我想将此时间序列转换为幼稚以进一步使用它,但它也必须以我的本地时区表示(所以只需删除时区信息,无需将 user-visible 时间转换为 UTC).

Some context on the reason I am asking this: I want to work with timezone naive timeseries (to avoid the extra hassle with timezones, and I do not need them for the case I am working on).
But for some reason, I have to deal with a timezone-aware timeseries in my local timezone (Europe/Brussels). As all my other data are timezone naive (but represented in my local timezone), I want to convert this timeseries to naive to further work with it, but it also has to be represented in my local timezone (so just remove the timezone info, without converting the user-visible time to UTC).

我知道时间实际上是内部存储为 UTC 并且仅在您表示它时转换为另一个时区,因此当我想离域"它时必须进行某种转换.例如,使用 python datetime 模块,您可以像这样删除"时区:

I know the time is actually internal stored as UTC and only converted to another timezone when you represent it, so there has to be some kind of conversion when I want to "delocalize" it. For example, with the python datetime module you can "remove" the timezone like this:

In [119]: d = pd.Timestamp("2013-05-18 12:00:00", tz="Europe/Brussels")

In [120]: d
Out[120]: <Timestamp: 2013-05-18 12:00:00+0200 CEST, tz=Europe/Brussels>

In [121]: d.replace(tzinfo=None)
Out[121]: <Timestamp: 2013-05-18 12:00:00> 

因此,基于此,我可以执行以下操作,但我认为在处理更大的时间序列时效率不会很高:

So, based on this, I could do the following, but I suppose this will not be very efficient when working with a larger timeseries:

In [124]: t
Out[124]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: S, Timezone: Europe/Brussels

In [125]: pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
Out[125]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-05-18 12:00:00, ..., 2013-05-18 12:00:09]
Length: 10, Freq: None, Timezone: None


解决方案

为了回答我自己的问题,此功能已同时添加到 pandas.从 pandas 0.15.0 开始,您可以使用 tz_localize(None) 删除时区,从而生成当地时间.
查看 whatsnew 条目:http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements

To answer my own question, this functionality has been added to pandas in the meantime. Starting from pandas 0.15.0, you can use tz_localize(None) to remove the timezone resulting in local time.
See the whatsnew entry: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#timezone-handling-improvements

所以以我上面的例子为例:

So with my example from above:

In [4]: t = pd.date_range(start="2013-05-18 12:00:00", periods=2, freq='H',
                          tz= "Europe/Brussels")

In [5]: t
Out[5]: DatetimeIndex(['2013-05-18 12:00:00+02:00', '2013-05-18 13:00:00+02:00'],
                       dtype='datetime64[ns, Europe/Brussels]', freq='H')

使用 tz_localize(None) 会删除时区信息,导致 本地时间:

using tz_localize(None) removes the timezone information resulting in naive local time:

In [6]: t.tz_localize(None)
Out[6]: DatetimeIndex(['2013-05-18 12:00:00', '2013-05-18 13:00:00'], 
                      dtype='datetime64[ns]', freq='H')

此外,您还可以使用 tz_convert(None) 删除时区信息但转换为 UTC,从而产生 naive UTC time:

Further, you can also use tz_convert(None) to remove the timezone information but converting to UTC, so yielding naive UTC time:

In [7]: t.tz_convert(None)
Out[7]: DatetimeIndex(['2013-05-18 10:00:00', '2013-05-18 11:00:00'], 
                      dtype='datetime64[ns]', freq='H')

<小时>

这比 datetime.replace 解决方案性能更高:

In [31]: t = pd.date_range(start="2013-05-18 12:00:00", periods=10000, freq='H',
                           tz="Europe/Brussels")

In [32]: %timeit t.tz_localize(None)
1000 loops, best of 3: 233 µs per loop

In [33]: %timeit pd.DatetimeIndex([i.replace(tzinfo=None) for i in t])
10 loops, best of 3: 99.7 ms per loop

相关文章