在 Pandas 中查找与给定时间最近的 DataFrame 行
问题描述
我有一个由 DatetimeIndex 索引的 Pandas 数据框:
I have a Pandas dataframe which is indexed by a DatetimeIndex:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss) 53732 non-null values
Julian_Day 53732 non-null values
AOT_870 53732 non-null values
440-870Angstrom 53732 non-null values
440-675Angstrom 53732 non-null values
500-870Angstrom 53732 non-null values
Last_Processing_Date(dd/mm/yyyy) 53732 non-null values
Solar_Zenith_Angle 53732 non-null values
time 53732 non-null values
dtypes: datetime64[ns](2), float64(6), object(1)
我想找到最接近某个时间的行:
I want to find the row that is closest to a certain time:
image_time = dateutil.parser.parse('2009-07-28 13:39:02')
并找出它有多接近.到目前为止,我已经尝试了各种基于从所有时间中减去我想要的时间并找到最小绝对值的想法,但似乎都没有奏效.
and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work.
例如:
aeronet.index - image_time
给出一个错误,我认为这是由于日期时间索引上的 +/- 发生了变化,所以我尝试将索引放入另一列,然后进行处理:
Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that:
aeronet['time'] = aeronet.index
aeronet.time - image_time
这似乎可行,但要做我想做的事,我需要获得绝对时间差,而不是相对差.但是,仅在其上运行 abs
或 np.abs
会出现错误:
This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running abs
or np.abs
on it gives an error:
abs(aeronet.time - image_time)
C:Python27libsite-packagespandascoreseries.pyc in __repr__(self)
1061 Yields Bytestring in Py2, Unicode String in py3.
1062 """
-> 1063 return str(self)
1064
1065 def _tidy_repr(self, max_vals=20):
C:Python27libsite-packagespandascoreseries.pyc in __str__(self)
1021 if py3compat.PY3:
1022 return self.__unicode__()
-> 1023 return self.__bytes__()
1024
1025 def __bytes__(self):
C:Python27libsite-packagespandascoreseries.pyc in __bytes__(self)
1031 """
1032 encoding = com.get_option("display.encoding")
-> 1033 return self.__unicode__().encode(encoding, 'replace')
1034
1035 def __unicode__(self):
C:Python27libsite-packagespandascoreseries.pyc in __unicode__(self)
1044 else get_option("display.max_rows"))
1045 if len(self.index) > (max_rows or 1000):
-> 1046 result = self._tidy_repr(min(30, max_rows - 4))
1047 elif len(self.index) > 0:
1048 result = self._get_repr(print_header=True,
C:Python27libsite-packagespandascoreseries.pyc in _tidy_repr(self, max_vals)
1069 """
1070 num = max_vals // 2
-> 1071 head = self[:num]._get_repr(print_header=True, length=False,
1072 name=False)
1073 tail = self[-(max_vals - num):]._get_repr(print_header=False,
AttributeError: 'numpy.ndarray' object has no attribute '_get_repr'
我是否以正确的方式处理这个问题?如果是这样,我应该如何让 abs
工作,以便我可以选择最小绝对时差,从而获得最接近的时间.如果不是,那么使用 Pandas 时间序列的最佳方法是什么?
Am I approaching this the right way? If so, how should I get abs
to work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?
解决方案
我认为您可以尝试 DatetimeIndex.asof
来查找包含输入的最新标签.然后使用返回的日期时间选择适当的行.如果您只需要特定列的值,Series.asof
存在并将上述两个步骤合二为一.
I think you can try DatetimeIndex.asof
to find the most recent label up to and including the input. Then use the returned datetime to select the appropriate row.
If you only need values for a particular column, Series.asof
exists and combines the two steps above into one.
这假设您想要最接近的日期时间.如果你不关心日期,只想要每天相同的时间,请在 DataFrame 中使用 at_time
.
This assumes you want the closest datetime. If you don't care about the date and just want the same time every day, use at_time
in DataFrame.
误报,我在本地有一个旧版本.master 上的最新版本应该可以与 np.abs 一起使用.
false alarm, I had an older version locally. The latest on master should work with np.abs.
In [10]: np.abs(df.time - image_time)
Out[10]:
0 27 days, 13:39:02
1 26 days, 13:39:02
2 25 days, 13:39:02
3 24 days, 13:39:02
4 23 days, 13:39:02
5 22 days, 13:39:02
也只是为了澄清:
aeronet.index - image_time 不起作用,因为 Index 上的减法是一个设定的差异(过去 Index 曾经被限制为唯一).
aeronet.index - image_time doesn't work because subtraction on Index is a set difference (back in the day Index used to be constrained to be unique).
相关文章