KDB+ like asof 加入 pandas 中的时间序列数据?

2022-01-11 00:00:00 python pandas join time-series kdb

问题描述

kdb+ 有一个 aj 函数，通常用于沿时间连接表列.

kdb+ has an aj function that is usually used to join tables along time columns.

这是一个示例，其中我有交易和报价表，并且我获得了每笔交易的现行报价.

Here is an example where I have trade and quote tables and I get the prevailing quote for every trade.

q)5# t time sym price size ----------------------------- 09:30:00.439 NVDA 13.42 60511 09:30:00.439 NVDA 13.42 60511 09:30:02.332 NVDA 13.42 100 09:30:02.332 NVDA 13.42 100 09:30:02.333 NVDA 13.41 100 q)5# q time sym bid ask bsize asize ----------------------------------------- 09:30:00.026 NVDA 13.34 13.44 3 16 09:30:00.043 NVDA 13.34 13.44 3 17 09:30:00.121 NVDA 13.36 13.65 1 10 09:30:00.386 NVDA 13.36 13.52 21 1 09:30:00.440 NVDA 13.4 13.44 15 17 q)5# aj[`time; t; q] time sym price size bid ask bsize asize ----------------------------------------------------- 09:30:00.439 NVDA 13.42 60511 13.36 13.52 21 1 09:30:00.439 NVDA 13.42 60511 13.36 13.52 21 1 09:30:02.332 NVDA 13.42 100 13.34 13.61 1 1 09:30:02.332 NVDA 13.42 100 13.34 13.61 1 1 09:30:02.333 NVDA 13.41 100 13.34 13.51 1 1

如何使用 pandas 执行相同的操作?我正在使用索引为 datetime64 的交易和报价数据框.

How can I do the same operation using pandas? I am working with trade and quote dataframes where the index is datetime64.

In [55]: quotes.head() Out[55]: bid ask bsize asize 2012-09-06 09:30:00.026000 13.34 13.44 3 16 2012-09-06 09:30:00.043000 13.34 13.44 3 17 2012-09-06 09:30:00.121000 13.36 13.65 1 10 2012-09-06 09:30:00.386000 13.36 13.52 21 1 2012-09-06 09:30:00.440000 13.40 13.44 15 17 In [56]: trades.head() Out[56]: price size 2012-09-06 09:30:00.439000 13.42 60511 2012-09-06 09:30:00.439000 13.42 60511 2012-09-06 09:30:02.332000 13.42 100 2012-09-06 09:30:02.332000 13.42 100 2012-09-06 09:30:02.333000 13.41 100

我看到 pandas 有一个 asof 函数，但它没有在 DataFrame 上定义，只在 Series 对象上.我想可以循环遍历每个系列并将它们一一对齐，但我想知道是否有更好的方法?

I see that pandas has an asof function but that is not defined on the DataFrame, only on the Series object. I guess one could loop through each of the Series and align them one by one, but I am wondering if there is a better way?

解决方案

正如你在问题中提到的，遍历每一列应该适合你:

As you mentioned in the question, looping through each column should work for you:

df1.apply(lambda x: x.asof(df2.index))

我们可能会创建一个更快的 NaN-naive 版本的 DataFrame.asof 来一次性完成所有列.但就目前而言，我认为这是最直接的方法.

We could potentially create a faster NaN-naive version of DataFrame.asof to do all the columns in one shot. But for now, I think this is the most straightforward way.

相关文章