从一维 NumPy 数组中创建 NaN 填充元素的滑动窗口

2022-01-11 00:00:00 numpy scipy time-series performance vectorization

问题描述

我有一个时间序列 x[0], x[1], ... x[n-1]，存储为一维 numpy 数组.我想将其转换为以下矩阵:

I have a time series x[0], x[1], ... x[n-1], stored as a 1 dimensional numpy array. I would like to convert it to the following matrix:

NaN, ... , NaN , x[0] NaN, ... , x[0], x[1] . . NaN, x[0], ... , x[n-3],x[n-2] x[0], x[1], ... , x[n-2],x[n-1]

我想使用这个矩阵来加速时间序列计算.numpy 或 scipy 中是否有函数可以执行此操作?(我不想在python中使用for循环来做)

I would like to use this matrix to speedup time-series calculations. Is there a function in numpy or scipy to do this? (I don't want to use for loop in python to do it)

解决方案

一种方法 np.lib.stride_tricks.as_strided -

One approach with np.lib.stride_tricks.as_strided -

def nanpad_sliding2D(a): L = a.size a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a)) n = a_ext.strides[0] strided = np.lib.stride_tricks.as_strided return strided(a_ext, shape=(L,L), strides=(n,n))

示例运行 -

In [41]: a Out[41]: array([48, 82, 96, 34, 93, 25, 51, 26]) In [42]: nanpad_sliding2D(a) Out[42]: array([[ nan, nan, nan, nan, nan, nan, nan, 48.], [ nan, nan, nan, nan, nan, nan, 48., 82.], [ nan, nan, nan, nan, nan, 48., 82., 96.], [ nan, nan, nan, nan, 48., 82., 96., 34.], [ nan, nan, nan, 48., 82., 96., 34., 93.], [ nan, nan, 48., 82., 96., 34., 93., 25.], [ nan, 48., 82., 96., 34., 93., 25., 51.], [ 48., 82., 96., 34., 93., 25., 51., 26.]])

strides

正如@Eric 的评论中所提到的，这种基于步幅的方法将是一种内存效率高的方法，因为输出只是对 NaNs-padded 1D 的视图版本.让我们测试一下 -

As mentioned in the comments by @Eric, this strides based approach would be a memory efficient one as the output would be simply a view into the NaNs-padded 1D version. Let's test this out -

In [158]: a # Sample 1D input Out[158]: array([37, 95, 87, 10, 35]) In [159]: L = a.size # Run the posted approach ...: a_ext = np.concatenate(( np.full(a.size-1,np.nan) ,a)) ...: n = a_ext.strides[0] ...: strided = np.lib.stride_tricks.as_strided ...: out = strided(a_ext, shape=(L,L), strides=(n,n)) ...: In [160]: np.may_share_memory(a_ext,out) O/p might be a view into extended version Out[160]: True

让我们通过将值赋给 a_ext 然后检查 out 来确认输出确实是一个视图.

Let's confirm that the output is actually a view indeed by assigning values into a_ext and then checking out.

a_ext 和 out 的初始值:

In [161]: a_ext Out[161]: array([ nan, nan, nan, nan, 37., 95., 87., 10., 35.]) In [162]: out Out[162]: array([[ nan, nan, nan, nan, 37.], [ nan, nan, nan, 37., 95.], [ nan, nan, 37., 95., 87.], [ nan, 37., 95., 87., 10.], [ 37., 95., 87., 10., 35.]])

修改a_ext:

In [163]: a_ext[:] = 100

查看新的out:

In [164]: out Out[164]: array([[ 100., 100., 100., 100., 100.], [ 100., 100., 100., 100., 100.], [ 100., 100., 100., 100., 100.], [ 100., 100., 100., 100., 100.], [ 100., 100., 100., 100., 100.]])

确认这是一个视图.

最后，让我们测试一下内存需求:

Finally, let's test out the memory requirements :

In [131]: a_ext.nbytes Out[131]: 72 In [132]: out.nbytes Out[132]: 200

因此，即使显示为 200 字节的输出实际上也只是 72 字节，因为它是扩展数组的视图，其大小为 72 个字节.

So, the output even though it shows as 200 bytes is actually just 72 bytes because of being a view into the extended array that has a size of 72 bytes.

Scipy's toeplitz -

from scipy.linalg import toeplitz out = toeplitz(a, np.full(a.size,np.nan) )[:,::-1]

相关文章