用 numpy 估计周期性的自相关

2022-01-11 00:00:00 python numpy time-series correlation

问题描述

我有大量的时间序列 (> 500)，我想只选择那些周期性的.我做了一些文献研究，发现我应该寻找自相关.使用 numpy 我将自相关计算为:

I have a large set of time series (> 500), I'd like to select only the ones that are periodic. I did a bit of literature research and I found out that I should look for autocorrelation. Using numpy I calculate the autocorrelation as:

def autocorr(x): norm = x - np.mean(x) result = np.correlate(norm, norm, mode='full') acorr = result[result.size/2:] acorr /= ( x.var() * np.arange(x.size, 0, -1) ) return acorr

这会返回一组系数 (r?)，当绘图时应该告诉我时间序列是否是周期性的.

This returns a set of coefficients (r?) that when plot should tell me if the time series is periodic or not.

我生成了两个玩具示例:

I generated two toy examples:

#random signal s1 = np.random.randint(5, size=80) #periodic signal s2 = np.array([5,2,3,1] * 20)

当我生成我获得的自相关图时:

When I generate the autocorrelation plots I obtain:

第二个自相关向量清楚地表明了一些周期性:

The second autocorrelation vector clearly indicates some periodicity:

Autocorr1 = [1, 0.28, -0.06, 0.19, -0.22, -0.13, 0.07 ..] Autocorr2 = [1, -0.50, -0.49, 1, -0.50, -0.49, 1 ..]

我的问题是，如何根据自相关向量自动确定时间序列是否是周期性的?有没有办法将这些值总结为一个系数，例如if = 1 完美周期性，if = 0 完全没有周期性.我试图计算平均值，但它没有意义.我应该看数字1吗?

My question is, how can I automatically determine, from the autocorrelation vector, if a time series is periodic? Is there a way to summarise the values into a single coefficient, e.g. if = 1 perfect periodicity, if = 0 no periodicity at all. I tried to calculate the mean but it is not meaningful. Should I look at the number of 1?

解决方案

我会使用 mode='same' 而不是 mode='full' 因为使用 mode='full' 我们可以获得极端变化的协方差，其中只有 1 个数组元素与自身重叠，其余为零.这些不会很有趣.使用 mode='same' 至少一半的移位数组与原始数组重叠.

I would use mode='same' instead of mode='full' because with mode='full' we get covariances for extreme shifts, where just 1 array element overlaps self, the rest being zeros. Those are not going to be interesting. With mode='same' at least half of the shifted array overlaps the original one.

此外，要获得真正的相关系数 (r)，您需要除以重叠的大小，而不是原始 x 的大小.(在我的代码中，这些是 np.arange(n-1, n//2, -1)).那么每个输出将在 -1 和 1 之间.

Also, to have the true correlation coefficient (r) you need to divide by the size of the overlap, not by the size of the original x. (in my code these are np.arange(n-1, n//2, -1)). Then each of the outputs will be between -1 and 1.

一目了然Durbin–Watson statistic，类似于2(1-r) 表明人们认为其值低于 1 是自相关的重要指示，对应于 r > 0.5.所以这就是我在下面使用的.有关自相关重要性的统计合理处理，请参阅统计文献；一个起点是为您的时间序列建立一个模型.

A glance at Durbin–Watson statistic, which is similar to 2(1-r), suggests that people consider its values below 1 to be a significant indication of autocorrelation, which corresponds to r > 0.5. So this is what I use below. For a statistically sound treatment of the significance of autocorrelation refer to statistics literature; a starting point would be to have a model for your time series.

def autocorr(x): n = x.size norm = (x - np.mean(x)) result = np.correlate(norm, norm, mode='same') acorr = result[n//2 + 1:] / (x.var() * np.arange(n-1, n//2, -1)) lag = np.abs(acorr).argmax() + 1 r = acorr[lag-1] if np.abs(r) > 0.5: print('Appears to be autocorrelated with r = {}, lag = {}'. format(r, lag)) else: print('Appears to be not autocorrelated') return r, lag

您的两个玩具示例的输出:

Output for your two toy examples:

似乎不是自相关
似乎与 r = 1.0, lag = 4 自相关

Appears to be not autocorrelated
Appears to be autocorrelated with r = 1.0, lag = 4

相关文章