计算大数字的方差

2022-01-07 00:00:00 statistics math largenumber variance c++

我还没有真正使用过那么多的方差计算,我也不知道会发生什么.其实我数学一点都不好.

I haven't really used variance calculation that much, and I don't know quite what to expect. Actually I'm not too good with math at all.

我有一个由 0-10000 范围内的 1000000 个随机数值组成的数组.

I have a an array of 1000000 random numeric values in the range 0-10000.

数组可以变得更大,所以我使用 64 位 int 来求和.

The array could grow even larger, so I use 64 bit int for sum.

我试图找到关于如何计算方差的代码,但我不知道我是否得到了正确的输出.

I have tried to find code on how to calc variance, but I don't know if I get correct output.

平均值为 4692,中位数为 4533.我使用以下代码得到方差 1483780.469308:

The mean is 4692 and median is 4533. I get variance 1483780.469308 using the following code:

// size is the element count, in this case 1000000
// value_sum is __int64

double p2 = pow( (double)(value_sum - (value_sum/size)), (double)2.0 );
double variance = sqrt( (double)(p2 / (size-1)) );

我得到了一个合理的价值吗?

Am I getting a reasonable value?

计算有问题吗?

推荐答案

注意:这看起来不像是在计算方差.

Note: It doesn't look like you're calculating the variance.

方差是通过从每个元素中减去平均值并计算这些差异的加权和来计算的.

Variance is calculated by subtracting the mean from every element and calculating the weighted sum of these differences.

所以你需要做的是:

// Get mean
double mean = static_cast<double>(value_sum)/size;

// Calculate variance
double variance = 0;
for(int i = 0;i<size;++i) 
{
  variance += (MyArray[i]-mean)*(MyArray[i]-mean)/size;
}

// Display
cout<<variance;

请注意,这是样本方差,在潜在分布未知时使用(因此我们假设分布均匀).

Note that this is the sample variance, and is used when the underlying distribution is unknown (so we assume a uniform distribution).

此外,经过一番挖掘,我发现这不是一个无偏估计.Wolfram Alpha 对此有话要说,但作为一个例子,当 MATLAB 计算方差,它返回偏差校正样本方差".

Also, after some digging around, I found that this is not an unbiased estimator. Wolfram Alpha has something to say about this, but as an example, when MATLAB computes the variance, it returns the "bias-corrected sample variance".

偏差修正后的方差可以用每个元素除以size-1得到,或者:

The bias-corrected variance can be obtained by dividing by each element by size-1, or:

//Please check that size > 1
variance += (MyArray[i]-mean)*(MyArray[i]-mean)/(size-1); 

还要注意的是,mean 的值保持不变.

Also note that, the value of mean remains the same.

相关文章