为什么迭代二维数组行专业比列专业快?

这里是一个简单的 C++ 代码,用于比较迭代二维数组行主和列主.

Here is simple C++ code that compare iterating 2D array row major with column major.

#include <iostream>
#include <ctime>

using namespace std;

const int d = 10000;

int** A = new int* [d];

int main(int argc, const char * argv[]) {
    for(int i = 0; i < d; ++i)
        A[i] = new int [d];
    
    clock_t ColMajor = clock();
    
    for(int b = 0; b < d; ++b)
        for(int a = 0; a < d; ++a)
            A[a][b]++;
    
    double col = static_cast<double>(clock() - ColMajor) / CLOCKS_PER_SEC;
    
    clock_t RowMajor = clock();
    for(int a = 0; a < d; ++a)
        for(int b = 0; b < d; ++b)
            A[a][b]++;
    
    double row = static_cast<double>(clock() - RowMajor) / CLOCKS_PER_SEC;
    

    
    cout << "Row Major : " << row;
    cout << "
Column Major : " << col;

    return 0;
}

d 不同值的结果:

d = 10^3:

主要行:0.002431

Row Major : 0.002431

列主要:0.017186

Column Major : 0.017186

d = 10^4:

主要行:0.237995

Row Major : 0.237995

列主要:2.04471

Column Major : 2.04471

d = 10^5

行主要:53.9561

Row Major : 53.9561

列专业:444.339

Column Major : 444.339

现在的问题是为什么行专业比列专业快?

Now the question is why row major is faster than column major?

推荐答案

这显然取决于您使用的机器,但非常笼统地说:

It obviously depends on the machine you're on but very generally speaking:

  1. 您的计算机将程序的部分内存存储在缓存中,该缓存的延迟比主内存小得多(即使在补偿缓存命中时间时也是如此).

  1. Your computer stores parts of your program's memory in a cache that has a much smaller latency than main memory (even when compensating for cache hit time).

C 数组以连续的行主要顺序存储.这意味着,如果您要求元素 x,则元素 x+1 将存储在主存储器中,直接位于存储 x 的位置之后.

C arrays are stored in a contiguous by row major order. This means if you ask for element x, then element x+1 is stored in main memory at a location directly following where x is stored.

您的计算机缓存通常会先发制人"地用尚未使用的内存地址填充缓存,但这些内存地址在本地接近您的程序已使用的内存.想象你的计算机在说:好吧,你想要地址 X 的内存,所以我假设你很快就会想要 X+1 的内存,因此我会先发制人地为你抓取它并将它放在你的缓存中".

It's typical for your computer cache to "pre-emptively" fill cache with memory addresses that haven't been used yet, but that are locally close to memory that your program has used already. Think of your computer as saying: "well, you wanted memory at address X so I am going to assume that you will shortly want memory at X+1, therefore I will pre-emptively grab that for you and place it in your cache".

当您通过行主要顺序枚举数组时,您是在枚举数组时,它以连续的方式存储在内存中,并且您的机器已经冒昧地为您将这些地址预加载到缓存中因为它猜到你想要它.因此,您可以获得更高的缓存命中率.当您以另一种非连续方式枚举数组时,您的机器可能无法预测您正在应用的内存访问模式,因此它无法为您抢先将内存地址拉入缓存,您赢了'不会招致尽可能多的缓存命中,因此必须更频繁地访问主内存,这比您的缓存慢.

When you enumerate your array via row major order, you're enumerating it in such a way where it's stored in a contiguous manner in memory, and your machine has already taken the liberty of pre-loading those addresses into cache for you because it guessed that you wanted it. Therefore you achieve a higher rate of cache hits. When you're enumerating an array in another non-contiguous manner then your machine likely won't predict the memory access pattern you're applying, so it wont be able to pre-emptively pull memory addresses into cache for you, and you won't incur as many cache hits, so main memory will have to be accessed more frequently which is slower than your cache.

此外,这可能更适合 https://cs.stackexchange.com/,因为您的系统缓存的行为方式是在硬件中实现,空间局部性问题似乎更适合那里.

Also, this might be better suited for https://cs.stackexchange.com/ because the way your system cache behaves is implemented in hardware, and spatial locality questions seem better suited there.

相关文章