数组结构与结构数组

2022-01-10 00:00:00 arrays cuda struct c++

从我在这里读到的一些评论中,出于某种原因,最好使用 Structure of Arrays (SoA) 而不是 Array of Structures (AoS) 用于像 CUDA 这样的并行实现?如果这是真的,谁能解释为什么?提前致谢!

From some comments that I have read in here, for some reason it is preferable to have Structure of Arrays (SoA) over Array of Structures (AoS) for parallel implementations like CUDA? If that is true, can anyone explain why? Thanks in advance!

推荐答案

选择 AoS 还是 SoA 以获得最佳性能通常取决于访问模式.然而,这不仅限于 CUDA - 类似的考虑适用于性能可能会受到内存访问模式显着影响的任何架构,例如有缓存的地方或者连续内存访问性能更好的地方(例如 CUDA 中的合并内存访问).

Choice of AoS versus SoA for optimum performance usually depends on access pattern. This is not just limited to CUDA however - similar considerations apply for any architecture where performance can be significantly affected by memory access pattern, e.g. where you have caches or where performance is better with contiguous memory access (e.g. coalesced memory accesses in CUDA).

例如对于 RGB 像素与单独的 RGB 平面:

E.g. for RGB pixels versus separate RGB planes:

struct {
    uint8_t r, g, b;
} AoS[N];

struct {
    uint8_t r[N];
    uint8_t g[N];
    uint8_t b[N];
} SoA;

如果您要同时访问每个像素的 R/G/B 组件,那么 AoS 通常是有意义的,因为 R、G、B 组件的连续读取将是连续的,并且通常包含在同一缓存行中.对于 CUDA,这也意味着内存读/写合并.

If you are going to be accessing the R/G/B components of each pixel concurrently then AoS usually makes sense, since the successive reads of R, G, B components will be contiguous and usually contained within the same cache line. For CUDA this also means memory read/write coalescing.

但是,如果您要单独处理颜色平面,那么 SoA 可能是首选,例如如果你想通过某个比例因子来缩放所有 R 值,那么 SoA 意味着所有 R 分量都是连续的.

However if you are going to process color planes separately then SoA might be preferred, e.g. if you want to scale all R values by some scale factor, then SoA means that all R components will be contiguous.

另一个考虑因素是填充/对齐.对于上面的 RGB 示例,AoS 布局中的每个元素都对齐到 3 个字节的倍数,这对于 CUDA、SIMD 等可能不方便 - 在某些情况下,甚至可能需要在结构内填充以使对齐更方便(例如添加一个虚拟 uint8_t 元素以确保 4 字节对齐).然而,在 SoA 情况下,平面是字节对齐的,这对于某些算法/架构可能更方便.

One further consideration is padding/alignment. For the RGB example above each element in an AoS layout is aligned to a multiple of 3 bytes, which may not be convenient for CUDA, SIMD, et al - in some cases perhaps even requiring padding within the struct to make alignment more convenient (e.g. add a dummy uint8_t element to ensure 4 byte alignment). In the SoA case however the planes are byte aligned which can be more convenient for certain algorithms/architectures.

对于大多数图像处理类型的应用程序,AoS 场景更为常见,但对于其他应用程序或特定图像处理任务,情况可能并非总是如此.如果没有明显的选择,我会推荐 AoS 作为默认选择.

For most image processing type applications the AoS scenario is much more common, but for other applications, or for specific image processing tasks this may not always be the case. When there is no obvious choice I would recommend AoS as the default choice.

另请参阅此答案,了解有关 AoS 与 SoA 的更一般性讨论.

See also this answer for more general discussion of AoS v SoA.

相关文章