矢量的数据是如何对齐的?

2021-12-21 00:00:00 vector c++ sse allocator memory-alignment

如果我想用 SSE 处理 std::vector 中的数据,我需要 16 字节对齐.我怎样才能做到这一点?我需要编写自己的分配器吗?或者默认分配器是否已经与 16 字节边界对齐?

If I want to process data in a std::vector with SSE, I need 16 byte alignment. How can I achieve that? Do I need to write my own allocator? Or does the default allocator already align to 16 byte boundaries?

推荐答案

C++ 标准需要分配函数(malloc()operator new())来适当地分配内存对齐任何标准类型.由于这些函数不接收对齐要求作为参数,实际上这意味着所有分配的对齐方式是相同的,并且是具有最大对齐要求的标准类型的对齐方式,通常是 long double 和/或 long long(参见 提升 max_align 联合).

C++ standard requires allocation functions (malloc() and operator new()) to allocate memory suitably aligned for any standard type. As these functions don't receive the alignment requirement as an argument, on practice it means that the alignment for all allocations is the same and is the alignment of a standard type with the largest alignment requirement, which often is long double and/or long long (see boost max_align union).

向量指令,例如 SSE 和 AVX,比标准 C++ 分配函数提供的对齐要求更高(16 字节对齐用于 128 位访问,32 字节对齐用于 256 位访问).posix_memalign()memalign() 可用于满足此类具有更强对齐要求的分配.

Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign() or memalign() can be used to satisfy such allocations with stronger alignment requirements.

在 C++17 中,分配函数接受一个额外的参数std::align_val_t 类型.

In C++17 the allocation functions accept an additional argument of type std::align_val_t.

你可以像这样使用它:

#include <immintrin.h>
#include <memory>
#include <new>

int main() {
    std::unique_ptr<__m256i[]> arr{new(std::align_val_t{alignof(__m256i)}) __m256i[32]};
}

此外,在 C++17 中,标准分配器已更新以尊重类型的对齐,因此您可以简单地执行以下操作:

Moreover, in C++17 the standard allocators have been updated to respect type's alignment, so you can simply do:

#include <immintrin.h>
#include <vector>

int main() {
    std::vector<__m256i> arr2(32);
}

或(C++11 不涉及和支持堆分配):

Or (no heap allocation involved and supported in C++11):

#include <immintrin.h>
#include <array>

int main() {
    std::array<__m256i, 32> arr3;
}

相关文章