`__m256` 的包装器使用构造函数产生分段错误 - Windows 64 + MinGW + AVX 问题

2022-01-23 00:00:00 mingw-w64 g++ c++ avx windows64

我有一个像这样的工会

 union bareVec8f { 
    __m256 m256; //avx 8x float vector
    float floats[8];
    int ints[8];
    inline bareVec8f(){
    }
    inline bareVec8f(__m256 vec){
        this->m256 = vec;
    }
    inline bareVec8f &operator=(__m256 m256) {
        this->m256 = m256;
        return *this;
    }

    inline operator __m256 &() {
        return m256;
    }
}

__m256 需要在 32 字节边界上对齐才能与 SSE 函数一起使用，并且应该自动对齐，即使在联合内也是如此.

the __m256 needs to be aligned on 32 byte boundary to be used with SSE functions, and should be automatically, even within the union.

当我这样做时

bareVec8f test = _mm256_set1_ps(1.0f);

我遇到了分段错误.由于我制作的构造函数，这段代码应该可以工作.但是，当我这样做时

I get a segmentation fault. This code should work because of the constructor I made. However, when I do this

bareVec8f test; test.m256 = _mm256_set1_ps(8.f);

我没有遇到分段错误.

因此，由于这工作正常，联合可能正确对齐，似乎构造函数导致了一些分段错误

So because that works fine the union is probably aligned properly, there's just some segmentation fault being caused with the constructor it seems

我正在使用 gcc 64 位 windows 编译器

I'm using gcc 64bit windows compiler

---------------------------------编辑Matt 设法生成了似乎在这里发生的错误的最简单示例.

---------------------------------EDIT Matt managed to produce the simplest example of the error that seems to be happening here.

#include <immintrin.h> void foo(__m256 x) {} int main() { __m256 r = _mm256_set1_ps(0.0f); foo(r); }

我正在使用 -std=c++11 -mavx

推荐答案

这是 g++ for Windows 中的一个错误.它不应该执行 32 字节堆栈对齐.错误 49001 错误 54412

This is a bug in g++ for Windows. It does not perform 32-byte stack alignment when it should. Bug 49001 Bug 54412

在这个 SO 线程上有人制作了一个 Python 脚本来处理 g++ 的程序集输出以解决问题，因此这是一种选择.

On this SO thread someone made a Python script to process the assembly output by g++ to fix the problem, so that would be one option.

否则，为避免在您的联合中出现这种情况，您可以将按值获取 __m256 的函数改为通过引用获取.这不应该有任何性能损失，除非优化低/关闭.

Otherwise, to avoid this in your union you could make the functions which take __m256 by value, take it by reference instead. This shouldn't have any performance penalty unless optimization is low/off.

如果您不知道 - 联合别名会导致 C++ 中未定义的行为，则不允许先编写 m256 然后再读取 floats 或 ints例如.因此，您的问题可能有不同的解决方案.

In case you are unaware - union aliasing causes undefined behaviour in C++, it's not permitted to write m256 and then read floats or ints for example. So perhaps there is a different solution to your problem.

相关文章