-O3 模式下的 SEGFAULT?

2022-01-12 00:00:00 gcc optimization segmentation-fault c++

我将我的问题总结为以下短程序.

I summarized my problem to the following short program.

它仅在 -O3 模式下导致 SEGFAULT(-O2 工作正常).根据 gdb 它发生在 *f = 0 行.

It causes SEGFAULT in -O3 mode only (-O2 works fine). According to gdb it happens at *f = 0 line.

#include <iostream>

void func1(int s, int t)
{
        char* buffer = new char[s + t*sizeof(float)];
        if (!buffer)
        {
            std::cout << "new failed
";
            return;
        }
        float* f = (float*)(buffer + s);
        for (int i = 0; i < t; ++i)
        {
            *f = 0;
            //std::cout << i << std::endl; // if uncomment this line everything will work fine
            ++f;
        }
        delete [] buffer;
        std::cout << "done
";
}

int main()
{
        int s = 31, t = 12423138;
        std::cout << s << " " << t << std::endl;
        func1(s, t);
        return 0;
}

请告诉我,我做错了什么?

Please let me know, what am I doing wrong?

推荐答案

SEGFAULT 的来源不仅违反了严格的别名规则,即使使用 -fno-strict-aliasing 标志,问题仍然存在.

The source of SEGFAULT was not solely in violation of the strict aliasing rule, as the problem persisted even with -fno-strict-aliasing flag.

确实是在访问未对齐的内存,但不是那么简单.作为现代处理器,通常允许未对齐的内存访问,现在甚至没有太多的开销.我已经进行了一些基准测试,但在我的 Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz 上没有观察到 algined 读取与未对齐读取的巨大差异.还有一些网络中非常相似(或多或少)的结果.

It was indeed accessing unaligned memory, but not as simple as that. As modern processors, generally allow unaligned memory access and there is even not much of an overhead nowadays. I've done some benchmarking and didn't observe a big difference in algined vs unaligned read on my Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz. Also there are some very similar (and more or less recent) results in the web.

我的问题是 -O3 模式启用了 -ftree-vectorize 标志,因此我的 for 循环被矢量化(我可以看到使用-ftree-vectorizer-verbose 标志).并且 (AFAIU) 不支持(还没有?)使用 矢量化 指令进行未对齐的内存访问,因此存在运行时异常.

My problem was that -O3 mode enables -ftree-vectorize flag, therefore my for cycle was vectorized (as I could see using -ftree-vectorizer-verbose flag). And (AFAIU) there is no support (yet?) for unaligned memory access using vectorized instructions, so there was a runtime exception.

这篇文章在理解理论方面帮助了我很多,虽然它似乎今天未对齐的内存访问已经不像以前那么有害了,尽管仍然很棘手

This article helped me out a lot in understanding theory, though it seems that today unaligned memory access is not as harmful as it was, though still tricky

相关文章