原始 new[]/delete[] 与 std::vector 的优化

2021-12-21 00:00:00 compiler-optimization vector c++ c++14

让我们来处理非常基本的动态分配内存.我们取一个 3 的向量,设置它的元素并返回向量的总和.

Let's mess around with very basic dynamically allocated memory. We take a vector of 3, set its elements and return the sum of the vector.

在第一个测试用例中,我使用带有 new[]/delete[] 的原始指针.在第二个我使用 std::vector:

In the first test case I used a raw pointer with new[]/delete[]. In the second I used std::vector:

#include <vector>   

int main()
{
  //int *v = new int[3];        // (1)
  auto v = std::vector<int>(3); // (2)


  for (int i = 0; i < 3; ++i)
    v[i] = i + 1;

  int s = 0;
  for (int i = 0; i < 3; ++i)
    s += v[i];

  //delete[] v;                 // (1)
  return s;
}

(1) 的集合 (new[]/delete[])

Assembly of (1) (new[]/delete[])

main:                                   # @main
        mov     eax, 6
        ret

(2) 的组装 (std::vector)

main:                                   # @main
        push    rax
        mov     edi, 12
        call    operator new(unsigned long)
        mov     qword ptr [rax], 0
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        test    rax, rax
        je      .LBB0_2
        mov     rdi, rax
        call    operator delete(void*)
.LBB0_2:                                # %std::vector<int, std::allocator<int> >::~vector() [clone .exit]
        mov     eax, 6
        pop     rdx
        ret

两个输出均取自 https://gcc.godbolt.org/ 和 -std=c++14 -O3

Both outputs taken from https://gcc.godbolt.org/ with -std=c++14 -O3

在两个版本中,返回值都是在编译时计算的,所以我们只看到 mov eax, 6;ret.

In both versions the returned value is computed at compile time so we see just mov eax, 6; ret.

使用原始 new[]/delete[] 完全删除动态分配.但是,使用 std::vector 分配、设置和释放内存.

With the raw new[]/delete[] the dynamic allocation was completely removed. With std::vector however, the memory is allocated, set and freed.

这种情况发生即使使用未使用的变量 auto v = std::vector(3):调用new,内存设置然后调用delete.

This happens even with an unused variable auto v = std::vector<int>(3): call to new, memory is set and then call to delete.

我意识到这很可能是一个几乎不可能给出的答案,但也许有人有一些见解并且可能会弹出一些有趣的答案.

I realize this is most likely a near impossible answer to give, but maybe someone has some insights and some interesting answers might pop out.

std::vector 情况下(例如在原始内存分配情况下)不允许编译器优化移除内存分配的促成因素是什么?

What are the contributing factors that don't allow compiler optimizations to remove the memory allocation in the std::vector case, like in the raw memory allocation case?

推荐答案

当使用指向动态分配数组的指针(直接使用 new[] 和 delete[])时,编译器优化掉了对 operator new 的调用operator delete 即使它们有明显的副作用.C++标准第5.3.4节第10段允许这种优化:

When using a pointer to a dynamically allocated array (directly using new[] and delete[]), the compiler optimized away the calls to operator new and operator delete even though they have observable side effects. This optimization is allowed by the C++ standard section 5.3.4 paragraph 10:

一个实现可以省略对可替换全局的调用分配函数(18.6.1.1、18.6.1.2).当它这样做时,存储而是由实现提供或...

An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or...

我会在最后展示句子的其余部分,这很关键.

I'll show the rest of the sentence, which is crucial, at the end.

这种优化相对较新,因为它首先在 C++14 中被允许(提案 N3664).Clang 从 3.4 开始支持它.gcc 的最新版本,即 5.3.0,没有利用这种放松的 as-if 规则.它产生以下代码:

This optimization is relatively new because it was first allowed in C++14 (proposal N3664). Clang supported it since 3.4. The latest version of gcc, namely 5.3.0, doesn't take advantage of this relaxation of the as-if rule. It produces the following code:

main:
        sub     rsp, 8
        mov     edi, 12
        call    operator new[](unsigned long)
        mov     DWORD PTR [rax], 1
        mov     DWORD PTR [rax+4], 2
        mov     rdi, rax
        mov     DWORD PTR [rax+8], 3
        call    operator delete[](void*)
        mov     eax, 6
        add     rsp, 8
        ret

MSVC 2013 也不支持此优化.它产生以下代码:

MSVC 2013 also doesn't support this optimization. It produces the following code:

main:
  sub         rsp,28h  
  mov         ecx,0Ch  
  call        operator new[] ()  
  mov         rcx,rax  
  mov         dword ptr [rax],1  
  mov         dword ptr [rax+4],2  
  mov         dword ptr [rax+8],3  
  call        operator delete[] ()  
  mov         eax,6  
  add         rsp,28h  
  ret 

我目前无法访问 MSVC 2015 Update 1,因此我不知道它是否支持此优化.

I currently don't have access to MSVC 2015 Update 1 and therefore I don't know whether it supports this optimization or not.

最后是icc 13.0.1生成的汇编代码:

Finally, here is the assembly code generated by icc 13.0.1:

main:
        push      rbp                                          
        mov       rbp, rsp                                   
        and       rsp, -128                                    
        sub       rsp, 128                                     
        mov       edi, 3                                       
        call      __intel_new_proc_init                         
        stmxcsr   DWORD PTR [rsp]                               
        mov       edi, 12                                 
        or        DWORD PTR [rsp], 32832                       
        ldmxcsr   DWORD PTR [rsp]                               
        call      operator new[](unsigned long)
        mov       rdi, rax                                      
        mov       DWORD PTR [rax], 1                            
        mov       DWORD PTR [4+rax], 2                          
        mov       DWORD PTR [8+rax], 3                         
        call      operator delete[](void*)
        mov       eax, 6    
        mov       rsp, rbp                           
        pop       rbp                                   
        ret                                          

显然,它不支持这种优化.我无法访问最新版本的 icc,即 16.0.

Clearly, it doesn't support this optimization. I don't have access to the latest version of icc, namely 16.0.

所有这些代码片段都是在启用优化的情况下生成的.

All of these code snippets have been produced with optimizations enabled.

当使用 std::vector 时,所有这些编译器都没有优化分配.当编译器不执行优化时,要么是因为某些原因不能执行优化,要么只是尚不支持.

When using std::vector, all of these compilers didn't optimize away the allocation. When a compiler doesn't perform an optimization, it's either because it cannot for some reason or it's just not yet supported.

不允许编译器的成因是什么在 std::vector 情况下删除内存分配的优化,就像在原始内存分配情况下一样?

What are the contributing factors that don't allow compiler optimizations to remove the memory allocation in the std::vector case, like in the raw memory allocation case?

编译器没有执行优化,因为这是不允许的.为了看到这一点,让我们看看 5.3.4 中第 10 段的其余句子:

The compiler didn't perform the optimization because it's not allowed to. To see this, let's see the rest of the sentence of paragraph 10 from 5.3.4:

一个实现可以省略对可替换全局的调用分配函数(18.6.1.1、18.6.1.2).当它这样做时,存储由实现提供或通过扩展提供另一个新表达式的分配.

An implementation is allowed to omit a call to a replaceable global allocation function (18.6.1.1, 18.6.1.2). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.

这意味着您可以省略对可替换全局分配函数的调用,前提是它源自 new 表达式.同一节的第 1 段定义了一个 new 表达式.

What this is saying is that you can omit a call to a replaceable global allocation function only if it originated from a new-expression. A new-expression is defined in paragraph 1 of the same section.

以下表达式

new int[3]

是一个新表达式,因此允许编译器优化掉相关的分配函数调用.

is a new-expression and therefore the compiler is allowed to optimize away the associated allocation function call.

另一方面,以下表达式:

On the other hand, the following expression:

::operator new(12)

不是新表达式(见 5.3.4 第 1 段).这只是一个函数调用表达式.换句话说,这被视为典型的函数调用.无法优化此函数,因为它是从另一个共享库导入的(即使您静态链接运行时,该函数本身也会调用另一个导入的函数).

is NOT a new-expression (see 5.3.4 paragraph 1). This is just a function call expression. In other words, this is treated as a typical function call. This function cannot be optimized away because its imported from another shared library (even if you linked the runtime statically, the function itself calls another imported function).

std::vector 使用的默认分配器使用 ::operator new 分配内存,因此不允许编译器对其进行优化.

The default allocator used by std::vector allocates memory using ::operator new and therefore the compiler is not allowed to optimize it away.

让我们测试一下.代码如下:

Let's test this. Here's the code:

int main()
{
  int *v =  (int*)::operator new(12);

  for (int i = 0; i < 3; ++i)
    v[i] = i + 1;

  int s = 0;
  for (int i = 0; i < 3; ++i)
    s += v[i];

  delete v;
  return s;
}

通过使用 Clang 3.7 编译,我们得到以下汇编代码:

By compiling using Clang 3.7, we get the following assembly code:

main:                                   # @main
        push    rax
        mov     edi, 12
        call    operator new(unsigned long)
        movabs  rcx, 8589934593
        mov     qword ptr [rax], rcx
        mov     dword ptr [rax + 8], 3
        test    rax, rax
        je      .LBB0_2
        mov     rdi, rax
        call    operator delete(void*)
.LBB0_2:
        mov     eax, 6
        pop     rdx
        ret

这与使用 std::vector 时生成的汇编代码完全相同,除了 mov qword ptr [rax], 0 来自 std 的构造函数::vector(编译器应该删除它,但由于其优化算法中的缺陷而未能这样做).

This is exactly the same as assembly code generated when using std::vector except for mov qword ptr [rax], 0 which comes from the constructor of std::vector (the compiler should have removed it but failed to do so because of a flaw in its optimization algorithms).

相关文章