在 C++ 中分配大内存块

2021-12-24 00:00:00 memory-management c++

我正在尝试为浮点值的 C++ 中的 3D 矩阵分配一个大内存块.它的尺寸是 44100x2200x2.这应该正好需要 44100x2200x2x4 字节的内存,大约 7.7gb.我正在带有 Ubuntu 的 64 位 x86 机器上使用 g++ 编译我的代码.当我使用 htop 查看进程时,我看到内存使用量增长到 32GB 并立即被杀死.我的记忆计算有误吗?

I am trying to allocate a large memory block for a 3D matrix in C++ of floating point value. It's dimensions are 44100x2200x2. This should take exactly 44100x2200x2x4 bytes of memory which is about 7.7gb. I am compiling my code using g++ on a 64bit x86 machine with Ubuntu. When I view the process using htop, I see that the memory usage grows to 32gb and is promptly killed. Did I make a mistake in my memory calculation?

这是我的代码:

#include <iostream>

using namespace std;
int main(int argc, char* argv[]) {
  int N = 22000;
  int M = 44100;
  float*** a = new float**[N];
  for (int m = 0; m<N; m+=1) {
    cout<<((float)m/(float)N)<<endl;
    a[m] = new float*[M - 1];
    for (int n = 0; n<M - 1; n+=1) {
      a[m][n] = new float[2];
    }
  }
}

我的计算不正确,我分配的空间接近 38GB.我现在修复了代码以分配 15GB.

My calculation was incorrect, and I was allocating closer to 38gb. I fixed the code now to allocate 15gb.

#include <iostream>

using namespace std;
int main(int argc, char* argv[]) {
  unsigned long  N = 22000;
  unsigned long  M = 44100;
  unsigned long blk_dim = N*(M-1)*2;
  float* blk = new float[blk_dim];
  unsigned long b = (unsigned long) blk;

  float*** a = new float**[N];
  for (int m = 0; m<N; m+=1) {
    unsigned long offset1 = m*(M - 1)*2*sizeof(float);
    a[m] = new float*[M - 1];
    for (int n = 0; n<M - 1; n+=1) {
      unsigned long offset2 = n*2*sizeof(float);
      a[m][n] = (float*)(offset1 + offset2 + b);
    }
  }
}

推荐答案

你忘记了一个维度,以及分配内存的开销.显示的代码在第三维中分配内存的效率非常低,导致开销太大.

You forgot one dimension, and the overhead of allocating memory. The shown code allocates memory very inefficiently in the third dimension, resulting in way too much overhead.

float*** a = new float**[N];

这将分配大约 22000 * sizeof(float **),大约 176kb.可以忽略不计.

This will allocate, roughly 22000 * sizeof(float **), which is rougly 176kb. Negligible.

a[m] = new float*[M - 1];

此处的单个分配将用于 44099 * sizeof(float *),但您将获得其中的 22000.22000 * 44099 * sizeof(float *),或大约 7.7gb 的额外内存.这是您停止计数的地方,但您的代码尚未完成.还有很长的路要走.

A single allocation here will be for 44099 * sizeof(float *), but you will grab 22000 of these. 22000 * 44099 * sizeof(float *), or roughly 7.7gb of additional memory. This is where you stopped counting, but your code isn't done yet. It's got a long ways to go.

a[m][n] = new float[2];

这是8个字节的单次分配,但是这个分配会做22000 * 44099次.这是另一个 7.7gb 冲入下水道.您现在有超过 15 个应用程序所需的内存,大致需要分配.

This is a single allocation of 8 bytes, but this allocation will be done 22000 * 44099 times. That's another 7.7gb flushed down the drain. You're now over 15 gigs of application-required memory, roughly, that needs to be allocated.

但是每个分配都不是免费的,并且new float[2]需要更多超过8个字节.每个单独分配的块必须由您的 C++ 库在内部进行跟踪,以便它可以通过 delete 回收.最简单的基于链表的堆分配实现需要一个前向指针、一个后向指针以及分配块中??有多少字节的计数.假设不需要为了对齐而填充任何内容,那么在 64 位平台上,每次分配至少需要 24 字节的开销.

But each allocation does not come free, and new float[2] requires more than 8 bytes. Each individually allocated block must be tracked internally by your C++ library, so that it can be recycled by delete. The most simplistic link-list based implementation of heap allocation requires one forward pointer, one backward pointer, and the count of how many bytes are there in the allocated block. Assuming nothing needs to be padded for alignment purposes, this is at least 24 bytes of overhead per allocation, on a 64-bit platform.

现在,由于您的第三维进行了 22000 * 44099 次分配,第二维进行了 22000 次分配,第一维进行了一次分配:如果我指望我的手指,这将需要 (22000 * 44099 + 22000 + 1) *24,或另外 22 GB 的内存,只是为了消耗最简单、基本的内存分配方案的开销.

Now, since your third dimension makes 22000 * 44099 allocations, 22000 allocations for the second dimension, and one allocation for the first dimension: if I count on my fingers, this will require (22000 * 44099 + 22000 + 1) * 24, or another 22 gigabytes of memory, just to consume the overhead of the most simple, basic memory allocation scheme.

如果我的数学计算正确,我们现在使用最简单、可能的堆分配跟踪最多需要大约 38 GB 的 RAM.您的 C++ 实现可能会使用稍微复杂一些的堆分配逻辑,但开销更大.

We're now up to about 38 gigabytes of RAM needed using the most simple, possible, heap allocation tracking, if I did my math right. Your C++ implementation is likely to use a slightly more sophisticated heap allocation logic, with larger overhead.

摆脱new float[2].计算矩阵的大小,并new 一个 7.7gb 的块,然后计算其余的指针应该指向的位置.此外,为矩阵的第二维分配一块内存,并计算第一维的指针.

Get rid of the new float[2]. Compute your matrix's size, and new a single 7.7gb chunk, then calculate where the rest of your pointers should be pointing to. Also, allocate a single chunk of memory for the second dimension of your matrix, and compute the pointers for the first dimension.

您的分配代码应该正好执行三个 new 语句.一个用于第一维指针,一个用于第二维指针.还有一个用于构成您的第三维的大量数据.

Your allocation code should execute exactly three new statements. One for the first dimension pointer, One for the second dimension pointers. And one more for the huge chunk of data that comprises your third dimension.

相关文章