如果 32 位整数溢出,我们可以使用 40 位结构而不是 64 位长的结构吗?

2021-12-24 00:00:00 c memory-management c++ integer-overflow

如果,比如说,一个 32 位整数溢出,而不是将 int 升级为 long,如果我们需要一个 40 位类型,我们可以使用一些 40 位类型吗?范围只在 240 之内,所以我们为每个整数节省 24 (64-40) 位?

If, say, a 32-bit integer is overflowing, instead of upgrading int to long, can we make use of some 40-bit type if we need a range only within 240, so that we save 24 (64-40) bits for every integer?

如果是,怎么做?

我必须处理数十亿,空间是一个更大的限制.

I have to deal with billions and space is a bigger constraint.

推荐答案

是的,但是...

这当然可能,但通常是荒谬的(对于任何不使用数十亿个这些数字的程序):

Yes, but...

It is certainly possible, but it is usually nonsensical (for any program that doesn't use billions of these numbers):

#include <stdint.h> // don't want to rely on something like long long
struct bad_idea
{
    uint64_t var : 40;
};

在这里,var 确实会有 40 位的宽度,但代价是生成很多 效率较低的代码(事实证明,很多"是非常错误的――- 测得的开销仅为 1-2%,请参见下面的时间安排),通常无济于事.除非您需要将另一个 24 位值(或 8 位和 16 位值)打包到相同的结构中,否则对齐将失去您可能获得的任何东西.

Here, var will indeed have a width of 40 bits at the expense of much less efficient code generated (it turns out that "much" is very much wrong -- the measured overhead is a mere 1-2%, see timings below), and usually to no avail. Unless you have need for another 24-bit value (or an 8 and 16 bit value) which you wish to pack into the same structure, alignment will forfeit anything that you may gain.

在任何情况下,除非你有数十亿个,否则内存消耗的有效差异不会很明显(但管理位域所需的额外代码会很明显!).

In any case, unless you have billions of these, the effective difference in memory consumption will not be noticeable (but the extra code needed to manage the bit field will be noticeable!).

注意:
同时,该问题已更新以反映确实需要数十亿 个数字,因此这可能是可行的做法,假设您采取措施不会因结构对齐而失去收益和填充,即通过在剩余的 24 位中存储其他内容或将 40 位值存储在每个 8 个或其倍数的结构中).
节省三个字节十亿次是值得的,因为它需要明显更少的内存页面,从而导致更少的缓存和 TLB 未命中,尤其是页面错误(单个页面错误加权数千万条指令).

Note:
The question has in the mean time been updated to reflect that indeed billions of numbers are needed, so this may be a viable thing to do, presumed that you take measures not to lose the gains due to structure alignment and padding, i.e. either by storing something else in the remaining 24 bits or by storing your 40-bit values in structures of 8 each or multiples thereof).
Saving three bytes a billion times is worthwhile as it will require noticeably fewer memory pages and thus cause fewer cache and TLB misses, and above all page faults (a single page fault weighting tens of millions instructions).

虽然上面的代码片段没有使用剩余的 24 位(它只是演示了使用 40 位"部分),但需要类似于以下内容才能真正使该方法在保留内存的意义上有用 -- 假设您确实有其他有用的"数据可以放入漏洞中:

While the above snippet does not make use of the remaining 24 bits (it merely demonstrates the "use 40 bits" part), something akin to the following will be necessary to really make the approach useful in a sense of preserving memory -- presumed that you indeed have other "useful" data to put in the holes:

struct using_gaps
{
    uint64_t var           : 40;
    uint64_t useful_uint16 : 16;
    uint64_t char_or_bool  : 8;  
};

结构大小和对齐方式将等于 64 位整数,因此如果您制作例如十亿个这样的结构的数组(即使不使用特定于编译器的扩展).如果您不使用 8 位值,您还可以使用 48 位和 16 位值(提供更大的溢出余量).
或者,您可以以牺牲可用性为代价,将 8 个 40 位值放入一个结构中(40 和 64 的最小公倍数为 320 = 8*40).当然,那么访问结构数组中元素的代码将复杂(尽管可以实现一个 operator[] 来恢复线性数组功能并隐藏结构复杂度).

Structure size and alignment will be equal to a 64 bit integer, so nothing is wasted if you make e.g. an array of a billion such structures (even without using compiler-specific extensions). If you don't have use for an 8-bit value, you could also use an 48-bit and a 16-bit value (giving a bigger overflow margin).
Alternatively you could, at the expense of usability, put 8 40-bit values into a structure (least common multiple of 40 and 64 being 320 = 8*40). Of course then your code which accesses elements in the array of structures will become much more complicated (though one could probably implement an operator[] that restores the linear array functionality and hides the structure complexity).

更新:
编写了一个快速测试套件,只是为了看看位域(以及使用位域引用的运算符重载)会有什么开销.在 gcc.godbolt.org,测试我的 Win7 输出-64 机器是:

Update:
Wrote a quick test suite, just to see what overhead the bitfields (and operator overloading with bitfield refs) would have. Posted code (due to length) at gcc.godbolt.org, test output from my Win7-64 machine is:

Running test for array size = 1048576
what       alloc   seq(w)  seq(r)  rand(w)  rand(r)  free
-----------------------------------------------------------
uint32_t    0      2       1       35       35       1
uint64_t    0      3       3       35       35       1
bad40_t     0      5       3       35       35       1
packed40_t  0      7       4       48       49       1


Running test for array size = 16777216
what        alloc  seq(w)  seq(r)  rand(w)  rand(r)  free
-----------------------------------------------------------
uint32_t    0      38      14      560      555      8
uint64_t    0      81      22      565      554      17
bad40_t     0      85      25      565      561      16
packed40_t  0      151     75      765      774      16


Running test for array size = 134217728
what        alloc  seq(w)  seq(r)  rand(w)  rand(r)  free
-----------------------------------------------------------
uint32_t    0      312     100     4480     4441     65
uint64_t    0      648     172     4482     4490     130
bad40_t     0      682     193     4573     4492     130
packed40_t  0      1164    552     6181     6176     130

可以看到的是,位域的额外开销可以忽略不计,但是当以缓存友好的方式线性访问数据时,使用位域引用作为一种便利的操作符重载是相当激烈的(大约增加了 3 倍).另一方面,在随机访问中,它几乎无关紧要.

What one can see is that the extra overhead of bitfields is neglegible, but the operator overloading with bitfield reference as a convenience thing is rather drastic (about 3x increase) when accessing data linearly in a cache-friendly manner. On the other hand, on random access it barely even matters.

这些时间表明,简单地使用 64 位整数会更好,因为它们总体上仍然比位域更快(尽管涉及更多内存),但当然他们没有考虑更大数据集的页面错误成本.一旦物理内存用完(我没有测试过),它看起来可能会大不相同.

These timings suggest that simply using 64-bit integers would be better since they are still faster overall than bitfields (despite touching more memory), but of course they do not take into account the cost of page faults with much bigger datasets. It might look very different once you run out of physical RAM (I didn't test that).

相关文章