比较一个 32 位浮点数和一个 32 位整数而不强制转换为双精度，当任何一个值都可能太大而无法完全适合另一种类型时

2022-01-17 00:00:00 floating-point precision arm c++

我有一个 32 位浮点 f 数(已知为正数)，我需要将其转换为 32 位无符号整数.它的大小可能太大而无法容纳.此外，下游计算需要一些空间.我可以将最大可接受值 m 计算为 32 位整数.如果 f <= m 在数学上，我如何在受约束的 32 位机器 (ARM M4F) 上有效地确定 C++11.请注意，这两个值的类型不匹配.以下三种方法各有问题:

I have a 32 bit floating point f number (known to be positive) that I need to convert to 32 bit unsigned integer. It's magnitude might be too large to fit. Furthermore, there is downstream computation that requires some headroom. I can compute the maximum acceptable value m as a 32 bit integer. How do I efficiently determine in C++11 on a constrained 32 bit machine (ARM M4F) if f <= m mathematically. Note that the types of the two values don't match. The following three approaches each have their issues:

static_cast<uint32_t>(f) <= m:如果 f 不适合 32 位整数，我认为这会触发未定义的行为
f <= static_cast<float>(m):如果m太大而无法精确转换，则转换后的值可能大于m 这样后续比较会在某些极端情况下产生错误的结果
static_cast<double>(f) <= static_cast<double>(m):在数学上是正确的，但需要转换为 double 并使用 double，我想避免这种情况效率原因

static_cast<uint32_t>(f) <= m: I think this triggers undefined behaviour if f doesn't fit the 32 bit integer

f <= static_cast<float>(m): if m is too large to be converted exactly, the converted value could be larger than m such that the subsequent comparison will produce the wrong result in certain edge cases

static_cast<double>(f) <= static_cast<double>(m): is mathematically correct, but requires casting to, and working with double, which I'd like to avoid for efficiency reasons

当然必须有一种方法可以将整数直接转换为具有指定舍入方向的浮点数，即保证结果的大小不超过输入.我更喜欢 C++11 标准解决方案，但在最坏的情况下，平台内在函数也可以满足要求.

Surely there must be a way to convert an integer to a float directly with specified rounding direction, i.e. guaranteeing the result not to exceed the input in magnitude. I'd prefer a C++11 standard solution, but in the worst case platform intrinsics could qualify as well.

推荐答案

我认为你最好的选择是有点平台特定.232 可以用浮点数精确表示.检查 f 是否太大而无法容纳，然后转换为无符号并检查 m.

I think your best bet is to be a bit platform specific. 232 can be represented precisely in floating point. Check if f is too large to fit at all, and then convert to unsigned and check against m.

const float unsigned_limit = 4294967296.0f; bool ok = false; if (f < unsigned_limit) { const auto uf = static_cast<unsigned int>(f); if (uf <= m) { ok = true; } }

不喜欢双重比较，但很清楚.

Not fond of the double comparison, but it's clear.

如果 f 通常显着小于 m(或通常显着大于)，则可以针对 float(m)*0.99f 进行测试(分别为 float(m)*1.01f)，然后在异常情况下进行精确比较.如果分析表明性能提升值得额外的复杂性，那可能才值得这样做.

If f is usually significantly less than m (or usually significantly greater), one can test against float(m)*0.99f (respectively float(m)*1.01f), and then do the exact comparison in the unusual case. That is probably only worth doing if profiling shows that the performance gain is worth the extra complexity.

相关文章