从有符号字符转换为无符号字符然后再转换回来?

2022-01-25 00:00:00 c c++ java-native-interface

我正在使用 JNI 并且有一个 jbyte 类型的数组,其中 jbyte 表示为有符号字符,即范围从 -128 到 127.jbytes 表示图像像素.对于图像处理,我们通常希望像素分量的范围为 0 到 255.因此我想将 jbyte 值转换为 0 到 255 的范围(即与 unsigned char 相同的范围),对值进行一些计算,然后存储结果再次为 jbyte.

I'm working with JNI and have an array of type jbyte, where jbyte is represented as an signed char i.e. ranging from -128 to 127. The jbytes represent image pixels. For image processing, we usually want pixel components to range from 0 to 255. I therefore want to convert the jbyte value to the range 0 to 255 (i.e. the same range as unsigned char), do some calculations on the value and then store the result as a jbyte again.

如何安全地进行这些转换?

How can I do these conversion safely?

我设法让这段代码工作,其中一个像素值增加了 30,但限制为值 255,但我不明白它是否安全或可移植:

I managed to get this code to work, where a pixel value is incremented by 30 but clamped to the value 255, but I don't understand if it's safe or portable:

 #define CLAMP255(v) (v > 255 ? 255 : (v < 0 ? 0 : v))

 jbyte pixel = ...
 pixel = CLAMP_255((unsigned char)pixel + 30);

我很想知道如何在 C 和 C++ 中做到这一点.

I'm interested to know how to do this in both C and C++.

推荐答案

这也是C++引入新的cast style的原因之一,包括static_castreinterpret_cast

This is one of the reasons why C++ introduced the new cast style, which includes static_cast and reinterpret_cast

说从有符号到无符号的转换可能意味着两件事,你可能意味着你希望无符号变量包含有符号变量的值,以你的无符号类型的最大值 + 1 为模.也就是说,如果你有符号char 的值为 -128,然后将 CHAR_MAX+1 添加为 128,如果它的值为 -1,则将 CHAR_MAX+1 添加为值为 255,这是由 static_cast 完成的.另一方面,您可能意味着将某个变量引用的内存的位值解释为无符号字节,而不管系统上使用的有符号整数表示形式如何,即它是否具有位值 0b10000000 它应该评估为值 128,位值 0b11111111 的值为 255,这是通过 reinterpret_cast 完成的.

There's two things you can mean by saying conversion from signed to unsigned, you might mean that you wish the unsigned variable to contain the value of the signed variable modulo the maximum value of your unsigned type + 1. That is if your signed char has a value of -128 then CHAR_MAX+1 is added for a value of 128 and if it has a value of -1, then CHAR_MAX+1 is added for a value of 255, this is what is done by static_cast. On the other hand you might mean to interpret the bit value of the memory referenced by some variable to be interpreted as an unsigned byte, regardless of the signed integer representation used on the system, i.e. if it has bit value 0b10000000 it should evaluate to value 128, and 255 for bit value 0b11111111, this is accomplished with reinterpret_cast.

现在,对于二进制补码表示,这恰好是完全相同的东西,因为 -128 表示为 0b10000000 而 -1 表示为 0b11111111 并且同样介于两者之间.然而,其他计算机(通常是较旧的架构)可能会使用不同的符号表示,例如符号和大小或反码.在一个的补码中,0b10000000 位值不会是 -128,而是 -127,因此静态转换为 unsigned char 将使此为 129,而 reinterpret_cast 将使此为 128.此外,在一个补码中0b11111111 位值不会是 -1,而是 -0,(是的,这个值存在于反码中)并且会使用 static_cast 转换为 0 值,但使用 255reinterpret_cast.请注意,在反码的情况下,无符号值 128 实际上不能用有符号字符表示,因为它的范围是 -127 到 127,因为 -0 值.

Now, for the two's complement representation this happens to be exactly the same thing, since -128 is represented as 0b10000000 and -1 is represented as 0b11111111 and likewise for all in between. However other computers (usually older architectures) may use different signed representation such as sign-and-magnitude or ones' complement. In ones' complement the 0b10000000 bitvalue would not be -128, but -127, so a static cast to unsigned char would make this 129, while a reinterpret_cast would make this 128. Additionally in ones' complement the 0b11111111 bitvalue would not be -1, but -0, (yes this value exists in ones' complement,) and would be converted to a value of 0 with a static_cast, but a value of 255 with a reinterpret_cast. Note that in the case of ones' complement the unsigned value of 128 can actually not be represented in a signed char, since it ranges from -127 to 127, due to the -0 value.

我不得不说,绝大多数计算机都将使用二进制补码,这使得整个问题对于您的代码将运行的任何地方都没有实际意义.想想 60 年代的时间框架,您可能只会在非常古老的架构中看到具有非二进制补码的系统.

I have to say that the vast majority of computers will be using two's complement making the whole issue moot for just about anywhere your code will ever run. You will likely only ever see systems with anything other than two's complement in very old architectures, think '60s timeframe.

语法归结为:

signed char x = -100;
unsigned char y;

y = (unsigned char)x;                    // C static
y = *(unsigned char*)(&x);               // C reinterpret
y = static_cast<unsigned char>(x);       // C++ static
y = reinterpret_cast<unsigned char&>(x); // C++ reinterpret

使用数组以一种不错的 C++ 方式执行此操作:

To do this in a nice C++ way with arrays:

jbyte memory_buffer[nr_pixels];
unsigned char* pixels = reinterpret_cast<unsigned char*>(memory_buffer);

或C方式:

unsigned char* pixels = (unsigned char*)memory_buffer;

相关文章