为什么“STRING".getBytes() 的工作方式因操作系统而异

2022-01-09 00:00:00 binary linux unix java ebcdic

I am running the code below and I am getting different outcome from "some_string".getBytes() depending if I am in Windows or Unix. The issue happens with any string (I tried a very simple ABC and same problem.

See the differences below printed in console.

The code below is well-tested using Java 7. If you copy it entirely it will run.

Additionally, see the difference in Hexadecimal in the two images below. The first two images shows the file created in Windows. You can see the hexadecimal values with ANSI and EBCDIC respectively. The third image, the black one, is from Unix. You can see the hexadecimal (-c option) and the character readable in which I believe it is EBCDIC.

So, my straight question is: why does such code work different since I am just using Java 7 in both case? Should I check any especific property in somewhere? Maybe, Java in Windows get certain default format and in Unix it get another. If so, which property must I check or settup?

Unix Console:

$ ./java -cp /usr/test.jar test.mainframe.read.test.TestGetBytes
H = 76 - L
< wasn't found

Windows Console:

H = 60 - <
H1 = 69 - E
H2 = 79 - O
H3 = 77 - M
H4 = 62 - >
End of Message found

The entire code:

package test.mainframe.read.test;

import java.util.ArrayList;

public class TestGetBytes {

       public static void main(String[] args) {
              try {
                     ArrayList ipmMessage = new ArrayList();
                     ipmMessage.add(newLine());

                     //Windows Path
                     writeMessage("C:/temp/test_bytes.ipm", ipmMessage);
                     reformatFile("C:/temp/test_bytes.ipm");
                     //Unix Path
                     //writeMessage("/usr/temp/test_bytes.ipm", ipmMessage);
                     //reformatFile("/usr/temp/test_bytes.ipm");
              } catch (Exception e) {

                     System.out.println(e.getMessage());
              }
       }

       public static byte[] newLine() {
              return "<EOM>".getBytes();
       }

       public static void writeMessage(String fileName, ArrayList ipmMessage)
                     throws java.io.FileNotFoundException, java.io.IOException {

              java.io.DataOutputStream dos = new java.io.DataOutputStream(
                           new java.io.FileOutputStream(fileName, true));
              for (int i = 0; i < ipmMessage.size(); i++) {
                     try {
                           int[] intValues = (int[]) ipmMessage.get(i);
                           for (int j = 0; j < intValues.length; j++) {
                                  dos.write(intValues[j]);
                           }
                     } catch (ClassCastException e) {
                           byte[] byteValues = (byte[]) ipmMessage.get(i);
                           dos.write(byteValues);
                     }
              }
              dos.flush();
              dos.close();

       }

       // reformat to U1014
       public static void reformatFile(String filename)
                     throws java.io.FileNotFoundException, java.io.IOException {
              java.io.FileInputStream fis = new java.io.FileInputStream(filename);
              java.io.DataInputStream br = new java.io.DataInputStream(fis);

              int h = br.read();
              System.out.println("H = " + h + " - " + (char)h);

              if ((char) h == '<') {// Check for <EOM>

                     int h1 = br.read();
                     System.out.println("H1 = " + h1 + " - " + (char)h1);
                     int h2 = br.read();
                     System.out.println("H2 = " + h2 + " - " + (char)h2);
                     int h3 = br.read();
                     System.out.println("H3 = " + h3 + " - " + (char)h3);
                     int h4 = br.read();
                     System.out.println("H4 = " + h4 + " - " + (char)h4);
                     if ((char) h1 == 'E' && (char) h2 == 'O' && (char) h3 == 'M'
                                  && (char) h4 == '>') {
                           System.out.println("End of Message found");
                     }
                     else{
                           System.out.println("EOM not found but < was found");
                     }
              }
              else{
                     System.out.println("< wasn't found");
              }
       }
}

解决方案

You are not specifying a charset when calling getBytes(), so it uses the default charset of the underlying platform (or of Java itself if specified when Java is started). This is stated in the String documentation:

public byte[] getBytes()

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

getBytes() has an overloaded version that lets you specify a charset in your code.

public byte[] getBytes(Charset charset)

Encodes this String into a sequence of bytes using the given charset, storing the result into a new byte array.

相关文章