用字符串读/写二进制文件?

如何从二进制文件中写入/读取字符串?

How can I write/read a string from a binary file?

我试过使用 writeUTF/readUTF (DataOutputStream/DataInputStream) 但太麻烦了.

I've tried using writeUTF / readUTF (DataOutputStream/DataInputStream) but it was too much of a hassle.

谢谢.

推荐答案

暂时忘记 FileWriter,DataOutputStream.

Forget about FileWriter, DataOutputStream for a moment.

  • 对于二进制数据,可以使用 OutputStreamInputStream 类.他们处理 byte[].
  • 对于文本数据,使用 ReaderWriter 类.他们处理 String 可以存储所有类型的文本,因为它在内部使用 Unicode.
  • For binary data one uses OutputStream and InputStream classes. They handle byte[].
  • For text data one uses Reader and Writer classes. They handle String which can store all kind of text, as it internally uses Unicode.

文本到二进制数据的交叉可以通过指定编码来完成,默认为OS编码.

The crossover from text to binary data can be done by specifying the encoding, which defaults to the OS encoding.

  • new OutputStreamWriter(outputStream, encoding)
  • string.getBytes(编码)

因此,如果您想避免 byte[] 并使用 String,则必须滥用以任何顺序覆盖所有 256 字节值的编码.所以没有UTF-8",但可能是windows-1252"(也称为Cp1252").

So if you want to avoid byte[] and use String you must abuse an encoding which covers all 256 byte values in any order. So no "UTF-8", but maybe "windows-1252" (also named "Cp1252").

但内部存在转换,在极少数情况下可能会出现问题.例如 é 在 Unicode 中可以是一个代码,也可以是两个,e + 组合变音符号右重音 '.有一个转换函数(java.text.Normalizer).

But internally there is a conversion, and in very rare cases problems might happen. For instance é can in Unicode be one code, or two, e + combining diacritical mark right-accent '. There exists a conversion function (java.text.Normalizer) for that.

这已经导致问题的一种情况是不同操作系统中的文件名;MacOS 比 Windows 有另一个 Unicode 规范化,因此在版本控制系统中需要特别注意.

One case where this already led to problems is file names in different operating systems; MacOS has another Unicode normalisation than Windows, and hence in version control system need special attention.

所以原则上最好使用更繁琐的字节数组,或ByteArrayInputStream,或java.nio 缓冲区.还要注意 String char 是 16 位的.

So on principle it is better to use the more cumbersome byte arrays, or ByteArrayInputStream, or java.nio buffers. Mind also that String chars are 16 bit.

相关文章