“适当"用 C++/STL 存储二进制数据的方法

2022-01-07 00:00:00 binary-data c++ stl

一般来说,在 C++ 中存储二进制数据的最佳方式是什么?据我所知,这些选项几乎可以归结为使用字符串或向量字符.(我将省略 char*s 和 malloc()s 的可能性,因为我特指 C++).

In general, what is the best way of storing binary data in C++? The options, as far as I can tell, pretty much boil down to using strings or vector<char>s. (I'll omit the possibility of char*s and malloc()s since I'm referring specifically to C++).

通常我只使用一个字符串,但是我不确定是否有我遗漏的开销,或者 STL 在内部进行的转换可能会干扰二进制数据的完整性.有没有人对此有任何指示(har)?以一种或另一种方式提出建议或偏好?

Usually I just use a string, however I'm not sure if there are overheads I'm missing, or conversions that STL does internally that could mess with the sanity of binary data. Does anyone have any pointers (har) on this? Suggestions or preferences one way or another?


char 向量很好,因为内存是连续的.因此,您可以将它与许多 C API 一起使用,例如 Berkley 套接字或文件 API.您可以执行以下操作,例如:

vector of char is nice because the memory is contiguious. Therefore you can use it with a lot of C API's such as berkley sockets or file APIs. You can do the following, for example:

  std::vector<char> vect;
  send(sock, &vect[0], vect.size());



You can essentially treat it just like any other dynamically allocated char buffer. You can scan up and down looking for magic numbers or patters. You can parse it partially in place. For receiving from a socket you can very easily resize it to append more data.

缺点是调整大小不是非常有效(谨慎调整大小或预分配)并且从数组前面删除也将非常低效.例如,如果您需要非常频繁地一次从数据结构的前端弹出一两个字符,则在此处理之前复制到双端队列可能是一种选择.这会花费你一个复制和双端队列内存不是连续的,所以你不能只传递一个指向 C API 的指针.

The downside is resizing is not terribly efficient (resize or preallocate prudently) and deletion from the front of the array will also be very ineficient. If you need to, say, pop just one or two chars at a time off the front of the data structure very frequently, copying to a deque before this processing may be an option. This costs you a copy and deque memory isn't contiguous, so you can't just pass a pointer to a C API.


Bottom line, learn about the data structures and their tradeoffs before diving in, however vector of char is typically what I see used in general practice.
