“适当"用 C++/STL 存储二进制数据的方法

2022-01-07 00:00:00 binary-data c++ stl

一般来说,在 C++ 中存储二进制数据的最佳方式是什么?据我所知,这些选项几乎可以归结为使用字符串或向量字符.(我将省略 char*s 和 malloc()s 的可能性,因为我特指 C++).

In general, what is the best way of storing binary data in C++? The options, as far as I can tell, pretty much boil down to using strings or vector<char>s. (I'll omit the possibility of char*s and malloc()s since I'm referring specifically to C++).

通常我只使用一个字符串,但是我不确定是否有我遗漏的开销,或者 STL 在内部进行的转换可能会干扰二进制数据的完整性.有没有人对此有任何指示(har)?以一种或另一种方式提出建议或偏好?

Usually I just use a string, however I'm not sure if there are overheads I'm missing, or conversions that STL does internally that could mess with the sanity of binary data. Does anyone have any pointers (har) on this? Suggestions or preferences one way or another?

推荐答案

char 向量很好,因为内存是连续的.因此,您可以将它与许多 C API 一起使用,例如 Berkley 套接字或文件 API.您可以执行以下操作,例如:

vector of char is nice because the memory is contiguious. Therefore you can use it with a lot of C API's such as berkley sockets or file APIs. You can do the following, for example:

  std::vector<char> vect;
  ...
  send(sock, &vect[0], vect.size());

它会正常工作.

您基本上可以像对待任何其他动态分配的字符缓冲区一样对待它.您可以上下扫描以寻找幻数或模式.您可以就地部分解析它.对于从套接字接收,您可以非常轻松地调整其大小以附加更多数据.

You can essentially treat it just like any other dynamically allocated char buffer. You can scan up and down looking for magic numbers or patters. You can parse it partially in place. For receiving from a socket you can very easily resize it to append more data.

缺点是调整大小不是非常有效(谨慎调整大小或预分配)并且从数组前面删除也将非常低效.例如,如果您需要非常频繁地一次从数据结构的前端弹出一两个字符,则在此处理之前复制到双端队列可能是一种选择.这会花费你一个复制和双端队列内存不是连续的,所以你不能只传递一个指向 C API 的指针.

The downside is resizing is not terribly efficient (resize or preallocate prudently) and deletion from the front of the array will also be very ineficient. If you need to, say, pop just one or two chars at a time off the front of the data structure very frequently, copying to a deque before this processing may be an option. This costs you a copy and deque memory isn't contiguous, so you can't just pass a pointer to a C API.

最重要的是,在深入研究之前了解数据结构及其权衡,但是字符向量通常是我在一般实践中看到的.

Bottom line, learn about the data structures and their tradeoffs before diving in, however vector of char is typically what I see used in general practice.

相关文章