读取二进制文件的惯用 C++17 标准方法是什么?
通常我只会使用 C 风格的文件 IO,但我正在尝试现代 C++ 方法,包括使用 C++17 特定功能 std::byte
和 std::文件系统
.
Normally I would just use C style file IO, but I'm trying a modern C++ approach, including using the C++17 specific features std::byte
and std::filesystem
.
将整个文件读入内存,传统方法:
Reading an entire file into memory, traditional method:
#include <stdio.h>
#include <stdlib.h>
char *readFileData(char *path)
{
FILE *f;
struct stat fs;
char *buf;
stat(path, &fs);
buf = (char *)malloc(fs.st_size);
f = fopen(path, "rb");
fread(buf, fs.st_size, 1, f);
fclose(f);
return buf;
}
将整个文件读入内存,现代方法:
Reading an entire file into memory, modern approach:
#include <filesystem>
#include <fstream>
#include <string>
using namespace std;
using namespace std::filesystem;
auto readFileData(string path)
{
auto fileSize = file_size(path);
auto buf = make_unique<byte[]>(fileSize);
basic_ifstream<byte> ifs(path, ios::binary);
ifs.read(buf.get(), fileSize);
return buf;
}
这看起来对吗?这可以改进吗?
Does this look about right? Can this be improved?
推荐答案
我个人更喜欢 std::vector
使用 std::string
除非您正在阅读实际的文本文档.make_unique
的问题在于您会立即丢失数据的大小,并且必须将其放入单独的变量中.鉴于它不会零初始化,它可能比 std::vector
快一小部分.但我认为这可能总是被读取磁盘所花费的时间所掩盖.
Personally I prefer std::vector<std::byte>
to using std::string
unless you are reading an actual text document. The problem with make_unique<byte[]>(fileSize);
is that you instantly lose the size of the data and have to carry it in a separate variable. It may be a tiny fraction faster than a std::vector<std::byte>
given that it won't zero initialize. But I think that will probably always be overshadowed by the time taken reading off the disk.
所以对于一个二进制文件,我使用这样的东西:
So for a binary file I use something like this:
std::vector<std::byte> load_file(std::string const& filepath)
{
std::ifstream ifs(filepath, std::ios::binary|std::ios::ate);
if(!ifs)
throw std::runtime_error(filepath + ": " + std::strerror(errno));
auto end = ifs.tellg();
ifs.seekg(0, std::ios::beg);
auto size = std::size_t(end - ifs.tellg());
if(size == 0) // avoid undefined behavior
return {};
std::vector<std::byte> buffer(size);
if(!ifs.read((char*)buffer.data(), buffer.size()))
throw std::runtime_error(filepath + ": " + std::strerror(errno));
return buffer;
}
这是我所知道的最快的方法.也避免了一个常见的判断文件中数据大小的错误,因为ifs.tellg()
和最后打开文件后的文件大小不一定相同,ifs.seekg(0)
理论上不是定位文件开头的正确方法(尽管它在大多数地方都适用).
This is the fastest method I know of. It also avoids a common mistake in determining the size of the data in the file because ifs.tellg()
is not necessarily the same as the file size after opening the file at the end and ifs.seekg(0)
is not theoretically the correct way to locate the start of the file (even though it works in practice most places).
来自 std::strerror(errno)
的错误信息保证适用于 POSIX
系统(应该包括 Microsoft,但不确定).
The error message from std::strerror(errno)
is guaranteed to work on POSIX
systems (that should include Microsoft but not sure).
显然你可以使用 std::filesystem::path const&如果需要,文件路径
代替 std::string
.
Obviously you can use std::filesystem::path const& filepath
in place of std::string
if you want.
另外,特别是对于 C++17
之前的版本,你可以使用 std::vector
或 std::vector
如果您没有或不想使用 std::byte
.
Also, especially for pre C++17
, you can use std::vector<unsigned char>
or std::vector<char>
if you don't have or want to use std::byte
.
相关文章