std::ifstream 是否比 FILE 慢得多?
我被告知我的库比它应该的慢,解析特定文件(文本文件,大小为 326 kb)太慢了 30 多倍.用户建议可能是我正在使用 std::ifstream
(大概而不是 FILE
).
I've been informed that my library is slower than it should be, on the order of 30+ times too slow parsing a particular file (text file, size 326 kb). The user suggested that it may be that I'm using std::ifstream
(presumably instead of FILE
).
我不想盲目重写,所以我想我会先检查这里,因为我猜瓶颈在别处.我正在逐个字符地阅读,所以我使用的唯一函数是 get()
、peek()
和 tellg()/seekg()
.
I'd rather not blindly rewrite, so I thought I'd check here first, since my guess would be the bottleneck is elsewhere. I'm reading character by character, so the only functions I'm using are get()
, peek()
, and tellg()/seekg()
.
更新:
我进行了分析,并得到了令人困惑的输出 - gprof 似乎并不认为需要这样做长.我重新编写了程序,先将整个文件读入缓冲区,然后速度提高了大约 100 倍.我认为问题可能出在 tellg()/seekg()
花了很长时间,但 gprof 可能由于某种原因无法看到.在任何情况下,ifstream
似乎都不会缓冲整个文件,即使对于这个大小也是如此.
I profiled, and got confusing output - gprof didn't appear to think that it took so long. I rewrote the program to read the entire file into a buffer first, and it sped up by about 100x. I think the problem may have been the tellg()/seekg()
that took a long time, but gprof may have been unable to see that for some reason. In any case, ifstream
does not appear to buffer the entire file, even for this size.
推荐答案
我不认为这会有什么不同.尤其是当您逐个字符读取时,I/O 的开销可能会完全支配任何.为什么一次读取单个字节?你知道它是多么的低效吗?
I don't think that'd make a difference. Especially if you're reading char by char, the overhead of I/O is likely to completely dominate anything else. Why do you read single bytes at a time? You know how extremely inefficient it is?
对于 326kb 的文件,最快的解决方案很可能是立即将其读入内存.
On a 326kb file, the fastest solution will most likely be to just read it into memory at once.
std::ifstream 和 C 等价物之间的区别基本上是一两个虚函数调用.如果每秒执行几千万次可能会有所不同,否则,不是真的.文件 I/O 通常很慢,以至于用于访问它的 API 并不重要.更重要的是读/写模式.很多搜索都不好,顺序读/写很好.
The difference between std::ifstream and the C equivalents, is basically a virtual function call or two. It may make a difference if executed a few tens of million times per second, otherwise, not reall. file I/O is generally so slow that the API used to access it doesn't really matter. What matters far more is the read/write pattern. Lots of seeks are bad, sequential reads/writes good.
相关文章