为什么在 C++ 中从标准输入读取行比 Python 慢得多?
我想比较使用 Python 和 C++ 从标准输入读取字符串输入的行数,并震惊地发现我的 C++ 代码的运行速度比等效的 Python 代码慢一个数量级.由于我的 C++ 生疏了,而且我还不是 Python 专家,请告诉我我做错了什么或误解了什么.
I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.
(TLDR 答案:包括以下语句:cin.sync_with_stdio(false)
或仅使用 fgets
代替.
(TLDR answer: include the statement: cin.sync_with_stdio(false)
or just use fgets
instead.
TLDR 结果:一直向下滚动到我的问题底部并查看表格.)
TLDR results: scroll all the way down to the bottom of my question and look at the table.)
C++ 代码:
#include <iostream>
#include <time.h>
using namespace std;
int main() {
string input_line;
long line_count = 0;
time_t start = time(NULL);
int sec;
int lps;
while (cin) {
getline(cin, input_line);
if (!cin.eof())
line_count++;
};
sec = (int) time(NULL) - start;
cerr << "Read " << line_count << " lines in " << sec << " seconds.";
if (sec > 0) {
lps = line_count / sec;
cerr << " LPS: " << lps << endl;
} else
cerr << endl;
return 0;
}
// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp
Python 等效项:
#!/usr/bin/env python
import time
import sys
count = 0
start = time.time()
for line in sys.stdin:
count += 1
delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
lines_per_sec = int(round(count/delta_sec))
print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
lines_per_sec))
这是我的结果:
$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889
$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000
我应该注意到我在 Mac OS X v10.6.8 (Snow Leopard) 和 Linux 2.6.32 (Red Hat Linux 6.2) 下都试过这个.前者是MacBook Pro,后者是非常强大的服务器,并不是说这太贴切了.
$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP: Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP: Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
微小的基准附录和回顾
Tiny benchmark addendum and recap
为了完整起见,我想我会用原始(同步的)C++ 代码更新同一个盒子上同一个文件的读取速度.同样,这是针对快速磁盘上的 100M 行文件.这是比较,有几种解决方案/方法:
For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:
实现 | 每秒行数 |
---|---|
python(默认) | 3,571,428 |
cin (default/naive) | 819,672 |
cin (不同步) | 12,500,000 |
fgets | 14,285,714 |
wc(不公平比较) | 54,644,808 |
推荐答案
tl;dr: 因为 C++ 中不同的默认设置需要更多的系统调用.
默认情况下,cin
与 stdio 同步,这会导致它避免任何输入缓冲.如果您将其添加到 main 的顶部,您应该会看到更好的性能:
tl;dr: Because of different default settings in C++ requiring more system calls.
By default, cin
is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:
std::ios_base::sync_with_stdio(false);
通常,当缓冲输入流时,不是一次读取一个字符,而是以更大的块读取流.这减少了通常相对昂贵的系统调用的数量.但是,由于基于 FILE*
的 stdio
和 iostreams
通常具有单独的实现,因此具有单独的缓冲区,如果同时使用两者,这可能会导致问题一起.例如:
Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE*
based stdio
and iostreams
often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:
int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);
如果 cin
读取的输入比实际需要的多,则第二个整数值将不可用于 scanf
函数,该函数有自己的独立缓冲区.这会导致意想不到的结果.
If more input was read by cin
than it actually needed, then the second integer value wouldn't be available for the scanf
function, which has its own independent buffer. This would lead to unexpected results.
为了避免这种情况,默认情况下,流与 stdio
同步.实现此目的的一种常见方法是让 cin
根据需要使用 stdio
函数一次读取每个字符.不幸的是,这会带来很多开销.对于少量输入,这不是什么大问题,但是当您读取数百万行时,性能损失会很大.
To avoid this, by default, streams are synchronized with stdio
. One common way to achieve this is to have cin
read each character one at a time as needed using stdio
functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn't a big problem, but when you are reading millions of lines, the performance penalty is significant.
幸运的是,库设计者决定如果您知道自己在做什么,您也应该能够禁用此功能以提高性能,因此他们提供了 sync_with_stdio
方法.
Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio
method.
相关文章