为什么在 C++ 中从标准输入读取行比 Python 慢得多?

2022-01-30 00:00:00 python benchmarking iostream c++ getline

我想比较使用 Python 和 C++ 从标准输入读取字符串输入的行数，并震惊地发现我的 C++ 代码的运行速度比等效的 Python 代码慢一个数量级.由于我的 C++ 生疏了，而且我还不是 Python 专家，请告诉我我做错了什么或误解了什么.

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.

(TLDR 答案:包括以下语句:cin.sync_with_stdio(false) 或仅使用 fgets 代替.

(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

TLDR 结果:一直向下滚动到我的问题底部并查看表格.)

TLDR results: scroll all the way down to the bottom of my question and look at the table.)

C++ 代码:

#include <iostream> #include <time.h> using namespace std; int main() { string input_line; long line_count = 0; time_t start = time(NULL); int sec; int lps; while (cin) { getline(cin, input_line); if (!cin.eof()) line_count++; }; sec = (int) time(NULL) - start; cerr << "Read " << line_count << " lines in " << sec << " seconds."; if (sec > 0) { lps = line_count / sec; cerr << " LPS: " << lps << endl; } else cerr << endl; return 0; } // Compiled with: // g++ -O3 -o readline_test_cpp foo.cpp

Python 等效项:

#!/usr/bin/env python import time import sys count = 0 start = time.time() for line in sys.stdin: count += 1 delta_sec = int(time.time() - start_time) if delta_sec >= 0: lines_per_sec = int(round(count/delta_sec)) print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec, lines_per_sec))

这是我的结果:

$ cat test_lines | ./readline_test_cpp Read 5570000 lines in 9 seconds. LPS: 618889 $ cat test_lines | ./readline_test.py Read 5570000 lines in 1 seconds. LPS: 5570000

我应该注意到我在 Mac OS X v10.6.8 (Snow Leopard) 和 Linux 2.6.32 (Red Hat Linux 6.2) 下都试过这个.前者是MacBook Pro，后者是非常强大的服务器，并不是说这太贴切了.

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done

Test run 1 at Mon Feb 20 21:29:28 EST 2012 CPP: Read 5570001 lines in 9 seconds. LPS: 618889 Python:Read 5570000 lines in 1 seconds. LPS: 5570000 Test run 2 at Mon Feb 20 21:29:39 EST 2012 CPP: Read 5570001 lines in 9 seconds. LPS: 618889 Python:Read 5570000 lines in 1 seconds. LPS: 5570000 Test run 3 at Mon Feb 20 21:29:50 EST 2012 CPP: Read 5570001 lines in 9 seconds. LPS: 618889 Python:Read 5570000 lines in 1 seconds. LPS: 5570000 Test run 4 at Mon Feb 20 21:30:01 EST 2012 CPP: Read 5570001 lines in 9 seconds. LPS: 618889 Python:Read 5570000 lines in 1 seconds. LPS: 5570000 Test run 5 at Mon Feb 20 21:30:11 EST 2012 CPP: Read 5570001 lines in 10 seconds. LPS: 557000 Python:Read 5570000 lines in 1 seconds. LPS: 5570000

微小的基准附录和回顾

Tiny benchmark addendum and recap

为了完整起见，我想我会用原始(同步的)C++ 代码更新同一个盒子上同一个文件的读取速度.同样，这是针对快速磁盘上的 100M 行文件.这是比较，有几种解决方案/方法:

For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:
<头>
实现每秒行数
python(默认) 3,571,428
cin (default/naive) 819,672
cin (不同步) 12,500,000
fgets 14,285,714
wc(不公平比较) 54,644,808

推荐答案

tl;dr: 因为 C++ 中不同的默认设置需要更多的系统调用.
默认情况下，cin 与 stdio 同步，这会导致它避免任何输入缓冲.如果您将其添加到 main 的顶部，您应该会看到更好的性能:

tl;dr: Because of different default settings in C++ requiring more system calls.

By default, cin is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:

std::ios_base::sync_with_stdio(false);

通常，当缓冲输入流时，不是一次读取一个字符，而是以更大的块读取流.这减少了通常相对昂贵的系统调用的数量.但是，由于基于 FILE* 的 stdio 和 iostreams 通常具有单独的实现，因此具有单独的缓冲区，如果同时使用两者，这可能会导致问题一起.例如:

Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE* based stdio and iostreams often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:

int myvalue1; cin >> myvalue1; int myvalue2; scanf("%d",&myvalue2);

如果 cin 读取的输入比实际需要的多，则第二个整数值将不可用于 scanf 函数，该函数有自己的独立缓冲区.这会导致意想不到的结果.

If more input was read by cin than it actually needed, then the second integer value wouldn't be available for the scanf function, which has its own independent buffer. This would lead to unexpected results.

为了避免这种情况，默认情况下，流与 stdio 同步.实现此目的的一种常见方法是让 cin 根据需要使用 stdio 函数一次读取每个字符.不幸的是，这会带来很多开销.对于少量输入，这不是什么大问题，但是当您读取数百万行时，性能损失会很大.

To avoid this, by default, streams are synchronized with stdio. One common way to achieve this is to have cin read each character one at a time as needed using stdio functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn't a big problem, but when you are reading millions of lines, the performance penalty is significant.

幸运的是，库设计者决定如果您知道自己在做什么，您也应该能够禁用此功能以提高性能，因此他们提供了 sync_with_stdio 方法.

Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio method.

实现	每秒行数
python(默认)	3,571,428
cin (default/naive)	819,672
cin (不同步)	12,500,000
fgets	14,285,714
wc(不公平比较)	54,644,808

相关文章