在 C++ 中寻找搜索和替换的圣杯
最近我正在寻找一种方法来替换字符串中的标记,这基本上是查找和替换(但至少有一种额外的方法可以解决这个问题)并且看起来像是相当平庸的任务.我带来了几个可能的实现,但从性能的角度来看,它们都不令人满意.最好的成就是每次迭代约 50us.这种情况很理想,字符串的大小从未增长过,最初我省略了不区分大小写的要求
这是
您可以点击图例显示/隐藏特定系列.
结论
有两个突出的结果
- MFC模仿功能和
- 我自己的手动替换
这些函数至少比其他函数快一个数量级,那么有什么区别呢?
所有这些慢"函数都是用 C++ 编写的,而 fast 是用 C 编写的(不是纯 C,当输出大小增加时,我懒得处理输出缓冲区的 malloc/realloc).好吧,我想很明显,有时别无选择,只能求助于纯 C.出于安全原因和缺乏类型安全性,我个人反对使用 C.此外,编写高质量的 C 代码只需要更多的专业知识和注意力.
暂时不会标记为答案,等待对此结论的评论.
我要感谢所有积极参与讨论、提出想法并指出我示例中的不一致之处的人.
2019 年更新:
只是为了保留代码:https://github.com/kreuzerkrieg/string_search_replace
Nonius 结果:https://kreuzerkrieg.github.io/string_search_replace/
在 Ubuntu 18.04 上使用 gcc-9 运行
Recently I was looking for a way to replace tokens in string which is essentially find and replace (but there is at least one additional approach to the problem) and looks like quite banal task. I've came with couple of possible implementations but none of them was satisfying from performance point of view. Best achievement was ~50us per iteration. The case was ideal, the string was never growing in size and initially I've omitted the requirement to be case insensitive
Here is the code at Coliru
Results from my machine:
Boost.Spirit symbols result: 3421?=3421
100000 cycles took 6060ms.
Boyer-Moore results:3421?=3421
100000 cycles took 5959ms.
Boyer Moore Hospool result:3421?=3421
100000 cycles took 5008ms.
Knuth Morris Pratt result: 3421?=3421
100000 cycles took 12451ms.
Naive STL search and replace result: 3421?=3421
100000 cycles took 5532ms.
Boost replace_all result:3421?=3421
100000 cycles took 4860ms.
So the question, what takes so long in such a simple task? One can say, ok, simple task, go ahead and implement it better. But the reality is that 15 years old MFC naive implementation does the task order of magnitude faster:
CString FillTokenParams(const CString& input, const std::unordered_map<std::string, std::string>& tokens)
{
CString tmpInput = input;
for(const auto& token : tokens)
{
int pos = 0;
while(pos != -1)
{
pos = tmpInput.Find(token.first.c_str(), pos);
if(pos != -1)
{
int tokenLength = token.first.size();
tmpInput.Delete(pos, tokenLength);
tmpInput.Insert(pos, token.second.c_str());
pos += 1;
}
}
}
return tmpInput;
}
result:
MFC naive search and replace result:3421?=3421
100000 cycles took 516ms.
How come this clumsy code outperforms modern C++??? why other implementations were so slow? Am I missing something fundamental?
EDIT001: I've invested in this question, the code been profiled and triple checked. You can be unsatisfied with this and that, but std::string::replace is not what taking time. In any STL implementation search is what taking most of the time, boost spirit wastes time on allocation of tst (test node in evaluation tree I guess). I dont expect anyone pointing on a line in a function with "this is your problem" and poof, the problem is gone. The question is how did MFC manages to do the same work 10 times faster.
EDIT002: Just drilled down into MFC implementation of Find and wrote a function which mimics the MFC implementation
namespace mfc
{
std::string::size_type Find(const std::string& input, const std::string& subString, std::string::size_type start)
{
if(subString.empty())
{
return std::string::npos;
}
if(start < 0 || start > input.size())
{
return std::string::npos;
}
auto found = strstr(input.c_str() + start, subString.c_str());
return ((found == nullptr) ? std::string::npos : std::string::size_type(found - input.c_str()));
}
}
std::string MFCMimicking(const std::string& input, const std::unordered_map<std::string, std::string>& tokens)
{
auto tmpInput = input;
for(const auto& token : tokens)
{
auto pos = 0;
while(pos != std::string::npos)
{
pos = mfc::Find(tmpInput, token.first, pos);
if(pos != std::string::npos)
{
auto tokenLength = token.first.size();
tmpInput.replace(pos, tokenLength, token.second.c_str());
pos += 1;
}
}
}
return tmpInput;
}
Results:
MFC mimicking expand result:3421?=3421
100000 cycles took 411ms.
Meaning 4us. per call, go beat that C strstr
EDIT003: Compiling and running with -Ox
MFC mimicking expand result:3421?=3421
100000 cycles took 660ms.
MFC naive search and replace result:3421?=3421
100000 cycles took 856ms.
Manual expand result:3421?=3421
100000 cycles took 1995ms.
Boyer-Moore results:3421?=3421
100000 cycles took 6911ms.
Boyer Moore Hospool result:3421?=3421
100000 cycles took 5670ms.
Knuth Morris Pratt result: 3421?=3421
100000 cycles took 13825ms.
Naive STL search and replace result: 3421?=3421
100000 cycles took 9531ms.
Boost replace_all result:3421?=3421
100000 cycles took 8996ms.
running with -O2 (as in original measurements) but 10k cycles
MFC mimicking expand result:3421?=3421
10000 cycles took 104ms.
MFC naive search and replace result:3421?=3421
10000 cycles took 105ms.
Manual expand result:3421?=3421
10000 cycles took 356ms.
Boyer-Moore results:3421?=3421
10000 cycles took 1355ms.
Boyer Moore Hospool result:3421?=3421
10000 cycles took 1101ms.
Knuth Morris Pratt result: 3421?=3421
10000 cycles took 1973ms.
Naive STL search and replace result: 3421?=3421
10000 cycles took 923ms.
Boost replace_all result:3421?=3421
10000 cycles took 880ms.
解决方案
Ok, it will be a long story. Just to remind you questions asked.
- Why the search and replace using C++ (various approaches) so slow?
- Why the MFC search and replace so fast?
Surprisingly, both questions have the same answer. Because of C++ overhead. Yep. Our shiny modern C++ has an overhead which mostly we dismiss for the flexibility and elegance.
However, when it comes to sub-microsecond resolutions (not that C++ is not capable of doing things in nanosecond resolutions) the overhead becomes more prominent.
Let me showcase with the same code I've posted in the question, but it is more coherent with things done in each function.
Live On Coliru
It uses aforementioned Nonius (thanks to @sehe) and interactive results are here.
You can click the legend to show/hide particular series.
Conclusions
There are two outstanding results
- MFC mimicking function and
- my own manual replace
These functions at least one order of magnitude are faster than the rest, so what is the difference?
All these "slow" functions are written in C++ when the fast is written in C (not pure C, I was too lazy to deal with malloc/realloc of output buffers when the output grows in size). Well, I guess it is clear that sometimes there is no choice but resort to pure C. I personally against using C for security reasons and lack of type safety. In addition it simply requires more expertise and attention to write high quality C code.
I will not mark it as an answer for a while, waiting for comments on this conclusion.
I want to thank all those who actively participated in the discussion, raised ideas and pointed out inconsistencies in my example.
2019 update:
Just to preserve the code: https://github.com/kreuzerkrieg/string_search_replace
Nonius results: https://kreuzerkrieg.github.io/string_search_replace/
Ran with gcc-9 on Ubuntu 18.04
相关文章