在 C++ 类中使用虚方法的性能成本是多少?

2021-12-08 00:00:00 performance c++ virtual-functions

在 C++ 类(或其任何父类)中至少有一个虚拟方法意味着该类将有一个虚拟表,并且每个实例都有一个虚拟指针.

Having at least one virtual method in a C++ class (or any of its parent classes) means that the class will have a virtual table, and every instance will have a virtual pointer.

所以内存开销就很清楚了.最重要的是实例的内存成本(特别是如果实例很小,例如如果它们只是打算包含一个整数:在这种情况下,在每个实例中都有一个虚拟指针可能会使实例的大小增加一倍.至于虚拟表使用的内存空间,我想与实际方法代码使用的空间相比,通常可以忽略不计.

So the memory cost is quite clear. The most important is the memory cost on the instances (especially if the instances are small, for example if they are just meant to contain an integer: in this case having a virtual pointer in every instance might double the size of the instances. As for the memory space used up by the virtual tables, I guess it is usually negligible compared to the space used up by the actual method code.

这让我想到了一个问题:使方法虚拟化是否存在可衡量的性能成本(即速度影响)?每次调用方法时都会在运行时在虚拟表中进行查找,因此如果对此方法的调用非常频繁,并且如果此方法很短,那么可能会出现可衡量的性能下降?我想这取决于平台,但有人运行过一些基准测试吗?

This brings me to my question: is there a measurable performance cost (i.e. speed impact) for making a method virtual? There will be a lookup in the virtual table at runtime, upon every method call, so if there are very frequent calls to this method, and if this method is very short, then there might be a measurable performance hit? I guess it depends on the platform, but has anyone run some benchmarks?

我问这个问题的原因是我遇到了一个错误,该错误是由于程序员忘记定义 virtual 方法所致.这不是我第一次看到这种错误.我想:为什么我们在需要时添加 virtual 关键字而不是删除 virtual 关键字,而我们绝对确定它不需要?如果性能成本低,我想我会在我的团队中简单地推荐以下内容:在每个类中将every方法默认设为虚拟,包括析构函数,并且仅在需要时将其删除.你觉得这很疯狂吗?

The reason I am asking is that I came across a bug that happened to be due to a programmer forgetting to define a method virtual. This is not the first time I see this kind of mistake. And I thought: why do we add the virtual keyword when needed instead of removing the virtual keyword when we are absolutely sure that it is not needed? If the performance cost is low, I think I will simply recommend the following in my team: simply make every method virtual by default, including the destructor, in every class, and only remove it when you need to. Does that sound crazy to you?

推荐答案

I 在 3ghz 有序 PowerPC 处理器上运行一些计时.在该架构上,虚拟函数调用比直接(非虚拟)函数调用多花费 7 纳秒.

I ran some timings on a 3ghz in-order PowerPC processor. On that architecture, a virtual function call costs 7 nanoseconds longer than a direct (non-virtual) function call.

因此,除非函数类似于简单的 Get()/Set() 访问器,否则不值得担心成本,其中除内联之外的任何东西都有些浪费.内联到 0.5ns 的函数的 7ns 开销是严重的;一个需要 500 毫秒来执行的函数的 7 纳秒开销是没有意义的.

So, not really worth worrying about the cost unless the function is something like a trivial Get()/Set() accessor, in which anything other than inline is kind of wasteful. A 7ns overhead on a function that inlines to 0.5ns is severe; a 7ns overhead on a function that takes 500ms to execute is meaningless.

虚函数的巨大成本实际上并不是在 vtable 中查找函数指针(通常只是一个循环),而是间接跳转通常无法进行分支预测.这可能会导致大的流水线气泡,因为在间接跳转(通过函数指针的调用)退出并计算新的指令指针之前,处理器无法获取任何指令.因此,虚函数调用的成本比从程序集看起来要大得多……但仍然只有 7 纳秒.

The big cost of virtual functions isn't really the lookup of a function pointer in the vtable (that's usually just a single cycle), but that the indirect jump usually cannot be branch-predicted. This can cause a large pipeline bubble as the processor cannot fetch any instructions until the indirect jump (the call through the function pointer) has retired and a new instruction pointer computed. So, the cost of a virtual function call is much bigger than it might seem from looking at the assembly... but still only 7 nanoseconds.

Andrew、不确定和其他人也提出了一个很好的观点,即虚函数调用可能导致指令缓存未命中:如果跳转到不在缓存中的代码地址,那么当指令从主存储器中取出时,整个程序就停止了.这总是一个明显的停顿:在氙气上,大约 650 个周期(根据我的测试).

Andrew, Not Sure, and others also raise the very good point that a virtual function call may cause an instruction cache miss: if you jump to a code address that is not in cache then the whole program comes to a dead halt while the instructions are fetched from main memory. This is always a significant stall: on Xenon, about 650 cycles (by my tests).

然而,这不是虚函数特有的问题,因为如果跳转到不在缓存中的指令,即使是直接的函数调用也会导致未命中.重要的是该函数是否最近运行过(使其更有可能在缓存中),以及您的架构是否可以预测静态(非虚拟)分支并提前将这些指令提取到缓存中.我的 PPC 没有,但也许英特尔最新的硬件有.

However this isn't a problem specific to virtual functions because even a direct function call will cause a miss if you jump to instructions that aren't in cache. What matters is whether the function has been run before recently (making it more likely to be in cache), and whether your architecture can predict static (not virtual) branches and fetch those instructions into cache ahead of time. My PPC does not, but maybe Intel's most recent hardware does.

我的时间控制了 icache 未命中对执行的影响(故意的,因为我试图孤立地检查 CPU 管道),所以他们打折了这个成本.

My timings control for the influence of icache misses on execution (deliberately, since I was trying to examine the CPU pipeline in isolation), so they discount that cost.

相关文章