在基准测试时防止编译器优化

我最近遇到了这个精彩的 cpp2015 演讲 CppCon 2015:Chandler Carruth调整 C++:基准,以及CPU 和编译器!天哪!"

I recently came across this brilliant cpp2015 talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"

提到的防止编译器优化代码的技术之一是使用以下函数.

One of the techniques mentioned to prevent the compiler from optimizing code is using the below functions.

static void escape(void *p) {
  asm volatile("" : : "g"(p) : "memory");
}

static void clobber() {
  asm volatile("" : : : "memory");
}

void benchmark()
{
  vector<int> v;
  v.reserve(1);
  escape(v.data());
  v.push_back(10);
  clobber()
}

我试图理解这一点.问题如下.

I'm trying to understand this. Questions as follows.

1) 与 clobber 相比,escape 的优势是什么?

1) What is the advantage of an escape over clobber ?

2) 从上面的例子来看,看起来 clobber() 阻止了之前的语句 ( push_back ) 被优化的方式.如果是这样,为什么下面的代码片段不正确?

2) From the example above it looks like clobber() prevents the previous statement ( push_back ) to be optimized way. If that's the case why the below snippet is not correct ?

 void benchmark()
 {
     vector<int> v;
     v.reserve(1);
     v.push_back(10);
     clobber()
 }

如果这还不够混乱,folly(FB 的线程库)有一个偶数 陌生人实现

If this wasn't confusing enough, folly ( FB's threading lib ) has an even stranger implementation

相关片段:

template <class T>
void doNotOptimizeAway(T&& datum) {
  asm volatile("" : "+r" (datum));
}

我的理解是上面的代码片段通知编译器汇编块将写入数据.但是如果编译器发现没有这个数据的消费者,它仍然可以优化出生产数据的实体对吗?

My understanding is that the above snippet informs the compiler that the assembly block will writes to datum. But if the compiler finds there is no consumer of this datum it can still optimize out the entity producing datum right ?

我认为这不是常识,不胜感激!

I assume this is not common knowledge and any help is appreciated !

推荐答案

tl;dr doNotOptimizeAway 创建人为的使用".

tl;dr doNotOptimizeAway creates an artificial "use"s.

这里有一点术语:def"(定义")是一个语句,它为变量赋值;use"是一个语句,它使用变量的值来执行一些操作.

A little bit of terminology here: a "def" ("definition") is a statement, which assigns a value to a variable; a "use" is a statement, which uses the value of a variable to perform some operation.

如果从紧跟在 def 之后的点开始,所有到程序退出的路径都没有遇到变量的使用,则该 def 被称为 dead 并且死代码消除 (DCE) 传递将删除它.这反过来可能会导致其他 defs 失效(如果那个 def 是由于具有可变操作数而被使用的),等等.

If from the point immediately after a def, all the paths to the program exit do not encounter a use of a variable, that def is called dead and Dead Code Elimination (DCE) pass will remove it. Which in turn may cause other defs to become dead (if that def was an use by virtue of having variable operands), etc.

想象一下经过标量替换聚合 (SRA) 后的程序,它将本地 std::vector 转换为两个变量 lenptr>.在某些时候,程序会为 ptr 赋值;该声明是一个定义.

Imagine the program after Scalar Replacement of Aggregates (SRA) pass, which turns the local std::vector in two variables len and ptr. At some point the program assigns a value to ptr; that statement is a def.

现在,原始程序没有对向量做任何事情;换句话说,没有任何使用lenptr.因此,他们的所有定义都已失效,DCE 可以删除它们,从而有效地删除所有代码并使基准测试毫无价值.

Now, the original program didn't do anything with the vector; in other words there weren't any uses of either len or ptr. Hence, all of their defs are dead and the DCE can remove them, effectively removing all code and making the benchmark worthless.

添加doNotOptimizeAway(ptr) 会造成人为使用,这会阻止 DCE 删除 defs.(作为旁注,我认为+"没有任何意义,g"应该已经足够了).

Adding doNotOptimizeAway(ptr) creates an artificial use, which prevents DCE from removing the defs. (As a side note, I see no point in the "+", "g" should have been enough).

内存加载和存储可以遵循类似的推理:如果没有到程序末尾的路径,则存储(def)是死的,其中包含来自该存储位置的加载(使用).由于跟踪任意内存位置比跟踪单个伪寄存器变量要困难得多,因此编译器保守地推理 - 如果没有通往程序末尾的路径,则存储是死的,这可能会可能遇到使用该商店.

A similar line of reasoning can be followed with memory loads and stores: a store (a def) is dead iff there is no path to the end of the program, which contains load (a use) from that store location. As tracking arbitrary memory locations is a lot harder than tracking individual pseudo-register variables, the compiler reasons conservatively - a store is dead if there is no path to the end of the program, which could possibly encounter a use of that store.

一种这样的情况,是存储到一个内存区域,保证不会被别名 - 在该内存被释放后,不可能使用该存储,这不会触发未定义的行为.IOW,没有这样的用途.

One such case, is a store to a region of memory, which is guaranteed to not be aliased - after that memory is deallocated, there could not possibly be a use of that store, which does not trigger undefined behaviour. IOW, there are no such uses.

因此编译器可以消除v.push_back(42).但是出现了 escape - 它导致 v.data() 被视为任意别名,如上面@Leon 所述.

Thus a compiler could eliminate v.push_back(42). But there comes escape - it causes the v.data() to be considered as arbitrarily aliased, as @Leon described above.

示例中 clobber() 的目的是创建对所有别名内存的人为使用.我们有一个存储(来自 push_back(42)),存储指向一个全局别名的位置(由于 escape(v.data())),因此 clobber() 可能包含对该存储的使用(IOW,可以观察到存储副作用),因此不允许编译器删除该存储.

The purpose of clobber() in the example is to create an artificial use of all of the aliased memory. We have a store (from push_back(42)), the store is to a location that is globally aliased (due to the escape(v.data())), hence clobber() could potentially contain a use of that store (IOW, the store side effect to be observable), therefore the compiler is not allowed to remove the store.

一些更简单的例子:

示例一:

void f() {
  int v[1];
  v[0] = 42;
}

这不会生成任何代码.

示例二:

extern void g();

void f() {
  int v[1];
  v[0] = 42;
  g();
}

这只会生成对 g() 的调用,没有内存存储.函数 g 不可能访问 v,因为 v 没有别名.

This generates just a call to g(), no memory store. The function g cannot possibly access v because v is not aliased.

示例三:

void clobber() {
  __asm__ __volatile__ ("" : : : "memory");
}

void f() {
  int v[1];
  v[0] = 42;
  clobber();
}

与前面的示例一样,没有生成存储,因为 v 没有别名,并且对 clobber 的调用没有内联.

Like in the previous example, no store generated because v is not aliased and the call to clobber is inlined to nothing.

示例四:

template<typename T>
void use(T &&t) {
  __asm__ __volatile__ ("" :: "g" (t));
}

void f() {
  int v[1];
  use(v);
  v[0] = 42;
}

这次 v 转义了(即可以潜在地从其他激活帧访问).但是,该存储仍然被删除,因为在它之后没有该内存的潜在用途(没有 UB).

This time v escapes (i.e. can be potentially accessed from other activation frames). However, the store is still removed, since after it there were no potential uses of that memory (without UB).

示例五:

template<typename T>
void use(T &&t) {
  __asm__ __volatile__ ("" :: "g" (t));
}

extern void g();

void f() {
  int v[1];
  use(v);
  v[0] = 42;
  g(); // same with clobber()
}

最后我们得到了存储,因为 v 转义并且编译器必须保守地假设对 g 的调用可能会访问存储的值.

And finally we get the store, because v escapes and the compiler must conservatively assume that the call to g may access the stored value.

(用于实验 https://godbolt.org/g/rFviMI)

相关文章