什么更有效?使用 pow 平方或只是乘以它自己?

2021-12-20 00:00:00 optimization c c++

这两种方法在C中哪个更有效?以及如何:

What of these two methods is in C more efficient? And how about:

pow(x,3)

对比

x*x*x // etc?

推荐答案

UPDATE 2021

我修改了基准代码如下:

I've modified the benchmark code as follows:

std::chrono 用于计时测量而不是 boost
C++11 代替 rand()
避免重复操作可能会被吊出.基本参数不断变化.

我使用 GCC 10 -O2 得到以下结果(以秒为单位):

I get the following results with GCC 10 -O2 (in seconds):

exp c++ pow c pow x*x*x... 2 0.204243 1.39962 0.0902527 3 1.36162 1.38291 0.107679 4 1.37717 1.38197 0.106103 5 1.3815 1.39139 0.117097

GCC 10 -O3 几乎与 GCC 10 -O2 相同.

GCC 10 -O3 is almost identical to GCC 10 -O2.

使用 GCC 10 -O2 -ffast-math:

With GCC 10 -O2 -ffast-math:

exp c++ pow c pow x*x*x... 2 0.203625 1.4056 0.0913414 3 0.11094 1.39938 0.108027 4 0.201593 1.38618 0.101585 5 0.102141 1.38212 0.10662

使用 GCC 10 -O3 -ffast-math:

With GCC 10 -O3 -ffast-math:

exp c++ pow c pow x*x*x... 2 0.0451995 1.175 0.0450497 3 0.0470842 1.20226 0.051399 4 0.0475239 1.18033 0.0473844 5 0.0522424 1.16817 0.0522291

使用 Clang 12 -O2:

With Clang 12 -O2:

exp c++ pow c pow x*x*x... 2 0.106242 0.105435 0.105533 3 1.45909 1.4425 0.102235 4 1.45629 1.44262 0.108861 5 1.45837 1.44483 0.1116

Clang 12 -O3 几乎与 Clang 12 -O2 相同.

Clang 12 -O3 is almost identical to Clang 12 -O2.

使用 Clang 12 -O2 -ffast-math:

With Clang 12 -O2 -ffast-math:

exp c++ pow c pow x*x*x... 2 0.0233731 0.0232457 0.0231076 3 0.0271074 0.0266663 0.0278415 4 0.026897 0.0270698 0.0268115 5 0.0312481 0.0296402 0.029811

Clang 12 -O3 -ffast-math 几乎与 Clang 12 -O2 -ffast-math 相同.

Clang 12 -O3 -ffast-math is almost identical to Clang 12 -O2 -ffast-math.

机器是 Linux 5.4.0-73-generic x86_64 上的 Intel Core i7-7700K.

Machine is Intel Core i7-7700K on Linux 5.4.0-73-generic x86_64.

结论:

使用 GCC 10(无 -ffast-math)，x*x*x... 总是更快
使用 GCC 10 -O2 -ffast-math，std::pow 和 x*x*x... 对于odd 一样快em> 指数
使用 GCC 10 -O3 -ffast-math，对于所有测试用例，std::pow 与 x*x*x... 一样快，并且是大约是 -O2 的两倍.
使用 GCC 10，C 的 pow(double, double) 总是慢得多
使用 Clang 12(无 -ffast-math)，x*x*x... 对于大于 2 的指数会更快
使用 Clang 12 -ffast-math，所有方法都会产生相似的结果
在 Clang 12 中，pow(double, double) 与 std::pow 对于整数指数一样快
在没有让编译器比你聪明的情况下编写基准测试是困难的.

With GCC 10 (no -ffast-math), x*x*x... is always faster

With GCC 10 -O2 -ffast-math, std::pow is as fast as x*x*x... for odd exponents

With GCC 10 -O3 -ffast-math, std::pow is as fast as x*x*x... for all test cases, and is around twice as fast as -O2.

With GCC 10, C's pow(double, double) is always much slower

With Clang 12 (no -ffast-math), x*x*x... is faster for exponents greater than 2

With Clang 12 -ffast-math, all methods produce similar results

With Clang 12, pow(double, double) is as fast as std::pow for integral exponents

Writing benchmarks without having the compiler outsmart you is hard.

我最终会在我的机器上安装更新版本的 GCC，并在我这样做时更新我的??结果.

I'll eventually get around to installing a more recent version of GCC on my machine and will update my results when I do so.

这是更新的基准代码:

#include <cmath> #include <chrono> #include <iostream> #include <random> using Moment = std::chrono::high_resolution_clock::time_point; using FloatSecs = std::chrono::duration<double>; inline Moment now() { return std::chrono::high_resolution_clock::now(); } #define TEST(num, expression) double test##num(double b, long loops) { double x = 0.0; auto startTime = now(); for (long i=0; i<loops; ++i) { x += expression; b += 1.0; } auto elapsed = now() - startTime; auto seconds = std::chrono::duration_cast<FloatSecs>(elapsed); std::cout << seconds.count() << " "; return x; } TEST(2, b*b) TEST(3, b*b*b) TEST(4, b*b*b*b) TEST(5, b*b*b*b*b) template <int exponent> double testCppPow(double base, long loops) { double x = 0.0; auto startTime = now(); for (long i=0; i<loops; ++i) { x += std::pow(base, exponent); base += 1.0; } auto elapsed = now() - startTime; auto seconds = std::chrono::duration_cast<FloatSecs>(elapsed); std::cout << seconds.count() << " "; return x; } double testCPow(double base, double exponent, long loops) { double x = 0.0; auto startTime = now(); for (long i=0; i<loops; ++i) { x += ::pow(base, exponent); base += 1.0; } auto elapsed = now() - startTime; auto seconds = std::chrono::duration_cast<FloatSecs>(elapsed); std::cout << seconds.count() << " "; return x; } int main() { using std::cout; long loops = 100000000l; double x = 0; std::random_device rd; std::default_random_engine re(rd()); std::uniform_real_distribution<double> dist(1.1, 1.2); cout << "exp c++ pow c pow x*x*x..."; cout << " 2 "; double b = dist(re); x += testCppPow<2>(b, loops); x += testCPow(b, 2.0, loops); x += test2(b, loops); cout << " 3 "; b = dist(re); x += testCppPow<3>(b, loops); x += testCPow(b, 3.0, loops); x += test3(b, loops); cout << " 4 "; b = dist(re); x += testCppPow<4>(b, loops); x += testCPow(b, 4.0, loops); x += test4(b, loops); cout << " 5 "; b = dist(re); x += testCppPow<5>(b, loops); x += testCPow(b, 5.0, loops); x += test5(b, loops); std::cout << " " << x << " "; }

旧答案，2010 年

我使用此代码测试了 x*x*... 与 pow(x,i) 对于小型 i 之间的性能差异:

I tested the performance difference between x*x*... vs pow(x,i) for small i using this code:

#include <cstdlib> #include <cmath> #include <boost/date_time/posix_time/posix_time.hpp> inline boost::posix_time::ptime now() { return boost::posix_time::microsec_clock::local_time(); } #define TEST(num, expression) double test##num(double b, long loops) { double x = 0.0; boost::posix_time::ptime startTime = now(); for (long i=0; i<loops; ++i) { x += expression; x += expression; x += expression; x += expression; x += expression; x += expression; x += expression; x += expression; x += expression; x += expression; } boost::posix_time::time_duration elapsed = now() - startTime; std::cout << elapsed << " "; return x; } TEST(1, b) TEST(2, b*b) TEST(3, b*b*b) TEST(4, b*b*b*b) TEST(5, b*b*b*b*b) template <int exponent> double testpow(double base, long loops) { double x = 0.0; boost::posix_time::ptime startTime = now(); for (long i=0; i<loops; ++i) { x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); x += std::pow(base, exponent); } boost::posix_time::time_duration elapsed = now() - startTime; std::cout << elapsed << " "; return x; } int main() { using std::cout; long loops = 100000000l; double x = 0.0; cout << "1 "; x += testpow<1>(rand(), loops); x += test1(rand(), loops); cout << " 2 "; x += testpow<2>(rand(), loops); x += test2(rand(), loops); cout << " 3 "; x += testpow<3>(rand(), loops); x += test3(rand(), loops); cout << " 4 "; x += testpow<4>(rand(), loops); x += test4(rand(), loops); cout << " 5 "; x += testpow<5>(rand(), loops); x += test5(rand(), loops); cout << " " << x << " "; }

结果是:

1 00:00:01.126008 00:00:01.128338 2 00:00:01.125832 00:00:01.127227 3 00:00:01.125563 00:00:01.126590 4 00:00:01.126289 00:00:01.126086 5 00:00:01.126570 00:00:01.125930 2.45829e+54

请注意，我累积了每次 pow 计算的结果，以确保编译器不会对其进行优化.

Note that I accumulate the result of every pow calculation to make sure the compiler doesn't optimize it away.

如果我使用 std::pow(double, double) 版本，并且 loops = 1000000l，我得到:

If I use the std::pow(double, double) version, and loops = 1000000l, I get:

1 00:00:00.011339 00:00:00.011262 2 00:00:00.011259 00:00:00.011254 3 00:00:00.975658 00:00:00.011254 4 00:00:00.976427 00:00:00.011254 5 00:00:00.973029 00:00:00.011254 2.45829e+52

这是在运行 Ubuntu 9.10 64 位的 Intel Core Duo 上.使用带有 -o2 优化的 gcc 4.4.1 编译.

This is on an Intel Core Duo running Ubuntu 9.10 64bit. Compiled using gcc 4.4.1 with -o2 optimization.

所以在 C 中，是的 x*x*x 会比 pow(x, 3) 快，因为没有 pow(double, int) 重载.在 C++ 中，它大致相同.(假设我的测试方法是正确的.)

So in C, yes x*x*x will be faster than pow(x, 3), because there is no pow(double, int) overload. In C++, it will be the roughly same. (Assuming the methodology in my testing is correct.)

这是对 An Markm 的评论的回应:

This is in response to the comment made by An Markm:

即使发出了 using namespace std 指令，如果 pow 的第二个参数是 int，那么 std::pow(double, int) 来自 <cmath> 的重载将被调用，而不是来自 < 的 ::pow(double, double);math.h>.

Even if a using namespace std directive was issued, if the second parameter to pow is an int, then the std::pow(double, int) overload from <cmath> will be called instead of ::pow(double, double) from <math.h>.

此测试代码确认了该行为:

This test code confirms that behavior:

#include <iostream> namespace foo { double bar(double x, int i) { std::cout << "foo::bar "; return x*i; } } double bar(double x, double y) { std::cout << "::bar "; return x*y; } using namespace foo; int main() { double a = bar(1.2, 3); // Prints "foo::bar" std::cout << a << " "; return 0; }

相关文章