在 C++11 中计算编译时(constexpr)中的斐波那契数(递归方法)
我在编译时编写了斐波那契数计算程序(constexpr)使用 C++11 支持的模板元编程技术的问题.目的这是为了计算模板元编程方法与旧的传统方法之间的运行时间差异.
I wrote the program Fibonacci number calculation in compile time (constexpr) problem using the template metaprogramming techniques supported in C++11. The purpose of this is to calculate the difference in the run-time between the template metaprogramming approach and the old conventional approach.
// Template Metaprograming Approach
template<int N>
constexpr int fibonacci() {return fibonacci<N-1>() + fibonacci<N-2>(); }
template<>
constexpr int fibonacci<1>() { return 1; }
template<>
constexpr int fibonacci<0>() { return 0; }
// Conventional Approach
int fibonacci(int N) {
if ( N == 0 ) return 0;
else if ( N == 1 ) return 1;
else
return (fibonacci(N-1) + fibonacci(N-2));
}
我在我的 GNU/Linux 系统上为 N = 40 运行了两个程序,并测量了时间和发现传统解决方案(1.15 秒)比基于模板的解决方案(0.55 秒)慢大约两倍.这是一项重大改进,因为这两种方法都基于递归.
I ran both programs for N = 40 on my GNU/Linux system and measured the time and found that that conventional solution (1.15 second) is around two times slower than the template-based solution (0.55 second). This is a significant improvement as both approaches are based on the recursion.
为了更深入地理解它,我在 g++ 中编译了程序(-fdump-tree-all 标志),发现编译器实际上生成了 40 个不同的函数(如 fibonacci<40>、fibonacci<39>...斐波那契<0>).
To understand it more I compiled the program (-fdump-tree-all flag) in g++ and found that compiler actually generated the 40 different functions (like fibonacci<40>, fibonacci<39>...fibonacci<0>).
constexpr int fibonacci() [with int N = 40] () {
int D.29948, D.29949, D.29950;
D.29949 = fibonacci<39> ();
D.29950 = fibonacci<38> ();
D.29948 = D.29949 + D.29950;
return D.29948;
}
constexpr int fibonacci() [with int N = 39] () {
int D.29952, D.29953, D.29954;
D.29953 = fibonacci<38> ();
D.29954 = fibonacci<37> ();
D.29952 = D.29953 + D.29954;
return D.29952;
}
...
...
...
constexpr int fibonacci() [with int N = 0] () {
int D.29962;
D.29962 = 0;
return D.29962;
}
我也在GDB中调试了程序,发现上面的功能都是执行次数与传统递归方法相同.如果程序的两个版本都以相同的次数(递归)执行该函数,那么模板元编程技术如何实现这一点?我还想知道您对基于模板元编程的方法与其他版本相比如何以及为什么要花费一半时间的看法?这个程序能不能比现在的程序更快?
I also debugged the program in GDB and found that all the above functions are executed an equal number of times as with the conventional recursive approach. If both versions of the program are executing the function an equal number of times (recursive), then how is this achieved by template metaprogramming techniques? I would also like to know your opinion about how and why a template metaprogramming based approach is taking half time compared to the other version? Can this program be made faster than the current one?
基本上,我的目的是尽可能多地了解内部发生的事情.
Basically my intention here is to understand what's going on internally as much as possible.
我的机器是带有 GCC 4.8.1 的 GNU/Linux,我对两个程序都使用了优化 -o3
.
My machine is GNU/Linux with GCC 4.8.1, and I used the optimization -o3
for both programs.
推荐答案
试试这个:
template<size_t N>
struct fibonacci : integral_constant<size_t, fibonacci<N-1>{} + fibonacci<N-2>{}> {};
template<> struct fibonacci<1> : integral_constant<size_t,1> {};
template<> struct fibonacci<0> : integral_constant<size_t,0> {};
使用 clang 和 -Os
,编译时间大约为 0.5 秒,零 时间运行 N=40
.您的传统"方法大约在 0.4 秒内编译并在 0.8 秒内运行.只是为了检查,结果是102334155
对吗?
With clang and -Os
, this compiles in roughly 0.5s and runs in zero time for N=40
. Your "conventional" approach compiles in roughly 0.4s and runs in 0.8s. Just for checking, the result is 102334155
right?
当我尝试您自己的 constexpr
解决方案时,编译器运行了几分钟,然后我停止了它,因为显然内存已满(计算机开始冻结).编译器正在尝试计算最终结果,而您的实现在编译时使用效率极低.
When I tried your own constexpr
solution the compiler run for a couple of minutes and then I stopped it because apparently memory was full (computer started freezing). The compiler was trying to compute the final result and your implementation is extremely inefficient to be used at compile time.
使用此解决方案,在实例化 N
时会重复使用 N-2
、N-1
处的模板实例化.所以 fibonacci<40>
实际上在编译时作为一个值被知道,在运行时没有什么可做的.这是一种动态编程方法,当然,如果您在 N<计算之前将所有值存储在
0
到 N-1
处,当然您可以在运行时执行相同的操作/代码>.
With this solution, template instantiations at N-2
, N-1
are re-used when instantiating N
. So fibonacci<40>
is actually known at compile time as a value, and there is nothing to do at run-time. This is a dynamic programming approach and of course you can do the same at run time if you store all values at 0
through N-1
before computing at N
.
使用您的解决方案,编译器可以在编译时评估fibonacci
,但不需要.在您的情况下,所有或部分计算都留给运行时.就我而言,所有计算都是在编译时尝试的,因此永无止境.
With your solution, the compiler can evaluate fibonacci<N>()
at compile time but is not required to. In your case, all or part of computation is left for run time. In my case, all computation is attempted at compile time, hence never ending.
相关文章