带有递归算法的 SIGSEGV

2022-01-12 00:00:00 segmentation-fault gdb c++

我实现了一个递归算法，它运行了大约 1730 次递归，然后因神秘的 SIGSEGV 而崩溃.我尝试了我的 gdb 并得到以下输出:

I implemented a recursive algorithm which runs for about 1730 recursions and then crashes with a mysterious SIGSEGV. I tried my gdb and got the following output:

Program received signal SIGSEGV, Segmentation fault. 0x000000000040b7da in Town::get_cur_capacity (this=0x61efe0) at ./solver/Darstellung.cpp:54 54 return left_over_capacity; (gdb) print 0x61efe0 $1 = 6418400 (gdb) print *0x61efe0 Cannot access memory at address 0x61efe0 (gdb) print this $2 = (const Town * const) 0x61efe0 (gdb)

调试器怎么可能知道它应该是一个 const Town 指针，但不能访问内存给我一个转储?我很确定这个方法没有错误，因为它在崩溃前被使用了几千次，就像程序中的所有其他函数一样.有没有可能这是与操作系统相关的问题?我正在使用 Linux Ubuntu 64 位.

How can it be that the debugger does know that it should be a const Town pointer but is not able to access the memory to give me a dump? I am quite sure that there is no error within this method as it is used several thousand times before the crash, like every other function within the Program. Is there any chance that this is a OS related problem? I am using Linux Ubuntu 64-Bit.

我的简化算法:

bool solveproblem(ptr_to_model) { if( way_one(ptr_to_model) ) return true; if(way_two(ptr_to_model) ) return true; if( check_if_solved) return true; return false; } bool way_one(ptr_to_model) { for(go_through_current_problem_configuration) { if(check_stuff) { ptr_to_model->execute_partial_solution(...); //adds another problem configuration to the stack within the model if(solveproblem(ptr_to_model)) return true; ptr_to_model->redo_last_step(); } } return false; } bool way_two(...) { /*basicly the same as way one*/ } bool check_if_solve(...) { if(problem_solved) return true; else return false; }

该模型与名称相似，它代表了算法通过在其堆栈上推送一个新层"所完成的所有步骤，这是一个由旧问题提出的修改(跳跃简化)问题，考虑到部分算法评估的解决方案.希望我把它缩小到足够和可以理解的范围内.

The model is similiar to the name, it represents all steps the algorithm made through the time by pushing a new "layer" on its stack which is a modified (hopfully simplified) problem made out of the old one, considering the partial solution evaluated by the algorithm. Hope i narrowed it down enough and understandable.

推荐答案

如果您的递归深度达到 1700 级，那么您越过堆栈并损坏可能很容易导致此类崩溃的调用参数也就不足为奇了.

If you're 1700 levels deep in recursion it's not unbelievable that you overran your stack and corrupted a call parameter which could easily lead to this sort of crash.

如果您使用 g++，请尝试添加 -fstack-protector-all 以查看它是否有助于您获得更好的诊断.

If you use g++ try adding -fstack-protector-all to see if it helps you get better diagnostics.

另一个指标是，如果您在 gdb 中的回溯变得循环或不通向任何地方:这是一个强有力的指标，堆栈已损坏.

Another indicator is if your backtrace inside gdb becomes circular or doesn't lead anywhere: This is a strong indicator the stack has become corrupted.

对于评论的回应，没有一种万无一失的方法来确定某些事情是堆栈溢出还是更正常"的堆损坏.显然，如果 valgrind 可用，它始终是解决内存错误的可靠选择.您可以在 shell 中使用 ulimit 或(我相信)setrlimit 以编程方式配置堆栈限制.请注意，存在硬性上限，并且通常最好将递归更改为减少堆栈滥用而不是增加堆栈大小.

And in response to the comment, there isn't a sure-fire way to determine if something is a stack overflow or a "more normal" heap corruption. Obviously valgrind is always a solid option for memory errors if it's available. You can use ulimit in your shell or (I believe) setrlimit programmatically to configure the stack limit. Note that there are hard upper bound limits and that it's often better to change your recursion to be less stack-abusive rather than increasing the stack size.

相关文章