如何使用 gdb 从核心文件中获取 lua 堆栈跟踪

2021-12-30 00:00:00 lua gdb coredump c++

我有一个 C++ 应用程序(用于 OS X),它调用 lua 作为脚本语言.我正在运行大量这些应用程序(100 个)并且它们可以运行很长一段时间(几天或几周).

I have a C++ application (for OS X) that calls lua as a scripting language. I'm running a large number of these applications (100s) and they can run for a very long time (days or weeks).

有时会崩溃.当它崩溃时,它给我留下了一个可爱的核心文件.

Sometimes one crashes. And when it crashes it leaves me a lovely core file.

我可以在 gdb 中打开这个核心文件并找到应用程序崩溃的地方.我可以遍历调用堆栈并找到一个 lua_State 变量的实例.我的问题是我想看看 lua 调用堆栈是什么样的这次……

I can open this core file in gdb and find where the application crashes. I can walk the call stack and find an instance of a lua_State variable. My problem is that I'd like to see what the lua call stack looks like at this time...

请记住,由于这是一个核心,我无权调用 C 函数,这排除了调试 lua 脚本的几种常用方法.

Keep in mind that since this is a core I don't have access to calling C functions, which rules out several of the usual ways of debugging lua scripts.

我希望避免通过调试挂钩添加手动跟踪,因为我担心额外的性能损失和增加的复杂性.

Id like to avoid adding manual traces through debug hooks as I'm worried about the additional performance penalties, and added complexity.

如何遍历lua内部结构来获取调用栈信息?

How can I traverse the lua internal structures to get at call stack information?

推荐答案

我已经创建了一个 GDB 脚本来执行由 macs 链接的网页中的内容.它并不漂亮,可能应该正确地包装成一个函数等,但这里是为了好奇.

I've created a GDB script to do the stuff in the web page linked to by macs. Its not beautiful, and should probably be properly wrapped into a function etc, but here it is for the curious.

注意:似乎网页关于 lua 函数的文件名是错误的.在字符串来自 luaL_dofile() 的情况下,文件名以 @ 符号开头.如果它们是从 lua_dostring() 调用的.在这种情况下,$filename 变量被设置为传递给 lua_dostring() 的整个字符串 - 用户可能只对其中的一两行上下文感兴趣那个文件.我不知道如何解决这个问题.

NOTE: It seems that the web page is wrong about the filename for lua functions. In the case where the string comes from luaL_dofile() the filename starts with a @ symbol. If they're called from lua_dostring(). In that case the $filename variable is set to the whole of the string passed to lua_dostring() - and the user is probably only interested in one or two lines of context from that file. I wasn't sure how to fix that up.

set $p = L->base_ci
while ($p <= L->ci )
  if ( $p->func->value.gc->cl.c.isC == 1 )
    printf "0x%x   C FUNCTION", $p
    output $p->func->value.gc->cl.c.f
    printf "
"
  else
    if ($p->func.tt==6)
      set $proto = $p->func->value.gc->cl.l.p
      set $filename = (char*)(&($proto->source->tsv) + 1)
      set $lineno = $proto->lineinfo[ $p->savedpc - $proto->code -1 ]
      printf "0x%x LUA FUNCTION : %d %s
", $p, $lineno, $filename
    else
      printf "0x%x LUA BASE
", $p
    end
  end
  set $p = $p+1
end

这输出类似:

0x1002b0 LUA BASE
0x1002c8 LUA FUNCTION : 4 @a.lua
0x1002e0 LUA FUNCTION : 3 @b.lua
0x100310   C FUNCTION(lua_CFunction) 0x1fda <crash_function(lua_State*)>

当我通过这段代码调试崩溃时:

When I debug the crash from this code:

// This is a file designed to crash horribly when run.
// It should generate a core, and it should crash inside some lua functions

#include "lua.h"
#include "lualib.h"
#include "lauxlib.h"

#include <iostream>
#include <signal.h>

int crash_function(lua_State * L)
{
  raise( SIGABRT ); //This should dump core!
  return 0;
}



int main()
{
  lua_State * L = luaL_newstate();
  lua_pushcfunction(L, crash_function);
  lua_setfield(L, LUA_GLOBALSINDEX, "C");

  luaopen_base(L);
  if( 1 == luaL_dofile(L, "a.lua" ))
  {
    std::cout<<"ERROR: "<<lua_tostring(L,-1)<<std::endl;
    return 1;
  }
  if( 1 == luaL_dofile(L, "b.lua" ))
  {
    std::cout<<"ERROR: "<<lua_tostring(L,-1)<<std::endl;
    return 1;
  }

  lua_getfield(L, LUA_GLOBALSINDEX, "A");
  lua_pcall(L, 0, 0, NULL);
}

使用a.lua

-- a.lua
-- just calls B, which calls C which should crash
function A()
  B()
end

和 b.lua

-- b.lua
function B()
  C()
end

相关文章