find function symbols from a corrupted stack¶
公司面试时,被问到gdb里bt显示全是?时怎么办? 这个情况遇到过就是栈写坏了,但我只能说我不知道。进公司后,私下跟面试我的同事请教到了,可以通过 x 命令从rsp寄存器的地址,向上打印,来找到更上层的调用函数,这样能帮助缩小定位范围。
具体步骤¶
例如有代码如下,foo中memset向从低地址向高地址写,数组操作越界,导致将高地址的栈内存置0了。
void foo(){
char array[] = "abcdefg";
memset(array, 0, 50);
}
void bar(){
char array[] = "abcdefg";
foo();
}
int main(){
bar();
return 0;
}
根据栈的结构,变量array的高地址还有return address 和RBP,这导致foo结束时,无法找到调用它的函数地址,即RIP被恢复成0了,无法继续执行。
previous frame base
bt¶
在gdb中bt打印如下,忽略libc的堆栈,只有foo打印出来了,因为rsp寄存器的值不会被写坏。
(gdb) bt
#0 0x0000720946963a2c in ?? () from /usr/lib/libc.so.6
#1 0x00007209469091a0 in raise () from /usr/lib/libc.so.6
#2 0x00007209468f05fe in abort () from /usr/lib/libc.so.6
#3 0x00007209468f1697 in ?? () from /usr/lib/libc.so.6
#4 0x00007209469f0b80 in __fortify_fail () from /usr/lib/libc.so.6
#5 0x00007209469f1f54 in __stack_chk_fail () from /usr/lib/libc.so.6
#6 0x000062ef5ac461a1 in foo () at array.c:9
#7 0x0000000000000000 in ?? ()
但是如果用x来打印,虽然看不到bar被调用,但能看到 main。加大打印个数,还能看到 libc_start_main。
(gdb) x/150ag $rsp
// ...
0x7ffd8485e750: 0x7ffd8485e760 0x7209469f1f54
0x7ffd8485e760: 0x7ffd8485e790 0x62ef5ac461a1 <foo+88>
0x7ffd8485e770: 0x800 0x0
0x7ffd8485e780: 0x0 0x0
0x7ffd8485e790: 0x0 0x0
0x7ffd8485e7a0: 0x0 0x0
0x7ffd8485e7b0: 0x7ffd84850000 0x62ef5ac4620e <main+42>
0x7ffd8485e7c0: 0x67666564636261 0xb7d3eb533165fa00
0x7ffd8485e7d0: 0x7ffd8485e880 0x7209468f26c1
0x7ffd8485e7e0: 0x7ffd8485e8c0 0x7ffd8485e908
0x7ffd8485e7f0: 0x146af5000 0x62ef5ac461e4 <main>
0x7ffd8485e800: 0x7ffd8485e840 0x720946b136c6
0x7ffd8485e810: 0x0 0x670c08cb34d52e6e
0x7ffd8485e820: 0x7ffd8485e908 0x1
0x7ffd8485e830: 0x720946b2d000 <_rtld_global> 0x62ef5ac48dd8
0x7ffd8485e840: 0x670c08cb2a152e6e 0x7ce58cdea93f2e6e
0x7ffd8485e850: 0x7ffd00000000 0x0
0x7ffd8485e860: 0x0 0x0
0x7ffd8485e870: 0x7ffd8485e908 0xb7d3eb533165fa00
0x7ffd8485e880: 0x7ffd8485e8e0 0x7209468f27f9 <__libc_start_main+137>
Example 2¶
# gdb transcoder core.3000171.transcoder
Core was generated by `/opt/transcoder 20'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f804e18396f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f8042a48700 (LWP 3000176))]
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-151.el8.x86_64 xz-libs-5.2.4-3.el8.x86_64 zlib-1.2.11-17.el8.x86_64
(gdb) bt
#0 0x00007f804e18396f in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1 0x0000007f803c000b in ?? ()
#2 0x0d00007f8042a454 in ?? ()
#3 0xf000000001004779 in ?? ()
#4 0x0000000000000000 in ?? ()
(gdb)
Use X RSP
(gdb) x/150ag $rsp
0x7f8042a453e8: 0x7f803c000b 0xd00007f8042a454
0x7f8042a453f8: 0xf000000001004779 0x0
0x7f8042a45408: 0x0 0x0
0x7f8042a45418: 0x0 0x0
0x7f8042a45428: 0x0 0x0
0x7f8042a45438: 0x0 0x0
0x7f8042a45448: 0x0 0x0
0x7f8042a45458: 0x0 0x0
0x7f8042a45468: 0x0 0x0
0x7f8042a45478: 0x0 0x0
0x7f8042a45488: 0x0 0x0
0x7f8042a45498: 0x0 0x0
0x7f8042a454a8: 0x0 0x9b
0x7f8042a454b8: 0x0 0x9b
0x7f8042a454c8: 0x0 0x0
0x7f8042a454d8: 0x20e43b0 0x76190e3c
0x7f8042a454e8: 0x1427dcf <TranscodeChannel+47> 0xa000000007
It gets '0x7f8042a454e8: 0x1427dcf
# addr2line -e ./transcoder 0x1427dcf
??:?
start from previous frames at a higher address x/150ag $rsp + 100
0x7f8042a456e8: 0x7f8042a47fc0 0x7f8042a45760
0x7f8042a456f8: 0x4707a3 <encode+334> 0x8
0x7f8042a45708: 0x1427dcf <stTranscodeChannel+47> 0x1429058 <stTranscodeChannel+4792>
0x7f8042a45718: 0x100000000 0xa0
0x7f8042a45728: 0x7f8042a46790 0x700000000
0x7f8042a45738: 0x0 0xa0
0x7f8042a45748: 0x7ffccf087a5f 0x0
0x7f8042a45758: 0x0 0x7f8042a47710
0x7f8042a45768: 0x470fe1 <msg_proc+1182> 0x0
0x7f8042a45778: 0x7f8042a47734 0x700000000
0x7f8042a45788: 0x7f8042a47744 0x0
It gets '0x7f8042a456f8: 0x4707a3
# addr2line -e ./transcoder 0x4707a3
/code/main/channel.c:975
# try some more
# addr2line -e ./transcoder 0x1427dcf
??:?
# addr2line -e ./transcoder 0x470fe1
/code/main/main/channel.c:1103
after cross-validate by other means, it turn out to be reliable.