/mili

Debugging Application Crash in Linux

Recently we had an issue with rsyslog daemon, now and then it was crashing and the only way to debug such an application (especially if it’s not something developed by your team) is using core dump. It’s not kind of task that you do every day and it took me a while to search and remember how did it last time.

Enable Core Dump

The first is step is to enable the Core Dump, It’s pretty simple. Follow this guide and you can enable it in no time.

The core dump only will be generated for those process that started after the above changes. Don’t forget to restart the daemon or application you have problem with.

Wait for the next crash!

Make coffee and enjoy the life and be prepared for the next crash :)

We are using Zenoss, it’s monitoring the important services like apache, rsyslog and etc. If the process is not running we get an SMS right away.

Install debugging packages

Make sure you have gdb package installed.

1
yum install gdb

GDB is GNU Debugger is the standard debugger for the GNU operating system.

If you are using Centos during the debugging you might see some messages like

1
2
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6

In order to be able to debug and see the full stacktrace you need to install yum-utils. Then you need to install the debugging packages for the application with debuginfo-install. It installs headers and debugging tools that are required to debug rsyslog daemon.

1
2
yum install yum-utils
debuginfo-install  rsyslog-5.8.10-8.el6.x86_64

Read the Core dump

What you are going to get out of this core dump is stacktrace of application and the exact line of code that caused this failure which requires some programming skills. . I’m going to explain how did I debug and read the stacktrace for rsyslog but you can follow the same steps to do it for any other application.

  • Run gdb for the core dump

You need to run the following command to start the gdb for that specific core dump.

1
2
3
4
5
6
7
8
9
gdb /sbin/rsyslogd /tmp/core-rs-action-que-6-0-0-28309-1402524331
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
...

Core was generated by `/sbin/rsyslogd -i /var/run/syslogd.pid -c 5'.
Program terminated with signal 6, Aborted.
 0x00007fea37ff0925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb)

Based on my configuration the core dump saved under /tmp/core-rs-action-que-6-0-0-28309-1402524331.

  • Now gdb prompt is ready for a command.

If you are new to gdb give yourself a faviour and check this gdb crash course

The First command that I usally run especially in this suiation is where. It spits out the stacktrace and the line of code which was running when the crash happend.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(gdb) where
 #0  0x00007fea37ff0925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
 #1  0x00007fea37ff2105 in abort () at abort.c:92
 #2  0x00007fea38fece17 in sigsegvHdlr (signum=6) at debug.c:830
 #3  <signal handler called>
 #4  0x00007fea37ff0925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
 #5  0x00007fea37ff2105 in abort () at abort.c:92
 #6  0x00007fea3802e837 in __ libc _ message (do_abort=2, fmt=0x7fea38116ac0 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:198
    #7  0x00007fea38034166 in malloc_printerr (action=3, str=0x7fea38114bdc "invalid fastbin entry (free)", ptr=<value optimized out>)
        at malloc.c:6332
        #8  0x00007fea38ffadf9 in qDelLinkedList (pThis=<value optimized out>) at queue.c:586
        #9  0x00007fea38ffb95c in DoDeleteBatchFromQStore (pThis=0x7fea398d9da0, nElem=1) at queue.c:1340
        #10 0x00007fea38ffe40d in DeleteBatchFromQStore (pThis=0x7fea398d9da0, pWti=<value optimized out>) at queue.c:1368
        #11 DeleteProcessedBatch (pThis=0x7fea398d9da0, pWti=<value optimized out>) at queue.c:1428
        #12 DequeueConsumableElements (pThis=0x7fea398d9da0, pWti=<value optimized out>) at queue.c:1457
        #13 DequeueConsumable (pThis=0x7fea398d9da0, pWti=<value optimized out>) at queue.c:1505
        #14 0x00007fea38ffe603 in DequeueForConsumer (pThis=<value optimized out>, pWti=<value optimized out>) at queue.c:1642
        #15 ConsumerReg (pThis=<value optimized out>, pWti=<value optimized out>) at queue.c:1696
        #16 0x00007fea38ff7126 in wtiWorker (pThis=0x7fea398da220) at wti.c:313
        #17 0x00007fea38ff6c1a in wtpWorker (arg=0x7fea398da220) at wtp.c:387
        #18 0x00007fea3897b9d1 in start_thread (arg=0x7fea17fff700) at pthread_create.c:301
        #19 0x00007fea380a6b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
        (gdb)

What I can read from this output is the program crashed in queue.c in function qDelLinkedList at line 586. My gut feeling is it has something to do with memory allocation. In order to follow up more deeply I had to find the right source code that matches with my application. We found out that the application crashed on calling free system call over a variable. That looks like a dead end to me! Fortunately there was a new package of libc available on Centos, we upgraded it and so for everything works smoothly.

These steps are just a starting point for debugging a crash. I wish you a wonderful journey!