Use gdb to find where program stuck - c

My program is not working correctly.
It looks like it is stuck in an infinite loop or a bad mutex lock/unlock. But, I have no idea where the bug is.
I tried using gdb for debugging.
I can't use gdb backtrace command because I don't designate breakpoint.
And I can't designate it because I don't have any idea where the error is.
Does gdb have instrument for backtrace "on the fly"?

I can't use gdb backtrace command because I don't designate breakpoint.
Yes, you can.
All you need is for the inferior (being debugged) program to be stopped somewhere.
When you first attach to the program, GDB will stop all threads, and you can examine where they are. Later, you can hit Ctrl-C, and again look at all threads. A useful command is thread apply all where.

Get the process ID from 'ps -ef' of your program. Use pstack to know exactly which function it's hung in. It will print out an execution stack trace.
Example output:
$ pstack PROCESS_PID
\#0 0x00000038cfaa664e in waitpid () from /lib64/libc.so.6
\#1 0x000000000043ed42 in ?? ()
\#2 0x000000000043ffbf in wait_for ()
\#3 0x0000000000430bc9 in execute_command_internal ()
\#4 0x0000000000430dbe in execute_command ()
\#5 0x000000000041d526 in reader_loop ()
\#6 0x000000000041ccde in main ()

Related

How to fix GDB not finding file: "../sysdeps/unix/sysv/linux/raise.c:50"

We're learning to use GDB in my Computer Architecture class. To do this we do most of our work by using SSH to connect to a raspberry pi. When running GDB on some code he gave us to debug though it ends with an error message on how it can't find raise.c
I've tried:
installing libc6, libc6-dbg (says they're already up-to-date)
apt-get source glibc (gives me: "You must put some 'source' URIs in your sources.list")
https://stackoverflow.com/a/48287761/12015458 (apt source returns same thing as the apt-get source above, the "find $PWD" command the user gave returns nothing)
I've tried looking for it manually where told it may be? (/lib/libc doesn't exist for me)
This is the code he gave us to try debugging on GDB:
#include <stdio.h>
main()
{
int x,y;
y=54389;
for (x=10; x>=0; x--)
y=y/x;
printf("%d\n",y);
}
However, whenever I run the code in GDB I get the following error:
Program received signal SIGFPE, Arithmetic exception.
__GI_raise (sig=8) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
I asked him about it and he didn't really have any ideas on how to fix it.
It does not really matter that the source for raise() is not found. It would only show you the line where the exception is finally raised, but not the place where the error is triggered.
Run the erroneous program again in GDB. And when the exception is raised, investigate the call stack and the stackframes with GBDs commands. This is the point in your task, so I won't give you more than this hint.
If you're clever you can see the error in the given source just by looking at it. ;-)
When GDB does not know any symbol, you need to compile with the option -g to get debugger support.
EDIT
Now on a Windows system this is my log (please excuse the colouring, I didn't found a language selector for pure text):
D:\tmp\StackOverflow\so_027 > type crash1.c
#include <stdio.h>
main()
{
int x,y;
y=54389;
for (x=10; x>=0; x--)
y=y/x;
printf("%d\n",y);
}
D:\tmp\StackOverflow\so_027 > gcc crash1.c -g -o crash1.out
crash1.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
main()
^~~~
D:\tmp\StackOverflow\so_027 > dir
[...cut...]
04.09.2019 08:33 144 crash1.c
04.09.2019 08:40 54.716 crash1.out
D:\tmp\StackOverflow\so_027 > gdb crash1.out
GNU gdb (GDB) 8.1
[...cut...]
This GDB was configured as "x86_64-w64-mingw32".
[...cut...]
Reading symbols from crash1.out...done.
(gdb) run
Starting program: D:\tmp\StackOverflow\so_027\crash1.out
[New Thread 4520.0x28b8]
[New Thread 4520.0x33f0]
Thread 1 received signal SIGFPE, Arithmetic exception.
0x0000000000401571 in main () at crash1.c:7
7 y=y/x;
(gdb) backtrace
#0 0x0000000000401571 in main () at crash1.c:7
(gdb) help stack
Examining the stack.
The stack is made up of stack frames. Gdb assigns numbers to stack frames
counting from zero for the innermost (currently executing) frame.
At any time gdb identifies one frame as the "selected" frame.
Variable lookups are done with respect to the selected frame.
When the program being debugged stops, gdb selects the innermost frame.
The commands below can be used to select other frames by number or address.
List of commands:
backtrace -- Print backtrace of all stack frames
bt -- Print backtrace of all stack frames
down -- Select and print stack frame called by this one
frame -- Select and print a stack frame
return -- Make selected stack frame return to its caller
select-frame -- Select a stack frame without printing anything
up -- Select and print stack frame that called this one
Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
(gdb) next
Thread 1 received signal SIGFPE, Arithmetic exception.
0x0000000000401571 in main () at crash1.c:7
7 y=y/x;
(gdb) next
[Inferior 1 (process 4520) exited with code 030000000224]
(gdb) next
The program is not being run.
(gdb) quit
D:\tmp\StackOverflow\so_027 >
Well, it marks directly the erroneous source line. That is different to your environment as you use a Raspi. However, it shows you some GDB commands to try.
Concerning your video:
It is clear that inside raise() you can't access x. That's why GDB moans about it.
If an exception is raised usually the program is about to quit. So there is no value in stepping forward.
Instead, as shown in my log, use GDB commands to investigate the stack frames. I think this is the issue you are about to learn.
BTW, do you know that you should be able to copy the screen content? This will make reading so much easier for us.
From a practical standpoint the other answer is correct, but if you do want the libc sources:
apt-get source is the right way to get the sources of libc, but yes, you do need to have source repositories configured in /etc/apt/sources.list.
If you're using Ubuntu, see the deb-src lines in https://help.ubuntu.com/community/Repositories/CommandLine
For debian, see https://wiki.debian.org/SourcesList#Example_sources.list
Then apt-get source should work. Remember to tell GDB where those sources are using the "directory" command.

Debugging functions in __libc_start_main

I'm writing a library that hooks some CUDA functions to add some functionality. The "constructor" hooks the CUDA functions and set up message queue and shared memory to communicate with other hooked CUDA binaries. When launching several hooked CUDA binaries (by python subprocess.Popen('<path-to-binary>', shell=True)) some processes hangs. So I used gdb -p <pid> to attach one suspended process, hoping to figure out what's going wrong. Here's the result:
Attaching to process 7445
Reading symbols from /bin/dash...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
78 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
#1 0x000055fff93be8a0 in ?? ()
#2 0x000055fff93c009d in ?? ()
#3 0x000055fff93ba6d8 in ?? ()
#4 0x000055fff93b949e in ?? ()
#5 0x000055fff93b9eda in ?? ()
#6 0x000055fff93b7944 in ?? ()
#7 0x00007f9cefdc8b97 in __libc_start_main (main=0x55fff93b7850, argc=3, argv=0x7ffca7c7beb8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca7c7bea8) at ../csu/libc-start.c:310
#8 0x000055fff93b7a4a in ?? ()
I've added -g flag but it seems that the program hangs on wait4 before entering main.
Thanks for any insights on:
How can I load these debug symbols to get rid of ??
Where is ../csu/libc-start.c:310 located?
What else can I do to locate the bug?
System Info: gcc 6.5.0, Ubuntu 18.04 with 4.15.0-54-generic.
How can I load these debug symbols to get rid of ??
You appear to need the debug symbols for /bin/dash, which are probably going to be in a package called dash-dbg or dash-dbgsym or something like that.
Also, I suspect your stack trace would make more sense if you compiled your library with -fno-optimize-sibling-calls.
Where is ../csu/libc-start.c:310 located?
See this answer.
What else can I do to locate the bug?
You said that you are writing a library that uses __attribute__((constructor)), but you showed a stack trace for /bin/dash (which I presume is DASH and not a program you wrote) that does not appear to involve symbols from your library. I infer from this, that your library is loaded with LD_PRELOAD into programs that are not expecting it to be there.
Both of those things -- LD_PRELOAD and __attribute__((constructor)) -- break the normal expectations of both whatever unsuspecting program is involved, and the C library. You should only do those things if you have no other choice, and you should try to do as little as possible within the injected code. (In particular, I do not think any design that involves spawning processes from a constructor function will be workable, period.) If you tell us about your larger goals we may be able to suggest alternative means that are less troublesome.
EDIT:
subprocess.Popen('<path-to-binary>', shell=True)
With shell=True, Python doesn't invoke the program directly, it runs a command of the form /bin/sh -c 'string passed to Popen'. In many cases this will naturally produce a /bin/dash process sleeping (not hung) in a wait syscall for the entire lifetime of the actual binary. Unless you actually need to evaluate some shell code before running the program, try the default shell=False instead and see if that makes your problem go away. (If you do need to evaluate shell code, try Popen('<shell code>; exec <binary>', shell=True).)

Stopping gdb while loop when receiving signal

I'm trying to find a segmentation fault in my program that doesn't happen all the time. I'm trying to run my program in a loop in gdb until the segmentation fault happens.
My problem is that the gdb continues the while loop after receiving the seg fault and doesn't prompt me with the gdb shell.
when I run my gdb I use:
set $i=0
while($i<100)
set $i = $i+1
r
end
Anybody know how to make the gdb stop at first segfault and not run 100 times??
Thanks!
The gdb documentation is huge and it's difficult to find what you want but I could make that happen, and just by tweaking your script slightly.
Upon completion, gdb sets $_exitcode to the exit code value.
If segv occurs, the value isn't changed. So my idea was to set it to some stupid value (I chose 244) and run. But if return code is still 244 after the run command, then exit the loop (maybe there's another way to do it)
Warning: hack ahead (but that works)
set $i=0
while($i<100)
set $i = $i+1
set $_exitcode = 244
r
if $_exitcode==244
set $i = 200
end
end
I tested that with an interactive program. Type n for normal execution, and y to trigger segfault (well it would not trigger it, but there's a good chance for that to happen)
#include <stdio.h>
#include <stdlib.h>
int main()
{
printf("want segfault?\n");
char c = getchar();
if (c=='y')
{
printf("%s", 'a'); // this is broken on purpose, to trigger segfault
}
return 0;
}
testing in a gdb session:
(gdb) source gdbloop.txt
[New Thread 6216.0x1d2c]
want segfault?
n
[Inferior 1 (process 6216) exited normally]
[New Thread 7008.0x1264]
want segfault?
n
[Inferior 1 (process 7008) exited normally]
[New Thread 8000.0x2754]
want segfault?
y
Breakpoint 1, 0x76b2d193 in wtoi () from C:\windows\syswow64\msvcrt.dll
(gdb)
so I get the prompt back when a segfault is triggered.
You can script GDB interaction using expect.
But the solution from this answer should really be all you need here.
break on exit didn't work for me
It's possible that your program calls _exit instead of exit, so you may need to set a breakpoint there.
It's also possible that your program executes direct SYS_exit system call without going through either exit or _exit.
On Linux, you can catch this with:
catch syscall exit
catch syscall exit_group
At least one the four variants should fire (just run a program by hand). Once you know which variant actually fires, attach commands to the corresponding breakpoint, and use the solution above.

gdb can't insert a breakpoint when attach to a process

I'm trying to attach gdb to a program started by socat like this:
socat TCP-LISTEN:5678,reuseaddr,fork EXEC:./test
In another terminal,
sudo gdb
attach `pidof socat`
br *0x080487D4
when execute continue command in gdb, it shows error like this:
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x80487d4
Command aborted.
0x080487D4 is in .text of test program. The follow-fork-mode of gdb is child, I have searched online but still can't solve it.
I debugged program successfully like this way a month ago, and don't know why it doesn't work now. But it's ok if debug the program directly using gdb like this:
gdb -q ./test
However, the way above doesn't meet my needs.
Through debugging, I think gdb expects that address is a valid address in the socat rather then test program. So how can I set breakpoints in test program? Without breakpoints in test program, it will run directly to the end when execute continue command. Setting breakpoints in socat program is useless.
Any advice? Thanks in advance.
I have figured out how to set breakpoints in test program.
When start test program using socat, it won't fork a test process until a socket connection comes. So trying to set breakpoints directly in test program fails.
I use a tool(for my purpose, choose pwntools) to connect to it and suspend it ,then use gdb to attach to the forked test process. Next, I can debug normally.
Any better ideas? Thanks in advance.

No Debug Symbols cross compile ARM on BusyBox

i'm trying to debug a C program, which runs on an ARM926EJ-S rev 5 (v5l). The software was cross-compiled (and is statically linked) with the std. arm-linux-gnueabi compiler (intalled via synaptic). I run Ubuntu 13.04 64bit. On the device is a Busybox v1.18.2. I successfully compiled gdbserver (with host=arm-linux-gnueabi) and gdb (with target=arm-linux-gnueabi) and can start my program on the embedded device via the locally running gdb...
My problem now is, that i don't have a proper backtrace output.
Message of gdb:
Remote debugging using 192.168.21.127:2345
0x0000a79c in ?? ()
(gdb) run
The "remote" target does not support "run". Try "help target" or "continue".
(gdb) continue
Continuing.
Cannot access memory at address 0x0
Program received signal SIGINT, Interrupt.
0x00026628 in ?? ()
(gdb) backtrace
#0 0x00026628 in ?? ()
#1 0x00036204 in ?? ()
#2 0x00036204 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I try to compile the software with -g, -g3 -gdwarf-2, -ggdb, -ggdb3 without any difference.
Has anybody an idea what i am missing here?
Is this a problem maybe with the BusyBox or do i need additional libs on my host system?
I also tried the function backtrace_symbols from execinfo.h with nearly the same output...
Thanks in advance for any reply.
Another way for debugging is use gdb inside board follow below steps.
1)Run gdb process and attach your process to gdb using attach <pid> command
2)Continue your process using c command in gdb
Whenever you find any SIGINT or SIGSEGV then refer stack of your process using bt command in gdb.

Resources