As mentioned in the title, gdb behaves weirdly when I try to set a breakpoint at linux socket functions such as send etc. I've read through similar threads where it's been suggested to use the debug argument, but I can't set it as I'm just messing around with different Linux programs/video games and I've noticed the same behaviour - for the most part, send can't be backtraced. Only the familiar "< memory address > in ??" messages are shown, and the addresses themselves don't point to anything (can't be retrieved). At the same time, the message buffer in send (or sendto etc) is stuck at one value and not updating (while all the other values such as len are, in real time). I suppose these are simply limitations of gdb, but I'd appreciate it if someone more knowledgeable could shed some light on the issue.
EDIT:
First I'll list my steps:
As an example, I'm trying to backtrace the sendto in openarena (free linux quake-based game). Openarena in particular is not ELF readable, but I get the same results with other ELF-readable files. Because it isn't ELF-readable, I can only attach to a running process. So I type gdb /usr/games/openarena -p < process name > , though I'm pretty sure the binary path is redundant in this case
(it still says "0x7ffc8a81ebe0s": not in executable format: file format not recognized, but I'm able to list functions and everything anyway) As a side note, attaching produces this bug:
((( https://forum.manjaro.org/t/critical-bug-gdb-broken-with-last-stable-update/53155
"Error while reading shared library symbols for /lib/x86_64-linux-gnu/libpthread.so.0:"
However, in my case, I'm still able to attach, but the program eventually crashes after complaining about not being able to find a thread. This also often happens when joining a server from a lobby, but this is a side note, as I've tested programs by running them directly from gdb as well which doesn't produce the error, but still led to this weird socket behaviour. )))
So after attaching to the process, I type source script, the script containing:
break sendto
commands 1
backtrace -raw-frame-arguments on
continue
end
I then resume the program and it's firing backtraces in realtime. This is sample output after joining a server:
Thread 1 "ioquake3" hit Breakpoint 1, __libc_sendto (fd=43, buf=0x7ffefc5028f0, len=32, flags=0, addr=..., addrlen=16) at ../sysdeps/unix/sysv/linux/sendto.c:25
25 in ../sysdeps/unix/sysv/linux/sendto.c
#0 __libc_sendto (fd=43, buf=0x7ffefc5028f0, len=32, flags=0, addr=..., addrlen=16) at ../sysdeps/unix/sysv/linux/sendto.c:25
#1 0x00005642f88bc5fc in ?? ()
#2 0x00005642f88baf94 in ?? ()
#3 0x00005642f888b1ee in ?? ()
#4 0x00005642f88779cf in ?? ()
#5 0x00005642f8886572 in ?? ()
#6 0x00005642f88a58bf in ?? ()
#7 0x00005642f886e3f5 in main ()
Thread 1 "ioquake3" hit Breakpoint 1, __libc_sendto (fd=43, buf=0x7ffefc5028f0, len=34, flags=0, addr=..., addrlen=16) at ../sysdeps/unix/sysv/linux/sendto.c:25
25 in ../sysdeps/unix/sysv/linux/sendto.c
#0 __libc_sendto (fd=43, buf=0x7ffefc5028f0, len=34, flags=0, addr=..., addrlen=16) at ../sysdeps/unix/sysv/linux/sendto.c:25
#1 0x00005642f88bc5fc in ?? ()
#2 0x00005642f88baf94 in ?? ()
#3 0x00005642f888b1ee in ?? ()
#4 0x00005642f88779cf in ?? ()
#5 0x00005642f8886572 in ?? ()
#6 0x00005642f88a58bf in ?? ()
#7 0x00005642f886e3f5 in main ()
As you can see, there is nothing between main and the send, the socket buffer is stuck at the same message, while len is updating correctly. I can perform any kind of actions, jump, shoot, and the output still stays the same. As I mentioned, I get pretty much the same output with other applications. There's some main function/loop, then nothing and then just the send function.
As for my system specs, I'm on Kubuntu 21.04,
GDB version is: GNU gdb (Ubuntu 10.1-2ubuntu2) 10.1.90.20210411-git
Glibc: Ubuntu GLIBC 2.33-0ubuntu5
I've migrated recently from an earlier LTS release, the upgrade might not have been entirely clean, I suppose...
As you can see, there is nothing between main and the send, the socket buffer is stuck at the same message, while len is updating correctly.
A few points:
There are no function names (which isn't the same as "nothing"). That is expected IF there is no symbol table. GDB uses symbol table(s) from loaded binaries to translate addresses into function names.
If the binary is fully stripped, or if the code generated into memory directly, or if the binary is decompressed or decrypted into memory, then you would need to teach GDB where it can get the symbol table from (if the symbol table exists at all, which isn't a given).
The socket buffer being "stuck" is not necessarily unexpected either: the program is very likely to be doing repeated sendto calls using the same stack buffer. Like this:
while (!error) {
char buf[4096];
int n = copy_to(buf); // fill buf[] with data
if (sendto(fd, buf, n, ...) != n) // handle error
}
Update:
I still don't quite understand why I can't see buffer values change.
You are not looking at the buffer contents, you are looking at the buffer address (i.e. &buf[0] given the code above).
If you want to look at the buffer contents, you need to print / examine it. E.g. to examine the first 8 bytes being sent, add this to your breakpoint command: x/8cx buf. But also note that it is common to have a fixed prefix on all the packets being sent, and it's not guaranteed that the 8 leading bytes will change on every packet either.
Related
We're learning to use GDB in my Computer Architecture class. To do this we do most of our work by using SSH to connect to a raspberry pi. When running GDB on some code he gave us to debug though it ends with an error message on how it can't find raise.c
I've tried:
installing libc6, libc6-dbg (says they're already up-to-date)
apt-get source glibc (gives me: "You must put some 'source' URIs in your sources.list")
https://stackoverflow.com/a/48287761/12015458 (apt source returns same thing as the apt-get source above, the "find $PWD" command the user gave returns nothing)
I've tried looking for it manually where told it may be? (/lib/libc doesn't exist for me)
This is the code he gave us to try debugging on GDB:
#include <stdio.h>
main()
{
int x,y;
y=54389;
for (x=10; x>=0; x--)
y=y/x;
printf("%d\n",y);
}
However, whenever I run the code in GDB I get the following error:
Program received signal SIGFPE, Arithmetic exception.
__GI_raise (sig=8) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
I asked him about it and he didn't really have any ideas on how to fix it.
It does not really matter that the source for raise() is not found. It would only show you the line where the exception is finally raised, but not the place where the error is triggered.
Run the erroneous program again in GDB. And when the exception is raised, investigate the call stack and the stackframes with GBDs commands. This is the point in your task, so I won't give you more than this hint.
If you're clever you can see the error in the given source just by looking at it. ;-)
When GDB does not know any symbol, you need to compile with the option -g to get debugger support.
EDIT
Now on a Windows system this is my log (please excuse the colouring, I didn't found a language selector for pure text):
D:\tmp\StackOverflow\so_027 > type crash1.c
#include <stdio.h>
main()
{
int x,y;
y=54389;
for (x=10; x>=0; x--)
y=y/x;
printf("%d\n",y);
}
D:\tmp\StackOverflow\so_027 > gcc crash1.c -g -o crash1.out
crash1.c:2:1: warning: return type defaults to 'int' [-Wimplicit-int]
main()
^~~~
D:\tmp\StackOverflow\so_027 > dir
[...cut...]
04.09.2019 08:33 144 crash1.c
04.09.2019 08:40 54.716 crash1.out
D:\tmp\StackOverflow\so_027 > gdb crash1.out
GNU gdb (GDB) 8.1
[...cut...]
This GDB was configured as "x86_64-w64-mingw32".
[...cut...]
Reading symbols from crash1.out...done.
(gdb) run
Starting program: D:\tmp\StackOverflow\so_027\crash1.out
[New Thread 4520.0x28b8]
[New Thread 4520.0x33f0]
Thread 1 received signal SIGFPE, Arithmetic exception.
0x0000000000401571 in main () at crash1.c:7
7 y=y/x;
(gdb) backtrace
#0 0x0000000000401571 in main () at crash1.c:7
(gdb) help stack
Examining the stack.
The stack is made up of stack frames. Gdb assigns numbers to stack frames
counting from zero for the innermost (currently executing) frame.
At any time gdb identifies one frame as the "selected" frame.
Variable lookups are done with respect to the selected frame.
When the program being debugged stops, gdb selects the innermost frame.
The commands below can be used to select other frames by number or address.
List of commands:
backtrace -- Print backtrace of all stack frames
bt -- Print backtrace of all stack frames
down -- Select and print stack frame called by this one
frame -- Select and print a stack frame
return -- Make selected stack frame return to its caller
select-frame -- Select a stack frame without printing anything
up -- Select and print stack frame that called this one
Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
(gdb) next
Thread 1 received signal SIGFPE, Arithmetic exception.
0x0000000000401571 in main () at crash1.c:7
7 y=y/x;
(gdb) next
[Inferior 1 (process 4520) exited with code 030000000224]
(gdb) next
The program is not being run.
(gdb) quit
D:\tmp\StackOverflow\so_027 >
Well, it marks directly the erroneous source line. That is different to your environment as you use a Raspi. However, it shows you some GDB commands to try.
Concerning your video:
It is clear that inside raise() you can't access x. That's why GDB moans about it.
If an exception is raised usually the program is about to quit. So there is no value in stepping forward.
Instead, as shown in my log, use GDB commands to investigate the stack frames. I think this is the issue you are about to learn.
BTW, do you know that you should be able to copy the screen content? This will make reading so much easier for us.
From a practical standpoint the other answer is correct, but if you do want the libc sources:
apt-get source is the right way to get the sources of libc, but yes, you do need to have source repositories configured in /etc/apt/sources.list.
If you're using Ubuntu, see the deb-src lines in https://help.ubuntu.com/community/Repositories/CommandLine
For debian, see https://wiki.debian.org/SourcesList#Example_sources.list
Then apt-get source should work. Remember to tell GDB where those sources are using the "directory" command.
I'm writing a library that hooks some CUDA functions to add some functionality. The "constructor" hooks the CUDA functions and set up message queue and shared memory to communicate with other hooked CUDA binaries. When launching several hooked CUDA binaries (by python subprocess.Popen('<path-to-binary>', shell=True)) some processes hangs. So I used gdb -p <pid> to attach one suspended process, hoping to figure out what's going wrong. Here's the result:
Attaching to process 7445
Reading symbols from /bin/dash...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
78 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
#1 0x000055fff93be8a0 in ?? ()
#2 0x000055fff93c009d in ?? ()
#3 0x000055fff93ba6d8 in ?? ()
#4 0x000055fff93b949e in ?? ()
#5 0x000055fff93b9eda in ?? ()
#6 0x000055fff93b7944 in ?? ()
#7 0x00007f9cefdc8b97 in __libc_start_main (main=0x55fff93b7850, argc=3, argv=0x7ffca7c7beb8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca7c7bea8) at ../csu/libc-start.c:310
#8 0x000055fff93b7a4a in ?? ()
I've added -g flag but it seems that the program hangs on wait4 before entering main.
Thanks for any insights on:
How can I load these debug symbols to get rid of ??
Where is ../csu/libc-start.c:310 located?
What else can I do to locate the bug?
System Info: gcc 6.5.0, Ubuntu 18.04 with 4.15.0-54-generic.
How can I load these debug symbols to get rid of ??
You appear to need the debug symbols for /bin/dash, which are probably going to be in a package called dash-dbg or dash-dbgsym or something like that.
Also, I suspect your stack trace would make more sense if you compiled your library with -fno-optimize-sibling-calls.
Where is ../csu/libc-start.c:310 located?
See this answer.
What else can I do to locate the bug?
You said that you are writing a library that uses __attribute__((constructor)), but you showed a stack trace for /bin/dash (which I presume is DASH and not a program you wrote) that does not appear to involve symbols from your library. I infer from this, that your library is loaded with LD_PRELOAD into programs that are not expecting it to be there.
Both of those things -- LD_PRELOAD and __attribute__((constructor)) -- break the normal expectations of both whatever unsuspecting program is involved, and the C library. You should only do those things if you have no other choice, and you should try to do as little as possible within the injected code. (In particular, I do not think any design that involves spawning processes from a constructor function will be workable, period.) If you tell us about your larger goals we may be able to suggest alternative means that are less troublesome.
EDIT:
subprocess.Popen('<path-to-binary>', shell=True)
With shell=True, Python doesn't invoke the program directly, it runs a command of the form /bin/sh -c 'string passed to Popen'. In many cases this will naturally produce a /bin/dash process sleeping (not hung) in a wait syscall for the entire lifetime of the actual binary. Unless you actually need to evaluate some shell code before running the program, try the default shell=False instead and see if that makes your problem go away. (If you do need to evaluate shell code, try Popen('<shell code>; exec <binary>', shell=True).)
My program is not working correctly.
It looks like it is stuck in an infinite loop or a bad mutex lock/unlock. But, I have no idea where the bug is.
I tried using gdb for debugging.
I can't use gdb backtrace command because I don't designate breakpoint.
And I can't designate it because I don't have any idea where the error is.
Does gdb have instrument for backtrace "on the fly"?
I can't use gdb backtrace command because I don't designate breakpoint.
Yes, you can.
All you need is for the inferior (being debugged) program to be stopped somewhere.
When you first attach to the program, GDB will stop all threads, and you can examine where they are. Later, you can hit Ctrl-C, and again look at all threads. A useful command is thread apply all where.
Get the process ID from 'ps -ef' of your program. Use pstack to know exactly which function it's hung in. It will print out an execution stack trace.
Example output:
$ pstack PROCESS_PID
\#0 0x00000038cfaa664e in waitpid () from /lib64/libc.so.6
\#1 0x000000000043ed42 in ?? ()
\#2 0x000000000043ffbf in wait_for ()
\#3 0x0000000000430bc9 in execute_command_internal ()
\#4 0x0000000000430dbe in execute_command ()
\#5 0x000000000041d526 in reader_loop ()
\#6 0x000000000041ccde in main ()
My program loads a dynamic library, but after it tries to load it (it doesn't seem to, or at least something's amiss with the loading. A free() throws an error, and I commented out that line.)
I get the following in gdb.
Program received signal SIGSEGV, Segmentation fault.
__strlen_ia32 () at ../sysdeps/i386/i686/multiarch/../../i586/strlen.S:99
99 ../sysdeps/i386/i686/multiarch/../../i586/strlen.S: No such file or directory.
in ../sysdeps/i386/i686/multiarch/../../i586/strlen.S
How would I go about addressing this?
EDIT1:
The above issue was due to me not having an xml file where it should have been.
Here's the first error that I covered up to get to the initial error I showed.
(gdb) s
__dlopen (file=0xbfffd03c "/usr/lib/libvisual-0.5/actor/actor_AVS.so", mode=1)
at dlopen.c:76
76 dlopen.c: No such file or directory.
in dlopen.c
(gdb) bt
#0 __dlopen (file=0xbfffd03c "/usr/lib/libvisual-0.5/actor/actor_AVS.so",
mode=1) at dlopen.c:76
#1 0xb7f8680d in visual_plugin_get_references (
pluginpath=0xbfffd03c "/usr/lib/libvisual-0.5/actor/actor_AVS.so",
count=0xbfffd020) at lv_plugin.c:834
#2 0xb7f86168 in plugin_add_dir_to_list (list=0x804e428,
dir=0x804e288 "/usr/lib/libvisual-0.5/actor") at lv_plugin.c:609
#3 0xb7f86b2b in visual_plugin_get_list (paths=0x804e3d8,
ignore_non_existing=1) at lv_plugin.c:943
#4 0xb7f9c5db in visual_init (argc=0xbffff170, argv=0xbffff174)
at lv_libvisual.c:370
#5 0x080494b7 in main (argc=2, argv=0xbffff204) at client.c:32
(gdb) quit
A debugging session is active.
Inferior 1 [process 3704] will be killed.
Quit anyway? (y or n) y
starlon#lyrical:client$ ls /usr/lib/libvisual-0.5/actor/actor_AVS.so
/usr/lib/libvisual-0.5/actor/actor_AVS.so
starlon#lyrical:client$
The file exists. Not sure what's up. Not sure what code to provide either.
Edit2: More info on the file. Permissions are ok.
816K -rwxr-xr-x 1 root root 814K 2011-11-08 15:06 /usr/lib/libvisual-0.5/actor/actor_AVS.so
You didn't tell what dynamic library it is.
If it is a free dynamic library -or a library whose source is accessible to you- you can compile it and use it with debugging enabled.
Several Linux distributions -notably Debian & Ubuntu- provide debugging variant of many libraries (e.g. GLibc, GTK, Qt, etc...), so you don't need to rebuild them. For example, Debian has libgtk-3-0 package (the binary libraries mostly), libgtk-3-dev the development files for it (headers, etc...) and libgtk-3-0-dbg (the debugging variant of the library). You need to set LD_LIBRARY_PATH appropriately to use it (since it is in /usr/lib/debug/usr/lib/libgdk-3.so.0.200.1).
Sometimes, using the debugging variants of system libraries help you to find bugs in your own code. (Of course, you also need to compile with -g -Wall your own code)
Turned out this was due to a faulty hard drive. Looks like I need a new one.
I am analyzing this core dump
Program received signal SIGABRT, Aborted.
0xb7fff424 in __kernel_vsyscall ()
(gdb) where
#0 0xb7fff424 in __kernel_vsyscall ()
#1 0x0050cd71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0050e64a in abort () at abort.c:92
#3 0x08083b3b in ?? ()
#4 0x08095461 in ?? ()
#5 0x0808bdea in ?? ()
#6 0x0808c4e2 in ?? ()
#7 0x080b683b in ?? ()
#8 0x0805d845 in ?? ()
#9 0x08083eb6 in ?? ()
#10 0x08061402 in ?? ()
#11 0x004f8cc6 in __libc_start_main (main=0x805f390, argc=15, ubp_av=0xbfffef64, init=0x825e220, fini=0x825e210,
rtld_fini=0x4cb220 <_dl_fini>, stack_end=0xbfffef5c) at libc-start.c:226
#12 0x0804e5d1 in ?? ()
I'm not able to know which function ?? maps to OR for instance #10 0x08061402 in ?? ()
falls in which address range ...
Please help me debug this.
Your program has no debugging symbols. Recompile it with -g. Make sure you haven't stripped your executable, e.g. by passing -s to the linker.
Even though #user794080 didn't say so, it appears exceedingly likely that his program is a 32-bit linux executable.
There are two possible reasons (I can think of) for symbols from main executable (and all symbols in the stack trace in the range [0x08040000,0x08100000) are from the main executable) not to show up.
The main executable has in fact been stripped (this is the same as
ninjalj's answer), and often happens when '-s' is passed into the linker, perhaps inadvertently.
The executable has been compiled with a new(er) GCC, but is being debugged by an old(er) GDB, which chokes on some newer dwarf construct (there should be a warning from GDB about that).
To know what libraries are mapped into the application, record a pid of you program, stopped in gdb and run in other console
cat /proc/$pid/maps
wher $pid is the pid of stopped process. Format of the maps file is described at http://linux.die.net/man/5/proc - starting from "/proc/[number]/maps
A file containing the currently mapped memory regions and their access permissions."
Also, if your OS don't use a ASLR (address space layout randomization) or it is disabled for your program, you can use
ldd ./program
to list linked libraries and their memory ranges. But if ASLR is turned on, you will be not able to get real memory mapping ranges info, as it will change for each run of program. But even then you will know, what libraries are linked in dynamically and install a debuginfo for them.
The stack might be corrupted. The "??" can happen if the return address on the stack has been overwritten by, for example, a buffer overflow.