gdb: always stop at 0xffffe410 in __kernel_vsyscall () - c

I'm using gdb to attach a running process, however, it always stops at __kernel_vsyscall. It looks like it stopped at my system call msgrcv(). I have to constantly "cont" it and don't know when it could jump out of kernel and go back to application. How can I make it continue? The following is my procedure.
How did I get this situation?
How to make it continue?
Thanks!
gdb
(gdb) attach PID
...
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0xffffe410 in __kernel_vsyscall ()
(gdb)bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x009ed573 in msgrcv () from /lib/libc.so.6
#2 0xf7f3a487 in _UX_wgetmsg (mode=0, msgp=0xffbb4178, pmaxtime=0xffbb4164,
pdata=0xf7f7a860, ux_type=0) at ../../../ux/com_ux/libux/com/UXipc.c:2550
#3 0xf7f3ad05 in UX_wgetmsg_v2 (mode=0, msgp=0xffbb4178, maxtime=10000,
ux_type=0) at ../../../ux/com_ux/libux/com/UXipc.c:2237
#4 0x0804bb9b in main (argc=1, argv=0xffbb5394)
at /path/to/my_application:243

How did I get this situation?
That situation is completely normal for when you attach to a process which is blocked in a system call (waiting for message, or for read to complete).
How to make it continue?
You type continue (at which point the application would again block, waiting for a message). If you want to debug some part of the application, set breakpoints before continuing.

Related

Debugging functions in __libc_start_main

I'm writing a library that hooks some CUDA functions to add some functionality. The "constructor" hooks the CUDA functions and set up message queue and shared memory to communicate with other hooked CUDA binaries. When launching several hooked CUDA binaries (by python subprocess.Popen('<path-to-binary>', shell=True)) some processes hangs. So I used gdb -p <pid> to attach one suspended process, hoping to figure out what's going wrong. Here's the result:
Attaching to process 7445
Reading symbols from /bin/dash...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.27.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.27.so...done.
done.
0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
78 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f9cefe8b76a in wait4 () at ../sysdeps/unix/syscall-template.S:78
#1 0x000055fff93be8a0 in ?? ()
#2 0x000055fff93c009d in ?? ()
#3 0x000055fff93ba6d8 in ?? ()
#4 0x000055fff93b949e in ?? ()
#5 0x000055fff93b9eda in ?? ()
#6 0x000055fff93b7944 in ?? ()
#7 0x00007f9cefdc8b97 in __libc_start_main (main=0x55fff93b7850, argc=3, argv=0x7ffca7c7beb8, init=<optimized out>,
fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca7c7bea8) at ../csu/libc-start.c:310
#8 0x000055fff93b7a4a in ?? ()
I've added -g flag but it seems that the program hangs on wait4 before entering main.
Thanks for any insights on:
How can I load these debug symbols to get rid of ??
Where is ../csu/libc-start.c:310 located?
What else can I do to locate the bug?
System Info: gcc 6.5.0, Ubuntu 18.04 with 4.15.0-54-generic.
How can I load these debug symbols to get rid of ??
You appear to need the debug symbols for /bin/dash, which are probably going to be in a package called dash-dbg or dash-dbgsym or something like that.
Also, I suspect your stack trace would make more sense if you compiled your library with -fno-optimize-sibling-calls.
Where is ../csu/libc-start.c:310 located?
See this answer.
What else can I do to locate the bug?
You said that you are writing a library that uses __attribute__((constructor)), but you showed a stack trace for /bin/dash (which I presume is DASH and not a program you wrote) that does not appear to involve symbols from your library. I infer from this, that your library is loaded with LD_PRELOAD into programs that are not expecting it to be there.
Both of those things -- LD_PRELOAD and __attribute__((constructor)) -- break the normal expectations of both whatever unsuspecting program is involved, and the C library. You should only do those things if you have no other choice, and you should try to do as little as possible within the injected code. (In particular, I do not think any design that involves spawning processes from a constructor function will be workable, period.) If you tell us about your larger goals we may be able to suggest alternative means that are less troublesome.
EDIT:
subprocess.Popen('<path-to-binary>', shell=True)
With shell=True, Python doesn't invoke the program directly, it runs a command of the form /bin/sh -c 'string passed to Popen'. In many cases this will naturally produce a /bin/dash process sleeping (not hung) in a wait syscall for the entire lifetime of the actual binary. Unless you actually need to evaluate some shell code before running the program, try the default shell=False instead and see if that makes your problem go away. (If you do need to evaluate shell code, try Popen('<shell code>; exec <binary>', shell=True).)

How to gdb a core file with seg fault in centos 6.5?

My program is multi thread. i got a core file and when i try to debug it i got this.
Program terminated with signal 11, Segmentation fault.
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
91 movl %ecx, (%rdi)
Missing separate debuginfos, use: debuginfo-install libssh2-1.8.0-2.0.cf.rhel6.x86_64
(gdb) bt
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:91
#1 0x00007f981b342feb in ?? ()
#2 0x00000000025f1ef0 in ?? ()
#3 0x00000000025edef0 in ?? ()
#4 0x00007fff4b65a810 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x00000000025cb800 in ?? ()
#7 0x00000000025ccea0 in ?? ()
#8 0x0000000000000000 in ?? ()
Why the bt infos are "???" Can i identify which thread and where case the seg fault?
Thank you.
In order to run gdb and make the best use of it, firstly, you need to compile your source with the -g or -ggdb3 option of gcc in the following way:
gcc -ggdb3 sample.c -o sample
After this you will get an executable or binary file which you can execute. Upon execution the program will generate a segfault and a coredump will be created. You can use this core file in the following way with gdb to obtain the backtrace:
gdb ./sample /path/to/core/file
You can even launch your program using gdb without actually executing it separately and generating the core file explicitly. If you want to do this, execute the following command:
gdb ./sample
The "??" entries are where symbol translation failed. Stack walking – which produces the stack trace – can also fail. In that case you'll likely see a single valid frame, then a small number of bogus addresses. If symbols or stacks are too badly broken to make sense of the stack trace, then there are usually ways to fix it: installing debug info packages (giving gdb more symbols, and letting it do DWARF-based stack walks), or recompiling the software from source with frame pointers and debugging information (-fno-omit-frame-pointer -g).

How to use GDB to debug QEMU with SMP (symmetric multiple processors)?

I am in a graduate operating systems class, and we are emulating our kernel using QEMU, and debugging it using gdb. Debugging has been straight-forward enough.. up until now. How can I connect gdb to the other CPUs I have running in QEMU?
Our makefile allows us to start qemu with either "make qemu-nox" or "make qemu-nox-gdb" in one terminal, and if we used the latter, then to connect to it with gdb using just "gdb" in another terminal (in the same directory). Thus, I'm not quite sure how to connect to the same QEMU, again, but to a different processor (I'm running with a total of 4 right now).
Each qemu CPU is visible as a separate thread within gdb. To inspect the state of another CPU, use the thread command to switch CPUs.
(gdb) info thread
Id Target Id Frame
* 1 Thread 1 (CPU#0 [running]) 0x80105163 in stosl (addr=0x89c3e000, data=16843009, cnt=1024) at x86.h:44
2 Thread 2 (CPU#1 [halted ]) halt () at x86.h:127
3 Thread 3 (CPU#2 [halted ]) halt () at x86.h:127
4 Thread 4 (CPU#3 [halted ]) halt () at x86.h:127
(gdb) where
#0 0x80105163 in stosl (addr=0x89c3e000, data=16843009, cnt=1024) at x86.h:44
#1 0x801051bf in memset (dst=0x89c3e000, c=1, n=4096) at string.c:8
#2 0x80102b5a in kfree (v=0x89c3e000 "\001\001\001\001") at kalloc.c:63
#3 0x80102af4 in freerange (vstart=0x80400000, vend=0x8e000000) at kalloc.c:47
#4 0x80102ac1 in kinit2 (vstart=0x80400000, vend=0x8e000000) at kalloc.c:38
#5 0x8010386a in main () at main.c:37
(gdb) thread 3
[Switching to thread 3 (Thread 3)]
#0 halt () at x86.h:127
127 }
(gdb) where
#0 halt () at x86.h:127
#1 0x80104aeb in scheduler () at proc.c:288
#2 0x801038f6 in mpmain () at main.c:59
#3 0x801038b0 in mpenter () at main.c:50
#4 0x0000705a in ?? ()

Weird SEGFAULT while loading DLL under gdb

I have a small C program that loads a custom DLL and uses a couple of functions. I can run the program from the console and it works as intended. (I'm compiling with MinGW on Windows XP)
But if I run it from gdb, when it gets to loading the DLL, I get:
56 ldll = LoadLibrary("gsp810.dll");
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x7c929af2 in ntdll!RtlpWaitForCriticalSection () from C:\WINDOWS\system32\ntdll.dll
The weird thing is, if I make a backtrace at this point, I get a strange stack of Windows functions, which doesn't even contain my own program's stack (see below). However, if I keep running, it'll eventually return to my main() function and everything seems to be back to normal. The program works as expected and the functions from the DLL can be called.
(gdb) backtrace
#0 0x7c929af2 in ntdll!RtlpWaitForCriticalSection () from C:\WINDOWS\system32\ntdll.dll
#1 0x7c911046 in ntdll!RtlEnterCriticalSection () from C:\WINDOWS\system32\ntdll.dll
#2 0x00e161a0 in ?? ()
#3 0x77da6cf8 in RegCloseKey () from C:\WINDOWS\system32\advapi32.dll
#4 0x77da78e4 in RegOpenKeyExA () from C:\WINDOWS\system32\advapi32.dll
#5 0x77f44fcd in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#6 0x77f452e8 in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#7 0x77f45252 in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#8 0x7c91118a in ntdll!LdrInitializeThunk () from C:\WINDOWS\system32\ntdll.dll
#9 0x77f40000 in ?? ()
#10 0x7c92b5d2 in ntdll!LdrFindResourceDirectory_U () from C:\WINDOWS\system32\ntdll.dll
#11 0x7c9262db in ntdll!RtlValidateUnicodeString () from C:\WINDOWS\system32\ntdll.dll
#12 0x7c92643d in ntdll!LdrLoadDll () from C:\WINDOWS\system32\ntdll.dll
#13 0x00000000 in ?? ()
Is this SEGFAULT normal, or it is indicating an underlying problem with the DLL?
EDIT: Ok, looks like the problem is in the DLL itself. What I don't understand is the backtrace gdb is showing, as it does not contain the functions in my application. Then, at a certain point, it somehow "switches" to my stack, and the program keeps running as if nothing had happened.
Is it possible that Windows is somehow "handling" the segmentation fault, and the it returns control to the application?

gdb : address range mappings

I am analyzing this core dump
Program received signal SIGABRT, Aborted.
0xb7fff424 in __kernel_vsyscall ()
(gdb) where
#0 0xb7fff424 in __kernel_vsyscall ()
#1 0x0050cd71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0050e64a in abort () at abort.c:92
#3 0x08083b3b in ?? ()
#4 0x08095461 in ?? ()
#5 0x0808bdea in ?? ()
#6 0x0808c4e2 in ?? ()
#7 0x080b683b in ?? ()
#8 0x0805d845 in ?? ()
#9 0x08083eb6 in ?? ()
#10 0x08061402 in ?? ()
#11 0x004f8cc6 in __libc_start_main (main=0x805f390, argc=15, ubp_av=0xbfffef64, init=0x825e220, fini=0x825e210,
rtld_fini=0x4cb220 <_dl_fini>, stack_end=0xbfffef5c) at libc-start.c:226
#12 0x0804e5d1 in ?? ()
I'm not able to know which function ?? maps to OR for instance #10 0x08061402 in ?? ()
falls in which address range ...
Please help me debug this.
Your program has no debugging symbols. Recompile it with -g. Make sure you haven't stripped your executable, e.g. by passing -s to the linker.
Even though #user794080 didn't say so, it appears exceedingly likely that his program is a 32-bit linux executable.
There are two possible reasons (I can think of) for symbols from main executable (and all symbols in the stack trace in the range [0x08040000,0x08100000) are from the main executable) not to show up.
The main executable has in fact been stripped (this is the same as
ninjalj's answer), and often happens when '-s' is passed into the linker, perhaps inadvertently.
The executable has been compiled with a new(er) GCC, but is being debugged by an old(er) GDB, which chokes on some newer dwarf construct (there should be a warning from GDB about that).
To know what libraries are mapped into the application, record a pid of you program, stopped in gdb and run in other console
cat /proc/$pid/maps
wher $pid is the pid of stopped process. Format of the maps file is described at http://linux.die.net/man/5/proc - starting from "/proc/[number]/maps
A file containing the currently mapped memory regions and their access permissions."
Also, if your OS don't use a ASLR (address space layout randomization) or it is disabled for your program, you can use
ldd ./program
to list linked libraries and their memory ranges. But if ASLR is turned on, you will be not able to get real memory mapping ranges info, as it will change for each run of program. But even then you will know, what libraries are linked in dynamically and install a debuginfo for them.
The stack might be corrupted. The "??" can happen if the return address on the stack has been overwritten by, for example, a buffer overflow.

Resources