Weird SEGFAULT while loading DLL under gdb - c

I have a small C program that loads a custom DLL and uses a couple of functions. I can run the program from the console and it works as intended. (I'm compiling with MinGW on Windows XP)
But if I run it from gdb, when it gets to loading the DLL, I get:
56 ldll = LoadLibrary("gsp810.dll");
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x7c929af2 in ntdll!RtlpWaitForCriticalSection () from C:\WINDOWS\system32\ntdll.dll
The weird thing is, if I make a backtrace at this point, I get a strange stack of Windows functions, which doesn't even contain my own program's stack (see below). However, if I keep running, it'll eventually return to my main() function and everything seems to be back to normal. The program works as expected and the functions from the DLL can be called.
(gdb) backtrace
#0 0x7c929af2 in ntdll!RtlpWaitForCriticalSection () from C:\WINDOWS\system32\ntdll.dll
#1 0x7c911046 in ntdll!RtlEnterCriticalSection () from C:\WINDOWS\system32\ntdll.dll
#2 0x00e161a0 in ?? ()
#3 0x77da6cf8 in RegCloseKey () from C:\WINDOWS\system32\advapi32.dll
#4 0x77da78e4 in RegOpenKeyExA () from C:\WINDOWS\system32\advapi32.dll
#5 0x77f44fcd in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#6 0x77f452e8 in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#7 0x77f45252 in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#8 0x7c91118a in ntdll!LdrInitializeThunk () from C:\WINDOWS\system32\ntdll.dll
#9 0x77f40000 in ?? ()
#10 0x7c92b5d2 in ntdll!LdrFindResourceDirectory_U () from C:\WINDOWS\system32\ntdll.dll
#11 0x7c9262db in ntdll!RtlValidateUnicodeString () from C:\WINDOWS\system32\ntdll.dll
#12 0x7c92643d in ntdll!LdrLoadDll () from C:\WINDOWS\system32\ntdll.dll
#13 0x00000000 in ?? ()
Is this SEGFAULT normal, or it is indicating an underlying problem with the DLL?
EDIT: Ok, looks like the problem is in the DLL itself. What I don't understand is the backtrace gdb is showing, as it does not contain the functions in my application. Then, at a certain point, it somehow "switches" to my stack, and the program keeps running as if nothing had happened.
Is it possible that Windows is somehow "handling" the segmentation fault, and the it returns control to the application?

Related

Macro substitution on a dynamically linked shared object

I have a compiled source code executable which has redefined malloc() with a custom function CustMalloc() using macro substitution.
As seen from the below backtrace, the compiled source code executable is supposed to be dynamically linked with libMRegAccess.so and libusb-1.0.so.0 shared objects. Both these shared objects were independently compiled and they have no means of knowing the above macro substitution during their compilation.
Now, bsd-asprintf.c I presume is a linux source file and the call to asprintf() and vasprintf() are called from the libusb-1.0.so.0 shared object when executing, which in turn tries to call malloc().
I'm unable to understand why the custom function CustMalloc() is getting called instead of the actual malloc().
FYI, the semaphore that is required in this backtrace is not yet created and hence the crash. The expectation is that the CustMalloc() should not be invoked in this code flow as the call is being made from an independently built shared object.
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1269]
0x0000007fb7f896cc in __new_sem_wait_fast () from /lib//libpthread.so.0
(gdb) bt
#0 0x0000007fb7f896cc in __new_sem_wait_fast () from /lib//libpthread.so.0
#1 0x0000007fb7f898fc in sem_wait##GLIBC_2.17 () from /lib//libpthread.so.0
#2 0x0000000001b09000 in SemTake (SemId=0x0) at <compiled_source_code.c>
#3 0x0000000000d6cffc in ContextLock () at <compiled_source_code.c>
#4 0x0000000000d993e4 in CustMalloc (size=128) at <compiled_source_code.c>
#5 0x0000000001c88a2c in vasprintf (str=0x7fb5eaf5f8, fmt=0x7fb7e8a640 "usb%s", ap=...) at bsd-asprintf.c:61
#6 0x0000000001c88c50 in asprintf (str=0x7fb5eaf5f8, fmt=0x7fb7e8a640 "usb%s") at bsd-asprintf.c:120
#7 0x0000007fb7e853cc in linux_enumerate_device () from /usr/lib/libusb-1.0.so.0
#8 0x0000007fb7e854c4 in sysfs_scan_device () from /usr/lib/libusb-1.0.so.0
#9 0x0000007fb7e85b80 in op_init () from /usr/lib/libusb-1.0.so.0
#10 0x0000007fb7e7dd1c in libusb_init () from /usr/lib/libusb-1.0.so.0
#11 0x0000007fb7ea65fc in cyusb_open(unsigned short, unsigned short) () from /usr/lib/libMRegAccess.so
#12 0x0000007fb7ea33f4 in InitDefaultUSBConn () from /usr/lib/libMRegAccess.so
#13 0x0000007fb7ea58e0 in openDefaultUSBDriver () from /usr/lib/libMRegAccess.so
#14 0x00000000010ddd94 in InitDrv () at <compiled_source_code.c>
#15 ... at <compiled_source_code.c>
#16 ... at <compiled_source_code.c>
#17 ... at <compiled_source_code.c>
#18 ... at <compiled_source_code.c>
#19 0x0000007fb7f80fd0 in start_thread () from /lib//libpthread.so.0
#20 0x0000007fb7d8cf60 in ?? () from /lib//libc.so.6
I'm unable to understand why the custom function CustMalloc() is getting called instead of the actual malloc()
This appears to be happening because you compiled and linked bsd-asprintf.c (with your macro redefinition) into your main executable.
You can tell that asprintf and CustMalloc are part of your binary, because their addresses are very different from other library routines (such as linux_enumerate_device or sem_wait).
If you want to know where asprintf is defined (which archive library or object file it comes from), relink your executable with -Wl,-y,asprintf flag, and the linker will tell you.

Memory failure in "?? ()" using GDB

I'm trying to trace my segmentation fault using gdb and I'm unable to find the exact line where the fault is happening.
(gdb) backtrace
#0 0x00110402 in __kernel_vsyscall ()
#1 0x007a5690 in raise () from /lib/libc.so.6
#2 0x007a6f91 in abort () from /lib/libc.so.6
#3 0x007dd9eb in __libc_message () from /lib/libc.so.6
#4 0x007e59aa in _int_free () from /lib/libc.so.6
#5 0x007e90f0 in free () from /lib/libc.so.6
#6 0x080dc4e7 in CRYPTO_free ()
#7 0x08c36668 in ?? ()
#8 0x08c44bac in ?? ()
#9 0x08100168 in BN_free ()
#10 0x00000009 in ?? ()
#11 0x08c44ba8 in ?? ()
#12 0x08108c07 in BN_MONT_CTX_free ()
#13 0xffffffff in ?? ()
#14 0x08c36630 in ?? ()
#15 0x08112697 in RSA_eay_finish ()
#16 0x08c4c110 in ?? ()
#17 0x08c36630 in ?? ()
#18 0x081150af in RSA_free ()
#19 0xffffffff in ?? ()
#20 0x00000009 in ?? ()
#21 0x0821870d in ?? ()
#22 0x000000dd in ?? ()
#23 0x08c4c110 in ?? ()
#24 0x08c35e98 in ?? ()
#25 0x08136893 in EVP_PKEY_free ()
#26 0xffffffff in ?? ()
#27 0x0000000a in ?? ()
#28 0x08226017 in ?? ()
#29 0x00000189 in ?? ()
#30 0x007e90f0 in free () from /lib/libc.so.6
#31 0x00000000 in ?? ()
(gdb)
How do I get rid of the ?? () and get a more precise solution? Thank you.
First, getting the complete stack trace here will likely not help you: any crash inside free implementation is due to heap corruption. Here we have heap corruption that GLIBC has already detected and told you about on the console.
Knowing where the corrupted block is being freed usually doesn't help to find where the block was corrupted; use specialized tools like Valgrind or AddressSanitizer for that.
Second, you are not getting file/line info because the crash is happening inside libc.so.6, and you have not installed debuginfo symbols for it. How to install debuginfo depends on your Linux distribution, which you have not told us about.
Last, the reason you have an "apparently corrupt" stack with addresses that don't correspond to any symbols is likely that the calls are coming from hand-coded assembly code (from libopenssl.a), which doesn't use frame pointers and doesn't have correct unwind descriptors. GDB needs one or the other to produce correct stack trace.
Compile your project with -g -O0 flag. Without -g flag the gcc compiler will strip all the symbol out and that's why you cannot see any symbol. If you want debug 3rd party library then you should configure it with --with-debug or other debug option.
Yeah it looks like your stack is corrupted. The way I would approach this is to run the program under a memory profiler like valgrind. Watch out for double free, writing arrays out-of-bounds, and conditional jumps.

gdb giving a function name followed by a number instead of file and line number

I have a segmentation fault in my program, and I'm using gdb to identify where it's happening. However, I am not able to see a clear line number where the error is occurring.
Below is a screenshot of my output.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 20065168 (LWP 4645)]
0x007e537f in _int_free () from /lib/libc.so.6
(gdb) backtrace
#0 0x007e537f in _int_free () from /lib/libc.so.6
#1 0x007e90f0 in free () from /lib/libc.so.6
#2 0x080d9e67 in CRYPTO_free ()
#3 0xbfd15f7c in ?? ()
#4 0xbfd16108 in ?? ()
#5 0x08070b3e in function_random.19532 ()
#6 0x00000001 in ?? ()
#7 0x00000000 in ?? ()
(gdb)
frame 5 is the piece of code that I have written, but I don't quite understand what it means.
Can someone please explain?
Most likely, in your case, debug symbols are not present in the binary. That is why, gdb is not able to read the debugging info and display them.
Re-compile your code, with the debugging enabled.
Example: for gcc, use the -g options.

SIGSEGV and mono crash issue while running .NET binary using mono

I have ubuntu 12.04 Linux on my PC and mono-complete package "Mono JIT compiler version 2.10.8.1 (Debian 2.10.8.1-1ubuntu2.2)".
I am going to run one .NET binary using mono and got SIGSEGV signal after running that binary and mono is going to be crashed after that.
I have also got some gdb debug messages on command prompt whihc i have mentioned below.
Thread 2 (Thread 0xb28ffb40 (LWP 20460)) :
#0 0xb7796424 in __kernel_vsyscall ()
#1 0xb77329db in read () from /lib/i386-linux-gnu/libpthread.so.0
#2 0x080e18e7 in read (__nbytes=1024, __buf=0xb2e0867c, __fd=<optimized out>) at /usr/include/i386-linux-gnu/bits/unistd.h:45
#3 mono_handle_native_sigsegv (signal=11, ctx=0xb2e08bcc) at mini-exceptions.c:2208
#4 0x081209fc in mono_arch_handle_altstack_exception (sigctx=0xb2e08bcc, fault_addr=0x0, stack_ovf=0) at exceptions-x86.c:1223
#5 0x0806094d in mono_sigsegv_signal_handler (_dummy=11, info=0xb2e08b4c, context=0xb2e08bcc) at mini.c:5909
#6 <signal handler called>
#7 0xb48881dc in ?? ()
#8 0xb2bcba6b in bulk_interrupt_read_thread (arguments=0xb4888108) at testusb.c:1596
#9 0xb772bd4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#10 0xb766adde in clone () from /lib/i386-linux-gnu/libc.so.6
Thread 1 (Thread 0xb757a700 (LWP 20449)) :
#0 0xb7796424 in __kernel_vsyscall ()
#1 0xb765c690 in poll () from /lib/i386-linux-gnu/libc.so.6
#2 0xb2c2c984 in ?? ()
#3 0xb2c2bdb0 in ?? ()
#4 0xb2c2e2d4 in ?? ()
#5 0xb2c4e770 in ?? ()
#6 0xb2c4b86c in ?? ()
#7 0xb2c4b527 in ?? ()
#8 0xb2e14518 in ?? ()
#9 0xb2e139a8 in ?? ()
#10 0xb2e13648 in ?? ()
#11 0xb58e3f84 in ?? ()
#12 0xb58e403e in ?? ()
#13 0x08064c2c in mono_jit_runtime_invoke (method="GTechUtility.Program:Main ()", obj=0x0, params=0xbfab491c, exc=0x0) at mini.c:5791
#14 0x081a422f in mono_runtime_invoke (method="GTechUtility.Program:Main ()", obj=0x0, params=0xbfab491c, exc=0x0) at object.c:2755
#15 0x081a7025 in mono_runtime_exec_main (method="GTechUtility.Program:Main ()", args=0x3be00, exc=0x0) at object.c:3938
#16 0x080bb80b in main_thread_handler (user_data=<synthetic pointer>) at driver.c:1003
#17 mono_main (argc=2, argv=0xbfab4ae4) at driver.c:1855
#18 0x0805998f in mono_main_with_options (argv=0xbfab4ae4, argc=2) at main.c:66
#19 main (argc=2, argv=0xbfab4ae4) at main.c:97
=================================================================
Got a SIGSEGV while executing native code.
This usually indicates a fatal error in the mono runtime or one of the native libraries used by your
application.
=================================================================
Aborted (core dumped)
Please let me know if any one have idea about this issue.
Try this: Disable increasing amounts of your own/application code until the error goes away. Then refine, with smaller steps, to see which part/line of your code is causing this.
If in the end, there is absolutely no own/application code left, play with the configuration, version, compiler options, of the libraries you're using.
Sorry I can't give you a detailed answer, but I HTH. Good luck!

gdb : address range mappings

I am analyzing this core dump
Program received signal SIGABRT, Aborted.
0xb7fff424 in __kernel_vsyscall ()
(gdb) where
#0 0xb7fff424 in __kernel_vsyscall ()
#1 0x0050cd71 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0050e64a in abort () at abort.c:92
#3 0x08083b3b in ?? ()
#4 0x08095461 in ?? ()
#5 0x0808bdea in ?? ()
#6 0x0808c4e2 in ?? ()
#7 0x080b683b in ?? ()
#8 0x0805d845 in ?? ()
#9 0x08083eb6 in ?? ()
#10 0x08061402 in ?? ()
#11 0x004f8cc6 in __libc_start_main (main=0x805f390, argc=15, ubp_av=0xbfffef64, init=0x825e220, fini=0x825e210,
rtld_fini=0x4cb220 <_dl_fini>, stack_end=0xbfffef5c) at libc-start.c:226
#12 0x0804e5d1 in ?? ()
I'm not able to know which function ?? maps to OR for instance #10 0x08061402 in ?? ()
falls in which address range ...
Please help me debug this.
Your program has no debugging symbols. Recompile it with -g. Make sure you haven't stripped your executable, e.g. by passing -s to the linker.
Even though #user794080 didn't say so, it appears exceedingly likely that his program is a 32-bit linux executable.
There are two possible reasons (I can think of) for symbols from main executable (and all symbols in the stack trace in the range [0x08040000,0x08100000) are from the main executable) not to show up.
The main executable has in fact been stripped (this is the same as
ninjalj's answer), and often happens when '-s' is passed into the linker, perhaps inadvertently.
The executable has been compiled with a new(er) GCC, but is being debugged by an old(er) GDB, which chokes on some newer dwarf construct (there should be a warning from GDB about that).
To know what libraries are mapped into the application, record a pid of you program, stopped in gdb and run in other console
cat /proc/$pid/maps
wher $pid is the pid of stopped process. Format of the maps file is described at http://linux.die.net/man/5/proc - starting from "/proc/[number]/maps
A file containing the currently mapped memory regions and their access permissions."
Also, if your OS don't use a ASLR (address space layout randomization) or it is disabled for your program, you can use
ldd ./program
to list linked libraries and their memory ranges. But if ASLR is turned on, you will be not able to get real memory mapping ranges info, as it will change for each run of program. But even then you will know, what libraries are linked in dynamically and install a debuginfo for them.
The stack might be corrupted. The "??" can happen if the return address on the stack has been overwritten by, for example, a buffer overflow.

Resources