SIGSEGV and mono crash issue while running .NET binary using mono - c

I have ubuntu 12.04 Linux on my PC and mono-complete package "Mono JIT compiler version 2.10.8.1 (Debian 2.10.8.1-1ubuntu2.2)".
I am going to run one .NET binary using mono and got SIGSEGV signal after running that binary and mono is going to be crashed after that.
I have also got some gdb debug messages on command prompt whihc i have mentioned below.
Thread 2 (Thread 0xb28ffb40 (LWP 20460)) :
#0 0xb7796424 in __kernel_vsyscall ()
#1 0xb77329db in read () from /lib/i386-linux-gnu/libpthread.so.0
#2 0x080e18e7 in read (__nbytes=1024, __buf=0xb2e0867c, __fd=<optimized out>) at /usr/include/i386-linux-gnu/bits/unistd.h:45
#3 mono_handle_native_sigsegv (signal=11, ctx=0xb2e08bcc) at mini-exceptions.c:2208
#4 0x081209fc in mono_arch_handle_altstack_exception (sigctx=0xb2e08bcc, fault_addr=0x0, stack_ovf=0) at exceptions-x86.c:1223
#5 0x0806094d in mono_sigsegv_signal_handler (_dummy=11, info=0xb2e08b4c, context=0xb2e08bcc) at mini.c:5909
#6 <signal handler called>
#7 0xb48881dc in ?? ()
#8 0xb2bcba6b in bulk_interrupt_read_thread (arguments=0xb4888108) at testusb.c:1596
#9 0xb772bd4c in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#10 0xb766adde in clone () from /lib/i386-linux-gnu/libc.so.6
Thread 1 (Thread 0xb757a700 (LWP 20449)) :
#0 0xb7796424 in __kernel_vsyscall ()
#1 0xb765c690 in poll () from /lib/i386-linux-gnu/libc.so.6
#2 0xb2c2c984 in ?? ()
#3 0xb2c2bdb0 in ?? ()
#4 0xb2c2e2d4 in ?? ()
#5 0xb2c4e770 in ?? ()
#6 0xb2c4b86c in ?? ()
#7 0xb2c4b527 in ?? ()
#8 0xb2e14518 in ?? ()
#9 0xb2e139a8 in ?? ()
#10 0xb2e13648 in ?? ()
#11 0xb58e3f84 in ?? ()
#12 0xb58e403e in ?? ()
#13 0x08064c2c in mono_jit_runtime_invoke (method="GTechUtility.Program:Main ()", obj=0x0, params=0xbfab491c, exc=0x0) at mini.c:5791
#14 0x081a422f in mono_runtime_invoke (method="GTechUtility.Program:Main ()", obj=0x0, params=0xbfab491c, exc=0x0) at object.c:2755
#15 0x081a7025 in mono_runtime_exec_main (method="GTechUtility.Program:Main ()", args=0x3be00, exc=0x0) at object.c:3938
#16 0x080bb80b in main_thread_handler (user_data=<synthetic pointer>) at driver.c:1003
#17 mono_main (argc=2, argv=0xbfab4ae4) at driver.c:1855
#18 0x0805998f in mono_main_with_options (argv=0xbfab4ae4, argc=2) at main.c:66
#19 main (argc=2, argv=0xbfab4ae4) at main.c:97
=================================================================
Got a SIGSEGV while executing native code.
This usually indicates a fatal error in the mono runtime or one of the native libraries used by your
application.
=================================================================
Aborted (core dumped)
Please let me know if any one have idea about this issue.

Try this: Disable increasing amounts of your own/application code until the error goes away. Then refine, with smaller steps, to see which part/line of your code is causing this.
If in the end, there is absolutely no own/application code left, play with the configuration, version, compiler options, of the libraries you're using.
Sorry I can't give you a detailed answer, but I HTH. Good luck!

Related

MinGW cross-compiled application, atexit / mingw_onexit crashes on Windows 10

I have a C application that I cross-compile for Windows from Fedora Linux:
$ x86_64-w64-mingw32-gcc --version
x86_64-w64-mingw32-gcc (GCC) 6.2.0 20160822 (Fedora MinGW 6.2.0-1.fc24)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The application uses atexit to register a teardown routine:
https://github.com/jdolan/Objectively/blob/master/Sources/Objectively/Class.c#L74
On some Windows systems, the registration of the atexit handler causes a crash:
Starting program: C:\Users\jay\Desktop\Quetoo\bin\quetoo.exe
[New Thread 2900.0xa48]
[New Thread 2900.0x9a4]
[New Thread 2900.0x310]
[New Thread 2900.0xcfc]
warning: HEAP[quetoo.exe]:
warning: Invalid address specified to RtlSizeHeap( 0000000002DD0000, 000000000010ED30 )
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ff8c7cdfa56 in ntdll!RtlpNtMakeTemporaryKey () from C:\Windows\SYSTEM32\ntdll.dll
(gdb) bt
#0 0x00007ff8c7cdfa56 in ntdll!RtlpNtMakeTemporaryKey () from C:\Windows\SYSTEM32\ntdll.dll
#1 0x00007ff8c7cb1075 in ntdll!memset () from C:\Windows\SYSTEM32\ntdll.dll
#2 0x00007ff8c7cdf7c0 in ntdll!RtlpNtMakeTemporaryKey () from C:\Windows\SYSTEM32\ntdll.dll
#3 0x00007ff8c799a620 in msvcrt!.dllonexit () from C:\Windows\system32\msvcrt.dll
#4 0x000000006c28c91d in mingw_onexit () from C:\Users\jay\Desktop\Quetoo\bin\libObjectively-0.dll
#5 0x000000006c28c979 in atexit () from C:\Users\jay\Desktop\Quetoo\bin\libObjectively-0.dll
#6 0x000000006c28ca09 in __do_global_ctors () from C:\Users\jay\Desktop\Quetoo\bin\libObjectively-0.dll
#7 0x000000006c28136a in __DllMainCRTStartup (hDllHandle=0x6c280000, dwReason=1, lpreserved=0x25afb00)
at ../crt/crtdll.c:200
#8 0x00007ff8c7c04fc8 in ntdll!RtlActivateActivationContextUnsafeFast () from C:\Windows\SYSTEM32\ntdll.dll
#9 0x00007ff8c7c61d7a in ntdll!RtlAreBitsSet () from C:\Windows\SYSTEM32\ntdll.dll
#10 0x00007ff8c7c61bbf in ntdll!RtlAreBitsSet () from C:\Windows\SYSTEM32\ntdll.dll
#11 0x00007ff8c7c61bdd in ntdll!RtlAreBitsSet () from C:\Windows\SYSTEM32\ntdll.dll
#12 0x00007ff8c7c7fd6d in ntdll!EtwEventProviderEnabled () from C:\Windows\SYSTEM32\ntdll.dll
#13 0x00007ff8c7cb14f3 in ntdll!memset () from C:\Windows\SYSTEM32\ntdll.dll
#14 0x00007ff8c7c66a0e in ntdll!LdrInitializeThunk () from C:\Windows\SYSTEM32\ntdll.dll
#15 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
This particular system is a fresh install of Windows 10. I tried running the application as an Administrator, with the same result.
Is there a different version of the msvcrt I should provide? Kinda stumped here.

Memory failure in "?? ()" using GDB

I'm trying to trace my segmentation fault using gdb and I'm unable to find the exact line where the fault is happening.
(gdb) backtrace
#0 0x00110402 in __kernel_vsyscall ()
#1 0x007a5690 in raise () from /lib/libc.so.6
#2 0x007a6f91 in abort () from /lib/libc.so.6
#3 0x007dd9eb in __libc_message () from /lib/libc.so.6
#4 0x007e59aa in _int_free () from /lib/libc.so.6
#5 0x007e90f0 in free () from /lib/libc.so.6
#6 0x080dc4e7 in CRYPTO_free ()
#7 0x08c36668 in ?? ()
#8 0x08c44bac in ?? ()
#9 0x08100168 in BN_free ()
#10 0x00000009 in ?? ()
#11 0x08c44ba8 in ?? ()
#12 0x08108c07 in BN_MONT_CTX_free ()
#13 0xffffffff in ?? ()
#14 0x08c36630 in ?? ()
#15 0x08112697 in RSA_eay_finish ()
#16 0x08c4c110 in ?? ()
#17 0x08c36630 in ?? ()
#18 0x081150af in RSA_free ()
#19 0xffffffff in ?? ()
#20 0x00000009 in ?? ()
#21 0x0821870d in ?? ()
#22 0x000000dd in ?? ()
#23 0x08c4c110 in ?? ()
#24 0x08c35e98 in ?? ()
#25 0x08136893 in EVP_PKEY_free ()
#26 0xffffffff in ?? ()
#27 0x0000000a in ?? ()
#28 0x08226017 in ?? ()
#29 0x00000189 in ?? ()
#30 0x007e90f0 in free () from /lib/libc.so.6
#31 0x00000000 in ?? ()
(gdb)
How do I get rid of the ?? () and get a more precise solution? Thank you.
First, getting the complete stack trace here will likely not help you: any crash inside free implementation is due to heap corruption. Here we have heap corruption that GLIBC has already detected and told you about on the console.
Knowing where the corrupted block is being freed usually doesn't help to find where the block was corrupted; use specialized tools like Valgrind or AddressSanitizer for that.
Second, you are not getting file/line info because the crash is happening inside libc.so.6, and you have not installed debuginfo symbols for it. How to install debuginfo depends on your Linux distribution, which you have not told us about.
Last, the reason you have an "apparently corrupt" stack with addresses that don't correspond to any symbols is likely that the calls are coming from hand-coded assembly code (from libopenssl.a), which doesn't use frame pointers and doesn't have correct unwind descriptors. GDB needs one or the other to produce correct stack trace.
Compile your project with -g -O0 flag. Without -g flag the gcc compiler will strip all the symbol out and that's why you cannot see any symbol. If you want debug 3rd party library then you should configure it with --with-debug or other debug option.
Yeah it looks like your stack is corrupted. The way I would approach this is to run the program under a memory profiler like valgrind. Watch out for double free, writing arrays out-of-bounds, and conditional jumps.

gdb giving a function name followed by a number instead of file and line number

I have a segmentation fault in my program, and I'm using gdb to identify where it's happening. However, I am not able to see a clear line number where the error is occurring.
Below is a screenshot of my output.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 20065168 (LWP 4645)]
0x007e537f in _int_free () from /lib/libc.so.6
(gdb) backtrace
#0 0x007e537f in _int_free () from /lib/libc.so.6
#1 0x007e90f0 in free () from /lib/libc.so.6
#2 0x080d9e67 in CRYPTO_free ()
#3 0xbfd15f7c in ?? ()
#4 0xbfd16108 in ?? ()
#5 0x08070b3e in function_random.19532 ()
#6 0x00000001 in ?? ()
#7 0x00000000 in ?? ()
(gdb)
frame 5 is the piece of code that I have written, but I don't quite understand what it means.
Can someone please explain?
Most likely, in your case, debug symbols are not present in the binary. That is why, gdb is not able to read the debugging info and display them.
Re-compile your code, with the debugging enabled.
Example: for gcc, use the -g options.

alloc: invalid block - Are Tcl_IncrRefCount and Tcl_DecrRefCount thread safe for threaded Tcl / 1 interp per thread?

Our 32-bit server application statically embeds tcl 8.4.11. On Red Hat Linux 6.5 64-bit we're encountering crashes / core dumps. The failure looks like
alloc: invalid block: 0xf6f00f58: 88 f6 0
At the bottom of the question, I've documented two different core dumps we've seen.
We've isolated a potential root cause to a TCL object shared between two threads concurrently running separate TCL interpreter instances. We think it's because TCL object is passed to Tcl_IncrRefCount / Tcl_DecrRefCount from these concurrently executing TCL interpreters.
Are Tcl_IncrRefCount / Tcl_DecrRefCount thread safe when TCL is compiled threaded?
Are TCL objects shared by TCL interpreter instances? Is there any way to disable TCL object sharing across interpreter instances?
Is the situation any better in TCL version 8.6.3?
(gdb) bt
#0 __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1 0x001b7871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x001b914a in abort () at abort.c:92
#3 0x080f611c in Tcl_PanicVA ()
#4 0x080f613b in Tcl_Panic ()
#5 0x0810133c in Ptr2Block ()
#6 0x08100e04 in TclpFree ()
#7 0x080b46a7 in Tcl_Free ()
#8 0x08100686 in FreeStringInternalRep ()
#9 0x080fdac1 in ResetObjResult ()
#10 0x080fd316 in Tcl_GetStringResult ()
#11 0x0808aaad in run_tcl_proc (pDevice=0x8e0ba08, pInterp=0x8d798c0, iNumArgs=2, objv=0x115434c, bIsCommand=0 '\000', pCommand=0x0)
#12 0x08093672 in Tcl_begin_next_state (pDevice=0x8e0ba08, iNextState=RunPoll, pCommand=0x0)
#13 0x08093759 in Tcl_port_thread (dummy=0x8d1cab8)
#14 0x008bcb39 in start_thread (arg=0x1154b70) at pthread_create.c:301
#15 0x0026fc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)
(gdb) bt
#0 __kernel_vsyscall () at arch/x86/vdso/vdso32/sysenter.S:49
#1 0x00395871 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0039714a in abort () at abort.c:92
#3 0x080f611c in Tcl_PanicVA ()
#4 0x080f613b in Tcl_Panic ()
#5 0x0810133c in Ptr2Block ()
#6 0x08100e04 in TclpFree ()
#7 0x080b46a7 in Tcl_Free ()
#8 0x080d21b6 in TclExecuteByteCode ()
#9 0x080d1bc1 in TclCompEvalObj ()
#10 0x080fbd5c in TclObjInterpProc ()
#11 0x080b026a in TclEvalObjvInternal ()
#12 0x080d2716 in TclExecuteByteCode ()
#13 0x080d1bc1 in TclCompEvalObj ()
#14 0x080fbd5c in TclObjInterpProc ()
#15 0x080b026a in TclEvalObjvInternal ()
#16 0x080b0517 in Tcl_EvalObjv ()
#17 0x0808aa02 in run_tcl_proc (pDevice=0x94a2500, pInterp=0xac2bba0, iNumArgs=2, objv=0x11b034c, bIsCommand=0 '\000', pCommand=0x0)
#18 0x08093672 in Tcl_begin_next_state (pDevice=0x94a2500, iNextState=RunPoll, pCommand=0x0)
#19 0x08093759 in Tcl_port_thread (dummy=0x9365e98)
#20 0x00356b39 in start_thread (arg=0x11b0b70) at pthread_create.c:301
#21 0x0044dc2e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
(gdb)
The calls Tcl_IncrRefCount (actually a simple macro) and Tcl_DecrRefCount (a complicated macro) are sort of thread safe, but only because each Tcl_Obj should only ever be accessed from the thread that created it; parallel calls to T_IRC and T_DRC are fine, so long as they're on different values. The plus side of this is that accesses don't need locking (and the memory manager for Tcl_Obj structures takes advantage of this).
Note that multi-threaded access is not a good plan at all unless you're very careful, since even reader operations like Tcl_GetIntFromObj can write to the underlying structure if a type transformation needs to be applied. These operations are not locked. Doing it at all needs very intimate knowledge of the current type of the value — not something that you're usually encouraged to think about in Tcl in the first place, though tcl::unsupported::representation can be helpful with probing this in 8.6 — and some very careful interlocking between the threads so that one isn't writing while the other is peeking. Don't do this at all, while not 100% accurate, is the approach least likely to lead to headaches.
You probably ought to read more about how you're supposed to do it. The ActiveState blog has a reasonable introduction.

Weird SEGFAULT while loading DLL under gdb

I have a small C program that loads a custom DLL and uses a couple of functions. I can run the program from the console and it works as intended. (I'm compiling with MinGW on Windows XP)
But if I run it from gdb, when it gets to loading the DLL, I get:
56 ldll = LoadLibrary("gsp810.dll");
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x7c929af2 in ntdll!RtlpWaitForCriticalSection () from C:\WINDOWS\system32\ntdll.dll
The weird thing is, if I make a backtrace at this point, I get a strange stack of Windows functions, which doesn't even contain my own program's stack (see below). However, if I keep running, it'll eventually return to my main() function and everything seems to be back to normal. The program works as expected and the functions from the DLL can be called.
(gdb) backtrace
#0 0x7c929af2 in ntdll!RtlpWaitForCriticalSection () from C:\WINDOWS\system32\ntdll.dll
#1 0x7c911046 in ntdll!RtlEnterCriticalSection () from C:\WINDOWS\system32\ntdll.dll
#2 0x00e161a0 in ?? ()
#3 0x77da6cf8 in RegCloseKey () from C:\WINDOWS\system32\advapi32.dll
#4 0x77da78e4 in RegOpenKeyExA () from C:\WINDOWS\system32\advapi32.dll
#5 0x77f44fcd in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#6 0x77f452e8 in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#7 0x77f45252 in SHLWAPI!PathMakeSystemFolderW () from C:\WINDOWS\system32\shlwapi.dll
#8 0x7c91118a in ntdll!LdrInitializeThunk () from C:\WINDOWS\system32\ntdll.dll
#9 0x77f40000 in ?? ()
#10 0x7c92b5d2 in ntdll!LdrFindResourceDirectory_U () from C:\WINDOWS\system32\ntdll.dll
#11 0x7c9262db in ntdll!RtlValidateUnicodeString () from C:\WINDOWS\system32\ntdll.dll
#12 0x7c92643d in ntdll!LdrLoadDll () from C:\WINDOWS\system32\ntdll.dll
#13 0x00000000 in ?? ()
Is this SEGFAULT normal, or it is indicating an underlying problem with the DLL?
EDIT: Ok, looks like the problem is in the DLL itself. What I don't understand is the backtrace gdb is showing, as it does not contain the functions in my application. Then, at a certain point, it somehow "switches" to my stack, and the program keeps running as if nothing had happened.
Is it possible that Windows is somehow "handling" the segmentation fault, and the it returns control to the application?

Resources