RSA_sign() segfaults unpredictably - c

I am using RSA_sign() to create dkim signatures. Sometimes , extremely unpredictably the code just crashes
I get a gdb dump like this
Thread 39 (Thread 0x41401940 (LWP 31921)):
#0 0x0000003b9dacc3bb in BN_div () from /lib64/libcrypto.so.6
#1 0x0000003b9daceb40 in BN_mod_inverse () from /lib64/libcrypto.so.6
#2 0x0000003b9dacb609 in BN_BLINDING_create_param () from /lib64/libcrypto.so.6
#3 0x0000003b9dadc9f7 in RSA_setup_blinding () from /lib64/libcrypto.so.6
#4 0x0000003b9daee954 in ?? () from /lib64/libcrypto.so.6
#5 0x0000003b9daef56b in ?? () from /lib64/libcrypto.so.6
#6 0x0000003b9da6e965 in RSA_sign () from /lib64/libcrypto.so.6
#7 0x0000000000403e7f in dkim_create (headers=0x2aaaac001840, headerc=7,
......., v=0) at firm-dkim.c:145
The firm-dkim.c is available here
http://code.google.com/p/firm-dkim/source/browse/trunk/firm-dkim.c
How can I debug this further ?
Thanks
Ram

Ok I think I got the error
The code over here
firm-dkim.c.
Does not allocate any memory for RSA *rsa_private , line 48
And this unallocated memory area is used in RSA_sign() and RSA_free()
I think that must be causing the segfault. I have alloc'ed memory now and I am running the daemon in production. Hopefully no more segfaults now.

Related

Why does ASan throw a segmentation fault here?

So this is a follow up to my previous Question
I am trying to build a legacy codebase with Address Sanitizer. As soon as I run the binary, I get a Segmentation Fault.
==2940167==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000002aa79a0 at pc 0x000000d6aaf4 bp 0x7fffe7244140 sp 0x7fffe7244130
READ of size 4 at 0x000002aa79a0 thread T3
Thread 1 "block_hk.xcg." received signal SIGSEGV, Segmentation fault.
Here is the trace with gdb:
#0 0x00007ffff6f0dc26 in __sanitizer::internal_strlen(char const*) () from /lib64/libasan.so.5
#1 0x00007ffff6e57208 in printf_common(void*, char const*, __va_list_tag*) () from /lib64/libasan.so.5
#2 0x00007ffff6e59566 in vsnprintf () from /lib64/libasan.so.5
#3 0x00007ffff540ed5c in ?? () from /lib64/libdus.so
#4 0x00007ffff540f017 in ?? () from /lib64/libdus.so
#5 0x00007ffff540f860 in ?? () from /lib64/libdus.so
#6 0x00007ffff541a443 in logPrint () from /lib64/libdus.so
#7 0x00007ffff4761734 in mcsCreateSem () from /lib64/libcwl.so
#8 0x00007ffff4764b14 in mcsInitialize () from /lib64/libcwl.so
#9 0x00007ffff47678fc in cwlAppRunCPP () from /lib64/libcwl.so
#10 0x000000000112d719 in main (argc=1, argv=0x7ffffffe7478) at ./src/main.c:755

Fixing AddressSanitizer: strcpy-param-overlap with memmove?

I am poking around in an old & quite buggy C program. When compiled with gcc -fsanitize=address I got this error while running the program itself:
==635==ERROR: AddressSanitizer: strcpy-param-overlap: memory ranges [0x7f37e8cfd5b5,0x7f37e8cfd5b8) and [0x7f37e8cfd5b5, 0x7f37e8cfd5b8) overlap
#0 0x7f390c3a8552 in __interceptor_strcpy /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cc:429
#1 0x56488e5c1a08 in backupExon src/BackupGenes.c:72
#2 0x56488e5c2df1 in backupGene src/BackupGenes.c:134
#3 0x56488e5c426e in BackupArrayD src/BackupGenes.c:227
#4 0x56488e5c0bb1 in main src/geneid.c:583
#5 0x7f390b6bfee2 in __libc_start_main (/usr/lib/libc.so.6+0x26ee2)
#6 0x56488e5bf46d in _start (/home/darked89/proj_soft/geneidc/crg_github/geneidc/bin/geneid+0x1c46d)
0x7f37e8cfd5b5 is located 3874229 bytes inside of 37337552-byte region [0x7f37e894b800,0x7f37eace71d0)
allocated by thread T0 here:
#0 0x7f390c41bce8 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:153
#1 0x56488e618728 in RequestMemoryDumpster src/RequestMemory.c:801
#2 0x56488e5bfcea in main src/geneid.c:305
#3 0x7f390b6bfee2 in __libc_start_main (/usr/lib/libc.so.6+0x26ee2)
The error was caused by this line:
/* backupExon src/BackupGenes.c:65 */
strcpy(d->dumpSites[d->ndumpSites].subtype, E->Acceptor->subtype);
I have replaced it with:
memmove(d->dumpSites[d->ndumpSites].subtype, E->Acceptor->subtype,
strlen(d->dumpSites[d->ndumpSites].subtype));
The error went away and the program output produced with 2 different data inputs is identical to the results obtained before the change. BTW, more of strcpy bugs remain further down in the source. I need a confirmation that this is the way to fix it.
The issue & the rest of the code is here:
https://github.com/darked89/geneidc/issues/2
Assuming that E->Acceptor->subtype is at least as long as d->dumpSites[d->ndumpSites].subtype then there's no problem. You might want to check that first if you didn't already. Actually, you need a +1 to also copy the string terminator (\0), thanks #R.. for spotting it.
Your previous code was making a different assumption: it was assuming that d->dumpSites[d->ndumpSites].subtype was at least as long as E->Acceptor->subtype (the opposite basically).
The real equivalent would be:
memmove(
d->dumpSites[d->ndumpSites].subtype,
E->Acceptor->subtype,
strlen(E->Acceptor->subtype) + 1
);
This is the correct way to fix the code to allow overlapping.

Segmentation fault in gst_mini_object_init

I'm trying to use Gstreamer in a C program.
I use udpsrc so I have to put caps :
GstCaps *caps = gst_caps_new_empty_simple("application/x-rtp");
With this, I get an Segmentation fault.
So, I've tried with G_DEBUG="fatal_warnings" gdb --args ./test_gst.
Here's the output :
Program received signal SIGSEGV, Segmentation fault.
0x76f010e4 in gst_mini_object_init (mini_object=0x28600, flags=0, type=0, copy_func=0x76ed6174 <_gst_caps_copy>, dispose_func=0x0, free_func=0x76ed5128 <_gst_caps_free>)
at gstminiobject.c:133
133 gstminiobject.c: No such file or directory.
(gdb) bt
#0 0x76f010e4 in gst_mini_object_init (mini_object=0x28600, flags=0, type=0, copy_func=0x76ed6174 <_gst_caps_copy>, dispose_func=0x0, free_func=0x76ed5128 <_gst_caps_free>)
at gstminiobject.c:133
#1 0x76ed57b4 in gst_caps_init (caps=0x28600) at gstcaps.c:209
#2 gst_caps_new_empty () at gstcaps.c:239
#3 0x76ed58f8 in gst_caps_new_empty_simple (media_type=0x110b4 "application/x-rtp") at gstcaps.c:282
#4 0x00010bbc in main ()
I don't know if this can help, but I'm working on a Raspberry PI 3 (raspbian).
I found a similar bug report with Segmentation fault in gst_mini_object_init(). According to this comment you should call gst_init() before using Gstreamer.
Did you call gst_init() before using Gstreamer API ?

Modified stack in multi-threaded case

We're loading a symbol from a shared library via dlsym() under GNU/Linux and obviously get some kind of race condition resulting in a segmentation fault. The backtrace looks something like this:
(gdb) backtrace
#0 do_lookup_x at dl-lookup.c:366
#1 _dl_lookup_symbol_x at dl-lookup.c:829
#2 do_sym at dl-sym.c:168
#3 _dl_sym at dl-sym.c:273
#4 dlsym_doit at dlsym.c:50
#5 _dl_catch_error at dl-error.c:187
#6 _dlerror_run at dlerror.c:163
#7 __dlsym at dlsym.c:70
#8 ... (our code)
My local machine uses glibc-2.23.
I discovered, that the library handle given to __dlsym() in frame #7 is different to the handle passed to _dlerror_run(). It runs wild in the following lines in dlsym.c:
void *
__dlsym (void *handle, const char *name DL_CALLER_DECL)
{
# ifdef SHARED
if (__glibc_unlikely (_dlfcn_hook != NULL))
return _dlfcn_hook->dlsym (handle, name, DL_CALLER);
# endif
struct dlsym_args args;
args.who = DL_CALLER;
args.handle = handle; /* <------------------ this isn't my handle! */
args.name = name;
/* Protect against concurrent loads and unloads. */
__rtld_lock_lock_recursive (GL(dl_load_lock));
void *result = (_dlerror_run (dlsym_doit, &args) ? NULL : args.sym);
__rtld_lock_unlock_recursive (GL(dl_load_lock));
return result;
}
GDB says
(gdb) frame 7
#7 __dlsym at dlsym.c:70
(gdb) p *(struct link_map *)args.handle
$36 = {l_addr= 140736951484536, l_name = 0x7fffe0000078 "\300\215\r\340\377\177", ...}
so this is obviously garbage. The same occurs in the higher frames, e.g. in frame #2:
(gdb) frame 2
#2 do_sym at dl-sym.c:168
(gdb) p handle
$38 = {l_addr= 140736951484536, l_name = 0x7fffe0000078 "\300\215\r\340\377\177", ...}
Unfortunately the parameter handle in frame #7 can't be displayed:
(gdb) p handle
$37 = <optimized out>
but surprisingly in frame #8 and further down in our code the handle was correct:
(gdb) frame 8
#8 ...
(gdb) p *(struct link_map *)libHandle
$38 = {l_addr = 140737160646656, l_name = 0x7fffd8005b60 "/path/to/libfoo.so", ...}
Now my conclusion is, that the variable args must be modified during the execution inside __dlsym() but I can't see where and why.
I have to confess, there's a second aspect to this problem: It only occurs in a multi-threaded environment and only sometimes. But as you can see, there are some counter measures for race conditions in the implementation of __dlsym() since they're calling __rtld_lock_(un)lock_recursive() and the local variable args isn't shared across threads. And curiously enough, the problem still persists, if I make frame #8 mutual exclusive among my threads.
Questions: What are possible sources for the discrepancy in the library handle between frame #8 and frame #7?
Question 2: Does dlopen() yield different values for different threads? Or to put it differently: Is it possible to share the handles returned by dlopen() between different threads.
Update: I thank everybody commenting on this question and trying to answer it despite the lack of almost any viable information to do so. I found the solution of this problem. As foreseen by the commenters, it was totaly unrelated to the stacktraces and other information I provided. Hence, I consider this question as closed and will flag it for deletion. So Long, and Thanks for All the Fish
What are possible sources for the discrepancy in the library handle between frame #8 and frame #7?
The most likely cause is mismatch between ld-linux.so and libdl.so. As stated in this answer, ld-linux and libdl must come from the same build of GLIBC, or bad things will happen.
The mismatch can come from (A) trying to point to a different libc build via LD_LIBRARY_PATH, or (B) by static linking of libdl.a into the program.
The (gdb) info shared should show you which libraries are currently loaded. If you see something other than installed system ld-linux and libdl, then (A) is likely your problem.
For (B), you probably got (and ignored) a linker warning to the effect that your program will require at runtime the same libc version that you used to link it. Contrary to popular belief, fully-static binaries are less portable on Linux, not more.

Stack frame NULL in backtrace log

My application is receiving the segmentation fault. The back trace log -
Program received signal SIGSEV, Segmentation fault.
0x00000000004a5c03 in engine_unlocked_finish ()
(gdb) bt
#0 0x00000000004a5c03 in engine_unlocked_finish ()
#1 0x00000000004a5d71 in ENGINE_finish ()
#2 0x000000000046a537 in EVP_PKEY_free_it ()
#3 0x000000000046a91b in EVP_PKEY_free ()
#4 0x00000000004b231a in pubkey_cb ()
#5 0x0000000000470c97 in asn1_item_combine_free ()
#6 0x0000000000750f70 in X509_CINF_seq_tt ()
#7 0x00000000010f7d90 in ?? ()
#8 0x00000000010f7cf0 in ?? ()
#9 0x0000000000000000 in ?? ()
The stackframe at #9 is interesting. It's address is 0x0000000000000000. Does this mean stack got corrupted even before getting to engine_unlocked_finish () ?
The stackframe at #9 is interesting.
Not really. What's most likely happening is that X509_CINF_seq_tt is hand-coded assembly, and lacks correct unwind descriptors, so everything after it in the stack trace is just bogus.
In fact, looking at this source, X509_CINF_seq_tt is not even a function, so it's probably asn1_item_combine_free that starts the "bad unwind".

Resources