Why the same function behaves differently after linking other object files? - c

I am handling a bug of a R extension which only occurs on debian system.
The SSL_CTX_new function produces a stack smashing detected during runtime which might indicate an occurrence of segfault.
To understand the bug, I write a standalone test function:
#include <Rcpp.h>
#include <openssl/ssl.h>
RcppExport SEXP test() {
BEGIN_RCPP
SSL_library_init();
SSL_CTX_new(SSLv23_client_method());
END_RCPP
}
This function run normally standalone.
However, after linking my existed project with the test function, it produces a stack smashing detected
Why the same function behaves differently after linking other object files? Could anyone give me some hints? Thanks!
Here is my project: https://github.com/wush978/RMessenger. It crashes on debian so far.

R handles its own memory management. The Valgrind memory profiler / debugger has been used successfully before, and there are some posts on the web.
If I understand your posts correctly, then the SSL routine may be doing something that upsets R. You will have to debug that. What you have posted here does not constitute a reproducible bug report.
You may also find the feedback you could get on the rcpp-devel list helpful.

Related

C Runtime Library issue MD/MDd

We have a C library that we want to distribute, alongside C example code.
The library is of course built in release mode.
The example code project is in cmake so that it can be easily run on both Linux and Windows.
On Linux (debug and release) as well as on Windows (release), we have no issue.
However, on Windows (debug), we have an issue when leaving the main : the program triggers an assertion:
Invalid address specified to RtlValidateHeap
Expression: _CrtIsValidHeapPointer(block)
Then when continuing the process, it raises the following exception:
Unhandled exception at [...] (ntdll.dll)
0xC000000D STATUS_INVALID_PARAMETER
As this seemed related to runtime library, we tryed changing it from MDd (multithreaded dll debug) to MD (multithreaded dll) [ more on these here ] and it solved the issue.
However, this seems like a work around rather than a fix: the release library (built with MD) should be usable in a debug program using MDd, right?
As we understand it, conflicts in runtime library only appears when allocation is made in the caller and deallocation is made in the callee or vice-versa.
So we tracked all allocations to check them and everything seems ok.
We ran leak detections on the example code in both Linux (Valgrind) and Windows (CrtDbg) but they didn't find any leak, everything seems fine.
Is it right to expect a release library built with MD to run in an MDd program?
If not, it seems strange: libraries are always distributed in release, yet used in debug solutions while developing...
If yes, what can cause the issue?
It sounds more like a heap corruption than a leak. It implies someone is overwriting the heap (passed its allocated memory). Finding it can be pain in the but.
First, check your example code. Strip it down to the bare minimum "hello world" and then build it up until it happens again. Then check the example code. If it is not the example code, check which library functions were called and code-review those.
As an aid, you can use the MS heap check functions. place them at function entry and function exit, or maintain global versions that you regularly check. The following is an example:
#include <crtdbg.h>
void example(void)
{
_CrtMemState memStateStart, memStateEnd, memStateDelta;
// Make a checkpoint of the heap's state so we can later check the heap is still OK
_CrtMemCheckpoint( &memStateStart );
//
// do your things
//
// Check the heap
_CrtMemCheckpoint( &memStateEnd );
_CrtSetReportMode( _CRT_WARN, _CRTDBG_MODE_WNDW );
_CrtSetReportMode( _CRT_ERROR, _CRTDBG_MODE_WNDW );
_CrtSetReportMode( _CRT_ASSERT, _CRTDBG_MODE_WNDW );
if (_CrtMemDifference( &memStateDelta, &memStateStart, &memStateEnd ))
_CrtMemDumpStatistics( &memStateDelta );
_CrtDumpMemoryLeaks();
}

Why might malloc'd memory from a shared library be inaccessible to the application?

I maintain a library written in C, which is being accessed by a user on Linux, directly from Python using a module which loads the shared library and call functions. The module is very commonly used, as is this version of the shared library, by people doing a popular tutorial.
The user is getting a segmentation fault. Running his Python script under gdb, he sees that it is in the shared library, within a function that mallocs memory for an struct and returns the pointer. He is getting a pointer back, but when he attempts to use it in subsequent calls to the shared library, the segmentation fault occurs as the memory is inaccessible.
If he runs the Python script as root, the problem does not occur. Nor does it occur in an alternate Linux installation.
So to recap:
His Python code loads the shared library.
It then calls a function which returns a pointer to memory allocated within the shared library.
Then he calls another function in the shared library, and passed in the pointer it returned to him, and the shared library chokes on it's own pointer.
It only occurs when he runs it as a normal user on "4.0.7-2-ARCH x86_64 GNU/Linux". It does not occur on that OS, when he switches to root and runs it.
It does not occur when he attempts to reproduce the problem on a Ubuntu machine.
What gives? Is this some ARCH bug? Or is there programming nuances to this which can be cleared up?
You can read the minutiae here which includes enough detail to reproduce the problem, if the problem is not self-evident to users with more Linux programming experience than I.
Quick links to the shared library functions:
Source code for TCOD_map_new.
Source code for TCOD_map_set_properties.
Excerpt of his Python code for posterity and ease of access:
#!/usr/bin/env python2
import curses
import libtcodpy as libtcod
def main(stdscr):
curses.start_color()
curses.use_default_colors()
map = libtcod.map_new(10, 10) # any numbers work
libtcod.map_set_properties(map, 0, 0, True, True) # any in bounds integer coordinates fail
stdscr.getch()
curses.wrapper(main)
I met the same problem with you. My solution is that I declared the string ( malloc() ) in the caller function, than pass-by-reference to the callee function and it fill the content.

Link time running of code? (specifically, issues raised by Valgrind seemingly at linktime)

I suspect this is an issue with my understanding of how the linking of shared objects takes place on Linux.
Using Valgrind with OpenCL (which, from various other posts, I understand to be problematic in its own right), I'm encountering errors from a module that is part of the shared object but is never actually run.
Specifically, I have an OpenCL helper module that has a series of functions for performing OpenCL actions. I have removed all references to the functions within this module from the executing code. I would naively assume, therefore, that this module cannot raise any problems with Valgrind.
However, I am seeing issues that are raised through _dl_init.c (lots of them, showing how broken OpenCL is with Valgrind). This suggests to me that code within the OpenCL runtime is being executed at link time.
Can someone clarify (or point me to suitable clarification material) how _dl_init.c is involved in the linker process?
Is it universally true that the .so files execute some initialisation code, or is it a library option?
Is this something that is easily accessible to library writers, or does it involved nefarious hacks?
Shared objects (.so files) are permitted to have code that is executed as soon as the library is loaded, regardless of the use of any of the code in the library.
This is used, for example, to perform static initialization of C++ objects.
If you don't want to have valgrind complaining about things that are being done in the library behind your back, then you can run valgrind so that it generates output that can be used as a suppression file by passing in --gen-suppressions=all. You use the suppression output in a suppression file of your own when running against the library and it should mask out these issues.
Cases when it appears:
If you're using C++ code and have globally scoped objects, then their constructors are called when the library is loaded
If you add the gcc specific attribute((constructor)) to a function definition it gets called when the library is loaded
e.g. (C++) at a global scope:
#include <iostream>
class demo {
demo() { std::cout << "Hello World" << std::endl; }
~demo() {}
};
demo ademo;
e.g. (C)
#include <stdio.h>
static void __attribute__((constructor)) my_init() {
printf("Hello World\n");
}
There is a corresponding event for object destruction or __attribute__((destructor)) on library unloading.
The ELF spec defines the presence of an .init and .fini section, which are, for libraries, the mechanism that is used to run constructor-type and destructor-type code when the library is loaded/unloaded. For a standard executable it's this, plus the getting you to main with the appropriate parameters.
You can explicitly change these entry points, but that's a little bit dangerous and can lead to crashes and unknown bugs. It makes more sense to hook into the mentioned supported mechanisms.

Can I make valgrind ignore glibc libraries?

Is it possible to tell valgrind to ignore some set of libraries?
Specifically glibc libraries..
Actual Problem:
I have some code that runs fine in normal execution. No leaks etc.
When I try to run it through valgrind, I get core dumps and program restarts/stops.
Core usually points to glibc functions (usually fseek, mutex etc).
I understand that there might be some issue with incompatible glibc / valgrind version.
I tried various valgrind releases and glibc versions but no luck.
Any suggestions?
This probably doesn't answer your question, but will provide you the specifics of how to suppress certain errors (which others have alluded to but have not described in detail):
First, run valgrind as follows:
valgrind --gen-suppressions=all --log-file=valgrind.out ./a.out
Now the output file valgrind.out will contain some automatically-generated suppression blocks like the following:
{
stupid sendmsg bug: http://sourceware.org/bugzilla/show_bug.cgi?id=14687
Memcheck:Param
sendmsg(mmsg[0].msg_hdr)
fun:sendmmsg
obj:/usr/lib/libresolv-2.17.so
fun:__libc_res_nquery
obj:/usr/lib/libresolv-2.17.so
fun:__libc_res_nsearch
fun:_nss_dns_gethostbyname4_r
fun:gaih_inet
fun:getaddrinfo
fun:get_socket_fd
fun:main
}
Where "stupid sendmsg bug" and the link are the name that I added to refer to this block. Now, save that block to sendmsg.supp and tell valgrind about that file on the next run:
valgrind --log-file=valgrind --suppressions=sendmsg.supp ./a.out
And valgrind will graciously ignore that stupid upstream bug.
As noted by unwind, valgrind has an elaborate mechanism for controlling which procedures are instrumented and how. But both valgrind and glibc are complicated beasts, and you really, really, really don't want to do this. The easy way to get a glibc and valgrind that are mutually compatible is to get both from the Linux distro of your choice. Things should "just work", and if they don't, you have somebody to complain to.
Yes, look into Valgrind's suppression system.
You probably want to ask about this on the Valgrind user's mailing list (which is extremely helpful). You can suppress output from certain calls, however, suppressing the noise is all you are doing. The calls are still going through Valgrind.
To accomplish what you need, you (ideally) match Valgrind appropriately with glibc or use the macros in valgrind/valgrind.h to work around them. Using those, yes, you can tell valgrind not to touch certain things. I'm not sure what calls are borking everything, however you can also (selectively) not run bits of code in your own program if its run under valgrind. See the RUNNING_ON_VALGRIND macro in valgrind/valgrind.h.
The other thing that comes to mind is to make sure that Valgrind was compiled correctly to deal with threads. Keep in mind that atomic operations under Valgrind could cause your program to crash during racey operations, where it otherwise might not, if not properly configured.
If you have been swapping versions of valgrind and glibc, there's a chance you found a match, but incorrectly configured valgrind at build time.

Implementing traceback on i386

I am currently porting our code from an alpha (Tru64) to an i386 processor (Linux) in C.
Everything has gone pretty smoothly up until I looked into porting our
exception handling routine. Currently we have a parent process which
spawns lots of sub processes, and when one of these sub-processes
fatal's (unfielded) I have routines to catch the process.
I am currently struggling to find the best method of implementing a traceback routine which can list the function addresses in the error log, currently my routine just prints the the signal which caused the exception and the exception qualifier code.
Any help would be greatly received, ideally I would write error handling for all processors, however at this stage I only really care about i386, and x86_64.
Thanks
Mark
The glibc functions backtrace() and backtrace_symbols(), from execinfo.h, might be of use.
You might look at http://tlug.up.ac.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV. It covers the functionality you need. However you must link against libgdb and libdl, compile with -rdynamic (includes more symbols in the executable), and forgo the use of some optimizations.
There are two GNU (non-POSIX) functions that can help you - backtrace() and backtrace_symbols() - first returns array of function addresses and second resolves addresses to names. Unfortunately names of static functions cannot be resolved.
To get it working you need to compile your binary with -rdynamic flag.
Unfortunately, there isn't a "best" method since the layout of the stack can vary depending on the CPU, the OS and the compiler used to compile your code. But this article may help.
Note that you must implement this in the child process; the parent process just gets a signal that something is wrong; you don't get a copy of the child stack.
If a comment, you state you are using gcc. This http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Return-Address.html#Return-Address could be useful.
If you're fine with only getting proper backtraces when running through valgrind, then this might be an option for you:
VALGRIND_PRINTF_BACKTRACE(format, ...):
It will give you the backtrace for all functions, including static ones.

Resources