Overriding malloc using the LD_PRELOAD and calls malloc in library functions - c

I'm creating a small fast app which detects memory leaks in other apps. I use LD_PRELOAD for override default malloc and using this stored in the memory every call to malloc in your program. Problem is that some library functions of C use malloc too. Moreover, there are library functions which don't free allocated memory. Let's demonstrate:
MyApp.cpp
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
static void * (* LT_MALLOC)(size_t) = 0;
static void (* LT_FREE)(void *) = 0;
static void init_malloc ()
{
char *error;
*(void **) (&LT_MALLOC) = dlsym (RTLD_NEXT, "malloc");
dlerror ();
if ((error = dlerror ()) != NULL)
{
fprintf (stderr, "%s\n", error);
_exit(1);
}
}
static void init_free ()
{
char *error;
*(void **) (&LT_FREE) = dlsym (RTLD_NEXT, "free");
dlerror ();
if ((error = dlerror ()) != NULL)
{
fprintf (stderr, "%s\n", error);
_exit(1);
}
}
extern "C"
void * malloc (size_t size) throw ()
{
if (LT_MALLOC == 0)
init_malloc ();
printf ("malloc(%ld) ", size);
void *p = LT_MALLOC(size);
printf ("p = %p\n", p);
return p;
}
extern "C"
void free (void *p) throw ()
{
if (LT_FREE == 0)
init_free ();
printf ("free(%p)\n", p);
LT_FREE(p);
}
And test.c. (Assume that a source code is not available initially, and I have a program only.)
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main (void)
{
printf ("test start\n");
int *a = (int *)malloc(3 * sizeof (int));
if (a)
printf ("Allocated 3 int - %p\n", a);
time_t t = time(NULL);
a[0] = 1;
printf ("time - %s", ctime(&t));
printf ("a[0] = %d\n", a[0]);
free (a);
printf ("CALL FREE(a)\n");
printf ("test done\n");
return 0;
}
$ g++ -g -fPIC -c MyApp.cpp -o MyApp.o
$ g++ -g -shared MyApp.o -o MyApp.so -ldl
$ gcc -g test.c -o test
$ export LD_PRELOAD=./MyApp.so
Run the program ./test and see:
malloc(72704) p = 0x8aa040
test start
malloc(12) p = 0x6a0c50
Allocated 3 int - 0x6a0c50
free((nil))
malloc(15) p = 0x6a0c70
malloc(552) p = 0x6a0c90
free((nil))
malloc(1014) p = 0x6a0ec0
free(0x6a0c90)
malloc(20) p = 0x6a0c90
malloc(20) p = 0x6a0cb0
malloc(20) p = 0x6a0cd0
malloc(21) p = 0x6a0cf0
malloc(20) p = 0x6a0d10
malloc(20) p = 0x6a0d30
malloc(20) p = 0x6a0d50
malloc(20) p = 0x6a0d70
malloc(21) p = 0x6a0d90
free((nil))
time - Mon Mar 14 18:30:14 2016
a[0] = 1
free(0x6a0c50)
CALL FREE(a)
test done
But I want to see:
test start
malloc(12) p = 0x6a0c50
Allocated 3 int - 0x6a0c50
time - Mon Mar 14 18:30:14 2016
a[0] = 1
free(0x6a0c50)
CALL FREE(a)
test done
I want my app avoid сalls malloc in library functions. But I no ideas how to do it.
Can malloc find out which function call it: library or your own? Or can we make to call default not override malloc in library functions? Or something else?
P.S. I am sorry for my bad English.

I have MANY times used the technique of compiling my code with a macro that replaces malloc with my own version. I thought I'd written an answer to demonstrate that, but apparently not. Something like this:
In "debugmalloc.h" or some such:
#if DEBUG_MALLOC
#define malloc(x) myTrackingMalloc(x, __FILE__, __LINE__)
#define free(x) myTrackngFree(x, __FILE__, __LINE__)
extern void* myTrackingMalloc(size_t size, const char* file, int line);
extern void myTrackingFree(void* ptr, const char* file, int line);
#endif
Then in a source file, e.g "debugmalloc.c":
#if DEBUG_MALLOC
void* myTrackingMalloc(size_t size, const char* file, int line)
{
void *p = malloc(size);
... whatever extra stuff you need ...
return p;
}
void myTrackingFree(void* ptr, const char* file, int line)
{
... some extra code here ...
free(ptr);
}
#endif
[I allocated some extra bytes by modifying size and added then used an offset to return the appropriate actual payload pointer]
This is relatively easy to implement, and doesn't have the drawbacks of injecting using LD_PRELOAD - in particular that you don't need to distinguish between your code and libraries. And of course, you don't need to implement your own malloc.
It does have the slight drawback that it doens't cover new, delete, nor strdup and other library functions that in themselves do memory allocation - you can implement a new and delete global operator set [don't forget the array versions!], but replacing strdup and any other function that may allocate memory is quite a bit of work - and at that point, valgrind is probably going to be quicker, even if it runs quite slow.

Maybe the Dynamic Loader API can help you. There you find the function dladdr() which returns the module and the name of the symbol for a given address, if available.
Please see the man page of dlopen for a detailed explanation.

In the code posted, you do not track calloc, realloc and strdup. It would be more consistent to track all malloc related allocation functions.
You could also track fopen and fclose to disable malloc and free tracking during the execution of those. This would also allow you to track missing fclose() calls.
You can do the same for other C library functions that use malloc. It is unlikely that printf uses malloc, but you could track vfprintf to filter out any such calls. It is difficult to wrap variadic functions, but the printf family all call vfprintf or some close cousin, depending on your C library's implementation. Be careful to disable vfprintf tracking when you call printf yourself!

Related

Library interpositioning

I have been trying to intercept calls to malloc and free, following our textbook (CSAPP book).
I have followed their exact code, and nearly the same code that I found online and I keep getting a segmentation fault. I heard our professor saying something about printf that mallocs and frees memory so I think that this happens because I am intercepting a malloc and since I am using a printf function inside the intercepting function, it will call itself recursively.
However I can't seem to find a solution to solving this problem? Our professor demonstrated that intercepting worked ( he didn't show us the code) and prints our information every time a malloc occurs, so I do know that it's possible.
Can anyone suggest a working method??
Here is the code that I used and get nothing:
mymalloc.c
#ifdef RUNTIME
// Run-time interposition of malloc and free based on // dynamic linker's (ld-linux.so) LD_PRELOAD mechanism #define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h> #include <dlfcn.h>
void *malloc(size_t size) {
static void *(*mallocp)(size_t size) = NULL; char *error;
void *ptr;
// get address of libc malloc
if (!mallocp) {
mallocp = dlsym(RTLD_NEXT, "malloc"); if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(EXIT_FAILURE);
}
}
ptr = mallocp(size);
printf("malloc(%d) = %p\n", (int)size, ptr); return ptr;
}
#endif
test.c
#include <stdio.h>
#include <stdlib.h>
int main(){
printf("main\n");
int* a = malloc(sizeof(int)*5);
a[0] = 1;
printf("end\n");
}
The result i'm getting:
$ gcc -o test test.c
$ gcc -DRUNTIME -shared -fPIC mymalloc.c -o mymalloc.so
$ LD_PRELOAD=./mymalloc.so ./test
Segmentation Fault
This is the code that I tried and got segmentation fault (from https://gist.github.com/iamben/4124829):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void* malloc(size_t size)
{
static void* (*rmalloc)(size_t) = NULL;
void* p = NULL;
// resolve next malloc
if(!rmalloc) rmalloc = dlsym(RTLD_NEXT, "malloc");
// do actual malloc
p = rmalloc(size);
// show statistic
fprintf(stderr, "[MEM | malloc] Allocated: %lu bytes\n", size);
return p;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define STR_LEN 128
int main(int argc, const char *argv[])
{
char *c;
char *str1 = "Hello ";
char *str2 = "World";
//allocate an empty string
c = malloc(STR_LEN * sizeof(char));
c[0] = 0x0;
//and concatenate str{1,2}
strcat(c, str1);
strcat(c, str2);
printf("New str: %s\n", c);
return 0;
}
The makefile from the git repo didn't work so I manually compiled the files and got:
$ gcc -shared -fPIC libint.c -o libint.so
$ gcc -o str str.c
$ LD_PRELOAD=./libint.so ./str
Segmentation fault
I have been doing this for hours and I still get the same incorrect result, despite the fact that I copied textbook code. I would really appreciate any help!!
One way to deal with this is to turn off the printf when your return is called recursively:
static char ACallIsInProgress = 0;
if (!ACallIsInProgress)
{
ACallIsInProgress = 1;
printf("malloc(%d) = %p\n", (int)size, ptr);
ACallIsInProgress = 0;
}
return ptr;
With this, if printf calls malloc, your routine will merely call the actual malloc (via mallocp) and return without causing another printf. You will miss printing information about a call to malloc that the printf does, but that is generally tolerable when interposing is being used to study the general program, not the C library.
If you need to support multithreading, some additional work might be needed.
The printf implementation might allocate a buffer only once, the first time it is used. In that case, you can initialize a flag that turns off the printf similar to the above, call printf once in the main routine (maybe be sure it includes a nice formatting task that causes printf to allocate a buffer, not a plain string), and then set the flag to turn on the printf call and leave it set for the rest of the program.
Another option is for your malloc routine not to use printf at all but to cache data in a buffer to be written later by some other routine or to write raw data to a file using write, with that data interpreted and formatted by a separate program later. Or the raw data could be written by a pipe to a program that formats and prints it and that is not using your interposed malloc.

How do I make dynamic allocations for a program run fail

I have a C program which uses malloc (it could also have been C++ with new). I would like to test my program and simulate an "out of memory" scenario.
I would strongly prefer running my program from within a bash or sh shell environment without modifying the core code.
How do I make dynamic memory allocations fail for a program run?
Seems like it could be possible using ulimit but I can't seem to find the right parameters:
$ ulimit -d 50
$ ./program_which_heap_allocates
./program_which_heap_allocates: error while loading shared libraries: libc.so.6: cannot map zero-fill pages
$ ulimit -d 51
bash: ulimit: data seg size: cannot modify limit: Operation not permitted
I'm having trouble running the program in such a way that dynamic linking can occur (such as stdlib) but not the allocations from my program.
If you are under Linux and using glibc then there are Hooks for Malloc. The hooks allow you to catch calls to malloc and make them randomly fail.
Your test suite could use an environment variable to tell the code to insert the malloc hook and which call of malloc to fail. E.g. if you set FOOBAR_FAIL_MALLOC=10 then your malloc hook would count down and let the 10th use of malloc return 0.
FOOBAR_FAIL_MALLOC=0 could simply report the numbers of mallocs in a testcase. You would then run the test once with FOOBAR_FAIL_MALLOC=0 and capture the number of mallocs involved. Then repeat for FOOBAR_FAIL_MALLOC=1 to N to test every single malloc.
Unless after a failure of malloc you have more mallocs. Then you have to think of something more complex to specify which mallocs should fail.
You could also just make the hook fail randomly. Given enough runs every malloc call would fail at some point.
Note: a C++ new should also hot the malloc hook
You can have your test program include the .c under test and use a #define to override calls to malloc.
For example:
prog.c:
#include <stdio.h>
#include <stdlib.h>
void *foo(int x)
{
return malloc(x);
}
test.c:
#include <stdio.h>
#include <stdlib.h>
static char buf[100];
static int malloc_fail;
void *test_malloc(size_t n)
{
if (malloc_fail) {
return NULL;
} else {
return buf;
}
}
#define malloc(x) test_malloc(x)
#include "prog.c"
#undef malloc
int main()
{
void *p;
malloc_fail=0;
p = foo(5);
printf("buf=%p, p=%p\n", (void *)buf, p); // prints same value both times
malloc_fail=1;
p = foo(4);
if (p) {
printf("buf=%p, p=%p\n", (void *)buf, p);
} else {
printf("p is NULL\n"); // this prints
}
return 0;
}

mmap load shared object and get function pointer

I want to dynamically load a library without using functions from dlfcn.h i have a folder full of .so files compiled with:
gcc -Wall -shared -fPIC -o filename.so filename.c
And all of them have an entry function named:
void __start(int size, char** cmd);
(I know, probably not the best name)
then i try to call open over the so, then read the elf header as an Elf64_Ehdr load it into memory with mmap (i know i should use mprotect but i want to make it work first and then add the security) and finally adding the the elf header entry to the pointer returned by mmap (plus an offset of 0x100).
The test code is the following:
#include <elf.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
typedef void (*ptr_t) (int, char**);
void* alloc_ex_memory(int fd) {
struct stat s;
fstat(fd, &s);
void * ptr = mmap(0, s.st_size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, fd, 0);
if (ptr == (void *)-1) {
perror("mmap");
return NULL;
}
return ptr;
}
void* load_ex_file(const char* elfFile) {
Elf64_Ehdr header;
void * ptr;
FILE* file = fopen(elfFile, "rb");
if(file) {
fread(&header, 1, sizeof(header), file);
if (memcmp(header.e_ident, ELFMAG, SELFMAG) == 0) {
ptr = alloc_ex_memory(fileno(file));
printf("PTR AT -> %p\n", ptr);
printf("Entry at -> %lx\n", header.e_entry+256);
ptr = (header.e_entry + 256);
} else {
ptr = NULL;
}
fclose(file);
return ptr;
}
return NULL;
}
int main(int argc, char** argv) {
ptr_t func = load_ex_file(argv[1]);
printf("POINTER AT: %p\n", func);
getchar();
func(argc, argv);
return 0;
}
Let me explain a bit further:
When i run objdump -d filename.so i get that the _start is at 0x6c0. The elf header entry point returns 0x5c0 (i compensate it adding 256 in dec).
Also pmap shows an executable area being created at lets say 0x7fdf94d0c000
so the function pointer direction that i get and which i call is at 0x7fdf94d0c6c0 this should be the entry point and callable via function pointer. But what i get is, you guessed it: segfault.
Las thing that i would like to point out is that i have the same example running with (dlopen, dlsym, dlclose) but its required that i use the mmap trick. I have seen my professor implement it successfully but i cannot figure out how. (perhaps there is a simpler way that im missing).
I based the code on this jit example
Thank you in advance!
Edit: This is the code of the .c that i compile into .so:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void __start(int size, char** cmd) {
if (size==2) {
if (!strcmp(cmd[1], "-l")) {
printf("******.********.*****#udc.es\n");
return;
} else if (!strcmp(cmd[1], "-n")) {
printf("****** ******** *****\n");
return;
}
} else if (size==1) {
printf("****** ******** ***** (******.********.*****#udc.es)\n");
return;
}
printf("Wrong command syntax: autores [-l | -n]\n");
}
its required that i use the mmap trick. I have seen my professor implement it successfully but i cannot figure out how.
For this to work, your __start must be completely stand-alone, and not call any other library (you violated that requirement by calling strcmp and printf).
The moment you call an unresolved symbol, you ask the dynamic loader to resolve that symbol, and the loader needs all kinds of info for that. That info is normally set up during dlopen call. Since you bypassed dlopen, it is not at all surprising that your program crashes.
Your first step should be to change your code such that __start doesn't do anything at all (is empty), and verify that you can then call __start. That would confirm that you are computing its address correctly.
Second, add a crash to that code:
void __start(...)
{
int *p = NULL;
*p = 42; // should crash here
}
and verify that you are observing the crash now. That will confirm that your code is getting called.
Third, remove the crashing code and use only direct system calls (e.g. write(1, "Hello\n", 6) (but don't call write -- you must implement it directly in your library)) to implement whatever you need.

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

How would a loaded library function call a symbol in the main application?

When loaded a shared library is opened via the function dlopen(), is there a way for it to call functions in main program?
Code of dlo.c (the lib):
#include <stdio.h>
// function is defined in main program
void callb(void);
void test(void) {
printf("here, in lib\n");
callb();
}
Compile with
gcc -shared -olibdlo.so dlo.c
Here the code of the main program (copied from dlopen manpage, and adjusted):
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void callb(void) {
printf("here, i'm back\n");
}
int
main(int argc, char **argv)
{
void *handle;
void (*test)(void);
char *error;
handle = dlopen("libdlo.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
*(void **) (&test) = dlsym(handle, "test");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
(*test)();
dlclose(handle);
exit(EXIT_SUCCESS);
}
Build with
gcc -ldl -rdynamic main.c
Output:
[js#HOST2 dlopen]$ LD_LIBRARY_PATH=. ./a.out
here, in lib
here, i'm back
[js#HOST2 dlopen]$
The -rdynamic option puts all symbols in the dynamic symbol table (which is mapped into memory), not only the names of the used symbols. Read further about it here. Of course you can also provide function pointers (or a struct of function pointers) that define the interface between the library and your main program. It's actually the method what i would choose probably. I heard from other people that it's not so easy to do -rdynamic in windows, and it also would make for a cleaner communication between library and main program (you've got precise control on what can be called and not), but it also requires more house-keeping.
Yes, If you provide your library a pointer to that function, I'm sure the library will be able to run/execute the function in the main program.
Here is an example, haven't compiled it so beware ;)
/* in main app */
/* define your function */
int do_it( char arg1, char arg2);
int do_it( char arg1, char arg2){
/* do it! */
return 1;
}
/* some where else in main app (init maybe?) provide the pointer */
LIB_set_do_it(&do_it);
/** END MAIN CODE ***/
/* in LIBRARY */
int (*LIB_do_it_ptr)(char, char) = NULL;
void LIB_set_do_it( int (*do_it_ptr)(char, char) ){
LIB_do_it_ptr = do_it_ptr;
}
int LIB_do_it(){
char arg1, arg2;
/* do something to the args
...
... */
return LIB_do_it_ptr( arg1, arg2);
}
The dlopen() function, as discussed by #litb, is primarily provided on systems using ELF format object files. It is rather powerful and will let you control whether symbols referenced by the loaded library can be satisfied from the main program, and generally does let them be satisfied. Not all shared library loading systems are as flexible - be aware if it comes to porting your code.
The callback mechanism outlined by #hhafez works now that the kinks in that code are straightened out.

Resources