Library interpositioning - c

I have been trying to intercept calls to malloc and free, following our textbook (CSAPP book).
I have followed their exact code, and nearly the same code that I found online and I keep getting a segmentation fault. I heard our professor saying something about printf that mallocs and frees memory so I think that this happens because I am intercepting a malloc and since I am using a printf function inside the intercepting function, it will call itself recursively.
However I can't seem to find a solution to solving this problem? Our professor demonstrated that intercepting worked ( he didn't show us the code) and prints our information every time a malloc occurs, so I do know that it's possible.
Can anyone suggest a working method??
Here is the code that I used and get nothing:
mymalloc.c
#ifdef RUNTIME
// Run-time interposition of malloc and free based on // dynamic linker's (ld-linux.so) LD_PRELOAD mechanism #define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h> #include <dlfcn.h>
void *malloc(size_t size) {
static void *(*mallocp)(size_t size) = NULL; char *error;
void *ptr;
// get address of libc malloc
if (!mallocp) {
mallocp = dlsym(RTLD_NEXT, "malloc"); if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(EXIT_FAILURE);
}
}
ptr = mallocp(size);
printf("malloc(%d) = %p\n", (int)size, ptr); return ptr;
}
#endif
test.c
#include <stdio.h>
#include <stdlib.h>
int main(){
printf("main\n");
int* a = malloc(sizeof(int)*5);
a[0] = 1;
printf("end\n");
}
The result i'm getting:
$ gcc -o test test.c
$ gcc -DRUNTIME -shared -fPIC mymalloc.c -o mymalloc.so
$ LD_PRELOAD=./mymalloc.so ./test
Segmentation Fault
This is the code that I tried and got segmentation fault (from https://gist.github.com/iamben/4124829):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void* malloc(size_t size)
{
static void* (*rmalloc)(size_t) = NULL;
void* p = NULL;
// resolve next malloc
if(!rmalloc) rmalloc = dlsym(RTLD_NEXT, "malloc");
// do actual malloc
p = rmalloc(size);
// show statistic
fprintf(stderr, "[MEM | malloc] Allocated: %lu bytes\n", size);
return p;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define STR_LEN 128
int main(int argc, const char *argv[])
{
char *c;
char *str1 = "Hello ";
char *str2 = "World";
//allocate an empty string
c = malloc(STR_LEN * sizeof(char));
c[0] = 0x0;
//and concatenate str{1,2}
strcat(c, str1);
strcat(c, str2);
printf("New str: %s\n", c);
return 0;
}
The makefile from the git repo didn't work so I manually compiled the files and got:
$ gcc -shared -fPIC libint.c -o libint.so
$ gcc -o str str.c
$ LD_PRELOAD=./libint.so ./str
Segmentation fault
I have been doing this for hours and I still get the same incorrect result, despite the fact that I copied textbook code. I would really appreciate any help!!

One way to deal with this is to turn off the printf when your return is called recursively:
static char ACallIsInProgress = 0;
if (!ACallIsInProgress)
{
ACallIsInProgress = 1;
printf("malloc(%d) = %p\n", (int)size, ptr);
ACallIsInProgress = 0;
}
return ptr;
With this, if printf calls malloc, your routine will merely call the actual malloc (via mallocp) and return without causing another printf. You will miss printing information about a call to malloc that the printf does, but that is generally tolerable when interposing is being used to study the general program, not the C library.
If you need to support multithreading, some additional work might be needed.
The printf implementation might allocate a buffer only once, the first time it is used. In that case, you can initialize a flag that turns off the printf similar to the above, call printf once in the main routine (maybe be sure it includes a nice formatting task that causes printf to allocate a buffer, not a plain string), and then set the flag to turn on the printf call and leave it set for the rest of the program.
Another option is for your malloc routine not to use printf at all but to cache data in a buffer to be written later by some other routine or to write raw data to a file using write, with that data interpreted and formatted by a separate program later. Or the raw data could be written by a pipe to a program that formats and prints it and that is not using your interposed malloc.

Related

Why replacing malloc causes segment error (__GI__IO_file_overflow)

I tried to mimic the way jemalloc replaces ptmalloc by replacing malloc myself, and the replacement resulted in a direct segment error
code1.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
void *ptr = malloc(10);
printf("%p\n", ptr);
return EXIT_SUCCESS;
}
code2.c:
#include <stdlib.h>
#include <stdint.h>
void *malloc(size_t size)
{
return (void *)10;
}
Compile instructions
gcc -c code2.c
ar r libcode2.a code2.o
gcc code1.c -L. -lcode2 -g
gdb
Breakpoint 1, main (argc=1, argv=0x7fffffffe318) at code1.c
17 void *ptr = malloc(10);
(gdb) s
18 printf("%p\n", ptr);
(gdb) p ptr
$1 = (void *) 0xa
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e46658 in __GI__IO_file_overflow () from /lib64/libc.so.6
the replacement resulted in a direct segment error code1.c
If you replace malloc with a non-functional variant, you better not call any libc functions which may use malloc in their implementation.
Here you called printf, which itself uses malloc internally. Use GDB where command to observe where the crash happened.
I am actually surprised your program made it as far as reaching main() -- I expected it to crash much earlier (there are 1000s of instruction executed long before main is reached).

Variable is unexpectedly overwritten in for loop with crypt()

I'm trying to build a C program that will bruteforce a hash given in argument. Here is the code:
#include <unistd.h>
#include <stdio.h>
#include <crypt.h>
#include <string.h>
const char setting[] = "$6$QSX8hjVa$";
const char values[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
int main(int argc, char *argv[])
{
char *hashToCrack = crypt(argv[1], setting);
printf("%s\n", hashToCrack);
for (int i = 0; i < strlen(values); i++)
{
printf("trying %c ...\n", values[i]);
char *try = crypt(&values[i], setting);
if (strcmp(hashToCrack, try) == 0)
{
printf("calc: %s\n", try);
printf("init: %s\n", hashToCrack);
printf("Found!\n");
}
}
return 0;
}
For convenience, I just give in argument a string that will be the one to crack. It is encrypted at the beginning of the main function (stored in hashToCrack). For now, I just work with one char. I compile the program this way: gcc main.c -o main -lcrypt -Wall.
The problem - When I launch this program, I have "Found!" in every iteration in the for loop. It seems that hashToCrack and try are the same. However, I never overwrite hashToCrack, so it should never change.
There is probably something I don't understand with pointers, but I can't find it.
Any idea ? :D
The crypt function returns a pointer to a static data buffer. So when you call it again, the string pointed to by hashToCrack changes.
You need to copy the results of the first call to crypt into a separate buffer.
char *hashToCrack = strdup(crypt(argv[1], setting));
Don't forget to call free on this buffer when you're done with it.

mmap load shared object and get function pointer

I want to dynamically load a library without using functions from dlfcn.h i have a folder full of .so files compiled with:
gcc -Wall -shared -fPIC -o filename.so filename.c
And all of them have an entry function named:
void __start(int size, char** cmd);
(I know, probably not the best name)
then i try to call open over the so, then read the elf header as an Elf64_Ehdr load it into memory with mmap (i know i should use mprotect but i want to make it work first and then add the security) and finally adding the the elf header entry to the pointer returned by mmap (plus an offset of 0x100).
The test code is the following:
#include <elf.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
typedef void (*ptr_t) (int, char**);
void* alloc_ex_memory(int fd) {
struct stat s;
fstat(fd, &s);
void * ptr = mmap(0, s.st_size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, fd, 0);
if (ptr == (void *)-1) {
perror("mmap");
return NULL;
}
return ptr;
}
void* load_ex_file(const char* elfFile) {
Elf64_Ehdr header;
void * ptr;
FILE* file = fopen(elfFile, "rb");
if(file) {
fread(&header, 1, sizeof(header), file);
if (memcmp(header.e_ident, ELFMAG, SELFMAG) == 0) {
ptr = alloc_ex_memory(fileno(file));
printf("PTR AT -> %p\n", ptr);
printf("Entry at -> %lx\n", header.e_entry+256);
ptr = (header.e_entry + 256);
} else {
ptr = NULL;
}
fclose(file);
return ptr;
}
return NULL;
}
int main(int argc, char** argv) {
ptr_t func = load_ex_file(argv[1]);
printf("POINTER AT: %p\n", func);
getchar();
func(argc, argv);
return 0;
}
Let me explain a bit further:
When i run objdump -d filename.so i get that the _start is at 0x6c0. The elf header entry point returns 0x5c0 (i compensate it adding 256 in dec).
Also pmap shows an executable area being created at lets say 0x7fdf94d0c000
so the function pointer direction that i get and which i call is at 0x7fdf94d0c6c0 this should be the entry point and callable via function pointer. But what i get is, you guessed it: segfault.
Las thing that i would like to point out is that i have the same example running with (dlopen, dlsym, dlclose) but its required that i use the mmap trick. I have seen my professor implement it successfully but i cannot figure out how. (perhaps there is a simpler way that im missing).
I based the code on this jit example
Thank you in advance!
Edit: This is the code of the .c that i compile into .so:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void __start(int size, char** cmd) {
if (size==2) {
if (!strcmp(cmd[1], "-l")) {
printf("******.********.*****#udc.es\n");
return;
} else if (!strcmp(cmd[1], "-n")) {
printf("****** ******** *****\n");
return;
}
} else if (size==1) {
printf("****** ******** ***** (******.********.*****#udc.es)\n");
return;
}
printf("Wrong command syntax: autores [-l | -n]\n");
}
its required that i use the mmap trick. I have seen my professor implement it successfully but i cannot figure out how.
For this to work, your __start must be completely stand-alone, and not call any other library (you violated that requirement by calling strcmp and printf).
The moment you call an unresolved symbol, you ask the dynamic loader to resolve that symbol, and the loader needs all kinds of info for that. That info is normally set up during dlopen call. Since you bypassed dlopen, it is not at all surprising that your program crashes.
Your first step should be to change your code such that __start doesn't do anything at all (is empty), and verify that you can then call __start. That would confirm that you are computing its address correctly.
Second, add a crash to that code:
void __start(...)
{
int *p = NULL;
*p = 42; // should crash here
}
and verify that you are observing the crash now. That will confirm that your code is getting called.
Third, remove the crashing code and use only direct system calls (e.g. write(1, "Hello\n", 6) (but don't call write -- you must implement it directly in your library)) to implement whatever you need.

Overriding malloc using the LD_PRELOAD and calls malloc in library functions

I'm creating a small fast app which detects memory leaks in other apps. I use LD_PRELOAD for override default malloc and using this stored in the memory every call to malloc in your program. Problem is that some library functions of C use malloc too. Moreover, there are library functions which don't free allocated memory. Let's demonstrate:
MyApp.cpp
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
static void * (* LT_MALLOC)(size_t) = 0;
static void (* LT_FREE)(void *) = 0;
static void init_malloc ()
{
char *error;
*(void **) (&LT_MALLOC) = dlsym (RTLD_NEXT, "malloc");
dlerror ();
if ((error = dlerror ()) != NULL)
{
fprintf (stderr, "%s\n", error);
_exit(1);
}
}
static void init_free ()
{
char *error;
*(void **) (&LT_FREE) = dlsym (RTLD_NEXT, "free");
dlerror ();
if ((error = dlerror ()) != NULL)
{
fprintf (stderr, "%s\n", error);
_exit(1);
}
}
extern "C"
void * malloc (size_t size) throw ()
{
if (LT_MALLOC == 0)
init_malloc ();
printf ("malloc(%ld) ", size);
void *p = LT_MALLOC(size);
printf ("p = %p\n", p);
return p;
}
extern "C"
void free (void *p) throw ()
{
if (LT_FREE == 0)
init_free ();
printf ("free(%p)\n", p);
LT_FREE(p);
}
And test.c. (Assume that a source code is not available initially, and I have a program only.)
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main (void)
{
printf ("test start\n");
int *a = (int *)malloc(3 * sizeof (int));
if (a)
printf ("Allocated 3 int - %p\n", a);
time_t t = time(NULL);
a[0] = 1;
printf ("time - %s", ctime(&t));
printf ("a[0] = %d\n", a[0]);
free (a);
printf ("CALL FREE(a)\n");
printf ("test done\n");
return 0;
}
$ g++ -g -fPIC -c MyApp.cpp -o MyApp.o
$ g++ -g -shared MyApp.o -o MyApp.so -ldl
$ gcc -g test.c -o test
$ export LD_PRELOAD=./MyApp.so
Run the program ./test and see:
malloc(72704) p = 0x8aa040
test start
malloc(12) p = 0x6a0c50
Allocated 3 int - 0x6a0c50
free((nil))
malloc(15) p = 0x6a0c70
malloc(552) p = 0x6a0c90
free((nil))
malloc(1014) p = 0x6a0ec0
free(0x6a0c90)
malloc(20) p = 0x6a0c90
malloc(20) p = 0x6a0cb0
malloc(20) p = 0x6a0cd0
malloc(21) p = 0x6a0cf0
malloc(20) p = 0x6a0d10
malloc(20) p = 0x6a0d30
malloc(20) p = 0x6a0d50
malloc(20) p = 0x6a0d70
malloc(21) p = 0x6a0d90
free((nil))
time - Mon Mar 14 18:30:14 2016
a[0] = 1
free(0x6a0c50)
CALL FREE(a)
test done
But I want to see:
test start
malloc(12) p = 0x6a0c50
Allocated 3 int - 0x6a0c50
time - Mon Mar 14 18:30:14 2016
a[0] = 1
free(0x6a0c50)
CALL FREE(a)
test done
I want my app avoid сalls malloc in library functions. But I no ideas how to do it.
Can malloc find out which function call it: library or your own? Or can we make to call default not override malloc in library functions? Or something else?
P.S. I am sorry for my bad English.
I have MANY times used the technique of compiling my code with a macro that replaces malloc with my own version. I thought I'd written an answer to demonstrate that, but apparently not. Something like this:
In "debugmalloc.h" or some such:
#if DEBUG_MALLOC
#define malloc(x) myTrackingMalloc(x, __FILE__, __LINE__)
#define free(x) myTrackngFree(x, __FILE__, __LINE__)
extern void* myTrackingMalloc(size_t size, const char* file, int line);
extern void myTrackingFree(void* ptr, const char* file, int line);
#endif
Then in a source file, e.g "debugmalloc.c":
#if DEBUG_MALLOC
void* myTrackingMalloc(size_t size, const char* file, int line)
{
void *p = malloc(size);
... whatever extra stuff you need ...
return p;
}
void myTrackingFree(void* ptr, const char* file, int line)
{
... some extra code here ...
free(ptr);
}
#endif
[I allocated some extra bytes by modifying size and added then used an offset to return the appropriate actual payload pointer]
This is relatively easy to implement, and doesn't have the drawbacks of injecting using LD_PRELOAD - in particular that you don't need to distinguish between your code and libraries. And of course, you don't need to implement your own malloc.
It does have the slight drawback that it doens't cover new, delete, nor strdup and other library functions that in themselves do memory allocation - you can implement a new and delete global operator set [don't forget the array versions!], but replacing strdup and any other function that may allocate memory is quite a bit of work - and at that point, valgrind is probably going to be quicker, even if it runs quite slow.
Maybe the Dynamic Loader API can help you. There you find the function dladdr() which returns the module and the name of the symbol for a given address, if available.
Please see the man page of dlopen for a detailed explanation.
In the code posted, you do not track calloc, realloc and strdup. It would be more consistent to track all malloc related allocation functions.
You could also track fopen and fclose to disable malloc and free tracking during the execution of those. This would also allow you to track missing fclose() calls.
You can do the same for other C library functions that use malloc. It is unlikely that printf uses malloc, but you could track vfprintf to filter out any such calls. It is difficult to wrap variadic functions, but the printf family all call vfprintf or some close cousin, depending on your C library's implementation. Be careful to disable vfprintf tracking when you call printf yourself!

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

Resources