Segmentation fault after defining my own malloc? - c

Why does this result in a segmentation fault?
#include <stddef.h>
void *malloc(size_t s) {
};
int main() {
printf("lol");
}
This is how I compile:
gcc -o l lol.c
My guess is that printf() calls malloc().

Per the C language specification, providing your own definitions of standard library functions as functions with external linkage (which is the default for functions) produces undefined behavior. Those names are reserved for such use (C17 7.1.3). You have observed one of many possible manifestations of such behavior.
You have at least four alternatives:
Just use the standard library's implementation.
Define your function with a different name. For example, my_malloc(). You will then need to use that name to call it, though you could disguise that by use of a macro.
Declare your function static (giving it internal linkage). Then it can have the same name as a standard library function, but only functions defined in the same translation unit (roughly: source file) will be able to call it via that name.
Engage implementation-specific provisions of your particular C implementation (see next).
Some C implementations make implementation-specific provision for programs to provide their own versions of at least some library functions. Glibc is one of these. However, such provisions are subject to significant limits.
First and foremost, you can expect that the implementation will require your replacement functions to provide the same binary interface and to correctly implement the behavior specified by the language. (Your function does not do the latter.)
Second, where the function is part of a set of related ones, as malloc is, you may find that the implementation requires you to replace the whole set. Indeed, Glibc docs say that "replacing malloc" involves providing replacements for all these functions: malloc, free, calloc, realloc. Your program does not do this, either. The Glibc docs recommend providing replacements for several other functions as well, with the suggestion that failure to do so, while not in itself compromising any Glibc functions, is likely to break some programs: aligned_alloc, cfree,* malloc_usable_size, memalign, posix_memalign, pvalloc, valloc. These latter are not relevant to your particular example, however.
*Required only by very old programs.

Something in the standard library is calling malloc(), expecting it to return a usable memory address, and writing something to that address.
On Unixy (or at least Linuxy) platforms, when you define a library function in the main program, it overrides the one in any other library, even when a library calls it, even when the same library that defines it calls it.

To verify that your malloc() function overrides the sibling one in the C library and is called by printf(), you can log a trace on the terminal (file descriptor number 1) by calling write() system call:
#include <stdio.h>
#include <unistd.h>
void *malloc(size_t s) {
write(1, "I am called\n", sizeof("I am called\n") - 1);
};
int main() {
printf("lol");
}
After compilation, the programs displays:
$ gcc -g overm.c -o overm
$ ./overm
I am called
Segmentation fault (core dumped)
The analysis of the generated core dump file with a debugger like gdb shows the state of the call stack at the time of the crash:
$ gdb overm core
[...]
Reading symbols from overm...
[New LWP 8494]
Core was generated by `./overm'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fd4bf7d6dc8 in _IO_new_file_overflow (f=0x7fd4bf9336a0 <_IO_2_1_stdout_>, ch=108) at fileops.c:781
781 fileops.c: No such file or directory.
(gdb) where
#0 0x00007fd4bf7d6dc8 in _IO_new_file_overflow (f=0x7fd4bf9336a0 <_IO_2_1_stdout_>, ch=108) at fileops.c:781
#1 0x00007fd4bf7d8024 in __GI__IO_default_xsputn (n=<optimized out>, data=<optimized out>, f=<optimized out>) at libioP.h:948
#2 __GI__IO_default_xsputn (f=f#entry=0x7fd4bf9336a0 <_IO_2_1_stdout_>, data=<optimized out>, n=n#entry=3) at genops.c:370
#3 0x00007fd4bf7d56fa in _IO_new_file_xsputn (n=3, data=<optimized out>, f=<optimized out>) at fileops.c:1265
#4 _IO_new_file_xsputn (f=0x7fd4bf9336a0 <_IO_2_1_stdout_>, data=<optimized out>, n=3) at fileops.c:1197
#5 0x00007fd4bf7bc972 in __vfprintf_internal (s=0x7fd4bf9336a0 <_IO_2_1_stdout_>, format=0x5556655db011 "lol", ap=ap#entry=0x7fff657394a0,
mode_flags=mode_flags#entry=0) at ../libio/libioP.h:948
#6 0x00007fd4bf7a7d3f in __printf (format=<optimized out>) at printf.c:33
#7 0x00005556655da1ab in main () at overm.c:8
The above display shows that the crash occured while calling printf() at line 33 of the source code. Let's set a breakpoint on the printf() function and rerun the program from the debugger:
(gdb) br printf
Breakpoint 3 at 0x7ffff7e1ac90: file printf.c, line 28.
(gdb) run
Breakpoint 3, __printf (format=0x555555556011 "lol") at printf.c:28
28 printf.c: No such file or directory.
(gdb)
Once we are stopped at the printf() call, add another breakpoint onto malloc() function (we didn't do that before because malloc() is called several times during program startup by internal code) and continue the execution:
(gdb) br malloc
(gdb) continue
Continuing.
Breakpoint 4, malloc (s=140737354002065) at overm.c:4
4 void *malloc(size_t s) {
(gdb) where
#0 malloc (s=140737354002065) at overm.c:4
#1 0x00007ffff7e3ad04 in __GI__IO_file_doallocate (fp=0x7ffff7fa66a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#2 0x00007ffff7e4aed0 in __GI__IO_doallocbuf (fp=fp#entry=0x7ffff7fa66a0 <_IO_2_1_stdout_>) at libioP.h:948
#3 0x00007ffff7e49f30 in _IO_new_file_overflow (f=0x7ffff7fa66a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#4 0x00007ffff7e486b5 in _IO_new_file_xsputn (n=3, data=<optimized out>, f=<optimized out>) at libioP.h:948
#5 _IO_new_file_xsputn (f=0x7ffff7fa66a0 <_IO_2_1_stdout_>, data=<optimized out>, n=3) at fileops.c:1197
#6 0x00007ffff7e2f972 in __vfprintf_internal (s=0x7ffff7fa66a0 <_IO_2_1_stdout_>, format=0x555555556011 "lol", ap=ap#entry=0x7fffffffdd90,
mode_flags=mode_flags#entry=0) at ../libio/libioP.h:948
#7 0x00007ffff7e1ad3f in __printf (format=<optimized out>) at printf.c:33
#8 0x00005555555551ab in main () at overm.c:8
(gdb)
The displayed stack when we are stopped at the malloc() call, shows that it is the result of the printf() call from line 33 in the source code.

Related

"Segmentation fault (core dumped)" for: "No such file or directory" for libioP.h, printf-parse.h, vfprintf-internal.c, etc

Sample errors in the core dump files:
1289 vfprintf-internal.c: No such file or directory.
111 printf-parse.h: No such file or directory.
948 libioP.h: No such file or directory.
948 libioP.h: No such file or directory.
I'm working on a fast_malloc() implementation, but getting segmentation faults for unknown reasons once I override malloc() and free() with my own implementations, but NOT before that (meaning, if I call fast_malloc() it's fine, but if I want to be able to call malloc() to get my implementation, it seems to be broken).
Why the segfault?
Sample output, before ANYTHING can be printed, including the print statement at the start of main(), and some debug prints inside my fast_malloc():
Segmentation fault (core dumped)
I have turned on core dumps as I explain here.
So, gdb path/to/my/executable core shows some of the following core file info. Note that each run may result in a different statement for what file is missing in "No such file or directory."
One run:
Reading symbols from build/fast_malloc_unit_tests...
warning: core file may not match specified executable file.
[New LWP 1257155]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fd50fc7ba01 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>,
format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap#entry=0x7ffec28300a0,
mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1289
1289 vfprintf-internal.c: No such file or directory.
(gdb) bt
#0 0x00007fd50fc7ba01 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>,
format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap#entry=0x7ffec28300a0,
mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1289
#1 0x00007fd50fc66ebf in __printf (format=<optimized out>) at printf.c:33
#2 0x00005622fd1b53eb in fast_malloc (num_bytes=1024) at src/fast_malloc.c:225
#3 0x00005622fd1b5b66 in malloc (num_bytes=1024) at src/fast_malloc.c:496
#4 0x00007fd50fc86e84 in __GI__IO_file_doallocate (fp=0x7fd50fdee6a0 <_IO_2_1_stdout_>)
at filedoalloc.c:101
#5 0x00007fd50fc97050 in __GI__IO_doallocbuf (fp=fp#entry=0x7fd50fdee6a0 <_IO_2_1_stdout_>)
at libioP.h:948
#6 0x00007fd50fc960b0 in _IO_new_file_overflow (f=0x7fd50fdee6a0 <_IO_2_1_stdout_>, ch=-1)
at fileops.c:745
#7 0x00007fd50fc94835 in _IO_new_file_xsputn (n=7, data=<optimized out>, f=<optimized out>)
at libioP.h:948
#8 _IO_new_file_xsputn (f=0x7fd50fdee6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=7)
at fileops.c:1197
#9 0x00007fd50fc7baf2 in __vfprintf_internal (s=0x7fd50fdee6a0 <_IO_2_1_stdout_>,
format=0x5622fd1b8010 "DEBUG: %s():\n", ap=ap#entry=0x7ffec28308e0,
mode_flags=mode_flags#entry=0) at ../libio/libioP.h:948
#10 0x00007fd50fc66ebf in __printf (format=<optimized out>) at printf.c:33
#11 0x00005622fd1b53eb in fast_malloc (num_bytes=1024) at src/fast_malloc.c:225
#12 0x00005622fd1b5b66 in malloc (num_bytes=1024) at src/fast_malloc.c:496
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) q
Another one:
Reading symbols from build/fast_malloc_unit_tests...
warning: core file may not match specified executable file.
[New LWP 1257787]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f20b0bbba80 in __find_specmb (
format=0x5644c516d108 "DEBUG: block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
111 printf-parse.h: No such file or directory.
(gdb) bt
#0 0x00007f20b0bbba80 in __find_specmb (
format=0x5644c516d108 "DEBUG: block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
#1 __vfprintf_internal (s=0x7f20b0d2e6a0 <_IO_2_1_stdout_>,
format=0x5644c516d108 "DEBUG: block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n", ap=ap#entry=0x7ffe7f6ea580, mode_flags=mode_flags#entry=0)
at vfprintf-internal.c:1365
#2 0x00007f20b0ba6ebf in __printf (format=<optimized out>) at printf.c:33
#3 0x00005644c516a47d in fast_malloc (num_bytes=1024) at src/fast_malloc.c:244
#4 0x00005644c516ab4e in malloc (num_bytes=1024) at src/fast_malloc.c:496
#5 0x00007f20b0bc6e84 in __GI__IO_file_doallocate (fp=0x7f20b0d2e6a0 <_IO_2_1_stdout_>)
at filedoalloc.c:101
#6 0x00007f20b0bd7050 in __GI__IO_doallocbuf (fp=fp#entry=0x7f20b0d2e6a0 <_IO_2_1_stdout_>)
at libioP.h:948
#7 0x00007f20b0bd60b0 in _IO_new_file_overflow (f=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, ch=-1)
at fileops.c:745
#8 0x00007f20b0bd4835 in _IO_new_file_xsputn (n=23, data=<optimized out>, f=<optimized out>)
at libioP.h:948
#9 _IO_new_file_xsputn (f=0x7f20b0d2e6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=23)
at fileops.c:1197
#10 0x00007f20b0bbbaf2 in __vfprintf_internal (s=0x7f20b0d2e6a0 <_IO_2_1_stdout_>,
format=0x5644c516d108 "DEBUG: block_map_i = %zu (num_bytes requested to allocate = %zu; smallest--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) q
another:
Reading symbols from build/fast_malloc_unit_tests...
warning: core file may not match specified executable file.
[New LWP 1258037]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f901ef65e4d in __GI__IO_file_doallocate (fp=0x7f901f0cd6a0 <_IO_2_1_stdout_>)
at libioP.h:948
948 libioP.h: No such file or directory.
(gdb) q
another
Reading symbols from build/fast_malloc_unit_tests...
warning: core file may not match specified executable file.
[New LWP 1258336]
Core was generated by `build/fast_malloc_unit_tests'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f5e4b551a80 in __find_specmb (
format=0x562fac6d7108 "DEBUG: block_map_i = %zu (num_bytes requested to allocate = %zu; smallest user block size large enough = %zu)\n") at printf-parse.h:111
111 printf-parse.h: No such file or directory.
(gdb) q
My gcc build options at the moment:
-Wall -Wextra -Werror -O0 -ggdb -std=c11 -save-temps=obj -DDEBUG
Possibly related to this DEBUG_PRINTF() macro I have, which I call inside fast_malloc().
#ifdef DEBUG
/// Debug printf function.
/// See: https://stackoverflow.com/a/1941336/4561887
#define DEBUG_PRINTF(...) printf("DEBUG: "__VA_ARGS__)
#else
#define DEBUG_PRINTF(...) \
do \
{ \
} while (0)
#endif
Why is malloc() getting called before the program starts anyway? I don't call it anywhere. But, notice you can see malloc() getting called with 1024 bytes as visible in the stack traces in runs 1 and 2 (though it happens every run, those are the ones I have pasted enough you can see it in).
My malloc() and free() overrides look like this:
inline void* malloc(size_t num_bytes)
{
return fast_malloc(num_bytes);
}
inline void free(void* ptr)
{
fast_free(ptr);
}
Is my single-threaded program where malloc() is mysteriously getting called without me calling it somehow multi-threaded at startup? Does some weird program initialization stuff take place? My fast_malloc() implementation is currently NOT thread safe, so if Linux is doing some weird multi-threaded malloc() calls during some kind of program initialization or something, that could be the cause of the corruption, as again, fast_malloc(), which overrides malloc(), is NOT yet threadsafe.
It seems to be related to printing inside malloc(). Is printing inside malloc() forbidden?
Here is the bottom (first call is at very bottom) of a recent stack trace from a core dump:
#127471 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127472 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127473 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp#entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127474 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127475 0x00007faa222b5835 in _IO_new_file_xsputn (n=13, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127476 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=13) at fileops.c:1197
#127477 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df227 '=' <repeats 13 times>) at libioP.h:948
#127478 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127479 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127480 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp#entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127481 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127482 0x00007faa222b5835 in _IO_new_file_xsputn (n=13, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127483 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=13) at fileops.c:1197
#127484 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df227 '=' <repeats 13 times>) at libioP.h:948
#127485 0x00005626d43dca28 in malloc (num_bytes=1024) at src/fast_malloc.c:494
#127486 0x00007faa222a7e84 in __GI__IO_file_doallocate (fp=0x7faa2240f6a0 <_IO_2_1_stdout_>) at filedoalloc.c:101
#127487 0x00007faa222b8050 in __GI__IO_doallocbuf (fp=fp#entry=0x7faa2240f6a0 <_IO_2_1_stdout_>) at libioP.h:948
#127488 0x00007faa222b70b0 in _IO_new_file_overflow (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, ch=-1) at fileops.c:745
#127489 0x00007faa222b5835 in _IO_new_file_xsputn (n=49, data=<optimized out>, f=<optimized out>) at libioP.h:948
#127490 _IO_new_file_xsputn (f=0x7faa2240f6a0 <_IO_2_1_stdout_>, data=<optimized out>, n=49) at fileops.c:1197
#127491 0x00007faa222aa678 in __GI__IO_puts (str=0x5626d43df238 "Running UNIT tests for the \"fast_malloc\" module.\n") at libioP.h:948
#127492 0x00005626d43dca98 in main () at src/fast_malloc_unit_tests.c:35
(gdb)
What are __GI__IO_puts and _IO_new_file_xsputn and those other function calls as you move up? Are they calls in other threads? Are they calling malloc() behind-the-scenes? It appears __GI__IO_file_doallocate is...
You are calling printf within your malloc implementation. That is not going to end well.
In the stack trace, you can clearly see that printf itself calls malloc.
If your malloc is not prepared to to be called while in the middle of manipulating its data structures, it will crash (possibly that's what's happening here).
Alternatively, you can also end up with infinite recursion, when malloc calls printf, which calls malloc, which calls printf, etc.
TL;DR: when implementing something as low level as malloc, you must stick to either low-level functions which don't themselves allocate anything, or to direct system calls.
Why is malloc() getting called before the program starts anyway?
Because low-level functions in e.g. dynamic loader need to allocate memory during their own initialization.
Your malloc must work very early in the process lifetime; long before main.
Is printing inside malloc() forbidden?
Everything that might allocate memory is forbidden.
In practice, you need to call only async-signal safe routines, because non-async-signal safe ones may allocate, if not now then in the future.
To follow up and answer my own question: #Employed Russian's answer appears to be correct.
To be more-specific: I have two main problems:
Infinite recursion between malloc() and printf().
Data corruption by freeing and reusing memory the system thinks it has exclusive access to.
The 1st problem: infinite recursion
I call printf() to do some debug prints inside my fast_malloc() implementation. So long as I do NOT override malloc() with my fast_malloc(), this is fine (so long as I protect the print with a mutex to make it multi-threaded-safe). BUT, once I do override malloc() with my fast_malloc(), this is NOT fine, because printf() calls malloc() to create a buffer into which it can place formatted string data. So, once malloc() becomes overridden by fast_malloc(), we end up with infinite recursion: prior to main() even being run, the system calls malloc() to prepare some things. This calls printf(), which calls malloc(), which calls printf()...forever until stack overflow...all before it has even entered my main() function.
So, I see zero of my prints, and main() doesn't even get entered. You can see from my last stack trace I posted in my answer that I had 127492 stack frames on my stack at the time of the crash...at which point the stack overflowed. Sanity check: for a stack size of ~7.4 MB, that equates to about 7400000/127492 = ~58 bytes per stack frame, which seems reasonable.
The 2nd problem: I'm freeing and reusing memory that the system (glibc) thinks it has safely acquired and still controls
The code I'm running is my fast_malloc_unit_tests.c program, which, among other things, re-initializes the memory pools I'm using under-the-hood many times. Each time it does this, it considers prior-allocated memory to be freed, and it reallocates it when needed. BUT, printf() and other system calls run prior to main() even being entered have already called malloc() and think they still own this memory. So, we end up with me mistakenly reusing the memory they are using, causing data corruption and crashes.
After disabling all prints inside my malloc() implementation, thereby removing the infinite recursion problem, I was able to see this behavior. In this case, the code did enter my main() function, I did see up to a few dozen of my prints before the crash, and there were only 2 calls (stack frames) on my stack at the time of the crash (rather than 127492 frames). They were:
#0 0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
#1 0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129
Full output:
Program received signal SIGSEGV, Segmentation fault.
0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
464 block = block->next_free_block;
(gdb) bt
#0 0x000055555555589d in fast_malloc_print_stats () at src/fast_malloc.c:464
#1 0x0000555555556228 in main () at src/fast_malloc_unit_tests.c:129
where fast_malloc.c line 464 contains:
while (block != NULL)
{
free_block_cnt_walked++;
block = block->next_free_block; <==== line 464
}
which as far as I can tell has nothing wrong whatsoever, as it's a simple copy and block was already guaranteed NOT to be NULL, so calling block->next_free_block couldn't possibly be dereferencing a NULL ptr. I think the segmentation fault must therefore be due to corrupted memory because that memory is being double-used, so the block ptr probably is a corrupted address which is outside the valid bounds for us to read--hence the seg fault.
That's it (I think). Now I've got to go do proper fixes and continue work on this. Big thanks goes out to #Employed Russian!
See also:
[my answer: a safe_printf() function which never calls malloc(), thereby solving the infinite recursion problem!] Which print calls in C do NOT ever call malloc() under the hood?

If I dereference an illegal address in C I get SIGSEGV with valid gdb backtrace. But if I pass it to snprintf backtrace is trashed

In my x86-64 Linux program I deliberately do:
char *ptr = 0x3e8;
int x = *(int *)ptr;
When I run it in gdb the process crashes due to SIGSEGV and prints a valid backtrace. If I do instead:
char s[16];
snprintf(s, 16, "%s\n", ptr);
The process still crashes but the backtrace is trash:
(gdb) bt
#0 0x00007ffff5da15c7 in ?? ()
#1 0x00007ffff5c704d3 in ?? ()
#2 0x0000000000000000 in ?? ()
My example may look contrived but my production code is crashing in snprintf() in exactly the same way. I've compiled with -g -O0.
The process still crashes but the backtrace is trash
When I build this test:
#include <stdio.h>
int main()
{
char *ptr = (char *)0x3e8;
char s[16];
snprintf(s, 16, "%s\n", ptr);
return 0;
}
Using gcc (Debian 9.3.0-3) 9.3.0 and GNU C Library (Debian GLIBC 2.30-4) stable release version 2.30., with libc6-dbg installed, I get:
Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:96
96 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:96
#1 0x00007ffff7e48756 in __vfprintf_internal (s=s#entry=0x7fffffffd8b0, format=format#entry=0x555555556004 "%s\n", ap=ap#entry=0x7fffffffda30, mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1688
#2 0x00007ffff7e5a1f6 in __vsnprintf_internal (string=0x7fffffffdb10 "", maxlen=<optimized out>, format=0x555555556004 "%s\n", args=args#entry=0x7fffffffda30, mode_flags=mode_flags#entry=0) at vsnprintf.c:114
#3 0x00007ffff7e335a2 in __GI___snprintf (s=<optimized out>, maxlen=<optimized out>, format=<optimized out>) at snprintf.c:31
#4 0x0000555555555169 in main () at t.c:7
I suspect that you'll get similar result from this test case on a standard x86 Ubuntu 18.04, in which case you are not telling us the whole story, and an MCVE would help a lot to get you the real answer.

LD_PRELOAD with possible static shared library functions

My objective is to hook the open function that dlopen on linux uses. For some reason, this code is not hooking dlopen->open, but it does hook my version of open main.c->open. Is dlopen not using my symbols somehow?
Compilation process is as follows:
gcc main.c -ldl -ggdb
gcc fake-open.c -o libexample.so -fPIC -shared
export LD_PRELOAD="$PWD/libexample.so"
When I run the program, everything works. Ensuring the LD_PRELOAD variable is set.. etc.
Here is the problem, when I try to hook the open function directly or indirectly called by dlopen, somehow this "version" of open is not being resolved/redirected/hooked by my version.
[main.c]
#include <dlfcn.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
puts("calling open");
int fd = open("/tmp/test.so", O_RDONLY|O_CLOEXEC);
puts("calling dlopen");
int *handle = dlopen("/tmp/test.so", RTLD_LAZY);
}
[fake-open.c]
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/stat.h>
//#include <fcntl.h>
int open(const char *pathname, int flags)
{
puts("from hooked..");
return 1;
}
Console Output:
calling open
from hooked..
calling dlopen
I know for a fact dlopen is somehow calling open due to strace.
write(1, "calling open\n", 13calling open
) = 13
write(1, "from hooked..\n", 14from hooked..
) = 14
write(1, "calling dlopen\n", 15calling dlopen
) = 15
brk(0) = 0x804b000
brk(0x806c000) = 0x806c000
open("/tmp/test.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0\3\0\1\0\0\0`\205\4\0104\0\0\0"..., 512) = 512
But, for some reason, when dlopen calls open, it is not using my version of open. This has to be some kind of linking of run time symbol resolution problem, or perhaps dlopen is using a static version of open and doesnt need to resolve any symbols at run or load time?
First, contrary to #usr's answer, dlopen does open the library.
We can confirm this by running a simple test under GDB:
// main.c
#include <dlfcn.h>
int main()
{
void *h = dlopen("./foo.so", RTLD_LAZY);
return 0;
}
// foo.c; compile with "gcc -fPIC -shared -o foo.so foo.c"
int foo() { return 0; }
Let's compile and run this:
gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x400605: file main.c, line 4.
Starting program: /tmp/a.out
Temporary breakpoint 1, main () at main.c:4
4 void *h = dlopen("./foo.so", RTLD_LAZY);
(gdb) catch syscall open
Catchpoint 2 (syscall 'open' [2])
(gdb) c
Continuing.
Catchpoint 2 (call to syscall open), 0x00007ffff7df3497 in open64 () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007ffff7df3497 in open64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff7ddf5bd in open_verify (name=0x602010 "./foo.so", fbp=0x7fffffffd568, loader=<optimized out>, whatcode=<optimized out>, found_other_class=0x7fffffffd550, free_name=<optimized out>) at dl-load.c:1930
#2 0x00007ffff7de2d6f in _dl_map_object (loader=loader#entry=0x7ffff7ffe1c8, name=name#entry=0x4006a4 "./foo.so", type=type#entry=2, trace_mode=trace_mode#entry=0, mode=mode#entry=-1879048191, nsid=0) at dl-load.c:2543
#3 0x00007ffff7deea14 in dl_open_worker (a=a#entry=0x7fffffffdae8) at dl-open.c:235
#4 0x00007ffff7de9fc4 in _dl_catch_error (objname=objname#entry=0x7fffffffdad8, errstring=errstring#entry=0x7fffffffdae0, mallocedp=mallocedp#entry=0x7fffffffdad0, operate=operate#entry=0x7ffff7dee960 <dl_open_worker>, args=args#entry=0x7fffffffdae8) at dl-error.c:187
#5 0x00007ffff7dee37b in _dl_open (file=0x4006a4 "./foo.so", mode=-2147483647, caller_dlopen=<optimized out>, nsid=-2, argc=1, argv=0x7fffffffde28, env=0x7fffffffde38) at dl-open.c:661
#6 0x00007ffff7bd702b in dlopen_doit (a=a#entry=0x7fffffffdd00) at dlopen.c:66
#7 0x00007ffff7de9fc4 in _dl_catch_error (objname=0x7ffff7dd9110 <last_result+16>, errstring=0x7ffff7dd9118 <last_result+24>, mallocedp=0x7ffff7dd9108 <last_result+8>, operate=0x7ffff7bd6fd0 <dlopen_doit>, args=0x7fffffffdd00) at dl-error.c:187
#8 0x00007ffff7bd762d in _dlerror_run (operate=operate#entry=0x7ffff7bd6fd0 <dlopen_doit>, args=args#entry=0x7fffffffdd00) at dlerror.c:163
#9 0x00007ffff7bd70c1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#10 0x0000000000400614 in main () at main.c:4
This tells you that on 64-bit system, dlopen calls open64 instead of open, so your interposer wouldn't work (you'd need to interpose open64 instead).
But you are on a 32-bit system (as evidenced by the 0x806c000 etc. addresses printed by strace), and there the stack trace looks like this:
#0 0xf7ff3774 in open () at ../sysdeps/unix/syscall-template.S:81
#1 0xf7fe1211 in open_verify (name=0x804b008 "./foo.so", fbp=fbp#entry=0xffffc93c, loader=0xf7ffd938, whatcode=whatcode#entry=0, found_other_class=found_other_class#entry=0xffffc933, free_name=free_name#entry=true) at dl-load.c:1930
#2 0xf7fe4114 in _dl_map_object (loader=loader#entry=0xf7ffd938, name=name#entry=0x8048590 "./foo.so", type=type#entry=2, trace_mode=trace_mode#entry=0, mode=mode#entry=-1879048191, nsid=0) at dl-load.c:2543
#3 0xf7feec14 in dl_open_worker (a=0xffffccdc) at dl-open.c:235
#4 0xf7feac06 in _dl_catch_error (objname=objname#entry=0xffffccd4, errstring=errstring#entry=0xffffccd8, mallocedp=mallocedp#entry=0xffffccd3, operate=operate#entry=0xf7feeb50 <dl_open_worker>, args=args#entry=0xffffccdc) at dl-error.c:187
#5 0xf7fee644 in _dl_open (file=0x8048590 "./foo.so", mode=-2147483647, caller_dlopen=0x80484ea <main+29>, nsid=<optimized out>, argc=1, argv=0xffffcf74, env=0xffffcf7c) at dl-open.c:661
#6 0xf7fafcbc in dlopen_doit (a=0xffffce90) at dlopen.c:66
#7 0xf7feac06 in _dl_catch_error (objname=0xf7fb3070 <last_result+12>, errstring=0xf7fb3074 <last_result+16>, mallocedp=0xf7fb306c <last_result+8>, operate=0xf7fafc30 <dlopen_doit>, args=0xffffce90) at dl-error.c:187
#8 0xf7fb037c in _dlerror_run (operate=operate#entry=0xf7fafc30 <dlopen_doit>, args=args#entry=0xffffce90) at dlerror.c:163
#9 0xf7fafd71 in __dlopen (file=0x8048590 "./foo.so", mode=1) at dlopen.c:87
#10 0x080484ea in main () at main.c:4
So why isn't open_verifys call to open redirected to your open interposer?
First, let's look at the actual call instruction in frame 1:
(gdb) fr 1
#1 0xf7fe1211 in open_verify (name=0x804b008 "./foo.so", fbp=fbp#entry=0xffffc93c, loader=0xf7ffd938, whatcode=whatcode#entry=0, found_other_class=found_other_class#entry=0xffffc933, free_name=free_name#entry=true) at dl-load.c:1930
1930 dl-load.c: No such file or directory.
(gdb) x/i $pc-5
0xf7fe120c <open_verify+60>: call 0xf7ff3760 <open>
Compare this to the call instruction in frame 10:
(gdb) fr 10
#10 0x080484ea in main () at main.c:4
4 void *h = dlopen("./foo.so", RTLD_LAZY);
(gdb) x/i $pc-5
0x80484e5 <main+24>: call 0x80483c0 <dlopen#plt>
Notice anything different?
That's right: the call from main goes through the procedure linkage table (PLT), which the dynamic loader (ld-linux.so.2) resolves to appropriate definition.
But the call to open in frame 1 does not go through PLT (and thus is not interposable).
Why is that? Because that call must work before there is any other definition of open available -- it is used while the libc.so.6 (which normally supplies the definition of open) is itself being loaded (by the dynamic loader).
For this reason, the dynamic loader must be entirely self-contained (in fact in contains a statically linked in copy of a subset of libc).
My objective is to hook the open function that dlopen on linux uses.
For the reason above, this objective can't be achieved via LD_PRELOAD. You'll need to use some other hooking mechanism, such as patching the loader executable code at runtime.

use of malloc in contiki programs

Consider the following contiki program.
#include<stdio.h>
#include"contiki.h"
#include <stdlib.h>
static char *mem;
static int x;
/*---------------------------------------------------------------------------*/
PROCESS(test, "test");
AUTOSTART_PROCESSES(&test);
/*---------------------------------------------------------------------------*/
PROCESS_THREAD(test, ev, data)
{
PROCESS_BEGIN();
printf("before malloc\n");
mem=(char*)malloc(10);
for(x=0;x<10;x++)
mem[x]=x+1;
printf("after malloc\n");
PROCESS_END();
}
when this program is compiled for native/z1/wismote/cooja it executes perfectly fine and both the printf statements are executed, but when compiled for mbxxx target, and then executed on hardware, only the first printf statements is executed and the code gets stuck in the malloc. Any guess or reason behind this behaviour? I am also attaching the GDB trace here.
(gdb) mon reset init
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x08000efc msp: 0x20000500
(gdb) b test.c:16
Breakpoint 1 at 0x8000ec8: file test.c, line 16.
(gdb) b test.c:17
Breakpoint 2 at 0x8000ece: file test.c, line 17.
(gdb) b test.c:18
Breakpoint 3 at 0x8000ed8: file test.c, line 18.
(gdb) load
Loading section .isr_vector, size 0x84 lma 0x8000000
Loading section .text, size 0xc5c4 lma 0x8000084
Loading section .data, size 0x660 lma 0x800c648
Start address 0x8000084, load size 52392
Transfer rate: 15 KB/sec, 8732 bytes/write.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
Breakpoint 1, process_thread_test (process_pt=0x2000050c <test+12>, ev=129 '\201', data=0x0) at test.c:16
16 printf("before malloc\n");
(gdb) c
Continuing.
Breakpoint 2, process_thread_test (process_pt=0x2000050c <test+12>, ev=<optimized out>,
data=<optimized out>) at test.c:17
17 mem=(char*)malloc(10);
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
Default_Handler () at ../../cpu/stm32w108/hal/micro/cortexm3/stm32w108/crt-stm32w108.c:87
87 {
(gdb) bt
#0 Default_Handler () at ../../cpu/stm32w108/hal/micro/cortexm3/stm32w108/crt-stm32w108.c:87
#1 <signal handler called>
#2 0x08000440 in _malloc_r ()
#3 0x08000ed4 in process_thread_test (process_pt=0x2000050c <certificate_check+12>, ev=<optimized out>,
data=<optimized out>) at test.c:17
#4 0x0800272c in call_process (p=0x20000500 <test>, ev=<optimized out>, data=<optimized out>)
at ../../core/sys/process.c:190
#5 0x080028e6 in process_post_synch (p=<optimized out>, ev=ev#entry=129 '\201', data=<optimized out>)
at ../../core/sys/process.c:366
#6 0x0800291a in process_start (p=<optimized out>, arg=arg#entry=0x0) at ../../core/sys/process.c:120
#7 0x08002964 in autostart_start (processes=<optimized out>) at ../../core/sys/autostart.c:57
#8 0x08001134 in main () at ../../platform/mbxxx/./contiki-main.c:210
(gdb)
Ahhh... Finally figured out the problem. This particular problem was there because stm32w108 was not configured to use dynamic memory.
All that was needed to be done was, to open the the following file:
contiki-2.7/cpu/stm32w108/hal/micro/cortexm3/stm32w108/crt-stm32w108.c and add #define USE_HEAP at the top of the file or before the _sbrk implementation! The heap size can also be modified here, not from the linker script, although the stack size
A side note: It is a really bad idea to use dynamic memory allocation in embedded systems, so avoid it! Its filthy trust me! Eventually I will also remove any dynamic memory allocation references! :)

Why do I get a SIGABRT here?

I have this code segment in which I am opening/closing a file a number of times (in a loop):
for(i=1;i<max;i++)
{
/* other code */
plot_file=fopen("all_fitness.out","w");
for (j=0;j<pop_size;j++)
fprintf(plot_file, "%lf %lf\n",oldpop[i].xreal[0],oldpop[i].obj);
fclose(plot_file);
/*other code*/
}
I get a SIGABRT here, with the following backtrace:
#0 0x001fc422 in __kernel_vsyscall ()
#1 0x002274d1 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0022a932 in *__GI_abort () at abort.c:92
#3 0x0025dee5 in __libc_message (do_abort=2, fmt=0x321578 "*** glibc detected *** %s: %s: 0x%s ***\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#4 0x00267ff1 in malloc_printerr (action=<value optimized out>, str=0x6 <Address 0x6 out of bounds>, ptr=0x8055a60) at malloc.c:6217
#5 0x002696f2 in _int_free (av=<value optimized out>, p=<value optimized out>) at malloc.c:4750
#6 0x0026c7cd in *__GI___libc_free (mem=0x8055a60) at malloc.c:3716
#7 0x0025850a in _IO_new_fclose (fp=0x8055a60) at iofclose.c:88
#8 0x0804b9c0 in main () at ga.c:1100
The line number 1100, is the line where I am doing the fclose() in the above code segment. What is the reason for the above behavior? Any pointers is appreciated.
(I am on Linux and using gcc)
When you call fclose(), glibc releases some dynamically allocated structures; internally there is a free() call. malloc() and free() rely on rather complex, dynamically built structures. Apparently, glibc found that the structures were in an incoherent state, to the point that safe memory release cannot be done. glibc decided that the problem was serious enough to warrant an immediate abort.
This means that you have a bug somewhere in your code, possibly quite far from the snippet you show, a buffer overflow or a similar out-of-place memory write which damages the memory allocation structures.
You may want to try Valgrind or Electric Fence to sort such problems out.
I don't know if it's causing your particular problem, but you should always check the FILE * pointer returned by fopen() in case it's NULL.

Resources