GLIBC: debugging memory leaks: how to interpret output of mtrace() - c

I’m trying to debug a memory leak problem. I’m using mtrace() to get a malloc/free/realloc trace. I’ve ran my prog and have now a huge log file. So far so good. But I have problems interpreting the file. Look at these lines:
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x1502570 0x68
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x1502620 0x30
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x2aaab43a1700 0xa80
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x1501460 0xa64
The strange about this is that one call (same return address) is responsible for 4 allocations.
Even stranger:
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x2aaab43a1700 0xa2c
…
# /usr/java/ibm-java2-x86_64-50/jre/bin/libj9prt23.so:[0x2b270a384a34] + 0x2aaab43a1700 0xa80
Between those two lines the block 0x2aaab43a1700 is never being freed.
Does anyone know how to explain this? How could one call result in 4 allocations? And how could malloc return an address which was already allocated previously?
edit 2008/09/30:
The script to analyze the mtrace() output provided by GLIBC (mtrace.pl) isn't of any help here. It will just say: Alloc 0x2aaab43a1700 duplicate. But how could this happen?

You're looking at the direct output of mtrace, which is extremely confusing and counterintuitive. Luckily, there is a perl script (called mtrace, found within glibc-utils) which can very easily help the parsing of this output.
Compile your build with debugging on, and run mtrace like such:
$ gcc -g -o test test.c
$ MALLOC_TRACE=mtrace.out ./test
$ mtrace test mtrace.out
Memory not freed:
-----------------
Address Size Caller
0x094d9378 0x400 at test.c:6
The output should be a lot easier to digest.

The function that is allocating the memory is being called more than once. The caller address points to the code that did the allocation, and that code is simply being run more than once.
Here is an example in C:
void *allocate (void)
{
return (malloc(1000));
}
int main()
{
mtrace();
allocate();
allocate();
}
The output from mtrace is:
Memory not freed:
-----------------
Address Size Caller
0x0000000000601460 0x3e8 at 0x4004f6
0x0000000000601850 0x3e8 at 0x4004f6
Note how the caller address is identical? This is why the mtrace analysing script is saying they are identical, because the same bug is being seen more that once, resulting in several memory leaks.
Compiling with debugs flags (-g) is helpful if you can:
Memory not freed:
-----------------
Address Size Caller
0x0000000000601460 0x3e8 at /home/andrjohn/development/playground/test.c:6
0x0000000000601850 0x3e8 at /home/andrjohn/development/playground/test.c:6

One possible explanation is that the same function is allocating different buffer sizes? One such example is strdup.
For the second question, it is possible that the runtime is allocating some "static" scratch area which is not intended to be freed until the process is terminated. And at that point, the OS will clean-up after the process anyway.
Think about it this way: in Java, there are no destructors, and no guarantees that finalization will be ever called for any object.

Try running your app under valgrind. It might give you a better view about what is actually being leaked.

Related

How to get the peak dynamicly allocated memory usage of a C/C++ program in Linux system

I would like to know the peak dynamicly allocated memory usage (have already canceled the released memory usage) after running a C/C++ program.
1 (Initial question):
Given a C/C++ program a.out. Can any tool report the peak dynamically allocated memory usage in a way like:
>$ peak-memory ./a.out
Peak dynamically allocated memory size: 12345678 Bytes
2 (Alternative question):
Can I insert a snippit code in the source program such that every time it is executed, it will report the current heap memory usage where I want. This looks like:
int main(){
int *a = (int*) malloc(12);
// some code...
print_heap_usage();
// other code...
}
My research:
I know I can use a wrapper function such as my_malloc and my_free that track the allocated and released memory. But it is not a practical option for other's source code when too many lines of code need to be modified. And prohibitively, this wrapper function can't handle the situation when new and delete are used.
There is one related question Get peak amount of memory used by a C program. But it's only about Windows system and doesn't specifically care about heap memory usage. Some answers indicated the using of getrusage. However, I googled and found it clearly can't tell the heap usage.
Either
Statistics for Memory Allocation with malloc
or
malloc_stats
Note: check the CONFORMING TO sections.
Another gimmick (in terminal):
cat /proc/meminfo
free (from procps package, uses /proc/meminfo, see https://gitlab.com/procps-ng/procps/-/blob/master/proc/sysinfo.c line 698 in function meminfo)

How to create a dynamically allocated memory fault detectable by Valgrind?

I need to use a real C program for exemplifying memory safety concepts. The idea is to inject or delete some statements in a program that uses malloc in order to create a memory problem. The modified version of the program must lend to a memory fault at runtime. The problem should be detectable by Valgrind, and thus be related to dynamically allocated memory (not stack memory). It should also have a pre-made test case or test input to trigger the problem.
I don't understand how to create a dynamically allocated memory fault.
Can you present an example and explain a modification to a program that causes a memory fault when the program is executed with a given input?
I'll give you a few examples.
#include <stdlib.h>
int main(void)
{
int*pi1 = malloc(10*sizeof(int));
if (pi1[5]) // ERROR here, see 4.2.2 in the manual
;
free(pi1);
int* pi2 = malloc(10*sizeof(int));
free(pi2);
if (pi2[5]) // ERROR here, a variation of 4.2.1 in the manual
;
int* pi3 = (int*)0x500000000000U;
if (*pi3) // ERROR and probably a crash, see 4.2.1 in the manual
;
}
Clearly these are trivial examples. In more involved real world problems you should be aware that the 'uninitialized' nature of memory is transitive. Valgrind does not emit error messages until the use of the uninitialized memory has an effect on the behaviour of the software.
For example, you could have
Structure s1 allocated with malloc.
All fields of s1 get initialized except f1.
s1 gets copied into s2. No error emitted.
s2 gets copied into s3. No error emitted.
A read is done on s3.f1. Now Valgrind emits an error. It will give the stack here and the allocation stack of step 1.

Understanding printf in C

I'm trying to understand how printf works in C for a simple case. I wrote the following program:
#include "stdio.h"
int main(int argc, char const *argv[])
{
printf("Test %s\n", argv[1]);
return 0;
}
Running objdump on the binary I noticed the Test %s\n resides in .rodata
objdump -sj .rodata bin
bin: file format elf64-x86-64
Contents of section .rodata:
08e0 01000200 54657374 2025730a 00 ....Test %s..
So formatted print seems to perform additional pattern copying from rodata to somewhere else.
After compiling and running it with stare ./bin rr I noticed a brk syscall before the actual write. So running it with
gdb catch syscall brk
gdb catch syscall write
shows that in my case the current break equals to 0x555555756000, but it then sets to 0x555555777000. When the write occurs the formatted string
x/s $rsi
0x555555756260: "Test rr\n"
Resides between the "old" and "new" break. After the write occurs the programs exits.
QUESTION: Why do we allocate so many pages and why didn't the break returns to the previous one after write syscall occurs? Is there any reason to use brk instead of mmap for such formatting?
brk() (and it's companion sbrk()) is some kind of mmap() specialized to manipulate the heap size. It is there for historical reasons, the libc could also use mmap() or mremap() directly.
The heap is expanded as additional memory is allocated, for example with malloc(), which happens internally in the libc, for example to have enough space to create the actual string from the format string and the parameters or many other internal things (i.e. the output buffers when using buffered io with the f* function family).
If some parts of the heap are not used anymore, it is often not automatically deallocated for two main reasons: the heap may be fragmented, and/or the unused heap does not fall below a certain threshold which justifies the operation, because it might be needed again soon.
As a side note: the format string itself is certainly not copied from the ro-section to the heap, this would be completely useless. But the result string is (usually) built on the heap.
Why do we allocate so many pages ?
Using a system call is costly, so the library ask for more than you would like at this moment because it is highly probable that you will want more very soon. Managing memory in user mode is less costly. It is a matter of granularity.
and why didn't the break returns to
the previous one after write syscall occurs?
Again, why free if the probability that you will ask for more soon is high?
Is there any reason to use brk instead of mmap for such formatting?
It is a matter of choice, this depends on implementation.
Aside: Your question is more about "memory allocation policy" than "understanding printf" (that is the context of).

post-mortem memory leak search (and analysis) with gdb

My goal altogether is to figure out from a post mortem core file, why a specific process is consuming a lot of memory. Is there a summary that I can get somehow? As obvious valgrind is out of the question, because I can't get access to the process live.
First of all getting an output something similar to /proc/"pid"/maps, would help, but
maintenance info sections
(as described here: GDB: Listing all mapped memory regions for a crashed process) in gdb didn't show me heap memory consumption.
info proc map
is an option, as I can get access to machine with the exact same code, but as far as I have seen it is not correct. My process was using 700MB-s, but the maps seen only accounted for some 10 MBs. And I didn't see .so-s there which are visible in
maintenance print statistics
Do you know any other command which might be useful?
I can always instrument the code, but that's no easy. Along with reaching all the allocated data through pointers is like needle in the haystack.
Do you have any ideas?
Postmortem debugging of this sort in gdb is a bit of an art more than a science.
The most important tool for it, in my opinion, is the ability to write scripts that run inside of gdb. The manual will explain it to you. The reason I find this so useful is that it lets you do things like walking data structures and printing out information abou them.
Another possibility for you here is to instrument your version of malloc -- write a new malloc function that saves statistics about what is being allocated so that you can then look at those post mortem. You can, of course, call the original malloc to do the actual memory allocation work.
I'm sorry that I can't give you an obvious and simple answer that will simply yield an immediate fix for you here -- without tools like valgrind this is a very hard job.
If its Linux you dont have to worry about doing stats to your malloc. Use the utility called 'memusage'
for a sample program (sample_mem.c) like below
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(voiid)
{
int i=1000;
char *buff=NULL;
srand(time(NULL));
while(i--)
{
buff = malloc(rand() % 64);
free(buff);
}
return 0;
}
the output of memusage will be
$memusage sample_mem
Memory usage summary: heap total: 31434, heap peak: 63, stack peak: 80
total calls total memory failed calls
malloc| 1000 31434 0
realloc| 0 0 0 (nomove:0, dec:0, free:0)
calloc| 0 0 0
free| 1000 31434
Histogram for block sizes:
0-15 253 25% ==================================================
16-31 253 25% ==================================================
32-47 247 24% ================================================
48-63 247 24% ================================================
but if your writing a malloc wapper then you can make your program coredump after this many number of malloc so that you can get a clue.
You might be able to use a simple tool like log-malloc.c which compiles into a shared library which is LD_PRELOADed before your application and logs all the malloc-type functions to a file. At least it might help narrow down the search in your dump.

How to find place of buffer overflow and memory corruptions?

valgrind can't find anything useful. I'm confused.
Symptomes:
my data corrupted by a malloc() call
return address of my function is replaced via something wrong
PS: code does NOT segfault
Currently I have some progress via replacing all my malloc() via mmap()+mprotect()
You might be overwriting the stack, or you might be overwriting the heap.
You can try adding the flag -fstack-protector-all to your GCC command line options to ask for some stack-smashing reporting to be built into the program. This might cause it to fail sooner.
Another possibility is to look at the address reported in dmesg output and see if you can't track down the function/memory that is being smashed:
[68303.941351] broken[13301]: segfault at 7f0061616161 ip 000000000040053d sp 00007fffd4ad3980 error 4 in broken[400000+1000]
readelf -s will dump the symbol table, we can look for the function that is triggering the problem:
$ readelf -s broken | grep 4005
40: 00000000004005e0 0 FUNC LOCAL DEFAULT 13 __do_global_ctors_aux
47: 0000000000400540 2 FUNC GLOBAL DEFAULT 13 __libc_csu_fini
57: 0000000000400550 137 FUNC GLOBAL DEFAULT 13 __libc_csu_init
63: 0000000000400515 42 FUNC GLOBAL DEFAULT 13 main
The main routine is the one executing when the bad pointer is used:
#include <string.h>
void f(const char *s) {
char buf[4];
strcpy(buf, s);
return;
}
int main(int argc, char* argv[]) {
f("aaaa");
f("aaaaaaaaaaaaaaaaaaaa");
return 0;
}
When main tries to return to the C library to quit, it uses a bad pointer stored in the stack frame. So look at the functions called by main, and (it's pretty easy in this trivial case) f is obviously the bugger that scribbled all over the stack frame.
If you're overwriting the heap, then perhaps you could try electric fence. The downsides are pretty steep (vast memory use) but it might be just what you need to find the problem.
Valgrind memcheck isn't very good at detecting buffer overruns. But you could try a patch that might.
Fix all dangling pointers,all buffer
overflows
Use pointers only where
they are really needed
see following link:: What C/C++ tools can check for buffer overflows?
You could also try the trial version of IBM Rational Purify - a pretty good tool to detect buffer overflows, memory leaks and any other memory corruption errors. Follow this link to download http://www-01.ibm.com/software/awdtools/purify/unix/
What environment are you developing on?
If your developing on Windows, try this article http://msdn.microsoft.com/en-us/library/cc500347.aspx
Can't help you on Linux. But you say you aren't using any string functions, which suggests your application might be fairly portable. Does it fail under Windows?
If it does, our CheckPointer tool may be able to find the problem. It performs a much more careful check of how your program uses pointers than Valgrind can do, because it can see the structure and declarations in your code, and it understands the various kinds of storage uses (stack frames vs. heap). Valgrind only sees the machine instructions, and can't tell when stack frames go out of scope.

Resources