I've been testing a C shared library for memory leaks. I got the output below, and I'd like to make sure my understanding of the output is correct.
I'm fairly well-acquainted with valgrind, but I'm used to the output having just one line below the "heap allocation" section, so I'd like to make sure I get this right. I tried to find more info in the valgrind manuals and here and other forums, but couldn't find anything.
Anyway, I ran with these parameters:
valgrind --leak-check=full --log-fd=1 --keep-debuginfo=yes --track-origins=yes
And got this output:
==28303== Conditional jump or move depends on uninitialised value(s)
==28303== at 0x82C80C0: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x830AA05: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x8301896: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0xB4E7E9A: func1 (in /opt/MyDir/MyExit.so_r)
==28303== by 0xB4E9FFD: func2 (in /opt/MyDir/MyExit.so_r)
==28303== by 0xB4EC62E: func3 (in /opt/MyDir/MyExit.so_r)
==28303== by 0xB4ECB7C: func4 (in /opt/MyDir/MyExit.so_r)
==28303== by 0x83AD1B7: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x831DA90: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x8301A1C: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x838DBCA: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x606E8A: ??? (in /opt/mqm/bin/dmpmqmsg)
==28303== Uninitialised value was created by a heap allocation
==28303== at 0x6C29F73: malloc (vg_replace_malloc.c:309)
==28303== by 0x8200401: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x848331B: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x839DC40: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x83167C4: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x83197D1: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x830092D: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x83833B4: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x838EE0C: ??? (in /opt/mqm/lib64/libmqe_r.so)
==28303== by 0x60316B: ??? (in /opt/mqm/bin/dmpmqmsg)
==28303== by 0x7680554: (below main) (in /usr/lib64/libc-2.17.so)
(changed func names for security reasons)
Here's my understanding:
Memory was first allocated in "libc-2.17.so" (or was it in "dmpmqmsg"?), and was not initialized.
After that, the call sequence was as follows:
libc-2.17.so >> dmpmqmsg >> libmqe_r.so >> MyExit.so_r (func4) >> ... >> MyExit.so_r (func1) >> libmqe_r.so
Finally the "conditional jump" which valgrind notified about was in libmqe_r.so at 0x82C80C0, probably an if(pmem != NULL) somewhere in this library
Is this the right interpretation?
In case it's relevant - I'm running on RedHat linux. My code is an MQ Exit which I compiled as a shared library.
First, looking at the "heap allocation" callstack. At the top is malloc (which has been intercepted by memcheck). Then you have a series of 8 calls in libmqe_r.so, all without debug info. Then there's the call from the guest executable, dmpmqmsg. The last line in the callstack, in libc, is the startup function that calls main.
Next, the actual error. Without debug info, it will be difficult to be certain. It looks like your func1 is calling a chain of 3 functions in libmqe_r.so, and passing in some uninitialized heap memory. It's also possible that your code is innocent and the topmost function is accessing some uninitialized static or global object.
I have almost no experience with MQ. There may be a package related to MQ that contains debugging symbols. Installing that would probably help.
Related
This is my first time working with vlagrind and I am wondering if those errors are something sirious, I should worry about or just ignore them. My program is just simple SDL2 2D space game and i have no clue where those memory leaks could come from.
==9173== Conditional jump or move depends on uninitialised value(s)
==9173== at 0xA0E1343: ??? (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0xA0215E7: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0x9E8BD75: llvm::FPPassManager::runOnFunction(llvm::Function&) (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0x9E8BFF2: llvm::FPPassManager::runOnModule(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0x9E8C49F: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0xAFD7B34: llvm::MCJIT::emitObject(llvm::Module*) (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0xAFD7F1D: llvm::MCJIT::generateCodeForModule(llvm::Module*) (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0xAFD86AD: llvm::MCJIT::finalizeObject() (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0xAF9C87F: LLVMGetPointerToGlobal (in /usr/lib/x86_64-linux-gnu/libLLVM-10.so.1)
==9173== by 0x84B0041: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==9173== by 0x84A49EF: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==9173== by 0x8490937: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
And here it is mentioning some memory leak. But i have checked my code for leaks so many times that i think it must be in SDL library.
17 bytes in 1 blocks are definitely lost in loss record 10 of 1,977
==9173== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==9173== by 0x4EC85A6: _XlcDefaultMapModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==9173== by 0x4EC897A: XSetLocaleModifiers (in /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0)
==9173== by 0x4923824: ??? (in /home/coder/Desktop/game/libSDL2-2.0.so.0)
==9173== by 0x492A45A: ??? (in /home/coder/Desktop/game/libSDL2-2.0.so.0)
==9173== by 0x48FCF6A: ??? (in /home/coder/Desktop/game/libSDL2-2.0.so.0)
==9173== by 0x486C8E6: ??? (in /home/coder/Desktop/game/libSDL2-2.0.so.0)
==9173== by 0x10972A: main (projekt.c:115)
==9173==
==9173== 112 (56 direct, 56 indirect) bytes in 1 blocks are definitely lost in loss record 1,922 of 1,977
==9173== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==9173== by 0x880D07E: ???
==9173== by 0x8488C3B: ???
==9173== by 0x84737A5: ???
==9173== by 0x847386C: ???
==9173== by 0x8474479: ???
==9173== by 0x8437A33: ???
==9173== by 0x843A35C: ???
==9173== by 0x84352BC: ???
==9173== by 0x83F8357: ???
==9173== by 0x841C33D: ???
==9173== by 0x8419C76: ???
==9173==
==9173== LEAK SUMMARY:
==9173== definitely lost: 73 bytes in 2 blocks
==9173== indirectly lost: 56 bytes in 1 blocks
==9173== possibly lost: 0 bytes in 0 blocks
==9173== still reachable: 330,333 bytes in 2,678 blocks
==9173== suppressed: 0 bytes in 0 blocks
Could someone explain me what those errors mean?
This kind of library will always have some leaks, unfortunately. You can check this post for further details or find more answers on the SDL / OpenGL / Any graphic library you want, but long story short, it will almost always happen.
All the leaks you should focus on are the ones which are traced back to the code you wrote yourself.
I recommend launching valgrind --leak-check=full --show-reachable=yes instead of just valgrind, it will display your errors more precisely.
Here is my code:
#include <stdlib.h>
#include <stdio.h>
int main() {
int* temp = malloc(sizeof(int));
*temp = 10;
free(temp);
return 0;
}
but when I try to use valgrind on it, I get a ton of errors like
Conditional jump or move depends on uninitialised value(s)
at 0x41C540: ??? (in /home/path)
by 0x4711EA: ??? (in /home/path)
by 0x448DE4: ??? (in /home/path)
by 0x44A995: ??? (in /home/path)
by 0x40249B: ??? (in /home/path)
by 0x401B6D: ??? (in /home/path)
by 0x1FFF000657: ???
and I can't understand why I get them.
Adding --track-origins=yes doesn't change the errors.
I also tried using calloc instead of malloc, but the errors remain.
Can someone help me?
To give more precise stack traces containing line numbers and file name, valgrind uses the debug info.
You should compile with the -g option.
Note however that I do not see how the small code you have shown could create errors. So, clearly, these errors are arriving from system libraries. Such errors should then not be shown. So, likely your valgrind installation is strange and/or the setup of your system libraries is not typical.
Here is the valgrind output from a project:
==2433== Invalid free() / delete / delete[] / realloc()
==2433== at 0x402B06C: free (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==2433== by 0x43F345B: av_freep (mem.c:172)
==2433== by 0x5A6F4D2: (below main) (libc-start.c:226)
==2433== Address 0xb3fd830 is 48 bytes inside a block of size 111,634 alloc'd
==2433== at 0x402BE68: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==2433== by 0x80BB6B5: _talloc_realloc (talloc.c:997)
The line starting with Address is indented by one space more compared to the line starting with Invalid. Does that mean one leads onto another? Or are they seperate?
If the are seperate, where does the by 0x5A6F4D2: (below main) (libc-start.c:226) come from? I get the feeling (below main) has something to do with it, but I can't find libc-start.c anywhere on my hard drive.
Yes, it is providing you with additional details on the invalid free. The first four lines describe the invalid call (free in this case) and the call stack at the time of the free. The following three lines provide additional data. In this case, valgrind recognizes that the address passed to free is contained within an allocated region, and it provides the offset, size of the block, and call stack of that allocation.
According to valgrind.org, the hierarchy should be flat, as shown below:
==3016== Invalid write of size 1
==3016== at 0x80484DA: main (in /jfs/article/sample2)
==3016== by 0x40271507: __libc_start_main (../sysdeps/generic/libc-start.c:129)
==3016== by 0x80483B1: free##GLIBC_2.0 (in /jfs/article/sample2)
==3016== Address 0x40CA0224 is 0 bytes after a block of size 512 alloc'd
==3016== at 0x400483E4: malloc (vg_clientfuncs.c:100)
==3016== by 0x80484AA: main (in /jfs/article/sample2)
==3016== by 0x40271507: __libc_start_main (../sysdeps/generic/libc-start.c:129)
==3016== by 0x80483B1: free##GLIBC_2.0 (in /jfs/article/sample2)
I would treat the indentation of Address in your output as the above, as it may be a version-specific change to make the output more readable.
Before my program even starts I am receiving uninitialized value messages that reference function calls that are not in my program. I am confused as to why I am receiving these messages and how I can clean them up?
==24266== Conditional jump or move depends on uninitialised value(s)
==24266== at 0x809098A: __linkin_atfork (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x80919EB: _dl_non_dynamic_init (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x80921B1: __libc_init_first (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x805F60B: (below main) (in /home/mbarry/workspace/datapup/src/plugin)
==24266== Uninitialised value was created
==24266== at 0x8091662: _dl_sysinfo_int80 (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x80BE31F: brk (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x808DE99: sbrk (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x805F96B: __libc_setup_tls (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x805FB66: __pthread_initialize_minimal (in /home/mbarry/workspace/datapup/src/plugin)
==24266== by 0x805F5A3: (below main) (in /home/mbarry/workspace/datapup/src/plugin)
It was incorrect use of -D_THREAD_SAFE -D_REENTRANT -static flags in my gcc makefile causing the memory issue.
I wrote a C-based application that appears to run fine, except on very large datasets as input.
With large input, I get a segmentation fault at the end steps of the binary's functionality.
I ran the binary (with the test input) with valgrind:
valgrind --tool=memcheck --leak-check=yes /foo/bar/baz inputDataset > outputAnalysis
This job normally takes a few hours, but with valgrind it took seven days.
Unfortunately, at this point, I don't know how to read the results I am getting from this run.
I get a lot of these warnings:
...
==4074== Conditional jump or move depends on uninitialised value(s)
==4074== at 0x435900: ??? (in /foo/bar/baz)
==4074== by 0x439CC5: ??? (in /foo/bar/baz)
==4074== by 0x400BF2: ??? (in /foo/bar/baz)
==4074== by 0x402086: ??? (in /foo/bar/baz)
==4074== by 0x402A0F: ??? (in /foo/bar/baz)
==4074== by 0x41684F: ??? (in /foo/bar/baz)
==4074== by 0x4001B8: ??? (in /foo/bar/baz)
==4074== by 0x7FEFFFF57: ???
==4074== Uninitialised value was created
==4074== at 0x461D3A: ??? (in /foo/bar/baz)
==4074== by 0x43F926: ??? (in /foo/bar/baz)
==4074== by 0x416B9B: ??? (in /foo/bar/baz)
==4074== by 0x416725: ??? (in /foo/bar/baz)
==4074== by 0x4001B8: ??? (in /foo/bar/baz)
==4074== by 0x7FEFFFF57: ???
...
There are no parts of code hinted at, no names of variables, etc. What can I do with this information?
At the end, I finally get the following error, but — as with smaller datasets that do not crash — valgrind finds no leaks:
...
==4074== Process terminating with default action of signal 11 (SIGSEGV)
==4074== Access not within mapped region at address 0x7158E7F7
==4074== at 0x7158E7F7: ???
==4074== by 0x4020B8: ??? (in /foo/bar/baz)
==4074== by 0x6322203A22656D6E: ???
==4074== by 0x306C675F6E557267: ???
==4074== by 0x202C22373232302F: ???
==4074== by 0x6D616E656C696621: ???
==4074== by 0x72686322203A2264: ???
==4074== by 0x3030306C675F6E54: ???
==4074== by 0x346469702E373231: ???
==4074== by 0x646469662E34372F: ???
==4074== by 0x722E64616568656B: ???
==4074== by 0x63656D6F6C756764: ???
==4074== If you believe this happened as a result of a stack
==4074== overflow in your program's main thread (unlikely but
==4074== possible), you can try to increase the size of the
==4074== main thread stack using the --main-stacksize= flag.
==4074== The main thread stack size used in this run was 10485760.
==4074==
==4074== HEAP SUMMARY:
==4074== in use at exit: 0 bytes in 0 blocks
==4074== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==4074==
==4074== All heap blocks were freed -- no leaks are possible
==4074==
==4074== For counts of detected and suppressed errors, rerun with: -v
==4074== ERROR SUMMARY: 1603141870 errors from 86 contexts (suppressed: 0 from 0)
Segmentation fault
Everything I allocate space for gets an equivalent free statement, after which I set pointers to NULL.
At this point, how can I best debug this application, to determine what else is causing the segmentation fault?
22 Dec 2011 - Edit
I compiled a debug-version of my binary, called debug-binary, using the following compilation flags:
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -DUSE_ZLIB -g -O0 -Wformat -Wall -pedantic -std=gnu99
When I run it with valgrind, I don't get much more information:
valgrind -v --tool=memcheck --leak-check=yes --error-limit=no --track-origins=yes debug-binary input > output
Here's a snippet of output:
==25116== 2 errors in context 14 of 14:
==25116== Invalid read of size 4
==25116== at 0x4045E8: ??? (in /foo/bar/debug-binary)
==25116== by 0x40682F: ??? (in /foo/bar/debug-binary)
==25116== by 0x404F0C: ??? (in /foo/bar/debug-binary)
==25116== by 0x401FA4: ??? (in /foo/bar/debug-binary)
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)
==25116== Address 0x539f188 is 24 bytes inside a block of size 48 free'd
==25116== at 0x4A05D21: free (vg_replace_malloc.c:325)
==25116== by 0x401F6B: ??? (in /foo/bar/debug-binary)
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)
Is this an issue with my binary, or with a system library (libc) that my application is dependent upon?
I also don't know what to do about interpreting the ??? entries. Is there another compilation flag I need to get valgrind to provide more information?
Valgrind basically says there are no notable heap management issues. The program is segfaulting from a less complex programming fault.
If it were me, I would
compile it with gcc -g,
enable core dump files (ulimit -c unlimited),
run the program normally,
and let it fault
use gdb to examine the core file and look at what it was doing when it faulted:
gdb (programfile) (corefile)
bt
I don't believe valgrind is able to find all errors where you've overrun a value on the stack (but not overrun the stack itself). So, you may want to try gcc's -f-stack-protector-all option.
You should also try mudflap, with -fmudflap (single-threaded) or -fmudflapth (multi-threaded).
Both mudflap and stack protector should be much faster than valgrind.
In additional, it looks like you don't have debug symbols, making reading backtraces difficult. Add -ggdb.
You probably also want to enable core-file generation (try ulimit -c unlimited). This way, you can try to debug the process post-crash by using gdb program core.
As #wallyk indicates, your segfault may actually be something fairly easy to find—e.g., maybe you're dereferencing NULL, and gdb can point you to the exact line (or, well, close unless you compile with -O0). This would make sense, for example, if you're just running of memory for your larger datasets, and thus malloc returns NULL, and you forgot to check that somewhere.
Lastly, if nothing else makes sense, there is always the possibility of hardware issues. But those would be expected to be fairly random, e.g., different values getting corrupted different runs. If you try a different machine, and it happens there, its extremely unlikely to be a hardware issue.
The "Conditional jump or move depends on uninitialised value" is a serious bug you need to fix. It indicates that the behaviour of your program is affected by the contents of an uninitialised variable (including an uninitialised memory region returned by malloc()).
To get readable backtraces from valgrind you need to compile with -g.