I have encountered a weird issue. A process(written in c) is leaking memory but I am not able to locate why it is happening. Memory usage is increasing continuously when process handles traffic and at some point OS(linux) is killing it with error 'out of memory'.
I tried to debug this using valgrind with the following flags:
--show-reachable=yes --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --track-fds=yes --num-callers=20 --log-file=/tmp/valgrind-out.txt
Output file is as follows:
==5564== LEAK SUMMARY:
==5564== definitely lost: 0 bytes in 0 blocks
==5564== indirectly lost: 0 bytes in 0 blocks
==5564== possibly lost: 646,916 bytes in 1,156 blocks
==5564== still reachable: 4,742,112 bytes in 2,191 blocks
==5564== suppressed: 0 bytes in 0 blocks
Definitely lost is shown as 0 and there is no indication on where it is leaking. I have gone through still reachable segment, they all seem fine.
I wont be able to post code as it is a huge code with 100k+ lines. Basically what it does it sends some packets over tcp socket as a client. Sever is a simple python script which replies with response. My codes works as expected. This leak is the only trouble.
Any suggestion on debugging this issue?
Related
Should I worry about handling the event that user passes SIGINT in the middle of using my program?
The program in question deals with heap allocations and frees, so I am worried that such a situation would cause a memory leak. When I pass SIGINT in the middle of using the program, Valgrind states:
==30173== Process terminating with default action of signal 2 (SIGINT)
==30173== at 0x4ACC142: read (read.c:26)
==30173== by 0x4A4ED1E: _IO_file_underflow##GLIBC_2.2.5 (fileops.c:517)
==30173== by 0x4A41897: getdelim (iogetdelim.c:73)
==30173== by 0x109566: main (main.c:55)
==30173==
==30173== HEAP SUMMARY:
==30173== in use at exit: 1,000 bytes in 1 blocks
==30173== total heap usage: 3 allocs, 2 frees, 3,048 bytes allocated
==30173==
==30173== LEAK SUMMARY:
==30173== definitely lost: 0 bytes in 0 blocks
==30173== indirectly lost: 0 bytes in 0 blocks
==30173== possibly lost: 0 bytes in 0 blocks
==30173== still reachable: 1,000 bytes in 1 blocks
==30173== suppressed: 0 bytes in 0 blocks
==30173== Rerun with --leak-check=full to see details of leaked memory
==30173==
==30173== For lists of detected and suppressed errors, rerun with: -s
==30173== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
The answer is OS-dependent. Most modern operating systems will clean up memory allocated by your process once it is killed (Windows, Linux, *nix in general, and more). This is usually just part of the OS memory isolation and protection system, where each process gets its own virtual memory mapping and the physical pages corresponding to that mapping are allocated / freed by way of reference counting (a killed / exited process will decrement the reference counts to its mapped physical pages and free them if they reach zero).
If you plan on running your process on obscure embedded systems with no such guarantees with respect to memory management, then perhaps you might need to worry about such a thing. Otherwise, if memory management is your only concern, then it's a non-issue.
If you want to account for other things which should happen on exit (e.g. saving state), then you will certainly need to trap SIGINT, likely along with other signals as well.
I wrote a linked list in C today at work on a Linux machine and everything checked out in Valgrind. Then I ran the same test (a handful of pushes and then deleting the list) at home on OS X and got a crazy amount of allocs.
==4344== HEAP SUMMARY:
==4344== in use at exit: 26,262 bytes in 187 blocks
==4344== total heap usage: 267 allocs, 80 frees, 32,374 bytes allocated
==4344==
==4344== LEAK SUMMARY:
==4344== definitely lost: 0 bytes in 0 blocks
==4344== indirectly lost: 0 bytes in 0 blocks
==4344== possibly lost: 0 bytes in 0 blocks
==4344== still reachable: 0 bytes in 0 blocks
==4344== suppressed: 26,262 bytes in 187 blocks
==4344==
==4344== For counts of detected and suppressed errors, rerun with: -v
==4344== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I know the code is fine and doesn't have any leaks. So I just commented out the list test and compiled with only printf("test\n"); in the main, and it showed 263 allocs with 76 frees (I had 4 intentional allocs in the list test). Why am I getting so many allocs on OS X? Is this just something the OS did? I don't understand why I'd have 263 allocs when I just did a printf...
OS X has a very bad architecture. Because libdl, libdyld, libm, libc and some other libraries are "packed" into libSystem, all of them are initialized when the library is loaded. Most of them come from dyld. Dyld is written in C and C++, that's why C++ part may push up number of allocs.
This is only Apple thing, not OS X thing. I have written an alternate C library. It does not have many "not-needed allocs".
Also, allocs are caused by opening FILE *s. Note that 3 streams (stdin, stdout and stderr) are initialized on run.
Valgrind support on OS X is currently being actively worked on. Your best approach is to ensure you are using a SVN trunk build, and update frequently.
The errors Valgrind is reporting to you are present within the OS X system libraries. These are not the fault of your program, but because even simple programs including these system libraries Valgrind continues to pick them up. Suppressions within Valgrind trunk are continually being updated to catch these issues, allowing you to focus on the real problems that may be present within your code.
The following commands will allow you to use Valgrind trunk, if you're not already:
svn co svn://svn.valgrind.org/valgrind/trunk valgrind
cd valgrind
./autogen.sh
./configure
make -j4
sudo make install
Full disclosure: I'm one of the Valgrind developers who contributed patches to support OS X 10.11
The Problem
I've written a php extension (PHP 5.3) which appears to work fine for simple tests but the moment I start making multiple calls it I start seeing the error:
zend_mm_heap corrupted
Normally through a console or apache error log, I also sometimes see the error
[Thu Jun 19 16:12:31.934289 2014] [:error] [pid 560] [client 127.0.0.1:35410] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 139678164955264 bytes) in Unknown on line 0
What I've tried to do
I've tried find the exact spot where the issue occurs but it appears to occurs between the destructor being called for my php class that calls the extension but before the constructor runs the first line of the constructor (note, I have mainly used phpunit to diagnose this, if I run it in a browser it will usually work once and then throw the error to the log on the next attempt with a 'The connection was reset' in my browser window so no output.
I've tried adding debug lines with memory_get_usage and installing the extension memprof but all output fails to show any serious memory issues and I've never seen a memory usage greater than 8mb.
I've looked at other stack overflow posts with regard to changing php settings to deal with zend_mm_corrupted issue, disabling/enabling garbage collection without any degree of success.
What I'm looking for
I realise that there is not enough information here to possibly know what is causing what I presume to be a memory leak, so what I want to know is what are possible and probable causes of my issue and how can I go about diagnosing this issue to find where the problem is.
Note:
I have tried building my extension with --enable-debug but it comes as unrecognised argument.
Edit: Valgrind
I have run over it with valgrind and got the following output:
--24803-- REDIR: 0x4ebde30 (__GI_strncmp) redirected to 0x4c2dd20 (__GI_strncmp)
--24803-- REDIR: 0x4ec1820 (__GI_stpcpy) redirected to 0x4c2f860 (__GI_stpcpy)
Segmentation fault (core dumped)
==24803==
==24803== HEAP SUMMARY:
==24803== in use at exit: 2,401 bytes in 72 blocks
==24803== total heap usage: 73 allocs, 1 frees, 2,417 bytes allocated
==24803==
==24803== Searching for pointers to 72 not-freed blocks
==24803== Checked 92,624 bytes
==24803==
==24803== LEAK SUMMARY:
==24803== definitely lost: 0 bytes in 0 blocks
==24803== indirectly lost: 0 bytes in 0 blocks
==24803== possibly lost: 0 bytes in 0 blocks
==24803== still reachable: 2,401 bytes in 72 blocks
==24803== suppressed: 0 bytes in 0 blocks
==24803== Reachable blocks (those to which a pointer was found) are not shown.
==24803== To see them, rerun with: --leak-check=full --show-reachable=yes
==24803==
==24803== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
--24803--
--24803-- used_suppression: 2 dl-hack3-cond-1
==24803==
==24803== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
This suggests to me that perhaps the issue isn't a memory leak but am not certain on this.
It appears to me that your program does have heap memory corruption. This is is bit difficult to find by looking out your code snippet or faulty call stack. You may want run your program under some dynamic tools(Valgrind, WindDBG/Pageheap) to track the actual source of error.
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Allowed memory size of 134217728 bytes exhausted (tried to allocate
139678164955264 bytes) in Unknown on line 0
It looks like somewhere in your program signed to unsigned conversion is getting executed. Normally allocators have size parameter of unsigned type so it interpret the negative value to be very large type and under those scenario, allocation would fail.
In my program, even if do all the obvious housekeeping, such as calling cairo_destroy(), cairo_surface_destroy()..., valgrind always finds memory leaks, the leaks are in cairo dependencies (freetype, pixman, ...). How do I cleanup after cairo, so that valgrind won't detect any leaks, or are the leaks normal?
Sample output
==1861== HEAP SUMMARY:
==1861== in use at exit: 1,996,663 bytes in 532 blocks
==1861== total heap usage: 21,915 allocs, 21,383 frees, 95,411,698 bytes allocated
==1861==
==1861== LEAK SUMMARY:
==1861== definitely lost: 0 bytes in 0 blocks
==1861== indirectly lost: 0 bytes in 0 blocks
==1861== possibly lost: 0 bytes in 0 blocks
==1861== still reachable: 1,996,663 bytes in 532 blocks
==1861== suppressed: 0 bytes in 0 blocks
==1861== Reachable blocks (those to which a pointer was found) are not shown.
==1861== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1861==
==1861== For counts of detected and suppressed errors, rerun with: -v
==1861== Use --track-origins=yes to see where uninitialised values come from
==1861== ERROR SUMMARY: 1961 errors from 7 contexts (suppressed: 1 from 1)
UPDATE:
This question says, the "leaks" are normal, does there exist a way to do the cleanup, so valgrind becomes happy?
For cairo there is cairo_debug_reset_static_data().
While writing this as a comment, I was looking into pixman's source and the implementation of _pixman_choose_implementation() and apparently you cannot "clean up" pixman.
I have no clue about freetype.
Edit:
For fontconfig (related to freetype, so possibly interesting here), there is FcFini().
I'm using valgrind to check about the memory usage of my C application. After the first tests valgrind reports:
"still reachable: 2,248 bytes in 1 blocks".
I checked the code but I was not able to find the problem at mere sight. So I started to comment sections of the code to try to find the problem.
I was shocked when in my code I only have left
int main(void)
{
};
and STILL get the message, with the only difference in the amount of bytes.
I'm really puzzled with this...
Here is the complete messagge:
Running with options : valgrind --leak-check=full --show-reachable=yes
==2557== HEAP SUMMARY:
==2557== in use at exit: 2,248 bytes in 1 blocks
==2557== total heap usage: 362 allocs, 361 frees, 14,579 bytes allocated
==2557==
==2557== 2,248 bytes in 1 blocks are still reachable in loss record 1 of 1
==2557== at 0x4006171: calloc (vg_replace_malloc.c:593)
==2557== by 0x4D72250B: monstartup (in /usr/lib/libc-2.15.so)
==2557== by 0x8048650: __gmon_start__ (in /home/elias/Documents/SL_HTTP/Endosos/bin/Debug/Endosos)
==2557== by 0x4005421: ??? (in /usr/local/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==2557==
==2557== LEAK SUMMARY:
==2557== definitely lost: 0 bytes in 0 blocks
==2557== indirectly lost: 0 bytes in 0 blocks
==2557== possibly lost: 0 bytes in 0 blocks
==2557== still reachable: 2,248 bytes in 1 blocks
==2557== suppressed: 0 bytes in 0 blocks
==2557==
==2557== For counts of detected and suppressed errors, rerun with: -v
==2557== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Profiling timer expired
I'm compiling with gcc 4.7.2 in Fedrora 17
Any advice will be appreciated. Thanks.
This is perfectly fine and safe to ignore. In this case this is memory that seems to have been allocated by profiling (you're probably compiling code with profiling enabled or linking to some library that does).
Your environment will do a bunch of things to set up before calling main and those things can allocate memory. Since they know that this memory will be used until the program exits they don't bother to free it on exit because that just takes time for no benefit. Most of that memory will be reported as "still reachable" by valgrind and can be safely ignored.
Thanks to all.
You were right.
I'm using Code:Blocks 12.11 and by default it had enable -pg in the compiler settings.