I know this issue has been addressed in few questions on stackoverflow. Experts have mentioned that the report "still reachable" is not a memory leak. They just demand to be freed and they're completely a programmer's choice. But the problem in my case is that I am still not able to identify the pointer(s) that must be freed.
Valgrind clearly shows that there is no loss of memory but still when I have kept my program integrated with rest of the code of zabbix running for 3 days, I have noticed that memory has dropped from 2.75 GB to 2.05 GB (my computer has 4 GB of RAM allocated).
==13630== LEAK SUMMARY:
==13630== definitely lost: 0 bytes in 0 blocks
==13630== indirectly lost: 0 bytes in 0 blocks
==13630== possibly lost: 0 bytes in 0 blocks
==13630== still reachable: 15,072 bytes in 526 blocks
==13630== suppressed: 0 bytes in 0 blocks
I want to show you the whole code that's why I am not pasting it here. Please click here to have a look into the code which will work in eclipse CDT.
Purpose of the code: Basically the code has been re-written by me to allow zabbix server to get system value from zabbix agent installed in a remote machine. This code which I have pasted, creates a file in a directory "/ZB_RQ" of the remote host with a request "vm.memory.size[available]" and zabbix agent writes the proper value back in this file with a prefix "#" to distinguish request from response. In this code, I have considered localhost "127.0.0.1" as a remote host for testing. This program works by using a user "zabbix" with password "bullet123", which you would find out from the code itself. This complete operation is carried out using libssh2 API.
Before you get started: Please create a directory "/ZB_RQ", a user "zabbix" with password "bullet123" and install libssh2-devel on your Linux machine.
How to get this program completely working: When you run this program with valgrind, (after execution of the function "sendViaSFTP") you would find that there is a file created in the directory "/ZB_RQ". The program waits for 3 seconds to get the value back in the same file. If not found, the program will create a new file with same expectation. So, within 3 seconds, from an another terminal, you have to write a sample response into the file (say "#test"). And, thus you could know the whole execution.
So, the moment you kill the whole execution (ctrl + c), the valgrind will show the above result with a very long list of "reachable blocks".
I have made sure to free every libssh2 variables but still I could not figure out why there is a continuous drop in the memory. Is this happening due to piling up of "reachable blocks" ?
If I consider, this is not going to consume all memory, then anyway, please help me to get away from the "reachable blocks".
Related
How can I understand the discrepancy between the 21 MiB of private memory bytes reported for my C application by ps_mem and the much smaller numbers reported by valgrind for current allocations.
System monitor shows below readings.
I've also used ps_mem to check used memory.
Running proc/2101/smaps outputs the following. 4432 is the pid of my application.
I am running Valgrind over a large code-base, with "--time-stamp=yes"
I need to find out the ACTUAL (Relative) TIMESTAMPS at which each Memory was Allocated
Problem: Valgrind Report contains the Time-Stamps at which Leak Summary is generated
Steps:
- Run the codebase for 24 Hours with valgrind [ options
"--tool=memcheck --leak-check=full --time-stamp=yes"]
- Terminate the process with "kill -15" after 24 hours, Leak Summary is
generated.
- Time-Stamps In Valgrind Report= Time of Leak Report Generation [Not
the Actual Time at which Memory was Allocated]
Is there any option thorough which I can get the ACTUAL TIMESTAMPS -- at which Leaked Memory was Allocated ?
Thanks
No, there isn't, because leaks are not detected in real time - there isn't really any way to do that. Instead they are detected by scanning memory when the program finishes, to see what blocks are still reachable - anything which has been allocated but is not reachable is a leak.
I have a strange problem freeing allocated memory in my mpi program:
Here is a code sample that produces the error for me:
void *out, *in;
int cnt = 2501; //if cnt<=2500: works perfectly. cnt>2500: crashes at free!
if((out = malloc(cnt * sizeof(double))) == NULL)
MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OP);
if((in = malloc(cnt * sizeof(double))) == NULL)
MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OP);
//Test data generation
//usage of MPI_Send and MPI_Reduce_local
//doing a lot of memcpy, assigning pointer synonyms to in and out, changing data to in and out
free(in); //crashes here with "munmap_chunk(): invalid pointer"
free(out); //and here (if above line is commented out) with "double free or corruption (!prev)"
I ran it using valgrind:
mpirun -np 2 valgrind --leak-check=full --show-reachable=yes ./foo
and got the following:
==6248== Warning: ignored attempt to set SIGRT32 handler in sigaction();
==6248== the SIGRT32 signal is used internally by Valgrind
cr_libinit.c:183 cri_init: sigaction() failed: Invalid argument
==6248== HEAP SUMMARY:
==6248== in use at exit: 0 bytes in 0 blocks
==6248== total heap usage: 1 allocs, 1 frees, 25 bytes allocated
==6248==
==6248== All heap blocks were freed -- no leaks are possible
==6248==
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 134
Any ideas about how to track this error down? Note that it only appears if cnt>2500!
If you’re using GNU glibc, you can set the environment variable MALLOC_CHECK_ to 2 before running your program, to enable extra checking on memory-allocation calls—details here.
The message you had above,
Warning: ignored attempt to set SIGRT32 handler in sigaction(); the
SIGRT32 signal is used internally by Valgrind cr_libinit.c:XXX
cri_init: sigaction() failed: Invalid argument
has to do with the MPI's (mine is mpich-3.1.4-9) use of the BLCR checkpointing libraries (mine is blcr-0.8.5)
When I had no support for BLCR (run "mpiexec -info" and look at the "Checkpointing libraries available" line) Valgrind was working perfectly during my test phase.
When I recompiled my MPI for BLCR support (for Checkpointing experiments) Valgrind got heart-attack. He stopped working totally.
The bug (as the programmers say,) is very bad, because obviously the two programs use the same signalling for their interrupts to your running program, and simply they cannot do so. (In our case the BLCR for MPI got it first and now Valgrind is left walking on the air).
I will try to run two different installations of MPI to the same machine (one with blcr support and one without), and I hope I will alternate happily between Valgrind and Checkpoint.
UPDATE:
Even if Checkpoint itself was working, it was not possible to run any mpiexec parallel executables (programs that they were working before).
Mpiexec command itself, was crashing when I was USING (the compiled in) checkpointing library BLCR.
SOLUTION:
I recompiled MPI (mpich-3.1.4-9) WITHOUT BLCR support (I put BLCR out completely) and I installed the DMTCP checkpointing solution (dmtcp-2.4.4) that not only works transparently, but it is also faster than BLCR (you will find the benchmarks in the bibliography).
Now everything runs as intended :) and the handling of the checkpointed jobs is done properly too. In the future I will do more heavy tests of DMTCP (I will use local files that will have heavy/active IO from the parallel program).
PS. I also found out that mpich has put out completely the BLCR from their distribution (July 2016).
I have an application I have been trying to get "memory leak free", I have been through solid testing on Linux using Totalview's MemoryScape and no leaks found. I have ported the application to Solaris (SPARC) and there is a leak I am trying to find...
I have used "LIBUMEM" on Solaris and it seems to me like it als picks up NO leaks...
Here is my startup command:
LD_PRELOAD=libumem.so UMEM_DEBUG=audit ./link_outbound config.ini
Then I immediatly checked the PRSTAT on Solaris to see what the startup memory usage was:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 44M 25M sleep 59 0 0:00:00 1.1% link_outbou/3
Then I started to send thousands of messages to the application...and over time the PRSTAT grew..
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 48M 29M sleep 59 0 0:00:36 3.5% link_outbou/3
And just before I eventually stopped it:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9471 root 48M 48M sleep 59 0 0:01:05 5.3% link_outbou/3
Now the interesting part is when I use LIBUMEM on this application that it showing 48 MB memory, like follows:
pgrep link
9471
# gcore 9471
gcore: core.9471 dumped
# mdb core.9471
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::findleaks
BYTES LEAKED VMEM_SEG CALLER
131072 7 ffffffff79f00000 MMAP
57344 1 ffffffff7d672000 MMAP
24576 1 ffffffff7acf0000 MMAP
458752 1 ffffffff7ac80000 MMAP
24576 1 ffffffff7a320000 MMAP
131072 1 ffffffff7a300000 MMAP
24576 1 ffffffff79f20000 MMAP
------------------------------------------------------------------------
Total 7 oversized leaks, 851968 bytes
CACHE LEAKED BUFCTL CALLER
----------------------------------------------------------------------
Total 0 buffers, 0 bytes
>
The "7 oversized leaks, 851968 bytes" never changes if I send 10 messages through the application or 10000 messages...it is always "7 oversized leaks, 851968 bytes". Does that mean that the application is not leaking according to "libumem"?
What is so frustrating is that on Linux the memory stays constant, never changes....yet on Solaris I see this slow, but steady growth.
Any idea what this means? Am I using libumem correctly? What could be causing the PRSTAT to be showing memory growth here?
Any help on this would be greatly appreciated....thanks a million.
If the SIZE column doesn't grow, you're not leaking.
RSS (resident set size) is how much of that memory you are actively using, it's normal that that value changes over time. If you were leaking, SIZE would grow over time (and RSS could stay constant, or even shrink).
check out this page.
the preferred option is UMEM_DEBUG=default, UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1. that is the options that I use for debugging solaris memory leak problem, and it works fine for me.
based on my experience with RedHat REL version 5 and solaris SunOS 5.9/5.10, linux process memory footprint doesn't increase gradually, instead it seems it grabs a large chunk memory when it needs extra memory and use them for a long run. (purely based on observation, haven't done any research on its memory allocation mechanism). so you should send a lot more data (10K messages are not big).
you can try dtrace tool to check memory problem at solaris.
Jack
I have created 20 threads to read/write a shared file.I have synchronized threads.
Now My program works fine but when I run it with valgrind it gives me Errors like this:
LEAK SUMMARY:
**definitely lost: 0 bytes in 0 blocks.
\
**possibly lost: 624 bytes in 5 blocks.**
**still reachable: 1,424 bytes in 5 blocks.****
suppressed: 0 bytes in 0 blocks.
Reachable blocks (those to which a pointer was found) are not shown.
Also When I press Ctrl + c , it gives the same errors.
I have not even malloced anything but still valgrind complains.
Any suggestion would be appreciated .
You can run valgrind --leak-check=full ./prog_name to make sure these reachable blocks are not something you can destroy in your program. Many times initializing a library such as libcurl without closing or destroying it will cause leaks. If it's not something you have control over, you can write a suppression file. http://valgrind.org/docs/manual/mc-manual.html section 4.4 has some info and a link to some examples
Sill reachable blocks are probably caused by your standard library not freeing memory used in pools for standard containers (see this faq): which would be a performance optimisation for program exit, since the memory is immediately going to be returned to the operating system anyway.
"Possibly lost" blocks are probably caused by the same thing.
The Valgrind Manual page for memcheck has a good explanation about the different sorts of leaks detected.