Premise, I'm using Eclipse.
In my quest for the debug of a multithreaded application i first ran valgrind memcheck that gave me a bunch of errors, couldn't identify which lines of code these errors originated from.
I then created a profile to use valgrind on Debug build, it gave me an error
"Invalid read of size 1" that pointed to a line in the source code which allowed me to fix it. Now valgrind runs on the Debug build are giving me no errors, but if I try to run valgrind on the Release build i get errors, which cannot pinpoint.
==5083== 16 bytes in 1 blocks are definitely lost in loss record 2 of 4
==5083== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5083== by 0x400F67: main (in /home/crysis/workspace/ReliableUPDserver/Release/ReliableUPDserver)
==5083==
==5083== 16 bytes in 1 blocks are definitely lost in loss record 3 of 4
==5083== at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5083== by 0x400FA1: main (in /home/crysis/workspace/ReliableUPDserver/Release/ReliableUPDserver)
==5083==
==5083== 512 bytes in 1 blocks are possibly lost in loss record 4 of 4
==5083== at 0x4C2C080: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5083== by 0x400F3E: main (in /home/crysis/workspace/ReliableUPDserver/Release/ReliableUPDserver)
==5083==
How come these errors appear only with the Release build? What can i do to get more information?
Also, my multithreaded program hangs somewhere, is this the right way to try find out where the problem is?
Related
I have a C program that takes various command line arguments, i.e
./Coupled arg1 argv2
And when I run this with valgrind as
valgrind ./Coupled arg1 arg2
I get no memory leaks. But when I use a bash script, called run, of the form
arg1=thing1
arg2=thing2
./Coupled $thing1 $thing2
and then run
valgrind ./run
I get a lot of still reachable memory leakage. I have read that that still reachable memory leakage isn't a huge problem, but I would quite like to know why this is happening? When running valgrind with --leak-check=full --show-leak-kinds=all flags, an example bit of output (the full valgrind output is many pages long)
==4518== 1 bytes in 1 blocks are still reachable in loss record 1 of 269
==4518== at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==4518== by 0x46A3DA: xmalloc (in /usr/bin/bash)
==4518== by 0x437219: make_variable_value (in /usr/bin/bash)
==4518== by 0x438230: ??? (in /usr/bin/bash)
==4518== by 0x43A35E: initialize_shell_variables (in /usr/bin/bash)
==4518== by 0x41DD92: ??? (in /usr/bin/bash)
==4518== by 0x41C482: main (in /usr/bin/bash)
valgrind ./run will debug the shell and not your program.
Take a look at the output, see how it mentions (e.g.)
==4518== by 0x41C482: main (in /usr/bin/bash)
[Emphasis mine]
If you want to debug your program, you need to run valgrind in the script:
arg1=thing1
arg2=thing2
valgrind ./Coupled $thing1 $thing2
I'm trying to understand some differences I've noticed when compiling a simple C program with gcc on Ubuntu (canonical ubuntu) and on Alpine (a docker container).
The program is the following:
int main(void)
{
printf("test\n");
return 0;
}
The command used to compile is the same for each terminal (Ubuntu and Alpine).
Valgrind detects no error on Ubuntu and 1 error on Alpine:
==311== Invalid free() / delete / delete[] / realloc()
==311== at 0x4C939EA: free (vg_replace_malloc.c:530)
==311== by 0x4057B69: ??? (in /lib/ld-musl-x86_64.so.1)
==311== Address 0x4e9b180 is in a rw- mapped file
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so segment
==311==
test
==311==
==311== HEAP SUMMARY:
==311== in use at exit: 404 bytes in 1 blocks
==311== total heap usage: 1 allocs, 1 frees, 404 bytes allocated
==311==
==311== LEAK SUMMARY:
==311== definitely lost: 0 bytes in 0 blocks
==311== indirectly lost: 0 bytes in 0 blocks
==311== possibly lost: 0 bytes in 0 blocks
==311== still reachable: 404 bytes in 1 blocks
==311== suppressed: 0 bytes in 0 blocks
==311== Rerun with --leak-check=full to see details of leaked memory
==311==
==311== For counts of detected and suppressed errors, rerun with: -v
==311== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
What is the explanation for that?
Valgrind calls special glibc functions to deallocate memory on process exit (while normally, glibc just lets the kernel do that). Musl likely doesn't have that because it is poor bloat.
Valgrind also has suppression files to deal with false positives or useless reports from system libraries. Some porting work is required to create them, and it looks like Alpine hasn't done that yet, or the files have become obsolete due to further musl development.
Sometimes, suppression files require debuginfo symbols, and valgrind couldn't find them in the run you quoted, so this is another thing to try.
I wrote a linked list in C today at work on a Linux machine and everything checked out in Valgrind. Then I ran the same test (a handful of pushes and then deleting the list) at home on OS X and got a crazy amount of allocs.
==4344== HEAP SUMMARY:
==4344== in use at exit: 26,262 bytes in 187 blocks
==4344== total heap usage: 267 allocs, 80 frees, 32,374 bytes allocated
==4344==
==4344== LEAK SUMMARY:
==4344== definitely lost: 0 bytes in 0 blocks
==4344== indirectly lost: 0 bytes in 0 blocks
==4344== possibly lost: 0 bytes in 0 blocks
==4344== still reachable: 0 bytes in 0 blocks
==4344== suppressed: 26,262 bytes in 187 blocks
==4344==
==4344== For counts of detected and suppressed errors, rerun with: -v
==4344== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I know the code is fine and doesn't have any leaks. So I just commented out the list test and compiled with only printf("test\n"); in the main, and it showed 263 allocs with 76 frees (I had 4 intentional allocs in the list test). Why am I getting so many allocs on OS X? Is this just something the OS did? I don't understand why I'd have 263 allocs when I just did a printf...
OS X has a very bad architecture. Because libdl, libdyld, libm, libc and some other libraries are "packed" into libSystem, all of them are initialized when the library is loaded. Most of them come from dyld. Dyld is written in C and C++, that's why C++ part may push up number of allocs.
This is only Apple thing, not OS X thing. I have written an alternate C library. It does not have many "not-needed allocs".
Also, allocs are caused by opening FILE *s. Note that 3 streams (stdin, stdout and stderr) are initialized on run.
Valgrind support on OS X is currently being actively worked on. Your best approach is to ensure you are using a SVN trunk build, and update frequently.
The errors Valgrind is reporting to you are present within the OS X system libraries. These are not the fault of your program, but because even simple programs including these system libraries Valgrind continues to pick them up. Suppressions within Valgrind trunk are continually being updated to catch these issues, allowing you to focus on the real problems that may be present within your code.
The following commands will allow you to use Valgrind trunk, if you're not already:
svn co svn://svn.valgrind.org/valgrind/trunk valgrind
cd valgrind
./autogen.sh
./configure
make -j4
sudo make install
Full disclosure: I'm one of the Valgrind developers who contributed patches to support OS X 10.11
The Problem
I've written a php extension (PHP 5.3) which appears to work fine for simple tests but the moment I start making multiple calls it I start seeing the error:
zend_mm_heap corrupted
Normally through a console or apache error log, I also sometimes see the error
[Thu Jun 19 16:12:31.934289 2014] [:error] [pid 560] [client 127.0.0.1:35410] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 139678164955264 bytes) in Unknown on line 0
What I've tried to do
I've tried find the exact spot where the issue occurs but it appears to occurs between the destructor being called for my php class that calls the extension but before the constructor runs the first line of the constructor (note, I have mainly used phpunit to diagnose this, if I run it in a browser it will usually work once and then throw the error to the log on the next attempt with a 'The connection was reset' in my browser window so no output.
I've tried adding debug lines with memory_get_usage and installing the extension memprof but all output fails to show any serious memory issues and I've never seen a memory usage greater than 8mb.
I've looked at other stack overflow posts with regard to changing php settings to deal with zend_mm_corrupted issue, disabling/enabling garbage collection without any degree of success.
What I'm looking for
I realise that there is not enough information here to possibly know what is causing what I presume to be a memory leak, so what I want to know is what are possible and probable causes of my issue and how can I go about diagnosing this issue to find where the problem is.
Note:
I have tried building my extension with --enable-debug but it comes as unrecognised argument.
Edit: Valgrind
I have run over it with valgrind and got the following output:
--24803-- REDIR: 0x4ebde30 (__GI_strncmp) redirected to 0x4c2dd20 (__GI_strncmp)
--24803-- REDIR: 0x4ec1820 (__GI_stpcpy) redirected to 0x4c2f860 (__GI_stpcpy)
Segmentation fault (core dumped)
==24803==
==24803== HEAP SUMMARY:
==24803== in use at exit: 2,401 bytes in 72 blocks
==24803== total heap usage: 73 allocs, 1 frees, 2,417 bytes allocated
==24803==
==24803== Searching for pointers to 72 not-freed blocks
==24803== Checked 92,624 bytes
==24803==
==24803== LEAK SUMMARY:
==24803== definitely lost: 0 bytes in 0 blocks
==24803== indirectly lost: 0 bytes in 0 blocks
==24803== possibly lost: 0 bytes in 0 blocks
==24803== still reachable: 2,401 bytes in 72 blocks
==24803== suppressed: 0 bytes in 0 blocks
==24803== Reachable blocks (those to which a pointer was found) are not shown.
==24803== To see them, rerun with: --leak-check=full --show-reachable=yes
==24803==
==24803== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
--24803--
--24803-- used_suppression: 2 dl-hack3-cond-1
==24803==
==24803== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
This suggests to me that perhaps the issue isn't a memory leak but am not certain on this.
It appears to me that your program does have heap memory corruption. This is is bit difficult to find by looking out your code snippet or faulty call stack. You may want run your program under some dynamic tools(Valgrind, WindDBG/Pageheap) to track the actual source of error.
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Allowed memory size of 134217728 bytes exhausted (tried to allocate
139678164955264 bytes) in Unknown on line 0
It looks like somewhere in your program signed to unsigned conversion is getting executed. Normally allocators have size parameter of unsigned type so it interpret the negative value to be very large type and under those scenario, allocation would fail.
I have created 20 threads to read/write a shared file.I have synchronized threads.
Now My program works fine but when I run it with valgrind it gives me Errors like this:
LEAK SUMMARY:
**definitely lost: 0 bytes in 0 blocks.
\
**possibly lost: 624 bytes in 5 blocks.**
**still reachable: 1,424 bytes in 5 blocks.****
suppressed: 0 bytes in 0 blocks.
Reachable blocks (those to which a pointer was found) are not shown.
Also When I press Ctrl + c , it gives the same errors.
I have not even malloced anything but still valgrind complains.
Any suggestion would be appreciated .
You can run valgrind --leak-check=full ./prog_name to make sure these reachable blocks are not something you can destroy in your program. Many times initializing a library such as libcurl without closing or destroying it will cause leaks. If it's not something you have control over, you can write a suppression file. http://valgrind.org/docs/manual/mc-manual.html section 4.4 has some info and a link to some examples
Sill reachable blocks are probably caused by your standard library not freeing memory used in pools for standard containers (see this faq): which would be a performance optimisation for program exit, since the memory is immediately going to be returned to the operating system anyway.
"Possibly lost" blocks are probably caused by the same thing.
The Valgrind Manual page for memcheck has a good explanation about the different sorts of leaks detected.