Valgrind reports errors for a very simple C program - c

I'm learning C language from Learn C The Hard Way. I'm on exercise 6 and while I can make it work, valgrind repots a lot of errors.
Here's the stripped down minimal program from a file ex6.c:
#include <stdio.h>
int main(int argc, char *argv[])
{
char initial = 'A';
float power = 2.345f;
printf("Character is %c.\n", initial);
printf("You have %f levels of power.\n", power);
return 0;
}
Content of Makefile is just CFLAGS=-Wall -g.
I compile the program with $ make ex6 (there are no compiler warnings or errors). Executing with $ ./ex6 produces the expected output.
When I run the program with $ valgrind ./ex6 I get errors which I can't solve. Here's the full output:
==69691== Memcheck, a memory error detector
==69691== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==69691== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==69691== Command: ./ex6
==69691==
--69691-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option
--69691-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 2 times)
--69691-- UNKNOWN mach_msg unhandled MACH_SEND_TRAILER option (repeated 4 times)
==69691== Conditional jump or move depends on uninitialised value(s)
==69691== at 0x1003FBC3F: _platform_memchr$VARIANT$Haswell (in /usr/lib/system/libsystem_platform.dylib)
==69691== by 0x1001EFBB6: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001FA005: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021F9CE: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021FCA0: __xvprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F5B91: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F39F7: printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x100000F1B: main (ex6.c:8)
==69691==
Character is A.
==69691== Invalid read of size 32
==69691== at 0x1003FBC1D: _platform_memchr$VARIANT$Haswell (in /usr/lib/system/libsystem_platform.dylib)
==69691== by 0x1001EFBB6: __sfvwrite (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001FA005: __vfprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021F9CE: __v2printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x10021FCA0: __xvprintf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F5B91: vfprintf_l (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x1001F39F7: printf (in /usr/lib/system/libsystem_c.dylib)
==69691== by 0x100000F31: main (ex6.c:9)
==69691== Address 0x100809680 is 32 bytes before a block of size 32 in arena "client"
==69691==
You have 2.345000 levels of power.
==69691==
==69691== HEAP SUMMARY:
==69691== in use at exit: 39,365 bytes in 429 blocks
==69691== total heap usage: 510 allocs, 81 frees, 45,509 bytes allocated
==69691==
==69691== LEAK SUMMARY:
==69691== definitely lost: 16 bytes in 1 blocks
==69691== indirectly lost: 0 bytes in 0 blocks
==69691== possibly lost: 13,090 bytes in 117 blocks
==69691== still reachable: 26,259 bytes in 311 blocks
==69691== suppressed: 0 bytes in 0 blocks
==69691== Rerun with --leak-check=full to see details of leaked memory
==69691==
==69691== For counts of detected and suppressed errors, rerun with: -v
==69691== Use --track-origins=yes to see where uninitialised values come from
==69691== ERROR SUMMARY: 5 errors from 2 contexts (suppressed: 0 from 0)
I'm on OS X yosemite. Valgrind is installed via brew with this command $ brew install valgrind --HEAD.
So, does anyone know what's the issue here? How do I fix the valgrind errors?

If the programme you are running through Valgrind is exactly the one you posted in your question, it clearly doesn't have any memory leaks. In fact, you don't even use malloc/free yourself!
It looks to me like these are spurious errors / false positives that Valgrind detects on OS X (only!), similar to what happened to myself some time ago.
If you have access to a different operating system, e.g. a Linux machine, try to analyze the programme using Valgrind on that system.
EDIT: I haven't tried this myself, since I don't have access to a Mac right now, but you should try what
M Oehm suggested: try to use a supressions file as mentioned in this other SO question.

This issue is fixed for Darwin 14.3.0 (Mac OS X 10.10.2) using Valgrind r14960 with VEX r3124 for Xcode6.2 and Valgrind r15088 for Xcode 6.3.
If you are using Macports (at this time of writing), sudo port install valgrind-devel will give you Valgrind r14960 with VEX r3093.
Here's my build script to install Valgrind r14960 with VEX r3124:
#! /usr/bin/env bash
mkdir -p buildvalgrind
cd buildvalgrind
svn co svn://svn.valgrind.org/valgrind/trunk/#14960 valgrind
cd valgrind
./autogen.sh
./configure --prefix=/usr/local
make && sudo make install
# check that we have our valgrind installed
/usr/local/bin/valgrind --version
(reference: http://calvinx.com/2015/04/10/valgrind-on-mac-os-x-10-10-yosemite/)
My macports-installed valgrind is located at /opt/local/bin/valgrind.
If I now run
/opt/local/bin/valgrind --leak-check=yes --suppressions=`pwd`/objc.supp ./ex6
I will get exactly the same errors you described above. (Using my objc.supp file here https://gist.github.com/calvinchengx/0b1d45f67be9fdca205b)
But if I run
/usr/local/bin/valgrind --leak-check=yes --suppressions=`pwd`/objc.supp ./ex6
Everything works as expected and I do not get the system level memory leak errors showing up.

Judging from this topic, I assume that valgrind is not guaranteed to give correct results on your platform. If you can, try this code on another platform.
The culprit is either in valgrid itself or in your system's implementation of printf, both of which would be impractical for you to fix.
Rerun with --leak-check=full to see details of leaked memory. This should give you some more information about the leak you are experiencing. If nothing helps, you can create a suppression file to stop the errors from being displayed.

Related

PulseAudio-related leaks in SDL2 program under Valgrind's Memcheck?

I'm currently on Kubuntu and I write a code with SDL 2.
My goal is to do ray-casting.
So no problem in my code - gdb said no problem and exit normally but valgrind said one error
==1894== Memcheck, a memory error detector
==1894== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1894== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1894== Command: ./ray
==1894==
==1894== Conditional jump or move depends on uninitialised value(s)
==1894== at 0x50B8565: pa_shm_cleanup (in /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-13.99.so)
==1894== by 0x50B87A1: pa_shm_create_rw (in /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-13.99.so)
==1894== by 0x50A84B6: pa_mempool_new (in /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-13.99.so)
==1894== by 0x4E149B1: pa_context_new_with_proplist (in /usr/lib/x86_64-linux-gnu/libpulse.so.0.21.2)
==1894== by 0x493ED5E: ??? (in /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0.10.0)
==1894== by 0x493F65A: ??? (in /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0.10.0)
==1894== by 0x4891D9B: ??? (in /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0.10.0)
==1894== by 0x488D906: ??? (in /usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0.10.0)
==1894== by 0x10941D: main (main.c:9)
==1894==
==1894==
==1894== HEAP SUMMARY:
==1894== in use at exit: 349,601 bytes in 2,981 blocks
==1894== total heap usage: 220,203 allocs, 217,222 frees, 32,111,232 bytes allocated
==1894==
==1894== LEAK SUMMARY:
==1894== definitely lost: 377 bytes in 3 blocks
==1894== indirectly lost: 3,071 bytes in 24 blocks
==1894== possibly lost: 0 bytes in 0 blocks
==1894== still reachable: 346,153 bytes in 2,954 blocks
==1894== suppressed: 0 bytes in 0 blocks
==1894== Rerun with --leak-check=full to see details of leaked memory
==1894==
==1894== Use --track-origins=yes to see where uninitialised values come from
==1894== For lists of detected and suppressed errors, rerun with: -s
==1894== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
If I understand, my code is great but there is a problem with a pulseAudio lib?
To test, I just write SDL_Init(SDL_INIT_EVERYTHING) SDL_Quit() in the main function and valgrind said the same thing. So that by SDL with a pulseAudio lib.
Can someone help me to track and remove that error?
The problem is likely in the SDL2 or PulseAudio lib. Though having exact test code with the compilation command would ensure that you are not doing something wrong, it is unlikely a bug from you and I would ignore it.
Valgrind can and have error suppression lists to remove these annoyances. How do you tell Valgrind to completely suppress a particular .so file? might help you.
Also avoid using SDL_INIT_EVERYTHING, you likely only need SDL_INIT_VIDEO or SDL_INIT_VIDEO|SDL_INIT_TIMER depending on what you do, checkout the SDL_Init() documentation.

Valgrind gives possibly lost memory with g_test_trap_subprocess ()

I am currently working on unit tests with glib-Testing for a C library I am writing. Part of these tests check that code fails on expected occasions (I am used to these sort of tests from Python where you would assert a certain Exception was raised). I am using the recipe in the manual for glib-Testing for g_test_trap_subprocess () (see minimal example below) which works fine from the unit-testing point of view and gives the correct tests.
My problem is when I run valgrind on the following minimal example (test_glib.c):
#include <glib.h>
void test_possibly_lost(){
if (g_test_subprocess()){
g_assert(1 > 2);
}
g_test_trap_subprocess(NULL, 0, 0);
g_test_trap_assert_failed();
}
int main(int argc, char **argv){
g_test_init(&argc, &argv, NULL);
g_test_add_func("/set1/test", test_possibly_lost);
return g_test_run();
}
compiled with
gcc `pkg-config --libs --cflags glib-2.0` test_glib.c
The output of valgrind --leak-check=full ./a.out is then
==15260== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15260== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==15260== Command: ./a.out
==15260==
/set1/test: OK
==15260==
==15260== HEAP SUMMARY:
==15260== in use at exit: 24,711 bytes in 40 blocks
==15260== total heap usage: 2,507 allocs, 2,467 frees, 235,121 bytes allocated
==15260==
==15260== 272 bytes in 1 blocks are possibly lost in loss record 36 of 40
==15260== at 0x483AB65: calloc (vg_replace_malloc.c:752)
==15260== by 0x4012AC1: allocate_dtv (in /usr/lib/ld-2.29.so)
==15260== by 0x4013431: _dl_allocate_tls (in /usr/lib/ld-2.29.so)
==15260== by 0x4BD51AD: pthread_create##GLIBC_2.2.5 (in /usr/lib/libpthread-2.29.so)
==15260== by 0x48BE42A: ??? (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x48BE658: g_thread_new (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x48DCBF0: ??? (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x48DCC43: ??? (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x48DCD11: g_child_watch_source_new (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x48B7DF4: ??? (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x48BEA93: g_test_trap_subprocess (in /usr/lib/libglib-2.0.so.0.6000.6)
==15260== by 0x1091DD: test_possibly_lost (in /dir/to/aout/a.out)
==15260==
==15260== LEAK SUMMARY:
==15260== definitely lost: 0 bytes in 0 blocks
==15260== indirectly lost: 0 bytes in 0 blocks
==15260== possibly lost: 272 bytes in 1 blocks
==15260== still reachable: 24,439 bytes in 39 blocks
==15260== suppressed: 0 bytes in 0 blocks
==15260== Reachable blocks (those to which a pointer was found) are not shown.
==15260== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15260==
==15260== For counts of detected and suppressed errors, rerun with: -v
==15260== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
The possibly lost memory bothers me, coincidentally my code also possibly loses 272 bytes so I think this might be a problem with the way I use glib rather than my own structs. Personally, I would treat possibly lost memory as definitely lost and I would like to get rid of it.
So my question is whether there is a free that I could cleverly insert to free the memory, a different recipe to check for failed asserts or are these lost 272 bytes just something I will have to live with?
That's a somewhat odd stack trace for the allocation. g_test_trap_subprocess() is supposed to run the specified test(s) in a subprocess, but it is creating a thread. These are not mutually exclusive -- a subprocess may well also be forked -- but mixing threads with forking is a tricky, finicky business.
In any event, the trace seems to indicate that the problem arises from glib starting a thread that is not properly terminated and cleaned up before your program exits. Since the issue is with an internal thread, the best solution would involve calling an appropriate shutdown function. I don't see such a function documented either specifically for g_test or more generally among the GLib utility functions, nor do I see any documentation of a need to call such a function, so I'm going to attribute the issue to a minor flaw in Glib.
Unless you can find a glib-based solution that I missed, your best alternative is probably to accept that what you're seeing is a glib quirk, and to write a Valgrind suppression file that you can then use to instruct Valgrind not to report on it. Note that although you can write such a file by hand, using the information provided in the leak report, the easiest way to get one is to run Valgrind with the --gen-suppressions=yes option. However you get it, you can instruct valgrind to use it on subsequent runs by using a --suppressions=/path/to/file.supp option on the Valgrind command line.
Do consult the Valgrind manual (linked above) for details on suppression files, including format, how to create and modify them, and how to use them.

How to track down why code behaves differently with optimization flags

so I have some complex C code that encodes and decodes bit arrays with error correction. The code was based on this, and patched to meet my needs (so it really doesn't function the same as the linked code, except in its core).
With optimizations off, I get what I expect on the output. With optimizations on full (ie, -O3 provided during gcc compilation), the code behaves differently.
I'm trying to figure out what I should be looking for to track this down; as in is there something obvious that the optimizations do that I can look for before I start adding printfs everywhere in the code to see in which line(s) the output differs between optimizations.
A clue, I think, is that when I run the code through valgrind, without optimizations, I get no errors or warnings:
==5112== Memcheck, a memory error detector
==5112== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5112== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5112== Command: ./a.out
==5112==
!!!!! decode ret = 0 (my output)
Nid decoded = 0010100100111010101110101001001110111110110000100110101000101011 (my output)
==5112==
==5112== HEAP SUMMARY:
==5112== in use at exit: 0 bytes in 0 blocks
==5112== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==5112==
==5112== All heap blocks were freed -- no leaks are possible
==5112==
==5112== For counts of detected and suppressed errors, rerun with: -v
==5112== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
With optimizations enabled and --track-origins=yes, valgrind reports the following:
==5506== Memcheck, a memory error detector
==5506== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5506== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5506== Command: ./a.out
==5506==
==5506== Conditional jump or move depends on uninitialised value(s)
==5506== at 0x400DC1: bch_decode (in /home/directory/bch/a.out)
==5506== by 0x4005CD: main (in /home/directory/bch/a.out)
==5506== Uninitialised value was created by a stack allocation
==5506== at 0x40094B: bch_decode (in /home/directory/bch/a.out)
==5506==
==5506== Use of uninitialised value of size 8
==5506== at 0x400DC3: bch_decode (in /home/directory/bch/a.out)
==5506== by 0x4005CD: main (in /home/directory/bch/a.out)
==5506== Uninitialised value was created by a stack allocation
==5506== at 0x40094B: bch_decode (in /home/directory/bch/a.out)
==5506==
!!!!! decode ret = -1
Nid decoded = 0010100100111010101110101001001110111110110000100010101000101011
==5506==
==5506== HEAP SUMMARY:
==5506== in use at exit: 0 bytes in 0 blocks
==5506== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==5506==
==5506== All heap blocks were freed -- no leaks are possible
==5506==
==5506== For counts of detected and suppressed errors, rerun with: -v
==5506== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
However, with optimizations on, I don't think Valgrind can tell me specific line numbers, or variables in which this happens. Even if I were to step through the code with GDB, it will be hard to nail this down because of the optimizations.
Is there a way I can sort this out, maybe /emulate/ the behavior of optimizations, but keep the correct line numbers and variable names in the debug information (unlikely from everything I've so far read).
Any help to move this forward even a little bit would be greatly appreciated.
valgrind tells you that you have uninitialized variables:
==5506== Conditional jump or move depends on uninitialised value(s)
==5506== at 0x400DC1: bch_decode (in /home/directory/bch/a.out)
==5506== by 0x4005CD: main (in /home/directory/bch/a.out)
==5506== Uninitialised value was created by a stack allocation
==5506== at 0x40094B: bch_decode (in /home/directory/bch/a.out)
==5506==
==5506== Use of uninitialised value of size 8
==5506== at 0x400DC3: bch_decode (in /home/directory/bch/a.out)
==5506== by 0x4005CD: main (in /home/directory/bch/a.out)
==5506== Uninitialised value was created by a stack allocation
==5506== at 0x40094B: bch_decode (in /home/directory/bch/a.out)
Recompile and re-link your code with -g flag to see line numbers in valgrind output.
Step 1 - use any static analysis tools available, starting with the maximum warning level available on your compiler of choice. Sometimes using a different compiler (e.g. llvm) may produce new warnings the first one didn't give you.
Valgrind says you have uninitialized variable trouble, so "gcc -Wall" will probably nail it.
Step 2 - try valgrind on code built -g; maybe you'll get line numbers. (Or maybe it'll stop replicating, because of interaction of -g with -O.)
Step 3a - intense code review, looking for undefined behaviour. How well do you understand "sequence points" in C?
If there is any multithreading, signal handling, etc. then look for ordering issues. (Unlikely with the stated purpose of the code, but a common source of optimizer-related breakage.)
Step 3b - examine the assembly code generated in both instances. Consider stepping code with the debugger at the assembly level rather than the source code level.
Step 3c - enable/disable individual optimizations, and see which one breaks things. Then use understanding of what that optimization does to guide code review
Step 3d - simplify your code. The oversight may appear as a result of this exercise. And if not, you can start calling individual parts of it, and checking which bits are misbehaving. (Odds are once you are looking at the right lines, you'll see what's wrong.)

Valgrind reports "Conditional jump or move depends on uninitialised value(s)" on every program

I'm working through "Learn C the Hardway" and using valgrind to debug my programs, but it keeps giving me the same error, even on programs I know for a fact are correct. I'm running ubuntu on a VMware virtual machine which I thought might be the problem but it works fine on another Windows computer using the same setup. I originally built valgrind from sratch per the book but removed it and used sudo-apt get install to see if it made a difference but still same error.
Here's the error on a simple hello world program and I get the exact same results on every C program I run.
bizarro#ubuntu:~/Dropbox/Programming/C/TheHardWay/Exercises$ valgrind ./ex1
==8625== Memcheck, a memory error detector
==8625== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==8625== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==8625== Command: ./ex1
==8625==
==8625== Conditional jump or move depends on uninitialised value(s)
==8625== at 0x4019B04: index (strchr.S:77)
==8625== by 0x4007DED: expand_dynamic_string_token (dl-load.c:425)
==8625== by 0x4008D71: _dl_map_object (dl-load.c:2538)
==8625== by 0x40014BD: map_doit (rtld.c:627)
==8625== by 0x400FFF3: _dl_catch_error (dl-error.c:187)
==8625== by 0x4000B2E: do_preload (rtld.c:816)
==8625== by 0x400446C: dl_main (rtld.c:1633)
==8625== by 0x4017564: _dl_sysdep_start (dl-sysdep.c:249)
==8625== by 0x4004CF7: _dl_start (rtld.c:332)
==8625== by 0x40012D7: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==8625==
Hello world.
This is a the print function
It prints things and needs a semi colon at the end
Which im not used to. Python has made me sloppy
I miss Python already
==8625==
==8625== HEAP SUMMARY:
==8625== in use at exit: 0 bytes in 0 blocks
==8625== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==8625==
==8625== All heap blocks were freed -- no leaks are possible
==8625==
==8625== For counts of detected and suppressed errors, rerun with: -v
==8625== Use --track-origins=yes to see where uninitialised values come from
==8625== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
edit*
Here is the code but like I said it gives this message independent of code.
#include <stdio.h>
int main(int argc, char *argv[])
{
puts("Hello world.");
puts("This is a the print function");
puts("It prints things and needs a semi colon at the end");
puts("Which im not used to. Python has made me sloppy");
puts("I miss Python already");
return 0;
}
It looks as though Valgrind is identifying a potential issue within the C runtime. This isn't something you need to worry about, so I'd recommend that you create a suppressions file to ignore this particular warning.

Tips on debugging segmentation faults when no leaks are found

I wrote a C-based application that appears to run fine, except on very large datasets as input.
With large input, I get a segmentation fault at the end steps of the binary's functionality.
I ran the binary (with the test input) with valgrind:
valgrind --tool=memcheck --leak-check=yes /foo/bar/baz inputDataset > outputAnalysis
This job normally takes a few hours, but with valgrind it took seven days.
Unfortunately, at this point, I don't know how to read the results I am getting from this run.
I get a lot of these warnings:
...
==4074== Conditional jump or move depends on uninitialised value(s)
==4074== at 0x435900: ??? (in /foo/bar/baz)
==4074== by 0x439CC5: ??? (in /foo/bar/baz)
==4074== by 0x400BF2: ??? (in /foo/bar/baz)
==4074== by 0x402086: ??? (in /foo/bar/baz)
==4074== by 0x402A0F: ??? (in /foo/bar/baz)
==4074== by 0x41684F: ??? (in /foo/bar/baz)
==4074== by 0x4001B8: ??? (in /foo/bar/baz)
==4074== by 0x7FEFFFF57: ???
==4074== Uninitialised value was created
==4074== at 0x461D3A: ??? (in /foo/bar/baz)
==4074== by 0x43F926: ??? (in /foo/bar/baz)
==4074== by 0x416B9B: ??? (in /foo/bar/baz)
==4074== by 0x416725: ??? (in /foo/bar/baz)
==4074== by 0x4001B8: ??? (in /foo/bar/baz)
==4074== by 0x7FEFFFF57: ???
...
There are no parts of code hinted at, no names of variables, etc. What can I do with this information?
At the end, I finally get the following error, but — as with smaller datasets that do not crash — valgrind finds no leaks:
...
==4074== Process terminating with default action of signal 11 (SIGSEGV)
==4074== Access not within mapped region at address 0x7158E7F7
==4074== at 0x7158E7F7: ???
==4074== by 0x4020B8: ??? (in /foo/bar/baz)
==4074== by 0x6322203A22656D6E: ???
==4074== by 0x306C675F6E557267: ???
==4074== by 0x202C22373232302F: ???
==4074== by 0x6D616E656C696621: ???
==4074== by 0x72686322203A2264: ???
==4074== by 0x3030306C675F6E54: ???
==4074== by 0x346469702E373231: ???
==4074== by 0x646469662E34372F: ???
==4074== by 0x722E64616568656B: ???
==4074== by 0x63656D6F6C756764: ???
==4074== If you believe this happened as a result of a stack
==4074== overflow in your program's main thread (unlikely but
==4074== possible), you can try to increase the size of the
==4074== main thread stack using the --main-stacksize= flag.
==4074== The main thread stack size used in this run was 10485760.
==4074==
==4074== HEAP SUMMARY:
==4074== in use at exit: 0 bytes in 0 blocks
==4074== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==4074==
==4074== All heap blocks were freed -- no leaks are possible
==4074==
==4074== For counts of detected and suppressed errors, rerun with: -v
==4074== ERROR SUMMARY: 1603141870 errors from 86 contexts (suppressed: 0 from 0)
Segmentation fault
Everything I allocate space for gets an equivalent free statement, after which I set pointers to NULL.
At this point, how can I best debug this application, to determine what else is causing the segmentation fault?
22 Dec 2011 - Edit
I compiled a debug-version of my binary, called debug-binary, using the following compilation flags:
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -DUSE_ZLIB -g -O0 -Wformat -Wall -pedantic -std=gnu99
When I run it with valgrind, I don't get much more information:
valgrind -v --tool=memcheck --leak-check=yes --error-limit=no --track-origins=yes debug-binary input > output
Here's a snippet of output:
==25116== 2 errors in context 14 of 14:
==25116== Invalid read of size 4
==25116== at 0x4045E8: ??? (in /foo/bar/debug-binary)
==25116== by 0x40682F: ??? (in /foo/bar/debug-binary)
==25116== by 0x404F0C: ??? (in /foo/bar/debug-binary)
==25116== by 0x401FA4: ??? (in /foo/bar/debug-binary)
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)
==25116== Address 0x539f188 is 24 bytes inside a block of size 48 free'd
==25116== at 0x4A05D21: free (vg_replace_malloc.c:325)
==25116== by 0x401F6B: ??? (in /foo/bar/debug-binary)
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)
Is this an issue with my binary, or with a system library (libc) that my application is dependent upon?
I also don't know what to do about interpreting the ??? entries. Is there another compilation flag I need to get valgrind to provide more information?
Valgrind basically says there are no notable heap management issues. The program is segfaulting from a less complex programming fault.
If it were me, I would
compile it with gcc -g,
enable core dump files (ulimit -c unlimited),
run the program normally,
and let it fault
use gdb to examine the core file and look at what it was doing when it faulted:
gdb (programfile) (corefile)
bt
I don't believe valgrind is able to find all errors where you've overrun a value on the stack (but not overrun the stack itself). So, you may want to try gcc's -f-stack-protector-all option.
You should also try mudflap, with -fmudflap (single-threaded) or -fmudflapth (multi-threaded).
Both mudflap and stack protector should be much faster than valgrind.
In additional, it looks like you don't have debug symbols, making reading backtraces difficult. Add -ggdb.
You probably also want to enable core-file generation (try ulimit -c unlimited). This way, you can try to debug the process post-crash by using gdb program core.
As #wallyk indicates, your segfault may actually be something fairly easy to find—e.g., maybe you're dereferencing NULL, and gdb can point you to the exact line (or, well, close unless you compile with -O0). This would make sense, for example, if you're just running of memory for your larger datasets, and thus malloc returns NULL, and you forgot to check that somewhere.
Lastly, if nothing else makes sense, there is always the possibility of hardware issues. But those would be expected to be fairly random, e.g., different values getting corrupted different runs. If you try a different machine, and it happens there, its extremely unlikely to be a hardware issue.
The "Conditional jump or move depends on uninitialised value" is a serious bug you need to fix. It indicates that the behaviour of your program is affected by the contents of an uninitialised variable (including an uninitialised memory region returned by malloc()).
To get readable backtraces from valgrind you need to compile with -g.

Resources