How to detect where my app collapsed in linux - c

HI, i am recently in a project in linux written in C.
This app has several processes and they share a block of shared memory...When the app run for about several hrs, a process collapsed without any footprints so it's very diffficult to know what the problem was or where i can start to review the codes....
well, it could be memory overflown or pointer malused...but i dunno exactly...
Do you have any tools or any methods to detect the problems...
It will very appreciated if it get resolved. thanx for your advice...

Before you start the program, enable core dumps:
ulimit -c unlimited
(and make sure the working directory of the process is writeable by the process)
After the process crashes, it should leave behind a core file, which you can then examine with gdb:
gdb /some/bin/executable core
Alternatively, you can run the process under gdb when you start it - gdb will wake up when the process crashes.

You could also run gdb in gdb-many-windows if you are running emacs. which give you better debugging options that lets you examine things like the stack, etc. This is much like Visual Studio IDE.
Here is a useful link
http://emacs-fu.blogspot.com/2009/02/fancy-debugging-with-gdb.html

Valgrind is where you need to go next. Chances are that you have a memory misuse problem which is benign -- until it isn't. Run the programs under valgrind and see what it says.

I agree with bmargulies -- Valgrind is absolutely the best tool out there to automatically detect incorrect memory usage. Almost all Linux distributions should have it, so just emerge valgrind or apt-get install valgrind or whatever your distro uses.
However, Valgrind is hardly the least cryptic thing in existence, and it usually only helps you tell where the program eventually ended up accessing memory incorrectly -- if you stored an incorrect array index in a variable and then accessed it later, then you will still have to figure that out. Especially when paired with a powerful debugger like GDB, however (the backtrace or bt command is your friend), Valgrind is an incredibly useful tool.
Just remember to compile with the -g flag (if you are using GCC, at least), or Valgrind and GDB will not be able to tell you where in the source the memory abuse occurred.

Related

GDB prevents errors

I got this problem on different C projects when using gdb.
If I run my program without it, it crashes consistently at a given event probably because of a invalid read of the memory. I try debugging it with gdb but when I do so, the crash seems to never occur !
Any idea why this could happen ?
I'm using mingw toolchain on Windows.
Yes, it sounds like a race condition or heap corruption or something else that is usually responsible for Heisenbugs. The problem is that your code is likely not correct at some place, but that the debugger will have to behave even if the debugged application does funny things. This way problems tend to disappear under the debugger. And for race conditions they often won't appear in the first place because some debuggers can only handle one thread at a time and uniformly all debuggers will cause the code to run slower, which may already make race conditions go away.
Try Valgrind on the application. Since you are using MinGW, chances are that your application will compile in an environment where Valgrind can run (even though it doesn't run directly on Windows). I've been using Valgrind for about three years now and it has solved a lot of mysteries quickly. The first thing when I get a crash report on the code I'm working with (which runs on AIX, Solaris, BSDs, Linux, Windows) I'm going to make one test run of the code under Valgrind in x64 and x86 Linux respectively.
Valgrind, and in your particular case its default tool Memcheck, is going to emulate through the code. Whenever you allocate memory it will mark all bytes in that memory as "tainted" until you actually initialize it explicitly. The tainted status of memory bytes will get inherited by memcpy-ing uninitialized memory and will lead to a report from Valgrind as soon as an uninitialized byte is used to make a decision (if, for, while ...). Also, it keeps track of orphaned memory blocks and will report leaks at the end of the run. But that's not all, more tools are part of the Valgrind family and test various aspects of your code, including race conditions between threads (Helgrind, DRD).
Assuming Linux now: make sure that you have all the debug symbols of your supporting libraries installed. Usually those come in the *-debug version of packages or in *-devel. Also, make sure to turn off optimization in your code and include debug symbols. For GCC that's -ggdb -g3 -O0.
Another hint: I've had it that pointer aliasing has caused some grief. Although Valgrind was able to help me track it down, I actually had to do the last step and verify the created code in its disassembly. It turned out that at -O3 the GCC optimizer got ahead of itself and turned a loop copying bytes into a sequence of instructions to copy 8 bytes at once, but assumed alignment. The last part was the problem. The assumption about alignment was wrong. Ever since, we've resorted to building at -O2 - which, as you will see in this Gentoo Wiki article, is not the worst idea. To quote the relevant partÖ
-O3: This is the highest level of optimization possible, and also the riskiest. It will take a longer time to compile your code with this
option, and in fact it should not be used system-wide with gcc 4.x.
The behavior of gcc has changed significantly since version 3.x. In
3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all
your packages with -O3 will result in larger binaries that require
more memory, and will significantly increase the odds of compilation
failure or unexpected program behavior (including errors). The
downsides outweigh the benefits; remember the principle of diminishing
returns. Using -O3 is not recommended for gcc 4.x.
Since you are using GCC in MinGW, I reckon this could well apply to your case as well.
Any idea why this could happen ?
There are several usual reasons:
Your application has multiple threads, has a race condition, and running under GDB affects timing in such a way that the crash no longer happens
Your application has a bug that is affected by memory layout (often reading of uninitialized memory), and the layout changes when running under GDB.
One way to approach this is to let the application trap whatever unhandled exception it is being killed by, print a message, and spin forever. Once in that state, you should be able to attach GDB to the process, and debug from there.
Although it's a bit late, one can read this question's answer in order to be able to set up a system to catch a coredump without using gdb. He may then load the core file using
gdb <path_to_core_file> <path_to_executable_file>
and then issue
thread apply all bt
in gdb.
This will show stack traces for all threads that were running when the application crashed, and one may be able to locate the last function and the corresponding thread that caused the illegal access.
Your application is probably receiving signals and gdb might not pass them on depending on its configuration. You can check this with the info signals or info handle command. It might also help to post a stack trace of the crashed process. The crashed process should generate a core file (if it hasn't been disabled) which can be analyzed with gdb.

What kind of bug hangs Linux?

I'm trying to debug a simple cross-platform commandline program (a C parser, itself written in C) and running into something strange.
On Windows, when I run it on a small dataset (the source code of glib) it completes successfully, and when I run it on a large dataset (the source code of the Linux kernel) it exits with an out of memory error. I'm not sure whether the latter is a bug in my code or just a consequence of not just having optimized the memory consumption yet, so I've been trying to run it on Linux so I can get some feedback from valgrind.
On Linux (Ubuntu 11.04 x64 in VirtualBox), when I run my program on a small dataset it completes successfully, and when I run it on a large dataset Linux locks up hard enough I have to reset the entire virtual box (mouse pointer still moves but other than that it's completely unresponsive; Windows task manager says the virtual box is using one hundred percent of a CPU core but not allocating memory).
I wouldn't have expected a bug in my code to crash Linux unless I was writing something like a device driver, and when I try simple test cases that allocate too much memory, go into an infinite loop or both, Linux can handle them just fine. What kind of bug should I be looking for, or what am I missing?
On Linux (Ubuntu 11.04 x64 in VirtualBox)
Probably you haven't reserved enough memory to your virtual machine.
This is most likely an infinite loop (easily done in a parser), which could easily take up 100% cpu or 100% ram.
Attach a debugger!
e.g. gdb
http://www.gnu.org/s/gdb/
gdb comes with gcc on Ubuntu etc...
Here's a how-to: http://www.unknownroad.com/rtfm/gdbtut/gdbtoc.html
EDIT: just saw you already tried gdb. So, try running strace on it, it might give you a hint.
Further to that, try adding log messages to see how far the program gets (primitive, but it'll work eventually!)

Cannot reproduce segfault in gdb

I'm getting segfaults when I run my project. Every time I run the program in gdb, the segfaults disappear. This behavior is not random: each time I run it in my shell it segfaults, each time I run it in gdb, the segfaults disappear. (I did recompile using -g).
So before I start adding printfs frantically everywhere in my code, I would like to know a few things:
Is this behavior common?
What's the best way to approach the issue?
I don't know if tests can be scripted since my application is interactive and crashes on a particular user input.
I didn't paste my code here because it'd be way too long. But if anyone is interested in helping out, here it is:
https://github.com/rahmu/Agros
The easiest way to figure it out is to capture core dumps:
$ ulimit -c unlimited
Then run your program. It will generate a core file
Then use gdb:
$ gdb ./program core
And gdb will load and you can run a backtrace to see exactly what operation elicited the segfault.
Does it do a core dump? Is so that load up the core dump in the debugger. Otherwise change the code to get it to do a core dump.
My guess is that it's a concurrency problem causing a reference to be freed out from under a method call that's assuming that the pointer it has will stay valid. The reason that gdb is probably masking this is because GDB only allows 2 threads to actually concurrently run. If you have more than 2 threads running the only 2 will actively run concurrently. GDB also has performance hits which could be masking this specific condition. As mentioned by Ed just make your application core dump and you can open up the core in GDB and check the stack.
Is this behavior common?
Yes. Undefined behaviour is the source of most of these problems, and by definition it is undefined. Recompiling with -g may certainly affect the results. Recompiling at all may change the results, if the compiler uses some pseudo-random genetic algorithm to optimize stuff or something like that.
What's the best way to approach the issue?
An ounce of prevention is worth a ton of cure; learn the common causes of undefined behaviour and pick up good habits to avoid writing them. Once you've found that there is a problem, static analysis of the code is often a good idea; go through and reason to yourself and prove that indices will stay in bounds, data will fit its arrays, invalid pointers won't be dereferenced etc.

Memory allocation

I am experimenting with the c-language right at the moment, yet i have some trouble with memory allocation. After some time i have to restart my computer because my memory runs full. Is there a way to let the compiler tell me which arrays do not get deallocated after the program has run?
Thx for answers
you can use valgrind to do that.
http://tldp.org/HOWTO/Valgrind-HOWTO/
http://valgrind.org/
use it on your compiled program with --leak-check=yes
You didn't tell us anything about your compiler, OS, platform... so the rest could only be wild guesses.
This sounds much that you have dead processes or something like that that keep eating your memory in the background. On linux you have top (and inside top press M) to inspect the processes running on your system and how much memory, time etc they consume. Do that to see what is happening on your machine and don't reboot it blindly without knowing the reason.
There are equivalent tools on all other operating systems that let you inspect the current state of processes.
You have tools that can tell you about memory leaks. Compilers i am afraid may not be useful for tha tpurpose.
You can also use DevPartner or Valgrind to analyse your memory leaks in case you are suspecting them. But for your system to be restarted because of memory issues how long do you run the application before you perform a restart.
How did you get to know that this is a memory related issue.
You better check your source code first, if you are under Linux, using 'splint' to your source and that will display you a lot, try to fix those warnings or errors, if everything gets done, recompile your source and try 'valgrind' to the exacutable.
You can see the reference of splint through its official website and so as valgrind.
splint: www.splint.org
valgrind: valgrind.org
Good luck~~~

Finding where memory was last freed?

Very general:
Is there an easy way to tell which line of code last freed a block of memory when an access violation occurs?
Less general:
My understanding of profilers is that they override the allocation and deallocation processes. If this is true, might they happen to store the line of code that last freed a section of memory so that when it later crashes because of an access violation, you know what freed it last?
Specifics:
Windows, ANSI C, using Visual Studio
Yes!
Install the Windows Debugging Tools and use Application Verifier.
File -> Add Application, select your .exe
Under Basics, select Memory and Heaps.
Run the debug build of your program under ntsd (ntsd yourprogram.exe).
Reproduce the bug.
Now when you make the crash happen, you will get additional information in the debugger from AppVerifier. Use !avrf (may take a long time to run (minutes)) and it will try to give you as much useful information as possible.
You can all use the dps command on the memory address to get all the stored stack info (allocation, deallocation, etc).
You can also use the !heap command on the memory address:
0:004> !heap -p -a 0x0C46CFE0
Which will dump information as well.
Further Reading:
Advanced Windows Debugging, Hewardt and Pravat
Debugging with PageHeap
Short answer: no.
What you need is a debug malloc. I don't keep up with Windows any longer but there are several about, including this free one.
Update
Looks like Visual Studio C has a built in version. See here
When the application is linked with a
debug version of the C run-time
libraries, malloc resolves to
_malloc_dbg. For more information about how the heap is managed during
the debugging process, see The CRT
Debug Heap.
... and see here for _malloc_dbg.
No, not unless you provide your own allocators (e.g. by overloading new/delete) to store this information.
What profilers do is highly dependent on what they're profiling. I'm not aware of any profiler that tracks what you're looking for.
Perhaps if you provided more details on your situation people could suggest an alternative means of diagnosing the problem you're encountering.

Resources