I have a large program in C that compiles fine, but when I run it (./a.out) I get: Bus error 10!
I used the gdb debugger to trace the memory error but the strange thing is that the program completes normally inside the gdb..Can this behaviour somehow be explained and how am I gonna debug my code now?
On some operating systems gdb will load the program differently in gdb. I know that on MacOS gdb will disable some address space layout randomization which changes how relocation of shared libraries is done. On some operating systems gdb will load more sections than a normal program execution or load those sections with wider permissions (non-executable memory might be executable under gdb or read-only will become writeable).
Your best bet is to catch a core dump of the problem and continue debugging from there. Valgrind is also good at catching this type of bugs.
Related
Currently I'm trying to run qemu-system-arm for armv7 architecture, do some initial setup for paging and then enable MMU.
I run qemu with gdb stub and connect to it then with gdb.
I must have screwed something up with translation tables/registers/etc., the thing is the minute I set MMU-enable bit in control register, gdb can't fetch data from memory anymore: after ni command which executes mmu-enable instruction it doesn't fetch next command and I can't access memory.
Is there any way to look what happens inside Qemu's MMU? Where it takes translation tables from, what calculates etc.
Or should I just recompile it with my additional debug output?
No, there's no way to trace this without modifying QEMU's sources yourself
So I did. For ARM architecture, the relevant code is found in target-arm/helper.c - get_phys_addr* functions.
No, there's no way to trace this without modifying QEMU's sources yourself to add debugging output. This is a specific case of a more general tendency, which is that QEMU's design and approach is largely "run correct code quickly", not to provide detailed introspection into the behaviour of possibly buggy guests. Sometimes, as in this case, there's a nice easy location to add your own debug printing; sometimes, as in the fairly common desire to print all the memory accesses made by the guest, there is nowhere in the C code where tracing can be put to catch all accesses.
When I used QEMU for debugging VM issues in an operating system kernel that I had built with a colleague, I ended up connecting GDB to debug QEMU instead (instead of debugging the guest process inside QEMU).
You can place breakpoints on the MMU table walking function and step through it.
I have a program with concurrency, and when I run it normally, everything works fine. However, when using valgrind, I run into (probably deadlock) issues, most likely because valgrind causes the program to run much slower. I tried debugging the program using gdb, but again I couldn't reproduce the error. I was wondering if there was some way to make gdb run slower so that I could potentially reproduce and find the bug. The program is run on a remote server.
The program is huge with loads of concurrency, so pure code analysis isn't very realistic at this point.
when using valgrind, I run into (probably deadlock) issues,
There is a way to analyze your problem when it happens while running your program under valgrind. Use gdbserver feature in valgrind in order to analyze this deadlock. Run you program under valgrind, get a deadlock, then attach gdb and investigate your deadlock. This is from valgrind doc:
A program running under Valgrind is not executed directly by the CPU.
Instead it runs on a synthetic CPU provided by Valgrind. This is why a
debugger cannot debug your program when it runs on Valgrind.
This section describes how GDB can interact with the Valgrind
gdbserver to provide a fully debuggable program under Valgrind.
So you need to run your program under valgrind in this way:
valgrind --vgdb=yes --vgdb-error=0 prog
And as soon as you get a deadlock, attach to it under gdb according to instructions here: http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver.
I think if you have indeed a deadlock, then you need to run thread apply all backtrace in gdb then.
You could try to use the helgrind valgrind plugin. It is a thread-error detector and can help you to discover inconsistent lock ordering (source of deadlocks).
Another way to do this is to place sleep call inside the source code of your critical sections. But it's a little dirty.
Another way to run gdb and valgrind simultaneously, suggested by valgrind at startup, though I didn't notice it at first:
==16218== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==16218== /path/to/gdb ./ha3_qdqx.exe
==16218== and then give GDB the following command
==16218== target remote | /path/to/valgrind/bin/vgdb --pid=16218
==16218== --pid is optional if only one valgrind process is running
I'm new to embedded programming but I have to debug a quite complex application running on an embedded platform. I use GDB through a JTAG interface.
My program crashes at some point in an unexpected way. I suppose this happens due to some memory related issue. Does GDB allow me to inspect the memory after the system has crashed, thus being completely unresponsive?
It depends on your setup a bit. In particular, since you're using JTAG, you may be able to set your debugger up to halt the processor when it detects an exception (for example accessing protected memory illegally and so forth). If not, you can replace your exception handlers with infinite loops. Then you can manually unroll the exception to see what the processor was doing that caused the crash. Normally, you'll still have access to memory in that situation and you can either use GDB to look around directly, or just dump everything to a file so you can look around later.
It depends on what has crashed. If the system is only unresponsive (in some infinite loop, deadlock or similar), then it will normally respond to GDB and you will be able to see a backtrace (call stack), etc.
If the system/bus/cpu has actually crashed (on lower level), then it probably will not respond. In this case you can try setting breakpoints at suspicious places/variables and observe what is happening. Also simulator (ISS, RTL - if applicable) could come handy, to compare behavior with HW.
I got this problem on different C projects when using gdb.
If I run my program without it, it crashes consistently at a given event probably because of a invalid read of the memory. I try debugging it with gdb but when I do so, the crash seems to never occur !
Any idea why this could happen ?
I'm using mingw toolchain on Windows.
Yes, it sounds like a race condition or heap corruption or something else that is usually responsible for Heisenbugs. The problem is that your code is likely not correct at some place, but that the debugger will have to behave even if the debugged application does funny things. This way problems tend to disappear under the debugger. And for race conditions they often won't appear in the first place because some debuggers can only handle one thread at a time and uniformly all debuggers will cause the code to run slower, which may already make race conditions go away.
Try Valgrind on the application. Since you are using MinGW, chances are that your application will compile in an environment where Valgrind can run (even though it doesn't run directly on Windows). I've been using Valgrind for about three years now and it has solved a lot of mysteries quickly. The first thing when I get a crash report on the code I'm working with (which runs on AIX, Solaris, BSDs, Linux, Windows) I'm going to make one test run of the code under Valgrind in x64 and x86 Linux respectively.
Valgrind, and in your particular case its default tool Memcheck, is going to emulate through the code. Whenever you allocate memory it will mark all bytes in that memory as "tainted" until you actually initialize it explicitly. The tainted status of memory bytes will get inherited by memcpy-ing uninitialized memory and will lead to a report from Valgrind as soon as an uninitialized byte is used to make a decision (if, for, while ...). Also, it keeps track of orphaned memory blocks and will report leaks at the end of the run. But that's not all, more tools are part of the Valgrind family and test various aspects of your code, including race conditions between threads (Helgrind, DRD).
Assuming Linux now: make sure that you have all the debug symbols of your supporting libraries installed. Usually those come in the *-debug version of packages or in *-devel. Also, make sure to turn off optimization in your code and include debug symbols. For GCC that's -ggdb -g3 -O0.
Another hint: I've had it that pointer aliasing has caused some grief. Although Valgrind was able to help me track it down, I actually had to do the last step and verify the created code in its disassembly. It turned out that at -O3 the GCC optimizer got ahead of itself and turned a loop copying bytes into a sequence of instructions to copy 8 bytes at once, but assumed alignment. The last part was the problem. The assumption about alignment was wrong. Ever since, we've resorted to building at -O2 - which, as you will see in this Gentoo Wiki article, is not the worst idea. To quote the relevant partÖ
-O3: This is the highest level of optimization possible, and also the riskiest. It will take a longer time to compile your code with this
option, and in fact it should not be used system-wide with gcc 4.x.
The behavior of gcc has changed significantly since version 3.x. In
3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all
your packages with -O3 will result in larger binaries that require
more memory, and will significantly increase the odds of compilation
failure or unexpected program behavior (including errors). The
downsides outweigh the benefits; remember the principle of diminishing
returns. Using -O3 is not recommended for gcc 4.x.
Since you are using GCC in MinGW, I reckon this could well apply to your case as well.
Any idea why this could happen ?
There are several usual reasons:
Your application has multiple threads, has a race condition, and running under GDB affects timing in such a way that the crash no longer happens
Your application has a bug that is affected by memory layout (often reading of uninitialized memory), and the layout changes when running under GDB.
One way to approach this is to let the application trap whatever unhandled exception it is being killed by, print a message, and spin forever. Once in that state, you should be able to attach GDB to the process, and debug from there.
Although it's a bit late, one can read this question's answer in order to be able to set up a system to catch a coredump without using gdb. He may then load the core file using
gdb <path_to_core_file> <path_to_executable_file>
and then issue
thread apply all bt
in gdb.
This will show stack traces for all threads that were running when the application crashed, and one may be able to locate the last function and the corresponding thread that caused the illegal access.
Your application is probably receiving signals and gdb might not pass them on depending on its configuration. You can check this with the info signals or info handle command. It might also help to post a stack trace of the crashed process. The crashed process should generate a core file (if it hasn't been disabled) which can be analyzed with gdb.
HI, i am recently in a project in linux written in C.
This app has several processes and they share a block of shared memory...When the app run for about several hrs, a process collapsed without any footprints so it's very diffficult to know what the problem was or where i can start to review the codes....
well, it could be memory overflown or pointer malused...but i dunno exactly...
Do you have any tools or any methods to detect the problems...
It will very appreciated if it get resolved. thanx for your advice...
Before you start the program, enable core dumps:
ulimit -c unlimited
(and make sure the working directory of the process is writeable by the process)
After the process crashes, it should leave behind a core file, which you can then examine with gdb:
gdb /some/bin/executable core
Alternatively, you can run the process under gdb when you start it - gdb will wake up when the process crashes.
You could also run gdb in gdb-many-windows if you are running emacs. which give you better debugging options that lets you examine things like the stack, etc. This is much like Visual Studio IDE.
Here is a useful link
http://emacs-fu.blogspot.com/2009/02/fancy-debugging-with-gdb.html
Valgrind is where you need to go next. Chances are that you have a memory misuse problem which is benign -- until it isn't. Run the programs under valgrind and see what it says.
I agree with bmargulies -- Valgrind is absolutely the best tool out there to automatically detect incorrect memory usage. Almost all Linux distributions should have it, so just emerge valgrind or apt-get install valgrind or whatever your distro uses.
However, Valgrind is hardly the least cryptic thing in existence, and it usually only helps you tell where the program eventually ended up accessing memory incorrectly -- if you stored an incorrect array index in a variable and then accessed it later, then you will still have to figure that out. Especially when paired with a powerful debugger like GDB, however (the backtrace or bt command is your friend), Valgrind is an incredibly useful tool.
Just remember to compile with the -g flag (if you are using GCC, at least), or Valgrind and GDB will not be able to tell you where in the source the memory abuse occurred.