gdb not giving reason for core dump - c

Typically when analyzing a core dump in gdb it'll print the reason why it was generated:
Core was generated by `executable'.
Program terminated with signal 11, Segmentation fault.
However I've encountered a situation where gdb isn't giving a reason:
Core was generated by `executable'.
I'm wondering what could cause a core dump where gdb doesn't give the reason for its generation.

It turns out that this core file was generated using gcore, so there wasn't actually a problem with the executable. /facepalm

Related

Coredump GDB "Backtrace stopped: frame did not save the PC"

While trying to analyse the backtrace of a coredump (process dumped by a SIGABRT from assert) in GDB I get the following output:
(gdb) bt
#0 0x76d6bc54 in raise () from ./lib/libc.so.1
#1 0x76d63bb8 in abort () from ./lib/libc.so.1
Backtrace stopped: frame did not save the PC
(gdb) thread apply all bt
The binary is compiled with "-g" so are all linked libraries except the ones from the toolchain (e.g. libc which doesn't even have symbols) of which I can't determine how it was built.
Is this stack corruption or is it consequence of libc being compiled with something like "fomit-frame-pointer".
As a general question if an uncaught exception happens from a runtime linked library and that library wasn't built for debug what happens i.e. can the coredump still contain useful information?
Thanks
I think the culprit was the libc the application was loading. It was probably compiled with some options that made the coredump useless. What I did was create a custom toolchain (used one compiled from buildroot) and compiling and running the application with that toolchain. I then was able to successfully read the coredump.
One way to improve a backtrace which includes functions from shared objects without debug symbols is to install debug symbols where gdb can see them. The details of how to do that depend on your environment. One example is that if libc.so.6 is provided by the libc6 package on a Debian system, installing the libc6-dbg package places a number of symbol tables underneath /usr/lib/debug/.build-id (the libc6 package, in addition to libc.so.6, provides a number of other stripped shared objects). If you're using a debugger environment for a non-native core (as suggested by the leading . in ./lib/libc.so.1) you might extract such a package rather than installing it (on a Debian system dpkg -x is one way to do that).
Aside from the issue of debug symbols, in some cases you can improve a backtrace by insuring that the shared objects (stripped or otherwise) seen by gdb correctly correspond to the shared objects which were in use by the process which dumped the core. One way to check that is to compare build IDs, which are (on a typical Linux system) reported by the file command. This only helps if you can reliably determine which shared objects were in use at the time of the core dump, and it presumes your shared objects were built in such a way that they include build IDs.
In some situations the build IDs of the executable and all of the relevant shared objects can be reliably extracted from the core file itself. On a Linux system, this requires the presence of a file note in the core, and the presence of the first page of the executable and each shared object. Recent Linux kernels configured with typical defaults include all of those.
https://github.com/wackrat/structer provides python code which extracts build IDs from a core file which satisfies its assumptions. Depending on how big the core file is it might be preferable to use a 64 bit system, even if the core itself came from a 32 bit system.
If it turns out that gdb is using the correct shared objects for this core (or if there is no viable way to confirm or refute that), another possibility is to disassemble code in the two stack frames reported by gdb. If the shared objects gdb sees are not the right ones for this core, that disassembly is likely to be mysterious, because gdb relies on the contents of the shared objects it uses to line up with the contents of that location at the time the core file was dumped (readonly segments are typically excluded from the core file, with the exception of that first page which provides each build ID). In my experience gdb can typically provide a coherent backtrace without debug symbols even without a frame pointer, but if the wrong shared object is used, gdb might be basing its backtrace on instructions which do not correspond to the correct contents of that location.

Why is segmentation fault coming in kernel ?

I am learning os kernel development and still at a very beginner level. I have written a bit of code for 80386 processor and testing it on qemu using gdb as a debugger (remote debugging).
Now, strange error is coming :- When , I run the code in qemu, it runs fine but when I run it and connect it to gdb. gdb shows segmentation fault in it at a line.
My, question is that how can segmentation fault come in the os kernel when I am running in real mode currently and haven't even used memory protection. Also, if there is a mechanism by which segmentation fault is generated why is the kernel running fine in qemu.
The seg fault is thrown by the hardware not the OS. So yes you can still get segfaults, but segfaults are some of the easier bugs to fix.

How to simulate slowdown in gdb?

I have a program with concurrency, and when I run it normally, everything works fine. However, when using valgrind, I run into (probably deadlock) issues, most likely because valgrind causes the program to run much slower. I tried debugging the program using gdb, but again I couldn't reproduce the error. I was wondering if there was some way to make gdb run slower so that I could potentially reproduce and find the bug. The program is run on a remote server.
The program is huge with loads of concurrency, so pure code analysis isn't very realistic at this point.
when using valgrind, I run into (probably deadlock) issues,
There is a way to analyze your problem when it happens while running your program under valgrind. Use gdbserver feature in valgrind in order to analyze this deadlock. Run you program under valgrind, get a deadlock, then attach gdb and investigate your deadlock. This is from valgrind doc:
A program running under Valgrind is not executed directly by the CPU.
Instead it runs on a synthetic CPU provided by Valgrind. This is why a
debugger cannot debug your program when it runs on Valgrind.
This section describes how GDB can interact with the Valgrind
gdbserver to provide a fully debuggable program under Valgrind.
So you need to run your program under valgrind in this way:
valgrind --vgdb=yes --vgdb-error=0 prog
And as soon as you get a deadlock, attach to it under gdb according to instructions here: http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver.
I think if you have indeed a deadlock, then you need to run thread apply all backtrace in gdb then.
You could try to use the helgrind valgrind plugin. It is a thread-error detector and can help you to discover inconsistent lock ordering (source of deadlocks).
Another way to do this is to place sleep call inside the source code of your critical sections. But it's a little dirty.
Another way to run gdb and valgrind simultaneously, suggested by valgrind at startup, though I didn't notice it at first:
==16218== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==16218== /path/to/gdb ./ha3_qdqx.exe
==16218== and then give GDB the following command
==16218== target remote | /path/to/valgrind/bin/vgdb --pid=16218
==16218== --pid is optional if only one valgrind process is running

Bus error disappears in gdb

I have a large program in C that compiles fine, but when I run it (./a.out) I get: Bus error 10!
I used the gdb debugger to trace the memory error but the strange thing is that the program completes normally inside the gdb..Can this behaviour somehow be explained and how am I gonna debug my code now?
On some operating systems gdb will load the program differently in gdb. I know that on MacOS gdb will disable some address space layout randomization which changes how relocation of shared libraries is done. On some operating systems gdb will load more sections than a normal program execution or load those sections with wider permissions (non-executable memory might be executable under gdb or read-only will become writeable).
Your best bet is to catch a core dump of the problem and continue debugging from there. Valgrind is also good at catching this type of bugs.

Cannot reproduce segfault in gdb

I'm getting segfaults when I run my project. Every time I run the program in gdb, the segfaults disappear. This behavior is not random: each time I run it in my shell it segfaults, each time I run it in gdb, the segfaults disappear. (I did recompile using -g).
So before I start adding printfs frantically everywhere in my code, I would like to know a few things:
Is this behavior common?
What's the best way to approach the issue?
I don't know if tests can be scripted since my application is interactive and crashes on a particular user input.
I didn't paste my code here because it'd be way too long. But if anyone is interested in helping out, here it is:
https://github.com/rahmu/Agros
The easiest way to figure it out is to capture core dumps:
$ ulimit -c unlimited
Then run your program. It will generate a core file
Then use gdb:
$ gdb ./program core
And gdb will load and you can run a backtrace to see exactly what operation elicited the segfault.
Does it do a core dump? Is so that load up the core dump in the debugger. Otherwise change the code to get it to do a core dump.
My guess is that it's a concurrency problem causing a reference to be freed out from under a method call that's assuming that the pointer it has will stay valid. The reason that gdb is probably masking this is because GDB only allows 2 threads to actually concurrently run. If you have more than 2 threads running the only 2 will actively run concurrently. GDB also has performance hits which could be masking this specific condition. As mentioned by Ed just make your application core dump and you can open up the core in GDB and check the stack.
Is this behavior common?
Yes. Undefined behaviour is the source of most of these problems, and by definition it is undefined. Recompiling with -g may certainly affect the results. Recompiling at all may change the results, if the compiler uses some pseudo-random genetic algorithm to optimize stuff or something like that.
What's the best way to approach the issue?
An ounce of prevention is worth a ton of cure; learn the common causes of undefined behaviour and pick up good habits to avoid writing them. Once you've found that there is a problem, static analysis of the code is often a good idea; go through and reason to yourself and prove that indices will stay in bounds, data will fit its arrays, invalid pointers won't be dereferenced etc.

Resources