fatal error disappeared when running with gdb - c

I have a program which produces a fatal error with a testcase, and I can locate the problem by reading the log and the stack trace of the fatal - it turns out that there is a read operation upon a null pointer.
But when I try to attach gdb to it and set a breakpoint around the suspicious code, the null pointer just cannot be observed! The program works smoothly without any error.
This is a single-process, single-thread program, I didn't experience this kind of thing before. Can anyone give me some comments? Thanks.
Appended: I also tried to call pause() syscall before the fatal-trigger code, and expected to make the program sleep before fatal point and then attach the gdb on it on-the-fly, sadly, no fatal occurred.

It's only guesswork without looking at the code, but debuggers sometimes do this:
They initialize certain stuff for you
The timing of the operations is changed
I don't have a quote on GDB, but I do have one on valgrind (granted the two do wildly different things..)
My program crashes normally, but doesn't under Valgrind, or vice versa. What's happening?
When a program runs under Valgrind,
its environment is slightly different
to when it runs natively. For example,
the memory layout is different, and
the way that threads are scheduled is
different.
Same would go for GDB.
Most of the time this doesn't make any
difference, but it can, particularly
if your program is buggy.
So the true problem is likely in your program.

There can be several things happening.. The timing of the application can be changed, so if it's a multi threaded application it is possible that you for example first set the ready flag and then copy the data into the buffer, without debugger attached the other thread might access the buffer before the buffer is filled or some pointer is set.
It's could also be possible that some application has anti-debug functionality. Maybe the piece of code is never touched when running inside a debugger.
One way to analyze it is with a core dump. Which you can create by ulimit -c unlimited then start the application and when the core is dumped you could load it into gdb with gdb ./application ./core You can find a useful write-up here: http://www.ffnn.nl/pages/articles/linux/gdb-gnu-debugger-intro.php

If it is an invalid read on a pointer, then unpredictable behaviour is possible. Since you already know what is causing the fault, you should get rid of it asap. In general, expect the unexpected when dealing with faulty pointer operations.

Related

Program does not segfault in gdb [duplicate]

I have a multithreaded C program, which consistently generates a segmentation fault at a specific point in the program. When I run it with gdb, no fault is shown. Can you think of any reason why the fault might occur only when not using the debugger? It's pretty annoying not being able to use it to find the problem!
Classic Heisenbug. From Wikipedia:
Time can also be a factor in heisenbugs. Executing a program under control of a debugger can change the execution timing of the program as compared to normal execution. Time-sensitive bugs such as race conditions may not reproduce when the program is slowed down by single-stepping source lines in the debugger. This is particularly true when the behavior involves interaction with an entity not under the control of a debugger, such as when debugging network packet processing between two machines and only one is under debugger control.
The debugger may be changing timing, and hiding a race condition.
On Linux, GDB also disables address space randomization, and your crash may be specific to address space layout. Try (gdb) set disable-randomization off.
Finally, ulimit -c unlimited and post-mortem debugging (already suggested by Robie) may work.
Perhaps when using gdb memory is mapped in a location which your over/under flow doesn't trample on memory that causes a crash. Or it could be a race condition that is no longer getting tripped. Although it sounds unintuitive, you should be happy your program was nice enough to crash on you.
Some suggestions
Try a static code analyzer such as the free
cppcheck
Try a malloc() debugger like
libefence
Try running it through valgrind
By debugging it you are changing the environment that it is running in. It sounds like you are dealing with some sort of race condition, and by debugging it things are scheduled slightly differently so you don't encounter the issue. That, or things are being stored in a slightly different way so it doesn't occur. Are you able to put some debugging output in the code to assist in figuring out the problem? That may have less of an impact and allow you to find your issue.
I have totally had this problem before! It was a race condition, and when I was stepping though the code with a debugger the thread i was in was slow enough to not trigger the race condition. Pretty awful.
If you're using gcc, try using the -Wall option to get all warnings. If you use an IDE like Eclipse, it would do that automatically.

C malloc "can't allocate region" error, but can't repro with GDB?

How can I debug a C application that does not crash when attached with gdb and run inside of gdb?
It crashes consistently when run standalone - even the same debug build!
A few of us are getting this error with a C program written for BSD/Linux, and we are compiling on macOS with OpenSSL.
app(37457,0x7000017c7000) malloc: *** mach_vm_map(size=13835058055282167808) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
ERROR: malloc(buf->length + 1) failed!
I know, not helpful.
Recompiling the application with -g -rdynamic gives the same error. Ok, so now we know it isn't because of a release build as it continues to fail.
It works when running within a gdb debugging session though!!
$ sudo gdb app
(gdb) b malloc_error_break
Function "malloc_error_break" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (malloc_error_break) pending.
(gdb) run -threads 8
Starting program: ~/code/app/app -threads 8
[New Thread 0x1903 of process 45436]
warning: unhandled dyld version (15)
And it runs for hours. CTRL-C, and run ./app -threads 8 and it crashes after a second or two (a few million iterations).
Obviously there's an issue within one of the threads. But those workers for the threads are pretty big (a few hundred lines of code). Nothing stands out.
Note that the threads iterate over loops of about 20 million per second.
macOS 10.12.3
Homebrew w/GNU gcc and openssl (linking to crypto)
Ps, not familiar with C too much - especially any type of debugging. Be kind and expressive/verbose in answers. :)
One debugging technique that is sometimes overlooked is to include debug prints in the code, of course it has it's disadvantages, but also it has advantages. A thing you must keep in mind though in the face of abnormal termination is to make sure the printouts actually get printed. Often it's enough to print to stderr (but if that doesn't make the trick one may need to fflush the stream explicitly).
Another trick is to stop the program before the error occurs. This requires you to know when the program is about to crash, preferably as close as possible. You do this by using raise:
raise(SIGSTOP);
This does not terminate the program, it just suspends execution. Now you can attach with gdb using the command gdb <program-name> <pid> (use ps to find the pid of the process). Now in gdb you have to tell it to ignore SIGSTOP:
> handle SIGSTOP ignore
Then you can set break-points. You can also step out of the raise function using the finish command (may have to be issued multiple times to return to your code).
This technique makes the program have normal behaviour up to the time you decide to stop it, hopefully the final part when running under gdb would not alter the behavior enuogh.
A third option is to use valgrind. Normally when you see these kind of errors there's errors involved that valgrind will pick up. These are accesses out of range and uninitialized variables.
Many memory managers initialise memory to a known bad value to expose problems like this (e.g. Microsoft's CRT will use a range of values (0xCD means uninitialised, 0xDD means already free etc).
After each use of malloc, try memset'ing the memory to 0xCD (or some other constant value). This will allow you to identify uninitialised memory more easily with the debugger. don't use 0x00 as this is a 'normal' value and will be harder to spot if it's wrong (it will also probably 'fix' your problem).
Something like:
void *memory = malloc(sizeof(my_object));
memset(memory, 0xCD, sizeof(my_object));
If you know the size of the blocks, you could do something similar before free (this is sometimes harder unless you know the size of your objects, or track it in some way):
memset(memory, 0xDD, sizeof(my_object));
free(memory);

Best way to print information when debugging a race condition

I am debugging an application to fix a segmentation fault that I suspect to be caused by a race condition.
I'd like to put some print statements in the code, but I know for experience that adding calls to printf is not recommended since this could change the behavior of the threads and in some case hide the bug.
Looking at other options, I have seen that with gdb it is possible to use break points to print something and then automatically continue the execution:
break foo
commands
silent
printf "Called foo: x is %d\n",x
cont
end
Is this any better then putting a printf in my code?
I know that gdb has a also Tracepoints but they only work with gdbserver and this is an additional level of complication that I would prefer to avoid at the moment.
Additional information: the application is written in C and it runs on Linux.
Is this any better then putting a printf in my code?
No, it's much worse. Every breakpoint that is hit in GDB triggers the following chain of events:
context switch from running thread to GDB
GDB stops all other threads (assuming default all-stop mode)
GDB evaluates breakpoint commands
GDB resumes all threads (this is itself a complicated multi-step process, which I would not go into here).
This is at least an order of magnitude more expensive and disruptive than a simple printf call, and is very likely to hide whatever race you were trying to debug.
The bottom line is that GDB is in general completely unsuitable for debugging data races.
I second the ThreadSanitizer recommendation by Christopher Ian Stern.
The only problem with this bug is that I am doing the debug on a production machine where I cannot install other SW.
First, ThreadSanitizer instruments your existing program. It has a runtime library, but that could be statically linked into your binary. There is nothing that you need to install on your production machine.
Second, ThreadSanitizer detects data races even when they do not cause visible behavior changes. It may well turn out that you don't need to run on your production machine at all: simply running your tests (you do have tests, right?) on your development machine may prove to be sufficient.
Since you are on Linux, I would recomend ThreadSanitizer. That is using a recent version of gcc or clang and passing -fsanitize=thread to the build. This isn't a printf repacment but should tell you explicitly about any race conditions in your code. Even after you solve this problem if you are working with multithreaded code, you will want to have this tool available. Alternately, or in addition, I have had good results with Valgrind's http://valgrind.org Data Race Detector, but I would start with ThreadSanitizer.

Ptrace mprotect debugging trouble

I'm having trouble with an research project.
What i am trying to is to use ptrace to watch the execution of a target process.
With the help of ptrace i am injecting a mprotect syscall into the targets code segment (similar to a breakpoint) and set the stack protection to PROT_NONE.
After that i restore the original instructions and let the target continue.
When i get an invalid permisson segfault i again inject the syscall to unprotect the stack again and afterwards i execute the instruction which caused the segfault and protect the stack again.
(This does indeed work for simple programs.)
My problem now is, that with this setup the target (pretty) randomly crashes in library function calls (no matter whether i use dynamic or static linking).
By crashing i mean, it either tries to access memory which for some reason is not mapped, or it just keeps hanging in the function __lll_lock_wait_private (that was following a malloc call).
Let me emphasis again, that the crashes don't always happen and don't always happen at the same positions.
It kind of sounds like an synchronisation problem but as far as i can tell (meaning i looked into /proc/pid/tasks/) there is only one thread running.
So do you have any clue what could be the reason for this?
Please tell me your suggestions even if you are not sure, i am running out of ideas here ...
It's also possible the non-determinism is created by address space randomization.
You may want to disable that to try and make the problem more deterministic.
EDIT:
Given that turning ASR off 'fixes' the problem then maybe the under-lying problem might be:
Somewhere thinking 0 is invalid when it should be valid, or visaversa. (What I had).
Using addresses from one run against a different run?

segfault only when NOT using debugger

I have a multithreaded C program, which consistently generates a segmentation fault at a specific point in the program. When I run it with gdb, no fault is shown. Can you think of any reason why the fault might occur only when not using the debugger? It's pretty annoying not being able to use it to find the problem!
Classic Heisenbug. From Wikipedia:
Time can also be a factor in heisenbugs. Executing a program under control of a debugger can change the execution timing of the program as compared to normal execution. Time-sensitive bugs such as race conditions may not reproduce when the program is slowed down by single-stepping source lines in the debugger. This is particularly true when the behavior involves interaction with an entity not under the control of a debugger, such as when debugging network packet processing between two machines and only one is under debugger control.
The debugger may be changing timing, and hiding a race condition.
On Linux, GDB also disables address space randomization, and your crash may be specific to address space layout. Try (gdb) set disable-randomization off.
Finally, ulimit -c unlimited and post-mortem debugging (already suggested by Robie) may work.
Perhaps when using gdb memory is mapped in a location which your over/under flow doesn't trample on memory that causes a crash. Or it could be a race condition that is no longer getting tripped. Although it sounds unintuitive, you should be happy your program was nice enough to crash on you.
Some suggestions
Try a static code analyzer such as the free
cppcheck
Try a malloc() debugger like
libefence
Try running it through valgrind
By debugging it you are changing the environment that it is running in. It sounds like you are dealing with some sort of race condition, and by debugging it things are scheduled slightly differently so you don't encounter the issue. That, or things are being stored in a slightly different way so it doesn't occur. Are you able to put some debugging output in the code to assist in figuring out the problem? That may have less of an impact and allow you to find your issue.
I have totally had this problem before! It was a race condition, and when I was stepping though the code with a debugger the thread i was in was slow enough to not trigger the race condition. Pretty awful.
If you're using gcc, try using the -Wall option to get all warnings. If you use an IDE like Eclipse, it would do that automatically.

Resources