This is a mad-hack, but I am trying to deliberately cause a segfault at a particular point in execution, so valgrind will give me a stack trace.
If there is a better way to do this please tell me, but I would still be curious to know how to deliberaly cause a segfault, and why my attempt didn't work.
This is my failed attempt:
long* ptr = (long *)0xF0000000;
ptr = 10;
I thought valgrind should at least pick up on that as a invalid write, even if it's not a segmentation violation. Valgrind says nothing about it.
Any ideas why?
EDIT
Answer accepted, but I still have some up-votes for any suggestions for a more sane way to get a stack trace...
Just call abort(). It's not a segfault but it should generate a core dump.
Are you missing a * as in *ptr = 10? What you have won't compile.
If it does, somehow, that of course won't cause a seg-fault, since you're just assigning a number. Dereferencing might.
Assuming dereferencing null on your OS results in a segfault, the following should do the trick:
inline void seg_fault(void)
{
volatile int *p = reinterpret_cast<volatile int*>(0);
*p = 0x1337D00D;
}
Sorry for mentioning the obvious but why not use gdb with a breakbpoint and then use backtrace?
(gdb) b somewhere
(gdb) r
(gdb) bt
As mentioned in other answers you can just call abort() if you want to abnormally terminate your program entirely, or kill(getpid(), SIGSEGV) if it has to be a segmentation fault. This will generate a core file that you can use with gdb to get a stack trace or debug even if you are not running under valgrind.
Using a valgrind client request you can also have valgrind dump a stack trace with your own custom message and then continue executing. The client request does nothing when the program is not running under valgrind.
#include <valgrind/valgrind.h>
...
VALGRIND_PRINTF_BACKTRACE("Encountered the foobar problem, x=%d, y=%d\n", x, y);
Wouldn't it be better to send sig SEGV (11) to the process to force a core dump?
Are you on x86? If so then there is actually an opcode in the CPU that means "invoke whatever debugger may be attached". This is opcode CC, or more commonly called int 3. Easiest way to trigger it is with inline assembly:
inline void debugger_halt(void)
{
#ifdef MSVC
__asm int 3;
#elif defined(GCC)
asm("int 3");
#else
#pragma error Well, you'll have to figure out how to do inline assembly
in your compiler
#endif
}
MSVC also supports __debugbreak() which is a hardware break in unmanaged code and an MSIL "break" in managed code.
Related
I'm writing my own test-runner for my current project. One feature (that's probably quite common with test-runners) is that every testcase is executed in a child process, so the test-runner can properly detect and report a crashing testcase.
I want to also test the test-runner itself, therefore one testcase has to force a crash. I know "crashing" is not covered by the C standard and just might happen as a result of undefined behavior. So this question is more about the behavior of real-world implementations.
My first attempt was to just dereference a null-pointer:
int c = *((int *)0);
This worked in a debug build on GNU/Linux and Windows, but failed to crash in a release build because the unused variable c was optimized out, so I added
printf("%d", c); // to prevent optimizing away the crash
and thought I was settled. However, trying my code with clang instead of gcc revealed a surprise during compilation:
[CC] obj/x86_64-pc-linux-gnu/release/src/test/test/test_s.o
src/test/test/test.c:34:13: warning: indirection of non-volatile null pointer
will be deleted, not trap [-Wnull-dereference]
int c = *((int *)0);
^~~~~~~~~~~
src/test/test/test.c:34:13: note: consider using __builtin_trap() or qualifying
pointer with 'volatile'
1 warning generated.
And indeed, the clang-compiled testcase didn't crash.
So, I followed the advice of the warning and now my testcase looks like this:
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
int *volatile nptr = 0;
int c = *nptr;
printf("%d", c); // to prevent optimizing away the crash
}
This solved my immediate problem, the testcase "works" (aka crashes) with both gcc and clang.
I guess because dereferencing the null pointer is undefined behavior, clang is free to compile my first code into something that doesn't crash. The volatile qualifier removes the ability to be sure at compile time that this really will dereference null.
Now my questions are:
Does this final code guarantee the null dereference actually happens at runtime?
Is dereferencing null indeed a fairly portable way for crashing on most platforms?
I wouldn't rely on that method as being robust if I were you.
Can't you use abort(), which is part of the C standard and is guaranteed to cause an abnormal program termination event?
The answer refering to abort() was great, I really didn't think of that and it's indeed a perfectly portable way of forcing an abnormal program termination.
Trying it with my code, I came across msvcrt (Microsoft's C runtime) implements abort() in a special chatty way, it outputs the following to stderr:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
That's not so nice, at least it unnecessarily clutters the output of a complete test run. So I had a look at __builtin_trap() that's also referenced in clang's warning. It turns out this gives me exactly what I was looking for:
LLVM code generator translates __builtin_trap() to a trap instruction if it is supported by the target ISA. Otherwise, the builtin is translated into a call to abort.
It's also available in gcc starting with version 4.2.4:
This function causes the program to exit abnormally. GCC implements this function by using a target-dependent mechanism (such as intentionally executing an illegal instruction) or by calling abort.
As this does something similar to a real crash, I prefer it over a simple abort(). For the fallback, it's still an option trying to do your own illegal operation like the null pointer dereference, but just add a call to abort() in case the program somehow makes it there without crashing.
So, all in all, the solution looks like this, testing for a minimum GCC version and using the much more handy __has_builtin() macro provided by clang:
#undef HAVE_BUILTIN_TRAP
#ifdef __GNUC__
# define GCC_VERSION (__GNUC__ * 10000 \
+ __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
# if GCC_VERSION > 40203
# define HAVE_BUILTIN_TRAP
# endif
#else
# ifdef __has_builtin
# if __has_builtin(__builtin_trap)
# define HAVE_BUILTIN_TRAP
# endif
# endif
#endif
#ifdef HAVE_BUILTIN_TRAP
# define crashMe() __builtin_trap()
#else
# include <stdio.h>
# define crashMe() do { \
int *volatile iptr = 0; \
int i = *iptr; \
printf("%d", i); \
abort(); } while (0)
#endif
// [...]
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
crashMe();
}
you can write memory instead of reading it.
*((int *)0) = 0;
No, dereferencing a NULL pointer is not a portable way of crashing a program. It is undefined behavior, which means just that, you have no guarantees what will happen.
As it happen, for the most part under any of the three main OS's used today on desktop computers, that being MacOS, Linux and Windows NT (*) dereferencing a NULL pointer will immediately crash your program.
That said: "The worst possible result of undefined behavior is for it to do what you were expecting."
I purposely put a star beside Windows NT, because under Windows 95/98/ME, I can craft a program that has the following source:
int main()
{
int *pointer = NULL;
int i = *pointer;
return 0;
}
that will run without crashing. Compile it as a TINY mode .COM files under 16 bit DOS, and you'll be just fine.
Ditto running the same source with just about any C compiler under CP/M.
Ditto running that on some embedded systems. I've not tested it on an Arduino, but I would not want to bet either way on the outcome. I do know for certain that were a C compiler available for the 8051 systems I cut my teeth on, that program would run fine on those.
The program below should work. It might cause some collateral damage, though.
#include <string.h>
void crashme( char *str)
{
char *omg;
for(omg=strtok(str, "" ); omg ; omg=strtok(NULL, "") ) {
strcat(omg , "wtf");
}
*omg =0; // always NUL-terminate a NULL string !!!
}
int main(void)
{
char buff[20];
// crashme( "WTF" ); // works!
// crashme( NULL ); // works, too
crashme( buff ); // Maybe a bit too slow ...
return 0;
}
Can I detect a possible segmentation fault at compile-time?
I understand the circumstance of a segmentation fault. But I am curious if GCC as a compiler has some flags to check for the basic scenarios resulting in segmentation faults.
This would help enormously to take precautions before releasing a library.
Can I detect a possible segmentation fault at compile time?
Sometimes, but no, you can't flawlessly detect these scenarios at compile time. Consider the general case in this C code:
volatile extern int mem[];
void foo (int access)
{
mem[access];
}
A compiler would be too noisy if it were to warn about this access at compile time, the code is valid C and a warning is, in general, inappropriate. Static analysis can't do anything with this code unless you have a mechanism for whole-program or link-time analysis.
An additonal optimization flag in GCC 4.8 which can sometimes catch a few out-of-bounds access in loops is `-faggressive-loop-optimizations'. This found a number of issues in the SPEC benchmark suite last year (http://blog.regehr.org/archives/918)
I understand the circumstance of segmentation fault. But i am curious if GCC as a compiler has some flags to check for the basic scenarios resulting in segmention faults.
GCC 4.8 comes with an address sanitizer which can help catch some of these run-time only issues (out of bounds/use-after-free bugs). You can use it with
-fsanitize=address.
http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#Debugging-Options
GCC 4.9 (which will be released within the next few months) comes with an undefined behaviour sanitizer and more aggressive optimization of NULL pointer paths, which might help you catch some more issues. When it comes, it will be available with -fsanitize=undefined
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options
Note however that neither of these are "compile-time" solutions, they both rely on instrumenting the binary and performing run-time checks.
Yes, there are ways of detecting some faults that may cause runtime errors such as segmentation faults. Those ways are called warnings. Many warnings messages are places where you have undefined behavior, and undefined behavior is often the leading cause of runtime crashes.
When I build, I always use the -Wall, -Wextra and -pedantic flags.
Other than that, there are really no good way of detecting all places that may cause segmentation faults (or other runtime errors), except strict coding guidelines, code reviews and plenty of testing.
gcc -Wall -Werror as mention by Joachim Pileborg are very good ideas. You could also use another compiler maybe. some reports more memory issues. I think you can not do a lot more at compile time.
At running time, I highly recommend Valgrind, which is a amazing tool for detecting memory issues. (don't forget to compile with the -g option)
Can I detect a possible segmentation fault at compile-time?
Yes, it is possible. Unfortunately, it is very limited what the compiler can do. Here is a buggy code example and the output from gcc and clang:
#include <stdlib.h>
int main() {
int a[4];
int x, y;
a[5]=1;
if(x)
y = 5;
x = a[y];
int* p = malloc(3*sizeof(int));
p[5] = 0;
free(p);
free(p);
}
For this buggy code, gcc -Wall -Wextra corrupt.c gives
corrupt.c: In function ‘main’:
corrupt.c:13:1: warning: control reaches end of non-void function [-Wreturn-type]
corrupt.c:6:7: warning: ‘x’ is used uninitialized in this function [-Wuninitialized]
clang catches more:
corrupt.c:5:5: warning: array index 5 is past the end of the array (which contains 4 elements) [-Warray-bounds]
a[5]=1;
^ ~
corrupt.c:3:5: note: array 'a' declared here
int a[4];
^
corrupt.c:6:8: warning: variable 'y' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if(x)
^
corrupt.c:8:11: note: uninitialized use occurs here
x = a[y];
^
corrupt.c:6:5: note: remove the 'if' if its condition is always true
if(x)
^~~~~
corrupt.c:4:13: note: initialize the variable 'y' to silence this warning
int x, y;
^
= 0
corrupt.c:6:8: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
if(x)
^
corrupt.c:4:10: note: initialize the variable 'x' to silence this warning
int x, y;
^
= 0
3 warnings generated.
I believe the above code example gives you insight what to expect. (Even though I tried, I could not get the static analyzer in clang to work.)
This would help enormously to take precautions before releasing a library.
As you can see above, it won't be an enormous help, unfortunately. I can only confirm that instrumentation is currently the best way to debug your code. Here is another code example:
#include <stdlib.h>
int main() {
int* p = malloc(3*sizeof(int));
p[5] = 0; /* line 4 */
free(p);
p[1]=42; /* line 6 */
free(p); /* line 7 */
}
Compiled as clang -O0 -fsanitize=address -g -Weverything memsen.c. (GCC 4.8 also has address santizier but I only have gcc 4.7.2.) The output:
==3476==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000f004 at pc 0x4887a7 bp 0x7fff9544be30 sp 0x7fff9544be28
WRITE of size 4 at 0x60200000f004 thread T0
#0 0x4887a6 in main /home/ali/tmp/memsen.c:4
[...]
Awesome, we know what went wrong (heap-buffer-overflow) and where (in main /home/ali/tmp/memsen.c:4). Now, I comment out line 4 and get:
==3481==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200000eff4 at pc 0x4887d7 bp 0x7fff27a00d50 sp 0x7fff27a00d48
WRITE of size 4 at 0x60200000eff4 thread T0
#0 0x4887d6 in main /home/ali/tmp/memsen.c:6
[...]
Again, we see what went wrong and where. Finally, I comment out line 6.
==3486==ERROR: AddressSanitizer: attempting double-free on 0x60200000eff0 in thread T0:
#0 0x46dba1 in free /home/ali/llvm/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:65
#1 0x48878c in main /home/ali/tmp/memsen.c:7
[...]
Also caught the problem.
If your code has tests, or at least you can run your code with different inputs on your machine before releasing the library, you could probably track down a significant portion of the bugs. Unfortunately, it is not a compile-time solution and you probably don't want to release instrumented code (code compiled with -fsanitize=* flag). So if the user runs your code with an input that triggers a bug, the program will still crash with a segmentation fault.
I included some sample ASM code in a small program to do a test.
My program is:
#include <stdio.h>
static inline
unsigned char inb (int port) {
unsigned char data;
asm volatile("inb %w1,%0" : "=a" (data) : "d" (port));
return data;
}
int main()
{
printf("hello world %d\n", inb(22));
return 0;
}
When I run the program, it crashes with a segmentation fault when executing the ASM code.
Could someone tell me what's wrong with this small program? Thanks a lot.
You need to use ioperm before you're allowed to use port I/O. Also, note the kernel already provides inb and outb functions.
Use ioperm(2) or alternatively iopl(2) to tell the kernel to allow the
user space application to access the I/O ports in question. Failure
to do this will cause the application to receive a segmentation fault.
If your OS is Windows or Linux, most likely your program is terminated because the OS doesn't allow regular applications access I/O ports.
You syntax is absolutely correct. Just find and use the valid or unused port on your system.
EDIT: Clarifying.
fout is is a FILE*. (I thought this was irrelevant since that line clearly compiles)
there is A LOT of code above these last few lines; I guess I could dump them all but I imagine you're not overly interested in debugging my stuff. I'm more interested, generally, in what could possibly occur that would segfault at return 0 but not before.
Warning: My C is terrible.
I've got a C program which, from the likes of it, just wants to segfault. I'll spare you the other, irrelevant details, but here's the big picture:
My code:
//...other code
printf("finished \n");
fclose(fout);
printf("after fclose \n");
return 0;
The output:
finished
after fclose
Segmentation fault
I'm compiling with GCC, -std=c99.
My question:
How the heck is this even possible? What should I be looking at, that may be causing this (seemingly random) segfault? Any ideas?
Much thanks!
Whatever the return is going back to is causing the fault. If this code snippet is in main(), then the code has inflicted damage to the stack, most likely by exceeding the bounds of a variable. For example
int main ()
{
int a [3];
int j;
for (j = 0; j < 10; ++j)
a [j] = 0;
return 0;
}
This sort of thing could cause any of a number of inexplicable symptoms, including a segfault.
Since it's probably a stack corruption related problem, you could also use a memory debugger to locate the source of the corruption, like valgrind.
Just compile using gcc -g and then run valgrind yourprog args.
Does "Hello world!" program seg fault? If so then you have a hardware problem. If not then you have at least one problem in the code you're not showing us!
Compile your program with the debug flag gcc -g and run your code in gdb. You can't always trust the console to output "Segmentation fault" exactly when problematic code is executed or relative to other output. In many cases this will be misleading -- you will find debugging tools such as gdb extremely useful.
I'm trying to debug a simple stop-and-copy garbage collector (written in C) using GDB. The GC works by handling SIGBUS. I've set a breakpoint at the top of my SIGBUS signal handler. I've told GDB to pass SIGBUS to my program. However, it doesn't appear to work.
The following program (explained inline) shows the essence of my problem:
#include <stdio.h>
#include <sys/mman.h>
#include <assert.h>
#include <signal.h>
#define HEAP_SIZE 4096
unsigned long int *heap;
void gc(int n) {
signal(SIGBUS, SIG_DFL); // just for debugging
printf("GC TIME\n");
}
int main () {
// Allocate twice the required heap size (two semi-spaces)
heap = mmap(NULL, HEAP_SIZE * 2, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED,
-1, 0);
assert (heap != MAP_FAILED);
// 2nd semi-space is unreadable. Using "bump-pointer allocation", a SIGBUS
// tells us we are out of space and need to GC.
void *guard = mmap(heap + HEAP_SIZE, HEAP_SIZE, PROT_NONE, MAP_ANON |
MAP_SHARED | MAP_FIXED, -1, 0);
assert (guard != MAP_FAILED);
signal(SIGBUS, gc);
heap[HEAP_SIZE] = 90; // pretend we are out of heap space
return 0;
}
I compile and run the program on Mac OS X 10.6 and get the output I expect:
$ gcc debug.c
$ ./a.out
GC TIME
Bus error
I want to run and debug this program using GDB. In particular, I want to set a breakpoint at the gc function (really, the gc signal handler). Naturally, I need to tell GDB to not halt on SIGBUS as well:
$ gdb ./a.out
GNU gdb 6.3.50-20050815 (Apple version gdb-1346) (Fri Sep 18 20:40:51 UTC 2009)
... snip ...
(gdb) handle SIGSEGV SIGBUS nostop noprint
Signal Stop Print Pass to program Description
SIGBUS No No Yes Bus error
SIGSEGV No No Yes Segmentation fault
(gdb) break gc
Breakpoint 1 at 0x100000d6f
However, we never reach the breakpoint:
(gdb) run
Starting program: /snip/a.out
Reading symbols for shared libraries +. done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000100029000
0x0000000100000e83 in main ()
(gdb)
Apparently, the signal handler is not invoked (GC TIME is not printed). Furthermore, we're still in main(), at the faulting mov:
0x0000000100000e83 <main+247>: movq $0x5a,(%rax)
Any ideas?
Thanks.
Internally, bad memory accesses result in the Mach exception EXC_BAD_ACCESS being sent to the program. Normally, this is translated into a SIGBUS UNIX signal. However, gdb intercepts Mach exceptions directly, before the signal translation. The solution is to give gdb the command set dont-handle-bad-access 1 before running your program. Then the normal mechanism is used, and breakpoints inside your signal handler are honored.
The same code (modified to handle SIGSEGV too) works as expected in GDB on Linux; it may be a bug in OS X or GDB's port to that platform.
Googling finds broken OS X behavior just like yours all the way back on 10.1, with a sort-of workaround (set inferior-bind-exception-port off before running the program).
(There's a similar bug on Windows.)
How about putting a for( ;; ); after the printf(), running the program normally, and then connecting to the process with gdb after GC TIME prints out?
Why do you expect to get a SIGBUS? SIGBUS usually means alignment error, on an architecture where some datatypes have alignment requirements. It looks like you are just trying to access memory outside your allocated area, and I would expect that you get SIGSEGV instead of SIGBUS then.
Edit:
It seems that the Mac OS X name for what I think of as SIGSEGV is SIGBUS. Therefore, ignore this answer.
If it's any help, the breakpoint works as expected (i.e., it works) when I try the program on a Linux system, with SIGBUS replaced by SIGSEGV.
Edit 2:
Can you try to catch SIGSEGV in your program too? It seems that the type of signal may vary depending on where the memory is mapped in Mac OS X (I just read the discussion here and here), and just perhaps a different signal could be thrown when you run in the debugger?