I'm trying to debug a simple stop-and-copy garbage collector (written in C) using GDB. The GC works by handling SIGBUS. I've set a breakpoint at the top of my SIGBUS signal handler. I've told GDB to pass SIGBUS to my program. However, it doesn't appear to work.
The following program (explained inline) shows the essence of my problem:
#include <stdio.h>
#include <sys/mman.h>
#include <assert.h>
#include <signal.h>
#define HEAP_SIZE 4096
unsigned long int *heap;
void gc(int n) {
signal(SIGBUS, SIG_DFL); // just for debugging
printf("GC TIME\n");
}
int main () {
// Allocate twice the required heap size (two semi-spaces)
heap = mmap(NULL, HEAP_SIZE * 2, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED,
-1, 0);
assert (heap != MAP_FAILED);
// 2nd semi-space is unreadable. Using "bump-pointer allocation", a SIGBUS
// tells us we are out of space and need to GC.
void *guard = mmap(heap + HEAP_SIZE, HEAP_SIZE, PROT_NONE, MAP_ANON |
MAP_SHARED | MAP_FIXED, -1, 0);
assert (guard != MAP_FAILED);
signal(SIGBUS, gc);
heap[HEAP_SIZE] = 90; // pretend we are out of heap space
return 0;
}
I compile and run the program on Mac OS X 10.6 and get the output I expect:
$ gcc debug.c
$ ./a.out
GC TIME
Bus error
I want to run and debug this program using GDB. In particular, I want to set a breakpoint at the gc function (really, the gc signal handler). Naturally, I need to tell GDB to not halt on SIGBUS as well:
$ gdb ./a.out
GNU gdb 6.3.50-20050815 (Apple version gdb-1346) (Fri Sep 18 20:40:51 UTC 2009)
... snip ...
(gdb) handle SIGSEGV SIGBUS nostop noprint
Signal Stop Print Pass to program Description
SIGBUS No No Yes Bus error
SIGSEGV No No Yes Segmentation fault
(gdb) break gc
Breakpoint 1 at 0x100000d6f
However, we never reach the breakpoint:
(gdb) run
Starting program: /snip/a.out
Reading symbols for shared libraries +. done
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x0000000100029000
0x0000000100000e83 in main ()
(gdb)
Apparently, the signal handler is not invoked (GC TIME is not printed). Furthermore, we're still in main(), at the faulting mov:
0x0000000100000e83 <main+247>: movq $0x5a,(%rax)
Any ideas?
Thanks.
Internally, bad memory accesses result in the Mach exception EXC_BAD_ACCESS being sent to the program. Normally, this is translated into a SIGBUS UNIX signal. However, gdb intercepts Mach exceptions directly, before the signal translation. The solution is to give gdb the command set dont-handle-bad-access 1 before running your program. Then the normal mechanism is used, and breakpoints inside your signal handler are honored.
The same code (modified to handle SIGSEGV too) works as expected in GDB on Linux; it may be a bug in OS X or GDB's port to that platform.
Googling finds broken OS X behavior just like yours all the way back on 10.1, with a sort-of workaround (set inferior-bind-exception-port off before running the program).
(There's a similar bug on Windows.)
How about putting a for( ;; ); after the printf(), running the program normally, and then connecting to the process with gdb after GC TIME prints out?
Why do you expect to get a SIGBUS? SIGBUS usually means alignment error, on an architecture where some datatypes have alignment requirements. It looks like you are just trying to access memory outside your allocated area, and I would expect that you get SIGSEGV instead of SIGBUS then.
Edit:
It seems that the Mac OS X name for what I think of as SIGSEGV is SIGBUS. Therefore, ignore this answer.
If it's any help, the breakpoint works as expected (i.e., it works) when I try the program on a Linux system, with SIGBUS replaced by SIGSEGV.
Edit 2:
Can you try to catch SIGSEGV in your program too? It seems that the type of signal may vary depending on where the memory is mapped in Mac OS X (I just read the discussion here and here), and just perhaps a different signal could be thrown when you run in the debugger?
Related
I have the simple .c file that is supposed to fail(I know where it fails, I put the bug in there intentionally):
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s argument\n", argv[0]);
return EXIT_FAILURE;
}
char *hello = "hello";
*hello = 'H';
fprintf(stdout, "%s, %s!\n", hello, argv[1]);
return EXIT_SUCCESS;
}
Part 1) I save it as "hello.c" and compile without debugging flags by gcc hello.c -o hello.
Then, I wish to go line by line through the main function. I try to use gdb as follows:
Run gdb ./hello
Set breakpoint by break main
Run run 123
s -> fails
Here is the outcome:
(gdb) info func
All defined functions:
Non-debugging symbols:
0x0000000000000568 _init
0x0000000000000590 fprintf#plt
0x00000000000005a0 __cxa_finalize#plt
0x00000000000005b0 _start
0x00000000000005e0 deregister_tm_clones
0x0000000000000620 register_tm_clones
0x0000000000000670 __do_global_dtors_aux
0x00000000000006b0 frame_dummy
0x00000000000006ba main
0x0000000000000740 __libc_csu_init
0x00000000000007b0 __libc_csu_fini
0x00000000000007b4 _fini
(gdb) break main
Breakpoint 1 at 0x6be
(gdb) r
Starting program: /mnt/c/Users/User/Documents/Debugging/hello
Breakpoint 1, 0x00000000080006be in main ()
(gdb) s
Single stepping until exit from function main,
which has no line number information.
__fprintf (stream=0x7fffff3ec680 <_IO_2_1_stderr_>, format=0x80007c4 "Usage: %s argument\n") at fprintf.c:27
27 fprintf.c: No such file or directory.
Why does this happen? Why does it fail like this by trying to find a file fprintf? Thought the preprocessed headers should deal with the required implementation code.
Part 2) When I however compile with the -g, it works for some reason. But running the program in gdb does not yield a segmentation fault as expected :/ Why?
Again, see:
$ ./hello 123
Segmentation fault (core dumped)
but
(gdb) run 123
Starting program: /mnt/c/Users/NichlasDesktop/Documents/uni/Compsys/Exercises/Debugging/exercise_code/hello 123
Hello, 123!
[Inferior 1 (process 632) exited normally]
Part 1)
s -> fails
Without debugging information, gdb is unable to map instructions to source code - file paths and line numbers. The s/step command tells gdb to execute instructions that correspond to the current statement in the source language. gdb has no way of telling what that is, so it continues till wherever in code the source position ("SPOS") information is available. That happens to be in libc where fprintf is defined. However the source code for fprintf is not available in your environment, hence that message.
You could step through the individual instructions using the si/stepi command, this does not need debugging information to be present.
You did not show what happens if you continue execution from that point, I suspect it would eventually hit the segmentation fault.
Why does this happen? Why does it fail like this by trying to find a file fprintf? Thought the preprocessed headers should deal with the required implementation code.
That is not what headers are. They don't contain source code.
Part 2) When I however compile with the -g, it works for some reason. But running the program in gdb does not yield a segmentation fault as expected :/ Why?
gdb or any other debugger would arrange for the executable text segment to be mapped private instead of shared as in the usual run case (mmap with MAP_PRIVATE instead of MAP_SHARED). This is because the debugger needs the text segment to be writable so that instructions can be overwritten to insert breakpoints etc.
The "hello" you put in there is a constant string literal that is stored in the read-only text segment of the executable. This is why writing to it causes a segmentation fault during the normal run - the text segment is mapped shared. Inside the debugger however the text segment is mapped private, so you are able to write to it.
Have a bit of trouble tracking down the cause of this.
Here is the code bit:
#include <time.h>
time_t now;
struct tm *mytime;
char yyyy[5];
char mm[3];
char dd[3];
char mname[10];
if(time(&now)!=(time_t)(-1))
{
mytime=localtime(&now);
strftime(yyyy, sizeof(yyyy), "%Y", mytime);
strftime(mm, sizeof(mm), "%m", mytime);
strftime(dd, sizeof(dd), "%d", mytime);
strftime(mname, sizeof(mname), "%B", mytime);
}
It crashes on localtime line:
Segmentation fault (core dumped)
Any ideas?
The sample code runs fine for me. Post your full code? Or cut down your example to minimal possible which still reproduces problem. And run gdb on your core 'gdb -c a.core a.out' and get a backtrace(bt).
One gotcha with localtime is the returned pointer is a pointer to a static global var and subsequent calls to localtime update the var. Tripped me up once long long ago.
struct tm *localtime(const time_t *time)
The return value is a pointer to a static broken-down time structure, which might be overwritten by subsequent calls to any of the date and time functions. (But no other library function overwrites the contents of this object.)
From:
http://www.chemie.fu-berlin.de/chemnet/use/info/libc/libc_17.html
On searching for core files:
See also core dumped - but core file is not in current directory?
Make sure system can write core file.
* for me on one sample ununtu system ulimit -c showed 0 *
ulimit -c unlimited
Check what pattern used and change the pattern to a simple one or different location.
cat /proc/sys/kernel/core_pattern
#sysctl -w kernel.core_pattern=core
Search some common locations and look in /var/log/messages:
ls /var/crash /var/cache/abrt /var/spool/abrt/ /tmp/*core*
tail /var/log/messages
On ubuntu examine the apport service config and init/rcfiles:
find /etc/ |grep appo
This may or may not be something close to your problem?
Still don't quite have enough info to be sure what you are doing.
I'd be thinking localtime is not the problem but somewhere in the printfs there are problems . . .
How are you compiling your project?
g++ or gcc?
Can you print a string?? (as opposed to string ptr)
What is a string in your context?
1 problem here:
fprintf(stdout,"# Single electron capture cross sections, %s state-selective\n",res);
If string is typedef a char[] then passing a pointer to it in func args would be better?
Compiling your sample making some assumptions and adding debug (-g3):
gcc -g3 stackoverflow_localtime_crash.c -o socrash
./socrash
# Single electron capture cross sections, RESSTRING state-selective
# Magic number=20032014, 20 March 2014,
# Single electron capture cross sections, RESSTRING state-selective
# ^0+ + -> ^-1+ + ^+
# Method=MCLZ
# et al. 2014, to be submitted
# ----------------------------------------------------------------------------
# Energy Cross sections (10^-16 cm^2)
# (eV/u) LABELSTRING �H���u�j
* lots more JUNK follows
res string itself cannot be passed in on stack and printed *
Segmentation fault (core dumped)
gdb -c core socrash
Core was generated by `./socrash'.
Program terminated with signal 11, Segmentation fault.
[New process 10298]
#0 0xb770e713 in strlen () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb770e713 in strlen () from /lib/tls/i686/cmov/libc.so.6
#1 0xb76da6d9 in vfprintf () from /lib/tls/i686/cmov/libc.so.6
#2 0xb76e0b9f in fprintf () from /lib/tls/i686/cmov/libc.so.6
#3 0x08048a42 in prtcsheader (cs=0xbf9e907c, labels=0xbf9e90bc, res=0xbf9e90a8 "RESSTRING") at stackoverflow_localtime_crash.c:66
#4 0x08048b36 in main (argc=Cannot access memory at address 0x0
) at stackoverflow_localtime_crash.c:80
(gdb) list
64 fprintf(stdout,"%-15s","# (eV/u)");
65 for(i=0;labels[i]!=NULL;i++)
66 fprintf(stdout,"\t%-14s",labels[i]);
67 fprintf(stdout,"\t%-14s","Total");
68
69 return 0;
(gdb)
* see line 66 there is also a problem *
might well be the way I have have declared string :-P
66 fprintf(stdout,"\t%-14s",labels[i]);
* If you can find a core file yourself then it would be a big help!!! *
If you can't find core then maybe you could use ltrace.
Run strace to get trace of all system calls a process does.
Run ltrace to get trace of all lib calls a process does.
Try:
strace ls
ltrace ls
On my crashing example:
ltrace ./socrash 1>stdout.log 2>socrash_ltrace.log
less socrash_ltrace.log
I'm trying to run some assembly code saved in a buffer on OS X, but I keep getting a segmentation fault. The code looks like this:
int main()
{
unsigned char buff[] = "\x66\x6a\7f\x66\xb8\x01\x00\x00\x00\x66\x83\xec\x04\xcd\x80";
( void (*)()buff )(); /* same as calling return 127 */
return 0; /* program should never reach here */
}
The code in buff is generated by nasm and it works, it causes the program to return 127. When running through a c program like so though, I get a segmentation fault. Is there a different way to do this in OS X?
First, this will not compile, because you are missing the parentheses necessary to make void (*)() a cast. The line should be ((void (*)())buff)();.
Second, if you compile without optimization, buff is likely constructed on the stack, and execution will fail because Mac OS X marks the stack as not executable.
Third, if you compile with optimization, buff is likely prepared in some data segment, and you may be able to execute it. But the instructions you have are inappropriate for the Mac OS X platform, and you get a normal access exception. You could step through the instructions in the debugger to figure out what is wrong.
The behavior of converting an object pointer to a function pointer and calling the function is not defined by the C standard. You should not rely on it to work.
Among the errors in the assembly code:
It moves one to the %ax register, which is the low two bytes of the %rax register. This leaves the high six bytes uncontrolled. Then it attempts to use %rax as an address. This fails because the value in the %rax register is not pointing at accessible memory.
It attempts to execute the instruction int $0x80. This is some Microsoft Windows, DOS, or Linux service call. On Mac OS X, it is an illegal instruction.
The stack is non executable by default -- you need to mark a page as executable with mprotect(2) in order to make it executable. Making the stack executable is highly not recommended, so if you want to run code generated at runtime, you should allocate memory on the heap instead.
For example:
#include <sys/mman.h>
#include <unistd.h>
...
// Error checking omitted for expository purposes
// Allocate 1 page of read-write memory
size_t page_size = getpagesize();
void *mem = mmap(NULL, page_size,
PROT_READ | PROT_WRITE,
MAP_ANON | MAP_PRIVATE,
-1, 0);
// Copy the shell code into the memory
char shellcode[] = "...";
memcpy(mem, shellcode, sizeof(shellcode));
// Change memory to executable and non-writable
mprotect(mem, page_size, PROT_READ | PROT_EXEC);
// Run the code
((void (*)())mem)();
// Free the memory
munmap(mem, page_size);
Why does the following program not crash when it is executed, but crash with a segfault in GDB? Compiled with GCC 4.5.2 on a 32-bit x86 (Athlon 64, if it should matter).
#include <stdio.h>
#include <string.h>
int modify(void)
{
__asm__("mov $0x41414141, %edx"); // Stray value.
__asm__("mov $0xbffff2d4, %eax"); // Addr. of ret pointer for function().
__asm__("mov %edx, (%eax)");
}
int function(void)
{
modify();
return 0;
}
int main(int argc, char **argv)
{
function();
return 0;
}
The mov $0xbffff2d4, %eax was determined using GDB to find the address where the return pointer was stored for the "function" function. This will probably be different on a different system. ASLR was disabled for this.
When I execute the program, nothing happens. There is no report of a crash in dmesg either. However when I execute the same program in GDB:
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
=> 0x41414141: Cannot access memory at address 0x41414141
This is what I expect should happen when I execute the program normally as well. I do indeed get segfaults as usual when other programs crash, and I can easily write a small program that crashes with a nice segfault. But why does this particular program not crash with a segfault?
Even with full ASLR disabled, you may still get randomized stack and heap. You can turn that off globally using the norandmaps kernel boot parameter or at runtime by setting /proc/sys/kernel/randomize_va_space to zero. It's also part of the process personality.
In GDB, you can tweak this using the disable-randomization setting:
(gdb) help set disable-randomization
Set disabling of debuggee's virtual address space randomization.
When this mode is on (which is the default), randomization of the virtual
address space is disabled. Standalone programs run with the randomization
enabled by default on some platforms.
As a small test program to illustrate this, you can print the address of a local variable, such as:
#include <stdio.h>
int main(int argc, char **argv)
{
printf("%p\n", &argc);
return 0;
}
This is a mad-hack, but I am trying to deliberately cause a segfault at a particular point in execution, so valgrind will give me a stack trace.
If there is a better way to do this please tell me, but I would still be curious to know how to deliberaly cause a segfault, and why my attempt didn't work.
This is my failed attempt:
long* ptr = (long *)0xF0000000;
ptr = 10;
I thought valgrind should at least pick up on that as a invalid write, even if it's not a segmentation violation. Valgrind says nothing about it.
Any ideas why?
EDIT
Answer accepted, but I still have some up-votes for any suggestions for a more sane way to get a stack trace...
Just call abort(). It's not a segfault but it should generate a core dump.
Are you missing a * as in *ptr = 10? What you have won't compile.
If it does, somehow, that of course won't cause a seg-fault, since you're just assigning a number. Dereferencing might.
Assuming dereferencing null on your OS results in a segfault, the following should do the trick:
inline void seg_fault(void)
{
volatile int *p = reinterpret_cast<volatile int*>(0);
*p = 0x1337D00D;
}
Sorry for mentioning the obvious but why not use gdb with a breakbpoint and then use backtrace?
(gdb) b somewhere
(gdb) r
(gdb) bt
As mentioned in other answers you can just call abort() if you want to abnormally terminate your program entirely, or kill(getpid(), SIGSEGV) if it has to be a segmentation fault. This will generate a core file that you can use with gdb to get a stack trace or debug even if you are not running under valgrind.
Using a valgrind client request you can also have valgrind dump a stack trace with your own custom message and then continue executing. The client request does nothing when the program is not running under valgrind.
#include <valgrind/valgrind.h>
...
VALGRIND_PRINTF_BACKTRACE("Encountered the foobar problem, x=%d, y=%d\n", x, y);
Wouldn't it be better to send sig SEGV (11) to the process to force a core dump?
Are you on x86? If so then there is actually an opcode in the CPU that means "invoke whatever debugger may be attached". This is opcode CC, or more commonly called int 3. Easiest way to trigger it is with inline assembly:
inline void debugger_halt(void)
{
#ifdef MSVC
__asm int 3;
#elif defined(GCC)
asm("int 3");
#else
#pragma error Well, you'll have to figure out how to do inline assembly
in your compiler
#endif
}
MSVC also supports __debugbreak() which is a hardware break in unmanaged code and an MSIL "break" in managed code.