I've done a lot of programming but not much in C, and I need advice on debugging. I have a static variable (file scope) that is being clobbered after about 10-100 seconds of execution of a multithreaded program (using pthreads on OS X 10.4). My code looks something like this:
static float some_values[SIZE];
static int * addr;
addr points to valid memory address for a while, and then gets clobbered with some value (sometimes 0, sometimes nonzero), thereby causing a segfault when dereferenced. Poking around with gdb I have verified that addr is being layed out in memory immediately after some_values as one would expect, so my first guess would be that I have used an out-of-bounds index to write to some_values. However, this is a tiny file, so it is easy to check this is not the problem.
The obvious debugging technique would be to set a watchpoint on the variable addr. But doing so seems to create erratic and inexplicable behavior in gdb. The watchpoint gets triggered at the first assignment to addr; then after I continue execution, I immediately get a nonsensical segfault in another thread...supposedly a segfault on accessing the address of a static variable in a different part of the program! But then gdb lets me read from and write to that memory address interactively.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x001d5bd0
0x0000678d in receive (arg=0x0) at mainloop.c:39
39 sample_buf_cleared ++;
(gdb) p &sample_buf_cleared
$17 = (int *) 0x1d5bd0
(gdb) p sample_buf_cleared
$18 = 1
(gdb) set sample_buf_cleared = 2
(gdb)
gdb is obviously confused. Does anyone know why? Or does anyone have any suggestions for debugging this bug without using watchpoints?
You could put an array of uint's between some_values and addr and determine if you are overruning some_values or if the corruption affects more addresses then you first thought. I would initialize padding to DEADBEEF or some other obvious pattern that is easy to distinguish and unlikely to occur in the program. If a value in the padding changes then cast it to float and see if the number makes sense as a float.
static float some_values[SIZE];
static unsigned int padding[1024];
static int * addr;
Run the program multiple times. In each run disable a different thread and see when the problems goes away.
Set the programs process affinity to a single core and then try the watchpoint. You may have better luck if you don't have two threads simultaneously modifying the value. NOTE: This solution does not preclude that from happening. It may make it easier to catch in a debugger.
static variables and multi-threading generally do not mix.
Without seeing your code (you should include your threaded code), my guess is that you have two threads concurrently writing to addr variable. It doesn't work.
You either need to:
create separate instances of addr for each thread; or
provide some sort of synchronisation around addr to stop two threads changing the value at the same time.
Try using valgrind; I haven't tried valgrind on OS X, and I don't understand your problem, but "try valgrind" is the first thing I think of when you say "clobbered".
One thing you could try would be to create a separate thread whose only purpose is to watch the value of addr, and to break when it changes. For example:
static int * volatile addr; // volatile here is important, and must be after the *
void *addr_thread_proc(void *arg)
{
while(1)
{
int *old_value = addr;
while(addr == old_value) /* spin */;
__asm__("int3"); // break the debugger, or raise SIGTRAP if no debugger
}
}
...
pthread_t spin_thread;
pthread_create(&spin_thread, NULL, &addr_thread_proc, NULL);
Then, whenever the value of addr changes, the int3 instruction will run, which will break the debugger, stopping all threads.
gdb often acts weird with multithreaded programs. Another solution (if you can afford it) would be to put printf()s all over the place to try and catch the moment where your value gets clobbered. Not very elegant, but sometimes effective.
I have not done any debugging on OSX, but I have seen the same behavior in GDB on Linux: program crashes, yet GDB can read and write the memory which program just tried to read/write unsuccessfully.
This doesn't necessarily mean GDB is confused; rather the kernel allowed GDB to read/write memory via ptrace() which the inferior process is not allowed to read or write. IOW, it was a (recently fixed) kernel bug.
Still, it sounds like GDB watchpoints aren't working for you for whatever reason.
One technique you could use is to mmap space for some_values rather than statically allocating space for them, arrange for the array to end on a page boundary, and arrange for the next page to be non-accessible (via mprotect).
If any code tries to access past the end of some_values, it will get an exception (effectively you are setting a non-writable "watch point" just past some_values).
Related
This question already has answers here:
How can I prevent (not react to) a segmentation fault?
(3 answers)
Closed 2 years ago.
Can I tell if a pointer is in the rodata section of an executable?
As in, editing that pointer's data would cause a runtime system trap.
Example (using a C character pointer):
void foo(char const * const string) {
if ( in_rodata( string ) ) {
puts("It's in rodata!");
} else {
puts("That ain't in rodata");
}
}
Now I was thinking that, maybe, I could simply compare the pointer to the rodata section.
Something along the lines of:
if ( string > start_of_rodata && string < end_of_rodata ) {
// it's in rodata!
}
Is this a feasible plan/idea?
Does anyone have an idea as to how I could do this?
(Is there any system information that one might need in order to answer this?)
I am executing the program on a Linux platform.
I doubt that it could possibly be portable
If you don't want to mess with linker scripts or using platform-specific memory map query APIs, a proxy approach is fairly portable on platforms with memory protection, if you're willing to just know whether the location is writable, read-only, or neither. The general idea is to do a test read and a test write. If the first succeeds but the second one fails, it's likely .rodata or code segment. This doesn't tell you "it's rodata for sure" - it may be a code segment, or some other read-only page, such as as read-only file memory mapping that has copy-on-write disabled. But that depends on what you had in mind for this test - what was the ultimate purpose.
Another caveat is: For this to be even remotely safe, you must suspend all other threads in the process when you do this test, as there's a chance you may corrupt some state that code executing on another thread may happen to refer to. Doing this from inside a running process may have hard-to-debug corner cases that will stop lurking and show themselves during a customer demo. So, on platforms that support this, it's always preferable to spawn another process that will suspend the first process in its entirety (all threads), probe it, write the result to the process's address space (to some result variable), resume the process and terminate itself. On some platforms, it's not possible to modify a process's address space from outside, and instead you need to suspend the process mostly or completely, inject a probe thread, suspend the remaining other threads, let the probe do its job, write an answer to some agreed-upon variable, terminate, then resume everything else from the safety of an external process.
For simplicity's sake, the below will assume that it's all done from inside the process. Even though "fully capable" self-contained examples that work cross-process would not be very long, writing this stuff is a bit tedious especially if you want it short, elegant and at least mostly correct - I imagine a really full day's worth of work. So, instead, I'll do some rough sketches and let you fill in the blanks (ha).
Windows
Structured exceptions get thrown e.g. due to protection faults or divide by zero. To perform the test, attempt a read from the address in question. If that succeeds, you know it's at least a mapped page (otherwise it'll throw an exception you can catch). Then try writing there - if that fails, then it was read-only. The code is almost boring:
static const int foo;
static int bar;
#if _WIN32
typedef struct ThreadState ThreadState;
ThreadState *suspend_other_threads(void) { ... }
void resume_other_threads(ThreadState *) { ... }
int check_if_maybe_rodata(void *p) {
__try {
(void) *(volatile char *)p;
} __finally {
return false;
}
volatile LONG result = 0;
ThreadState *state = suspend_other_threads();
__try {
InterlockedExchange(&result, 1);
LONG saved = *(volatile LONG*)p;
InterlockedExchange((volatile LONG *)p, saved);
InterlockedExchange(&result, 0); // we succeeded writing there
} __finally {}
resume_other_threads(state);
return result;
}
int main() {
assert(check_if_maybe_rodata(&foo));
assert(!check_if_maybe_rodata(&bar));
}
#endif
Suspending the threads requires traversing the thread list, and suspending each thread that's not the current thread. The list of all suspended threads has to be created and saved, so that later the same list can be traversed to resume all the threads.
There are surely caveats, and WoW64 threads have their own API for suspension and resumption, but it's probably something that would, in controlled circumstances, work OK.
Unix
The idea is to leverage the kernel to check the pointer for us "at arms length" so that no signal is thrown. Handling POSIX signals that result from memory protection faults requires patching the code that caused the fault, inevitably forcing you to modify the protection status of the code's memory. Not so great. Instead, pass a pointer to a syscall you know should succeed in all normal circumstances to read from the pointed-to-address - e.g. open /dev/zero, and write to that file from a buffer pointed-to by the pointer. If that fails with EFAULT, it is due to buf [being] outside your accessible address space. If you can't even read from that address, it's not .rodata for sure.
Then do the converse: from an open /dev/zero, attempt a read to the address you are testing. If the read succeeds, then it wasn't read-only data. If the read fails with EFAULT that most likely means that the area in question was read-only since reading from it succeeded, but writing to it didn't.
In all cases, it'd be most preferable to use native platform APIs to test the mapping status of the page on which the address you try to access resides, or even better - to walk the sections list of the mapped executable (ELF on Linux, PE on Windows), and see exactly what went where. It's not somehow guaranteed that on all systems with memory protection the .rodata section or its equivalent will be mapped read only, thus the executable's image as-mapped into the running process is the ultimate authority. That still does not guarantee that the section is currently mapped read-only. An mprotect or a similar call could have changed it, or parts of it, to be writable, even modified them, and then perhaps changed them back to read-only. You'd then have to either checksum the section if the executable's format provides such data, or mmap the same binary somewhere else in memory and compare the sections.
But I smell a faint smell of an XY problem: what is it that you're actually trying to do? I mean, surely you don't just want to check if an address is in .rodata out of curiosity's sake. You must have some use for that information, and it is this application that would ultimately decide whether even doing this .rodata check should be on the radar. It may be, it may be not. Based on your question alone, it's a solid "who knows?"
Suppose I have following program:
#include <signal.h>
#include <stddef.h>
#include <stdlib.h>
static void myHandler(int sig){
abort();
}
int main(void){
signal(SIGSEGV,myHandler);
char* ptr=NULL;
*ptr='a';
return 0;
}
As you can see, I register a signalhandler and some lines further, I dereference a null pointer ==> SIGSEGV is triggered.
But how is it triggered?
If I run it using strace (Output stripped):
//Set signal handler (In glibc signal simply wraps a call to sigaction)
rt_sigaction(SIGSEGV, {sa_handler=0x563b125e1060, sa_mask=[SEGV], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffbe4fe0d30}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
//SIGSEGV is raised
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [SEGV], 8) = 0
But something is missing, how does a signal go from the CPU to the program?
My understanding:
[Dereferences null pointer] -> [CPU raises an exception] -> [??? (How does it go from the CPU to the kernel?) ] -> [The kernel is notified, and sends the signal to the process] -> [??? (How does the process know, that a signal is raised?)] -> [The matching signal handler is called].
What happens at these two places marked with ????
A NULL pointer in most (but not all) C implementations is address 0. Normally this address is not in a valid (mapped) page.
Any access to a virtual page that's not mapped by the HW page tables results in a page-fault exception. e.g. on x86, #PF.
This invokes the OS's page-fault exception handler to resolve the situation. On x86-64 for example, the CPU pushes exception-return info on the kernel stack and loads a CS:RIP from the IDT (Interrupt Descriptor Table) entry that corresponds to that exception number. Just like any other exception triggered by user-space, e.g. integer divide by zero (#DE), or a General Protection fault #GP (trying to run a privileged instruction in user-space, or a misaligned SIMD instruction that required alignment, or many other possible things).
The page-fault handler can find out what address user-space tried to access. e.g. on x86, there's a control register (CR2) that holds the linear (virtual) address that caused the fault. The OS can get a copy of that into a general-purpose register with mov rax, cr2.
Other ISAs have other mechanisms for the OS to tell the CPU where its page-fault handler is, and for that handler to find out what address user-space was trying to access. But it's pretty universal for systems with virtual memory to have essentially equivalent mechanisms.
The access is not yet known to be invalid. There are several reasons why an OS might not have bothered to "wire" a process's allocated memory into the hardware page tables. This is what paging is all about: letting the OS correct the situation, like copy-on-write, lazy allocation, or bringing a page back in from swap space.
Page faults come in three categories: (copied from my answer on another question). Wikipedia's page-fault article says similar things.
valid (the process logically has the memory mapped, but the OS was lazy or playing tricks like copy-on-write):
hard: the page needs to be paged in from disk, either from swap space or from a disk file (e.g. a memory mapped file, like a page of an executable or shared library). Usually the OS will schedule another task while waiting for I/O: this is the key difference between hard (major) and soft (minor).
soft: No disk access required, just for example allocating + zeroing a new physical page to back a virtual page that user-space just tried to write. Or copy-on-write of a writeable page that multiple processes had mapped, but where changes by one shouldn't be visible to the other (like mmap(MAP_PRIVATE)). This turns a shared page into a private dirty page.
invalid: There wasn't even a logical mapping for that page. A POSIX OS like Linux will deliver SIGSEGV signal to the offending process/thread.
So only after the OS consults its own data structures to see which virtual addresses a process is supposed to own can it be sure that the memory access was invalid.
Deciding whether a page fault is invalid or not is completely up to software. As I wrote on Why page faults are usually handled by the OS, not hardware? - if the HW could figure everything out, it wouldn't need to trap to the OS.
Fun fact: on Linux it's possible to configure the system so virtual address 0 is (or can be) valid. Setting mmap_min_addr = 0 allows processes to mmap there. e.g. WINE needs this for emulating a 16-bit Windows memory layout.
Since that wouldn't change the internal object-representation of a NULL pointer to be other than 0, doing that would mean that NULL dereference would no longer fault. That makes debugging harder, which is why the default for mmap_min_addr is 64k.
On a simpler system without virtual memory, the OS might still be able to configure an MMU to trap on memory access to certain regions of address space. The OS's trap handler doesn't have to check anything, it knows any access that triggered it was invalid. (Unless it's also emulating something for some regions of address space...)
Delivering a signal to user-space
This part is pure software. Delivering SIGSEGV is no different than delivering SIGALRM or SIGTERM sent by another process.
Of course, a user-space process that just returns from a SIGSEGV handler without fixing the problem will make the main thread re-run the same faulting instruction again. (The OS would return to the instruction that raised the page-fault exception.)
This is why the default action for SIGSEGV is to terminate, and why it doesn't make sense to set the behaviour to "ignore".
Typically what happens is that when the CPU’s Memory Management Unit finds that the virtual address the program is trying to access is not in any of the mappings to physical memory, it raises an interrupt. The OS will have set up an Interrupt Service Routine just in case this happens. That routine will do whatever is necessary inside the OS to signal the process with SEGV. In return from the ISR the offending instruction has not been completed.
What happens then depends on whether there’s a handler installed or not for SEGV. The language’s runtime may have installed one that raises it as an exception. Almost always the process is terminated, as it is beyond recovery. Something like valgrind would do something useful with the signal, eg telling you exactly where in the code the program had got to.
Where it gets interesting is when you look at the memory allocation strategies used by C runtime libraries like glibc. A NULL pointer dereference is a bit of an obvious one, but what about accessing beyond the end of an array? Often, calls to malloc() or new will result in the library asking for more memory than has been asked for. The bet is that it can use that memory to satisfy further requests for memory without troubling the OS - which is nice and fast. However, the CPU’s MMU has no idea that that’s happened. So if you do access beyond the end of the array, you’re still accessing memory that the MMU can see is mapped to your process, but in reality you’re beginning to trample where one shouldn’t. Some very defensive OSes don’t do this, specifically so that the MMU does catch out of bounds accesses.
This leads to interesting results. I’ve come across software that builds and runs just fine on Linux which, compiled for FreeBSD, starts throwing SEGVs. GNURadio is one such piece of software (it was a complex flow graph). Which is interesting because it makes heavy use of boost / c++11 smart pointers specifically to help avoid memory misuse. I’ve not yet been able to identify where the fault is to submit a bug report for that one...
I'm building very small test program and I wanted to have the program access the same memory address every time(I know its not a good practice) to simulate some behaviors. How can I just pick a memory address to hard code in the program an try it out? Is there a way to see unused blocks of memory addresses and just block them ?
I totally understand that this might create unwanted conditions/situation.
You can use ampersand operator (&) to point a pointer to a specific memory address. However, your program must be able to able to legally access that address which is decided by what address range your OS has assigned to your program otherwise you will a segmentation fault.
Sample code:
void * p1 = (void *)0x28ff44;
Or if you want it as a char pointer:
char * p2 = (char *)0x28ff44;
PS
You can find out the address allocated to your program and take one of the addresses from it into your program. For a single run, your program will access the same memory location but for another run, it will be different one assigned to your process but same for that run.
You can refer here to check how you can read memory address assigned to your process. You can take input at runtime to provide your process id to get the filepath.
Work around
Since you mentioned it is small test program, you can also save yourself your efforts by just disabling randomization of memory addresses by disabling ASLR for your testing your program, you just disable ASLR in linux using
echo 0 > /proc/sys/kernel/randomize_va_space
and then run your program, declare and initialize a variable, print its address and then hardcode that address in your program. Bingo!! Everytime that address will be used untill you enable ASLR again.
However it is not secure to turn off ASLR and after testing you should enable ASLR again by
echo 1 > /proc/sys/kernel/randomize_va_space
I have this code that gives me a segmentation fault. My understanding of the clone function is that the parent process has to allocate space for the child process and clone calls a function that runs in that stack space. Am I misunderstanding something or does my code just not make sense?
char *stack;
char *stackTop;
stack = malloc(STACK_SIZE);
if (stack == NULL)
fprintf(stderr, "malloc");
stackTop = stack + STACK_SIZE;
myClone(childFunc, stackTop, CLONE_FILES, NULL);
int myClone(int (*fn)(void *), void *child_stack,int flags, void *arg){
int* space = memcpy(child_stack, fn, sizeof(fn));
typedef int func(void);
func* f = (func*)&space;
f();
}
There are two main reasons why this wouldn't work.
Memory protection: the relevant memory pages must be executable. Data pages, you got from malloc are not. "Normal" memory-management functions can't do this. On the other hand, the existing code pages are not writable, so you can't move one piece of code onto another. This is a fundamental memory-protection mechanism. You have to either go back to DOS or to use some advanced "debugging" interface.
Position-independent code: all memory addresses in your code must be either relative ones, or be fixuped manually. It may be too tricky to do this in C.
The clone() function is a system call. It cannot be replicated by C code running within your process.
There's a fundamental misunderstanding there. So you're getting a segmentation fault, that tells me you're trying to run this code in user space (in a process created by the operating system).
An address space is an abstraction available to the operating system. It typically uses hardware support (that of an MMU [memory management unit]) which provides means to use virtual addresses. These are addresses that are, when accessed, automatically translated to the real physical addresses according to some data structures that only the OS can manage.
I don't think it makes much sense to go into great detail here, you have enough key words to google for. The essence is: There is no way you can create an address space from user space code. That functionality is reserved to the OS and to do it, clone() on linux issues a syscall, invoking the OS.
edit: concerning the stack, providing a stack means to reserve space for it (by mapping an appropriate amount of pages to the address space) and setting the necessary processor registers when context is switched to the process (e.g. esp/ebp on i386). This, too, is something only the operating system can do.
I'm doing a bit reverse engineering practice and I got stuck at this problem. The general idea is that having a process P1 call a function, f1(). At the beginning of f1() I let it sleep so our evil process P2 can kick in. In P2, I overwrite the return address on f1()'s stack to our evil function, fevil(). But when f1() wakes up, it crashes before jump to fevil().
More detail:
I'm using a kind of OS without any memory protection. Every process can read/write the
whole memory range.
The whole thing runs on x86 architecture.
The way I do it is locate the return address on the call stack of f1(), let's say 0xffeecc, and do *((int*) 0xffeecc) = fevil;
I'm using gcc and all sort of standard C stuff.
The OS is single thread, and these two processes are the only two running, in additional to main process.
So the question is why the whole thing crashes, and if it's the correct way to jump to a function by the address of the function.
I can provide more details upon request. Thank you.
Actually your compiler might implement some memory protection : like canaries against buffer overflow/stack-smashing.
It inserts magic words before the return address and check its integrity before jumping at it.
You may have overwritten this marker.