What's the way to simulate success after SIGSEGV - c

I'm capturing a SIGSEGV on a read/write to a known block of memory. The block is mmaped and under my control, so it can be manipulated. I'd like to simulate the read/write succeeding, actually process the data and continue the application. I've got two possible solutions, but they all seem too complicated. I'm hoping there's a better way to achieve this:
Borrow the trick from debuggers and:
mmap the area and protect
wait for SIGSEGV
get the read/write size from instruction type
for reads, put the required data in memory and remove protection
single-step the app
for writes, read what was written and process
in the single-step TRAP protect the page again and continue the app
Do some crazy processing on the instruction itself and:
mmap the area and protect
wait for SIGSEGV
get the instruction under eip and simulate its effects
return after the instruction
The app is not running under root account, in case that matters.
I'm assuming x86_64 and don't really care about other platforms at the moment.

Related

Linux: Emulating memory via signal handler

I basically want to emulate memory by catching SIGSEGV to specific locations. These locations will be zero-permission-mapped using mmap(). Performance is something that doesn't matter too much as it's just an experiment. I figured out how to figure out the memory location accessed in the handler, but I am stuck trying to actually figure out weather a read of a write has happened and how do simulate a successful read with fake data or how to simulate a successful write, intercepting the data written.
Can you give me any tips, or other approaches (maybe something that hasn't got anything to do with signals at all) to this problem?
I wish there was more to find about this on the wide internet, guess nobody had this kind of stupid idea before lol
Thanks
I am not sure I understand what you mean by simulate. There is much more unknowns - the width of access, side effects, etc. In any case, if you would find all the missing pieces, what you are going to do next?
If you simply simulate an access, and return from the handler, the CPU will rerun the faulty instruction, and guess what? - it will immediately segfault again.
Now opening a can of worms.
You may try to
find what the instruction pointer in ucontext points to
decode the pointed instruction, simulate it the way you want (tricky)
keeping in mind that the instruction may have side effects (like setting/clearing flags, or modifying registers)
figure out what the nex instruction would be (doubleplustricky)
doctor ucontext appropriately
and return from the handler.
I am sure there will be too many corner cases.
Now opening another can of worms
Another approach, a more debugger-like, is to implement a single-step handler. In a segfault handler
back up the protected area, and prepare it for a read
remove the protection
enable a single-step flag in ucontext
get ready to handle the single-step exception
return
and in a single-step handler
compare the protected area with a back-up. If they are different, it was a write access; do what is deem necessary
restore the protection
return

What happens at CPU-Level if you dereference a null pointer?

Suppose I have following program:
#include <signal.h>
#include <stddef.h>
#include <stdlib.h>
static void myHandler(int sig){
abort();
}
int main(void){
signal(SIGSEGV,myHandler);
char* ptr=NULL;
*ptr='a';
return 0;
}
As you can see, I register a signalhandler and some lines further, I dereference a null pointer ==> SIGSEGV is triggered.
But how is it triggered?
If I run it using strace (Output stripped):
//Set signal handler (In glibc signal simply wraps a call to sigaction)
rt_sigaction(SIGSEGV, {sa_handler=0x563b125e1060, sa_mask=[SEGV], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffbe4fe0d30}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
//SIGSEGV is raised
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [SEGV], 8) = 0
But something is missing, how does a signal go from the CPU to the program?
My understanding:
[Dereferences null pointer] -> [CPU raises an exception] -> [??? (How does it go from the CPU to the kernel?) ] -> [The kernel is notified, and sends the signal to the process] -> [??? (How does the process know, that a signal is raised?)] -> [The matching signal handler is called].
What happens at these two places marked with ????
A NULL pointer in most (but not all) C implementations is address 0. Normally this address is not in a valid (mapped) page.
Any access to a virtual page that's not mapped by the HW page tables results in a page-fault exception. e.g. on x86, #PF.
This invokes the OS's page-fault exception handler to resolve the situation. On x86-64 for example, the CPU pushes exception-return info on the kernel stack and loads a CS:RIP from the IDT (Interrupt Descriptor Table) entry that corresponds to that exception number. Just like any other exception triggered by user-space, e.g. integer divide by zero (#DE), or a General Protection fault #GP (trying to run a privileged instruction in user-space, or a misaligned SIMD instruction that required alignment, or many other possible things).
The page-fault handler can find out what address user-space tried to access. e.g. on x86, there's a control register (CR2) that holds the linear (virtual) address that caused the fault. The OS can get a copy of that into a general-purpose register with mov rax, cr2.
Other ISAs have other mechanisms for the OS to tell the CPU where its page-fault handler is, and for that handler to find out what address user-space was trying to access. But it's pretty universal for systems with virtual memory to have essentially equivalent mechanisms.
The access is not yet known to be invalid. There are several reasons why an OS might not have bothered to "wire" a process's allocated memory into the hardware page tables. This is what paging is all about: letting the OS correct the situation, like copy-on-write, lazy allocation, or bringing a page back in from swap space.
Page faults come in three categories: (copied from my answer on another question). Wikipedia's page-fault article says similar things.
valid (the process logically has the memory mapped, but the OS was lazy or playing tricks like copy-on-write):
hard: the page needs to be paged in from disk, either from swap space or from a disk file (e.g. a memory mapped file, like a page of an executable or shared library). Usually the OS will schedule another task while waiting for I/O: this is the key difference between hard (major) and soft (minor).
soft: No disk access required, just for example allocating + zeroing a new physical page to back a virtual page that user-space just tried to write. Or copy-on-write of a writeable page that multiple processes had mapped, but where changes by one shouldn't be visible to the other (like mmap(MAP_PRIVATE)). This turns a shared page into a private dirty page.
invalid: There wasn't even a logical mapping for that page. A POSIX OS like Linux will deliver SIGSEGV signal to the offending process/thread.
So only after the OS consults its own data structures to see which virtual addresses a process is supposed to own can it be sure that the memory access was invalid.
Deciding whether a page fault is invalid or not is completely up to software. As I wrote on Why page faults are usually handled by the OS, not hardware? - if the HW could figure everything out, it wouldn't need to trap to the OS.
Fun fact: on Linux it's possible to configure the system so virtual address 0 is (or can be) valid. Setting mmap_min_addr = 0 allows processes to mmap there. e.g. WINE needs this for emulating a 16-bit Windows memory layout.
Since that wouldn't change the internal object-representation of a NULL pointer to be other than 0, doing that would mean that NULL dereference would no longer fault. That makes debugging harder, which is why the default for mmap_min_addr is 64k.
On a simpler system without virtual memory, the OS might still be able to configure an MMU to trap on memory access to certain regions of address space. The OS's trap handler doesn't have to check anything, it knows any access that triggered it was invalid. (Unless it's also emulating something for some regions of address space...)
Delivering a signal to user-space
This part is pure software. Delivering SIGSEGV is no different than delivering SIGALRM or SIGTERM sent by another process.
Of course, a user-space process that just returns from a SIGSEGV handler without fixing the problem will make the main thread re-run the same faulting instruction again. (The OS would return to the instruction that raised the page-fault exception.)
This is why the default action for SIGSEGV is to terminate, and why it doesn't make sense to set the behaviour to "ignore".
Typically what happens is that when the CPU’s Memory Management Unit finds that the virtual address the program is trying to access is not in any of the mappings to physical memory, it raises an interrupt. The OS will have set up an Interrupt Service Routine just in case this happens. That routine will do whatever is necessary inside the OS to signal the process with SEGV. In return from the ISR the offending instruction has not been completed.
What happens then depends on whether there’s a handler installed or not for SEGV. The language’s runtime may have installed one that raises it as an exception. Almost always the process is terminated, as it is beyond recovery. Something like valgrind would do something useful with the signal, eg telling you exactly where in the code the program had got to.
Where it gets interesting is when you look at the memory allocation strategies used by C runtime libraries like glibc. A NULL pointer dereference is a bit of an obvious one, but what about accessing beyond the end of an array? Often, calls to malloc() or new will result in the library asking for more memory than has been asked for. The bet is that it can use that memory to satisfy further requests for memory without troubling the OS - which is nice and fast. However, the CPU’s MMU has no idea that that’s happened. So if you do access beyond the end of the array, you’re still accessing memory that the MMU can see is mapped to your process, but in reality you’re beginning to trample where one shouldn’t. Some very defensive OSes don’t do this, specifically so that the MMU does catch out of bounds accesses.
This leads to interesting results. I’ve come across software that builds and runs just fine on Linux which, compiled for FreeBSD, starts throwing SEGVs. GNURadio is one such piece of software (it was a complex flow graph). Which is interesting because it makes heavy use of boost / c++11 smart pointers specifically to help avoid memory misuse. I’ve not yet been able to identify where the fault is to submit a bug report for that one...

Protecting heap data when a program is halted

Suppose I have a program that decrypts a file and stores the decrypted contents on the heap. I want to protect this information from other (non-root) processes running on the same system, so before I call free() to release the heap allocation, I'm using memset() to overwrite the data and make it unavailable to the next process that uses the same physical memory. (I understand this isn't a concern on some systems, but would prefer to err on the side of safety.)
However, I'm not sure what to do in cases where the program doesn't terminate normally, either through a forced termination (SIGINT, SIGTERM, etc.) or due to an error condition (SIGSEGV, SIGBUS, etc.). Should I just trap as many signals as possible to clear the heap before exiting, or is there a more orderly way of doing things?
An operating system that leaks contents of memory between processes (especially with different privileges) would be so broken from a security point of view that you doing it yourself won't change anything. Especially since on most operating systems the memory pages that you write to can at any point be taken away from you, swapped out and given to someone else. So I can safely say that you don't need to worry about normal termination unless you're on an operating system so specialized that it doesn't have anyone to leak the memory to. Also, there are certain ways to kill your process without you having any ability to catch the killing signal, so you couldn't handle all the cases anyway.
When it comes to abnormal termination (SIGSEGV, etc.) your best bet is to either disable dumping cores or at least make sure that your core dumps are only readable by you. That should be the main worry, the physical memory won't leak, but your core dumps could be readable by someone else.
That being said, it's still a very good practice to wipe secrets from memory as soon as you don't need them anymore. Not because they can leak to others through normal operation, because they can't, but because they can leak out through bugs. You might have an exploitable bug, maybe you get a stray pointer you'll write to a log, maybe you'll leave your key on the stack and then forget to initialize your data, etc. So your main worry shouldn't be to wipe out secrets from memory before exit, but to actually identify the point in your code where you don't need a secret anymore and wipe it right then and there.
Unfortunately, using memset that you mentioned is not enough. Many compilers today are smart enough to understand that some of your calls to memset are dead stores and optimize them away (like memset of a stack buffer just before leaving a function or just before free). See this issue in LibreSSL for a discussion about it, and this implementation of explicit_bzero for the currently best known attempt to work around it on clang and gcc.

Sharing memory across multiple computers?

I'd like to share certain memory areas around multiple computers, that is, for a C/C++ project. When something on computer B accesses some memory area which is currently on computer A, that has to be locked on A and sent to B. I'm fine when its only linux compitable.
Thanks in advance :D
You cannot do this for a simple C/C++ project.
Common computer hardware does not have the physical properties that support this directly: Memory on one system cannot be read by another system.
In order to make it appear to C/C++ programs on different machines that they are sharing memory, you have to write software that provides this function. Typically, you would need to do something like this:
Allocate some pages in the virtual memory address space (of each process).
Mark those pages read-only.
Set a handler to receive the exception that occurs when the process attempts to write to the read-only memory. (This handler might be in the operating system, as some sort of kernel extension, or it might be a signal handler in your process.)
When the exception is received, determine what the process was attempting to write to memory. Write that to the page (perhaps by writing it through a separate mapping in virtual memory to the same physical memory, with this extra mapping marked writeable).
Send a message by network communications to the other machine telling it that memory has changed.
Resume execution in the process after the instruction that wrote to memory.
Additionally, you need to determine what to do about memory coherence: If two processes write to the same address in memory at nearly the same time, what happens? If process A writes to location X and then reads location Y while, at nearly the same time, process B writes to location Y and reads X, what do they see? Is it okay if the two processes see data that cannot possibly be the result of a single time sequence of writes to memory?
On top of all that, this is hugely expensive in time: Stores to memory that require exception handling and network operations take many thousands, likely hundreds of thousands, times as long as normal stores to memory. Your processes will execute excruciatingly slowly whenever they write to this shared memory.
There are software solutions, as noted in the comments. These use the paging hardware in the processors on a node to detect access, and use your local network fabric to disseminate the changes to the memory. One hardware alternative is reflective memory - you can read more about it here:
https://en.wikipedia.org/wiki/Reflective_memory
http://www.ecrin.com/embedded/downloads/reflectiveMemory.pdf
Old page was broken
http://www.dolphinics.com/solutions/embedded-system-reflective-memory.html
Reflective memory provides low latency (about one microsecond per hop) in either a ring or tree configuration.

C write/read detection on memory block

i like to ask if someone have any idea how to detect a write on alloc memory address.
At first i used mprotect along with sigaction to force a segmentation fault when was made a write/read operation.
Two negative factor with this approach among several:
is the difficult to pass through a segmentation fault
the memory address pass in mprotect must be aligned to a page boundary i.e its not possible to handle this memory address with a simple malloc.
To clarify the problematic:
I construct a app in C for cluster environment. In some point I allocate memory that i call buffer in a local host and assign some data. This buffer will be sent to a remote node and have the same procedure. At same point this buffer will be write/read in remote node but i don't know when(it will be used DMA to write/read buffer), the local host must be notified about buffer modification. Like i said above i already used some mechanisms but none of them
its capable to handle it with some ability. For now i just want some idea.
Every different idea here its welcome.
Thanks
You could use hardware breakpoints. The downsides are that this is hardware specific and only a limited number of breakpoints can be set. Also most of the times such facilities are not task specific, so if you run multiple instances of the program they'll share the number of available 'slots'.
The x86 architecture has debug registers which can be used to set hardware memory breakpoints (see: http://en.wikipedia.org/wiki/X86_debug_register).
If you want to test this you could use GDB to set hardware breakpoints. You can use the 'watch' command of GDB to place a hardware memory breakpoint on a variable.
Note that using debug registers and mprotect() are just methods to get the job done you're asking for, I don't think they are sound engineering practices for doing memory management (what you probably try to do here). Maybe you can explain a bit more about what you trying to do at a higher level: http://catb.org/esr/faqs/smart-questions.html#goal

Resources