I have been working with the RISCV ISA over the weekend. The command
00000011 | 00000| 000 | 01010| 0010011 | ADDI
(RISCV-ISA-Specification page 116) should assign the value 15 to the a0- register.
I add the command as
__asm__ volatile(".byte 0x13, 0x05, 0xF0, 0x00");
(little endian already considered) in my c-file in the main, it compiles, but at (gbd) run I get a SIGSEV Error
As I could read, the error is related to an invalid memory address.
A Segmentation Fault occurs when a program tries to access a memory location that it is not allowed to access, or when it tries to access a memory location in a way that is not allowed (for example, when it tries to write to a read-only memory location or to overwrite part of the operating system).
On Unix-like operating systems, a signal called SIGSEGV is sent to a process accessing an invalid memory address.
The error occurs only with the command. Similar commands like LW are failing, too
Related
I'm new to VxWorks and working with an ELF binary for VxWorks. System calls appear to trap into the kernel by calling the address _func_syscallTrapHandle which is 0x1234. Since the program must transition into the kernel, am I correct in assuming that the goal of this is to segfault by accessing low memory to enter the kernel? If so does the segfault ISR check the contents of rax and, when it's 0x1234 perform systemcall logic? Why isn't the syscall instruction used instead?
You are describing the system call trap mechanism in vxsim; as VxWorks, in this case, is executed as normal process inside Linux or Windows it cannot use syscall instruction.
An elf binary for real hardware behaves differently.
Suppose I have following program:
#include <signal.h>
#include <stddef.h>
#include <stdlib.h>
static void myHandler(int sig){
abort();
}
int main(void){
signal(SIGSEGV,myHandler);
char* ptr=NULL;
*ptr='a';
return 0;
}
As you can see, I register a signalhandler and some lines further, I dereference a null pointer ==> SIGSEGV is triggered.
But how is it triggered?
If I run it using strace (Output stripped):
//Set signal handler (In glibc signal simply wraps a call to sigaction)
rt_sigaction(SIGSEGV, {sa_handler=0x563b125e1060, sa_mask=[SEGV], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7ffbe4fe0d30}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
//SIGSEGV is raised
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [SEGV], 8) = 0
But something is missing, how does a signal go from the CPU to the program?
My understanding:
[Dereferences null pointer] -> [CPU raises an exception] -> [??? (How does it go from the CPU to the kernel?) ] -> [The kernel is notified, and sends the signal to the process] -> [??? (How does the process know, that a signal is raised?)] -> [The matching signal handler is called].
What happens at these two places marked with ????
A NULL pointer in most (but not all) C implementations is address 0. Normally this address is not in a valid (mapped) page.
Any access to a virtual page that's not mapped by the HW page tables results in a page-fault exception. e.g. on x86, #PF.
This invokes the OS's page-fault exception handler to resolve the situation. On x86-64 for example, the CPU pushes exception-return info on the kernel stack and loads a CS:RIP from the IDT (Interrupt Descriptor Table) entry that corresponds to that exception number. Just like any other exception triggered by user-space, e.g. integer divide by zero (#DE), or a General Protection fault #GP (trying to run a privileged instruction in user-space, or a misaligned SIMD instruction that required alignment, or many other possible things).
The page-fault handler can find out what address user-space tried to access. e.g. on x86, there's a control register (CR2) that holds the linear (virtual) address that caused the fault. The OS can get a copy of that into a general-purpose register with mov rax, cr2.
Other ISAs have other mechanisms for the OS to tell the CPU where its page-fault handler is, and for that handler to find out what address user-space was trying to access. But it's pretty universal for systems with virtual memory to have essentially equivalent mechanisms.
The access is not yet known to be invalid. There are several reasons why an OS might not have bothered to "wire" a process's allocated memory into the hardware page tables. This is what paging is all about: letting the OS correct the situation, like copy-on-write, lazy allocation, or bringing a page back in from swap space.
Page faults come in three categories: (copied from my answer on another question). Wikipedia's page-fault article says similar things.
valid (the process logically has the memory mapped, but the OS was lazy or playing tricks like copy-on-write):
hard: the page needs to be paged in from disk, either from swap space or from a disk file (e.g. a memory mapped file, like a page of an executable or shared library). Usually the OS will schedule another task while waiting for I/O: this is the key difference between hard (major) and soft (minor).
soft: No disk access required, just for example allocating + zeroing a new physical page to back a virtual page that user-space just tried to write. Or copy-on-write of a writeable page that multiple processes had mapped, but where changes by one shouldn't be visible to the other (like mmap(MAP_PRIVATE)). This turns a shared page into a private dirty page.
invalid: There wasn't even a logical mapping for that page. A POSIX OS like Linux will deliver SIGSEGV signal to the offending process/thread.
So only after the OS consults its own data structures to see which virtual addresses a process is supposed to own can it be sure that the memory access was invalid.
Deciding whether a page fault is invalid or not is completely up to software. As I wrote on Why page faults are usually handled by the OS, not hardware? - if the HW could figure everything out, it wouldn't need to trap to the OS.
Fun fact: on Linux it's possible to configure the system so virtual address 0 is (or can be) valid. Setting mmap_min_addr = 0 allows processes to mmap there. e.g. WINE needs this for emulating a 16-bit Windows memory layout.
Since that wouldn't change the internal object-representation of a NULL pointer to be other than 0, doing that would mean that NULL dereference would no longer fault. That makes debugging harder, which is why the default for mmap_min_addr is 64k.
On a simpler system without virtual memory, the OS might still be able to configure an MMU to trap on memory access to certain regions of address space. The OS's trap handler doesn't have to check anything, it knows any access that triggered it was invalid. (Unless it's also emulating something for some regions of address space...)
Delivering a signal to user-space
This part is pure software. Delivering SIGSEGV is no different than delivering SIGALRM or SIGTERM sent by another process.
Of course, a user-space process that just returns from a SIGSEGV handler without fixing the problem will make the main thread re-run the same faulting instruction again. (The OS would return to the instruction that raised the page-fault exception.)
This is why the default action for SIGSEGV is to terminate, and why it doesn't make sense to set the behaviour to "ignore".
Typically what happens is that when the CPU’s Memory Management Unit finds that the virtual address the program is trying to access is not in any of the mappings to physical memory, it raises an interrupt. The OS will have set up an Interrupt Service Routine just in case this happens. That routine will do whatever is necessary inside the OS to signal the process with SEGV. In return from the ISR the offending instruction has not been completed.
What happens then depends on whether there’s a handler installed or not for SEGV. The language’s runtime may have installed one that raises it as an exception. Almost always the process is terminated, as it is beyond recovery. Something like valgrind would do something useful with the signal, eg telling you exactly where in the code the program had got to.
Where it gets interesting is when you look at the memory allocation strategies used by C runtime libraries like glibc. A NULL pointer dereference is a bit of an obvious one, but what about accessing beyond the end of an array? Often, calls to malloc() or new will result in the library asking for more memory than has been asked for. The bet is that it can use that memory to satisfy further requests for memory without troubling the OS - which is nice and fast. However, the CPU’s MMU has no idea that that’s happened. So if you do access beyond the end of the array, you’re still accessing memory that the MMU can see is mapped to your process, but in reality you’re beginning to trample where one shouldn’t. Some very defensive OSes don’t do this, specifically so that the MMU does catch out of bounds accesses.
This leads to interesting results. I’ve come across software that builds and runs just fine on Linux which, compiled for FreeBSD, starts throwing SEGVs. GNURadio is one such piece of software (it was a complex flow graph). Which is interesting because it makes heavy use of boost / c++11 smart pointers specifically to help avoid memory misuse. I’ve not yet been able to identify where the fault is to submit a bug report for that one...
Honestly, I am really confused with this particular virtual memory related concept.
Q1) When a page fault occurs, does the processor first finishes the execution of the current instruction and then moves the IP register contents (address of next instruction) to the stack? Or, it aborts current instruction being executed and moves the contents of instruction pointer register to stack?
Q2) If the second case is true, then how does it resume the instruction which was aborted because when if it resumes, the stack contains the instruction pointer value which is nothing but the address of the next instruction. So it will never resume the instruction where the page fault occurred.
What I think
I think the second case sounds wrong. The confusion occurred while i was reading Operating System Principles by Silbershatz and Galvin. In that they have written
when a page fault occurs, we will have to bring in the desired page, correct page table and restart the instruction.
But the instruction pointer always points to the address of the next instruction so it means, according to what this book is trying to convey, we are decrementing the value of IP just to restart the execution of the instruction where the page fault occurred?
In the Intel System Programming guide, chapter 6.5, it says
Faults — A fault is an exception that can generally be corrected and that, once corrected, allows the program
to be restarted with no loss of continuity. When a fault is reported, the processor restores the machine state to
the state prior to the beginning of execution of the faulting instruction. The return address (saved contents of
the CS and EIP registers) for the fault handler points to the faulting instruction, rather than to the instruction
following the faulting instruction.
A page fault is classified as a fault (no surprises there), so when a page fault happened you're in the state "before it ever happened" - well not really, because you're in the fault handler (so EIP and ESP are definitely different, also CR2 contains the address), but when you return it'll be the state before the ever happened, only with changes made by the handler (so, put there page there, or kill the process)
How can Instruction Pointer register recover from a bad read or bad jump?
Kernel makes the call to an init code that will call the main() program. If the main() program makes a stack overflow or whatever and RIP/EIP/IP fills with junk, how can the OS recover the CPU register?
CPU has only one instruction pointer right? So recovering from a overflow seems impossible to my point of view.
Yes, if the IP gets trashed and that causes a fault, only the bad value is known. It's unclear what you mean by "recovering from overflow". Of course the fault handler of the OS has a well defined address and the cpu goes there so IP will be well defined from then on. The OS may decide to terminate the process or if the program has installed a signal/exception handler the OS will make sure that is called. This handler can then load IP with an appropriate value.
When you trash the IP in the usermode, eventually a hardware fault occurs, be it a page fault, illegal opcode or something like that. Then the processor switches to supervisor/kernel mode and starts running a fault handler by setting the instruction pointer to a well-defined value.
The kernel code will then inspect the address at which the exception happened and/or the type of the exception. Upon finding that it was because of any of these usually the kernel will then terminate the malfunctioning user-mode process.
If the IP gets loaded with an address from which it cannot execute, it triggers an EXCEPTION. A CPU usually recognizes a number of different types of exceptions and they are identified by a different number.
When the exception occurs, it causes the CPU to switch to kernel mode. That in turn causes the CPU to load the IP with the address of a handler defined to handle the specific type of exception and to load a kernel mode stack.
There are two types of exceptions: faults and traps. After a fault, the original instruction in the IP can be restarted. A trap is a fatal error. What happens at this point depends upon the type of exception.
If its a page fault, the handler will try to load the page into memory.
For most other exceptions, the handler will try to find a user mode handler for the specific type of exception. See the signal function in eunuchs.
I used to think that x86-64 supports unaligned memory access and invalid memory access always causes segmentation fault (except, perhaps, SIMD instructions like movdqa or movaps). Nevertheless recently I observed bus error with normal mov instruction. Here is a reproducer:
void test(void *a)
{
asm("mov %0, %%rbp\n\t"
"mov 0(%%rbp), %%rdx\n\t"
: : "r"(a) : "rbp", "rdx");
}
int main()
{
test((void *)0x706a2e3630332d69);
return 0;
}
(must be compiled with frame pointer omission, e.g. gcc -O test.c && ./a.out).
mov 0(%rbp), %rdx instruction and the address 0x706a2e3630332d69 were copied from a coredump of the buggy program. Changing it to 0 causes segfault, but just aligning to 0x706a2e3630332d60 is still bus error (my guess is that it is related to the fact that address space is 48-bit on x86-64).
The question is: which addresses cause bus error (SIGBUS)? Is it determined by architecture or configured by OS kernel (i.e. in page table, control registers or something similar)?
SIGBUS is in a sad state. There's no consensus between different operating systems what it should mean and when it is generated varies wildly between operating systems, cpu architectures, configuration and the phase of the moon. Unless you work with a very specific configuration you should just treat it "just like SIGSEGV, but different".
I suspect that originally it was supposed to mean "you tried a memory access that could not possibly be successful no matter what the kernel does", so in other words the exact bit pattern you have in the address can never be a valid memory access. Most commonly this would mean unaligned access on strict alignment architectures. Then some systems started using it for accesses to virtual address space that doesn't exist (like in your example, the address you have can't exist). Then by accident some systems made it also mean that userland tried to touch kernel memory (since at least technically it's virtual address space that doesn't exist from the point of view of userland). Then it became just random.
Other than that I've seen SIGBUS from:
access to non-existent physical address from mmap:ed hardware.
exec of non-exec mapping
access to perfectly valid mapping, but overcommitted memory couldn't be faulted in at this moment (I've seen SIGSEGV, SIGKILL and SIGBUS here, at least one operating system does this differently depending on which architecture you're on).
memory management deadlocks (and other "something went horribly wrong, but we don't know what" memory management errors).
stack red zone access
hardware errors (ECC memory, pci bus parity errors, etc.)
access to mmap:ed file where the file contents don't exist (past the end of the file or a hole).
access to mmap:ed file where the file contents should exist, but don't (I/O errors).
access to normal memory that got swapped out and swap in couldn't be performed (I/O error).
Generally, a SIGBUS can be sent on an unaligned memory access, i.e. when writing a 64-bit integer to an address, which is not 8-byte aligned. However, in recent systems. either the hardware itself handles it correctly (albeit a bit slower than an aligned access), or the OS emulates the access it in an exception handler (with 2 or more separate memory accesses).
In this case, the problem is, that an address outside the permissible virtual address address space was specified. Despite a pointer has 64-bit, only the address space from 0-(2^48-1) (0x0-0xffffffffffff) is valid on current 64-bit intel processors. Linux provides even less address space to its processes, from 0-(2^47-1) (which is 0-0x7fffffffffff), the rest (0x800000000000-0xffffffffffff) is used by the kernel.
This means, that the kernel sends a SIGBUS because of an access to an invalid address (every address >= 0x800000000000), as opposed to a SIGSEGV, which means, that an access error to a valid address occurred (missing page entry, wrong access rights, etc.).
The only situation where POSIX specifically requires generation of a SIGBUS is, when you create a file-backed mmap region that extends beyond the end of the backing file by more than a whole page, and then access addresses sufficiently far past the end. (The exact words are "References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.", from the specification of mmap.)
In all other circumstances, whether you get a SIGSEGV or a SIGBUS for an invalid memory access, or no signal at all, is left completely up to the implementation.