I'm new to VxWorks and working with an ELF binary for VxWorks. System calls appear to trap into the kernel by calling the address _func_syscallTrapHandle which is 0x1234. Since the program must transition into the kernel, am I correct in assuming that the goal of this is to segfault by accessing low memory to enter the kernel? If so does the segfault ISR check the contents of rax and, when it's 0x1234 perform systemcall logic? Why isn't the syscall instruction used instead?
You are describing the system call trap mechanism in vxsim; as VxWorks, in this case, is executed as normal process inside Linux or Windows it cannot use syscall instruction.
An elf binary for real hardware behaves differently.
Related
I know that a process switches between user mode and kernel mode for running. I am confused that for every line of code, we should possibly need the kernel. Below is the example, could I get explanation of the kernels role in execution of the following coding lines. Does the following actually require kernel mode.
if(a < 0)
a++
I am confused that for every line of code, we should possibly need the kernel.
Most code in user-space is executed without the kernel being involved. The kernel becomes involved (and the CPU switches from user-space to kernel) when:
a) The user-space code explicitly asks the kernel to do something (calls a system call).
b) There's an IRQ (from a device) that interrupts user-space code.
c) The kernel is providing some functionality that user-space code is unaware of. The most common reason is virtual memory management; but debugging and profiling are other reasons.
d) Asynchronous notifications (e.g. something causing a switch to kernel so that kernel can redirect the program to a suitable signal handler).
e) The user-space code does something illegal (crashes).
Does the following actually require kernel mode.
That code (if(a < 0) a++;) probably won't require kernel's assistance; but it possibly might. For example, if the variable a is in memory that was previously sent to swap space, then any attempt to access a is a request for the kernel to fetch that data from swap space. In a similar way, if the executable file was memory mapped but not loaded yet (a common optimization to improve program startup time), then attempting to execute any instruction (regardless of what the instruction is) could ask the kernel to fetch the code from the executable file on disk.
Short answer:
It depends on what you are trying to do, following code depending on which enviroment and how its compiled it shouldn't need to use the kernel. The CPU executes machine code directly, only trapping to the kernel on instructions like syscall, or on faults like page-fault or an interrupt.
The ISA is designed so that a kernel can set up the page tables in a way that stops user-space from taking over the machine, even though the CPU is fetching bytes of its machine code directly. This is how user-space code can run just as efficiently when it's just operating on its own data, doing pure computation not hardware access.
Long answer:
Comparing something and increasing value of something shouldn't require use of a kernel, On x86 (64 bit) architecture following could be represented like this (in NASM syntax):
; a is in RAX, perhaps a return value from some earlier function
cmp rax, 0 ; if (a<0) implemented as
jnl no_increase ; a jump over the inc if a is Not Less-than 0
inc rax
no_increase:
Actual compilers do it branchlessly, with various tricks as you can see on the Godbolt compiler explorer.
Clearly there aren't any syscalls so this piece of code can be ran on any x86 device but it wouldn't be meaningful
What requires kernels are the system calls now sys calls aren't required to have a device that can output something in theory you can output something by finding a memory location that let's say corresponds to video memory and you can manipulate pixels to output something in the screen but for userland this isn't possible due virtual memory.
A userspace application needs a kernel to exist if a kernel did not exist then userspace wouldn't exist :) and please note not every kernel let's a userspace.
So only doing something like:
write(open(stdout, _O_RDWR), "windows sucks linux rocks", 24);
would obviously require a kernel.
Writing / reading to arbitary memory location for example: 0xB8000 to manipulate video memory doesn't need a kernel.
TL:DR; For example code you provided it needs a kernel to be in userspace but can be written in a system where userspace and kernel doesn't exist at all and work perfectly fine (eg: microcontrollers)
In simpler words: It doesn't require a kernel to be work since it doesn't use any system calls, but for meaningful operation in a modern operating system it would atleast require a exit syscall to exit with a code otherwise you will see Segmentation fault even though there isn't dynamic allocation done by you.
Whatever the code we write is but obvious in the realm of user mode.. Kernel mode is only going to be in picture when you write any code that performs any system call..
and since the if() is not calling any system function it's not going to be in kernel mode.
I just could not wrap my head around the idea of debuggers and probing tools.
How is it technically possible to insert debugging printk statements inside running kernel module or user space applications -- using Kprobe and Uprobe. what terminology is used to define the behavior of Kprobe and Uprobe in terms Memory -- how is it possible to stretch the address space in program running state.
There are usually single-byte instructions that cause a breakpoint (software interrupt) and then there are some debug registers in the processor too.
With these it is possible to insert a trap that jumps to kernel trap handler anywhere in memory without extending any "memory space" - you just set the debug registers or replace the desired instruction at the breakpoint with that trap instruction.
Within the kernel trap handler the kernel would get to know the exact address where the fault occurred and therefore inspect the state of the registers and so forth. In case of a trap by a single-byte instruction or so, you'd replace the trap instruction with the original one; possibly use a processor trick to single step it; and then replace with the trap instruction again...
I trying to understand the internals of the Linux kernel by reading Robert Love's Linux Kernel Development.
On page 74 he says the easiest way to pass arguments to a syscall is via :
Somehow, user-space must relay the parameters to the kernel during the
trap.The easiest way to do this is via the same means that the syscall
number is passed: The parameters are stored in registers. On x86-32,
the registers ebx, ecx, edx, esi, and edi contain, in order, the first
five arguments.
Now this is bothering me for a number of reasons:
All syscalls are defined with the asmlinkage option. Which implies that the arguments are always to be found on the stack and not the register. So what is all this business with the registers ?
It may be possible that before the syscall is performed the values are copied on to the kernel stack. I have no idea why that would be efficient but it might be a possibility.
(This answer is for 32-bit x86 Linux to match your question; things are slightly different for 64-bit x86 and other architectures.)
The parameters are passed from userspace in registers as Love says.
When userspace invokes a system call with int $0x80, the kernel syscall entry code gets control. This is written in assembly language and can be seen here, for instance. One of the things this code does is to take the parameters from the registers and push them onto the stack, and then call the appropriate kernel sys_XXX() function (which is written in C). So those functions do indeed expect their arguments on the stack.
It wouldn't work as well to try to pass parameters from userspace to the kernel on the stack. When the system call is made, the CPU switches to a separate kernel stack, so the parameters would have to be copied from the userspace stack to the kernel stack, and this is somewhat complicated. And it would have to be done even for very simple system calls that just take a few numeric arguments and wouldn't otherwise need to access userspace memory at all (think about close() for instance).
ptrace can get the registers and memory data when entry/exit syscall. But if linux syscall handler change some memory include some place in stack, How can I get to know which memory has been changed.
You cannot; but for example strace (which in turn uses ptrace) knows the semantics of most (all?) syscalls and can show you the memory changed.
For example, if the syscall-number is 0, strace knows, that the read()-syscall is invoked and that the kernel will write to the address specified in the second parameter. The number of bytes written there equals the return value of the syscall.
Now, the contents of these memory locations can be read with PTRACE_PEEK* and be displayed to you.
However, when it comes to custom syscalls with unknown or less-strict semantics (for example a proposed syscall write_to_random_memory_location()); you cannot determine memory changes with ptrace() in general (neither from kernel nor from userspace).
Depending on what you need to achieve, a general solution can only be to utilize some sort of virtualization (for example, what valgrind does in userspace) and simulate/watch all memory accesses.
I am developing a simple Operating System only to know its internals better. On developing a Boot loader and a simple kernel that runs on 16-bit Real Mode, I came across the unfamiliar term System Call and a familiar Interrupt.
I have been Googling the terms since only to find that the concepts are still unclear to me. As far as I have understood, the System calls are used by the Application programs running in least privileged mode to request for a service to the Kernel running in Higher Privileged mode(Ring 0).
I am still unclear of How the System Calls are implemented.
Say, I am writing a Simple C program to print a word and compiling it. Now, I am left with an executable file that contains a System Call to print the given word on screen. My questions corresponding to the given scenario are as follows:
Question 1:
As soon the Program is executed, the system call informs the kernel of the request - What exactly happens here in terms of low level programming?
Question 2:
Can an Interrupt be a System Call or vice versa?
If it seems that I have not understood the concepts clearly, Kindly explain me the Concept of System Call.
Thanking you.
On most systems, interrupts and system calls (and exception handlers) are implemented in the same way.
As soon the Program is executed, the system call informs the kernel of the request - What exactly happens here in terms of low level programming?
Usually, system calls are wrappers around assembly language routines. The sequence of events is:
Call to System Routine
System Routine unpacks parameters and loads them into registers.
System Routine forces an exception (identified by a number) by executing a change mode instruction (to some mode higher than user mode).
The CPU handles the exception by dispatching to an exception handler in the system dispatch table.
The handler performs the system service.
The handler executes a return from exception or interrupt instruction, returning the process to user mode (or whatever mode was called from) and to the system service routine.
The system service routine unpacks the return values from registers and updates the parameters.
Return to the calling function.
Can an Interrupt be a System Call or vice versa?
No. They are dispatched in the same way.
Presumably an operating system could map system calls and interrupts to the same handler but that would be screwy.
System Calls are like function calls to the operating system, that perform operations that cannot or should not be handled manually by the programs and fall in the task scope of the operating system, e.g. file manipulation, writing to screen etc.
The x86 handles handles interrupts by some kind of callback mechanism. All kinds of external interrupt are given an interrupt number. The operating system sets up a table, (the interrupt vector table in real mode and the interrupt descriptor table in protected mode), that stores pointers to functions that handle the corresponding interrupt. For example if the pressing a key interrupt would be assigned to int 21h upon receiving the interrupt from the interrupt controller, the CPU stores the current code segment, instruction pointer, flags and stack and then the CPU will examine entry 21h in the interrupt table and reads out the address where the instruction handler is located. It then executes the handler and resumes normal execution.
However this behavior of calling an handler in the interrupt table can not only be triggered by real hardware interrupts, but also by an internal exception (like divide by zero, reaching an undefined opcode, etc.). The exceptions are assigned to interrupt numbers that are hopefully different to the ones used by hardware interrupts.
Finally any interrupt can also be triggered directly by the currently executed program using the "int n" instruction.
This last feature is often used for system calls. The reason is that the user program only needs to know the interrupt number (witch is usually standardized (DOS uses mainly 21h, Linux mainly 80h) and the operation system can located the interrupt handler wherever it likes it to be and store its address in the the corresponding interrupt table entry.
Keep in mind that there are other ways to implement system calls. For example in protected mode the x86 provides call gates witch are special segments that cause a system call if your try to load them into CS using a far call. Newer processors provide special syscall instructions that are faster them interrupts