Allocate executable ram in c on linux - c

I want to make a simple just-in-time compiler with c on Linux.
How can I allocate memory such that I can write out raw x86 code to it and execute it as any other function?

See mprotect(). Once you have filled a (n-)page-sized memory region (allocated with mmap()) with code, change its permissions to disallow writes and allow execution.

In addition to using mprotect correctly to provide first write and then execute permission, on some OS/hardware operations you may need to flush the I-cache. At this moment (mid-2010), all recent x86 processors have separate level 1 caches for instructions and data, and somebody has to make sure that if you write new instructions into memory (which will update the D-cache), you don't then try to execute stale bits from the I-cache. Exactly how to flush the I-cache from userspace will depend on both your hardware and the OS. My advice would be to read Intel's documentation on "self-modifying code" for their IA-32 multiprocessors. This should be enough to get you through.

Related

What are the exact programming instructions that are in user space?

I know that a process switches between user mode and kernel mode for running. I am confused that for every line of code, we should possibly need the kernel. Below is the example, could I get explanation of the kernels role in execution of the following coding lines. Does the following actually require kernel mode.
if(a < 0)
a++
I am confused that for every line of code, we should possibly need the kernel.
Most code in user-space is executed without the kernel being involved. The kernel becomes involved (and the CPU switches from user-space to kernel) when:
a) The user-space code explicitly asks the kernel to do something (calls a system call).
b) There's an IRQ (from a device) that interrupts user-space code.
c) The kernel is providing some functionality that user-space code is unaware of. The most common reason is virtual memory management; but debugging and profiling are other reasons.
d) Asynchronous notifications (e.g. something causing a switch to kernel so that kernel can redirect the program to a suitable signal handler).
e) The user-space code does something illegal (crashes).
Does the following actually require kernel mode.
That code (if(a < 0) a++;) probably won't require kernel's assistance; but it possibly might. For example, if the variable a is in memory that was previously sent to swap space, then any attempt to access a is a request for the kernel to fetch that data from swap space. In a similar way, if the executable file was memory mapped but not loaded yet (a common optimization to improve program startup time), then attempting to execute any instruction (regardless of what the instruction is) could ask the kernel to fetch the code from the executable file on disk.
Short answer:
It depends on what you are trying to do, following code depending on which enviroment and how its compiled it shouldn't need to use the kernel. The CPU executes machine code directly, only trapping to the kernel on instructions like syscall, or on faults like page-fault or an interrupt.
The ISA is designed so that a kernel can set up the page tables in a way that stops user-space from taking over the machine, even though the CPU is fetching bytes of its machine code directly. This is how user-space code can run just as efficiently when it's just operating on its own data, doing pure computation not hardware access.
Long answer:
Comparing something and increasing value of something shouldn't require use of a kernel, On x86 (64 bit) architecture following could be represented like this (in NASM syntax):
; a is in RAX, perhaps a return value from some earlier function
cmp rax, 0 ; if (a<0) implemented as
jnl no_increase ; a jump over the inc if a is Not Less-than 0
inc rax
no_increase:
Actual compilers do it branchlessly, with various tricks as you can see on the Godbolt compiler explorer.
Clearly there aren't any syscalls so this piece of code can be ran on any x86 device but it wouldn't be meaningful
What requires kernels are the system calls now sys calls aren't required to have a device that can output something in theory you can output something by finding a memory location that let's say corresponds to video memory and you can manipulate pixels to output something in the screen but for userland this isn't possible due virtual memory.
A userspace application needs a kernel to exist if a kernel did not exist then userspace wouldn't exist :) and please note not every kernel let's a userspace.
So only doing something like:
write(open(stdout, _O_RDWR), "windows sucks linux rocks", 24);
would obviously require a kernel.
Writing / reading to arbitary memory location for example: 0xB8000 to manipulate video memory doesn't need a kernel.
TL:DR; For example code you provided it needs a kernel to be in userspace but can be written in a system where userspace and kernel doesn't exist at all and work perfectly fine (eg: microcontrollers)
In simpler words: It doesn't require a kernel to be work since it doesn't use any system calls, but for meaningful operation in a modern operating system it would atleast require a exit syscall to exit with a code otherwise you will see Segmentation fault even though there isn't dynamic allocation done by you.
Whatever the code we write is but obvious in the realm of user mode.. Kernel mode is only going to be in picture when you write any code that performs any system call..
and since the if() is not calling any system function it's not going to be in kernel mode.

copy_from_user and segmentation

I was reading a paragraph from the "The Linux Kernel Module Programming Guide" and I have a couple of doubts related to the following paragraph.
The reason for copy_from_user or get_user is that Linux memory (on
Intel architecture, it may be different under some other processors)
is segmented. This means that a pointer, by itself, does not reference
a unique location in memory, only a location in a memory segment, and
you need to know which memory segment it is to be able to use it.
There is one memory segment for the kernel, and one for each of the
processes.
However it is my understanding that Linux uses paging instead of segmentation and that virtual addresses at and above 0xc0000000 have the kernel mapping in.
Do we use copy_from_user in order to accommodate older kernels?
Do the current linux kernels use segmentation in any way at all? If so how?
If (1) is not true, are there any other advantages to using copy_from_user?
Yeah. I don't like that explanation either. The details are essentially correct in a technical sense (see also Why does Linux on x86 use different segments for user processes and the kernel?) but as you say, linux typically maps the memory so that kernel code could access it directly, so I don't think it's a good explanation for why copy_from_user, etc. actually exist.
IMO, the primary reason for using copy_from_user / copy_to_user (and friends) is simply that there are a number of things to be checked (dangers to be guarded against), and it makes sense to put all of those checks in one place. You wouldn't want every place that needs to copy data in and out from user-space to have to re-implement all those checks. Especially when the details may vary from one architecture to the next.
For example, it's possible that a user-space page is actually not present when you need to copy to or from that memory and hence it's important that the call be made from a context that can accommodate a page fault (and hence being put to sleep).
Also, user-space data pointers need to be checked carefully to ensure that they actually point to user-space and that they point to data regions, and that the copy length doesn't wrap beyond the end of the valid regions, and so forth.
Finally, it's possible that user-space actually doesn't share the same page mappings with the kernel. There used to be a linux patch for 32-bit x86 that made the complete 4G of virtual address space available to user-space processes. In that case, kernel code could not make the assumption that a user-space pointer was directly accessible, and those functions might need to map individual user-space pages one at a time in order to access them. (See 4GB/4GB Kernel VM Split)

Reading/writing in Linux kernel space

I want to add functions in the Linux kernel to write and read data. But I don't know how/where to store it so other programs can read/overwrite/delete it.
Program A calls uf_obj_add(param, param, param) it stores information in memory.
Program B does the same.
Program C calls uf_obj_get(param) the kernel checks if operation is allowed and if it is, it returns data.
Do I just need to malloc() memory or is it more difficult ?
And how uf_obj_get() can access memory where uf_obj_add() writes ?
Where to store memory location information so both functions can access the same data ?
As pointed out by commentators to your question, achieving this in userspace would probably be much safer. However, if you insist on achieving this by modifying kernel code, one way you can go is implementing a new device driver, which has functions such as read and write that you may implement according to your needs, in order to have your processes access some memory space. Your processes can then work, as you described, by reading from and writing onto the same space more or less as if they are reading from/writing to a regular file.
I would recommend reading quite a bit of materials before diving into kernel code, though. A good resource on device drivers is Linux Device Drivers. Even though a significant portion of its information may not be up-to-date, you may find here a version of the source code used in the book ported to linux 3.x. You may find what you are looking for under the directory scull.
Again, as pointed out by commentators to your question, I do not think you should jump right into updating the execution of the kernel space. However, for educational purposes scull may serve as a good starting point to read kernel code and see how to achieve results similar to what you described.

How does a system call translate to CPU instructions?

Let's say there is a simple program like:
#include<stdio.h>
void main()
{
int x;
printf("Cool");
fd = open("/tmp/cool.txt", O_READONLY)
}
The open is a system call here. I suppose when the shell runs it, it makes some hundred other system calls to implement it? How about a declaration like int x - at some point should it have some additional system calls in the backdrop to get the memory from the computer?
I am not sure what is the boundary between a system call and a normal stuff ... everything, in the end, needs the operating system's help right?!
Or is it like the C generates an executable (code) which can be run on the processor and need no OS assistance is needed until a system call is reached - at which point it has to do something to load the OS instructions etc ...
A bit vague :) Please clarify.
I'm not answering the questions in order, so I'm prefixing my answers with the questions. I've taken the liberty of editing them a bit. You didn't specify the processor architecture, but I'm assuming you want to know about x86, so the processor-level details will pertain to x86. Other architectures can behave differently (memory management, how system calls are made, etc.). I'm also using Linux for examples.
Does the c compiler generate executable code that can be run straight on the processor without need for OS assistance until a system call is reached, at which point it has to do something to load the OS instructions?
Yes, that is correct. The compiler generates native machine code that can be run straight on the processor. The executable files that you get from the compiler, however, contain both the code and other needed data, for example, instructions on where to load the code in the memory. On Linux the ELF format is typically used for executables.
If the process is completely loaded into memory and has sufficient stack space, it will not need further OS assistance before it wants to make a system call. When you make a system call, it is just an instruction in the machine code that calls the OS. The program itself does not need to "load the OS instructions" in any way. The processor handles transferring execution to the OS code.
With Linux on the x86 architecture, one way for the machine code to make a system call is to use the software interrupt vector 128 to transfer execution to the operating system. In x86 assembly (Intel syntax), that is expressed as int 0x80. Linux will then perform tasks based on the values that the calling program placed into processor registers before making the system call: the system call number is found in the eax processor register and the system call parameters are found in other processor registers. After the OS is done, it will return a result in the eax register, and has possibly modified buffers pointed to by the system call parameters etc. Note however, that this is not the only way to make a system call.
However, if the process is not entirely in memory, and execution moves to a part of the code that is not in memory at the moment, the processor causes a page fault, which moves execution to the operating system, which then loads the required part of the process into memory and transfers execution back to the process, which can then continue execution normally, without even noticing that anything happened.
I'm not entirely sure on the next point, so take it with a grain of salt. The Wikipedia article on stack overflow (the computer error, not this site :) seems to indicate that stacks are usually of fixed size, so int x; should not cause the OS to run, unless that part of the stack is not in the memory (see previous paragraph). If you had a system with dynamic stack size (if it is even possible, but as far as I can see, it is), int x; could also cause a page fault when the stack space is used up, prompting the operating system to allocate more stack space for the process.
Page faults cause the execution to move to the operating system, but are not system calls in the usual sense of the word. System calls are explicit calls to the OS when you want it to perform some work for you. Page faults and other such events are implicit. Hardware interrupts continuously transfer the execution from your process to the OS so that it can react to them. After that it transfers the execution back to your process, or some other process.
On a multitasking OS, you can run many programs at once even if you have only one processor/core. This is accomplished by running only one program at a time, but switching between programs quickly. The hardware timer interrupt makes sure that control is transferred back to the OS in a timely fashion, so that one process can't hog the CPU all for itself. When control is passed to the OS and it has done what it needs to, it may always start a different process from the one that was interrupted. The OS handles all this totally transparently, so you don't have to think about it, and your process won't notice it. From the viewpoint of your process, it is executing continuously.
In short: Your program executes system calls only when you explicitly ask it to. The operating system may also swap parts of your process in and out of the memory when it wants to, and generally does things related and unrelated to your process in the background, but you don't normally need to think about that at all. (You can reduce the amount of page faults, though, by keeping your program as small as possible, and things like that)
In this case open() is an explicit system call, but I suppose when the shell runs it, it makes some hundred other system calls to implement it.
No, the shell has got nothing to do with an open() call in your c program. Your program makes that one system call, and shell doesn't come into the picture at all.
The shell will only affect your program when it starts it. When you start your program with the shell, the shell does a fork system call to fork off a second process, which then does an execve system call to replace itself with your program. After that, your program is in control. Before the control gets to your main() function though, it executes some initialization code, that was put there by the compiler. If you want to see what system calls a process makes, on Linux you can use strace to view them. Just say strace ls, for example, to see what system calls ls makes during its execution. If you compile a c program with just a main() function that returns immediately, you can see with strace what system calls the initialization code makes.
How does the process get its memory from the computer etc.? It has to involve some system calls again right? I am not sure what is the boundary between a system call and normal stuff. Everything in the end needs the OS help, right?
Yep, system calls. When your program is loaded into memory with the execve system call, it takes care of getting enough memory for your process. When you need more memory and call malloc(), it will make a brk system call to grow the data segment of your process if it has run out of internally cached memory to give you.
Not everything needs explicit help from the OS. If you have enough memory, have all your input in memory, and you write your output data to memory, you won't need the OS at all. That is, as long as you only do calculations on data you already have in memory, don't need more memory, and don't need to communicate with the outside world, you don't need the OS. On the other hand, a program that does not communicate with the outside world at all is a pretty useless one, because it can't get any input, and cannot give any output. Even if you calculate the millionth decimal of pi, it doesn't matter at all if you don't output it to the user.
This answer got quite big, so in case I missed something or didn't explain something clearly enough, please leave me a comment and I'll try to elaborate. If anyone spots any mistakes, be sure to point them out also.

Is there a way to disable the mmu for a specific block memory?

If we can access some block memory without mmu, while accessing other memory with mmu, a good performance gain can be achieved. I have read the intelx86_64 manual, and only to find that mission seems impossible...Or perhaps can we disble the mmu to work when accessing the specifice memory?
Can someone tell the answer to me ? Thanks!
Short answer: no, you cannot.
Long answer: you can write a kernel module that switches the CPU to 32-bit mode (if in 64-bit mode) and disables paging, while remaining in protected mode. During that time, you would be able to run only pure computations, i.e., no input/output (including networking) would be possible. (Presuming that you want to be able to restore the OS kernel and other running applications to their original state, which is essentially a must if you want to be able to save the results of your computations.)

Resources