Why is executing callback functions in kernel mode bad? - c

Why is calling callback functions in kernel-space from user-space considered 'bad' vs calling callback functions in user-space from user-space?

Allowing the user to execute code in kernel mode would be an enormous security risk. That is to say, if a user space program is executing in kernel mode, there is no security: the game is completely lost and the user has full access to everyone and everything.
Consider that if you're executing in kernel mode, virtual memory lookups are no longer protected by privilege level. In x86, when in kernel mode, you have a privilege level of 0; meaning you can access anything in physical memory. So, if a process' callback were executing in kernel space, it would be able to do anything it wanted to literally anything on the machine.
Want to erase everybody's page tables? K. Want to instead see what's in those page tables? You got it. Want to zero out kernel memory and cause the entire system to crash? Lolz good idea. Want to hack another process on the machine so that it logs its I/O traffic? Seems legit.
Don't let the user run code in kernel space.

There are systems that support such callbacks. E.g.:
http://h41379.www4.hpe.com/doc/732final/4527/4527pro_001.html#index_x_190
In such a case, your process must have the appropriate privilege to execute in kernel mode.
Allowing code to execute in kernel mode opens the system up to crashes and security holes.

Related

Why do system calls exist

I have been reading about system calls and how they work in Linux. I still have more reading to do but one thing that nothing I have read has answered is, WHY do we need system calls?
I understand that system calls are requests from user space program for the kernel to do something, but my question is basically: Why can't the user space program do the thing itself? Why doesn't Glibc do the actual operation instead of just being a wrapper for a system call?
For example, if I call fopen() in my program, why does glibc call the open system call? Why doesn't glibc just do the operation itself?
I understand that it would mean that glibc developers would have a lot more work and that they would have to have an intimate knowledge of Linux, but isn't glibc already very closely related to Linux kernel?
Also, I understand the system call functions are run in ring 0 in the CPU...but what's really the point of that? If I execute a program, I am giving it express permission to run, so what security is added by separating what code can be run in different contexts since you are giving it all permission anyway?
Why doesn't glibc just do the operation itself?
Well that is more less the ways things went in good old MS/DOS systems: no separation between kernel code and user code, and user code could happily directly access the hardware.
This just has 2 major problems:
It works (rather) fine on a single user and not multi tasking system, but as soon as multiple programs can simultaneously run in a system, you have to synchronize hardware accesses and memory usage => those are the parts dedicated to the kernel
There is no protection of the system from a poorly coded program. In a modern OS, an erroneous program can crash, but the system itself should survive. In MS/DOS a program crash usually ended in a system reboot.
For those reasons, all modern OS (except maybe some lightweight embedded ones) use isolation between different user processes and the kernel. And that just mean that you need a way to allow a user mode process to require a privileged action (reading or writing a physical disk is) from the kernel: that is exactly what system calls are made for.
Why doesn't glibc just do the operation itself?
Short answer: Because it can't.
Long answer:
A program running in Linux can run in two modes : UserLand or KernelLand.
The Kernel Land has every rights and can do everything, including talking with hardware, or providing userspace callbacks. For instance, when you call fopen(), the kernel does all the dirty talking with your filesystem (ext4 for instance), the caching, everything down to talking with the SATA Controller to access data on the hard-drive.
GLibc could do that using the device exposed by the kernel in /dev, but that would mean recoding from scratch all the filesystems layers, the sockets, the firewalling...
The kernel just provides easy usable API for programmers to have elevated rights and communicate with the devices. That's how Linux (and most modern OS) is made.
What security is added by separating what code can be run in different contexts since you are giving it all permission anyway?
The permissions are managed by the kernel. If you don't have syscall, you don't have permissions. Or should the program you run check their own permission? Once again, it would be reinventing the wheel every time.
If the code generated by a C implementation were the only thing that were going to be running on the target system (as it would be for many freestanding implementations, and for a very small number of hosted implementations) and if implementation knew precisely what hardware it would be running upon (true of some freestanding implementations, but seldom true for hosted ones), its runtime library might be able to perform operations like "fopen" by directly communicating with the storage hardware. It is rare, however, for either condition to apply, much less both of them.
If multiple programs will be using storage device, it will generally be necessary that they either coordinate their actions somehow or else that sequences of operations performed by different programs do not overlap, and that every program "forget" anything it thinks it knows about the state of storage any time another program might have written to it.
Otherwise, suppose a disk contains a single file and program #1 uses "fopen" to open it for reading. Each directory sector holds 8 entries, so the program would read the first directory sector and observe that slot #0 identifies the file of interest while #1-#7 are blank.
Now suppose program #2 uses "fopen" to create a file for writing. It would read the directory sector, observe that slots #1-#7 are blank, and rewrite the directory sector with information about the new file in slot #1.
Finally, suppose program #1 wants to write a file. If it doesn't know about program #2, it might reasonably believe it knows what the directory contains (it had read it earlier, and has no reason to believe it's changed), place information about the new file in slot #1, and replace the directory sector on disk with its new version, obliterating the entry written by program #2.
Having both programs route their operations through an operating system ensures that when program #2 wants to create its file, it can exploit the fact that it had just read the directory for program #1 (and thus doesn't need to reread it). More importantly, when program #1 goes to write a file, the operating system will know that the directory contains the file written by program #2, and will thus ensure that the new file gets placed in slot #2.
Contrary to what other answers say, even microcomputer C implementations running on platforms like MS-DOS essentially always relied upon the OS for file I/O. Some would include their own console I/O routines because the ones in MS-DOS were about four times as slow as they should have been, but the need for coordination when using file I/O meant that very few programs would try to do it themselves.
1- You don't wanna deal with low-level hardware communications. At least most people don't. Each of them has hundreds of commands.
2- Make a simple mistake and your CPU/RAM or I/O device might be useless forever.
3- When you are part of a network, you can share resources. The system calls and kernel keeps your co-worker from damaging your hard disk.
Another consideration is that the OS kernel needs to provide an abstraction for the myriad different types of hardware via a uniform API - without which you'd invariably be making device specific calls in your program.
While the previously-idle disk spins up for two seconds, or the networked disk gets connected for thirty seconds, what is the library going to do?
The full answer to your question is very broad but let me take a simple example based upon your question about fopen.
Let us say that we have a large system that has hundred or thousands of users. One of those users is, say the HR department with files containing confidential information about employees.
If that disk could be accessed at will in user mode, then any person on the system could open any file on the system, including those with confidential information.
In other words operating systems managed SHARED resources. These include disk, CPU, and memory. If these could be controlled in user mode, there would be no way to ensure that these were shared equitably.

Are processes "sandboxed" by hardware?

Can a process access all of the RAM or does the CPU give the process a specific part which the kernel decides, and the process (running in user space) can't change? In other words - is a process sandboxed by hardware, or can it do anything, but is monitored by the OS?
EDIT
I'm told in the comments that this is too broad, so let's assume x86/x64. I'll also add that the question arose while reading what I understood to say that processes can access all RAM - which seems to conflict with what I've read about security in OSs.
If you count MS-DOS as an "operating system", then processes can do anything (and aren't monitored). Even Windows95 doesn't have real memory protection, and a buggy process can crash the machine by scribbling over the wrong memory.
If you only count modern OSes with privilege separation (Unix/Linux, Windows NT and derivates), then processes are sandboxed.
AFAIK, there aren't really systems where there's monitoring of any kind other than "fault if you try to do something". The kernel sets the boundaries, and the user-space process gets a fault if it tries to go outside them.
If you're imagining that maybe the kernel looks at what an unprivileged process does, and adapts accordingly, then no, that's not what happens.
See
https://en.wikipedia.org/wiki/Memory_protection: Usually achieved by giving each process its own virtual address space (virtual memory). This is hardware-supported: every address your code uses is translated to a physical address by a fast translation cache (TLB), which caches the translation tables set up by the OS (aka page tables).
A process can't directly modify its own page tables: it has to ask the kernel to map more physical memory into its address space (e.g. as part of malloc()). So the kernel has a chance to verify that the request is ok before doing it.
Also, a process can ask the kernel to copy data to/from files (or other things) into its memory space. (write/read system calls).
https://en.wikipedia.org/wiki/User_space: normal processes run in user-mode, which is a mode provided by the hardware where privileged instructions will trap to the kernel.

Force a C program to run in kernel mode for a while

I wanted to test out something when the process is in kernel mode. A period of 10-15 seconds completely in kernel mode should be enough.
Is there any way to force a program written in C to run in kernel mode for a while? Would a single read system call on a huge buffer do?
There is no way to force running program to run in kernel mode.
In kernel mode there are more privileges available. It means that
program could execute special instructions that it cannot execute in
user mode.
Obviously, there are no way, AFAIK, to force program in user mode to
switch to kernel mode, because it compromises security: imagine,
that every process on will could switch to user mode, malicious
code can cause disaster by altering page tables, changing CPU settings, turning off interrupts and so on.
The only valid way of "switching" to kernel mode is executing system call.
That will cause trap to the kernel and after some kernel code is executed
control will return back to user code. However, I think that is not what you
asked about.

Printing to stdout without the involvement of functions

How can one print some arbitrary text on-screen without the call to any function ?
Another way of asking this question is how are i/o functions implemented ?
I tried searching in google, but unluckily, no result was found similarly to as if it was some sort of a top secret thing.
For study purposes.
I personally don't feel even close to a real programmer without knowing this.
Well, in the end, loosely speaking, the low level software in the computer sets a special memory location or uses a special instruction that changes the voltage on some pins on the CPU, and the hardware responds to these changes.
But user-level processes don't have access to those instructions or memory locations. Messing with stuff that drives the hardware is the responsibility of "device drivers" that execute in the kernel. They use these special memory locations or instructions, and each one is given responsibility over a particular hardware device.
User-level processes communicate with device drivers via system calls as mentioned in the comments. A system call is not quite like a normal function call -- you don't just call the code. After setting up a "request" for what it wants to do, the user-level process pokes the kernel, usually by using a software interrupt instruction. The kernel wakes up, looks at what you request, and then decides itself what code to execute. The kernel code runs at a higher privilege level and will call directly into the device drivers that access the hardware.
This is how the kernel keeps processes safe from each other.
To actually get from stdout to the screen is a lengthy process:
the standard library ends up making a system call that writes to a "pipe" that is attached to stdout. This is where it leaves your process.
The other end of the pipe is being read by the console. The console is a user-level process, so it has to do a system call to do the reading.
The console decides what to display, and how to make it visible to you. There will be a bunch more layers, but eventually there will be system calls into drivers that control the graphics hardware. They will mess with the bits that turn into pixels on the screen, making your text visible.

Why does "read" have to be a system call run in "Kernel Mode"?

As I understood, the UNIX function read() will cause an interrupt(TRAP) and invoke the system call read. I also remembered that it has to switch to "Kernel Mode" before invoking the system call read and the switching is expensive..
I was wondering that why the read operation has to be delegated to system call in "Kernel Mode", instead of being done in "User Mode" completely.
For example, if there could be a service in "User Mode" which manages the access permissions of files, the read operation can just request this service, not disturbing the Kernel..
And for the disk driver, it is said in this link that
Device drivers can run in either user or kernel mode
Does anyone have ideas about this? Why does read have to be in Kernel Mode?
Is not the way Operating Systems are designed. The definition of OS is to handle the computers' hardware and to bring resources to their users. Operating Sysmtes also have the concept of user mode and kernel mode (as you said).
By having these concepts, OS define an specific line to what a user might do and what not. Letting them manage hardware is definitely something OS don't want users to do.
read usually involves a hardware access. Accessing hardware is cumbersome and error prone and can leave the computer in an unusable state. Operating System uses drivers to control the computer's hardware.
Issuing a read (assuming a hard disk IO) generally makes a driver to send a set of commands to the disk controller, read it's output, pass it to main memory, etc. This are dangerous operations that shouldn't be trusted to User Mode.
If there would be a service in user mode to handle this. Context switch still would be needed to be done, because the service would be running as another process.
Sure thing it can be done an Operating System that allows this. But modern operating systems aren't design to fulfill this behavior.
There are other approaches to building operating systems that relies on microkernels. A microkernel just do the minimum to get the pc started and leave everything else to other modules. Meaning that if a module crashes, the system still up. That's the case of specific drivers, filesystems, etc. I don't know if microkernels let these run in user space though.
Hope this helps!
First of all: it's no longer true that calling kernel is very expensive. It used to be when causing an exception/trap/fault/interrupt were the only way to switch from user mode to kernel mode in x86 systems, but that all changed with the addition of the systenter/sysexit machine code instructions, which perform a more lightweight transition.
Even if it is/were expensive, in terms of time consumed, system calls that deal with character and block device drivers should run in kernel mode because dealing with hardware devices involves reading and writting to hardware registers, which could be memory mapped or accessed thru I/O ports.
These registers must be protected from any access from userspace process. Not doing so may lead to any process to not to use the established API for reading a file, and directly use the hardware registers to read and write to the device. In the case of a disk with file, this would allow the userprocess to bypass the filesystem entirely, and hence, all the security and permission system.
So, if we need to protect these hardware registers so no user process can use them, code that does use them cannot run at the same priviledge level as any other user process. Hence, they run in another (more priviledged) mode, which is what is called "kernel mode".
Think on what would happen if you configure a Linux system so /dev/sda (usually the main harddisk in which the root filesystem lives) is read/write to anybody and everybody:
# chmod 666 /dev/sda
Having done this is more or less the equivalent of exposing the hard disk device to any user process. You can effectively write a program that could open, read, and write files stored within this device, but at the same time, you can write a program that open, read and write ANY files within the partition, no matter which permissions files have.
That said, there are cases in which a system runs only trusted applications. This kind of system doesn't need the level of protection that is present in a general purpose system, and hence it can benefit from the increased speed that comes by not depending on layers of APIs to isolate the process from the hardware. The most widely known example would be a videoconsole system. I recall that Windows CE used to run all its programs and device drivers at the same privilege too.

Resources