Why does "read" have to be a system call run in "Kernel Mode"? - c

As I understood, the UNIX function read() will cause an interrupt(TRAP) and invoke the system call read. I also remembered that it has to switch to "Kernel Mode" before invoking the system call read and the switching is expensive..
I was wondering that why the read operation has to be delegated to system call in "Kernel Mode", instead of being done in "User Mode" completely.
For example, if there could be a service in "User Mode" which manages the access permissions of files, the read operation can just request this service, not disturbing the Kernel..
And for the disk driver, it is said in this link that
Device drivers can run in either user or kernel mode
Does anyone have ideas about this? Why does read have to be in Kernel Mode?

Is not the way Operating Systems are designed. The definition of OS is to handle the computers' hardware and to bring resources to their users. Operating Sysmtes also have the concept of user mode and kernel mode (as you said).
By having these concepts, OS define an specific line to what a user might do and what not. Letting them manage hardware is definitely something OS don't want users to do.
read usually involves a hardware access. Accessing hardware is cumbersome and error prone and can leave the computer in an unusable state. Operating System uses drivers to control the computer's hardware.
Issuing a read (assuming a hard disk IO) generally makes a driver to send a set of commands to the disk controller, read it's output, pass it to main memory, etc. This are dangerous operations that shouldn't be trusted to User Mode.
If there would be a service in user mode to handle this. Context switch still would be needed to be done, because the service would be running as another process.
Sure thing it can be done an Operating System that allows this. But modern operating systems aren't design to fulfill this behavior.
There are other approaches to building operating systems that relies on microkernels. A microkernel just do the minimum to get the pc started and leave everything else to other modules. Meaning that if a module crashes, the system still up. That's the case of specific drivers, filesystems, etc. I don't know if microkernels let these run in user space though.
Hope this helps!

First of all: it's no longer true that calling kernel is very expensive. It used to be when causing an exception/trap/fault/interrupt were the only way to switch from user mode to kernel mode in x86 systems, but that all changed with the addition of the systenter/sysexit machine code instructions, which perform a more lightweight transition.
Even if it is/were expensive, in terms of time consumed, system calls that deal with character and block device drivers should run in kernel mode because dealing with hardware devices involves reading and writting to hardware registers, which could be memory mapped or accessed thru I/O ports.
These registers must be protected from any access from userspace process. Not doing so may lead to any process to not to use the established API for reading a file, and directly use the hardware registers to read and write to the device. In the case of a disk with file, this would allow the userprocess to bypass the filesystem entirely, and hence, all the security and permission system.
So, if we need to protect these hardware registers so no user process can use them, code that does use them cannot run at the same priviledge level as any other user process. Hence, they run in another (more priviledged) mode, which is what is called "kernel mode".
Think on what would happen if you configure a Linux system so /dev/sda (usually the main harddisk in which the root filesystem lives) is read/write to anybody and everybody:
# chmod 666 /dev/sda
Having done this is more or less the equivalent of exposing the hard disk device to any user process. You can effectively write a program that could open, read, and write files stored within this device, but at the same time, you can write a program that open, read and write ANY files within the partition, no matter which permissions files have.
That said, there are cases in which a system runs only trusted applications. This kind of system doesn't need the level of protection that is present in a general purpose system, and hence it can benefit from the increased speed that comes by not depending on layers of APIs to isolate the process from the hardware. The most widely known example would be a videoconsole system. I recall that Windows CE used to run all its programs and device drivers at the same privilege too.

Related

What code is user mode code and what code is kernel mode code?

I am having a question for myself while taking operating system course.
If I type in any C code to my text editor or though IDE and execute it with the compiler
it translates the code into machine code.
Then I would guess if I run the program, the OS will allocate a memory address to the code which is done by kernel code.
And if there was IO interrupt typed into my code, the kernel code executes.
And so...which bit is the user mode code then?
In the ordinary course of events, any code you write is 'user mode code'. Kernel mode code is executed only when you execute a system call and the control jumps from your user code to the operating system.
Obviously, if you're writing kernel code, or loadable kernel modules, then things are different — that code will be kernel mode code. But most people most of the time are only writing user mode code.
Kernel mode versus user mode actually reflects how the processor is running.
With modern operating systems, code only runs (with the processor) in kernel mode if it is trusted by the operating system, and all other code runs in user mode.
The functional difference, under modern operating systems, is that kernel mode code runs in a single (virtual) address space that represents all system resources, so all functions in kernel mode can affect each other directly. For example, all actions by a kernel mode driver can directly affect functioning of the operating system itself and of any other kernel mode driver. (The specifics implementation details vary somewhat between operating system types, for example between windows, linux, BSD, etc but the basic principles are the same)
Which means that, if you are writing code that will execute within the internal workings of the operating system or within a kernel mode driver, then it might be said to be kernel mode code. Otherwise, it will be user-mode code. Code that attempts some action that can only be executed in kernel mode will be prevented from doing so by the processor itself, unless the processor itself is in kernel mode. The operating system itself mediates when the processor enters kernel mode, which is why code need to be recognised by the operating system (or installed, in the case of kernel mode drivers) in order to do things that can only be done in kernel mode. User mode code can't arbitrarily escalate the processor to kernel mode, without the help of some code that is already recognised by the operating system.
Practically, modern operating systems also supply a set of functions (e.g. in an API) that can be called from user mode. A lot of those functions are, themselves, executed solely in user mode. Some, however, result in the the processor being switched into kernel mode to perform some specific actions, and then the processor is switched back to user mode by the time control returns to the caller. Which code, within the OS itself, executes in user mode or kernel mode depends both on the design of the operating system and administrative settings (e.g. only suitably privileged users (aka administrators) can, for example, install kernel mode drivers).

Why do system calls exist

I have been reading about system calls and how they work in Linux. I still have more reading to do but one thing that nothing I have read has answered is, WHY do we need system calls?
I understand that system calls are requests from user space program for the kernel to do something, but my question is basically: Why can't the user space program do the thing itself? Why doesn't Glibc do the actual operation instead of just being a wrapper for a system call?
For example, if I call fopen() in my program, why does glibc call the open system call? Why doesn't glibc just do the operation itself?
I understand that it would mean that glibc developers would have a lot more work and that they would have to have an intimate knowledge of Linux, but isn't glibc already very closely related to Linux kernel?
Also, I understand the system call functions are run in ring 0 in the CPU...but what's really the point of that? If I execute a program, I am giving it express permission to run, so what security is added by separating what code can be run in different contexts since you are giving it all permission anyway?
Why doesn't glibc just do the operation itself?
Well that is more less the ways things went in good old MS/DOS systems: no separation between kernel code and user code, and user code could happily directly access the hardware.
This just has 2 major problems:
It works (rather) fine on a single user and not multi tasking system, but as soon as multiple programs can simultaneously run in a system, you have to synchronize hardware accesses and memory usage => those are the parts dedicated to the kernel
There is no protection of the system from a poorly coded program. In a modern OS, an erroneous program can crash, but the system itself should survive. In MS/DOS a program crash usually ended in a system reboot.
For those reasons, all modern OS (except maybe some lightweight embedded ones) use isolation between different user processes and the kernel. And that just mean that you need a way to allow a user mode process to require a privileged action (reading or writing a physical disk is) from the kernel: that is exactly what system calls are made for.
Why doesn't glibc just do the operation itself?
Short answer: Because it can't.
Long answer:
A program running in Linux can run in two modes : UserLand or KernelLand.
The Kernel Land has every rights and can do everything, including talking with hardware, or providing userspace callbacks. For instance, when you call fopen(), the kernel does all the dirty talking with your filesystem (ext4 for instance), the caching, everything down to talking with the SATA Controller to access data on the hard-drive.
GLibc could do that using the device exposed by the kernel in /dev, but that would mean recoding from scratch all the filesystems layers, the sockets, the firewalling...
The kernel just provides easy usable API for programmers to have elevated rights and communicate with the devices. That's how Linux (and most modern OS) is made.
What security is added by separating what code can be run in different contexts since you are giving it all permission anyway?
The permissions are managed by the kernel. If you don't have syscall, you don't have permissions. Or should the program you run check their own permission? Once again, it would be reinventing the wheel every time.
If the code generated by a C implementation were the only thing that were going to be running on the target system (as it would be for many freestanding implementations, and for a very small number of hosted implementations) and if implementation knew precisely what hardware it would be running upon (true of some freestanding implementations, but seldom true for hosted ones), its runtime library might be able to perform operations like "fopen" by directly communicating with the storage hardware. It is rare, however, for either condition to apply, much less both of them.
If multiple programs will be using storage device, it will generally be necessary that they either coordinate their actions somehow or else that sequences of operations performed by different programs do not overlap, and that every program "forget" anything it thinks it knows about the state of storage any time another program might have written to it.
Otherwise, suppose a disk contains a single file and program #1 uses "fopen" to open it for reading. Each directory sector holds 8 entries, so the program would read the first directory sector and observe that slot #0 identifies the file of interest while #1-#7 are blank.
Now suppose program #2 uses "fopen" to create a file for writing. It would read the directory sector, observe that slots #1-#7 are blank, and rewrite the directory sector with information about the new file in slot #1.
Finally, suppose program #1 wants to write a file. If it doesn't know about program #2, it might reasonably believe it knows what the directory contains (it had read it earlier, and has no reason to believe it's changed), place information about the new file in slot #1, and replace the directory sector on disk with its new version, obliterating the entry written by program #2.
Having both programs route their operations through an operating system ensures that when program #2 wants to create its file, it can exploit the fact that it had just read the directory for program #1 (and thus doesn't need to reread it). More importantly, when program #1 goes to write a file, the operating system will know that the directory contains the file written by program #2, and will thus ensure that the new file gets placed in slot #2.
Contrary to what other answers say, even microcomputer C implementations running on platforms like MS-DOS essentially always relied upon the OS for file I/O. Some would include their own console I/O routines because the ones in MS-DOS were about four times as slow as they should have been, but the need for coordination when using file I/O meant that very few programs would try to do it themselves.
1- You don't wanna deal with low-level hardware communications. At least most people don't. Each of them has hundreds of commands.
2- Make a simple mistake and your CPU/RAM or I/O device might be useless forever.
3- When you are part of a network, you can share resources. The system calls and kernel keeps your co-worker from damaging your hard disk.
Another consideration is that the OS kernel needs to provide an abstraction for the myriad different types of hardware via a uniform API - without which you'd invariably be making device specific calls in your program.
While the previously-idle disk spins up for two seconds, or the networked disk gets connected for thirty seconds, what is the library going to do?
The full answer to your question is very broad but let me take a simple example based upon your question about fopen.
Let us say that we have a large system that has hundred or thousands of users. One of those users is, say the HR department with files containing confidential information about employees.
If that disk could be accessed at will in user mode, then any person on the system could open any file on the system, including those with confidential information.
In other words operating systems managed SHARED resources. These include disk, CPU, and memory. If these could be controlled in user mode, there would be no way to ensure that these were shared equitably.

passing variables to a process from linux kernel

I want to make a program that will gather information about the keystrokes of a user (keycode, press and release times) and will use them as a biometric for authenticating the user continuously. My approach is to gather the keystrokes using a kernel module (because you can't just kill a kernel module), than the kernel module will send the information to another process that will analyze the data gathered by the kernel module, it will save it to a database and will return an answer to the kernel (the user is authenticated or not) and the kernel will lock the computer if the user is not authenticated. the whole module will not be distributed.
my questions are:
1. How can I call a process from the kernel and also send him the data?
2. how can I return a message to the kernel from the process?
#basile-starynkevitch 's answer and his arguments notwithstanding there is an approach you can take that is perfectly correct and technically allowed by the linux kernel.
Register a keyboard notifier call back function using the kernel call register_keyboard_notifier() in your kernel module. As a matter of fact it's designed for exactly this!
Your notifier call back function will look something like:
int keysniffer_callback(struct notifier_block *notifier_block,
unsigned long scancode,
void *param)
{
// do something with the scancode
return NOTIFY_OK;
}
See https://www.kernel.org/doc/Documentation/input/notifier.txt for starters.
I want to make a program that will gather information about the keystrokes of a user
That should go in practice into your display server, which you did not mention (Xorg, Wayland, MIR, ...?). Details matter a big lot!
My approach is to gather the keystrokes using a kernel module
I strongly believe this is a wrong approach, you don't need any kernel module.
I want to make a program that gathers data about the user keystrokes
Then use ordinary Unix machinery. The keyboard is some character device (and you could have several keyboards, or none, or some virtual one...) and you could read(2) from it. If you want to code a keylogger, please tell that explicitly.
(be aware that a keylogger or any other cyberspying activity can be illegal when used without consent and without permission; in most countries, that could send you to jail: in France, Article 323-1 du Code Pénal punishes that by at least 2 years of jail; and most other countries have similar laws)
the kernel module will send the information to another process [....] it will save it to a database
This is practically very difficult to get (and you look confused). Databases are in user-land (e.g. some RDBMS like PostGresSQL, or some library accessing files like sqlite). Notice that a kernel driver cannot (easily and reliably) even access to files.
All application programs (and most daemons & servers) on Linux are started with execve(2) (e.g. by some unix shell process, or by some daemon, etc...), and I see no reason for you to make an exception. However, some programs (mostly init, but also a few others, e.g. /sbin/hotplug) are started by the kernel, but this is exceptional (and should be avoided, and you don't need that).
How can I call a process from the kernel
You should not do that. I see no reason for your program to avoid being started by execve from some other process (perhaps your init, e.g. systemd).
and also send him the data?
Your process, as all other processes, is interacting with the kernel thru system calls (listed in syscalls(2)). So your application program could use read(2), write(2), poll(2) etc.. Be aware of netlink(7).
how can I return a message to the kernel from the process?
You don't. Use system calls, initiated by application code.
the kernel will lock the computer if the user is not authenticated.
This does not have any sense. Screen locking is a GUI artefact (so is not done by kernel code, but by ad-hoc daemon processes). Of course some processes do continue to run when locking is enabled. And many processes are daemons or servers which don't belong to "the" user (and continue to run when "the computer is locked"). At heart, Linux & POSIX is a multi-user and multi-tasking operating system. Even on a desktop Linux system used by a single physical person, you have dozens of users (i.e. uid-s many of them specialized to a particular feature, look into your /etc/passwd file, see passwd(5)) and more than a hundred processes (each having its pid), use top(1) or ps(1) as ps auxw to list them.
I believe you have the wrong approach. Take first several days or weeks to understand more about Linux from the application point of view. So read some book about Linux programming, e.g. ALP or something newer. Read also something like: Operating Systems: Three Easy Pieces
Be aware that in practice, most Linux systems having a desktop environment are using some display server. So the (physical) keyboard is handled by the X11 or Wayland server. You need to read more about your display server (with X11, things like EWMH).
Hence, you need to be much more specific. You are likely to need to interact with the display server, not the kernel directly.
At last, a rule of thumb is to avoid bloating your kernel with extra and useless driver code. You very probably can do your thing in userland entirely.
So, spend a week or more reading about OSes & Linux before coding a single line of code. Avoid kernel modules, they will bite you, and you probably don't need them (but you might need to hack your display server or simply your window manager; of course details are different with X11 and with Wayland). Read also about multiseat Linux systems.
At last, most Linux distributions are made of free software, whose source code you can study. So take time to look into the source code of relevant software for your (ill-defined) goals. Use also strace(1) to understand the system calls dynamically done by commands and processes.

Why is executing callback functions in kernel mode bad?

Why is calling callback functions in kernel-space from user-space considered 'bad' vs calling callback functions in user-space from user-space?
Allowing the user to execute code in kernel mode would be an enormous security risk. That is to say, if a user space program is executing in kernel mode, there is no security: the game is completely lost and the user has full access to everyone and everything.
Consider that if you're executing in kernel mode, virtual memory lookups are no longer protected by privilege level. In x86, when in kernel mode, you have a privilege level of 0; meaning you can access anything in physical memory. So, if a process' callback were executing in kernel space, it would be able to do anything it wanted to literally anything on the machine.
Want to erase everybody's page tables? K. Want to instead see what's in those page tables? You got it. Want to zero out kernel memory and cause the entire system to crash? Lolz good idea. Want to hack another process on the machine so that it logs its I/O traffic? Seems legit.
Don't let the user run code in kernel space.
There are systems that support such callbacks. E.g.:
http://h41379.www4.hpe.com/doc/732final/4527/4527pro_001.html#index_x_190
In such a case, your process must have the appropriate privilege to execute in kernel mode.
Allowing code to execute in kernel mode opens the system up to crashes and security holes.

Printing to stdout without the involvement of functions

How can one print some arbitrary text on-screen without the call to any function ?
Another way of asking this question is how are i/o functions implemented ?
I tried searching in google, but unluckily, no result was found similarly to as if it was some sort of a top secret thing.
For study purposes.
I personally don't feel even close to a real programmer without knowing this.
Well, in the end, loosely speaking, the low level software in the computer sets a special memory location or uses a special instruction that changes the voltage on some pins on the CPU, and the hardware responds to these changes.
But user-level processes don't have access to those instructions or memory locations. Messing with stuff that drives the hardware is the responsibility of "device drivers" that execute in the kernel. They use these special memory locations or instructions, and each one is given responsibility over a particular hardware device.
User-level processes communicate with device drivers via system calls as mentioned in the comments. A system call is not quite like a normal function call -- you don't just call the code. After setting up a "request" for what it wants to do, the user-level process pokes the kernel, usually by using a software interrupt instruction. The kernel wakes up, looks at what you request, and then decides itself what code to execute. The kernel code runs at a higher privilege level and will call directly into the device drivers that access the hardware.
This is how the kernel keeps processes safe from each other.
To actually get from stdout to the screen is a lengthy process:
the standard library ends up making a system call that writes to a "pipe" that is attached to stdout. This is where it leaves your process.
The other end of the pipe is being read by the console. The console is a user-level process, so it has to do a system call to do the reading.
The console decides what to display, and how to make it visible to you. There will be a bunch more layers, but eventually there will be system calls into drivers that control the graphics hardware. They will mess with the bits that turn into pixels on the screen, making your text visible.

Resources