Implementation of system calls / traps within Linux kernel source - c

I'm currently learning about operating systems the use of traps to facilitate system calls within the Linux kernel. I've located the table of the traps in traps.c and the implementation of many of the traps within entry.S.
However, I'm instructed to find an implementation of two system calls in the Linux kernel which utilize traps to implement a system call. Although I can find the definition of the traps themselves, I'm not sure what a "call" to one of these traps within the kernel would look like. Therefore, I'm struggling to find an example of this behavior.
Before anyone asks, yes, this is homework.
As a note, I'm using Github to browse the kernel source, since kernel.org is down:
https://github.com/torvalds/linux/

For the x86 architecture the SYCALL_VECTOR (0x80) interrupt is used only for 32bit kernels. You can see the interrupt vector layout in arch/x86/include/asm/irq_vectors.h. The trap_init() function from traps.c is the one that sets the trap handler defined in entry_32.S:
set_system_trap_gate(SYSCALL_VECTOR, &system_call);
For the 64bit kernels, the new SYSENTER (Intel) or SYSCALL (AMD) intructions are used for performance reasons. The syscall_init() function from arch/x86/kernel/cpu/common.c sets up the "handler" defined in entry_64.S and bearing the same name (system_call).
For the user-space perspetive you might want to take a look at this page (a bit outdated for the function/file names).

I'm instructed to find an implementation of two system calls in the Linux kernel which utilize traps to implement a system call
Every system call utilizes a trap (interrupt 0x80 if I recall correctly) so the "kernel" bit will be turned on in PSW, and privileged operations will be available to the processor.
As you mentioned the system calls are specified in entry.S under sys_call_table: and they all start with the "sys" prefix.
you can find the system call function header in: include/linux/syscalls.h, you can find it here:
http://lxr.linux.no/#linux+v3.0.4/include/linux/syscalls.h
Use lxr (as the comment above have already mentioned) in general in order to browse the source code.
Anyhow, the function are implemented using the SYSCALL_DEFINE1 or othe versions of the macro, see
http://lxr.linux.no/#linux+v3.0.4/kernel/sys.c

If you're looking for an actual system call, not an implementation of a system call, maybe you want to check some C libraries. Why would a kernel include a system call? (I'm not talking about a system call implementation, I'm talking about e.g. an actual chdir call for example. There is a chdir system call, which is a request for changing the directory and there is a chdir system call implementation which actually changes it and must be somewhere in the kernel). Ok, maybe some kernels do include some syscalls too but that's another story :)
Anyway, if I get your question right, you're not looking for an implementation but an actual call. GNU libc is too complicated for me, but you can try browsing the dietlibc sources. Some examples:
chdir.S
syscalls.h

Related

Are OS libraries written in assembly or in C

I ask this, because I am getting very conflicting definitions of System calls.
One one hand, I have seen the definition that they are an API the OS provides that a user program can call. Since this API is a high level interface, it has to be implemented in a high level language like C.
On the other hand, I have seen that the actual OS syscalls are machine instructions, for which you have to set certain registers to call (according to some compliance standard set by the OS). But this looks nothing like the UNIX APIs like open(), write() and read(), so what is going on here.
I have also read that these high level interfaces are implemented in the C libraries which do the actual assembly code syscalls. In that case, why do we say the OS provides this interface when it is actually provided by the C language. What if I want to perform a UNIX syscall directly to the OS without having to use C?
There are two open functions - one, the syscall open exposed by the operating system (e.g. Linux), and two, the C-library function open, exposed by the C standard library (e.g. glibc).
You can see two different man pages for these functions - run man 2 open to see the man page regarding the syscall, and man 3 open to see the man page regarding the C standard function.
Functions you mentioned like open, write, and read can be confusing - because they exist both as syscalls and as C standard functions. But they are separate entities entirely - in fact, glibc's open function doesn't even use the open syscall - it uses the openat syscall.
On Windows, where the syscall open doesn't even exist - the C standard library function open does still exist, and uses WinAPI's CreateFile behind the scenes.
What if I want to perform a UNIX syscall directly to the OS without
having to use C?
This is possible - indeed, glibc has to do it to implement C standard library functions. But it's tricky, and involves implementing wrappers for the syscalls and sometimes even handcrafting assembly.
If you want to see things for yourself, you can look at how glibc implements open:
int
__libc_open (const char *file, int oflag, ...)
{
int mode = 0;
if (__OPEN_NEEDS_MODE (oflag))
{
va_list arg;
va_start (arg, oflag);
mode = va_arg (arg, int);
va_end (arg);
}
return SYSCALL_CANCEL (openat, AT_FDCWD, file, oflag, mode);
}
...
weak_alias (__libc_open, open)
notice that the function ends with a call to the macro SYSCALL_CANCEL, which will end up calling the OS-exposed openat syscall.
Are OS libraries written in assembly or in C
That is a question that can not really be answered as it depends. Technically there are no limitations on the implementation (i.e. it can be written in any language, though C is probably the most common followed by assembly).
The important part here is the ABI. This defines how OS calls can be made.
You can make system calls in assembly (if you know the ABI you can manually write all the code to comply), the C compiler knows the ABI and will automatically generate all the code required to make a call.
Most languages though allow you to make system calls, they will either know the ABI or have a wrapper API that translates the calls from a language call to the appropriate ABI for that OS.
I ask this, because I am getting very conflicting definitions of System calls.
The definitions will depend on the context. You will have to give examples of what the definitions are AND in what context they are being used.
One one hand, I have seen the definition that they are an API the OS provides that a user program can call.
Sure this is one way to look at it.
More strictly I would ays the OS provides a set of interfaces that can be used to perform privileged tasks. Now those interfaces can be exposed via an API provided by a particular environment that makes them easier to use.
Since this API is a high level interface, it has to be implemented in a high level language like C.
Sort of true.
An environment can expose an API does not mean that it needs a high level language (and C is not a high level language, it is one step above assembly, it is considered a low level language). And just because it is exposed by the language does not mean it is implemented in that language.
On the other hand, I have seen that the actual OS syscalls are machine instructions, for which you have to set certain registers to call (according to some compliance standard set by the OS).
OK. Here we have moved from System Calls to syscalls. We should be very careful on how we use these terms to make sure we are not conflating different terms.
I would (and this is a bit abstract still) think about the computer as several levels of abstraction:
Hardware
------ --------------
syscalls
OS --------------
System Calls (read/write etc..)
------ --------------
Language Interface (read/write etc..)
You can poke the hardware directly if you want (if you know how), but it is better if you can make syscalls (if you know how), but it better to use the OS System Calls which use a well defined ABI, but it better to use the language interface (what you would call the API) to call the underlying System Calls.
But this looks nothing like the UNIX APIs like open(), write() and read(), so what is going on here.
Here the UNIX OS provides the open/close/read interface.
The C libraries provides a very thin API wrapper interface above the the OS System Calls. The C compiler will then generate the correct instructions to call the System Calls using the correct ABI, which in turn will call the next layer down in the OS to use the syscalls.
I have also read that these high level interfaces are implemented in the C libraries which do the actual assembly code syscalls.
The high level interface can be written in any language. But the C one is so easy to use that most other languages don't bother doing it themselves but simply call via the C interface.
It's VERRRY rare to ever directly write something in assembly. By writing in C you can compile it for many different CPU architectures whereas by writing in assembly you are basically stuck with one specific architecture. Most operating systems are written in C. We say the OS provides the interface because you are interacting with the operating system which happens to be written in C.

Why was a readdir function added to POSIX library interface when there is a readdir kernel function?

I was surprised to discover the man pages having entries for two conflicting variants of readdir.
in READDIR(2), it specifically states you do not want to use it:
This is not the function you are interested in. Look at readdir(3) for the POSIX conforming C library interface. This page documents the bare kernel system call interface, which is superseded by getdents(2).
I understand a function may become deprecated when another function comes along and does its job better, but I am not familiar with other cases of a userspace function coming in and replacing a kernel function of the same name. Is there a known reason it was chosen to go this route rather than coming up with a new function name (as the man page mentions getdents did when superseding readdir).
The programming interface, POSIX, is stable. You don't just go replacing functions in it unnecessarily because you want to implement the backend more efficiently. The Linux syscall readdir never implemented the readdir function because it has the wrong signature; it was an old, inefficient backend for implementing the readdir function. When a better backend came along, it was obsolete.
You have it completely backwards: it's the library function readdir(3) which predates Linux and its readdir(2) system call, and not the reverse.
Naming the syscall that way was certainly a poor decision, and probably has a story behind it, but it's pretty much irrelevant now, as nobody is using it.
On Unix, directories used to be simple files formatted in a special way, and the system call interface through which they were read was just read(2) [1]. Later systems introduced system calls like getdirentries (44BSD) and getdents (SVR3), but they weren't willing or capable to standardize on an interface, so we're still stuck with the high level and broken [2] readdir(3) library function as the only standard interface for reading a directory.
[1] On some systems like BSD you can still cat a directory, at least when using the default filesystem (FFS).
[2] it's broken because it's not signal safe, and it returns NULL for both error and EOF, which means that the only way it could be safely used is by first setting errno to 0, and checking both its return value and errno afterwards. Yuck.

Why does system() exist?

Many papers and such mention that calls to 'system()' are unsafe and unportable. I do not dispute their arguments.
I have noticed, though, that many Unix utilities have a C library equivalent. If not, the source is available for a wide variety of these tools.
While many papers and such recommend against goto, there are those who can make an argument for its use, and there are simple reasons why it's in C at all.
So, why do we need system()? How much existing code relies on it that can't easily be changed?
sarcastic answer Because if it didn't exist people would ask why that functionality didn't exist...
better answer
Many of the system functionality is not part of the 'C' standard but are part of say the Linux spec and Windows most likely has some equivalent. So if you're writing an app that will only be used on Linux environments then using these functions is not an issue, and as such is actually useful. If you're writing an application that can run on both Linux and Windows (or others) these calls become problematic because they may not be portable between system. The key (imo) is that you are simply aware of the issues/concerns and program accordingly (e.g. use appropriate #ifdef's to protect the code etc...)
The closest thing to an official "why" answer you're likely to find is the C89 Rationale. 4.10.4.5 The system function reads:
The system function allows a program to suspend its execution temporarily in order to run another program to completion.
Information may be passed to the called program in three ways: through command-line argument strings, through the environment, and (most portably) through data files. Before calling the system function, the calling program should close all such data files.
Information may be returned from the called program in two ways: through the implementation-defined return value (in many implementations, the termination status code which is the argument to the exit function is returned by the implementation to the caller as the value returned by the system function), and (most portably) through data files.
If the environment is interactive, information may also be exchanged with users of interactive devices.
Some implementations offer built-in programs called "commands" (for example, date) which may provide useful information to an application program via the system function. The Standard does not attempt to characterize such commands, and their use is not portable.
On the other hand, the use of the system function is portable, provided the implementation supports the capability. The Standard permits the application to ascertain this by calling the system function with a null pointer argument. Whether more levels of nesting are supported can also be ascertained this way; assuming more than one such level is obviously dangerous.
Aside from that, I would say mainly for historical reasons. In the early days of Unix and C, system was a convenient library function that fulfilled a need that several interactive programs needed: as mentioned above, "suspend[ing] its execution temporarily in order to run another program". It's not well-designed or suitable for any serious tasks (the POSIX requirements for it make it fundamentally non-thread-safe, it doesn't admit asynchronous events to be handled by the calling program while the other program is running, etc.) and its use is error-prone (safe construction of command string is difficult) and non-portable (because the particular form of command strings is implementation-defined, though POSIX defines this for POSIX-conforming implementations).
If C were being designed today, it almost certainly would not include system, and would either leave this type of functionality entirely to the implementation and its library extensions, or would specify something more akin to posix_spawn and related interfaces.
Many interactive applications offer a way for users to execute shell commands. For instance, in vi you can do:
:!ls
and it will execute the ls command. system() is a function they can use to do this, rather than having to write their own fork() and exec() code.
Also, fork() and exec() aren't portable between operating systems; using system() makes code that executes shell commands more portable.

Are system calls directly send to the kernel?

I have a couple of assumptions, most likely some of them will be incorrect. Please correct me where they are wrong.
We could categorize the functions in a program written in C as follows:
Functions that are sent to dynamically loaded libraries:
These are sent to the library that translates them in to multiple standard C-functions
The library passes them on to libc where they are translated into multiple system calls.
Libc passes those on to the kernel where they are executed and the returns are sent back to libc.
Libc will collect the returs, group them by c-function and use them to create 1 return for each c-function. These returns will be send back to the dynamically loaded library.
This library will collect all returns and use them to create 1 return that is send back to the original program.
Functions that are either defined in the code or part of statically compiled libraries: Everything is the same as the category above but:
They program already does the translation into standard C functions where they are known or into functions calling dynamically loaded libraries in the other case.
The standard c functions are send to libc, the others to the dynamically loaded libraries (where they will be handled as above).
The creation of 1 final return based on the returns from both types of functions will be done by the program
Functions that are standard C functions: They will just be sent to libc which will communicate with the kernel in the same way as above and 1 return will be sent to the program
Functions that are system calls: They are NOT sent directly to the kernel but have to pass to libc although it doesn't do any extra work.
Security checks (permissions, writing to unallocated mem, ...) are always done by the kernel, although libc and libraries above might also check it first.
All POSIX-compliant systems follow these rules
It might not be the same on Linux and on some other POSIX system (like FreeBSD).
On Linux, the ABI defines how a system call is done. Read about Linux kernel interfaces. The system calls are listed in syscalls(2) (but see also /usr/include/asm*/unistd.h ...). Read also vdso(7). The assembler HowTo explains more details, but for 32 bits i686 only.
Most Linux libc are free software, you can study their source code. IMHO the source code of musl-libc is very readable.
To simplify a tiny bit, most system calls (e.g. write(2)) are small C functions in the libc which:
call the kernel using SYSENTER machine instruction (and take care of passing the system call number and its arguments with the kernel convention, which is not the usual C ABI). What the kernel considers as a system call is only that machine instruction (and conventions about it).
handle the failure case, by passing it to errno(3) and returning -1.
(IIRC, on failure, the carry -or perhaps the overflow- flag bit is set when the kernel returns from SYSENTER; but I could be wrong in the details)
handle the success case, by returning a result.
You could invoke system calls without libc, with some assembler code. This is unusual, but has been done (e.g. in BusyBox or in Bones).
So the libc code for write is doing some tiny extra work (passing arguments, handling failure & errno and success cases).
Some few system calls (probably getpid & clock_gettime) avoid the overhead of the SYSENTER machine instruction (and user-mode -> kernel-mode switch) thanks to vDSO.
No you can't categorize things like that. When you program in C (but that makes no difference in almost all other languages), there is only functions and whatever is the real status of these, you call them exactly the same way. This is defined by ABI (how to pass parameters, get returned values, etc) and enforced by the compiler/linker. Of course some functions are just stubs. For example stubs to shared libraries functions (stubs may be need to load the library, dynamic link to the real function, etc) or system calls (this is more technical and differs from kernel to kernel). But from the viewpoint of your program everything is the same (this is why it is hard to understand difference between fread and read at the beginning: you call them the same way, they make almost the same job, what's the difference?).
POSIX doesn't say a single word about kernels... It just lists the C (and formerly ADA) API of a set of functions with minimal semantic (plus some command, tools, etc). Implementation of these is totally free.

How to include math.h #include <math.h> on kernel source file?

I am trying to include math.h in my Linux kernel module. If I use,
#include '/usr/include/math.h'
It give me theses errors:
error: features.h: No such file or directory
error: bits/huge_val.h: No such file or directory
error: bits/mathdef.h: No such file or directory
error: bits/mathcalls.h: No such file or directory
Why is this?
You cannot use the C library in a kernel module, this is even more true for the math library part.
You can't include a userspace C module in kernel space. Also are you sure that you want to be doing this? This thread may help http://kerneltrap.org/node/16570. You can do math functions inside the kernel, just search around on http://lxr.linux.no/ for the function you need.
Standard libraries are not available in the kernel. This includes libc, libm, etc. Although some of the functions in those libraries are implemented in kernel space, some are not. Without knowing what you're trying to call, it's impossible to say for sure whether or not you should be doing what you're trying to do in kernel space.
I should further note that the kernel does NOT have access to the FPU. This is to save time when switching tasks (since saving the FPU registers would add unnecessary overhead when performing context switches). You can get access to the FPU from kernel space if you really want it, but you need to be really careful not to trash the user space's FPU registers when doing so.
Edit: This summarizes the caveat about the FPU much better than I did.
Floating point operations is not supported in the kernel. This is because when switching from kernel context to user context, registers must be saved. If the kernel would make use of floating point, then also the floating point registers would have to be saved also, which would cause bad performance for each context switch. So because floating point is very rarely needed, especially in the kernel it is not supported.
If you really have to:
maybe you could compile your own kernel with floating point support
you could block context switch within your floating point operations
the best would be to use fixed point arithmetics.
AFAIK kernel space is separated from user space, and so should the source code. /usr/include is for general programming.
This suggests that doing floating point math in the kernel is not as simple is in user-space code. Another instance suggesting that this is hard.
Still looking for a more definitive answer.
well you cannot, you can rewrite functions you need in your module, it's dirty but it should work...
Thanks a lot for your comments
To use math functions
Is it possiable to make a plane C application and pass variables from kernel source file. So the C Application will compute the variables and sends back the information .
Kernel source file (kernel space) ---> C Application (user space)
|
<---|
Kernel source file
So we may include header file in kernel source code. In case of any event, it pass the values to a C application (user space)
Details:
I am trying to modify my HID joystick events(absolute x, y) So It may only move to the improved location, which will be genarated by my application, with some math functions like (pow, tan,etc).
So I used hid-input.c to get raw events, and modify them. which will be used for input subsystem through hid kernel module –
Looking for your comments
Regards.
You cannot (often, without lots of kernel know-how to lock and preserve these registers while not impacting other critical sections) use floating point registers in the kernel, and besides it is of course inappropriate to do "processing" in the kernel. Many others have mentioned this. Performance will be terrible. Thus, math.h is not provided for kernel modules. We accept this and move on...
However, as I am also a victim of crazy requirements and completely insane designs forced on us by others, this is a legitimate question. After reducing the usage of the math.h API to minimize the performance impact, you can use floating point emulation (soft-float) via correct compiler settings to implement your required functions without using floating point registers. Kernel code should already compile with these soft-float settings.
In order to implement math.h functionality, you can look at glibc or uClibc and perhaps others. Both of these libraries have generic "C" implementations of libm which implement math.h without the use of special intrinsics or platform specific types and should therefore compile just fine in the kernel.
uClibc: The above link takes you directly to the libm section of uClibc.
glibc: After "git"-ing glibc, you'll find what you're looking for in glibc/sysdeps/ieee754/flt-32.
glibc may be more difficult to understand because it is more sophisticated and has more inter-dependencies within itself, but uClibc only provides (at the moment) C89 math.h. If you want single precision (read: faster) or complex math functionality as in C99+, you'll have to look at glibc.
Maybe try using double quotes (") instead of single quotes?
In experts view , its NOT a good approach to communicate data between kernel space and user space. Either fully work on kernel space OR only on user space.
But one solution can, use read() and write() command in a kernel module to send the information between user space and kernel space.

Resources