how do you do system call interrupts in C? - c

I learned from download.savannah.gnu.org/.../ProgrammingGroundUp-1-0-booksize.pdf
that programs interrupt the kernel, and that is how things are done. What I want to know is how you do that in C (if it's possible)

There is no platform-independent way (obviously)! On x86 platforms, system-calls are typically implemented by placing the system-call code in the eax register, and triggering int 80h in assembler, which causes a switch to kernel-mode. The kernel then executes the relevant code based on what it sees in eax.

User processes usually request kernel services by calling system call wrapper functions from Standard C Library. You can do it manually with syscall(2).

The user program's interaction with the kernel is going to be very platform-specific, so it usually happens behind the scenes in the various library routines. So one just calls printf, write, select, or other library routines, which allow the programmer to write code without worrying about the details of the kernel interface, device drivers, and so forth.
And the way it usually works is that when one of those library routines needs the kernel to do something on its behalf, it performs a low-level system call that yields its control of the CPU to the kernel. It's the user program, not the kernel, that is the one being interrupted.

If you're using glibc (which you probably are if you are using gcc and linux) then there is a syscall function in unistd.h that you can use. It has different implementations for different architectures and operating systems, but the implementation is done in assembly (could be inline assembly). syscall has a man page, so:
man syscall
will give you some info.
If you are just curious about how all of this works then you should know that this has changed in Linux on x86 in recent years. Originally interrupt 0x80 was used by Linux as the normal system call entry point on x86. This worked well enough, but as processors got more advanced pipelining (starting an instruction before previous instructions have completed) interrupts have slowed down (relative to execution of regular code which has sped up, though some tests have shown that it has slowed down more than that). The reason for this is that even when the int instruction is used to trigger an interrupt it works mostly the same as hardware triggered interrupts, which occur unpredictably, which causes them not to play nice with the pipelining of instructions (pipelining works better when code paths are predictable).
To help with this newer x86 processors have instructions specifically intended for making system calls, but Intel and AMD use different instructions for this (sysenter and syscall, respectively). Additionally the Intel systenter instruction clobbers a general purpose register that Linux has used on x86_32 to pass a parameter to the kernel. This means that programs have to know which of 3 possible system call mechanisms to use as well as possibly different ways of passing arguments to the kernel. To get around all of this newer kernels map a special page of memory into programs (this page is called vsyscall and if you cat /proc/self/maps you will see an entry for it) that contains code for the system call mechanism that the kernel has determined should be used on the system, and newer versions of glib can implement their system call entry using the code in this page.
The point of all of this is that this isn't as simple as it used to be, but if you are just playing around on an x86_32 then you should be able to use the int 80h instruction because that will be supported on systems that can use one of the other mechanisms for backwards compatibility.

In C, you don't really do it directly, but you'll end up doing this indirectly any time you use library functions that end up invoking system calls. File access, network access, etc, are typical examples of this.
Those functions will all end up "trapping" to the kernel, which will handle the request.

Related

__rdtsc/__rdtscp for ARM Mac M1/M2?

I want to insert some time measurement into my code. On x64 I use __rdtscp. Is there something similar for the mac m1/m2? Specifically something that isn't a system call and high resolution.
Just use clock_gettime(CLOCK_MONOTONIC,...)
It is a VDSO function. That means that the kernel injects code into the userspace program that "does the right thing", so the userspace program can access the time stamp counter without doing a syscall.
On x86, it will [usually] invoke rdtsc [or a PET], and adjust the counter value to represent nanoseconds.
On arm, the TSC is a control register, accessible only in kernel mode. But, higher end arm arches allow this to be mapped for R/O access by userspace. The kernel enables the mapping. Then, the VDSO snippet will know how to access the values via the mapping.
Calls to clock_gettime are fast. So fast that it's not worth trying to access the counter register directly.
Also, it's not terribly meaningful to access the counter directly, because we still have to convert it to some standard unit (e.g. nanoseconds). The VDSO snippet will do this.
UPDATE:
Is it a VDSO call on macOS, too? – 
fuz
My direct experience was with arm was on an nVidia Jetson [under linux].
But, AFAIK, macOS provides [has to provide] clock_gettime.
On older kernels, it may have to issue a syscall equivalent.
But, since the architecture provides the means to do the direct access for userspace to a given OS/kernel, there is every reason to believe the VDSO method is available under macOS as well. In fact, it does: https://www.unix.com/man-page/osx/7/vdso/
The way to see the specific mechanism is to build a program that uses clock_gettime and [using gdb] single step it a bit. Then, it is possible to have gdb disassemble the clock_gettime code.
We have to use gdb [vs. objdump and/or readelf] for the disassembly because the snippet is loaded/injected by the kernel dynamically, so it's not easily accessible with static analysis.
Further, the injected code can be processor model specific. The kernel probes the CPU arch and its features during boot. It crafts the snippet based on the features it finds.
Using gdb is how I examined clock_gettime [about 3 years ago for a commercial product], to verify that it would access the H/W without a syscall and that it provided the correct nanosecond values. In that particular case, I also looked at the arch specific sections in the kernel source code.

What's the relationship between VDSO(7) and SYSCALL(2)?

From this post, I learned
syscall is the default way of entering kernel mode on x86-64.
In practice, recent kernels are implementing a VDSO
Then I look up manual, in http://man7.org/linux/man-pages/man2/syscall.2.html :
The first table lists the instruction used to transition to kernel
mode (which might not be the fastest or best way to transition to the
kernel, so you might have to refer to vdso(7)), the register used to
indicate the system call number, the register used to return the sys‐
tem call result, and the register used to signal an error.....
But I lack some essential knowledge to understand the statements.
Is it true that VDSO(7) is the implementation of syscall(2), or syscall(2) will invoke VDSO(7) to complete system call?
If it is not true, what's the relationship between VDSO(7) and SYSCALL(2)?
the VDSO(7) is not the implementation of syscall(2).
Without VDSO(7), syscall will be run in user-space applications. In this case will be occur context switching.
if use VDSO(7), will be run syscall without context switching.
The kernel automatically maps into the address space of all user-space applications with vDSO.
Read more carefully the man pages syscalls(2), vdso(7) and the wikipages on system calls and VDSO. Read also the operating system wikipage and Operating Systems: Three Easy Pieces (freely downloadable).
System calls are fundamental, they are the only way a user-space application can interact with the operating system kernel and use services provided by it. So every program uses some system calls (unless it crashes and is terminated by some signal(7)). System calls requires a user to kernel transition (e.g. thru a SYSCALL or SYSENTER machine instruction on x86) which is somehow "costly" (e.g. could take a microsecond).
VDSO is only a clever optimization (to avoid the cost of a genuine system call, for very few functions like clock_gettime(2) which also still exist as genuine system calls), a bit like some shared library magically provided by the kernel without any real file. Some programs (e.g. statically linked ones, or those not using libc like BONES or probably busybox) don't use it.
You can avoid VDSO (or not use it), and earlier kernels did not have it. But you cannot avoid doing system calls, and programs usually do a lot of them.
Play also with strace(1) to understand the (many) system calls done by an application or a running process.

cannot find pthread.c in linux folder

I have downloaded kernel and the kernel resides in folder called Linux-2.6.32.28 in which I can find /Kernel/Kthread.c
I found Kthread.c but I cannot find pthread.c in Linux-2.6.32.28
I found Kthread.c in Linux-3.13/Kernel and Linux-4.7.2/Kernel
locate pthread.c finds file pthread.c in Computer/usr folder that comes when I install Ubuntu but pthread.c is not available in downloaded folders Linux-2.6.32.28, Linux-3.13, Linux-4.7.2
MORE: There are two sets of function calls. 1. System Calls 2. Library Calls.
For a computer to do any task it has to use hardware resources. So, how library calls differ from system calls?
System calls always use kernel which means hardware.
Library calls means no usage of kernel or hardware?
I know that library calls sometimes may resolve to system call.
What I want to know is that, if every set of function calls uses hardware then to what degree system calls will use hardware resources when compared with library calls and vice-versa.
Whether a function call is System or Library, at least hardware resource like RAM has to be utilized. Right?
Read first pthreads(7). It explains you that pthreads are implemented in the C standard library as nptl(7).
The C standard library is the cornerstone of Linux systems, and you might have several variants of it; however, most Linux distributions have only one libc, often the GNU glibc, which contains the NPTL. You might use another C standard library (such as musl-libc or dietlibc). With care, you can have several C standard libraries co-existing on your system.
The C standard library (and every user-space program) is using system calls to interact with the kernel. They are listed in syscalls(2). BTW most C standard library implementations on Linux are free software, so you can study (or even improve) their source code if you want to. You often use a system call thru the small wrapper C function (e.g. write(2)) for it in your C standard library, and even more often thru some higher-level function (e.g. fprintf(3)) provided by your C standard library.
Pthreads are implemented (in the NPTL layer of glibc) using low-level stuff like clone(2) and futex(7) and a bit of assembler code. But you generally won't use these directly unless you are implementing a thread library (like NPTL).
Most programs are using the libc and linking with it (and also with crt0) as a shared library, which is /lib/x86_64-linux-gnu/libc.so.6 on my Debian/Sid/x86-64. However, you might (but you usually don't) invoke system calls directly thru some assembler code (e.g. using a SYSCALL or SYSENTER machine instruction). See also this.
The question was edited to also ask
What I want to know is that, if every set of function calls uses hardware then to what degree system calls will use hardware resources
Please read a lot more about operating systems. So read carefully Operating Systems: Three Easy pieces (a freely downloadable textbook) and read about Instruction Set Architecture and Computer Architecture. Study several of them, e.g. x86-64 and RISC-V (or ARM, PowerPC, etc...). Read about CPU modes and virtual memory.
You'll find out that the OS manages physical resources (including RAM and cores of processors). Each process has its own virtual address space. So from a user-space point of view, a process don't use directly hardware (e.g. it runs in virtual address space, not in RAM), it runs in some virtual machine (provided by the OS kernel) defined by the system calls and the ISA (the unpriviledged machine instructions).
Whether a function call is System or Library, at least hardware resource like RAM has to be utilized. Right?
Wrong, from a user-space point of view. All hardware resources are (by definition) managed by the operating system (which provides abstractions thru system calls). An ordinary application executable program uses the abstractions and software resources (files, processes, file descriptors, sockets, memory mappings, virtual address space, etc etc...) provided by the OS.
(so several books are required to really answer your questions; I gave some references, please follow them and read a lot more; we can't explain everything here)
Regarding your second cluster of questions: Everything a computer does is ultimately done by hardware. But we can make some distinctions between different kinds of hardware within the computer, and different kinds of programs interacting with them in different ways.
A modern computer can be divided into three major components: central processing unit (CPU), main memory (random access memory, RAM), and "peripherals" such as disks, network transceivers, displays, graphics cards, keyboards, mice, etc. Things that the CPU and RAM can do by themselves without involving peripherals in any way are called computation. Any operation involving at least one peripheral is called input/output (I/O). All programs do some of both, but some programs mostly do computation, and other programs mostly do I/O.
A modern operating system is also divided into two major components: the kernel, which is a self-contained program that is given special abilities that no other program has (such as the ability to communicate with peripherals and the ability to control the allocation of RAM), and is responsible for supervising execution of all the other programs; and the user space, which is an unbounded set of programs that have no such special abilities.
User space programs can do computation by themselves, but to do I/O they must, essentially, ask the kernel to do it for them. A system call is a request from a user-space program to the kernel. Many system calls are requests to perform I/O, but the kernel can also be requested to do other things, such as provide general information, set up communication channels among programs, allocate RAM, etc. A library is a component of a user space program. It is, essentially, a collection of "functions" that someone else has written for you, that you can use as if you wrote them yourself. There are many libraries, and most of them don't do anything out of the ordinary. For instance, zlib is a library that (provides functions that) compress and uncompress data.
However, "the C library" is special because (on all modern operating systems, except Windows) it is responsible for interacting directly with the kernel. Nearly all programs, and nearly all libraries, will not make system calls themselves; they will instead ask the C library to do it for them. Because of this, it can often be confusing trying to tell whether a particular C library function does all of its work itself or whether it "wraps" one or more system calls. The pthreads functions in particular tend to be a complicated tangle of both. If you want to learn how operating systems are put together at their lower levels, I recommend you start with something simpler, such as how the stdio.h "buffered I/O" routines (fopen, fclose, fread, fwrite, puts, etc) ultimately call the unistd.h/fcntl.h "raw I/O" routines (open, close, read, write, etc) and how the latter set of functions are just wrappers around system calls.
(Assigning the task of wrapping system calls to the runtime library for the C programming language is a historical accident and we probably wouldn't do it that way again if we were going to start over from scratch.)
pthread is POSIX thread. It is a library that helps application to create threads in OS. Kthread in kernel source code is for kernel threads in linux.
POSIX is a standard defined by IEEE to maintain the compatibility between different OS.
So pthread source code cannot be seen in linux kernel source code.
you may refer this link for pthread source code http://www.gnu.org/software/hurd/libpthread.html

How are trap handlers, exception handlers, and interrupt handlers different from system calls?

Considering Linux environment, what is the difference between them?
How is a system call different from a normal function call?
According to wikipedia, a TRAP is an exception. Exceptions are defined differently depending on who you talk to. In a generic form, interrupts are exceptions. Exceptions could be a page fault (code or data), an alignment, an undefined instruction, divide by zero, etc.
Generally, they are all very similar. They will switch context to the OS to handle the issue resulting in registers saves (a user-space to OS context switch) and a possible process context switch depending on the request or circumstance. When you transition to the OS, different MMU protection (the CPU view of memory) are in effect and a different stack is used. In most cases, the instruction that caused the fault is the one that was executing when the switch happens.
The interrupt is different in that any user-space instruction could be interrupted. For most others, there are only specific classes of instructions that may cause a fault. This has ramification for compilers and libraries that need to do things atomically (to the thread, process or to the system globally). More details really depend on the CPU in use.
Difference between library and system calls
Considering Linux environment, what is the difference between them?
This is almost unanswerable in a definite way. Linux versions, CPU versions and your definition of what these are would influence the answer. However, I think the above is a good conceptual guide.
How is a system call different from a normal function call?
A normal function call doesn't transition to 'kernel space'. Many access permissions change when entering kernel space. Usually this has some physical hard wiring into the CPU. However the Linux 'mm' and 'io' layers are most definitely different and code maybe required to make it so. It can also depend on what the 'system call' does. In some cases, Linux has been optimize so the system call isn't needed (from one version to the next). See for example the vdso man page. In other cases, the C libraries or other mechanism might avoid the system call; for instance DNS name caching, etc.

Dependency of Run time library on operating system

I was going through this tutorial about how to write a minimalist kernel. I read this in between :
The Run-Time Library
A major part of writing code for your OS is rewriting the run-time library, also known as libc. This is because
the RTL is the most OS-dependent part of the compiler package: the C
RTL provides enough functionality to allow you to write portable
programs, but the inner workings of the RTL are dependent on the OS in
use. In fact, compiler vendors often use different RTLs for the same
OS: Microsoft Visual C++ provides different libraries for the various
combinations of debug/multi-threaded/DLL, and the older MS-DOS
compilers offered run-time libraries for up to 6 different memory
models.
I am kind of confused with this part. Suppose I write my kernel in C code and against the advice use the inbuilt printf() function to print something. Finally my code will be translated to the machine code. When it will be executed, processer will directly run it. Why does the author says :
inner workings of the RTL are dependent on the OS in use ?
There are two separate issues:
What will printf() do when run inside of your kernel? Most likely it will crash or do nothing, since the RTL of the C compiler you use to develop your kernel is probably assuming some runtime environment with console, operating system, etc. Even if you're using a freestanding implementation of C/C++, the runtime will likely take over serial ports or whatnot to perform the output. You don't want that, probably, since your kernel's drivers will control the I/O. So you need to reimplement the underlying file I/O from the RTL.
What will printf() do when run in a user process that runs on top of your kernel? If the kernel protects access to hardware resources, it can't do anything. The underlying file I/O code from the RTL has to be aware of how to communicate with the kernel to open whatever passes for standard input/output "files" and to perform data exchange.
You need to be aware of whether you're using a free-standing or hosted implementation of the C/C++ compiler + RTL, and all of the implications. For kernel development, you'll be using a free-standing implementation. For userspace development, you'll want a hosted implementation, perhaps a cross-compiler, but the runtime library must be written as for a hosted implementation. Note that in both cases you can use the same compiler, you just need to point it to appropriate header files and libraries. On Linux, for example, kernel and userspace development can be done using the very same gcc compiler, with different headers and libraries.
The processor has no clue what a console is, or what a kernel is. Some code has to actually access the hardware. When you take printf() from a hosted C/C++ implementation, that implementation, somewhere deep in its guts, will invoke a system call for the particular platform it was meant to run on. That system call is meant to write to some abstraction that wraps the "console". On the other side of this system call is kernel code that will push this data to some hardware. It may not even be hardware directly, it may well be userspace of another process.
For example, whenever you run things in a GUI-based terminal on a Unix machine (KDE's Konsole, X11 xterm, OS X Terminal, etc.), the userland process invoking printf() has very, very far to go before anything hits hardware. Namely (even this is simplified!):
printf() writes data to a buffer
The buffer is flushed to (written to) a file handle. The write() library function is called.
The write() library function invokes a system call that transfers the control over to the kernel.
The kernel code copies the data from the userspace pages, since those can vanish at any time, to a kernel-side non-paged buffer.
The kernel code invokes the write handler for a given file handle - a file handle, in many kernels, is implemented as class with virtual methods.
The file handle happens to be a pseudo-terminal (pty) slave. The write method passes the data to the pty master.
The pty master fills up the read buffer of given pseudo-terminal, and wakes up the process waiting on the related file handle.
The process implementing the GUI terminal wakes up and read()s the file handle. This goes through the library to a syscall.
The kernel invokes the read handler for the pty master file handle.
The read handler copies its buffered data to the userspace.
The syscall returns.
The terminal process takes the data, parses it for control codes, and updates its internal data structure representing the emulated screen. It also queues an update event in the event queue. Control returns to the event loop of the GUI library/framework. This is done through an event since those events are usually coalesced. If there's a lot of data available, it will be all processed to update the screen data structure before anything gets repainted.
The event dispatcher dispatches the update/repaint event to the "screen" widget/window.
The event handler code in the widget/window uses the internal data structure to "paint" somewhere. Usually it'd be on a bitmap backing store.
The GUI library/framework code signals the operating system's graphics driver that new data is available on the backing store.
Again, through a syscall, the control is passed over to the kernel. The graphics driver running in the kernel will do the necessary magic on the graphics hardware to pass the backing bitmap to the screen. It may be an explicit memory copy, or a simple queuing of a texture copy with the graphics hardware.
Printf() is a high-level function that can be independent of the OS. It is however just part of the puzzle, it has dependencies itself. It needs to be able to write to stdout. Which will result in low-level OS dependent system calls, like create() to open the stdout stream and write() to send printf output there. Different OSes have different system calls so there's always an adaption layer, there will be in yours.
So sure, you can make printf() work in your kernel. Actually seeing the output of calls to printf() is going to be the real problem to solve. Nothing like a terminal window in kernel mode.

Resources