What does sys_schedule() do in Minix 3.1.8? - c

I'm modifying a scheduler in Minix 3.1.8 and wondering what the system call sys_schedule() does in CPU. Could someone explain?
sys_schedule.c
PUBLIC int sys_schedule(endpoint_t proc_ep, unsigned priority, unsigned quantum)
{
message m;
m.SCHEDULING_ENDPOINT = proc_ep;
m.SCHEDULING_PRIORITY = priority;
m.SCHEDULING_QUANTUM = quantum;
return(_kernel_call(SYS_SCHEDULE, &m));
}
com.h
#define KERNEL_CALL 0x600 /* base for kernel calls to SYSTEM */
# define SYS_SCHEDULE (KERNEL_CALL + 3) /* sys_schedule() */
kernel_call.c
PUBLIC int _kernel_call(int syscallnr, message *msgptr)
{
msgptr->m_type = syscallnr;
_do_kernel_call(msgptr);
return(msgptr->m_type);
}
ipc.h
_PROTOTYPE( int _do_kernel_call, (message *m_ptr) );
_ipc.S
ENTRY(_do_kernel_call)
/* pass the message pointer to kernel in the %eax register */
movl 4(%esp), %eax
int $KERVEC
ret

Any system call in MINIX will switch to the SYSTEM task (which is what does the code you showed, at least in part). The SYSTEM task has a table which maps the SYS_XXX tags into do_xxx() subroutines. These subroutines are usually contained in small source files in the system/ folder.
There we quickly find do_schedule.c. That file in 3.1.8 is pretty straightforward (and this is clearly explained in the book IIRC), but to give you a resume, it checks its arguments and stores the new scheduling parameters in the calling process table; those new values might then change which process will be picked when the SYSTEM task has ended its job and is about to return to user mode.

Minix supports userspace scheduling, which means that a userspace process is responsible for making scheduling decisions for one or more processors. The scheduler processes are invoked by the kernel when there is a need to make such decisions.
The purpose of sys_schedule system call is to enable this userspace scheduling design. A scheduler can invoke sys_schedule to tell the kernel how to schedule a given process. See the documentation page on userspace scheduling.
I've written a description of the SYS_SCHEDULE system call, which you can refer to until the official documentation gets updated.
Parameters:
proc_ep: Process endpoint to be reschedule.
priority: The priority to assign to the process.
quantum: The amount to time to run the process. When the process runs out of quantum, the scheduler assocaited with the process will be informed by the kernel, which can invoke sys_schedule to reschedule the process, thereby putting it in a runnable state again. If the process is not associated with userspace scheduler, the Minix process manager (PM) automatically renews its quantum.
Return value:
EINVAL: proc_ep contains an incorrect process number.
EPERM: The process that performed the system call is not the scheduler associated with proc_ep, so it doesn't have permission to reschedule the process specified by proc_ep.
EINVAL: Invalid priority or quantum.
OK: Call succeeded. In this case, the process has been added to the queue associated with the specified priority. The kernel scheduler schedules processes in round-robin fashion starting from the those in the highest-priority queue.
Note that _do_kernel_call doesn't implement sys_schedule, rather it executes int $KERVEC, which is an x86 instruction that performs a user-to-kernel transition and invokes the interrupt handler associated with interrupt number $KERVEC. The interrupt handler then invokes the actual implementation of the system call specified by msgptr->m_type. (See #AntoineL's answer.)
The sys_schedule is used by the PM, which runs in userspace, to balance the priority queues periodically and to automatically renew the quantum of processes that don't have schedulers.

Related

System call: does Read function change process?

enter image description here
I learned that when a system call function is called, the process changes. But what is process B if I call the read function without the fork() function? isn't there is only one process?
On x86-64, there is one specific instruction to do system calls: syscall (https://www.felixcloutier.com/x86/syscall.html). When you call read() in C, it is compiled to placing the proper syscall number in a register along with the arguments you provide and to one syscall instruction. When syscall is executed, it jumps to the address stored in the IA32_LSTAR register. After that, it is in kernel mode executing the kernel's syscall handler.
At that point, it is still in the context of process A. Within its handler, the kernel realizes that you want to read from disk. It will thus start a DMA operation by writing some registers of the hard-disk controller. From there, process A is waiting for IO. There is no point in leaving the core idle so the kernel calls the scheduler and it will probably decide to switch the context of the core to another process B.
When the DMA IO operation is done, the hard-disk controller triggers an interrupt. The kernel thus puts process A back into the ready queue and calls the scheduler which will probably have the effect of switching the context of the core back to process A.
The image you provide isn't very clear so I can understand the confusion. Overall, on most architectures it will work similarly to what is stated above.
The image is somewhat misleading. What actually happens is, the read system call needs to wait for IO. There is nothing else that can be done in the context of process (or thread) A.
So kernel needs to find something else for the CPU to do. Usually there is some other process or processes which do have something to do (not waiting for a system call to return). It could also be another thread of process A that is given time to execute (from kernel point of view, thread and process aren't really much different, actually). There may be several processes which get to execute while process A waits for system call to complete, too.
And if there is nothing else for any other process and thread to do, then kernel will just be idle, let the CPU sleep for a bit, basically save power (especially important on a laptop).
So the image in the question shows just one possible situation.

In a linux kernel mode, how can I detect a process?

I need to create a user-mode process. This process has to be detected in kernel mode to be sent to a FIFO queue (SCHED_FIFO) in the Linux kernel.
I have been investigating and if you use the function void scheduler_tick(void), which is located in core.c (I think scheduler_tick is called by the system each tick of the clock of the cpu), I can capture the process.
My question is if this is correct, or if there is any better way.
Scheduler_tick code: http://lxr.free-electrons.com/ident?i=scheduler_tick
The work is based on a multilevel queue, in which a series of diferente processes will be introduced(we have: payment processes, cancellations processes, reservations processes and event processes). These processes have different priorities in to the system.
Therefore when I created a process, for example payment process, i need detect the process, because i need know what type it is your priority.
Hence to comment the idea of used the function void scheduler_tick to detect process.
I don't know if i explained well ...
Thank you very much.
Creating a user process is not a kernel concern.
All the user processes that are created are forked from the init process or from its children.
You don't need to do that in the kernel. Actually, you have to keep that outside of the kernel.
What you need to do is either use chrt in you init scripts or use sched_setscheduler from your init program or daemon monitor.

Return control from thread to scheduler (context switching)

I am writing a simple threading library using context switching for a college project. I am having troubles returning from the thread execution.
Originally I switched to a newly created thread like this:
int done = 0;
getcontext(&parent_context);
if (!done) {
done = 1;
setcontext(&(thread->context));
}
return thread->tid;
Where thread->context.uc_link is &parent_context. It works, but I need to call a scheduler upon thread creation instead of just switching to its context. So I set thread->context.uc_link to NULL instead of &parent_context and replaced the above code with
schedule(thread);
scheduler();
return thread->tid;
Where schedule enqueues the thread and scheduler gets the first thread in the queue and calls dispatcher, which is just a call to setcontext. The thing is, I need the thread to return control to the scheduler. The first thing that occurred me was doing:
static void
scheduler()
{
int dispatched = 0;
ucontext_t ret;
// Get the first thread in the queue, then
thread->context.uc_link = &ret;
getcontext(&ret);
if (!dispatched) {
dispatched = 1;
setcontext(&(thread->context));
}
// Remove the dispatched thread from the queue
}
Which is not working - the thread doesn't return control to the scheduler and the program finishes after the thread terminates its execution. I assume this is because I didn't call makecontext after changing uc_link. However, in order to call makecontext I would have to pass to the scheduler the thread's function pointer and arguments, which is not desirable, as I cannot modify the thread data structure to store that (project rules). The threading "libraries" using context switching I found online do a setcontext call inside the thread's function:
http://www.evanjones.ca/software/threading.html
http://nitish712.blogspot.com.br/2012/10/thread-library-using-context-switching.html
This is not desirable either, as the user should not need to do the context switching himself. How can I make the thread return control to the scheduler? One hack I can think of is using a static variable ucontext_t return_context and using it as all thread's uc_link. So before I call the scheduler I do getcontext(&return_context) and the scheduler becomes:
// Get the first thread in the queue, then
// Remove the thread from the queue
setcontext(&(thread->context));
This seems to work, but this way no two threads can execute at the same time. It is not an issue for this project, but it seems wrong. Another issue is that every function calling the scheduler acts sort of as a dispatcher:
int done = 0;
getcontext(&return_context);
if (!done) {
done = 1;
scheduler();
}
fprintf(stderr, "Thread returned\n");
Is this the way to go?
What the code needs to do depends on the environment where the code is running.
If running under a OS like Linux or MAC or Windows, the user does not really have much control.
If running on a 'barebones' platform, The the choice needs to be made:
pre-emptive context switching or permissive context switching.
In pre-emptive context switching, a interrupt handler triggers the scheduler (which will trigger the dispatcher which will actually cause a 'thread' to resume execution) and the scheduler saves the current context, decides what to run next (which will depend on how long since the thread was run and priority of the available contexts and whither the specific context is 'blocked' as in waiting for some event.
In permissive context switching, the thread calls the scheduler, thereby giving up the CPU to to the scheduler, which will resume which ever thread is not blocked & highest priority & been delayed the longest.
The scheduler, in a time sensitive system, will also check that a thread that needs to be run periodically has completed its' prior execution before re-starting that thread. If the thread has not completed its' prior execution, then a error is asserted, with usually involves logging the error and re-starting the system.
So, just what environment is your code expected to run in?

Userspace process preempts kernel thread?

Currently I am reading "Understanding the Linux kernel, 3rd edition" and on p.22 I can read:
In the simplest case, the CPU executes a kernel control path sequentially from the
first instruction to the last. When one of the following events occurs, however, the
CPU interleaves the kernel control paths:
A process executing in User Mode invokes a system call, and the corresponding
kernel control path verifies that the request cannot be satisfied immediately; it
then invokes the scheduler to select a new process to run. As a result, a process
switch occurs. The first kernel control path is left unfinished, and the CPU
resumes the execution of some other kernel control path. In this case, the two
control paths are executed on behalf of two different processes.
The kernel control path can be interrupted from a user space process doing a system call?
I thought the priority was pretty much:
interrupts
kernel threads
user space processes
I have checked the errata and could not find anything about this.
You are right about the priority list, but what (I think) the book is trying to say is:
When a (user) process makes a system call, the kernel starts executing on its behalf.
If the system call can be completed (the kernel control path does not run into a roadblock), then it will usually return direct to the calling process - think getpid() function call.
On the other hand, if the system call cannot be completed (for example, because the disk system must read a block into the kernel buffer pool before its data can be returned to the calling process), then the scheduler is used to select a new process to run - preempting the (kernel thread of control that was running on behalf of the) user process.
In due course, the original system call will be able to continue, and the original (kernel thread of control that was running on behalf of the) user process will be able to continue and eventually complete, returning control to the user space process running in user space and not in the kernel.
So "No": it is not the case that the 'kernel path can be interrupted from a user space process doing a system call'.
The kernel path can be interrupted while it is executing a system call on behalf of a user space process because: an interrupt occurs, or the kernel path must wait for a resource to become available, or ...

How do unix signals work?

How do signals work in unix? I went through W.R. Stevens but was unable to understand. Please help me.
The explanation below is not exact, and several aspects of how this works differ between different systems (and maybe even the same OS on different hardware for some portions), but I think that it is generally good enough for you to satisfy your curiosity enough to use them. Most people start using signals in programming without even this level of understanding, but before I got comfortable using them I wanted to understand them.
signal delivery
The OS kernel has a data structure called a process control block for each process running which has data about that process. This can be looked up by the process id (PID) and included a table of signal actions and pending signals.
When a signal is sent to a process the OS kernel will look up that process's process control block and examines the signal action table to locate the action for the particular signal being sent. If the signal action value is SIG_IGN then the new signal is forgotten about by the kernel. If the signal action value is SIG_DFL then the kernel looks up the default signal handling action for that signal in another table and preforms that action. If the values are anything else then that is assumed to be a function address within the process that the signal is being sent to which should be called. The values for SIG_IGN and SIG_DFL are numbers cast to function pointers whose values are not valid addresses within a process's address space (such as 0 and 1, which are both in page 0, which is never mapped into a process).
If a signal handling function were registered by the process (the signal action value was neither SIG_IGN or SIG_DFL) then an entry in the pending signal table is made for that signal and that process is marked as ready to RUN (it may have been waiting on something, like data to become available for a call to read, waiting for a signal, or several other things).
Now the next time that the process is run the OS kernel will first add some data to the stack and changes the instruction pointer for that process so that it looks almost like the process itself has just called the signal handler. This is not entirely correct and actually deviates enough from what actually happens that I'll talk about it more in a little bit.
The signal handler function can do whatever it does (it is part of the process that it was called on behalf of, so it was written with knowledge about what that program should do with that signal). When the signal handler returns then the regular code for the process begins executing again. (again, not accurate, but more on that next)
Ok, the above should have given you a pretty good idea of how signals are delivered to a process. I think that this pretty good idea version is needed before you can grasp the full idea, which includes some more complicated stuff.
Very often the OS kernel needs to know when a signal handler returns. This is because signal handlers take an argument (which may require stack space), you can block the same signal from being delivered twice during the execution of the signal handler, and/or have system calls restarted after a signal is delivered. To accomplish this a little bit more than stack and instruction pointer changes.
What has to happen is that the kernel needs to make the process tell it that it has finished executing the signal handler function. This may be done by mapping a section of RAM into the process's address space which contains code to make this system call and making the return address for the signal handler function (the top value on the stack when this function started running) be the address of this code. I think that this is how it is done in Linux (at least newer versions). Another way to accomplish this (I don't know if this is done, but it could be) would be do make the return address for the signal handler function be an invalid address (such as NULL) which would cause an interrupt on most systems, which would give the OS kernel control again. It doesn't matter a whole lot how this happens, but the kernel has to get control again to fix up the stack and know that the signal handler has completed.
WHILE LOOKING INTO ANOTHER QUESTION I LEARNED
that the Linux kernel does map a page into the process for this, but that the actual system call for registering signal handlers (what sigaction calls ) takes a parameter sa_restore parameter, which is an address that should be used as the return address from the signal handler, and the kernel just makes sure that it is put there. The code at this address issues the I'm done system call (sigreturn)and the kernel knows that the signal handler has finished.
signal generation
I'm mostly assuming that you know how signals are generated in the first place. The OS can generate them on behalf of a process due to something happening, like a timer expiring, a child process dying, accessing memory that it should not be accessing, or issuing an instruction that it should not (either an instruction that does not exist or one that is privileged), or many other things. The timer case is functionally a little different from the others because it may occur when the process is not running, and so is more like the signals sent with the kill system call. For the non-timer related signals sent on behalf of the current process these are generated when an interrupt occurs because the current process is doing something wrong. This interrupt gives the kernel control (just like a system call) and the kernel generates the signal to be delivered to the current process.
Some issues that are not addressed in all of the above statements are multi core, running in kernel space while receiving a signal, sleeping in kernel space while receiving a signal, system call restarting and signal handler latency.
Here are a couple of issues to consider:
What if the kernel knows that a signal needs to be delivered to process X which is running on CPU_X, but the kernel learns about it while running on CPU_Y (CPU_X!=CPU_Y). So the kernel needs to stop the process from running on a different core.
What if the process is running in kernel space while receiving a signal? Every time a process makes a system call it enters kernel space and tinkers with data structures and memory allocations in kernel space. Does all of this hacking take place in kernel space too?
What if the process is sleeping in kernel space waiting for some other event? (read, write, signal, poll, mutex are just some options).
Answers:
If the process is running on another CPU the kernel, via cross CPU communication, will deliver an interrupt to the other CPU and a message for it. The other CPU will, in hardware, save state and jump to the kernel on the other CPU and then will do the delivery of the signal on the other CPU. This is all a part of trying not to execute the signal handler of the process on another CPU which will break cache locality.
If the process is running in kernel space it is not interrupted. Instead it is recorded that this process has received a signal. When the process exits kernel space (at the end of each system call), the kernel will setup the trampoline to execute the signal handler.
If the process, while running in kernel space, after having received a signal, reaches a sleep function, then that sleep function (and this is common to all sleep functions within the kernel) will check if the process has a signal pending. If it is so, it will not put the process to sleep and instead will cancel all that has been done while coming down into the kernel, and will exit to user space while setting up a trampoline to execute the signal handler and then restart the system call. You can actually control which signals you want to interrupt system calls and which you do not using the siginterrupt(2) system call. You can decide if you want system calls restartable for a certain signal when you register the signal using sigaction(2) with the SA_RESTART flag. If a system call is issued and is cut off by a signal and is not restarted automatically you will get an EINTR (interrupted) return value and you must handle that value. You can also look at the restart_syscall(2) system call for more details.
If the process is already sleeping/waiting in kernel space (actually all sleeping/waiting is always in kernel space) it is woken from the sleep, kernel code cleans up after itself and jump to signal handler on return to user space after which the system call is automatically restarted if the user so desired (very similar to previous explanation of what happens if the process is running in kernel space).
A few notes about why all of this is so complex:
You cannot just stop a process running in kernel space since the kernel developer allocates memory, does things to data structures and more. If you just take the control away you will corrupt the kernel state and cause a machine hang. The kernel code must be notified in a controlled way that it must stop its running, return to user space and allow user space to handle the signal. This is done via the return value of all (well, almost all) sleeping functions in the kernel. And kernel programmers are expected to treat those return values with respect and act accordingly.
Signals are asynchronous. This means that they should be delivered as soon as possible. Imagine a process that has only one thread, went to sleep for hour, and is delivered a signal. Sleep is inside the kernel. So you except the kernel code to wake up, clean up after itself, return to user space and execute the signal handler, possibly restarting the system call after the signal handler finished. You certainly do not expect that process to only execute the signal handler an hour later. Then you expect the sleep to resume. Great trouble is taken by the user space and kernel people to allow just that.
All in all signals are like interrupt handlers but for user space. This is a good analogy but not perfect. While interrupt handlers are generated by hardware some signal handlers originate from hardware but most are just software (signal about a child process dying, signal from another process using the kill(2) syscall and more).
So what is the latency of signal handling?
If when you get a signal some other process is running then it up to the kernel scheduler to decide if to let the other process finish its time slice and only then deliver the signal or not. If you are on a regular Linux/Unix system this means that you could be delayed by 1 or more time slices before you get the signal (which means milliseconds which are equivalent to eternity).
When you get a signal, if your process is high-priority or other processes already got their time slice you will get the signal quite fast. If you are running in user space you will get it "immediately", if you are running in kernel space you will shortly reach a sleep function or return from kernel in which case when you return to user space your signal handler will be called. That is usually a short time since not a lot of time is spent in the kernel.
If you are sleeping in the kernel, and nothing else is above your priority or needs to run, the kernel thread handling your system call is woken up, cleans up after all the stuff it did on the way down into the kernel, goes back to user space and executes your signal. This doesn't take too long (were talking microseconds here).
If you are running a real time version of Linux and your process has the highest real time priority then you will get the signal very soon after it is triggered. Were talking 50 microseconds or even better (depends on other factors that I cannot go into).
Think of the signal facility as interrupts, implemented by the OS (instead of in hardware).
As your program merrily traverses its locus of execution rooted in main(), these interrupts can occur, cause the program to be dispatched to a vector (handler), run the code there, and then return to the location where it got interrupted.
These interrupts (signals) can originate from a variety of sources e.g. hardware errors like accessing bad or misaligned addresses, death of a child process, user generated signals using the kill command, or from other processes using the kill system call. The way you consume signals is by designating handlers for them, which are dispatched by the OS when the signals occur. Note that some of these signals cannot be handled, and result in the process simply dying.
But those that can be handled, can be quite useful. You can use them for inter process communication i.e. one process sends a signal to another process, which handles it, and in the handler does something useful. Many daemons will do useful things like reread the configuration file if you send them the right signal.
Signal are nothing but an interrupt in the execution of the process. A process can signal itself or it can cause a signal to be passed to another process. Maybe a parent can send a signal to its child in order to terminate it, etc..
Check the following link to understand.
https://unix.stackexchange.com/questions/80044/how-signals-work-internally
http://www.linuxjournal.com/article/3985
http://www.linuxprogrammingblog.com/all-about-linux-signals?page=show

Resources