Context:
Linux 64.
Intel Core 2 duo.
Question:
Where does the Linux kernel "communicate" with the cpu ?
I read the source code for scheduler but could not understand how they communicate and how the kernel tells the cpu that something need to be processed.
I understand that there are run queues, but isn't there something that enables the kernel to interrupt the cpu via the bus ?
Update
It expands my initial questions a bit : How can we tell the cpu where the task queues are ?
Because the cpu has to poll something, and i guess we tell it at some point. Missed that point in the kernel code.
I will try to write a simplified explanation of how it works, tell me if anything is unclear.
A CPU only do one thing : execute instructions. It will start at a predefined address, and execute. That's all. Sometime you can have an interrupt, that will temporarily make the CPU jump to another instruction.
A kernel is a program (=a sequence of instructions) that will make it easy to execute other programs. The kernel will do his business to setup what it needs. This often include building a list of process to run. The definition of "process" is totally up to the kernel because, as you know, the CPU only do one thing.
Now, when the kernel runs (being executed by the CPU), it might decide that one process needs to be executed. To do so, the kernel will simply jump to the process program. How it is done doesn't matter, but in most OSes, the kernel will map a periodic interrupt (the CPU will periodically jump) to a function that decide which process to execute and jump to it. It isn't required, but it is convenient because programs will be forcefully "interrupted" periodically so others can also be executed.
To sum up, the CPU doesn't "know" anything. The kernel runs, and will jump to other process code to make them run. Only the kernel "knows".
The Linux kernel is a program. It doesn't "talk" to the CPU as such; the CPU has a special register, the program counter (PC), which points to the current execution of the kernel which the CPU is processing.
The kernel itself contains many services. One of them manages the task queues. Each entry in the task queue contains information about the task. One such information is the CPU core on which the task is running. When the kernel decides that the service should do some work, it will call it's functions. The functions are made up from instructions which the CPU interprets. Most of them change the state of the CPU (like advancing the PC, changing register values, setting flags, enabling/disabling CPU cores, ...).
This means the CPU isn't polling anything. Depending on the scheduler, different strategies are used to process the task queue. The most simple one is timer based: The kernel install a timer interrupt (i.e. it writes the address of an interrupt handler somewhere plus it configured the timer to cause an interrupt every few milliseconds).
The handler then looks at the task queue and decides what to do, depending on its strategy.
Related
I know it has to do with time and efficiency, and how ISRs take time away from other processes, but I am unclear why this is. I am always told to keep ISRs very short. I am a bit confused why this is.
Normally, ISRs come into scene when a hardware device needs to interact with the CPU. They send an interrupt signal that makes the CPU to leave whatever it was doing to service the interrupt. That it's what ISR must care about.
Now, this depends on many factors, being the hardware environment and the nature of the interrupt maybe the most relevant ones, but it usually happens that in order to properly service an interrupt, ISRs run with interrupts disabled so they cannot be interrupted. This means that the CPU cannot be shared among other processes while it is running ISR code because the system timer interrupt that is used to run the scheduler (which is the part of the kernel that takes care of making the illusion that the CPU can do several tasks at the same time) won't work.
So, if your ISR takes too much time to perform a certain operation with the device, your system will be affected as a whole, because the percentage of time the CPU is available for the rest of processes will be less than usual. This is much noted on old system with PIO hard disks, which interrupt the CPU for every disk sector they want to transfer to the CPU, and the ISR must do the actual transfer. If there's many disk traffic, you may notice things like your mouse moving jerky (because the interrupt that the mouse device sends to the CPU is not attended)
OSes like Linux allow ISRs to defer time consuming operations with hardware devices to tasklets: sort of kernel threads that can share CPU time with other processes, yet keeping the atomic nature of hardware device operations (the OS ensures that there won't be more than one tasklet function -for the specific tasklet associated to the ISR- running in the system at the same time). The PIO transfer from disk to kernel buffers is an example of such operation.
Some precisions w.r.t. the accepted answer.
Interrupts are not necessarily disabled when running an interrupt, and that is not necessarily the reason why the kernel processes all interrupts before returning to threads.
There is the concept of interrupt priorities. An interrupt of higher priority will preempt a running ISR: if the timer interrupt is of higher priority than the running ISR, it will run. However, a kernel will not handle context switches at this time, but rather defer them until all queued/pending ISRs have run.
Also, on some processors (eg. ARM Cortex-M3), the concept of handling an interrupt is a mode of operation in the processor itself. The processor cannot go back to running threads until it gets out of interrupt mode. Once that happens, all interrupts are fully serviced: you cannot go back to running an ISR.
But the main reason why all ISRs must finish before going back to threads is that kernels do not have the concept of a thread-like running context for ISRs. An ISR thus cannot pend: it must run to completion. An ISR is thus hogging the CPU, except from higher-priority interrupts, until it finishes its purpose.
Usually, the main thread has lower priority than the ISRs. Depending on the scheduler, often the main code will be executed after all pending ISRs have been run.
Having alot of computation intensive code in one or many ISR is generally not advisable, since it may cause delays or even CPU starvation of lower priority ISRs or threads, which may be detrimental if time-critical code needs to be executed.
However, when action needs to be taken immediately at an interrupt event, the fastest way is to execute code from the associated ISR (and possibly assign it a high priority).
If you plan on using several interrupt sources that execute time-consuming code, the way to go is by using an RTOS to allow safe and efficient interleaving of several threads to service each of the interrupts.
I would like to run a long term task on a dedicated core and would like that task to be minimally interrupted / preempted. I can see 2 solutions. Which one is better or any other solution?
1) Set affinity and isolate core using isolcpus
2) Make the thread real time using SCHED_FIFO and set the priority high
- if this is the better choice how high the priority should be? Can I set it to 99?
What I am concerned about is being preempted by kernel threads, IPIs ...
Regarding the first solution you mentioned, by adding parameter isolcpus = [CPU no.] during boot will instruct Linux scheduler to not run any task on that CPU unless requested by user using CPU Affinity. But this CPU may receive interrupts and that can also be avoided by setting IRQ Affinity, so that the isolated CPU doesn’t receive any interrupt. Finally in your code of the task you set the Affinity to the isolated CPU and you are good to go.
But Even if you follow these steps, kernel tasks are executed on the isolated CPU core if you are not using a real-time kernel from RP_PREEMPT, hence it might not be possible to completely isolate a CPU core unless you are using RT kernel.
Refer - http://elinux.org/CPU_Shielding_capability
The second solution about using SCHED_FIFO scheduling policy and using a high priority value will still not prevent the kernel threads, Timer tick interrupts, IPIs etc., from pre-empting your task. Because the scheduling policies and priority is for kernel to schedule all other User-space processes and threads and does not apply to kernel threads or processes.
So by setting high priority to your task does not mean you will get 100% CPU dedicated to your task. Also the alternative, manually setting the CPU mask of your task to a CPUSET in the system, can cause problems and suboptimal load balancer performance. Your task will still get interrupted from time to time by Linux code, including other tasks - such as the timer tick interrupt and the scheduler code, IPIs from other CPUs and stuff like work queue kernel threads, although the interruption should be quite minimal if you have don’t have much activity going on in your other cores.
But the cleanest way to achieve this should come from Kernel tweak which I found from this link http://www.linuxjournal.com/article/6799?page=0,2. Though I haven’t tried this personally, I think it’s worth giving a look at this article as well before you decide upon the method you will use.
Below are three threading models that i came across.
Based on these below 3 architectures, It is new for me to understand that, there also exist something called kernel thread, apart from user thread which is introduced as part of POSIX.1C
This is 1-1 model
This is N-1 model.
This is Hybrid model.
I have been through many questions on SO for kernel threads. This looks more relevant link for clarification.
At process level, For every user process that is loaded by Linux loader(say), Kernel does not allocate corresponding kernel process for executing machine instructions that a user process has come up with. User process only request for kernel mode execution, when it require a facility from kernel module[like malloc()/fork()]. Scheduling of user process is done by OS scheduler and assign a CPU core.
For example, User process does not require kernel execution mode to execute an instruction
a=a+2;//a is my local variable in a user level C function
My question:
1)
So, What is the purpose of kernel level thread? Why does OS need to maintain a kernel thread(additionally) for corresponding user thread of a User level process? Does User mode programmer have any control on choosing any of the above three threading models for a given User process through programming?
After i understand the answer to first question, one relevant supplementary is,
2)
Does kernel thread actually get scheduled by OS scheduler but not user thread?
I think the use of the word kernel thread is a bit misleading in these figures. I know the figures from a book about operating system (design) and if I remember correctly, they refer to the way how work is scheduled by the operating system.
In the figures, each process has at least one kernel thread assigned that is scheduled by the kernel.
The N-1 model shows multiple user-land threads that are not known to the kernel at all because the latter schedules the process (or how it's called in the figure, a single kernel thread) only. So for the kernel, each process is a kernel thread. When the process is assigned a slice of processor time, it itself runs multiple threads by scheduling them at its own discretion.
In the 1-1 model, the kernel is aware of the user-land threads and each thread is considered for processor time assignment by the scheduler. So instead of scheduling a whole process, the kernel switches between threads inside of processes.
The hybrid model combines both principles, where lightweight processes are actually threads known to the kernel and which are scheduled for execution by it. Additionally, they implement threads the kernel is not aware of and assign processor time in user-land.
And now to be completely confused, there is actually a real kernel thread in Linux. But as far as I understand the concept, these threads are used for kernel-space operations only, e.g. when kernel modules need to do things in parallel.
So, What is the purpose of kernel level thread?
To provide a vehicle for assignment of a set of resources provided by the OS. The set always incudes CPU code execution on a core. Others may include disk, NIC, KB, mouse, timers, as may be requested by syscalls from the thread. The kernel manages access to those resources as they become available and arbitrates between resource conflicts, eg. a request for KB input when none is available will remove CPU execution from the thread until KB input becomes available.
Why do we need a kernel thread(additionally) for corresponding user
thread of a User level process?
Without a kernel-level thread, the user thread would not be able to obtain execution - it would be dead code/stack. Note that with Linux, the concept of threads/processes can get somewhat muddied, but nevertheless, the fundamental unit of execution is a thread. A process is a higher-level construct whose code must be run by at least one thread, (eg. the one raised by the OS loader to run code at the process entry point when it is first loaded).
Does User mode programmer have any control on choosing any of the
above three threading models for a given User process through
programming?
No, not without a syscall, which means leaving user mode.
Does kernel thread actually get scheduled by OS scheduler but not user
thread
Yes - it is the only thing that gets to be given execution when it can use it, have execution removed when it cannot, and be subject to preemptive removal of CPU if the OS scheduler requires it for something else.
what do we mean by thread running in User mode and running in kernel mode? Is this related to thread execution instruction from User mode and thread executing instruction from Kernel mode? Kindly elaborate.
Also, is it possible that if a thread is executing in user mode is put to suspended state, then it may start executing in kernel mode? if yes, how is it possible? Until now I am only aware that a thread if suspended will be SUSPENDED completely, i.e. the context switch will take place by CPU to schedule another thread.
what do we mean by thread running in User mode and running in kernel mode?
There is no way to know what a person means by a phrase without context. If I had to guess, I'd say they are talking about whether the thread is scheduled by a user-space scheduler or a kernel scheduler. But it's also possible they are actually asking whether the thread is running user code or kernel code.
Is this related to thread execution instruction from User mode and thread executing instruction from Kernel mode? Kindly elaborate.
It could be. It also might not be. There's no way to know what a person means by a phrase without context.
Also, is it possible that if a thread is executing in user mode is put to suspended state, then it may start executing in kernel mode? if yes, how is it possible?
For implementations where the kernel schedules threads, the scheduler is running in kernel space. The code that actually suspends the thread typically runs in kernel space too because it has to add the thread to the various kernel scheduler data structures. So the thread that resumes the thread will run in kernel space too. At a higher level view, the same thread of execution can "become" the kernel scheduler, choose a user-space thread to execute, and then "become" that thread.
Until now I am only aware that a thread if suspended will be SUSPENDED completely, i.e. the context switch will take place by CPU to schedule another thread.
Right, and that's kernel code. So the same core is running user space code, then it's running kernel code, then it's running the user space code of another thread.
Modern operating systems have hardware support for separating the user code from the kernel code. On the x86 architecture you can set up memory pages that are not accessible to normal user code, and will trigger a page fault, so that the OS can survive faulty programs.
Code running in kernel mode has higher privileges, but also more responsibillities, as not everything is as easily accessible as from user space. If the user code gets stuck, then the OS can clean it up. If a kernel mode code hangs it might not be that easy, depending on how high the privilege level is.
Is there any way in Linux to assign one CPU core to a particular given process and there should not be any other processes or interrupt handlers to be scheduled on this core?
I have read about process affinity in Linux Binding Processes to CPUs using the taskset utility but that's not solving my problem because it just try to affine the given process to that core but it is possible that other processes may be scheduled on this core and this is what I want to avoid.
Should we change the kernel code for scheduling?
Yes there is. In fact, there are two separate ways to do it :-)
Right now, the best way to accomplish what you want is to do the following:
Add the parameter isolcpus=[cpu_number] to the Linux kernel command line from the boot loader during boot. This will instruct the Linux scheduler not to run any regular tasks on that CPU unless specifically requested using cpu affinity.
Use IRQ affinity to set other CPUs to handle all interrupts so that your isolated CPU will not receive any interrupts.
Use CPU affinity to fix your specific task to the isolated CPU.
This will give you the best that Linux can provide with regard to CPU isolation without out-of-tree and in-development patches.
Your task will still get interrupted from time to time by Linux code, including other tasks - such as the timer tick interrupt and the scheduler code, IPIs from other CPUs and stuff like work queue kernel threads, although the interruption should be quite minimal.
For an (almost) complete list of interruption sources, check out my page at https://github.com/gby/linux/wiki
The alternative method is to use cpusets which is way more elegant and dynamic but suffers from some weaknesses at this point in time (no migration of timers for example) which makes me recommend the old, crude but effective isolcpus parameter.
Note that work is currently being done by the Linux community to address all these issues and more to give even better isolation.
There is Redhat article talking about it. It modifies the boot parameter isolcpus.
And an old article written by Robert Love. And there is solution in that article.
All of a process' children receive the same CPU affinity mask as their
parent.
Then, all we need to do is have init bind itself to one processor.
All other processes, by nature of init being the root of the process
tree and thus the superparent of all processes, are then likewise
bound to the one processor.
Dedicate a Whole CPU Core to a Particular Program
While taskset allows a particular program to be assigned to certain CPUs, that does not mean that no other programs or processes will be scheduled on those CPUs. If you want to prevent this and dedicate a whole CPU core to a particular program, you can use "isolcpus" kernel parameter, which allows you to reserve the CPU core during boot.
Add the kernel parameter "isolcpus=" to the boot loader during boot or GRUB configuration file. Then the Linux scheduler will not schedule any regular process on the reserved CPU core(s), unless specifically requested with taskset. For example, to reserve CPU cores 0 and 1, add "isolcpus=0,1" kernel parameter. Upon boot, then use taskset to safely assign the reserved CPU cores to your program.
Source(s)
http://xmodulo.com/2013/10/run-program-process-specific-cpu-cores-linux.html
http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html
Even if you follow the steps in gby's answer, kernel tasks are executed on the isolated CPU core. Work is underway in the linux RT_PREEMPT real time project to improve this. So if you are not using a bleeding edge real time kernel from RP_PREEMPT, it might not be possible to completely isolate a CPU core.
As per documentation
The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs.
There is no mention that specific processor will be given to process exclusively.