I have some confusion when looking at how interrupt handler(ISR) is run. In Wiki http://en.wikipedia.org/wiki/Context_switch, it describes interrupt handling with 2 steps:
1) context switching
When an interrupt occurs, the hardware automatically switches a part of the
context (at least enough to allow the handler to return to the interrupted code).
The handler may save additional context, depending on details of the particular
hardware and software designs.
2) running the handler
The kernel does not spawn or schedule a special process to handle interrupts,
but instead the handler executes in the (often partial) context established at
the beginning of interrupt handling. Once interrupt servicing is complete, the
context in effect before the interrupt occurred is restored so that the
interrupted process can resume execution in its proper state.
Let's say the interrupt handler is the upper half, is for a kernel space device driver (i assume user space device driver interrupt follow same logic).
when interrupt occurs:
1) current kernel process is suspended. But what is the context situation here? Based on Wiki's description, kernel does not spawn a new process to run ISR, and the context established at the beginning of interrupt handling, sounds so much like another function call within the interrupted process. so is interrupt handler using the interrupted process's stack(context) to run? Or kernel would allocate some other memory space/resource to run it?
2) since here ISR is not a 'process' type that can be put to sleep by scheduler. It has to be finished no matter what? Not even limited by any time-slice bound? What if ISR hang, how does the system deal with it?
Sorry if the question is fundamental. I have not delved into the subject long enough.
Thanks,
so is interrupt handler using the interrupted process's stack(context) to run? Or kernel would allocate some other memory space/resource to run it?
It depends on the CPU and on the kernel. Some CPUs execute ISRs using the current stack. Others automatically switch to a special ISR stack or to a kernel stack. The kernel may switch the stack as well, if needed.
since here ISR is not a 'process' type that can be put to sleep by scheduler. It has to be finished no matter what?
Yep, or you're risking to hang your computer. You see, interrupts interrupt processes and threads. In fact, most CPUs have no concept of a thread or a process and to them it doesn't matter what gets interrupted/preempted (it can even be another ISR!), it's just not going to execute again until the ISR finishes.
Not even limited by any time-slice bound? What if ISR hang, how does the system deal with it?
It hangs, especially if it's a single-CPU system. It may report an error and then hang/reboot. In fact, in Windows (since Vista?) hung or too slowly executing deferred procedures (DPCs), which aren't ISRs but are somewhat like them (they execute between ISRs and threads in terms of priority/preemption) can cause a "bugcheck". The OS monitors execution of DPCs and it can do that concurrently on multiple CPUs.
Anyway, it's not a normal situation and typically there's no way out of it other than a system reset. Look up watchdog timers. They help to discover such bad hangs and perform a reset. Many electronic devices have them.
Think about interrupt handler as a function running in its own thread with high priority. When interrupt is set by device, any other activity with lowest priority is suspended, and ISR is executed. This is like thread context switch.
When ISR hangs (for example, in endless loop), the whole computer hangs - assuming that we are talking about ISR in PC driver. Any activity with lower that ISR priority is not allowed, so computer looks dead. However, it still reacts on the hardware remote debugger commands, if one is attached.
Related
im wondering whether I understand the concept of a RTOS, and more specifically the scheduling process, correctly.
So, I think I understand the process of a timer interrupt (i omitted the interrupt enable/disable commands for better readability here)
1. program runs...
2. A timer tick occurs that triggers a Timer Interrupt
3. The Timer ISR is called
The timer ISR looks like this:
3.1. Kernel saves context (registers etc.)
3.2. Kernel checks if there is a higher priority task
3.3. If so, the Kernel performs the context switch
3.4. Return from Interrupt
4. Program runs with another task executing
But how does the process looks like, when an Interrupt occurs from lets say a I/O Pin?
1. program runs
2. an interrupt is triggered because data is available
3. a general ISR is called?
3.1. Kernel saves context
3.2. Kernel have to call the User defined ISR, because the Kernel doesn't know what to do now
3.1.1 User ISR runs and does whatever it should do (maybe change priority of a task, that should run now, because the data is now available)
3.1.2 return from User ISR
3.3. Kernel checks if there is a higher priority task available
3.4. If so the Kernel performs a context switch
3.5. Return from Interrupt
4. program runs with the different task
In this case the kernel must implement a general ISR, so that all interrupts are mapped to this ISR. For example (as far as i know) the ATmega168p microcontroller has 26 interrupt vectors. So there should be a processor specific file, that maps all the Interrupts to a general ISR. The Kernel-ISR determines what caused the interrupt and calls the specific User-ISR (that handles the actual interrupt).
Did I misunderstood something?
Thank you for your help
There is a clear distinction between the OS tick interrupt and the OS scheduler - you have however conflated the two. When the OS tick ISR occurs, the tick count is incremented, if that increment causes a timer or delay expiry, that is a scheduling event, and scheduling events causes the scheduler to run on exit from the interrupt context.
Different RTOS may have subtle differences, but in general in any ISR, if a scheduling event occurred, the scheduler runs immediately before exiting the interrupt context, setting up the threading context for whatever thread is due to run by the scheduling policy (normally highest priority ready thread).
Scheduling events include:
OS timer expiry
Task delay expiry
Timeslice expiry (for round-robin scheduling).
Semaphore give
Message queue post
Task event flag set
These last three can occur in any ISR (so long as they are "try semantics" non-blocking/zero timeout), the first three as a result of the tick ISR. So the scheduler will run on exit from the interrupt context when any interrupt has caused at least one scheduling event (there may have been nested or multiple simultaneous interrupts).
Scheduling events may occur in the task context also including on any potentially blocking action such as:
Semaphore give
Semaphore take
Message queue receive
Message queue post
Task event flag set
Task event flag wait
Task delay start
Timer wait
Explicit "yield"
The scheduler runs also when a thread triggers a scheduling event, so context switches do not only occur as the result of an interrupt.
To summarise and with respect to your question specifically; the tick or any other interrupt does not directly cause the scheduler to run. An interrupt, any interrupt can perform an action that makes the scheduler due to run. Unlike the thread context where such an action causes the scheduler to run immediately, in the interrupt context, the scheduler is deferred until all pending interrupts have been serviced and runs on exit from the interrupt context.
For details of a specific RTOS implementation of context switching see ยงยง3.05, 3.06 and 3.10 of MicroC/OS-II: The Real Time Kernel (the kernel and the book were specifically developed to teach such principles, so it is a useful resource and the principles apply to other RTOS kernels). In particular Listings 3.18 to 3.20 and Figure 3.10 and the associated explanation.
I saw this piece of code on disk read in Linux 0.11 kernel:
static inline void lock_buffer(struct buffer_head * bh)
{
cli();
while (bh->b_lock)
sleep_on(&bh->b_wait);
bh->b_lock=1;
sti();
}
IIUC, cli() will block the interrupt (not blocking all as explained here: https://c9x.me/x86/html/file_module_x86_id_31.html, but still, block some interrupts which means it changes the default behavior).
And sleep_on will call schedule, which will pass the control flow to another process.
However, what makes me confused is that here we will switch to another process with some of the interrupts blocked, which seems error-prone because the other process should expect the default behavior. So is this a correctly written piece of code (if so, why?) or it is just a wrongly written one which will cause unexpected behaviors?
I presume that the interrupt handler of the disk drive will be the one to wakeup(&bh->b_wait), which could lead to a missed wakeup if interrupts were not disabled in the process waiting for this block.
Remember that condition variables (sleep_on, wakeup) have no memory: sleep_on will suspend until wakeup is called; it doesn't matter if wakeup is called just before sleep_on.
From the point in time of testing bh->b_lock, the caller is racing with the interrupt handler; thus cli (or, more typical unix splbio()) blocks the interrupt handler, preventing the race.
Since the kernel saves the interrupt state (mask, priority, ...) with the process state, when sleep_on cause a reschedule, it is most likely that interrupts will be re-enabled; or at least eventually will be. The disk interrupt will eventually run, waking-up this process.
When this process is rescheduled, its saved interrupt state (disabled) will be restored, so that the test & assignment of b_lock will also prevent interference from the disk interrupt handler.
Thought about this again. I think this is the intended behavior. It means that before the disk read finishes (unlock_buffer being called), all the following executions will be in uninterruptible mode (interrupt blocked). When the buffer is unlocked and the head of queue is woken up,
while (bh->b_lock)
sleep_on(&bh->b_wait);
bh->b_lock=1;
sti();
will be executed and because we are in uninterruptible mode, it will execute to sti() without switching to other process. So other processes waiting on the same signal will sleep again (bh->b_lock is 1) when scheduled and only 1 process continues its execution.
I have been trying to understand how context switching works in Linux Kernel. It appears to me that there is a situation (explained later) which results in no invocation of IRET instruction after the interrupt (I am sure that there is something that I am missing!). I am assuming that invocation of IRET after the interrupt is extremely necessary, since you can't get the same interrupt until you invoke IRET. I am only worried about uni-processor kernel running on x86 arch.
The situation that I think might result in the described behavior is as follows:
Process A running in kernel mode calls schedule() voluntarily (for example while trying to acquire an already locked mutex).
schedule() decides to perform a context switch to process B and hence calls context_switch()
context_switch() switches virtual memory from A to B by calling switch_mm()
context_switch() runs macro switch_to() to switch stacks and actually change the running process from A to B. Note that process A is now stuck inside switch_to() and the stack of process A looks like (stack growing downwards):
...
[mutex_lock()]
[schedule()]
[context_switch()] (Stack Top)
Process B starts running. At some later time, it receives a timer interrupt and the timer interrupt handler decides that process B needs a reschedule.
On return from timer interrupt (but before invoking IRET) preempt_schedule_irq() is invoked.
preempt_schedule_irq() calls schedule().
schedule() decides to context switch to process A and calls context_switch().
context_switch() calls switch_mm() to switch the virtual memory.
context_switch() calls switch_to() to switch stacks. At this point, stack of process B looks like following:
...
[IRET return frame]
[ret_from_interrupt()]
[preempt_schedule_irq()]
[schedule()]
[context_switch()] (Stack top)
Now process A is running with its stack resumed. Since, context_switch() function in A was not invoked due to a timer interrupt, process A does not call IRET and it continues execution of mutex_lock(). This scenario may lead to blocking of timer interrupt forever.
What am I missing here?
Economical with the truth time, non-linux-specifc explanation/example:
Thread A does not have to call IRET - the kernel code calls IRET to return execution to thread A, after all, that's one way it may have lost it in the first place - a hardware interrupt from some peripheral device.
Typically, when thread A lost execution earlier on due to some other hardware interrupt or sycall, thread A's stack pointer is saved in the kernel TCB pointing to an IRET return frame on the stack of A before switching to the kernel stack for all the internal scheduler etc gubbins. If an exact IRET frame does not exist because of the particular syscall mechanism used, one is assembled. When the kernel needs to resume A, the kernel reloads the hardware SP with thread A's stored SP and IRET's to user space. Job done - A resumes running with interrupts etc, enabled.
The kernel has then lost control. When it's entered again by the next hardware interrupt/driver or syscall, it can set it's internal SP to the top of its own private stack since it keeps no state data on it between invocations.
That's just one way in which it can be made to work:) Obviously, the exact mechanism/s are ABI/architecture dependent.
I don't know about Linux, but in many operating systems, the context switch is usually performed by a dispatcher, not an interrupt handler. If an interrupt doesn't result in a pending context switch, it just returns. If an interrupt triggered context switch is needed, the current state is saved and the interrupt exits via the dispatcher (the dispatcher does the IRET). This gets more complicated if nested interrupts are allowed, since the initial interrupt is the one that goes to the dispatcher, regardless of of which nested interrupt handler(s) triggered a context switch condition. An interrupt needs to check the saved state to see if it's a nested interrupt, and if not, it can disable interrupts to prevent nested interrupts occurring when it does the check for and optionally exits via the dispatcher to perform a context switch. If the interrupt is a nested interrupt, it only has to set a context switch flag if needed, and rely on the initial interrupt to do the check and context switch.
Usually, there's no need for an interrupt to save a threads state in a kernel TCB unless a context switch is going to occur.
The dispatcher also handles the cases where context switches are triggered by non-interrupt conditions, such as mutex, semaphore, ... .
I'm looking at some Linux kernel module code that starts and stops timers using add_timer and del_timer.
Sometimes, the implementation goes on to delete the timer "object" (the struct timer_list) right after calling del_timer.
I'd like to find out is if this is safe. Note that this is a uniprocessor implementation, with SMP disabled (which would mandate the use of del_timer_sync instead).
The del_timer_sync implementation checks if the timer is being handled anywhere right now, but del_timer does not. On a UP system, is it possible to have the timer being handled without del_timer knowing, i.e. the timer has been removed from the pending timers list and is being handled?
UP makes things quite a bit simpler, but I think the answer is still "it depends."
If you are doing del_timer in process context, then on UP I think you are safe in assuming the timer is not running anywhere after that returns: the timers are removed from the pending lists and run from the timer interrupt, and if that interrupt starts, it will run to completion before allowing the process context code to continue.
However, if you are in interrupt context, then your interrupt might have interrupted the timer interrupt, and so the timer might be in the middle of being run.
I have to develop an application that tries to emulate the executing flow of an embedded target. This target has 2 levels of priority : the highest one being preemptive on the lowest one. The low priority level is managed with a round-robin scheduler which gives 1ms of execution to each thread in turn.
My goal is to write a library that provide the thread_create, thread_start, and all the system calls that are available on my target and use POSIX functions to reproduce the behavior natively on a standard PC.
Thus, when an high priority thread executes, low priority threads should be suspended whatever they are doing at that very moment. It is to the responsibility of the low priority thread's implementation to ensure that it won't be perturbed.
I now it is usually unsafe to suspend a thread, which explains why I didn't find any "suspend(pid)" function.
I basically imagine two solutions to the problem :
-find a way to suspend the low priority threads when a high priority thread starts (and resume them when there is no more high priority activity)
-periodically call a very small "suspend_if_necessary" function everywhere in my low-priority code, and whenever an high priority must start, wait for all low-priority process to call that function and be suspended, execute as single high priority thread, then resume them all.
Even if it is not-so-clean, I quite like the second solution, but still have one problem : how to call the function everywhere without changing all my code?
I wonder if there is an easy way to doing that, somewhat like debugging code does : add a hook call at every line executed that checks for a flag and run some specific code when that flag changes?
I'd be very happy if there is an easy solution to that problem, since I really need to be representative with the behavior of the target execution flow...
Thanks in advance,
Goulou.
Unfortunately, it's not really possible to implement what you want with true threads - even if the high prio thread is restarted, it can take arbitrarily long before the high prio thread is scheduled back in and goes to suspend all the low priority threads. Moreover, there is no reliable way to determine whether the high priority thread is blocked or not using only POSIX threads; you could try tracking things manually, but this runs the risk of both false positives (the thread's blocked on something, but the low prio threads think it's running and suspend itself) and false negatives (you miss a resumed annotation, or there's lag between when the thread's actually resumed and when it marks itself as running).
If you want to implement a thread priority system with pure POSIX, one option is to not use threads, but rather use setcontext for cooperative multitasking. This would allow you to swap between threads at a user level. However you must explicitly yield the CPU in this case. It also doesn't help with blocking syscalls, which would then block all threads in your app; but since you're writing an emulator this might not be an issue.
You may also be able to swap threads using setcontext within a signal handler; I've not tested this case myself, but it could be worth a try scheduling using setcontext in a SIGALRM handler.
To suspend a thread, you sleep it. If you want to be able to wake it on command, sleep it using sigwait, which puts the thread to sleep until it gets a signal. You can send a specific thread a signal with pthread_kill (crazy name, but it actually just sends signals to a thread). This is a very fast way to sleep and wake up threads. 40x Faster than condition variables and very easy.