I'm maintaining a driver which shares some resource between the ISR (i.e., in interrupt context) and the read() syscall. In both cases, spin_lock_irqsave() is used, since (obviously) the resource can be acquired in the interrupt context.
However, I was wondering if using spin_lock_irqsave() is necessary in the interrupt context. Namely, the Unreliable Guide to Locking (see here: https://kernel.readthedocs.io/en/sphinx-samples/kernel-locking.html) states:
Note that the spin_lock_irqsave() will turn off interrupts if they are on, otherwise does nothing (if we are already in an interrupt handler), hence these functions are safe to call from any context.
As a result, is it common practice to use "normal" spin_lock() in the interrupt handler (since the particular interrupt is already disabled) and then call spin_lock_irqsave() in the user context? Alternatively, is the better practice to just use spin_lock_irqsave() everywhere? I'm leaning towards the latter, for two reasons:
As soon as someone sees that a lock is acquired with spin_lock_irqsave(), it's obvious that the lock is intended to be shared with the interrupt context.
As someone maintaining the code, you don't have to ensure whether or not a particular function is going to be called in what context. Said differently, using spin_lock_irqsave() works in any context, so you don't have to ensure that a function is only called in a certain context.
With the above said, I'm wondering what the convention/best practice is for code that resides in kernel space. Is it better to use spin_lock_irqsave() everywhere the lock is acquired, even if you can guarantee that the lock is being acquired from the interrupt context?
See Unreliable Guide To Locking in kernel documentation. There's a table of minimum requirements for locking to synchronize between different contexts which, roughly speaking, can be summarized as:
If one of them is a process: use the ones that are strong enough to disable the other. For example if the other competitor is a softirq, then you need at
least spin_lock_bh, which disables softirq while locking.
Else if one of the them is hard irq: the advent of hard irq is inevitable unless you disable it beforehand, so spin_lock_irq or spin_lock_irqsave, depending on whether the other is hard irq or not.
Otherwise, use spin_lock.
(Of course, those are under assumption that your kernel isn't config as PREEMPT_RT)
Related
Consider the following kernel code
local_irq_disable();
__update_rq_clock(rq);
spin_lock(&rq->lock);
rq is a pointer to a per-processor struct (i.e; not subject to SMP concurrency). Since rq will never be accessed in an another place after calling local_irq_disable (because rq is used by only a single processor and disabling local interrupts means no interrupt handlers will run on that CPU), then what is the point of embedding __update_rq_clock between the previous functions? In other words, what difference does it make from the following, which disables interrupts and takes the lock in a single call, given that rq is safe in both cases inside __update_rq_clock either locked or not?
spin_lock_irqsave(&rq->lock, flags);
__update_rq_clock(rq);
First and foremost: the two examples you show have different semantics: local_irq_disable does not save the old state of IRQs. In other words, when the corresponding local_irq_enable function gets called, it will forcibly re-enable IRQs (whether they were already disabled or not). On the other hand, spin_lock_irqsave does save the old IRQ state, so it can later be restored through spin_unlock_irqrestore. For this reason, the two pieces of code you show are very different, and it doesn't make much sense to compare them.
Now, coming to the real problem:
Since rq will never be accessed in an another place after calling local_irq_disable (because rq is used by only a single processor and disabling local interrupts means no interrupt handlers will run on that CPU)
This is not always true. There isn't a "magic barrier" which stops CPUs from accessing another CPU's per-CPU data. It is still possible, and in such case extra care must be taken by means of a proper locking mechanism.
While per-CPU variables are usually meant to provide fast access to an object for a single CPU, and therefore can have the advantage of not requiring locking, there is nothing other than convention that keeps processors from digging around in other processors' per-CPU data (quote).
Runqueues are a great example of this: since the scheduler often needs to migrate tasks from one runqueue to another, it certainly will need to access two runqueues at the same time at some point. Indeed, this is probably one of the reasons why struct rq has a .lock field.
In fact, doing an rq clock update without holding rq->lock seems to be considered a bug in recent kernel code, as you can see from this lockdep assertion in update_rq_clock():
void update_rq_clock(struct rq *rq)
{
s64 delta;
lockdep_assert_held(&rq->lock);
// ...
It feels like the statements you show in your first code snippet should be re-ordered to lock first and then update, but the code is quite old (v2.6.25), and the call to __update_rq_clock() seems to be deliberately made before acquiring the lock. Hard to tell why, but maybe the old runqueue semantics did not require locking in order to update .lock/.prev_clock_raw, and thus the locking was done afterwards just to minimize the size of the critical section.
I stumbled upon a this embedded system question, can we call an function inside an ISR ?
Working ARM Cortex M4, Have called function many times from ISR without any fault.
I assume behavior will be same for other micro controller as well or am i wrong ?
Note : Please ignore calling an function in ISR would increase my ISR time in turn increasing the interrupt latency.
Generally, there is nothing stopping you from calling a function from an ISR. There are however some things to consider.
First of all, you should keep ISRs as short as possible. Even the function call overhead might be considered too much in some cases. So if you call functions from inside an ISR, it might be wise to inline those functions.
You must also ensure that the called function is either re-entrant or that it isn't called by other parts of the code except the ISR. If a non re-entrant function is called by the main program and your ISR both, then you'll get severe but subtle "race condition" bugs. (Just as you will if the main program and the ISR modify the same shared variable non-atomically, without semaphore guards.)
And finally, designing a system with interrupts, where you don't know if there are other interrupts in the system, is completely unprofessional. You must always consider the program's interrupt situation as whole when designing the individual interrupts. Otherwise the program will have non-existent real-time performance and no programmer involved in the project will actually know what the program is doing. And from the point where nobody knows what they are doing, bugs are guaranteed to follow.
Some RTOS will enforce a policy of which of its macros can or can't be called from an ISR context, i.e. functions that will block on some shared resource. For example:
http://www.freertos.org/a00122.html
I'm looking for hints in using dynamic memory handler safe in multi-threaded system. Details of the issue:
written in C will run on cortex-M3 processor, with RTOS (CooCox OS),
TLSF memory allocator will be used (other allocators might be used if I will find them better suited and they will be free and open-source),
Solution I'm looking for is using memory allocator safe from OS tasks and interrupts.
So far thought of 2 possible approaches, both have few yet unknown for me details:
disable and enable interrupts when calling allocator functions. Problem - if I'm not mistaking I can't play with interrupts disable and enable in normal mode, only in privileged mode (so if I'm not mistaken, that is only in interrupts), I need to do that from runtime also - to prevent interupts and task switching during memory handler operations.
call allocator from SWI. This one is still very unclear for me. 1st - is SWI same as FIQ (if so is it true that FIQ code needs to be written in asm - since allocator is written in C). Then still have few doubts about calling FIQ from IRQ (that scenarion would happen - tho not often), but most likely this part will not cause issues.
So any ideas on possible solutions for this situation?
Regarding your suggestions 1 and 2:
On Cortex-M3 you can enable and disable interrupts at any time in privileged level code through the CMSIS intrinsics __disable_irq/_enable_irq functions. privileged level is not restricted to handler mode; thread mode code can run at privileged level too (and in many small RTOS that is the default).
SWI and FIQ are concepts from legacy ARM architectures. They do not exist in Cortex-M3.
You would not ideally want to perform memory allocation in an interrupt handler - even if the allocator is deterministic, it may still take significant amount of time; I can think of few reasons you would want to do that.
The best approach is to modify the tlsf code to use an RTOS mutex for each of the calls with external linkage. Other libraries I have used have stubs already in the library that normally do nothing, but which you can override with your own implementation to map it to any RTOS.
Now you cannot of course use a mutex in an ISR, but as I said you should probably not allocate memory there either. If you really must perform allocation in an interrupt handler, then enable/disable interrupts is your only option, but you are then confounding all the real-time deterministic behaviour that an RTOS provides. A better solution to that is to have your ISR do not more than issue an event-flag or semaphore to a thread context handler. This allows you to use all RTOS services and scheduling, and the context switch time from ISR to a high priority thread will be insignificant compared to the memory allocation time.
Another possibility would be to not use this allocator at all, but instead use a fixed-block allocator using RTOS queues. You pre-allocate blocks of memory (statically or dynamically), post pointers to the start of each block onto a queue, then to allocate you simply receive a pointer from the queue, and to free you post back to the queue. If memory is exhausted (queue is empty), you can baulk or block on the queue (do not block in an ISR though). You can create multiple queues for different sized blocks, and use the one appropriate to your needs (ensuring you post back to the same queue of course!)
Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.
I'm writing app that has many independant threads. While I'm doing quite low level, dangerous stuff there, threads may fail (SIGSEGV, SIGBUS, SIGFPE) but they should not kill whole process. Is there a way to do it proper way?
Currently I intercept aforementioned signals and in their signal handler then I call pthread_exit(NULL). It seems to work but since pthread_exit is not async-signal-safe function I'm a bit concerned about this solution.
I know that splitting this app into multiple processes would solve the problem but in this case it's not an feasible option.
EDIT: I'm aware of all the Bad Thingsā¢ that can happen (I'm experienced in low-level system and kernel programming) due to ignoring SIGSEGV/SIGBUS/SIGFPE, so please try to answer my particular question instead of giving me lessons about reliability.
The PROPER way to do this is to let the whole process die, and start another one. You don't explain WHY this isn't appropriate, but in essence, that's the only way that is completely safe against various nasty corner cases (which may or may not apply in your situation).
I'm not aware of any method that is 100% safe that doesn't involve letting the whole process. (Note also that sometimes just the act of continuing from these sort of errors are "undefined behaviour" - it doesn't mean that you are definitely going to fall over, just that it MAY be a problem).
It's of course possible that someone knows of some clever trick that works, but I'm pretty certain that the only 100% guaranteed method is to kill the entire process.
Low-latency code design involves a careful "be aware of the system you run on" type of coding and deployment. That means, for example, that standard IPC mechanisms (say, using SysV msgsnd/msgget to pass messages between processes, or pthread_cond_wait/pthread_cond_signal on the PThreads side) as well as ordinary locking primitives (adaptive mutexes) are to be considered rather slow ... because they involve something that takes thousands of CPU cycles ... namely, context switches.
Instead, use "hot-hot" handoff mechanisms such as the disruptor pattern - both producers as well as consumers spin in tight loops permanently polling a single or at worst a small number of atomically-updated memory locations that say where the next item-to-be-processed is found and/or to mark a processed item complete. Bind all producers / consumers to separate CPU cores so that they will never context switch.
In this type of usecase, whether you use separate threads (and get the memory sharing implicitly by virtue of all threads sharing the same address space) or separate processes (and get the memory sharing explicitly by using shared memory for the data-to-be-processed as well as the queue mgmt "metadata") makes very little difference because TLBs and data caches are "always hot" (you never context switch).
If your "processors" are unstable and/or have no guaranteed completion time, you need to add a "reaper" mechanism anyway to deal with failed / timed out messages, but such garbage collection mechanisms necessarily introduce jitter (latency spikes). That's because you need a system call to determine whether a specific thread or process has exited, and system call latency is a few micros even in best case.
From my point of view, you're trying to mix oil and water here; you're required to use library code not specifically written for use in low-latency deployments / library code not under your control, combined with the requirement to do message dispatch with nanosec latencies. There is no way to make e.g. pthread_cond_signal() give you nsec latency because it must do a system call to wake the target up, and that takes longer.
If your "handler code" relies on the "rich" environment, and a huge amount of "state" is shared between these and the main program ... it sounds a bit like saying "I need to make a steam-driven airplane break the sound barrier"...