Why can't FreeRTOS software timer callback use blocking API - c

Quoting the documentation (emphasis theirs)
Timer callback functions execute in the context of the timer service task. It is therefore essential that timer callback functions never attempt to block. For example, a timer callback function must not call vTaskDelay(), vTaskDelayUntil(), or specify a non zero block time when accessing a queue or a semaphore.
The FreeRTOS reference book elaborates a little more, abut again without a clear explanation
It is ok to call functions such as xQueueReceive(), but
only if the function’s xTicksToWait parameter (which specifies the function’s block time) is set
to 0. It is not ok to call functions such as vTaskDelay(), as calling vTaskDelay() will always
place the calling task into the Blocked state.
My question is: why is this such a problem? I'm waiting for a semaphore, that's set by an interrupt, in the timer callback, and it has worked fine so far. (It's used for sending long packets using a USB bulk endpoint.)
Is the only issue possibly delaying other waiting timers?

The statement:
Timer callback functions execute in the context of the timer service task.
is the key. If your callback, blocks, you are blocking the timer service task, which if it were allowed to happen would delay other timer actions, and the RTOS scheduling guarantees could not be fulfilled.
The timer service task will perform the timer actions for all timers that have expired in that tick in a loop. If your timer callback were to perform a delay or blocking action, that would delay all timer actions not yet invoked but which were scheduled for the same tick, and would delay all actions in subsequent ticks if they came due while the service task were blocked.
If the action to be performed requires blocking (or even just takes a significant amount of time), the correct action would be to have your callback signal an independent task to perform the action. Timer callbacks should be treated like interrupt service routines - run to completion as quickly and as deterministically as possible, without blocking. In fact some RTOS actually invoke timer callbacks in the interrupt context rather then a special service task in any case, so it is a good guide to follow regardless of what RTOS you are using.

Related

How a context switch works in a RTOS, need clarity

im wondering whether I understand the concept of a RTOS, and more specifically the scheduling process, correctly.
So, I think I understand the process of a timer interrupt (i omitted the interrupt enable/disable commands for better readability here)
1. program runs...
2. A timer tick occurs that triggers a Timer Interrupt
3. The Timer ISR is called
The timer ISR looks like this:
3.1. Kernel saves context (registers etc.)
3.2. Kernel checks if there is a higher priority task
3.3. If so, the Kernel performs the context switch
3.4. Return from Interrupt
4. Program runs with another task executing
But how does the process looks like, when an Interrupt occurs from lets say a I/O Pin?
1. program runs
2. an interrupt is triggered because data is available
3. a general ISR is called?
3.1. Kernel saves context
3.2. Kernel have to call the User defined ISR, because the Kernel doesn't know what to do now
3.1.1 User ISR runs and does whatever it should do (maybe change priority of a task, that should run now, because the data is now available)
3.1.2 return from User ISR
3.3. Kernel checks if there is a higher priority task available
3.4. If so the Kernel performs a context switch
3.5. Return from Interrupt
4. program runs with the different task
In this case the kernel must implement a general ISR, so that all interrupts are mapped to this ISR. For example (as far as i know) the ATmega168p microcontroller has 26 interrupt vectors. So there should be a processor specific file, that maps all the Interrupts to a general ISR. The Kernel-ISR determines what caused the interrupt and calls the specific User-ISR (that handles the actual interrupt).
Did I misunderstood something?
Thank you for your help
There is a clear distinction between the OS tick interrupt and the OS scheduler - you have however conflated the two. When the OS tick ISR occurs, the tick count is incremented, if that increment causes a timer or delay expiry, that is a scheduling event, and scheduling events causes the scheduler to run on exit from the interrupt context.
Different RTOS may have subtle differences, but in general in any ISR, if a scheduling event occurred, the scheduler runs immediately before exiting the interrupt context, setting up the threading context for whatever thread is due to run by the scheduling policy (normally highest priority ready thread).
Scheduling events include:
OS timer expiry
Task delay expiry
Timeslice expiry (for round-robin scheduling).
Semaphore give
Message queue post
Task event flag set
These last three can occur in any ISR (so long as they are "try semantics" non-blocking/zero timeout), the first three as a result of the tick ISR. So the scheduler will run on exit from the interrupt context when any interrupt has caused at least one scheduling event (there may have been nested or multiple simultaneous interrupts).
Scheduling events may occur in the task context also including on any potentially blocking action such as:
Semaphore give
Semaphore take
Message queue receive
Message queue post
Task event flag set
Task event flag wait
Task delay start
Timer wait
Explicit "yield"
The scheduler runs also when a thread triggers a scheduling event, so context switches do not only occur as the result of an interrupt.
To summarise and with respect to your question specifically; the tick or any other interrupt does not directly cause the scheduler to run. An interrupt, any interrupt can perform an action that makes the scheduler due to run. Unlike the thread context where such an action causes the scheduler to run immediately, in the interrupt context, the scheduler is deferred until all pending interrupts have been serviced and runs on exit from the interrupt context.
For details of a specific RTOS implementation of context switching see §§3.05, 3.06 and 3.10 of MicroC/OS-II: The Real Time Kernel (the kernel and the book were specifically developed to teach such principles, so it is a useful resource and the principles apply to other RTOS kernels). In particular Listings 3.18 to 3.20 and Figure 3.10 and the associated explanation.

Can ASIO timer `cancel()` call a spurious "success"?

The ASIO documentation for basic_deadline_timer::cancel() has the following remarks section:
If the timer has already expired when cancel() is called, then the handlers for asynchronous wait operations will:
have already been invoked; or
have been queued for invocation in the near future.
These handlers can no longer be cancelled, and therefore are passed an error code that indicates the successful completion of the wait operation.
The emphasis has been added by me. Normally when you call cancel() on a timer, the callback is run with an error code of "operation cancelled by user". But this says there is a small chance it could actually be called with a success error code. I think it is trying to say that the following could happen:
Thread A calls async_wait(myTimerHandler) on a timer, where myTimerHandler() is a user callback function.
Thread B calls io_context::post(cancelMyTimer) where cancelMyTimer() is a user callback function. This is now queued up to be called in thread A.
The timer deadline expires, so ASIO queues up the timer callback handler, with a success error code. It doesn't call it yet, but it is queued up to be called in thread A.
ASIO gets round to calling cancelMyTimer() in thread A, which calls cancel() on the timer. But the timer already fired, and ASIO doesn't check that the handler is still queued up and not executed, so this does nothing.
ASIO now calls myTimerHandler, and doesn't check that cancel() was called in the meantime, and so it still passes success as the error code.
Bear in mind this example only has a single thread calling io_context::run(), deadline_timer::async_wait or deadline_timer::cancel(). The only thing that happened in another thread was a call to post(), which happened in an attempt to avoid any race conditions. Is this sequence of events possible? Or is it referring to some multithreading scenario (that seems unlikely given that timers are not thread safe)?
Context: If you have a timer that you wish to repeat periodically, then the obvious thing to do is check the error code in the callback, and set the timer again if the code is success. If the above race is possible, then it would be necessary to have a separate variable saying whether you cancelled the timer, which you update in addition to calling cancel().
You don't even need a second thread to run into a situation where basic_waitable_timer::cancel() is invoked too late (because the timer's (completion) handler is already queued).
It's sufficient that your program executes some other asynchronous operations concurrently to the not yet resumed basic_waitable_timer::async_wait(). If you then only rely on basic_waitable_timer::cancel() for cancellation then the cancel() call from another asynchronous (completion) handler races with an already scheduled async_wait() handler:
If the timer has already expired when cancel() is called, then the handlers for asynchronous wait operations will:
have already been invoked; or
have been queued for invocation in the near future.
These handlers can no longer be cancelled, and therefore are passed an error code that indicates the successful completion of the wait operation.
(basic_waitable_timer::cancel(), emphasis mine, i.e. the race condition is due to the second case)
A real-world example that is single-threaded (i.e. the program doesn't explicitly start any threads and only invokes io_server.run() once) and contains the described race:
void Fetch_Timer::resume()
{
timer_.expires_from_now(std::chrono::seconds(1));
timer_.async_wait([this](const boost::system::error_code &ec)
{
BOOST_LOG_FUNCTION();
if (ec) {
if (ec.value() == boost::asio::error::operation_aborted)
return;
THROW_ERROR(ec);
} else {
print();
resume();
}
});
}
void Fetch_Timer::stop()
{
print();
timer_.cancel();
}
(Source: imapdl/copy/fetch_timer.cc)
In this example, the obvious fix (i.e. also querying a boolean flag) doesn't even need to use any synchronization primitives (such as atomics), because the program is single-threaded. That means it executes (asynchronous) operations concurrently but not in parallel.
(FWIW, in the above example, the bug manifested itself only every 2 years or so, even under daily usage)
Everything you stated is correct. So in your situation you could need a separate variable to indicate you don’t want to continue the loop. I normally used a atomic_bool and I don’t bother posting a cancel routine, I just set the bool & call cancel from whatever thread I am on.
UPDATE:
The source for my answer is mainly experience in using ASIO for years and for understanding the asio codebase enough to fix problems and extend parts of it when required.
Yes the documentation says that the it's not thread safe between shared instances of the deadline_timer, but the documentation is not the best (what documentation is...). If you look at the source for how the "cancel" works we can see:
Boost Asio version 1.69: boost\asio\detail\impl\win_iocp_io_context.hpp
template <typename Time_Traits>
std::size_t win_iocp_io_context::cancel_timer(timer_queue<Time_Traits>& queue,
typename timer_queue<Time_Traits>::per_timer_data& timer,
std::size_t max_cancelled)
{
// If the service has been shut down we silently ignore the cancellation.
if (::InterlockedExchangeAdd(&shutdown_, 0) != 0)
return 0;
mutex::scoped_lock lock(dispatch_mutex_);
op_queue<win_iocp_operation> ops;
std::size_t n = queue.cancel_timer(timer, ops, max_cancelled);
post_deferred_completions(ops);
return n;
}
You can see that the cancel operation is guarded by a mutex lock so the "cancel" operation is thread safe.
Calling most of the other operations on deadline timer is not (in regards to calling them at the same time from multiple threads).
Also I think you are correct about the restarting of timers in quick order. I don't normally have a use case for stopping and starting timers in that sort of fashion, so I've never needed to do that.

What will happen if codes inside a NodeMCU timer execute over the timer interval I set?

With NodeMCU, we can easily create timer function in esp8266 chip.
However, I wonder what will happen if codes inside a timer execute over the timer interval I set?
Please see the code below.
If I set a timer with 2 seconds interval, and "Something to do" inside this timer executes over 2 seconds, then what will happen?
tmr.alarm(0, 2000, 1, function ()
--Something to do
end)
a) Will "Something to do" be terminated once the interval reaches 2 seconds?
b) Or "Something to do" will continue execute until finish, and the next "Something to do" will be delayed?
c) Or the each round of this timer will wait for "Something to do" to finish regardless of the 2-seconds-interval? (the interval is automatically expanded)
d) or else?
The function does not behave as you might think. Usually when you are required to provide a callback function, the callback is executed when an event occurs; which is what happens here. The callback function is executed after the timer expires.
The documentation for tmr.alarm() says that the function is a combination of tmr.register() and tmr.start(). The tmr.register() documentation says
Configures a timer and registers the callback function to call on expiry.
So, your answer is that "Something to do" will run, until it's finished, 2 seconds after the tmr.alarm() function is called.
tmr.alarm() (and tmr.register() which it is based upon) can take a mode parameter. I'll describe their behavior and how each is affected by the execution time of the callback function.
tmr.ALARM_SINGLE: Run the callback function only once n seconds after the call to tmr.alarm() finishes. Completely independent of the execution time of the callback function.
tmr.ALARM_AUTO: Run the callback function repeated every n seconds after the call to tmr.alarm() finishes. It is important to note that the next interval starts as soon as the previous finishes, regardless of the execution time of the callback. So if the callback take 0.5s to finish executing and the timer is 2s, then the next call will occur 1.5s after the callback function finishes.
tmr.ALARM_SEMI: Run the callback function n seconds after the call to tmr.alarm() finishes. Unlike tmr.ALARM_AUTO the next interval is not automatically started, but will only start after you call tmr.start(); which you should probably do within the callback function. This means you can set the timer to be dependent on the execution time of the callback function. If the timer is 2s and you restart the timer at the end of your callback function, then the next time the callback will run will be 2s from then.
As you might be able to tell, you don't want the execution time of the callback function to be greater than the timer period, callbacks will keep stacking on top of each other, never finishing. The callbacks should be simple and quick to execute, perhaps scheduling additional work as another task.
I believe there's a misunderstanding of what type of firmware NodeMCU is or what kind of programming model it requires.
The NodeMCU programming model is similar to that of Node.js, only in Lua. It is asynchronous and event-driven. Many functions, therefore, have parameters for callback functions.
Source: NodeMCU README, "Programming Model"
The Lua libraries provide a set of functions for declaring application functions (written in Lua) as callbacks (which are stored in the Lua registry) to associate application tasks with specific hardware and timer events. These are non-preemptive at an applications level* The Lua libraries work in consort with the SDK to queue pending events and invoke any registered Lua callback routines, which then run to completion uninterrupted.
Source: NodeMCU Lua developer FAQ
For full explanations see the "event tasking system" chapter at https://nodemcu.readthedocs.io/en/latest/en/lua-developer-faq/#so-how-does-the-sdk-event-tasking-system-work-in-lua
You're saying
and "Something to do" inside this timer executes over 2 seconds
but the truth is that it will never run for 2s. In fact, any task that runs for more than a few milliseconds uninterrupted may cause the Wifi and TCP stacks to fail. If you write code that violates this principle then the watchdog might reset your device any time. Events that your code triggers are simply added to a queue and executed in sequential order.
Thus, the correct answer is b) in most cases.

Can del_timer return while its handler is running?

I'm looking at some Linux kernel module code that starts and stops timers using add_timer and del_timer.
Sometimes, the implementation goes on to delete the timer "object" (the struct timer_list) right after calling del_timer.
I'd like to find out is if this is safe. Note that this is a uniprocessor implementation, with SMP disabled (which would mandate the use of del_timer_sync instead).
The del_timer_sync implementation checks if the timer is being handled anywhere right now, but del_timer does not. On a UP system, is it possible to have the timer being handled without del_timer knowing, i.e. the timer has been removed from the pending timers list and is being handled?
UP makes things quite a bit simpler, but I think the answer is still "it depends."
If you are doing del_timer in process context, then on UP I think you are safe in assuming the timer is not running anywhere after that returns: the timers are removed from the pending lists and run from the timer interrupt, and if that interrupt starts, it will run to completion before allowing the process context code to continue.
However, if you are in interrupt context, then your interrupt might have interrupted the timer interrupt, and so the timer might be in the middle of being run.

timer_create how to stop recursive thread function invocation after first timer expiry?

I have created a timer using the simple "timer_create". The timer is created using SIGEV_THREAD. That is when the timer expires, there is a call to the timer thread function.
How timer_create works is, suppose assume: expiry=3 secs and timer interval is 1 ns, then the timer keeps ticking every 1 ns until expiry reaches. Once the timer expires, from that instance it keeps on hitting the timer thread function after every 1 ns (timer interval). And keeps on creating one thread per hit till the timer is deleted.
I don't want this to happen, i want once the timer expires, it should go and hit the thread function only once.
How can i achieve this? Can we put any option in timer_create? If not any other timer API?
Thanks a lot in advance
I think this is an implementation flaw in the glibc implementation of POSIX timers. There is certainly no way the timer_getoverrun function, which is critical for realtime usage, can work in the glibc implementation, since it returns from the kernel the overrun count for the "current" expiration, but when multiple expiration events are running in parallel, "current" makes no sense. There are also serious issues with resource exhaustion and dropped expiration events which make the implementation unusable for realtime purposes. For example, in nptl/sysdeps/unix/sysv/linux/timer_routines.c:
struct thread_start_data *td = malloc (sizeof (*td));
/* There is not much we can do if the allocation fails. */
...
In the Linux man page for sigevent, you see for SIGEV_THREAD:
Among the implementation possibilities here are that each timer notification could result in the creation of a new thread, or that a single thread is created to receive all notifications.
The latter is the only choice that could provide correct realtime semantics, but for some reason, glibc did not take this choice.
Here is a possible workaround:
Choose a realtime signal, block that signal before creating any threads, and setup your timer to use that signal with SIGEV_SIGNAL. Now, create a thread for handling your timer(s), and loop on sigwaitinfo, then call your handler function each time it returns. This is actually one possible implementation (and the most-correct implementation) of SIGEV_THREAD which glibc should be using.
Another possibility: there is exactly one synchronization-related, non-syscall-invoking, async-signal-safe function in POSIX: sem_post. Thus it may be possible to make a signal handler (as opposed to getting the signal from sigwaitinfo) synchronize with another thread for the purpose of delivering timer events. But I haven't worked out the details, and it seems like it may be difficult or impossible still.
Just set timer interval to 0 and expiry to whatever you want. Your timer will expire once (and thread created and run) and then stay disarmed.

Resources