I'm integrating FreeRTOS cmsis_v2 on my STM32F303VCx and come to a certain problem then using Event Flags when blocking the task to wait for operation approval from another task.
If the task executes the following code, all other tasks get minimal runtime (understandably because OS is constantly checking evt_flg):
for(;;)
{
flag = osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, 0);
if (flag == EventOccured)
{
/* Task main route */
osEventFlagsClear (evt_flg,EventOccured);
}
}
But if to set timeout to osWaitForver: osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, osWaitForver ), the whole program goes into HardFault.
What's the best solution for such behavior? I need the task to wait for a flag and don't block other ones, such as terminal input read, from running.
The task code the question provides is constantly busy, polling the RTOS event.
This is a design antipattern, it is virtually always better to have the task block until the event source has fired. The only exception where a call to osEventFlagsWait() with a zero timeout could make more sense is if you have to monitor several different event/data sources for which there is not a common RTOS API to wait for (and even then, this is only an "emergency exit"). Hence, osWaitForver shall be used.
Next, the reason for the HardFault should be sought. Alone in this task code, I don't see a reason for this - the HardFault source is likely somewhere else. When the area the HardFault can come from, that could be worth a new question (or already fixed). Good luck!
Related
I have a wireless device and can send commands from my phone to the device. A command executes a bunch of steps to complete an action. At the moment, this action function is blocking, ie until the call completes the user needs to wait, with no option to quit. If for any reason, the call doesn't complete then the user is stuck with that screen. A simple pseudo looks like this:
do_action()
{
int result = 0;
result |= step_a();
result |= step_b();
result |= step_c();
result |= step_d();
return result;
}
How can I make this process "interruptible", ie use a signal/flag to tell this function call that the user has terminated and that this action needs to be terminated/cleaned up. Is there a way I can "time bound" this function, ie exit the action if not completed within expected time? How can I implement such a feature? One of the issues also is that some of the steps, such as step_a, step_b functions are blackbox functions, ie implemented by the manufacturer and are blocking and I have no way modify their interface.
The easiest option is to see if the API has a way to pass a timeout to individual calls (or ask the manufacturer to implement one). If this is possible, then you could structure your code to take one step, check if you should quit (timeout reached or user interrupted), then take another step.
Should you be stuck with blocking vendor code you cannot modify, you will need to put something in place that can terminate an in-progress "action". Exactly what this interruption can be will depend environment is running on your device.
If your device is running more feature complete operating system then you could either add a second thread to your process that monitors the first (lookup pthreads), or you could execute your action in one process and have a separate monitor process that kills the first if it takes too long or if the user cancels the action (lookup the "fork" and "kill" system calls).
If your code is running on a more bare-bones environment then your options are more limited. One way to do this is to manually set up a hardware timer and an interrupt handler to check state. The specifics of how to do this will depend entirely on what hardware you are using.
So I have just discovered that libuv is a fairly small library as far as C libraries go (compare to FFmpeg). I have spent the past 6 hours reading through the source code to get a feel for the event loop at a deeper level. But still not seeing where the "nonblockingness" is implemented. Where some event interrupt signal or whatnot is being invoked in the codebase.
I have been using Node.js for over 8 years so I am familar with how to use an async non-blocking event loop, but I never actually looked into the implementation.
My question is twofold:
Where exactly is the "looping" occuring within libuv?
What are the key steps in each iteration of the loop that make it non-blocking and async.
So we start with a hello world example. All that is required is this:
#include <stdio.h>
#include <stdlib.h>
#include <uv.h>
int main() {
uv_loop_t *loop = malloc(sizeof(uv_loop_t));
uv_loop_init(loop); // initialize datastructures.
uv_run(loop, UV_RUN_DEFAULT); // infinite loop as long as queue is full?
uv_loop_close(loop);
free(loop);
return 0;
}
The key function which I have been exploring is uv_run. The uv_loop_init function essentially initializes data structures, so not too much fancness there I don't think. But the real magic seems to happen with uv_run, somewhere. A high level set of code snippets from the libuv repo is in this gist, showing what the uv_run function calls.
Essentially it seems to boil down to this:
while (NOT_STOPPED) {
uv__update_time(loop)
uv__run_timers(loop)
uv__run_pending(loop)
uv__run_idle(loop)
uv__run_prepare(loop)
uv__io_poll(loop, timeout)
uv__run_check(loop)
uv__run_closing_handles(loop)
// ... cleanup
}
Those functions are in the gist.
uv__run_timers: runs timer callbacks? loops with for (;;) {.
uv__run_pending: runs regular callbacks? loops through queue with while (!QUEUE_EMPTY(&pq)) {.
uv__run_idle: no source code
uv__run_prepare: no source code
uv__io_poll: does io polling? (can't quite tell what this means tho). Has 2 loops: while (!QUEUE_EMPTY(&loop->watcher_queue)) {, and for (;;) {,
And then we're done. And the program exists, because there is no "work" to be done.
So I think I have answered the first part of my question after all this digging, and the looping is specifically in these 3 functions:
uv__run_timers
uv__run_pending
uv__io_poll
But not having implemented anything with kqueue or multithreading and having dealt relatively little with file descriptors, I am not quite following the code. This will probably help out others along the path to learning this too.
So the second part of the question is what are the key steps in these 3 functions that implement the nonblockingness? Assuming this is where all the looping exists.
Not being a C expert, does for (;;) { "block" the event loop? Or can that run indefinitely and somehow other parts of the code are jumped to from OS system events or something like that?
So uv__io_poll calls poll(...) in that endless loop. I don't think is non-blocking, is that correct? That seems to be all it mainly does.
Looking into kqueue.c there is also a uv__io_poll, so I assume the poll implementation is a fallback and kqueue on Mac is used, which is non-blocking?
So is that it? Is it just looping in uv__io_poll and each iteration you can add to the queue, and as long as there's stuff in the queue it will run? I still don't see how it's non-blocking and async.
Can one outline similar to this how it is async and non-blocking, and which parts of the code to take a look at? Basically, I would like to see where the "free processor idleness" exists in libuv. Where is the processor ever free in the call to our initial uv_run? If it is free, how does it get reinvoked, like an event handler? (Like a browser event handler from the mouse, an interrupt). I feel like I'm looking for an interrupt but not seeing one.
I ask this because I want to implement an MVP event loop in C, but just don't understand how nonblockingness actually is implemented. Where the rubber meets the road.
I think that trying to understand libuv is getting in your way of understanding how reactors (event loops) are implemented in C, and it is this that you need to understand, as opposed to the exact implementation details behind libuv.
(Note that when I say "in C", what I really means is "at or near the system call interface, where userland meets the kernel".)
All of the different backends (select, poll, epoll, etc) are, more-or-less, variations on the same theme. They block the current process or thread until there is work to be done, like servicing a timer, reading from a socket, writing to a socket, or handling a socket error.
When the current process is blocked, it literally is not getting any CPU cycles assigned to it by the OS scheduler.
Part of the issue behind understanding this stuff IMO is the poor terminology: async, sync in JS-land, which don't really describe what these things are. Really, in C, we're talking about non-blocking vs blocking I/O.
When we read from a blocking file descriptor, the process (or thread) is blocked -- prevented from running -- until the kernel has something for it to read; when we write to a blocking file descriptor, the process is blocked until the kernel accepts the entire buffer.
In non-blocking I/O, it's exactly the same, except the kernel won't stop the process from running when there is nothing to do: instead, when you read or write, it tells you how much you read or wrote (or if there was an error).
The select system call (and friends) prevent the C developer from having to try and read from a non-blocking file descriptor over and over again -- select() is, in effect, a blocking system call that unblocks when any of the descriptors or timers you are watching are ready. This lets the developer build a loop around select, servicing any events it reports, like an expired timeout or a file descriptor that can be read. This is the event loop.
So, at its very core, what happens at the C-end of a JS event loop is roughly this algorithm:
while(true) {
select(open fds, timeout);
did_the_timeout_expire(run_js_timers());
for (each error fd)
run_js_error_handler(fdJSObjects[fd]);
for (each read-ready fd)
emit_data_events(fdJSObjects[fd], read_as_much_as_I_can(fd));
for (each write-ready fd) {
if (!pendingData(fd))
break;
write_as_much_as_I_can(fd);
pendingData = whatever_was_leftover_that_couldnt_write;
}
}
FWIW - I have actually written an event loop for v8 based around select(): it really is this simple.
It's important also to remember that JS always runs to completion. So, when you call a JS function (via the v8 api) from C, your C program doesn't do anything until the JS code returns.
NodeJS uses some optimizations like handling pending writes in a separate pthreads, but these all happen in "C space" and you shouldn't think/worry about them when trying to understand this pattern, because they're not relevant.
You might also be fooled into the thinking that JS isn't run to completion when dealing with things like async functions -- but it absolutely is, 100% of the time -- if you're not up to speed on this, do some reading with respect to the event loop and the micro task queue. Async functions are basically a syntax trick, and their "completion" involves returning a Promise.
I just took a dive into libuv's source code, and found at first that it seems like it does a lot of setup, and not much actual event handling.
Nonetheless, a look into src/unix/kqueue.c reveals some of the inner mechanics of event handling:
int uv__io_check_fd(uv_loop_t* loop, int fd) {
struct kevent ev;
int rc;
rc = 0;
EV_SET(&ev, fd, EVFILT_READ, EV_ADD, 0, 0, 0);
if (kevent(loop->backend_fd, &ev, 1, NULL, 0, NULL))
rc = UV__ERR(errno);
EV_SET(&ev, fd, EVFILT_READ, EV_DELETE, 0, 0, 0);
if (rc == 0)
if (kevent(loop->backend_fd, &ev, 1, NULL, 0, NULL))
abort();
return rc;
}
The file descriptor polling is done here, "setting" the event with EV_SET (similar to how you use FD_SET before checking with select()), and the handling is done via the kevent handler.
This is specific to the kqueue style events (mainly used on BSD-likes a la MacOS), and there are many other implementations for different Unices, but they all use the same function name to do nonblocking IO checks. See here for another implementation using epoll.
To answer your questions:
1) Where exactly is the "looping" occuring within libuv?
The QUEUE data structure is used for storing and processing events. This queue is filled by the platform- and IO- specific event types you register to listen for. Internally, it uses a clever linked-list using only an array of two void * pointers (see here):
typedef void *QUEUE[2];
I'm not going to get into the details of this list, all you need to know is it implements a queue-like structure for adding and popping elements.
Once you have file descriptors in the queue that are generating data, the asynchronous I/O code mentioned earlier will pick it up. The backend_fd within the uv_loop_t structure is the generator of data for each type of I/O.
2) What are the key steps in each iteration of the loop that make it non-blocking and async?
libuv is essentially a wrapper (with a nice API) around the real workhorses here, namely kqueue, epoll, select, etc. To answer this question completely, you'd need a fair bit of background in kernel-level file descriptor implementation, and I'm not sure if that's what you want based on the question.
The short answer is that the underlying operating systems all have built-in facilities for non-blocking (and therefore async) I/O. How each system works is a little outside the scope of this answer, I think, but I'll leave some reading for the curious:
https://www.quora.com/Network-Programming-How-is-select-implemented?share=1
The first thing to keep in mind is that work must be added to libuv's queues using its API; one cannot just load up libuv, start its main loop, and then code up some I/O and get async I/O.
The queues maintained by libuv are managed by looping. The infinite loop in uv__run_timers isn't actually infinite; notice that the first check verifies that a soonest-expiring timer exists (presumably, if the list is empty, this is NULL), and if not, breaks the loop and the function returns. The next check breaks the loop if the current (soonest-expiring) timer hasn't expired. If neither of those conditions breaks the loop, the code continues: it restarts the timer, calls its timeout handler, and then loops again to check more timers. Most times when this code runs, it's going to break the loop and exit, allowing the other loops to run.
What makes all this non-blocking is the caller/user following the guidelines and API of libuv: adding your work to queues, and allowing libuv to perform its work on those queues. Processing-intensive work may block these loops and other work from running, so it's important to break your work into chunks.
btw, uv__run_idle, uv__run_check, uv__run_prepare 's source code is defined on src/unix/loop-watcher.c
I'm working on a project for automotive system where we use the MPC5748 MCU. The application uses an RTOS based on AUTOSAR OS, and this MPC target support two type of watchdogs; software and hardware (they have used soft WDT).
My mission is to fit an algorithm within this application, the development of the algorithm has been done, the problem is that in the task where the algorithm is running is a 1ms task and the algorithm needs much more time than the time dedicated to this function.
I'm a newbie to the embedded world.By the way, in the algorithm main function the program will reset itself and this seems to be a timeOut generated by the expiration of the watchdog.
My questions are:
Can I disable the watchdog timer for this specified function (which must not be disabled but just for testing purpose)? It is possible to use more timeOut for the watchdog on that specified function?
Must I develop another task with a big delay in other to run the algorithm? But the problem is that the algorithm need to be synchronised with the 1ms task since we are receiving CAN commands.
Can i add a sleep(<1ms) on the desired function in order to wait a little bit witout affecting other tasks
What are other options to try?
NB: This is a general problem on the watchdog timer and any useful informations will be much helpful for me. Sorry because I can't share the code.
Can I disable the watchdog timer for this specified function (which must not be disabled but just for testing purpose)? It is possible to use more timeOut for the watchdog on that specified function?
Let's forget that one - it is a really bad idea. If it is possible to defeat the watchdog, then it is possible to do it by error, and then the whole point of the watchdog is defeated. Apart from that its an XY question - a question about your proposed solution to a different problem - you should ask about the problem directly.
Must I develop another task with a big delay in other to run the algorithm? But the problem is that the algorithm need to be synchronised with the 1ms task since we are receiving CAN commands.
Yes you need another task, but you should not add a "big delay" and it is probably unnecessary and certainly a bad design. If the 1ms task needs the result of the algorithm then, the algorithm should run in a service task triggered by the 1ms task and run asynchronously to the 1ms task, the service task then makes the results available to the 1ms task when available (by shared memory or message passing perhaps). Alternatively if the result is not specifically needed by the 1ms task, the service task could take the necessary action independently of the 1ms task.
There are many options, but essentially it seems that your task partitioning is inappropriate; your CAN Rx task should be responsible for receiving CAN messages only, and any action required in response to CAN messages deferred to one or more other tasks, perhaps fed from a message queue.
What are other options to try ?
Software design should not be a matter of trial and error - get the design right, implement the design. However you might consider whether 1ms is appropriate; is it possible that the period can be extended to encompass the worst case execution time without causing a failure to meet deadlines in general? If the answer is "no" then the algorithm does not belong in this task.
I don't think so you can disable/delay the WATCHDOG timer and even if you could that's not a good option to go for.
The problem what think is that the task you are calling is of 1ms, which is very less to read CAN messages and then operate on the same. The minimum task time i think should be of 5ms and the optimal time should be of 10ms.
Can I disable the watchdog timer for this specified function (which must not be disabled but just for testing purpose)? It is possible to use more timeOut for the watchdog on that specified function?
You should never disable the watchdog anywhere in your code.
It might not even be possible, on the MPC5x families you typically set up the watchdog once, and then for safety reasons all watchdog registers turn to read-only registers.
Must I develop another task with a big delay in other to run the algorithm? But the problem is that the algorithm need to be synchronised with the 1ms task since we are receiving CAN commands.
Ideally you should only service the watchdog from one single location in the program. Your CAN peripheral will be FlexCAN, which has a lot of available "mailboxes" for CAN messages. In most cases, you shouldn't need to poll it, but a flag will be set when the desired message arrive.
So it isn't obvious to me why you would need a delay to wait for them. Simply do:
void the_task (void)
{
wdog_refresh();
... // do other things
if(can_message_available)
{
// do something with the message
}
... // do other things
}
rather than
// BAD:
while(!can_message_available)
; // do nothing
Even if you need to use the CAN as FIFO and poll it repeatedly, you would still use the same approach. You'd just have to ensure that the task runs often enough that there will never be an overflow in the FIFO buffer.
I'm currently working on an embedded project using an ARM Cortex M3 microcontroller with FreeRTOS as system OS. The code was written by a former colleague and sadly the project has some weird bugs which I have to find and fix as soon as possible.
Short description: The device is integrated into vehicles and sends some "special" data using an integrated modem to a remote server.
The main problem: Since the device is integrated into a vehicle, the power supply of the device can be lost at any time. Therefore the device stores some parts of the "special" data to two reserved flash pages. This code module is laid out as an eeprom emulation on two flash pages(for wear leveling and data transfer from one flash page to another).
The eeprom emulation works with so called "virtual addresses", where you can write data blocks of any size to the currently active/valid flash page and read it back by using those virtual addresses.
The former colleague implemented the eeprom emulation as multitasking module, where you can read/write to the flash pages from every task in the application. At first sight everything seems fine.
But my project manager told me, that the device always loses some of the "special" data at moments, where the power supply level in the vehicle goes down to some volts and the device tries to save the data to flash.
Normally the power supply is about 10-18 volts, but if it goes down to under 7 volts, the device receives an interrupt called powerwarn and it triggers a task called powerfail task.
The powerfail task has the highest priority of all tasks and executes some callbacks where e.g. the modem is turned off and also where the "special" data is stored in the flash page.
I tried to understand the code and debugged for days/weeks and now I'm quite sure that I found the problem:
Within those callbacks which the powerfail task executes (called powerfail callbacks), there are RTOS calls,
where other tasks get suspended. But unfortunately those supended task could also have a unfinished EEPROM_WriteBlock() call just before the powerwarn interrupt is received.
Therefore the powerfail task executes the callbacks and in one of the callbacks there is a EE_WriteBlock() call where the task can't take the mutex in EE_WriteBlock() since another task (which was suspended) has taken it already --> Deadlock!
This is the routine to write data to flash:
uint16_t
EE_WriteBlock (EE_TypeDef *EE, uint16_t VirtAddress, const void *Data, uint16_t Size)
{
.
.
xSemaphoreTakeRecursive(EE->rw_mutex, portMAX_DELAY);
/* Write the variable virtual address and value in the EEPROM */
.
.
.
xSemaphoreGiveRecursive(EE->rw_mutex);
return Status;
}
This is the RTOS specific code when 'xSemaphoreTakeRecursive()' is called:
portBASE_TYPE xQueueTakeMutexRecursive( xQueueHandle pxMutex, portTickType xBlockTime )
{
portBASE_TYPE xReturn;
/* Comments regarding mutual exclusion as per those within
xQueueGiveMutexRecursive(). */
traceTAKE_MUTEX_RECURSIVE( pxMutex );
if( pxMutex->pxMutexHolder == xTaskGetCurrentTaskHandle() )
{
( pxMutex->uxRecursiveCallCount )++;
xReturn = pdPASS;
}
else
{
xReturn = xQueueGenericReceive( pxMutex, NULL, xBlockTime, pdFALSE );
/* pdPASS will only be returned if we successfully obtained the mutex,
we may have blocked to reach here. */
if( xReturn == pdPASS )
{
( pxMutex->uxRecursiveCallCount )++;
}
else
{
traceTAKE_MUTEX_RECURSIVE_FAILED( pxMutex );
}
}
return xReturn;
}
My project manager is happy that I've found the bug but he also forces me to create a fix as quickly as possible, but what I really want is a rewrite of the code.
Maybe one of you might think, just avoid the suspension of the other tasks and you are done, but that is not a possible solution, since this could trigger another bug.
Does anybody have a quick solution/idea how I could fix this deadlock problem?
Maybe I could use xTaskGetCurrentTaskHandle() in EE_WriteBlock() to determine who has the ownership of the mutex and then give it if the task is not running anymore.
Thx
Writing flash, on many systems, requires interrupts to be disabled for the duration of the write so I'm not sure how powerFail can be made running while a write is in progress, but anyway:
Don't control access to the reserved flash pages directly with a mutex - use a blocking producer-consumer queue instead.
Delegate all those writes to one 'flashWriter' thread by queueing requests to it. If the threads requesting the writes require synchronous access, include an event or semaphore in the request struct that the requesting thread waits on after pushing its request. The flashWriter can signal it when done, (or after loading the struct with an error indication:).
There are variations on a theme - if all the write requesting threads need only synchronous access, maybe they can keep their own static request struct with their own semaphore and just queue up a pointer to it.
Use a producer-consumer queue class that allows a high-priority push at the head of the queue and, when powerfail runs, push a 'stopWriting' request at the front of the queue. The flashWriter will then complete any write operation in progress, pop the stopWriting request and so be instructed to suspend itself, (or you could use a 'stop' volatile boolean that the flashWriter checks every time before attempting to pop the queue).
That should prevent deadlock by removing the hard mutex lock from the flash write requests pushed in the other threads. It won't matter if other threads continue to queue up write requests - they will never be executed.
Edit: I've just had two more coffees and, thinking about this, the 'flashWriter' thread could easily become the 'FlashWriterAndPowerFail' thread:
You could arrange for your producer-consumer queue to return a pop() result of null if a volatile 'stop' boolean is set, no matter whether there were entries on the queue or no. In the 'FWAPF' thread, do a null-check after every pop() return and do the powerFail actions upon null or flashWrite actions if not.
When the powerFail interrupt occurs, set the stop bool and signal the 'count' semaphore in the queue to ensure that the FWAPF thread is made running if it's currently blocked on the queue.
That way, you don't need a separate 'powerFail' thread and stack - one thread can do the flashWrite and powerFail while still ensuring that there are no mutex deadlocks.
I'm using stm32f103 with GCC and have a task, which can be described with following pseudocode:
void http_server() {
transmit(data, len);
event = waitfor(data_sent_event | disconnect_event | send_timeout_event);
}
void tcp_interrupt() {
if (int_reg & DATA_SENT) {
emit(data_send_event);
}
}
void main.c() {
run_task(http_server);
}
I know, that all embedded OSes offer such functionality, but they are too huge for this single task. I don't need preemption, mutexes, queues and other features. Just waiting for flags in secondary tasks and raising these flags in interrupts.
Hope someone knows good tutorial on this topic or have a piece of code of context switching and wait implementation.
You will probably need to use an interrupt driven finite state machine.
There are a number of IP stacks that are independent of an operating system, or even interrupts. lwip (light weight ip) comes to mind. I used it indirectly as it was provided by xilinx. the freedos folks may have had one, certainly the crynwr packet drivers come to mind to which there were no doubt stacks built.
As far as the perhaps more simpler question. Your code is sitting in a foreground task in the waitfor() function which appears to want to be an infinite loop waiting for some global variables to change. And an interrupt comes along calls the interrupt handler which with a lot of stack work (to know it is a tcp interrupt) calls tcp_interrupt which modifies the flags, interrupt finishes and now waitfor sees the global flag change. The context switch is the interrupt which is built into the processor, no need for an operating system or anything fancy, a global variable or two and the isr. The context switch and flags/events are a freebie compared to the tcp/ip stack. udp is significantly easier, do you really need tcp btw?
If you want more than one of these waitfor() active, basically you don want to only have the one forground task sitting in one waitfor(). Then I would do one of two things. have the foreground task poll, instead of a waitfor(something) change it to an if(checkfor(something)) { then do something }.
Or setup your system so that the interrupt handler, which in your pseudo code is already very complicated to know this is tcp packet data, examines the tcp header deeper and knows to call the http_server() thing for port 80 events, and other functions for other events that you might have had a waitfor. So in this case instead of a multitasking series of functions that are waitfor()ing, create a single list of the events, and look for them in the ISR. Use a timer and interrupt and globals for the timeouts (reset a counter when a packet arrives, bump the counter on a timer interrupt if the counter reaches N then a timeout has occurred, call the timeout task handler function).