I'm currently working on an embedded project using an ARM Cortex M3 microcontroller with FreeRTOS as system OS. The code was written by a former colleague and sadly the project has some weird bugs which I have to find and fix as soon as possible.
Short description: The device is integrated into vehicles and sends some "special" data using an integrated modem to a remote server.
The main problem: Since the device is integrated into a vehicle, the power supply of the device can be lost at any time. Therefore the device stores some parts of the "special" data to two reserved flash pages. This code module is laid out as an eeprom emulation on two flash pages(for wear leveling and data transfer from one flash page to another).
The eeprom emulation works with so called "virtual addresses", where you can write data blocks of any size to the currently active/valid flash page and read it back by using those virtual addresses.
The former colleague implemented the eeprom emulation as multitasking module, where you can read/write to the flash pages from every task in the application. At first sight everything seems fine.
But my project manager told me, that the device always loses some of the "special" data at moments, where the power supply level in the vehicle goes down to some volts and the device tries to save the data to flash.
Normally the power supply is about 10-18 volts, but if it goes down to under 7 volts, the device receives an interrupt called powerwarn and it triggers a task called powerfail task.
The powerfail task has the highest priority of all tasks and executes some callbacks where e.g. the modem is turned off and also where the "special" data is stored in the flash page.
I tried to understand the code and debugged for days/weeks and now I'm quite sure that I found the problem:
Within those callbacks which the powerfail task executes (called powerfail callbacks), there are RTOS calls,
where other tasks get suspended. But unfortunately those supended task could also have a unfinished EEPROM_WriteBlock() call just before the powerwarn interrupt is received.
Therefore the powerfail task executes the callbacks and in one of the callbacks there is a EE_WriteBlock() call where the task can't take the mutex in EE_WriteBlock() since another task (which was suspended) has taken it already --> Deadlock!
This is the routine to write data to flash:
uint16_t
EE_WriteBlock (EE_TypeDef *EE, uint16_t VirtAddress, const void *Data, uint16_t Size)
{
.
.
xSemaphoreTakeRecursive(EE->rw_mutex, portMAX_DELAY);
/* Write the variable virtual address and value in the EEPROM */
.
.
.
xSemaphoreGiveRecursive(EE->rw_mutex);
return Status;
}
This is the RTOS specific code when 'xSemaphoreTakeRecursive()' is called:
portBASE_TYPE xQueueTakeMutexRecursive( xQueueHandle pxMutex, portTickType xBlockTime )
{
portBASE_TYPE xReturn;
/* Comments regarding mutual exclusion as per those within
xQueueGiveMutexRecursive(). */
traceTAKE_MUTEX_RECURSIVE( pxMutex );
if( pxMutex->pxMutexHolder == xTaskGetCurrentTaskHandle() )
{
( pxMutex->uxRecursiveCallCount )++;
xReturn = pdPASS;
}
else
{
xReturn = xQueueGenericReceive( pxMutex, NULL, xBlockTime, pdFALSE );
/* pdPASS will only be returned if we successfully obtained the mutex,
we may have blocked to reach here. */
if( xReturn == pdPASS )
{
( pxMutex->uxRecursiveCallCount )++;
}
else
{
traceTAKE_MUTEX_RECURSIVE_FAILED( pxMutex );
}
}
return xReturn;
}
My project manager is happy that I've found the bug but he also forces me to create a fix as quickly as possible, but what I really want is a rewrite of the code.
Maybe one of you might think, just avoid the suspension of the other tasks and you are done, but that is not a possible solution, since this could trigger another bug.
Does anybody have a quick solution/idea how I could fix this deadlock problem?
Maybe I could use xTaskGetCurrentTaskHandle() in EE_WriteBlock() to determine who has the ownership of the mutex and then give it if the task is not running anymore.
Thx
Writing flash, on many systems, requires interrupts to be disabled for the duration of the write so I'm not sure how powerFail can be made running while a write is in progress, but anyway:
Don't control access to the reserved flash pages directly with a mutex - use a blocking producer-consumer queue instead.
Delegate all those writes to one 'flashWriter' thread by queueing requests to it. If the threads requesting the writes require synchronous access, include an event or semaphore in the request struct that the requesting thread waits on after pushing its request. The flashWriter can signal it when done, (or after loading the struct with an error indication:).
There are variations on a theme - if all the write requesting threads need only synchronous access, maybe they can keep their own static request struct with their own semaphore and just queue up a pointer to it.
Use a producer-consumer queue class that allows a high-priority push at the head of the queue and, when powerfail runs, push a 'stopWriting' request at the front of the queue. The flashWriter will then complete any write operation in progress, pop the stopWriting request and so be instructed to suspend itself, (or you could use a 'stop' volatile boolean that the flashWriter checks every time before attempting to pop the queue).
That should prevent deadlock by removing the hard mutex lock from the flash write requests pushed in the other threads. It won't matter if other threads continue to queue up write requests - they will never be executed.
Edit: I've just had two more coffees and, thinking about this, the 'flashWriter' thread could easily become the 'FlashWriterAndPowerFail' thread:
You could arrange for your producer-consumer queue to return a pop() result of null if a volatile 'stop' boolean is set, no matter whether there were entries on the queue or no. In the 'FWAPF' thread, do a null-check after every pop() return and do the powerFail actions upon null or flashWrite actions if not.
When the powerFail interrupt occurs, set the stop bool and signal the 'count' semaphore in the queue to ensure that the FWAPF thread is made running if it's currently blocked on the queue.
That way, you don't need a separate 'powerFail' thread and stack - one thread can do the flashWrite and powerFail while still ensuring that there are no mutex deadlocks.
Related
I'm integrating FreeRTOS cmsis_v2 on my STM32F303VCx and come to a certain problem then using Event Flags when blocking the task to wait for operation approval from another task.
If the task executes the following code, all other tasks get minimal runtime (understandably because OS is constantly checking evt_flg):
for(;;)
{
flag = osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, 0);
if (flag == EventOccured)
{
/* Task main route */
osEventFlagsClear (evt_flg,EventOccured);
}
}
But if to set timeout to osWaitForver: osEventFlagsWait (evt_flg, EventOccured, osFlagsWaitAny, osWaitForver ), the whole program goes into HardFault.
What's the best solution for such behavior? I need the task to wait for a flag and don't block other ones, such as terminal input read, from running.
The task code the question provides is constantly busy, polling the RTOS event.
This is a design antipattern, it is virtually always better to have the task block until the event source has fired. The only exception where a call to osEventFlagsWait() with a zero timeout could make more sense is if you have to monitor several different event/data sources for which there is not a common RTOS API to wait for (and even then, this is only an "emergency exit"). Hence, osWaitForver shall be used.
Next, the reason for the HardFault should be sought. Alone in this task code, I don't see a reason for this - the HardFault source is likely somewhere else. When the area the HardFault can come from, that could be worth a new question (or already fixed). Good luck!
I have following C problem:
I have a hardware module that controls the SPI bus (as a master), let's call it SPI_control, it's got private (static) read & write and "public" Init() and WriteRead() functions (for those who don't know, SPI is full duplex i.e. a Write always reads data on the bus). Now I need to make this accessible to higher levekl modules that incorporate certain protocols. Let's cal the upper modules TDM and AC. They run in two separate threads and one might not be interrupted by the other (when it's in the nmiddle of a transaction, it first needs to complete).
So one possibility I thought of, is to incorporate a SPI_ENG inbween the modules and the SPI_control which controls data flow and know what can be interrupted and what can't - it would then forward data accordingly to spi_control. But hwo can the independent tasks AC & **TDM talk to spi_control, can I have them to write to and read from some kind ok Semaphore queue? How is should this be done?
Its not exactly clear what you are trying to do, but a general solution is that your two processes (AC and TDM) can write data in their own separate output queues. A third process can act as a scheduler and read alternatively from these queue and write on to the HW (SPI_control). This may be what you are looking for since the queues will also act as elasticity buffers to handle bursty transactions.
This ways you will not have to worry about AC getting preempted TDM, there should be no need for Mutex's to synchronize the accesses to SPI_Control.
Queues in kernel are implemented using kernel semaphores. A Queue is an array of memory guarded by an kernel semaphore.
What I would do is create a control msg queue for the Scheduler tasks. So now system will have 3 queue. 2 data output queues for AC, TDM process and one Control queue for Scheduler task. During system startup scheduler task will start before AC and TDM and pend on its control queue. The AC and TDM process should send "data available" msg to scheduler task over the control queue whenever their queue goes non empty (msgQNumMsgs()). On receiving this msg, the scheduler task should start reading from the specific queue until it is empty and again pend on the control queue. The last time I worked on vxworks(2004), it had a flat memory model, in which all the global variables were accessible to all tasks. Is it this the case? If yes then you can use global variable to pass queue Id's between tasks.
I would simply use a Mutex on each SPI operation:
SPI_Read()
{
MutexGet(&spiMutex);
...
MutexPut(&spiMutex);
}
SPI_Write()
{
MutexGet(&spiMutex);
...
MutexPut(&spiMutex);
}
Make sure that you initialize the Mutex with priority-inheritance enabled, so that it can perform priority-inversion whenever needed.
void print_task(void)
{
for(;;)
{
taskLock();
printf("this is task %d\n", taskIdSelf());
taskUnlock();
taskDelay(0);
}
}
void print_test(void)
{
taskSpawn("t1", 100,0,0x10000, (FUNCPTR)print_task, 0,0,0,0,0,0,0,0,0,0);
taskSpawn("t2", 100,0,0x10000, (FUNCPTR)print_task, 0,0,0,0,0,0,0,0,0,0);
}
the above code show:
this is task this is task126738208 126672144 this is task this is
task 126712667214438208
this is task this is task 1266721441 26738208 this is task 126672144
this is task
what is the right way to print a string in multitask?
The problem lies in taskLock();
Try semaphore or mutex instead.
The main idea to print in multi-threaded environment is using dedicated task that printout.
Normally in vxWorks there is a log task that gets the log messages from all tasks in the system and print to terminal from one task only.
The main problem in vxWorks logger mechanism is that the logger task use very high priority and can change your system timing.
Therefore, you should create your own low priority task that get messages from other tasks (using message queue, shared memory protected by mutex, …).
In that case there are 2 great benefits:
The first one, all system printout will be printed from one single task.
The second, and most important benefit, the real-time tasks in the system should not loss time using printf() function.
As you know, printf is very slow function that use system calls and for sure change the timing of your tasks according the debug information you add.
taskLock,
taskLock use as a command to the kernel, it mean to leave the current running task in the CPU as READY.
As you wrote in the example code taskUnlock() function doesn't have arguments. The basic reason is to enable the kernel & interrupts to perform taskUnlock in the system.
There are many system calls that perform task unlock (and sometimes interrupts service routing do it also)
Rather than invent a home-brew solution, just use logMsg(). It is the canonical safe & sane way to print stuff. Internally, it pushes your message onto a message queue. Then a separate task pulls stuff off the queue and prints it. By using logMsg(), you gain ability to print from ISR's, don't have interleaved prints from multiple tasks printing simultaneously, and so on.
For example:
printf("this is task %d\n", taskIdSelf());
becomes
logMsg("this is task %d\n", taskIdSelf(), 0,0,0,0,0,0);
Can I set up the priority of a workqueue?
I am modifying the SPI kernel module "spidev" so it can communicate faster with my hardware.
The external hardware is a CAN controller with a very small buffer, so I must read any incoming data quickly to avoid loosing data.
I have configured a GPIO interrupt to inform me of the new data, but I cannot read the SPI hardware in the interrupt handler.
My interrupt handler basically sets up a workqueue that will read the SPI data.
It works fine when there is only one active process in the kernel.
As soon as I open any other process (even the process viewer top) at the same time, I start loosing data in bunches, i.e., I might receive 1000 packects of data with no problems and then loose 15 packets in a row and so on.
I suspect that the cause of my problem is that when the other process (top, in this case) has control over the cpu the interrupt handler runs, but the work in the workqueue doesn't until the scheduler is called again.
I tried to increase the priority of my process with no success.
I wonder if there is a way to tell the kernel to execute the work in workqueue immediatelly after the interrupt handling function.
Suggestions are welcome.
As an alternative you could consider using a tasklet, which will tell the kernel execute more immediate, but be aware you are unable to sleep in tasklets
A good ibm article on deffering work in the kernel
http://www.ibm.com/developerworks/linux/library/l-tasklets/
http://www.makelinux.net/ldd3/chp-7-sect-5
A tasklet is run at the next timer tick as long as the CPU is busy running a process, but it is run immediately when the CPU is otherwise idle. The kernel provides a set of ksoftirqd kernel threads, one per CPU, just to run "soft interrupt" handlers, such as the tasklet_action function. Thus, the final three runs of the tasklet take place in the context of the ksoftirqd kernel thread associated to CPU 0. The jitasklethi implementation uses a high-priority tasklet, explained in an upcoming list of functions.
I'm trying to implement a UDP-based server that maintains two sockets, one for controlling(ctrl_sock) and the other for data transmission(data_sock). The thing is, ctrl_sock is always uplink and data_sock is downlink. That is, clients will request data transmission/stop via the ctrl_sock and data will be sent to them via data_sock.
Now the problem is, since the model is connection-less, the server will have to maintain a list of registered clients' information( I call it peers_context) such that it can "blindly" push data to them until they ask to stop. During this blind transmission, clients may send controlling messages to the server via the ctrl_sock asynchronously. These information, besides initial Request and Stop, can also be, for example, preferences of file parts. Therefore, the peers_context has to be updated asynchronously. However, the transmission over the data_sock relies on this peers_context structure, hence raises a synchronization problem between ctrl_sock and data_sock. My question is, what can I do to safely maintain these two socks and the peers_context structure such that the asynchronous update of peers_context won't cause a havoc. By the way, the update of peers_context wouldn't be very frequent, that is why I need to avoid the request-reply model.
My initial consideration of the implementation is, to maintain ctrl_sock in the main thread(listener thread), and transmission over data_sock is maintained in the other thread(worker thread). However, I found it is difficult to synchronize in this case. For example, if I use mutexes in peers_context, whenever the worker thread locks peers_context, the listener thread wouldn't have access to it anymore when it needs to modify peers_context, because the worker thread works endlessly. On the other hand, if the listener thread holds the peers_context and writes to it, the worker thread would fail to read peers_context and terminates. Can anybody give me some suggestions?
By the way, the implementation is done in Linux environment in C. Only the listener thread would need to modify peers_context occasionally, the worker thread only needs to read. Thanks sincerely!
If there is strong contention for your peers_context then you need to need to shorten your critical sections. You talked about using a mutex. I assume you've already considered changing to a reader+writer lock and rejected it because you don't want the constant readers to starve a writer. How about this?
Make a very small structure that is an indirect reference to a peers_context like this:
struct peers_context_handle {
pthread_mutex_t ref_lock;
struct peers_context *the_actual_context;
pthread_mutex_t write_lock;
};
Packet senders (readers) and control request processors (writers) always access the peers_mutex through this indirection.
Assumption: the packet senders never modify the peers_context, nor do they ever free it.
Packer senders briefly lock the handle, obtain the current version of the peers_context and unlock it:
pthread_mutex_lock(&(handle->ref_lock));
peers_context = handle->the_actual_context;
pthread_mutex_unlock(&(handle->ref_lock));
(In practice, you can even do away with the lock if you introduce memory barriers, because a pointer dereference is atomic on all platforms that Linux supports, but I wouldn't recommend it since you would have to start delving into memory barriers and other low-level stuff, and neither C nor POSIX guarantees that it will work anyway.)
Request processors don't update the peers_context, they make a copy and completely replace it. That's how they keep their critical section small. They do use write_lock to serialize updates, but updates are infrequent so that's not a problem.
pthread_mutex_lock(&(handle->write_lock));
/* Short CS to get the old version */
pthread_mutex_lock(&(handle->ref_lock));
old_peers_context = handle->the_actual_context;
pthread_mutex_unlock(&(handle->ref_lock));
new_peers_context = allocate_new_structure();
*new_peers_context = *old_peers_context;
/* Now make the changes that are requested */
new_peers_context->foo = 42;
new_peers_context->bar = 52;
/* Short CS to replace the context */
pthread_mutex_lock(&(handle->ref_lock));
handle->the_actual_context = new_peers_context;
pthread_mutex_unlock(&(handle->ref_lock));
pthread_mutex_unlock(&(handle->write_lock));
magic(old_peers_context);
What's the catch? It's the magic in the last line of code. You have to free the old copy of the peers_context to avoid a memory leak but you can't do it because there might be packet senders still using that copy.
The solution is similar to RCU, as used inside the Linux kernel. You have to wait for all of the packet sender threads to have entered a quiescent state. I'm leaving the implementation of this as an exercise for you :-) but here are the guidelines:
The magic() function adds old_peers_context so a to-be-freed queue (which has to be protected by a mutex).
One dedicated thread frees this list in a loop:
It locks the to-be-freed list
It obtains a pointer to the list
It replaced the list with a new empty list
It unlocks the to-be-freed list
It clears a mark associated with each worker thread
It waits for all marks to be set again
It frees each item in its previously obtained copy of the to-be-freed list
Meanwhile, each worker thread sets its own mark at an idle point in its event loop (i.e. a point when it is not busy sending any packets or holding any peer_contexts.