I have following C problem:
I have a hardware module that controls the SPI bus (as a master), let's call it SPI_control, it's got private (static) read & write and "public" Init() and WriteRead() functions (for those who don't know, SPI is full duplex i.e. a Write always reads data on the bus). Now I need to make this accessible to higher levekl modules that incorporate certain protocols. Let's cal the upper modules TDM and AC. They run in two separate threads and one might not be interrupted by the other (when it's in the nmiddle of a transaction, it first needs to complete).
So one possibility I thought of, is to incorporate a SPI_ENG inbween the modules and the SPI_control which controls data flow and know what can be interrupted and what can't - it would then forward data accordingly to spi_control. But hwo can the independent tasks AC & **TDM talk to spi_control, can I have them to write to and read from some kind ok Semaphore queue? How is should this be done?
Its not exactly clear what you are trying to do, but a general solution is that your two processes (AC and TDM) can write data in their own separate output queues. A third process can act as a scheduler and read alternatively from these queue and write on to the HW (SPI_control). This may be what you are looking for since the queues will also act as elasticity buffers to handle bursty transactions.
This ways you will not have to worry about AC getting preempted TDM, there should be no need for Mutex's to synchronize the accesses to SPI_Control.
Queues in kernel are implemented using kernel semaphores. A Queue is an array of memory guarded by an kernel semaphore.
What I would do is create a control msg queue for the Scheduler tasks. So now system will have 3 queue. 2 data output queues for AC, TDM process and one Control queue for Scheduler task. During system startup scheduler task will start before AC and TDM and pend on its control queue. The AC and TDM process should send "data available" msg to scheduler task over the control queue whenever their queue goes non empty (msgQNumMsgs()). On receiving this msg, the scheduler task should start reading from the specific queue until it is empty and again pend on the control queue. The last time I worked on vxworks(2004), it had a flat memory model, in which all the global variables were accessible to all tasks. Is it this the case? If yes then you can use global variable to pass queue Id's between tasks.
I would simply use a Mutex on each SPI operation:
SPI_Read()
{
MutexGet(&spiMutex);
...
MutexPut(&spiMutex);
}
SPI_Write()
{
MutexGet(&spiMutex);
...
MutexPut(&spiMutex);
}
Make sure that you initialize the Mutex with priority-inheritance enabled, so that it can perform priority-inversion whenever needed.
Related
Serial port open to the public and a thread always works with the port.
One or more high priority thread are created at run-time witch without Conflict with the main thread should work with the port and destroy when completed.
how can i schedule these threads and manage access to serial port?
thanks.
In case you are creating many threads and you always want only one thread to work with the serial port (one thread at a time), you can manage it's access through the use of semaphores (so that they do not collide).
However the scheduling algorithm which you want to use purely depends on your need. When you are creating more than one thread I am sure you must be using pthread_create API which has more flexibility to set your attributes (such as priority) in it's second parameter. Please use that parameter to set you priority levels. You can schedule them by taking their priorities into consideration or you can even use a time slice technique.
When analyzing your question it looks like you are working on some development board. In case it is an RTOS code, you can try implementing the preemption mechanism along with semaphores.
We are using embedded C for the VxWorks real time operating system.
Currently, all of our UDP connections are started with TaskSpawn().
This routine creates and activates a new task with a specified
priority and options and returns a system-assigned ID.
We specify the task size, a priority, and pass in an entry point.
These are continuous connections, and thus every entry point contains an infinite loop where we delay before the next iteration.
Then I discovered period().
period spawns a task to call a function periodically.
Period sounds like what we should be using instead, but I can't find any information on when you would prefer this function over TaskSpawn. Period also doesn't allow specifying the task size or the priority, so how is it decided? Is the task size dynamic? What will the priority be?
There are also watchdogs.
Any task may create a watchdog timer and use it to run a specified
routine in the context of the system-clock ISR, after a specified
delay.
Again, this seems to be in line with the goal of processing data at a particular rate. Which do I choose when a task must continuously execute code at the same rate (i.e. in real time)?
What are the differences between these 3 methods?
Here is a little clarification:
taskSpawn(..) creates a task with which you're free to do anything with you like.
Watchdogs shall only be used to monitor time constraints. Remember that the callback of the watchdog is executed within the context of the system clock ISR which has many limitations (e.g. free stack size, never use blocking function calls in an ISR, ...). Additionally executing "a lot of code" in the system clock ISR slows down your entire system.
period(..) is intended to be a helper for the VxWorks shell and not to be used by a program.
With that being said your only option is to use taskSpawn(..) unless you're doing some very simple stuff in which case period(..) might be ok to use.
If you need to do things cyclically in a specific time frame you might look at timers or taskDelay(..) in combination with sysClkRateSet(..).
Another option is to create two tasks. One that is setting a semaphore after a specific time intervall and the other "worker" tasks waits for this semaphore to do something. With that approach you separate "timing" from "action" which proved to be benefitial according to my experience. You also might want to monitor excution time of the "worker" task by using a watchdog.
I'm currently working on an embedded project using an ARM Cortex M3 microcontroller with FreeRTOS as system OS. The code was written by a former colleague and sadly the project has some weird bugs which I have to find and fix as soon as possible.
Short description: The device is integrated into vehicles and sends some "special" data using an integrated modem to a remote server.
The main problem: Since the device is integrated into a vehicle, the power supply of the device can be lost at any time. Therefore the device stores some parts of the "special" data to two reserved flash pages. This code module is laid out as an eeprom emulation on two flash pages(for wear leveling and data transfer from one flash page to another).
The eeprom emulation works with so called "virtual addresses", where you can write data blocks of any size to the currently active/valid flash page and read it back by using those virtual addresses.
The former colleague implemented the eeprom emulation as multitasking module, where you can read/write to the flash pages from every task in the application. At first sight everything seems fine.
But my project manager told me, that the device always loses some of the "special" data at moments, where the power supply level in the vehicle goes down to some volts and the device tries to save the data to flash.
Normally the power supply is about 10-18 volts, but if it goes down to under 7 volts, the device receives an interrupt called powerwarn and it triggers a task called powerfail task.
The powerfail task has the highest priority of all tasks and executes some callbacks where e.g. the modem is turned off and also where the "special" data is stored in the flash page.
I tried to understand the code and debugged for days/weeks and now I'm quite sure that I found the problem:
Within those callbacks which the powerfail task executes (called powerfail callbacks), there are RTOS calls,
where other tasks get suspended. But unfortunately those supended task could also have a unfinished EEPROM_WriteBlock() call just before the powerwarn interrupt is received.
Therefore the powerfail task executes the callbacks and in one of the callbacks there is a EE_WriteBlock() call where the task can't take the mutex in EE_WriteBlock() since another task (which was suspended) has taken it already --> Deadlock!
This is the routine to write data to flash:
uint16_t
EE_WriteBlock (EE_TypeDef *EE, uint16_t VirtAddress, const void *Data, uint16_t Size)
{
.
.
xSemaphoreTakeRecursive(EE->rw_mutex, portMAX_DELAY);
/* Write the variable virtual address and value in the EEPROM */
.
.
.
xSemaphoreGiveRecursive(EE->rw_mutex);
return Status;
}
This is the RTOS specific code when 'xSemaphoreTakeRecursive()' is called:
portBASE_TYPE xQueueTakeMutexRecursive( xQueueHandle pxMutex, portTickType xBlockTime )
{
portBASE_TYPE xReturn;
/* Comments regarding mutual exclusion as per those within
xQueueGiveMutexRecursive(). */
traceTAKE_MUTEX_RECURSIVE( pxMutex );
if( pxMutex->pxMutexHolder == xTaskGetCurrentTaskHandle() )
{
( pxMutex->uxRecursiveCallCount )++;
xReturn = pdPASS;
}
else
{
xReturn = xQueueGenericReceive( pxMutex, NULL, xBlockTime, pdFALSE );
/* pdPASS will only be returned if we successfully obtained the mutex,
we may have blocked to reach here. */
if( xReturn == pdPASS )
{
( pxMutex->uxRecursiveCallCount )++;
}
else
{
traceTAKE_MUTEX_RECURSIVE_FAILED( pxMutex );
}
}
return xReturn;
}
My project manager is happy that I've found the bug but he also forces me to create a fix as quickly as possible, but what I really want is a rewrite of the code.
Maybe one of you might think, just avoid the suspension of the other tasks and you are done, but that is not a possible solution, since this could trigger another bug.
Does anybody have a quick solution/idea how I could fix this deadlock problem?
Maybe I could use xTaskGetCurrentTaskHandle() in EE_WriteBlock() to determine who has the ownership of the mutex and then give it if the task is not running anymore.
Thx
Writing flash, on many systems, requires interrupts to be disabled for the duration of the write so I'm not sure how powerFail can be made running while a write is in progress, but anyway:
Don't control access to the reserved flash pages directly with a mutex - use a blocking producer-consumer queue instead.
Delegate all those writes to one 'flashWriter' thread by queueing requests to it. If the threads requesting the writes require synchronous access, include an event or semaphore in the request struct that the requesting thread waits on after pushing its request. The flashWriter can signal it when done, (or after loading the struct with an error indication:).
There are variations on a theme - if all the write requesting threads need only synchronous access, maybe they can keep their own static request struct with their own semaphore and just queue up a pointer to it.
Use a producer-consumer queue class that allows a high-priority push at the head of the queue and, when powerfail runs, push a 'stopWriting' request at the front of the queue. The flashWriter will then complete any write operation in progress, pop the stopWriting request and so be instructed to suspend itself, (or you could use a 'stop' volatile boolean that the flashWriter checks every time before attempting to pop the queue).
That should prevent deadlock by removing the hard mutex lock from the flash write requests pushed in the other threads. It won't matter if other threads continue to queue up write requests - they will never be executed.
Edit: I've just had two more coffees and, thinking about this, the 'flashWriter' thread could easily become the 'FlashWriterAndPowerFail' thread:
You could arrange for your producer-consumer queue to return a pop() result of null if a volatile 'stop' boolean is set, no matter whether there were entries on the queue or no. In the 'FWAPF' thread, do a null-check after every pop() return and do the powerFail actions upon null or flashWrite actions if not.
When the powerFail interrupt occurs, set the stop bool and signal the 'count' semaphore in the queue to ensure that the FWAPF thread is made running if it's currently blocked on the queue.
That way, you don't need a separate 'powerFail' thread and stack - one thread can do the flashWrite and powerFail while still ensuring that there are no mutex deadlocks.
void print_task(void)
{
for(;;)
{
taskLock();
printf("this is task %d\n", taskIdSelf());
taskUnlock();
taskDelay(0);
}
}
void print_test(void)
{
taskSpawn("t1", 100,0,0x10000, (FUNCPTR)print_task, 0,0,0,0,0,0,0,0,0,0);
taskSpawn("t2", 100,0,0x10000, (FUNCPTR)print_task, 0,0,0,0,0,0,0,0,0,0);
}
the above code show:
this is task this is task126738208 126672144 this is task this is
task 126712667214438208
this is task this is task 1266721441 26738208 this is task 126672144
this is task
what is the right way to print a string in multitask?
The problem lies in taskLock();
Try semaphore or mutex instead.
The main idea to print in multi-threaded environment is using dedicated task that printout.
Normally in vxWorks there is a log task that gets the log messages from all tasks in the system and print to terminal from one task only.
The main problem in vxWorks logger mechanism is that the logger task use very high priority and can change your system timing.
Therefore, you should create your own low priority task that get messages from other tasks (using message queue, shared memory protected by mutex, …).
In that case there are 2 great benefits:
The first one, all system printout will be printed from one single task.
The second, and most important benefit, the real-time tasks in the system should not loss time using printf() function.
As you know, printf is very slow function that use system calls and for sure change the timing of your tasks according the debug information you add.
taskLock,
taskLock use as a command to the kernel, it mean to leave the current running task in the CPU as READY.
As you wrote in the example code taskUnlock() function doesn't have arguments. The basic reason is to enable the kernel & interrupts to perform taskUnlock in the system.
There are many system calls that perform task unlock (and sometimes interrupts service routing do it also)
Rather than invent a home-brew solution, just use logMsg(). It is the canonical safe & sane way to print stuff. Internally, it pushes your message onto a message queue. Then a separate task pulls stuff off the queue and prints it. By using logMsg(), you gain ability to print from ISR's, don't have interleaved prints from multiple tasks printing simultaneously, and so on.
For example:
printf("this is task %d\n", taskIdSelf());
becomes
logMsg("this is task %d\n", taskIdSelf(), 0,0,0,0,0,0);
I have a very simple code, a data decomposition problem in which in a loop each process sends two large messages to the ranks before and after itself at each cycle. I run this code in a cluster of SMP nodes (AMD Magny cores, 32 core per node, 8 cores per socket). It's a while I'm in the process of optimizing this code. I have used pgprof and tau for profiling and it looks to me that the bottleneck is the communication. I have tried to overlap the communication with the computations in my code however it looks that the actual communication starts when the computations finish :(
I use persistent communication in ready mode (MPI_Rsend_init) and in between the MPI_Start_all and MPI_Wait_all bulk of the computation is done. The code looks like this:
void main(int argc, char *argv[])
{
some definitions;
some initializations;
MPI_Init(&argc, &argv);
MPI_Rsend_init( channel to the rank before );
MPI_Rsend_init( channel to the rank after );
MPI_Recv_init( channel to the rank before );
MPI_Recv_init( channel to the rank after );
for (timestep=0; temstep<Time; timestep++)
{
prepare data for send;
MPI_Start_all();
do computations;
MPI_Wait_all();
do work on the received data;
}
MPI_Finalize();
}
Unfortunately the actual data transfer does not start until the computations are done, I don't understand why. The network uses QDR InfiniBand Interconnect and mvapich2. each message size is 23MB (totally 46 MB message is sent). I tried to change the message passing to eager mode, since the memory in the system is large enough. I use the following flags in my job script:
MV2_SMP_EAGERSIZE=46M
MV2_CPU_BINDING_LEVEL=socket
MV2_CPU_BINDING_POLICY=bunch
Which gives me an improvement of about 8%, probably because of better placement of the ranks inside the SMP nodes however still the problem with communication remains. My question is why can't I effectively overlap the communications with the computations? Is there any flag that I should use and I'm missing it? I know something is wrong, but whatever I have done has not been enough.
By the order of ranks inside the SMP nodes the actual message sizes between the nodes is also 46MB (2x23MB) and the ranks are in a loop. Can you please help me? To see the flags that other users use I have checked /etc/mvapich2.conf however it is empty.
Is there any other method that I should use? do you think one sided communication gives better performance? I feel there is a flag or something that I'm not aware of.
Thanks alot.
There is something called progression of operations in MPI. The standard allows for non-blocking operations to only be progressed to completion once the proper testing/waiting call was made:
A nonblocking send start call initiates the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed concurrently with computations done at the sender after the send was initiated and before it completed. Similarly, a nonblocking receive start call initiates the receive operation, but does not complete it. The call can return before a message is stored into the receive buffer. A separate receive complete call is needed to complete the receive operation and verify that the data has been received into the receive buffer. With suitable hardware, the transfer of data into the receiver memory may proceed concurrently with computations done after the receive was initiated and before it completed.
(words in bold are also bolded in the standard text; emphasis added by me)
Although this text comes from the section about non-blocking communication (§3.7 of MPI-3.0; the text is exactly the same in MPI-2.2), it also applies to persistent communication requests.
I haven't used MVAPICH2, but I am able to speak about how things are implemented in Open MPI. Whenever a non-blocking operation is initiated or a persistent communication request is started, the operation is added to a queue of pending operations and is then progressed in one of the two possible ways:
if Open MPI was compiled without an asynchronous progression thread, outstanding operations are progressed on each call to a send/receive or to some of the wait/test operations;
if Open MPI was compiled with an asynchronous progression thread, operations are progressed in the background even if no further communication calls are made.
The default behaviour is not to enable the asynchronous progression thread as doing so increases the latency of the operations somehow.
The MVAPICH site is unreachable at the moment from here, but earlier I saw a mention of asynchronous progress in the features list. Probably that's where you should start from - search for ways to enable it.
Also note that MV2_SMP_EAGERSIZE controls the shared memory protocol eager message size and does not affect the InfiniBand protocol, i.e. it can only improve the communication between processes that reside on the same cluster node.
By the way, there is no guarantee that the receive operations would be started before the ready send operations in the neighbouring ranks, so they might not function as expected as the ordering in time is very important there.
For MPICH, you can set MPICH_ASYNC_PROGRESS=1 environment variable when runing mpiexec/mpirun. This will spawn a background process which does "asynchronous progress" stuff.
MPICH_ASYNC_PROGRESS - Initiates a spare thread to provide
asynchronous progress. This improves progress semantics for
all MPI operations including point-to-point, collective,
one-sided operations and I/O. Setting this variable would
increase the thread-safety level to
MPI_THREAD_MULTIPLE. While this improves the progress
semantics, it might cause a small amount of performance
overhead for regular MPI operations.
from MPICH Environment Variables
I have tested on my cluster with MPICH-3.1.4, it worked! I believe MVAPICH will also work.