Task Switching in Arm - arm

I am reading the Arm Architecture Reference Manual and I think I have some theoritical questions.
Well at first I am confused whether by context switching we mean task switching?
Secondly, by having an experience with architecture of Intel 80386 I remember there were Task Descriptors and some other mechanisms that automatically saved the state of the task, here in arm how is it done? Is it done, let's say "manually", by saving registers in stack?
And that ASID(Application Space ID) is linked to the previous that I asked somehow?

Well at first I am confused whether by context switching we mean task switching?
Yes, task switching is exactly same to context switch.
here in arm how is it done? Is it done, let's say "manually", by saving registers in stack?
Yes, we save the task context on the stack, usually on the privileged mode(IRQ/SVC) stack, copy the context into the task control block, then restore the context from another task control block which is going to run. Here is a presudo code:
irq_handler:
sub lr, lr, 4
push {lr}
// push cpu context
// copy the context to task's tcb
// get tcb of another task which is going to run
// copy the tcb context back to stack
// pop cpu context
pop {pc}
And that ASID(Application Space ID) is linked to the previous that I asked somehow?
Don't know this

If you have 2 threads with one stack each (array of the registers value), then if you have an ISR that saves the state of the thread and switches to the other thread, then that is a context switch. The simplest example is an operating system with 2 threads (1 producer, 1 consumer) where the switch can look similar to the code here.
/*
* threadswitch - change thread
*
* The thread stack-pointer is supplied as a parameter.
* The old thread's stack-pointer value is saved to the array
* os_thread_info_array, and a new thread is selected from the array.
* The stack pointer of the new thread is returned.
*/
unsigned int os_internal_threadswitch( unsigned int old_sp )
{
unsigned int new_sp;
os_number_of_thread_switches += 1; /* Increase thread-switch counter. */
/* Print line 1 of an informational message. */
printf( "\nPerforming thread-switch number %d. The system has been running for %d ticks.\n",
os_number_of_thread_switches,
os_get_internal_globaltime() );
/* Save the stack pointer of the old thread. */
os_thread_info_array[ os_currently_running_thread ].thread_sp = old_sp;
/* Print part 1 of a message saying which threads are involved this time. */
printf( "Switching from thread-ID %d ",
os_thread_info_array[ os_currently_running_thread ].thread_id );
/* Perform the scheduling decision (round-robin). */
os_currently_running_thread += 1;
if( os_currently_running_thread >= os_current_thread_count )
{
os_currently_running_thread = 0;
}
/* Print part 2 of the informational message. */
printf( "to thread-ID %d.\n",
os_thread_info_array[ os_currently_running_thread ].thread_id );
/* Get the stack pointer of the new thread. */
new_sp = os_thread_info_array[ os_currently_running_thread ].thread_sp;
/* Return. */
return( new_sp );
}

The "context" generally refers to the current state of the CPU; i.e. contents of the the registers. Each "task" (a.k.a. thread) has its own Task Control Block structure (TCB) that stores the pertinent information that the OS knows about the task, such as priority, entry point, name, stack size, etc. Generally, the current CPU context is saved on the stack the of the running task whenever that task is swapped out (the TCB has a pointer to the task stack). The stack pointer is then saved to a known location (usually in the TCB), and the CPU context is restored with info from the TCB and stack of the next task to run. After the switch, the stack pointer is pointing to the stack of the newly running task, and a return goes back to the next instruction after the last call made by that task. That is a context switch.
I don't know why people are indicating that a context switch would be in an ISR. A context switch usually occurs during a system call that causes the running task to block, such as a sleep call, or a semaphore get call, though it may also occur when the system tick ISR runs and wakes up a higher priority task, or determines that the current task's timeslice has expired and another task of equal priority is ready to run. The context switch is just a function called by the OS scheduler, which is called from various other system functions, and it does not make sense for it to be an ISR, though i suppose it could be implemented as a "software" interrupt call, and perhaps that was they had in mind.
The point is that context switches don't only occur as a result of an interrupt, which is the impression I got from other responses. In fact, they occur far more often when a system call is made by a task.

Here is a code that does exactly what you ask for - https://github.com/DISTORTEC/distortos/blob/master/source/architecture/ARM/ARMv6-M-ARMv7-M/ARMv6-M-ARMv7-M-PendSV_Handler.cpp . On exception entry some registers are saved automatically, so you just save remaining ones, switch stack pointer and do the opposite - unstack "remaining" registers and exit the exception. This gets a bit harder if you also need to save FPU registers, because these don't need to be saved every time (they are unused if the thread doesn't do any FPU calculations).

Related

How to update the nice priority of a process from within the Linux kernel?

Every process is first created by a fork system call, which ends up calling _do_fork() in file /kernel/fork.c at line 2416. Inside _do_fork() the function copy_process() is called, which is defined in /kernel/fork.c at line 1841. This function then calls sched_fork() at line 2068 which according to the commentary inside the source code “performs a scheduler related setup of the newly created process and assigns it to a CPU”. The function sched_fork() is defined in file /kernel/sched/core.c at line 2999.
I wish to overwrite the nice priority of regular processes that have an uneven process ID from within the Linux kernel itself (Linux kernel v5.3.18 to be specific). I am not interested in doing it from userspace. This answer suggests this should be theoretically as simple as setting the static_prio of a task_struct. So I reasoned that setting both the dynamic and static priorities in sched_fork would be a good choice, where the dynamic priority is (first?) assigned to a value (please click the previous link to see the source code of sched_fork).
For example, I tried to replace the line p->prio = current->normal_prio; with the following code:
if(fair_policy(p->policy))
p->prio = p->static_prio = p->normal_prio = 117;
After compiling and installing, when I call getpriority() from userspace, it does indeed return the nice priority -3 (equivalent to 117), but behavior-wise it is still acting as if its priority is the default value 0 (i.e. its still assigned the same CPU runtime as the default priority).
So the question is, why is it that modifying p->static_prio and p->prio doesn't have any influence on the CPU timeslice that p receives?
Here follows my reasoning:
Since getpriority returns -3, we at least know that 1) the static priority is indeed modified to 117, and 2) the process does get enqueued to the CFS runqueue (since the process runs succesfully).
I suspect that the load weight was calculated some place earlier, based on the 'old' static priority. For example, I notice that later on in sched_fork there's a call to init_entity_runnable_average(&p->se);, that is defined as follows:
/* Give new sched_entity start runnable values to heavy its load in infant time */
void init_entity_runnable_average(struct sched_entity *se)
{
struct sched_avg *sa = &se->avg;
memset(sa, 0, sizeof(*sa));
/*
* Tasks are initialized with full load to be seen as heavy tasks until
* they get a chance to stabilize to their real load level.
* Group entities are initialized with zero load to reflect the fact that
* nothing has been attached to the task group yet.
*/
if (entity_is_task(se))
sa->load_avg = scale_load_down(se->load.weight);
/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
}
So I think the se->load.weight may be based on the static_priority prior to it being overwritten. But if that is the case, then where does se->load.weight first get initialized when it is forked? I can't seem to find it.
Edit:
I modified the question to incorporate the issues that #MarcoBonelli addressed.

How to create an uninterruptible sleep in C?

I'm looking to create a state of uninterruptible sleep for a program I'm writing. Any tips or ideas about how to create this state would be helpful.
So far I've looked into the wait_event() function defined in wait.h, but was having little luck implementing it. When trying to initialize my wait queue the compiler complained
warning: parameter names (without types) in function declaration
static DECLARE_WAIT_QUEUE_HEAD(wq);
Has anyone had any experience with the wait_event() function or creating an uninterruptible sleep?
The functions that you're looking at in include/linux/wait.h are internal to the Linux kernel. They are not available to userspace.
Generally speaking, uninterruptible sleep states are considered undesirable. Under normal circumstances, they cannot be triggered by user applications except by accident (e.g, by attempting to read from a storage device that is not responding correctly, or by causing the system to swap).
You can make sleep 'signal-aware`.
sleep can be interrupted by signal. In which case the pause would be stopped and sleep would return with amount of time still left. The application can choose to handle the signal notified and if needed resume sleep for the time left.
Actually, you should use synchronization objects provided by the operating system you're working on or simply check the return value of sleep function. If it returns to a value bigger than zero, it means your procedure was interrupted. According to this return value, call sleep function again by passing the delta (T-returnVal) as argument (probably in a loop, in case of possible interrupts that might occur again in that time interval)
On the other hand, if you really want a real-uninterruptible custom sleep function, I may suggest something like the following:
void uninterruptible_sleep(long time, long factor)
{
long i, j;
__asm__("cli"); // close interrupts
for(i=0; i<time; ++i)
for(j=0; j<factor; ++j)
; // custom timer loop
__asm__("sti"); // open interrupts
}
cli and sti are x86 assembly instructions which allow us to set IF (interrupt flag) of the cpu. In this way, it is possible to clear (cli) or set (sti) all the interrupts. However, if you're working on a multi-processor system, there needs to be taken another synchronization precautions too, due to the fact that these instructions will only be valid for single microprocessor. Moreover, this type of function as I suggested above, will be very system (cpu) dependant. Because, the inner loop requires a clock-cycle count to measure an exact time interval (execution number of instructions per second) depending on the cpu frequency. Thus, if you really want to get rid of every possible interrupt, you may use a function as I suggested above. But be careful, if your program gets a deadlock situation while it's in cli state, you will need to restart your system.
(The inline assembly syntax I have written is for gcc compiler)

Deadlock of powerfail sequence during write to flash page

I'm currently working on an embedded project using an ARM Cortex M3 microcontroller with FreeRTOS as system OS. The code was written by a former colleague and sadly the project has some weird bugs which I have to find and fix as soon as possible.
Short description: The device is integrated into vehicles and sends some "special" data using an integrated modem to a remote server.
The main problem: Since the device is integrated into a vehicle, the power supply of the device can be lost at any time. Therefore the device stores some parts of the "special" data to two reserved flash pages. This code module is laid out as an eeprom emulation on two flash pages(for wear leveling and data transfer from one flash page to another).
The eeprom emulation works with so called "virtual addresses", where you can write data blocks of any size to the currently active/valid flash page and read it back by using those virtual addresses.
The former colleague implemented the eeprom emulation as multitasking module, where you can read/write to the flash pages from every task in the application. At first sight everything seems fine.
But my project manager told me, that the device always loses some of the "special" data at moments, where the power supply level in the vehicle goes down to some volts and the device tries to save the data to flash.
Normally the power supply is about 10-18 volts, but if it goes down to under 7 volts, the device receives an interrupt called powerwarn and it triggers a task called powerfail task.
The powerfail task has the highest priority of all tasks and executes some callbacks where e.g. the modem is turned off and also where the "special" data is stored in the flash page.
I tried to understand the code and debugged for days/weeks and now I'm quite sure that I found the problem:
Within those callbacks which the powerfail task executes (called powerfail callbacks), there are RTOS calls,
where other tasks get suspended. But unfortunately those supended task could also have a unfinished EEPROM_WriteBlock() call just before the powerwarn interrupt is received.
Therefore the powerfail task executes the callbacks and in one of the callbacks there is a EE_WriteBlock() call where the task can't take the mutex in EE_WriteBlock() since another task (which was suspended) has taken it already --> Deadlock!
This is the routine to write data to flash:
uint16_t
EE_WriteBlock (EE_TypeDef *EE, uint16_t VirtAddress, const void *Data, uint16_t Size)
{
.
.
xSemaphoreTakeRecursive(EE->rw_mutex, portMAX_DELAY);
/* Write the variable virtual address and value in the EEPROM */
.
.
.
xSemaphoreGiveRecursive(EE->rw_mutex);
return Status;
}
This is the RTOS specific code when 'xSemaphoreTakeRecursive()' is called:
portBASE_TYPE xQueueTakeMutexRecursive( xQueueHandle pxMutex, portTickType xBlockTime )
{
portBASE_TYPE xReturn;
/* Comments regarding mutual exclusion as per those within
xQueueGiveMutexRecursive(). */
traceTAKE_MUTEX_RECURSIVE( pxMutex );
if( pxMutex->pxMutexHolder == xTaskGetCurrentTaskHandle() )
{
( pxMutex->uxRecursiveCallCount )++;
xReturn = pdPASS;
}
else
{
xReturn = xQueueGenericReceive( pxMutex, NULL, xBlockTime, pdFALSE );
/* pdPASS will only be returned if we successfully obtained the mutex,
we may have blocked to reach here. */
if( xReturn == pdPASS )
{
( pxMutex->uxRecursiveCallCount )++;
}
else
{
traceTAKE_MUTEX_RECURSIVE_FAILED( pxMutex );
}
}
return xReturn;
}
My project manager is happy that I've found the bug but he also forces me to create a fix as quickly as possible, but what I really want is a rewrite of the code.
Maybe one of you might think, just avoid the suspension of the other tasks and you are done, but that is not a possible solution, since this could trigger another bug.
Does anybody have a quick solution/idea how I could fix this deadlock problem?
Maybe I could use xTaskGetCurrentTaskHandle() in EE_WriteBlock() to determine who has the ownership of the mutex and then give it if the task is not running anymore.
Thx
Writing flash, on many systems, requires interrupts to be disabled for the duration of the write so I'm not sure how powerFail can be made running while a write is in progress, but anyway:
Don't control access to the reserved flash pages directly with a mutex - use a blocking producer-consumer queue instead.
Delegate all those writes to one 'flashWriter' thread by queueing requests to it. If the threads requesting the writes require synchronous access, include an event or semaphore in the request struct that the requesting thread waits on after pushing its request. The flashWriter can signal it when done, (or after loading the struct with an error indication:).
There are variations on a theme - if all the write requesting threads need only synchronous access, maybe they can keep their own static request struct with their own semaphore and just queue up a pointer to it.
Use a producer-consumer queue class that allows a high-priority push at the head of the queue and, when powerfail runs, push a 'stopWriting' request at the front of the queue. The flashWriter will then complete any write operation in progress, pop the stopWriting request and so be instructed to suspend itself, (or you could use a 'stop' volatile boolean that the flashWriter checks every time before attempting to pop the queue).
That should prevent deadlock by removing the hard mutex lock from the flash write requests pushed in the other threads. It won't matter if other threads continue to queue up write requests - they will never be executed.
Edit: I've just had two more coffees and, thinking about this, the 'flashWriter' thread could easily become the 'FlashWriterAndPowerFail' thread:
You could arrange for your producer-consumer queue to return a pop() result of null if a volatile 'stop' boolean is set, no matter whether there were entries on the queue or no. In the 'FWAPF' thread, do a null-check after every pop() return and do the powerFail actions upon null or flashWrite actions if not.
When the powerFail interrupt occurs, set the stop bool and signal the 'count' semaphore in the queue to ensure that the FWAPF thread is made running if it's currently blocked on the queue.
That way, you don't need a separate 'powerFail' thread and stack - one thread can do the flashWrite and powerFail while still ensuring that there are no mutex deadlocks.

How does software recognize an interrupt has occured?

As we know we write Embedded C programming, for task management, memory management, ISR, File system and all.
I would like to know if some task or process is running and at the same time an interrupt occurred, then how SW or process or system comes to know that, the interrupt has occurred? and pauses the current task execution and starts serving ISR.
Suppose if I will write the below code like;
// Dummy Code
void main()
{
for(;;)
printf("\n forever");
}
// Dummy code for ISR for understanding
void ISR()
{
printf("\n Interrupt occurred");
}
In this above code if an external interrupt(ISR) occurs, then how main() comes to know that the interrupt occurred? So that it would start serving ISR first?
main doesn't know. You have to execute some-system dependent function in your setup code (maybe in main) that registers the interrupt handler with the hardware interrupt routine/vector, etc.
Whether that interrupt code can execute a C function directly varies quite a lot; runtime conventions for interrupt procedures don't always follow runtime conventions for application code. Usually there's some indirection involved in getting a signal from the interrupt routine to your C code.
your query: I understood your answer. But I wanted to know when Interrupt occurs how the current task execution gets stopped/paused and the ISR starts executing?
well Rashmi to answer your query read below,
when microcontroller detects interrupt, it stops exucution of the program after executing current instruction. Then it pushes PC(program counter) on to stack and loads PC with the vector location of that inerrupt hence, program flow is directed to interrrupt service routine. On completion of ISR the microcontroller again pops the stored program counter from stack and loads it on to PC hence, program execution again resumes from next location it was stopped.
does that replied to your query?
It depends on your target.
For example the ATMEL mega family uses a pre-processor directive to register the ISR with an interrupt vector. When an interrupt occurs the corrosponding interrupt flag is raised in the relevant status register. If the global interrupt flag is raised the program counter is stored on the stack before the ISR is called. This all happens in hardware and the main function knows nothing about it.
In order to allow main to know if an interrupt has occurred you need to implement a shared data resource between the interrupt routine and your main function and all the rules from RTOS programming apply here. This means that as the ISR may be executed at any time it as not safe to read from a shared resource from main without disabling interrupts first.
On an ATMEL target this could look like:
volatile int shared;
int main() {
char status_register;
int buffer;
while(1) {
status_register = SREG;
CLI();
buffer = shared;
SREG = status_register;
// perform some action on the shared resource here.
}
return 0;
}
void ISR(void) {
// update shared resource here.
}
Please note that the ISR is not added to the vector table here. Check your compiler documentation for instructions on how to do that.
Also, an important thing to remember is that ISRs should be very short and very fast to execute.
On most embedded systems the hardware has some specific memory address that the instruction pointer will move to when a hardware condition indicates an interrupt is required.
When the instruction pointer is at this specific location it will then begin to execute the code there.
On a lot of systems the programmer will place only an address of the ISR at this location so that when the interrupt occurs and the instruction pointer moves to the specific location it will then jump to the ISR
try doing a Google search on "interrupt vectoring"
An interrupt handling is transparent for the running program. The processor branchs automatically to a previously configured address, depending on the event, and this address being the corresponding ISR function. When returning from the interrupt, a special instruction restores the interrupted program.
Actually, most of the time you won't ever want that a program interrupted know it has been interrupted. If you need to know such info, the program should call a driver function instead.
interrupts are a hardware thing not a software thing. When the interrupt signal hits the processor the processor (generally) completes the current instruction. In some way shape or form preserves the state (so it can get back to where it was) and in some way shape or form starts executing the interrupt service routine. The isr is generally not C code at least the entry point is usually special as the processor does not conform to the calling convention for the compiler. The ISR might call C code, but you end up with the mistakes that you made, making calls like printf that should not be in an ISR. hard once in C to keep from trying to write general C code in an isr, rather than the typical get in and get out type of thing.
Ideally your application layer code should never know the interrupt happened, there should be no (hardware based) residuals affecting your program. You may choose to leave something for the application to see like a counter or other shared data which you need to mark as volatile so the application and isr can share it. this is not uncommon to have the isr simply flag that an interrupt happened and the application polls that flag/counter/variable and the handling happens primarily in the application not isr. This way the application can make whatever system calls it wants. So long as the overall bandwidth or performance is met this can and does work as a solution.
Software doesnt recognize the interrupt to be specific, it is microprocessor (INTC) or microcontrollers JOB.
Interrupt routine call is just like normal function call for Main(), the only difference is that main dont know when that routine will be called.
And every interrupt has specific priority and vector address. Once the interrput is received (either software or hardware), depending on interrupt priority, mask values and program flow is diverted to specific vector location associated with that interrupt.
hope it helps.

exit function when interrupted

I have an interrupt function called, interrupt_Foo() {...} which turns on a flag when 1 second has elapsed, and a user-defined function foo_calling() {...} which calls another function foo_called() {...}. I want to stop the process in foo_called() when 1 second has elapsed.
The code snippet below may elaborate further my need:
void interrupt interrupt_foo() {
...
if(1 second has elapsed) {
flag1s = 1;
} else {
flag1s = 0;
}
}
void foo_calling() {
// need something here to stop the process of foo_called()
...
(*fptr_called)(); // ptr to function which points to foo_called
...
}
void foo_called() {
// or something here to stop the process of this function
...
// long code
...
}
This is real time operating system so polling the 1 second flag inside foo_called() at some portion in the code is undesirable. Please help.
If you are willing to write non-portable code, and test the heck out of it before deploying it, and if the processor supports it, there may be a solution.
When the interrupt handler is called, the return address must be stored somewhere. If that is a location your code can query - like a fixed offset down the stack - then you can compare that address to the range occupied by your function to determine if 'foo_called is executing. You can get the address of the function by storing a dummy address, compiling, parsing the map file, then updating the address and recompiling.
Then, if your processor supports it, you can replace the return address with the address of the last instruction(s) of foo_called. (make sure you include the stack cleanup and register restoration code.). Then exit the interrupt as normal, and the interrupt handling logic will return code to the end of your interrupted function.
If the return address is not stored in the stack, but in an unwritable register, you still may be able to force quit your function - if the executable code is in writrable memory. Just store the instruction at the interruupt's return address, then overwrite it with a jump instruction which jumps to the function end. In the caller code, add a detector which restored the overwritten instruction.
I would expect that your RTOS has some kind of timer signal/interrupt that you can use to notify you when one second has passed. For instance if it is a realtime UNIX/Linux then you would set a signal handler for SIGALRM for one second. On a RT variant of Linux this signal will have more granularity and better guarantees than on a non-RT variant. But it is still a good idea to set the signal for slightly less than a second and busy-wait (loop) until you reach one second.

Resources