ARM A7 Linux raw interrupt handling possible?

ARM A7 Linux raw interrupt handling possible? - c

I'd like to write an open-sourced core driver for controlling stepper motors in Linux. In this case, especially for 3D-Printers.
The basic idea is that the driver reserves pins on one IO port, and then manipulates those pins at once. It receives a buffer full of "toggle this, toggle that" values, and then emits those to the port, using a hardware timer.
Now the question is: Is there any way to handle a specific hardware interrupt as fast as possible?
The chip in question is an Allwinner H3, and I am using the TMR1 resource of said chip (IRQ 51). I can use it just fine, and it works as an interrupt as well:
static irqreturn_t stepCore_timer_interrupt(int irq, void *dev_id)
{
writel(TMR1_IRQ_PEND, TMR_IRQ_ST_VREG);
icnt++;
porta_state = readl(PA_VDAT);
porta_state &= porta_mask;
if(icnt & 0x00000001)
{
porta_state |= 0x00000001;
}
writel(porta_state, PA_VDAT);
return IRQ_HANDLED;
}
static struct irqaction stepCore_timer_irq = {
.name = "stepCore_timer",
.flags = IRQF_DISABLED | IRQF_NOBALANCING , IRQF_PERCPU,
.handler = stepCore_timer_interrupt,
.dev_id = NULL,
};
static void stepCore_timer_interrupt_setup(void)
{
int ret;
u32 val;
writel( 24000000, TMR1_INTV_VALUE_VREG );
writel( ( TMR1_MODE_CONTINUOUS | TMR1_CLK_PRES_1 | TMR1_CLK_SRC_OSC24M ), TMR1_CTRL_VREG );
ret = setup_irq(SUNXI_IRQ_TIMER1, &stepCore_timer_irq);
if (ret)
printk("%s: ERROR: failed to install irq %d\n", __func__, SUNXI_IRQ_TIMER1);
else
printk("%s: irq %d installed\n", __func__, SUNXI_IRQ_TIMER1);
ret = irq_set_affinity_hint(SUNXI_IRQ_TIMER1, cpumask_of(3));
if (ret)
printk("%s: ERROR: failed to set irq affinity for irq %d\n", __func__, SUNXI_IRQ_TIMER1);
else
printk("%s: set irq affinity for irq %d\n", __func__, SUNXI_IRQ_TIMER1);
/* Enable timer0 interrupt */
val = readl(TMR_IRQ_EN_VREG);
writel(val | TMR1_IRQ_EN, TMR_IRQ_EN_VREG);
}
TMR1 is otherwise unused (in fact, I had to add it myself) and so far works. However, there is quite some latency in handling the rather simple IRQ routine. Since I want to produce some code that is usable for a 3D printer, I very much like a more "stable" timer interrupt.
So, my question is: Is there any way to have a very short IRQ routine in Linux that has the highest possible priority? Or even doesn't care about the Linux scheduler at all, and just "does it's thing"? Basically a raw IRQ handler, ignoring what Linux thinks it should be?
The core it runs on is dedicated to just that task, anyways. The handler will be as short as possible: fetch an u32 from an array, write that to the port, done.
Preferably I would like to have something that just ignores the remainder of Linux all together. Yes, I know that that isn't the way to do it. But this is meant for a rather special case, so I have no qualms with adapting the regular kernel sources to suit those needs.
Oh, that reminds me, the kernel is 3.4.112 with the suitable preempt-rt patches.
Any help is greatly appreciated.
Greetings,
Chris

Here is a general solution to this issue. You can write a kernel module which will overwrite the existing interrupt handling routine and will be replaced by your own routine, where you can handle your irq of interest and redirect all the irq to the existing kernel interrupt handling routine. It's possible for x86 arch where you will get low level CPU instructions to get the existing address of interrupt description routine (lidt). I believe it should be possible for ARM too. Now, Linux has technique of CPU isolation isolcpus by utilizing this technique you can take a CPU out of scheduler domain i.e. no task will be scheduled on that particular CPU, until you specify a task to be run on that particular CPU (using taskset). After you take a CPU out of scheduler domain you can take help of the technique of affine a interrupt to that isolated CPU, you can do it via /proc/irq/IRQ_NUMBER/smp_affinity. Now all of your interrupt will be handled by this isolated CPU and 100% dedicated to that interrupt. And with your own IRQ routine you have full control over the interrupt handling.
Hopefully this will help!

Have you thought about using FIQ for that. We have a blog post about it:
http://free-electrons.com/blog/fiq-handlers-in-the-arm-linux-kernel/

Related

How can I determin if execution takes place in thread mode or if an exception is active? (ARMv7-A architecture)

I am using FreeRTOS on an ARM Cortex A9 CPU und I'm desperately trying to find out if it is possible to determin if the processor is executing a normal thread or an interrupt service routine. It is implemented in V7-a architecture.
I found some promising reference hinting the ICSR register (-> VECTACTIVE bits), but this only exist in the cortex M family. Is there a comparable register in the A family as well? I tried to read out the processor modes in the current processor status register (CPSR), but when read during an ISR I saw that the mode bits indicate supervisor mode rather than IRQ or FIQ mode.
Looks a lot like there is no way to determine in which state the processor is, but I wanted to ask anyway, maybe I missed something...
The processor has a pl390 General Interrupt Controller. Maybe it is possible to determine the if an interrupt has been triggered by reading some of it's registers?
If anybody can give me a clue I would be very greatfull!
Edit1:
The IRQ Handler of FreeRTOS switches the processor to Superviser mode:
And subsequently switches back to system mode:
Can I just check if the processor is in supervisor mode and assume that this means that the execution takes place in an ISR, or are there other situations where the kernel may switches to supervisor mode, without being in an ISR?
Edit2:
On request I'll add an overal background description of the solution that I want to achieve in the first place, by solving the problem of knowing the current execution context.
I'm writing a set of libraries for the CortexA9 and FreeRTOS that will access periphery. Amongst others I want to implement a library for the available HW timer from the processor's periphery.
In order to secure the access to the HW and to avoid multiple tasks trying to access the HW resource simultaneously I added Mutex Semaphores to the timer library implementation. The first thing the lib function does on call is to try to gain the Mutex. If it fails the function returns an error, otherwise it continouses its execution.
Lets focus on the function that starts the timer:
static ret_val_e TmrStart(tmr_ctrl_t * pCtrl)
{
ret_val_e retVal = RET_ERR_DEF;
BaseType_t retVal_os = pdFAIL;
XTtcPs * pHwTmrInstance = (XTtcPs *) pCtrl->pHwTmrInstance;
//Check status of driver
if(pCtrl == NULL)
{
return RET_ERR_TMR_CTRL_REF;
}else if(!pCtrl->bInitialized )
{
return RET_ERR_TMR_UNINITIALIZED;
}else
{
retVal_os = xSemaphoreTake(pCtrl->osSemMux_Tmr, INSTANCE_BUSY_ACCESS_DELAY_TICKS);
if(retVal_os != pdPASS)
{
return RET_ERR_OS_SEM_MUX;
}
}
//This function starts the timer
XTtcPs_Start(pHwTmrInstance);
(...)
Sometimes it can be helpful to start the timer directly inside an ISR. The problem that appears is that while the rest of function would support it, the SemaphoreTake() call MUST be changed to SemaphoreTakeFromISR() - moreover no wait ticks are supported when called from ISR in order to avoid a blocking ISR.
In order to achieve code that is suitable for both execution modes (thread mode and IRQ mode) we would need to change the function to first check the execution state and based on that invokes either SemaphoreTake() or SemaphoreTakeFromISR() before proceeding to access the HW.
That's the context of my question. As mentioned in the comments I do not want to implement this by adding a parameter that must be supplied by the user on every call which tells the function if it's been called from a thread or an ISR, as I want to keep the API as slim as possible.
I could take FreeRTOS approch and implement a copy of the TmrStart() function with the name TmrStartFromISR() which contains the the ISR specific calls to FreeRTOS's system resources. But I rather avoid that either as duplicating all my functions makes the code overall harder to maintain.
So determining the execution state by reading out some processor registers would be the only way that I can think of. But apparently the A9 does not supply this information easily unfortunately, unlike the M3 for example.
Another approch that just came to my mind could be to set a global variable in the assembler code of FreeRTOS that handles exeptions. In the portSAVE_CONTEXT it could be set and in the portRESTORE_CONTEXT it could be reset.
The downside of this solution is that the library then would not work with the official A9 port of FreeRTOS which does not sound good either. Moreover you could get problems with race conditions if the variable is changed right after it has been checked by the lib function, but I guess this would also be a problem when reading the state from a processor registers directly... Probably one would need to enclose this check in a critical section that prevents interrupts for a short period of time.
If somebody sees some other solutions that I did not think of please do not hesitate to bring them up.
Also please feel free to discuss the solutions I brought up so far.
I'd just like to find the best way to do it.
Thanks!

On a Cortex-A processor, when an interrupt handler is triggered, the processor enters IRQ mode, with interrupts disabled. This is reflected in the state field of CPSR. IRQ mode is not suitable to receive nested interrupts, because if a second interrupt happened, the return address for the first interrupt would be overwritten. So, if an interrupt handler ever needs to re-enable interrupts, it must switch to supervisor mode first.
Generally, one of the first thing that an operating system's interrupt handler does is to switch to supervisor mode. By the time the code reaches a particular driver, the processor is in supervisor mode. So the behavior you're observing is perfectly normal.
A FreeRTOS interrupt handler is a C function. It runs with interrupts enabled, in supervisor mode. If you want to know whether your code is running in the context of an interrupt handler, never call the interrupt handler function directly, and when it calls auxiliary functions that care, pass a variable that indicates who the caller is.
void code_that_wants_to_know_who_called_it(int context) {
if (context != 0)
// called from an interrupt handler
else
// called from outside an interrupt handler
}
void my_handler1(void) {
code_that_wants_to_know_who_called_it(1);
}
void my_handler2(void) {
code_that_wants_to_know_who_called_it(1);
}
int main(void) {
Install_Interrupt(EVENT1, my_handler1);
Install_Interrupt(EVENT2, my_handler1);
code_that_wants_to_know_who_called_it(0);
}

Why does the Linux kernel not stop at the first handler for a shared IRQ that returns IRQ_HANDLED?

I'm sure there's a good reason for this, but I can't see what it is. Inside __handle_irq_event_percpu the kernel loops over all the handlers registered for a particular IRQ line and calls it. What I don't understand is why this loop isn't exited when the first handler returning IRQ_HANDLED is reached? It seems like a simple performance improvement, so there must be something I don't understand.
Does anyone know why?

In the Linux source tree, __handle_irq_event_percpu() is in kernel/irq/handle.c:
irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags)
{
irqreturn_t retval = IRQ_NONE;
unsigned int irq = desc->irq_data.irq;
struct irqaction *action;
record_irq_time(desc);
for_each_action_of_desc(desc, action) {
irqreturn_t res;
trace_irq_handler_entry(irq, action);
res = action->handler(irq, action->dev_id);
trace_irq_handler_exit(irq, action, res);
if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pS enabled interrupts\n",
irq, action->handler))
local_irq_disable();
switch (res) {
case IRQ_WAKE_THREAD:
/*
* Catch drivers which return WAKE_THREAD but
* did not set up a thread function
*/
if (unlikely(!action->thread_fn)) {
warn_no_thread(irq, action);
break;
}
__irq_wake_thread(desc, action);
/* Fall through - to add to randomness */
case IRQ_HANDLED:
*flags |= action->flags;
break;
default:
break;
}
retval |= res;
}
return retval;
}
The for_each_action_of_desc(desc, action) macro travels in the action list of the IRQ descriptor:
#define for_each_action_of_desc(desc, act) \
for (act = desc->action; act; act = act->next)
[...]
struct irq_desc {
struct irq_common_data irq_common_data;
struct irq_data irq_data;
unsigned int __percpu *kstat_irqs;
irq_flow_handler_t handle_irq;
struct irqaction *action; /* IRQ action list */
[...]
struct irqaction {
irq_handler_t handler;
void *dev_id;
void __percpu *percpu_dev_id;
struct irqaction *next;
irq_handler_t thread_fn;
struct task_struct *thread;
struct irqaction *secondary;
unsigned int irq;
unsigned int flags;
unsigned long thread_flags;
unsigned long thread_mask;
const char *name;
struct proc_dir_entry *dir;
} ____cacheline_internodealigned_in_smp;
There are multiple entries in the action list if the interrupt line is shared by several devices. So, several devices may enter in interrupt state at the same time. Hence, the action is to be called for all the devices sharing the line to check if there is something to do.
N.B.:
This answer is better argumented on the subject
This blog article depicts the steps of interrupt handling in the Linux kernel.

Inside __handle_irq_event_percpu the kernel loops over all the handlers registered for a particular IRQ line and calls it. What I don't understand is why this loop isn't exited when the first handler returning IRQ_HANDLED is reached? It seems like a simple performance improvement, so there must be something I don't understand.
There are 2 cases to consider - shared edge triggered IRQs and shared level triggered IRQs.
Shared Edge Triggered IRQs
In this case, 2 or more devices can send an IRQ at the same time or at similar times. If this happens and the "for each driver" loop is exited when the first handler returns IRQ_HANDLED then other devices can/will become stuck in a "waiting for IRQ handler's attention" state (most likely causing devices to lock up permanently). To avoid that, for edge triggered IRQs, the kernel's "for each driver" loop must notify all drivers (and can't stop as soon as one returns IRQ_HANDLED).
Note that shared edge triggered IRQs are rare. For 80x86 PCs it's possible when there are more than 2 serial port controllers (which can be solved by using the same driver for all serial port controllers and dealing with the problem in the driver and not in the kernel's IRQ management code), but apart from that shared edge triggered IRQs simply don't exist (on 80x86 PCs).
Shared Level Triggered IRQs
In this case, 2 or more devices can send an IRQ at the same time or at similar times; but if this happens and the "for each driver" loop is exited when the first handler returning IRQ_HANDLED then the other IRQs (from other devices) are not lost. Instead, the interrupt controller will see "level is still being triggered by at least one device" and will re-issue the IRQ (and keep sending more IRQs until all devices are satisfied).
For shared level triggered IRQs, it's a performance compromise (that has nothing to do with "correctness"). More specifically:
If it's very likely that multiple devices will want attention at the same or similar time; then you can improve performance by continuing the loop (when a driver returns IRQ_HANDLED) because it's likely that this will avoid the cost of the interrupt controller re-issuing the IRQ.
If it's very unlikely that multiple devices will want attention at the same or similar time; then you can improve performance by stopping the loop as soon as driver returns IRQ_HANDLED because it's likely that this will avoid the cost of the executing unnecessary device drivers' interrupt handlers.
Note that this depends on the order that device drivers' IRQ handlers are called. To understand this imagine there are 2 devices sharing an IRQ line and almost all IRQs come from the first device. If the first device's driver's IRQ handler is called first and returns IRQ_HANDLED then it'd be unlikely that the second device also sent an IRQ at the same time; but if the second device's driver's IRQ handler is called first and returns IRQ_HANDLED then it'd be likely that the second device also sent an IRQ at the same time.
In other words; if the kernel sorted the list of device drivers in order of "chance the device sent an IRQ"; then it becomes more likely that stopping the loop as soon as a driver returns IRQ_HANDLED will improve performance (and it becomes more likely that the first driver called will return IRQ_HANDLED sooner).
However tracking statistics and "being smarter" (determining how to optimize performance dynamically based on those statistics) would also add a little overhead, and (at least in theory, especially if device drivers' interrupt handlers are extremely fast anyway) this could cost more performance than you'd gain.
Essentially; it'd take a lot of work (research, benchmarking) to quantify and maximize the potential benefits; and it's a lot easier to not bother (and always call all device driver's interrupt handlers" even when it is worse).

STM32 DMA from timer count to memory

I'm using an STM32H743. I have an external clock signal coming in on a GPIO pin, and I want to very accurately measure elapsed time between each rising (or falling) edge in the external clock signal. So I set things up so that TIM4 is triggered by the external clock, and TIM5 is triggered by the internal oscillator.
I wrote an IRQ so that whenever TIM4 triggers, an interrupt runs that captures TIM5's value. It seems to work OK, but I'm wondering if I can do it through DMA to avoid all the context switching and free up the CPU. Basically I want to set up a DMA so that each TIM4 event initiates a DMA transfer that copies the TIM5 counter value to a circular buffer somewhere.
I've searched through forums and the DMA documentation but I'm hazy on whether a timer register can be a valid DMA source. I was thinking maybe I could do something like this:
hDma->PAR = (uint32_t) &htim5.Instance->CNT;
hDma->M0AR = (uint32_t) myBufferPtr;
hDma->NDTR = myBufferSize;
hDma->CR |= (uint32_t)DMA_SxCR_EN;
But I'm not sure if this can work.
Short version: Can I use the timer's CNT register as a DMA transfer source? Would it be a peripheral-to-memory transfer? Or a memory-to-memory transfer? Are there other flags I need to make this work? Or is it not possible? Or is there another STM32 feature that would make it easier to count time between pulses?

Disclaimer
I must confess that my long practical experience with STM32 by now stayed with mainstream controller families like STM32F0, STM32F3, STM32F4 and STM32L4.
Therefore I'm answering based on what those controllers would offer you in your situation.
The STM32H7 series is much stronger, let alone it offers several additional DMA technologies like DMA2D, MDMA and lots of other stuff that I'm not sure about.
But I think a simplified answer might also help you for now, so I'm daring to write it.
Can I use the timer's CNT register as a DMA transfer source? Would it be a peripheral-to-memory transfer? Or a memory-to-memory transfer? Are there other flags I need to make this work? Or is it not possible?
I would expect this to work.
I don't see a reason not to read the TIMx_CNT register in a DMA transfer.
The CNT register is definitely a peripheral address so you have to configure it as a peripheral-to-memory transfer.
I believe that the peripheral/memory separation refers to the bus from which the DMA controller fetches the data (or to which bus one it delivers them) in the bus matrix implemented in every STM32.
Or is there another STM32 feature that would make it easier to count time between pulses?
Yes, there is:
Many of the TIM peripherals (not all are the same) offer you a feature called "Input Capture" that connects the channel (sub-)peripheral of the TIM instance to the input and has the main part of the (same!) TIM peripheral do the internal clocking.
A prerequisite of this is, that the pin you'd like to measure has a TIMx_CHy alternate function, not "only" a TIMx_ETR one.
The TIM peripherals offer a wealthy range of different configuration options - and a complicated mess as long as you haven't got used to it.
As an introduction and a good overview, I recommend two application notes from ST:
AN4013 Application note. "STM32 cross-series timer overview", Rev.8
Which timers you have on your µC, and which features are offered by which one.
AN4776 Application note. "General-purpose timer cookbook for STM32 microcontrollers", Rev.3
How to use the timers you have. Check out section 2.6, input capture is on page 27.
Looking up those two, I found a third one you might want to check out for better precision, related to HRTIM timers:
AN4539 Application note. "HRTIM cookbook", Rev.4

It is easily done using STM32CubeIDE configurator:
configure timer, enable input capture channel, enable DMA (mode
circular, peripheral to memory,data width word/word). Enable
interrupts.
Prepare buffer for storing captured counter values
Start IC in DMA mode before main loop
For high speed operation you may copy data from timerCaptureBuffer
to timerCaptureBufferSafe inside these callbacks. For example, DMA memory to memory transfer to minimize time spent in HAL_TIM_IC_CaptureHalfCpltCallback and HAL_TIM_IC_CaptureCallback interrupts. Process adjacent captured values stored in timerCaptureBufferSafe after DMA memory to memory callback signals data is ready. You may use signaling flags so timerCaptureBufferSafe will not be overwritten.
Here is an example:
#define TIM_BUFFER_SIZE 128
uint32_t timerCaptureBuffer[TIM_BUFFER_SIZE];
uint32_t timerCaptureBufferSafe[TIM_BUFFER_SIZE];
// ...
HAL_DMA_RegisterCallback(&hdma_memtomem_dma2_stream2,
HAL_DMA_XFER_CPLT_CB_ID,
myDMA_Callback22);
// ...
HAL_TIM_IC_Start_DMA(&htim2, TIM_CHANNEL_1, uint32_t*)timerCaptureBuffer,TIM_BUFFER_SIZE);
// ...
void HAL_TIM_IC_CaptureHalfCpltCallback(TIM_HandleTypeDef *htim)
{
HAL_DMA_Start_IT(&hdma_memtomem_dma2_stream2,
(uint32_t)&timerCaptureBuffer[0],
(uint32_t)&timerCaptureBufferSafe[0],
sizeof(timerCaptureBuffer)/2/4);
// ...
}
void HAL_TIM_IC_CaptureCallback(TIM_HandleTypeDef *htim)
{
HAL_DMA_Start_IT(&hdma_memtomem_dma2_stream2,
(uint32_t)&timerCaptureBuffer[TIM_BUFFER_SIZE/2],
(uint32_t)&timerCaptureBufferSafe[TIM_BUFFER_SIZE/2],
sizeof(timerCaptureBuffer)/2/4);
// ...
}
void myDMA_Callback22(DMA_HandleTypeDef *_hdma)
{
//...
}

Using embOS functions within USB ISR for LPC1788

I'm developing software for an NXP LPC1788 microcontroller, and I'm using the embOS RTOS. Whenever a message is received over USB, I want to use the OS_PutMailCond() function to store the USB message in a mailbox which a handler function is waiting on. In other words, I want to make message handling interrupt-driven.
The embOS user manual can be found here. Page 145 describes the OS_PutMailCond() function.
Whenever a USB message is received, it triggers the USB interrupt service routine on the LPC, but to let embOS know that it's an ISR I have to place OS_EnterInterrupt() and OS_LeaveInterrupt() at the start and end of the ISR respectively. This is necessary if I want to call embOS functions within it, including OS_PutMailCond().
The problem is that if I put OS_EnterInterrupt()/OS_LeaveInterrupt() anywhere within the USB ISR, the USB stops functioning properly and Windows informs me that the device has malfunctioned.
I have no idea why this is the case. We've tried something similar for handling messages over CAN, as shown below, and it works fine.
void CAN_IRQHandler(void)
{
OS_EnterInterrupt();
...
if (MBfieldCANframeInitialised)
OS_PutMailCond (&MBfieldCANframe, &recMessage);
OS_LeaveInterrupt();
}
OS_EnterInterrupt() and OS_LeaveInterrupt() are described on pages 252 and 253 of the linked manual. From the additional information section of the former:
If OS_EnterInterrupt() is used, it should be the first function to be
called in the interrupt handler. It must be used with
OS_LeaveInterrupt() as the last function called. The use of this
function has the following effects, it:
disables task switches
keeps interrupts in internal routines disabled
EDIT
I've investigated further and found out that using OS_EnterInterrupt() and OS_LeaveInterrupt() within the USB ISR (and other ISR's like the one for the GPIO when a rising or falling edge is detected on a pin) causes an OS error. The error value is 166, which means "OS-function called from ISR with high priority".
I'll update if I find out anything else.

Problem solved. It turns out the guy that made this work for the CAN ISR changed the code of one of the embOS source files to set the CAN ISR priority level from 0 to 29 (higher level = lower priority). I did the same thing for the USB ISR:
void OS_InitHW(void) {
OS_IncDI();
//
// We assume, the PLL and core clock was already set by the SystemInit() function
// which was called from the startup code
// Therefore, we don't have to initailize any hardware here,
// we just ensure that the system clock variable is updated and then
// set the periodic system timer tick for embOS.
//
SystemCoreClockUpdate(); // Update the system clock variable (might not have been set before)
if (SysTick_Config (OS_PCLK_TIMER / OS_TICK_FREQ)) { // Setup SysTick Timer for 1 msec interrupts
while (1); // Handle Error
}
//
// Initialize NVIC vector base address. Might be necessary for RAM targets or application not running from 0
//
NVIC_VTOR = (OS_U32)&__Vectors;
//
// Set the interrupt priority for the system timer to 2nd lowest level to ensure the timer can preempt PendSV handler
//
NVIC_SetPriority(SysTick_IRQn, (1u << __NVIC_PRIO_BITS) - 2u);
NVIC_SetPriority(CANActivity_IRQn, (1u << __NVIC_PRIO_BITS) - 3u);
NVIC_SetPriority(CAN_IRQn, (1u << __NVIC_PRIO_BITS) - 3u);
NVIC_SetPriority(USB_IRQn, (1u << __NVIC_PRIO_BITS) - 3u);
OS_COM_INIT();
OS_DecRI();
}
I found this in the embOS documentation:
Why can a high priority ISR not use the OS API ?
embOS disables low priority interrupts when embOS data structures are modified. During this time high priority ISR are enabled. If they would call an embOS function, which also modifies embOS data, the embOS data structures would be corrupted.

ARM Cortex-M3 example for interrupt pending

With an ARM Cortex-M3, such as an NXP LPC1788, why would someone use the Interrupt Set-Pending Register(s) or Interrupt Clear-Pending Registers?
Can someone provide a simple, canonical example of using these registers?

The only use case I can think of is the triggering of a low-priority software excaption form a high priority IRQHandler - like the GPIO interrupt handler.
Normally you would use PendSV for that, but when you have more than one task or priority level you can use any unused peripherial exception vector. Could be useful in programs that use the Sleep-on-Exit feature - where the µC will only run in exception handlers.
// Example for LPC17xx
void ETHERNET_Handler (void)
{
// toggle LED on P0.4
LPC_GPIO0->FIODIR0 ^= (1<<4);
}
void main(void)
{
// set Ethernet IRQ to loewst Priority
NVIC_SetPriority(ENET_IRQn,31);
NVIC_EnableIRQ(ENET_IRQn);
NVIC_SetPendingIRQ(ENET_IRQn); // trigger Ethernet IRQ Handler
// ...
while (1);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight