Can an ARM interrupt occur in mid-instruction? - arm

This question will be short and sweet.
I know an instruction can occur between instruction but can an interrupt happen during an instruction? Can a load multiple instruction be interrupted before it's loaded all the values into the registers?
mov r0, r1
< interrupt can happen here
ldm r0, {r1-r4} < can an interrupt happen **during** a load multiple instruction?

The load multiple instructions are explicitly not atomic. See section A3.5.3 of the ARM V7C architecture reference manual.
LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM,
VLDR, VSTM, and VSTR instructions are executed as a sequence of
word-aligned word accesses. Each 32-bit word access is guaranteed to
be single-copy atomic. The architecture does not require subsequences
of two or more word accesses from the sequence to be single-copy
atomic.
If you read on, you'll find out that the LDM/STM instructions can be aborted by an interrupt (and restarted from the beginning on interrupt return). LDM and STM instructions can always be interrupted by a data abort, so they're non atomic in that sense. Otherwise, the ARMv7-A architecture does its best to help you out. For interrupts, they can only be interrupted if low interrupt latency is enabled, AND normal memory is being accessed. So at the very least, you won't get repeated accesses to device memory. You don't want to do anything that expects atomic read/writes of normal memory though.
On v7-M, LDM and STM can be interrupted at any time (see section B1.5.10 of the ARMv7-M Architecture Reference Manual). It's implementation defined whether or not the instruction is restarted from the beginning of the list of loads/stores, or whether it's restarted from where it left off. As the ARM says:
The ARMv7-M architecture supports continuation of, or restarting from
the beginning, an abandoned LDM or STM instruction as outlined below.
Where an LDM or STM is abandoned and restarted (ICI bits are not
supported), the instructions should not be used with volatile memory.
In other words, don't rely on LDM or STM being atomic if you're trying to write portable code.

Related

DSB on ARM Cortex M4 processors

I have read the ARM documentation and it appears that they say in some places that the Cortex M4 can reorder memory writes, while in other places it indicates that M4 will not.
Specifically I am wondering if the DBM instruction is needed like:
volatile int flag=0;
char buffer[10];
void foo(char c)
{
__ASM volatile ("dbm" : : : "memory");
__disable_irq(); //disable IRQ as we use flag in ISR
buffer[0]=c;
flag=1;
__ASM volatile ("dbm" : : : "memory");
__enable_irq();
}
Uh, it depends on what your flag is, and it also varies from chip to chip.
In case that flag is stored in memory:
DSB is not needed here. An interrupt handler that would access flag would have to load it from memory first. Even if your previous write is still in progress the CPU will make sure that the load following the store will happen in the correct order.
If your flag is stored in peripheral memory:
Now it gets interesting. Lets assume flag is in some hardware peripheral. A write to it may make an interrupt pending or acknowledge an interrupt (aka clear a pending interrupt). Contrary to the memory example above this effect happens without the CPU having to read the flag first. So the automatic ordering of stores and loads won't help you. Also writes to flag may take effect with a surprisingly long delay due to different clock domains between the CPU and the peripheral.
So the following szenario can happen:
you write flag=1 to clear an handled interrupt.
you enable interrupts by calling __enable_irq()
interrupts get enabled, write to flag=1 is still pending.
wheee, an interrupt is pending and the CPU jumps to the interrupt handler.
flag=1 takes effect. You're now in an interrupt handler without anything to do.
Executing a DSB in front of __enable_irq() will prevent this problem because whatever is triggered by flag=1 will be in effect before __enable_irq() executes.
If you think that this case is purely academic: Nope, it's real.
Just think about a real-time clock. These usually runs at 32khz. If you write into it's peripheral space from a CPU running at 64Mhz it can take a whopping 2000 cycles before the write takes effect. Now for real-time clocks the data-sheet usually shows specific sequences that make sure you don't run into this problem.
The same thing can however happen with slow peripherals.
My personal anecdote happened when implementing power-saving late in a project. Everything was working fine. Then we reduced the peripheral clock speed of I²C and SPI peripherals to the lowest possible speed we could get away with. This can save lots of power and extend battery live. What we found out was that suddenly interrupts started to do unexpected things. They seem to fire twice each time wrecking havoc. Putting a DSB at the end of each affected interrupt handler fixed this because - you can guess - the lower clock speed caused us to leave the interrupt handlers before clearing the interrupt source was in effect due to the slow peripheral clock.
This section of the Cortex M4 generic device user guide enumerates the factors which can affect reordering.
the processor can reorder some memory accesses to improve efficiency, providing this does not affect the behavior of the instruction sequence.
the processor has multiple bus interfaces
memory or devices in the memory map have different wait states
some memory accesses are buffered or speculative.
You should also bear in mind that both DSB and ISB are often required (in that order), and that C does not make any guarantees about the ordering (except in-thread volatile accesses).
You will often observe that the short pipeline and instruction sequences can combine in such a way that the race conditions seem unreachable with a specific compiled image, but this isn't something you can rely on. Either the timing conditions might be rare (but possible), or subsequent code changes might change the resulting instruction sequence.

ARM Bootloader: Disable MMU and Caches

According to some tutorials, we will disable MMU and I/D-Caches at the beginning of bootlaoder. If I understand correctly, it aims to use the physical address directly in the program, so please correct me if I'm wrong. Thank you!
Secondly, we do this to disable MMU and Caches:
mrc P15, 0, R0, C1, C0, 0
bic R0, R0, #0x00002300 # clear bits 13, 9:8
bic R0, R0, #0x00000087 # clear bits 7, 2:0
orr R0, R0, #0x00000002 # set bit 2 (A) Align
orr R0, R0, #0x00001000 # set bit 12 (I) I-Cache
mcr P15, 0, R0, C1, C0, 0
D-Cache, MMU and Data Address Alignment Fault Checking have been disabled by clear bits 2:0, but why we enable bit 2 immediately in the following instrument? To make sure this manipulation is valid?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
Last question is why D-cache is disabled but I-caches is able? To speed up instrument process?
The MMU has settings to determine which memory regions are cacheable or not. If you do not have the mmu on but you have the data cache on (if possible) then you cannot safely talk to peripherals. if you read the uart status register for example that goes through the cache just like any other data operation, whatever that status is stays in the cache for subsequent reads until such time as that cache line is evicted and you get one more shot at the actual register. Lets say for example you have some code that polls the uart status register waiting for a character in the rx buffer. If that first read shows there is no character, that status goes in the cache, you will remain in the loop forever since you will never get to talk to the status register again you will simply get the cached copy of the register. if there was a character in there then that status also gets cached, you read the rx register, and perhaps do something, if when you come back again if the status has not been evicted from the data cache then you get the stale status which shows there is a character, you rx buffer read may or may not also be cached so you may get the stale value in the cache, you may get a stale value or whatever the peripheral does when you read and there is no new value or you might get a new value, but what you dont get in these situations is proper access to the peripheral. When the mmu is on, you use the mmu to mark the address space used by that peripheral as non-(data)-cacheable, and you dont have this problem. With the mmu off you need the data cache off for arm systems.
Leaving the I-cache on is okay because instruction fetches only read instructions...Well for a bare metal application that is okay, it helps for example if you are using a flash that has a potential for read disturb (spi or i2c flashes). The problem is this application is a bootloader, so you must take some extra care. For example your bootloader has some code at address 0x8000 that it runs through at least once, then you choose to use it as a bootloader, the bootloader might be at say address 0x10000000 allowing you to load a new program at 0x8000, this load uses data accesses so it does not go through the instruction cache. So there is a potential that the instruction cache has some or all of the code from the last time you were in the 0x8000 area, and when you branch to the bootloaded code at 0x8000 you will get either the old program from cache or a nasty mixture of old program and new program for the parts that are cached and not cached. So if your bootloader allows for the i-cache to be on, you need to invalidate the cache before branching to bootloaded code.
Lastly, if you or anyone using this bootloader wants to use jtag, then you have that same problem but worse, data cycles that do not go through the i-cache are used to write the new program to ram, when you tell the jtag debugger to then run the new program you will get 1) only the new program, 2) a mixture of the new program and old program fragments from cache 3) the old program from cache.
So d-cache is bad without an mmu because of things that are not in ram, peripherals, etc. The i-cache is a use at your own risk kind of thing which you can mitigate except for the times that jtag is used for debugging.
If you have concerns or have confirmed read-disturb in your (external) flash, then I recommend turn on the i-cache, use a tight loop to copy your application to ram, branch to the ram copy and run there, turn off the i-cache (or use at your own risk) and dont touch the flash again, certainly not heavy read accesses to small areas. A tight uart polling loop like you might have for a command line parser, is a really good place to get hit with read-disturb.
You did not specified on which ARM you are working. Capabilities may vary from one ARM to an other (there is a huge gap between an ARM9 and an ARM Cortex A15).
In the given code, bit 2 is cleared and then set, but it does not matter, as those changes are done in R0. There is no change in the ARM behavior until the write in CP15 register (done by the instruction mcr P15, 0, R0, C1, C0, 0).
Concerning d-cache/i-cache enabling, it is only a matter of choice, there is no requirement. On the products I work on, the bootloader enables L1 I-cache, D-cache, L2 cache, and MMU (and it disables all that stuff before jumping on Linux). Be sure to follow ARM documentations about cache invalidation and memory barriers (according to your actual ARM Core) if you use cache and MMU in your bootloader.

ARM modes: User and System

Can you explain how the ARM mode get changed in case of a system call handling?
I heard ARM mode change can happen only in privileged mode, but in case of a system call handling while the ARM is in user mode (which is a non-privileged mode), how does the ARM mode change?
Can anybody explain the whole action flow for the user mode case, and also more generally the system call handling (especially how the ARM mode change)?
Thanks in advance.
In the case of system calls on ARM, normally the system call causes a SWI instruction to be executed. Anytime the processor executes a SWI (software interrupt) instruction, it goes into SVC mode, which is privileged, and jumps to the SWI exception handler. The SWI handler then looks at the cause of the interrupt (embedded in the instruction) and then does whatever the OS programmer decided it should do. The other exceptions - reset, undefined instruction, prefetch abort, data abort, interrupt, and fast interrupt - all also cause the processor to enter privileged modes.
How file handling works is entirely up to whoever wrote your operating system - there's nothing ARM specific about that at all.
You need to get a copy of the ARM ARM (Architectural Reference Manual).
http://infocenter.arm.com -> ARM Architecture -> Reference Manuals -> ARMv5 Architectural Reference Manual then download the pdf.
It used to be a single ARM ARM for the ARM world but there are too many cores and starting to diverge so they split off the old one as ARMv5 ARM and made new Architectural Reference Manuals for each of the major ARM processor families.
In the Programmers Model chapter it talks about the modes, it says that you can change freely among the modes other than user. ARM startup code will often go through a series of mode changes so that the stack pointers, etc can be configured. Then as needed go back to System mode or User mode.
In that same chapter look at the Exceptions section, this describes the exceptions and what mode the processor switches to for each exception.
The Software interrupt exception which happens when an SWI instruction is executed, is a way to implement system calls. The processor is put in Supervisor mode and if in thumb mode switches to arm mode.
There needs to be code to support that exception handler of course. You need to verify with the operating system, if any, you are running, what is supported and what the calling convention is, etc.
Not all ARM processors work this way. The Cortex-M (ARMv7-M) does not have the same modes and same exception table, etc. As with any time you are using an ARM (at this level) you need to get the ARM ARM for the family you are using and you need to get the TRM (Techincal Reference Manual) for the core(s) you are using, ideally the exact revision, even if ARM marks the TRM as having been replaced by a newer version the chip manufacturer has purchased and uses a specific rev of the core and there can be enough differences between revs that you will want the correct manual.
When an SVC instruction is encountered by the PC, the following behaviour takes place:
The current status (CPSR) is saved (to the supervisor SPSR)
The mode is switched to supervisor mode
Normal (IRQ) interrupts are disabled
ARM mode is entered (if not already in use)
The address of the following instruction (the return address) is saved into the link register (R14) - It's worth noting that this is
the link register that belongs to the supervisor mode
The PC is altered to jump to address 0x00000008
An exception vector (just a branch instruction) should be at the address 0x0000008, which will branch the program to another area of code used to determine which supervisor call has been made.
Determining which supervisor call has been made is usually accomplished by loading the SVC instruction into a register (by offsetting the LR by one word - since the LR is still pointing to the instruction next to the supervisor call), bit clearing the last 8 bits and using the value in the remaining 24 bits of the register to calculate an offset in a jump table, to branch to the corresponding SVC code.
When the supervisor call code wishes to return to the user application, the processor needs to context switch back into user mode and return to the address contained within the LR (which is only available in supervisor mode, since certain registers are banked for both modes). This problem is overcome using the MOVS instruction, as illustrated below:
(consider this to also be your explanation on how to change mode)
MRS R0, CPSR ; load CPSR into R0
BIC R0, R0, #&1F ; clear mode field
ORR R0, R0, #&10 ; user mode code
MSR SPSR, R0 ; store modified CPSR into SPSR
MOVS PC, LR ; context switch and branch
The MRS and MSR instructions are used to transfer content between an ARM register and the CPSR or SPSR.
The MOVS instruction is a special instruction, which operates as a standard MOV instruction, but also sets the CPSR equal to the SPSR upon branching. This allows the processor to branch back (since we're moving the LR into the PC) and change mode to the mode specified by the SPSR.
I quote from the ARM documentation available here:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471c/BABDCIEH.html
When an exception is generated, the processor performs the following
actions:
Copies the CPSR into the appropriate SPSR. This saves the current mode, interrupt mask, and condition flags.
Switches state automatically if the current state does not match the instruction set used in the exception vector table.
Changes the appropriate CPSR mode bits to:
Change to the appropriate mode, and map in the appropriate banked out registers for that mode.
Disable interrupts. IRQs are disabled when any exception occurs. FIQs are disabled when an FIQ occurs and on reset.
Sets the appropriate LR to the return address.
Sets the PC to the vector address for the exception.
where, CPSR refers to Current Program Status Register and SPSR to Saved Program Status register used to restore the state of the process that was interrupted. Thus, as seen in point 3, the processor circuitry is designed in a way that the hardware itself changes the mode when user mode executes a Supervisor call instruction.
"I heard ARM mode change can happen only in privileged mode". You are partly right here. By partly I mean the control field of the CPSR register can be manually modified (manually modified means through code) in the privileged modes only not in the unprivileged mode (i.e. user mode). When a system call happens in the user mode it happens because of SWI instruction. An SWI instruction has inbuilt mechanism to change the mode to supervisor mode.
So to conclude , there are two ways to change the mode:
1) Explicitly through code. Only allowed in a privileged mode.
2) Implicitly through IRQ, FIQ, SWI, RESET, undefined instruction encountered, data abort, prefetch abort. This is allowed in all the modes.

Does anyone know how to enable ARM FIQ?

Does anyone know how to enable ARM FIQ?
Other than enabling or disabling the IRQ/FIQ while you're in supervisor mode, there's no special setup you should have to do on the ARM to use it, unless the system (that the ARM chip is running in) has disabled it in hardware (based on your comment, this is not the case since you're seeing the FIQ input pin driven correctly).
For those unaware of the acronyms, FIQ is simply the last interrupt vector in the list which means it's not limited to a branch instruction as the other interrupts are. That means it can execute faster than the other IRQ handlers.
Normal IRQs are limited to a branch instruction since they have to ensure their code fits into a single word. FIQ, because it won't overwrite any other IRQ vectors, can just run the code directly without a branch instruction (hence the "fast").
The FIQ input line is just a way for external elements to kick the ARM chip into FIQ mode and start executing the correct exception. There's nothing on the ARM itself which prevents that from happening except the CPSR.
To enable FIQ in supervisor mode:
MRS r1, cpsr ; get the cpsr.
BIC r1, r1, #0x40 ; enable FIQ (ORR to disable).
MSR cpsr_c, r1 ; copy it back, control field bit update.
A similar thing can be done for normal IRQs, but using #0x80 instead of #0x40.
The FIQ can be closed off to you by the chip manufacturer using trustzone extensions.
Trustzone creates a Secure world and a normal world. The secure world has its own supervisor, user and memory space. The idea is for secure operations to be routed so they never leave the chip and cannot be traced even if you scan the pins on the bus. I think in OMAP it is used for some cryptography operations.
On Reset the core starts in secure mode. It sets up the secure monitor (gateway between secure and non-secure world) and at this time FIQ can be setup to be routed to the monitor. I think it is the SCR.FIQ bit that may be set and then all FIQs ignore the value of CPSR.F and go to monitor mode. Check out the ARM ARM but if I remember correctly if this is happening there is no way for you to know from nonsecure OS code. Then the monitor will reset the Normal world registers and doing an exception return with PC set to the reset exception vector.
The core will take an interrupt to monitor mode, do its thing and return.
Sorry I can't answer you in the comments, I don't have enough reputation, you could always fix that ;), but I hope you see this

What is the difference between FIQ and IRQ interrupt system?

I want to know the difference between FIQ and IRQ interrupt system in
any microprocessor, e.g: ARM926EJ.
ARM calls FIQ the fast interrupt, with the implication that IRQ is normal priority. In any real system, there will be many more sources of interrupts than just two devices and there will therefore be some external hardware interrupt controller which allows masking, prioritization etc. of these multiple sources and which drives the interrupt request lines to the processor.
To some extent, this makes the distinction between the two interrupt modes redundant and many systems do not use nFIQ at all, or use it in a way analogous to the non-maskable (NMI) interrupt found on other processors (although FIQ is software maskable on most ARM processors).
So why does ARM call FIQ "fast"?
FIQ mode has its own dedicated banked registers, r8-r14. R14 is the link register which holds the return address(+4) from the FIQ. But if your FIQ handler is able to be written such that it only uses r8-r13, it can take advantage of these banked registers in two ways:
One is that it does not incur the overhead of pushing and popping any registers that are used by the interrupt service routine (ISR). This can save a significant number of cycles on both entry and exit to the ISR.
Also, the handler can rely on values persisting in registers from one call to the next, so that for example r8 may be used as a pointer to a hardware device and the handler can rely on the same value being in r8 the next time it is called.
FIQ location at the end of the exception vector table (0x1C) means that if the FIQ handler code is placed directly at the end of the vector table, no branch is required - the code can execute directly from 0x1C. This saves a few cycles on entry to the ISR.
FIQ has higher priority than IRQ. This means that when the core takes an FIQ exception, it automatically masks out IRQs. An IRQ cannot interrupt the FIQ handler. The opposite is not true - the IRQ does not mask FIQs and so the FIQ handler (if used) can interrupt the IRQ. Additionally, if both IRQ and FIQ requests occur at the same time, the core will deal with the FIQ first.
So why do many systems not use FIQ?
FIQ handler code typically cannot be written in C - it needs to be written directly in assembly language. If you care sufficiently about ISR performance to want to use FIQ, you probably wouldn't want to leave a few cycles on the table by coding in C in any case, but more importantly the C compiler will not produce code that follows the restriction on using only registers r8-r13. Code produced by a C compiler compliant with ARM's ATPCS procedure call standard will instead use registers r0-r3 for scratch values and will not produce the correct cpsr restoring return code at the end of the function.
All of the interrupt controller hardware is typically on the IRQ pin. Using FIQ only makes sense if you have a single highest priority interrupt source connected to the nFIQ input and many systems do not have a single permanently highest priority source. There is no value connecting multiple sources to the FIQ and then having software prioritize between them as this removes nearly all the advantages the FIQ has over IRQ.
FIQ or fast interrupt is often referred to as Soft DMA in some ARM references.Features of the FIQ are,
Separate mode with banked register including stack, link register and R8-R12.
Separate FIQ enable/disable bit.
Tail of vector table (which is always in cache and mapped by MMU).
The last feature also gives a slight advantage over an IRQ which must branch.
A speed demo in 'C'
Some have quoted the difficulty of coding in assembler to handle the FIQ. gcc has annotations to code a FIQ handler. Here is an example,
void __attribute__ ((interrupt ("FIQ"))) fiq_handler(void)
{
/* registers set previously by FIQ setup. */
register volatile char *src asm ("r8"); /* A source buffer to transfer. */
register char *uart asm ("r9"); /* pointer to uart tx register. */
register int size asm ("r10"); /* Size of buffer remaining. */
if(size--) {
*uart = *src++;
}
}
This translates to the following almost good assembler,
00000000 <fiq_handler>:
0: e35a0000 cmp sl, #0
4: e52d3004 push {r3} ; use r11, r12, etc as scratch.
8: 15d83000 ldrbne r3, [r8]
c: 15c93000 strbne r3, [r9]
10: e49d3004 pop {r3} ; same thing.
14: e25ef004 subs pc, lr, #4
The assembler routine at 0x1c might look like,
tst r10, #0 ; counter zero?
ldrbne r11, [r8] ; get character.
subne r10, #1 ; decrement count
strbne r11, [r9] ; write to uart
subs pc, lr, #4 ; return from FIQ.
A real UART probably has a ready bit, but the code to make a high speed soft DMA with the FIQ would only be 10-20 instructions. The main code needs to poll the FIQ r10 to determine when the buffer is finished. Main (non-interrupt code) may transfer and setup the banked FIQ registers by using the msr instruction to switch to FIQ mode and transfer non-banked R0-R7 to the banked R8-R13 registers.
Typically RTOS interrupt latency will be 500-1000 instructions. For Linux, it maybe 2000-10000 instructions. Real DMA is always preferable, however, for high frequency simple interrupts (like a buffer transfer), the FIQ can provide a solution.
As the FIQ is about speed, you shouldn't consider it if you aren't secure in coding in assembler (or willing to dedicate the time). Assembler written by an infinitely running programmer will be faster than a compiler. Having GCC assist can help a novice.
Latency
As the FIQ has a separate mask bit it is almost ubiquitously enabled. On earlier ARM CPUs (such as the ARM926EJ), some atomic operations had to be implemented by masking interrupts. Still even with the most advanced Cortex CPUs, there are occasions where an OS will mask interrupts. Often the service time is not critical for an interrupt, but the time between signalling and servicing. Here, the FIQ also has an advantage.
Weakness
The FIQ is not scalable. In order to use multiple FIQ sources, the banked registers must be shared among interrupt routines. Also, code must be added to determine what caused the interrupt/FIQ. The FIQ is generally a one trick pony.
If your interrupt is highly complex (network driver, USB, etc), then the FIQ probably makes little sense. This is basically the same statement as multiplexing the interrupts. The banked registers give 6 free variables to use which never load from memory. Register are faster than memory. Registers are faster than L2-cache. Registers are faster than L1-cache. Registers are fast. If you can not write a routine that runs with 6 variables, then the FIQ is not suitable. Note: You can double duty some register with shifts and rotates which are free on the ARM, if you use 16 bit values.
Obviously the FIQ is more complex. OS developers want to support multiple interrupt sources. Customer requirements for a FIQ will vary and often they realize they should just let the customer roll their own. Usually support for a FIQ is limited as any support is likely to detract from the main benefit, SPEED.
Summary
Don't bash my friend the FIQ. It is a system programers one trick against stupid hardware. It is not for everyone, but it has its place. When all other attempts to reduce latency and increase ISR service frequency has failed, the FIQ can be your only choice (or a better hardware team).
It also possible to use as a panic interrupt in some safety critical applications.
A feature of modern ARM CPUs (and some others).
From the patent:
A method of performing a fast
interrupt in a digital data processor
having the capability of handling more
than one interrupt is provided. When a
fast interrupt request is received a
flag is set and the program counter
and condition code registers are
stored on a stack. At the end of the
interrupt servicing routine the return
from interrupt instructions retrieves
the condition code register which
contains the status of the digital
data processor and checks to see
whether the flag has been set or not.
If the flag is set it indicates that a
fast interrupt was serviced and
therefore only the program counter is
unstacked.
In other words, an FIQ is just a higher priority interrupt request, that is prioritized by disabling IRQ and other FIQ handlers during request servicing. Therefore, no other interrupts can occur during the processing of the active FIQ interrupt.
Chaos has already answered well, but an additional point not covered so far is that FIQ is at the end of the vector table and so it's common/traditional to just start the routine right there, whereas the IRQ vector is usually just that. (ie a jump to somewhere else). Avoiding that extra branch immediately after a full stash and context switch is a slight speed gain.
another reason is in case of FIQ, lesser number of register is needed to push in the stack, FIQ mode has R8 to R14_fiq registers
FIQ is higher priority, and can be introduced while another IRQ is being handled. The most critical resource(s) are handled by FIQ's, the rest are handled by IRQ's.
I believe this is what you are looking for:
http://newsgroups.derkeiler.com/Archive/Comp/comp.sys.arm/2005-09/msg00084.html
Essentially, FIQ will be of the highest priority with multiple, lower priority IRQ sources.
FIQs are higher priority, no doubt, remaining points i am not sure..... FIQs will support high speed data transfer (or) channel processing, where high speed data processes is required we use FIQs and generally IRQs are used normal interrupt handlling.
No any magic about FIQ. FIQ just can interrupt any other IRQ which is being served,this is why it is called 'fast'. The system reacts faster on these interrupts but the rest is the same.
It Depends how we design interrupt handlers, as FIQ is at last it may not need one branch instruction, also it has unique set of r8-r14 registers so next time we come back to FIQ interrupt we do not need to push/pop up the stack. Ofcourse it saves some cycles, but again it is not wise to have more handlers serving one FIQ and yes FIQ is having more priority but it is not any reason to say it handles the interrupt faster, both IRQ/FIQ run at same CPU frequency, So they must be running at same speed.
This may be wrong. All I know is that FIQ stands for Fast Interrupt Request and that IRQ stands for Interrupt Request. Judging from these names, I will guess that a FIQ will be handled(thrown?) faster than an IRQ. It probably has something to do with the design of the processor where an FIQ will interrupt the process faster than an IRQ. I apologize if I'm wrong, but I normally do higher level programming, I'm just guessing right now.

Resources