fiq & irq handler -- arm

fiq & irq handler -- arm - arm

I am new to arm & have some doubs related to IRQ & FIQ. Please try to clarify these.
How many number of FIQ & IRQ channel arm have ?
And what number of handlers can we write for each channel ?
Also if we can register multiple handler for single interrupt channel how arm comes to know which handler to run.

The distinction between IRQ and FIQ goes right the way back to early days of ARM when it was designed by Acorn. It was always the case that the IRQ line was attached to an interrupt controller that multiplexed a large number of interrupt sources together. This is precisely what happens in all modern ARMs
The rationale behind the FIQ was to provide an extremely low latency response with maximum priority (it can safely pre-empt the IRQ handler). The comparatively large number of shadow registers facilitate writing handlers that store the handler's state in CPU registers and not hitting the stack.
The shadow registers are almost of the opposite set to those commonly used by APCS for function call, so writing handlers in C, would cause a push and eventual pop of up to 8 non-shadowed registers. Having any kind of interrupt demultiplexing wipes out any performance advantage that FIQ might have given.
All of this means that there is only really any benefit in using FIQ for very specialised applications where really hard-real time interrupt response is required for one interrupting device, and you're willing to write your handler in assembler. You'll also be left with working out how to synchronise with the rest of the system - some of which would rely on disabling IRQ to keep data synchronised.

traditionally the arm has one interrupt line which you can send to one of two handlers FIQ or IRQ. FIQ has a larger bank of FIQ mode only registers so you have fewer that you need to store on the stack. From there you read the vendor specific registers if any to determine the source of the interrupt and then branch into separate handlers.
More recently there have bend arm architectures with many interrupts 128, 256 each with a separate handler. So generically asking about arm is not as varied but about like asking something generic about x86.
All of this information is easily available in the ARM architectural reference manuals for the different architectures and the pinouts to the core (what the vendor builds its chip around) is documented in the technical reference manuals for the various cores (also very easy to obtain). infocenter.arm.com has the architecture and technical reference manuals as well as amba/axi (the data bus that the vendor connects to). Your question is completely answered in those documents.

The ARM processor directly supports only ONE IRQ and ONE FIQ. ARM supports multiple interrupts through a peripheral called Interrupt Controller. ARM standard interrupt controllers are called GIC (Generic Interrupt Controller).
The GIC has a number of inputs for peripherals to connect their interrupt lines and two output lines that connect to IRQ and FIQ. Basically it acts as a MUX. A GIC driver will setup configurations such as interrupt priority, type (IRQ/FIQ), masking etc.
In traditional ARM systems there is one entry each for IRQ and FIQ in the Exception Vectors. Depending on which line the interrupt fired, IRQ or FIQ handler is called. The interrupt handler queries the GIC (GIC CPU interface registers, to be specific) to get the interrupt number. Based on this interrupt number, corresponding device handler is invoked.
Number of interrupts depends on the specific GIC implementation. So you would have to check the manual for the interrupt controller in your system to get those specifics.
Note: The interrupt handling is slightly different depending on which specific ARM core you are coding for.

Actually the question is a bit tricky. You must specify in the question to which architecture in ARM you work. ARM v7-A and ARM v7-R Architecture Reference Manual (ARM ARM) specifies one FIQ and one IRQ, as many already answered. But ARMv7-M (used in Cortex-M processors) integrates a interrupt controller in the processor, and thus offers one NMI (instead of FIQ) and up to 240 IRQ lines.
For more information: ARMv7 A and ARMv7-R Architecure reference manual: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0406c/index.html
ARMv7-M Architecture Reference Manual: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403e.b/index.html
As an example, Cortex M4 specs sheet: http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php

Related

ARM Cortex M3 - Add a new interrupt to the end of the vector table?

I am doing some bare metal C development on an ARM Cortex M3 SoC, and I wanted to check and see if it is possible to add a new user-defined interrupt handler to the NVIC. I am adding my own IRQ with the plan of triggering it via software, either via NVIC_SetPendingIRQ() or via the NVIC->STIR register. Neither seem to work.
I have added my interrupt vector name to the end of the vector list in the CMSIS startup assembler file, and added the corresponding enum to the system header, and while debugging and executing the function call NVIC_EnableIRQ(), it doesn't correctly update the NVIC->ISER (Interrupt Set Enable Register). So I guess, the question is, can you even add your own interrupt? There are 256 total interrupts than can be used in the ARM Cortex M3, and I just followed how the others were added so I figured it wouldn't be an issue.
Thank you.

The datasheet for your SoC should say how many interrupts are supported by the NVIC. While 240 is the maximum possible number for a Cortex-M3 device in general, the actual number on your chip is defined by the implementation and it makes sense to have that number be as small as possible to reduce costs.
In general, there is no way to add interrupts in software, but you might be able to use the SVCall interrupt, which is designed to be triggered by software. Or you could find some other interrupt you aren't using in your system, and which is not being activated by hardware, and try to use that for your purposes.
References:
Nested Vectored Interrupt Controller in the Cortex-M3 Devices Generic User Guide

The SVC instruction invokes the SVCall handler with an 8-bit service number available to the handler which can be used to invoke a handler from a look-up table (essentially a secondary vector table for software interrupts).
An example of that can be found at https://developer.arm.com/documentation/ka004005/latest, except there it uses a switch rather than a look-up table - to the same effect.

How could we sleep when we are executing a syscall that execute in interrupt mode

When I am executing a system call to do write or something else, the ISR corresponded to the exception is executing in interrupt mode (on cortex-m3 the IPSR register is having a non-zero value, 0xb). And what I have learned is that when we execute a code in an interrupt mode we can not sleep, we can not use functions that might block ...
My question is that: is there any kind of a mechanism with which the ISR could still executing in interrupt mode and in the same time it could use functions that might block, or is there any kind of trick is implemented.

Caveat: This is more of a comment than an answer but is too big to fit in a comment or series of comments.
TL;DR: Needing to sleep or execute a blocking operation from an ISR is a fundamental misdesign. This seems like an XY problem: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Doing sleep [even in application code] is generally a code smell. Why do you feel you need to sleep [vs. some event/completion driven mechanism]?
Please edit your question and add clarification [i.e. don't just add comments]
When I am executing a system call to do write or something else
What is your application doing? A write to what device? What "something else"?
What is the architecture/board, kernel/distro? (e.g. Raspberry Pi running Raspian? nvidia Jetson? Beaglebone? Xilinx FPGA with petalinux?)
What is the H/W device and where did the device driver come from? Did you write the device driver yourself or is it a standard one that comes with the kernel/distro? If you wrote it, please post it in your question.
Is the device configured properly? (e.g.) Are the DTB entries correct?
Is the device a block device, such as a disk controller? Or, is it a character device, such as a UART? Does the device transfer data via DMA? Or, does it transfer data by reading/writing to/from an IO port?
What do you mean by "exception"? Generally, exception is an abnormal condition (e.g. segfault, bus error, etc.). Please describe the exact context/scenario for which this occurs.
Generally, an ISR does little things. (e.g.) Grab [and save] status from the device. Clear/rearm the interrupt in the interrupt controller. Start the next queued transfer request. Wake up the sleeping base level task (usually the task that executed the syscall [waiting on a completion event in kernel mode]).
More elaborate actions are generally deferred and handled in the interrupt's "bottom half" handler and/or tasklet. Or, the base level is woken up and it handles the remaining processing.
What kernel subsystems are involved? Are you using platform drivers? Are you interfacing from within the DMA device driver framework? Are message buses involved (e.g. I2C, SPI, etc.)?
Interrupt and device handling in the linux kernel is somewhat different than what one might do in a "bare metal" system or RTOS (e.g. FreeRTOS). So, if you're coming from those environments, you'll need to think about restructuring your driver code [and/or application code].
What are your requirements for device throughput and latency?
You may wish to consult a good book on linux device driver design. And, you may wish to consult the kernel's Documentation subdirectory.
If you're able to provide more information, I may be able to help you further.
UPDATE:
A system call is not really in the same class as a hardware interrupt as far as the kernel is concerned, even if the CPU hardware uses the same sort of exception vector mechanisms for handling both hardware and software interrupts. The kernel treats the system call as a transition from user mode to kernel mode. – Ian Abbott
This is a succinct/great explanation. The "mode" or "context" has little to do with how we got/get there from a H/W mechanism.
The CPU doesn't really "understand" interrupt mode [as defined by the kernel]. It understands "supervisor" vs "user" privilege level [sometimes called "mode"].
When executing at user privilege level, an interrupt/exception will notice the transition from "user" level to "supvervisor" level. It may have a special register that specifies the address of the [initial] supervisor stack pointer. Atomically, it swaps in the value, pushing the user SP onto the new kernel stack.
If the interrupt is interrupting a CPU that is already at supervisor level, the existing [supervisor] SP will be used unchanged.
Note that x86 has privilege "ring" levels. User mode is ring 3 and the highest [most privileged] level is ring 0. For arm, some arches can have a "hypervisor" privilege level [which is higher privilege than "supervisor" privilege].
The setup of the mode/context is handled in arch/arm/kernel/entry-*.S code.
An svc is a synchronous interrupt [generated by a special CPU instruction]. The resulting context is the context of the currently executing thread. It is analogous to "call function in kernel mode". The resulting context is "kernel thread mode". At that point, it's not terribly useful to think of it as an "interrupt" anymore.
In fact, on some arches, the syscall instruction/mechanism doesn't use the interrupt vector table. It may have a fixed address or use a "call gate" mechanism (e.g. x86).
Each thread has its own stack which is different than the initial/interrupt stack.
Thus, once the entry code has established the context/mode, it is not executing in "interrupt mode". So, the full range of kernel functions is available to it.
An interrupt from a H/W device is asynchronous [may occur at any time the CPU's internal interrupt enable flag is set]. It may interrupt a userspace application [executing in application mode] OR kernel thread mode OR an existing ISR executing in interrupt mode [from another interrupt]. The resulting ISR is executing in "interrupt mode" or "ISR mode".
Because the ISR can interrupt a kernel thread, it may not do certain things. For example, if the CPU were in [kernel] thread mode, and it was in the middle of a kmalloc call [GFP_KERNEL], the ISR would see partial state and any action that tried to adjust the kernel's heap would result in corruption of the heap.
This is a design choice by linux for speed.
Kernel ISRs may be classified as "fast interrupts". The ISR executes with the CPU interrupt enable [IE] flag cleared. No normal H/W interrupt may interrupt the ISR.
So, if another H/W device asserts its interrupt request line [in the external interrupt controller], that request will be "pending". That is, the request line has been asserted but the CPU has not acknowledged it [and the CPU has not jumped via the interrupt table].
The request will remain pending until the CPU allows further interrupts by asserting IE. Or, the CPU may clear the pending interrupt without taking action by clearing the pending interrupt in the interrupt controller.
For a "slow" interrupt ISR, the interrupt entry code will clear the interrupt in the external interrupt controller. It will then rearm interrupts by setting IE and call the ISR. This ISR can be interrupted by other [higher priority] interrupts. The result is a "stacked" interrupt.
I have been searching all over the places, I come to the conclusion is that the interrupts have a higher priority than some of exceptions in the Linux kernel.
An exception [synchronous interrupt] can be interrupted if the IE flag is enabled. An svc is treated differently but after the entry code is executed, the IE flag is set, so the actual syscall code [executing in kernel thread mode] can be interrupted by a H/W interrupt.
Or, in limited circumstances, the kernel code can generate an exception (e.g. a page fault caused by a kernel action [which is usually deemed fatal]).
but I am still looking on how exactly the context switching happen when executing an exception and letting the processor to execute in a thread mode while the SVCall exception is pending (was preempted and have not returned yet)... I think when I understand that, it would be more clear to me.
I think you have to be very careful with the terminology. In particular, when combining terms from disparate sources. Although user mode, kernel thread mode, or interrupt mode can be considered a context [in the dictionary sense of the word], context switching usually means that the current thread is suspended, the scheduler selects a new thread to run and resumes it. That is separate from the user-to-kernel transition.
And if there is any recommended resources about that for ARM-Cortex-M3/4, it would be nice
Here is something: https://interrupt.memfault.com/blog/arm-cortex-m-exceptions-and-nvic But, be very careful in applying the terminology therein. What it considers "pending" only exists in the kernel during the entry code. What is more relevant is what the kernel does to set up mode/context and the terms are not equivalent.
So, from the kernel's standpoint, it's probably better to not consider an svc as "pending".

Entering sleep mode on arm cortex m4

I'm trying to put a cortex m4 processor to sleep for a little less than a second. I want to be able to tell it to sleep, then a second later, or when a button is pressed, pick up right where I left off. I've looked in the reference manual and VLPS mode looks like it would fit my needs. I don't know how to begin to enter that mode or how to program the NVIC.
More Info:
I am doing this in C, on the bare metal.

You can download and inspect the code that implements this demo. Although the demo is for an RTOS the code used to place the CPU into a sleep mode is the same whether an RTOS is being used or the application is running on bare metal.
There are generic things you can do to place a Cortex-M3 core into a low power state (see the WFI instruction). To get extreme low power then you have to do chip specific things as well. The above linked code performs some chip specific pre-sleep processing (turn of peripherals, set the chips own sleep mode, etc.) before calling WFI, then does some chip specific things when it returns from the WFI instruction.

You don't need a RTOS in order to wake up from sleep a Cortex M4, what you need is to use and interrupt (ISR) you should refer to the manufacturer manual, you may wake up with a timer(ISR) or a button(GPIO) depending of the sleep-hibernation modes of your particular chip. Here is a ARM document more in depth about it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/BABGGICD.html

What is the irq latency due to the operating system?

How can I estimate the irq latency on ARM processor?
What is the definition for irq latency?

Interrupt Request (irq) latency is the time that takes for interrupt request to travel from source of the interrupt to the point when it will be serviced.
Because there are different interrupts coming from different sources via different paths, obviously their latency is depending on the type of the interrupt. You can find table with very good explanations about latency (both value and causes) for particular interrupts on ARM site
You can find more information about it in ARM9E-S Core Technical Reference Manual:
4.3 Maximum interrupt latency
If the sampled signal is asserted at the same time as a multicycle instruction has started
its second or later cycle of execution, the interrupt exception entry does not start until
the instruction has completed.
The longest LDM instruction is one that loads all of the registers, including the PC.
Counting the first Execute cycle as 1, the LDM takes 16 cycles.
• The last word to be transferred by the LDM is transferred in cycle 17, and the abort
status for the transfer is returned in this cycle.
• If a Data Abort happens, the processor detects this in cycle 18 and prepares for
the Data Abort exception entry in cycle 19.
• Cycles 20 and 21 are the Fetch and Decode stages of the Data Abort entry
respectively.
• During cycle 22, the processor prepares for FIQ entry, issuing Fetch and Decode
cycles in cycles 23 and 24.
• Therefore, the first instruction in the FIQ routine enters the Execute stage of the
pipeline in stage 25, giving a worst-case latency of 24 cycles.
and
Minimum interrupt latency
The minimum latency for FIQ or IRQ is the shortest time the request can be sampled
by the input register (one cycle), plus the exception entry time (three cycles). The first
interrupt instruction enters the Execute pipeline stage four cycles after the interrupt is
asserted

There are three parts to interrupt latency:
The interrupt controller picking up the interrupt itself. Modern processors tend to do this quite quickly, but there is still some time between the device signalling it's pin and the interrupt controller picking it up - even if it's only 1ns, it's time [or whatever the method of signalling interrupts are].
The time until the processor starts executing the interrupt code itself.
The time until the actual code supposed to deal with the interrupt is running - that is, after the processor has figured out which interrupt, and what portion of driver-code or similar should deal with the interrupt.
Normally, the operating system won't have any influence over 1.
The operating system certainly influences 2. For example, an operating system will sometimes disable interrupts [to avoid an interrupt interfering with some critical operation, such as for example modifying something to do with interrupt handling, or when scheduling a new task, or even when executing in an interrupt handler. Some operating systems may disable interrupts for several milliseconds, where a good realtime OS will not have interrupts disabled for more than microseconds at the most.
And of course, the time it takes from the first instruction in the interrupt handler runs, until the actual driver code or similar is running can be quite a few instructions, and the operating system is responsible for all of them.
For real time behaviour, it's often the "worst case" that matters, where in non-real time OS's, the overall execution time is much more important, so if it's quicker to not enable interrupts for a few hundred instructions, because it saves several instructions of "enable interrupts, then disable interrupts", a Linux or Windows type OS may well choose to do so.

Mats and Nemanja give some good information on interrupt latency. There are two is one more issue I would add, to the three given by Mats.
Other simultaneous/near simultaneous interrupts.
OS latency added due to masking interrupts. Edit: This is in Mats answer, just not explained as much.
If a single core is processing interrupts, then when multiple interrupts occur at the same time, usually there is some resolution priority. However, interrupts are often disabled in the interrupt handler unless priority interrupt handling is enabled. So for example, a slow NAND flash IRQ is signaled and running and then an Ethernet interrupt occurs, it may be delayed until the NAND flash IRQ finishes. Of course, if you have priorty interrupts and you are concerned about the NAND flash interrupt, then things can actually be worse, if the Ethernet is given priority.
The second issue is when mainline code clears/sets the interrupt flag. Typically this is done with something like,
mrs r9, cpsr
biceq r9, r9, #PSR_I_BIT
Check arch/arm/include/asm/irqflags.h in the Linux source for many macros used by main line code. A typical sequence is like this,
lock interrupts;
manipulate some flag in struct;
unlock interrupts;
A very large interrupt latency can be introduced if that struct results in a page fault. The interrupts will be masked for the duration of the page fault handler.
The Cortex-A9 has lots of lock free instructions that can prevent this by never masking interrupts; because of better assembler instructions than swp/swpb. This second issue is much like the IRQ latency due to ldm/stm type instructions (these are just the longest instructions to run).
Finally, a lot of the technical discussions will assume zero-wait state RAM. It is likely that the cache will need to be filled and if you know your memory data rate (maybe 2-4 machine cycles), then the worst case code path would multiply by this.
Whether you have SMP interrupt handling, priority interrupts, and lock free main line depends on your kernel configuration and version; these are issues for the OS. Other issues are intrinsic to the CPU/SOC interrupt controller, and to the interrupt code itself.

What is the difference between FIQ and IRQ interrupt system?

I want to know the difference between FIQ and IRQ interrupt system in
any microprocessor, e.g: ARM926EJ.

ARM calls FIQ the fast interrupt, with the implication that IRQ is normal priority. In any real system, there will be many more sources of interrupts than just two devices and there will therefore be some external hardware interrupt controller which allows masking, prioritization etc. of these multiple sources and which drives the interrupt request lines to the processor.
To some extent, this makes the distinction between the two interrupt modes redundant and many systems do not use nFIQ at all, or use it in a way analogous to the non-maskable (NMI) interrupt found on other processors (although FIQ is software maskable on most ARM processors).
So why does ARM call FIQ "fast"?
FIQ mode has its own dedicated banked registers, r8-r14. R14 is the link register which holds the return address(+4) from the FIQ. But if your FIQ handler is able to be written such that it only uses r8-r13, it can take advantage of these banked registers in two ways:
One is that it does not incur the overhead of pushing and popping any registers that are used by the interrupt service routine (ISR). This can save a significant number of cycles on both entry and exit to the ISR.
Also, the handler can rely on values persisting in registers from one call to the next, so that for example r8 may be used as a pointer to a hardware device and the handler can rely on the same value being in r8 the next time it is called.
FIQ location at the end of the exception vector table (0x1C) means that if the FIQ handler code is placed directly at the end of the vector table, no branch is required - the code can execute directly from 0x1C. This saves a few cycles on entry to the ISR.
FIQ has higher priority than IRQ. This means that when the core takes an FIQ exception, it automatically masks out IRQs. An IRQ cannot interrupt the FIQ handler. The opposite is not true - the IRQ does not mask FIQs and so the FIQ handler (if used) can interrupt the IRQ. Additionally, if both IRQ and FIQ requests occur at the same time, the core will deal with the FIQ first.
So why do many systems not use FIQ?
FIQ handler code typically cannot be written in C - it needs to be written directly in assembly language. If you care sufficiently about ISR performance to want to use FIQ, you probably wouldn't want to leave a few cycles on the table by coding in C in any case, but more importantly the C compiler will not produce code that follows the restriction on using only registers r8-r13. Code produced by a C compiler compliant with ARM's ATPCS procedure call standard will instead use registers r0-r3 for scratch values and will not produce the correct cpsr restoring return code at the end of the function.
All of the interrupt controller hardware is typically on the IRQ pin. Using FIQ only makes sense if you have a single highest priority interrupt source connected to the nFIQ input and many systems do not have a single permanently highest priority source. There is no value connecting multiple sources to the FIQ and then having software prioritize between them as this removes nearly all the advantages the FIQ has over IRQ.

FIQ or fast interrupt is often referred to as Soft DMA in some ARM references.Features of the FIQ are,
Separate mode with banked register including stack, link register and R8-R12.
Separate FIQ enable/disable bit.
Tail of vector table (which is always in cache and mapped by MMU).
The last feature also gives a slight advantage over an IRQ which must branch.
A speed demo in 'C'
Some have quoted the difficulty of coding in assembler to handle the FIQ. gcc has annotations to code a FIQ handler. Here is an example,
void __attribute__ ((interrupt ("FIQ"))) fiq_handler(void)
{
/* registers set previously by FIQ setup. */
register volatile char *src asm ("r8"); /* A source buffer to transfer. */
register char *uart asm ("r9"); /* pointer to uart tx register. */
register int size asm ("r10"); /* Size of buffer remaining. */
if(size--) {
*uart = *src++;
}
}
This translates to the following almost good assembler,
00000000 <fiq_handler>:
0: e35a0000 cmp sl, #0
4: e52d3004 push {r3} ; use r11, r12, etc as scratch.
8: 15d83000 ldrbne r3, [r8]
c: 15c93000 strbne r3, [r9]
10: e49d3004 pop {r3} ; same thing.
14: e25ef004 subs pc, lr, #4
The assembler routine at 0x1c might look like,
tst r10, #0 ; counter zero?
ldrbne r11, [r8] ; get character.
subne r10, #1 ; decrement count
strbne r11, [r9] ; write to uart
subs pc, lr, #4 ; return from FIQ.
A real UART probably has a ready bit, but the code to make a high speed soft DMA with the FIQ would only be 10-20 instructions. The main code needs to poll the FIQ r10 to determine when the buffer is finished. Main (non-interrupt code) may transfer and setup the banked FIQ registers by using the msr instruction to switch to FIQ mode and transfer non-banked R0-R7 to the banked R8-R13 registers.
Typically RTOS interrupt latency will be 500-1000 instructions. For Linux, it maybe 2000-10000 instructions. Real DMA is always preferable, however, for high frequency simple interrupts (like a buffer transfer), the FIQ can provide a solution.
As the FIQ is about speed, you shouldn't consider it if you aren't secure in coding in assembler (or willing to dedicate the time). Assembler written by an infinitely running programmer will be faster than a compiler. Having GCC assist can help a novice.
Latency
As the FIQ has a separate mask bit it is almost ubiquitously enabled. On earlier ARM CPUs (such as the ARM926EJ), some atomic operations had to be implemented by masking interrupts. Still even with the most advanced Cortex CPUs, there are occasions where an OS will mask interrupts. Often the service time is not critical for an interrupt, but the time between signalling and servicing. Here, the FIQ also has an advantage.
Weakness
The FIQ is not scalable. In order to use multiple FIQ sources, the banked registers must be shared among interrupt routines. Also, code must be added to determine what caused the interrupt/FIQ. The FIQ is generally a one trick pony.
If your interrupt is highly complex (network driver, USB, etc), then the FIQ probably makes little sense. This is basically the same statement as multiplexing the interrupts. The banked registers give 6 free variables to use which never load from memory. Register are faster than memory. Registers are faster than L2-cache. Registers are faster than L1-cache. Registers are fast. If you can not write a routine that runs with 6 variables, then the FIQ is not suitable. Note: You can double duty some register with shifts and rotates which are free on the ARM, if you use 16 bit values.
Obviously the FIQ is more complex. OS developers want to support multiple interrupt sources. Customer requirements for a FIQ will vary and often they realize they should just let the customer roll their own. Usually support for a FIQ is limited as any support is likely to detract from the main benefit, SPEED.
Summary
Don't bash my friend the FIQ. It is a system programers one trick against stupid hardware. It is not for everyone, but it has its place. When all other attempts to reduce latency and increase ISR service frequency has failed, the FIQ can be your only choice (or a better hardware team).
It also possible to use as a panic interrupt in some safety critical applications.

A feature of modern ARM CPUs (and some others).
From the patent:
A method of performing a fast
interrupt in a digital data processor
having the capability of handling more
than one interrupt is provided. When a
fast interrupt request is received a
flag is set and the program counter
and condition code registers are
stored on a stack. At the end of the
interrupt servicing routine the return
from interrupt instructions retrieves
the condition code register which
contains the status of the digital
data processor and checks to see
whether the flag has been set or not.
If the flag is set it indicates that a
fast interrupt was serviced and
therefore only the program counter is
unstacked.
In other words, an FIQ is just a higher priority interrupt request, that is prioritized by disabling IRQ and other FIQ handlers during request servicing. Therefore, no other interrupts can occur during the processing of the active FIQ interrupt.

Chaos has already answered well, but an additional point not covered so far is that FIQ is at the end of the vector table and so it's common/traditional to just start the routine right there, whereas the IRQ vector is usually just that. (ie a jump to somewhere else). Avoiding that extra branch immediately after a full stash and context switch is a slight speed gain.

another reason is in case of FIQ, lesser number of register is needed to push in the stack, FIQ mode has R8 to R14_fiq registers

FIQ is higher priority, and can be introduced while another IRQ is being handled. The most critical resource(s) are handled by FIQ's, the rest are handled by IRQ's.

I believe this is what you are looking for:
http://newsgroups.derkeiler.com/Archive/Comp/comp.sys.arm/2005-09/msg00084.html
Essentially, FIQ will be of the highest priority with multiple, lower priority IRQ sources.

FIQs are higher priority, no doubt, remaining points i am not sure..... FIQs will support high speed data transfer (or) channel processing, where high speed data processes is required we use FIQs and generally IRQs are used normal interrupt handlling.

No any magic about FIQ. FIQ just can interrupt any other IRQ which is being served,this is why it is called 'fast'. The system reacts faster on these interrupts but the rest is the same.

It Depends how we design interrupt handlers, as FIQ is at last it may not need one branch instruction, also it has unique set of r8-r14 registers so next time we come back to FIQ interrupt we do not need to push/pop up the stack. Ofcourse it saves some cycles, but again it is not wise to have more handlers serving one FIQ and yes FIQ is having more priority but it is not any reason to say it handles the interrupt faster, both IRQ/FIQ run at same CPU frequency, So they must be running at same speed.

This may be wrong. All I know is that FIQ stands for Fast Interrupt Request and that IRQ stands for Interrupt Request. Judging from these names, I will guess that a FIQ will be handled(thrown?) faster than an IRQ. It probably has something to do with the design of the processor where an FIQ will interrupt the process faster than an IRQ. I apologize if I'm wrong, but I normally do higher level programming, I'm just guessing right now.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight