Programming embedded without interrupts

Programming embedded without interrupts - arm

I never would have believed I could be in this position in 2017, but I have a target system (LPC2138) which absolutely refuses to handle interrupts, despite lots of trying on my part. For various reasons I do need to work with it, so it’s just a matter of getting on with it. The application is ‘interrupt friendly’, with several asynchronous I/O streams (SPI, UART) plus timer signals. The one thing in my favour is that the processor is very fast in comparison to my real time requirements, so I have plenty of spare grunt available.
The approach I’m stuck with is to do the whole thing in one big polled loop, that will include 3 FIFOs to handle I/O. On a quick glance it seems viable, does anyone have any comments based on experience?
The interrupt problem is not trivial, 100% platform-compatible hello world snippets straight off the web fail to work, when they run the system crashes in a scrambled state. If there does happen to be a clear, expert fix somewhere that someone knows about, a pointer would be greatly appreciated.

Without knowing your application and your target platform well, I can't give you a definitive answer!
But, you asked for comments based on experience. Here goes :-)
You mention real-time requirements. Working without interrupts can actually help with that! When we did projects with hard real-time requirements, we did not use interrupts. Let's say we were handling the incoming data stream and had exactly 20 us to handle a word or we would miss the next one, and right in the middle of processing one word we were interrupted. Bang! Lost the next one. So we did a lot of polling. In a different application the design decisions could be different, using interrupts to handle time-critical work at the expense of non-real-time code, or some such. The philosophy in the shop I was working in then was very anti-interrupt.
Polling may "waste" some resources (not really waste, since you have to do it :-) ). But you mention that your processor is fast enough. If you can meet your speed requirements, enjoy polling. It is easier to handle than interrupts, in a lot of ways. Your program has more control over what will happen when.
FIFOs are good. They soften your real-time requirements.
I don't know your hw design at all, but if you have a logic chip there and a flexible hw engineer (OK, that's an oxymoron :-) ) you can route some of your inputs and so on through the hw and use hw logic to handle some of your simple requirements, and present you an interface to make writing the program easier (could be some kind of FIFO optimized for your specific needs, for example, or a register to poll that gives you several pieces of information in one go, etc.)
So, go for it! You'll learn a whole new approach that even has some advantages.

dont have a clue where you are stuck we need more info, but maybe a small skeleton will confirm you are at least doing these few things. have you done an arm7 before or is this the first time, or you are well versed in the arm7/arm world but just cant get the interrupts to work?
start.s
.globl _start
_start:
.globl _start
_start:
ldr pc,reset_handler
ldr pc,undefined_handler
ldr pc,swi_handler
ldr pc,prefetch_handler
ldr pc,data_handler
ldr pc,unused_handler
ldr pc,irq_handler
ldr pc,fiq_handler
reset_handler: .word reset
undefined_handler: .word hang
swi_handler: .word hang
prefetch_handler: .word hang
data_handler: .word hang
unused_handler: .word hang
irq_handler: .word irq
fiq_handler: .word hang
reset:
;# (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD2
msr cpsr_c,r0
ldr sp,=0x40002000
;# (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD1
msr cpsr_c,r0
ldr sp,=0x40003000
;# (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
mov r0,#0xD3
msr cpsr_c,r0
ldr sp,=0x40004000
bl notmain
hang: b hang
.globl PUT32
PUT32:
str r1,[r0]
bx lr
.globl GET32
GET32:
ldr r0,[r0]
bx lr
.globl dummy
dummy:
bx lr
.globl enable_irq
enable_irq:
mrs r0,cpsr
bic r0,r0,#0x80
msr cpsr_c,r0
bx lr
irq:
push {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}
bl c_irq_handler
pop {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}
subs pc,lr,#4
yes that is too many registers...only need the volatile ones the compiler covers the others.
notmain:
void c_irq_handler ( void )
{
}
void notmain ( void )
{
unsigned int ra;
for(ra=0;ra<100;ra++) dummy(ra);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x40000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.bss : { *(.bss*) } > ram
}
guess I didnt need .bss
build
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
arm-none-eabi-ld -T flash.ld start.o notmain.o -o so.elf
arm-none-eabi-objdump -D so.elf > so.list
arm-none-eabi-objcopy so.elf -O binary so.bin
always examine your so.list
they have an interesting way to determine if you have a flash there or if they should dump into their bootloader
Criterion for valid user code: The reserved ARM interrupt vector
location (0x0000 0014) should contain the 2’s complement of the
check-sum of the remaining interrupt vectors. This causes the checksum
of all of the vectors together to be 0.
I didnt do that yet, can do that by hand or have a program come along after and do it programmatically.
change it to this
ldr pc,reset_handler
ldr pc,undefined_handler
ldr pc,swi_handler
ldr pc,prefetch_handler
ldr pc,data_handler
.word 0xb8a06f58 #ldr pc,unused_handler
ldr pc,irq_handler
ldr pc,fiq_handler
and that should work fine.
If this is completely useless to elementary then let me know will delete this answer no problem.
notmain.c
void PUT32 ( unsigned int, unsigned int );
unsigned int GET32 ( unsigned int );
void enable_irq ( void );
#define T0IR 0xE0004000
#define T0TCR 0xE0004004
#define T0PC 0xE0004010
#define T0MCR 0xE0004014
#define T0TC 0xE0004008
#define T0MCR0 0xE0004018
void c_irq_handler ( void )
{
PUT32(T0IR,1);
}
void notmain ( void )
{
PUT32(T0TCR,2);
PUT32(T0TCR,0);
PUT32(T0TC,0);
PUT32(T0MCR0,0x100000);
PUT32(T0MCR,0x1); //3);
PUT32(T0TCR,1);
while(1)
{
if(GET32(T0IR&1)) break;
}
PUT32(T0IR,1);
PUT32(T0TCR,2);
PUT32(T0TCR,0);
PUT32(T0TC,0);
PUT32(T0MCR0,0x100000);
PUT32(T0MCR,0x1); //3);
PUT32(T0IR,1);
enable_irq();
PUT32(T0TCR,1);
while(1) continue;
}
just banged this out from the manual didnt check if there was a clock enable for the timer, etc. Personally I get the gpio up first, blink an led with a big counter loop
for(ra=0;ra<0x20000;ra++) dummy(ra);
then use a timer in a polling mode (just get it started free-running) to blink based on time from this can figure out the clock speed, that leads into the uart, get the uart, up, have a trivial routine for printing data
void hexstring ( unsigned int d )
{
//unsigned int ra;
unsigned int rb;
unsigned int rc;
rb=32;
while(1)
{
rb-=4;
rc=(d>>rb)&0xF;
if(rc>9) rc+=0x37; else rc+=0x30;
uart_send(rc);
if(rb==0) break;
}
uart_send(0x0D);
uart_send(0x0A);
}
octal is even easier but man is it hard to think in octal...
THEN finally the interrupt stuff, take code like the above and I would poll and print the various registers in the timer, would see if that interrupt register does fire in a polling mode first (second, third, fourth... without enabling interrupts to the processor (dont do that for a while).
Once I can see it at the peripheral level, which some dont work that way but assume this one does. Then in this case there is a VIC, and I assume this register
VICRawIntr 0xFFFFF008
should also be asserted if the timer interrupt is asserted and has fired. confirm it is (bit 4 it appears) confirm it goes away when the interrupt is cleared in the peripheral.
VICIntSelect resets to zero which is irq, that is what we want dont need to touch it for now.
I assume set bit 4 in VICIntEnable
then do the polling thing again printing out to see what is going on.
Now I would expect to see the interrupt show in VICIRQStatus (still completely polling do not enable the irq to the processor yet) and go away when the peripheral clears, or figure out how to clear it if the peripheral interrupt being cleared does not make it this far.
NOW it is time to enable the irq to the processor, and I personaly would shove a byte into the uart to see it pop out. or flick an led on or something.
in theory we just clear the periheral and return to safely return to the application.
I follow the same procedure no matter what processor it is mcu or full sized, etc. interrupts can be nightmares and the more code you write without testing the more likely to fail. sometimes one line of code per test is required. YMMV.
again if this is completely useless sorry, will delete. I think I have one/some 2148s but not a 2138, and am not going to order one just to write/test working code. been using arm since this ARMV7TDMI was out and becoming popular to the present armv8s which are far more painful. the pi-zero and such are quite fun as they are old school like this arm7...
I figure the timer would be the easier one to get a real interrupt from. another might be a gpio pin although I think that is more work but would tie a jumper between pins, make one an output the other an input with interrupt edge detection if this chip has it use the output gpio to change the state of the input in a clean manner, and go through the whole polling process of watching the interrupt hit each layer in turn and confirm you can remove the interrupt each time at each layer. then bang on the edge of the core, and then finally let it into the core.

Related

Why do I need an infinite loop in STM32 programming?

I'm programing a STM32F4 in C (gcc), it's a ARM Cortex M4, I see all examples finish their main() function with an infinite loop, even when the rest of the program will be executed from interruptions. If I try to remove the loop from my program, the interruptions stop being fired too.
Why can't I just remove this loop and exit the main thread?
here is the assembly (I guess it's thumb, but I can't read that, even with the doc):
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call the application's entry point.*/
bl main
bx lr
.size Reset_Handler, .-Reset_Handler

Take a look at the setup code that runs before main in your project. It might be some slim assembly code or something more complicated, but in general it's pretty close to the bare minimum amount of processor setup required to initialize a stack and get the C runtime going.
If you were return from main, what is your processor supposed to do? Reset? Hang? There's no one good answer, so you'll have to look at the runtime support code being linked with your program to see what its designers decided. In your case, it sounds like they didn't make any allowances for main to return, so the processor just crashes/takes an exception and your program stops working.
Edit: It seems like what you're actually looking for is a way to enter a low power state during the idle loop. That's certainly possible - since your processor is an ARM Cortex-M4, there's a simple instruction to do just that:
while (1)
{
asm("wfi");
}
If you're using CMSIS (and it looks like you are, given your use of SystemInit), the assembly is probably already done for you:
while(1)
{
__WFI();
}
More details at this link.

you are not running on an operating system. main() is just a function like any other, it returns from where it was called. With a bare metal system like this though that is not an operating system. So if your software is all interrupt driven and main() is simply for setup, then you need to keep the processor in a controlled state either in an infinite loop or in a low power mode. you can do that at the end of main() or whatever your setup function is called, or the assembly that calls main:
bl main
b . ;# an infinite loop
or if you want wfi in there:
bl main
xyz:
wfi
b xyz

i guess you haven't defined a interrupt handler which your program need.

Enabling Interrupts in U-boot for ARM cortex A-9

I am trying to configure a GPIO interrupt in the uboot, This it to test the Interrupt response time without any OS intervention (Bare-metal). I was able to configure the pin-muxing and also successful in setting up the interrupt with the GPIO pin.
My question is regarding the registering of the interrupt service routine. I see that the Interrupt vector table for my platform is at address 0xFFFF0000 ( I read the System Control Register to find out this). The interrupt Id for the GPIO was 56 and with that i just calculated the address where my interrupt service routine should reside and just tried writing the address with the pointer to my ISR routine. Is this the right way of doing it? or i have to take care of the all the other things like context saving etc by myself?
Note : I am using an ARM Cortex A-9.
Edit :
Based on the answers i went through the code i have the following questions. The definition of
do_irq for my architecture( arm v7) is not doing much and the CONFIG_USE_IRQ does not work for me since functions like arch_interrupt_init are not defined for me. So i can conclude interrupts are not supported for my architecture. Now If i have to define this on my own what all functions i need to implement to get this working? Since this is a very small part of my proj and i would want to see if i can do this is feasible to implement. I just want to know if this requires few lines of code or requires some effort to implement this interrupt support.

The ARM vectors all interrupts to address 0xFFFF0018 (or 0x00000018). This is typically an unconditional branch. Then the code will inspect some interrupt controller hardware to determine the number 56. Typically, there is a routine to set the handler for the interrupt number, so you don't manually patch the code; this table is dependent on how the u-boot interrupt handling is implemented.
In my u-boot sourcenote, the interrupt table looks like this,
.globl _start
_start:
b reset
ldr pc, _undefined_instruction
ldr pc, _software_interrupt
ldr pc, _prefetch_abort
ldr pc, _data_abort
ldr pc, _not_used
ldr pc, _irq
ldr pc, _fiq
...
_irq:
.word irq
So _irq is a label to install a routine for interrupt handling; it does some assembler in the same file and then calls do_irq(), based on CONFIG_USE_IRQ. Part of the API is in *lib_arm/interrupts.c*. Some CPUs are defined to handler irqs, such as cpu/arm720t/interrupts.c, for a S3C4510B. Here you can see that the code gets a register from this controller and then branches to a table.
So by default u-boot doesn't seem to have support for interrupts. This is not surprising as a boot loader is usually polling based for simplicity and speed.
Note: My u-boot is base on 2009.01-rc3.

How does the ARM linker know where the exception table stops?

I have come across something very similar to the following when looking for basic, bare-metal Cortex-M3 programming information (referred to in a great answer right here, that I'll attempt to locate later).
/* vectors.s */
.cpu cortex-m3
.thumb
.word 0x20002000 /* stack top address */
.word _start /* 1 Reset */
.word hang /* 2 NMI */
.word hang /* 3 HardFault */
.word hang /* 4 MemManage */
.word hang /* 5 BusFault */
.word hang /* 6 UsageFault */
.word hang /* 7 RESERVED */
.word hang /* 8 RESERVED */
.word hang /* 9 RESERVED*/
.word hang /* 10 RESERVED */
.word hang /* 11 SVCall */
.word hang /* 12 Debug Monitor */
.word hang /* 13 RESERVED */
.word hang /* 14 PendSV */
.word hang /* 15 SysTick */
.word hang /* 16 External Interrupt(0) */
.word hang /* 17 External Interrupt(1) */
.word hang /* 18 External Interrupt(2) */
.word hang /* 19 ... */
.thumb_func
.global _start
_start:
bl notmain
b hang
.thumb_func
hang: b .
This makes sense to me, I understand what it does, but what does not make sense to me is how the linker (or CPU, I'm not sure which...) knows where the exception table ends and actual code begins. It seems to me from the Cortex-M3 documentation that there can be an arbitrary number of external interrupts in the table.
How does this work, and what should I have read to learn?

That is my code, and it is the programmer not the linker that needs to know what is going on. The cortex-m's have different vector tables and they can be quite lengthy, many many individual interrupt vectors. The above was just a quicky from one flavor one day. If you never will use anything more than the reset vector do you need to burn more memory locations creating an entire table? probably not. You might want to make one deep enough to cover undefined exceptions and aborts and such, so you can trap them with a controlled hang. If you only need reset and one interrupt but that interrupt is pretty deep in the table, you can make a huge table to cover both or if brave enough put some code between them to recover the lost space.
To learn/read more you need to go to the TRM, technical reference manual, for the specific cortex-m cores cortex-m0, cortex-m3 and cortex-m4 as well as possibly the ARM ARM for the armv7-m (the cortex-m's are all of the family armv7-m). ARM ARM means ARM Architectural Reference Manual, which talks generically about the whole family but doesnt cover core specific details to any great length, you need an ARM ARM and the right TRM usually when programming for an ARM (if you need to get into anything low level like asm or specific registers). All can be found at the arm website infocenter.arm.com along the left look for Architecture or look for Cortex-M series processors.
These cores are new enough they probably dont have more than the original rev number. For some older cores it is best to get the TRM for the specific core. Take the ARM11 Mpcore for example if the vendor used rev 1.0 (r1p0) of the core despite the obsolete markings on the manual you want to at least try to use the rev 1.0 manual over the rev 2.0 manual if there are differences. The vendors dont always tell you which rev they bought/used so this can be a problem when sorting out how to program the thing or what errata applies to you and what doesnt. (Linux is FULL of mistakes related to ARM because of not understanding this, errata applied wrong in general or applied to the wrong core, instructions used on the wrong core or at the wrong time, etc).
I probably wrote that handler for a cortex-m3 when i first got one and then cut and pasted here and there not bothering to fix/change it.
The core (CPU, logic) definitely knows how many interrupts are supported, and knows the complete make up of the vector table. There might be control signals on the edge of the core that might change these kinds of things, and the vendor may have tied those one way or made them programmable, etc. No matter what the logic for that core definitely knows what the vector table looks like it is up to the programmer to make the code match the hardware as needed. This will all be described in the TRM for that core.
EDIT
Hardcoded in the logic will be an address or an offset for each of these events or interrupts. So when the interrupt or event happens it performs a memory read from that address.
This is just a bank of flash memory, you feed it an address flick a read strobe and some data comes out. Those bits dont mean anything until you put them in a context. You can cause a memory cycle at address 0x10 if you create some code that does a load instruction at that address, or you can branch to that address and a fetch cycle reads that location hoping to find an instruction, and if an event has that hardcoded address and a read happens at that address it is hoping to find an address for a handler. but it is just a memory. Just like a file on a file system, its just bytes on the disk. If it is a jpeg then a particular byte might be a pixel, if the file on the disk is a program then a byte at an offset might be an instruction. its just a byte.
These addresses are generated directly from the logic. These days logic is written using programming languages (usually verilog or vhdl or some higher level language that produces verilog or vhdl as an output). No different than if you were to write a program in your favorite language you might choose to literally hardcode some address
x = (unsigned int *)0x1234;
or you might choose to use a structure or an array or some sort of other programming style, but once compiled it still ends up producing some fixed address or offset:
unsigned int vector_table[256];
...
handler_address = vector_table[interrupt_base+interrupt_number];
...
So as a programmer, at this low level, you have to know if and when the hardware is going to read one of these addresses and why. if you never use interrupts because you never enable any interrupts then those memory locations that might normally hold interrupt handler addresses, are now memory locations you can use for anything you want. if as you see in many, almost all, of my examples I only ever need the reset vector, the rest I dont use. I might accidentally hit an undefined instruction handler or data abort if I accidentally perform an unaligned access, but I dont worry about it enough to place a handler there, usually. I am going to crash/hang anyway so I will sort that problem out when I get there, cross that bridge when I get to it. So I am usually content with the bare minimum for a cortex-m the stack address and reset address:
.cpu cortex-m3
.thumb
.word 0x20002000 /* stack top address */
.word _start /* 1 Reset */
.thumb_func
.global _start
_start:
bl notmain
b hang
.thumb_func
hang: b .
And yes, absolutely I have placed a two word instruction bl notmain at the memory location for an NMI. if an NMI were to occur, then the cpu would read that location, assume it was an address for a handler, try to fetch the instruction at that address, which will have who knows what. It might know before even fetching that it is an invalid address and cause a data abort, which in the above case would be yet another address to somewhere, and it might turn into an infinite loop of data aborts. Or if by chance one of those instructions happens to appear like an address in our program then basically it will jump right into the middle of the program, which quite often will end up in some sort of crash. what is a crash? really? the cpu doing its job reading bytes from memory and interpreting them as instructions if those instructions tell it to do unaligned accesses or those bytes are not valid instructions or cause you to read invalid memory address you go back into the vector table. Or you might be so lucky that the messed up addresses being written or read by jumping in the middle of the code space cause a flash bank to be erased, or a byte to spit out a uart, or a gpio input port to be changed into an output port (if really lucky you make it try to drive against a ground or something and you melt down the chip).
If I were to start seeing weird things (and hopefully not hear or smell or see the chip melting down) I might toss in a few dozen entries in the vector table that point to a hang, or point to a handler that spits something out a port or turns on a gpio/led. if my weirdness now becomes this led coming on or the uart output in the handler then I have to sort out what event is happening, or if I think about the last few changes to my application I may realize I had an unaligned access or a branch into the weeds, etc.
It goes back to that saying, "The computer doesnt do what you want it to do, it does what you told it to do". You put the bytes there and it interpreted them for what they were, if you didnt put the right bytes in the right place (vector addresses to the right place, instructions that do the right thing, and the right data in the right place) it can/will crash.

The answer is it doesn't. It's the responsibility of the programmer to put a sufficient number of vectors to trap all of the ones generated by your CPU. If you screw it up, the CPU could try to load something after the vector table (like an instruction from your _start code) , treat it like a vector and try to jump into it (which would probably cause a data abort).

Setting up Interrupt Vector Table, ARMv6

I'm trying to use usermode and SVC in my ARMv6 bare metal application, but for this I need to set up the SVC entry of the ARMv6 interrupt vector table to branch to my interrupt handler. But, I can't find a good example on how to do this (ie: what memory address exactly I need to set, and to what). I have done similar things in the past, but always with a more comprehensive bootloader (RedBoot) that set up some of this for me. Any help would be appreciated.
I am testing my application using:
qemu-system-arm -M versatilepb -cpu arm1176

Are you talking about the SWI interrupt? Or one of the others (FIQ, IRQ). In either case I think I know what the problem is. Qemu is for running linux, your binary is not loaded at address 0x00000 so your entry points are not used by qemu for handling exceptions.
I have an example that uses qemu and implements a solution. Go to the qemu directory of http://github.com/dwelch67/yagbat. The qemu example is not really related to the gba thing in the yagbat repo, the gba is a 32 bit ARM thing so it was easy to borrow code from so I stuck it there.
The example was specifically written for your question as I tried to figure out how to use qemu in this manner. It appears that the address 0x00000000 space is simulated as ram, so you can re-write the qemu exception table and have the exceptions call code in the 0x10000 address space that your binary loads.
A quick and dirty solution is to make the entry point of the binary (that qemu loads to 0x10000) resemble a vector table at address 0x00000. The ldr pc instruction is relative to the program counter, the disassembly might show that it is loading an address at 0x10000 but it is really relative to the pc and the disassembler used the pc assuming the linked address being used.
.globl _start
_start:
ldr pc,start_vector_add
ldr pc,undef_vector_add
ldr pc,swi_vector_add
start_vector_add: .word start_vector
undef_vector_add: .word undef_vector
swi_vector_add: .word swi_vector
Then before you want to cause any interrupts, in the example I use the swi instruction to cause an swi interrupt. You copy enough of the code from 0x10000 to 0x00000 to include the exception table and the list of addresses that it loads into the pc. by linking your program to 0x10000 those addresses are in the 0x10000 range. When the interrupt occurs, the exception handler that you have now modified will load the 0x10000 based address into the pc and your handler in the 0x10000 range will get called.
Using this command line to run the binary in my example
qemu-system-arm -M versatilepb -m 128M -kernel hello_world.bin
and then ctrl-alt-3 (not F3 but 3) will switch to the serial console and you can see the output, and close that window to close out of qemu and stop the simulation.

What is the difference between FIQ and IRQ interrupt system?

I want to know the difference between FIQ and IRQ interrupt system in
any microprocessor, e.g: ARM926EJ.

ARM calls FIQ the fast interrupt, with the implication that IRQ is normal priority. In any real system, there will be many more sources of interrupts than just two devices and there will therefore be some external hardware interrupt controller which allows masking, prioritization etc. of these multiple sources and which drives the interrupt request lines to the processor.
To some extent, this makes the distinction between the two interrupt modes redundant and many systems do not use nFIQ at all, or use it in a way analogous to the non-maskable (NMI) interrupt found on other processors (although FIQ is software maskable on most ARM processors).
So why does ARM call FIQ "fast"?
FIQ mode has its own dedicated banked registers, r8-r14. R14 is the link register which holds the return address(+4) from the FIQ. But if your FIQ handler is able to be written such that it only uses r8-r13, it can take advantage of these banked registers in two ways:
One is that it does not incur the overhead of pushing and popping any registers that are used by the interrupt service routine (ISR). This can save a significant number of cycles on both entry and exit to the ISR.
Also, the handler can rely on values persisting in registers from one call to the next, so that for example r8 may be used as a pointer to a hardware device and the handler can rely on the same value being in r8 the next time it is called.
FIQ location at the end of the exception vector table (0x1C) means that if the FIQ handler code is placed directly at the end of the vector table, no branch is required - the code can execute directly from 0x1C. This saves a few cycles on entry to the ISR.
FIQ has higher priority than IRQ. This means that when the core takes an FIQ exception, it automatically masks out IRQs. An IRQ cannot interrupt the FIQ handler. The opposite is not true - the IRQ does not mask FIQs and so the FIQ handler (if used) can interrupt the IRQ. Additionally, if both IRQ and FIQ requests occur at the same time, the core will deal with the FIQ first.
So why do many systems not use FIQ?
FIQ handler code typically cannot be written in C - it needs to be written directly in assembly language. If you care sufficiently about ISR performance to want to use FIQ, you probably wouldn't want to leave a few cycles on the table by coding in C in any case, but more importantly the C compiler will not produce code that follows the restriction on using only registers r8-r13. Code produced by a C compiler compliant with ARM's ATPCS procedure call standard will instead use registers r0-r3 for scratch values and will not produce the correct cpsr restoring return code at the end of the function.
All of the interrupt controller hardware is typically on the IRQ pin. Using FIQ only makes sense if you have a single highest priority interrupt source connected to the nFIQ input and many systems do not have a single permanently highest priority source. There is no value connecting multiple sources to the FIQ and then having software prioritize between them as this removes nearly all the advantages the FIQ has over IRQ.

FIQ or fast interrupt is often referred to as Soft DMA in some ARM references.Features of the FIQ are,
Separate mode with banked register including stack, link register and R8-R12.
Separate FIQ enable/disable bit.
Tail of vector table (which is always in cache and mapped by MMU).
The last feature also gives a slight advantage over an IRQ which must branch.
A speed demo in 'C'
Some have quoted the difficulty of coding in assembler to handle the FIQ. gcc has annotations to code a FIQ handler. Here is an example,
void __attribute__ ((interrupt ("FIQ"))) fiq_handler(void)
{
/* registers set previously by FIQ setup. */
register volatile char *src asm ("r8"); /* A source buffer to transfer. */
register char *uart asm ("r9"); /* pointer to uart tx register. */
register int size asm ("r10"); /* Size of buffer remaining. */
if(size--) {
*uart = *src++;
}
}
This translates to the following almost good assembler,
00000000 <fiq_handler>:
0: e35a0000 cmp sl, #0
4: e52d3004 push {r3} ; use r11, r12, etc as scratch.
8: 15d83000 ldrbne r3, [r8]
c: 15c93000 strbne r3, [r9]
10: e49d3004 pop {r3} ; same thing.
14: e25ef004 subs pc, lr, #4
The assembler routine at 0x1c might look like,
tst r10, #0 ; counter zero?
ldrbne r11, [r8] ; get character.
subne r10, #1 ; decrement count
strbne r11, [r9] ; write to uart
subs pc, lr, #4 ; return from FIQ.
A real UART probably has a ready bit, but the code to make a high speed soft DMA with the FIQ would only be 10-20 instructions. The main code needs to poll the FIQ r10 to determine when the buffer is finished. Main (non-interrupt code) may transfer and setup the banked FIQ registers by using the msr instruction to switch to FIQ mode and transfer non-banked R0-R7 to the banked R8-R13 registers.
Typically RTOS interrupt latency will be 500-1000 instructions. For Linux, it maybe 2000-10000 instructions. Real DMA is always preferable, however, for high frequency simple interrupts (like a buffer transfer), the FIQ can provide a solution.
As the FIQ is about speed, you shouldn't consider it if you aren't secure in coding in assembler (or willing to dedicate the time). Assembler written by an infinitely running programmer will be faster than a compiler. Having GCC assist can help a novice.
Latency
As the FIQ has a separate mask bit it is almost ubiquitously enabled. On earlier ARM CPUs (such as the ARM926EJ), some atomic operations had to be implemented by masking interrupts. Still even with the most advanced Cortex CPUs, there are occasions where an OS will mask interrupts. Often the service time is not critical for an interrupt, but the time between signalling and servicing. Here, the FIQ also has an advantage.
Weakness
The FIQ is not scalable. In order to use multiple FIQ sources, the banked registers must be shared among interrupt routines. Also, code must be added to determine what caused the interrupt/FIQ. The FIQ is generally a one trick pony.
If your interrupt is highly complex (network driver, USB, etc), then the FIQ probably makes little sense. This is basically the same statement as multiplexing the interrupts. The banked registers give 6 free variables to use which never load from memory. Register are faster than memory. Registers are faster than L2-cache. Registers are faster than L1-cache. Registers are fast. If you can not write a routine that runs with 6 variables, then the FIQ is not suitable. Note: You can double duty some register with shifts and rotates which are free on the ARM, if you use 16 bit values.
Obviously the FIQ is more complex. OS developers want to support multiple interrupt sources. Customer requirements for a FIQ will vary and often they realize they should just let the customer roll their own. Usually support for a FIQ is limited as any support is likely to detract from the main benefit, SPEED.
Summary
Don't bash my friend the FIQ. It is a system programers one trick against stupid hardware. It is not for everyone, but it has its place. When all other attempts to reduce latency and increase ISR service frequency has failed, the FIQ can be your only choice (or a better hardware team).
It also possible to use as a panic interrupt in some safety critical applications.

A feature of modern ARM CPUs (and some others).
From the patent:
A method of performing a fast
interrupt in a digital data processor
having the capability of handling more
than one interrupt is provided. When a
fast interrupt request is received a
flag is set and the program counter
and condition code registers are
stored on a stack. At the end of the
interrupt servicing routine the return
from interrupt instructions retrieves
the condition code register which
contains the status of the digital
data processor and checks to see
whether the flag has been set or not.
If the flag is set it indicates that a
fast interrupt was serviced and
therefore only the program counter is
unstacked.
In other words, an FIQ is just a higher priority interrupt request, that is prioritized by disabling IRQ and other FIQ handlers during request servicing. Therefore, no other interrupts can occur during the processing of the active FIQ interrupt.

Chaos has already answered well, but an additional point not covered so far is that FIQ is at the end of the vector table and so it's common/traditional to just start the routine right there, whereas the IRQ vector is usually just that. (ie a jump to somewhere else). Avoiding that extra branch immediately after a full stash and context switch is a slight speed gain.

another reason is in case of FIQ, lesser number of register is needed to push in the stack, FIQ mode has R8 to R14_fiq registers

FIQ is higher priority, and can be introduced while another IRQ is being handled. The most critical resource(s) are handled by FIQ's, the rest are handled by IRQ's.

I believe this is what you are looking for:
http://newsgroups.derkeiler.com/Archive/Comp/comp.sys.arm/2005-09/msg00084.html
Essentially, FIQ will be of the highest priority with multiple, lower priority IRQ sources.

FIQs are higher priority, no doubt, remaining points i am not sure..... FIQs will support high speed data transfer (or) channel processing, where high speed data processes is required we use FIQs and generally IRQs are used normal interrupt handlling.

No any magic about FIQ. FIQ just can interrupt any other IRQ which is being served,this is why it is called 'fast'. The system reacts faster on these interrupts but the rest is the same.

It Depends how we design interrupt handlers, as FIQ is at last it may not need one branch instruction, also it has unique set of r8-r14 registers so next time we come back to FIQ interrupt we do not need to push/pop up the stack. Ofcourse it saves some cycles, but again it is not wise to have more handlers serving one FIQ and yes FIQ is having more priority but it is not any reason to say it handles the interrupt faster, both IRQ/FIQ run at same CPU frequency, So they must be running at same speed.

This may be wrong. All I know is that FIQ stands for Fast Interrupt Request and that IRQ stands for Interrupt Request. Judging from these names, I will guess that a FIQ will be handled(thrown?) faster than an IRQ. It probably has something to do with the design of the processor where an FIQ will interrupt the process faster than an IRQ. I apologize if I'm wrong, but I normally do higher level programming, I'm just guessing right now.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight