Regarding NAPI implementation in Linux kernel

Regarding NAPI implementation in Linux kernel - c

I am trying to understand the NAPI enabled Network driver and have some doubts regarding the same.
If I talk about in layman's term whenever a network packets comes at the interface, it is notified to CPU and appropriate Ethernet driver(interrupt handler) code is executed.Ethernet driver code then copy the packet from Ethernet's Device memory to DMA buffers and finally packets are pushed to upper layer.
Is above true for NAPI disabled Ethernet driver?
Now for NAPI enabled Ethernet driver initially whenever packets comes at interface ,it is notified to CPU and appropriate Ethernet driver code (Interrupt handler) is executed .Inside the interrupt handler code we check if type of interrupt is received packet.
if(statusword & SNULL_RX_INTER)
snull_rx_ints(dev,0);//Disbale further interrupts
netif_rx_schedule(dev);
What does it mean by disabling further interrupts?
Is it mean packets are still captured by device and kept in device memory but not notified to CPU about the availability of these packets?
Also ,what it mean by CPU is pooling the device ,is it like CPU after every few second will run snull_poll() method and copy whatever number of packets are in device memory to DMA Buffer and pushed to Upper layer?
It would be great help if someone provides me clear picture on it.

Now for NAPI enabled Ethernet driver initially whenever packets comes at interface ,it is notified to CPU and appropriate Ethernet driver code (Interrupt handler) is executed .Inside the interrupt handler code we check if type of interrupt is received packet.
What it mean by disabling further interrupts?
Normally a driver would clear the condition causing the interrupt. The NAPI driver, however, may also disable the receive interrupt when the ISR is done.
The assumption is that the arrival of one Ethernet frame may be the start of a burst or flood of frames. So instead of exiting interrupt mode and likely immediately reentering interrupt mode, why not test (i.e. poll) if more frames have already arrived?
Is it mean packets are still captured by device
Yes.
Each arriving frame is stored by the Ethernet controller in a frame buffer.
and kept in device memory
It's not typically "device memory".
It is typically a set of buffers (e.g. ring buffer) allocated in main memory assigned to the Ethernet controller.
but not notified to CPU about the availability of these packets?
Since the receive interrupt has been disabled, the NAPI driver is not notified of this event.
But since the driver is busy processing the previous frame, the interrupt request could not be serviced immediately anyway.
Also ,what it mean by CPU is pooling the device ,
Presumably you are actually asking about "polling"?
Polling simply means that the program (i.e. the driver) interrogates (i.e. reads and tests) status bit(s) for the condition it is waiting for.
If the condition is met, then it will process the event in a manner similar to an interrupt for that event.
If the condition is not met, then it may loop (in the generic case). But the NAPI driver, when the poll indicates that no more frames have arrived, will assume that the packet burst or flood is over, and will resume interrupt mode.
is it like CPU after every few second will run snull_poll() method and copy whatever number of packets are in device memory to DMA Buffer and pushed to Upper layer?
The NAPI driver would not delay or suspend itself for a "few second"s before polling.
The assumption is that Ethernet frames could be flooding the port, so the poll would be performed as soon as processing on the current frame is complete.
A possible bug in a NAPI driver is called "rotting packet".
When the driver transitions from the poll mode back to interrupt mode, a frame could arrive during this transition and be undetected by the driver.
Not until another frame arrives (and generates an interrupt) would the previous frame be "found" and processed by the NAPI driver.
BTW
You consistently write statements or questions similar to "the CPU does ..." or "notified to CPU".
The CPU is always (when not sleeping or powered off) executing machine instructions.
You should be concerned about to which logical entity (i.e. which program or source code module) those instructions belong.
You're asking software questions, so the fact that an interrupt causes a known, certain sequence by the CPU is a given and need not be mentioned.
ADDENDUM
I am just trying to understand drivers/net/ethernet/smsc/smsc911x.c in Linux source code.
The SMSC LAN911x Ethernet chips are more sophisticated than what I'm used to and have been describing above. Besides the MAC, these chips also have an integrated PHY, and have TX and RX FIFOs instead of using buffer ring or lists in main memory.
As per your suggestion I have started reading the SMSCLan9118 datasheet and trying to map it with smsc911x_irqhandler function where interrupt status (INT_STS) and interrupt enable (INT_EN) registers have been read but don't get how
if (likely(intsts & inten & INT_STS_RSFL_))
condition is checked here in line 1627.
INT_STS is defined in the header file as
#define INT_STS 0x58
and the table in Section 5.3, System Control and Status Registers, in the datasheet lists the register at (relative) address 0x58 as
58h INT_STS Interrupt Status
So the smsc911x device driver uses the exact same register name as the HW datasheet.
This 32-bit register is read using this register offset in the ISR using:
u32 intsts = smsc911x_reg_read(pdata, INT_STS);
So the 32 bits of the interrupt status (in variable intsts) is Boolean ANDed with the 32 bits of the interrupt mask (in variable inten).
This produces the interrupt status bits that the driver are actually interested in. This may also be good defensive programming in case the HW sets status bits anyway for interrupt conditions that have not been enabled (in the INT_EN register).
Then that if statement does another Boolean AND to extract the one bit (INT_STS_RSFL_) that is being checked.
5.3.3 INT_STS—Interrupt Status Register
RX Status FIFO Level Interrupt (RSFL).
Generated when the RX Status FIFO reaches the programmed level
The likely() operator is for compiler optimization to utilize branch prediction capabilities in the CPU. The driver's author is directing the compiler to optimize the code for a true result of the enclosed logic expression (e.g. the ANDing of three integers, which would indicate an interrupt condition that needs servicing).
Also on recieving the packet on interface which bit is set on which register.
My take on reading the LAN9118 datasheet is that there really is no interrupt specifically for the receipt of a frame.
Instead the host can be notified when the RX FIFO exceeds a threshold.
5.3.6 FIFO_INT—FIFO Level Interrupts
RX Status Level.
The value in this field sets the level, in number of DWORDs, at which the RX Status FIFO Level interrupt (RSFL) will be generated.
When the RX Status FIFO used space is greater than this value an RX Status FIFO Level interrupt (RSFL) will be generated.
The smsc911x driver apparently uses this threshold at its default value of zero.
Each entry in the RX Status FIFO occupies a DWORD. The default value of this threshold is 0x00 (i.e. interrupt on "first" frame). If this threshold is more than zero, then there is the possibility of "rotting packets".

Related

lwip to send data bigger than 64kb

i'm trying to send over lwip a RT data (4 bytes) sampled at 100kHz for 10 channels.
I've understood that lwip has a timer which loops every 250ms and it cannot be changed.
In this case i'm saving the RT over RAM at 100kHz and every 250ms sending the sampled data over TCP.
The problem is that i cannot go over 65535 bytes every 250ms because i get the MEM_ERR.
i already increased the buffer up to 65535 but when i try to increase it more i get several error during compiling.
So my doubt is: can lwip manage buffer bigger than 16bits?
Thanks,
Marco

Better to focus on throughput.
You neglected to show any code, describe which Xilinx board/system you're using, or which OS you're using (e.g. FreeRTOS, linux, etc.).
Your RT data is: 4 bytes * 10 channels * 100kHz --> 400,000 bytes / second.
From your lwip description, you have 65536 byte packets * 4 packets/sec --> 256,000 bytes / second.
This is too slow. And, it much slower than what a typical TCP / Ethernet link can process, so I think your understanding of the maximum rate is off a bit.
You probably can't increase the packet size.
But, you probably can increase the number of packets sent.
You say that the 250ms interval for lwip can't be changed. I believe that it can.
From: https://www.xilinx.com/support/documentation/application_notes/xapp1026.pdf we have the section: Creating an lwIP Application Using the RAW API:
Set up a timer to interrupt at a constant interval. Usually, the interval is around 250 ms. In the timer interrupt, update necessary flags to invoke the lwIP TCP APIs tcp_fasttmr and tcp_slowtmr from the main application loop explained previously
The "usually" seems to me to imply that it's a default and not a maximum.
But, you may not need to increase the timer rate as I don't think it dictates the packet rate, just the servicing/completion rate [in software].
A few guesses ...
Once a packet is queued to the NIC, normally, other packets may be queued asynchronously. Modern NIC hardware often has a hardware queue. That is, the NIC H/W supports multiple pending requests. It can service those at line speed without CPU intervention.
The 250ms may just be a timer interrupt that retires packet descriptors of packets completed by the NIC hardware.
That is, more than one packet can be processed/completed per interrupt. If that were not the case, then only 4 packets / second could be sent and that would be ridiculously low.
Generating an interrupt from the NIC for each packet incurs an overhead. My guess is that interrupts from the NIC are disabled. And, the NIC is run in a "polled" mode. The polling occurs in the timer ISR.
The timer interrupt will occur 4 times per second. But, will process any packets it sees that are completed. So, the ISR overhead is only 4 interrupts / second.
This increases throughput because the ISR overhead is reduced.
UPDATE:
Thanks for the reply, indeed is 4 bytes * 10 channels * 100kHz --> 4,000,000 bytes / second but I agree that we are quite far from the 100Mbit/s.
Caveat: I don't know lwip, so most of what I'm saying is based on my experience with other network stacks (e.g. linux), but it appears that lwip should be similar.
Of course, lwip will provide a way to achieve full line speed.
Regarding the changing of the 250ms timer period, to achieve what i want it should be lowered more than 10times which seems it is too much and it can compromise the stability of the protocol.
When you say that, did you actually try that?
And, again, you didn't post your code or describe your target system, so it's difficult to provide help.
Issues could be because of the capabilities [or lack thereof] of your target system and its NIC.
Or, because of the way your code is structured, you're not making use of the features that can make it fast.
So basically your suggestion is to enable the interrupt on each message? In this case i can send the remaining data in the ACK callback if I understood correctly. – Marco Martines
No.
The mode for interrupt on every packet is useful for low data rates where the use of the link is sparse/sporadic.
If you have an interrupt for every packet, the overhead of entering/exiting the ISR (the ISR prolog/epilog) code will become significant [and possibly untenable] at higher data rates.
That's why the timer based callback is there. To accumulate finished request blocks and [quickly] loop on the queue/chain and complete/reclaim them periodically. If you wish to understand the concept, look at NAPI: https://en.wikipedia.org/wiki/New_API
Loosely, on most systems, when you do a send, a request block is created with all info related to the given buffer. This block is then queued to the transmit queue. If the transmitter is idle, it is started. For subsequent blocks, the block is appended to the queue. The driver [or NIC card] will, after completing a previous request, immediately start a new/fresh one from the front of the queue.
This allows you to queue multiple/many blocks quickly [and return immediately]. They will be sent in order at line speed.
What actually happens depends on system H/W, NIC controller, and OS and what lwip modes you use.

How could we sleep when we are executing a syscall that execute in interrupt mode

When I am executing a system call to do write or something else, the ISR corresponded to the exception is executing in interrupt mode (on cortex-m3 the IPSR register is having a non-zero value, 0xb). And what I have learned is that when we execute a code in an interrupt mode we can not sleep, we can not use functions that might block ...
My question is that: is there any kind of a mechanism with which the ISR could still executing in interrupt mode and in the same time it could use functions that might block, or is there any kind of trick is implemented.

Caveat: This is more of a comment than an answer but is too big to fit in a comment or series of comments.
TL;DR: Needing to sleep or execute a blocking operation from an ISR is a fundamental misdesign. This seems like an XY problem: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Doing sleep [even in application code] is generally a code smell. Why do you feel you need to sleep [vs. some event/completion driven mechanism]?
Please edit your question and add clarification [i.e. don't just add comments]
When I am executing a system call to do write or something else
What is your application doing? A write to what device? What "something else"?
What is the architecture/board, kernel/distro? (e.g. Raspberry Pi running Raspian? nvidia Jetson? Beaglebone? Xilinx FPGA with petalinux?)
What is the H/W device and where did the device driver come from? Did you write the device driver yourself or is it a standard one that comes with the kernel/distro? If you wrote it, please post it in your question.
Is the device configured properly? (e.g.) Are the DTB entries correct?
Is the device a block device, such as a disk controller? Or, is it a character device, such as a UART? Does the device transfer data via DMA? Or, does it transfer data by reading/writing to/from an IO port?
What do you mean by "exception"? Generally, exception is an abnormal condition (e.g. segfault, bus error, etc.). Please describe the exact context/scenario for which this occurs.
Generally, an ISR does little things. (e.g.) Grab [and save] status from the device. Clear/rearm the interrupt in the interrupt controller. Start the next queued transfer request. Wake up the sleeping base level task (usually the task that executed the syscall [waiting on a completion event in kernel mode]).
More elaborate actions are generally deferred and handled in the interrupt's "bottom half" handler and/or tasklet. Or, the base level is woken up and it handles the remaining processing.
What kernel subsystems are involved? Are you using platform drivers? Are you interfacing from within the DMA device driver framework? Are message buses involved (e.g. I2C, SPI, etc.)?
Interrupt and device handling in the linux kernel is somewhat different than what one might do in a "bare metal" system or RTOS (e.g. FreeRTOS). So, if you're coming from those environments, you'll need to think about restructuring your driver code [and/or application code].
What are your requirements for device throughput and latency?
You may wish to consult a good book on linux device driver design. And, you may wish to consult the kernel's Documentation subdirectory.
If you're able to provide more information, I may be able to help you further.
UPDATE:
A system call is not really in the same class as a hardware interrupt as far as the kernel is concerned, even if the CPU hardware uses the same sort of exception vector mechanisms for handling both hardware and software interrupts. The kernel treats the system call as a transition from user mode to kernel mode. – Ian Abbott
This is a succinct/great explanation. The "mode" or "context" has little to do with how we got/get there from a H/W mechanism.
The CPU doesn't really "understand" interrupt mode [as defined by the kernel]. It understands "supervisor" vs "user" privilege level [sometimes called "mode"].
When executing at user privilege level, an interrupt/exception will notice the transition from "user" level to "supvervisor" level. It may have a special register that specifies the address of the [initial] supervisor stack pointer. Atomically, it swaps in the value, pushing the user SP onto the new kernel stack.
If the interrupt is interrupting a CPU that is already at supervisor level, the existing [supervisor] SP will be used unchanged.
Note that x86 has privilege "ring" levels. User mode is ring 3 and the highest [most privileged] level is ring 0. For arm, some arches can have a "hypervisor" privilege level [which is higher privilege than "supervisor" privilege].
The setup of the mode/context is handled in arch/arm/kernel/entry-*.S code.
An svc is a synchronous interrupt [generated by a special CPU instruction]. The resulting context is the context of the currently executing thread. It is analogous to "call function in kernel mode". The resulting context is "kernel thread mode". At that point, it's not terribly useful to think of it as an "interrupt" anymore.
In fact, on some arches, the syscall instruction/mechanism doesn't use the interrupt vector table. It may have a fixed address or use a "call gate" mechanism (e.g. x86).
Each thread has its own stack which is different than the initial/interrupt stack.
Thus, once the entry code has established the context/mode, it is not executing in "interrupt mode". So, the full range of kernel functions is available to it.
An interrupt from a H/W device is asynchronous [may occur at any time the CPU's internal interrupt enable flag is set]. It may interrupt a userspace application [executing in application mode] OR kernel thread mode OR an existing ISR executing in interrupt mode [from another interrupt]. The resulting ISR is executing in "interrupt mode" or "ISR mode".
Because the ISR can interrupt a kernel thread, it may not do certain things. For example, if the CPU were in [kernel] thread mode, and it was in the middle of a kmalloc call [GFP_KERNEL], the ISR would see partial state and any action that tried to adjust the kernel's heap would result in corruption of the heap.
This is a design choice by linux for speed.
Kernel ISRs may be classified as "fast interrupts". The ISR executes with the CPU interrupt enable [IE] flag cleared. No normal H/W interrupt may interrupt the ISR.
So, if another H/W device asserts its interrupt request line [in the external interrupt controller], that request will be "pending". That is, the request line has been asserted but the CPU has not acknowledged it [and the CPU has not jumped via the interrupt table].
The request will remain pending until the CPU allows further interrupts by asserting IE. Or, the CPU may clear the pending interrupt without taking action by clearing the pending interrupt in the interrupt controller.
For a "slow" interrupt ISR, the interrupt entry code will clear the interrupt in the external interrupt controller. It will then rearm interrupts by setting IE and call the ISR. This ISR can be interrupted by other [higher priority] interrupts. The result is a "stacked" interrupt.
I have been searching all over the places, I come to the conclusion is that the interrupts have a higher priority than some of exceptions in the Linux kernel.
An exception [synchronous interrupt] can be interrupted if the IE flag is enabled. An svc is treated differently but after the entry code is executed, the IE flag is set, so the actual syscall code [executing in kernel thread mode] can be interrupted by a H/W interrupt.
Or, in limited circumstances, the kernel code can generate an exception (e.g. a page fault caused by a kernel action [which is usually deemed fatal]).
but I am still looking on how exactly the context switching happen when executing an exception and letting the processor to execute in a thread mode while the SVCall exception is pending (was preempted and have not returned yet)... I think when I understand that, it would be more clear to me.
I think you have to be very careful with the terminology. In particular, when combining terms from disparate sources. Although user mode, kernel thread mode, or interrupt mode can be considered a context [in the dictionary sense of the word], context switching usually means that the current thread is suspended, the scheduler selects a new thread to run and resumes it. That is separate from the user-to-kernel transition.
And if there is any recommended resources about that for ARM-Cortex-M3/4, it would be nice
Here is something: https://interrupt.memfault.com/blog/arm-cortex-m-exceptions-and-nvic But, be very careful in applying the terminology therein. What it considers "pending" only exists in the kernel during the entry code. What is more relevant is what the kernel does to set up mode/context and the terms are not equivalent.
So, from the kernel's standpoint, it's probably better to not consider an svc as "pending".

ARM Cortex-M4 Interrupt priorities

I'm using an ARM Cortex-M4 MCU. If I have an interrupt handler for a GPIO at priority 2 and an SPI driver at priority 3 (i.e., lower priority than the GPIO's), and I call a (blocking) SPI read from within the GPIO's interrupt handler, will the SPI function work?

The answer to your question depends on how it is blocking to handle the transfer, as #Notlikethat said.
If your SPI driver is a polling driver, then it will most likely work. In this case, your GPIO interrupt would spin on flags within the SPI peripheral, waiting for each part of the transfer to complete.
If your SPI driver is interrupt driven, then it will not work. Since you are executing a priority 2 interrupt (GPIO), the priority 3 interrupt (SPI) will not execute until the GPIO interrupt finishes. Depending on how your SPI driver is written, this may entirely hang your system, or it may result in a timeout.
If your SPI driver is DMA driven, then the answer is not so clear and depends on how the driver works. It is possible in this case, that your transaction would complete, but if the function has blocked waiting for a DMA interrupt, it may never arrive depending on its priority.
In any of the above cases, it would generally be considered not a good idea to do something like that inside of an interrupt. If you have an RTOS, you could use a high priority task that is waiting on a semaphore to execute the SPI transaction, or if the OS supports it, used deferred interrupt processing. If you aren't running with an RTOS, I would consider if there is a way you can signal a lower priority interrupt (i.e use PendSV at the lowest priority) or monitor a flag from within the main process. Using a lower priority interrupt, you can still preempt the main process (if that's what is needed), but all your other interrupts can continue executing. If you can monitor a flag in your main process, then that would also allow your interrupts to continue, but if you are time constrained, this may not be as possible (again, depending on how your application is structured)

AVR8 Real Time Scheduler, Serial Communication

I am currently programming an ATmega32u4. I have implemented serial communication which is implemented using a build in interrupt that executes every time there is a byte received on the Rx pin. The byte on the Rx pin is placed in a one byte buffer which is replaced when another byte is received on the Rx pin. This is a built in library in atmel.
ISR(USART1_RX_vect, ISR_BLOCK)
{
RingBuffer_Insert(&usart_rx_buffer,UDR1);
}
My code executes an interrupt when a byte is received on the Rx pin. When a byte is receives this byte is entered into my ring buffer uart_rx_buffer where it is later decoded.
If an interrupt is being executed and this causes the one byte buffer to be replaced before the UART interrupt can be executed, this byte is lost.
The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost.Is there any way to avoid this problem?

One way to solve this problem would be to use the attribute ISR_NOBLOCK in all interrupts that take longer than the baud rate, causing the interrupt enable flag to be activated by the compiler as early as possible within the ISR and allowing the USART1_RX_vect to be executed inside other interrupts. However, "care should be taken to avoid stack overflows, or to avoid infinitely entering the ISR for those cases where the AVR hardware does not clear the respective interrupt flag before entering the ISR".
I've experienced this same problem and so far this was the best solution I could think of. I didn't use it nor tested it, though.
Edit: keep in mind that all other interrupts could also be executed inside interrupts declared with the attribute ISR_NOBLOCK, not just the interrupt you want. So you would basically allow all interrupts to be nested inside all interrupts, except USART1_RX_vect (and those declared with ISR_BLOCK). This is the main problem with this solution (besides the stack overflow problem).

The result of this is that other interrupts cannot take longer than the baud rate to execute otherwise serial bytes are lost. Is there any way to avoid this problem?
All your observations are correct. While allowing nested interrupts like suggested in Nuno's answer could work, it is normally something you would/should want to avoid. Allowing nested interrupts everywhere makes code petty unpredictable.
I would first try to optimize the execution time of the interrupts that are blocking your UART receive ISR. Take a look at the interrupt priorities. If several interrupts are pending, they will be executed according to this priority. This can result in "starvation" of lower level interrupts, if there is "always" a higher level interrupt pending.
What is your baud rate? Even at 115200 bit/s you can execute about 700 instructions (assuming 8MHz) per byte received. ISRs should be as short as possible. If there is one single ISR that is taking long and you can't optimize it for what reason whatsoever, you could consider just allowing nested interrupts in this single ISR (this is only feasible if the execution is not critical).
If you use a high baud rate, consider reducing it. 9600 baud is often enough, but may require asynchronous sending to prevent blocking code.

configuring UART as FIQ

I am working on lpc2468 and using UART0 of the controller for communication with sim300 gprs
module. Sometimes if i send a command for reading the signal strength of the sim the input I
receive is not correct. After looking upon the problem I found the problem that sometimes
when the UART is receiving information at same time the timer gets called and the software
goes to the timer block. in that duration some bytes sent by the module gets missed. To
prevent this i want to configure UART0 as FIQ i.e. interrupt having highest priority. can I
configure UART0 as FIQ.If yes How?

From LPC2048 data sheet,
The ARM processor core has two interrupt inputs called Interrupt
ReQuest (IRQ) and Fast Interrupt ReQuest (FIQ). The VIC takes 32
interrupt request inputs which can be programmed as FIQ or vectored
IRQ types. The programmable assignment scheme means that priorities
of interrupts from the various peripherals can be dynamically
assigned and adjusted.
So you need to find out where are the programmable registers of the Interrupt controller and change the interrupt type of UART to FIQ.
If you have simulation support, then see this to know how to change interrupt types and priorities.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight