STM32 USB CDC Operation - c

I have created a project using STMCubeMX which includes a usbd driver configured as a virtual com port. I have it working and can receive data via the CDC_Receive_FS callback. My question is how is this callback called. Is it done at interrupt level, or is there some other mechanism. In particular, if I want to copy the data from the callback buffer into a queue which will be read by my main code, do I need some protection for concurrency (e.g. disable interrupts)?
Thanks.

It is called from an ISR. (Interrupt Service Routine)
Most likely it is called from:
OTG_HS_IRQHandler.
(with several levels of functions inbetween).
Here is a copy of my stack inside of a breakpoint.
CDC_Receive_HS() at usbd_cdc_if.c:456 0x801c758
USBD_CDC_DataOut() at usbd_cdc.c:699 0x8031592
USBD_LL_DataOutStage() at usbd_core.c:331 0x80318aa
HAL_PCD_DataOutStageCallback() at usbd_conf.c:249 0x801e486
HAL_PCD_IRQHandler() at stm32f7xx_hal_pcd.c:359 0x802d264
OTG_HS_IRQHandler() at stm32f7xx_it.c:288 0x801ab74
You most likely do NOT need to disable other interrupts just to copy this data to another buffer. I believe the buffer it uses should only be used by the usb receive. Copy the data to a separate buffer. The new buffer will need concurrency protection when used outside of this interrupt.
If you are using FreeRTOS, I recommend using the "xQueue" type as a buffer. It is thread safe. You use xQueueSendToBackFromISR inside of interrupts and xQueueSendToBack outside of interrupts.

Related

tx_semaphore inside a Interrupt

I want to read out RS232 data periodically. I have created an interrupt for this purpose. However, my RS232 functions need semaphores. I found out that I cannot execute a TX(Thread X) function in the interrupt. What do I have to do to make my TX function work inside the interrupt?
If your RTOS provides a way to do it, then use that. If not, then here's some other options:
Disable the specific interrupt from the background program during variable access.
In case interrupts aren't interruptible on your MCU, you could implement a "poor man's mutex" described here: https://electronics.stackexchange.com/questions/409545/using-volatile-in-embedded-c-development/409570#409570
Use inline assembler and ensure reads/writes are done in a single instruction.
There's also a very bad idea/last resort, and that is to toggle the global interrupt mask.
First, make sure you are calling _tx_thread_context_save and _tx_thread_context_restore at the beginning and end of your ISR, respectively. See here for more information: https://learn.microsoft.com/en-us/azure/rtos/threadx/chapter3#isr-template
Second, you cannot create a semaphore in an interrupt, so make sure you create it elsewhere.

Avoiding Race Condition with event queue in event driven embedded system

I am trying to program stm32 and use event driven architecture. For example I am going to toggle a pin when timer interrupt occurs and transfer some data to external flash when ADC DMA buffer full interrupt occurs and so on..
There will be multiple interrupt sources each with same priority which disables nesting.
I will use the interrupts to set a flag to signal my main that interrupt occured and process data inside main. There will be no processing/instruction inside ISRs.
What bothers me is that accessing a variable(flags in this case) in main and ISRs may cause race condition bug in the long run.
So I want to use an circular event queue instead of flags.
Only ISRs will be able to write to event queue buffer and increment "head".
Only main will be able to read the event queue(and execute instructions according to event) and increment "tail".
Since ISR nesting is disabled and each ISR will access different element of event queue array and main function will only react when there is new event on event queue, race condition is avoided right? or am I missing something?
Please correct me if I am doing something wrong.
Thank you.
If the interrupt only sets a variable and nothing gets done until main context is ready to do it then there is really no reason to have an interrupt at all.
For example: if you get a DMA complete hardware interrupt and set a variable then all you have achieved is to copy one bit of information from a hardware register to a variable. You could have much simpler code with identical performance and less potential for error by instead of polling a variable just not enabling the interrupt and polling the hardware flag directly.
Only enable the interrupt if you are actually going to do something in interrupt context that cannot wait, for example: reading a UART received data register so that the next character received doesn't overflow the buffer.
If after the interrupt has done the thing that cannot wait it then needs to communicate something with main-context then you need to have shared data. This will mean that you need some way of preventing race-conditions. The simplest way is atomic access with only one side writing to the data item. If that is not sufficient then the old-fashioned way is to turn off interrupts when main context is accessing the shared data. There are more complicated ways using LDREX/STREX instructions but you should only explore these once you are sure that the simple way isn't good enough for your application.

How to exploit interrupts for data transfer over SPI peripheral

I have been implementing device driver for the SPI peripheral of the MCU in C language.
I would like to exploit interrupt mechanism for reception and also for transmission.
As far as the reception part I think that I can implement this via exposing
the function SpiRegisterCallback into the SPI driver interface. This function
enables the client register its function which will be invoked as soon as
data byte is received (reception buffer full interrupt is invoked).
As far as the transmission part I would like to use some SpiTransmit function
which will receive pointer to the data bytes to be transmitted and number of bytes
to be transmitted. As far as implementation I am going to define some internal
callback function of the SPI driver. This internal callback will be registered
for transmission buffer empty interrupt. In this callback function the passed data bytes will be gradually placed into the transmission buffer. I am not sure whether this approach
is appropriate. Can anybody give me an advice how to implement SPI peripheral
driver which exploits interrupts for data transmission? Thanks in advance for any
suggestions.
SPI is often very real-time critical, introducing a callback with function pointers means needless overhead code. The actual copying of data from SPI to RAM must be done internally by your driver. That's all the ISR should be doing. Some general guidance can be found here.
So your ISR should be filling up a buffer, then swap pointers to buffers (no slow memcpy!) in a protected way, so that the caller always has one buffer with valid data, and the ISR always has one working buffer to fill up. Let the caller poll a flag rather than to invoke a callback from inside an ISR. I like to use tripple buffering if I can spare the RAM. That is: one buffer for the ISR, one buffer for the caller and one spare that the ISR can swap with without disrupting the caller.
This is all rather intricate to code and most programmers get it wrong. DMA is superior to interrupts here, so you should really be considering DMA instead. This is something you should be considering when picking MCU.
A request for "any suggestions" does not really make this a great question because multiple answers may be acceptable, and few will be comprehensive. It invites comments rather then answers. However I will indulge:
First, this is not by any definition an exploit. To "exploit" implies making use of something for a purpose it was not intended - that is not the correct term in this case, you are not "exploiting" the interrupt mechanism, you are simply using it.
At high clock rates, in some cases the interrupt latency and context switch time involved in processing the interrupts may be less efficient than a simple busy-wait. If the transfers are more than two or three bytes at a time, you should in any case consider using DMA if available - so the interrupt will be the DMA interrupt for a complete transfer rather then a single character. For applications such as SD card interfacing or EEPROM, DMA will have a significant performance impact and free up the CPU to do other useful work concurrently. A driver that uses a busy-wait for single byte/word transfers and DMA for block transfers may be optimal. This is particularly true perhaps if you are using an RTOS and the ISR triggers a task context to process the data - the context switch overhead may be nearly as much or more than a busy-wait for a single byte. If your SPI clock is > 1MHz for example, you will wait 8us for a byte transfer, your ISR and call backs could easily be greater then that, in which case it is not worthwhile.
So my advice here is to only consider interrupts for SPI if you are using a slow clock and can get other useful work done whilst waiting for the interrupt.
A problem with allowing call-backs in interrupts is it allows the callback provider to do things ill-advised or illegal in an interrupt context, and you loose the ability to control the processing time of the interrupt. It is fine perhaps if the callback is intended for use by someone writing a device driver - they should be aware of what they are doing, but this is the device driver.

Where, in the e1000 linux code, can I zeroize rx/tx network packets?

I'd need to know where can I make zeroization for the received/transmitted network packets in the e1000 linux driver. I need to know this to pass one compliance requirement, but I'm not able to find in the code of the e1000 where to do zeroization of the network packet buffer (or if it already does the zeroization somewhere, that would be great)
I saw that it does ring zeroization when the interface goes up or down in the kernel in the file Intel_LAN_15.0.0_Linux_Source_A00/Source/base_driver/e1000e-2.4.14/src/netdev.c, in the e1000_clean_rx_ring() and e1000_clean_tx_ring() functions:
/* Zero out the descriptor ring */
memset(rx_ring->desc, 0, rx_ring->size);
But I'm not able to find where it should be done for each packet that the system receives/send.
So, does anybody know where is the place in the code where the buffer zeroization for the tx/rx packets should happen? I bet that it will introduce some overhead, but I have to do it anyway.
We're using the intel EF multi port network card: https://www-ssl.intel.com/content/www/us/en/network-adapters/gigabit-network-adapters/gigabit-et-et2-ef-multi-port-server-adapters-brief.html?
and the kernel 3.4.107
We're using the linux-image-3.4.107-0304107-generic_3.4.107-0304107.201504210712_amd64.deb kernel
EDIT: #skgrrwasme pointed correctly that the e1000_clean_tx_ring and e1000_clean_rx_ring functions seem to do the zeroize work, but as it is done only when the hw is down it is not valid for our compliance need.
So, it seems that the functions that are doing the work for each packet are e1000_clean_rx_irq and e1000_clean_tx_irq, but those functions doesn't zeroize data, they only free memory but doesn't make a memset() with 0 to overwrite memory (and that's what is required). So, what I think could be done is, as it is enough to zeroize data when rx or tx, inside e1000_clean_tx_irq() calls to e1000_unmap_and_free_tx_resource(), but in fact it only frees it, not zeroize it:
if (buffer_info->skb) {
dev_kfree_skb_any(buffer_info->skb);
buffer_info->skb = NULL;
}
So what I think is that we can wrote inside dev_kfree_skb_any(), the memset. That function calls to two functions:
dev_kfree_skb_any(struct sk_buff *skb)
{
if (in_irq() || irqs_disabled())
dev_kfree_skb_irq(skb);
else
dev_kfree_skb(skb);
}
So, something easy would be a call to skb_recycle_check(skb); that will do a:
memset(skb, 0, offsetof(struct sk_buff, tail));
Does this make sense? I think that with this, the memory will be overwritten with zeroes, and the work will be done, but I'm not sure...
TL;DR
As far as I can tell, both the transmit and receive buffers are already cleaned by the driver for both transmit and receive. I don't think you need to do anything.
Longer Answer
I don't think you have to worry about it. The transmit and receive buffer clearing functions, e1000_clean_rx__irq and e1000_clean_rx_irq, seem to be called in any interrupt configuration, and for both transmit and receive. Interrupts can be triggered with any of the following interrupt signaling methods: legacy, MSI, or MSI-X. It appears that ring buffer cleaning happens in any interrupt mode, but they call the cleaning functions in different locations.
Since you have two types of transfers (transmit and receive) and three different types of interrupt invocations (Legacy, MSI, and MSI-X), you have a total of six scenarious where you need to make sure things are cleaned. Fortunately, five of the six situations handle the packets by scheduling a job for NAPI. These scenarios are transmit and receive for Legacy and MSI interrupts, and receive for MSI-X. Part of NAPI handling those packets is calling the e1000_clean function as a callback. If you look at the code, you'll see that it calls the buffer cleaning functions for both TX and RX.
The outlier is the MSI-X TX handler. However, it seems to directly call the TX buffer cleaning function, rather than having NAPI handle it.
Here are the relevant interrupt handlers that weren't specifically listed above:
Legacy (both RX and TX)
MSI (both RX and TX)
MSI-X RX
Notes
All of my function references will open a file in the e1000e driver called netdev.c. They will open a window in the Linux Cross Reference database.
This post discusses the e1000e driver, but some of the function names are "e1000...". I think a lot of the e1000 code was reused in the newer e1000e driver, so some of the names carried over. Just know that it isn't a typo.
The e1000_clean_tx_ring and e1000_clean_rx_ring functions that you referred too appear to only be called when the driver is trying to free resources or the hardware is down, during any actual packet handling. The two I referenced above seem to, though. I'm not sure exactly what the difference between them is, but they appear to get the job done.

protocol handler using dev_add_pack consumes cpu

I wrote a kernel module and used dev_add_pack to get all the incoming packets.
According to given filter rules, if packet matches, I am forwarding it to user space.
When I am loading this kernel module and send udp traffic using sipp,
ksoftirqd process appears and starts consume cpu. (I am testing this by top command)
is there any way to save cpu ?
I guess you use ETH_P_ALL type to register your packet_type structure to protocol stack. And I think your packet_type->func is the bottleneck, which maybe itself consumes lots of cpu, or it break the existing protocol stack model and triggers other existing packet_type functions to consumes cpu. So the only way to save cpu is to optimize you packet_type->func. If your function is too complicated, you should consider to spit the function to several parts, use the simple part as the packet_type->func which runs in ksoftirqd context, while the complicated parts should be put to other kernel thread context(you can create new thread in your kernel module if needed).

Resources