Linux UART imx8 how to quickly detect frame end? - c

I have an imx8 module running Linux on my PCB and i would like some tips or pointers on how to modify the UART driver to allow me to be able to detect the end of frame very quickly (less than 2ms) from my user space C application. The UART frame does not have any specific ending character or frame length. The standard VTIME of 100ms is much too long
I am reading from a Sim card, i have no control over the data, no control over the size or content of the data. I just need to detect the end of frame very quickly. The frame could be 3 bytes or 500. The SIM card reacts to data that it receives, typically I send it a couple of bytes and then it will respond a couple of ms later with an uninterrupted string of bytes of unknown length. I am using an iMX8MP
I thought about using the IDLE interrupt to detect the frame end. Turn it on when any byte is received and off once the idle interrupt fires. How can I propagate this signal back to user space? Or is there an existing method to do this?

Waiting for an "idle" is a poor way to do this.
Use termios to set raw mode with VTIME of 0 and VMIN of 1. This will allow the userspace app to get control as soon as a single byte arrives. See:
How to read serial with interrupt serial?
How do I use termios.h to configure a serial port to pass raw bytes?
How to open a tty device in noncanonical mode on Linux using .NET Core
But, you need a "protocol" of sorts, so you can know how much to read to get a complete packet. You prefix all data with a struct that has (e.g.) A type and a payload length. Then, you send "payload length" bytes. The receiver gets/reads that fixed length struct and then reads the payload which is "payload length" bytes long. This struct is always sent (in both directions).
See my answer: thread function doesn't terminate until Enter is pressed for a working example.
What you have/need is similar to doing socket programming using a stream socket except that the lower level is the UART rather than an actual socket.
My example code uses sockets, but if you change the low level to open your uart in raw mode (as above), it will be very similar.
UPDATE:
How quickly after the frame finished would i have the data at the application level? When I try to read my random length frames currently reading in 512 byte chunks, it will sometimes read all the frame in one go, other times it reads the frame broken up into chunks. –
Engo
In my link, in the last code block, there is an xrecv function. It shows how to read partial data that comes in chunks.
That is what you'll need to do.
Things missing from your post:
You didn't post which imx8 board/configuration you have. And, which SIM card you have (the protocols are card specific).
And, you didn't post your other code [or any code] that drives the device and illustrates the problem.
How much time must pass without receiving a byte before the [uart] device is "idle"? That is, (e.g.) the device sends 100 bytes and is then finished. How many byte times does one wait before considering the device to be "idle"?
What speed is the UART running at?
A thorough description of the device, its capabilities, and how you intend to use it.
A uart device doesn't have an "idle" interrupt. From some imx8 docs, the DMA device may have an "idle" interrupt and the uart can be driven by the DMA controller.
But, I looked at some of the linux kernel imx8 device drivers, and, AFAICT, the idle interrupt isn't supported.
I need to read everything in one go and get this data within a few hundred microseconds.
Based on the scheduling granularity, it may not be possible to guarantee that a process runs in a given amount of time.
It is possible to help this a bit. You can change the process to use the R/T scheduler (e.g. SCHED_FIFO). Also, you can use sched_setaffinity to lock the process to a given CPU core. There is a corresponding call to lock IRQ interrupts to a given CPU core.
I assume that the SIM card acts like a [passive] device (like a disk). That is, you send it a command, and it sends back a response or does a transfer.
Based on what command you give it, you should know how many bytes it will send back. Or, it should tell you how many optional bytes it will send (similar to the struct in my link).
The method you've described (e.g.) wait for idle, then "race" to get/process the data [for which you don't know the length] is fraught with problems.
Even if you could get it to work, it will be unreliable. At some point, system activity will be just high enough to delay wakeup of your process and you'll miss the window.
If you're reading data, why must you process the data within a fixed period of time (e.g. 100 us)? What happens if you don't? Does the device catch fire?
Without more specific information, there are probably other ways to do this.
I've programmed such systems before that relied on data races. They were unreliable. Either missing data. Or, for some motor control applications, device lockup. The remedy was to redesign things so that there was some positive/definitive way to communicate that was tolerant of delays.
Otherwise, I think you've "fallen in love" with "idle interrupt" idea, making this an XY problem: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem

Related

lwip to send data bigger than 64kb

i'm trying to send over lwip a RT data (4 bytes) sampled at 100kHz for 10 channels.
I've understood that lwip has a timer which loops every 250ms and it cannot be changed.
In this case i'm saving the RT over RAM at 100kHz and every 250ms sending the sampled data over TCP.
The problem is that i cannot go over 65535 bytes every 250ms because i get the MEM_ERR.
i already increased the buffer up to 65535 but when i try to increase it more i get several error during compiling.
So my doubt is: can lwip manage buffer bigger than 16bits?
Thanks,
Marco
Better to focus on throughput.
You neglected to show any code, describe which Xilinx board/system you're using, or which OS you're using (e.g. FreeRTOS, linux, etc.).
Your RT data is: 4 bytes * 10 channels * 100kHz --> 400,000 bytes / second.
From your lwip description, you have 65536 byte packets * 4 packets/sec --> 256,000 bytes / second.
This is too slow. And, it much slower than what a typical TCP / Ethernet link can process, so I think your understanding of the maximum rate is off a bit.
You probably can't increase the packet size.
But, you probably can increase the number of packets sent.
You say that the 250ms interval for lwip can't be changed. I believe that it can.
From: https://www.xilinx.com/support/documentation/application_notes/xapp1026.pdf we have the section: Creating an lwIP Application Using the RAW API:
Set up a timer to interrupt at a constant interval. Usually, the interval is around 250 ms. In the timer interrupt, update necessary flags to invoke the lwIP TCP APIs tcp_fasttmr and tcp_slowtmr from the main application loop explained previously
The "usually" seems to me to imply that it's a default and not a maximum.
But, you may not need to increase the timer rate as I don't think it dictates the packet rate, just the servicing/completion rate [in software].
A few guesses ...
Once a packet is queued to the NIC, normally, other packets may be queued asynchronously. Modern NIC hardware often has a hardware queue. That is, the NIC H/W supports multiple pending requests. It can service those at line speed without CPU intervention.
The 250ms may just be a timer interrupt that retires packet descriptors of packets completed by the NIC hardware.
That is, more than one packet can be processed/completed per interrupt. If that were not the case, then only 4 packets / second could be sent and that would be ridiculously low.
Generating an interrupt from the NIC for each packet incurs an overhead. My guess is that interrupts from the NIC are disabled. And, the NIC is run in a "polled" mode. The polling occurs in the timer ISR.
The timer interrupt will occur 4 times per second. But, will process any packets it sees that are completed. So, the ISR overhead is only 4 interrupts / second.
This increases throughput because the ISR overhead is reduced.
UPDATE:
Thanks for the reply, indeed is 4 bytes * 10 channels * 100kHz --> 4,000,000 bytes / second but I agree that we are quite far from the 100Mbit/s.
Caveat: I don't know lwip, so most of what I'm saying is based on my experience with other network stacks (e.g. linux), but it appears that lwip should be similar.
Of course, lwip will provide a way to achieve full line speed.
Regarding the changing of the 250ms timer period, to achieve what i want it should be lowered more than 10times which seems it is too much and it can compromise the stability of the protocol.
When you say that, did you actually try that?
And, again, you didn't post your code or describe your target system, so it's difficult to provide help.
Issues could be because of the capabilities [or lack thereof] of your target system and its NIC.
Or, because of the way your code is structured, you're not making use of the features that can make it fast.
So basically your suggestion is to enable the interrupt on each message? In this case i can send the remaining data in the ACK callback if I understood correctly. – Marco Martines
No.
The mode for interrupt on every packet is useful for low data rates where the use of the link is sparse/sporadic.
If you have an interrupt for every packet, the overhead of entering/exiting the ISR (the ISR prolog/epilog) code will become significant [and possibly untenable] at higher data rates.
That's why the timer based callback is there. To accumulate finished request blocks and [quickly] loop on the queue/chain and complete/reclaim them periodically. If you wish to understand the concept, look at NAPI: https://en.wikipedia.org/wiki/New_API
Loosely, on most systems, when you do a send, a request block is created with all info related to the given buffer. This block is then queued to the transmit queue. If the transmitter is idle, it is started. For subsequent blocks, the block is appended to the queue. The driver [or NIC card] will, after completing a previous request, immediately start a new/fresh one from the front of the queue.
This allows you to queue multiple/many blocks quickly [and return immediately]. They will be sent in order at line speed.
What actually happens depends on system H/W, NIC controller, and OS and what lwip modes you use.

How to design a test case to validate throttling capacity of a packet decoder?

I am implementing a packet decoder on a micro-controller. The packets are of 32-bytes each, received through a UART, every 10 milliseconds. The UART ISR (Interrupt Service Routine) keeps the received bytes in a ring buffer, and a thread scheduled every 7.5ms decodes the packets from ring buffer. There are instrumentation routines implemented to report the number of times ring buffer was full, error count after decoding, dropped bytes count. The micro-controller can send these packets back to PC running my test case through a different UART.
How do I design a test case to check if the system is meeting my performance requirements. These are the test cases which I should take care of --
The transmitter clock may run slightly faster (Sending a packet every 8ms, rather than the nominal 10ms).
The channel may introduce errors to data bits. There are checksum fields included in packet to cope up with that. How to simulate the channel errors?
The test case should be maintainable and extendable.
I already have a simulator through which I tested the decoder (implemented in micro-controller) for functional correctness. This simulator sends packets at programmable intervals, and the value of data fields can be changed through a UI. How can this simulator be modified to do this?
Are there standard practices/test cases to handle such throttling tests? Are there some edge cases I am missing? I need to make sure that the ring buffer has enough space to handle the higher rates of packets sent by the receiver.

Speed up / modify tcdrain() function

I'll skirt round the long and tedious story of how we got where we are, but the situation is this:
We are using half-duplex RS485 serial comms and (by necessity) driving the TX/RX flag "manually" via GPIO pin toggling. In order to make this work we're using tcdrain() to wait until the Tx buffer is empty before flipping back to Rx mode.
The problem is that tcdrain() seems to wait (block) for quite a while after the last character has been transmitted, which causes us a bit of a bottleneck.
I've seen suggestions that the default tcdrain() code just multiplies the baud rate by the (maximum) size of the serial buffer, sleep()s for that time period and then returns.- and I could easily believe that.
So, can anyone suggest ways to either:
Speed up tcdrain() perhaps by shortening the serial buffer
Modify tcdrain() (or related code/parameters) to actually wait for the last character to be sent by the hardware, or wait for a period more closely related to the buffer contents
I've grepped our (embedded) kernel (2.6.x) code and can't see any references other than a single header file (termios.h).
Edit to add: As per this post, if for example we could reduce the serial Tx buffer to 1 byte using an IOCTL I assume the write() call would/could block while chars were written, then return, which would allow us to avoid relying on tcdrain() and just use a very short usleep() before toggling the Tx/Rx pin. I will experiment when I get a moment, in the meantime any suggestions/examples welcome.

Implement UART frame controller

I'm programming on a STM32 board and I'm confused on how to use my peripherals : polling, interrupt, DMA, DMA interrupt...
Actually, I coded an UART module which send basics data and it works in polling, interrupt and DMA mode.
But I'd like to be able to send and receive specific frames with variable lengths, for example:
[ START | LGTH | CMD_ID | DATA(LGTH) | CRC ]
I also have sensors and I'd like to interact received DATA in these UART frames with sensors.
So, what I don't understand is:
how to program the UART module to work in "frame" mode? (buffer? circular DMA? interrupt? where, when..)
when I'm able to send or receive frame with my UART, what is the best way to interact with sensors? (inside a timer interrupt? in a state machine ? with extern variable? ...)
Here is my Libraries tree
In future, the idea is to carry this application in freertos
Thank you!
Absolutelly in DMA when it is available.
You have one big (good solution is cyclic) buffer and you just write data from one side. If DMA does not already work, you start the DMA with your buffer.
If DMA works, you just write your data to buffer and you wait DMA transfer complete interrupt.
Later in this interrupt you increase read pointer of buffer (as you sent some data already) and check if any data available to send over DMA. Set memory address to DMA and number of bytes in buffer to send.
Again, when DMA TC IRQ happens, do process again.
There is no support for FRAME, but only in plain bytes. It means you have to "invent" your own frame protocol and use it in app.
Later, when you want to send that FRAME over UART, you have to:
Write start byte to buffer
Write other header bytes
Write actual data
Write stop bytes/CRC/whatever
Check if DMA does not work, if it does not, start it.
Normally, I use this frame concept:
[START, ADDRESS, CMD, LEN, DATA, CRC, STOP]
START: Start byte indicating start of frame
ADDRESS: Address of device when multiple devices are in use on bus
CMD: Command ID
LEN: 2 bytes for data length
DATA: Actual data in bytes of variable length
CRC: 2 bytes for CRC including: address, cmd, len, data
STOP: Stop byte indicating end of frame
This is how I do it in every project where necessary. This does not use CPU to send data, just sets DMA and starts transmission.
From app perspective, you just have to create send_send(data, len) function which will create frame and put it to buffer for transmission.
Buffer size must be big enough to fit your requirements:
How much data at particular time (is it continues or a lot of data at small time)
UART baudrate
For specific question, ask and maybe I can provide some code examples from my libraries as reference.
In this case, where you need to implement that protocol, I would probably use plain interrupts and, in the handler, use a byte-by-byte state-machine to parse the incoming bytes into a frame buffer.
Only when a complete, valid frame has been received is it necessary to signal some semaphore/event and requuest a scheduler run, otherwise, you can handle any protocol error as you require - maybe tx some 'error-repeat' message and reset the state-machine to await the nexx start-of-frame buyte.
If you use DMA for this, then the variable frame-length is going to be awkward and you STILL have to iterate the received data to validate you protocol:(
DMA doesn't sound like a good fit for this, to me...
EDIT: if no preemptive multitasker, then forget about all that semaphore gunge above:) Still, it's easier to check a boolean 'validFrameRx' flag than parse DMA block data.

Data structure for storing serial port data in firmware

I am sending data from a linux application through serial port to an embedded device.
In the current implementation a byte circular buffer is used in the firmware. (Nothing but an array with a read and write pointer)
As the bytes come in, it is written to the circular bufffer.
Now the PC application appears to be sending the data too fast for the firmware to handle. Bytes are missed resulting in the firmware returning WRONG_INPUT too mant times.
I think baud rate (115200) is not the issue. A more efficient data structure at the firmware side might help. Any suggestions on choice of data structure?
A circular buffer is the best answer. It is the easiest way to model a hardware FIFO in pure software.
The real issue is likely to be either the way you are collecting bytes from the UART to put in the buffer, or overflow of that buffer.
At 115200 baud with the usual 1 start bit, 1 stop bit and 8 data bits, you can see as many as 11520 bytes per second arrive at that port. That gives you an average of just about 86.8 µs per byte to work with. In a PC, that will seem like a lot of time, but in a small microprocessor, it might not be all that many total instructions or in some cases very many I/O register accesses. If you overfill your buffer because bytes are arriving on average faster than you can consume them, then you will have errors.
Some general advice:
Don't do polled I/O.
Do use a Rx Ready interrupt.
Enable the receive FIFO, if available.
Empty the FIFO completely in the interrupt handler.
Make the ring buffer large enough.
Consider flow control.
Sizing your ring buffer large enough to hold a complete message is important. If your protocol has known limits on the message size, then you can use the higher levels of your protocol to do flow control and survive without the pains of getting XON/XOFF flow to work right in all of the edge cases, or RTS/CTS to work as expected in both ends of the wire which can be nearly as hairy.
If you can't make the ring buffer that large, then you will need some kind of flow control.
There is nothing better than a circular buffer.
You could use a slower baud rate or speed up the application in the firmware so that it can handle data coming at full speed.
If the output of the PC is in bursts it may help to make the buffer big enough to handle one burst.
The last option is to implement some form of flow control.
What do you mean by embedded device ? I think most of current DSP and processor can easily handle this kind of load. The problem is not with the circular buffer, but how do you collect bytes from the serial port.
Does your UART have a hardware fifo ? If yes, then you should enable it. If you have an interrupt per byte, you can quickly get into trouble, especially if you are working with an OS or with virtual memory, where the IRQ cost can be quit high.
If your receiving firmware is very simple (no multitasking), and you don't have an hardware fifo, polled mode can be a better solution than interrupt driven, because then your processor is doing only UART data reception, and you have no interrupt overhead.
Another problem might be with the transfer protocol. For example if you have long packet of data that you have to checksum, and you do the whole checksum at the end of the packet, then all the processing time of the packet is at the end of it, and that is why you may miss the beginning of the next packet.
So circular buffer is fine and you have to way to improve :
- The way you interact with the hardware
- The protocol (packet length, acknoledgment etc ...)
Before trying to solve the problem, first you need to establish what the problem really is. Otherwise you might waste time trying to fix something that isn't actually broken.
Without knowing more about your set-up it's hard to give more specific advice. But you should investigate further to establish what exactly the hardware and software is currently doing when the bytes come in, and then what is the weak point where they're going missing.
A circular buffer with Interrupt driven IO will work on the smallest and slowest of embedded targets.
First try it at the lowest baud rate and only then try at high speeds.
Using a circular buffer in conjunction with IRQ is an excellent suggestion. If your processor generates an interrupt each time a byte is received take that byte and store it in the buffer. How you decide to empty that buffer depends on if you are processing a stream of data or data packets. If you are processing a stream simply have your background process remove the bytes from the buffer and process them first-in-first-out. If you are processing packets then just keep filing the buffer until you have a complete packet. I've used the packet method successfully many times in the past. I would implement some type of flow control as well to signal to the PC if something went wrong like a full buffer or if packet-processing time is long to indicate to the PC when it is ready for the next packet.
You could implement something like IP datagram which contains data length, id, and checksum.
Edit:
Then you could hard-code some fixed length for the packets, for example 1024 byte or whatever that makes sense for the device. PC side would then check if the queue is full at the device every time it writes in a packet. Firmware side would run checksum to see if all data is valid, and read up till the data length.

Resources