Speed up / modify tcdrain() function

Speed up / modify tcdrain() function - c

I'll skirt round the long and tedious story of how we got where we are, but the situation is this:
We are using half-duplex RS485 serial comms and (by necessity) driving the TX/RX flag "manually" via GPIO pin toggling. In order to make this work we're using tcdrain() to wait until the Tx buffer is empty before flipping back to Rx mode.
The problem is that tcdrain() seems to wait (block) for quite a while after the last character has been transmitted, which causes us a bit of a bottleneck.
I've seen suggestions that the default tcdrain() code just multiplies the baud rate by the (maximum) size of the serial buffer, sleep()s for that time period and then returns.- and I could easily believe that.
So, can anyone suggest ways to either:
Speed up tcdrain() perhaps by shortening the serial buffer
Modify tcdrain() (or related code/parameters) to actually wait for the last character to be sent by the hardware, or wait for a period more closely related to the buffer contents
I've grepped our (embedded) kernel (2.6.x) code and can't see any references other than a single header file (termios.h).
Edit to add: As per this post, if for example we could reduce the serial Tx buffer to 1 byte using an IOCTL I assume the write() call would/could block while chars were written, then return, which would allow us to avoid relying on tcdrain() and just use a very short usleep() before toggling the Tx/Rx pin. I will experiment when I get a moment, in the meantime any suggestions/examples welcome.

Related

Linux UART imx8 how to quickly detect frame end?

I have an imx8 module running Linux on my PCB and i would like some tips or pointers on how to modify the UART driver to allow me to be able to detect the end of frame very quickly (less than 2ms) from my user space C application. The UART frame does not have any specific ending character or frame length. The standard VTIME of 100ms is much too long
I am reading from a Sim card, i have no control over the data, no control over the size or content of the data. I just need to detect the end of frame very quickly. The frame could be 3 bytes or 500. The SIM card reacts to data that it receives, typically I send it a couple of bytes and then it will respond a couple of ms later with an uninterrupted string of bytes of unknown length. I am using an iMX8MP
I thought about using the IDLE interrupt to detect the frame end. Turn it on when any byte is received and off once the idle interrupt fires. How can I propagate this signal back to user space? Or is there an existing method to do this?

Waiting for an "idle" is a poor way to do this.
Use termios to set raw mode with VTIME of 0 and VMIN of 1. This will allow the userspace app to get control as soon as a single byte arrives. See:
How to read serial with interrupt serial?
How do I use termios.h to configure a serial port to pass raw bytes?
How to open a tty device in noncanonical mode on Linux using .NET Core
But, you need a "protocol" of sorts, so you can know how much to read to get a complete packet. You prefix all data with a struct that has (e.g.) A type and a payload length. Then, you send "payload length" bytes. The receiver gets/reads that fixed length struct and then reads the payload which is "payload length" bytes long. This struct is always sent (in both directions).
See my answer: thread function doesn't terminate until Enter is pressed for a working example.
What you have/need is similar to doing socket programming using a stream socket except that the lower level is the UART rather than an actual socket.
My example code uses sockets, but if you change the low level to open your uart in raw mode (as above), it will be very similar.
UPDATE:
How quickly after the frame finished would i have the data at the application level? When I try to read my random length frames currently reading in 512 byte chunks, it will sometimes read all the frame in one go, other times it reads the frame broken up into chunks. –
Engo
In my link, in the last code block, there is an xrecv function. It shows how to read partial data that comes in chunks.
That is what you'll need to do.
Things missing from your post:
You didn't post which imx8 board/configuration you have. And, which SIM card you have (the protocols are card specific).
And, you didn't post your other code [or any code] that drives the device and illustrates the problem.
How much time must pass without receiving a byte before the [uart] device is "idle"? That is, (e.g.) the device sends 100 bytes and is then finished. How many byte times does one wait before considering the device to be "idle"?
What speed is the UART running at?
A thorough description of the device, its capabilities, and how you intend to use it.
A uart device doesn't have an "idle" interrupt. From some imx8 docs, the DMA device may have an "idle" interrupt and the uart can be driven by the DMA controller.
But, I looked at some of the linux kernel imx8 device drivers, and, AFAICT, the idle interrupt isn't supported.
I need to read everything in one go and get this data within a few hundred microseconds.
Based on the scheduling granularity, it may not be possible to guarantee that a process runs in a given amount of time.
It is possible to help this a bit. You can change the process to use the R/T scheduler (e.g. SCHED_FIFO). Also, you can use sched_setaffinity to lock the process to a given CPU core. There is a corresponding call to lock IRQ interrupts to a given CPU core.
I assume that the SIM card acts like a [passive] device (like a disk). That is, you send it a command, and it sends back a response or does a transfer.
Based on what command you give it, you should know how many bytes it will send back. Or, it should tell you how many optional bytes it will send (similar to the struct in my link).
The method you've described (e.g.) wait for idle, then "race" to get/process the data [for which you don't know the length] is fraught with problems.
Even if you could get it to work, it will be unreliable. At some point, system activity will be just high enough to delay wakeup of your process and you'll miss the window.
If you're reading data, why must you process the data within a fixed period of time (e.g. 100 us)? What happens if you don't? Does the device catch fire?
Without more specific information, there are probably other ways to do this.
I've programmed such systems before that relied on data races. They were unreliable. Either missing data. Or, for some motor control applications, device lockup. The remedy was to redesign things so that there was some positive/definitive way to communicate that was tolerant of delays.
Otherwise, I think you've "fallen in love" with "idle interrupt" idea, making this an XY problem: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem

How to handle asynchronous input and synchronous output?

I am currently working on a project where I have USART input and SAI(Serial Audio Interface, similar to SPI) output on an STM32 system.
I created a circular buffer which act as pinpong buffer(double buffer) structure. The input samples which received from USART are stored in this buffer where the head pointer points. When SAI peripheral requests new data, data is pulled from this buffer's tail pointer.
At the start of my code I wait until half the buffer is filled then activate SAI. SAI outputs at constant rate which is 40kHz. Input samples are received from external device's USART at approximately at same rate 40kHz.
Ideally, I expect the difference between my head and tail pointer to be constant.
I also implemented a protection mechanism which makes the Tail pointer wait and output the last sample from SAI until half of the buffer to fill when two pointers are pointing at same location.
The code works at start. The problem is when some time passes like approximately 2 minutes we see the head and tail pointers are pointing at the same location which creates discontinuity in our samples. Which means the one pointer is slow or fast than expected. I am sure of SAI protocol outputting 40kHz constantly (I checked it with scope). However, I am not so sure about accuracy of USARTs timing. I cannot modify the external USART device's code and I cannot change the output rate 40kHz it must be this value.
Is there a another way (maybe other than ping pong buffer method) to handle asynchronous input and synchronous outputs?

If what you are saying is you are receiving (continuous) serial data from some external device, and then you are forwarding it out some interface of your own at some rate...based on your clock. Even if the data is the same format and the clocks are "the same", then a buffer overflow is expected somewhere.
Same thing happens with ethernet or any other source if 1) continuous at line rate 2) source for the input and source for the output are a different clock 3) there are guaranteed to be differences in the reference clocks, if the input source clock is a little faster then so long as the stream stays continuous and at line rate, then you will overflow eventually.
The clocks change with temperature and voltage so the delta can change.
Possible to even reduce the percentage of the data you output from the input and still overflow if the input is continuous. Depends on if your output is also at line rate or of you have margin and the margin can overcome the difference in the clocks.
Also remember uarts hardly run at that exact rate, the use clock dividers and get close, you can have two computers using uarts at the same rate and the delta can be relatively large and the overflow can happen very soon. For uart to work the clock has to only be good enough to get through one character so can be several percent off if not more than that, even if the oscillator is very good and no plls and both sides use the same reference clock (but not the same uart, clocking system, etc).
If you increase your output rate or reduce the data being output so that it is not at line rate then the problem may go away or may take hours or days before it happens...
if I have misunderstood the problem, forgive me, I will delete this answer.

Relaying UART1 with UART2 within an ISR() (PIC24H)

I am programming a microcontroller of the PIC24H family and using xc16 compiler.
I am relaying U1RX-data to U2TX within main(), but when I try that in an ISR it does not work.
I am sending commands to the U1RX and the ISR() is down below. At U2RX, there are databytes coming in constantly and I want to relay 500 of them with the U1TX. The results of this is that U1TX is relaying the first 4 databytes from U2RX but then re-sending the 4th byte over and over again.
When I copy the for loop below into my main() it all works properly. In the ISR(), its like that U2RX's corresponding FIFObuffer is not clearing when read so the buffer overflows and stops reading further incoming data to U2RX. I would really appreciate if someone could show me how to approach the problem here. The variables tmp and command are globally declared.
void __attribute__((__interrupt__, auto_psv, shadow)) _U1RXInterrupt(void)
{
command = U1RXREG;
if(command=='d'){
for(i=0;i<500;i++){
while(U2STAbits.URXDA==0);
tmp=U2RXREG;
while(U1STAbits.UTXBF==1); //
U1TXREG=tmp;
}
}
}
Edit: I added the first line in the ISR().

Trying to draw an answer from the various comments.
If the main() has nothing else to do, and there are no other interrupts, you might be able to "get away with" patching all 500 chars from one UART to another under interrupt, once the first interrupt has ocurred, and perhaps it would be a useful exercise to get that working.
But that's not how you should use an interrupt. If you have other tasks in main(), and equal or lower priority interrupts, the relatively huge time that this interrupt will take (500 chars at 9600 baud = half a second) will make the processor what is known as "interrupt-bound", that is, the other processes are frozen out.
As your project gains complexity, you won't want to restrict main() to this task, and there is no need to for it be involved at all, after setting up the UARTs and IRQs. After that it can calculate π ad infinitum if you want.
I am a bit perplexed as to your sequence of operations. A command 'd' is received from U1 which tells you to patch 500 chars from U2 to U1.
I suggest one way to tackle this (and there are many) seeing as you really want to use interrupts, is to wait until the command is received from U1 - in main(). You then configure, and enable, interrupts for RXD on U2.
Then the job of the ISR will be to receive data from U2 and transmit it thru U1. If both UARTS have the same clock and the same baud rate, there should not be a synchronisation problem, since a UART is typically buffered internally: once it begins to transmit, the TXD register is available to hold another character, so any stagnation in the ISR should be minimal.
I can't write the actual code for you, since it would be supposed to work, but here is some very pseudo code, and I don't have a PIC handy (or wish to research its operational details).
ISR
has been invoked because U2 has a char RXD
you *might* need to check RXD status as a required sequence to clear the interrupt
read the RXD register, which also might clear the interrupt status
if not, specifically clear the interrupt status
while (U1 TXD busy);
write char to U1
if (chars received == 500)
disable U2 RXD interrupt
return from interrupt

ISR's must be kept lean and mean and the code made hyper-efficient if there is any hope of keeping up with the buffer on a UART. Experiment with the BAUD rate just to find the point at which your code can keep up, to help discover the right heuristic and see how far away you are from achieving your goal.
Success could depend on how fast your micro controller is, as well, and how many tasks it is running. If the microcontroller has a built in UART theoretically you should be able to manage keeping the FIFO from overflowing. On the other hand, if you paired up a UART with an insufficiently-powered micro controller, you might not be able to optimize your way out of the problem.
Besides the suggestion to offload the lower-priority work to the main thread and keep the ISR fast (that someone made in the comments), you will want to carefully look at the timing of all of the lines of code and try every trick in the book to get them to run faster. One expensive instruction can ruin your whole day, so get real creative in finding ways to save time.
EDIT: Another thing to consider - look at the assembly language your C compiler creates. A good compiler should let you inline assembly language instructions to allow you to hyper-optimize for your particular case. Generally in an ISR it would just be a small number of instructions that you have to find and implement.
EDIT 2: A PIC 24 series should be fast enough if you code it right and select a fast oscillator or crystal and run the chip at a good clock rate. Also consider the divisor the UART might be using to achieve its rate vs. the PIC clock rate. It is conceivable (to me) that an even division that could be accomplished internally via shifting would be better than one where math was required.

Init a serial communication with c library open() causes TX to send one bit on RPi

I'm trying to set up a serial communication between the RPI and an FPGA. However, there is an issue when using the standard C library open() to init the serial interface: I'm using a scope to monitor what is sent and received via the RX and TX lines. A call to open causes the TX line of the RPI to go low for the length of one bit. I do not see this behavior with other computers/linux PCs. The point is, the FPGA assumes a valid transmission, since he thinks it's a start bit, but it's not.
I checked with minicom installed on the RPI. Same thing. Starting minicom causes the TX line sending one bit. Once minicom has started, the communication runs as expected and all bytes have the correct frame size. Is there any way to suppress the TX line going low upon the open call to init the serial communication? Is this an expected behavior?

This is a super far-fetched hunch, but this code seems a bit suspicious, from the pl011_startup() function in the PL011 serial port driver:
/*
* Provoke TX FIFO interrupt into asserting.
*/
It seems as if it's twiddling the TX line when starting up the port, which would explain the pulse you're seeing. More investigation would surely be needed before concluding this is what happens, of course.
So, I guess my "answer" boils down to: that sounds weird, perhaps it's something with the driver?
Of course, one way of working around this is to apply some care in the FPGA end, assuming you have more control over it. "Proper" framing would take care of this, and make it clear that the spurious send can be discarded.
UPDATE: I meant that if "proper" messages were to be always framed by some sequence of bytes, the FPGA might be able to discard invalid ("unframed") data anyway, and thus become immune to the random pulse. For instance, messages could be defined to always start with SOH (start of header) or SOT (start of text) symbols (bytes with the values 0x01 and 0x02, respectively).

Data structure for storing serial port data in firmware

I am sending data from a linux application through serial port to an embedded device.
In the current implementation a byte circular buffer is used in the firmware. (Nothing but an array with a read and write pointer)
As the bytes come in, it is written to the circular bufffer.
Now the PC application appears to be sending the data too fast for the firmware to handle. Bytes are missed resulting in the firmware returning WRONG_INPUT too mant times.
I think baud rate (115200) is not the issue. A more efficient data structure at the firmware side might help. Any suggestions on choice of data structure?

A circular buffer is the best answer. It is the easiest way to model a hardware FIFO in pure software.
The real issue is likely to be either the way you are collecting bytes from the UART to put in the buffer, or overflow of that buffer.
At 115200 baud with the usual 1 start bit, 1 stop bit and 8 data bits, you can see as many as 11520 bytes per second arrive at that port. That gives you an average of just about 86.8 µs per byte to work with. In a PC, that will seem like a lot of time, but in a small microprocessor, it might not be all that many total instructions or in some cases very many I/O register accesses. If you overfill your buffer because bytes are arriving on average faster than you can consume them, then you will have errors.
Some general advice:
Don't do polled I/O.
Do use a Rx Ready interrupt.
Enable the receive FIFO, if available.
Empty the FIFO completely in the interrupt handler.
Make the ring buffer large enough.
Consider flow control.
Sizing your ring buffer large enough to hold a complete message is important. If your protocol has known limits on the message size, then you can use the higher levels of your protocol to do flow control and survive without the pains of getting XON/XOFF flow to work right in all of the edge cases, or RTS/CTS to work as expected in both ends of the wire which can be nearly as hairy.
If you can't make the ring buffer that large, then you will need some kind of flow control.

There is nothing better than a circular buffer.
You could use a slower baud rate or speed up the application in the firmware so that it can handle data coming at full speed.
If the output of the PC is in bursts it may help to make the buffer big enough to handle one burst.
The last option is to implement some form of flow control.

What do you mean by embedded device ? I think most of current DSP and processor can easily handle this kind of load. The problem is not with the circular buffer, but how do you collect bytes from the serial port.
Does your UART have a hardware fifo ? If yes, then you should enable it. If you have an interrupt per byte, you can quickly get into trouble, especially if you are working with an OS or with virtual memory, where the IRQ cost can be quit high.
If your receiving firmware is very simple (no multitasking), and you don't have an hardware fifo, polled mode can be a better solution than interrupt driven, because then your processor is doing only UART data reception, and you have no interrupt overhead.
Another problem might be with the transfer protocol. For example if you have long packet of data that you have to checksum, and you do the whole checksum at the end of the packet, then all the processing time of the packet is at the end of it, and that is why you may miss the beginning of the next packet.
So circular buffer is fine and you have to way to improve :
- The way you interact with the hardware
- The protocol (packet length, acknoledgment etc ...)

Before trying to solve the problem, first you need to establish what the problem really is. Otherwise you might waste time trying to fix something that isn't actually broken.
Without knowing more about your set-up it's hard to give more specific advice. But you should investigate further to establish what exactly the hardware and software is currently doing when the bytes come in, and then what is the weak point where they're going missing.

A circular buffer with Interrupt driven IO will work on the smallest and slowest of embedded targets.
First try it at the lowest baud rate and only then try at high speeds.

Using a circular buffer in conjunction with IRQ is an excellent suggestion. If your processor generates an interrupt each time a byte is received take that byte and store it in the buffer. How you decide to empty that buffer depends on if you are processing a stream of data or data packets. If you are processing a stream simply have your background process remove the bytes from the buffer and process them first-in-first-out. If you are processing packets then just keep filing the buffer until you have a complete packet. I've used the packet method successfully many times in the past. I would implement some type of flow control as well to signal to the PC if something went wrong like a full buffer or if packet-processing time is long to indicate to the PC when it is ready for the next packet.

You could implement something like IP datagram which contains data length, id, and checksum.
Edit:
Then you could hard-code some fixed length for the packets, for example 1024 byte or whatever that makes sense for the device. PC side would then check if the queue is full at the device every time it writes in a packet. Firmware side would run checksum to see if all data is valid, and read up till the data length.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight