Strange lag when sending large message using tcp_nodelay

Strange lag when sending large message using tcp_nodelay - c

I'm writing a c program that sends the output of a bash shell over a tcp connection. To make my program more responsive, I used setsockopt() to enable TCP_NODELAY, which disables Nagle's buffering algorithm. This worked great, except rarely there is a lag in large messages. As in, if the message is more than around 500 bytes (probably 512). The first 500 bytes will go through (quickly in small messages), then there'll be a 1-2 second delay before the rest is received all at once. This only happens once every 10-15 times a large message is received. On the server side, the message is being written to the socket one byte at a time, and all of the bytes are available, so this behavior is unexpected to me.
My best guess is that there's a 512 byte buffer somewhere in the socket that's causing a block? I did some time tests to see where the lag is, and I'm pretty sure it's the socket itself where the lag is occurring. All of the data on the server side is written without blocking, but the client receives the end of the message after a lag. However I used getsockopt() to find the socket's receive and send buffers, and they are well over 512 bytes - 66000 and 130000 respectively. On the client side, I'm using express js to receive the data in a handler (app.on('data', function(){})). But I read that this express function does not buffer data?
Would anyone have a guess why this is happening? Thanks!

Since TCP_NODELAY means send every piece of data as a packet as soon as possible without combining data together, it sounds like you are sending tons of packets. Since you are writing one byte at a time it could send packets with just one byte of payload and a much bigger frame. This would work fine most of the time but as soon as the first packet drops for whatever reason the receiver would need to go into error-correction mode on the TCP socket to ask for retransmission of the dropped packet. That would incur at least one round-trip latency and perhaps several. It sounds like you are getting lucky for the first several hundred packets (500 bytes worth) and then typically hitting your first packet drop and slowing way down due to error correction. One simple solution might be to write in larger chunks, say 10 bytes at a time, instead of 1 byte so that the chance of hitting a dropped packet is much less. Then you would expect to see this problem as often as you do only for messages around 5000 bytes or so. In general setting TCP_NODELAY will cause things to go faster at first but wind up hitting the first dropped packet sooner simply because TCP_NODELAY will not decrease the number of packets you send per amount of data. So it increases or leaves the number of packets the same which means your chance of hitting a dropped packet within a certain amount of data will go up. There is a tradeoff here between interactive feel and first hiccup. By avoiding TCP_NODELAY you can delay the typical amount of data that will be sent before the first error retransmission is hit on average.

Get a network capture using tcpdump or wire-shark. Review the packet transmission time line, this will help distinguish network problems from software implementation issues. If you see retransmissions you may have a network issue, if you see slow acks you might find it better to NOT use 'No Delay' since Ack delay can stall a 'No Delay' connection.

Related

Accumulate data in receive buffer in order to prevent busy-waiting when epoll_waiting on slow connections

Clients sending sufficient large amount of data with sufficient slow internet connection are causing me to busy-wait in a classic non-blocking server-client setup in C with sockets.
The busy-waiting is caused in detail by this procedure
I install EPOLLIN for client, (monitor for receiving data)
client sends data.
epoll_wait signalizes me there is data to be read (EPOLLIN)
coroutine is being resumed, data is being consumed, more data is needed in order to finish this client. EWOULDBLOCK and BACK TO 1.
This above procedure is being repeated for minutes (due to the slow internet connection and large data). It's basically just a useless hopping around without doing anything meaningful other than consuming cpu time. Additionally it's kind of killing the purpose of epoll_wait.
So, I wanted to avoid this busy-waiting by some mechanism which does accumulate the data in receive buffer until either a minimum size has been reached or a maximal timeout has passed since the first byte arrived and only then epoll_wait should wake me up with EPOLLIN for this client.
I first looked into tcp(7), I was hoping for something like TCP_CORK but for the receive buffer, but could not find anything.
Then I looked into unix(7) and tried to implement it myself via SIOCINQ right after step 3. The problem is that I end up busy-waiting again because step 3. is immediately going to return because data is available for read. Alternatively I could deregister the client right after 3., but this would block this specific client until epoll_wait returns from a different client.
Is it a stalemate, or is there any solution to the above problem to accumulate data inside receive buffer upon a min size or max time without busy-waiting?

#ezgoing and I chatted at length about this, and I'm convinced this is a non problem (as #user207421 noted as well).
When I first read the question, I thought perhaps they were worried about tiny amounts (say, 16 bytes at a time), and that would have been worth investigating, but once it turns out that it's 4KiB at a time, it's so routine that this is not worth looking into.
Interestingly, the serial I/O module does support this, with a mode that wakes up only after so many characters are available or so much time has passed, but no such thing with the network module.
The only time this would be worth addressing is if there is actual evidence that it's impacting the application's responsiveness in a meaningful way, not a hypothetical concern for packet rates.

Streaming audio udp C

I'm writing a program with a client and a server I almost achieved it.
At the moment I can execute the server on a port. The client in the same port with the IP adress and the name of the .wav file that I want to read.
Now what I'd like to do is making a timeout between each sendto() so that the client receives the packet and read them well. without that the client receives many packets at once and it losts many of them.
So could someone tell me how it works in UDP, and how to do that ?

making a timeout between each sendto()
I believe that you are asking how to put a small delay between each sendto(). If you open raw wav file and send bytes, there is a good chance that the data will be getting to the client much faster than it can play it. If you want to stream data at the same rate as it is played, send data in chunks, then let the client request the next chunk.
If that is not an option, you can send a chunk of data (i.e. 20ms). Then let the thread sleep for a little less than 20ms then send the next chunk. Sleeps are kind of a hack. Some sort of audio callback would be best on the server. Bottom line is that your client buffer has to be big enough to consume the the amount of data your server is sending.
without that the client receives many packets at once and it losts many of them
I believe that you are asking how to deal with the variety of packet inter arrival rates and the packet losses and out of order packets received. It sounds like you were just sending packets at too fast a rate that your client could handle. You might need a larger buffer on the client.
In any case, with UDP/IP, you have the following scenarios
lost packets
packets arriving out of order
packets arriving in bursts: (each packet will not arrive exactly X ms apart)
To deal with this, you have to minimally have what is know as a dejitter buffer. This is a buffer that collects packets as they arrive and inserts them typically in a ring buffer. The buffer will have to be large enough to buffer up packets that your server is sending. Your client is potentially consuming the packets from the buffer slower than the server is sending them (or vice versa). In order to get packets in the right order and deal with losses, you have to detect it. You can detect losses and out of order arrivals by simply numbering each packet that is sent. As packets arrive you can put them into the buffer into the correct location. If a packet is lost, you need to deal with that with some sort of loss concealment (playing silence, estimating the lost packet, etc.) which is beyond the scope of this question,
The RTP protocol is designed for streaming and is an application protocol that work over UDP.

Since you're using UDP, which is connectionless, you don't really have a way to control the flow of packets unless you implement some kind of acknowledgement mechanism... at which point you might as well be using TCP because it already has that built in.
Although I don't have much experience in network programming, this looks a bit more complicated than it might seem at first glance. So UDP is connectionless. That speeds things up a lot, but there is a price to pay -- off the top of my head, packets can get lost or arrive out of order.
Those are situations you need to handle on the client end. Your client needs to be designed so that it accepts packets as they arrive at an arbitrary rate, skips over those that fail to arrive within a certain time (for live streaming, for buffered that doesn't matter) and takes order into consideration, which means that each packet needs to contain information about its place relative to previous packets.

Socket client/server input/output polling vs. read/write in linux

Basically I set up a test to see which method is the fastest way to get data from another computer my network for a server with only a few clients(10 at max, 1 at min).
I tried two methods, both were done in a thread/per client fashion, and looped the read 10000 times. I timed the loop from the creation of the threads to the joining of the threads right after. In my threads I used these two methods, both used standard read(2)/write(2) calls and SOCK_STREAM/AF_INET:
In one I polled for data in my client reading(non blocking) whenever data was available, and in my server, I instantly sent data whenever I got a connection. My thread returned on a read of the correct number of bytes(which happened every time).
In the other, my client sent a message to the sever on connect and my server sent a message to my client on a read(both sides blocked here to make this more turn-based and synchronous). My thread returned after my client read.
I was pretty sure polling would be faster. I made a histogram of times to complete threads, and, as expected, polling was faster by a slight margin, but two things were not expected about the read/write method. Firstly, the read/write method gave me two distinct time spikes. I.E. some event sometimes occurred which would slow the read/write down by about .01 microseconds. I ran this test on a switch initially, and thought this may be a collision of packets, but then I ran the server and client on the same computer and still got these two different time spikes. Anyone know what event may be occurring?
The other, my read function returned too many bytes sometimes, and some bytes were garbage. I know streams don't guarantee you'll get all the data correctly, but why would the read function return extra garbage bytes?

Seems you are confusing the purpose of these two alternatives:
Connection per thread approach does not need polling (unless your protocol allows for random sequence of messages either way, which would be very confusing to implement). Blocking reads and writes will always be faster here since you skip one extra system call to select(2)/poll(2)/epoll(4).
Polling approach allows to multiplex I/O on many sockets/files in single-threaded or fixed-number-of-threads setup. This is how web-servers like nginx handle thousands of client connections in very few threads. The idea is that wait on any given file descriptor does not block others - wait on all of them.
So I would say you are comparing apples and goblins :) Take a look here:
High Performance Server Architecture
The C10K problem
libevent
As for the spikes - check if TCP gets into re-transmission mode, i.e. one of the sides is not reading fast enough to drain receive buffers, play with SO_RCVBUF and SO_SNDBUF socket options.
Too many bytes is definitely wrong - looks like API misuse - check if you are comparing signed and unsigned numbers, compile with high warning level.
Edit:
Looks like you are between two separate issues - data corruption and data transfer performance. I would strongly recommend focusing on the first one before tackling the second. Reduce the test to a minimum and try to figure out what you are doing wrong with the sockets. i.e. where's that garbage data comes from. Do you check return values of the read(2) and write(2) calls? Do you share buffers between threads? Paste the reduced code sample into the question (or provide a link to it) if really stuck.
Hope this helps.

I know streams don't guarantee you'll get all the data correctly, but why would the read function return extra garbage bytes?
Actually, streams do guarantee you will get all the data correctly, and in order. Datagrams (UDP) are what you were thinking of, SOCK_DGRAM, which is not what you are using. Within AF_INET, SOCK_STREAM means TCP and TCP means reliable.

Winsock UDP packets being dropped?

We have a client/server communication system over UDP setup in windows. The problem we are facing is that when the throughput grows, packets are getting dropped. We suspect that this is due to the UDP receive buffer which is continuously being polled causing the buffer to be blocked and dropping any incoming packets. Is it possible that reading this buffer will cause incoming packets to be dropped? If so, what are the options to correct this? The system is written in C. Please let me know if this is too vague and I can try to provide more info. Thanks!

The default socket buffer size in Windows sockets is 8k, or 8192 bytes. Use the setsockopt Windows function to increase the size of the buffer (refer to the SO_RCVBUF option).
But beyond that, increasing the size of your receive buffer will only delay the time until packets get dropped again if you are not reading the packets fast enough.
Typically, you want two threads for this kind of situation.
The first thread exists solely to service the socket. In other words, the thread's sole purpose is to read a packet from the socket, add it to some kind of properly-synchronized shared data structure, signal that a packet has been received, and then read the next packet.
The second thread exists to process the received packets. It sits idle until the first thread signals a packet has been received. It then pulls the packet from the properly-synchronized shared data structure and processes it. It then waits to be signaled again.
As a test, try short-circuiting the full processing of your packets and just write a message to the console (or a file) each time a packet has been received. If you can successfully do this without dropping packets, then breaking your functionality into a "receiving" thread and a "processing" thread will help.

Yes, the stack is allowed to drop packets — silently, even — when its buffers get too full. This is part of the nature of UDP, one of the bits of reliability you give up when you switch from TCP. You can either reinvent TCP — poorly — by adding retry logic, ACK packets, and such, or you can switch to something in-between like SCTP.
There are ways to increase the stack's buffer size, but that's largely missing the point. If you aren't reading fast enough to keep buffer space available already, making the buffers larger is only going to put off the time it takes you to run out of buffer space. The proper solution is to make larger buffers within your own code, and move data from the stack's buffers into your program's buffer ASAP, where it can wait to be processed for arbitrarily long times.

Is it possible that reading this buffer will cause incoming packets to be dropped?
Packets can be dropped if they're arriving faster than you read them.
If so, what are the options to correct this?
One option is to change the network protocol: use TCP, or implement some acknowledgement + 'flow control' using UDP.
Otherwise you need to see why you're not reading fast/often enough.
If the CPU is 100% utilitized then you need to do less work per packet or get a faster CPU (or use multithreading and more CPUs if you aren't already).
If the CPU is not 100%, then perhaps what's happening is:
You read a packet
You do some work, which takes x msec of real-time, some of which is spent blocked on some other I/O (so the CPU isn't busy, but it's not being used to read another packet)
During those x msec, a flood of packets arrive and some are dropped
A cure for this would be to change the threading.
Another possibility is to do several simultaneous reads from the socket (each of your reads provides a buffer into which a UDP packet can be received).
Another possibility is to see whether there's a (O/S-specific) configuration option to increase the number of received UDP packets which the network stack is willing to buffer until you try to read them.

First step, increase the receiver buffer size, Windows pretty much grants all reasonable size requests.
If that doesn't help, your consume code seems to have some fairly slow areas. I would use threading, e.g. with pthreads and utilize a producer consumer pattern to put the incoming datagram in a queue on another thread and then consume from there, so your receive calls don't block and the buffer does not run full
3rd step, modify your application level protocol, allow for batched packets and batch packets at the sender to reduce UDP header overhead from sending a lot of small packets.
4th step check your network gear, switches, etc. can give you detailed output about their traffic statistics, buffer overflows, etc. - if that is in issue get faster switches or possibly switch out a faulty one
... just fyi, I'm running UDP multicast traffic on our backend continuously at avg. ~30Mbit/sec with peaks a 70Mbit/s and my drop rate is bare nil

Not sure about this, but on windows, its not possible to poll the socket and cause a packet to drop. Windows collects the packets separately from your polling and it shouldn't cause any drops.
i am assuming your using select() to poll the socket ? As far as i know , cant cause a drop.

The packets could be lost due to an increase in unrelated network traffic anywhere along the route, or full receive buffers. To mitigate this, you could increase the receive buffer size in Winsock.
Essentially, UDP is an unreliable protocol in the sense that packet delivery is not guaranteed and no error is returned to the sender on delivery failure. If you are worried about packet loss, it would be best to implement acknowledgment packets into your communication protocol, or to port it to a more reliable protocol like TCP. There really aren't any other truly reliable ways to prevent UDP packet loss.

tcp - getting num bytes acked

In standard tcp implementations (say, on bsd), does anybody know if it's possible to find out how many bytes have been ack-ed by the remote host? Calling write() on a socket returns the number of bytes written, but I believe this actually means the number of bytes that could fit into the tcp buffer (not the number of bytes written to the network, or the number of bytes acked). Or maybe I'm wrong...
thanks!

When you have NODELAY=false (which is the default), when you call send() with less bytes than the TCP window, the bytes are not really sent immediately, so you're right. The OS will wait a little to see if you call another send(), in order to use only one packet to transmit the combined data, and avoid wasting a TCP header.
When NODELAY=true the data is transmitted when you call send(), so you can (theoretically) count on the returned value. But this is not recommended due to the added network inefficiency.
All in all, if you don't need absolute precision, you can use the value returned by send() even when NODELAY=true. The value will not reflect immediate reality, but some miliseconds later it will (but also check for lost connections, since the last data block you sent could have been lost). Once the connection is gracefully terminated, you can trust all the data was transmitted. If it wasn't, you'll know before - either because the connection was abruptly dropped or because you received a data retention related error (or any other).

I don't know of any way to get this and its probably not useful to you anyway.
Assuming you want to know how much data was received by the host so that after connection lost and re-connection you can start sending from there again. So, the ACK'd data has only been ACK'd by the OS! It doesn't indicate what data has been received by your program on the other side; depending on the size of the TCP receive buffer there, your program could be hundreds of KB behind. If you want to know how much data has been received and 'used' by the program there, then get it to send application-level ACKs

I think you're wrong, although its one of those places where I'd want to look at the specific implementation before I would bet serious money on it. Consider, though, the case of a TCP connection where the connection is dropped immediately after the original handshake. If the number-of-bytes returned were just the number of buffered, it would be possible to apparently have written a number of bytes, but have them remain undelivered; that would violate TCP's guarantee-of-delivery property.
Note, though, that this is only true of TCP; not all protocols within IP provide the same guarantee.

You might have some luck for TCP by using ioctl(fd, TIOCOUTQ, &intval); to get the outgoing queue into intval. This will be the total length kept in the queue, including "written by the app" but not yet sent. It's still the best approximation I can think of at the moment.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight