Maximizing performance on udp

Maximizing performance on udp - c

im working on a project with two clients ,one for sending, and the other one for receiving udp datagrams, between 2 machines wired directly to each other.
each datagram is 1024byte in size, and it is sent using winsock(blocking).
they are both running on a very fast machines(separate). with 16gb ram and 8 cpu's, with raid 0 drives.
im looking for tips to maximize my throughput , tips should be at winsock level, but if u have some other tips, it would be great also.
currently im getting 250-400mbit transfer speed. im looking for more.
thanks.

Since I don't know what else besides sending and receiving that your applications do it's difficult to know what else might be limiting it, but here's a few things to try. I'm assuming that you're using IPv4, and I'm not a Windows programmer.
Maximize the packet size that you are sending when you are using a reliable connection. For 100 mbs Ethernet the maximum packet is 1518, Ethernet uses 18 of that, IPv4 uses 20-64 (usually 20, thought), and UDP uses 8 bytes. That means that typically you should be able to send 1472 bytes of UDP payload per packet.
If you are using gigabit Ethernet equiptment that supports it your packet size increases to 9000 bytes (jumbo frames), so sending something closer to that size should speed things up.
If you are sending any acknowledgments from your listener to your sender then try to make sure that they are sent rarely and can acknowledge more than just one packet at a time. Try to keep the listener from having to say much, and try to keep the sender from having to wait on the listener for permission to keep sending.
On the computer that the sender application lives on consider setting up a static ARP entry for the computer that the receiver lives on. Without this every few seconds there may be a pause while a new ARP request is made to make sure that the ARP cache is up to date. Some ARP implementations may do this request well before the ARP entry expires, which would decrease the impact, but some do not.
Turn off as many users of the network as possible. If you are using an Ethernet switch then you should concentrate on the things that will introduce traffic to/from the computers/network devices on which your applications are running reside/use (this includes broadcast messages, like many ARP requests). If it's a hub then you may want to quiet down the entire network. Windows tends to send out a constant stream of junk to networks which in many cases isn't useful.
There may be limits set on how much of the network bandwidth that one application or user can have. Or there may be limits on how much network bandwidth the OS will let it self use. These can probably be changed in the registry if they exist.
It is not uncommon for network interface chips to not actually support the maximum bandwidth of the network all the time. There are chips which may miss packets because they are busy handling a previous packet as well as some which just can't send packets as close together as Ethernet specifications would allow. Additionally the rest of the system might not be able to keep up even if it is.

Some things to look at:
Connected UDP sockets (some info) shortcut several operations in the kernel, so are faster (see Stevens UnP book for details).
Socket send and receive buffers - play with SO_SNDBUF and SO_RCVBUF socket options to balance out spikes and packet drop
See if you can bump up link MTU and use jumbo frames.

use 1Gbps network and upgrade your network hardware...

Test the packet limit of your hardware with an already proven piece of code such as iperf:
http://www.noc.ucf.edu/Tools/Iperf/
I'm linking a Windows build, it might be a good idea to boot off a Linux LiveCD and try a Linux build for comparison of IP stacks.
More likely your NIC isn't performing well, try an Intel Gigabit Server Adapter:
http://www.intel.com/network/connectivity/products/server_adapters.htm

For TCP connections it has been shown that using multiple parallel connections will better utilize the data connection. I'm not sure if that applies to UDP, but it might help with some of the latency issues of packet processing.
So you might want to try multiple threads of blocking calls.

As well as Nikolai's suggestion of send and recv buffers, if you can, switch to overlapped I/O and have many recvs pending, this also helps to minimise the number of datagrams that are dropped by the stack due to lack of buffer space.
If you're looking for reliable data transfer, consider UDT.

Related

Sending UDP and TCP packets on the same network line - how to prevent UDP drops? [duplicate]

Consider the prototypical multiplayer game server.
Clients connecting to the server are allowed to download maps and scripts. It is straightforward to create a TCP connection to accomplish this.
However, the server must continue to be responsive to the rest of the clients via UDP. If TCP download connections are allowed to saturate available bandwidth, UDP traffic will suffer severely from packet loss.
What might be the best way to deal with this issue? It definitely seems like a good idea to "throttle" the TCP upload connection somehow by keeping track of time, and send() on a regular time interval. This way, if UDP packet loss starts to occur more frequently the TCP connections may be throttled further. Will the OS tend to still bunch the data together rather than sending it off in a steady stream? How often would I want to be calling send()? I imagine doing it too often would cause the data to be buffered together first rendering the method ineffective, and doing it too infrequently would provide insufficient (and inefficient use of) bandwidth. Similar considerations exist with regard to how much data to send each time.

It sounds a lot like you're solving a problem the wrong way:
If you're worried about losing UDP packets, you should consider not using UDP.
If you're worried about sharing bandwidth between two functions, you should consider having separate pipes (bandwidth) for them.
Traffic shaping (which is what this sounds like) is typically addressed in the OS. You should look in that direction before making strange changes to your application.
If you haven't already gotten the application working and experienced this problem, you are probably prematurely optimizing.

To avoid saturating the bandwidth, you need to apply some sort of rate limiting. TCP actually already does this, but it might not be effective in some cases. For example, it has no idea weather you consider the TCP or UDP traffic to be the more important.
To implement any form of rate limiting involving UDP, you will first need to calculate UDP loss rate. UDP packets will need to have sequence numbers, and then the client has to count how many unique packets it actually got, and send this information back to the server. This gives you the packet loss rate. The server should monitor this, and if packet loss jumps after a file transfer is started, start lowering the transfer rate until the packet loss becomes acceptable. (You will probably need to do this for UDP anyway, since UDP has no congestion control.)
Note that while I mention "server" above, it could really be done either direction, or both. Depending on who needs to send what. Imagine a game with player created maps that transfer these maps with peer-to-peer connections.
While lowering the transfer rate can be as simple as calling your send function less frequently, attempting to control TCP this way will no doubt conflict with the existing rate control TCP has. As suggested in another answer, you might consider looking into more comprehensive ways to control TCP.
In this particular case, I doubt it would be an issue, unless you really need to send lots of UDP information while the clients are transferring files.
I wold expect most games to just show a loading screen or a lobby while this is happening. Neither should require much UDP traffic unless your game has it's own VOIP.
Here is an excellent article series that explains some of the possible uses of both TCP and UDP, specifically in the context of network games. TCP vs. UDP
In a later article from the series, he even explains a way to make UDP 'almost' as reliable as TCP (with code examples).
And as always... and measure your results. You have no way of knowing if your code is making the connections faster or slower unless you measure.

"# If you're worried about losing UDP packets, you should consider not using UDP."
Right on. UDP means no guarentee of packet delivery, especially over the internet. Check the TCP speed which is quite acceptable in modern day internet connections for most users playing games.

In linux, why do I lose UDP packets if I call send() as fast as possible?

The implicit question is: If Linux blocks the send() call when the socket's send buffer is full, why should there be any lost packets?
More details:
I wrote a little utility in C to send UDP packets as fast as possible to a unicast address and port. I send a UDP payload of 1450 bytes each time, and the first bytes are a counter which increments by 1 for every packet. I run it on a Fedora 20 inside VirtualBox on a desktop PC with a 1Gb nic (=quite slow).
Then I wrote a little utility to read UDP packets from a given port which checks the packet's counter against its own counter and prints a message if they are different (i.e. 1 or more packets have been lost). I run it on a Fedora 20 bi-xeon server with a 1Gb ethernet nic (=super fast). It does show many lost packets.
Both machines are on a local network. I don't know exactly the number of hops between them, but I don't think there are more than 2 routers between them.
Things I tried:
Add a delay after each send(). If I set a delay of 1ms, then no packets are lost any more. A delay of 100us will start losing packets.
Increase the receiving socket buffer size to 4MiB using setsockopt(). That does not make any difference...
Please enlighten me!

For UDP the SO_SNDBUF socket option only limits the size of the datagram you can send. There is no explicit throttling send socket buffer as with TCP. There is, of course, in-kernel queuing of frames to the network card.
In other words, send(2) might drop your datagram without returning an error (check out description of ENOBUFS at the bottom of the manual page).
Then the packet might be dropped pretty much anywhere on the path:
sending network card does not have free hardware resources to service the request, frame is discarded,
intermediate routing device has no available buffer space or implements some congestion avoidance algorithm, drops the packet,
receiving network card cannot accept ethernet frames at given rate, some frames are just ignored.
reader application does not have enough socket receive buffer space to accommodate traffic spikes, kernel drops datagrams.
From what you said though, it sounds very probable that the VM is not able to send the packets at a high rate. Sniff the wire with tcpdump(1) or wireshark(1) as close to the source as possible, and check your sequence numbers - that would tell you if it's the sender that is to blame.

Even if send() blocks when the send buffer is full (provided that you didn't set SOCK_NONBLOCK on the socket to put it in non-blocking mode) the receiver must still be fast enough to handle all incoming packets. If the receiver or any intermediate system is slower than the sender, packets will get lost when using UDP. Note that slower does not only apply to the speed of the network interface but to the whole network stack plus the userspace application.
In your case it is quite possible that the receiver is receiving all packets but can't handle them fast enough in userpace. You can check that by recording and analyzing your traffic via tcpdump or wireshark.
If you don't want to loose packets then switch to TCP.

Any of the two routers you mentionned might drop packets if there is an overload,
and the receiving PC might drop or miss packets as well under certain circumstances such as overload.

As one of the above posters said, UDP is a simple datagram protocol that does not guarantee delivery. Either because of local machine, equipments on the network,etc. That is the reason why many current developers will recommend, if you want reliability, to switch to TCP. However, if you really want to stick to the UDP protocol and there are many valid reasons to do that, you will need to find a library that will help you guarantee the delivery. Look for SS7 projects especially in telephony APIs where UDP is used to transmit voice,data and signalling information. For your sole purpose app may i suggest the enet UDP library.http://enet.bespin.org/

Udp transmit packet lost on linux embedded device (GigE Vision camera) in rare cases

Hardware:
derived from Sequoia-Platform (AMCC)
Using AMCC PowerPC 440EPx and
Marvell 88E1111 Ethernet PHY, 256 M DDR2 RAM
Linux version 2.6.24.2
I transmit data via udp socket ca. 60MB per second in a linux-application (C –Language). Sometimes my PC-Test program notice a lost packet because all packets are numbered (GigE Vision Stream Channel protocol) I Know that the UDP-protocol is unreliable. But because I have clean labor conditions and it always the same last packet which is lost, I think it must be a systematic error somewhere in my code.
So I try to find out the cause of the missing packet over a week but I can’t find it.
Following issues:
using Jumbo-Frames : packet size 8K Bytes
always the same last packet which is lost
error is rare (after some hours and thousands of transferred images)
error rate is higher after a connect or reconnect the device on NIC (after Auto negotiation)
I tried :
Use another NIC
Check my code : check the return values of functions, check the error handling of functions
Log the outgoing packages on my device
View packages with wireshark tool, and check with logged
packages from device
How I can solve the problem?
I know it is difficult because there are so many possibilities of cause of failure.
Questions:
Are there any know bugs on linux 2.6.24 ethernet driver stack(
especially after Auto negotiation) which were fixed in later
versions?
Should I set special options on my transfer socket ? (sock
= socket(AF_INET, SOCK_DGRAM, 0);
Should I renew the socket after Auto negotiation ?
Should I enable any linux diagnostic messages in linux kernel to find out what is going wrong ?
Are there other recommendations ?

I have seen similar problems on an application I once developed where one side of the connection was windows, and the other side was an embedded real time OS (not linux) and the only thing in between was cat5 ethernet cable. Indeed I found that a certain flury of UDP messages would almost always cause 1 of the messages to be lost, and it was always the same message. Very strange, and after a lot of time with wireshark and other network tools I finally decided that it could only be the fact that UDP was unreliable.
My advice is to switch to TCP and build a small message framer:
http://blog.chrisd.info/tcp-message-framing/
I find using TCP to be very reliable, and it can also be very fast if the traffic is "stream like" (meaning: mostly unidirectional) additionally building a message framer on top of TCP is much easier than building TCP on top of UDP.

It might be possible that it's the camera's fault - surprisingly even on the camera side, GigE vision can be more complicated and less deterministic than competing technologies like CameraLink. In particular I've seen my own strange problems and have been told from manufacturers that many of the cameras have some known buffering issues, in particular when running at their higher resolution/framerates.
Check your camera's firmware with the vendor and see if there's an update to address this.
Alternatively perhaps you have some delay between the last packet recvmsg and the previous recvmsg such as processing the frame data before receiving the end of the gvsp frame packet?
Additional recommendation: make sure no switch or other networking equipment is in the middle between the system and the camera - use a direct Cat-6e cable.

Speeding up UDP-based file transfer with loss protection?

I'm trying to learn UDP, and make a simple file transferring server and client.
I know TCP would potentially be better, because it has some reliability built in. However I would like to implement some basic reliability code myself.
I've decided to try and identify when packets are lost, and resend them.
What I've implemented is a system where the server will send the client a certain file in 10 byte chunks. After it sends each chunk, it waits for an acknowledgement. If it doesn't receive one in a few seconds time, it sends the chunk again.
My question is how can a file transfer like this be done quickly? If you send a file, and lets say theirs 25% chance a packet could be lost, then there will be a lot of time built up waiting for the ACK.
Is there some way around this? Or is it accepted that with high packet loss, it will take a very long time? Whats an accepted time-out value for the acknowledgement?
Thanks!

There are many questions in your post, I will try to address some. The main thing is to benchmark and find the bottleneck. What is the slowest operation?
I can tell you now that the bottleneck in your approach is waiting for an ACK after each chunk. Instead of acknowledging chunks, you want to acknowledge sequences. The second biggest problem is the ridiculously small chunk. At that size there's more overhead than actual data (look up the header sizes for IP and UDP).
In conclusion:
What I've implemented is a system where the server will send the
client a certain file in 10 byte chunks.
You might want to try a few hundred bytes chunks.
After it sends each chunk, it waits for an acknowledgement.
Send more chunks before requiring an acknowledgement, and label them. There is more than one way:
Instead of acknowledging chunks, acknowledge data: "I've received
5000 bytes" (TCP, traditional)
Acknowledge multiple chunks in one message. "I've received chunks 1, 5, 7, 9" (TCP with SACK)

What you've implemented is Stop-and-wait ARQ. In a high-latency network, it will inevitably be slower than some other more complex options, because it waits for a full cycle on each transmission.
For other possibilities, see Sliding Window and follow links to other variants. What you've got is basically a degenerate form of sliding window with window-size 1.
As other answers have noted, this will involve adding sequence numbers to your packets, sending additional packets while waiting for acknowledgement, and retransmitting on a more complex pattern.
If you do this, you are essentially reinventing TCP, which uses these tactics to supply a reliable connection.

You want some kind of packet numbering, so that the client can detect a lost packet by the missing number in the sequence of received packets. Then the client can request a resend of the packets it knows it is missing.
Example:
Server sends packet 1,2,3,4,5 to client. Client receives 1,4,5, so it knows 2 and 3 were lost. So client acks 1,4 and 5 and requests resend of 2 and 3.
Then you still need to work out how to handle acks / requests for resends, etc. In any case, assigning a sequence of consecutive numbers to the packets so that packet loss can be detected by "gaps" in the sequence is a decent approach to this problem.

Your question exactly describes one of the problems that TCP tries to answer. TCP's answer is particularly elegant and parsimonious, imo, so reading an English-language description of TCP might reward you.
Just to give you a ballpark idea of UDP in the real world: SNMP is a network-management protocol that is meant to operate over UDP. SNMP requests (around 1500 payload bytes) sent by a manager to a managed node are never explicitly acknowledged and it works pretty well. Twenty-five percent packet loss is a huge number -- real-life packet loss is an order of magnitude somaller, at worst -- and, in that broken environment, SNMP would hardly work at all. Certainly a human being operating the network management system -- the NMS -- would be on the phone to network hardware support very quickly.
When we use SNMP, we generally understand that a good value for timeout is three or four seconds, meaning that the SNMP agent in the managed network node will probably have completed its work in that time.
HTH

Have a look at the TFTP protocol. It is a UDP-based file transfer protocol with built-in ack/resend provisions.

What is the best way to minimize the impact of lost packets on a realtime media stream sent over TCP?

We've implemented an audio-video collaboration application on top of Silverlight, and are trying to tune it. One of the issues we're experiencing is an increase in stream latency whenever a packet is dropped: we have to wait for the packet loss to be detected, requested, and then for the lost packet to be resent. Of course, this plays hell with the consistency of our audio stream. (We'd switch over to UDP if we could, but Silverlight doesn't support that in-browser. We've also disabled the Nagle algorithm, so in general, as soon as we submit a byte[] array to be transmitted, it's transmitted, and in a single packet. I'm aware that TCP packet size != amount of data submitted, but with the Nagle algorithm disabled, it's close. And we have an adaptive jitter buffer, so we can deal with lost packets, but a lost packet over TCP/IP massively increases the amount of audio we need to buffer, and hence latency.)
So we're trying to optimize how we send our packets, to see if there's any way to reduce the impact of dropped packets. We currently have several competing solutions that we're thinking about implementing:
(1) We could try to make our packets larger. Currently, we send a mix of large (~1024 byte video) packets and small (~70 byte audio) packets over the same TCP stream. But we could multiplex the audio and video data together, i.e., by attaching some of our video data to our audio packets whenever there's room. This would make the individual packets somewhat larger, but would cut down on the total number of packets.
(2) We could split the audio and video into two separate TCP streams. This means that if the video stream stalled because of a lost packet, the audio stream wouldn't stall, and vice versa. Of course, it would slightly increase the overhead, and wouldn't cut down on the overall number of packets sent.
(3) We could inverse multiplex the audio into multiple separate TCP streams, and then reassemble them on the far side. This would effectively allow us to "fake" a single UDP style of packet delivery. If we had 8 audio streams, and one of them stalled because of a lost packet, the other streams would still be able to deliver their data on time, and all we'd have to do is deal with 1/8 of the audio packets being unavailable until the stalled stream caught back up. That's not ideal, of course, but it might result in a better experience than to have the entire stream stall, and not being able to play any packets until the lost packet is retransmitted.
Any thoughts on any of these possibilities? Any other suggestions? Or do we just need to code up all three, and then test them?

If you re-enabled the Nagle algorithm you would (i) let TCP send out maximally-sized buffers according to the path MTU rather than your own determination; (ii) accomplish your suggestion (1) of piggybacking audio and video packets; and (iii) reduce the total number of packets. The steady-state performance of a saturated TCP connection with and without the Nagle algorithm is the same so you don't lose anything except during initial window filling.
You should also run the largest socket send buffer you can possibly afford: at least 128k, or double or quadruple that if possible; and you should also use as large a socket receive buffer as possible, although a socket receive buffer > 64k has to be set before connecting the socket so the other end can be told about window scaling during the TCP handshake.

Is this application to be used over the Internet? Is the reason for the lost packets due to Internet quality? If so, beyond developing the application to be as fault tolerant as possible, you may also want to make sure the Internet circuits are of acceptable quality. Good Internet circuits today should not have any more than 0.1% packet loss. You can test Internet circuits and ISPs using our Packet Loss tool. It's free to use so help yourself.

How have you determined that it is packet loss that is causing the stalls?
I don't think separating the streams will help much, you'll just have more problems trying to keep the audio / video in sync.
Either way, no matter what tweaks you use, you will be limited by TCP/IP requiring the packet to be retransmitted. I guess the biggest thing I would look into is whether the TCP stacks on you're server and clients have some of the more advanced options enabled. I'm specifically referring to Selective Acknowledgements and Fast retransmissions (any modern OS should have these by default). Fast retransmissions will have the client ask for a missing packet very quickly when it's detected missing, and the selective acknowledgements will have the server only retransmit the missing portions of the stream.
Ultimately though, it sounds as if you're not using a large enough jitter buffer, if you're unable to tolerate a single packet lost. It's also possible you're application isn't all that consistent in the timing used to send data to the tcp stack. I'd get some packet captures and try and get a good idea what's going on in the network, and see what you can do from there.

I second #Kevin Nisbet on the buffer (unfortunately). If you're using TCP instead of UDP, the buffer needs to be as large as it takes for the server get notified about the missing bytes and for them to reach the client.
Since TCP delivers data to the application as an ordered stream, when a packet gets lost, the stack cannot deliver any additional byte to the app until the ack reporting the missing bytes is sent to the server, processed and the bytes arrive on the client.
Meanwhile, the only thing keeping your app running is the buffer. Do you know how long does it take for the round-trip, including processing?
Without Selective Ack, anything received after that lost byte is useless and needs to be retransmitted. The client will ack the last byte received and the server needs to re-retransmit everything from that point on.
With Selective Ack at least the server only needs to send the missing chunk, but the stack needs to wait for the chunk to arrive nonetheless. It can't give the data it has received so far to the app and then fill in the missing parts. That's what UDP does :)
Maybe you should write to MS...
Coming from the network side, I cringe (a bit) about sending multiple copies of the same content. Is bandwidth plentiful in your application? Maybe some sort of redundancy (FEC or similar) is better than duplicating your content.
Besides, if packet loss could be happening I don't think it would be wise to shove more traffic on the network. Is your computer running half-duplex? :-D