When during the socket lifetime should I set the TCP_QUICKACK option? - c

I know why I should use it, but I'm not sure where to put the setsockopt in my socket code.
It is clear to me that it can be modified by the inner mechanisms of the socket api, but when exactly should I set the TCP_QUICKACK option with setsockopt?
Should I set it at the socket creation then after (or before?) each receive and sends? Or only receives?
Should I check that the option is already set?

When should I set the TCP_QUICKACK option?
The IETF offers TCP Tuning for HTTP, draft-stenberg-httpbis-tcp-03. Section 4.4 of the document explains:
Delayed ACK [RFC1122] is a mechanism enabled in most TCP stacks that
causes the stack to delay sending acknowledgement packets in response
to data. The ACK is delayed up until a certain threshold, or until
the peer has some data to send, in which case the ACK will be sent
along with that data. Depending on the traffic flow and TCP stack
this delay can be as long as 500ms.
This interacts poorly with peers that have Nagle's Algorithm enabled.
Because Nagle's Algorithm delays sending until either one MSS of data
is provided or until an ACK is received for all sent data, delaying
ACKs can force Nagle's Algorithm to buffer packets when it doesn't
need to (that is, when the other peer has already processed the
outstanding data).
Delayed ACKs can be useful in situations where it is reasonable to
assume that a data packet will almost immediately (within 500ms) cause
data to be sent in the other direction. In general in both HTTP/1.1
and HTTP/2 this is unlikely: therefore, disabling Delayed ACKs can
provide an improvement in latency.
However, the TLS handshake is a clear exception to this case. For the
duration of the TLS handshake it is likely to be useful to keep
Delayed ACKs enabled.
Additionally, for low-latency servers that can guarantee responses to
requests within 500ms, on long-running connections (such as HTTP/2),
and when requests are small enough to fit within a small packet,
leaving delayed ACKs turned on may provide minor performance benefits.
Effective use of switching off delayed ACKs requires extensive
profiling.
Later in the document it offers the following:
On recent Linux kernels (since Linux 2.4.4), Delayed ACKs can be
disabled like this:
int one = 1;
setsockopt(fd, IPPROTO_TCP, TCP_QUICKACK, &one, sizeof(one));
Unlike disabling Nagle’s Algorithm, disabling Delayed ACKs on Linux is
not a one-time operation: processing within the TCP stack can cause
Delayed ACKs to be re-enabled. As a result, to use TCP_QUICKACK
effectively requires setting and unsetting the socket option during
the life of the connection.

Related

How to cancel transmission of messages of a particular protocol?

I want to cancel the transmission of messages which use DATA protocol. How to use clear Req to cancel the transmission of messages which use DATA protocol but not others messages which use different protocol.
ClearReq is supported by some agents (e.g. PHY) to stop any ongoing transmission/receptions at the next safe opportunity. However, if the transmission was due to a higher level protocol (e.g. reliable `DatagramReq), that protocol may initiate re-transmission down the road.
DatagramCancelReq is supported by many agents that implement the DATAGRAM service. When supported, this requests cancels a specified previous DatagramReq (if an id of the request is given), or all ongoing datagram transmissions by that agent (if no id is specified).

In linux, why do I lose UDP packets if I call send() as fast as possible?

The implicit question is: If Linux blocks the send() call when the socket's send buffer is full, why should there be any lost packets?
More details:
I wrote a little utility in C to send UDP packets as fast as possible to a unicast address and port. I send a UDP payload of 1450 bytes each time, and the first bytes are a counter which increments by 1 for every packet. I run it on a Fedora 20 inside VirtualBox on a desktop PC with a 1Gb nic (=quite slow).
Then I wrote a little utility to read UDP packets from a given port which checks the packet's counter against its own counter and prints a message if they are different (i.e. 1 or more packets have been lost). I run it on a Fedora 20 bi-xeon server with a 1Gb ethernet nic (=super fast). It does show many lost packets.
Both machines are on a local network. I don't know exactly the number of hops between them, but I don't think there are more than 2 routers between them.
Things I tried:
Add a delay after each send(). If I set a delay of 1ms, then no packets are lost any more. A delay of 100us will start losing packets.
Increase the receiving socket buffer size to 4MiB using setsockopt(). That does not make any difference...
Please enlighten me!
For UDP the SO_SNDBUF socket option only limits the size of the datagram you can send. There is no explicit throttling send socket buffer as with TCP. There is, of course, in-kernel queuing of frames to the network card.
In other words, send(2) might drop your datagram without returning an error (check out description of ENOBUFS at the bottom of the manual page).
Then the packet might be dropped pretty much anywhere on the path:
sending network card does not have free hardware resources to service the request, frame is discarded,
intermediate routing device has no available buffer space or implements some congestion avoidance algorithm, drops the packet,
receiving network card cannot accept ethernet frames at given rate, some frames are just ignored.
reader application does not have enough socket receive buffer space to accommodate traffic spikes, kernel drops datagrams.
From what you said though, it sounds very probable that the VM is not able to send the packets at a high rate. Sniff the wire with tcpdump(1) or wireshark(1) as close to the source as possible, and check your sequence numbers - that would tell you if it's the sender that is to blame.
Even if send() blocks when the send buffer is full (provided that you didn't set SOCK_NONBLOCK on the socket to put it in non-blocking mode) the receiver must still be fast enough to handle all incoming packets. If the receiver or any intermediate system is slower than the sender, packets will get lost when using UDP. Note that slower does not only apply to the speed of the network interface but to the whole network stack plus the userspace application.
In your case it is quite possible that the receiver is receiving all packets but can't handle them fast enough in userpace. You can check that by recording and analyzing your traffic via tcpdump or wireshark.
If you don't want to loose packets then switch to TCP.
Any of the two routers you mentionned might drop packets if there is an overload,
and the receiving PC might drop or miss packets as well under certain circumstances such as overload.
As one of the above posters said, UDP is a simple datagram protocol that does not guarantee delivery. Either because of local machine, equipments on the network,etc. That is the reason why many current developers will recommend, if you want reliability, to switch to TCP. However, if you really want to stick to the UDP protocol and there are many valid reasons to do that, you will need to find a library that will help you guarantee the delivery. Look for SS7 projects especially in telephony APIs where UDP is used to transmit voice,data and signalling information. For your sole purpose app may i suggest the enet UDP library.http://enet.bespin.org/

Two TCP/IP socket send() requests were actually handled in one TCP Message

I had two send()s in my C program and looking at wireshark, I realized they were sent out as one TCP/IP message. I am assuming this is some sort of TCP/IP optimization that determined they were small enough that they could be sent out together. However, I am rebuilding an old program from scratch and I am building my tool based on it's TCP/IP traffic: MTU limitations, internal protocol design, etc. So if the old tool sends out two separate messages, I need to send out two separate messages.
So does anyone know what specifically it is doing in the background(besides simple optimization) and if there is a flag or something that needs to be enabled/disabled so that I get a 1 to 1 ratio of C send()s and TCP/IP transmission? For now all I can do to keep them separated is to put a sleep(1) after every send().
Thanks.
You can set TCP_NODELAY in setsockopt to disable Nagle's algorithm, to prevent your OS from combining small packets. However, it's important for you to realize that TCP is a stream-oriented protocol, and individual "packets" are not intended to be meaningfully separated. Routers along the way are free to combine or split TCP packets (though this is uncommon, due to the extra processing required), and the receiving OS will not necessarily read exactly one sent packet per recv(). If you want to delineate packets of information in TCP, you'll need to use a header structure to report how many of the following bytes belong to that packet.

How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?

I was trying to learn the usage of option SO_KEEPALIVE in socket programming in C language under Linux environment.
I created a server socket and used my browser to connect to it. It was successful and I was able to read the GET request, but I got stuck on the usage of SO_KEEPALIVE.
I checked this link keepalive_description#tldg.org but I could not find any example which shows how to use it.
As soon as I detect the client's request on accept() function I set the SO_KEEPALIVE option value 1 on the client socket. Now I don't know, how to check if the client is down, how to change the time interval between the probes sent etc.
I mean, how will I get the signal that the client is down? (Without reading or writing at the client - I thought I will get some signal when probes are not replied back from client), how should I program it after setting the option SO_KEEPALIVE on).
Also if suppose the probes are sent every 3 secs and the client goes down in between I will not get to know that client is down and I may get SIGPIPE.
Anyways importantly I wanna know how to use SO_KEEPALIVE in the code.
To modify the number of probes or the probe intervals, you write values to the /proc filesystem like
echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes
Note that these values are global for all keepalive enabled sockets on the system, You can also override these settings on a per socket basis when you set the setsockopt, see section 4.2 of the document you linked.
You can't "check" the status of the socket from userspace with keepalive. Instead, the kernel is simply more aggressive about forcing the remote end to acknowledge packets, and determining if the socket has gone bad. When you attempt to write to the socket, you will get a SIGPIPE if keepalive has determined remote end is down.
You'll get the same result if you enable SO_KEEPALIVE, as if you don't enable SO_KEEPALIVE - typically you'll find the socket ready and get an error when you read from it.
You can set the keepalive timeout on a per-socket basis under Linux (this may be a Linux-specific feature). I'd recommend this rather than changing the system-wide setting. See the man page for tcp for more info.
Finally, if your client is a web browser, it's quite likely that it will close the socket fairly quickly anyway - most of them will only hold keepalive (HTTP 1.1) connections open for a relatively short time (30s, 1 min etc). Of course if the client machine has disappeared or network down (which is what SO_KEEPALIVE is really useful for detecting), then it won't be able to actively close the socket.
As already discussed, SO_KEEPALIVE makes the kernel more aggressive about continually verifying the connection even when you're not doing anything, but does not change or enhance the way the information is delivered to you. You'll find out when you try to actually do something (for example "write"), and you'll find out right away since the kernel is now just reporting the status of a previously set flag, rather than having to wait a few seconds (or much longer in some cases) for network activity to fail. The exact same code logic you had for handling the "other side went away unexpectedly" condition will still be used; what changes is the timing (not the method).
Virtually every "practical" sockets program in some way provides non-blocking access to the sockets during the data phase (maybe with select()/poll(), or maybe with fcntl()/O_NONBLOCK/EINPROGRESS&EWOULDBLOCK, or if your kernel supports it maybe with MSG_DONTWAIT). Assuming this is already done for other reasons, it's trivial (sometimes requiring no code at all) to in addition find out right away about a connection dropping. But if the data phase does not already somehow provide non-blocking access to the sockets, you won't find out about the connection dropping until the next time you try to do something.
(A TCP socket connection without some sort of non-blocking behaviour during the data phase is notoriously fragile, as if the wrong packet encounters a network problem it's very easy for the program to then "hang" indefinitely, and there's not a whole lot you can do about it.)
Short answer, add
int flags =1;
if (setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&flags, sizeof(flags))) { perror("ERROR: setsocketopt(), SO_KEEPALIVE"); exit(0); };
on the server side, and read() will be unblocked when the client is down.
A full explanation can be found here.

Maximizing performance on udp

im working on a project with two clients ,one for sending, and the other one for receiving udp datagrams, between 2 machines wired directly to each other.
each datagram is 1024byte in size, and it is sent using winsock(blocking).
they are both running on a very fast machines(separate). with 16gb ram and 8 cpu's, with raid 0 drives.
im looking for tips to maximize my throughput , tips should be at winsock level, but if u have some other tips, it would be great also.
currently im getting 250-400mbit transfer speed. im looking for more.
thanks.
Since I don't know what else besides sending and receiving that your applications do it's difficult to know what else might be limiting it, but here's a few things to try. I'm assuming that you're using IPv4, and I'm not a Windows programmer.
Maximize the packet size that you are sending when you are using a reliable connection. For 100 mbs Ethernet the maximum packet is 1518, Ethernet uses 18 of that, IPv4 uses 20-64 (usually 20, thought), and UDP uses 8 bytes. That means that typically you should be able to send 1472 bytes of UDP payload per packet.
If you are using gigabit Ethernet equiptment that supports it your packet size increases to 9000 bytes (jumbo frames), so sending something closer to that size should speed things up.
If you are sending any acknowledgments from your listener to your sender then try to make sure that they are sent rarely and can acknowledge more than just one packet at a time. Try to keep the listener from having to say much, and try to keep the sender from having to wait on the listener for permission to keep sending.
On the computer that the sender application lives on consider setting up a static ARP entry for the computer that the receiver lives on. Without this every few seconds there may be a pause while a new ARP request is made to make sure that the ARP cache is up to date. Some ARP implementations may do this request well before the ARP entry expires, which would decrease the impact, but some do not.
Turn off as many users of the network as possible. If you are using an Ethernet switch then you should concentrate on the things that will introduce traffic to/from the computers/network devices on which your applications are running reside/use (this includes broadcast messages, like many ARP requests). If it's a hub then you may want to quiet down the entire network. Windows tends to send out a constant stream of junk to networks which in many cases isn't useful.
There may be limits set on how much of the network bandwidth that one application or user can have. Or there may be limits on how much network bandwidth the OS will let it self use. These can probably be changed in the registry if they exist.
It is not uncommon for network interface chips to not actually support the maximum bandwidth of the network all the time. There are chips which may miss packets because they are busy handling a previous packet as well as some which just can't send packets as close together as Ethernet specifications would allow. Additionally the rest of the system might not be able to keep up even if it is.
Some things to look at:
Connected UDP sockets (some info) shortcut several operations in the kernel, so are faster (see Stevens UnP book for details).
Socket send and receive buffers - play with SO_SNDBUF and SO_RCVBUF socket options to balance out spikes and packet drop
See if you can bump up link MTU and use jumbo frames.
use 1Gbps network and upgrade your network hardware...
Test the packet limit of your hardware with an already proven piece of code such as iperf:
http://www.noc.ucf.edu/Tools/Iperf/
I'm linking a Windows build, it might be a good idea to boot off a Linux LiveCD and try a Linux build for comparison of IP stacks.
More likely your NIC isn't performing well, try an Intel Gigabit Server Adapter:
http://www.intel.com/network/connectivity/products/server_adapters.htm
For TCP connections it has been shown that using multiple parallel connections will better utilize the data connection. I'm not sure if that applies to UDP, but it might help with some of the latency issues of packet processing.
So you might want to try multiple threads of blocking calls.
As well as Nikolai's suggestion of send and recv buffers, if you can, switch to overlapped I/O and have many recvs pending, this also helps to minimise the number of datagrams that are dropped by the stack due to lack of buffer space.
If you're looking for reliable data transfer, consider UDT.

Resources