tcp - getting num bytes acked - c

In standard tcp implementations (say, on bsd), does anybody know if it's possible to find out how many bytes have been ack-ed by the remote host? Calling write() on a socket returns the number of bytes written, but I believe this actually means the number of bytes that could fit into the tcp buffer (not the number of bytes written to the network, or the number of bytes acked). Or maybe I'm wrong...
thanks!

When you have NODELAY=false (which is the default), when you call send() with less bytes than the TCP window, the bytes are not really sent immediately, so you're right. The OS will wait a little to see if you call another send(), in order to use only one packet to transmit the combined data, and avoid wasting a TCP header.
When NODELAY=true the data is transmitted when you call send(), so you can (theoretically) count on the returned value. But this is not recommended due to the added network inefficiency.
All in all, if you don't need absolute precision, you can use the value returned by send() even when NODELAY=true. The value will not reflect immediate reality, but some miliseconds later it will (but also check for lost connections, since the last data block you sent could have been lost). Once the connection is gracefully terminated, you can trust all the data was transmitted. If it wasn't, you'll know before - either because the connection was abruptly dropped or because you received a data retention related error (or any other).

I don't know of any way to get this and its probably not useful to you anyway.
Assuming you want to know how much data was received by the host so that after connection lost and re-connection you can start sending from there again. So, the ACK'd data has only been ACK'd by the OS! It doesn't indicate what data has been received by your program on the other side; depending on the size of the TCP receive buffer there, your program could be hundreds of KB behind. If you want to know how much data has been received and 'used' by the program there, then get it to send application-level ACKs

I think you're wrong, although its one of those places where I'd want to look at the specific implementation before I would bet serious money on it. Consider, though, the case of a TCP connection where the connection is dropped immediately after the original handshake. If the number-of-bytes returned were just the number of buffered, it would be possible to apparently have written a number of bytes, but have them remain undelivered; that would violate TCP's guarantee-of-delivery property.
Note, though, that this is only true of TCP; not all protocols within IP provide the same guarantee.

You might have some luck for TCP by using ioctl(fd, TIOCOUTQ, &intval); to get the outgoing queue into intval. This will be the total length kept in the queue, including "written by the app" but not yet sent. It's still the best approximation I can think of at the moment.

Related

Strange lag when sending large message using tcp_nodelay

I'm writing a c program that sends the output of a bash shell over a tcp connection. To make my program more responsive, I used setsockopt() to enable TCP_NODELAY, which disables Nagle's buffering algorithm. This worked great, except rarely there is a lag in large messages. As in, if the message is more than around 500 bytes (probably 512). The first 500 bytes will go through (quickly in small messages), then there'll be a 1-2 second delay before the rest is received all at once. This only happens once every 10-15 times a large message is received. On the server side, the message is being written to the socket one byte at a time, and all of the bytes are available, so this behavior is unexpected to me.
My best guess is that there's a 512 byte buffer somewhere in the socket that's causing a block? I did some time tests to see where the lag is, and I'm pretty sure it's the socket itself where the lag is occurring. All of the data on the server side is written without blocking, but the client receives the end of the message after a lag. However I used getsockopt() to find the socket's receive and send buffers, and they are well over 512 bytes - 66000 and 130000 respectively. On the client side, I'm using express js to receive the data in a handler (app.on('data', function(){})). But I read that this express function does not buffer data?
Would anyone have a guess why this is happening? Thanks!
Since TCP_NODELAY means send every piece of data as a packet as soon as possible without combining data together, it sounds like you are sending tons of packets. Since you are writing one byte at a time it could send packets with just one byte of payload and a much bigger frame. This would work fine most of the time but as soon as the first packet drops for whatever reason the receiver would need to go into error-correction mode on the TCP socket to ask for retransmission of the dropped packet. That would incur at least one round-trip latency and perhaps several. It sounds like you are getting lucky for the first several hundred packets (500 bytes worth) and then typically hitting your first packet drop and slowing way down due to error correction. One simple solution might be to write in larger chunks, say 10 bytes at a time, instead of 1 byte so that the chance of hitting a dropped packet is much less. Then you would expect to see this problem as often as you do only for messages around 5000 bytes or so. In general setting TCP_NODELAY will cause things to go faster at first but wind up hitting the first dropped packet sooner simply because TCP_NODELAY will not decrease the number of packets you send per amount of data. So it increases or leaves the number of packets the same which means your chance of hitting a dropped packet within a certain amount of data will go up. There is a tradeoff here between interactive feel and first hiccup. By avoiding TCP_NODELAY you can delay the typical amount of data that will be sent before the first error retransmission is hit on average.
Get a network capture using tcpdump or wire-shark. Review the packet transmission time line, this will help distinguish network problems from software implementation issues. If you see retransmissions you may have a network issue, if you see slow acks you might find it better to NOT use 'No Delay' since Ack delay can stall a 'No Delay' connection.

Reading all available bytes via socket using blocking I/O

When reading from a socket using read(2) and blocking I/O, when do I know that the other side (the client) has no more data to send? (by "no more data to send" I mean that, as an example, the client is waiting for a response). At first, I thought that this point is reached when less than count bytes are returned by read (as in read(fd, *buf, count)).
But what if the client sends the data fragmented? Reading until read returns 0 would be a solution, but as far as I know 0 is only returned when the client closes the connection - otherwise, read would just block until the connection is closed. I thought of using non-blocking I/O and a timeout for select(2), but this does not seem to be a tidy solution to me.
Are there any known best practices?
The concept of "the other side has no more data to send", without either a timeout or some semantics in the transmitted data, is quite pointless. Normally, code on the client/server will be able to process data faster than the network can transmit it. So if there's no data in the receive buffer when you're trying to read() it, this just means the network has not yet transmitted everything, but you have no way to tell if the next packet will arrive within a millisecond, a second, or a day. You'd probably consider the first case as "there is more data to send", the third as "no more data to send", and the second depends on your application.
If the other side doesn't close the connection, you probably don't know when it's ready to send the next data packet either.
So unless you have specific semantics and knowledge about what the client sends, using select() and non-blocking I/O is the best you can do.
In specific cases, there might be other ways - for example, if you know the client will send and XML tag, some data, and a closing tag, every n seconds. In that case you could start reading n seconds after the last packet you received, then just read on until you receive the closing tag. But as i said, this isn't a general approach since it requires semantics on the channel.
TCP is a byte-stream protocol, not a message protocol. If you want messages you really have to implement them yourself, e.g. with a length-word prefix, lines, XML, etc. You can guess with the FIONREAD option of ioctl(), but guessing is all it is, as you can't know whether the client has paused in the middle of transmission of the message, or whether the network has done so for some reason.
The protocol needs to give you a way to know when the client is finishes sending a message.
Common approaches are to send the length of each message before it, or to send a special terminator after each message (similar to the NUL character at the end of strings in C).

Efficient way of validating UDP communication protocol having single Server and multiple Clients

I have developed a single Server/multiple Clients udp application, where Server can handle x number of clients at a time. The Server has x number of threads each thread dedicated to one Client.
The code works perfectly fine. Now I want to check my application for all possible scenarios i.e. validate my application. For this purpose, I need to design a test best.
Initial Design:
The test bed I initially designed has following functionalities:
The Server GUI has a button on it. When the button is clicked, the
each thread in the Server reads a text file, picks up few bytes of
the text file, and sends those chunks to its respective clients. The
thread then picks next chunk of bytes from the text file, sends those
chunks to the client and so on until EOF is found.
The Client on the other side keep receiving these chunks of bytes,
creates a text file, and keeps storing these chunks of bytes in its
text file.
When EOF from Server is received, the Client starts sending the
completely received text file back to the Server over its Socket.
When the file is completely received back (echoed), the Server then
compares the two text files, the Sent file and the echoed one. If
both files are same, the communication process has occurred without
any fault and the communication protocol is validated.
The above mentioned validation technique (sending the text file, receiving the echoed file and then comparing both) checks the following things:
The number of bytes sent = number of bytes receieved.
No data is corrupted.
The data is receieved in proper order.
If any of the above mentioned three conditions is not fulfilled, that means that there is some error in communication.
Now I have been asked to make changes to this test bead and add more functionlities to it. Does the procedure that I am using actually can check above mentioned 3 conditions in all scenarios?
Are there some other conditions that must be checked besides above mentioned 3 conditions.
What could be other methods of checking communication protocol except the one I desgined i.e. Sending a text file and getting it echoed and then comparing.
I have to implement more functionlities to his test bed for making validation system more efficient or completely replece the above test bed with some better option.
Please help me with your suggestions.
Thanks in advane :)
The first two of your conditions are guaranteed by UDP. Picking "a few bytes", i.e. anything less than 65535 bytes (64kiB isn't really a "few" bytes) will result in a single datagram being sent, and anything larger than that will fail. Though you will not want to max out the largest possible datagram size, as it will incur IP fragmentation (staying below 1280 bytes is a good idea).
You will be able to receive exactly the amount you sent or nothing at all, never more or less. UDP does not guarantee that any datagram that is sent out arrives (it cannot guarantee that, since IP does not), but it does guarantee that the entire datagram arrives as-is -- or nothing. Never anything in between.
It further guarantees that the data inside the datagram matches its checksum (the underlying protocols including IP/ethernet/ATM further do their own checksumming) and thus arrives in the same binary representation as it was sent. In other words, data arrives in order (inside the datagram) and is not corrupted.
It is of course in theory possible that a bit error passes all 3 layers of checksums, but this is extremely unlikely and will not happen in practice. Unless you need to guard against someone maliciously tampering with packets, you do not need to worry. The kinds of bit errors that happen accidentially are reliably picked up by the checksums used in the protocols.
If, on the other hand, you do need to guard against malicious modification of your data, you must add a MAC (or a checksum and encrypt the entire packet -- adding a checksum alone is useless).
To ensure that data spanning several datagrams arrives in order, you must add sequence numbers to your packets (in the same manner TCP does). And with that, you can as well use TCP, which is likely more efficient and less error-prone. One of the main reasons why one would want to use UDP is normally because in-order delivery and reliability are not needed, or sometimes reliability is needed, but not in-order delivery.
In-order delivery is the main cause of TCP's latency during packet loss (in absence of packet loss, TCP is exactly as "fast" as UDP), so if this is needed, there is no sane reason not to use TCP in the first place. It is a protocol that has been fine-tuned and worked reliably for literally billions of people for 4 decades.
Also, using one socket and one thread per client is possibly not the best approach. The disk won't read any faster, and the network card won't send any faster either. UDP doesn't need a socket per client either. When using TCP, you'll have no other choice but to use one socket per client, but still multiplexing using a readiness notification system will give you much better performance and fewer opportunities for threading errors.
Also, sending back a checksum such as one of the SHA family (or a MAC, if it needs to be secure) may be more efficient than echoing back the whole lot of data. The likelihood that the checksum matches and the data accidentially doesn't is neglegible.
Entire revision control systems that manage millions of lines of code for millions of people (such as git) rely on the fact that this just doesn't happen to identify files (well, it does happen of course, you just won't live to see it).
I have a question here ? Why UDP why not TCP? especially when you are worried for packet order and data corruption. According to me(I may be wrong), UDP is good only when the data is timesensitive like video stream.
Secondly, yes there are other methods of checking integrity of transmitted data. Simplest may be checking the MD5 and SHA1 checksum.
Does the procedure that I am using actually can check above mentioned 3 conditions in all scenarios?
yes
What could be other methods of checking communication protocol except the one I desgined i.e. Sending a text file and getting it echoed and then comparing.
It doesn't have to be a file, but it has to be something you can check once you get the response. You could just generate some random data and hold on to it until you get the response.
You'd have to tell us what you really want to test. If you are trying to make sure that UDP doesn't give you bad data or out of order data, you're using the wrong protocol. You're not testing anything by seeing if you get the exact data in the exact order you send it over UDP except for the networking infrastructure you have in place.
You say you want to test your application for "all possible scenarios", but that doesn't even mean anything. You're testing to see if a behavior that is part of the UDP specification exists and trying to see that it doesn't? Well, it does. Even if you never see it.

How long does a UDP packet stay at a socket?

If data is sent to the client but the client is busy executing something else, how long will the data be available to read using recvfrom()?
Also, what happens if a second packet is sent before the first one is read, is the first one lost and the next one sitting there wating to be read?
(windows - udp)
If data is sent to the client but the client is busy executing something else, how long will the data be available to read using recvfrom()?
Forever, or not at all, or until you close the socket or read as much as a single byte.
The reason for that is:
UDP delivers datagrams, or it doesn't. This sounds like nonsense, but it is exactly what it is.
A single UDP datagram relates to either exactly one or several "fragments", which are IP packets (further encapsulated in some "on the wire" protocol, but that doesn't matter). The network stack collects all fragments for a datagram. If the checksum on any of the fragments is not good, or any other thing that makes the network stack unhappy, the complete datagram is discarded, and you get nothing, not even an error. You simply don't know anything happened.
If all goes well, a complete datagram is placed into the receive buffer. Never anything less, and never anything more. If you try to recvfrom later, that is what you'll get.
The receive buffer is obviously necessarily large enough to hold at least one max-size datagram (65535 bytes), but since usually datagrams will not be maximum size, but rather something below 1280 bytes (or 1500 if you will), it can usually hold quite a few of them (on most platforms, the buffer defaults to something around 128-256k, and is configurable).
If there is not enough room left in the buffer, the datagram is discarded, and you get nothing (well, you do still get the ones that are already in the buffer). Again, you don't even know something happened.
Each time you call recvfrom, a complete datagram is removed from the buffer (important detail!), and you get up to the number of bytes that you requested. Which means if you naively try read a few bytes and then a few bytes again, it just won't work. The first read will discard the rest of the datagram, and the subsequent ones read the first bytes of some future datagrams (and possibly block)!
This is very different from how TCP works. Here you can actually read a few bytes and a few bytes again, and it will just work, because the network layer simulates a data stream. You give a crap how it works, because the network stack makes sure it works.
Also, what happens if a second packet is sent before the first one is read, is the first one lost and the next one sitting there waiting to be read?
You probably meant to say "received" rather than "sent". Send and receive have different buffers, so that would not matter at all. About receiving another packet while one is still in the buffer, see the above explanation. If the buffer can hold the second datagram, it will store it, otherwise it silently goes * poof *.
This does not affect any datagrams already in the buffer.
Normally, the data will be buffered until it's read. I suppose if you wait long enough that the driver completely runs out of space, it'll have to do something, but assuming your code works halfway reasonably, that shouldn't be a problem.
A typical network driver will be able to buffer a number of packets without losing any.
If data is sent to the client but the client is busy executing something else, how long will the data be available to read using recvfrom()?
This depends on the OS, in windows, I believe the default for each UDP socket is 8012, this can be raised with setsockopt() Winsock Documentation So, as long as the buffer isn't full, the data will stay there until the socket is closed or it is read.
Also, what happens if a second packet is sent before the first one is read, is the first one lost and the next one sitting there wating to be read?
If the buffer has room, they are both stored, if not, one of them gets discarded. I believe its the newest one but I'm not 100% Sure.

can one call of recv() receive data from 2 consecutive send() calls?

i have a client which sends data to a server with 2 consecutive send calls:
send(_sockfd,msg,150,0);
send(_sockfd,msg,150,0);
and the server is receiving when the first send call was sent (let's say i'm using select):
recv(_sockfd,buf,700,0);
note that the buffer i'm receiving is much bigger.
my question is: is there any chance that buf will contain both msgs? of do i need 2 recv() calls to get both msgs?
thank you!
TCP is a stream oriented protocol. Not message / record / chunk oriented. That is, all that is guaranteed is that if you send a stream, the bytes will get to the other side in the order you sent them. There is no provision made by RFC 793 or any other document about the number of segments / packets involved.
This is in stark contrast with UDP. As #R.. correctly said, in UDP an entire message is sent in one operation (notice the change in terminology: message). Try to send a giant message (several times larger than the MTU) with TCP ? It's okay, it will split it for you.
When running on local networks or on localhost you will certainly notice that (generally) one send == one recv. Don't assume that. There are factors that change it dramatically. Among these
Nagle
Underlying MTU
Memory usage (possibly)
Timers
Many others
Of course, not having a correspondence between an a send and a recv is a nuisance and you can't rely on UDP. That is one of the reasons for SCTP. SCTP is a really really interesting protocol and it is message-oriented.
Back to TCP, this is a common nuisance. An equally common solution is this:
Establish that all packets begin with a fixed-length sequence (say 32 bytes)
These 32 bytes contain (possibly among other things) the size of the message that follows
When you read any amount of data from the socket, add the data to a buffer specific for that connection. When 32 bytes are reached, read the length you still need to read until you get the message.
It is really important to notice how there are really no messages on the wire, only bytes. Once you understand it you will have made a giant leap towards writing network applications.
The answer depends on the socket type, but in general, yes it's possible. For TCP it's the norm. For UDP I believe it cannot happen, but I'm not an expert on network protocols/programming.
Yes, it can and often does. There is no way of matching up sends and receive calls when using TCP/IP. Your program logic should test the return values of both send and recv calls in a loop, which terminates when everything has been sent or recieved.

Resources