TCP fragmentation - unread bytes

TCP fragmentation - unread bytes - c

A slight variation on this SO question.
Say the receiver expects packets to be at most 100 bytes.
Say at time X there are actually 100 bytes available in the buffer, but for reasons the receiver only determines it needs to read 75 of those.
What happens with data not read from a socket?
Example:
Using flag MSG_PEEK (see here) the receiver determines that there is a full valid reply of 75 bytes in the buffer. The remaining 25 bytes must be the start of a next packet.
The receiver elects to remove only 75 bytes (i.e. ::recv() without the MSG_PEEK flag) from the buffer, leaving 25 bytes unread/unmoved in the buffer.

there are actually 100 bytes available in the buffer, but for reasons the receiver only determines it needs to read 75 of those.
I guess receiver refers to the application reading from the TCP socket. The remaining 25 bytes simply stay in the socket buffer to be read at some later time. If the socket is closed before that the data is lost.
Using the MSG_PEEK flag, the read data isn't removed from the buffer at all, so it still contains all 100 bytes after reading.
From the application level, you receive a continous data stream from a TCP socket. If and how the data was segmented or even fragmented for transport doesn't matter and isn't visible to the application. You can read the data in chunks of any size, regardless of how the source application has written it.
Say the receiver expects packets to be at most 100 bytes.
If you are trying to refer to TCP's Maximum Segment Size (MSS), the minimum Maximum Transfer Unit (MTU) for IPv4 is 576 bytes, so the minimum MSS is 536 bytes.

Related

How can I clear UDP buffer without recvfrom?

I have an embedded linux project. And it gets data via UDP to static char array from UDP buffer. This static array's size is 20000 bytes. I want to ignore UDB messages that exceed this size. But when comes bigger data, it stays always in UDP buffer since it is not read with recvfrom. Is there any way to clear this bigger data in UDP buffer?

One cannot discard the data from the socket buffer without reading. But one can read these large datagrams even when having a smaller buffer - it will simply discard anything which does not fit into the given buffer. To find out if the datagram was too large use the MSG_TRUNC flag so that it will provide the original length of the packet. If this indicates an oversized packet just discard it and continue with the next packet.

Sending integer atomically over a socket in c

I have a simple client which accepts a single uint32_t from the server through a socket. Using the solution that appeared here many times (e.g. transfer integer over a socket in C) seems to work, but:
When calling "read" on files I know that the system is not guaranteed to read the entire content of the message at once, and it therefore returns the number of bytes read. Couldn't the same happen when accepting 4 bytes over a network socket?
If this can't happen, why is that? and if it can, how is it possible to make sure the send is "atomic", or is it necessary to piece back the bytes myself?

Depending on the socket type, different protocols can be used. SOCK_STREAM (correspond to TCP on network sockets) is a stream oriented protocol, so packets may be re-combined by the sender, the receiver or any equipement in the middle.
But SOCK_DGRAM (UDP) or SOCK_SEQPACKET actually send packets that cannot be changed. In that case 4 bytes in the same packet are guaranteed be be available in the same read operation, except if the receive buffer is too small. From man socket:
If a message is too long to fit in the supplied buffer, excess
bytes may be discarded depending on the type of socket the message is
received from
So if you want to have atomic blocs, use a packet protocol and not a stream one, and make sure to have large enough receive buffers.

When calling "read" on files I know that the system is not guaranteed
to read the entire content of the message at once
That is wrong, if the requested number of bytes is available they are read:
POSIX read manual says: The value returned may be less than nbyte if the number of bytes left
in the file is less than nbyte
This is at least correct for regular files, for pipes and alike it is a different story.
Couldn't the same happen when accepting 4 bytes over a network socket?
(I suppose you are talking about TCP sockets.) That may happen with socket because underlying protocol may transport your byte in any suitable manner (read about TCP fragmentation for example), the only thing ensured is that if received bytes are received in the same order that they have been sent. So, to read a given number of bytes you have to try to read those bytes eventually with several reads. This is usually made by looping over the read until needed bytes are received and read.

If the underlying protocol is TCP/IP, which is stream-oriented (there are no "packets" or "messages", just two byte streams), then yes.
You need to take care to manage the amount of read data so that you can know where each "message" (in your case, a single integer) begins and ends.

Reading data on serial port - byte by byte

How to read data on serial port in byte by byte fashion.
I have a source which sends out packets of varying packet size. I am reading the data in raw mode(non-canonical). When i set VMIN, i am able to get packet of that size or slightly larger.
for ex: If the received packet size is 46 bytes, and if i set VMIN to say '1'. I receive the data in 2 chunks(meaning 2 read calls were needed to get the complete data with one fetching first 32 and next fetching the rest 14 bytes).
If i set VMIN to 46, complete packet is fetched.
But the problem here is varying packet size. If the data packet size is more(say 70 bytes), it will mess up the buffer and following reads as it reads 60+ bytes in first read and rest in next read.
So i am thinking to read the data byte by byte and determine the end of the packet.
Does anyone know if it is do-able. Or any suggestion on how to read the complete data packet in one read operation.
UART setting:
Baud: 115200
No parity.
1 stop bit.
8N1.
No flow control.
Thanks in advance.

A good approach for for processing serial data is to Read chunks of data from port the into a buffer and then pull byte by byte from the buffer.
Serial port reading is affected by the timeout settings and incoming data flow, so the number of bytes per read are not guaranteed to be consistent. For example, if you knew that packets were always going to be 46 bytes, then you might think to set Vmin to 46 and expect to get 46 bytes per read. However, if the sending source sends multiple packets without delays between, then you might get all of one and part of another packet. If the sending source were to delay during the transmission of a packet for longer than the receiving port's timeout, then you would get fewer than Vmin bytes.
Be sure to code for the possibility of lost data. For example, let's say that packets start with and ends with . You start pulling data from the buffer and the first byte is , but 49 bytes later you encounter meaning a new packet, but without having seen the from the previous packet. There should of course also be a CRC for the packet, or at least a checksum.

Since you are reading data that is structured into packets of variable size, you should add a 2 byte header for each packet and set it to the packet size.
In the reader you would read 2 bytes first and then decide how many bytes to read to receive the whole packet.

Isn't recv() in C socket programming blocking?

In Receiver, I have
recvfd=accept(sockfd,&other_side,&len);
while(1)
{
recv(recvfd,buf,MAX_BYTES-1,0);
buf[MAX_BYTES]='\0';
printf("\n Number %d contents :%s\n",counter,buf);
counter++;
}
In Sender , I have
send(sockfd,mesg,(size_t)length,0);
send(sockfd,mesg,(size_t)length,0);
send(sockfd,mesg,(size_t)length,0);
MAX_BYTES is 1024 and length of mesg is 15. Currently, It calls recv only one time. I want recv function to be called three times for each corresponding send. How do I achieve it?

In short: yes, it is blocking. But not in the way you think.
recv() blocks until any data is readable. But you don't know the size in advance.
In your scenario, you could do the following:
call select() and put the socket where you want to read from into the READ FD set
when select() returns with a positive number, your socket has data ready to be read
then, check if you could receive length bytes from the socket:
recv(recvfd, buf, MAX_BYTES-1, MSG_PEEK), see man recv(2) for the MSG_PEEK param or look at MSDN, they have it as well
now you know how much data is available
if there's less than length available, return and do nothing
if there's at least length available, read length and return (if there's more than length available, we'll continue with step 2 since a new READ event will be signalled by select()

To send discrete messages over a byte stream protocol, you have to encode messages into some kind of framing language. The network can chop up the protocol into arbitrarily sized packets, and so the receives do not correlate with your messages in any way. The receiver has to implement a state machine which recognizes frames.
A simple framing protocol is to have some length field (say two octets: 16 bits, for a maximum frame length of 65535 bytes). The length field is followed by exactly that many bytes.
You must not even assume that the length field itself is received all at once. You might ask for two bytes, but recv could return just one. This won't happen for the very first message received from the socket, because network (or local IPC pipe, for that matter) segments are never just one byte long. But somewhere in the middle of the stream, it is possible that the fist byte of the 16 bit length field could land on the last position of one network frame.
An easy way to deal with this is to use a buffered I/O library instead of raw operating system file handles. In a POSIX environment, you can take an open socket handle, and use the fdopen function to associate it with a FILE * stream. Then you can use functions like getc and fread to simplify the input handling (somewhat).
If in-band framing is not acceptable, then you have to use a protocol which supports framing, namely datagram type sockets. The main disadvantage of this is that the principal datagram-based protocol used over IP is UDP, and UDP is unreliable. This brings in a lot of complexity in your application to deal with out of order and missing frames. The size of the frames is also restricted by the maximum IP datagram size which is about 64 kilobytes, including all the protocol headers.
Large UDP datagrams get fragmented, which, if there is unreliability in the network, adds up to greater unreliability: if any IP fragment is lost, the entire packet is lost. All of it must be retransmitted; there is no way to just get a repetition of the fragment that was lost. The TCP protocol performs "path MTU discovery" to adjust its segment size so that IP fragmentation is avoided, and TCP has selective retransmission to recover missing segments.

I bet you've created a TCP socket using SOCK_STREAM, which would cause the three messages to be read into your buffer during the first recv call. If you want to read the messages one-by-one, create a UPD socket using SOCK_DGRAM, or develop some type of message format which allows you to parse your messages when they arrive in a stream (assuming your messages will not always be fixed length).

First send the length to be received in a fixed format regarding the size of length in bytes you use to transmit this length, then make recv() loop until length bytes had been received.
Note the fact (as also already mentioned by other answers), that the size and number of chunks received do not necessarly need to be the same as sent. Only the sum of all bytes received shall be the same as the sum of all bytes sent.
Read the man pages for recvand send. Especially read the sections on what those functions RETURN.

recv will block until the entire buffer is filled, or the socket is closed.
If you want to read length bytes and return, then you must only pass to recv a buffer of size length.
You can use select to determine if
there are any bytes waiting to be read,
how many bytes are waiting to be read, then
read only those bytes
This can avoid recv from blocking.
Edit:
After re-reading the docs, the following may be true: your three "messages" may be being read all-at-once since length + length + length < MAX_BYTES - 1.
Another possibility, if recv is never returning, is that you may need to flush your socket from the sender-side. The data may be waiting in a buffer to actually be sent to the receiver.

How large should my recv buffer be when calling recv in the socket library

I have a few questions about the socket library in C. Here is a snippet of code I'll refer to in my questions.
char recv_buffer[3000];
recv(socket, recv_buffer, 3000, 0);
How do I decide how big to make recv_buffer? I'm using 3000, but it's arbitrary.
what happens if recv() receives a packet bigger than my buffer?
how can I know if I have received the entire message without calling recv again and have it wait forever when there is nothing to be received?
is there a way I can make a buffer not have a fixed amount of space, so that I can keep adding to it without fear of running out of space? maybe using strcat to concatenate the latest recv() response to the buffer?
I know it's a lot of questions in one, but I would greatly appreciate any responses.

The answers to these questions vary depending on whether you are using a stream socket (SOCK_STREAM) or a datagram socket (SOCK_DGRAM) - within TCP/IP, the former corresponds to TCP and the latter to UDP.
How do you know how big to make the buffer passed to recv()?
SOCK_STREAM: It doesn't really matter too much. If your protocol is a transactional / interactive one just pick a size that can hold the largest individual message / command you would reasonably expect (3000 is likely fine). If your protocol is transferring bulk data, then larger buffers can be more efficient - a good rule of thumb is around the same as the kernel receive buffer size of the socket (often something around 256kB).
SOCK_DGRAM: Use a buffer large enough to hold the biggest packet that your application-level protocol ever sends. If you're using UDP, then in general your application-level protocol shouldn't be sending packets larger than about 1400 bytes, because they'll certainly need to be fragmented and reassembled.
What happens if recv gets a packet larger than the buffer?
SOCK_STREAM: The question doesn't really make sense as put, because stream sockets don't have a concept of packets - they're just a continuous stream of bytes. If there's more bytes available to read than your buffer has room for, then they'll be queued by the OS and available for your next call to recv.
SOCK_DGRAM: The excess bytes are discarded.
How can I know if I have received the entire message?
SOCK_STREAM: You need to build some way of determining the end-of-message into your application-level protocol. Commonly this is either a length prefix (starting each message with the length of the message) or an end-of-message delimiter (which might just be a newline in a text-based protocol, for example). A third, lesser-used, option is to mandate a fixed size for each message. Combinations of these options are also possible - for example, a fixed-size header that includes a length value.
SOCK_DGRAM: An single recv call always returns a single datagram.
Is there a way I can make a buffer not have a fixed amount of space, so that I can keep adding to it without fear of running out of space?
No. However, you can try to resize the buffer using realloc() (if it was originally allocated with malloc() or calloc(), that is).

For streaming protocols such as TCP, you can pretty much set your buffer to any size. That said, common values that are powers of 2 such as 4096 or 8192 are recommended.
If there is more data then what your buffer, it will simply be saved in the kernel for your next call to recv.
Yes, you can keep growing your buffer. You can do a recv into the middle of the buffer starting at offset idx, you would do:
recv(socket, recv_buffer + idx, recv_buffer_size - idx, 0);

If you have a SOCK_STREAM socket, recv just gets "up to the first 3000 bytes" from the stream. There is no clear guidance on how big to make the buffer: the only time you know how big a stream is, is when it's all done;-).
If you have a SOCK_DGRAM socket, and the datagram is larger than the buffer, recv fills the buffer with the first part of the datagram, returns -1, and sets errno to EMSGSIZE. Unfortunately, if the protocol is UDP, this means the rest of the datagram is lost -- part of why UDP is called an unreliable protocol (I know that there are reliable datagram protocols but they aren't very popular -- I couldn't name one in the TCP/IP family, despite knowing the latter pretty well;-).
To grow a buffer dynamically, allocate it initially with malloc and use realloc as needed. But that won't help you with recv from a UDP source, alas.

For SOCK_STREAM socket, the buffer size does not really matter, because you are just pulling some of the waiting bytes and you can retrieve more in a next call. Just pick whatever buffer size you can afford.
For SOCK_DGRAM socket, you will get the fitting part of the waiting message and the rest will be discarded. You can get the waiting datagram size with the following ioctl:
#include <sys/ioctl.h>
int size;
ioctl(sockfd, FIONREAD, &size);
Alternatively you can use MSG_PEEK and MSG_TRUNC flags of the recv() call to obtain the waiting datagram size.
ssize_t size = recv(sockfd, buf, len, MSG_PEEK | MSG_TRUNC);
You need MSG_PEEK to peek (not receive) the waiting message - recv returns the real, not truncated size; and you need MSG_TRUNC to not overflow your current buffer.
Then you can just malloc(size) the real buffer and recv() datagram.

There is no absolute answer to your question, because technology is always bound to be implementation-specific. I am assuming you are communicating in UDP because incoming buffer size does not bring problem to TCP communication.
According to RFC 768, the packet size (header-inclusive) for UDP can range from 8 to 65 515 bytes. So the fail-proof size for incoming buffer is 65 507 bytes (~64KB)
However, not all large packets can be properly routed by network devices, refer to existing discussion for more information:
What is the optimal size of a UDP packet for maximum throughput?
What is the largest Safe UDP Packet Size on the Internet

16kb is about right; if you're using gigabit ethernet, each packet could be 9kb in size.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight