Good day.
Intro.
Recently I've started to study some 'low-level' network programming as well as networking protocols in Linux. For this purpose I decided to create a small library for networking.
And now I wonder on some questions. I will ask one of them now.
As you know there are at least two protocols built on top of IP. I talk about TCP and UDP. Their implementation may differ in OS due to connection-orientation property of those.
According to man 7 udp all receive operations on UDP socket return only one packet. It is rational as different datagrams may come from different sources.
On the other hand TCP connection packets sequence may be considered as continuous byte flow.
Now, about the problem itself.
Say, I have an API for TCP connection socket and for UDP socket like:
void tcp_connection_recv(endpoint_t *ep, buffer_t *b);
void udp_recv(endpoint_t *ep, buffer_t *b);
endpoint_t type will describe the endpoint (remote for TCP connection and local for UDP). buffer_t type will describe some kind of vector-based or array-based buffer.
It is quite possible that buffer is already allocated by user and I'm not sure that this will be right for UDP to not change size of the buffer. And thus, to abstract code for TCP and UDP operations I think it will need to allocate as much buffer as needed to contain whole received data.
Also, to prevent from resizeing user buffer each socket may be maped to its own buffer (although it will be userspace buffer, but it will be hidden from user). And then on user's request data will be copied from that 'inner' buffer to user's one or read from socket if there is not enough amount.
Any suggestions or opinions?
If you want to create such API, it will depend on the service you want to provide. In TCP it will be different than UDP as TCP is stream oriented.
For TCP, tcp_connection_recv instead of reallocating a buffer, if the buffer passed by the user is not big enough, you can fill the whole buffer and then return, maybe with an output parameter, and indication that there is more data waiting to be read. Basically you can use the receive buffer that TCP connection already provides in the kernel, no need to create other buffer.
For, udp, you can request the user a number indicating the maximum datagram size it is waiting for. When you read from a UDP socket with recvfrom, if you read less data than what came in the arrived datagram, the rest of the datagram data is lost. You can read first with MSG_PEEK flag in order to find out how much data is available.
In general I wouldn't handle the buffer for the application as the application, actually the application layer protocol, is the one that knows how it expects to receive the data.
Related
When will a TCP packet be fragmented at the application layer? When a TCP packet is sent from an application, will the recipient at the application layer ever receive the packet in two or more packets? If so, what conditions cause the packet to be divided. It seems like a packet won't be fragmented until it reaches the Ethernet (at the network layer) limit of 1500 bytes. But, that fragmentation will be transparent to the recipient at the application layer since the network layer will reassemble the fragments before sending the packet up to the next layer, right?
It will be split when it hits a network device with a lower MTU than the packet's size. Most ethernet devices are 1500, but it can often be smaller, like 1492 if that ethernet is going over PPPoE (DSL) because of the extra routing information, even lower if a second layer is added like Windows Internet Connection Sharing. And dialup is normally 576!
In general though you should remember that TCP is not a packet protocol. It uses packets at the lowest level to transmit over IP, but as far as the interface for any TCP stack is concerned, it is a stream protocol and has no requirement to provide you with a 1:1 relationship to the physical packets sent or received (for example most stacks will hold messages until a certain period of time has expired, or there are enough messages to maximize the size of the IP packet for the given MTU)
As an example if you sent two "packets" (call your send function twice), the receiving program might only receive 1 "packet" (the receiving TCP stack might combine them together). If you are implimenting a message type protocol over TCP, you should include a header at the beginning of each message (or some other header/footer mechansim) so that the receiving side can split the TCP stream back into individual messages, either when a message is received in two parts, or when several messages are received as a chunk.
Fragmentation should be transparent to a TCP application. Keep in mind that TCP is a stream protocol: you get a stream of data, not packets! If you are building your application based on the idea of complete data packets then you will have problems unless you add an abstraction layer to assemble whole packets from the stream and then pass the packets up to the application.
The question makes an assumption that is not true -- TCP does not deliver packets to its endpoints, rather, it sends a stream of bytes (octets). If an application writes two strings into TCP, it may be delivered as one string on the other end; likewise, one string may be delivered as two (or more) strings on the other end.
RFC 793, Section 1.5:
"The TCP is able to transfer a
continuous stream of octets in each
direction between its users by
packaging some number of octets into
segments for transmission through the
internet system."
The key words being continuous stream of octets (bytes).
RFC 793, Section 2.8:
"There is no necessary relationship
between push functions and segment
boundaries. The data in any particular
segment may be the result of a single
SEND call, in whole or part, or of
multiple SEND calls."
The entirety of section 2.8 is relevant.
At the application layer there are any number of reasons why the whole 1500 bytes may not show up one read. Various factors in the internal operating system and TCP stack may cause the application to get some bytes in one read call, and some in the next. Yes, the TCP stack has to re-assemble the packet before sending it up, but that doesn't mean your app is going to get it all in one shot (it is LIKELY will get it in one read, but it's not GUARANTEED to get it in one read).
TCP tries to guarantee in-order delivery of bytes, with error checking, automatic re-sends, etc happening behind your back. Think of it as a pipe at the app layer and don't get too bogged down in how the stack actually sends it over the network.
This page is a good source of information about some of the issues that others have brought up, namely the need for data encapsulation on an application protocol by application protocol basis Not quite authoritative in the sense you describe but it has examples and is sourced to some pretty big names in network programming.
If a packet exceeds the maximum MTU of a network device it will be broken up into multiple packets. (Note most equipment is set to 1500 bytes, but this is not a necessity.)
The reconstruction of the packet should be entirely transparent to the applications.
Different network segments can have different MTU values. In that case fragmentation can occur. For more information see TCP Maximum segment size
This (de)fragmentation happens in the TCP layer. In the application layer there are no more packets. TCP presents a contiguous data stream to the application.
A the "application layer" a TCP packet (well, segment really; TCP at its own layer doesn't know from packets) is never fragmented, since it doesn't exist. The application layer is where you see the data as a stream of bytes, delivered reliably and in order.
If you're thinking about it otherwise, you're probably approaching something in the wrong way. However, this is not to say that there might not be a layer above this, say, a sequence of messages delivered over this reliable, in-order bytestream.
Correct - the most informative way to see this is using Wireshark, an invaluable tool. Take the time to figure it out - has saved me several times, and gives a good reality check
If a 3000 byte packet enters an Ethernet network with a default MTU size of 1500 (for ethernet), it will be fragmented into two packets of each 1500 bytes in length. That is the only time I can think of.
Wireshark is your best bet for checking this. I have been using it for a while and am totally impressed
My program contains a thread that waits for UDP messages and when a message is received it runs some functions before it goes back to listening. I am worried about missing a message, so my question is something along the line of, how long is it possible to read a message after it has been sent? For example, if the message was sent when the thread was running the functions, could it still be possible to read it if the functions are short enough? I am looking for guidelines here, but an answer in microseconds would also be appreciated.
When your computer receives a UDP packet (and there is at least one program listening on the UDP port specified in that packet), the TCP stack will add that packet's data into a fixed-size buffer that is associated with that socket and kept in the kernel's memory space. The packet's data will stay in that buffer until your program calls recv() to retrieve it.
The gotcha is that if your computer receives the UDP packet and there isn't enough free space left inside the buffer to fit the new UDP packet's data, the computer will simply throw the UDP packet away -- it's allowed to do that, since UDP doesn't make any guarantees that a packet will arrive.
So the amount of time your program has to call recv() before packets start getting thrown away will depend on the size of the socket's in-kernel packet buffer, the size of the packets, and the rate at which the packets are being received.
Note that you can ask the kernel to make its receive-buffer size larger by calling something like this:
size_t bufSize = 64*1024; // Dear kernel: I'd like the buffer to be 64kB please!
setsockopt(mySock, SOL_SOCKET, SO_RCVBUF, &bufSize, sizeof(bufSize));
… and that might help you avoid dropped packets. If that's not sufficient, you'll need to either make sure your program goes back to recv() quickly, or possibly do your network I/O in a separate thread that doesn't get held off by processing.
I'm having a question regarding send() on TCP sockets.
Is there a difference between:
char *text="Hello world";
char buffer[150];
for(i=0;i<10;i++)
send(fd_client, text, strlen(text) );
and
char *text="Hello world";
char buffer[150];
buffer[0]='\0';
for(i=0;i<10;i++)
strcat(buffer, text);
send(fd_client, buffer, strlen(buffer) );
Is there a difference for the receiver side using recv?
Are both going to be one TCP packet?
Even if TCP_NODELAY is set?
There's really no way to know. Depends on the implementation of TCP. If it were a UDP socket, they would definitely have different results, where you would have several packets in the first case and one in the second.
TCP is free to split up packets as it sees fit; it emulates a stream and abstracts it's packet mechanics away from the user. This is by design.
TCP is stream based protocol. If you run Send, it will put some data into OS TCP layer buffer and OS will send it periodically. But if you call Send too quick it might put few arrays into OS TCP layer before the previous one were sent. So it is like stack, it sends whatever it has and put everything in one big array.
Sending is btw done with segmentation by OS TCP layer, and there is as well Nagle's algorithm that will prevent sending small amount data before OS buffer will be big enough to satisfy one segment size.
So yes, there is difference.
TCP is stream based protocol, you can't rely on that single send will be single receive with same amount of data.
Data might merge together and you have to remember about that all the time.
Btw, based on your examples, in first case client will receive all bytes together or nothing. In meantime if sending one big segment will drop somewhere on the way then server OS will automatically resend it. Drop chance for bigger packets is higher so and resending of big segments will lead to some traffic lose. But this is based on percentage of dropped packets and might be not actual for your case at all.
In second example you might receive everything together or parts each separate or some merged. You never know and should implement you network reading that way, that you know how many bytes you expecting to receive and read just that amount of bytes. That way even if there is left some unread bytes they will be read on next "Read".
I have an application that sends data point to point from a sender to the receiver over an link that can operate in simplex (one way transmission) or duplex modes (two way). In simplex mode, the application sends data using UDP, and in duplex it uses TCP. Since a write on TCP socket may block, we are using Non Blocking IO (ioctl with FIONBIO - O_NONBLOCK and fcntl are not supported on this distribution) and the select() system call to determine when data can be written. NIO is used so that we can abort out of send early after a timeout if needed should network conditions deteriorate. I'd like to use the same basic code to do the sending but instead change between TCP/UDP at a higher abstraction. This works great for TCP.
However I am concerned about how Non Blocking IO works for a UDP socket. I may be reading the man pages incorrectly, but since write() may return indicating fewer bytes sent than requested, does that mean that a client will receive fewer bytes in its datagram? To send a given buffer of data, multiple writes may be needed, which may be the case since I am using non blocking IO. I am concerned that this will translate into multiple UDP datagrams received by the client.
I am fairly new to socket programming so please forgive me if have some misconceptions here. Thank you.
Assuming a correct (not broken) UDP implementation, then each send/sendmsg/sendto will correspond to exactly one whole datagram sent and each recv/recvmsg/recvfrom will correspond to exactly one whole datagram received.
If a UDP message cannot be transmitted in its entirety, you should receive an EMSGSIZE error. A sent message might still fail due to size at some point in the network, in which case it will simply not arrive. But it will not be delivered in pieces (unless the IP stack is severely buggy).
A good rule of thumb is to keep your UDP payload size to at most 1400 bytes. That is very approximate and leaves a lot of room for various forms of tunneling so as to avoid fragmentation.
Assume Linux and UDP is used.
The manpage of recvfrom says:
The receive calls normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested.
If this is the case, then it is highly possible to return partial application level protocol data from the socket, even if desired MAX_SIZE is set.
Should a subsequent call to recvfrom be made?
In another sense, it's also possible to have more than the data I want, such as two UDP packets in the socket's buffer. If a recvfrom() is called in this case, would it return both of them(assume within MAX_SIZE)?
I suppose there should be some application protocol level size info at the start of each UDP msg so that it won't mess up.
I think the man page you want is this one. It states that the extra data will be discarded. If there are two packets, the recvfrom call will only retrieve data from the first one.
Well..I got a better answer after searching the web:
Don't be afraid of using a big buffer and specifying a big datagram size when reading... recv() will only read ONE datagram even if there are many of them in the receive buffer and they all fit into your buffer... remember, UDP is datagram oriented, all operations are on those packets, not on bytes...
A different scenario would be faced if you used TCP sockets.... TCP doesn't have any boundary "concept", so you just read as many bytes as you want and recv() will return a number of bytes equals to MIN(bytes_in_buffer, bytes_solicited_by_your_call)
REF: http://www.developerweb.net/forum/archive/index.php/t-3396.html