When will a TCP packet be fragmented at the application layer? When a TCP packet is sent from an application, will the recipient at the application layer ever receive the packet in two or more packets? If so, what conditions cause the packet to be divided. It seems like a packet won't be fragmented until it reaches the Ethernet (at the network layer) limit of 1500 bytes. But, that fragmentation will be transparent to the recipient at the application layer since the network layer will reassemble the fragments before sending the packet up to the next layer, right?
It will be split when it hits a network device with a lower MTU than the packet's size. Most ethernet devices are 1500, but it can often be smaller, like 1492 if that ethernet is going over PPPoE (DSL) because of the extra routing information, even lower if a second layer is added like Windows Internet Connection Sharing. And dialup is normally 576!
In general though you should remember that TCP is not a packet protocol. It uses packets at the lowest level to transmit over IP, but as far as the interface for any TCP stack is concerned, it is a stream protocol and has no requirement to provide you with a 1:1 relationship to the physical packets sent or received (for example most stacks will hold messages until a certain period of time has expired, or there are enough messages to maximize the size of the IP packet for the given MTU)
As an example if you sent two "packets" (call your send function twice), the receiving program might only receive 1 "packet" (the receiving TCP stack might combine them together). If you are implimenting a message type protocol over TCP, you should include a header at the beginning of each message (or some other header/footer mechansim) so that the receiving side can split the TCP stream back into individual messages, either when a message is received in two parts, or when several messages are received as a chunk.
Fragmentation should be transparent to a TCP application. Keep in mind that TCP is a stream protocol: you get a stream of data, not packets! If you are building your application based on the idea of complete data packets then you will have problems unless you add an abstraction layer to assemble whole packets from the stream and then pass the packets up to the application.
The question makes an assumption that is not true -- TCP does not deliver packets to its endpoints, rather, it sends a stream of bytes (octets). If an application writes two strings into TCP, it may be delivered as one string on the other end; likewise, one string may be delivered as two (or more) strings on the other end.
RFC 793, Section 1.5:
"The TCP is able to transfer a
continuous stream of octets in each
direction between its users by
packaging some number of octets into
segments for transmission through the
internet system."
The key words being continuous stream of octets (bytes).
RFC 793, Section 2.8:
"There is no necessary relationship
between push functions and segment
boundaries. The data in any particular
segment may be the result of a single
SEND call, in whole or part, or of
multiple SEND calls."
The entirety of section 2.8 is relevant.
At the application layer there are any number of reasons why the whole 1500 bytes may not show up one read. Various factors in the internal operating system and TCP stack may cause the application to get some bytes in one read call, and some in the next. Yes, the TCP stack has to re-assemble the packet before sending it up, but that doesn't mean your app is going to get it all in one shot (it is LIKELY will get it in one read, but it's not GUARANTEED to get it in one read).
TCP tries to guarantee in-order delivery of bytes, with error checking, automatic re-sends, etc happening behind your back. Think of it as a pipe at the app layer and don't get too bogged down in how the stack actually sends it over the network.
This page is a good source of information about some of the issues that others have brought up, namely the need for data encapsulation on an application protocol by application protocol basis Not quite authoritative in the sense you describe but it has examples and is sourced to some pretty big names in network programming.
If a packet exceeds the maximum MTU of a network device it will be broken up into multiple packets. (Note most equipment is set to 1500 bytes, but this is not a necessity.)
The reconstruction of the packet should be entirely transparent to the applications.
Different network segments can have different MTU values. In that case fragmentation can occur. For more information see TCP Maximum segment size
This (de)fragmentation happens in the TCP layer. In the application layer there are no more packets. TCP presents a contiguous data stream to the application.
A the "application layer" a TCP packet (well, segment really; TCP at its own layer doesn't know from packets) is never fragmented, since it doesn't exist. The application layer is where you see the data as a stream of bytes, delivered reliably and in order.
If you're thinking about it otherwise, you're probably approaching something in the wrong way. However, this is not to say that there might not be a layer above this, say, a sequence of messages delivered over this reliable, in-order bytestream.
Correct - the most informative way to see this is using Wireshark, an invaluable tool. Take the time to figure it out - has saved me several times, and gives a good reality check
If a 3000 byte packet enters an Ethernet network with a default MTU size of 1500 (for ethernet), it will be fragmented into two packets of each 1500 bytes in length. That is the only time I can think of.
Wireshark is your best bet for checking this. I have been using it for a while and am totally impressed
I'm getting trouble with packet segmentation. I've already read from many sources about GSO, which is a generalized way for segmenting a packet with size greater than the Ethernet MTU (1500 B). However, I have not found an answer for doubts that I have in mind.
If we add a new set of bytes (ex. a new header by the name 'NH') between L2 and L3 layer, the kernel must be able to pass through NH and adjust sk_buff pointer to the beginning of the L3 to offload the packet according to the 'policy' of the L3 protocol type (ex. IPv4 fragmentation). My thoughts were to modify skb_network_protocol() function. This function, if I'm not wrong, enables skb_mac_gso_segment() to properly call GSO function for different types of L3 protocol. However, I'm not being able to segment my packets properly.
I have a kernel module that forwards packets through the network (OVS, Open vSwitch). On the tests which I've been running (h1 --ping-- h2), the host generates large ICMP packets and then sends packets which are less or equal than MTU size. Those packets are received by the first switch which attaches the new header NH, so if a packet had 1500B, it becomes 1500B + NH length. Here is the problem, the switch has already received a fragmented packet from the host, and the switch adds more bytes in the packet (kind of VLAN does).
Therefore, at first, I tried to ping large packets, but it didn't work. In OVS, before calling dev_queue_xmit(), a packet can be segmented by calling skb_gso_segment(). However, the packet needs to go through a condition checked by netif_needs_gso(). But I'm not sure if I have to use skb_gso_segment() to properly segment the packet.
I also noticed that, for the needs_gso_segment() function be true, skb_shinfo(skb)->gso_size have to be true. However, gso_size has always zero value for all the received packets. So, I made a test by attributing a random value to gso_size (ex. 1448B). Now, on my tests, I was able to ping from h1 to h2, but the first 2 packets were lost. On another test, TCP had a extremely poor performance. And since then, I've been getting a kernel warning: "[ 5212.694418] [c1642e50] ? skb_warn_bad_offload+0xd0/0xd8
"
For small packets (< MTU) I got no trouble and ping works fine. TCP works fine, but for small window size.
Someone has any idea for what's happening? Should I always use GSO when I get large packets? Is it possible to fragment a fragmented IPv4 packets?
As the new header lies between L2 and L3, I guess the enlargement of a IPv4 packet due to the additional header, is similar to what happens with VLAN. How VLAN can handle the segmentation problem?
Thanks in advance,
I want to do my own very simple implementation of VPN in C on Linux. For that purpose I'm going to capture IP packets, modify them and send forward. The modification consists of encryption, authentication and other stuff like in IPSec. My question is should I process somehow the size of packets or this will be handled automatically? I know it's maximum size is 65535 - 20 (for header) but accoring to MTU it is lesser. I think its because encrypted payload "incapsulated into UDP" for NAT-T is much bigger then just "normal payload" of the IP packet.
Well, I found that there actually 2 ways to handle that problem:
1) We can send big packets by settings DF flag to tell we want fragment out packets. But in this case packet can be lost, because not all the devices/etc support packet fragmentation
2) We can automatically calculate our maximum MTU between hosts, split them and send. On another side we put all this packets together and restore them. This can be done by implementing our own "system" for this purpose.
More about IP packets fragmentation and reassembly you can read here
I'm maintaining some network driver and I've got some problems with lost of data. The effect is that when I send for example ICMP or UDP ping using ping or nping some of the udp/icmp packets are lost.
I'm sure that on ping/nping side of the transmission the ping reply is received by my driver and the kernel (tcpdump shows incoming udp or icmp packets as a reply).
But application ping/nping shows sometimes that for example 80% packets are lost. I suspect that those packets are lost somewhere between kernel and user space.
I know that for UDP there is procedure udp_rcv() for maintenance of UDP packets, but I don't know which procedure is next in the path of delivering of the packet to user space application.
Linux kernel is in version 3.3.8.
My question is - how to trace the whole path of transition of the packet from my driver to user space socket buffer?
udp_rcv() is a callback that is passed to struct net_protocol as a .handler.
You may either look at usage of this handler field in the structure, or you can also see if some error occurs. There is a callback err_handler. Maybe packet loss happens here and error handler will be called.
P. S. Remember that UDP does not guarantee 100% transmit success, and one lost packet out of 100 might be expected behavior. (:
I am working on a program where I'm reading from a Tap. The only issue is, I have no clue how to detect the end of one transmission to the tap and the start of another.
Does reading from the tap act the same way as a SOCK_STREAM ?
Tun/tap tries to look like a regular ethernet controller, but the tap device itself is accessed just like any other file descriptor.
Since it pretends to be an ethernet controller, you have to know in advance how big the ethernet frame itself was that was transmitted - this comes either from the software bridge that the tap device was attached to or the "length" field in the raw ethernet frame.
This, of course can only be the maximum of the MTU size of the tap device, which typically defaults to 1500 bytes.
So, before you do a read() on the file descriptor for the tap device, you've gotta figure out how big the ethernet frame actually is.