Why does increasing byte packets makes the traceroute not map the final destination? - icmp

I'm struggling to understand why increasing byte packets prevent the final destination from responding.
I did
traceroute /domain_name/ 1000
and
traceroute -I /domain_name/ 1000

Probably because some link on the path has an MTU (maximum transmission unit) < 1000. The default packet size is 40 IIRC, so its not terribly surprising that at least one link in your path is failing to retransmit your packets when the size jumps from 40 to 1000. See here for more information: https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html

Related

Packet reassembly at Network Layer libpcap

Environment
As per my understanding Network layer is responsible for reassembly of fragmented datagrams and then it supplies the reassembled data to upper Transport layer.
I have collected packet traces using libpcap and i want to reassemble fragmented packets at layer 3 on my own.
This link says that i need fragment flag, fragment offset, identification number and buffer value for reassembly of segments.
Question
At the arrival of first segment how to know what should be size of buffer to be initialized for complete reassembly of datagram.
Thanks.
The IP header only gives you the size of the fragment. So you need to reserve a buffer the size of the largest possible IP packet, i.e. 65535 bytes. Only once you get the last fragment can you determine the length of the complete packet.

Packets incapsulation for own simple VPN

I want to do my own very simple implementation of VPN in C on Linux. For that purpose I'm going to capture IP packets, modify them and send forward. The modification consists of encryption, authentication and other stuff like in IPSec. My question is should I process somehow the size of packets or this will be handled automatically? I know it's maximum size is 65535 - 20 (for header) but accoring to MTU it is lesser. I think its because encrypted payload "incapsulated into UDP" for NAT-T is much bigger then just "normal payload" of the IP packet.
Well, I found that there actually 2 ways to handle that problem:
1) We can send big packets by settings DF flag to tell we want fragment out packets. But in this case packet can be lost, because not all the devices/etc support packet fragmentation
2) We can automatically calculate our maximum MTU between hosts, split them and send. On another side we put all this packets together and restore them. This can be done by implementing our own "system" for this purpose.
More about IP packets fragmentation and reassembly you can read here

What is the maximum buffer length allowed by the sendto function in C

I am implementing a simple network stack using UDP sockets and I wish to send about 1 MB of string data from the client to the server. However, I am not aware if there is a limit to the size of the length in the UDP sendto() API in C. If there is a limit and sendto() won't handle packetization beyond a limit then I would have to manually split the string into smaller blocks and then send them.
Is there a limit on the length of the buffer? Or does the sendto() API handle packetization by itself.
Any insight is appreciated.
There's no API limit on sendto -- it can handle any size the underlying protocol can.
There IS a packet limit for UDP -- 64K; if you exceed this, the sendto call will fail with an EMSGSIZE error code. There are also packet size limits for IP which differ between IPv4 and IPv6. Finally, the low level transport has an MTU size which may or may not be an issue. IP packets can be fragemented into multiple lower level packets and automatically reassembled, unless you've used an IP_OPTIONS setsockopt call to disable fragmentation.
The easiest way to deal with all this complexity is to make your code flexible -- detect EMSGSIZE errors from sendto and switch to using smaller messages if you get it. The also works well if you want to do path MTU discovery, which will generally accept larger messages at first, but will cut down the maximum message size when you send one that ends up exceeding the path MTU.
If you just want to avoid worrying about it, a send of 1452 bytes or less is likely to always be fine (that's the 1500 byte normal ethernet payload max minus 40 for a normal IPv6 header and 8 for a UDP header), unless you're using a VPN (in which case you need to worry about encapsulation overhead).

UDP sendto performance over loopback

Background
I have a very high throughput / low latency network app (goal is << 5 usec per packet) and I wanted to add some monitoring/metrics to it. I have heard about the statsd craze and seems a simple way to collect metrics and feed them into our time series database. Sending metrics is done via a small udp packet write to a daemon (typically running on same server).
I wanted to characterize the effects of sending ~5-10 udp packets in my data path to understand how much latency it would add and was surprised at how bad it is. I know this is a very obscure micro-benchmark but just wanted to get a rough idea on where it lands.
The question I have
I am trying to understand why it takes so long (relatively speaking) to send a UDP packet to localhost versus a remote host. Are there any tweaks I can make to reduce the latency to send a UDP packet? I am thinking the solution for me to push metric collection to an auxiliary core or actually run the statsd daemon on a seperate host.
My setup/benchmarks
CentOS 6.5 with some beefy server hardware.
The client test program I have been using is available here: https://gist.github.com/rishid/9178261
Compiled with gcc 4.7.3 gcc -O3 -std=gnu99 -mtune=native udp_send_bm.c -lrt -o udp_send_bm
The receiver side is running nc -ulk 127.0.0.1 12000 > /dev/null (ip change per IF)
I have ran this micro-benchmark with the following devices.
Some benchmark results:
loopback
Packet Size 500 // Time per sendto() 2159 nanosec // Total time 2.159518
integrated 1 Gb mobo controller
Packet Size 500 // Time per sendto() 397 nanosec // Total time 0.397234
intel ixgbe 10 Gb
Packet Size 500 // Time per sendto() 449 nanosec // Total time 0.449355
solarflare 10 Gb with userspace stack (onload)
Packet Size 500 // Time per sendto() 317 nanosec // Total time 0.317229
Writing to loopback will not be an efficient way to communicate inter-process for profiling. Generally the buffer will be copied multiple times before it's processed, and you run the risk of dropping packets since you're using udp. You're also making additional calls into the operating system, so you add to the risk of context switching (~2us).
goal is << 5 usec per packet
Is this a hard real-time requirement, or a soft requirement? Generally when you're handling things in microseconds, profiling should be zero overhead. You're using solarflare?, so I think you're serious. The best way I know to do this is tapping into the physical line, and sniffing traffic for metrics. A number of products do this.
i/o to disk or the network is very slow if you are incorporating it in a very tight (real time) processing loop. A solution might be to offload the i/o to a separate lower priority task. Let the real time loop pass the messages to the i/o task through a (best lock-free) queue.

libpcap format - packet header - incl_len / orig_len

The libpcap packet header structure has 2 length fields:
typedef struct pcaprec_hdr_s {
guint32 ts_sec; /* timestamp seconds */
guint32 ts_usec; /* timestamp microseconds */
guint32 incl_len; /* number of octets of packet saved in file */
guint32 orig_len; /* actual length of packet */
} pcaprec_hdr_t;
incl_len: the number of bytes of packet data actually captured and saved in the file. This value should never become larger than orig_len or the snaplen value of the global header.
orig_len: the length of the packet as it appeared on the network when it was captured. If incl_len and orig_len differ, the actually saved packet size was limited by snaplen.
Can any one tell me what is the difference between the 2 length fields? We are saving the packet in entirely then how can the 2 differ?
Reading through the documentation at the Wireshark wiki ( http://wiki.wireshark.org/Development/LibpcapFileFormat ) and studying an example pcap file, it looks like incl_len and orig_len are usually the same quantity. The only time they will differ is if the length of the packet exceeded the size of snaplen, which is specified in the global header for the file.
I'm just guessing here, but I imagine that snaplen specifies the size of the static buffer used for capturing. In the event that a packet was too large for the capture buffer, this is the format's method for signaling that fact. snaplen is documented to "usually" be 65535, which is large enough for most packets. But the documentation stipulates that the size might be limited by the user.
Can any one tell me what is the difference between the 2 length fields? We are saving the packet in entirely then how can the 2 differ?
If you're saving the entire packet, the 2 shouldn't differ.
However, if, for example, you run tcpdump or TShark or dumpcap or a capture-from-the-command-line Wireshark and specify a small value with the "-s n" flag, or specify a small value in the "Limit each packet to [n] bytes" option in the Wireshark GUI, then libpcap/WinPcap will be passed that value and will only supply the first n bytes of each packet to the program, and the entire packet won't be saved.
A limited "snapshot length" means you don't see all the packet data, so some analysis might not be possible, but means that less memory is needed in the OS to buffer packets (so fewer packets might be dropped), and less CPU bandwidth is needed to copy packet data to the application and less disk bandwidth is needed to save packets to disk if the application is saving them (which might also reduce the number of packets dropped), and less disk space is needed for the saved packets.

Resources