How to Find TCP Retransmissions while sniffing packets in C - c

I've written a simple source file that can read pcap files using the libpcap library in C. I can parse the packets one by one and analyze them up to a point. I want to be able to deduce whether a TCP packet I parsed is a TCP retransmission or not. After searching extensively the web, I've concluded that in order to so, I need to track the traffic behaviour and this means also analyzing previously received packets.
What I actually want to achieve is, to do on a basic level, what the tcp.analysis.retransmission filter does in wireshark.
This is an MRE that reads a pcap file and analyzes the TCP packets sent over IPv4. The function find_retransmissions is where the packet is analyzed.
#include <pcap.h>
#include <stdio.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <net/ethernet.h>
#include <string.h>
void process_packet(u_char *,const struct pcap_pkthdr * , const u_char *);
void find_retransmissions(const u_char * , int );
int main()
{
pcap_t *handle;
char errbuff[PCAP_ERRBUF_SIZE];
handle = pcap_open_offline("smallFlows.pcap", errbuff);
pcap_loop(handle, -1, process_packet, NULL);
}
void process_packet(u_char *args,
const struct pcap_pkthdr * header,
const u_char *buffer)
{
int size = header->len;
struct ethhdr *eth = (struct ethhdr *)buffer;
if(eth->h_proto == 8) //Check if IPv4
{
struct iphdr *iph = (struct iphdr*)(buffer +sizeof(struct ethhdr));
if(iph->protocol == 6) //Check if TCP
{
find_retransmissions(buffer,size);
}
}
}
void find_retransmissions(const u_char * Buffer, int Size)
{
static struct iphdr previous_packets[20000];
static struct tcphdr previous_tcp[20000];
static int index = 0;
static int retransmissions = 0;
int retransmission = 0;
struct sockaddr_in source,dest;
unsigned short iphdrlen;
// IP header
struct iphdr *iph = (struct iphdr *)(Buffer + sizeof(struct ethhdr));
previous_packets[index] = *iph;
iphdrlen =iph->ihl*4;
memset(&source, 0, sizeof(source));
source.sin_addr.s_addr = iph->saddr;
memset(&dest, 0, sizeof(dest));
dest.sin_addr.s_addr = iph->daddr;
// TCP header
struct tcphdr *tcph=(struct tcphdr*)(Buffer
+ iphdrlen
+ sizeof(struct ethhdr));
previous_tcp[index]=*tcph;
index++;
int header_size = sizeof(struct ethhdr) + iphdrlen + tcph->doff*4;
unsigned int segmentlength;
segmentlength = Size - header_size;
/* First check if a same TCP packet has been received */
for(int i=0;i<index-1;i++)
{
// Check if packet has been resent
unsigned short temphdrlen;
temphdrlen = previous_packets[i].ihl*4;
// First check IP header
if ((previous_packets[i].saddr == iph->saddr) // Same source IP address
&& (previous_packets[i].daddr == iph->daddr) // Same destination Ip address
&& (previous_packets[i].protocol == iph->protocol) //Same protocol
&& (temphdrlen == iphdrlen)) // Same header length
{
// Then check TCP header
if((previous_tcp[i].source == tcph->source) // Same source port
&& (previous_tcp[i].dest == tcph->dest) // Same destination port
&& (previous_tcp[i].th_seq == tcph->th_seq) // Same sequence number
&& (previous_tcp[i].th_ack==tcph->th_ack) // Same acknowledge number
&& (previous_tcp[i].th_win == tcph->th_win) // Same window
&& (previous_tcp[i].th_flags == tcph->th_flags) // Same flags
&& (tcph->syn==1 || tcph->fin==1 ||segmentlength>0)) // Check if SYN or FIN are
{ // set or if tcp.segment 0
// At this point the packets are almost identical
// Now Check previous communication to check for retransmission
for(int z=index-1;z>=0;z--)
{
// Find packets going to the reverse direction
if ((previous_packets[z].daddr == iph->saddr) // Swapped IP source addresses
&& (previous_packets[z].saddr ==iph->daddr) // Same for IP dest addreses
&& (previous_packets[z].protocol == iph->protocol)) // Same protocol
{
if((previous_tcp[z].dest==tcph->source) // Swapped ports
&& (previous_tcp[z].source==tcph->dest)
&& (previous_tcp[z].th_seq-1 != tcph->th_ack) // Not Keepalive
&& (tcph->syn==1 // Either SYN is set
|| tcph->fin==1 // Either FIN is set
|| (segmentlength>0)) // Either segmentlength >0
&& (previous_tcp[z].th_seq>tcph->th_seq) // Next sequence number is
// bigger than the expected
&& (previous_tcp[z].ack != 1)) // Last seen ACK is set
{
retransmission = 1;
retransmissions++;
break;
}
}
}
}
}
}
if (retransmission == 1)
{
printf("Retransmission: True\n");
printf("\n\n******************IPv4 TCP Packet*************************\n");
printf(" |-IP Version : %d\n",(unsigned int)iph->version);
printf(" |-Source IP : %s\n" , inet_ntoa(source.sin_addr) );
printf(" |-Destination IP : %s\n" , inet_ntoa(dest.sin_addr) );
printf(" |-Source Port : %u\n", ntohs(tcph->source));
printf(" |-Destination Port : %u\n", ntohs(tcph->dest));
printf(" |-Protocol : %d\n",(unsigned int)iph->protocol);
printf(" |-IP Header Length : %d DWORDS or %d Bytes\n",
(unsigned int)iph->ihl,((unsigned int)(iph->ihl))*4);
printf(" |-Payload Length : %d Bytes\n",Size - header_size);
}
printf("Total Retransmissions: %d\n",retransmissions);
}
This approach is based on the wireshark wiki paragraph about Retransmission. I literally have clicked every page google has to offer on how to approach this analysis but this was the only thing I was able to find.
The results I get are somewhat correct, some Retransmissions go unnoticed, I get a lot of DUP-ACK packets and some normal traffic gets through as well (checked with wireshark). I use the smallFlows.pcap file found here and I believe that the results that I should have, should be the same as the tcp.analysis.retransmission && not tcp.analysis.spurious_retransmission filter in wireshark. Which amounts to 88 retransmissions for this pcap.
Running this code yields 45 and I can't understand why.
Sorry for the messy if statements, I tried my best to clean them up.

For detecting a retransmission you have to keep track of the expected sequence number. If the sequence number is higher than expected the packet is a retransmitted one ( TCP Analysis chapter of the wireshark docs,
https://www.wireshark.org/docs/wsug_html_chunked/ChAdvTCPAnalysis.html )
TCP Retransmission
Set when all of the following are true:
This is not a keepalive packet.
In the forward direction, the segment length is greater than zero or the SYN or FIN flag is set.
The next expected sequence number is greater than the current sequence number
Beside TCP Retransmission this there is also TCP Spurious Retransmission and TCP Fast Retransmission
Basically a retransmission is only necessary if a package is lost.
Analyzing lost segment inconsistency :
source of graphic : http://www.opentextbooks.org.hk/ditatopic/3578
For detecting this type of fault in wireshark the filter tcp.analysis.ack_lost_segment is used. Maybe try to implement this.
(https://serverfault.com/questions/626273/how-can-i-write-a-filter-to-get-tcp-sequence-number-inconsisten)
In wireshark several filters can be applied to capture all types of inconsistencies in sequence numbers i.e. tcp.analysis.retransmission, tcp.analysis.spurious_retransmission and tcp.analysis.fast_retransmission, for the general case of packet loss check for tcp.analysis.ack_lost_segment
https://superuser.com/questions/828294/how-can-i-get-the-actual-tcp-sequence-number-in-wireshark
By default Wireshark and TShark will keep track of all TCP sessions
and implement its own crude version of Sliding_Windows. This requires
some extra state information and memory to be kept by the dissector
but allows much better detection of interesting TCP events such as
retransmissions. This allows much better and more accurate
measurements of packet-loss and retransmissions than is available in
any other protocol analyzer. (But it is still not perfect)
This feature should not impact too much on the run-time memory
requirements of Wireshark but can be disabled if required.
When this feature is enabled the sliding window monitoring inside
Wireshark will detect and trigger display of interesting events for
TCP such as :
TCP Retransmission - Occurs when the sender retransmits a packet after the expiration of the acknowledgement.
TCP Fast Retransmission - Occurs when the sender retransmits a packet before the expiration of the acknowledgement timer. Senders
receive some packets which sequence number are bigger than the
acknowledged packets. Senders should Fast Retransmit upon receipt of 3
duplicate ACKs.
...
source : https://gitlab.com/wireshark/wireshark/-/wikis/TCP_Analyze_Sequence_Numbers

The concept of re-transmission is simple: data that was sent, was sent again.
In TCP, every transmitted byte has an identifier. If a TCP segment has 5 bytes in it (just a hypothetical example, in reality things are bigger of course), then the identifier of the first segment is the sequence number in the TCP header, +1 for the 2nd segment, ..., +4 for the 5th.
The receiver, when it wants to acknowledge a byte, it just sends an ACK with byte's sequence number +1. If receiver wants to acknowledge the 5 bytes as in our example, it ACKs the 5th byte, which is seq_num + 4 + 1. In your case, you do this calculation to get the next expected sequence number seq_num + 4 + 1.
Then, in order to detect if a re-transmission has happened, you simply know it if the same source has sent a TCP segment with a sequence number that's lower than the expected seq_num + 4 + 1.
Say, instead of getting seq_num + 4 + 1 in the next transmitted TCP message, you got seq_num. This means that the this segment is a re-transmission of the previous one.
But does it mean that this TCP segment, with the re-transmission, only contains re-transmissions? No. It can contain re-transmissions from previous segment, plus extra bytes for the next segment. This is why you need to count the total bytes in the segments to tell how many of the bytes are part of the re-transmissions, and how many are part of new transmission. As you see, TCP re-transmission is not binary per segment, but can overlap across segments. Because we are really re-transmitting bytes. We just store bytes in segments for reducing TCP header's overhead.
Now, what if you got seq_num + 2 + 1? This is a bit odd because it indicates that the previous segment got partially re-transmitted only. It basically indicates that it's only re-transmitting from byte 3. If the segment has only 3 bytes, it re-transmitting 3rd, 4th and 5th bytes (i.e. only the previous segment's bytes). But if it has, say, 10 bytes, it means that 6th, 7th, 8th, 9th and 10th bytes are new bytes (not re-transmitted).
In my opinion you can only say that a TCP packet is a re-transmission only when it's carrying bytes with identifiers that were sent before. But as said earlier, this might not be true, as a segment could contain some bytes sent earlier, plus more never sent, hence being a mixture between re-transmissions and new-transmissions.

Related

Ubuntu Socket programming : Packets are repackaged between TX and RX

I have 2 Ubuntu 14.04 PCs. One is used as a server and the other one is used as a client. The client setup a TCP connection to the server which sends some packets back. Here's the code on the server:
send(sd, pkt, pkt_len, MSG_NOSIGNAL);
The code on the client side is also very simple:
read(sd, buf, buf_size);
If the transmissions on the server is spaced out, I don't see any issue. However, if server is doing rapid transmissions, then thing looks ugly. Here's an example when the server is sending 8 packets back-to-back.
The server code shows the size of these 8 packets are: 752 (bytes), 713, 713, 713, 396, 398, 396, 396
tcpdump on the server captures 4 TX packets: 752 (bytes), 1398, 1398, 929
tcpdump on the client captures 3 RX packets: 752 (bytes), 2796, 929
The client code shows it receives only 2 packets with 3548 bytes and 929 bytes, respectively.
So you can see all the bytes sent by the server are received by the client. However, packets are combined at various points in the transmission path. I guess this is due to TSO, GSO, GRO, etc. However, shouldn't these optimizations re-assemble the packets back to the correct form when the packets are delivered to the receiving application?
How do I get around this issue?
TCP is carefully designed to not only permit but implement exactly what you're seeing. It is a byte-stream protocol. If you want messages you have to implement them yourself via a superimposed application protocol.
How do I get around this issue?
So you're using TCP (a byte-stream-oriented transport mechanism) but you'd like it to have message-oriented behavior. You can't change the way TCP works (it is, by design, allowed to transport bytes in whatever-sized groups it chooses to, as long as the bytes are all received and they are received in the same order). But you can add a layer on top of TCP to simulate packet-oriented behavior.
For example, say you wanted to simulate the transmission of a 1000-byte "packet". Your sending program could first send out a fixed-size (let's say, 4-byte) header that would tell the receiver how many bytes the "packet" will contain:
size_t myPacketSize = 1000; // or whatever the size of your packet is
uint32_t bePacketSize = htonl(myPacketSize); // convert native-endian to big-endian for cross-platform compatibility
if (send(sd, &bePacketSize, sizeof(bePacketSize), 0) != sizeof(bePacketSize))
{
perror("send(header)");
}
.... then right after that you'd send out the packet's payload data:
if (send(sd, packetDataPtr, myPacketSize, 0) != myPacketSize)
{
perror("send(body)");
}
The receiver would need to receive the header/size value, then allocate an array of that size and receive the payload data into it. Since this code has to handle the incoming data correctly no matter how many bytes are returned by each recv() call, it's a little more complex than the sending code:
void HandleReceivedPseudoPacket(const char * packetBytes, uint32_t packetSizeBytes)
{
// Your received-packet-handling code goes here
}
// Parses an incoming TCP stream of header+body data back into pseudo-packets for handling
void ReadPseudoPacketsFromTCPStreamForever(int sd)
{
uint32_t headerBuf; // we'll read each 4-byte header's bytes into here
uint32_t numValidHeaderBytes = 0; // how many bytes in (headerBuf) are currently valid
char * bodyBuf = NULL; // will be allocated as soon as we know how many bytes to allocate
uint32_t bodySize; // How many bytes (bodyBuf) points to
uint32_t numValidBodyBytes = 0; // how many bytes in (bodyBuf) are currently valid
while(1)
{
if (bodyBuf == NULL)
{
// We don't know the bodySize yet, so read in header bytes to find out
int32_t numBytesRead = recv(sd, ((char *)&headerBuf)+numValidHeaderBytes, sizeof(headerBuf)-numValidHeaderBytes, 0);
if (numBytesRead > 0)
{
numValidHeaderBytes += numBytesRead;
if (numValidHeaderBytes == sizeof(headerBuf))
{
// We've read the entire 4-byte header, so now we can allocate the body buffer
numValidBodyBytes = 0;
bodySize = ntohl(headerBuf); // convert from big-endian to the CPU's native-endian
bodyBuf = (char *) malloc(bodySize);
if (bodyBuf == NULL)
{
perror("malloc");
break;
}
}
}
else if (numBytesRead < 0)
{
perror("recv(header)");
break;
}
else
{
printf("TCP connection was closed while reading header bytes!\n");
break;
}
}
else
{
// If we got here, then we know the bodySize and now we need to read in the body bytes
int32_t numBytesRead = recv(sd, &bodyBuf[numValidBodyBytes], bodySize-numValidBodyBytes, 0);
if (numBytesRead > 0)
{
numValidBodyBytes += numBytesRead;
if (numValidBodyBytes == bodySize)
{
// At this point the pseudo-packet is fully received and ready to be handled
HandleReceivedPseudoPacket(bodyBuf, bodySize);
// Reset our state variables so we'll be ready to receive the next header
free(bodyBuf);
bodyBuf = NULL;
numValidHeaderBytes = 0;
}
}
else if (numBytesRead < 0)
{
perror("recv(body)");
break;
}
else
{
printf("TCP connection was closed while reading body bytes!\n");
break;
}
}
}
// Avoid memory leak if we exited the while loop in the middle of reading a psuedo-packet's body
if (bodyBuf) free(bodyBuf);
}

A few related questions regarding traceroutes in c:

According to Wikipedia, a traceroute program
Traceroute, by default, sends a sequence of User Datagram Protocol
(UDP) packets addressed to a destination host[...] The time-to-live
(TTL) value, also known as hop limit, is used in determining the
intermediate routers being traversed towards the destination. Routers
decrement packets' TTL value by 1 when routing and discard packets
whose TTL value has reached zero, returning the ICMP error message
ICMP Time Exceeded.[..]
I started writing a program (using an example UDP program as a guide) to adhere to this specification,
#include <sys/socket.h>
#include <assert.h>
#include <netinet/udp.h> //Provides declarations for udp header
#include <netinet/ip.h> //Provides declarations for ip header
#include <stdio.h>
#include <string.h>
#include <arpa/inet.h>
#include <unistd.h>
#define DATAGRAM_LEN sizeof(struct iphdr) + sizeof(struct iphdr)
unsigned short csum(unsigned short *ptr,int nbytes) {
register long sum;
unsigned short oddbyte;
register short answer;
sum=0;
while(nbytes>1) {
sum+=*ptr++;
nbytes-=2;
}
if(nbytes==1) {
oddbyte=0;
*((u_char*)&oddbyte)=*(u_char*)ptr;
sum+=oddbyte;
}
sum = (sum>>16)+(sum & 0xffff);
sum = sum + (sum>>16);
answer=(short)~sum;
return(answer);
}
char *new_packet(int ttl, struct sockaddr_in sin) {
static int id = 0;
char *datagram = malloc(DATAGRAM_LEN);
struct iphdr *iph = (struct iphdr*) datagram;
struct udphdr *udph = (struct udphdr*)(datagram + sizeof (struct iphdr));
iph->ihl = 5;
iph->version = 4;
iph->tos = 0;
iph->tot_len = DATAGRAM_LEN;
iph->id = htonl(++id); //Id of this packet
iph->frag_off = 0;
iph->ttl = ttl;
iph->protocol = IPPROTO_UDP;
iph->saddr = inet_addr("127.0.0.1");//Spoof the source ip address
iph->daddr = sin.sin_addr.s_addr;
iph->check = csum((unsigned short*)datagram, iph->tot_len);
udph->source = htons(6666);
udph->dest = htons(8622);
udph->len = htons(8); //udp header size
udph->check = csum((unsigned short*)datagram, DATAGRAM_LEN);
return datagram;
}
int main(int argc, char **argv) {
int s, ttl, repeat;
struct sockaddr_in sin;
char *data;
printf("\n");
if (argc != 3) {
printf("usage: %s <host> <port>", argv[0]);
return __LINE__;
}
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = inet_addr(argv[1]);
sin.sin_port = htons(atoi(argv[2]));
if ((s = socket(AF_PACKET, SOCK_RAW, 0)) < 0) {
printf("Failed to create socket.\n");
return __LINE__;
}
ttl = 1, repeat = 0;
while (ttl < 2) {
data = new_packet(ttl);
if (write(s, data, DATAGRAM_LEN) != DATAGRAM_LEN) {
printf("Socket failed to send packet.\n");
return __LINE__;
}
read(s, data, DATAGRAM_LEN);
free(data);
if (++repeat > 2) {
repeat = 0;
ttl++;
}
}
return 0;
}
... however at this point I have a few questions.
Is read(s, data, ... reading whole packets at a time, or do I need to parse the data read from the socket; seeking markers particular to IP packets?
What is the best way to uniquely mark my packets as they return to my box as expired?
Should I set up a second socket with the IPPROTO_ICMP flag, or is it easier to write a filter; accepting everything?
Do any other common mistakes exist; or are any common obstacles foreseeable?
Here are some of my suggestions (based on assumption it's a Linux machine).
read packets
You might want to read whole 1500 byte packets (entire Ethernet frame). Don't worry - smaller frames would still be read completely with read returning the length of data read.
Best way to add marker is to have some UDP payload (a simple unsigned int) should be good enough. Increase it on every packet sent. (I just did a tcpdump on traceroute - the ICMP error - does return an entire IP frame back - so you can look at the returned IP frame, parse the UDP payload and so on. Note your DATAGRAM_LEN would change accordingly. ) Of course you can use ID - but be careful that ID is mainly used by fragmentation. You should be okay with that - 'cos you'd not be approaching fragmentation limit on any intermediate routers with these packet sizes. Generally, not a good idea to 'steal' protocol fields that are meant for something else for our custom purpose.
A cleaner way could be to actually use IPPROTO_ICMP on raw sockets (if manuals are installed on your machine man 7 raw and man 7 icmp). You would not want to receive copy of all packets on your device and ignore those that are not ICMP.
If you are using type SOCKET_RAW on AF_PACKET, you will have to manually attach a link layer header or you can do SOCKET_DGRAM and check. Also man 7 packet for lot of subtleties.
Hope that helps or are you looking at some actual code?
A common pitfall is that programming at this level needs very careful use of the proper include files. For instance, your program as-is won't compile on NetBSD, which is typically quite strict in following relevant standards.
Even when I add some includes, there is no struct iphdr but there is a struct udpiphdr instead.
So for now the rest of my answer is not based on trying your program in practice.
read(2) can be used to read single packets at a time. For packet-oriented protocols, such as UDP, you'll never get more data from it than a single packet.
However you can also use recvfrom(2), recv(2) or recvmsg(2) to receive the packets.
If fildes refers to a socket, read() shall be equivalent to recv()
with no flags set.
To identify the packets, I believe using the id field is typically done, as you have already. I am not sure what you mean with "mark my packets as they return to my box as expired", since your packets don't return to you. What you may get back are ICMP Time Exceeded messages. These usually arrive within a few seconds, if they arrive at all. Sometimes they are not sent, sometimes they may be blocked by misconfigured routers between you and their sender.
Note that this assumes that the IP ID you set up in your packet is respected by the network stack you're using. It is possible that it doesn't, and replaces your chosen ID with a different one. Van Jacobson, the original author of the traceroute command as found in NetBSD therefore use a different method:
* The udp port usage may appear bizarre (well, ok, it is bizarre).
* The problem is that an icmp message only contains 8 bytes of
* data from the original datagram. 8 bytes is the size of a udp
* header so, if we want to associate replies with the original
* datagram, the necessary information must be encoded into the
* udp header (the ip id could be used but there's no way to
* interlock with the kernel's assignment of ip id's and, anyway,
* it would have taken a lot more kernel hacking to allow this
* code to set the ip id). So, to allow two or more users to
* use traceroute simultaneously, we use this task's pid as the
* source port (the high bit is set to move the port number out
* of the "likely" range). To keep track of which probe is being
* replied to (so times and/or hop counts don't get confused by a
* reply that was delayed in transit), we increment the destination
* port number before each probe.
Using a IPPROTO_ICMP socket for receiving the replies is more likely to be efficient than trying to receive all packets. It would also require fewer privileges to do so. Of course sending raw packets normally already requires root, but it could make a difference if a more fine-grained permission system is in use.

UDP buffer overflow w/o filling the receive buffer?

If I send 1000 "Hello World!" UDP messages (12 bytes + 28 IP/UDP overhead), I observe that on the receiving side I only buffer 658 (always the same number, 658*40 = 26320 bytes). I do that, by sending the UDP messages while sleeping on the server (after creating the socket).
Curiously the SO_RCVBUF option on the server is 42080 bytes. So, I wonder why I can not buffer the 1000 messages. Do you know where are spend the remaining 15760 bytes?
Below the server code (where distrib.h contains basic error handling wrappers of the socket and signal handling functions):
#include "distrib.h"
static int count;
static void sigint_handler(int s) {
printf("\n%d UDP messages received\n",count);
exit(0);
}
int main(int argc, char **argv)
{
struct addrinfo* serverinfo;
struct addrinfo hints;
struct sockaddr_storage sender;
socklen_t len;
int listenfd,n;
char buf[MAXLINE+1];
if (argc != 2) {
log_error("usage: %s <port>\n", argv[0]);
exit(1);
}
Signal(SIGINT,sigint_handler);
bzero(&hints,sizeof(hints));
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_DGRAM;
hints.ai_protocol = IPPROTO_UDP;
Getaddrinfo("127.0.0.1", argv[1], &hints, &serverinfo);
listenfd = Socket(serverinfo->ai_family, serverinfo->ai_socktype,
serverinfo->ai_protocol);
Bind(listenfd, serverinfo->ai_addr,serverinfo->ai_addrlen);
freeaddrinfo(serverinfo);
count =0;
sleep(20);
while(true) {
bzero(buf,sizeof(buf));
len = sizeof(sender);
n = Recvfrom(listenfd, buf, MAXLINE, 0, (struct sockaddr*)&sender,&len);
buf[n]='\0';
count++;
}
close(listenfd);
return 0;
}
It's more informative to do the reverse calculation -- your buffer is 42080 and it's buffering 658 packets before it starts dropping. Now 42080/658 = 63.95, so it looks like it is counting each packet as 64 bytes and dropping packets if the total size of the packets buffered so far is at or above the limit. Since it buffers entire packets, it actually ends up buffering slightly more than the limit.
Why 64 bytes instead of 40? Perhaps it's including some queuing overhead or perhaps it's rounding up to a multiple of some power of 2 for alignment, or perhaps some combination of both.
I dont have a complete answer, but I tested this on my Linux box and this is what I observed.
When I send one "Hello World!\n" with a terminating '0'. I get:
Client:
$./sendto
sent 14 bytes
Socket "Recv-Q" has 768 bytes (seems probable its in bytes, did not check ss sources):
$ ss -ul|grep 55555
UNCONN 768 0 127.0.0.1:55555 *:*
When I send 1000 packets I get:
$ ./sendto
sent 14000 bytes
Recv-Q:
$ ss -ul|grep 55555
UNCONN 213504 0 127.0.0.1:55555 *:*
Your server (after ctrl-c):
$ ./recvfrom 55555
^C
278 UDP messages received
Incidentally 213504/768 = 278. With quick experimentation I could not figure out what setting to tune, to increase the buffered amount. Also, I dont know why a received packet takes so much space in this queue. Lots of metadata maybe? As on you osX, the dropped packets show up in netstat -su.
EDIT: Additional observation with ss -ulm, which prints "socket memory usage" in more detail:
UNCONN 213504 0 127.0.0.1:55555 *:*
skmem:(r213504,rb212992,t0,tb212992,f3584,w0,o0,bl0)
The 213504 bytes buffered are 512 bytes above the rb value. Might not be a coincidence, but would require reading the kernel source to find out.
Did you check how much one UDP datagram takes up on osX?
EDIT 2:
This is still not a suitable answer for osX, but on Linux I found that increasing the kernel memory allocated for receive buffers allowed me to buffer all the 1000 packets sent.
A bit of overkill, but I used these (disclaimer) tweaking the buffer values randomly might seriously mess up your networking and kernel):
net.core.rmem_max=1048568
net.core.rmem_default=1048568

Getting wrong ip and port number from libpcap captured packet

My Ubuntu virtual machine's IP address is 192.168.1.110. Everything else looks fine. I don't know what is wrong with the code. Maybe I'm using a wrong package header structure?
Below is my code and output. Again my host IP should be 192.168.1.110 and port for now is definitely wrong.
sudo ./sniffall 0
84.72.137.105:38055 192.168.1.105:56652
192.168.1.105:56652 174.141.213.124:28073
84.72.137.105:38055 192.168.1.105:56652
192.168.1.105:56652 174.141.213.124:28073
84.72.137.105:38055 192.168.1.105:56652
#include <pcap.h>
#include <stdio.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netinet/ip.h>
#include <netinet/if_ether.h>
#include <netinet/ether.h>
#include <sys/socket.h>
#include <netinet/tcp.h>
void getPacket(u_char *args, const struct pcap_pkthdr *pkthdr, const u_char *packet){
struct ip *ip;
struct tcphdr *tcp;
ip = (struct ip*)(packet+sizeof(struct ether_header));
tcp = (struct tcphdr*)(packet+sizeof(struct ether_header)+sizeof(struct ip));
char* src = inet_ntoa(ip->ip_src);
printf("%s:%d ",src,tcp->source);
char* dst = inet_ntoa(ip->ip_dst);
printf(" %s:%d\n", dst, tcp->dest);
}
int main(int argc, char *argv[]){
char errbuf[PCAP_ERRBUF_SIZE], *device;
device = argv[1];
pcap_t *handle;
handle = pcap_open_live(device, BUFSIZ, 1, 1000, errbuf);
if(!handle){
device = pcap_lookupdev(errbuf);
handle = pcap_open_live(device, BUFSIZ, 1, 1000, errbuf);
if(!handle){
printf("Couldn't open device %s: %s\n", device, errbuf);
}
}
pcap_loop(handle, 5, getPacket, NULL);
return 0;
}
Pcap is going to show some other traffic other than your system's if you're in promiscuous mode. Why you're seeing specific packets that aren't being sent or received from your system is going to be dependent a bit on your network configuration. Some ethernet switches will occasionally leak packets destined to other systems if they're unsure where they should go, etc.
You also need to need to convert between byte orders. In most common cases now, "network byte order" is not the same as your machine's byte order. To print out the port number, you need to do something like:
printf("%s:%d ",src,ntohs(tcp->source));
Also, you may want to try struct iphdr instead of struct ip. I've seen instances before where there were multiple definitions of a struct named ip in headers, but iphdr was always right for me.
Remember that you can always run tcpdump in another window to see what packets are actually coming in, it's possible that you're receiving more traffic than you are expecting.
First, after calling pcap_open_live(), call pcap_datalink() on handle and, if it doesn't return DLT_EN10MB, either exit or rewrite your program so that it can handle the value it returns. See the tcpdump.org link-layer header types page for a description of the supported values from pcap_datalink().
Second, do NOT assume that the packet is an IPv4 packet unless you have either installed a filter of "ip" or have checked the packet type (e.g., the type field in an Ethernet header) to make sure the packet is an IPv4 packet.
Third, do NOT assume that the header of an IPv4 packet is exactly sizeof(struct ip) bytes long. I assume sizeof(struct ip) will be 20, which is the minimum length of an IPv4 header, but the header may include options - check the "header length" field of the IPv4 header (which is in units of 4-byte words, so a value of 5 means "20 bytes") and use that as the length of the header (make sure it's at least 5 - if it's less than 5, the packet is not valid - and then multiply by 4 to get the length of the header).
Fourth, do NOT assume that the packet is a TCP packet unless you have either installed a filter of "ip and tcp" or just "tcp" (with the latter, you'll still have to check yourself to see whether it's an IPv4 packet) or have checked the "protocol" field of the IPv4 header to make sure it has a value of 6 (for TCP).

Error parsing IP header

I have a small function that tries to print the fragment offset of an IP header.
ParseIpHeader(unsigned char *packet, int len)
{
struct ethhdr *ethernet_header;
struct iphdr *ip_header;
/* First Check if the packet contains an IP header using
the Ethernet header */
ethernet_header = (struct ethhdr *)packet;
if(ntohs(ethernet_header->h_proto) == ETH_P_IP)
{
/* The IP header is after the Ethernet header */
if(len >= (sizeof(struct ethhdr) + sizeof(struct iphdr)))
{
ip_header = (struct iphdr*)(packet + sizeof(struct ethhdr));
/* print the Source and Destination IP address */
//printf("Dest IP address: %s\n", inet_ntoa(ip_header->daddr));
//printf("Source IP address: %s\n", inet_ntoa(ip_header->saddr));
printf("protocol %d\n", ip_header->protocol);
printf("Fragment off is %d\n", ntohs(ip_header->frag_off));
}
}
My packets are TCP (the ip_header->protocol is always 6. the problem is that the frag_off
is always 16384. I am sending a lot of data, why the frag_off is always constant?
Thanks.
Fragment offset is shared with flags. You have the "DF" (don't fragment) bit set.
Which gives you 16384 for the entire 16-bit field, given the fragment offset of 0.
Take a look at the http://www.ietf.org/rfc/rfc791.txt, starting from page 10.
EDIT:
The DF bit in the TCP segments that you are receiving is set by the remote side, to perform the Path MTU discovery - in a nutshell, to try to avoid the fragmentation.
In this case the sending side learns the biggest MTU that the overall path can handle, and chops the TCP segments such that they did not exceed it after the encapsulation into IP.
EDIT2:
regarding the use of recvfrom() and TCP: TCP is a connection-oriented protocol, and all of the segmentation/fragmentation details are already handled by it (fragmentation is obviously handled by the lower layer, IP) - so you do not need to deal with it. Anything you write() on the sending side will be eventually read() on the other side - possibly not in the same chunks though - i.e. two 4K writes may result in a single 8K read sometimes, and sometimes in two 4K reads - depending on the behaviour of the media inbetween concerning reordering/losses.
IP Fragmentation and reassembly is handled transparently by the operating system, so you do not need to worry about it, same as about packets out of order, etc. (you will just see the decreased performance as the effect on the application).
One good read I could recommend is this one: UNIX network programming. Given Steven's involvement with the TCP, it's a good book no matter which OS you use.
EDIT3:
And if you are doing something to be a "man in the middle" (assuming you have good and legitimate reasons for doing so :-) - then you can assess the upcoming work by looking at the prior art: chaosreader (one-script approach that works on pcap files, but adaptable to something else), or LibNIDS - that does emulate the IP defragmentation and the TCP stream reassembly; and maybe just reuse them for your purposes.

Resources