I have a small function that tries to print the fragment offset of an IP header.
ParseIpHeader(unsigned char *packet, int len)
{
struct ethhdr *ethernet_header;
struct iphdr *ip_header;
/* First Check if the packet contains an IP header using
the Ethernet header */
ethernet_header = (struct ethhdr *)packet;
if(ntohs(ethernet_header->h_proto) == ETH_P_IP)
{
/* The IP header is after the Ethernet header */
if(len >= (sizeof(struct ethhdr) + sizeof(struct iphdr)))
{
ip_header = (struct iphdr*)(packet + sizeof(struct ethhdr));
/* print the Source and Destination IP address */
//printf("Dest IP address: %s\n", inet_ntoa(ip_header->daddr));
//printf("Source IP address: %s\n", inet_ntoa(ip_header->saddr));
printf("protocol %d\n", ip_header->protocol);
printf("Fragment off is %d\n", ntohs(ip_header->frag_off));
}
}
My packets are TCP (the ip_header->protocol is always 6. the problem is that the frag_off
is always 16384. I am sending a lot of data, why the frag_off is always constant?
Thanks.
Fragment offset is shared with flags. You have the "DF" (don't fragment) bit set.
Which gives you 16384 for the entire 16-bit field, given the fragment offset of 0.
Take a look at the http://www.ietf.org/rfc/rfc791.txt, starting from page 10.
EDIT:
The DF bit in the TCP segments that you are receiving is set by the remote side, to perform the Path MTU discovery - in a nutshell, to try to avoid the fragmentation.
In this case the sending side learns the biggest MTU that the overall path can handle, and chops the TCP segments such that they did not exceed it after the encapsulation into IP.
EDIT2:
regarding the use of recvfrom() and TCP: TCP is a connection-oriented protocol, and all of the segmentation/fragmentation details are already handled by it (fragmentation is obviously handled by the lower layer, IP) - so you do not need to deal with it. Anything you write() on the sending side will be eventually read() on the other side - possibly not in the same chunks though - i.e. two 4K writes may result in a single 8K read sometimes, and sometimes in two 4K reads - depending on the behaviour of the media inbetween concerning reordering/losses.
IP Fragmentation and reassembly is handled transparently by the operating system, so you do not need to worry about it, same as about packets out of order, etc. (you will just see the decreased performance as the effect on the application).
One good read I could recommend is this one: UNIX network programming. Given Steven's involvement with the TCP, it's a good book no matter which OS you use.
EDIT3:
And if you are doing something to be a "man in the middle" (assuming you have good and legitimate reasons for doing so :-) - then you can assess the upcoming work by looking at the prior art: chaosreader (one-script approach that works on pcap files, but adaptable to something else), or LibNIDS - that does emulate the IP defragmentation and the TCP stream reassembly; and maybe just reuse them for your purposes.
Related
I've written a simple source file that can read pcap files using the libpcap library in C. I can parse the packets one by one and analyze them up to a point. I want to be able to deduce whether a TCP packet I parsed is a TCP retransmission or not. After searching extensively the web, I've concluded that in order to so, I need to track the traffic behaviour and this means also analyzing previously received packets.
What I actually want to achieve is, to do on a basic level, what the tcp.analysis.retransmission filter does in wireshark.
This is an MRE that reads a pcap file and analyzes the TCP packets sent over IPv4. The function find_retransmissions is where the packet is analyzed.
#include <pcap.h>
#include <stdio.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <net/ethernet.h>
#include <string.h>
void process_packet(u_char *,const struct pcap_pkthdr * , const u_char *);
void find_retransmissions(const u_char * , int );
int main()
{
pcap_t *handle;
char errbuff[PCAP_ERRBUF_SIZE];
handle = pcap_open_offline("smallFlows.pcap", errbuff);
pcap_loop(handle, -1, process_packet, NULL);
}
void process_packet(u_char *args,
const struct pcap_pkthdr * header,
const u_char *buffer)
{
int size = header->len;
struct ethhdr *eth = (struct ethhdr *)buffer;
if(eth->h_proto == 8) //Check if IPv4
{
struct iphdr *iph = (struct iphdr*)(buffer +sizeof(struct ethhdr));
if(iph->protocol == 6) //Check if TCP
{
find_retransmissions(buffer,size);
}
}
}
void find_retransmissions(const u_char * Buffer, int Size)
{
static struct iphdr previous_packets[20000];
static struct tcphdr previous_tcp[20000];
static int index = 0;
static int retransmissions = 0;
int retransmission = 0;
struct sockaddr_in source,dest;
unsigned short iphdrlen;
// IP header
struct iphdr *iph = (struct iphdr *)(Buffer + sizeof(struct ethhdr));
previous_packets[index] = *iph;
iphdrlen =iph->ihl*4;
memset(&source, 0, sizeof(source));
source.sin_addr.s_addr = iph->saddr;
memset(&dest, 0, sizeof(dest));
dest.sin_addr.s_addr = iph->daddr;
// TCP header
struct tcphdr *tcph=(struct tcphdr*)(Buffer
+ iphdrlen
+ sizeof(struct ethhdr));
previous_tcp[index]=*tcph;
index++;
int header_size = sizeof(struct ethhdr) + iphdrlen + tcph->doff*4;
unsigned int segmentlength;
segmentlength = Size - header_size;
/* First check if a same TCP packet has been received */
for(int i=0;i<index-1;i++)
{
// Check if packet has been resent
unsigned short temphdrlen;
temphdrlen = previous_packets[i].ihl*4;
// First check IP header
if ((previous_packets[i].saddr == iph->saddr) // Same source IP address
&& (previous_packets[i].daddr == iph->daddr) // Same destination Ip address
&& (previous_packets[i].protocol == iph->protocol) //Same protocol
&& (temphdrlen == iphdrlen)) // Same header length
{
// Then check TCP header
if((previous_tcp[i].source == tcph->source) // Same source port
&& (previous_tcp[i].dest == tcph->dest) // Same destination port
&& (previous_tcp[i].th_seq == tcph->th_seq) // Same sequence number
&& (previous_tcp[i].th_ack==tcph->th_ack) // Same acknowledge number
&& (previous_tcp[i].th_win == tcph->th_win) // Same window
&& (previous_tcp[i].th_flags == tcph->th_flags) // Same flags
&& (tcph->syn==1 || tcph->fin==1 ||segmentlength>0)) // Check if SYN or FIN are
{ // set or if tcp.segment 0
// At this point the packets are almost identical
// Now Check previous communication to check for retransmission
for(int z=index-1;z>=0;z--)
{
// Find packets going to the reverse direction
if ((previous_packets[z].daddr == iph->saddr) // Swapped IP source addresses
&& (previous_packets[z].saddr ==iph->daddr) // Same for IP dest addreses
&& (previous_packets[z].protocol == iph->protocol)) // Same protocol
{
if((previous_tcp[z].dest==tcph->source) // Swapped ports
&& (previous_tcp[z].source==tcph->dest)
&& (previous_tcp[z].th_seq-1 != tcph->th_ack) // Not Keepalive
&& (tcph->syn==1 // Either SYN is set
|| tcph->fin==1 // Either FIN is set
|| (segmentlength>0)) // Either segmentlength >0
&& (previous_tcp[z].th_seq>tcph->th_seq) // Next sequence number is
// bigger than the expected
&& (previous_tcp[z].ack != 1)) // Last seen ACK is set
{
retransmission = 1;
retransmissions++;
break;
}
}
}
}
}
}
if (retransmission == 1)
{
printf("Retransmission: True\n");
printf("\n\n******************IPv4 TCP Packet*************************\n");
printf(" |-IP Version : %d\n",(unsigned int)iph->version);
printf(" |-Source IP : %s\n" , inet_ntoa(source.sin_addr) );
printf(" |-Destination IP : %s\n" , inet_ntoa(dest.sin_addr) );
printf(" |-Source Port : %u\n", ntohs(tcph->source));
printf(" |-Destination Port : %u\n", ntohs(tcph->dest));
printf(" |-Protocol : %d\n",(unsigned int)iph->protocol);
printf(" |-IP Header Length : %d DWORDS or %d Bytes\n",
(unsigned int)iph->ihl,((unsigned int)(iph->ihl))*4);
printf(" |-Payload Length : %d Bytes\n",Size - header_size);
}
printf("Total Retransmissions: %d\n",retransmissions);
}
This approach is based on the wireshark wiki paragraph about Retransmission. I literally have clicked every page google has to offer on how to approach this analysis but this was the only thing I was able to find.
The results I get are somewhat correct, some Retransmissions go unnoticed, I get a lot of DUP-ACK packets and some normal traffic gets through as well (checked with wireshark). I use the smallFlows.pcap file found here and I believe that the results that I should have, should be the same as the tcp.analysis.retransmission && not tcp.analysis.spurious_retransmission filter in wireshark. Which amounts to 88 retransmissions for this pcap.
Running this code yields 45 and I can't understand why.
Sorry for the messy if statements, I tried my best to clean them up.
For detecting a retransmission you have to keep track of the expected sequence number. If the sequence number is higher than expected the packet is a retransmitted one ( TCP Analysis chapter of the wireshark docs,
https://www.wireshark.org/docs/wsug_html_chunked/ChAdvTCPAnalysis.html )
TCP Retransmission
Set when all of the following are true:
This is not a keepalive packet.
In the forward direction, the segment length is greater than zero or the SYN or FIN flag is set.
The next expected sequence number is greater than the current sequence number
Beside TCP Retransmission this there is also TCP Spurious Retransmission and TCP Fast Retransmission
Basically a retransmission is only necessary if a package is lost.
Analyzing lost segment inconsistency :
source of graphic : http://www.opentextbooks.org.hk/ditatopic/3578
For detecting this type of fault in wireshark the filter tcp.analysis.ack_lost_segment is used. Maybe try to implement this.
(https://serverfault.com/questions/626273/how-can-i-write-a-filter-to-get-tcp-sequence-number-inconsisten)
In wireshark several filters can be applied to capture all types of inconsistencies in sequence numbers i.e. tcp.analysis.retransmission, tcp.analysis.spurious_retransmission and tcp.analysis.fast_retransmission, for the general case of packet loss check for tcp.analysis.ack_lost_segment
https://superuser.com/questions/828294/how-can-i-get-the-actual-tcp-sequence-number-in-wireshark
By default Wireshark and TShark will keep track of all TCP sessions
and implement its own crude version of Sliding_Windows. This requires
some extra state information and memory to be kept by the dissector
but allows much better detection of interesting TCP events such as
retransmissions. This allows much better and more accurate
measurements of packet-loss and retransmissions than is available in
any other protocol analyzer. (But it is still not perfect)
This feature should not impact too much on the run-time memory
requirements of Wireshark but can be disabled if required.
When this feature is enabled the sliding window monitoring inside
Wireshark will detect and trigger display of interesting events for
TCP such as :
TCP Retransmission - Occurs when the sender retransmits a packet after the expiration of the acknowledgement.
TCP Fast Retransmission - Occurs when the sender retransmits a packet before the expiration of the acknowledgement timer. Senders
receive some packets which sequence number are bigger than the
acknowledged packets. Senders should Fast Retransmit upon receipt of 3
duplicate ACKs.
...
source : https://gitlab.com/wireshark/wireshark/-/wikis/TCP_Analyze_Sequence_Numbers
The concept of re-transmission is simple: data that was sent, was sent again.
In TCP, every transmitted byte has an identifier. If a TCP segment has 5 bytes in it (just a hypothetical example, in reality things are bigger of course), then the identifier of the first segment is the sequence number in the TCP header, +1 for the 2nd segment, ..., +4 for the 5th.
The receiver, when it wants to acknowledge a byte, it just sends an ACK with byte's sequence number +1. If receiver wants to acknowledge the 5 bytes as in our example, it ACKs the 5th byte, which is seq_num + 4 + 1. In your case, you do this calculation to get the next expected sequence number seq_num + 4 + 1.
Then, in order to detect if a re-transmission has happened, you simply know it if the same source has sent a TCP segment with a sequence number that's lower than the expected seq_num + 4 + 1.
Say, instead of getting seq_num + 4 + 1 in the next transmitted TCP message, you got seq_num. This means that the this segment is a re-transmission of the previous one.
But does it mean that this TCP segment, with the re-transmission, only contains re-transmissions? No. It can contain re-transmissions from previous segment, plus extra bytes for the next segment. This is why you need to count the total bytes in the segments to tell how many of the bytes are part of the re-transmissions, and how many are part of new transmission. As you see, TCP re-transmission is not binary per segment, but can overlap across segments. Because we are really re-transmitting bytes. We just store bytes in segments for reducing TCP header's overhead.
Now, what if you got seq_num + 2 + 1? This is a bit odd because it indicates that the previous segment got partially re-transmitted only. It basically indicates that it's only re-transmitting from byte 3. If the segment has only 3 bytes, it re-transmitting 3rd, 4th and 5th bytes (i.e. only the previous segment's bytes). But if it has, say, 10 bytes, it means that 6th, 7th, 8th, 9th and 10th bytes are new bytes (not re-transmitted).
In my opinion you can only say that a TCP packet is a re-transmission only when it's carrying bytes with identifiers that were sent before. But as said earlier, this might not be true, as a segment could contain some bytes sent earlier, plus more never sent, hence being a mixture between re-transmissions and new-transmissions.
I'm very new to C++, but I'm trying to learn some basics of TCP socket coding. Anyway, I've been able to send and receive messages, but I want to prefix my packets with the length of the packet (like I did in C# apps I made in the past) so when my window gets the FD_READ command, I have the following code to read just the first two bytes of the packet to use as a short int.
char lengthBuffer[2];
int rec = recv(sck, lengthBuffer, sizeof(lengthBuffer), 0);
short unsigned int toRec = lengthBuffer[1] << 8 | lengthBuffer[0];
What's confusing me is that after a packet comes in the 'rec' variable, which says how many bytes were read is one, not two, and if I make the lengthBuffer three chars instead of two, it reads three bytes, but if it's four, it also reads three (only odd numbers). I can't tell if I'm making some really stupid mistake here, or fundamentally misunderstanding some part of the language or the API. I'm aware that recv doesn't guarantee any number of bytes will be read, but if it's just two, it shouldn't take multiple reads.
Because you cannot assume how much data will be available, you'll need to continuously read from the socket until you have the amount you want. Something like this should work:
ssize_t rec = 0;
do {
int result = recv(sck, &lengthBuffer[rec], sizeof(lengthBuffer) - rec, 0);
if (result == -1) {
// Handle error ...
break;
}
else if (result == 0) {
// Handle disconnect ...
break;
}
else {
rec += result;
}
}
while (rec < sizeof(lengthBuffer));
Streamed sockets:
The sockets are generally used in a streamed way: you'll receive all the data sent, but not necessarily all at once. You may as well receive pieces of data.
Your approach of sending the length is hence valid: once you've received the length, you cann then load a buffer, if needed accross successive reads, until you got everything that you expected. So you have to loop on receives, and define a strategy on how to ahandle extra bytes received.
Datagramme (packet oriented) sockets:
If your application is really packet oriented, you may consider to create a datagramme socket, by requesting linux or windows socket(), the SOCK_DGRAM, or better SOCK_SEQPACKET socket type.
Risk with your binary size data:
Be aware that the way you send and receive your size data appers to be assymetric. You have hence a major risk if the sending and receiving between machine with CPU/architectures that do not use the same endian-ness. You can find here some hints on how to ame your code platform/endian-independent.
TCP socket is a stream based, not packet (I assume you use TCP, as to send length of packet in data does not make any sense in UDP). Amount of bytes you receive at once does not have to much amount was sent. For example you may send 10 bytes, but receiver may receive 1 + 2 + 1 + 7 or whatever combination. Your code has to handle that, be able to receive data partially and react when you get enough data (that's why you send data packet length for example).
According to Wikipedia, a traceroute program
Traceroute, by default, sends a sequence of User Datagram Protocol
(UDP) packets addressed to a destination host[...] The time-to-live
(TTL) value, also known as hop limit, is used in determining the
intermediate routers being traversed towards the destination. Routers
decrement packets' TTL value by 1 when routing and discard packets
whose TTL value has reached zero, returning the ICMP error message
ICMP Time Exceeded.[..]
I started writing a program (using an example UDP program as a guide) to adhere to this specification,
#include <sys/socket.h>
#include <assert.h>
#include <netinet/udp.h> //Provides declarations for udp header
#include <netinet/ip.h> //Provides declarations for ip header
#include <stdio.h>
#include <string.h>
#include <arpa/inet.h>
#include <unistd.h>
#define DATAGRAM_LEN sizeof(struct iphdr) + sizeof(struct iphdr)
unsigned short csum(unsigned short *ptr,int nbytes) {
register long sum;
unsigned short oddbyte;
register short answer;
sum=0;
while(nbytes>1) {
sum+=*ptr++;
nbytes-=2;
}
if(nbytes==1) {
oddbyte=0;
*((u_char*)&oddbyte)=*(u_char*)ptr;
sum+=oddbyte;
}
sum = (sum>>16)+(sum & 0xffff);
sum = sum + (sum>>16);
answer=(short)~sum;
return(answer);
}
char *new_packet(int ttl, struct sockaddr_in sin) {
static int id = 0;
char *datagram = malloc(DATAGRAM_LEN);
struct iphdr *iph = (struct iphdr*) datagram;
struct udphdr *udph = (struct udphdr*)(datagram + sizeof (struct iphdr));
iph->ihl = 5;
iph->version = 4;
iph->tos = 0;
iph->tot_len = DATAGRAM_LEN;
iph->id = htonl(++id); //Id of this packet
iph->frag_off = 0;
iph->ttl = ttl;
iph->protocol = IPPROTO_UDP;
iph->saddr = inet_addr("127.0.0.1");//Spoof the source ip address
iph->daddr = sin.sin_addr.s_addr;
iph->check = csum((unsigned short*)datagram, iph->tot_len);
udph->source = htons(6666);
udph->dest = htons(8622);
udph->len = htons(8); //udp header size
udph->check = csum((unsigned short*)datagram, DATAGRAM_LEN);
return datagram;
}
int main(int argc, char **argv) {
int s, ttl, repeat;
struct sockaddr_in sin;
char *data;
printf("\n");
if (argc != 3) {
printf("usage: %s <host> <port>", argv[0]);
return __LINE__;
}
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = inet_addr(argv[1]);
sin.sin_port = htons(atoi(argv[2]));
if ((s = socket(AF_PACKET, SOCK_RAW, 0)) < 0) {
printf("Failed to create socket.\n");
return __LINE__;
}
ttl = 1, repeat = 0;
while (ttl < 2) {
data = new_packet(ttl);
if (write(s, data, DATAGRAM_LEN) != DATAGRAM_LEN) {
printf("Socket failed to send packet.\n");
return __LINE__;
}
read(s, data, DATAGRAM_LEN);
free(data);
if (++repeat > 2) {
repeat = 0;
ttl++;
}
}
return 0;
}
... however at this point I have a few questions.
Is read(s, data, ... reading whole packets at a time, or do I need to parse the data read from the socket; seeking markers particular to IP packets?
What is the best way to uniquely mark my packets as they return to my box as expired?
Should I set up a second socket with the IPPROTO_ICMP flag, or is it easier to write a filter; accepting everything?
Do any other common mistakes exist; or are any common obstacles foreseeable?
Here are some of my suggestions (based on assumption it's a Linux machine).
read packets
You might want to read whole 1500 byte packets (entire Ethernet frame). Don't worry - smaller frames would still be read completely with read returning the length of data read.
Best way to add marker is to have some UDP payload (a simple unsigned int) should be good enough. Increase it on every packet sent. (I just did a tcpdump on traceroute - the ICMP error - does return an entire IP frame back - so you can look at the returned IP frame, parse the UDP payload and so on. Note your DATAGRAM_LEN would change accordingly. ) Of course you can use ID - but be careful that ID is mainly used by fragmentation. You should be okay with that - 'cos you'd not be approaching fragmentation limit on any intermediate routers with these packet sizes. Generally, not a good idea to 'steal' protocol fields that are meant for something else for our custom purpose.
A cleaner way could be to actually use IPPROTO_ICMP on raw sockets (if manuals are installed on your machine man 7 raw and man 7 icmp). You would not want to receive copy of all packets on your device and ignore those that are not ICMP.
If you are using type SOCKET_RAW on AF_PACKET, you will have to manually attach a link layer header or you can do SOCKET_DGRAM and check. Also man 7 packet for lot of subtleties.
Hope that helps or are you looking at some actual code?
A common pitfall is that programming at this level needs very careful use of the proper include files. For instance, your program as-is won't compile on NetBSD, which is typically quite strict in following relevant standards.
Even when I add some includes, there is no struct iphdr but there is a struct udpiphdr instead.
So for now the rest of my answer is not based on trying your program in practice.
read(2) can be used to read single packets at a time. For packet-oriented protocols, such as UDP, you'll never get more data from it than a single packet.
However you can also use recvfrom(2), recv(2) or recvmsg(2) to receive the packets.
If fildes refers to a socket, read() shall be equivalent to recv()
with no flags set.
To identify the packets, I believe using the id field is typically done, as you have already. I am not sure what you mean with "mark my packets as they return to my box as expired", since your packets don't return to you. What you may get back are ICMP Time Exceeded messages. These usually arrive within a few seconds, if they arrive at all. Sometimes they are not sent, sometimes they may be blocked by misconfigured routers between you and their sender.
Note that this assumes that the IP ID you set up in your packet is respected by the network stack you're using. It is possible that it doesn't, and replaces your chosen ID with a different one. Van Jacobson, the original author of the traceroute command as found in NetBSD therefore use a different method:
* The udp port usage may appear bizarre (well, ok, it is bizarre).
* The problem is that an icmp message only contains 8 bytes of
* data from the original datagram. 8 bytes is the size of a udp
* header so, if we want to associate replies with the original
* datagram, the necessary information must be encoded into the
* udp header (the ip id could be used but there's no way to
* interlock with the kernel's assignment of ip id's and, anyway,
* it would have taken a lot more kernel hacking to allow this
* code to set the ip id). So, to allow two or more users to
* use traceroute simultaneously, we use this task's pid as the
* source port (the high bit is set to move the port number out
* of the "likely" range). To keep track of which probe is being
* replied to (so times and/or hop counts don't get confused by a
* reply that was delayed in transit), we increment the destination
* port number before each probe.
Using a IPPROTO_ICMP socket for receiving the replies is more likely to be efficient than trying to receive all packets. It would also require fewer privileges to do so. Of course sending raw packets normally already requires root, but it could make a difference if a more fine-grained permission system is in use.
I am using domain sockets (AF_UNIX) to communicate between two threads for inter process communication. This is chosen to work well with libev: I use it on the recv end of the domain socket. This works very well except that the data I am sending is constant 4864 bytes. I cannot afford to get this data fragmented. I always thought domain sockets won't fragment data, but as it turns out it does. When the communication is at its peak between the threads, I observe the following
Thread 1:
SEND = 4864 actual size = 4864
Thread 2:
READ = 3328 actual size = 4864
Thread 1:
SEND = 4864 actual size = 4864
Thread 2:
READ = 1536 actual size = 4864
As you can see, thread 2 received the data in fragments (3328 + 1536). This is really bad for my application. Is there anyway we can make it not fragment it? I understand that IP_DONTFRAG can be set to only AF_INET family? Can someone suggest an alternative?
Update: sendto code
ssize_t
socket_domain_writer_dgram_send(int *domain_sd, domain_packet_t *pkt) {
struct sockaddr_un remote;
unsigned long len = 0;
ssize_t ret = 0;
memset(&remote, '\0', sizeof(struct sockaddr_un));
remote.sun_family = AF_UNIX;
strncpy(remote.sun_path, DOMAIN_SOCK_PATH, strlen(DOMAIN_SOCK_PATH));
len = strlen(remote.sun_path) + sizeof(remote.sun_family) + 1;
ret = sendto(*domain_sd, pkt, sizeof(*pkt), 0, (struct sockaddr *)&remote, sizeof(struct sockaddr_un));
if (ret == -1) {
bps_log(BPS_LOGGER_RD, ASL_LEVEL_ERR, "Domain writer could not connect send packets", errno);
}
return ret;
}
SOCK_STREAM by definition doesn't preserve message boundaries. Try again with SOCK_DGRAM or SOCK_SEQPACKET:
http://man7.org/linux/man-pages/man7/unix.7.html
On the other hand, consider that you may be passing messages larger than your architecture page size. For example, for amd64, a memory page is 4K. If that's a problem for any reason it might make sense to split the packets in 2.
Note however, that's not a real issue for the packets to arrive fragmented. It's common to have a packet assembler in the receiving end of the socket. What's wrong with implementing it ?
4864 + 3328 = 8192. My guess is that you're transmitting two 4864-byte packets back to back in some cases, and it's filling an 8 KB kernel buffer somewhere. IP_DONTFRAG isn't applicable because IP is not involved here — the "fragmentation" you're seeing is happening via a completely different mechanism.
If all the data you're transmitting consists of packets, you would do well to use a datagram socket (SOCK_DGRAM) instead of a stream. This should make the send() block when the kernel buffer doesn't have sufficient space to store an entire packet, rather than allowing a partial write through, and will make each recv() return exactly one packet, so you don't need to deal with framing.
Compiler: Code::Blocks(GNU GCC)
Platform: Windows(x86)
Includes: winsock.h winsock2.h (ws2_32 is linked as well)
I am currently trying to write a program that will read a text file containing a list of IP-Addresses and ping each IP-Addresses respectively. If the host responds to the ping then the host's IP-Address will be copied to a seconded file specified by the user. Unfourtantly this is the first time I have used C's socket library and I cannot find a good tutorial about how to ping using C. From what I understand of the couple tutorials I did find. I need to included a ICMP header, which is a struct containg the ICMP type, code, and checksum, in a IP datagram. But I have no idea how to go about doing so, should I declare the struct myself or is it declared in a header file? I am assuming that it is in a header but the tutorials contradicted each other about exactly where it is declared. I tired including icmp.h and netinet/icmp.h but my compiler complained that they don't exist so I created my own struct.
struct echo_request
{
char type; // Type
char code; // Code
short checksum; // Checksum
short id; // Identification
short seq; // Sequence
int time; // Time
char data[16]; // Data
};
I thought that I might be able to get away with it but I wasn't even able to compile my program because my compiler says that in_cksum()(checksum generator) is undefined.
To sum up my questions, what header files should I include, how do I create a ping packet, am I using the correct checksum generator function, should a ping be directed to port 80, and should the socket I use be RAW or DGRAM?
This is what I have so far, please note that I have purposely left out error checking.
int socket_descriptor = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP);
struct sockaddr_in address; //Initialize address struct
memset(&address, 0, sizeof(address)); //Clear address struct
//Declare address
address.sin_family = AF_INET;
address.sin_addr.s_addr = inet_addr(ipaddress);
address.sin_port = htons(80);
//Bind socket to address
bind(socket_descriptor, (struct sockaddr *)&address, sizeof(address));
//Create packet
struct echo_request packet; //See above for declaration of struct
memset(packet.data, 1, 16);
packet.type = 8; //ECHO_REQUEST
packet.code = 0;
packet.time = gettime();
packet.checksum = 0;
packet.checksum = in_cksum(packet, sizeof(packet));
If you don't have to implement the ping from scratch and you want only Windows solution, I'd second Anton's suggestion for IcmpSendEcho. If you have to implement ping, look at how POCO ICMP package is implemented. It is portable code and it runs fine on Windows.
In regards to the specific questions, here are the answers:
what header files should I include
#include <winsock2.h>
how do I create a ping packet
See ICMPv4PacketImpl::initPacket() for an example of IPv4 packet.
am I using the correct checksum generator function
Not for windows. See ICMPPacketImpl::checksum() for an example of checksum function.
should a ping be directed to port 80
No. There's no such thing as port when it comes to ICMP. See Does ICMP use a specific port?
should the socket I use be RAW or DGRAM
It should be RAW.
It looks like you want a real solution, not just reimplementing PING for the sake of it.
I recommend using IP helper (ICMP.dll on pre-WinXP systems), specifically, IcmpSendEcho (or its enhanced versions, IcmpSendEcho2, IcmpSendEcho2Ex, for asynchronous operations).
There is a complete example of "pinging" a host on MSDN. It may be a good starting point.
Update: for GCC (mingw), link with -liphlpapi.