I am trying to use a 16-bit Fletcher checksum here. Basically, my program simulates traffic over the physical layer by "sending" and "receiving" packets between two virtual entities. I am printing out the packets at both sides and they do match, but I am getting a different checksum calculated at the receiving end.
Packet structure:
#define MESSAGE_LENGTH 20
struct pkt {
int seqnum;
int acknum;
int checksum;
char payload[MESSAGE_LENGTH];
};
This is the code I'm using to compute the checksum of each packet:
/*
* Computes the fletcher checksum of the input packet
*/
uint16_t calcChecksum(struct pkt *packet) {
/* the data for the checksum needs to be continuous, so here I am making
a temporary block of memory to hold everything except the packet checksum */
size_t sizeint = sizeof(int);
size_t size = sizeof(struct pkt) - sizeint;
uint8_t *temp = malloc(size);
memcpy(temp, packet, sizeint * 2); // copy the seqnum and acknum
memcpy(temp + (2*sizeint), &packet->payload, MESSAGE_LENGTH); // copy data
// calculate checksum
uint16_t checksum = fletcher16((uint8_t const *) &temp, size);
free(temp);
return checksum;
}
/*
* This is a checksum algorithm that I shamelessly copied off a wikipedia page.
*/
uint16_t fletcher16( uint8_t const *data, size_t bytes ) {
uint16_t sum1 = 0xff, sum2 = 0xff;
size_t tlen;
while (bytes) {
tlen = bytes >= 20 ? 20 : bytes;
bytes -= tlen;
do {
sum2 += sum1 += *data++;
} while (--tlen);
sum1 = (sum1 & 0xff) + (sum1 >> 8);
sum2 = (sum2 & 0xff) + (sum2 >> 8);
}
/* Second reduction step to reduce sums to 8 bits */
sum1 = (sum1 & 0xff) + (sum1 >> 8);
sum2 = (sum2 & 0xff) + (sum2 >> 8);
return sum2 << 8 | sum1;
}
I don't know much about checksums and I copied that algorithm off a page I found, so if anyone can understand why the checksums for two otherwise identical packets are different I would greatly appreciate it. Thank you!
The problem occurs because you are not passing the address of temp data to the checksum function but rather the address to where the variable temp is stored on stack.
You should change
uint16_t checksum = fletcher16((uint8_t const *) &temp, size);
to
uint16_t checksum = fletcher16((uint8_t const *) temp, size);
^ no & operator
Related
I'm having a hard time figuring out which implementation of the 32-bit variation of the Fletcher checksum algorithm is correct. Wikipedia provides the following optimized implementation:
uint32_t fletcher32( uint16_t const *data, size_t words ) {
uint32_t sum1 = 0xffff, sum2 = 0xffff;
size_t tlen;
while (words) {
tlen = words >= 359 ? 359 : words;
words -= tlen;
do {
sum2 += sum1 += *data++;
} while (--tlen);
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
}
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
return sum2 << 16 | sum1;
}
In addition, I've adapted the non-optimized 16-bit example from the Wikipedia article to compute a 32-bit checksum:
uint32_t naive_fletcher32(uint16_t *data, int words) {
uint32_t sum1 = 0;
uint32_t sum2 = 0;
int index;
for( index = 0; index < words; ++index ) {
sum1 = (sum1 + data[index]) % 0xffff;
sum2 = (sum2 + sum1) % 0xffff;
}
return (sum2 << 16) | sum1;
}
Both these implementations yield the same results, e.g. 0x56502d2a for the string abcdef. To verify that this is indeed correct, I tried to find other implementations of the algorithm:
An online checksum/hash generator
C++ implementation in the srecord project
There's also a JavaScript implementation
All of these seem to agree that the checksum for abcdef is 0x8180255 instead of the value given by the implementation on Wikipedia. I've narrowed this down to how the data buffer the implementation operates on. All the above non-wikipedia implementation operate one byte at a time, whereas the Wikipedia implementation computes the checksum using 16-bit words. If I modify the above "naive" Wikipedia implementation to operate per-byte instead, it reads like this:
uint32_t naive_fletcher32_per_byte(uint8_t *data, int words) {
uint32_t sum1 = 0;
uint32_t sum2 = 0;
int index;
for( index = 0; index < words; ++index ) {
sum1 = (sum1 + data[index]) % 0xffff;
sum2 = (sum2 + sum1) % 0xffff;
}
return (sum2 << 16) | sum1;
}
The only thing that changes is the signature, really. So this modified naive implementation and the above mentioned implementations (except Wikipedia) agree that the checksum of abcdef is indeed 0x8180255.
My problem now is: which one is correct?
According to the standard, the right method is the one that Wikipedia provides — except the name:
Note that the 8-bit Fletcher algorithm gives a 16-bit checksum and the 16-bit algorithm gives a 32-bit checksum.
In the standard quoted in the answer of HideFromKGB, the algorithm is trivial: the 8-bit version uses only 8 bit accumulators ("ints"), producing 8 bit results A and B, and the 16-bit version uses 16 bit "ints", producing 16 bit results A and B.
It should be noted that what Wikipedia calls the "32 bit Fletcher" is actually the "16 bit Fletcher". The number of bits in the name refers in the standard to the number of bits in each D[i] and in each of A and B, but on Wikipedia it refers to the number of bits in the "stacked result", i.e. in A<<16 | B for the 32 bit result.
I did not implement this, but maybe this can explain the difference. I am inclined to say that your interpretation (implementation) is correct.
N.b.: also note that it is necessary to pad data with zeroes to the appropriate number of bytes.
These are test vectors, which are cross checked with two different implementations for 16-bit and for 32-bit check sums:
8-bit implementation (16-bit checksum)
"abcde" -> 51440 (0xC8F0)
"abcdef" -> 8279 (0x2057)
"abcdefgh" -> 1575 (0x0627)
16-bit implementation (32-bit checksum)
"abcde" -> 4031760169 (0xF04FC729)
"abcdef" -> 1448095018 (0x56502D2A)
"abcdefgh" -> 3957429649 (0xEBE19591)
TCP Alternate Checksum Options describes the Fletcher checksum algorithm for use with TCP: RFC 1146 dated March 1990.
The 8-bit Fletcher algorithm which gives a 16-bit checksum and the 16-bit algorithm which gives a 32-bit checksum are discussed.
The 8-bit Fletcher Checksum Algorithm is calculated over a sequence
of data octets (call them D[1] through D[N]) by maintaining 2
unsigned 1's-complement 8-bit accumulators A and B whose contents are
initially zero, and performing the following loop where i ranges from
1 to N:
A := A + D[i]
B := B + A
The 16-bit Fletcher Checksum algorithm proceeds in precisely the same
manner as the 8-bit checksum algorithm, except that A, B and the
D[i] are 16-bit quantities. It is necessary (as it is with the
standard TCP checksum algorithm) to pad a datagram containing an odd
number of octets with a zero octet.
That agrees with Wikipedia algorithms. The simple testing program confirms quoted results:
#include <stdio.h>
#include <string.h>
#include <stdint.h> // for uint32_t
uint32_t fletcher32_1(const uint16_t *data, size_t len)
{
uint32_t c0, c1;
unsigned int i;
for (c0 = c1 = 0; len >= 360; len -= 360) {
for (i = 0; i < 360; ++i) {
c0 = c0 + *data++;
c1 = c1 + c0;
}
c0 = c0 % 65535;
c1 = c1 % 65535;
}
for (i = 0; i < len; ++i) {
c0 = c0 + *data++;
c1 = c1 + c0;
}
c0 = c0 % 65535;
c1 = c1 % 65535;
return (c1 << 16 | c0);
}
uint32_t fletcher32_2(const uint16_t *data, size_t l)
{
uint32_t sum1 = 0xffff, sum2 = 0xffff;
while (l) {
unsigned tlen = l > 359 ? 359 : l;
l -= tlen;
do {
sum2 += sum1 += *data++;
} while (--tlen);
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
}
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
return (sum2 << 16) | sum1;
}
int main()
{
char *str1 = "abcde";
char *str2 = "abcdef";
size_t len1 = (strlen(str1)+1) / 2; // '\0' will be used for padding
size_t len2 = (strlen(str2)+1) / 2; //
uint32_t f1 = fletcher32_1(str1, len1);
uint32_t f2 = fletcher32_2(str1, len1);
printf("%u %X \n", f1,f1);
printf("%u %X \n\n", f2,f2);
f1 = fletcher32_1(str2, len2);
f2 = fletcher32_2(str2, len2);
printf("%u %X \n",f1,f1);
printf("%u %X \n",f2,f2);
return 0;
}
Output:
4031760169 F04FC729
4031760169 F04FC729
1448095018 56502D2A
1448095018 56502D2A
My answer is focusses on the correctness of s = (s & 0xffff) + (s >> 16);.
This is obviously supposed to replace the modulo operation. Now the big issue with the modulo operation is the division that needs to be performed. The trick is to not do the division and to estimate floor(s / 65535). So instead of computing s - floor(s/65535)*65535, which would be the same as modulo, we compute s - floor(s/65536)*65535. This will obviously not be equivalent to doing modulo. But it's good enough to quickly reduce the size of s.
Now we have
s - floor(s / 65536) * 65535
= s - (s >> 16) * 65535
= s - (s >> 16) * (65536 - 1)
= s - (s >> 16) * 65536 + (s >> 16)
= (s & 0xffff) + (s >> 16)
Since the (s & 0xffff) + (s >> 16) is not equivalent to doing the modulo, it does not suffice to use this formula. If s == 65535 then s % 65535 would yield zero. However, the former formula yields 65535. So the optimized Wikipedia implementation posted here is obviously false! The last 3 lines need to be changed to
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
if (sum1 >= 65535) { sum1 -= 65535; }
if (sum2 >= 65535) { sum2 -= 65535; }
return (sum2 << 16) | sum1;
It is noteworthy, that I can't find the optimized implementation on the Wikipedia page anymore (February 2020).
Addendum:
Imagine s would be equal to the maximum unsigned 32 bit value, that is 0xFFFF_FFFF. Then the formula (s & 0xffff) + (s >> 16); yields 0x1FFFE. That is exactly two times 65535. So the correction step if (s >= 65535) { s -= 65535; } will not work since it subtracts 65535 at most once. So we want to keep sum1 and sum2 in the loops strictly smaller than 0xFFFF_FFFF. Then the formula yields at most 2*65535-1 and the correction step will work. The following simple python program determines, that sum2 would become too big after 360 iterations. So processing at most 359 16 bit words at a time is exactly right.
s1 = 0x1FFFD
s2 = 0x1FFFD
for i in range(1,1000):
s1 += 0xFFFF
s2 += s1
if s2 >= 0xFFFFFFFF:
print(i)
break
I'm on x64 machine. Here is how I'm calculation the checksum for ICMP:
unsigned short in_checksum(unsigned short *ptr, int n_bytes)
{
register long sum;
u_short odd_byte;
register u_short ret_checksum;
while (n_bytes > 1)
{
sum += *ptr++;
n_bytes -= 2;
}
if (n_bytes == 1)
{
odd_byte = 0;
*((u_char *) & odd_byte) = * (u_char *) ptr;
sum += odd_byte;
}
sum = (sum >> 16) + (sum & 0xffff);
sum += (sum >> 16);
ret_checksum = ~sum;
return ret_checksum;
}
When I sniff the sent packets by wireshark, I always says the checksum is incorrect for each icmp packet. What's up with this?
You forgot to initialize
register long sum;
to 0. Passing option -W to gcc would have told you.
...: In function 'in_checksum':
...: warning: 'sum' may be used uninitialized in this function
I am writing a little POSIX program and I need to compute the checksum of a TCP segment, I would like use an existing function in order to avoid to writing one myself.
Something like (pseudocode) :
char *data = ....
u16_integer = computeChecksum(data);
I searched on the web but I did not find a right answer, any suggestion ?
Here, it's taken more or less directly from the RFC:
uint16_t ip_calc_csum(int len, uint16_t * ptr)
{
int sum = 0;
unsigned short answer = 0;
unsigned short *w = ptr;
int nleft = len;
while (nleft > 1) {
sum += *w++;
nleft -= 2;
}
sum = (sum >> 16) + (sum & 0xFFFF);
sum += (sum >> 16);
answer = ~sum;
return (answer);
}
Is this conversion right from the original?
uint8_t fletcher8( uint8_t *data, uint8_t len )
{
uint8_t sum1 = 0xff, sum2 = 0xff;
while (len) {
unsigned tlen = len > 360 ? 360 : len;
len -= tlen;
do {
sum1 += *data++;
sum2 += sum1;
tlen -= sizeof( uint8_t );
} while (tlen);
sum1 = (sum1 & 0xff) + (sum1 >> 4);
sum2 = (sum2 & 0xff) + (sum2 >> 4);
}
/* Second reduction step to reduce sums to 4 bits */
sum1 = (sum1 & 0xff) + (sum1 >> 4);
sum2 = (sum2 & 0xff) + (sum2 >> 4);
return sum2 << 4 | sum1;
}
Original:
uint32_t fletcher32( uint16_t *data, size_t len )
{
uint32_t sum1 = 0xffff, sum2 = 0xffff;
while (len) {
unsigned tlen = len > 360 ? 360 : len;
len -= tlen;
do {
sum1 += *data++;
sum2 += sum1;
tlen -= sizeof( uint16_t );
} while (tlen);
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
}
/* Second reduction step to reduce sums to 16 bits */
sum1 = (sum1 & 0xffff) + (sum1 >> 16);
sum2 = (sum2 & 0xffff) + (sum2 >> 16);
return sum2 << 16 | sum1;
}
len will be 8.
data will have an input of data[] (1 - 8)
Actually I don't know what to do with the line: unsigned tlen = len > 360 ? 360 : len;
Maybe -> int8_t tlen = len > 255 ? 255 : len;
How to compute this tlen value
Actually I don't know what to do with the line: unsigned tlen = len > 360 ? 360 : len;
That line appears to come from an old version of this Wikipedia section. It has been changed to 359 by now, with the rationale explained on the talk page. The number only applies to summing 16 bit entities, as it is the largest number n satisfying
n(n+5)/2 × (216−1) < 232
In other words, this is the larges number of times you can add blocks without performing a modulo reduction, and still avoid overflowing the uint32_t. For 4 bit data words and a 8 bit accumulator, the corresponding value will be 4, computed using
n(n+5)/2 × (24−1) < 28
So you have to modify that line if you change your data size. You could also change the code to use a larger data type to keep its sums, thus summing more blocks before the reduction. But in that case, you might require more than one reduction step inside the loop.
For example, if you were to use uint32_t for sum1 and sum2, then you could sum 23927 nibbles before danger of overflow, but after that you would require up to 7 reductions of the form sum1 = (sum1 & 0xf) + (sum1 >> 4) to boil this down to the range 1 through 0x1e, the way your original agorithm does it. It might be more efficient to write this as (sum1 - 1)%0xf + 1. In which case you might even change the range back from 1 through 15 to 0 through 14, initialize the sums to 0 and write the reduction as sum1 %= 0xf. Unless you require compatibility with an implementation that uses the other range.
I think you need 0xF masks throughout not 0xFF. The 32 bit uses a 16 bit mask, half of 32, your 8 bit uses an 8 bit mask which is not half of 8, 4 bits is half of 8.
uint8_t fletcher8( uint8_t *data, uint8_t len )
{
uint8_t sum1 = 0xf, sum2 = 0xf;
while (len) {
unsigned tlen = len > 360 ? 360 : len;
len -= tlen;
do {
sum1 += *data++;
sum2 += sum1;
tlen -= sizeof( uint8_t );
} while (tlen);
sum1 = (sum1 & 0xf) + (sum1 >> 4);
sum2 = (sum2 & 0xf) + (sum2 >> 4);
}
/* Second reduction step to reduce sums to 4 bits */
sum1 = (sum1 & 0xf) + (sum1 >> 4);
sum2 = (sum2 & 0xf) + (sum2 >> 4);
return sum2 << 4 | sum1;
}
Otherwise you are creating a different checksum, not Fletcher. sum1 for example is performing what I think is called a ones complement checksum. Basically it is a 16 bit in prior and 4 bit in your case, checksum where the carry bits are added back in. used in internet protocols makes it very easy to modify a packet without having to compute the checksum on the whole packet, you can add and subtract only the changes against the existing checksum.
The additional reduction step is for a corner case, if the result of sum1 += *data = 0x1F using a four bit scheme, then the adding of the carry bit is 0x01 + 0x0F = 0x10, you need to add that carry bit back in as well so outside the loop 0x01 + 0x00 = 0x01. Otherwise that outside the loop sum is adding in zero. Depending on your architecture you might execute faster with something like if(sum1&0x10) sum1=0x01; than the shifty adding thing that may take more instructions.
Where it becomes something more than just a checksum with the carry bits added in is that last step when the two are combined. And if you for example only use the 32 bit fletcher as a 16 bit checksum, you have wasted your time, the lower 16 bits of the result are just a stock checksum with the carry bit added back in, nothing special. sum2 is the interesting number as it is an accumulation of sum1 checksums (sum1 is an accumulation of the data, sum2 is an accumulation of checksums).
in the original version sum1,sum2 are 32-bit. that is why the bit shifting afterwards. In your case you declare sum1,sum2 to be 8 bit so bit shifting makes no sense.
I am trying to build a TCP packet using libnet library. I use '0' for autocomputation of checksum value in the libnet_build_tcp function. However, it seems the checksum ignores the pseudo-header when being computed resulting in a useless packet because of checksum error. Does anyone know how to solve this?
As far as I can see in the code, as long as you haven't used libnet_toggle_checksum(l, ptag, 1); your 0 in the sum parameter of libnet_build_tcp() should be causing it to autocompute a checksum for you by calling libnet_pblock_setflags(p, LIBNET_PBLOCK_DO_CHECKSUM).
I couldn't really tell you how to solve it but since you are the one building the packet maybe you could opt for creating your own checksum?
The TCP pseudoheader is a 12 byte field containing the source and destination IP addresses which are both 4 bytes each, a reserved field which is 2 bytes and always set to zeros, and a 4 byte TCP segment length which is equal to the size of the TCP header + the payload length.
You could create something like this:
First create your variables:
int bytesRead;
int pseudoHeaderLength;
int payloadLength;
int tcpHdrLength;
int tcpSegmentLength;
int ipHdrLength;
int headerLengths;
int ethIPHdrLengths;
struct ethhdr *ethHdr;
struct iphdr *ipHdr;
struct tcphdr *tcpHdr;
u_int8_t *pkt_data;
u_int8_t pseudoHeader[12];
u_int8_t packetBuffer[1600];
u_int16_t newChecksum;
Then create the actual function where b->len is the total size of the packet (just add all the headers + the payload and get the total size) and you would just have to memcpy your header and data to pkt_data:
ethHdr = (struct ethhdr*)pkt_data;
ipHdr = (struct iphdr*)(pkt_data + ETH_HLEN);
ipHdrLength = ipHdr->ihl * 4;
ethIPHdrLengths = ETH_HLEN + ipHdrLength;
tcpHdr = (struct tcphdr*)(pkt_data + ethIPHdrLengths);
tcpHdrLength = tcpHdr->doff * 4;
headerLengths = ethIPHdrLengths + tcpHdrLength;
payloadLength = b->len - headerLengths;
tcpSegmentLength = tcpHdrLength + payloadLength;
pseudoHeaderLength = tcpSegmentLength + 12;
pseudoHeader[0] = pkt_data[30];
pseudoHeader[1] = pkt_data[31];
pseudoHeader[2] = pkt_data[32];
pseudoHeader[3] = pkt_data[33];
pseudoHeader[4] = pkt_data[26];
pseudoHeader[5] = pkt_data[27];
pseudoHeader[6] = pkt_data[28];
pseudoHeader[7] = pkt_data[29];
pseudoHeader[8] = 0x00;
pseudoHeader[9] = 0x06;
pseudoHeader[10] = (tcpSegmentLength >> 8) & 0xFF;
pseudoHeader[11] = tcpSegmentLength & 0xFF;
bytesRead = 0;
for(i = 0; i < 12; i++) {
packetBuffer[bytesRead] = pseudoHeader[i];
bytesRead++;
}
for(i = ethIPHdrLengths; i < headerLengths; i++) {
packetBuffer[bytesRead] = pkt_data[i];
bytesRead++;
}
for(i = b->len - payloadLength; i < b->len; i++) {
packetBuffer[bytesRead] = pkt_data[i];
bytesRead++;
}
newChecksum = checksum((uint16_t *)packetBuffer, pseudoHeaderLength);
And just use the checksum function provided by https://www.rfc-editor.org/rfc/rfc1071 to calculate the checksum over the buffer:
uint16_t checksum(uint16_t *addr, int len)
{
int count = len;
register uint32_t sum = 0;
uint16_t answer = 0;
while (count > 1) {
sum += *(addr++);
count -= 2;
}
if (count > 0) {
sum += *(uint8_t *) addr;
}
while (sum >> 16) {
sum = (sum & 0xffff) + (sum >> 16);
}
answer = ~sum;
return (answer);
}
I use this in a realistic environment to calculate checksums at 5 million PPS.