How do you compute the XOR Remainder used in CRC? - xor

I'm trying to remember how the math is worked out to compute the remainder of an XOR algorithm in Cyclical Redundancy Checks to verify the remainder bits of a network message.
I shouldn't have tossed that text book.
This is easily done in code, but how is it worked out by hand?
I know it looks something like a standard division algorithm, but I can't remember where to go from there to get the remainder.
___________
1010 | 101101000
Note: I did google it, but wasn't able to find a place where they mapped the steps in figuring the remainder.

1010 | 101101000
1010
0001 this result is 1011 XOR 1010 = 0001
1010
1010
0000 thus no remainder.
Thus 101101000 is perfect and no error has occurred in transmission/reception

In my experience it's easier to convert it to a polynomial when calculating by hand, especially when there're a lot of zeroes.
1010 = 1*x^3 + 0*x^2 + 1*x^1 + 0*x^0 = x^3 + x = x3 + x
101101000 = x8 + x6 + x5 + x3
-------------------
x3 + x ) x8 + x6 + x5 + x3
Then you divide the largest term in the dividend (x^8) with the first term in the divisor (x^3), resulting in x^5. You put that number on top and then multiply it with each term in the divisor. This yields the following for the first iteration:
x5
-------------------
x3 + x ) x8 + x6 + x5 + x3
x8 + x6
Doing XOR for each term then yields the new dividend: x5 + x3:
x5
-------------------
x3 + x ) x8 + x6 + x5 + x3
x8 + x6
-------------------
x5 + x3
Follow the same pattern until the dividend's largest term is smaller then the divisor's largest term. After the calculations are complete it will look like this:
x5 + x2
-------------------
x3 + x ) x8 + x6 + x5 + x3
x8 + x6
-------------------
x5 + x3
x5 + x3
-------------------
0
The reminder in this case is 0, which would indicate that most likely no errors has occurred during the transmission.
Note: I've shortened x^y as xy in the example above to reduce the clutter in the answer, since SO doesn't support math equation formatting.
Note2: Adding/subtracting a multiple of the divisor from the dividend will also give the reminder 0, since (P(x) + a*C(x)) / C(x) = P(x)/C(x) + a*C(x)/C(x) gives the same reminder as P(x)/C(x) since the reminder of a*C(x)/C(x) is 0.

It is long division by binary 11. There is an example on Wikipedia.

Let's assume that we want to divide 101110000 to 1001.
101110000
1001
--------- XOR the 1011 and 1001
0010
Now we will remove the zeros at the beginning of our XOR result which is 0010 and slide numbers from the top.
101110000
1001
---------
1010
Continue the XOR 1001 with the result.
101110000
1001
---------
1010
1001
---------
0011
--------- Remove zeros at the beginning
1100
1001
---------
0101
--------- Remove zeros at the beginning
1010
1001
---------
0011
Answer is 0011.

Related

Bitwise carry applications

Call me naive but in this area I always struggled. So I was just browsing through the code for adding two numbers without + operator and bumped into this code:
int Add(int x, int y)
{
// Iterate till there is no carry
while (y != 0)
{
// carry now contains common set bits of x and y
int carry = x & y;
// Sum of bits of x and y where at least one of the bits is not set
x = x ^ y;
// Carry is shifted by one so that adding it to x gives the
// required sum
y = carry << 1;
}
return x;
}
Now I understand, how he is calculating the carry but why y!=0 and how this code is achieving the result for adding two numbers?
Basics first. Exclusive or'ing two bits is the same as the bottom digit of their sum. And'ing two bits is the same as the top bit of their sum.
A | B | A&B | A^B | A+B
-----------------------
0 | 0 | 0 | 0 | 00
0 | 1 | 0 | 1 | 01
1 | 0 | 0 | 1 | 01
1 | 1 | 1 | 0 | 10
As you can see the exclusive-or result is the same as the last digit of the sum. You can also see that the first digit of the sum is only 1 when A is 1 and B is 1.
[If you have a circuit with two inputs and two outputs, one of which is the exclusive or of the inputs and the other is the and of the inputs, it is called a half adder - because there is no facility to also input a carry (from a previous digit).]
So, to sum two bits, you calculate the XOR to get the lowest digit of the result and the AND to get the highest digit of the result.
For each individual pair of bits in a pair of numbers, I can calculate the sum of those two bits by doing both an XOR and an AND. Using four bit numbers, for example 3 and 5
3 0011
5 0101
------
0110 3^5 = 6 (low bit)
0001 3&5 = 1 (high bit)
In order to treat the 3 and 5 as single numbers rather than collections of four bits, each of those high bits needs to be treated as a carry and added to the next low bit to the left. We can do this by shifting the 3&5 left 1 bit and adding to the 3^5 which we do by repeating the two operations
6 0110
1<<1 0010
----
0100 6^(1<<1) = 4
0010 6&(1<<1) = 2
Unfortunately, one of the additions resulted in another carry being generated. So we can just repeat the operation.
4 0100
2<<1 0100
----
0000 4^(2<<1) = 0
0100 4&(2<<1) = 4
We still have a carry, so round we go again.
0 0000
4<<1 1000
----
1000 4^(4<<1) = 8
0000 4&(4<<1) = 0
This time, all the carries are 0 so more iterations are not going to change anything. We've finished.
I will try to explain it on a simple 3 bits example (you can skip this example to the actual explanation which marked in bold font and starts at Now to the way we achieve the same flow from the posted code).
Lets say we want to add x=0b011 with y=0b101. First we add the least significant bits 1+1 = 0b10
carry: x10
x: 011
+
y: 101
-----
xx0
Then we add the second bits (and by the book we need to add also the carry from the previous stage but we can also skip it for later stage): 1+0 = 0b1
carry: 010
x: 011
+
y: 101
-----
x10
Do the same for the third bit: 0+1 = 0b1
carry: 010
x: 011
+
y: 101
-----
110
So now we have carry = 0b010 and some partial result 0b110.
Remember my comment earlier that we take care of carry at some later stage? So now is this "later stage". Now we add the carry to the partial result we got (note that it is the same if we added the carry for each bit separately at the earlier stages). LSB bits addition:
NEW carry: x00
carry: 010
+
part. res.: 110
-----
xx0
Second bits addition:
NEW carry: 100
carry: 010
+
part. res.: 110
-----
x00
Third bit addition:
NEW carry: 100
carry: 010
+
part. res.: 110
-----
new part. res. 100
Now carry = NEW carry, part. res. = new part. res. and we do the same iteration once again.
For LSB
NEW carry: x00
carry: 100
+
part. res.: 100
-----
xx0
For the second bits:
NEW carry: 000
carry: 100
+
part. res.: 100
-----
x00
Third bits:
NEW carry: 1000 --> 000 since we are working with 3 bits only
carry: 100
+
part. res.: 100
-----
000
Now NEW carry is 0 so we have finished the calculation.The final result is 0b000 (overflow).
I am sure I haven't discovered anything to here. Now to the way we achieve the same flow from the posted code:
The partial result is the result without the carry, which means when x and y have different bits at the same position, the sum of these bits will be 1. If the same bits are identical, the result will be 0 (1+1 => 0, carry 1 and 0+0 => 0, carry 0).
Thus partial result is x ^ y (see the properties of the XOR operation). In the posted code it is x = x ^ y;.
Now let's look at the carry. We will get carry from a single bit addition only if both bits are 1. So the bits which will set the carry bits to 1 are marked as 1 in the following expression: x & y (only the set bits at the same position will remain 1). But the carry should be added to the next (more significant) bit! Thus
carry = (x & y) << 1; // in the posted code it is y = carry << 1
And the iterations are performed unless carry is 0 (like in our example).

Matlab:how to interpolate non motonic data

I have a question regarding the interpolation of some vectors which can not be monotonic.
The data vectors looks as follow:
x = x1 = y =
20.0000 21.6000 32
21.8000 19.8000 132
22.2000 18.0000 193
21.4000 16.6000 351
20.2000 17.0000 576
20.6000 16.0000 649
20.3000 13.4000 686
19.4000 12.2000 806
16.9000 11.4000 1117
15.8000 11.2000 1252
15.6000 10.9000 1281
15.3000 9.7000 1379
14.8000 9.2000 1527
14.5000 8.7000 1577
12.4000 7.2000 1943
11.8000 5.0000 2047
10.4000 3.0000 2282
5.3000 2.1000 2840
3.5000 2.0000 3047
2.6000 1.8000 3140
(small part)
I would link to achieve 'y1' as interpolation of these using:
y1 = interp1(x,y,x1);
but x and x1 are not monotonic.
y1 should be as long as y
Have you an idea of how to perform the interpolation?
Sort both y and x such as x is monotonic. than sort x1 and use it as presented.
See if the code below helps.
[new_x,indx]=sort(x);
new_y=y(indx);
new_x1=sort(x1);
%solves duplicate entries through the unique function (1) or average entries (2)
[temp_new_x,indx]=unique(new_x);
% (1) discard all repeated x values except the last one
new_y=new_y(indx);
new_x=temp_new_x;
% (2) average repeated entries
new_y = arrayfun(#(C) mean(new_y(C==new_x)),temp_new_x);
new_x=temp_new_x;
y1=interp1(new_x,new_y,new_x1);
Sort the x data and re-index the y data using the results, then interpolate:
[sortedX, IX] = sort(x);
y1 = interp1(sortedX, y(IX), X1);

How to find out the intersection of two coplanar lines in C

I have two 3D lines which lie on the same plane. line1 is defined by a point (x1, y1, z1) and its direction vector (a1, b1, c1) while line2 is defined by a point (x2, y2, z2) and its direction vector (a2, b2, c2). Then the parametric equations for both lines are
x = x1 + a1*t; x = x2 + a2*s;
y = y1 + b1*t; y = y2 + b2*s;
z = z1 + c1*t; z = z2 + c2*s;
If both direction vectors are nonzeros, we can find out the location of intersection node easily by equating the right-hand-side of the equations above and solving t and s from either two of the three. However, it's possible that a1 b1 c1 a2 b2 c2 are not all nonzero so that I can't solve those equations in the same way. My current thought is to deal with this issue case by case, like
case1: a1 = 0, others are nonzero
case2: a2 = 0, others are nonzero
case3: b1 = 0, others are nonzero
...
However, there are so many cases in total and the implementation would become messy. Is there any good ways to tackle this problem? Any reference? Thanks a lot!
It is much more practical to see this as a vector equation. A dot . is a scalar product and A,n,B,m are vectors describing the lines. Point A is on the first line of direction n. Directions are normalized : n.n=1 and m.m=1. The point of intersection C is such that :
C=A+nt=B+ms
where t and s are scalar parameters to be computed.
Therefore (.n) :
A.n+ t=B.n+m.n s
t= (B-A).n+m.n s
And (.m):
A.m+n.m t=B.m+ s
A.m+n.m (B-A).n+(m.n)^2 s=B.m+ s
n.m(B-A).n+(A-B).m=(1-(m.n)^2).s
Since n.n=m.m=1 and n and m are not aligned, (m.n)^2<1 :
s=[n.m(B-A).n+(A-B).m]/[1-(m.n)^2]
t= (B-A).n+m.n s
You can solve this as a linear system:
| 1 0 0 -a1 0 | | x | | x1 |
| 0 1 0 -b1 0 | | y | | y1 |
| 0 0 1 -c1 0 | | z | = | z1 |
| 1 0 0 0 -a2 | | s | | x2 |
| 0 1 0 0 -b2 | | t | | y2 |
| 0 0 1 0 -c2 | | z2 |
x y z is the intersection point, and s t are the coefficients of the vectors. This solves the same equation that #francis wrote, with the advantage that it also obtains the solution that minimizes the error in case your data are not perfect.
This equation is usually expressed as Ax=b, and can be solved by doing x = A^(-1) * b, where A^(-1) is the pseudo-inverse of A. All the linear algebra libraries implement some function to solve systems like this, so don't worry.
It might be vital to remember that calculations are never exact, and small deviations in your constants and calculations can make your lines not exactly intersect.
Therefore, let's solve a more general problem - find the values of t and s for which the distance between the corresponding points in the lines is minimal. This is clearly a task for calculus, and it's easy (because linear functions are the easiest ones in calculus).
So the points are
[xyz1]+[abc1]*t
and
[xyz2]+[abc2]*s
(here [xyz1] is a 3-vector [x1, y1, z1] and so on)
The (square of) the distance between them:
([abc1]*t - [abc2]*s + [xyz1]-[xyz2])^2
(here ^2 is a scalar product of a 3-vector with itself)
Let's find a derivative of this with respect to t:
[abc1] * ([abc1]*t - [abc2]*s + [xyz1]-[xyz2]) (multiplied by 2, but this doesn't matter)
(here the first * is a scalar product, and the other *s are regular multiplications between a vector and a number)
The derivative should be equal to zero at the minimum point:
[abc1] * ([abc1]*t - [abc2]*s + [xyz1]-[xyz2]) = 0
Let's use the derivative with respect to s too - we want it to be zero too.
[abc1]*[abc1]*t - [abc1]*[abc2]*s = -[abc1]*([xyz1]-[xyz2])
-[abc2]*[abc1]*t + [abc2]*[abc2]*s = [abc2]*([xyz1]-[xyz2])
From here, let's find t and s.
Then, let's find the two points that correspond to these t and s. If all calculations were ideal, these points would coincide. However, at this point you are practically guaranteed to get some small deviations, so take and of these points as your result (intersection of the two lines).
It might be better to take the average of these points, to make the result symmetrical.

Using __builtin_ia32_shufps to shift a vector by 32 bits?

This is the prototype for the function:
v4si __builtin_ia32_shufps (v4si, v4si, int)
On some websites I found they had but hex in the int field, and it looked liked it separated high and low bits, but what I want is a logical 32 bit shift.
X3 X2 X1 X0 shifted by 32 bits to get X2 X1 X0 0
Another example using 2 v4si vectors:
X7 X6 X5 X4 | X3 X2 X1 X0, where each X is a 32 bit and what I want for a shift is the
same a logical shift, but with each vector element. So:
X7 X6 X5 X4 | X3 X2 X1 X0 << 2 = X5 X4 X3 X2 | X1 X0 0 0
Is shufps the right command to do this?
Looking at your example with two vectors I think what you're probably looking for is _mm_alignr_epi8 (PALIGNR). This works for any shift of a pair of vectors by an arbitrary number of bytes, so you would need to multiply the shift parameter by sizeof(int), e.g.
v = _mm_alignr_epi8(v0, v1, 2 * sizeof(int));
Note that this instruction is only available in SSSE3 and later, which means pretty much any Intel CPU since ~ 2005.

How is a CRC32 checksum calculated?

Maybe I'm just not seeing it, but CRC32 seems either needlessly complicated, or insufficiently explained anywhere I could find on the web.
I understand that it is the remainder from a non-carry-based arithmetic division of the message value, divided by the (generator) polynomial, but the actual implementation of it escapes me.
I've read A Painless Guide To CRC Error Detection Algorithms, and I must say it was not painless. It goes over the theory rather well, but the author never gets to a simple "this is it." He does say what the parameters are for the standard CRC32 algorithm, but he neglects to lay out clearly how you get to it.
The part that gets me is when he says "this is it" and then adds on, "oh by the way, it can be reversed or started with different initial conditions," and doesn't give a clear answer of what the final way of calculating a CRC32 checksum given all of the changes he just added.
Is there a simpler explanation of how CRC32 is calculated?
I attempted to code in C how the table is formed:
for (i = 0; i < 256; i++)
{
temp = i;
for (j = 0; j < 8; j++)
{
if (temp & 1)
{
temp >>= 1;
temp ^= 0xEDB88320;
}
else {temp >>= 1;}
}
testcrc[i] = temp;
}
but this seems to generate values inconsistent with values I have found elsewhere on the Internet. I could use the values I found online, but I want to understand how they were created.
Any help in clearing up these incredibly confusing numbers would be very appreciated.
The polynomial for CRC32 is:
x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Wikipedia
CRC calculation
Or in hex and binary:
0x 01 04 C1 1D B7
1 0000 0100 1100 0001 0001 1101 1011 0111
The highest term (x32) is usually not explicitly written, so it can instead be represented in hex just as
0x 04 C1 1D B7
Feel free to count the 1s and 0s, but you'll find they match up with the polynomial, where 1 is bit 0 (or the first bit) and x is bit 1 (or the second bit).
Why this polynomial? Because there needs to be a standard given polynomial and the standard was set by IEEE 802.3. Also it is extremely difficult to find a polynomial that detects different bit errors effectively.
You can think of the CRC-32 as a series of "Binary Arithmetic with No Carries", or basically "XOR and shift operations". This is technically called Polynomial Arithmetic.
CRC primer, Chapter 5
To better understand it, think of this multiplication:
(x^3 + x^2 + x^0)(x^3 + x^1 + x^0)
= (x^6 + x^4 + x^3
+ x^5 + x^3 + x^2
+ x^3 + x^1 + x^0)
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0
If we assume x is base 2 then we get:
x^7 + x^3 + x^2 + x^1 + x^0
CRC primer Chp.5
Why? Because 3x^3 is 11x^11 (but we need only 1 or 0 pre digit) so we carry over:
=1x^110 + 1x^101 + 1x^100 + 11x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^100 + 1x^100 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^101 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^110 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^111 + 1x^11 + 1x^10 + 1x^1 + x^0
But mathematicians changed the rules so that it is mod 2. So basically any binary polynomial mod 2 is just addition without carry or XORs. So our original equation looks like:
=( 1x^110 + 1x^101 + 1x^100 + 11x^11 + 1x^10 + 1x^1 + x^0 ) MOD 2
=( 1x^110 + 1x^101 + 1x^100 + 1x^11 + 1x^10 + 1x^1 + x^0 )
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0 (or that original number we had)
I know this is a leap of faith but this is beyond my capability as a line-programmer. If you are a hard-core CS-student or engineer I challenge to break this down. Everyone will benefit from this analysis.
So to work out a full example:
Original message : 1101011011
Polynomial of (W)idth 4 : 10011
Message after appending W zeros : 11010110110000
Now we divide the augmented Message by the Poly using CRC arithmetic. This is the same division as before:
1100001010 = Quotient (nobody cares about the quotient)
_______________
10011 ) 11010110110000 = Augmented message (1101011011 + 0000)
=Poly 10011,,.,,....
-----,,.,,....
10011,.,,....
10011,.,,....
-----,.,,....
00001.,,....
00000.,,....
-----.,,....
00010,,....
00000,,....
-----,,....
00101,....
00000,....
-----,....
01011....
00000....
-----....
10110...
10011...
-----...
01010..
00000..
-----..
10100.
10011.
-----.
01110
00000
-----
1110 = Remainder = THE CHECKSUM!!!!
The division yields a quotient, which we throw away, and a remainder, which is the calculated checksum. This ends the calculation. Usually, the checksum is then appended to the message and the result transmitted. In this case the transmission would be: 11010110111110.
CRC primer, Chapter 7
Only use a 32-bit number as your divisor and use your entire stream as your dividend. Throw out the quotient and keep the remainder. Tack the remainder on the end of your message and you have a CRC32.
Average guy review:
QUOTIENT
----------
DIVISOR ) DIVIDEND
= REMAINDER
Take the first 32 bits.
Shift bits
If 32 bits are less than DIVISOR, go to step 2.
XOR 32 bits by DIVISOR. Go to step 2.
(Note that the stream has to be dividable by 32 bits or it should be padded. For example, an 8-bit ANSI stream would have to be padded. Also at the end of the stream, the division is halted.)
For IEEE802.3, CRC-32. Think of the entire message as a serial bit stream, append 32 zeros to the end of the message. Next, you MUST reverse the bits of EVERY byte of the message and do a 1's complement the first 32 bits. Now divide by the CRC-32 polynomial, 0x104C11DB7. Finally, you must 1's complement the 32-bit remainder of this division bit-reverse each of the 4 bytes of the remainder. This becomes the 32-bit CRC that is appended to the end of the message.
The reason for this strange procedure is that the first Ethernet implementations would serialize the message one byte at a time and transmit the least significant bit of every byte first. The serial bit stream then went through a serial CRC-32 shift register computation, which was simply complemented and sent out on the wire after the message was completed. The reason for complementing the first 32 bits of the message is so that you don't get an all zero CRC even if the message was all zeros.
I published a tutorial on CRC-32 hashes, here:
CRC-32 hash tutorial - AutoHotkey Community
In this example from it, I demonstrate how to calculate the CRC-32 hash for the 'ANSI' (1 byte per character) string 'abc':
calculate the CRC-32 hash for the 'ANSI' string 'abc':
inputs:
dividend: binary for 'abc': 0b011000010110001001100011 = 0x616263
polynomial: 0b100000100110000010001110110110111 = 0x104C11DB7
start with the 3 bytes 'abc':
61 62 63 (as hex)
01100001 01100010 01100011 (as bin)
reverse the bits in each byte:
10000110 01000110 11000110
append 32 0 bits:
10000110010001101100011000000000000000000000000000000000
XOR (exclusive or) the first 4 bytes with 0xFFFFFFFF:
(i.e. flip the first 32 bits:)
01111001101110010011100111111111000000000000000000000000
next we will perform 'CRC division':
a simple description of 'CRC division':
we put a 33-bit box around the start of a binary number,
start of process:
if the first bit is 1, we XOR the number with the polynomial,
if the first bit is 0, we do nothing,
we then move the 33-bit box right by 1 bit,
if we have reached the end of the number,
then the 33-bit box contains the 'remainder',
otherwise we go back to 'start of process'
note: every time we perform a XOR, the number begins with a 1 bit,
and the polynomial always begins with a 1 bit,
1 XORed with 1 gives 0, so the resulting number will always begin with a 0 bit
'CRC division':
'divide' by the polynomial 0x104C11DB7:
01111001101110010011100111111111000000000000000000000000
100000100110000010001110110110111
---------------------------------
111000100010010111111010010010110
100000100110000010001110110110111
---------------------------------
110000001000101011101001001000010
100000100110000010001110110110111
---------------------------------
100001011101010011001111111101010
100000100110000010001110110110111
---------------------------------
111101101000100000100101110100000
100000100110000010001110110110111
---------------------------------
111010011101000101010110000101110
100000100110000010001110110110111
---------------------------------
110101110110001110110001100110010
100000100110000010001110110110111
---------------------------------
101010100000011001111110100001010
100000100110000010001110110110111
---------------------------------
101000011001101111000001011110100
100000100110000010001110110110111
---------------------------------
100011111110110100111110100001100
100000100110000010001110110110111
---------------------------------
110110001101101100000101110110000
100000100110000010001110110110111
---------------------------------
101101010111011100010110000001110
100000100110000010001110110110111
---------------------------------
110111000101111001100011011100100
100000100110000010001110110110111
---------------------------------
10111100011111011101101101010011
we obtain the 32-bit remainder:
0b10111100011111011101101101010011 = 0xBC7DDB53
note: the remainder is a 32-bit number, it may start with a 1 bit or a 0 bit
XOR the remainder with 0xFFFFFFFF:
(i.e. flip the 32 bits:)
0b01000011100000100010010010101100 = 0x438224AC
reverse bits:
bit-reverse the 4 bytes (32 bits), treating them as one entity:
(e.g. 'abcdefgh ijklmnop qrstuvwx yzABCDEF'
to 'FEDCBAzy xwvutsrq ponmlkji hgfedcba':)
0b00110101001001000100000111000010 = 0x352441C2
thus the CRC-32 hash for the 'ANSI' string 'abc' is: 0x352441C2
A CRC is pretty simple; you take a polynomial represented as bits and the data, and divide the polynomial into the data (or you represent the data as a polynomial and do the same thing). The remainder, which is between 0 and the polynomial is the CRC. Your code is a bit hard to understand, partly because it's incomplete: temp and testcrc are not declared, so it's unclear what's being indexed, and how much data is running through the algorithm.
The way to understand CRCs is to try to compute a few using a short piece of data (16 bits or so) with a short polynomial -- 4 bits, perhaps. If you practice this way, you'll really understand how you might go about coding it.
If you're doing it frequently, a CRC is quite slow to compute in software. Hardware computation is much more efficient, and requires just a few gates.
In addition to the Wikipedia Cyclic redundancy check and Computation of CRC articles, I found a paper entitled Reversing CRC - Theory and Practice* to be a good reference.
There are essentially three approaches for computing a CRC: an algebraic approach, a bit-oriented approach, and a table-driven approach. In Reversing CRC - Theory and Practice*, each of these three algorithms/approaches is explained in theory accompanied in the APPENDIX by an implementation for the CRC32 in the C programming language.
* PDF Link
Reversing CRC – Theory and Practice.
HU Berlin Public Report
SAR-PR-2006-05
May 2006
Authors:
Martin Stigge, Henryk Plötz, Wolf Müller, Jens-Peter Redlich
Then there is always Rosetta Code, which shows crc32 implemented in dozens of computer languages. https://rosettacode.org/wiki/CRC-32 and has links to many explanations and implementations.
In order to reduce crc32 to taking the reminder you need to:
Invert bits on each byte
xor first four bytes with 0xFF (this is to avoid errors on the leading 0s)
Add padding at the end (this is to make the last 4 bytes take part in the hash)
Compute the reminder
Reverse the bits again
xor the result again.
In code this is:
func CRC32 (file []byte) uint32 {
for i , v := range(file) {
file[i] = bits.Reverse8(v)
}
for i := 0; i < 4; i++ {
file[i] ^= 0xFF
}
// Add padding
file = append(file, []byte{0, 0, 0, 0}...)
newReminder := bits.Reverse32(reminderIEEE(file))
return newReminder ^ 0xFFFFFFFF
}
where reminderIEEE is the pure reminder on GF(2)[x]

Resources