Bit hacking and modulo operation - c

While reading this: http://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith64BitsDiv
I came to the phrase:
The last step, which involves modulus division by 2^10 - 1, has the
effect of merging together each set of 10 bits (from positions 0-9,
10-19, 20-29, ...) in the 64-bit value.
(it is about reversing the bits in a number)...
so I did some calculations:
reverted = (input * 0x0202020202ULL & 0x010884422010ULL) % 1023;
b = 74 : 01001010
b
* 0x0202020202 : 1000000010000000100000001000000010
= 9494949494 :01001010010010100100101001001010010010100
& 10884422010 :10000100010000100010000100010000000010000
= 84000010 : 10000100000000000000000000010000
% 1023 : 1111111111
= 82 : 01010010
Now, the only part which is somewhat unclear is the part where the big number modulo by 1023 (2^10 - 1) packs and gives me the inverted bits... I did not find any good doc about relationship between bit operations and the modulo operation (beside x % 2^n == x & (2^n - 1))) so maybe if someone would cast a light on this it would be very fruitful.

The modulo operation does not give you the inverted bits per se, it is just a binning operation.
First Line : word expansion
b * 0x0202020202 = 01001010 01001010 01001010 01001010 01001010 0
The multiplication operation has a convolution property, which means it replicate the input variable several times (5 here since it's a 8-bit word).
First Line : reversing bits
That's the most tricky part of the hack. You have to remember that we are working on a 8-bit word : b = abcdefgh, where [a-h] are either 1 or 0.
b * 0x0202020202 = abcdefghabcdefghabcdefghabcdefghabcdefgha
& 10884422010 = a0000f000b0000g000c0000h000d00000000e0000
Last Line : word binning
Modulo has a peculiar property : 10 ≡ 1 (mod 9) so 100 ≡ 10*10 ≡ 10*1 (mod 9) ≡ 1 (mod 9).
More generally, for a base b, b ≡ 1 (mod b - 1) so for all number a ≡ sum(a_k*b^k) ≡ sum (a_k) (mod b - 1).
In the example, base = 1024 (10 bits) so
b ≡ a0000f000b0000g000c0000h000d00000000e0000
≡ a*base^4 + 0000f000b0*base^3 + 000g000c00*base^2 + 00h000d000*base +00000e0000
≡ a + 0000f000b0 + 000g000c00 + 00h000d000 + 00000e0000 (mod b - 1)
≡ 000000000a
+ 0000f000b0
+ 000g000c00
+ 00h000d000
+ 00000e0000 (mod b - 1)
≡ 00hgfedcba (mod b - 1) since there is no carry (no overlap)

Related

Find number of array element that can be formed with add/subtract operation of two other number with third number

Given three numbers d, a, b and an array of integers. We can add/subtract a or b to d any number of times. We are supposed to find the count of numbers in the array which can be formed by applying these operations to d.
Example: if array is [14, 15, 63] and d = 4, a = 7 and b = 9;
Then output should be 2.
As
14 = 4 + (9 - 7) + (9 - 7) + (9 - 7) + (9 - 7) + (9 - 7)
63 = 4 + 9 + 9 + 9 + 9 + 9 + 7 + 7
But 15 cannot be obtains with any combination. thus output is 2.
Kindly suggest an algorithm for calculating this. Thanks in advance.
Just to add a little more detail to #user3386109's comment,
Given three numbers d, a, b and an array of integers. We can add/subtract a or b to d any number of times. We are supposed to find the count of numbers in the array which can be formed by applying these operations to d.
Let an element in array be x,
Now, say x = d + a*i + b*j, where i and j are any integers. If this needs to hold true, then x - d = a*i + b*j.
Lets see what the right hand term has to say,
From Bézout's identity
Bézout's identity — Let a and b be integers with greatest common divisor d. Then, there exist integers x and y such that ax + by = d. More generally, the integers of the form ax + by are exactly the multiples of d.
We see that a*i + b*j are exactly the multiples of GCD(a,b). So the difference x-d must be divisible by GCD(a,b) like #AshutoshTiwari pointed out.

Truth table for XOR function for non binary bases

I know that the XOR operator in base 2 for two bits A and B is (A+B)%2. In others words, it's addition modulo 2.
If I want to compute the truth table for XOR operation in a ternary system (base 3), is it same as addition modulo 3?
Eg: In a base 3 system, does 2 XOR 2 = 1 (since (2+2)%3 = 1)?
I read this link which indicated that 2 XOR 2 in a base 3 system is 2 and I can't understand the formula behind that?
In general, for any base 'x', is the XOR operation for that base - addition modulo x?
While I don't know that XOR has technically been defined in higher bases, the properties of XOR can be maintained in higher bases such that:
a ⊕ b ⊕ b = a
a ⊕ b ⊕ a = b
As the blog post shows, using (base - (a + b) % base) % base works. The part you were missing is the first instance of base. In the example of 2 ⊕ 2 in base 3, we get (3 - (2 + 2) % 3) % 3) which does give 2. This formula only works with single digit numbers. If you want to extend to multiple digits, you would use the same formula with each pair of digits just as standard XOR in binary does it bitwise.
For example, 185 ⊕ 42 in base 10 when run for each pair of digits (i.e. hundreds, tens, ones) gives us:
(10 - (1 + 0) % 10) % 10 => 9
(10 - (8 + 4) % 10) % 10 => 8
(10 - (5 + 2) % 10) % 10 => 3
Or 983 when put together. If you run 983 ⊕ 145, you'll find it comes out to 85.
Well, XOR stands for eXclusive OR and it is a logical operation. And this operation is only defined in binary logic. In your link author defines completely different operator which works the same as XOR for binary base. You may call it an "extension" of XOR operation for bases greater than 2. But as you mentioned in your question, there are multiple ways to do this extension. And each way would preserve some properties of "original" XOR and loose some other. For example, you can stick to observation that
a ⊕ b = (a + b) mod 2
In this case your "extended XOR" for base 3 would produce output of 1 for inputs 2, 2. But this XOR3 will no longer satisfy other equations that work for standard XOR, e.g. these ones:
a ⊕ b ⊕ b = a
a ⊕ b ⊕ a = b
If you choose to "save" those, you will get the operation from your link. You can also preserve some different property of XOR, say
a ⊕ a = 0
and get another operation that is different from the former two.
So the short answer is: the phrase "XOR function for non binary bases" doesn't make any sense. XOR operator is only defined in binary logic. If you want to extend it for non-binary bases or non-integer number or complex numbers or whatever, you can do it and define some extension function with whatever behavior and whatever "truth table" you want. But this extension won't be a XOR function anymore.

c bitwise operation to match description

I'm supposed to match the worded descriptions to the bitwise operations. W is one less than the total bits in a's and b's data structure. So if a is 32 bits long W is 31 Here are the worded descriptions:
1. One’s complement of a
2. a.
3. a&b.
4. a * 7.
5. a / 4 .
6. (a<0)?1:-1.
and here are the bitwise descriptions:
a. ̃( ̃a | (b ˆ (MIN_INT + MAX_INT)))
b. ((aˆb)& ̃b)|( ̃(aˆb)&b)
c. 1+(a<<3)+ ̃a
d. (a<<4)+(a<<2)+(a<<1)
e. ((a<0)?(a+3):a)>>2
f. a ˆ (MIN_INT + MAX_INT)
g. ̃((a|( ̃a+1))>>W)&1
h. ̃((a >> W) << 1)
i. a >> 2
I have a few of them solved namely:
a. ̃( ̃a | (b ˆ (MIN_INT + MAX_INT))) = a & b
b. ((aˆb)& ̃b)|( ̃(aˆb)&b) = a
c. 1+(a<<3)+ ̃a = 7 * a
d. (a<<4)+(a<<2)+(a<<1) = 16*a + 4*a + 2*a = 22*a
e. e. ((a<0)?(a+3):a)>>2 = (a<0)?(a/4 + 3/4) : a/4 = a/4 + ((a<0)?(3/4:0)
f. a ˆ (MIN_INT + MAX_INT) = ~a
i. a >> 2 = a/4
So basically all I need help with are g and h
g. ̃((a|( ̃a+1))>>W)&1
h. ̃((a >> W) << 1)
If you wouldn't mind could you also provide an explanation if you could?
I think this is what is going on with g:
g. ̃((a|( ̃a+1))>>W)&1 = ~((a|(two's complement of a) >>W)&1
= ~((a|sign of two's complement of a) &1 = ~(-a)&1
but this could be 1 or 0 so I don't think I did this right.
and for this one:
h. ̃((a >> W) << 1) = ~((sign of a) << 1) = ~((sign of a)*2)
and I don't know where to go from there...
Thank you for your help!!!
For g, consider that (a|~a) sets all bits to 1, so:
~((a|~a) >> W) & 1
~(all_ones >> W) & 1
~1 & 1
0
The only way adding 1 to ~a could possibly affect this result is if the addition flipped the most significant bit of ~a (due to the right shift by W). That can only happen if a is 0 or 2^W. In the latter case, we will get the same result as above because the top bit of (a|X) will always be set. However, when a is 0 ~a+1 (0's twos complement) is also 0 and the final result of the entire expression will instead be 1.
Therefore, g is 1 when a is zero, otherwise it is 0 (i.e. - g is equivalent to the C expression a == 0). That seemingly doesn't match any of your worded descriptions. Indeed, I don't see how any expression (X & 1) possibly matches any of your worded descriptions. None of your worded descriptions matches an expression that evaluates to only 0 or 1 (for all values of a, b).
For h, consider that if a is negative, then its top most bit is set. Because a is signed, right shifting it 31 positions drags the sign bit across all 32 bits of a. Then left shifting it one position sets the least significant bit to 0. Complementing that yields 1. If a is non-negative, then its top most bit is 0 and right shifting that 31 positions yields 0. Left shifting that 1 position still yields 0. Complementing that yields all bits set, which is the 2's complement rep of -1. Therefore, h is equivalent to (a < 0 ? 1 : -1) or #6 of your worded descriptions.

Fastest computation of sum x^5 + x^4 + x^3...+x^0 (Bitwise possible ?) with x=16

For a tree layout that takes benefit of cache line prefetching (reading _next_ cacheline is cheap), I need to solve the address calculation in a really fast way. I was able to boil down the problem to:
newIndex = nowIndex + 1 + (localChildIndex*X)
x would be for example: X = 45 + 44 + 43 + 42 +40.
Note: 4 is the branching factor. In reality it will be 16, so a power of 2. This should be useful to use bitwise stuff?
It would be very bad if it would need a loop to calculate X (performancewise) and stuff like division/multiplication. This appeals to be an interesting problem which I wasn’t able to come up with some nice way of computing it.
Since its part of a tree traversal, 2 modes would be possible: absolute calculation, independent of prior calculations AND incremental calculation which starts with a high X being kept in a variable and then some minimal stuff done to it with every deeper level of the tree.
I hope I was able to make clear what the math should do. Not sure if there is a way to do this fast & without loop - but maybe somebody can come up with a really smart solution. I would like to thank everybody for their help - StackOverflow have been a great teacher to me in the past and I hope to be able to give back more in the future, as my knowledge increases.
I'll answer this in increasing complexity and generality.
If x is fixed to 16 then just use a constant value 1118481. Hooray! (Name it, using magical numbers is bad practice)
If you have a few cases known at compile time use a few constants or even defines, for example:
#define X_2 63
#define X_4 1365
#define X_8 37449
#define X_16 1118481
...
If you have several cases known at execution time initialize and use a lookup table indexed with the exponent.
int _X[MAX_EXPONENT]; // note: give it a more meaningful name :)
Initialize it and then just access with the known exponent of 2^exp at execution time.
newIndex = nowIndex + 1 + (localChildIndex*_X[exp]);
How are these values precalculated, or how to calculate them efficiently on the fly:
The sum X = x^n + x^(n - 1) + ... + x^1 + x^0 is a geometric serie and its finite sum is:
X = x^n + x^(n - 1) + ... + x^1 + x^0 = (1-x^(n + 1))/(1-x)
About the bitwise operations, as Oli Charlesworth has stated if x is a power of 2 (in binary 0..010..0) x^n is also a power of 2, and the sum of different powers of two is equivalent to the OR operation. Thus we could make an expression like:
Let exp be the exponent so that x = 2^exp. (For 16, exp = 4). Then,
X = x^5 + ... + x^1 + x^0
X = (2^exp)^5 + ... + (2^exp)^1 + 1
X = 2^(exp*5) + ... + 2^(exp*1) + 1
now using bitwise, 2^n = 1<<n
X = 1<<(exp*5) | ... | 1<<exp | 1
In C:
int X;
int exp = 4; //for x == 16
X = 1 << (exp*5) | 1 << (exp*4) | 1 << (exp*3) | 1 << (exp*2) | 1 << (exp*1) | 1;
And finally, I can't resist to say: if your expression were more complex and you had to evaluate an arbitrary polynomial a_n*x^n + ... + a_1*x^1 + a_0 in x, instead of implementing the obvious loop, a faster way to evaluate the polynomial is using the Horner's rule.

How is a CRC32 checksum calculated?

Maybe I'm just not seeing it, but CRC32 seems either needlessly complicated, or insufficiently explained anywhere I could find on the web.
I understand that it is the remainder from a non-carry-based arithmetic division of the message value, divided by the (generator) polynomial, but the actual implementation of it escapes me.
I've read A Painless Guide To CRC Error Detection Algorithms, and I must say it was not painless. It goes over the theory rather well, but the author never gets to a simple "this is it." He does say what the parameters are for the standard CRC32 algorithm, but he neglects to lay out clearly how you get to it.
The part that gets me is when he says "this is it" and then adds on, "oh by the way, it can be reversed or started with different initial conditions," and doesn't give a clear answer of what the final way of calculating a CRC32 checksum given all of the changes he just added.
Is there a simpler explanation of how CRC32 is calculated?
I attempted to code in C how the table is formed:
for (i = 0; i < 256; i++)
{
temp = i;
for (j = 0; j < 8; j++)
{
if (temp & 1)
{
temp >>= 1;
temp ^= 0xEDB88320;
}
else {temp >>= 1;}
}
testcrc[i] = temp;
}
but this seems to generate values inconsistent with values I have found elsewhere on the Internet. I could use the values I found online, but I want to understand how they were created.
Any help in clearing up these incredibly confusing numbers would be very appreciated.
The polynomial for CRC32 is:
x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Wikipedia
CRC calculation
Or in hex and binary:
0x 01 04 C1 1D B7
1 0000 0100 1100 0001 0001 1101 1011 0111
The highest term (x32) is usually not explicitly written, so it can instead be represented in hex just as
0x 04 C1 1D B7
Feel free to count the 1s and 0s, but you'll find they match up with the polynomial, where 1 is bit 0 (or the first bit) and x is bit 1 (or the second bit).
Why this polynomial? Because there needs to be a standard given polynomial and the standard was set by IEEE 802.3. Also it is extremely difficult to find a polynomial that detects different bit errors effectively.
You can think of the CRC-32 as a series of "Binary Arithmetic with No Carries", or basically "XOR and shift operations". This is technically called Polynomial Arithmetic.
CRC primer, Chapter 5
To better understand it, think of this multiplication:
(x^3 + x^2 + x^0)(x^3 + x^1 + x^0)
= (x^6 + x^4 + x^3
+ x^5 + x^3 + x^2
+ x^3 + x^1 + x^0)
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0
If we assume x is base 2 then we get:
x^7 + x^3 + x^2 + x^1 + x^0
CRC primer Chp.5
Why? Because 3x^3 is 11x^11 (but we need only 1 or 0 pre digit) so we carry over:
=1x^110 + 1x^101 + 1x^100 + 11x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^100 + 1x^100 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^101 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^110 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^111 + 1x^11 + 1x^10 + 1x^1 + x^0
But mathematicians changed the rules so that it is mod 2. So basically any binary polynomial mod 2 is just addition without carry or XORs. So our original equation looks like:
=( 1x^110 + 1x^101 + 1x^100 + 11x^11 + 1x^10 + 1x^1 + x^0 ) MOD 2
=( 1x^110 + 1x^101 + 1x^100 + 1x^11 + 1x^10 + 1x^1 + x^0 )
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0 (or that original number we had)
I know this is a leap of faith but this is beyond my capability as a line-programmer. If you are a hard-core CS-student or engineer I challenge to break this down. Everyone will benefit from this analysis.
So to work out a full example:
Original message : 1101011011
Polynomial of (W)idth 4 : 10011
Message after appending W zeros : 11010110110000
Now we divide the augmented Message by the Poly using CRC arithmetic. This is the same division as before:
1100001010 = Quotient (nobody cares about the quotient)
_______________
10011 ) 11010110110000 = Augmented message (1101011011 + 0000)
=Poly 10011,,.,,....
-----,,.,,....
10011,.,,....
10011,.,,....
-----,.,,....
00001.,,....
00000.,,....
-----.,,....
00010,,....
00000,,....
-----,,....
00101,....
00000,....
-----,....
01011....
00000....
-----....
10110...
10011...
-----...
01010..
00000..
-----..
10100.
10011.
-----.
01110
00000
-----
1110 = Remainder = THE CHECKSUM!!!!
The division yields a quotient, which we throw away, and a remainder, which is the calculated checksum. This ends the calculation. Usually, the checksum is then appended to the message and the result transmitted. In this case the transmission would be: 11010110111110.
CRC primer, Chapter 7
Only use a 32-bit number as your divisor and use your entire stream as your dividend. Throw out the quotient and keep the remainder. Tack the remainder on the end of your message and you have a CRC32.
Average guy review:
QUOTIENT
----------
DIVISOR ) DIVIDEND
= REMAINDER
Take the first 32 bits.
Shift bits
If 32 bits are less than DIVISOR, go to step 2.
XOR 32 bits by DIVISOR. Go to step 2.
(Note that the stream has to be dividable by 32 bits or it should be padded. For example, an 8-bit ANSI stream would have to be padded. Also at the end of the stream, the division is halted.)
For IEEE802.3, CRC-32. Think of the entire message as a serial bit stream, append 32 zeros to the end of the message. Next, you MUST reverse the bits of EVERY byte of the message and do a 1's complement the first 32 bits. Now divide by the CRC-32 polynomial, 0x104C11DB7. Finally, you must 1's complement the 32-bit remainder of this division bit-reverse each of the 4 bytes of the remainder. This becomes the 32-bit CRC that is appended to the end of the message.
The reason for this strange procedure is that the first Ethernet implementations would serialize the message one byte at a time and transmit the least significant bit of every byte first. The serial bit stream then went through a serial CRC-32 shift register computation, which was simply complemented and sent out on the wire after the message was completed. The reason for complementing the first 32 bits of the message is so that you don't get an all zero CRC even if the message was all zeros.
I published a tutorial on CRC-32 hashes, here:
CRC-32 hash tutorial - AutoHotkey Community
In this example from it, I demonstrate how to calculate the CRC-32 hash for the 'ANSI' (1 byte per character) string 'abc':
calculate the CRC-32 hash for the 'ANSI' string 'abc':
inputs:
dividend: binary for 'abc': 0b011000010110001001100011 = 0x616263
polynomial: 0b100000100110000010001110110110111 = 0x104C11DB7
start with the 3 bytes 'abc':
61 62 63 (as hex)
01100001 01100010 01100011 (as bin)
reverse the bits in each byte:
10000110 01000110 11000110
append 32 0 bits:
10000110010001101100011000000000000000000000000000000000
XOR (exclusive or) the first 4 bytes with 0xFFFFFFFF:
(i.e. flip the first 32 bits:)
01111001101110010011100111111111000000000000000000000000
next we will perform 'CRC division':
a simple description of 'CRC division':
we put a 33-bit box around the start of a binary number,
start of process:
if the first bit is 1, we XOR the number with the polynomial,
if the first bit is 0, we do nothing,
we then move the 33-bit box right by 1 bit,
if we have reached the end of the number,
then the 33-bit box contains the 'remainder',
otherwise we go back to 'start of process'
note: every time we perform a XOR, the number begins with a 1 bit,
and the polynomial always begins with a 1 bit,
1 XORed with 1 gives 0, so the resulting number will always begin with a 0 bit
'CRC division':
'divide' by the polynomial 0x104C11DB7:
01111001101110010011100111111111000000000000000000000000
100000100110000010001110110110111
---------------------------------
111000100010010111111010010010110
100000100110000010001110110110111
---------------------------------
110000001000101011101001001000010
100000100110000010001110110110111
---------------------------------
100001011101010011001111111101010
100000100110000010001110110110111
---------------------------------
111101101000100000100101110100000
100000100110000010001110110110111
---------------------------------
111010011101000101010110000101110
100000100110000010001110110110111
---------------------------------
110101110110001110110001100110010
100000100110000010001110110110111
---------------------------------
101010100000011001111110100001010
100000100110000010001110110110111
---------------------------------
101000011001101111000001011110100
100000100110000010001110110110111
---------------------------------
100011111110110100111110100001100
100000100110000010001110110110111
---------------------------------
110110001101101100000101110110000
100000100110000010001110110110111
---------------------------------
101101010111011100010110000001110
100000100110000010001110110110111
---------------------------------
110111000101111001100011011100100
100000100110000010001110110110111
---------------------------------
10111100011111011101101101010011
we obtain the 32-bit remainder:
0b10111100011111011101101101010011 = 0xBC7DDB53
note: the remainder is a 32-bit number, it may start with a 1 bit or a 0 bit
XOR the remainder with 0xFFFFFFFF:
(i.e. flip the 32 bits:)
0b01000011100000100010010010101100 = 0x438224AC
reverse bits:
bit-reverse the 4 bytes (32 bits), treating them as one entity:
(e.g. 'abcdefgh ijklmnop qrstuvwx yzABCDEF'
to 'FEDCBAzy xwvutsrq ponmlkji hgfedcba':)
0b00110101001001000100000111000010 = 0x352441C2
thus the CRC-32 hash for the 'ANSI' string 'abc' is: 0x352441C2
A CRC is pretty simple; you take a polynomial represented as bits and the data, and divide the polynomial into the data (or you represent the data as a polynomial and do the same thing). The remainder, which is between 0 and the polynomial is the CRC. Your code is a bit hard to understand, partly because it's incomplete: temp and testcrc are not declared, so it's unclear what's being indexed, and how much data is running through the algorithm.
The way to understand CRCs is to try to compute a few using a short piece of data (16 bits or so) with a short polynomial -- 4 bits, perhaps. If you practice this way, you'll really understand how you might go about coding it.
If you're doing it frequently, a CRC is quite slow to compute in software. Hardware computation is much more efficient, and requires just a few gates.
In addition to the Wikipedia Cyclic redundancy check and Computation of CRC articles, I found a paper entitled Reversing CRC - Theory and Practice* to be a good reference.
There are essentially three approaches for computing a CRC: an algebraic approach, a bit-oriented approach, and a table-driven approach. In Reversing CRC - Theory and Practice*, each of these three algorithms/approaches is explained in theory accompanied in the APPENDIX by an implementation for the CRC32 in the C programming language.
* PDF Link
Reversing CRC – Theory and Practice.
HU Berlin Public Report
SAR-PR-2006-05
May 2006
Authors:
Martin Stigge, Henryk Plötz, Wolf Müller, Jens-Peter Redlich
Then there is always Rosetta Code, which shows crc32 implemented in dozens of computer languages. https://rosettacode.org/wiki/CRC-32 and has links to many explanations and implementations.
In order to reduce crc32 to taking the reminder you need to:
Invert bits on each byte
xor first four bytes with 0xFF (this is to avoid errors on the leading 0s)
Add padding at the end (this is to make the last 4 bytes take part in the hash)
Compute the reminder
Reverse the bits again
xor the result again.
In code this is:
func CRC32 (file []byte) uint32 {
for i , v := range(file) {
file[i] = bits.Reverse8(v)
}
for i := 0; i < 4; i++ {
file[i] ^= 0xFF
}
// Add padding
file = append(file, []byte{0, 0, 0, 0}...)
newReminder := bits.Reverse32(reminderIEEE(file))
return newReminder ^ 0xFFFFFFFF
}
where reminderIEEE is the pure reminder on GF(2)[x]

Resources