Most efficient way to find an intersection between two sets of numbers encoded with bitwise operations - c

Given two sets of numbers encoded with bitwise operations (using 6 bits for number):
a = {12,20,21,24,31}
b = {13,18,24,28,35}
Intersection -> a ∩ b = {24}
unsigned int a = 0;
a |= (12 | 20 << 6 | 21 << 12 | 24 << 18 | 31 << 24);
unsigned int b = 0;
b |= (13 | 18 << 6 | 24 << 12 | 28 << 18 | 35 << 24);
What is the fastest way to find out, if there is at least one number in common between the sets?
This is just an example, but you can have common numbers in any position.

#include <stdint.h>
#include <limits.h>
typedef unsigned int SetType;
#define FieldWidth 6 // Number of bits per field.
#define NumberOfFields (sizeof(SetType) * CHAR_BIT / FieldWidth)
// Return non-zero iff some element is in both a and b.
int IsIntersectionNonEmpty(SetType a, SetType b)
{
// Create masks with a bit set for each element an input set.
uint64_t A = 0, B = 0;
for (int i = 0; i < NumberOfFields; ++i)
{
A |= UINT64_C(1) << (a >> i*6 & 0x3f);
B |= UINT64_C(1) << (b >> i*6 & 0x3f);
/* ">> i*6" moves field i to the low bits.
"& 0x3f" isolates that six-bit field.
"UINT64_C(1) << …" generates a 1 bit in that position.
*/
}
/* Bitwise AND A and B to see if they have a bit in common, then
convert that to 1 or 0.
*/
return !! (A & B);
}

Maybe not the absolute fastest, but I'd XOR a with b, and see if the result has any six-bit all-zeros pattern in any of your 5 positions. Then shift one of them by 6 bits and repeat up to 4 more times if needed.

Here's a somewhat faster version of my solution above; instead of shifting left and right, just rotate:
int leftRotate(unsigned int n, unsigned int d)
{
return (n << d)|(n >> (32 - d));
}
// Return non-zero iff some element is in both a and b.
int IsIntersectionNonEmpty(unsigned int a, unsigned int b)
{
for (int i = 0; i < 5; i++) {
unsigned int matches = leftRotate(a, i*6) ^ b;
for (int j = 0; j < 5; j++) {
unsigned int testval = 0x3f << j*6;
if (matches & testval == testval)
return 1; // success
}
}
return 0;
}
5 instructions in the outer loop, 3 in the inner * 5, so 20 total, times 5 loops, around 100 instructions total -- but as soon as it finds a match it returns. So if there are frequent matches it'll likely be faster than the #eric-postpischil version, but with no matches it'll be slower. On the other hand, his solution is likely auto-vectorizable with a smart compiler.

Well, thanks to everyone, but thank to the guy that posted the EL code, I do not know why he withdrew it.
Here we go, as fast as light:
#define EL(x) (UINT64_C(1) << (x))
unsigned int a = 0;
a |= (12 | 20 << 6 | 21 << 12 | 24 << 18 | 31 << 24);
unsigned int b = 0;
b |= (13 | 18 << 6 | 24 << 12 | 28 << 18 | 35 << 24);
unsigned int aa = EL(a & 0x00000003F) | EL((a & 0x000000FC0) >> 6) | EL((a & 0x3F000) >> 12) | EL((a & 0xFC0000) >> 18) | EL((a & 0x3F000000) >> 24);
unsigned int bb = EL(b & 0x00000003F) | EL((b & 0x000000FC0) >> 6) | EL((b & 0x3F000) >> 12) | EL((b & 0xFC0000) >> 18) | EL((b & 0x3F000000) >> 24);
anb = !! (aa & bb); // intersection

Related

How to do bit striping on pixel data?

I have 3 buffers containing R, G, B bit data running on a 32-bit processor.
I need to combine the three bytes in the following way:
R[0] = 0b r1r2r3r4r5r6r7r8
G[0] = 0b g1g2g3g4g5g6g7g8
B[0] = 0b b1b2b3b4b5b6b7b8
int32_t Out = 0b r1g1b1r2g2b2r3g3 b3r4g4b4r5g5b5r6 g6b6r7g7b7r8g8b8 xxxxxxxx
where xxxxxxxx is continuing on to each of the next bytes in the buffers.
I am looking for an optimal way to combine them. My approach is definitely not efficient.
Here is my approach
static void rgbcombineline(uint8_t line)
{
uint32_t i, bit;
uint8_t bitMask, rByte, gByte, bByte;
uint32_t ByteExp, rgbByte;
uint8_t *strPtr = (uint8_t*)&ByteExp;
for (i = 0; i < (LCDpixelsCol / 8); i++)
{
rByte = rDispbuff[line][i];
gByte = gDispbuff[line][i];
bByte = bDispbuff[line][i];
bitMask = 0b00000001;
ByteExp = 0;
for(bit = 0; bit < 8; bit++)
{
rgbByte = 0;
rgbByte |= ((rByte & bitMask) >> bit) << 2;
rgbByte |= ((gByte & bitMask) >> bit) << 1;
rgbByte |= ((bByte & bitMask) >> bit);
ByteExp |= (rgbByte << 3*bit);
bitMask <<= 1;
}
TempLinebuff[((i*3)+0) +2] = *(strPtr + 2);
TempLinebuff[((i*3)+1) +2] = *(strPtr + 1);
TempLinebuff[((i*3)+2) +2] = *(strPtr + 0);
}
}
If you can spare 1024 bytes, you can achieve your desired result with a single 256-element lookup table:
uint32_t lookup[256] = {
0, 1, 8, 9, 64, 65, ...
/* map abcdefgh to a00b00c00d00e00f00g00h */
};
uint32_t result = (lookup[rByte] << 2) | (lookup[gByte] << 1) | lookup[bByte];
This uses only 3 lookups, 2 shifts and 2 or operations, which should provide an acceptable speedup.
If you have more space, you can use three lookup tables to eliminate the shifts too (although this may result in worse cache performance, so always profile to check!)
You can use a multiplication by a "magical" constant to replicate the bits. Then use bit-shifts to extract the needed bits, and bit-wise masking to combine them. The "magical" constant is a 17-bit binary 10000000100000001. When multiplied by it, any 8-bit number is concatenated to itself 3 times.
r1r2r3r4r5r6r7r8 * M = r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8
r1r2r3r4r5r6r7r8 * M shr 2 = 0 0 r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2r3r4r5r6
r1r2r3r4r5r6r7r8 * M shr 4 = 0 0 0 0 r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2r3r4
r1r2r3r4r5r6r7r8 * M shr 6 = 0 0 0 0 0 0 r1r2r3r4r5r6r7r8r1r2r3r4r5r6r7r8r1r2
The bits marked in bold are those that are at the right places.
If you use this masking code
R * M & 0b100000000000100000000000 |
(R * M >> 2) & 0b000100000000000100000000 |
(R * M >> 4) & 0b000000100000000000100000 |
(R * M >> 6) & 0b000000000100000000000100
you will get the "red" bits combined in the right way:
r1 0 0 r2 0 0 r3 0 0 r4 0 0 r5 0 0 r6 0 0 r7 0 0 r8 0 0
Then combine the "blue" and "green" bits in a similar way.
A rough estimation of the number of operations:
Multiplications: 3
Bit shifts: 9
Bit-wise AND: 12
Bit-wise OR: 11
You can use a table of size 64 that contains bitstripped values for 6 bit and then fetch 2 bits each from r, g and b and use table for faster lookup. Using lookup of size 512 or 4096 can be more efficient.
/* Converts bits abcdefghijkl to adgjbehkcfil */
static const uint32_t bitStripLookUp[4096] = {
/* Hard coded values, can be generate with some script */
...
};
...
rByte = rDispbuff[line][i]; // rByte, gByte, bByte should be unit32
gByte = gDispbuff[line][i];
bByte = bDispbuff[line][i];
uMSB = ((rByte << 4) & 0x0F00) | (gByte & 0x00F0) | ((bByte >> 4) & 0x000F); // r7r6r5r4g7g6g5g4b7b6b5b4
uLSB = ((rByte << 8) & 0x0F00) | ((gByte << 4) & 0x00F0) | (bByte & 0x000F); // r3r2r1r0g3g2g1g0b3b2b1b0
stuffed_value = (bitStripLookUp[uMSB] << 12) | bitStripLookUp[uLSB];
Interleaving with bitwise operators
inline unsigned interleave(unsigned n)
{
n = ((n << 18) | (n << 9) | n) & 0007007007; // 000000111 000000111 000000111
n = ((n << 6) | (n << 3) | n) & 0444444444; // 100100100 100100100 100100100
return n;
}
unsigned r = interleave(rByte);
unsigned g = interleave(gByte);
unsigned b = interleave(bByte);
unsigned rgb = r | (g >> 1) | (b >> 2);
TempLinebuff[((i*3)+0) +2] = rgb >> 16;
TempLinebuff[((i*3)+1) +2] = rgb >> 8;
TempLinebuff[((i*3)+2) +2] = rgb;
Lookup table solution
#define EXPANDBIT(x, n) (((x) & (1 << (n))) << (3*(n))))
#define EXPAND8BIT(a) (EXPANDBIT(a, 0) | EXPANDBIT(a, 1) | EXPANDBIT(a, 2) | EXPANDBIT(a, 3) | \
EXPANDBIT(a, 4) | EXPANDBIT(a, 5) | EXPANDBIT(a, 6) | EXPANDBIT(a, 7))
#define EXPAND16(A) EXPAND8BIT(16*(A)+ 0), EXPAND8BIT(16*(A)+ 1), EXPAND8BIT(16*(A)+ 2), EXPAND8BIT(16*(A)+ 3), \
EXPAND8BIT(16*(A)+ 4), EXPAND8BIT(16*(A)+ 5), EXPAND8BIT(16*(A)+ 6), EXPAND8BIT(16*(A)+ 7), \
EXPAND8BIT(16*(A)+ 8), EXPAND8BIT(16*(A)+ 9), EXPAND8BIT(16*(A)+10), EXPAND8BIT(16*(A)+11), \
EXPAND8BIT(16*(A)+12), EXPAND8BIT(16*(A)+13), EXPAND8BIT(16*(A)+14), EXPAND8BIT(16*(A)+15)
const uint32_t LUT[256] = {
EXPAND16( 0), EXPAND16( 1), EXPAND16( 2), EXPAND16( 3),
EXPAND16( 4), EXPAND16( 5), EXPAND16( 6), EXPAND16( 7),
EXPAND16( 8), EXPAND16( 9), EXPAND16(10), EXPAND16(11),
EXPAND16(12), EXPAND16(13), EXPAND16(14), EXPAND16(15)
};
output = LUT[rByte] | LUT[gByte] << 1 | LUT[bByte] << 2;
The size of the lookup table may be increased if neccessary
On x86 with BMI2 there's hardware support with PDEP instruction which can be accessed via the intrinsic _pdep_u32. The solution is now much simpler
output = _pdep_u32(rByte, 044444444U << 8)
| _pdep_u32(gByte, 022222222U << 8)
| _pdep_u32(bByte, 011111111U << 8);
Another way is
interleaving using multiplication and mask with this packing technique
This is for architectures without hardware bit deposit instruction but with fast multipliers
uint32_t expand8bits(uint8_t b)
{
uint64_t MAGIC = 0x8040201008040201;
uint64_t MASK = 0x8080808080808080;
uint64_t expanded8bits = htobe64((MAGIC*b) & MASK);
uint64_t result = expanded8bits*0x2108421 & 0x9249000000009000;
// no need to shift if you want to get the bits in the high part
return ((result | (result << 30)) & (044444444ULL << 8)) >> 32;
}
uint32_t stripeBits(uint8_t rByte, uint8_t gByte, uint8_t bByte)
{
return expand8bits(rByte) | (expand8bits(gByte) >> 1) | (expand8bits(bByte) >> 2);
}
The way it works is like this
The first step expands the input bits from abcdefgh to a0000000 b0000000 c0000000 d0000000 e0000000 f0000000 g0000000 h0000000 and store in expand8bits
Then we move those spaced out bits close together by multiplying and masking in the next step. After that result contains a00b00c00d00e00f00000000000000000000000000000000g00h000000000000 and will be ready to merge into a single value
The magic number for bringing the bits closer is calculated like this
a0000000b0000000c0000000d0000000e0000000f0000000g0000000h0000000
× 10000100001000010000100001 (0x2108421)
────────────────────────────────────────────────────────────────
a0000000b0000000c0000000d0000000e0000000f0000000g0000000h0000000
000b0000000c0000000d0000000e0000000f0000000g0000000h0000000
+ 000000c0000000d0000000e0000000f0000000g0000000h0000000
0c0000000d0000000e0000000f0000000g0000000h0000000
0000d0000000e0000000f0000000g0000000h0000000
0000000e0000000f0000000g0000000h0000000
────────────────────────────────────────────────────────────────
ac0bd0cebd0ce0dfce0df0egdf0eg0fheg0fh0g0fh0g00h0g00h0000h0000000
& 1001001001001001000000000000000000000000000000001001000000000000 (0x9249000000009000)
────────────────────────────────────────────────────────────────
a00b00c00d00e00f00000000000000000000000000000000g00h000000000000
Alternatively expand8bits can be implemented using only 32-bit magic number multiplication like this, which may be simpler
uint32_t expand8bits(uint8_t b)
{
const uint8_t RMASK_1458 = 0b10011001;
const uint32_t MAGIC_1458 = 0b00000001000001010000010000000000U;
const uint32_t MAGIC_2367 = 0b00000000010100000101000000000000U;
const uint32_t MASK_BIT1458 = 0b10000000010010000000010000000000U;
const uint32_t MASK_BIT2367 = 0b00010010000000010010000000000000U;
return (((b & RMASK_1458) * MAGIC_1458) & MASK_BIT1458)
| (((b & ~RMASK_1458) * MAGIC_2367) & MASK_BIT2367);
}
Here we split the 8-bit number to two 4-bit parts, one with bits 1, 4, 5, 8 and the remaining with bits 2, 3, 6, 7. The magic numbers are like this
a00de00h 0bc00fg0
× 00000001000001010000010000000000 × 00000000010100000101000000000000
──────────────────────────────── ────────────────────────────────
a00de00h 0bc00fg0
+ a00de00h + 0bc00fg0
a00de00h 0bc00fg0
a00de00h 0bc00fg0
──────────────────────────────── ────────────────────────────────
a00de0ahadedehah0de00h0000000000 000bcbcfgfgbcbcfgfg0000000000000
& 10000000010010000000010000000000 & 00010010000000010010000000000000
──────────────────────────────── ────────────────────────────────
a00000000d00e00000000h0000000000 000b00c00000000f00g0000000000000
See
What's a fast way to space-out bits within a word?
How to create a byte out of 8 bool values (and vice versa)?
Portable efficient alternative to PDEP without using BMI2?

How to interleave 2 booleans using bitwise operators?

Suppose I have two 4-bit values, ABCD and abcd. How to interleave it, so it becomes AaBbCcDd, using bitwise operators? Example in pseudo-C:
nibble a = 0b1001;
nibble b = 0b1100;
char c = foo(a,b);
print_bits(c);
// output: 0b11010010
Note: 4 bits is just for illustration, I want to do this with two 32bit ints.
This is called the perfect shuffle operation, and it's discussed at length in the Bible Of Bit Bashing, Hacker's Delight by Henry Warren, section 7-2 "Shuffling Bits."
Assuming x is a 32-bit integer with a in its high-order 16 bits and b in its low-order 16 bits:
unsigned int x = (a << 16) | b; /* put a and b in place */
the following straightforward C-like code accomplishes the perfect shuffle:
x = (x & 0x0000FF00) << 8 | (x >> 8) & 0x0000FF00 | x & 0xFF0000FF;
x = (x & 0x00F000F0) << 4 | (x >> 4) & 0x00F000F0 | x & 0xF00FF00F;
x = (x & 0x0C0C0C0C) << 2 | (x >> 2) & 0x0C0C0C0C | x & 0xC3C3C3C3;
x = (x & 0x22222222) << 1 | (x >> 1) & 0x22222222 | x & 0x99999999;
He also gives an alternative form which is faster on some CPUs, and (I think) a little more clear and extensible:
unsigned int t; /* an intermediate, temporary variable */
t = (x ^ (x >> 8)) & 0x0000FF00; x = x ^ t ^ (t << 8);
t = (x ^ (x >> 4)) & 0x00F000F0; x = x ^ t ^ (t << 4);
t = (x ^ (x >> 2)) & 0x0C0C0C0C; x = x ^ t ^ (t << 2);
t = (x ^ (x >> 1)) & 0x22222222; x = x ^ t ^ (t << 1);
I see you have edited your question to ask for a 64-bit result from two 32-bit inputs. I'd have to think about how to extend Warren's technique. I think it wouldn't be too hard, but I'd have to give it some thought. If someone else wanted to start here and give a 64-bit version, I'd be happy to upvote them.
EDITED FOR 64 BITS
I extended the second solution to 64 bits in a straightforward way. First I doubled the length of each of the constants. Then I added a line at the beginning to swap adjacent double-bytes and intermix them. In the following 4 lines, which are pretty much the same as the 32-bit version, the first line swaps adjacent bytes and intermixes, the second line drops down to nibbles, the third line to double-bits, and the last line to single bits.
unsigned long long int t; /* an intermediate, temporary variable */
t = (x ^ (x >> 16)) & 0x00000000FFFF0000ull; x = x ^ t ^ (t << 16);
t = (x ^ (x >> 8)) & 0x0000FF000000FF00ull; x = x ^ t ^ (t << 8);
t = (x ^ (x >> 4)) & 0x00F000F000F000F0ull; x = x ^ t ^ (t << 4);
t = (x ^ (x >> 2)) & 0x0C0C0C0C0C0C0C0Cull; x = x ^ t ^ (t << 2);
t = (x ^ (x >> 1)) & 0x2222222222222222ull; x = x ^ t ^ (t << 1);
From Stanford "Bit Twiddling Hacks" page:
https://graphics.stanford.edu/~seander/bithacks.html#InterleaveTableObvious
uint32_t x = /*...*/, y = /*...*/;
uint64_t z = 0;
for (int i = 0; i < sizeof(x) * CHAR_BIT; i++) // unroll for more speed...
{
z |= (x & 1U << i) << i | (y & 1U << i) << (i + 1);
}
Look at the page they propose different and faster algorithms to achieve the same.
Like so:
#include <limits.h>
typedef unsigned int half;
typedef unsigned long long full;
full mix_bits(half a,half b)
{
full result = 0;
for (int i=0; i<sizeof(half)*CHAR_BIT; i++)
result |= (((a>>i)&1)<<(2*i+1))|(((b>>i)&1)<<(2*i+0));
return result;
}
Here is a loop-based solution that is hopefully more readable than some of the others already here.
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
uint64_t interleave(uint32_t a, uint32_t b) {
uint64_t result = 0;
int i;
for (i = 0; i < 31; i++) {
result |= (a >> (31 - i)) & 1;
result <<= 1;
result |= (b >> (31 - i)) & 1;
result <<= 1;
}
// Skip the last left shift.
result |= (a >> (31 - i)) & 1;
result <<= 1;
result |= (b >> (31 - i)) & 1;
return result;
}
void printBits(uint64_t a) {
int i;
for (i = 0; i < 64; i++)
printf("%lu", (a >> (63 - i)) & 1);
puts("");
}
int main(){
uint32_t a = 0x9;
uint32_t b = 0x6;
uint64_t c = interleave(a,b);
printBits(a);
printBits(b);
printBits(c);
}
I have used the 2 tricks/operations used in this post How do you set, clear, and toggle a single bit? of setting a bit at particular index and checking the bit at particular index.
The following code is implemented using these 2 operations only.
int a = 0b1001;
int b = 0b1100;
long int c=0;
int index; //To specify index of c
int bit,i;
//Set bits in c from right to left.
for(i=32;i>=0;i--)
{
index=2*i+1; //We have to add the bit in c at this index
//Check a
bit=a&(1<<i); //Checking whether the i-th bit is set in a
if(bit)
c|=1<<index; //Setting bit in c at index
index--;
//Check b
bit=b&(1<<i); //Checking whether the i-th bit is set in b
if(bit)
c|=1<<index; //Setting bit in c at index
}
printf("%ld",c);
Output: 210 which is 0b11010010

How can i swap every 2 bits in a binary number?

I'm working on this programming project and part of it is to write a function with just bitwise operators that switches every two bits. I've come up with a comb sort of algorithm that accomplishes this but it only works for unsigned numbers, any ideas how I can get it to work with signed numbers as well? I'm completely stumped on this one. Heres what I have so far:
// Mask 1 - For odd bits
int a1 = 0xAA; a1 <<= 24;
int a2 = 0xAA; a2 <<= 16;
int a3 = 0xAA; a3 <<= 8;
int a4 = 0xAA;
int mask1 = a1 | a2 | a3 | a4;
// Mask 2 - For even bits
int b1 = 0x55; b1 <<= 24;
int b2 = 0x55; b2 <<= 16;
int b3 = 0x55; b3 <<= 8;
int b4 = 0x55;
int mask2 = b1 | b2 | b3 | b4;
// Mask Results
int odd = x & mask1;
int even = x & mask2;
int newNum = (odd >> 1) | (even << 1);
return newNum;
The manual creation of the masks by or'ing variables together is because the only constants that can be used are between 0x00-0xFF.
The problem is that odd >> 1 will sign extend with negative numbers. Simply do another and to eliminate the duplicated bit.
int newNum = ((odd >> 1) & mask2) | (even << 1);
Minimizing the operators and noticing the sign extension problem gives:
int odd = 0x55;
odd |= odd << 8;
odd |= odd << 16;
int newnum = ((x & odd) << 1 ) // This is (sort of well defined)
| ((x >> 1) & odd); // this handles the sign extension without
// additional & -operations
One remark though: bit twiddling should be generally applied to unsigned integers only.
When you right shift a signed number, the sign will also be extended. This is known as sign extension. Typically when you are dealing with bit shifting, you want to use unsigned numbers.
Minimizing use of constants by working one byte at a time:
unsigned char* byte_p;
unsigned char byte;
int ii;
byte_p = &x;
for(ii=0; ii<4; ii++) {
byte = *byte_p;
*byte_p = ((byte & 0xAA)>>1) | ((byte & 0x55) << 1);
byte_p++;
}
Minimizing operations and keeping constants between 0x00 and 0xFF:
unsigned int comb = (0xAA << 8) + 0xAA;
comb += comb<<16;
newNum = ((x & comb) >> 1) | ((x & (comb >> 1)) << 1);
10 operations.
Just saw the comments above and realize this is implementing (more or less) some of the suggestions that #akisuihkonen made. So consider this a tip of the hat!

C converting an int to a bitshift operator

I can only use these symbols:
! ~ & ^ | + << >>
Here is the table I need to achieve:
input | output
--------------
0 | 0
1 | 8
2 | 16
3 | 24
With the output I am going to left shift a 32 bit int over.
Ex.
int main()
{
int myInt = 0xFFFFFFFF;
myInt = (x << (myFunction(2)));
//OUTPUT = 0xFFFF0000
}
int myFunction(int input)
{
// Do some magic conversions here
}
any ideas????
Well, if you want a function with f(0) = 0, f(1) = 8, f(3) = 24 and so on then you'll have to implement f(x) = x * 8. Since 8 is a perfect power of two the multiplication can be replaced by shifting. Thus:
int myFunction(int input)
{
return input << 3;
}
That's all.

Reverse bit pattern in C

I am converting a number to binary and have to use putchar to output each number.
The problem is that I am getting the order in reverse.
Is there anyway to reverse a numbers bit pattern before doing my own stuff to it?
As in int n has a specific bit pattern - how can I reverse this bit pattern?
There are many ways to do this, some very fast. I had to look it up.
Reverse bits in a byte
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
Reverse an N-bit quantity in parallel in 5 * lg(N) operations:
unsigned int v; // 32-bit word to reverse bit order
// swap odd and even bits
v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
// swap consecutive pairs
v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
// swap nibbles ...
v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
// swap bytes
v = ((v >> 8) & 0x00FF00FF) | ((v & 0x00FF00FF) << 8);
// swap 2-byte long pairs
v = ( v >> 16 ) | ( v << 16);
Reverse bits in word by lookup table
static const unsigned char BitReverseTable256[256] =
{
# define R2(n) n, n + 2*64, n + 1*64, n + 3*64
# define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
# define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
R6(0), R6(2), R6(1), R6(3)
};
unsigned int v; // reverse 32-bit value, 8 bits at time
unsigned int c; // c will get v reversed
// Option 1:
c = (BitReverseTable256[v & 0xff] << 24) |
(BitReverseTable256[(v >> 8) & 0xff] << 16) |
(BitReverseTable256[(v >> 16) & 0xff] << 8) |
(BitReverseTable256[(v >> 24) & 0xff]);
// Option 2:
unsigned char * p = (unsigned char *) &v;
unsigned char * q = (unsigned char *) &c;
q[3] = BitReverseTable256[p[0]];
q[2] = BitReverseTable256[p[1]];
q[1] = BitReverseTable256[p[2]];
q[0] = BitReverseTable256[p[3]];
Please look at http://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel for more information and references.
Pop bits off your input and push them onto your output. Multiplying and dividing by 2 are the push and pop operations. In pseudo-code:
reverse_bits(x) {
total = 0
repeat n times {
total = total * 2
total += x % 2 // modulo operation
x = x / 2
}
return total
}
See modulo operation on Wikipedia if you haven't seen this operator.
Further points:
What would happen if you changed 2 to 4? Or to 10?
How does this effect the value of n? What is n?
How could you use bitwise operators (<<, >>, &) instead of divide and modulo? Would this make it faster?
Could we use a different algorithm to make it faster? Could lookup tables help?
Let me guess: you have a loop that prints the 0th bit (n&1), then shifts the number right. Instead, write a loop that prints the 31st bit (n&0x80000000) and shifts the number left. Before you do that loop, do another loop that shifts the number left until the 31st bit is 1; unless you do that, you'll get leading zeros.
Reversing is possible, too. Somthing like this:
unsigned int n = 12345; //Source
unsigned int m = 0; //Destination
int i;
for(i=0;i<32;i++)
{
m |= n&1;
m <<= 1;
n >>= 1;
}
I know: that is not exactly C, but I think that is an interesting answer:
int reverse(int i) {
int output;
__asm__(
"nextbit:"
"rcll $1, %%eax;"
"rcrl $1, %%ebx;"
"loop nextbit;"
: "=b" (output)
: "a" (i), "c" (sizeof(i)*8) );
return output;
}
The rcl opcode puts the shifted out bit in the carry flag, then rcr recovers that bit to another register in the reverse order.
My guess is that you have a integer and you're attempting to convert it to binary?
And the "answer" is ABCDEFG, but your "answer" is GFEDCBA?
If so, I'd double check the endian of the machine you're doing it on and the machine the "answer" came from.
Here are functions I've used to reverse bits in a byte and reverse bytes in a quad.
inline unsigned char reverse(unsigned char b) {
return (b&1 << 7)
| (b&2 << 5)
| (b&4 << 3)
| (b&8 << 1)
| (b&0x10 >> 1)
| (b&0x20 >> 3)
| (b&0x40 >> 5)
| (b&0x80 >> 7);
}
inline unsigned long wreverse(unsigned long w) {
return ( ( w &0xFF) << 24)
| ( ((w>>8) &0xFF) << 16)
| ( ((w>>16)&0xFF) << 8)
| ( ((w>>24)&0xFF) );
}

Resources