I have a byte I'm using for bitflags. I know that one and only one bit in the byte is set at any give time.
Ex: unsigned char b = 0x20; //(00100000) 6th most bit set
I currently use the following loop to determine which bit is set:
int getSetBitLocation(unsigned char b) {
int i=0;
while( !((b >> i++) & 0x01) ) { ; }
return i;
}
How do I most efficiently determine the position of the set bit? Can I do this without iteration?
Can I do this without iteration?
It is indeed possible.
How do I most efficiently determine the position of the set bit?
You can try this algorithm. It splits the char in half to search for the top bit, shifting to the low half each time:
int getTopSetBit(unsigned char b) {
int res = 0;
if(b>15){
b = b >> 4;
res = res + 4;
}
if(b>3){
b = b >> 2;
res = res + 2;
}
//thanks #JasonD
return res + (b>>1);
}
It uses two comparisons (three for uint16s, four for uint32s...). and it might be faster than your loop. It is definitely not shorter.
Based on the idea by Anton Kovalenko (hashed lookup) and the comment by 6502 (division is slow), I also suggest this implementation (8-bit => 3-bit hash using a de-Bruijn sequence)
int[] lookup = {7, 0, 5, 1, 6, 4, 3, 2};
int getBitPosition(unsigned char b) {
// return lookup[(b | (b>>1) | (b>>2) | (b>>4)) & 0x7];
return lookup[((b * 0x1D) >> 4) & 0x7];
}
or (larger LUT, but uses just three terms instead of four)
int[] lookup = {0xFF, 0, 1, 4, 2, 0xFF, 5, 0xFF, 7, 3, 0xFF, 0xFF, 6, 0xFF, 0xFF, 0xFF};
int getBitPosition(unsigned char b) {
return lookup[(b | (b>>3) | (b>>4)) & 0xF];
}
Lookup table is simple enough, and you can reduce its size if the set of values is sparse. Let's try with 11 elements instead of 128:
unsigned char expt2mod11_bits[11]={0xFF,0,1,0xFF,2,4,0xFF,7,3,6,5};
unsigned char pos = expt2mod11_bits[b%11];
assert(pos < 8);
assert(1<<pos == b);
Of course, it's not necessarily more effective, especially for 8 bits, but the same trick can be used for larger sizes, where full lookup table would be awfully big. Let's see:
unsigned int w;
....
unsigned char expt2mod19_bits[19]={0xFF,0,1,13,2,0xFF,14,6,3,8,0xFF,12,15,5,7,11,4,10,9};
unsigned char pos = expt2mod19_bits[w%19];
assert(pos < 16);
assert(1<<pos == w);
This is a quite common problem for chess programs that use 64 bits to represent positions (i.e. one 64-bit number to store where are all the white pawns, another for where are all the black ones and so on).
With this representation there is sometimes the need to find the index 0...63 of the first or last set bit and there are several possible approaches:
Just doing a loop like you did
Using a dichotomic search (i.e. if x & 0x00000000ffffffffULL is zero there's no need to check low 32 bits)
Using special instruction if available on the processor (e.g. bsf and bsr on x86)
Using lookup tables (of course not for the whole 64-bit value, but for 8 or 16 bits)
What is faster however really depends on your hardware and on real use cases.
For 8 bits only and a modern processor I think that probably a lookup table with 256 entries is the best choice...
But are you really sure this is the bottleneck of your algorithm?
unsigned getSetBitLocation(unsigned char b) {
unsigned pos=0;
pos = (b & 0xf0) ? 4 : 0; b |= b >>4;
pos += (b & 0xc) ? 2 : 0; b |= b >>2;
pos += (b & 0x2) ? 1 : 0;
return pos;
}
It would be hard to do it jumpfree. Maybe with the Bruin sequences ?
Based on log2 calculation in Find the log base 2 of an N-bit integer in O(lg(N)) operations:
int getSetBitLocation(unsigned char c) {
// c is in {1, 2, 4, 8, 16, 32, 64, 128}, returned values are {0, 1, ..., 7}
return (((c & 0xAA) != 0) |
(((c & 0xCC) != 0) << 1) |
(((c & 0xF0) != 0) << 2));
}
Easiest thing is to create a lookup table. The simplest one will be sparse (having 256 elements) but it would technically avoid iteration.
This comment here technically avoids iteration, but who are we kidding, it is still doing the same number of checks: How to write log base(2) in c/c++
Closed form would be log2(), a la, log2() + 1 But I'm not sure how efficient that is - possibly the CPU has an instruction for taking base 2 logrithms?
if you define
const char bytes[]={1,2,4,8,16,32,64,128}
and use
struct byte{
char data;
int pos;
}
void assign(struct byte b,int i){
b.data=bytes[i];
b.pos=i
}
you don't need to determine the position of the set bit
A lookup table is fast and easy when CHAR_BIT == 8, but on some systems, CHAR_BIT == 16 or 32 and a lookup table becomes insanely bulky. If you're considering a lookup table, I'd suggest wrapping it; make it a "lookup table function", instead, so that you can swap the logic when you need to optimise.
Using divide and conquer, by performing a binary search on a sorted array, involves comparisons based on log2 CHAR_BIT. That code is more complex, involving an initialisation of an array of unsigned char to use as a lookup table for a start. Once you have such the array initialised, you can use bsearch to search it, for example:
#include <stdio.h>
#include <stdlib.h>
void uchar_bit_init(unsigned char *table) {
for (size_t x = 0; x < CHAR_BIT; x++) {
table[x] = 1U << x;
}
}
int uchar_compare(void const *x, void const *y) {
char const *X = x, *Y = y;
return (*X > *Y) - (*X < *Y);
}
size_t uchar_bit_lookup(unsigned char *table, unsigned char value) {
unsigned char *position = bsearch(lookup, c, sizeof lookup, 1, char_compare);
return position ? position - table + 1 : 0;
}
int main(void) {
unsigned char lookup[CHAR_BIT];
uchar_bit_init(lookup);
for (;;) {
int c = getchar();
if (c == EOF) { break; }
printf("Bit for %c found at %zu\n", c, uchar_bit_lookup(lookup, c));
}
}
P.S. This sounds like micro-optimisation. Get your solution done (abstracting the operations required into these functions), then worry about optimisations based on your profiling. Make sure your profiling targets the system that your solution will run on if you're going to focus on micro-optimisations, because the efficiency of micro-optimisations differ widely as hardware differs even slightly... It's usually a better idea to buy a faster PC ;)
I am trying to create a CRC-15 check in c and the output is never correct for each line of the file. I am trying to output the CRC for each line cumulatively next to each line. I use: #define POLYNOMIAL 0xA053 for the divisor and text for the dividend. I need to represent numbers as 32-bit unsigned integers. I have tried printing out the hex values to keep track and flipping different shifts around. However, I just can't seem to figure it out! I have a feeling it has something to do with the way I am padding things. Is there a flaw to my logic?
The CRC is to be represented in four hexadecimal numbers, that sequence will have four leading 0's. For example, it will look like 0000xxxx where the x's are the hexadecimal digits. The polynomial I use is 0xA053.
I thought about using a temp variable and do 4 16 bit chunks of code per line every XOR, however, I'm not quite sure how I could use shifts to accomplish this so I settled for a checksum of the letters on the line and then XORing that to try to calculate the CRC code.
I am testing my code using the following input and padding with . until the string is of length 504 because that is what the pad character needs to be via the requirements given:
"This is the lesson: never give in, never give in, never, never, never, never - in nothing, great or small, large or petty - never give in except to convictions of honor and good sense. Never yield to force; never yield to the apparently overwhelming might of the enemy."
The CRC of the first 64 char line ("This is the lesson: never give in, never give in, never, never,) is supposed to be 000015fa and I am getting bfe6ec00.
My logic:
In CRCCalculation I add each character to a 32-bit unsigned integer and after 64 (the length of one line) I send it into the XOR function.
If it the top bit is not 1, I shift the number to the left one
causing 0s to pad the right and loop around again.
If the top bit is 1, I XOR the dividend with the divisor and then shift the dividend to the left one.
After all calculations are done, I return the dividend shifted to the left four ( to add four zeros to the front) to the calculation function
Add result to the running total of the result
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <ctype.h>
#define POLYNOMIAL 0xA053
void crcCalculation(char *text, int length)
{
int i;
uint32_t dividend = atoi(text);
uint32_t result;
uint32_t sumText = 0;
// Calculate CRC
printf("\nCRC 15 calculation progress:\n");
i = length;
// padding
if(i < 504)
{
for(; i!=504; i++)
{
// printf("i is %d\n", i);
text[i] = '.';
}
}
// Try calculating by first line of crc by summing the values then calcuating, then add in the next line
for (i = 0; i < 504; i++)
{
if(i%64 == 0 && i != 0)
{
result = XOR(POLYNOMIAL, sumText);
printf(" - %x\n",result);
}
sumText +=(uint32_t)text[i];
printf("%c", text[i]);
}
printf("\n\nCRC15 result : %x\n", result);
}
uint32_t XOR(uint32_t divisor, uint32_t dividend)
{
uint32_t divRemainder = dividend;
uint32_t currentBit;
// Note: 4 16 bit chunks
for(currentBit = 32; currentBit > 0; --currentBit)
{
// if topbit is 1
if(divRemainder & 0x80)
{
//divRemainder = (divRemainder << 1) ^ divisor;
divRemainder ^= divisor;
printf("%x %x\n", divRemainder, divisor);
}
// else
// divisor = divisor >> 1;
divRemainder = (divRemainder << 1);
}
//return divRemainder; , have tried shifting to right and left, want to add 4 zeros to front so >>
//return divRemainder >> 4;
return divRemainder >> 4;
}
The first issue I see is the top bit check, it should be:
if(divRemainder & 0x8000)
The question doesn't state if the CRC is bit reflected (xor data into low order bits of CRC, right shift for cycle) or not (xor data into high order bits of CRC, left shift for cycle), so I can't offer help for the rest of the code.
The question doesn't state the initial value of CRC (0x0000 or 0x7fff), or if the CRC is post complemented.
The logic for a conventional CRC is:
xor a byte of data into the CRC (upper or lower bits)
cycle the CRC 8 times (or do a table lookup)
After generating the CRC for an entire message, the CRC can be appended to the message. If a CRC is generated for a message with the appended CRC and there are no errors, the CRC will be zero (or a constant value if the CRC is post complemented).
here is a typical CRC16, extracted from: <www8.cs.umu.se/~isak/snippets/crc-16.c>
#define POLY 0x8408
/*
// 16 12 5
// this is the CCITT CRC 16 polynomial X + X + X + 1.
// This works out to be 0x1021, but the way the algorithm works
// lets us use 0x8408 (the reverse of the bit pattern). The high
// bit is always assumed to be set, thus we only use 16 bits to
// represent the 17 bit value.
*/
unsigned short crc16(char *data_p, unsigned short length)
{
unsigned char i;
unsigned int data;
unsigned int crc = 0xffff;
if (length == 0)
return (~crc);
do
{
for (i=0, data=(unsigned int)0xff & *data_p++;
i < 8;
i++, data >>= 1)
{
if ((crc & 0x0001) ^ (data & 0x0001))
crc = (crc >> 1) ^ POLY;
else crc >>= 1;
}
} while (--length);
crc = ~crc;
data = crc;
crc = (crc << 8) | (data >> 8 & 0xff);
return (crc);
}
Since you want to calculate a CRC15 rather than a CRC16, the logic will be more complex as cannot work with whole bytes, so there will be a lot of bit shifting and ANDing to extract the desire 15 bits.
Note: the OP did not mention if the initial value of the CRC is 0x0000 or 0x7FFF, nor if the result is to be complemented, nor certain other criteria, so this posted code can only be a guide.
Given an integer typedef:
typedef unsigned int TYPE;
or
typedef unsigned long TYPE;
I have the following code to reverse the bits of an integer:
TYPE max_bit= (TYPE)-1;
void reverse_int_setup()
{
TYPE bits= (TYPE)max_bit;
while (bits <<= 1)
max_bit= bits;
}
TYPE reverse_int(TYPE arg)
{
TYPE bit_setter= 1, bit_tester= max_bit, result= 0;
for (result= 0; bit_tester; bit_tester>>= 1, bit_setter<<= 1)
if (arg & bit_tester)
result|= bit_setter;
return result;
}
One just needs first to run reverse_int_setup(), which stores an integer with the highest bit turned on, then any call to reverse_int(arg) returns arg with its bits reversed (to be used as a key to a binary tree, taken from an increasing counter, but that's more or less irrelevant).
Is there a platform-agnostic way to have in compile-time the correct value for max_int after the call to reverse_int_setup(); Otherwise, is there an algorithm you consider better/leaner than the one I have for reverse_int()?
Thanks.
#include<stdio.h>
#include<limits.h>
#define TYPE_BITS sizeof(TYPE)*CHAR_BIT
typedef unsigned long TYPE;
TYPE reverser(TYPE n)
{
TYPE nrev = 0, i, bit1, bit2;
int count;
for(i = 0; i < TYPE_BITS; i += 2)
{
/*In each iteration, we swap one bit on the 'right half'
of the number with another on the left half*/
count = TYPE_BITS - i - 1; /*this is used to find how many positions
to the left (and right) we gotta move
the bits in this iteration*/
bit1 = n & (1<<(i/2)); /*Extract 'right half' bit*/
bit1 <<= count; /*Shift it to where it belongs*/
bit2 = n & 1<<((i/2) + count); /*Find the 'left half' bit*/
bit2 >>= count; /*Place that bit in bit1's original position*/
nrev |= bit1; /*Now add the bits to the reversal result*/
nrev |= bit2;
}
return nrev;
}
int main()
{
TYPE n = 6;
printf("%lu", reverser(n));
return 0;
}
This time I've used the 'number of bits' idea from TK, but made it somewhat more portable by not assuming a byte contains 8 bits and instead using the CHAR_BIT macro. The code is more efficient now (with the inner for loop removed). I hope the code is also slightly less cryptic this time. :)
The need for using count is that the number of positions by which we have to shift a bit varies in each iteration - we have to move the rightmost bit by 31 positions (assuming 32 bit number), the second rightmost bit by 29 positions and so on. Hence count must decrease with each iteration as i increases.
Hope that bit of info proves helpful in understanding the code...
The following program serves to demonstrate a leaner algorithm for reversing bits, which can be easily extended to handle 64bit numbers.
#include <stdio.h>
#include <stdint.h>
int main(int argc, char**argv)
{
int32_t x;
if ( argc != 2 )
{
printf("Usage: %s hexadecimal\n", argv[0]);
return 1;
}
sscanf(argv[1],"%x", &x);
/* swap every neigbouring bit */
x = (x&0xAAAAAAAA)>>1 | (x&0x55555555)<<1;
/* swap every 2 neighbouring bits */
x = (x&0xCCCCCCCC)>>2 | (x&0x33333333)<<2;
/* swap every 4 neighbouring bits */
x = (x&0xF0F0F0F0)>>4 | (x&0x0F0F0F0F)<<4;
/* swap every 8 neighbouring bits */
x = (x&0xFF00FF00)>>8 | (x&0x00FF00FF)<<8;
/* and so forth, for say, 32 bit int */
x = (x&0xFFFF0000)>>16 | (x&0x0000FFFF)<<16;
printf("0x%x\n",x);
return 0;
}
This code should not contain errors, and was tested using 0x12345678 to produce 0x1e6a2c48 which is the correct answer.
typedef unsigned long TYPE;
TYPE reverser(TYPE n)
{
TYPE k = 1, nrev = 0, i, nrevbit1, nrevbit2;
int count;
for(i = 0; !i || (1 << i && (1 << i) != 1); i+=2)
{
/*In each iteration, we swap one bit
on the 'right half' of the number with another
on the left half*/
k = 1<<i; /*this is used to find how many positions
to the left (or right, for the other bit)
we gotta move the bits in this iteration*/
count = 0;
while(k << 1 && k << 1 != 1)
{
k <<= 1;
count++;
}
nrevbit1 = n & (1<<(i/2));
nrevbit1 <<= count;
nrevbit2 = n & 1<<((i/2) + count);
nrevbit2 >>= count;
nrev |= nrevbit1;
nrev |= nrevbit2;
}
return nrev;
}
This works fine in gcc under Windows, but I'm not sure if it's completely platform independent. A few places of concern are:
the condition in the for loop - it assumes that when you left shift 1 beyond the leftmost bit, you get either a 0 with the 1 'falling out' (what I'd expect and what good old Turbo C gives iirc), or the 1 circles around and you get a 1 (what seems to be gcc's behaviour).
the condition in the inner while loop: see above. But there's a strange thing happening here: in this case, gcc seems to let the 1 fall out and not circle around!
The code might prove cryptic: if you're interested and need an explanation please don't hesitate to ask - I'll put it up someplace.
#ΤΖΩΤΖΙΟΥ
In reply to ΤΖΩΤΖΙΟΥ 's comments, I present modified version of above which depends on a upper limit for bit width.
#include <stdio.h>
#include <stdint.h>
typedef int32_t TYPE;
TYPE reverse(TYPE x, int bits)
{
TYPE m=~0;
switch(bits)
{
case 64:
x = (x&0xFFFFFFFF00000000&m)>>16 | (x&0x00000000FFFFFFFF&m)<<16;
case 32:
x = (x&0xFFFF0000FFFF0000&m)>>16 | (x&0x0000FFFF0000FFFF&m)<<16;
case 16:
x = (x&0xFF00FF00FF00FF00&m)>>8 | (x&0x00FF00FF00FF00FF&m)<<8;
case 8:
x = (x&0xF0F0F0F0F0F0F0F0&m)>>4 | (x&0x0F0F0F0F0F0F0F0F&m)<<4;
x = (x&0xCCCCCCCCCCCCCCCC&m)>>2 | (x&0x3333333333333333&m)<<2;
x = (x&0xAAAAAAAAAAAAAAAA&m)>>1 | (x&0x5555555555555555&m)<<1;
}
return x;
}
int main(int argc, char**argv)
{
TYPE x;
TYPE b = (TYPE)-1;
int bits;
if ( argc != 2 )
{
printf("Usage: %s hexadecimal\n", argv[0]);
return 1;
}
for(bits=1;b;b<<=1,bits++);
--bits;
printf("TYPE has %d bits\n", bits);
sscanf(argv[1],"%x", &x);
printf("0x%x\n",reverse(x, bits));
return 0;
}
Notes:
gcc will warn on the 64bit constants
the printfs will generate warnings too
If you need more than 64bit, the code should be simple enough to extend
I apologise in advance for the coding crimes I committed above - mercy good sir!
There's a nice collection of "Bit Twiddling Hacks", including a variety of simple and not-so simple bit reversing algorithms coded in C at http://graphics.stanford.edu/~seander/bithacks.html.
I personally like the "Obvious" algorigthm (http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious) because, well, it's obvious. Some of the others may require less instructions to execute. If I really need to optimize the heck out of something I may choose the not-so-obvious but faster versions. Otherwise, for readability, maintainability, and portability I would choose the Obvious one.
Here is a more generally useful variation. Its advantage is its ability to work in situations where the bit length of the value to be reversed -- the codeword -- is unknown but is guaranteed not to exceed a value we'll call maxLength. A good example of this case is Huffman code decompression.
The code below works on codewords from 1 to 24 bits in length. It has been optimized for fast execution on a Pentium D. Note that it accesses the lookup table as many as 3 times per use. I experimented with many variations that reduced that number to 2 at the expense of a larger table (4096 and 65,536 entries). This version, with the 256-byte table, was the clear winner, partly because it is so advantageous for table data to be in the caches, and perhaps also because the processor has an 8-bit table lookup/translation instruction.
const unsigned char table[] = {
0x00,0x80,0x40,0xC0,0x20,0xA0,0x60,0xE0,0x10,0x90,0x50,0xD0,0x30,0xB0,0x70,0xF0,
0x08,0x88,0x48,0xC8,0x28,0xA8,0x68,0xE8,0x18,0x98,0x58,0xD8,0x38,0xB8,0x78,0xF8,
0x04,0x84,0x44,0xC4,0x24,0xA4,0x64,0xE4,0x14,0x94,0x54,0xD4,0x34,0xB4,0x74,0xF4,
0x0C,0x8C,0x4C,0xCC,0x2C,0xAC,0x6C,0xEC,0x1C,0x9C,0x5C,0xDC,0x3C,0xBC,0x7C,0xFC,
0x02,0x82,0x42,0xC2,0x22,0xA2,0x62,0xE2,0x12,0x92,0x52,0xD2,0x32,0xB2,0x72,0xF2,
0x0A,0x8A,0x4A,0xCA,0x2A,0xAA,0x6A,0xEA,0x1A,0x9A,0x5A,0xDA,0x3A,0xBA,0x7A,0xFA,
0x06,0x86,0x46,0xC6,0x26,0xA6,0x66,0xE6,0x16,0x96,0x56,0xD6,0x36,0xB6,0x76,0xF6,
0x0E,0x8E,0x4E,0xCE,0x2E,0xAE,0x6E,0xEE,0x1E,0x9E,0x5E,0xDE,0x3E,0xBE,0x7E,0xFE,
0x01,0x81,0x41,0xC1,0x21,0xA1,0x61,0xE1,0x11,0x91,0x51,0xD1,0x31,0xB1,0x71,0xF1,
0x09,0x89,0x49,0xC9,0x29,0xA9,0x69,0xE9,0x19,0x99,0x59,0xD9,0x39,0xB9,0x79,0xF9,
0x05,0x85,0x45,0xC5,0x25,0xA5,0x65,0xE5,0x15,0x95,0x55,0xD5,0x35,0xB5,0x75,0xF5,
0x0D,0x8D,0x4D,0xCD,0x2D,0xAD,0x6D,0xED,0x1D,0x9D,0x5D,0xDD,0x3D,0xBD,0x7D,0xFD,
0x03,0x83,0x43,0xC3,0x23,0xA3,0x63,0xE3,0x13,0x93,0x53,0xD3,0x33,0xB3,0x73,0xF3,
0x0B,0x8B,0x4B,0xCB,0x2B,0xAB,0x6B,0xEB,0x1B,0x9B,0x5B,0xDB,0x3B,0xBB,0x7B,0xFB,
0x07,0x87,0x47,0xC7,0x27,0xA7,0x67,0xE7,0x17,0x97,0x57,0xD7,0x37,0xB7,0x77,0xF7,
0x0F,0x8F,0x4F,0xCF,0x2F,0xAF,0x6F,0xEF,0x1F,0x9F,0x5F,0xDF,0x3F,0xBF,0x7F,0xFF};
const unsigned short masks[17] =
{0,0,0,0,0,0,0,0,0,0X0100,0X0300,0X0700,0X0F00,0X1F00,0X3F00,0X7F00,0XFF00};
unsigned long codeword; // value to be reversed, occupying the low 1-24 bits
unsigned char maxLength; // bit length of longest possible codeword (<= 24)
unsigned char sc; // shift count in bits and index into masks array
if (maxLength <= 8)
{
codeword = table[codeword << (8 - maxLength)];
}
else
{
sc = maxLength - 8;
if (maxLength <= 16)
{
codeword = (table[codeword & 0X00FF] << sc)
| table[codeword >> sc];
}
else if (maxLength & 1) // if maxLength is 17, 19, 21, or 23
{
codeword = (table[codeword & 0X00FF] << sc)
| table[codeword >> sc] |
(table[(codeword & masks[sc]) >> (sc - 8)] << 8);
}
else // if maxlength is 18, 20, 22, or 24
{
codeword = (table[codeword & 0X00FF] << sc)
| table[codeword >> sc]
| (table[(codeword & masks[sc]) >> (sc >> 1)] << (sc >> 1));
}
}
How about:
long temp = 0;
int counter = 0;
int number_of_bits = sizeof(value) * 8; // get the number of bits that represent value (assuming that it is aligned to a byte boundary)
while(value > 0) // loop until value is empty
{
temp <<= 1; // shift whatever was in temp left to create room for the next bit
temp |= (value & 0x01); // get the lsb from value and set as lsb in temp
value >>= 1; // shift value right by one to look at next lsb
counter++;
}
value = temp;
if (counter < number_of_bits)
{
value <<= counter-number_of_bits;
}
(I'm assuming that you know how many bits value holds and it is stored in number_of_bits)
Obviously temp needs to be the longest imaginable data type and when you copy temp back into value, all the extraneous bits in temp should magically vanish (I think!).
Or, the 'c' way would be to say :
while(value)
your choice
We can store the results of reversing all possible 1 byte sequences in an array (256 distinct entries), then use a combination of lookups into this table and some oring logic to get the reverse of integer.
Here is a variation and correction to TK's solution which might be clearer than the solutions by sundar. It takes single bits from t and pushes them into return_val:
typedef unsigned long TYPE;
#define TYPE_BITS sizeof(TYPE)*8
TYPE reverser(TYPE t)
{
unsigned int i;
TYPE return_val = 0
for(i = 0; i < TYPE_BITS; i++)
{/*foreach bit in TYPE*/
/* shift the value of return_val to the left and add the rightmost bit from t */
return_val = (return_val << 1) + (t & 1);
/* shift off the rightmost bit of t */
t = t >> 1;
}
return(return_val);
}
The generic approach hat would work for objects of any type of any size would be to reverse the of bytes of the object, and the reverse the order of bits in each byte. In this case the bit-level algorithm is tied to a concrete number of bits (a byte), while the "variable" logic (with regard to size) is lifted to the level of whole bytes.
Here's my generalization of freespace's solution (in case we one day get 128-bit machines). It results in jump-free code when compiled with gcc -O3, and is obviously insensitive to the definition of foo_t on sane machines. Unfortunately it does depend on shift being a power of 2!
#include <limits.h>
#include <stdio.h>
typedef unsigned long foo_t;
foo_t reverse(foo_t x)
{
int shift = sizeof (x) * CHAR_BIT / 2;
foo_t mask = (1 << shift) - 1;
int i;
for (i = 0; shift; i++) {
x = ((x & mask) << shift) | ((x & ~mask) >> shift);
shift >>= 1;
mask ^= (mask << shift);
}
return x;
}
int main() {
printf("reverse = 0x%08lx\n", reverse(0x12345678L));
}
In case bit-reversal is time critical, and mainly in conjunction with FFT, the best is to store the whole bit reversed array. In any case, this array will be smaller in size than the roots of unity that have to be precomputed in FFT Cooley-Tukey algorithm. An easy way to compute the array is:
int BitReverse[Size]; // Size is power of 2
void Init()
{
BitReverse[0] = 0;
for(int i = 0; i < Size/2; i++)
{
BitReverse[2*i] = BitReverse[i]/2;
BitReverse[2*i+1] = (BitReverse[i] + Size)/2;
}
} // end it's all