Reverse every 4bits in in 32 bit int - c

Reverse every bits in each 4 bit, e.g:
0101 1011 1100 0110 becomes
1010 1101 0011 0110
Another:
1010 1100 0101 1100 becomes
0101 0011 1010 0011
I can think to reverse all 32 bit as below:
unsigned int reverseBits(unsigned int num)
{
unsigned int count = sizeof(num) * 8 - 1;
unsigned int reverse_num = num;
num >>= 1;
while(num)
{
reverse_num <<= 1;
reverse_num |= num & 1;
num >>= 1;
count--;
}
reverse_num <<= count;
return reverse_num;
}
But how to solve the above problem?

You can take the algorithm for complete bit-reversal, and delete a couple of steps, leaving you with just: (not tested)
x = ((x >> 1) & 0x55555555) | ((x & 0x55555555) << 1); // swap odd/even bits
x = ((x >> 2) & 0x33333333) | ((x & 0x33333333) << 2); // swap groups of 2
Obviously that assumes unsigned ints are 32 bits.

1. Lookup-table to reverse nibbles. The i-th element gives the nibble-reversed version of i, where i is an unsigned byte:
static const unsigned char lut[] = {
0x00, 0x08, 0x04, 0x0C, 0x02, 0x0A, 0x06, 0x0E,
0x01, 0x09, 0x05, 0x0D, 0x03, 0x0B, 0x07, 0x0F,
0x80, 0x88, 0x84, 0x8C, 0x82, 0x8A, 0x86, 0x8E,
0x81, 0x89, 0x85, 0x8D, 0x83, 0x8B, 0x87, 0x8F,
0x40, 0x48, 0x44, 0x4C, 0x42, 0x4A, 0x46, 0x4E,
0x41, 0x49, 0x45, 0x4D, 0x43, 0x4B, 0x47, 0x4F,
0xC0, 0xC8, 0xC4, 0xCC, 0xC2, 0xCA, 0xC6, 0xCE,
0xC1, 0xC9, 0xC5, 0xCD, 0xC3, 0xCB, 0xC7, 0xCF,
0x20, 0x28, 0x24, 0x2C, 0x22, 0x2A, 0x26, 0x2E,
0x21, 0x29, 0x25, 0x2D, 0x23, 0x2B, 0x27, 0x2F,
0xA0, 0xA8, 0xA4, 0xAC, 0xA2, 0xAA, 0xA6, 0xAE,
0xA1, 0xA9, 0xA5, 0xAD, 0xA3, 0xAB, 0xA7, 0xAF,
0x60, 0x68, 0x64, 0x6C, 0x62, 0x6A, 0x66, 0x6E,
0x61, 0x69, 0x65, 0x6D, 0x63, 0x6B, 0x67, 0x6F,
0xE0, 0xE8, 0xE4, 0xEC, 0xE2, 0xEA, 0xE6, 0xEE,
0xE1, 0xE9, 0xE5, 0xED, 0xE3, 0xEB, 0xE7, 0xEF,
0x10, 0x18, 0x14, 0x1C, 0x12, 0x1A, 0x16, 0x1E,
0x11, 0x19, 0x15, 0x1D, 0x13, 0x1B, 0x17, 0x1F,
0x90, 0x98, 0x94, 0x9C, 0x92, 0x9A, 0x96, 0x9E,
0x91, 0x99, 0x95, 0x9D, 0x93, 0x9B, 0x97, 0x9F,
0x50, 0x58, 0x54, 0x5C, 0x52, 0x5A, 0x56, 0x5E,
0x51, 0x59, 0x55, 0x5D, 0x53, 0x5B, 0x57, 0x5F,
0xD0, 0xD8, 0xD4, 0xDC, 0xD2, 0xDA, 0xD6, 0xDE,
0xD1, 0xD9, 0xD5, 0xDD, 0xD3, 0xDB, 0xD7, 0xDF,
0x30, 0x38, 0x34, 0x3C, 0x32, 0x3A, 0x36, 0x3E,
0x31, 0x39, 0x35, 0x3D, 0x33, 0x3B, 0x37, 0x3F,
0xB0, 0xB8, 0xB4, 0xBC, 0xB2, 0xBA, 0xB6, 0xBE,
0xB1, 0xB9, 0xB5, 0xBD, 0xB3, 0xBB, 0xB7, 0xBF,
0x70, 0x78, 0x74, 0x7C, 0x72, 0x7A, 0x76, 0x7E,
0x71, 0x79, 0x75, 0x7D, 0x73, 0x7B, 0x77, 0x7F,
0xF0, 0xF8, 0xF4, 0xFC, 0xF2, 0xFA, 0xF6, 0xFE,
0xF1, 0xF9, 0xF5, 0xFD, 0xF3, 0xFB, 0xF7, 0xFF
};
2. Function to reverse the nibbles. It applies the lookup-table to every byte of an unsigned 4-byte integer:
unsigned reverse_nibbles(unsigned i) {
return (lut[(i & 0xFF000000) >> 24] << 24) |
(lut[(i & 0x00FF0000) >> 16] << 16) |
(lut[(i & 0x0000FF00) >> 8] << 8) |
(lut[ i & 0x000000FF ] );
}
Test results (ideone):
0000 0000 0000 0000 0101 1011 1100 0110
0000 0000 0000 0000 1010 1101 0011 0110
0000 0000 0000 0000 1010 1100 0101 1100
0000 0000 0000 0000 0101 0011 1010 0011
1100 1010 1111 1110 1011 1010 1011 1110
0011 0101 1111 0111 1101 0101 1101 0111
The lookup table was pre-calculated this way (ideone):
#include <stdio.h>
int main() {
unsigned i, j;
for (i = 0; i < 256; ++i) {
j = ((i & 0x01) << 3) |
((i & 0x02) << 1) |
((i & 0x04) >> 1) |
((i & 0x08) >> 3) |
((i & 0x10) << 3) |
((i & 0x20) << 1) |
((i & 0x40) >> 1) |
((i & 0x80) >> 3);
printf("0x%02X, ", j);
if (((i + 1) % 8) == 0)
printf("\n");
}
return 0;
}

Use code similar to what you have for each run of 4 bits, shifting each reversed nibble into the final result.
You could have 1 version of your code extract & replace each run of 4 bits (shifting by 4 bits with each iteration), and calling a different version that takes a 4-bit value & reverses those 4 bits (by setting count to 3, I believe).

My answer to swap every 4 bit is as below:
num = ((num&0F0F0F0F)<<4)|((num>>4)&0F0F0F0F);

Related

8086-Matrix Scroll ‘A’ left to right C code

I was trying to scroll 8*8 dot matrix for character 'A' left to right but I am stuck at something. The MDA-8086 trainer kit is not currently available to me. As a result, I am unable to run the code to inspect what is actually happening there.
The code is here.
#include "mde8086.h"
/* Output Font 'A' Left to Right*/
int font1[8] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
int font2[8] = { 0xc0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
int font3[8] = { 0xb7, 0xc0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
int font4[8] = { 0x77, 0xb7, 0xc0, 0xff, 0xff, 0xff, 0xff, 0xff };
int font5[8] = { 0x77, 0x77, 0xb7, 0xc0, 0xff, 0xff, 0xff, 0xff };
int font6[8] = { 0xb7, 0x77, 0x77, 0xb7, 0xc0, 0xff, 0xff, 0xff };
int font7[8] = { 0xc0, 0xb7, 0x77, 0x77, 0xb7, 0xc0, 0xff, 0xff };
int font8[8] = { 0xff, 0xc0, 0xb7, 0x77, 0x77, 0xb7, 0xc0, 0xff };
void wait(long del)
{
while( del-- );
}
void display( int *data1 )
{
int *data;
int common, i, k;
for( k = 0; k != 20; k++ )
{
common = 0x01;
data = data1;
for( i = 0; i != 8; i++ )
{
outportb( PPI2_C, common );
outportb( PPI2_B, *data );
wait(120);
data++;
common = common << 1;
}
}
}
void main(void)
{
outportb( PPI2_CR, 0x80 );
outportb( PPI2_A, 0xff );
do
{
display(font1);
display(font2);
display(font3);
display(font4);
display(font5);
display(font6);
display(font7);
display(font8);
display(font8);
display(font8);
} while(1);
}
On displaying just 'A' without scrolling from left to right, this loop for( k = 0; k != 20; k++ ) is absent. I just want to know what is the specialty of this loop here?
The output of the above code is shown in the figure below.
The 8x8 dot matrix can only ever show 8 lit LEDs, one vertical column.
(Actually it could show all identical columns at the same time, using more than one set bit inside common, but that would make for complex code and not help in the long run.)
To show one letter, you have to show the eight colums one by one, each time waiting a little (wait(120)). That makes one character, using 1 font, but only for a very short time. The 8 waits seem to be "in parallel" because of the speed with which that happens.
This is the inner loop.
To give a human time to actually see the character, that is done 20 times.
It is not possible to simply wait again, because that would mean only seeing the last column for a humanly perceivable time.
This is the outer loop.
The scrolling is done without loop, by the many lines with different fonts inside main().
The loop inside main() just repeats this.

How to create tests in C and check the values on Arm neon

I am trying to write tests in C to check if my understanding of the ARM neon intrinsic are correct. I want to compare my expected output with that from the Arm Neon's. I can complete all ints, uints, polys and float types but I'm unable to do the 2D vector types. The Arm Neon out put claims
error: invalid operands to binary expression ('int8x8x2_t'(aka 'struct int8x8x2_t') and 'int8x8x2_t').
Here is my test that I tried to code. I'm also unable to get the printf function out.I understand it has to be a struct, but how do I go about doing this.
#include<arm_neon.h>
#include<stdio.h>
int test_vtrn_s8()
{
int8x8_t a = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07 };
int8x8_t b = { 0xA0, 0xA1, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6, 0xA7 };
int8x8x2_t want = { {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08},
{0xA0, 0xA1, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6, 0xA7} };
int8x8x2_t result = vtrn_s8( a, b );
for(int j = 0; j<2; ++j){
for(int i = 0; i<8; ++i){
if(want[j][i] != result[j][i] )
return 1;
/*printf("result is '0x%d'; and 'expected is 0x%dX\n", result, expected);*/
}
}
return 0;
}
int main()
{
int pass = 0;
pass |= test_vtrn_s8();
return pass;
}

8-bit fletcher checksum of 16 byte data

I'm trying to implement a 8-bit fletcher checksum function.
My data will always be 17 byte long.
I started with code from Remake of Fletcher checksum from 32bit to 8
Here is what I ended up having :
// 8-bit Fletcher checksum
// data is always 17 byte long
uint8_t fletcher(uint8_t *data) {
uint8_t sum1 = 0x0f, sum2 = 0x0f, len = 17;
while(len) {
sum1 += *data++;
sum2 += sum1;
sum1 = (sum1 & 0x0f) + (sum1 >> 4);
sum2 = (sum2 & 0x0f) + (sum2 >> 4);
len--;
}
sum1 = (sum1 & 0x0f) + (sum1 >> 4);
sum2 = (sum2 & 0x0f) + (sum2 >> 4);
return sum2<<4 | sum1;
}
I'm wondering if it is good and I have the impression that I could simplify further but I can't find where (maybe it just can't be simplified further after all ...).
My main question is whether this code looks it will work okay. I am using it at both sides of a wireless based data link it will "work" (return the same checksum from the same data) but may be wrong in the fletcher way and not provide the expected error detection ...
Hope I'm clear enough ...
Thanks in advance !
First off, any variant of Fletcher is not a CRC. It's a checksum.
Second, there is nothing special about the number of bytes you process as long as the algorithm is correct.
Third, you do not need to and should not be doing the modulo 15 at every step. For speed (which is the whole point of a Fletcher sum as compared to a CRC), there should be an inner loop that only consists of sum2 += sum1 += *data++;. Depending on the size of the sum1 and sum2 data types, you can calculate how many iterations you can do before overflowing sum2 assuming that all input bytes are 0xff. Then an outer loop runs that inner loop that many times followed by the two modulo 15's. The outer loops runs through all of the input data.
Fourth, the (x >> 4) + (x & 0xf) operations do not complete the intended x % 15 when x ends up as 15. There would need to be a final if (x == 15) x = 0;.
Update:
Ok, so here's the code:
#include <stdio.h>
#define MAXPART 5803 /* for 32-bit unsigned sum1, sum2 */
/* #define MAXPART 22 for 16-bit unsigned sum1, sum2 */
unsigned fletcher8(unsigned f8, unsigned char *data, size_t len)
{
unsigned long sum1, sum2;
size_t part;
sum1 = f8 & 0xf;
sum2 = (f8 >> 4) & 0xf;
while (len) {
part = len > MAXPART ? MAXPART : len;
len -= part;
do {
sum2 += sum1 += *data++;
} while (--part);
sum1 %= 15;
sum2 %= 15;
}
return (sum2 << 4) + sum1;
}
#define SIZE 131072
int main(void)
{
unsigned f8 = 1;
unsigned char buf[SIZE];
size_t got;
while ((got = fread(buf, 1, SIZE, stdin)) > 0)
f8 = fletcher8(f8, buf, got);
printf("0x%02x\n", f8);
return 0;
}
Note that I started the Fletcher8 value at 1 instead of 0xff. That is sufficient to assure that a string of any length of zeros will produce the same zero result. You can set the initial value to whatever you like, so long as it's not zero.
If the modulo (%) operation on the machine is very slow, then it may be faster to do the % 15 operations with a set of shifts and adds. Here is an example for a 32-bit type:
k = (k >> 16) + (k & 0xffff);
k = (k >> 8) + (k & 0xff);
k = (k >> 4) + (k & 0xf);
k = (k >> 4) + (k & 0xf);
if (k > 14)
k -= 15;
For the stated case where len == 17, the code can be simplified to use unsigned instead of unsigned long for sum1 and sum2, and to skip the outer loop since len <= 22. The non-% modulo operations can be shortened as well. Here's that, where I also eliminated an unnecessary decrement operation in the loop:
unsigned fletch8_17(unsigned char *data)
{
unsigned sum1 = 1;
unsigned sum2 = 0;
unsigned char *end = data + 17;
do {
sum2 += sum1 += *data++;
} while (data < end);
sum1 = (sum1 >> 8) + (sum1 & 0xff);
sum1 = (sum1 >> 4) + (sum1 & 0xf);
if (sum1 > 14) {
sum1 -= 15;
if (sum1 > 14)
sum1 -= 15;
}
sum2 = (sum2 >> 8) + (sum2 & 0xff);
sum2 = (sum2 >> 4) + (sum2 & 0xf);
if (sum2 > 14) {
sum2 -= 15;
if (sum2 > 14)
sum2 -= 15;
}
return (sum2 << 4) + sum1;
}
For comparison, you can try this 8-bit CRC and see how it compares for speed (crc should be initialized to zero):
#include <stddef.h>
/* 8-bit CRC with polynomial x^8+x^6+x^3+x^2+1, 0x14D.
Chosen based on Koopman, et al. (0xA6 in his notation = 0x14D >> 1):
http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf
*/
static unsigned char crc8_table[] = {
0x00, 0x3e, 0x7c, 0x42, 0xf8, 0xc6, 0x84, 0xba, 0x95, 0xab, 0xe9, 0xd7,
0x6d, 0x53, 0x11, 0x2f, 0x4f, 0x71, 0x33, 0x0d, 0xb7, 0x89, 0xcb, 0xf5,
0xda, 0xe4, 0xa6, 0x98, 0x22, 0x1c, 0x5e, 0x60, 0x9e, 0xa0, 0xe2, 0xdc,
0x66, 0x58, 0x1a, 0x24, 0x0b, 0x35, 0x77, 0x49, 0xf3, 0xcd, 0x8f, 0xb1,
0xd1, 0xef, 0xad, 0x93, 0x29, 0x17, 0x55, 0x6b, 0x44, 0x7a, 0x38, 0x06,
0xbc, 0x82, 0xc0, 0xfe, 0x59, 0x67, 0x25, 0x1b, 0xa1, 0x9f, 0xdd, 0xe3,
0xcc, 0xf2, 0xb0, 0x8e, 0x34, 0x0a, 0x48, 0x76, 0x16, 0x28, 0x6a, 0x54,
0xee, 0xd0, 0x92, 0xac, 0x83, 0xbd, 0xff, 0xc1, 0x7b, 0x45, 0x07, 0x39,
0xc7, 0xf9, 0xbb, 0x85, 0x3f, 0x01, 0x43, 0x7d, 0x52, 0x6c, 0x2e, 0x10,
0xaa, 0x94, 0xd6, 0xe8, 0x88, 0xb6, 0xf4, 0xca, 0x70, 0x4e, 0x0c, 0x32,
0x1d, 0x23, 0x61, 0x5f, 0xe5, 0xdb, 0x99, 0xa7, 0xb2, 0x8c, 0xce, 0xf0,
0x4a, 0x74, 0x36, 0x08, 0x27, 0x19, 0x5b, 0x65, 0xdf, 0xe1, 0xa3, 0x9d,
0xfd, 0xc3, 0x81, 0xbf, 0x05, 0x3b, 0x79, 0x47, 0x68, 0x56, 0x14, 0x2a,
0x90, 0xae, 0xec, 0xd2, 0x2c, 0x12, 0x50, 0x6e, 0xd4, 0xea, 0xa8, 0x96,
0xb9, 0x87, 0xc5, 0xfb, 0x41, 0x7f, 0x3d, 0x03, 0x63, 0x5d, 0x1f, 0x21,
0x9b, 0xa5, 0xe7, 0xd9, 0xf6, 0xc8, 0x8a, 0xb4, 0x0e, 0x30, 0x72, 0x4c,
0xeb, 0xd5, 0x97, 0xa9, 0x13, 0x2d, 0x6f, 0x51, 0x7e, 0x40, 0x02, 0x3c,
0x86, 0xb8, 0xfa, 0xc4, 0xa4, 0x9a, 0xd8, 0xe6, 0x5c, 0x62, 0x20, 0x1e,
0x31, 0x0f, 0x4d, 0x73, 0xc9, 0xf7, 0xb5, 0x8b, 0x75, 0x4b, 0x09, 0x37,
0x8d, 0xb3, 0xf1, 0xcf, 0xe0, 0xde, 0x9c, 0xa2, 0x18, 0x26, 0x64, 0x5a,
0x3a, 0x04, 0x46, 0x78, 0xc2, 0xfc, 0xbe, 0x80, 0xaf, 0x91, 0xd3, 0xed,
0x57, 0x69, 0x2b, 0x15};
unsigned crc8(unsigned crc, unsigned char *data, size_t len)
{
unsigned char *end;
if (len == 0)
return crc;
crc ^= 0xff;
end = data + len;
do {
crc = crc8_table[crc ^ *data++];
} while (data < end);
return crc ^ 0xff;
}
An 8-bit CRC will definitely give better error-detection performance than the 8-bit Fletcher checksum. It may even be faster in this case!
Your code looks a lot like the one given in the Optimizations section of the Wikipedia article on the Fletcher checksum. I guess your source took it from there without proper attribution.
Deferred reduction
Your code looks mostly right. However, you sum too much. If your input is only 17 bytes of data, then your maximal value of sum2 will be 17*(17+1)/2*0xf + 0xf = 0x906, which fits into an uint16_t. To reduce that to a single nibble, two reduction steps are sufficient. For sum1, the maximum will be 17*0xf + 0xf = 0x10e which requires two reductions as well. So you could write
uint8_t fletcher(uint8_t *data) {
uint16_t sum1 = 0xf, sum2 = 0xf, len = 17;
while(len) {
sum1 += *data++;
sum2 += sum1;
len--;
};
sum1 = (sum1 & 0x0f) + (sum1 >> 4);
sum1 = (sum1 & 0x0f) + (sum1 >> 4);
sum2 = (sum2 & 0x0f) + (sum2 >> 4);
sum2 = (sum2 & 0x0f) + (sum2 >> 4);
return sum2<<4 | sum1;
}
You could do further “optimizations” on the code, like
do { sum2 += ( sum1 += *data++ ); } while (--len);
or even manually unrolling that loop, but a good optimizing compiler should take care of this for you.
You can adapt the above considerations to other input lengths.
Whole bytes as nibble stream
The above answer assumes that your data only contains unpacked nibbles, i.e. at most the least significant half of each byte is used. I might be wrong here, but I guess that if you were dealing with whole bytes, the solution most in the spirit of Fletcher's checksum would be to treat them as a sequence of twice as many nibbles, e.g.
sum1 += *data & 0xf;
sum2 += sum1;
sum1 += *data >> 4;
sum2 += sum1;
++data;
--len;
There might be more efficient ways to write this as fewer optimizations. Your compiler may or may not be able to find one of them.
Quality of the checksum
but may be wrong in the fletcher way and not provide the expected error detection ...
I'm not sure how useful a fletcher checksum here really is. Perhaps some real CRC with 8 bit output would better suite your needs in terms of error checking.

C reverse bits in unsigned integer

I'm converting an unsigned integer to binary using bitwise operators, and currently do integer & 1 to check if bit is 1 or 0 and output, then right shift by 1 to divide by 2. However the bits are returned in the wrong order (reverse), so I thought to reverse the bits order in the integer before beginning.
Is there a simple way to do this?
Example:
So if I'm given the unsigned int 10 = 1010
while (x not eq 0)
if (x & 1)
output a '1'
else
output a '0'
right shift x by 1
this returns 0101 which is incorrect... so I was thinking to reverse the order of the bits originally before running the loop, but I'm unsure how to do this?
Reversing the bits in a word is annoying and it's easier just to output them in reverse order. E.g.,
void write_u32(uint32_t x)
{
int i;
for (i = 0; i < 32; ++i)
putchar((x & ((uint32_t) 1 << (31 - i)) ? '1' : '0');
}
Here's the typical solution to reversing the bit order:
uint32_t reverse(uint32_t x)
{
x = ((x >> 1) & 0x55555555u) | ((x & 0x55555555u) << 1);
x = ((x >> 2) & 0x33333333u) | ((x & 0x33333333u) << 2);
x = ((x >> 4) & 0x0f0f0f0fu) | ((x & 0x0f0f0f0fu) << 4);
x = ((x >> 8) & 0x00ff00ffu) | ((x & 0x00ff00ffu) << 8);
x = ((x >> 16) & 0xffffu) | ((x & 0xffffu) << 16);
return x;
}
you could move from left to right instead, that is shift a one from the MSB to the LSB, for example:
unsigned n = 20543;
unsigned x = 1<<31;
while (x) {
printf("%u ", (x&n)!=0);
x = x>>1;
}
You could just loop through the bits from big end to little end.
#define N_BITS (sizeof(unsigned) * CHAR_BIT)
#define HI_BIT (1 << (N_BITS - 1))
for (int i = 0; i < N_BITS; i++) {
printf("%d", !!(x & HI_BIT));
x <<= 1;
}
Where !! can also be written !=0 or >> (N_BITS - 1).
You could reverse the bits like you output them, and instead store them in another integer, and do it again :
for (i = 0; i < (sizeof(unsigned int) * CHAR_BIT); i++)
{
new_int |= (original_int & 1);
original_int = original_int >> 1;
new_int = new_int << 1;
}
Or you could just do the opposite, shift your mask :
unsigned int mask = 1 << ((sizeof(unsigned int) * CHAR_BIT) - 1);
while (mask > 0)
{
bit = original_int & mask;
mask = mask >> 1;
printf("%d", (bit > 0));
}
If you want to remove leading 0's you can either wait for a 1 to get printed, or do a preliminary go-through :
unsigned int mask = 1 << ((sizeof(unsigned int) * CHAR_BIT) - 1);
while ((mask > 0) && ((original_int & mask) == 0))
mask = mask >> 1;
do
{
bit = original_int & mask;
mask = mask >> 1;
printf("%d", (bit > 0));
} while (mask > 0);
this way you will place the mask on the first 1 to be printed and forget about the leading 0's
But remember : printing the binary value of an integer can be done just with printf
unsigned int rev_bits(unsigned int input)
{
unsigned int output = 0;
unsigned int n = sizeof(input) << 3;
unsigned int i = 0;
for (i = 0; i < n; i++)
if ((input >> i) & 0x1)
output |= (0x1 << (n - 1 - i));
return output;
}
You can reverse an unsigned 32-bit integer and return using the following reverse function :
unsigned int reverse(unsigned int A) {
unsigned int B = 0;
for(int i=0;i<32;i++){
unsigned int j = pow(2,31-i);
if((A & (1<<i)) == (1<<i)) B += j;
}
return B;
}
Remember to include the math library. Happy coding :)
I believe the question is asking how to not output in reverse order.
Fun answer (recursion):
#include <stdio.h>
void print_bits_r(unsigned int x){
if(x==0){
printf("0");
return;
}
unsigned int n=x>>1;
if(n!=0){
print_bits_r(n);
}
if(x&1){
printf("1");
}else{
printf("0");
}
}
void print_bits(unsigned int x){
printf("%u=",x);
print_bits_r(x);
printf("\n");
}
int main(void) {
print_bits(10u);//1010
print_bits((1<<5)+(1<<4)+1);//110001
print_bits(498598u);//1111001101110100110
return 0;
}
Expected output:
10=1010
49=110001
498598=1111001101110100110
Sequential version (picks off the high-bits first):
#include <limits.h>//Defines CHAR_BIT
//....
void print_bits_r(unsigned int x){
//unsigned int mask=(UINT_MAX>>1)+1u;//Also works...
unsigned int mask=1u<<(CHAR_BIT*sizeof(unsigned int)-1u);
int start=0;
while(mask!=0){
if((x&mask)!=0){
printf("1");
start=1;
}else{
if(start){
printf("0");
}
}
mask>>=1;
}
if(!start){
printf("0");
}
}
The Best way to reverse the bit in an integer is:
It is very efficient.
It only runs upto when the leftmost bit is 1.
CODE SNIPPET
int reverse ( unsigned int n )
{
int x = 0;
int mask = 1;
while ( n > 0 )
{
x = x << 1;
if ( mask & n )
x = x | 1;
n = n >> 1;
}
return x;
}
The 2nd answer by Dietrich Epp is likely what's best on a modern processor with high speed caches. On typical microcontrollers however that is not the case and there the following is not only faster but also more versatile and more compact (in C):
// reverse a byte
uint8_t reverse_u8(uint8_t x)
{
const unsigned char * rev = "\x0\x8\x4\xC\x2\xA\x6\xE\x1\x9\x5\xD\x3\xB\x7\xF";
return rev[(x & 0xF0) >> 4] | (rev[x & 0x0F] << 4);
}
// reverse a word
uint16_t reverse_u16(uint16_t x)
{
return reverse_u8(x >> 8) | (reverse_u8(x & 0xFF) << 8);
}
// reverse a long
uint32_t reverse_u32(uint32_t x)
{
return reverse_u16(x >> 16) | (reverse_u16(x & 0xFFFF) << 16);
}
The code is easily translated to Java, Go, Rust etc. Of course if you only need to print the digits, it is best to simply print in reverse order (see the answer by Dietrich Epp).
It seems foolish to reverse the bit order of an integer value and then pick off bits from the low end, when it is trivial to leave it unchanged and pick off bits from the high end.
You want to convert an integer into a text representation, in this case in base-2 (binary). Computers convert integers into text all the time, most often in base-10 or base-16.
A simple built-in solution is:
printf('%b', 123); // outputs 1111011
But that's not standard in C. (See Is there a printf converter to print in binary format?)
Numbers are written with the most-significant digit (or bit) first, so repeatedly taking the least-significant digit (or bit) is half the job. You have to collect the digits and assemble or output them in reverse order.
To display the value 123 as base-10, you would:
Divide 123 by 10, yielding 12 remainder 3.
Divide 12 by 10, yielding 1 remainder 2.
Finally, divide 1 by 10, yielding 0 remainder 1. (0 is the stopping point.)
Display the remainders (3, 2, 1) in reverse order, to display "123".
We could put any number of zeros before the 123, but that is not proper, because they do not contribute anything. Bigger numbers need longer character strings ("123", "123000", "123000000"). With this approach, you don't know how many digits are needed until you compute the most-significant digit, so you can't output the first digit until you have computed all of them.
Alternatively, you can compute the most-significant digit first. It looks a little more complex. Especially in bases other than base-2. Again starting with 123:
Divide 123 by 1000, yielding 0 remainder 123.
Divide 123 by 100, yielding 1 remainder 23.
Divide 23 by 10, yielding 2 remainder 3.
Finally, divide 3 by 1, yielding 3 remainder 0. (0 is the stopping point.)
Display the quotients (0, 1, 2, 3) in the same order, skipping the leading zeros, to display "123".
You could output the digits in order as they are computed. You have to start with a large-enough divisor. For uint16 it's 10000; for uint32 it's 1000000000.
To display the value 123 as base-2, using the first method:
Divide 123 by 2, yielding 61 remainder 1.
Divide 61 by 2, yielding 30 remainder 1.
Divide 30 by 2, yielding 15 remainder 0.
Divide 15 by 2, yielding 7 remainder 1.
Divide 7 by 2, yielding 3 remainder 1.
Divide 3 by 2, yielding 1 remainder 1.
Finally, divide 1 by 2, yielding 0 remainder 1. (0 is the stopping point.)
Display the remainders (1,1,0,1,1,1,1) in reverse order, to display "1111011".
(Dividing by 2 can be accomplished by right-shifting by 1 bit.)
The second method yields the digits (bits) in order.
Divide 123 by 256, yielding 0 remainder 123.
Divide 123 by 128, yielding 0 remainder 123.
Divide 123 by 64, yielding 1 remainder 59.
Divide 59 by 32, yielding 1 remainder 27.
Divide 27 by 16, yielding 1 remainder 11.
Divide 11 by 8, yielding 1 remainder 3.
Divide 3 by 4, yielding 0 remainder 3.
Divide 2 by 2, yielding 1 remainder 1.
Finally, divide 1 by 1, yielding 1 remainder 0. (0 is the stopping point.)
Display the quotients (0,0,1,1,1,1,0,1,1) in the same order, skipping any leading first zeros, to display "1111011".
(These divisions can be accomplished using comparisons. The comparison values can be generated by dividing by 2, which means right-shifting by 1 bit.)
Any of these solutions might need a hack to prevent the value 0 from displaying as nothing (a.k.a. "", or the empty string) instead of "0".
I came up with a solution which dosesn't involve any application of bitwise operators. it is inefficient in terms of both space and time.
int arr[32];
for(int i=0;i<32;i++)
{
arr[i]=A%2;
A=A/2;
}
double res=1;
double re=0;
for(int i=0;i<32;i++)
{
int j=31-i;
res=arr[i];
while(j>0)
{
res=res*2;
j--;
}
re=re+res;
}
cout<<(unsigned int )re;
Here's a golang version of reverse bits in an integer, if anyone is looking for one. I wrote this with an approach similar to string reverse in c. Going over from bits 0 to 15 (31/2), swap bit i with bit (31-i). Please check the following code.
package main
import "fmt"
func main() {
var num = 2
//swap bits at index i and 31-i for i between 0-15
for i := 0; i < 31/2; i++ {
swap(&num, uint(i))
}
fmt.Printf("num is %d", num)
}
//check if bit at index is set
func isSet(num *int, index uint) int {
return *num & (1 << index)
}
//set bit at index
func set(num *int, index uint) {
*num = *num | (1 << index)
}
//reset bit at index
func reSet(num *int, index uint) {
*num = *num & ^(1 << index)
}
//swap bits on index and 31-index
func swap(num *int, index uint) {
//check index and 31-index bits
a := isSet(num, index)
b := isSet(num, uint(31)-index)
if a != 0 {
//bit at index is 1, set 31-index
set(num, uint(31)-index)
} else {
//bit at index is 0, reset 31-index
reSet(num, uint(31)-index)
}
if b != 0 {
set(num, index)
} else {
reSet(num, index)
}
}`
Here's my bit shift version which I think is very concise. Does not work with leading zeros though. The main idea is as follows
Input is in variable a, final answer in b
Keep extracting the right most bit from a using (a&1)
OR that with b and left shift b to make place for the next bit
Right shift a to go to the next bit
#include <stdio.h>
void main()
{
int a = 23;
int b = 0;
while(a!=0)
{
b = (b<<1)|(a&1);
a = a>>1;
}
printf("reversed bits gives %d\n", b);
}
The following make use of a table that stored all the reversed value of each byte, table[byte] == reversed_byte, and reverse the 4 bytes of the unsigned integer. Faster to compute than other answers.
#include <stdint.h>
uint32_t reverse_bits(uint32_t n) {
static const uint8_t table[256] =
{
0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};
// 1 2 3 4 -> byte 4 becomes 1, 3 becomes 2, 2 becomes 3 and 1 becomes 4.
return (table[n & 0xff] << 24) | (table[(n >> 8) & 0xff] << 16) |
(table[(n >> 16) & 0xff] << 8) | (table[(n >> 24) & 0xff]);
}

in-place bit-reversed shuffle on an array

For a FFT function I need to permutate or shuffle the elements within an array in a bit-reversed way. That's a common task with FFTs because most power of two sized FFT functions either expect or return their data in a bit-reversed way.
E.g. assume that the array has 256 elements I'd like to swap each element with it's bit-reversed pattern. Here are two examples (in binary):
Element 00000001b should be swapped with element 10000000b
Element 00010111b should be swapped with element 11101000b
and so on.
Any idea how to do this fast and more important: in-place?
I already have a function that does this swap. It's not hard to write one. Since this is such a common operation in DSP I have the feeling that there are more clever ways to do it than my very naiive loop.
Language in question is C, but any language is fine.
To swap in place with a single pass, iterate once through all elements in increasing index. Perform a swap only if the index is less-than the reversed index -- this will skip the double swap problem and also palindrome cases (elements 00000000b, 10000001b, 10100101b) which inverse to the same value and no swap is required.
// Let data[256] be your element array
for (i=0; i<256; i++)
j = bit_reverse(i);
if (i < j)
{
swap(data[i],data[j]);
}
The bit_reverse() can be using Nathaneil's bit-operations trick.
The bit_reverse() will be called 256 times but the swap() will be called less than 128 times.
A quick way to do this is to swap every adjacent single bit, then 2-bit fields, etc.
The fast way to do this is:
x = (x & 0x55) << 1 | (x & 0xAA) >> 1; //swaps bits
x = (x & 0x33) << 2 | (x & 0xCC) >> 2; //swapss 2-bit fields
x = (x & 0x0F) << 4 | (x & 0xF0) >> 4;
While hard to read, if this is something that needs to be optimized you may want to do it this way.
This code uses a lookup table to reverse 64-bit numbers very quickly. For your C-language example, I also included versions for 32-, 16-, and 8-bit numbers (assumes int is 32 bits). In an object-oriented language (C++, C#, etc), I would have just overloaded the function.
I don't have a C-compiler handy at the moment so, hopefully, I didn't miss anything.
unsigned char ReverseBits[] =
{
0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};
unsigned long Reverse64Bits(unsigned long number)
{
unsigned long result;
result =
(ReverseBits[ number & 0xff] << 56) |
(ReverseBits[(number >> 8) & 0xff] << 48) |
(ReverseBits[(number >> 16) & 0xff] << 40) |
(ReverseBits[(number >> 24) & 0xff] << 32) |
(ReverseBits[(number >> 32) & 0xff] << 24) |
(ReverseBits[(number >> 40) & 0xff] << 16) |
(ReverseBits[(number >> 48) & 0xff] << 8) |
(ReverseBits[(number >> 56) & 0xff]);
return result;
}
unsigned int Reverse32Bits(unsigned int number)
{
unsigned int result;
result =
(ReverseBits[ number & 0xff] << 24) |
(ReverseBits[(number >> 8) & 0xff] << 16) |
(ReverseBits[(number >> 16) & 0xff] << 8) |
(ReverseBits[(number >> 24) & 0xff]);
return result;
}
unsigned short Reverse16Bits(unsigned short number)
{
unsigned short result;
result =
(ReverseBits[ number & 0xff] << 8) |
(ReverseBits[(number >> 8) & 0xff]);
return result;
}
unsigned char Reverse8Bits(unsigned char number)
{
unsigned char result;
result = (ReverseBits[number]);
return result;
}
If you think about what's happening to the bitswapped index, it's being counted up in the same way that the non-bitswapped index is being counted up, just with the bits being used in the reverse order from conventional counting.
Rather than bitswapping the index every time through the loop you can manually implement a '++' equivalent that uses bits in the wrong order to do a double indexed for loop. I've verified that gcc at O3 inlines the increment function, but as to whether it's any faster then bitswapping the number via a lookup every time, that's for the profiler to say.
Here's an illustrative test program.
#include <stdio.h>
void RevBitIncr( int *n, int bit )
{
do
{
bit >>= 1;
*n ^= bit;
} while( (*n & bit) == 0 && bit != 1 );
}
int main(void)
{
int max = 0x100;
int i, j;
for( i = 0, j = 0; i != max; ++i, RevBitIncr( &j, max ) )
{
if( i < j )
printf( "%02x <-> %02x\n", i, j );
}
return 0;
}
Using a pre-built lookup table to do the mapping seems to be the obvious solution. I guess it depends how big the arrays you will be dealing with are. But even if a direct mapping is not possible, I'd still go for a lookup table, maybe of byte-size patterns that you can use to build the word-sized pattern for the final index.
The following approach computes the next bit-reversed index from the previous one like in Charles Bailey's answer, but in a more optimized way. Note that incrementing a number simply flips a sequence of least-significant bits, for example from 0111 to 1000. So in order to compute the next bit-reversed index, you have to flip a sequence of most-significant bits. If your target platform has a CTZ ("count trailing zeros") instruction, this can be done efficiently.
Example using GCC's __builtin_ctz:
void brswap(double *a, unsigned n) {
for (unsigned i = 0, j = 0; i < n; i++) {
if (i < j) {
double tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
// Length of the mask.
unsigned len = __builtin_ctz(i + 1) + 1;
// XOR with mask.
j ^= n - (n >> len);
}
}
Without a CTZ instruction, you can also use integer division:
void brswap(double *a, unsigned n) {
for (unsigned i = 0, j = 0; i < n; i++) {
if (i < j) {
double tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
// Compute a mask of LSBs.
unsigned mask = i ^ (i + 1);
// Using division to bit-reverse a single bit.
unsigned rev = n / (mask + 1);
// XOR with mask.
j ^= n - rev;
}
}
Element 00000001b should be swapped
with element 10000000b
I think you mean "Element 00000001b should be swapped with element 11111110b" in the first line?
Instead of awapping 256 bytes you could cast the array to (long long*) and swap 32 "long long" values instead, that should be much faster on 64 bit machines (or use 64 long values on a 32 bit machine).
Secondly if you naively run through the array and swap all values with its complement than you will swap all elements twice, so you have done nothing at all :-)
So you first have to identity which are the complements and leave them out of your loop.

Resources