Convert Little Endian to Big Endian - c

I just want to ask if my method is correct to convert from little endian to big endian, just to make sure if I understand the difference.
I have a number which is stored in little-endian, here are the binary and hex representations of the number:
‭0001 0010 0011 0100 0101 0110 0111 1000‬
‭12345678‬
In big-endian format I believe the bytes should be swapped, like this:
1000 0111 0110 0101 0100 0011 0010 0001
‭87654321
Is this correct?
Also, the code below attempts to do this but fails. Is there anything obviously wrong or can I optimize something? If the code is bad for this conversion can you please explain why and show a better method of performing the same conversion?
uint32_t num = 0x12345678;
uint32_t b0,b1,b2,b3,b4,b5,b6,b7;
uint32_t res = 0;
b0 = (num & 0xf) << 28;
b1 = (num & 0xf0) << 24;
b2 = (num & 0xf00) << 20;
b3 = (num & 0xf000) << 16;
b4 = (num & 0xf0000) << 12;
b5 = (num & 0xf00000) << 8;
b6 = (num & 0xf000000) << 4;
b7 = (num & 0xf0000000) << 4;
res = b0 + b1 + b2 + b3 + b4 + b5 + b6 + b7;
printf("%d\n", res);

OP's sample code is incorrect.
Endian conversion works at the bit and 8-bit byte level. Most endian issues deal with the byte level. OP's code is doing a endian change at the 4-bit nibble level. Recommend instead:
// Swap endian (big to little) or (little to big)
uint32_t num = 9;
uint32_t b0,b1,b2,b3;
uint32_t res;
b0 = (num & 0x000000ff) << 24u;
b1 = (num & 0x0000ff00) << 8u;
b2 = (num & 0x00ff0000) >> 8u;
b3 = (num & 0xff000000) >> 24u;
res = b0 | b1 | b2 | b3;
printf("%" PRIX32 "\n", res);
If performance is truly important, the particular processor would need to be known. Otherwise, leave it to the compiler.
[Edit] OP added a comment that changes things.
"32bit numerical value represented by the hexadecimal representation (st uv wx yz) shall be recorded in a four-byte field as (st uv wx yz)."
It appears in this case, the endian of the 32-bit number is unknown and the result needs to be store in memory in little endian order.
uint32_t num = 9;
uint8_t b[4];
b[0] = (uint8_t) (num >> 0u);
b[1] = (uint8_t) (num >> 8u);
b[2] = (uint8_t) (num >> 16u);
b[3] = (uint8_t) (num >> 24u);
[2016 Edit] Simplification
... The type of the result is that of the promoted left operand.... Bitwise shift operators C11 §6.5.7 3
Using a u after the shift constants (right operands) results in the same as without it.
b3 = (num & 0xff000000) >> 24u;
b[3] = (uint8_t) (num >> 24u);
// same as
b3 = (num & 0xff000000) >> 24;
b[3] = (uint8_t) (num >> 24);

Sorry, my answer is a bit too late, but it seems nobody mentioned built-in functions to reverse byte order, which in very important in terms of performance.
Most of the modern processors are little-endian, while all network protocols are big-endian. That is history and more on that you can find on Wikipedia. But that means our processors convert between little- and big-endian millions of times while we browse the Internet.
That is why most architectures have a dedicated processor instructions to facilitate this task. For x86 architectures there is BSWAP instruction, and for ARMs there is REV. This is the most efficient way to reverse byte order.
To avoid assembly in our C code, we can use built-ins instead. For GCC there is __builtin_bswap32() function and for Visual C++ there is _byteswap_ulong(). Those function will generate just one processor instruction on most architectures.
Here is an example:
#include <stdio.h>
#include <inttypes.h>
int main()
{
uint32_t le = 0x12345678;
uint32_t be = __builtin_bswap32(le);
printf("Little-endian: 0x%" PRIx32 "\n", le);
printf("Big-endian: 0x%" PRIx32 "\n", be);
return 0;
}
Here is the output it produces:
Little-endian: 0x12345678
Big-endian: 0x78563412
And here is the disassembly (without optimization, i.e. -O0):
uint32_t be = __builtin_bswap32(le);
0x0000000000400535 <+15>: mov -0x8(%rbp),%eax
0x0000000000400538 <+18>: bswap %eax
0x000000000040053a <+20>: mov %eax,-0x4(%rbp)
There is just one BSWAP instruction indeed.
So, if we do care about the performance, we should use those built-in functions instead of any other method of byte reversing. Just my 2 cents.

I think you can use function htonl(). Network byte order is big endian.

"I swap each bytes right?" -> yes, to convert between little and big endian, you just give the bytes the opposite order.
But at first realize few things:
size of uint32_t is 32bits, which is 4 bytes, which is 8 HEX digits
mask 0xf retrieves the 4 least significant bits, to retrieve 8 bits, you need 0xff
so in case you want to swap the order of 4 bytes with that kind of masks, you could:
uint32_t res = 0;
b0 = (num & 0xff) << 24; ; least significant to most significant
b1 = (num & 0xff00) << 8; ; 2nd least sig. to 2nd most sig.
b2 = (num & 0xff0000) >> 8; ; 2nd most sig. to 2nd least sig.
b3 = (num & 0xff000000) >> 24; ; most sig. to least sig.
res = b0 | b1 | b2 | b3 ;

You could do this:
int x = 0x12345678;
x = ( x >> 24 ) | (( x << 8) & 0x00ff0000 )| ((x >> 8) & 0x0000ff00) | ( x << 24) ;
printf("value = %x", x); // x will be printed as 0x78563412

One slightly different way of tackling this that can sometimes be useful is to have a union of the sixteen or thirty-two bit value and an array of chars. I've just been doing this when getting serial messages that come in with big endian order, yet am working on a little endian micro.
union MessageLengthUnion
{
uint16_t asInt;
uint8_t asChars[2];
};
Then when I get the messages in I put the first received uint8 in .asChars[1], the second in .asChars[0] then I access it as the .asInt part of the union in the rest of my program.
If you have a thirty-two bit value to store you can have the array four long.

I am assuming you are on linux
Include "byteswap.h" & Use int32_t bswap_32(int32_t argument);
It is logical view, In actual see, /usr/include/byteswap.h

one more suggestion :
unsigned int a = 0xABCDEF23;
a = ((a&(0x0000FFFF)) << 16) | ((a&(0xFFFF0000)) >> 16);
a = ((a&(0x00FF00FF)) << 8) | ((a&(0xFF00FF00)) >>8);
printf("%0x\n",a);

A Simple C program to convert from little to big
#include <stdio.h>
int main() {
unsigned int little=0x1234ABCD,big=0;
unsigned char tmp=0,l;
printf(" Little endian little=%x\n",little);
for(l=0;l < 4;l++)
{
tmp=0;
tmp = little | tmp;
big = tmp | (big << 8);
little = little >> 8;
}
printf(" Big endian big=%x\n",big);
return 0;
}

OP's code is incorrect for the following reasons:
The swaps are being performed on a nibble (4-bit) boundary, instead of a byte (8-bit) boundary.
The shift-left << operations of the final four swaps are incorrect, they should be shift-right >> operations and their shift values would also need to be corrected.
The use of intermediary storage is unnecessary, and the code can therefore be rewritten to be more concise/recognizable. In doing so, some compilers will be able to better-optimize the code by recognizing the oft-used pattern.
Consider the following code, which efficiently converts an unsigned value:
// Swap endian (big to little) or (little to big)
uint32_t num = 0x12345678;
uint32_t res =
((num & 0x000000FF) << 24) |
((num & 0x0000FF00) << 8) |
((num & 0x00FF0000) >> 8) |
((num & 0xFF000000) >> 24);
printf("%0x\n", res);
The result is represented here in both binary and hex, notice how the bytes have swapped:
‭0111 1000 0101 0110 0011 0100 0001 0010‬
78563412
Optimizing
In terms of performance, leave it to the compiler to optimize your code when possible. You should avoid unnecessary data structures like arrays for simple algorithms like this, doing so will usually cause different instruction behavior such as accessing RAM instead of using CPU registers.

#include <stdio.h>
#include <inttypes.h>
uint32_t le_to_be(uint32_t num) {
uint8_t b[4] = {0};
*(uint32_t*)b = num;
uint8_t tmp = 0;
tmp = b[0];
b[0] = b[3];
b[3] = tmp;
tmp = b[1];
b[1] = b[2];
b[2] = tmp;
return *(uint32_t*)b;
}
int main()
{
printf("big endian value is %x\n", le_to_be(0xabcdef98));
return 0;
}

You can use the lib functions. They boil down to assembly, but if you are open to alternate implementations in C, here they are (assuming int is 32-bits) :
void byte_swap16(unsigned short int *pVal16) {
//#define method_one 1
// #define method_two 1
#define method_three 1
#ifdef method_one
unsigned char *pByte;
pByte = (unsigned char *) pVal16;
*pVal16 = (pByte[0] << 8) | pByte[1];
#endif
#ifdef method_two
unsigned char *pByte0;
unsigned char *pByte1;
pByte0 = (unsigned char *) pVal16;
pByte1 = pByte0 + 1;
*pByte0 = *pByte0 ^ *pByte1;
*pByte1 = *pByte0 ^ *pByte1;
*pByte0 = *pByte0 ^ *pByte1;
#endif
#ifdef method_three
unsigned char *pByte;
pByte = (unsigned char *) pVal16;
pByte[0] = pByte[0] ^ pByte[1];
pByte[1] = pByte[0] ^ pByte[1];
pByte[0] = pByte[0] ^ pByte[1];
#endif
}
void byte_swap32(unsigned int *pVal32) {
#ifdef method_one
unsigned char *pByte;
// 0x1234 5678 --> 0x7856 3412
pByte = (unsigned char *) pVal32;
*pVal32 = ( pByte[0] << 24 ) | (pByte[1] << 16) | (pByte[2] << 8) | ( pByte[3] );
#endif
#if defined(method_two) || defined (method_three)
unsigned char *pByte;
pByte = (unsigned char *) pVal32;
// move lsb to msb
pByte[0] = pByte[0] ^ pByte[3];
pByte[3] = pByte[0] ^ pByte[3];
pByte[0] = pByte[0] ^ pByte[3];
// move lsb to msb
pByte[1] = pByte[1] ^ pByte[2];
pByte[2] = pByte[1] ^ pByte[2];
pByte[1] = pByte[1] ^ pByte[2];
#endif
}
And the usage is performed like so:
unsigned short int u16Val = 0x1234;
byte_swap16(&u16Val);
unsigned int u32Val = 0x12345678;
byte_swap32(&u32Val);

Below is an other approach that was useful for me
convertLittleEndianByteArrayToBigEndianByteArray (byte littlendianByte[], byte bigEndianByte[], int ArraySize){
int i =0;
for(i =0;i<ArraySize;i++){
bigEndianByte[i] = (littlendianByte[ArraySize-i-1] << 7 & 0x80) | (littlendianByte[ArraySize-i-1] << 5 & 0x40) |
(littlendianByte[ArraySize-i-1] << 3 & 0x20) | (littlendianByte[ArraySize-i-1] << 1 & 0x10) |
(littlendianByte[ArraySize-i-1] >>1 & 0x08) | (littlendianByte[ArraySize-i-1] >> 3 & 0x04) |
(littlendianByte[ArraySize-i-1] >>5 & 0x02) | (littlendianByte[ArraySize-i-1] >> 7 & 0x01) ;
}
}

Below program produce the result as needed:
#include <stdio.h>
unsigned int Little_To_Big_Endian(unsigned int num);
int main( )
{
int num = 0x11223344 ;
printf("\n Little_Endian = 0x%X\n",num);
printf("\n Big_Endian = 0x%X\n",Little_To_Big_Endian(num));
}
unsigned int Little_To_Big_Endian(unsigned int num)
{
return (((num >> 24) & 0x000000ff) | ((num >> 8) & 0x0000ff00) | ((num << 8) & 0x00ff0000) | ((num << 24) & 0xff000000));
}
And also below function can be used:
unsigned int Little_To_Big_Endian(unsigned int num)
{
return (((num & 0x000000ff) << 24) | ((num & 0x0000ff00) << 8 ) | ((num & 0x00ff0000) >> 8) | ((num & 0xff000000) >> 24 ));
}

#include<stdio.h>
int main(){
int var = 0X12345678;
var = ((0X000000FF & var)<<24)|
((0X0000FF00 & var)<<8) |
((0X00FF0000 & var)>>8) |
((0XFF000000 & var)>>24);
printf("%x",var);
}

Here is a little function I wrote that works pretty good, its probably not portable to every single machine or as fast a single cpu instruction, but should work for most. It can handle numbers up to 32 byte (256 bit) and works for both big and little endian swaps. The nicest part about this function is you can point it into a byte array coming off or going on the wire and swap the bytes inline before converting.
#include <stdio.h>
#include <string.h>
void byteSwap(char**,int);
int main() {
//32 bit
int test32 = 0x12345678;
printf("\n BigEndian = 0x%X\n",test32);
char* pTest32 = (char*) &test32;
//convert to little endian
byteSwap((char**)&pTest32, 4);
printf("\n LittleEndian = 0x%X\n", test32);
//64 bit
long int test64 = 0x1234567891234567LL;
printf("\n BigEndian = 0x%lx\n",test64);
char* pTest64 = (char*) &test64;
//convert to little endian
byteSwap((char**)&pTest64,8);
printf("\n LittleEndian = 0x%lx\n",test64);
//back to big endian
byteSwap((char**)&pTest64,8);
printf("\n BigEndian = 0x%lx\n",test64);
return 0;
}
void byteSwap(char** src,int size) {
int x = 0;
char b[32];
while(size-- >= 0) { b[x++] = (*src)[size]; };
memcpy(*src,&b,x);
}
output:
$gcc -o main *.c -lm
$main
BigEndian = 0x12345678
LittleEndian = 0x78563412
BigEndian = 0x1234567891234567
LittleEndian = 0x6745239178563412
BigEndian = 0x1234567891234567

Related

How can i swap every 2 bits in a binary number?

I'm working on this programming project and part of it is to write a function with just bitwise operators that switches every two bits. I've come up with a comb sort of algorithm that accomplishes this but it only works for unsigned numbers, any ideas how I can get it to work with signed numbers as well? I'm completely stumped on this one. Heres what I have so far:
// Mask 1 - For odd bits
int a1 = 0xAA; a1 <<= 24;
int a2 = 0xAA; a2 <<= 16;
int a3 = 0xAA; a3 <<= 8;
int a4 = 0xAA;
int mask1 = a1 | a2 | a3 | a4;
// Mask 2 - For even bits
int b1 = 0x55; b1 <<= 24;
int b2 = 0x55; b2 <<= 16;
int b3 = 0x55; b3 <<= 8;
int b4 = 0x55;
int mask2 = b1 | b2 | b3 | b4;
// Mask Results
int odd = x & mask1;
int even = x & mask2;
int newNum = (odd >> 1) | (even << 1);
return newNum;
The manual creation of the masks by or'ing variables together is because the only constants that can be used are between 0x00-0xFF.
The problem is that odd >> 1 will sign extend with negative numbers. Simply do another and to eliminate the duplicated bit.
int newNum = ((odd >> 1) & mask2) | (even << 1);
Minimizing the operators and noticing the sign extension problem gives:
int odd = 0x55;
odd |= odd << 8;
odd |= odd << 16;
int newnum = ((x & odd) << 1 ) // This is (sort of well defined)
| ((x >> 1) & odd); // this handles the sign extension without
// additional & -operations
One remark though: bit twiddling should be generally applied to unsigned integers only.
When you right shift a signed number, the sign will also be extended. This is known as sign extension. Typically when you are dealing with bit shifting, you want to use unsigned numbers.
Minimizing use of constants by working one byte at a time:
unsigned char* byte_p;
unsigned char byte;
int ii;
byte_p = &x;
for(ii=0; ii<4; ii++) {
byte = *byte_p;
*byte_p = ((byte & 0xAA)>>1) | ((byte & 0x55) << 1);
byte_p++;
}
Minimizing operations and keeping constants between 0x00 and 0xFF:
unsigned int comb = (0xAA << 8) + 0xAA;
comb += comb<<16;
newNum = ((x & comb) >> 1) | ((x & (comb >> 1)) << 1);
10 operations.
Just saw the comments above and realize this is implementing (more or less) some of the suggestions that #akisuihkonen made. So consider this a tip of the hat!

Swap byte 2 and 4 in a 32 bit integer

I had this interview question -
Swap byte 2 and byte 4 within an integer sequence.
Integer is a 4 byte wide i.e. 32 bits
My approach was to use char *pointer and a temp char to swap the bytes.
For clarity I have broken the steps otherwise an character array can be considered.
unsigned char *b2, *b4, tmpc;
int n = 0xABCD; ///expected output 0xADCB
b2 = &n; b2++;
b4 = &n; b4 +=3;
///swap the values;
tmpc = *b2;
*b2 = *b4;
*b4 = tmpc;
Any other methods?
int someInt = 0x12345678;
int byte2 = someInt & 0x00FF0000;
int byte4 = someInt & 0x000000FF;
int newInt = (someInt & 0xFF00FF00) | (byte2 >> 16) | (byte4 << 16);
To avoid any concerns about sign extension:
int someInt = 0x12345678;
int newInt = (someInt & 0xFF00FF00) | ((someInt >> 16) & 0x000000FF) | ((someInt << 16) & 0x00FF0000);
(Or, to really impress them, you could use the triple XOR technique.)
Just for fun (probably a tupo somewhere):
int newInt = someInt ^ ((someInt >> 16) & 0x000000FF);
newInt = newInt ^ ((newInt << 16) & 0x00FF0000);
newInt = newInt ^ ((newInt >> 16) & 0x000000FF);
(Actually, I just tested it and it works!)
You can mask out the bytes you want and shift them around. Something like this:
unsigned int swap(unsigned int n) {
unsigned int b2 = (0x0000FF00 & n);
unsigned int b4 = (0xFF000000 & n);
n ^= b2 | b4; // Clear the second and fourth bytes
n |= (b2 << 16) | (b4 >> 16); // Swap and write them.
return n;
}
This assumes that the "first" byte is the lowest order byte (even if in memory it may be stored big-endian).
Also it uses unsigned ints everywhere to avoid right shifting introducing extra 1s due to sign extension.
What about unions?
int main(void)
{
char tmp;
union {int n; char ary[4]; } un;
un.n = 0xABCDEF00;
tmp = un.ary[3];
un.ary[3] = un.ary[1];
un.ary[1] = tmp;
printf("0x%.2X\n", un.n);
}
in > 0xABCDEF00
out>0xEFCDAB00
Please don't forget to check endianess. this only work for little endian, but should not be hard to make it portable.

Is there a more efficient way of expanding a char to an uint64_t?

I want to inflate an unsigned char to an uint64_t by repeating each bit 8 times. E.g.
char -> uint64_t
0x00 -> 0x00
0x01 -> 0xFF
0x02 -> 0xFF00
0x03 -> 0xFFFF
0xAA -> 0xFF00FF00FF00FF00
I currently have the following implementation, using bit shifts to test if a bit is set, to accomplish this:
#include <stdint.h>
#include <inttypes.h>
#define BIT_SET(var, pos) ((var) & (1 << (pos)))
static uint64_t inflate(unsigned char a)
{
uint64_t MASK = 0xFF;
uint64_t result = 0;
for (int i = 0; i < 8; i++) {
if (BIT_SET(a, i))
result |= (MASK << (8 * i));
}
return result;
}
However, I'm fairly new to C, so this fiddling with individual bits makes me a little vary that there might be a better (i.e. more efficient) way of doing this.
EDIT TO ADD
Ok, so after trying out the table lookup solution, here are the results. However, keep in mind that I didn't test the routine directly, but rather as part of bigger function (a multiplication of binary matrices to be precise), so this might have affected how the results turned out. So, on my computer, when multiplying a million 8x8 matrices, and compiled with:
gcc -O2 -Wall -std=c99 foo.c
I got
./a.out original
real 0m0.127s
user 0m0.124s
sys 0m0.000s
./a.out table_lookup
real 0m0.012s
user 0m0.012s
sys 0m0.000s
So at least on my machine (a virtual machine 64 bit Linux Mint I should mention), the table lookup approach seems to provide a roughly 10-times speed-up, so I will accept that as the answer.
If you're looking for efficiency use a lookup table: a static array of 256 entries, each already holding the required result. You can use your code above to generate it.
In selected architectures (SSE,Neon) there are fast vector operations that can speed up this task or are designed to do this. Without special instructions the suggested look up table approach is both the fastest and most portable.
If the 2k size is an issue, parallel vector arithmetic operations can be simulated:
static uint64_t inflate_parallel(unsigned char a) {
uint64_t vector = a * 0x0101010101010101ULL;
// replicate the word all over qword
// A5 becomes A5 A5 A5 A5 A5 A5 A5 A5
vector &= 0x8040201008040201; // becomes 80 00 20 00 00 04 00 01 <--
vector += 0x00406070787c7e7f; // becomes 80 40 80 70 78 80 7e 80
// MSB is correct
vector = (vector >> 7) & 0x0101010101010101ULL; // LSB is correct
return vector * 255; // all bits correct
}
EDIT: 2^31 iterations, (four time unroll to mitigate loop evaluation)
time ./parallel time ./original time ./lookup
real 0m2.038s real 0m14.161s real 0m1.436s
user 0m2.030s user 0m14.120s user 0m1.430s
sys 0m0.000s sys 0m0.000s sys 0m0.000s
That's about 7x speedup, while the lookup table gives ~10x speedup
You should profile what your code does, before worrying about optimising it.
On my compiler locally, your code gets entirely inlined, unrolled and turned into 8 constant test + or instructions when the value is unknown, and turned into a constant when the value is known at compile time. I could probably marginally improve it by removing a few branches, but the compiler is doing a reasonable job on its own.
Optimising the loop is then a bit pointless. A table lookup might be more efficient, but would probably prevent the compiler from making optimisations itself.
The desired functionality can be achieved by moving each bit of the source into the lsb of the appropriate target byte (0 → 0, 1 → 8, 2 → 16, ...., 7 → 56), then expanding each lsb to cover the whole byte, which is easily done by multiplying with 0xff (255). Instead of moving bits into place individually using shifts, then combining the results, we can use an integer multiply to shift multiple bits in parallel. To prevent self-overlap, we can move only the least-significant seven source bits in this fashion, but need to move the source msb separately with a shift.
This leads to the following ISO-C99 implementation:
#include <stdint.h>
/* expand each bit in input into one byte in output */
uint64_t fast_inflate (uint8_t a)
{
const uint64_t spread7 = (1ULL << 42) | (1ULL << 35) | (1ULL << 28) | (1ULL << 21) |
(1ULL << 14) | (1ULL << 7) | (1UL << 0);
const uint64_t byte_lsb = (1ULL << 56) | (1ULL << 48) | (1ULL << 40) | (1ULL << 32) |
(1ULL << 24) | (1ULL << 16) | (1ULL << 8) | (1ULL << 0);
uint64_t r;
/* spread bits to lsbs of each byte */
r = (((uint64_t)(a & 0x7f) * spread7) + ((uint64_t)a << 49));
/* extract the lsbs of all bytes */
r = r & byte_lsb;
/* fill each byte with its lsb */
r = r * 0xff;
return r;
}
#define BIT_SET(var, pos) ((var) & (1 << (pos)))
static uint64_t inflate(unsigned char a)
{
uint64_t MASK = 0xFF;
uint64_t result = 0;
for (int i = 0; i < 8; i++) {
if (BIT_SET(a, i))
result |= (MASK << (8 * i));
}
return result;
}
#include <stdio.h>
#include <stdlib.h>
int main (void)
{
uint8_t a = 0;
do {
uint64_t res = fast_inflate (a);
uint64_t ref = inflate (a);
if (res != ref) {
printf ("error # %02x: fast_inflate = %016llx inflate = %016llx\n",
a, res, ref);
return EXIT_FAILURE;
}
a++;
} while (a);
printf ("test passed\n");
return EXIT_SUCCESS;
}
Most x64 compilers will compile fast_inflate() in straightforward manner. For example, my Intel compiler Version 13.1.3.198, when building with /Ox, generates the 11-instruction sequence below. Note that the final multiply with 0xff is actually implemented as a shift and subtract sequence.
fast_inflate PROC
mov rdx, 040810204081H
movzx r9d, cl
and ecx, 127
mov r8, 0101010101010101H
imul rdx, rcx
shl r9, 49
add r9, rdx
and r9, r8
mov rax, r9
shl rax, 8
sub rax, r9
ret
If you're willing to spend 256 * 8 = 2kB of memory on this (i.e. become less efficient in terms of memory, but more efficient in terms of CPU cycles needed), the most efficient way would be to pre-compute a lookup table:
static uint64_t inflate(unsigned char a) {
static const uint64_t charToUInt64[256] = {
0x0000000000000000, 0x00000000000000FF, 0x000000000000FF00, 0x000000000000FFFF,
// ...
};
return charToUInt64[a];
}
Here is one more method using only simple arithmetics:
uint64_t inflate_chqrlie(uint8_t value) {
uint64_t x = value;
x = (x | (x << 28));
x = (x | (x << 14));
x = (x | (x << 7)) & 0x0101010101010101ULL;
x = (x << 8) - x;
return x;
}
Another very efficient and concise one by phuclv using multiplication and mask:
static uint64_t inflate_phuclv(uint8_t b) {
uint64_t MAGIC = 0x8040201008040201ULL;
uint64_t MASK = 0x8080808080808080ULL;
return ((MAGIC * b) & MASK) >> 7;
}
And another with a small lookup table:
static uint32_t const lut_4_32[16] = {
0x00000000, 0x000000FF, 0x0000FF00, 0x0000FFFF,
0x00FF0000, 0x00FF00FF, 0x00FFFF00, 0x00FFFFFF,
0xFF000000, 0xFF0000FF, 0xFF00FF00, 0xFF00FFFF,
0xFFFF0000, 0xFFFF00FF, 0xFFFFFF00, 0xFFFFFFFF,
};
static uint64_t inflate_lut32(uint8_t b) {
return lut_4_32[b & 15] | ((uint64_t)lut_4_32[b >> 4] << 32);
}
I wrote a benchmarking program to determine relative performance of the different approaches on my system (x86_64-apple-darwin16.7.0, Apple LLVM version 9.0.0 (clang-900.0.39.2, clang -O3).
The results show that my function inflate_chqrlie is faster than naive approaches but slower than other elaborate versions, all of which are beaten hands down by inflate_lut64 using a 2KB the lookup table in cache optimal situations.
The function inflate_lut32, using a much smaller lookup table (64 bytes instead of 2KB) is not as fast as inflate_lut64, but seems a good compromise for 32-bit architectures as it is still much faster than all other alternatives.
64-bit benchmark:
inflate: 0, 848.316ms
inflate_Curd: 0, 845.424ms
inflate_chqrlie: 0, 371.502ms
fast_inflate_njuffa: 0, 288.669ms
inflate_parallel1: 0, 242.827ms
inflate_parallel2: 0, 315.105ms
inflate_parallel3: 0, 363.379ms
inflate_parallel4: 0, 304.051ms
inflate_parallel5: 0, 301.205ms
inflate_phuclv: 0, 109.130ms
inflate_lut32: 0, 197.178ms
inflate_lut64: 0, 25.160ms
32-bit benchmark:
inflate: 0, 1451.464ms
inflate_Curd: 0, 955.509ms
inflate_chqrlie: 0, 385.036ms
fast_inflate_njuffa: 0, 463.212ms
inflate_parallel1: 0, 468.070ms
inflate_parallel2: 0, 570.107ms
inflate_parallel3: 0, 511.741ms
inflate_parallel4: 0, 601.892ms
inflate_parallel5: 0, 506.695ms
inflate_phuclv: 0, 192.431ms
inflate_lut32: 0, 140.968ms
inflate_lut64: 0, 28.776ms
Here is the code:
#include <stdio.h>
#include <stdint.h>
#include <time.h>
static uint64_t inflate(unsigned char a) {
#define BIT_SET(var, pos) ((var) & (1 << (pos)))
uint64_t MASK = 0xFF;
uint64_t result = 0;
for (int i = 0; i < 8; i++) {
if (BIT_SET(a, i))
result |= (MASK << (8 * i));
}
return result;
}
static uint64_t inflate_Curd(unsigned char a) {
uint64_t mask = 0xFF;
uint64_t result = 0;
for (int i = 0; i < 8; i++) {
if (a & 1)
result |= mask;
mask <<= 8;
a >>= 1;
}
return result;
}
uint64_t inflate_chqrlie(uint8_t value) {
uint64_t x = value;
x = (x | (x << 28));
x = (x | (x << 14));
x = (x | (x << 7)) & 0x0101010101010101ULL;
x = (x << 8) - x;
return x;
}
uint64_t fast_inflate_njuffa(uint8_t a) {
const uint64_t spread7 = (1ULL << 42) | (1ULL << 35) | (1ULL << 28) | (1ULL << 21) |
(1ULL << 14) | (1ULL << 7) | (1UL << 0);
const uint64_t byte_lsb = (1ULL << 56) | (1ULL << 48) | (1ULL << 40) | (1ULL << 32) |
(1ULL << 24) | (1ULL << 16) | (1ULL << 8) | (1ULL << 0);
uint64_t r;
/* spread bits to lsbs of each byte */
r = (((uint64_t)(a & 0x7f) * spread7) + ((uint64_t)a << 49));
/* extract the lsbs of all bytes */
r = r & byte_lsb;
/* fill each byte with its lsb */
r = r * 0xff;
return r;
}
// Aki Suuihkonen: 1.265
static uint64_t inflate_parallel1(unsigned char a) {
uint64_t vector = a * 0x0101010101010101ULL;
// replicate the word all over qword
// A5 becomes A5 A5 A5 A5 A5 A5 A5 A5
vector &= 0x8040201008040201; // becomes 80 00 20 00 00 04 00 01 <--
vector += 0x00406070787c7e7f; // becomes 80 40 80 70 78 80 7e 80
// MSB is correct
vector = (vector >> 7) & 0x0101010101010101ULL; // LSB is correct
return vector * 255; // all bits correct
}
// By seizet and then combine: 1.583
static uint64_t inflate_parallel2(unsigned char a) {
uint64_t vector1 = a * 0x0002000800200080ULL;
uint64_t vector2 = a * 0x0000040010004001ULL;
uint64_t vector = (vector1 & 0x0100010001000100ULL) | (vector2 & 0x0001000100010001ULL);
return vector * 255;
}
// Stay in 32 bits as much as possible: 1.006
static uint64_t inflate_parallel3(unsigned char a) {
uint32_t vector1 = (( (a & 0x0F) * 0x00204081) & 0x01010101) * 255;
uint32_t vector2 = ((((a & 0xF0) >> 4) * 0x00204081) & 0x01010101) * 255;
return (((uint64_t)vector2) << 32) | vector1;
}
// Do the common computation in 64 bits: 0.915
static uint64_t inflate_parallel4(unsigned char a) {
uint32_t vector1 = (a & 0x0F) * 0x00204081;
uint32_t vector2 = ((a & 0xF0) >> 4) * 0x00204081;
uint64_t vector = (vector1 | (((uint64_t)vector2) << 32)) & 0x0101010101010101ULL;
return vector * 255;
}
// Some computation is done in 64 bits a little sooner: 0.806
static uint64_t inflate_parallel5(unsigned char a) {
uint32_t vector1 = (a & 0x0F) * 0x00204081;
uint64_t vector2 = (a & 0xF0) * 0x002040810000000ULL;
uint64_t vector = (vector1 | vector2) & 0x0101010101010101ULL;
return vector * 255;
}
static uint64_t inflate_phuclv(uint8_t b) {
uint64_t MAGIC = 0x8040201008040201ULL;
uint64_t MASK = 0x8080808080808080ULL;
return ((MAGIC * b) & MASK) >> 7;
}
static uint32_t const lut_4_32[16] = {
0x00000000, 0x000000FF, 0x0000FF00, 0x0000FFFF,
0x00FF0000, 0x00FF00FF, 0x00FFFF00, 0x00FFFFFF,
0xFF000000, 0xFF0000FF, 0xFF00FF00, 0xFF00FFFF,
0xFFFF0000, 0xFFFF00FF, 0xFFFFFF00, 0xFFFFFFFF,
};
static uint64_t inflate_lut32(uint8_t b) {
return lut_4_32[b & 15] | ((uint64_t)lut_4_32[b >> 4] << 32);
}
static uint64_t lut_8_64[256];
static uint64_t inflate_lut64(uint8_t b) {
return lut_8_64[b];
}
#define ITER 1000000
int main() {
clock_t t;
uint64_t x;
for (int b = 0; b < 256; b++)
lut_8_64[b] = inflate((uint8_t)b);
#define TEST(func) do { \
t = clock(); \
x = 0; \
for (int i = 0; i < ITER; i++) { \
for (int b = 0; b < 256; b++) \
x ^= func((uint8_t)b); \
} \
t = clock() - t; \
printf("%20s: %llu, %.3fms\n", \
#func, x, t * 1000.0 / CLOCKS_PER_SEC); \
} while (0)
TEST(inflate);
TEST(inflate_Curd);
TEST(inflate_chqrlie);
TEST(fast_inflate_njuffa);
TEST(inflate_parallel1);
TEST(inflate_parallel2);
TEST(inflate_parallel3);
TEST(inflate_parallel4);
TEST(inflate_parallel5);
TEST(inflate_phuclv);
TEST(inflate_lut32);
TEST(inflate_lut64);
return 0;
}
Variations on the same theme as #Aki answer. Some of them are better here, but it may depend on your compiler and target machines (they should be more suitable for superscalar processor that Aki's function even if they do more work as there is less data dependencies)
// Aki Suuihkonen: 1.265
static uint64_t inflate_parallel1(unsigned char a) {
uint64_t vector = a * 0x0101010101010101ULL;
vector &= 0x8040201008040201;
vector += 0x00406070787c7e7f;
vector = (vector >> 7) & 0x0101010101010101ULL;
return vector * 255;
}
// By seizet and then combine: 1.583
static uint64_t inflate_parallel2(unsigned char a) {
uint64_t vector1 = a * 0x0002000800200080ULL;
uint64_t vector2 = a * 0x0000040010004001ULL;
uint64_t vector = (vector1 & 0x0100010001000100ULL) | (vector2 & 0x0001000100010001ULL);
return vector * 255;
}
// Stay in 32 bits as much as possible: 1.006
static uint64_t inflate_parallel3(unsigned char a) {
uint32_t vector1 = (( (a & 0x0F) * 0x00204081) & 0x01010101) * 255;
uint32_t vector2 = ((((a & 0xF0) >> 4) * 0x00204081) & 0x01010101) * 255;
return (((uint64_t)vector2) << 32) | vector1;
}
// Do the common computation in 64 bits: 0.915
static uint64_t inflate_parallel4(unsigned char a) {
uint32_t vector1 = (a & 0x0F) * 0x00204081;
uint32_t vector2 = ((a & 0xF0) >> 4) * 0x00204081;
uint64_t vector = (vector1 | (((uint64_t)vector2) << 32)) & 0x0101010101010101ULL;
return vector * 255;
}
// Some computation is done in 64 bits a little sooner: 0.806
static uint64_t inflate_parallel5(unsigned char a) {
uint32_t vector1 = (a & 0x0F) * 0x00204081;
uint64_t vector2 = (a & 0xF0) * 0x002040810000000ULL;
uint64_t vector = (vector1 | vector2) & 0x0101010101010101ULL;
return vector * 255;
}
Two minor optimizations:
One for testing the bits in the input (a will be destroyed but this doesn't matter)
The other for shifting the mask.
static uint64_t inflate(unsigned char a)
{
uint64_t mask = 0xFF;
uint64_t result = 0;
for (int i = 0; i < 8; i++) {
if (a & 1)
result |= mask;
mask <<= 8;
a >>= 1;
}
return result;
}
Maybe you can also replace the 'for (int i = 0; i < 8; i++)'-loop by a
'while (a)'-loop.
This works, however, only if the right shift a >>=1 works unsigned
(As much as I know C standard allows the compiler to do it signed or unsigned).
Otherwise you will have an infinite loop in some cases.
EDIT:
To see the result I compiled both variants with gcc -std=c99 -S source.c.
A quick glance at the resulting assembler outputs shows that the optimization shown above yields ca. 1/3 viewer instructions, most of them inside the loop.

convert big endian to little endian in C [without using provided func] [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need to write a function to convert big endian to little endian in C. I can not use any library function.
Assuming what you need is a simple byte swap, try something like
Unsigned 16 bit conversion:
swapped = (num>>8) | (num<<8);
Unsigned 32-bit conversion:
swapped = ((num>>24)&0xff) | // move byte 3 to byte 0
((num<<8)&0xff0000) | // move byte 1 to byte 2
((num>>8)&0xff00) | // move byte 2 to byte 1
((num<<24)&0xff000000); // byte 0 to byte 3
This swaps the byte orders from positions 1234 to 4321. If your input was 0xdeadbeef, a 32-bit endian swap might have output of 0xefbeadde.
The code above should be cleaned up with macros or at least constants instead of magic numbers, but hopefully it helps as is
EDIT: as another answer pointed out, there are platform, OS, and instruction set specific alternatives which can be MUCH faster than the above. In the Linux kernel there are macros (cpu_to_be32 for example) which handle endianness pretty nicely. But these alternatives are specific to their environments. In practice endianness is best dealt with using a blend of available approaches
By including:
#include <byteswap.h>
you can get an optimized version of machine-dependent byte-swapping functions.
Then, you can easily use the following functions:
__bswap_32 (uint32_t input)
or
__bswap_16 (uint16_t input)
#include <stdint.h>
//! Byte swap unsigned short
uint16_t swap_uint16( uint16_t val )
{
return (val << 8) | (val >> 8 );
}
//! Byte swap short
int16_t swap_int16( int16_t val )
{
return (val << 8) | ((val >> 8) & 0xFF);
}
//! Byte swap unsigned int
uint32_t swap_uint32( uint32_t val )
{
val = ((val << 8) & 0xFF00FF00 ) | ((val >> 8) & 0xFF00FF );
return (val << 16) | (val >> 16);
}
//! Byte swap int
int32_t swap_int32( int32_t val )
{
val = ((val << 8) & 0xFF00FF00) | ((val >> 8) & 0xFF00FF );
return (val << 16) | ((val >> 16) & 0xFFFF);
}
Update : Added 64bit byte swapping
int64_t swap_int64( int64_t val )
{
val = ((val << 8) & 0xFF00FF00FF00FF00ULL ) | ((val >> 8) & 0x00FF00FF00FF00FFULL );
val = ((val << 16) & 0xFFFF0000FFFF0000ULL ) | ((val >> 16) & 0x0000FFFF0000FFFFULL );
return (val << 32) | ((val >> 32) & 0xFFFFFFFFULL);
}
uint64_t swap_uint64( uint64_t val )
{
val = ((val << 8) & 0xFF00FF00FF00FF00ULL ) | ((val >> 8) & 0x00FF00FF00FF00FFULL );
val = ((val << 16) & 0xFFFF0000FFFF0000ULL ) | ((val >> 16) & 0x0000FFFF0000FFFFULL );
return (val << 32) | (val >> 32);
}
Here's a fairly generic version; I haven't compiled it, so there are probably typos, but you should get the idea,
void SwapBytes(void *pv, size_t n)
{
assert(n > 0);
char *p = pv;
size_t lo, hi;
for(lo=0, hi=n-1; hi>lo; lo++, hi--)
{
char tmp=p[lo];
p[lo] = p[hi];
p[hi] = tmp;
}
}
#define SWAP(x) SwapBytes(&x, sizeof(x));
NB: This is not optimised for speed or space. It is intended to be clear (easy to debug) and portable.
Update 2018-04-04
Added the assert() to trap the invalid case of n == 0, as spotted by commenter #chux.
If you need macros (e.g. embedded system):
#define SWAP_UINT16(x) (((x) >> 8) | ((x) << 8))
#define SWAP_UINT32(x) (((x) >> 24) | (((x) & 0x00FF0000) >> 8) | (((x) & 0x0000FF00) << 8) | ((x) << 24))
Edit: These are library functions. Following them is the manual way to do it.
I am absolutely stunned by the number of people unaware of __byteswap_ushort, __byteswap_ulong, and __byteswap_uint64. Sure they are Visual C++ specific, but they compile down to some delicious code on x86/IA-64 architectures. :)
Here's an explicit usage of the bswap instruction, pulled from this page. Note that the intrinsic form above will always be faster than this, I only added it to give an answer without a library routine.
uint32 cq_ntohl(uint32 a) {
__asm{
mov eax, a;
bswap eax;
}
}
As a joke:
#include <stdio.h>
int main (int argc, char *argv[])
{
size_t sizeofInt = sizeof (int);
int i;
union
{
int x;
char c[sizeof (int)];
} original, swapped;
original.x = 0x12345678;
for (i = 0; i < sizeofInt; i++)
swapped.c[sizeofInt - i - 1] = original.c[i];
fprintf (stderr, "%x\n", swapped.x);
return 0;
}
here's a way using the SSSE3 instruction pshufb using its Intel intrinsic, assuming you have a multiple of 4 ints:
unsigned int *bswap(unsigned int *destination, unsigned int *source, int length) {
int i;
__m128i mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3);
for (i = 0; i < length; i += 4) {
_mm_storeu_si128((__m128i *)&destination[i],
_mm_shuffle_epi8(_mm_loadu_si128((__m128i *)&source[i]), mask));
}
return destination;
}
Will this work / be faster?
uint32_t swapped, result;
((byte*)&swapped)[0] = ((byte*)&result)[3];
((byte*)&swapped)[1] = ((byte*)&result)[2];
((byte*)&swapped)[2] = ((byte*)&result)[1];
((byte*)&swapped)[3] = ((byte*)&result)[0];
This code snippet can convert 32bit little Endian number to Big Endian number.
#include <stdio.h>
main(){
unsigned int i = 0xfafbfcfd;
unsigned int j;
j= ((i&0xff000000)>>24)| ((i&0xff0000)>>8) | ((i&0xff00)<<8) | ((i&0xff)<<24);
printf("unsigned int j = %x\n ", j);
}
Here's a function I have been using - tested and works on any basic data type:
// SwapBytes.h
//
// Function to perform in-place endian conversion of basic types
//
// Usage:
//
// double d;
// SwapBytes(&d, sizeof(d));
//
inline void SwapBytes(void *source, int size)
{
typedef unsigned char TwoBytes[2];
typedef unsigned char FourBytes[4];
typedef unsigned char EightBytes[8];
unsigned char temp;
if(size == 2)
{
TwoBytes *src = (TwoBytes *)source;
temp = (*src)[0];
(*src)[0] = (*src)[1];
(*src)[1] = temp;
return;
}
if(size == 4)
{
FourBytes *src = (FourBytes *)source;
temp = (*src)[0];
(*src)[0] = (*src)[3];
(*src)[3] = temp;
temp = (*src)[1];
(*src)[1] = (*src)[2];
(*src)[2] = temp;
return;
}
if(size == 8)
{
EightBytes *src = (EightBytes *)source;
temp = (*src)[0];
(*src)[0] = (*src)[7];
(*src)[7] = temp;
temp = (*src)[1];
(*src)[1] = (*src)[6];
(*src)[6] = temp;
temp = (*src)[2];
(*src)[2] = (*src)[5];
(*src)[5] = temp;
temp = (*src)[3];
(*src)[3] = (*src)[4];
(*src)[4] = temp;
return;
}
}
EDIT: This function only swaps the endianness of aligned 16 bit words. A function often necessary for UTF-16/UCS-2 encodings.
EDIT END.
If you want to change the endianess of a memory block you can use my blazingly fast approach.
Your memory array should have a size that is a multiple of 8.
#include <stddef.h>
#include <limits.h>
#include <stdint.h>
void ChangeMemEndianness(uint64_t *mem, size_t size)
{
uint64_t m1 = 0xFF00FF00FF00FF00ULL, m2 = m1 >> CHAR_BIT;
size = (size + (sizeof (uint64_t) - 1)) / sizeof (uint64_t);
for(; size; size--, mem++)
*mem = ((*mem & m1) >> CHAR_BIT) | ((*mem & m2) << CHAR_BIT);
}
This kind of function is useful for changing the endianess of Unicode UCS-2/UTF-16 files.
If you are running on a x86 or x86_64 processor, the big endian is native. so
for 16 bit values
unsigned short wBigE = value;
unsigned short wLittleE = ((wBigE & 0xFF) << 8) | (wBigE >> 8);
for 32 bit values
unsigned int iBigE = value;
unsigned int iLittleE = ((iBigE & 0xFF) << 24)
| ((iBigE & 0xFF00) << 8)
| ((iBigE >> 8) & 0xFF00)
| (iBigE >> 24);
This isn't the most efficient solution unless the compiler recognises that this is byte level manipulation and generates byte swapping code. But it doesn't depend on any memory layout tricks and can be turned into a macro pretty easily.

Swap bits in a number in C [duplicate]

This question already has answers here:
Best practices for circular shift (rotate) operations in C++
(16 answers)
Closed 4 years ago.
In a C interview, I was asked to swap the first 4-bits of a number with the last 4 bit. (eg. 1011 1110 should be 1110 1011.)
Does anyone have a solution for this?
If you haven't seen or done much bit twiddling, a good resource to study is:
Bit Twiddling Hacks
unsigned char c;
c = ((c & 0xf0) >> 4) | ((c & 0x0f) << 4);
There is no "correct answer" to this kind of interview question. There are several ways to do this (lookup tables, anyone?) and the tradeoffs between each way (readability vs. performance vs. portability vs. maintainability) would need to be discussed.
The question is just an opening gambit to get you discussing some of the above issues, and to determine how 'deeply' you can discuss such problems.
Just use a temporary variable and move the last bit into that variable, then shift the bit in that direction and end of masking in the bits in the tmp var and you are done.
Update:
Let's add some code and then you can choose what is more readable.
The working one liner
unsigned int data = 0x7654;
data = (data ^ data & 0xff) | ((data & 0xf) << 4) | ((data & 0xf0) >> 4);
printf("data %x \n", data);
the same code but with some tmp vars
unsigned int data = 0x7654;
unsigned int tmp1 = 0;
unsigned int tmp2 = 0;
tmp1 = (0x0f&data)<<4;
tmp2 = (0xf0&data)>>4;
tmp1 = tmp1 | tmp2;
data = data ^ (data & 0xff);
data = data | tmp1;
printf("data %x \n", data);
Well the one liner is shorter anyway :)
Update:
And if you look at the asm code that gcc generated with -Os -S, my guess is that they are more or less identical since the overhead is removed during the "compiler optimisation" part.
There's no need for a temporary variable, something like this should do it:
x = ((x & 0xf) << 4) | ((x & 0xf0) >> 4);
There is a potential pitfall with this depending on the exact type of x. Identification of this problem is left as an exercise for the reader.
C++-like pseudocode (can be easily rewritten to not use temporary variables):
int firstPart = source & 0xF;
int offsetToHigherPart = sizeof( source ) * CHAR_BIT - 4;
int secondPart = ( source >> offsetToHigherPart ) & 0xF;
int maskToSeparateMiddle = -1 & ( ~0xF ) & ( ~( 0xF << offsetToHigherPart );
int result = ( firstPart << offsetToHigherPart ) | secondPart | (source & maskToSeparateMiddle);
This will require CHAR_BIT to be defined. It is usually in limits.h and is defined as 8 bits but is strictly speaking platform-dependent and can be not defined at all in the headers.
unsigned char b;
b = (b << 4) | (b >> 4);
x86 assembly:
asm{
mov AL, 10111110b
rol AL
rol AL
rol AL
rol AL
}
http://www.geocities.com/SiliconValley/Park/3230/x86asm/asml1005.html
Are you looking for something more clever than standard bit-shifting?
(assuming a is an 8-bit type)
a = ((a >> 4) & 0xF) + ((a << 4) &0xF0)
The easiest is (t is unsigned):
t = (t>>4)|(t<<4);
But if you want to obfuscate your code, or to swap other bits combination you can use this base:
mask = 0x0F & (t ^ (t >> 4));
t ^= (mask | (mask << 4));
/*swaping four bits*/
#include<stdio.h>
void printb(char a) {
int i;
for( i = 7; i >= 0; i--)
printf("%d", (1 & (a >> i)));
printf("\n");
}
int swap4b(char a) {
return ( ((a & 0xf0) >> 4) | ((a & 0x0f) << 4) );
}
int main()
{
char a = 10;
printb(a);
a = swap4b(a);
printb(a);
return 0;
}
This is how you swap bits entirely, to change the bit endianess in a byte.
"iIn" is actually an integer because I'm using it to read from a file. I need the bits in an order where I can easily read them in order.
// swap bits
iIn = ((iIn>>4) & 0x0F) | ((iIn<<4) & 0xF0); // THIS is your solution here.
iIn = ((iIn>>2) & 0x33) | ((iIn<<2) & 0xCC);
iIn = ((iIn>>1) & 0x55) | ((iIn<<1) & 0xAA);
For swapping just two nibbles in a single byte, this is the most efficient way to do this, and it's probably faster than a lookup table in most situations.
I see a lot of people doing shifting, and forgetting to do the masking here. This is a problem when there is sign extension. If you have the type of unsigned char, it's fine since it's a unsigned 8 bit quantity, but it will fail with any other type.
The mask doesn't add overhead, with an unsigned char, the mask is implied anyhow, and any decent compiler will remove unnecessary code and has for 20 years.
Solution for generic n bits swapping between last and first.
Not verified for case when total bits are less than 2n.
here 7 is for char, take 31 for integer.
unsigned char swapNbitsFtoL(unsigned char num, char nbits)
{
unsigned char u1 = 0;
unsigned char u2 = 0;
u1 = ~u1;
u1 &= num;
u1 = (u1 >> (7 - (nbits - 1))); /* Here nbits is number of n=bits so I have taken (nbits - 1). */
u2 = ~u2;
u2 &= num;
u2 = (u2 << (7 - (nbits - 1))); /* Here nbits is number of n=bits so I have taken (nbits - 1). */
u1 |= u2; /* u1 have first and last swapped n bits with */
u2 = 0;
u2 = ~u2;
u2 = ((u2 >> (7 - (nbits - 1))) | (u2 << (7 - (nbits - 1))));
bit_print(u2);
u2 = ~u2;
u2 &= num;
return (u1 | u2);
}
My skills in this area are new and therefore unproven so if I'm wrong then I learn something new, which is at least a part of the point of Stack Overflow.
Would a bitmask and XOR work also?
Like so?
var orginal=
var mask =00001110 //I may have the mask wrong
var value=1011 1110
var result=value^mask;
I might be misunderstanding things, forgive me if I've screwed up entriely.
#include <stdio.h>
#include <conio.h>
#include <math.h>
void main() {
int q,t,n,a[20],j,temp;
int i=0;
int s=0;
int tp=0;
clrscr();
printf("\nenter the num\n");
scanf("%d",&n);
t=n;
while(n>0) {
a[i]=n%2;
i++;
n=n/2;
}
printf("\n\n");
printf("num:%d\n",t);
printf("number in binary format:");
for(j=i-1;j>=0;j--) {
printf("%d",a[j]);
}
printf("\n");
temp=a[i-1];
a[i-1]=a[0];
a[0]=temp;
printf("number in binary format wid reversed boundary bits:");
for(j=i-1;j>=0;j--) {
printf("%d",a[j]);
}
printf("\n");
q=i-1;
while(q>=0) {
tp=pow(2,q);
s=s+(tp*a[q]);
q--;
}
printf("resulatnt number after reversing boundary bits:%d",s);
printf("\n");
getch();
}

Resources