I made a function to set or clear a specific number of bits in a DWORD. My function works. I don't need help making it work. However, I am wondering if the method I've chosen to do it is the fastest possible way.
It's rather hard for me to explain how this works. There are two arrays containing DWORDs that are filled with bits on the left and right side of the DWORD (with all binary 1's). It makes a mask with all the bits filled except for the ones I want to set or clear, and then sets them with bitwise operators based on that mask. It seems rather complicated for such a simple task, but it seems like the fastest way I could come up with. It's much faster than setting them bit by bit.
static DWORD __dwFilledBitsRight[] = {
0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF, 0x1FFFF, 0x3FFFF, 0x7FFFF, 0xFFFFF, 0x1FFFFF, 0x3FFFFF, 0x7FFFFF, 0xFFFFFF, 0x1FFFFFF, 0x3FFFFFF, 0x7FFFFFF, 0xFFFFFFF, 0x1FFFFFFF, 0x3FFFFFFF, 0x7FFFFFFF, 0xFFFFFFFF
};
static DWORD __dwFilledBitsLeft[] = {
0x0, 0x80000000, 0xC0000000, 0xE0000000, 0xF0000000, 0xF8000000, 0xFC000000, 0xFE000000, 0xFF000000, 0xFF800000, 0xFFC00000, 0xFFE00000, 0xFFF00000, 0xFFF80000, 0xFFFC0000, 0xFFFE0000, 0xFFFF0000, 0xFFFF8000, 0xFFFFC000, 0xFFFFE000, 0xFFFFF000, 0xFFFFF800, 0xFFFFFC00, 0xFFFFFE00, 0xFFFFFF00, 0xFFFFFF80, 0xFFFFFFC0, 0xFFFFFFE0,
0xFFFFFFF0, 0xFFFFFFF8, 0xFFFFFFFC, 0xFFFFFFFE, 0xFFFFFFFF
};
// nStartBitFromLeft must be between 1 and 32...
// 1 is the bit farthest to the left (actual bit 31)
// 32 is the bit farthest to the right (actual bit 0)
inline void __FillDWORDBits(DWORD *p, int nStartBitFromLeft, int nBits, BOOL bSet)
{
DWORD dwLeftMask = __dwFilledBitsLeft[nStartBitFromLeft - 1]; // Mask for data on the left of the bits we want
DWORD dwRightMask = __dwFilledBitsRight[33 - (nStartBitFromLeft + nBits)]; // Mask for data on the right of the bits we want
DWORD dwBitMask = ~(dwLeftMask | dwRightMask); // Mask for the bits we want
DWORD dwOriginal = *p;
if(bSet) *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | (0xFFFFFFFF & dwBitMask);
else *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | 0;
}
How about:
// Create mask of correct length, and shift to the correct position
DWORD mask = ((1ULL << nBits) - 1) << pos;
// Apply mask (or its inverse)
if (bSet)
{
*p |= mask;
}
else
{
*p &= ~mask;
}
It's pretty likely that simple bitwise operations will be faster than table lookup on any modern processor.
Note: Depending on the relationship between DWORD and long long on this platform, you may need special handling for the case where nBits == sizeof(DWORD)*8. Or if nBits==0 is not a possibility, you could just do DWORD mask = ((2ULL << (nBits - 1)) - 1) << pos;.
Update: It's been mentioned that the if could potentially be slow, which is true. Here's a replacement for it, but you'd need to measure to see if it's actually any faster in practice.
// A bit hacky, but the aim is to get 0x00000000 or 0xFFFFFFFF
// (relies on two's-complement representation)
DWORD blanket = bSet - 1;
// Use the blanket to override one or other masking operation
*p |= (blanket | mask);
*p &= ~(blanket & mask);
This is the way I'd do it. I'd break it into two functions, setbits() and clearbits(). Steps broken out for clarity, and I'm sure it can be far more optimized.
This version is dependent on 32-bit code as it stands. Also, in my world, bit 0 is the rightmost bit. Your mileage may vary.
setbits( DWORD *p , int offset , int len )
{
// offset must be 0-31, len must be 0-31, len+offset must be 0-32
int right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
int left_shift = offset ;
DWORD right_mask = 0xFFFFFFFF >> right_shift ;
DWORD left_mask = 0xFFFFFFFF << left_shift ;
DWORD mask = left_mask & right_mask ;
*p |= mask ;
return ;
}
clearbits( DWORD *p , int offset , int len )
{
// offset must be 0-31, len must be 0-31, len+offset must be 0-32
int right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
int left_shift = offset ;
DWORD right_mask = 0xFFFFFFFF >> right_shift ;
DWORD left_mask = 0xFFFFFFFF << left_shift ;
DWORD mask = ~( left_mask & right_mask ) ;
*p &= mask ;
return ;
}
I stumbled across this improved version whilst looking for something else today. Courtesy of Sean Anderson's Bit Twiddling Hacks at Stanford University:
// uncomment #define to get the super scalar CPU version.
// #define SUPER_SCALAR_CPU
void setbits( unsigned int *p , int offset , int len , int flag )
{
unsigned int mask = ( ( 1 << len ) - 1 ) << offset ;
#if !defined( SUPER_SCALAR_CPU )
*p ^= ( - flag ^ *p ) & mask ;
#else
// supposed to be some 16% faster on a Intel Core 2 Duo than the non-super-scalar version above
*p = (*p & ~ mask ) | ( - flag & mask ) ;
#endif
return ;
}
Much depends on your compiler, though.
Related
I am trying to transmit values between architectures, by creating a uint8_t[] buffer and then sending that. To ensure they are transmitted correctly, the spec is to convert all values to little-endian as they go into the buffer.
I read this article here which discussed how to convert from one endianness to the other, and here where it discusses how to check the endianness of the system.
I am curious if there is a method to read bytes from a uint64 or other value in little-endian order regardless of whether the system is big or little? (ie through some sequence of bitwise operations)
Or is the only method to first check the endianness of the system, and then if big explicitly convert to little?
That's actually quite easy -- you just use shifts to convert between 'native' format (whatever that is) and little-endian
/* put a 32-bit value into a buffer in little-endian order (4 bytes) */
void put32(uint8_t *buf, uint32_t val) {
buf[0] = val;
buf[1] = val >> 8;
buf[2] = val >> 16;
buf[3] = val >> 24;
}
/* get a 32-bit value from a buffer (little-endian) */
uint32_t get32(uint8_t *buf) {
return (uint32_t)buf[0] + ((uint32_t)buf[1] << 8) +
((uint32_t)buf[2] << 16) + ((uint32_t)buf[3] << 24);
}
If you put a value into a buffer, transmit it as a byte stream to another machine, and then get the value from the received buffer, the two machines will have the same 32 bit value regardless of whether they have the same or different native byte oridering. The casts are needed becuase the default promotions will just convert to int, which might be smaller than a uin32_t, in which case the shifts could be out of range.
Be careful if you buffers are char rather than uint8_t (char might or might not be signed) -- you need to mask in that case:
uint32_t get32(char *buf) {
return ((uint32_t)buf[0] & 0xff) + (((uint32_t)buf[1] & 0xff) << 8) +
(((uint32_t)buf[2] & 0xff) << 16) + (((uint32_t)buf[3] & 0xff) << 24);
}
You can always serialize an uint64_t value to array of uint8_t in little endian order as simply
uint64_t source = ...;
uint8_t target[8];
target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;
or
for (int i = 0; i < sizeof (uint64_t); i++) {
target[i] = source >> i * 8;
}
and this will work anywhere where uint64_t and uint8_t exists.
Notice that this assumes that the source value is unsigned. Bit-shifting negative signed values will cause all sorts of headaches and you just don't want to do that.
Deserialization is a bit more complex if reading byte at a time in order:
uint8_t source[8] = ...;
uint64_t target = 0;
for (int i = 0; i < sizeof (uint64_t); i ++) {
target |= (uint64_t)source[i] << i * 8;
}
The cast to (uint64_t) is absolutely necessary, because the operands of << will undergo integer promotions, and uint8_t would always be converted to a signed int - and "funny" things will happen when you shift a set bit into the sign bit of a signed int.
If you write this into a function
#include <inttypes.h>
void serialize(uint64_t source, uint8_t *target) {
target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;
}
and compile for x86-64 using GCC 11 and -O3, the function will be compiled to
serialize:
movq %rdi, (%rsi)
ret
which just moves the 64-bit value of source into target array as is. If you reverse the indices (7 ... 0; big-endian), GCC will be clever enough to recognize that too and will compile it (with -O3) to
serialize:
bswap %rdi
movq %rdi, (%rsi)
ret
Most standardized network protocols specify numbers in big-endian format. In fact, big-endian is all referred to as network byte order, and there are functions specifically for translating integers of various sizes between host and network byte order.
These function are htons and ntohs for 16 bit values and htonl and ntohl` for 32 bit values. However, there is no equivalent for 64 bit values, and you're using little-endian for the network protocol, so these won't help you.
You can still however translate between the host byte order and the network byte order (little-endian in this case) without knowing the host order. You can do this by bit shifting the relevant values in to or out of the host numbers.
For example, to convert a 32 bit value from host to little endian and back to host:
uint32_t src_value = *some value*;
uint8_t buf[sizeof(uint32_t)];
int i;
for (i=0; i<sizeof(uint32_t); i++) {
buf[i] = (src_value >> (8 * i)) & 0xff;
}
uint32_t dest_value = 0;
for (i=0; i<sizeof(uint32_t); i++) {
dest_value |= (uint32_t)buf[i] << (8 * i);
}
For two systems that must communicated, you specify an "intercomminication-byte order". Then you have functions that convert between that and the native architecture byte order of each system.
There are three approaches to this problem. In order of efficiency:
Compile time detection of endianess
Run time detection of endianness
Endian agnostic code (corresponding to "sequence of bitwise operations" in your question).
Compile time detection of endianess
On architectures whose byte order is the same as the intercomm byte order, these functions do no transformation, but by using them, the same code becomes portable between systems.
Such functions may already exist on your target platform, for example:
Linux's endian.h be64toh() et-al
POSIX htonl, htons, ntohl, ntohs
Windows' winsock.h (same as POSIX but adds 64 bit htonll() and ntohll()
Where they don't exist creating them with cross-platform support is trivial. For example:
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
#if __BIG_ENDIAN__
return intercom_word ;
#else
return intercom_word >> 8 | intercom_word << 8 ;
#endif
}
Here I have assumed that the intercom order is big-endian, that makes the function compatible with network byte order per ntohs() et al. The macro __BIG_ENDIAN__ is a predefined macro on most compilers. If not simply define it as a command line macro when compiling e.g. -D __BIG_ENDIAN__.
Run time detection of endianness
It is possible to detect endianness at runtime with minimal overhead:
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
static const union
{
uint16_t word ;
uint8_t bytes[2] ;
} test = {.word = 0xff00u } ;
return test.bytes[0] == 0xffu ?
intercom_word :
intercom_word >> 8 | intercom_word << 8 ;
}
Of course you might wrap the test in a function for use in similar functions for other word sizes:
#include <stdbool.h>
bool isBigEndian()
{
static const union
{
uint16_t word ;
uint8_t bytes[2] ;
} test = {.word = 0xff00u } ;
return test.bytes[0] == 0xffu ;
}
Then simply have :
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
return isBigEndian() ? intercom_word :
intercom_word >> 8 | intercom_word << 8 ;
}
Endian agnostic code
It is entirely possible to use endian agnostic code, but in that case all participants in the communication or file processing have the software overhead imposed even if the native byte order is already the same as the intercom byte order.
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
uint8_t host_word [2] = { intercom_word >> 8,
intercom_word << 8 } ;
return *(uint16_t*)host_word ;
}
I am using GCC struct bit fields in an attempt interpret 8 byte CAN message data. I wrote a small program as an example of one possible message layout. The code and the comments should describe my problem. I assigned the 8 bytes so that all 5 signals should equal 1. As the output shows on an Intel PC, that is hardly the case. All CAN data that I deal with is big endian, and the fact that they are almost never packed 8 bit aligned makes htonl() and friends useless in this case. Does anyone know of a solution?
#include <stdio.h>
#include <netinet/in.h>
typedef union
{
unsigned char data[8];
struct {
unsigned int signal1 : 32;
unsigned int signal2 : 6;
unsigned int signal3 : 16;
unsigned int signal4 : 8;
unsigned int signal5 : 2;
} __attribute__((__packed__));
} _message1;
int main()
{
_message1 message1;
unsigned char incoming_data[8]; //This is how this message would come in from a CAN bus for all signals == 1
incoming_data[0] = 0x00;
incoming_data[1] = 0x00;
incoming_data[2] = 0x00;
incoming_data[3] = 0x01; //bit 1 of signal 1
incoming_data[4] = 0x04; //bit 1 of signal 2
incoming_data[5] = 0x00;
incoming_data[6] = 0x04; //bit 1 of signal 3
incoming_data[7] = 0x05; //bit 1 of signal 4 and signal 5
for(int i = 0; i < 8; ++i){
message1.data[i] = incoming_data[i];
}
printf("signal1 = %x\n", message1.signal1);
printf("signal2 = %x\n", message1.signal2);
printf("signal3 = %x\n", message1.signal3);
printf("signal4 = %x\n", message1.signal4);
printf("signal5 = %x\n", message1.signal5);
}
Because struct packing order varies between compilers and architectures, the best option is to use a helper function to pack/unpack the binary data instead.
For example:
static inline void message1_unpack(uint32_t *fields,
const unsigned char *buffer)
{
const uint64_t data = (((uint64_t)buffer[0]) << 56)
| (((uint64_t)buffer[1]) << 48)
| (((uint64_t)buffer[2]) << 40)
| (((uint64_t)buffer[3]) << 32)
| (((uint64_t)buffer[4]) << 24)
| (((uint64_t)buffer[5]) << 16)
| (((uint64_t)buffer[6]) << 8)
| ((uint64_t)buffer[7]);
fields[0] = data >> 32; /* Bits 32..63 */
fields[1] = (data >> 26) & 0x3F; /* Bits 26..31 */
fields[2] = (data >> 10) & 0xFFFF; /* Bits 10..25 */
fields[3] = (data >> 2) & 0xFF; /* Bits 2..9 */
fields[4] = data & 0x03; /* Bits 0..1 */
}
Note that because the consecutive bytes are interpreted as a single unsigned integer (in big-endian byte order), the above will be perfectly portable.
Instead of an array of fields, you could use a structure, of course; but it does not need to have any resemblance to the on-the-wire structure at all. However, if you have several different structures to unpack, an array of (maximum-width) fields usually turns out to be easier and more robust.
All sane compilers will optimize the above code just fine. In particular, GCC with -O2 does a very good job.
The inverse, packing those same fields to a buffer, is very similar:
static inline void message1_pack(unsigned char *buffer,
const uint32_t *fields)
{
const uint64_t data = (((uint64_t)(fields[0] )) << 32)
| (((uint64_t)(fields[1] & 0x3F )) << 26)
| (((uint64_t)(fields[2] & 0xFFFF )) << 10)
| (((uint64_t)(fields[3] & 0xFF )) << 2)
| ( (uint64_t)(fields[4] & 0x03 ) );
buffer[0] = data >> 56;
buffer[1] = data >> 48;
buffer[2] = data >> 40;
buffer[3] = data >> 32;
buffer[4] = data >> 24;
buffer[5] = data >> 16;
buffer[6] = data >> 8;
buffer[7] = data;
}
Note that the masks define the field length (0x03 = 0b11 (2 bits), 0x3F = 0b111111 (16 bits), 0xFF = 0b11111111 (8 bits), 0xFFFF = 0b1111111111111111 (16 bits)); and the shift amount depends on the bit position of the least significant bit in each field.
To verify such functions work, pack, unpack, repack, and re-unpack a buffer that should contain all zeros except one of the fields all ones, and verify the data stays correct over two roundtrips. It usually suffices to detect the typical bugs (wrong bit shift amounts, typos in masks).
Note that documentation will be key to ensure the code remains maintainable. I'd personally add comment blocks before each of the above functions, similar to
/* message1_unpack(): Unpack 8-byte message to 5 fields:
field[0]: Foobar. Bits 32..63.
field[1]: Buzz. Bits 26..31.
field[2]: Wahwah. Bits 10..25.
field[3]: Cheez. Bits 2..9.
field[4]: Blop. Bits 0..1.
*/
with the field "names" reflecting their names in documentation.
I'm working on code to compute CRC32 using the hardware CRC support that's built into the ARM Cortex-M4 processor. For reference, there's an application note that describes the hardware here:
http://www.st.com/st-web-ui/static/active/en/resource/technical/document/application_note/DM00068118.pdf
Basically, you write 32-bits of data at a time to a memory-mapped register (CRC_DR), and then you read the resulting CRC back from the same address. However, the CRC this produces is quite different than the standard result that the software CRC32 libraries produce. I finally found someone who had written code that manipulates the Cortex result to produce the "standard" result:
http://www.cnblogs.com/shangdawei/p/4603948.html
My code (shown below and adapted from the above solution) now produces the "standard" result, but I suspect there are more calls to function ReverseBits than are actually necessary. I'm hoping someone can tell me if it can be simplified.
Thanks!
Dan
#define RCC_BASE 0x40023800
#define RCC_AHB1ENR *((uint32_t *) (RCC_BASE + 0x30))
#define CRC_BASE 0x40023000
#define CRC_DR *((volatile uint32_t *) (CRC_BASE + 0x00))
#define CRC_IDR *((volatile uint32_t *) (CRC_BASE + 0x04))
#define CRC_CR *((volatile uint32_t *) (CRC_BASE + 0x08))
uint32_t ARMcrc32(void *data, uint32_t bytes)
{
uint32_t *p32 = data ;
uint32_t crc, crc_reg ;
RCC_AHB1ENR |= 1 << 12 ; // Enable CRC clock
CRC_CR |= 0x00000001 ; // Reset the CRC calculator
while (bytes >= 4)
{
CRC_DR = ReverseBits(*p32++) ;
bytes -= 4 ;
}
crc_reg = CRC_DR ;
crc = ReverseBits(crc_reg) ;
if (bytes > 0)
{
uint32_t bits = 8 * bytes ;
uint32_t xtra = 32 - bits ;
uint32_t mask = (1 << bits) - 1 ;
CRC_DR = crc_reg ;
CRC_DR = ReverseBits((*p32 & mask) ^ crc) >> xtra ;
crc = (crc >> bits) ^ ReverseBits(CRC_DR);
}
return ~crc ;
}
I built a virtual machine in C. And for this I have the Instruction
pushc <const>
I saved the command and the value in 32 Bit. The First 8 Bit are for the command and the rest for the value.
8 Bit -> Opcode
24 Bit -> Immediate value
For this I make a macro
#define PUSHC 1 //1 is for the command value in the Opcode
#define IMMEDIATE(x) ((x) & 0x00FFFFFF)
UPDATE:
**#define SIGN_EXTEND(i) ((i) & 0x00800000 ? (i) | 0xFF000000 : (i))**
Then I load for testing this in a unsigned int array:
Update:
unsigned int code[] = { (PUSHC << 24 | IMMEDIATE(2)),
(PUSHC << 24 | SIGN_EXTEND(-2)),
...};
later in my code I want to get the Immediate value of the pushc command and push this value to a stack...
I get every Instruction (IR) from the array and built my stack.
UPDATE:
void exec(unsigned int IR){
unsigned int opcode = (IR >> 24) & 0xff;
unsigned int imm = (IR & 0xffffff);
switch(opcode){
case PUSHC: {
stack[sp] = imm;
sp = sp + 1;
break;
}
}
...
}
}
Just use a bitwise AND to mask out the lower 24 bits, then use it in the case:
const uint8_t opcode = (IR >> 24) & 0xff;
const uint32_t imm = (IR & 0xffffff);
switch(opcode)
{
case PUSHC:
stack[sp] = imm;
break;
}
I shifted around the extraction of the opcode to make the case easier to read.
How can I switch the 0th and 3rd bits of each nibble in an integer using only bit operations (no control structures)? What kind of masks do I need to create in order to solve this problem? Any help would be appreciated. For example, 8(1000) become 1(0001).
/*
* SwitchBits(0) = 0
* SwitchBits(8) = 1
* SwitchBits(0x812) = 0x182
* SwitchBits(0x12345678) = 0x82a4c6e1
* Legal Operations: ! ~ & ^ | + << >>
*/
int SwitchBits(int n) {
}
Code:
#include <stdio.h>
#include <inttypes.h>
static uint32_t SwitchBits(uint32_t n)
{
uint32_t bit0_mask = 0x11111111;
uint32_t bit3_mask = 0x88888888;
uint32_t v_bit0 = n & bit0_mask;
uint32_t v_bit3 = n & bit3_mask;
n &= ~(bit0_mask | bit3_mask);
n |= (v_bit0 << 3) | (v_bit3 >> 3);
return n;
}
int main(void)
{
uint32_t i_values[] = { 0, 8, 0x812, 0x12345678, 0x9ABCDEF0 };
uint32_t o_values[] = { 0, 1, 0x182, 0x82A4C6E1, 0x93B5D7F0 };
enum { N_VALUES = sizeof(o_values) / sizeof(o_values[0]) };
for (int i = 0; i < N_VALUES; i++)
{
printf("0x%.8" PRIX32 " => 0x%.8" PRIX32 " (vs 0x%.8" PRIX32 ")\n",
i_values[i], SwitchBits(i_values[i]), o_values[i]);
}
return 0;
}
Output:
0x00000000 => 0x00000000 (vs 0x00000000)
0x00000008 => 0x00000001 (vs 0x00000001)
0x00000812 => 0x00000182 (vs 0x00000182)
0x12345678 => 0x82A4C6E1 (vs 0x82A4C6E1)
0x9ABCDEF0 => 0x93B5D7F0 (vs 0x93B5D7F0)
Note the use of uint32_t to avoid undefined behaviour with sign bits in signed integers.
To obtain a bit, you can mask it out using AND. To get the lowest bit, for example:
x & 0x01
Think about how AND works: both bits must be set. Since we're ANDing with 1, all bits except the first must be 0, because they're 0 in 0x01. The lowest bit will be either 0 or 1, depending on what's in x; said differently, the lowest bit will be the lowest bit in x, which is what we want. Visually:
x = abcd
AND 1 = 0001
--------
000d
(where abcd represent the bits in those slots; we don't know what they are)
To move it to bit 3's position, just shift it:
(x & 0x01) << 3
Visually, again:
x & 0x01 = 000d
<< 3
-----------
d000
To add it in, first, we need to clear out that spot in x for our bit. We use AND again:
x & ~0x08
Here, we invert 0x08 (which is 1000 in binary): this means all bits except bit 3 are set, and when we AND that with x, we get x except for that bit.
Visually,
0x08 = 1000
(invert)
-----------
0111
AND x = abcd
------------
0bcd
Combine with OR:
(x & ~0x08) | ((x & 0x01) << 3)
Visually,
x & ~0x08 = 0bcd
| ((x & 0x01) << 3) = d000
--------------------------
dbcd
Now, this only moves bit 0 to bit 3, and just overwrites bit 3. We still need to do bit 3 → 0. That's simply another:
x & 0x08 >> 3
And we need to clear out its spot:
x & ~0x01
We can combine the two clearing pieces:
x & ~0x09
And then:
(x & ~0x09) | ((x & 0x01) << 3) | ((x & 0x08) >> 3)
That of course handles only the lowest nibble. I'll leave the others as an exercise.
Try below code . Here you should know bitwise operator to implement and correct position to place.Also needs to aware of maintenance ,shifting and toggling basic properties.
#include<stdio.h>
#define BITS_SWAP(x) x=(((x & 0x88888888)>>3) | ((x & 0x11111111)<<3)) | ((x & ~ (0x88888888 | 0x11111111)))
int main()
{
int data=0;
printf("enter the data in hex=0x");
scanf("%x",&data);
printf("bits=%x",BITS_SWAP(data));
return 0;
}
OP
vinay#vinay-VirtualBox:~/c_skill$ ./a.out
enter the data in hex=0x1
bits=8
vinay#vinay-VirtualBox:~/c_skill$ ./a.out
enter the data in hex=0x812
bits=182
vinay#vinay-VirtualBox:~/c_skill$ ./a.out
enter the data in hex=0x12345678
bits=82a4c6e1
vinay#vinay-VirtualBox:~/c_skill$
Try this variant of the xor swap:
uint32_t switch_bits(uint32_t a){
static const mask = 0x11111111;
a ^= (a & mask) << 3;
a ^= (a >> 3) & mask;
a ^= (a & mask) << 3;
return a;
}
Move the low bits to the high bits and mask out the resulting bits.
Move the high bits to the low bits and mask out the resulting bits.
Mask out all bits that have not been moved.
Combine the results with ORs.
Code:
unsigned SwitchBits(unsigned n) {
return ((n << 3) & 0x88888888) | ((n >> 3) & 0x11111111) | (n & 0x66666666);
}
Alternativly, if you would like to be very clever. It can be done with two fewer operations, though this may not actually be faster due to some of the dependicies between instrutions.
Move the high bits to align with the low bits
XOR recording a 0 in the low bit if high an low bits are the same, and a 1 if they are different.
From this, mask out only the low bit of each nibble.
From this, multiply by 9, this will keep the low bit as is, and also copy it to the high bit.
From this, XOR with the original value. in the case that the high and low bit are the same, no change will correctly occure. In the case they are different, they will be effectivly exchanged.
Code:
unsigned SwitchBits(unsigned n) {
return ((((n >> 3) ^ n) & 0x11111111) * 0x9) ^ n;
}