Problem:
I have a sequence of bits of indices 7 6 5 4 3 2 1 0 and I want to swizzle them the following way :
7 6 5 4 3 2 1 0 = 7 6 5 4 3 2 1 0
_____| | | | | | | |_____
| ___| | | | | |___ |
| | _| | | |_ | |
| | | | | | | |
v v v v v v v v
_ 3 _ 2 _ 1 _ 0 7 _ 6 _ 5 _ 4 _
|___________________|
|
v
7 3 6 2 5 1 4 0
i.e. I want to interleave the bits of the low and high nibbles from a byte.
Naive solution:
I can achieve this behavior in C using the following way :
int output =
((input & (1 << 0)) << 0) |
((input & (1 << 1)) << 1) |
((input & (1 << 2)) << 2) |
((input & (1 << 3)) << 3) |
((input & (1 << 4)) >> 3) |
((input & (1 << 5)) >> 2) |
((input & (1 << 6)) >> 1) |
((input & (1 << 7)) >> 0);
However it's obviously very clunky.
Striving for a more elegant solution:
I was wondering if there where something I could do to achieve this behavior faster in less machine instructions. Using SSE for example?
Some context for curious people :
I use this for packing 2d signed integer vector coordinates into a 1d value that conserves proximity when dealing with memory and caching. The idea is similar to some texture layouts optimization used by some GPUs on mobile devices.
(i ^ 0xAAAAAAAA) - 0xAAAAAAAA converts from 1d integer to 1d signed integer with this power of two proximity I was talking about.
(x + 0xAAAAAAAA) ^ 0xAAAAAAAA is just the reverse operation, going from 1d signed integer to a 1d integer, still with the same properties.
To have it become 2d and keep the proximity property, I want to alternate the x and y bits.
So you want to interleave the bits of the low and high nibbles in each byte? For scalar code a 256-byte lookup table (LUT) is probably your best bet.
For x86 SIMD, SSSE3 pshufb (_mm_shuffle_epi8) can be used as a parallel LUT of 16x nibble->byte lookups in parallel. Use this to unpack a nibble to a byte.
__m128i interleave_high_low_nibbles(__m128i v) {
const __m128i lut_unpack_bits_low = _mm_setr_epi8( 0, 1, 0b00000100, 0b00000101,
... // dcba -> 0d0c0b0a
);
const __m128i lut_unpack_bits_high = _mm_slli_epi32(lut_unpack_bits_low, 1);
// dcba -> d0c0b0a0
// ANDing is required because pshufb uses the high bit to zero that element
// 8-bit element shifts aren't available so also we have to mask after shifting
__m128i lo = _mm_and_si128(v, _mm_set1_epi8(0x0f));
__m128i hi = _mm_and_si128(_mm_srli_epi32(v, 4), _mm_set1_epi8(0x0f));
lo = _mm_shuffle_epi8(lut_unpack_bits_low, lo);
hi = _mm_shuffle_epi8(lut_unpack_bits_high, hi);
return _mm_or_si128(lo, hi);
}
This is not faster than a memory LUT for a single byte, but it does 16 bytes in parallel. pshufb is a single-uop instruction on x86 CPUs made in the last decade. (Slow on first-gen Core 2 and K8.)
Having separate lo/hi LUT vectors means that setup can be hoisted out of a loop; otherwise we'd need to shift one LUT result before ORing together.
This question already has answers here:
How does this work? Weird Towers of Hanoi Solution
(3 answers)
Closed 3 years ago.
I am a beginner to C language.I have a code for towers of hanoi but can someone explain me what are these bitwise operators doing ie if value of i is 1 what will be the source and target output value ?
source = (i & i-1) % 3;
target = ((i | i-1) + 1) % 3;
i & i-1 turns off the lowest set bit in i (if there are any set). For example, consider i=200:
200 in binary is 1100 1000. (The space is inserted for visual convenience.)
To subtract one, the zeros cause us to “borrow” from the next position until we reach a one, producing 1100 0111. Note that, working from the right, all the zeros became ones, and the first one became a zero.
The & produces the bits that are set in both operands. Since i-1 changed all the bits up to the first one, those bits are clear in the &—none of the changed bits are the same in both i and i-1, so none of them is a one in both. The other ones in i, above the lowest one bit, are the same in both i and i-1, so they remain ones in i & i-1. The result of i & i-1 is 1100 0000.
1100 0000 is 1100 1000 with the lowest set bit turned off.
Then the % 3 is selecting which pole in Towers of Hanoi to use as the source. This is discussed in this question.
Similarly i | i-1 turns on all the low zeros in i, all the zeros up to the lowest one bit. Then (i | i-1) + 1 adds one to that. The result is the same as adding one to the lowest one bit in i. That is, the result is i + x, where x is the lowest bit set in i. Using our example value:
i is 1100 1000 and i-1 is 1100 0111.
i | i-1 is 1100 1111.
(i | i-1) + 1 is 1101 0000, which equals 1100 1000 + 0000 1000.
And again, the % 3 selects a pole.
A quick overview of bitwise operators:
Each operator takes the bits of both numbers and applies the operation to each bit of it.
& Bitwise AND
True only if both bits are true.
Truth table:
A | B | A & B
-------------
0 | 0 | 0
1 | 0 | 0
0 | 1 | 0
1 | 1 | 1
| Bitwise OR
True if either bit is true.
Truth table:
A | B | A | B
-------------
0 | 0 | 0
1 | 0 | 1
0 | 1 | 1
1 | 1 | 1
^ Bitwise XOR
True if only one bit is true.
Truth table:
A | B | A ^ B
-------------
0 | 0 | 0
1 | 0 | 1
0 | 1 | 1
1 | 1 | 0
~ Bitwise NOT
Inverts each bit. 1 -> 0, 0 -> 1. This is a unary operator.
Truth table:
A | ~A
------
0 | 1
1 | 0
In your case, if i = 1,
the expressions would be evaluated as:
source = (1 & 1-1) % 3;
target = ((1 | 1-1) + 1) % 3;
// =>
source = (1 & 0) % 3;
target = ((1 | 0) + 1) % 3;
// =>
source = 0 % 3;
target = (1 + 1) % 3;
// =>
source = 0;
target = 2 % 3;
// =>
source = 0;
target = 2;
Good answer above, here is a high-level approach:
i == 1:
source: (1 & 0). Are both of these values true or >= 1? No they are not. So the overall result is 0, 0 % 3 = 0.
target: ((1 | 0) + 1) % 3.
(1 | 0) evaluates to 1(true) since one of the two values on the sides of the | operator are 1, so now we have (1 + 1). so then it follows we have 2 % 3 = 2.
Source: 0, target: 2
All of the bit shuffling algorithms I've found deal with 16-bit or 32-bit, which means that even if I use only the first 25-bits of an int, the shuffle will leave bits outside. This function is in an inner loop of a CPU-intensive process so I'd prefer it to be as fast as possible. I've tried modifying the code of the Hacker's Delight 32-bit shuffle algorithm
x = (x & 0x0000FF00) << 8 | (x >> 8) & 0x0000FF00 | x & 0xFF0000FF;
x = (x & 0x00F000F0) << 4 | (x >> 4) & 0x00F000F0 | x & 0xF00FF00F;
x = (x & 0x0C0C0C0C) << 2 | (x >> 2) & 0x0C0C0C0C | x & 0xC3C3C3C3;
x = (x & 0x22222222) << 1 | (x >> 1) & 0x22222222 | x & 0x99999999;
but am having difficulty in doing some partly because I'm not sure where the masks come from. I tried shifting the number and re-shuffling but so far the results are all for naught. Any help would be GREATLY appreciated!
(I am using C but I can convert an algorithm from another language)
First, for the sake of evenness, we can extend the problem to a 26-bit shuffle by remembering that bit 25 will appear at the end of the interleaved list, so we can trim it off after the interleaving operation without affecting the positions of the other bits.
Now we want to interleave the first and second sets of 13 bits; but we only have an algorithm to interleave the first and second sets of 16 bits.
A straightfoward approach might be to just move the high and low parts of x into more workable positions before applying the standard algorithm:
x = (x & 0x1ffe000) << 3 | x & 0x00001fff;
x = (x & 0x0000FF00) << 8 | (x >> 8) & 0x0000FF00 | x & 0xFF0000FF;
x = (x & 0x00F000F0) << 4 | (x >> 4) & 0x00F000F0 | x & 0xF00FF00F;
x = (x & 0x0C0C0C0C) << 2 | (x >> 2) & 0x0C0C0C0C | x & 0xC3C3C3C3;
x = (x & 0x22222222) << 1 | (x >> 1) & 0x22222222 | x & 0x99999999;
The zeroes at the top of each half will be interleaved and appear at the top of the result.
~ & ^ | + << >> are the only operations I can use
Before I continue, this is a homework question, I've been stuck on this for a really long time.
My original approach: I thought that !x could be done with two's complement and doing something with it's additive inverse. I know that an xor is probably in here but I'm really at a loss how to approach this.
For the record: I also cannot use conditionals, loops, ==, etc, only the functions (bitwise) I mentioned above.
For example:
!0 = 1
!1 = 0
!anything besides 0 = 0
Assuming a 32 bit unsigned int:
(((x>>1) | (x&1)) + ~0U) >> 31
should do the trick
Assuming x is signed, need to return 0 for any number not zero, and 1 for zero.
A right shift on a signed integer usually is an arithmetical shift in most implementations (e.g. the sign bit is copied over). Therefore right shift x by 31 and its negation by 31. One of those two will be a negative number and so right shifted by 31 will be 0xFFFFFFFF (of course if x = 0 then the right shift will produce 0x0 which is what you want). You don't know if x or its negation is the negative number so just 'or' them together and you will get what you want. Next add 1 and your good.
implementation:
int bang(int x) {
return ((x >> 31) | ((~x + 1) >> 31)) + 1;
}
The following code copies any 1 bit to all positions. This maps all non-zeroes to 0xFFFFFFFF == -1, while leaving 0 at 0. Then it adds 1, mapping -1 to 0 and 0 to 1.
x = x | x << 1 | x >> 1
x = x | x << 2 | x >> 2
x = x | x << 4 | x >> 4
x = x | x << 8 | x >> 8
x = x | x << 16 | x >> 16
x = x + 1
For 32 bit signed integer x
// Set the bottom bit if any bit set.
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x ^= 1; // Toggle the bottom bit - now 0 if any bit set.
x &= 1; // Clear the unwanted bits to leave 0 or 1.
Assuming e.g. an 8-bit unsigned type:
~(((x >> 0) & 1)
| ((x >> 1) & 1)
| ((x >> 2) & 1)
...
| ((x >> 7) & 1)) & 1
You can just do ~x & 1 because it yields 1 for 0 and 0 for everything else
for example:
unsigned int a; // value to merge in non-masked bits
unsigned int b; // value to merge in masked bits
unsigned int mask; // 1 where bits from b should be selected; 0 where from a.
unsigned int r; // result of (a & ~mask) | (b & mask) goes here
r = a ^ ((a ^ b) & mask);
merges bits from two values according to the mask.
[taken from here]
In this case, I can see that it works, but I am not sure what the logic is? And I am not sure I can create my own bit operations like this from scratch. How do I start thinking in bits?
Pencil and paper helps the best in cases like this. I usually write it down:
a = 10101110
b = 01100011
mask = 11110000
a ^ b = 10101110
01100011
--------
x => 11001101
x & mask = 11001101
11110000
--------
x => 11000000
a ^ x = 11000000
10101110
--------
x => 01101110
(final x is your r)
I don't know if this is the result you were after, but that's what it does. Writing it out usually helps when I don't understand a bitwise operation.
In this case, I can see that it works, but I am not sure what the logic is? And I am not sure I can create my own bit operations like this from scratch. How do I start thinking in bits?
People have answered your first question -- explaining the logic. I shall hopefully show you a terribly basic, long-winded but standard method of making any bit twiddling operations. (note, once you get used to working with bits you'll start thinking in & and | straight off without doing such nonsense).
Figure out what you'd like your operation to do.
Write out a FULL truth table.
Either read the sum-of-products direct from the table or make a Karnaugh map. The km will reduce the final eqution a lot.
???
Profit
Deriving for the example you gave. ie, where a mask selects bits from A or B. (0 is A, 1 is B)
This table is for 1 bit per input. I'm not doing more than one bit, as I don't want to waste my time :) ( why? 2^(2bits * 3inputs) = 64 cases :( 2^(3bits * 3inputs) = 512 cases :(()
But the good news is that in this case the operation is independant of the number of bits, so a 1 bit example is 100% fine. Infact it's recommended by me :)
| A | B | M || R |
============++====
| 0 | 0 | 0 || 0 |
| 0 | 0 | 1 || 0 |
| 0 | 1 | 0 || 0 |
| 0 | 1 | 1 || 1 |
| 1 | 0 | 0 || 1 |
| 1 | 0 | 1 || 0 |
| 1 | 1 | 0 || 1 |
| 1 | 1 | 1 || 1 |
Hopefully you can see how this truth table works.
how to get an expression from this? Two methods: KMaps and by-hand. Let's do it by-hand first, should we? :)
Looking at the points where R is true, we see:
| A | B | M || R |
============++====
| 0 | 1 | 1 || 1 |
| 1 | 0 | 0 || 1 |
| 1 | 1 | 0 || 1 |
| 1 | 1 | 1 || 1 |
From this we can dervive an expresion:
R = (~A & B & M) |
( A & ~B & ~M) |
( A & B & ~M) |
( A & B & M) |
Hopefully you can see how this works: just or together the full expressions seen in each case. By full I imply that you need to not-variables i nthere.
Let's try it in python:
a = 0xAE #10101110b
b = 0x64 #01100011b
m = 0xF0 #11110000b
r = (~a & b & m) | ( a & ~b & ~m) | ( a & b & ~m) | ( a & b & m)
print hex(r)
OUTPUT:
0x6E
These numbers are from Abel's example. The output is 0x6E, which is 01101110b.
So it worked! Hurrah. (ps, it's possible to derive an expression for ~r from the first table, should you wish to do so. Just take the cases where r is 0).
This expression you've made is a boolean "sum of products", aka Disjunctive Normal Form, although DNF is really the term used when using first-order predicate logic. This expression is also pretty unweidly. Making it smaller is a tedious thing to do on paper, and is the kind of thing you'll do 500,000 times at Uni' on a CS degree if you take the compiler or hardware courses. (Highly recommended :))
So let's do some boolean algebra magic on this (don't try and follow this, it's a waste of time):
(~a & b & m) | ( a & ~b & ~m) | ( a & b & ~m) | ( a & b & m)
|= ((~a & b & m) | ( a & ~b & ~m)) | ( a & b & ~m) | ( a & b & m)
take that first sub-clause that I made:
((~a & b & m) | ( a & ~b & ~m))
|= (~a | (a & ~b & ~m)) & (b | ( a & ~b & ~m)) & (m | ( a & ~b & ~m))
|= ((~a | a) & (a | ~b) &( a | ~m)) & (b | ( a & ~b & ~m)) & (m | ( a & ~b & ~m))
|= (T & (a | ~b) &( a | ~m)) & (b | ( a & ~b & ~m)) & (m | ( a & ~b & ~m))
|= ((a | ~b) & (a | ~m)) & (b | ( a & ~b & ~m)) & (m | ( a & ~b & ~m))
etc etc etc. This is the massively tedious bit incase you didn't guess. So just whack the expression in a website of your choice, which will tell you
r = (a & ~m) | (b & m)
Hurrah! Correct result. Note, it might even go so far as giving you an expression involving XORs, but who cares? Actually, some people do, as the expression with ands and ors is 4 operations (1 or, 2 and, 1 neg), whilst r = a ^ ((a ^ b) & mask) is 3 (2 xor, 1 and).
Now, how do you do it with kmaps? Well, first you need to know how to make them, I'll leave you to do that. :) Just google for it. There's software available, but I think it's best to do it by hand -- it's more fun and the programs don't allow you to cheat.
Cheat? Well, if you have lots of inputs, it's often best to reduce the table like so:
| A | B | M || R |
============++====
| X | X | 0 || A |
| X | X | 1 || B |
eg that 64 case table?
| A1| A0| B1| B0| M1| M0|| R1| R0|
========================++========
| X | X | X | X | 0 | 0 || A1| A0|
| X | X | X | X | 0 | 1 || A1| B0|
| X | X | X | X | 1 | 0 || B1| A0|
| X | X | X | X | 1 | 1 || B1| B0|
Boils down to 4 cases in this example :)
(Where X is "don't care".) Then put that table in your Kmap. Once again, an exercise for you to work out [ie, I've forgotten how to do this].
Hopefully you can now derive your own boolean madness, given a set of inputs and an expected set of outputs.
Have fun.
In order to create boolean expressions like that one, I think you'd have to learn some boolean algebra.
This looks good:
http://www.allaboutcircuits.com/vol_4/chpt_7/1.html
It even has a page on generating boolean expressions from truth tables.
It also has a section on Karnaugh Maps. To be honest, I've forgotten what those are, but they look like they could be useful for what you want to do.
http://www.allaboutcircuits.com/vol_4/chpt_8/1.html
a ^ x for some x gives the result of flipping those bits in a which are set in x.
a ^ b gives you a 1 where the bits in a and b differ, a 0 where they are the same.
Setting x to (a ^ b) & mask gives the result of flipping the bits in a which are different in a and b and are set in the mask. Thus a ^ ((a ^ b) & mask) gives the result of changing, where necessary, the values of the bits which are set in the mask from the value they take in a to the value they take in b.
The basis for most bitwise operations (&, |, ^, ~) is Boolean algebra. Think of performing Boolean algebra to multiple Boolean values in parallel and you've got the bitwise operators. The only operators this doesn't cover are the shift operators (<<, >>), which you can think of as shifting the bits or as multiplication by powers of two. x << 3 == x * pow(2,3) and x >> 4 == (int)(x * pow(2,-4)).
Thinking in bits in not that hard, you just need to convert, in your head, all the values into bits and work on them a bit at a time. That sounds hard but it does get easier over time. A good first step is to start thinking of them as hex digits (4 bits at a time).
For example, let's say a is 0x13, b is 0x22 and mask is 0x0f:
a : 0x13 : 0001 0011
b : 0x22 : 0010 0010
---------------------------------
a^b : 0x31 : 0011 0001
mask : 0x0f : 0000 1111
---------------------------------
(a^b)&mask : 0x01 : 0000 0001
a : 0x13 : 0001 0011
---------------------------------
a^((a^b)&mask) : 0x12 : 0001 0010
This particular example is a way to combine the top four bits of a with the bottom 4 bits of b (the mask decides which bits come from a and b.
As the site says, it's an optimization of (a & ~mask) | (b & mask):
a : 0x13 : 0001 0011
~mask : 0xf0 : 1111 0000
---------------------------------
a & ~mask : 0x10 : 0001 0000
b : 0x22 : 0010 0010
mask : 0x0f : 0000 1111
---------------------------------
b & mask : 0x20 : 0000 0010
a & ~mask : 0x10 : 0001 0000
b & mask : 0x20 : 0000 0010
---------------------------------
(a & ~mask) | : 0x12 : 0001 0010
(b & mask)
Aside: I wouldn't be overly concerned about not understanding something on that page you linked to. There's some serious "black magic" going on there. If you really want to understand bit fiddling, start with unoptimized ways of doing it.
First learn the logical (that is, 1-bit) operators well. Try to write down some rules, like
a && b = b && a
1 && a = a
1 || a = 1
0 && a = ... //you get the idea. Come up with as many as you can.
Include the "logical" xor operator:
1 ^^ b = !b
0 ^^ b = ...
Once you have a feel for these, move onto bitwise operators. Try some problems, look at some common tricks and techniques. With a bit of practice you'll feel much more confident.
Break the expression down into individual bits. Consider a single bit position in the expression (a ^ b) & mask. If the mask has zero at that bit position, (a ^ b) & mask will simply give you zero. Any bit xor'ed with zero will remain unchanged, so a ^ (a ^ b) & mask will simply return a's original value.
If the mask has a 1 at that bit position, (a ^ b) & mask will simply return the value of a ^ b. Now if we xor the value with a, we get a ^ (a ^ b) = (a ^ a) ^ b = b. This is a consequence of a ^ a = 0 -- any value xor'ed with itself will return zero. And then, as previously mentioned, zero xor'ed with any value will just give you the original value.
How to think in bits:
Read up on what others have done, and take note of their strategies. The Stanford site you link to is a pretty good resource -- there are often several techniques shown for a particular operation, allowing you to see the problem from different angles. You might have noticed that there are people who've submitted their own alternative solutions for a particular operation, which were inspired by the techniques applied to a different operation. You could take the same approach.
Also, it might help you to remember a handful of simple identities, which you can then string together for more useful operations. IMO listing out the results for each bit-combination is only useful for reverse-engineering someone else's work.
Maybe you dont need to think in bits - perhaps you can get your compiler to think in bits for you
and you can focus on the actual problem you're trying to solve instead. Using bit manipulation directly
in your code can produce some profoundly impenetrable (if impressive) code -- here's some nice macros
(from the windows ddk) that demonstrate this
// from ntifs.h
// These macros are used to test, set and clear flags respectivly
#define FlagOn(_F,_SF) ((_F) & (_SF))
#define BooleanFlagOn(F,SF) ((BOOLEAN)(((F) & (SF)) != 0))
#define SetFlag(_F,_SF) ((_F) |= (_SF))
#define ClearFlag(_F,_SF) ((_F) &= ~(_SF))
now if you want to set a flag in a value you can simply say SetFlag(x, y) much clearer I think. Moreover
if you focus on the problem you're trying to address with your bit fiddling the mechanics will become
second nature without you having to expend any effort. Look after the bits and the bytes will look after
themselves!