I am looking for an algorithm to shuffle the first 25 bits of a (32-bit) int - c

All of the bit shuffling algorithms I've found deal with 16-bit or 32-bit, which means that even if I use only the first 25-bits of an int, the shuffle will leave bits outside. This function is in an inner loop of a CPU-intensive process so I'd prefer it to be as fast as possible. I've tried modifying the code of the Hacker's Delight 32-bit shuffle algorithm
x = (x & 0x0000FF00) << 8 | (x >> 8) & 0x0000FF00 | x & 0xFF0000FF;
x = (x & 0x00F000F0) << 4 | (x >> 4) & 0x00F000F0 | x & 0xF00FF00F;
x = (x & 0x0C0C0C0C) << 2 | (x >> 2) & 0x0C0C0C0C | x & 0xC3C3C3C3;
x = (x & 0x22222222) << 1 | (x >> 1) & 0x22222222 | x & 0x99999999;
but am having difficulty in doing some partly because I'm not sure where the masks come from. I tried shifting the number and re-shuffling but so far the results are all for naught. Any help would be GREATLY appreciated!
(I am using C but I can convert an algorithm from another language)

First, for the sake of evenness, we can extend the problem to a 26-bit shuffle by remembering that bit 25 will appear at the end of the interleaved list, so we can trim it off after the interleaving operation without affecting the positions of the other bits.
Now we want to interleave the first and second sets of 13 bits; but we only have an algorithm to interleave the first and second sets of 16 bits.
A straightfoward approach might be to just move the high and low parts of x into more workable positions before applying the standard algorithm:
x = (x & 0x1ffe000) << 3 | x & 0x00001fff;
x = (x & 0x0000FF00) << 8 | (x >> 8) & 0x0000FF00 | x & 0xFF0000FF;
x = (x & 0x00F000F0) << 4 | (x >> 4) & 0x00F000F0 | x & 0xF00FF00F;
x = (x & 0x0C0C0C0C) << 2 | (x >> 2) & 0x0C0C0C0C | x & 0xC3C3C3C3;
x = (x & 0x22222222) << 1 | (x >> 1) & 0x22222222 | x & 0x99999999;
The zeroes at the top of each half will be interleaved and appear at the top of the result.

Related

Binary Interleaving, Binary Swizzling, Alternating Bits

Problem:
I have a sequence of bits of indices 7 6 5 4 3 2 1 0 and I want to swizzle them the following way :
7 6 5 4 3 2 1 0 = 7 6 5 4 3 2 1 0
_____| | | | | | | |_____
| ___| | | | | |___ |
| | _| | | |_ | |
| | | | | | | |
v v v v v v v v
_ 3 _ 2 _ 1 _ 0 7 _ 6 _ 5 _ 4 _
|___________________|
|
v
7 3 6 2 5 1 4 0
i.e. I want to interleave the bits of the low and high nibbles from a byte.
Naive solution:
I can achieve this behavior in C using the following way :
int output =
((input & (1 << 0)) << 0) |
((input & (1 << 1)) << 1) |
((input & (1 << 2)) << 2) |
((input & (1 << 3)) << 3) |
((input & (1 << 4)) >> 3) |
((input & (1 << 5)) >> 2) |
((input & (1 << 6)) >> 1) |
((input & (1 << 7)) >> 0);
However it's obviously very clunky.
Striving for a more elegant solution:
I was wondering if there where something I could do to achieve this behavior faster in less machine instructions. Using SSE for example?
Some context for curious people :
I use this for packing 2d signed integer vector coordinates into a 1d value that conserves proximity when dealing with memory and caching. The idea is similar to some texture layouts optimization used by some GPUs on mobile devices.
(i ^ 0xAAAAAAAA) - 0xAAAAAAAA converts from 1d integer to 1d signed integer with this power of two proximity I was talking about.
(x + 0xAAAAAAAA) ^ 0xAAAAAAAA is just the reverse operation, going from 1d signed integer to a 1d integer, still with the same properties.
To have it become 2d and keep the proximity property, I want to alternate the x and y bits.
So you want to interleave the bits of the low and high nibbles in each byte? For scalar code a 256-byte lookup table (LUT) is probably your best bet.
For x86 SIMD, SSSE3 pshufb (_mm_shuffle_epi8) can be used as a parallel LUT of 16x nibble->byte lookups in parallel. Use this to unpack a nibble to a byte.
__m128i interleave_high_low_nibbles(__m128i v) {
const __m128i lut_unpack_bits_low = _mm_setr_epi8( 0, 1, 0b00000100, 0b00000101,
... // dcba -> 0d0c0b0a
);
const __m128i lut_unpack_bits_high = _mm_slli_epi32(lut_unpack_bits_low, 1);
// dcba -> d0c0b0a0
// ANDing is required because pshufb uses the high bit to zero that element
// 8-bit element shifts aren't available so also we have to mask after shifting
__m128i lo = _mm_and_si128(v, _mm_set1_epi8(0x0f));
__m128i hi = _mm_and_si128(_mm_srli_epi32(v, 4), _mm_set1_epi8(0x0f));
lo = _mm_shuffle_epi8(lut_unpack_bits_low, lo);
hi = _mm_shuffle_epi8(lut_unpack_bits_high, hi);
return _mm_or_si128(lo, hi);
}
This is not faster than a memory LUT for a single byte, but it does 16 bytes in parallel. pshufb is a single-uop instruction on x86 CPUs made in the last decade. (Slow on first-gen Core 2 and K8.)
Having separate lo/hi LUT vectors means that setup can be hoisted out of a loop; otherwise we'd need to shift one LUT result before ORing together.

k&r exercise 2-6 "setbits"

I've seen the answer here: http://clc-wiki.net/wiki/K%26R2_solutions:Chapter_2:Exercise_6
and i've tested the first, but in this part:
x = 29638;
y = 999;
p = 10;
n = 8;
return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n)))))
in a paper it give to me a 6, but in the program it return 28678...
in this part:
111001111000110
&000100000000111
in the result, the left-most three bits has to be 1's like in x but the bitwise operator & says:
The output of bitwise AND is 1 if the corresponding bits of all operands is 1. If either bit of an operand is 0, the result of corresponding bit is evaluated to 0.
so why it returns the number with thats 3 bits in 1?
Here we go, one step at a time (using 16-bit numbers). We start with:
(x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n)))))
Substituting in numbers (in decimal):
(29638 & ((~0 << (10 + 1)) | (~(~0 << (10 + 1 - 8)))))
Totalling up the bit shift amounts gives:
(29638 & ((~0 << 11) | (~(~0 << 3))))
Rewriting numbers as binary and applying the ~0s...
(0111001111000110 & ((1111111111111111 << 1011) | (~(1111111111111111 << 0011))))
After performing the shifts we get:
(0111001111000110 & (1111100000000000 | (~ 1111111111111000)))
Applying the other bitwise-NOT (~):
(0111001111000110 & (1111100000000000 | 0000000000000111))
And the bitwise-OR (|):
0111001111000110 & 1111100000000111
And finally the bitwise-AND (&):
0111000000000110
So we then have binary 0111000000000110, which is 2 + 4 + 4096 + 8192 + 16384, which is 28678.

Counting consecutive 1's in C [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Finding consecutive bit string of 1 or 0
Is it possible to count, from left, consecutive 1's in an integer?
So: total number of consecutive set bits starting from the top bit.
Using only:
! ~ & ^ | + << >>
-1= 0xFFFFFFFF would return 32
0xFFF0F0F0 would return 12 (FFF = 111111111111)
No loops, unfortunately.
Can assume the machine:
Uses 2s complement, 32-bit representations of integers.
Performs right shifts arithmetically.
Has unpredictable behavior when shifting an integer by more
than the word size.
I'm forbidden to:
Use any control constructs such as if, do, while, for, switch, etc.
Define or use any macros.
Define any additional functions in this file.
Call any functions.
Use any other operations, such as &&, ||, -, or ?:
Use any form of casting.
Use any data type other than int. This implies that you
cannot use arrays, structs, or unions.
I've looked at
Finding consecutive bit string of 1 or 0
It's using loops, which I can't use. I don't even know where to start.
(Yes, this is an assignment, but I'm simply asking those of you skilled enough for help. I've done pretty much all of those I need to do, but this one just won't work.)
(For those downvoting simply because it's for school:
FAQ:
1 a specific programming problem, check
2 However, if your motivation is “I would like others to explain ______ to me”, then you are probably OK.)
You can do it like this:
int result = clz(~x);
i.e. invert all the bits and then count leading zeroes.
clz returns the number of leading zero bits (also commonly known as ffs or nlz) - see here for implementation details: http://en.wikipedia.org/wiki/Find_first_set#Algorithms
Here you are. The function argument may be signed or unsigned. The alg is independent on signedness.
int leftmost_ones(int x)
{
x = ~x;
x = x | x >> 1 | x >> 2 | x >> 3 | x >> 4 | x >> 5 | x >> 6 | x >> 7 |
x >> 8 | x >> 9 | x >> 10 | x >> 11 | x >> 12 | x >> 13 | x >> 14 |
x >> 15 | x >> 16 | x >> 17 | x >> 18 | x >> 19 | x >> 20 | x >> 21 |
x >> 22 | x >> 23 | x >> 24 | x >> 25 | x >> 26 | x >> 27 | x >> 28 |
x >> 29 | x >> 30 | x >> 31;
x = ~x;
return (x & 1) + (x >> 1 & 1) + (x >> 2 & 1) + (x >> 3 & 1) + (x >> 4 & 1) +
(x >> 5 & 1) + (x >> 6 & 1) + (x >> 7 & 1) + (x >> 8 & 1) + (x >> 9 & 1) +
(x >> 10 & 1) + (x >> 11 & 1) + (x >> 12 & 1) + (x >> 13 & 1) + (x >> 14 & 1) +
(x >> 15 & 1) + (x >> 16 & 1) + (x >> 17 & 1) + (x >> 18 & 1) + (x >> 19 & 1) +
(x >> 20 & 1) + (x >> 21 & 1) + (x >> 22 & 1) + (x >> 23 & 1) + (x >> 24 & 1) +
(x >> 25 & 1) + (x >> 26 & 1) + (x >> 27 & 1) + (x >> 28 & 1) + (x >> 29 & 1) +
(x >> 30 & 1) + (x >> 31 & 1);
}
A version with some optimization:
int leftmost_ones(int x)
{
x = ~x;
x |= x >> 16;
x |= x >> 8;
x |= x >> 4;
x |= x >> 2;
x |= x >> 1;
x = ~x;
return (x & 1) + (x >> 1 & 1) + (x >> 2 & 1) + (x >> 3 & 1) + (x >> 4 & 1) +
(x >> 5 & 1) + (x >> 6 & 1) + (x >> 7 & 1) + (x >> 8 & 1) + (x >> 9 & 1) +
(x >> 10 & 1) + (x >> 11 & 1) + (x >> 12 & 1) + (x >> 13 & 1) + (x >> 14 & 1) +
(x >> 15 & 1) + (x >> 16 & 1) + (x >> 17 & 1) + (x >> 18 & 1) + (x >> 19 & 1) +
(x >> 20 & 1) + (x >> 21 & 1) + (x >> 22 & 1) + (x >> 23 & 1) + (x >> 24 & 1) +
(x >> 25 & 1) + (x >> 26 & 1) + (x >> 27 & 1) + (x >> 28 & 1) + (x >> 29 & 1) +
(x >> 30 & 1) + (x >> 31 & 1);
}
Can you use a loop?
int mask = 0x80000000;
int count = 0;
while (number & mask) {
count += 1;
mask >>= 1;
}
I think it's doable, by basically unrolling the typical loop and being generally annoying.
How about this: an expression that is 1 if and only if the answer is 1? I offer:
const int ok1 = !((number & 0xc0000000) - 0x800000000);
The ! and subtraction are to work around that someone broke the == key on our keyboard, of course.
And then, an expression that is 1 if and only if the anwer is 2:
const int ok2 = !((number & 0xe0000000) - 0xc0000000);
If you continue to form these, the final answer is their sum:
const int answer = ok1 + ok2 + ... + ok32;
By the way, I can't seem to remember being given these weirdly restricted assignments when I was in school, I guess times have changed. :)
int count_consecutive_bits(unsigned int x) {
int res = 0;
while (x & 0x80000000) { ++res; x <<= 1; }
return res;
}

Bitwise Operation in C - AnyOddBit

I am having trouble with the last problem of my bit twiddling homework exercise. The function is supposed to return 1 if any odd bit is set to 1.
Here is what I have so far:
int anyOddBit(int x) {
return (x & 0xaaaaaaaa) != 0;
}
That works perfectly, but I am not allowed to use a constant that large (only allowed 0 through 255, 0xFF). I am also not allowed to use !=
Specifically, this is what I am limited to using:
Each "Expr" is an expression using ONLY the following:
1. Integer constants 0 through 255 (0xFF), inclusive. You are
not allowed to use big constants such as 0xffffffff.
2. Function arguments and local variables (no global variables).
3. Unary integer operations ! ~
4. Binary integer operations & ^ | + << >>
I can't figure out how to do this within in those restrictions and I'd really appreciate it if someone could point me in the right direction. Thanks in advance!
You could do your ORs ahead of ANDs:
((x>>0) | (x>>8) | (x>>16) | (x>>24)) & 0xaa
The initial shift (x >> 0) will be optimized out - it's there for consistent look.
You can use:
!!(( ( x & 0xff)
| ((x >> 8) & 0xff)
| ((x >> 16) & 0xff)
| ((x >> 24) & 0xff)
) & 0xaa)
The "inner" bit, which ORs together each source octet, will give you an octet where each bit is set if the equivalent bit is set in any of the source octets. So, if one of the odd bits is set in the source octets, it will also be set in the target one.
Then, by simply ANDing that with 0xaa, you get a zero value if no odd bits are set, or a non-zero value if any of the odd bits are set.
Then, since you need 0 or 1, and you can't use !=, you can acheive a similar effect with !!, two logical not operators. It works because !(any-non-zero-value) gives 0 and !0 gives 1.
In order to do it with 12 operators only (rather than 13 as per my original solution above), you can remove the & 0xff for the >> 24 value since it's not actually necessary (zero-bits are shifted in from the left):
!!(( ( x & 0xff)
| ((x >> 8) & 0xff)
| ((x >> 16) & 0xff)
| ((x >> 24) )
) & 0xaa)
In fact, you can do even better than that. The final & 0xaa will clear out all the upper 24 bits anyway so no & 0xff sections are needed (it also fits on one line as well):
!!((x | (x >> 8) | (x >> 16) | (x >> 24)) & 0xaa)
That gets it down to nine operators.
0xaaaaaaaa is basically (0xaa << 24) | (0xaa << 16) | (0xaa << 8) | (0xaa), and that is allowed, isn't it?

Implementing logical negation with only bitwise operators (except !)

~ & ^ | + << >> are the only operations I can use
Before I continue, this is a homework question, I've been stuck on this for a really long time.
My original approach: I thought that !x could be done with two's complement and doing something with it's additive inverse. I know that an xor is probably in here but I'm really at a loss how to approach this.
For the record: I also cannot use conditionals, loops, ==, etc, only the functions (bitwise) I mentioned above.
For example:
!0 = 1
!1 = 0
!anything besides 0 = 0
Assuming a 32 bit unsigned int:
(((x>>1) | (x&1)) + ~0U) >> 31
should do the trick
Assuming x is signed, need to return 0 for any number not zero, and 1 for zero.
A right shift on a signed integer usually is an arithmetical shift in most implementations (e.g. the sign bit is copied over). Therefore right shift x by 31 and its negation by 31. One of those two will be a negative number and so right shifted by 31 will be 0xFFFFFFFF (of course if x = 0 then the right shift will produce 0x0 which is what you want). You don't know if x or its negation is the negative number so just 'or' them together and you will get what you want. Next add 1 and your good.
implementation:
int bang(int x) {
return ((x >> 31) | ((~x + 1) >> 31)) + 1;
}
The following code copies any 1 bit to all positions. This maps all non-zeroes to 0xFFFFFFFF == -1, while leaving 0 at 0. Then it adds 1, mapping -1 to 0 and 0 to 1.
x = x | x << 1 | x >> 1
x = x | x << 2 | x >> 2
x = x | x << 4 | x >> 4
x = x | x << 8 | x >> 8
x = x | x << 16 | x >> 16
x = x + 1
For 32 bit signed integer x
// Set the bottom bit if any bit set.
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x ^= 1; // Toggle the bottom bit - now 0 if any bit set.
x &= 1; // Clear the unwanted bits to leave 0 or 1.
Assuming e.g. an 8-bit unsigned type:
~(((x >> 0) & 1)
| ((x >> 1) & 1)
| ((x >> 2) & 1)
...
| ((x >> 7) & 1)) & 1
You can just do ~x & 1 because it yields 1 for 0 and 0 for everything else

Resources