I am working with bitvectors in C. My bitvectors are unsigned long long's. For a large number of vectors I need to know if the parity, i.e. the number of bits that are 1, is even or odd.
The exact value is not important, just the parity. I was wondering if there is anything faster than calculating the number of ones and checking. I tried to think of something, but couldn't find anything.
A short example of how I want this to work:
void checkIntersection(unsigned long long int setA, unsigned long long int setB){
if(isEven(setA & setB)){
//do something
}
}
With divide and conquer technique:
uint64_t a = value;
a ^= (a >> 32); // Fold the 32 MSB over the 32 LSB
a ^= (a >> 16); // reducing the problem by 50%
a ^= (a >> 8); // <-- this can be a good break even point
..
return lookup_table[a & 0xff]; // 16 or 256 entries are typically good
..
Folding procedure can be applied until the end:
a ^= (a >> 1);
return a & 1;
In IA the Parity flag can be directly retrieved after the reduction to 8 bits.
a ^= (a >> 4); makes another good point to stop dividing, since some processor architectures can provide parallel Look Up Tables uint8_t LUT[16] embedded into XXM (or NEON) registers. Or simply the potential cache misses of 256-entry LUT's can simply overweight the computational task of one extra round. It's naturally best to measure which LUT size is optimal in a given architecture.
This last table consists actually of 16 bits only and can be emulated with the sequence:
return ((TRUTH_TABLE_FOR_PARITY) >> (a & 15)) & 1;
where bit N of the magic constant above encodes the boolean value for Parity(N).
You could precompute in an array the parity for all possible combinations of bits in a byte:
bool pre[256] = { 0, 1, 1, 0, 1, ....}
When you need to find out the parity of a larger array you just do:
bool parity (long long unsigned x)
{
bool parity = 0;
while(x)
{
parity ^= pre[x&0xff];
x>>=8;
}
return parity;
}
Disclaimer: I haven't tested the code, it's just an idea.
Pretty easy. Something like
unsigned population(unsigned long long x) {
x = ((x >> 1) & 0x5555555555555555) + (x & 0x5555555555555555);
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) & 0x0f0f0f0f0f0f0f0f) + (x & 0x0f0f0f0f0f0f0f0f);
x = (x >> 8) + x; // Don't need to mask, because 64 < 0xff
x = (x >> 16) + x;
x = (x >> 32) + x;
return x & 0xff;
}
should work. Also, some CPUs have population count instructions (I don’t think x86 does, mind).
If you like this kind of thing, you should check out the book Hacker’s Delight by Henry S. Warren, Jr.
Related
I try to writ a function that calculate the average bits of byte.
float AvgOnesOnBinaryString (int x)
for example:
-252 is 11111111 11111111 11111111 00000100
so the function return 6.25
because ( 8+8+8+1) / 4 = 6.25
I have to use the function that count bits in char:
int countOnesOnBinaryString (char x){
int bitCount = 0;
while(x > 0)
{
if ( x & 1 == 1 )
bitCount++;
x = x>>1;
}
return bitCount;
}
I tried:
float AvgOnesOnBinaryString (int x){
float total = 0;
total += countOnesOnBinaryString((x >> 24));
total += countOnesOnBinaryString((x >> 16));
total += countOnesOnBinaryString((x >> 8));
total += countOnesOnBinaryString(x);
return total/4;
}
but I get the answae 0.25 and not 6.25
what could be the problem?
UPDATE
I can't change the AvgOnesOnBinaryString function signature.
The C language allows compilers to define char as either a signed or unsigned type. I suspect it is signed on your platform, meaning that a byte like 0xff is likely interpreted as -1. This means that the x > 0 test in countOnesOnBinaryString yields false, so countOnesOnBinaryString(0xff) would return 0 instead of the correct value 8.
You should change countOnesOnBinaryString to take an argument of type unsigned char instead of char.
For somewhat related reasons, it would also be a good idea to change the argument of AvgOnesOnBinaryString to be unsigned int. Or even better, uint32_t from <stdint.h>, since your code assumes the input value is 32 bits, and (unsigned) int is allowed to be of some other size.
There is one algorithm that gives you the count of the number of 1 bits in an unsigned variable far more quickly. Only 5 iterations are needed in a 32 bit integer. I'll show it to you in C for a full length 64 bit unsigned number, so probably you can guess the pattern and why it works (it is explained below):
uint64_t
no_of_1_bits(uint64_t the_value)
{
the_value = ((the_value & 0xaaaaaaaaaaaaaaaa) >> 1) + (the_value & 0x5555555555555555);
the_value = ((the_value & 0xcccccccccccccccc) >> 2) + (the_value & 0x3333333333333333);
the_value = ((the_value & 0xf0f0f0f0f0f0f0f0) >> 4) + (the_value & 0x0f0f0f0f0f0f0f0f);
the_value = ((the_value & 0xff00ff00ff00ff00) >> 8) + (the_value & 0x00ff00ff00ff00ff);
the_value = ((the_value & 0xffff0000ffff0000) >> 16) + (the_value & 0x0000ffff0000ffff);
the_value = ((the_value & 0xffffffff00000000) >> 32) + (the_value & 0x00000000ffffffff);
return the_value;
}
The number of 1 bits will be in the 64bit value of the_value. If you divide the result by eight, you'll have the average of 1 bits per byte for an unsigned long (beware of making the shifts with signed chars as the sign bit is replicated, so your algorithm will never stop for a negative number)
For 8 bit bytes, the algorithm reduces to:
uint8_t
no_of_1_bits(uint8_t the_value)
{
the_value = ((the_value & 0xaa) >> 1) + (the_value & 0x55);
the_value = ((the_value & 0xcc) >> 2) + (the_value & 0x33);
the_value = ((the_value & 0xf0) >> 4) + (the_value & 0x0f);
return the_value;
}
and again, the number of 1 bits is in the variable the_value.
The idea of this algorithm is to produce in the first step the sum of each pair of bits in a two bit accumulator (we shift the left bit of a pair to the right to align it with the right one, then we add them together, and in parallel for each pair of bits). As the accumulators are two bits, it is impossible to overflow (so there's never a carry from a pair of bits to the next, and we use the full integer as a series of two bit registers to add the sum)
Then we sum each pair of bits in an accumulator of four bits and again, that never overflows... let's do the same thing with the nibbles we got, and sum them into registers of 8 bits.... If it was impossible to overflow a 4 bit accumulator with two bits, it is more impossible to overflow an 8 bit accumulator with four bit addings.... and continue until you add the left half of the word with the right half. You finally end with the sum of all bits in one full length register of the word length.
Easy, isn't it? :)
Is there a bit twiddling hack for efficiently unpacking a 16-bit packed BCD number?
Doing it the pedestrian way requires 10 operations (3 shifts, 4 ANDs and 3 ORs or ADDs):
x = (bcd & 0xF000) << 12
| (bcd & 0x0F00) << 8
| (bcd & 0x00F0) << 4
| (bcd & 0x000F)
With multi-way ADD/OR the critical path length would be 3 but these operations tend to be binary and so most CPUs would be looking at a critical path of length 4.
Can this be done more efficiently?
Note: for some purposes it can be equally useful if some permutation of the nibbles can be unpacked especially efficiently, like if the word to be unpacked comes from a lookup table over whose creation I have full control (so that I can stick each digit wherever I want). The purpose of using packed instead of unpacked BCD in this case would be to halve the memory pressure and to avoid exceeding the size of the L1 cache, taking some load off an over-saturated memory subsystem by increasing the load on the CPU's ALUs.
For example, if I permute the digits like 0x1324 then a simple de-interleave yields 0x01020304:
x = ((bcd << 12) | bcd) & 0x0F0F0F0F
That's just three operations with critical path length 3, quite an improvement over the original version...
Here is an alternative way, with fewer operations but a longer critical path, based on the binary decomposition of the move-distance of the nibbles (moving nibbles that move by 8 or 12 steps together by 8, moving nibbles that move a distance of 4 or 12 together by 4).
x = bcd
x = ((x & 0xFF00) << 8) | (x & 0xFF)
x = ((x & 0x00F000F0) << 4) | (x & 0x000F000F)
For example:
// start
0000ABCD
// move A and B by 8
00AB00CD
// move A and C by 4
0A0B0C0D
The most efficient solution will be machine specific, as different ISAs have different capabilities when it comes to dealing with immediate constants, or combining shifts with ALU operations. Here is an alternative implementation with good instruction-level parallelism that may be superior on platforms with a very fast integer multiply. Integer multiply is often helpful for bit twiddling algorithms by performing multiple shift-add operations in parallel.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* reference implementation */
uint32_t bcd_spread_1 (uint32_t a)
{
return (((a & 0xF000) << 12) |
((a & 0x0F00) << 8) |
((a & 0x00F0) << 4) |
((a & 0x000F) << 0));
}
/* alternative implementation */
uint32_t bcd_spread_2 (uint32_t a)
{
return ((((a & 0xf0f0) * 0x1010) & 0x0f000f00) |
(((a & 0x0f0f) * 0x0101) & 0x000f000f));
}
/* BCD addition. Knuth TAOCP 4 */
uint32_t median (uint32_t x, uint32_t y, uint32_t z)
{
return (x & (y | z)) | (y & z);
}
uint32_t bcd_add (uint32_t x, uint32_t y)
{
uint32_t z, u, t;
z = y + 0x66666666;
u = x + z;
t = median (~x, ~z, u) & 0x88888888;
return u - t + (t >> 2);
}
int main (void)
{
uint32_t x, y, bcd = 0;
do {
x = bcd_spread_1 (bcd);
y = bcd_spread_2 (bcd);
if (x != y) {
printf ("!!!! bcd=%04x x=%08x y=%08x\n", bcd, x, y);
return EXIT_FAILURE;
}
bcd = bcd_add (bcd, 1);
} while (bcd < 0x10000);
return EXIT_SUCCESS;
}
Use the DoubleDabble algorithm.
[Part of a HW question]
Assume 2's complement, 32bit word-length. Only signed int and constants 0 through 0xFF allowed. I've been asked to implement a logical right shift by "n" bits (0 <= n <= 31) using ONLY the operators:
! ~ & ^ | + << >>
I figured I could store and clear the sign bit, perform the shift, and replace the stored sign bit in its new location.
I would like to implement the operation "31 - n" (w/out using the "-" operator) to find the appropriate location for the stored sign bit post shift.
If n were positive, I could use the expression: "31 + (~n + 1)", but I don't believe this will work in the case when n = 0.
Here's what I have so far:
int logicalShift(int x, int n) {
/* Store & clear sign bit, perform shift, and replace stored sign bit
in new location */
int bit = (x >> 31) & 1; // Store most significant bit
x &= ~(1 << 31); // Clear most significant bit
x = x >> n; // Shift by n
x &= ~((~bit) << (31 - n)); // Replace MSbit in new location
return x;
}
Any help and/or hints are appreciated.
[EDIT: Solved]
Thanks to everyone for the help. ~n + 1 works to negate n in this situation, including for the case n = 0 (where it returns 0 as desired). Functional code is below (by no means the most elegant solution). Utility operations borrowed from: How do you set, clear, and toggle a single bit?
int logicalShift(int x, int n) {
/* Store & clear sign bit, perform shift, and replace stored sign bit
in new location */
int bit = (x >> 31) & 1; // Store most significant bit
x &= ~(1 << 31); // Clear most significant bit
x = x >> n; // Shift by n
x ^= ((~bit + 1) ^ x) & (1 << (31 + (~n + 1))); // Replace MSbit in new location
return x;
}
A simple solution is
int logicalShift(int x, int n) {
return (x >> n) ^ (((x & 0x80000000) >> n) << 1);
}
Sadly, using the constant 0x80000000 is forbidden. We could calculate it as 1 << 31 (ignoring undefined behavior in C) or, to save on instruction, calculate 31 - n as n ^ 31 and then use the following somewhat more contrived method:
int logicalShift(int x, int n) {
int b = 1 << (n ^ 31);
return b ^ ((x >> n) + b);
}
I have a long sequence of bits stored in an array of unsigned long integers, like this
struct bit_array
{
int size; /* nr of bits */
unsigned long *array; /* the container that stores bits */
}
I am trying to design an algorithm to reverse the order of bits in *array. Problems:
size can be anything, i.e. not necessarily a multiple of 8 or 32 etc, so the first bit in the input array can end up at any position within the unsigned long in the output array;
the algorithm should be platform-independent, i.e. work for any sizeof(unsigned long).
Code, pseudocode, algo description etc. -- anything better than bruteforce ("bit by bit") approach is welcome.
My favorite solution is to fill a lookup-table that does bit-reversal on a single byte (hence 256 byte entries).
You apply the table to 1 to 4 bytes of the input operand, with a swap. If the size isn't a multiple of 8, you will need to adjust by a final right shift.
This scales well to larger integers.
Example:
11 10010011 00001010 -> 01010000 11001001 11000000 -> 01 01000011 00100111
To split the number into bytes portably, you need to use bitwise masking/shifts; mapping of a struct or array of bytes onto the integer can make it more efficient.
For brute performance, you can think of mapping up to 16 bits at a time, but this doesn't look quite reasonable.
I like the idea of lookup table. Still it's also a typical task for log(n) group bit tricks that may be very fast. Like:
unsigned long reverseOne(unsigned long x) {
x = ((x & 0xFFFFFFFF00000000) >> 32) | ((x & 0x00000000FFFFFFFF) << 32);
x = ((x & 0xFFFF0000FFFF0000) >> 16) | ((x & 0x0000FFFF0000FFFF) << 16);
x = ((x & 0xFF00FF00FF00FF00) >> 8) | ((x & 0x00FF00FF00FF00FF) << 8);
x = ((x & 0xF0F0F0F0F0F0F0F0) >> 4) | ((x & 0x0F0F0F0F0F0F0F0F) << 4);
x = ((x & 0xCCCCCCCCCCCCCCCC) >> 2) | ((x & 0x3333333333333333) << 2);
x = ((x & 0xAAAAAAAAAAAAAAAA) >> 1) | ((x & 0x5555555555555555) << 1);
return x;
}
The underlying idea is that when we aim to reverse the order of some sequence we may swap the head and tail halves of this sequence and then separately reverse each of halves (which is done here by applying the same procedure recursively to each half).
Here is a more portable version supporting unsigned long widths of 4,8,16 or 32 bytes.
#include <limits.h>
#define ones32 0xFFFFFFFFUL
#if (ULONG_MAX >> 128)
#define fill32(x) (x|(x<<32)|(x<<64)|(x<<96)|(x<<128)|(x<<160)|(x<<192)|(x<<224))
#define patt128 (ones32|(ones32<<32)|(ones32<<64) |(ones32<<96))
#define patt64 (ones32|(ones32<<32)|(ones32<<128)|(ones32<<160))
#define patt32 (ones32|(ones32<<64)|(ones32<<128)|(ones32<<192))
#else
#if (ULONG_MAX >> 64)
#define fill32(x) (x|(x<<32)|(x<<64)|(x<<96))
#define patt64 (ones32|(ones32<<32))
#define patt32 (ones32|(ones32<<64))
#else
#if (ULONG_MAX >> 32)
#define fill32(x) (x|(x<<32))
#define patt32 (ones32)
#else
#define fill32(x) (x)
#endif
#endif
#endif
unsigned long reverseOne(unsigned long x) {
#if (ULONG_MAX >> 32)
#if (ULONG_MAX >> 64)
#if (ULONG_MAX >> 128)
x = ((x & ~patt128) >> 128) | ((x & patt128) << 128);
#endif
x = ((x & ~patt64) >> 64) | ((x & patt64) << 64);
#endif
x = ((x & ~patt32) >> 32) | ((x & patt32) << 32);
#endif
x = ((x & fill32(0xffff0000UL)) >> 16) | ((x & fill32(0x0000ffffUL)) << 16);
x = ((x & fill32(0xff00ff00UL)) >> 8) | ((x & fill32(0x00ff00ffUL)) << 8);
x = ((x & fill32(0xf0f0f0f0UL)) >> 4) | ((x & fill32(0x0f0f0f0fUL)) << 4);
x = ((x & fill32(0xccccccccUL)) >> 2) | ((x & fill32(0x33333333UL)) << 2);
x = ((x & fill32(0xaaaaaaaaUL)) >> 1) | ((x & fill32(0x55555555UL)) << 1);
return x;
}
In a collection of related topics which can be found here, the bits of an individual array entry could be reversed as follows.
unsigned int v; // input bits to be reversed
unsigned int r = v; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end
for (v >>= 1; v; v >>= 1)
{
r <<= 1;
r |= v & 1;
s--;
}
r <<= s; // shift when v's highest bits are zero
The reversal of the entire array could be done afterwards by rearranging the individual positions.
You must define what is the order of bits in an unsigned long. You might assume that bit n is corresponds to array[x] & (1 << n) but this needs to be specified. If so, you need to handle the byte ordering (little or big endian) if you are going to use access the array as bytes instead of unsigned long.
I would definitely implement brute force first and measure whether the speed is an issue. No need to waste time trying to optimize this if it is not used a lot on large arrays. An optimized version can be tricky to implement correctly. If you end up trying anyway, the brute force version can be used to verify correctness on test values and benchmark the speed of the optimized version.
The fact that the size is not multiple of sizeof(long) is the hardest part of the problem. This can result in a lot of bit shifting.
But, you don't have to do that if you can introduce new struct member:
struct bit_array
{
int size; /* nr of bits */
int offset; /* First bit position */
unsigned long *array; /* the container that stores bits */
}
Offset would tell you how many bits to ignore at the beginning of the array.
Then you only only have to do following steps:
Reverse array elements.
Swap bits of each element. There are many hacks for in the other answers, but your compiler might also provide intrisic functions to do it in fewer instructions (like RBIT instruction on some ARM cores).
Calculate new starting offset. This is equal to unused bits the last element had.
I would split the problem into two parts.
First, I would ignore the fact that the number of used bits is not a multiple of 32. I would use one of the given methods to swap around the whole array like that.
pseudocode:
for half the longs in the array:
take the first longword;
take the last longword;
swap the bits in the first longword
swap the bits in the last longword;
store the swapped first longword into the last location;
store the swapped last longword into the first location;
and then fix up the fact that the first few bits (call than number n) are actually garbage bits from the end of the longs:
for all of the longs in the array:
split the value in the leftmost n bits and the rest;
store the leftmost n bits into the righthand part of the previous word;
shift the rest bits to the left over n positions (making the rightmost n bits zero);
store them back;
You could try to fold that into one pass over the whole array of course. Something like this:
for half the longs in the array:
take the first longword;
take the last longword;
swap the bits in the first longword
swap the bits in the last longword;
split both value in the leftmost n bits and the rest;
for the new first longword:
store the leftmost n bits into the righthand side of the previous word;
store the remaining bits into the first longword, shifted left;
for the new last longword:
remember the leftmost n bits for the next iteration;
store the remembered leftmost n bits, combined with the remaining bits, into the last longword;
store the swapped first longword into the last location;
store the swapped last longword into the first location;
I'm abstracting from the edge cases here (first and last longword), and you may need to reverse the shifting direction depending on how the bits are ordered inside each longword.
am having a little trouble with this function of mine. We are supposed to use bit wise operators only (that means no logical operators and no loops or if statements) and we aren't allowed to use a constant bigger than 0xFF.
I got my function to work, but it uses a huge constant. When I try to implement it with smaller numbers and shifting, I can't get it to work and I'm not sure why.
The function is supposed to check all of the even bits in a given integer, and return 1 if they are all set to 1.
Working code
int allEvenBits(int x) {
/* implements a check for all even-numbered bits in the word set to 1 */
/* if yes, the program outputs 1 WORKING */
int all_even_bits = 0x55555555;
return (!((x & all_even_bits) ^ all_even_bits));
}
Trying to implement with a smaller constant and shifts
int allEvenBits(int x) {
/* implements a check for all even-numbered bits in the word set to 1 */
/* if yes, the program outputs 1 WORKING */
int a, b, c, d, e = 0;
int mask = 0x55;
/* first 8 bits */
a = (x & mask)&1;
/* second eight bits */
b = ((x>>8) & mask)&1;
/* third eight bits */
c = ((x>>16) & mask)&1;
/* fourth eight bits */
d = ((x>>24) & mask)&1;
e = a & b & c & d;
return e;
}
What am I doing wrong here?
When you do, for example, this:
d = ((x>>24) & mask)&1;
..you're actually checking whether the lowest bit (with value 1) is set, not whether any of the the mask bits are set... since the &1 at the end bitwise ANDs the result of the rest with 1. If you change the &1 to == mask, you'll instead get 1 when all of the bits set in mask are set in (x>>24), as intended. And of course, the same problem exists for the other similar lines as well.
If you can't use comparisons like == or != either, then you'll need to shift all the interesting bits into the same position, then AND them together and with a mask to eliminate the other bit positions. In two steps, this could be:
/* get bits that are set in every byte of x */
x = (x >> 24) & (x >> 16) & (x >> 8) & x;
/* 1 if all of bits 0, 2, 4 and 6 are set */
return (x >> 6) & (x >> 4) & (x >> 2) & x & 1;
I don't know why you are ANDing your values with 1. What is the purpose of that?
This code is untested, but I would do something along the lines of the following.
int allEvenBits(int x) {
return (x & 0x55 == 0x55) &&
((x >> 8) & 0x55 == 0x55) &&
((x >> 16) & 0x55 == 0x55) &&
((x >> 24) & 0x55 == 0x55);
}
Say you are checking the first 4 least significant digits, the even ones would make 1010. Now you should AND this with the first 4 bits of the number you're checking against. All 1's should remain there. So the test would be ((number & mask) == mask) (mask is 1010) for the 4 least significant bits, you do this in blocks of 4bits (or you can use 8 since you are allowed).
If you aren't allowed to use constants larger than 0xff and your existing program works, how about replacing:
int all_even_bits = 0x55555555;
by:
int all_even_bits = 0x55;
all_even_bits |= all_even_bits << 8; /* it's now 0x5555 */
all_even_bits |= all_even_bits << 16; /* it's now 0x55555555 */
Some of the other answers here right shift signed integers (i.e. int) which is undefined behaviour.
An alternative route is:
int allevenbitsone(unsigned int a)
{
a &= a>>16; /* superimpose top 16 bits on bottom */
a &= a>>8; /* superimpose top 8 bits on bottom */
a &= a>>4; /* superimpose top 4 bits on bottom */
a &= a>>2; /* and down to last 2 bits */
return a&1; /* return & of even bits */
}
What this is doing is and-ing together the even 16 bits into bit 0, and the odd 16 bits into bit 1, then returning bit 0.
the main problem in your code that you're doing &1, so you take first 8 bits from number, mask them with 0x55 and them use only 1st bit, which is wrong
consider straightforward approach:
int evenBitsIn8BitNumber(int a) {
return (a & (a>>2) & (a>>4) & (a>>6)) & 1;
}
int allEvenBits(int a) {
return evenBitsIn8BitNumber(a) &
evenBitsIn8BitNumber(a>>8) &
evenBitsIn8BitNumber(a>>16) &
evenBitsIn8BitNumber(a>>24);
}