Finding out no bits set in a variable in faster manner [duplicate] - c

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Best algorithm to count the number of set bits in a 32-bit integer?
Finding out the no. bits sets in a variable is easier. But how could we perform the same operation in fastest method ?

This page on Bit Twiddling Hacks covers several techniques to count the number of bits set, and discusses the performance of each.

The bit twiddling hacks page has a variety of suggestions.

I highly recommend reading Hacker's Delight for all questions regarding various forms of bit-twiddling. For counting bits, in particular, it analyzes several algorithms depending on the instructions you might have available to you.

If you're asking the question, then chances are __builtin_popcount on gcc is at least as fast as what you're currently doing. __builtin_popcount can generally be beaten on x86, so presumably on other CPUs too, but you don't say what your CPU is other than "embedded". It affects the answer.
If you're not using gcc, then you need to look up how to do a fast popcount on your actual compiler and/or CPU. For obvious reasons, there is no such thing as "the fastest way to count set bits in C".

int i, size, set;
for (i = 1, size = sizeof(int) * 8; i <= size; i++)
{
if (value & (0 << 2 * i)) set++;
}

Counting the set bits in a variable is termed the "population count", shortened to "popcount".
A very good micro-benchmark of different software algorithms is given at: http://www.dalkescientific.com/writings/diary/archive/2008/07/05/bitslice_and_popcount.html
AMD "Barcelona" processors onwards have a fast fixed-cost instruction, which in GCC you can get using __builtin_popcount
On Intel boxes I've found that __builtin_ffs in a loop works best for sparse bit sets.
Its something you can't rely upon; you must micro-benchmark if this is important to you.

If variable is an integer, you can count bits using
public static int BitCount(int x)
{ return ((x == 0) ? 0 : ((x < 0) ? 1 : 0) + BitCount(x <<= 1)); }
Explanation:
Recursive, if number is zero, no bits are set, and function returns a zero
else, it checks the sign bit and if set stores 1 else stores a 0,
then shifts the entire number one bit to the left
eliminating the sign bit just examined,
and putting a zero in rightmost bit,
and calls itself again with new left-Shifted value.
Overall result is to examine each bit from leftmost to rightmost, and for each one set, stores on stack whether that bit was set (as 1/0), left-Shits next bit into sign bit position and resurses. When it finally gets to the last bit set , the value will be zero and recursion will stop. Function then returns up the call stack, adding up all the temp values it stored on the way down. Returns total

Related

Operating Rightmost/Leftmost n-Bits, Not All the Bits of A Integer Type Data Variable

In a programming-task, I have to add a smaller integer in variable B (data type int)
to a larger integer (20 decimal integer) in variable A (data type long long int),
then compare A with variable C which is also as large integer (data type long long int) as A.
What I realized, since I add a smaller B to A,
I don't need to check all the digits of A when I compare that with C, in other words, we don't need to check all the bits of A and C.
Given that I know, how many bits from the right I need to check, say n-bits,
is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language?
Because for comparing all the bits take more time, and since I am working with large number, the program becomes slower.
Every time I search in the google, bit-masking appears which uses all the bits of A, C, that doesn't do what I am asking for, so probably I am not using correct terminology, please help.
Addition:
Initial comments of this post made me think there is no way but i found the following -
Bit Manipulation by University of Colorado Boulder
(#cuboulder, after 7:45)
...the bit band region is accessed via a bit band alĂ­as, each bit in a
supported bit band region has its own unique address and we can access
that bit using a pointer to its bit band alias location, the least
significant bit in an alias location can be sent or cleared and that
will be mapped to the bit in the corresponding data or peripheral
memory, unfortunately this will not help you if you need to write to
multiple bit locations in memory dependent operations only allow a
single bit to be cleared or set...
Is above what I a asking for? if yes then
where I can find the detail as beginner?
Updated question:
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
Your assumption that comparing fewer bits is faster might be true in some cases but is probably not true in most cases.
I'm only familiar with x86 CPUs. A x86-64 Processor has 64 bit wide registers. These can be accessed as 64 bit registers but the lower bits also as 32, 16 and 8 bit registers. There are processor instructions which work with the 64, 32, 16 or 8 bit part of the registers. Comparing 8 bits is one instruction but so is comparing 64 bits.
If using the 32 bit comparison would be faster than the 64 bit comparison you could gain some speed. But it seems like there is no speed difference for current processor generations. (Check out the "cmp" instruction with the link to uops.info from #harold.)
If your long long data type is actually bigger then the word size of your processor, then it's a different story. E.g. if your long long is 64 bit but your are on a 32 bit processor then these instructions cannot be handled by one register and you would need multiple instructions. So if you know that comparing only the lower 32 bits would be enough this could save some time.
Also note that comparing only e.g. 20 bits would actually take more time then comparing 32 bits. You would have to compare 32 bits and then mask the 12 highest bits. So you would need a comparison and a bitwise and instruction.
As you see this is very processor specific. And you are on the processors opcode level. As #RawkFist wrote in his comment you could try to get the C compiler to create such instructions but that does not automatically mean that this is even faster.
All of this is only relevant if these operations are executed a lot. I'm not sure what you are doing. If e.g. you add many values B to A and compare them to C each time it might be faster to start with C, subtract the B values from it and compare with 0. Because the compare-operation works internally like a subtraction. So instead of an add and a compare instruction a single subtraction would be enough within the loop. But modern CPUs and compilers are very smart and optimize a lot. So maybe the compiler automatically performs such or similar optimizations.
Try this question.
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
Yes - when A + B != C. We can short-cut the comparison once a difference is found: from least to most significant.
No - when A + B == C. All bits need comparison.
Now back to OP's original question
Is there a way/technique to check only those specific n-bits from the right (not all the bits of A, C) to make the program faster in c programming language (or any other language) that makes the program faster?
No. In order to do so, we need to out-think the compiler. A well enabled compiler itself will notice any "tricks" available for long long + (signed char)int == long long and emit efficient code.
Yet what about really long compares? How about a custom uint1000000 for A and C?
For long compares of a custom type, a quick compare can be had.
First, select a fast working type. unsigned is a prime candidate.
typedef unsigned ufast;
Now define the wide integer.
#include <limits.h>
#include <stdbool.h>
#define UINT1000000_N (1000000/(sizeof(ufast) * CHAR_BIT))
typedef struct {
// Least significant first
ufast digit[UINT1000000_N];
} uint1000000;
Perform the addition and compare one "digit" at a time.
bool uint1000000_fast_offset_compare(const uint1000000 *A, unsigned B,
const uint1000000 *C) {
ufast carry = B;
for (unsigned i = 0; i < UINT1000000_N; i++) {
ufast sum = A->digit[i] + carry;
if (sum != C->digit[i]) {
return false;
}
carry = sum < A->digit[i];
}
return true;
}

Is bit masking comparable to "accessing an array" in bits?

For all the definitions I've seen of bit masking, they all just dive right into how to bit mask, use bitwise, etc. without explaining a use case for any of it. Is the purpose of updating all the bits you want to keep and all the bits you want to clear to "access an array" in bits?
Is the purpose of updating all the bits you want to keep and all the bits you want to clear to "access an array" in bits?
I will say the answer is no.
When you access an array of int you'll do:
int_array[index] = 42; // Write access
int x = int_array[42]; // Read access
If you want to write similar functions to read/write a specific bit in e.g. an unsigned int in a "array like fashion" it could look like:
unsigned a = 0;
set_bit(a, 4); // Set bit number 4
unsigned x = get_bit(a, 4); // Get bit number 4
The implementation of set_bit and get_bit will require (among other things) some bitwise mask operation.
So yes - to access bits in an "array like fashion" you'll need masking but...
There are many other uses of bit level masking.
Example:
int buffer[64];
unsigned index = 0;
void add_to_cyclic_buffer(int n)
{
buffer[index] = n;
++index;
index &= 0x3f; // Masking by 0x3f ensures index is always in the range 0..63
}
Example:
unsigned a = some_func();
a |= 1; // Make sure a is odd
a &= ~1; // Make sure a is even
Example:
unsigned a = some_func();
a &= ~0xf; // Make sure a is a multiple of 16
This is just a few examples of using "masking" that has nothing to do with accessing bits as an array. Many other examples can be made.
So to conclude:
Masking can be used to write functions that access bits in an array like fashion but masking is used for many other things as well.
So there are 3 (or 4) main uses.
One, as you say, is where you use the word as a set of true/false flags, where each flag is just indexed in a symmetric manner. I use 'word' here to be the piece of discrete memory that you are accessing in a single operation. So a byte holds 8 bit values, and a 'long long' holds 64 bits. With a bit more effort an array of words can be used as an array of more packed flags.
A second is where you are doing some manipulation of the value, but still consider the word to hold one value. There are many tricks like setting or clearing bottom bits to ensure alignment, or clearing top bits to get a modulus, shifting to divide or multiply by powers of 2.
A third use is where you want to pack lots of smaller-ranged values into a word. Each of the values is a particular meaning in context. This may either be because you need to communicate with a device that has defined this as the protocol, or because you need to create so many objects that the saving in space in each object outweighs the increase in code size and code speed cost (though that might be contrasted with the increased cache misses causing slowdown if the object were bigger).
As a distinction the fourth case is where these fields are distinct 1-bit flags that have specific meanings in the context of the code. Data objects tend to collect a number of such flags, and it is simply more convenient sometimes to store them as bits in a single location, than to use separate bytes for each flag. Generally testing a particular fixed indexed bit, or a fixed masked bit is no more expensive in code size or speed than testing the whole byte, though writing can be more complex. The storage savings are clear, so often programmers will declare an enumeration of bit masks by default when faced with creating a number of flags in a structure, or when writing a function.

Is bitwise & equivalent to modulo operation

I came across the following snippet which in my opinion is to convert an integer into binary equivalence.
Can anyone tell me why an &1 is used instead of %2 ? Many thanks.
for (i = 0; i <= nBits; ++i) {
bits[i] = ((unsigned long) x & 1) ? 1 : 0;
x = x / 2;
}
The representation of unsigned integers is specified by the Standard: An unsigned integer with n value bits represents numbers in the range [0, 2n), with the usual binary semantics. Therefore, the least significant bit is the remainder of the value of the integer after division by 2.
It is debatable whether it's useful to replace readable mathematics with low-level bit operations; this kind of style was popular in the 70s when compilers weren't very smart. Nowadays I think you can assume that a compiler will know that dividing by two can be realized as bit shift etc., so you can just Write What You Mean.
what the code snippet does, is not to convert a unsigned int into a binary number (it's internal representation is already binary). It created a bit array with the values of the unsigned int's bits. Spreads it out over an array if you will.
e.g. x=3 => bits[2]=0 bits[1]=1 bits[0]=1
To do this
it selects the last bit of the number and places it the bits array
(the &1 operation).
then shifts the number to the right by one position ( /2 is
equivalent to >>1).
Repeats the above operations for all the bits
You could have used %2 instead of &1, the generated code should be the same. But I guess it's just a matter of programming style and preference. For most programmers, the &1 is a lot clearer than %2.
In your example, %2 and &1 are the same. Which one to take is probably simply a matter of taste. While %2 is probably more easier to read for people with a strong mathematics background, &1 is easier to understand for people with a strong technical background.
They are equivalent in the very special case. It's an old Fortran influenced style.

How does C perform the % operation interally

I am curious to understand the logic behind the mod operation since I understand that bit-shifting operations can be performed to do different things such as bit shifting to multiply.
One way I can see it being done is by a recursive algorithm that keeps dividing until you cannot divide anymore, but this does not seem efficient.
Any ideas will be helpful. Thanks in advance!
The quick version is: Depends on hardware, the optimizer, if it's division by a constant or not (pdf), if there's exceptions to be checked for (e.g. modulo by 0), if and how negative numbers are handled (this is a scary question for C++), etc...
R gave a nice, concise answer for unsigned integers, but it's difficult to understand unless you're well versed with C.
The crux of the technique illuminated by R is to strip away multiples of q until there's no more multiples of q left. We could naively do this with a simple loop:
while (p >= q) p -= q; // One liner, woohoo!
The code may be short, but for large values of p and small values of q this might take a very long time.
Better than stripping away one q at a time would be to strip away many q's at a time. Note that we actually want to strip away as many q's as possible -- that is, floor(p/q) many q's... And indeed, that's a valid technique. For unsigned integers, one would expect that p % q == p - (p / q) * q. (Note that unsigned integer division rounds down.)
But this almost feels like cheating because division and remainder operations are so intimately related. (In fact, often if hardware natively supports division, it supports a divide-and-compute-remainder operation because they're so strongly related.)
Assuming we've no access to division, how shall we find a multiple of q greater than 1 to strip away? In hardware, fixed shift operations are cheap (if not practically free) and conceptually represent multiplication by a non-negative power of two. For example, shifting a bit string left by 3 is equivalent to multiplying by 8 (that is, 2^3), e.g. 5 decimal is equivalent to '101' binary. Shift '101' in binary by adding three zeroes on the right (giving '101000') and the result is 50 in decimal -- five times eight.
Likewise, shift operations are very cheap as software operations and you'll struggle to find a controller that doesn't support them and quickly. (Some architectures such as ARM can even combine shifts with other instructions to make them 'free' a good deal of the time.)
ARMed (couldn't resist) with these shift operations, we can proceed as follows:
Find out the largest power of two we can multiply q by and still be less than p.
Working from the largest power of two to the smallest, multiply q by each power of two and if it's less than what's left of p subtract it from what's left of p.
Whatever you've got left is the remainder.
Why does this work? Because in the end you'll find that all the subtracted powers of two actually sum to floor(p / q)! Don't take my word for it, similar knowledge has been known for a very long time.
Breaking apart R's answer:
#define HI (-1U-(-1U/2))
This effectively gives you an unsigned integer with only the highest value bit set.
unsigned i;
for (i=0; !(HI & (q<<i)); i++);
This line actually finds the highest power of two q can be multiplied before overflowing an unsigned integer. This isn't strictly necessary, but it doesn't change the results other than increasing the amount of execution time required.
In case you're not familiar with the C-isms in this line:
(q<<i) is a left bit shift by i. Recall this is equivalent to multiplying by 2^i.
HI & (q<<i) performs a bitwise-AND. Since HI only has its top bit populated this will only result in a non-zero value when (q<<i) is large enough to cause the top bit to be non-zero. One more shift over to the left and there'd be an integer overflow.
!(HI & (q<<i)) is 'true' when (HI & (q<<i)) is zero and 'false' otherwise.
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
This is a simple decreasing loop do { .... } while (i--);. Note that post-decrementing is used on i so the loop executes, then it checks to see if i is not zero, then it subtracts one from i, and then if its earlier check resulted in true it continues. This has the property that the loop executes its last time when i is 0. This is important because we may need to strip away an unmultiplied copy of q.
if (p >= (q<<i)) checks if the 2^i * q is less than or equal to p. If it is, p -= (q<<i) strips it away.
The remainder is left.
While most C implementations run on hardware that has a division instruction, the remainder operation can be performed roughly like this, for computing p%q, assuming unsigned values:
#define HI (-1U-(-1U/2))
unsigned i;
for (i=0; !(HI & (q<<i)); i++);
do { if (p >= (q<<i)) p -= (q<<i); } while (i--);
The resulting remainder is in p.
In addition to a hardware instruction and implementation using shifts, as R.. suggests, there's also reciprocal multiplication.
This technique can be used when the right-hand side of % is a constant, known at compile time.
Reciprocal multiplication is used to implement division, but using it for % is easy, based on the formula a%b == a-(a/b)*b.
Depending on the smarts of the optimizer, there is a shortcut for modulo base 2. For example, a % 32 can be implemented as a & 31. In general, a % (2^N) == a & (2^N -1). This is lightning fast compared to division. Most dividers (ever hardware) require at least 1 cycle for each bit of the result to calculate, while logic AND is just a few cycle operation (in the pipeline).
EDIT: this only works if a is unsigned !

How to determine the number of useful bits of a number?

I'm writing a function, which determine the number of useful bits of a 16 bits integer.
int16_t
f(int16_t x)
{
/* ... */
}
For example, the number "00000010 00100101" has 10 useful bits. I think I should use some bitwise operators, but I don't know how. I'm looking for some ways to do it.
If you're using gcc (or a gcc-compatible compiler such as ICC) then you can use built in intrinsics, e.g.
#include <limits.h>
int f(int16_t x)
{
return x != 0 ? sizeof(x) * CHAR_BIT - __builtin_clz(x) : 0;
}
This assumes you just want the number of bits to the right of the last leading zero bit.
For MSVC you can use _BitScanReverse with some adjustment.
Otherwise if you need this to be portable then you can implement your own general purpose clz function, see e.g. http://en.wikipedia.org/wiki/Find_first_set
These are called bitscan operations, and on intel architecture there are assembly instruction ( you can call directly from C ) see here. If you are using a MS compiler start from here.
Logarithms compute the number of digits needed to represent a certain number to a certain base:
Let [x] be x, rounded to the next integer.
Then [log_b(x)] is the number of digits needed to represent x to base b.
Hence, if you want to know the number of significant bits of some x in C, then ceil(log2(x)) will tell you.
Since there is no algorithm that will tell you the number of leading zeros of a binary representation in constant time, computing the logarithm may actually be faster than naively iterating.

Resources