Related
Most of the clz() (SW impl.) are optimized for 32 bit unsigned integer.
How to efficiently count leading zeros in a 24 bit unsigned integer?
UPD. Target's characteristics:
CHAR_BIT 24
sizeof(int) 1
sizeof(long int) 2
sizeof(long long int) 3
TL;DR: See point 4 below for the C program.
Assuming your hypothetical target machine is capable of correctly implementing unsigned 24-bit multiplication (which must return the low-order 24 bits of the product), you can use the same trick as is shown in the answer you link. (But you might not want to. See [Note 1].) It's worth trying to understand what's going on in the linked answer.
The input is reduced to a small set of values, where all integers with the same number of leading zeros map to the same value. The simple way of doing that is to flood every bit to cover all the bit positions to the right of it:
x |= x>>1;
x |= x>>2;
x |= x>>4;
x |= x>>8;
x |= x>>16;
That will work for 17 up to 32 bits; if your target datatype has 9 to 16 bits, you could leave off the last shift-and-or because there is no bit position 16 bits to the right of any bit. And so on. But with 24 bits, you'll want all five shift-and-or.
With that, you've turned x into one of 25 values (for 24-bit ints):
x clz x clz x clz x clz x clz
-------- --- -------- --- -------- --- -------- --- -------- ---
0x000000 24 0x00001f 19 0x0003ff 14 0x007fff 9 0x0fffff 4
0x000001 23 0x00003f 18 0x0007ff 13 0x00ffff 8 0x1fffff 3
0x000003 22 0x00007f 17 0x000fff 12 0x01ffff 7 0x3fffff 2
0x000007 21 0x0000ff 16 0x001fff 11 0x03ffff 6 0x7fffff 1
0x00000f 20 0x0001ff 15 0x003fff 10 0x07ffff 5 0xffffff 0
Now, to turn x into clz, we need a good hash function. We don't necessarily expect that hash(x)==clz, but we want the 25 possible x values to hash to different numbers, ideally in a small range. As with the link you provide, the hash function we'll choose is to multiply by a carefully-chosen multiplicand and then mask off a few bits. Using a mask means that we need to choose five bits; in theory, we could use a 5-bit mask anywhere in the 24-bit word, but in order to not have to think too much, I just chose the five high-order bits, the same as the 32-bit solution. Unlike the 32-bit solution, I didn't bother adding 1, and I expect to distinct values for all 25 possible inputs. The equivalent isn't possible with a five-bit mask and 33 possible clz values (as in the 32-bit case), so they have to jump through an additional hoop if the original input was 0.
Since the hash function doesn't directly produce the clz value, but rather a number between 0 and 31, we need to translate the result to a clz value, which uses a 32-byte lookup table, called debruijn in the 32-bit algorithm for reasons I'm not going to get into.
An interesting question is how to select a multiplier with the desired characteristics. One possibility would be to do a bunch of number theory to elegantly discover a solution. That's how it was done decades ago, but these days I can just write a quick-and-dirty Python program to do a brute force search over all the possible multipliers. After all, in the 24-bit case there are only about 16 million possibilities and lots of them work. The actual Python code I used is:
# Compute the 25 target values
targ=[2**i - 1 for i in range(25)]
# For each possible multiplier, compute all 25 hashes, and see if they
# are all different (that is, the set of results has size 25):
next(i for i in range(2**19, 2**24)
if len(targ)==len(set(((i * t) >> 19) & 0x1f
for t in targ)))
Calling next on a generator expression returns the first generated value, which in this case is 0x8CB4F, or 576335. Since the search starts at 0x80000 (which is the smallest multiplier for which hash(1) is not 0), the result printed instantly. I then spent a few more milliseconds to generate all the possible multipliers between 219 and 220, of which there are 90, and selected 0xCAE8F (831119) for purely personal aesthetic reasons.
The last step is to create the lookup table from the computed hash function. (Not saying this is good Python. I just took it from my command history; I might come back and clean it up later. But I included it for completeness.):
lut = dict((i,-1) for i in range(32))
lut.update((((v * 0xcae8f) >> 19) & 0x1f, 24 - i)
for i, v in enumerate(targ))
print(" static const char lut[] = {\n " +
",\n ".join(', '.join(f"{lut[i]:2}" for i in range(j, j+8))
for j in range(0, 32, 8)) +
"\n };\n")
# The result is pasted into the C code below.
So then it's just a question of assembling the C code:
// Assumes that `unsigned int` has 24 value bits.
int clz(unsigned x) {
static const char lut[] = {
24, 23, 7, 18, 22, 6, -1, 9,
-1, 17, 15, 21, 13, 5, 1, -1,
8, 19, 10, -1, 16, 14, 2, 20,
11, -1, 3, 12, 4, -1, 0, -1
};
x |= x>>1;
x |= x>>2;
x |= x>>4;
x |= x>>8;
x |= x>>16;
return lut[((x * 0xcae8f) >> 19) & 0x1f];
}
The test code calls clz on every 24-bit integer in turn. Since I don't have a 24-bit machine handy, I just assume that the arithmetic will work the same on the hypothetical 24-bit machine in the OP.
#include <stdio.h>
# For each 24-bit integer in turn (from 0 to 2**24-1), if
# clz(i) is different from clz(i-1), print clz(i) and i.
#
# Expected output is 0 and the powers of 2 up to 2**23, with
# descending clz values from 24 to 0.
int main(void) {
int prev = -1;
for (unsigned i = 0; i < 1<<24; ++i) {
int pfxlen = clz(i);
if (pfxlen != prev) {
printf("%2d 0x%06X\n", pfxlen, i);
prev = pfxlen;
}
}
return 0;
}
Notes:
If the target machine does not implement 24-bit unsigned multiply in hardware --i.e., it depends on a software emulation-- then it's almost certainly faster to do the clz by just looping over initial bits, particularly if you fold the loop by scanning several bits at a time with a lookup table. That might be faster even if the machine does do efficient hardware multiplies. For example, you can scan six bits at a time with a 32-entry table:
// Assumes that `unsigned int` has 24 value bits.
int clz(unsigned int x) {
static const char lut[] = {
5, 4, 3, 3, 2, 2, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
};
/* Six bits at a time makes octal easier */
if (x & 077000000u) return lut[x >> 19];
if (x & 0770000u) return lut[x >> 13] + 6;
if (x & 07700u) return lut[x >> 7] + 12;
if (x ) return lut[x >> 1] + 18;
return 24;
}
That table could be reduced to 48 bits but the extra code would likely eat up the savings.
A couple of clarifications seem to be in order here. First, although we're scanning six bits at a time, we only use five of them to index the table. That's because we've previously verified that the six bits in question are not all zero; in that case, the low-order bit is either irrelevant (if some other bit is set) or it's 1. Also, we get the table index by shifting without masking; the masking is unnecessary because we know from the masked tests that all the higher order bits are 0. (This will, however, fail miserably if x has more than 24 bits.)
Convert the 24 bit integer into a 32 bit one (either by type punning or explicitly shuffling around the bits), then to the 32 bit clz, and subtract 8.
Why do it that way? Because in this day and age you'll be hard pressed to find a machine that deals with 24 bit types, natively, in the first place.
I would look for the builtin function or intrinsic available for your platform and compiler. Those functions usually implement the most efficient way of finding the most significant bit number. For example, gcc has __builtin_clz function.
If the 24 bit integer is stored in a byte array (for example received from sensor)
#define BITS(x) (CHAR_BIT * sizeof(x) - 24)
int unaligned24clz(const void * restrict val)
{
unsigned u = 0;
memcpy(&u, val, 3);
#if defined(__GNUC__)
return __builtin_clz(u) - BITS(u);
#elif defined(__ICCARM__)
return __CLZ(u) - BITS(u);
#elif defined(__arm__)
return __clz(u) - BITS(u);
#else
return clz(u) - BITS(u); //portable version using standard C features
#endif
}
If it is stored in valid integer
int clz24(const unsigned u)
{
#if defined(__GNUC__)
return __builtin_clz(u) - BITS(u);
#elif defined(__ICCARM__)
return __CLZ(u) - BITS(u);
#elif defined(__arm__)
return __clz(u) - BITS(u);
#else
return clz(u) - BITS(u); //portable version using standard C features
#endif
}
https://godbolt.org/z/z6n1rKjba
You can add more compilers support if you need.
Remember if the value is 0 the value of the __builtin_clz is undefined so you will need to add another check.
I need to merge two variables into one like in example below:
x = 0b110101
y = 0b10011
merged(x,y) -> 0b11010111001
merged(y,x) -> 0b10011101011
So, y is reversed and concatenated to x without unnecessary zeros (last bit of result is always 1)
In other words: merged(abcdefg, hijklm) results in abcdefgmlkjih where a and h are 1
What would be the most efficient way to do this in C
EDIT: most efficient = fastest, sizeof(x) = sizeof(y) = 1, CHAR_BIT= 8
EDIT2: Since question has been put on hold I will post summary right here:
Further limitations and more detail:
Target language: C with 64-bit operations support
For this exact problem fastest method would be a 256 element lookup table as was suggested in comments, which returns index of high bit N and reversed value of second argument, so we can shift first argument to the left by N and perform bitwise or with reversed value of second argument.
In case someone needs good performance for arguments larger than 8 bit (lookup table is not an option) they could:
Find index of high bit using de Bruijn algorithm. Example for 32 bits:
uint32_t msbDeBruijn32( uint32_t v )
{
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1; // first round down to one less than a power of 2
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
return MultiplyDeBruijnBitPosition[( uint32_t )( v * 0x07C4ACDDU ) >> 27];
}
taken from this question:
Find most significant bit (left-most) that is set in a bit array
Reverse bits in bytes (one by one) of the second argument using this bit twiddling hack
unsigned char b; // reverse this (8-bit) byte
b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
taken from here: http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious
So, this question now can be closed/deleted
Solution
This should do the trick in C++ and C:
unsigned int merged(unsigned char a, unsigned char b) {
unsigned int r = a;
for (; b; b>>=1)
r = (r<<1)|(b&1);
return r;
}
I don't think that there's a more efficient way to do it. Here an online demo.
How does it work ?
This works by building the result by starting with the first argument:
r is 0b110101;
Then it left-shifts r, which is, using your terminology, like concatenating a 0 at the lowest end of r:
0b110101 << 1 is 0b1101010;
and simultaneously we get the lowest bit of b with a bitwise and, then set the lowest bit of r to the same value using a bitwise or:
0b10011 & 1 is 0b00001
so (0b110101 << 1)|(0b10011 & 1) is 0b1101011;
We then right-shift b to process the next lowest bit in the same way:
0b10011 >> 1 is 0b1001;
as long as there is some bit set left in b.
Important remarks
You have to be careful about the type used in order to avoid the risk of overflow. So for 2 unsigned char as input you could use unsigned int as output.
Note that using signed integers would be risky, if the left-shift might cause the sign bit to change. This is why unsigned is important here.
I write a algorithm (taken from "The C Programming Language") that counts the number of 1-bits very fast:
int countBit1Fast(int n)
{
int c = 0;
for (; n; ++c)
n &= n - 1;
return c;
}
But a friend told me that __builtin__popcount(int) is a lot faster, but less portable. I give it a try and was MANY times faster! Why it's so fast? I want to count bits as fast as possible, but without stick to a particular compiler.
EDIT: I may use it on PIC micro-controllers and maybe on non-intel processors, so I need the maximum portability.
I write a algorithm (taken from "The C Programming Language") that counts the number of 1-bits very fast:
I don't see why anyone would characterize your approach as "very fast". It's a bit clever, and it should be faster on average than naive alternatives. It also does not depend on the width of the representation of int, which is a plus. I observe that it has undefined behavior for negative arguments, but that's a common theme for bitwise operators and functions.
Let's analyze, supposing a non-negative argument:
int c = 0;
for (; n; ++c)
n &= n - 1;
How many loop iterations are performed?
1 for each 1 bit in the binary representation of the value, irrespective of where in the value each bit lies
How much work is performed per iteration
one increment of c
one comparison of n against zero (plus one more of these when breaking out of the loop)
one decrement of n by 1
one bitwise 'and'
That ignores reads and stores, which very likely can be made free or especially cheap by keeping the operands in registers. If we assume equal cost for each of those, that's four operations per iteration. For random 32-bit integers, there will be an average of 16 iterations, for a total of 65 operations on average. (Best case is just one operation, but worst is 129, which is no better than a naive implementation).
__builtin__popcount(), on the other hand, uses a single instruction regardless of input on platforms that support it, such as yours very likely is. Even on those that don't have a for-purpose instruction, however, it can be done faster (on average).
#dbush has presented one such mechanism that has similar advantages to the one you present. In particular, it does not depend on a pre-chosen integer width, and although it does depend on where in the representation the 1 bits reside, it does run faster for some arguments (smaller ones) than others. If I'm counting right, that one will average around 20 operations on random 32-bit inputs: five in each of four loop iterations (only 0.4% of random inputs would require fewer than four iterations). I'm counting one table read per iteration there, which I assume can be served from cache, but which is probably still not as fast as an arithmetic operation on values already held in registers.
One that is strictly computational would be:
int countBit1Fast(uint32_t n) {
n = (n & 0x55555555u) + ((n >> 1) & 0x55555555u);
n = (n & 0x33333333u) + ((n >> 2) & 0x33333333u);
n = (n & 0x0f0f0f0fu) + ((n >> 4) & 0x0f0f0f0fu);
n = (n & 0x00ff00ffu) + ((n >> 8) & 0x00ff00ffu);
n = (n & 0x0000ffffu) + ((n >>16) & 0x0000ffffu);
return n;
}
That's pretty easy to count: five additions, five shifts, and ten bitwise 'and' operations, and 5 loads of constants for a total of 25 operations for every input (and it goes up only to 30 for 64-bit inputs, though those are now 64-bit operations instead of 32-bit ones). This version is, however, intrinsically dependent on a particular size of the input data type.
As others have mentioned, __buildin__popcount() is fast because it uses a single x86 instruction.
If you want something faster than what you have that doesn't use anything processor or compiler specific you can create a lookup table with 256 entries:
int bitcount[] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};
Then use that to get the bit count of each byte:
int countBit1Fast(int n)
{
int i, count = 0;
unsigned char *ptr = (unsigned char *)&n;
for (i=0;i<sizeof(int);i++) {
count += bitcount[ptr[i]];
}
return count;
}
The __builtin__popcount(unsigned int) is so fast because it is a gcc extension that utilizes a builtin hardware instruction. If you are willing to trade architecture portability for compiler portability, look into the just-as-fast intel intrinsic functions, specifically:
_mm_popcnt_u32(unsigned __int32);
_mm_popcnt_u64(unsigned __int64);
You must then include the <mmintrin.h> header file to use these intrinsic functions, however they will work with non-gcc compilers. You may also have to supply a target architecture to get the functions to inline (which is strictly required), using something like -march=native.
As others have mentioned, on x86_64 you have a popcount CPU instruction that will thoroughly trounce any software implementation.
In the absence of a CPU popcount instruction, which method is fastest depends on word size, lookup speed (which may depend on CPU cache behaviour), and the effectiveness of super-scalar pipelining.
The simple approach of taking each byte, looking it up in a table, and adding together these values is quite quick, taking about ceil(num_bits/8)*3-1) operations, depending on how "array fetch" works.
There's another less well known method that works by grouping bits into runs, then repeatedly creating half as many runs that are twice the size as before, where each run contains the sum of two previous runs.
This algorithm takes 4×log₂(num_bits))-1 steps, which means it performs comparatively poorly for small integer sizes, but improves for larger ones:
int size (bits)
ops (lookup)
ops (run-add)
8
2
11
16
5
15
32
11
19
64
23
23
128
47
27
256
95
31
Initially you start with every bit in its own run; then you take pairs of sets and add them together, so each is a number from 0 to 2 inclusive, which conveniently fits in a 2-bit unsigned int:
x = (x >> 1 & 0x55555555555555555555555555555555)
+(x & 0x55555555555555555555555555555555);
Now every pair of bits contains a number from 0 to 2, indicating how many bits used to be set in that pair.
The subsequent steps are then fairly straightforward: combine adjacent runs into new runs that are twice the width:
x = (x >> 2 & 0x33333333333333333333333333333333)
+(x & 0x33333333333333333333333333333333);
Now each run of 4 bits contains a number from 0 to 4. Since those numbers fit in 3 bits, the top bit of each run will always be 0, and doesn't need to be included in the mask.
x = (x >> 4 & 0x07070707070707070707070707070707)
+(x & 0x07070707070707070707070707070707);
Now each run of 8 bits contains a number from 0 to 8. Since those numbers fit in 4 bits, the top 12 bits of each run will always be 0, and don't need to be included in the mask.
x = (x >> 8 & 0x000f000f000f000f000f000f000f000f)
+(x & 0x000f000f000f000f000f000f000f000f);
Now each run of 16 bits contains a number from 0 to 16. Since those numbers fit in 5 bits, the top 27 bits of each run will always be 0, and don't need to be included in the mask.
x = (x >>16 & 0x0000001f0000001f0000001f0000001f)
+(x & 0x0000001f0000001f0000001f0000001f);
Now each run of 32 bits contains a number from 0 to 32. Since those numbers fit in 6 bits, the top 58 bits of each run will always be 0, and don't need to be included in the mask.
x = (x >>32 & 0x000000000000003f000000000000003f)
+(x & 0x000000000000003f000000000000003f);
Now each run of 64 bits contains a number from 0 to 64. Since those numbers fit in 7 bits, the top 121 bits of each run will always be 0, and don't need to be included in the mask.
x = (x >>64 & 0x0000000000000000000000000000007f)
+(x & 0x0000000000000000000000000000007f);
In general, for step i, pre-compute
w0 = 1<<i; /* number of bits per run for THIS cycle */
w1 = 1<<i+1; /* number of bits per run for NEXT cycle */
r1 = w1-1; /* mask for a number from 0 .. w0 inclusive */
/* Create a pattern of bits with a 1 every w1 bits: */
m1 = 1 << w1;
m3 = UINTMAX / (m1 - 1);
m4 = m3 * r1;
shift[i] = w0;
mask[i] = m4;
/* for the variant below */
m0 = 1 << w0;
s_mult[i] = m0 - 1;
and then for each step use:
x = (x >> shift[i] & mask[i])
+(x & mask[i]);
Depending on how fast your CPU can do multiplication, this might make better use of pipelining:
x -= x >> 1 & 0x55555555555555555555555555555555;
x -= (x >> 2 & 0x33333333333333333333333333333333) * 3;
x -= (x >> 4 & 0x07070707070707070707070707070707) * 0xf;
x -= (x >> 8 & 0x000f000f000f000f000f000f000f000f) * 0xff;
x -= (x >>16 & 0x0000001f0000001f0000001f0000001f) * 0xffff;
x -= (x >>32 & 0x000000000000003f000000000000003f) * 0xffffffff;
x -= (x >>64 & 0x0000000000000000000000000000007f) * 0xffffffffffffffff;
y -= (x >> shift[i] & mask[i]) * s_mult[i];
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Hex to char array in C
I have a char[10] array that contains hex characters, and I'd like to end up with a byte[5] array of the values of those characters.
In general, how would I go from a char[2] hex value (30) to a single decimal byte (48)?
Language is actually Arduino, but basic C would be best.
1 byte = 8 bit = 2 x (hex digit)
What you can do is left shift the hex-digit stored in a byte by 4 places (alternatively multiply my 16) and then add the 2nd byte which has the 2nd hex digit.
So to convert 30 in hex to 48 in decimal:
take first hex digit, here 3; multiply it by 16 getting 3*16 = 48
add the second byte, here 0; getting 48+0 = 48 which is your final answer
Char array in "ch", byte array as "out"
byte conv[23] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1, -1, 10, 11, 12, 13, 14, 15};
// Loop over the byte array from 0 to 9, stepping by 2
int j = 0;
for (int i = 0; i < 10; i += 2) {
out[j] = conv[ch[i]-'0'] * 16 + conv[ch[i+1]-'0'];
j++;
}
Untested.
The trick is in the array 'conv'. It's a fragment of an ASCII table, starting with the character for 0. The -1's are for the junk between '9' and 'A'.
Oh, this is also not a safe or defensive routine. Bad input will result in crashes, glitches, etc.
How do you count the number of zero group bits in a number? group bits
is any consecutive zero or one bits, for example, 2 is represented
as ....0000000000000000010 has two zero bits groups the least
significant bit and the group starts after one.
Also, I am in a bad need for algorithms on bits manipulation if any
one has a reference, please share
Here are some hints for you:
if (x & 1) {...} checks if the least-significant bit of x is set;
x >>= 1 shifts the value of x one bit to the right;
mind the negative numbers when you do bit manipulation.
Here are a couple fun ways to do it. The #define-s at the beginning are only for being able to express the function inputs in binary notation. The two functions that do the work are variations on a theme: the first one uses a de Bruijn sequence and a lookup table to figure out how many trailing zeros there are in the parameter, and does the rest accordingly. The second uses a Mod37 table to do the same, which is very similar in concept but involves a modulo operation instead of a multiplication and a bit shift. One of them is faster. Too lazy to figure out which one.
This is a lot more code than the obvious solution. But this can be very effective if you have primarily zeros in the input, as it only requires one loop iteration (just one branch, actually) for every 1-bit, instead of one loop iteration for every bit.
#define HEX__(n) 0x##n##LU
#define B8__(x) ((x&0x0000000FLU)? 1:0) \
+((x&0x000000F0LU)? 2:0) \
+((x&0x00000F00LU)? 4:0) \
+((x&0x0000F000LU)? 8:0) \
+((x&0x000F0000LU)? 16:0) \
+((x&0x00F00000LU)? 32:0) \
+((x&0x0F000000LU)? 64:0) \
+((x&0xF0000000LU)?128:0)
#define B8(d) ((unsigned char)B8__(HEX__(d)))
#define B16(dmsb,dlsb) (((unsigned short)B8(dmsb)<<8) + B8(dlsb))
#define B32(dmsb,db2,db3,dlsb) (((unsigned long)B8(dmsb)<<24) \
+ ((unsigned long)B8(db2)<<16) \
+ ((unsigned long)B8(db3)<<8) \
+ B8(dlsb))
unsigned int count_zero_groups_debruijn(unsigned int v)
{
// number of zero-bit groups (set to 1 if high-bit is zero)
unsigned int g = (~(v & 0x80000000)) >> 31;
// lookup table for deBruijn
static const int _DeBruijnTable[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
do {
// get number of trailing zeros in v
int tz = _DeBruijnTable[((v & -v) * 0x077CB531U) >> 27];
// increment zero group count if more than 1 trailing zero
g += (tz > 0) * 1;
// shift out trailing zeros and the preceding 1
v = v >> (tz+1);
} while (v);
return g;
}
unsigned int count_zero_groups_mod37(unsigned int v)
{
// number of zero-bit groups (set to 1 if high-bit is zero)
unsigned int g = (~(v & 0x80000000)) >> 31;
// lookup table for mod37
static const int _Mod37Table[] =
{
0, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4,
7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5, 20,
8, 19, 18
};
do {
// get number of trailing zeros in v
int tz = _Mod37Table[(v & -v) % 37];
// increment zero group count if more than 1 trailing zero
g += (tz > 0) * 1;
// shift out trailing zeros and the preceding 1
v = v >> (tz+1);
} while (v);
return g;
}
int main(int argc, char* argv[])
{
printf("zero groups: %d (should be 6)\n", count_zero_groups_debruijn(B32(10011001,10000000,00001001,00010011)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_debruijn(B32(10011001,10000000,00001001,00010000)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_debruijn(B32(00011001,10000000,00001001,00010001)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_debruijn(B32(00011001,10000000,00001001,00010000)));
printf("zero groups: %d (should be 0)\n", count_zero_groups_debruijn(B32(11111111,11111111,11111111,11111111)));
printf("zero groups: %d (should be 1)\n", count_zero_groups_debruijn(B32(00000000,00000000,00000000,00000000)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_mod37 (B32(10011001,10000000,00001001,00010011)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_mod37 (B32(10011001,10000000,00001001,00010000)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_mod37 (B32(00011001,10000000,00001001,00010001)));
printf("zero groups: %d (should be 6)\n", count_zero_groups_mod37 (B32(00011001,10000000,00001001,00010000)));
printf("zero groups: %d (should be 0)\n", count_zero_groups_mod37 (B32(11111111,11111111,11111111,11111111)));
printf("zero groups: %d (should be 1)\n", count_zero_groups_mod37 (B32(00000000,00000000,00000000,00000000)));
return 0;
}
The simpliest way is to count number of transitions from one to zero in loop using the shift of a bit mask along with bitwise `and' operation. It also necessary to check the first bit and increase obtained quantity by one if it is 0.
Andrew's solution is without doubt the simplest to design and implement, but I can't help but wonder if there is a quicker solution using bigger bit masks.
At a job interview I was asked to write some code to identify the most significant set bit. After spending a few minutes coming up with an ultra slim ultra fast binary search using shrinking bitmasks, which I kept suddenly realising could be optimised further, and further, resulting in large quantities of scribbles on lots of pieces of paper, the examiner looked at me blankly and asked if I knew how to use a for loop.
Maybe he should have just asked me to use a for loop to solve the problem.
Anyway it's not impossible that a similar such solution exists here.
You can repeatedly use integer division and the modulo operator to extract the bits, and keep track of the groups within your loop. It sounds like a homework question, so I'm not sure how much of a service it does you to give you a full solution? Consider that you can use this algorithm to get the base-2 representation of a positive integer (in fact it works with any base >= 2):
int example = 40;
while (example > 0) {
printf("%d\n", example % 2);
example /= 2;
}
This will print out the bits in reverse order (that is, starting from the least significant). From here there's not much work to be done to count the groups you want to count. Should I go further or can you take it from here?
Try this code
Didnt test it.. so do let me know if you find some bug
num is the input.
int main()
{
int count = 0;
int num = 0xF0000000, mask = 1; /*size of mask and num should be same. */
int i;
int flag= 1;
i = sizeof(num) * 8;
while(--i) {
if(flag && !(num & mask)) {
count++;
flag = 0;
}
else if(num & mask)
flag = 1;
mask = mask<<1;
}
printf("\n%d\n",count);
}
Thanks
cexpert
http://learnwithtechies.com