c, obtaining a special random number - c
I have a algorithm problem that I need to speed up :)
I need a 32bit random number, with exact 10 bits set to 1. But in the same time, patterns like 101 (5 dec) and 11 (3 dec) to be considered illegal.
Now the MCU is a 8051 (8 bit) and I tested all this in Keil uVision. My first attempt completes, giving the solution
0x48891249
1001000100010010001001001001001 // correct, 10 bits 1, no 101 or 11
The problem is that it completes in 97 Seconds or 1165570706 CPU cycles which is ridiculous!!!
Here is my code
// returns 1 if number is not good. ie. contains at leats one 101 bit sequence
bool checkFive(unsigned long num)
{
unsigned char tmp;
do {
tmp = (unsigned char)num;
if(
(tmp & 7) == 5
|| (tmp & 3) == 3
) // illegal pattern 11 or 101
return true; // found
num >>= 1;
}while(num);
return false;
}
void main(void) {
unsigned long v,num; // count the number of bits set in v
unsigned long c; // c accumulates the total bits set in v
do {
num = (unsigned long)rand() << 16 | rand();
v = num;
// count all 1 bits, Kernigen style
for (c = 0; v; c++)
v &= v - 1; // clear the least significant bit set
}while(c != 10 || checkFive(num));
while(1);
}
The big question for a brilliant mind :)
Can be done faster? Seems that my approach is naive.
Thank you in advance,
Wow, I'm impressed, thanks all for suggestions. However, before accept, I need to test them these days.
Now with the first option (look-up) it's just not realistic, will complete blow my 4K RAM of entire 8051 micro controller :) As you can see in image bellow, I tested for all combinations in Code Blocks but there are way more than 300 and it's not finished yet until 5000 index...
The code I use to test
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
//#define bool bit
//#define true 1
//#define false 0
// returns 1 if number is not good. ie. contains at leats one 101 bit sequence
bool checkFive(uint32_t num)
{
uint8_t tmp;
do {
tmp = (unsigned char)num;
if(
(tmp & 7) == 5
|| (tmp & 3) == 3
) // illegal pattern 11 or 101
return true; // found
num >>= 1;
}while(num);
return false;
}
void main(void) {
uint32_t v,num; // count the number of bits set in v
uint32_t c, count=0; // c accumulates the total bits set in v
//printf("Program started \n");
num = 0;
printf("Program started \n");
for(num=0; num <= 0xFFFFFFFF; num++)
{
//do {
//num = (uint32_t)rand() << 16 | rand();
v = num;
// count all 1 bits, Kernigen style
for (c = 0; v; c++)
v &= v - 1; // clear the least significant bit set
//}while(c != 10 || checkFive(num));
if(c != 10 || checkFive(num))
continue;
count++;
printf("%d: %04X\n", count, num);
}
printf("Complete \n");
while(1);
}
Perhaps I can re-formulate the problem:
I need a number with:
precise (known) amount of 1 bits, 10 in my example
not having 11 or 101 patterns
remaining zeroes can be any
So somehow, shuffle only the 1 bits inside.
Or, take a 0x00000000 and add just 10 of 1 bits in random positions, except the illegal patterns.
Solution
Given a routine r(n) that returns a random integer from 0 (inclusive) to n (exclusive) with uniform distribution, the values described in the question may be generated with a uniform distribution by calls to P(10, 4) where P is:
static uint32_t P(int a, int b)
{
if (a == 0 && b == 0)
return 0;
else
return r(a+b) < a ? P(a-1, b) << 3 | 1 : P(a, b-1) << 1;
}
The required random number generator can be:
static int r(int a)
{
int q;
do
q = rand() / ((RAND_MAX+1u)/a);
while (a <= q);
return q;
}
(The purpose of dividing by (RAND_MAX+1u)/a and the do-while loop is to trim the range of rand to an even multiple of a so that bias due to a non-multiple range is eliminated.)
(The recursion in P may be converted to iteration. This is omitted as it is unnecessary to illustrate the algorithm.)
Discussion
If the number cannot contain consecutive bits 11 or 101, then the closest together two 1 bits can be is three bits apart, as in 1001. Fitting ten 1 bits in 32 bits then requires at least 28 bits, as in 1001001001001001001001001001. Therefore, to satisfy the constraints that there is no 11 or 101 and there are exactly 10 1 bits, the value must be 1001001001001001001001001001 with four 0 bits inserted in some positions (including possibly the beginning or the end).
Selecting such a value is equivalent to placing 10 instances of 001 and 4 instances of 0 in some order.1 There are 14! ways of ordering 14 items, but any of the 10! ways of rearranging the 10 001 instances with each other are identical, and any of the 4! ways of rearranging the 0 instances with each other are identical, so the number of distinct selections is 14! / 10! / 4!, also known as the number of combinations of selecting 10 things from 14. This is 1,001.
To perform such a selection with uniform distribution, we can use a recursive algorithm:
Select the first choice with probability distribution equal to the proportion of the choices in the possible orderings.
Select the remaining choices recursively.
When ordering a instances of one object and b of a second object, a/(a+b) of the potential orderings will start with the first object, and b/(a+b) will start with the second object. Thus, the design of the P routine is:
If there are no objects to put in order, return the empty bit string.
Select a random integer in [0, a+b). If it is less than a (which has probability a/(a+b)), insert the bit string 001 and then recurse to select an order for a-1 instances of 001 and b instances of 0.
Otherwise, insert the bit string 0 and then recurse to select an order for a instances of 001 and b-1 instances of 0.
(Since, once a is zero, only 0 instances are generated, if (a == 0 && b == 0) in P may be changed to if (a == 0). I left it in the former form as that shows the general form of a solution in case other strings are involved.)
Bonus
Here is a program to list all values (although not in ascending order).
#include <stdint.h>
#include <stdio.h>
static void P(uint32_t x, int a, int b)
{
if (a == 0 && b == 0)
printf("0x%x\n", x);
else
{
if (0 < a) P(x << 3 | 1, a-1, b);
if (0 < b) P(x << 1, a, b-1);
}
}
int main(void)
{
P(0, 10, 4);
}
Footnote
1 This formulation means we end up with a string starting 001… rather than 1…, but the resulting value, interpreted as binary, is equivalent, even if there are instances of 0 inserted ahead of it. So the strings with 10 001 and 4 0 are in one-to-one correspondence with the strings with 4 0 inserted into 1001001001001001001001001001.
One way to satisfy your criteria in a limited number of solutions is to utilize the fact that there can be no more that four groups of 000s within the bit population. This also means that there can one be one group of 0000 in the value. Knowing this, you can seed your value with a single 1 in bits 27-31 and then continue adding random bits checking that each bit added satisfies your 3 or 5 constraints.
When adding random bits to your value and satisfying your constraints, there can always be combinations that lead to a solution that can never satisfy all constraints. To protect against those cases, just keep an iteration count and reset/restart the value generation if iterations exceed that value. Here, if a solution is going to be found, it will be found in less than 100 iterations. And is generally found in 1-8 attempts. Meaning for each value you generate, you have on average no more than 800 iterations which will be a far cry less than "97 Seconds or 1165570706 CPU cycles" (I haven't counted cycles, but the return is almost instantaneous)
There are many ways to approach this problem, this is just one that worked in a reasonable amount of time:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <limits.h>
#define BPOP 10
#define NBITS 32
#define LIMIT 100
/** rand_int for use with shuffle */
static int rand_int (int n)
{
int limit = RAND_MAX - RAND_MAX % n, rnd;
rnd = rand();
for (; rnd >= limit; )
rnd = rand();
return rnd % n;
}
int main (void) {
int pop = 0;
unsigned v = 0, n = NBITS;
size_t its = 1;
srand (time (NULL));
/* one of first 5 bits must be set */
v |= 1u << (NBITS - 1 - rand_int (sizeof v + 1));
pop++; /* increment pop count */
while (pop < BPOP) { /* loop until pop count 10 */
if (++its >= LIMIT) { /* check iterations */
#ifdef DEBUG
fprintf (stderr, "failed solution.\n");
#endif
pop = its = 1; /* reset for next iteration */
v = 0;
v |= 1u << (NBITS - 1 - rand_int (sizeof v + 1));
}
unsigned shift = rand_int (NBITS); /* get random shift */
if (v & (1u << shift)) /* if bit already set */
continue;
/* protect against 5 (101) */
if ((shift + 2) < NBITS && v & (1u << (shift + 2)))
continue;
if ((int)(shift - 2) >= 0 && v & (1u << (shift - 2)))
continue;
/* protect against 3 (11) */
if ((shift + 1) < NBITS && v & (1u << (shift + 1)))
continue;
if ((int)(shift - 1) >= 0 && v & (1u << (shift - 1)))
continue;
v |= 1u << shift; /* add bit at shift */
pop++; /* increment pop count */
}
printf ("\nv : 0x%08x\n", v); /* output value */
while (n--) { /* output binary confirmation */
if (n+1 < NBITS && (n+1) % 4 == 0)
putchar ('-');
putchar ((v >> n & 1) ? '1' : '0');
}
putchar ('\n');
#ifdef DEBUG
printf ("\nits: %zu\n", its);
#endif
return 0;
}
(note: you will probably want a better random source like getrandom() or reading from /dev/urandom if you intend to generate multiple random solutions within a loop -- expecially if you are calling the executable in a loop from your shell)
I have also included a DEBUG define that you can enable by adding the -DDEBUG option to your compiler string to see the number of failed solutions and number of iterations on the final.
Example Use/Output
The results for 8 successive runs:
$ ./bin/randbits
v : 0x49124889
0100-1001-0001-0010-0100-1000-1000-1001
v : 0x49124492
0100-1001-0001-0010-0100-0100-1001-0010
v : 0x48492449
0100-1000-0100-1001-0010-0100-0100-1001
v : 0x91249092
1001-0001-0010-0100-1001-0000-1001-0010
v : 0x92488921
1001-0010-0100-1000-1000-1001-0010-0001
v : 0x89092489
1000-1001-0000-1001-0010-0100-1000-1001
v : 0x82491249
1000-0010-0100-1001-0001-0010-0100-1001
v : 0x92448922
1001-0010-0100-0100-1000-1001-0010-0010
As Eric mentioned in his answer, since each 1 but must be separated by at least two 0 bits, you basically start with the 28-bit pattern 1001001001001001001001001001. It's then a matter of placing the remaining four 0 bits within this bit pattern, and there are 11 distinct places to insert each zero.
This can be accomplished by first selecting a random number from 1 to 11 to determine where to place a bit. Then you left shift all the bits above the target bit by 1. Repeat 3 more times, and you have your value.
This can be done as follows:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>
void binprint(uint32_t n)
{
int i;
for (i=0;i<32;i++) {
if ( n & (1u << (31 - i))) {
putchar('1');
} else {
putchar('0');
}
}
}
// inserts a 0 bit into val after pos "1" bits are found
uint32_t insert(uint32_t val, int pos)
{
int cnt = 0;
uint32_t mask = 1u << 31;
uint32_t upper, lower;
while (cnt < pos) {
if (val & mask) { // look for a set bit and count if you find one
cnt++;
}
mask >>= 1;
}
if (mask == (1u << 31)) {
return val; // insert at the start: no change
} else if (mask == 0) {
return val << 1; // insert at the end: shift the whole thing by 1
} else {
mask = (mask << 1) - 1; // mask has all bits below the target set
lower = val & mask; // extract the lower portion
upper = val & (~mask); // extract the upper portion
return (upper << 1) | lower; // recombine with the upper portion shifted 1 bit
}
}
int main()
{
int i;
uint32_t val = 01111111111; // hey look, a good use of octal!
srand(time(NULL));
for (i=0;i<4;i++) {
int p = rand() % 11;
printf("p=%d\n", p);
val = insert(val, p);
}
binprint(val);
printf("\n");
return 0;
}
Sample output for two runs:
p=3
p=10
p=9
p=0
01001001000100100100100100100010
...
p=3
p=9
p=3
p=1
10001001000010010010010010010001
Run time is negligible.
Since you don't want a lookup table here is the way:
Basically you have this number with 28 bits set to 0 and 1 in which you need to insert 4x 0 :
0b1001001001001001001001001001
Hence you can use the following algorithm:
int special_rng_nolookup(void)
{
int secret = 0b1001001001001001001001001001;
int low_secret;
int high_secret;
unsigned int i = 28; // len of secret
unsigned int rng;
int mask = 0xffff // equivalent to all bits set in integer
while (i < 32)
{
rng = __asm__ volatile(. // Pseudo code
"rdrand"
);
rng %= (i + 1); // will generate a number between 0 and 28 where you will add a 0. Then between 0 and 29, 30, 31 for the 3 next loop.
low_secret = secret & (mask >> (i - rng)); // locate where you will add your 0 and save the lower part of your number.
high_secret = (secret ^ low_secret) << (!(!rng)); // remove the lower part to your int and shift to insert a 0 between the higher part and the lower part. edit : if rng was 0 you want to add it at the very beginning (left part) so no shift.
secret = high_secret | low_secret; // put them together.
++i;
}
return secret;
}
Related
random 4 digit number with non repeating digits in C
I'm trying to make a get_random_4digit function that generates a 4 digit number that has non-repeating digits ranging from 1-9 while only using ints, if, while and functions, so no arrays etc. This is the code I have but it is not really working as intended, could anyone point me in the right direction? int get_random_4digit() { int d1 = rand() % 9 + 1; int d2 = rand() % 9 + 1; while (true) { if (d1 != d2) { int d3 = rand() % 9 + 1; if (d3 != d1 || d3 != d2) { int d4 = rand() % 9 + 1; if (d4 != d1 || d4 != d2 || d4 != d3) { random_4digit = (d1 * 1000) + (d2 * 100) + (d3 * 10) + d4; break; } } } } printf("Random 4digit = %d\n", random_4digit); }
A KISS-approach could be this: int getRandom4Digits() { uint16_t acc = 0; uint16_t used = 0; for (int i = 0; i < 4; i++) { int idx; do { idx = rand() % 9; // Not equidistributed but never mind... } while (used & (1 << idx)); acc = acc * 10 + (idx + 1); used |= (1 << idx); } return acc; } This looks terribly dumb at first. A quick analysis gives that this really isn't so bad, giving a number of calls to rand() to be about 4.9. The expected number of inner loop steps [and corresponding calls to rand(), if we assume rand() % 9 to be i.i.d.] will be: 9/9 + 9/8 + 9/7 + 9/6 ~ 4.9107.
There are 9 possibilities for the first digit, 8 possibilities for the second digit, 7 possibilities for the third digit and 6 possibilities for the last digit. This works out to "9*8*7*6 = 3024 permutations". Start by getting a random number from 0 to 3023. Let's call that P. To do this without causing a biased distribution use something like do { P = rand() & 0xFFF; } while(P >= 3024);. Note: If you don't care about uniform distribution you could just do P = rand() % 3024;. In this case lower values of P will be more likely because RAND_MAX doesn't divide by 3024 nicely. The first digit has 9 possibilities, so do d1 = P % 9 + 1; P = P / 9;. The second digit has 8 possibilities, so do d2 = P % 8 + 1; P = P / 8;. The third digit has 7 possibilities, so do d3 = P % 7 + 1; P = P / 7;. For the last digit you can just do d4 = P + 1; because we know P can't be too high. Next; convert "possibility" into a digit. For d1 you do nothing. For d2 you need to increase it if it's greater than or equal to d1, like if(d2 >= d1) d2++;. Do the same for d3 and d4 (comparing against all previous digits). The final code will be something like: int get_random_4digit() { int P, d1, d2, d3, d4; do { P = rand() & 0xFFF; } while(P >= 3024); d1 = P % 9 + 1; P = P / 9; d2 = P % 8 + 1; P = P / 8; d3 = P % 7 + 1; P = P / 7; d4 = P + 1; if(d2 >= d1) d2++; if(d3 >= d1) d3++; if(d3 >= d2) d3++; if(d4 >= d1) d4++; if(d4 >= d2) d4++; if(d4 >= d3) d4++; return d1*1000 + d2*100 + d3*10 + d4; }
You could start with an integer number, 0x123456789, and pick random nibbles from it (the 4 bits that makes up one of the digits in the hex value). When a nibble has been selected, remove it from the number and continue picking from those left. This makes exactly 4 calls to rand() and has no if or other conditions (other than the loop condition). #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int get_random_4digit() { uint64_t bits = 0x123456789; // nibbles int res = 0; // pick random nibbles for(unsigned last = 9 - 1; last > 9 - 1 - 4; --last) { unsigned lsh = last * 4; // shift last nibble unsigned sel = (rand() % (last + 1)) * 4; // shift for random nibble // multiply with 10 and add the selected nibble res = res * 10 + ((bits & (0xFULL << sel)) >> sel); // move the last unselected nibble right to where the selected // nibble was: bits = (bits & ~(0xFULL << sel)) | ((bits & (0xFULL << lsh)) >> (lsh - sel)); } return res; } Demo Another variant could be to use the same value, 0x123456789, and do a Fisher-Yates shuffle on the nibbles. When the shuffle is done, return the 4 lowest nibbles. This is more expensive since it randomizes the order of all 9 nibbles - but it makes it easy if you want to select an arbitrary amount of them afterwards. Example: #include <stdlib.h> #include <stdint.h> #include <stdio.h> #include <time.h> uint16_t get_random_4digit() { uint64_t bits = 0x123456789; // nibbles // shuffle the nibbles for(unsigned idx = 9 - 1; idx > 0; --idx) { unsigned ish = idx * 4; // index shift // shift for random nibble to swap with `idx` unsigned swp = (rand() % (idx + 1)) * 4; // extract the selected nibbles uint64_t a = (bits & (0xFULL << ish)) >> ish; uint64_t b = (bits & (0xFULL << swp)) >> swp; // swap them bits &= ~((0xFULL << ish) | (0xFULL << swp)); bits |= (a << swp) | (b << ish); } return bits & 0xFFFF; // return the 4 lowest nibbles } The bit manipulation can probably be optimized - but I wrote it like I thought it so it's probably better for readability to leave it as-is You can then print the value as a hex value to get the output you want - or extract the 4 nibbles and convert it for decimal output. int main() { srand(time(NULL)); uint16_t res = get_random_4digit(); // print directly as hex: printf("%X\n", res); // or extract the nibbles and multiply to get decimal result - same output: uint16_t a = (res >> 12) & 0xF; uint16_t b = (res >> 8) & 0xF; uint16_t c = (res >> 4) & 0xF; uint16_t d = (res >> 0) & 0xF; uint16_t dec = a * 1000 + b * 100 + c * 10 + d; printf("%d\n", dec); } Demo
You should keep generating digits until distinct one found: int get_random_4digit() { int random_4digit = 0; /* We must have 4 digits number - at least 1234 */ while (random_4digit < 1000) { int digit = rand() % 9 + 1; /* check if generated digit is not in the result */ for (int number = random_4digit; number > 0; number /= 10) if (number % 10 == digit) { digit = 0; /* digit has been found, we'll try once more */ break; } if (digit > 0) /* unique digit generated, we add it to result */ random_4digit = random_4digit * 10 + digit; } return random_4digit; } Please, fiddle youself
One way to do this is to create an array with all 9 digits, pick a random one and remove it from the list. Something like this: uint_fast8_t digits[]={1,2,3,4,5,6,7,8,9}; //only 1-9 are allowed, 0 is not allowed uint_fast8_t left=4; //How many digits are left to create unsigned result=0; //Store the 4-digit number here while(left--) { uint_fast8_t digit=getRand(9-4+left); //pick a random index result=result*10+digits[digit]; //Move all digits above the selcted one 1 index down. //This removes the picked digit from the array. while(digit<8) { digits[digit]=digits[digit+1]; digit++; } } You said you need a solution without arrays. Luckily, we can store up to 16 4 bit numbers in a single uint64_t. Here is an example that uses a uint64_t to store the digit list so that no array is needed. #include <stdint.h> #include <inttypes.h> #include <stdarg.h> #include <stdio.h> #include <stdlib.h> unsigned getRand(unsigned max) { return rand()%(max+1); } //Creates a uint64_t that is used as an array. //Use no more than 16 values and don't use values >0x0F //The last argument will have index 0 uint64_t fakeArrayCreate(uint_fast8_t count, ...) { uint64_t result=0; va_list args; va_start (args, count); while(count--) { result=(result<<4) | va_arg(args,int); } return result; } uint_fast8_t fakeArrayGet(uint64_t array, uint_fast8_t index) { return array>>(4*index)&0x0F; } uint64_t fakeArraySet(uint64_t array, uint_fast8_t index, uint_fast8_t value) { array = array & ~((uint64_t)0x0F<<(4*index)); array = array | ((uint64_t)value<<(4*index)); return array; } unsigned getRandomDigits(void) { uint64_t digits = fakeArrayCreate(9,9,8,7,6,5,4,3,2,1); uint_fast8_t left=4; unsigned result=0; while(left--) { uint_fast8_t digit=getRand(9-4+left); result=result*10+fakeArrayGet(digits,digit); //Move all digits above the selcted one 1 index down. //This removes the picked digit from the array. while(digit<8) { digits=fakeArraySet(digits,digit,fakeArrayGet(digits,digit+1)); digit++; } } return result; } //Test our function int main(int argc, char **argv) { srand(atoi(argv[1])); printf("%u\n",getRandomDigits()); }
You could use a partial Fisher-Yates shuffle on an array of 9 digits, stopping after 4 digits: // Return random integer from 0 to n-1 // (for n in range 1 to RAND_MAX+1u). int get_random_int(unsigned int n) { unsigned int x = (RAND_MAX + 1u) / n; unsigned int limit = x * n; int s; do { s = rand(); } while (s >= limit); return s / x; } // Return random 4-digit number from 1234 to 9876 with no // duplicate digits and no 0 digit. int get_random_4digit(void) { char possible[9] = {1, 2, 3, 4, 5, 6, 7, 8, 9}; int result = 0; int i; // Uses partial Fisher-Yates shuffle. for (i = 0; i < 4; i++) { // Get random position rand_pos from remaining possibilities i to 8 // (positions before i contain previous selected digits). int rand_pos = i + get_random_int(9 - i); // Select digit from position rand_pos. char digit = possible[rand_pos]; // Exchange digits at positions i and rand_pos. possible[rand_pos] = possible[i]; possible[i] = digit; // not really needed // Put selected digit into result. result = result * 10 + digit; } return result; } EDIT: I forgot the requirement "while only using int's, if, while and functions, so no arrays etc.", so feel free to ignore this answer! If normal C integer types are allowed including long long int, the get_random_4digit() function above can be replaced with the following to satisfy the requirement: // Return random 4-digit number from 1234 to 9876 with no // duplicate digits and no 0 digit. int get_random_4digit(void) { long long int possible = 0x123456789; // 4 bits per digit int result = 0; int i; // Uses partial Fisher-Yates shuffle. i = 0; while (i < 4) { // Determine random position rand_pos in remaining possibilities 0 to 8-i. int rand_pos = get_random_int(9 - i); // Select digit from position rand_pos. int digit = (possible >> (4 * rand_pos)) & 0xF; // Replace digit at position rand_pos with digit at position 0. possible ^= ((possible ^ digit) & 0xF) << (4 * rand_pos); // Shift remaining possible digits down one position. possible >>= 4; // Put selected digit into result. result = result * 10 + digit; i++; } return result; }
There are multiple answers to this question already, but none of them seem to fit the requirement only using ints, if, while and functions. Here is a modified version of Pelle Evensen's simple solution: #include <stdlib.h> int get_random_4digit(void) { int acc = 0, used = 0, i = 0; while (i < 4) { int idx = rand() % 9; // Not strictly uniform but never mind... if (!(used & (1 << idx))) { acc = acc * 10 + idx + 1; used |= 1 << idx; i++; } } return acc; }
Binary two consecutive 1 bits
I am trying to implement with C that outputs the number of two consecutive 1-bits in an integer without overlapping. This is my code: #include <stdio.h> int numPairs(int num) { int count = 0; while (num) { num = (num & (num << 1)); count++; } return count / 2; } int main(){ printf("%d\t", numPairs(10000)); printf("%d\t", numPairs(146)); printf("%d\t", numPairs(7645)); printf("%d\t", numPairs(16383)); return 0; } My output is 1 0 1 7 But the output should be 1 0 3 7 Everything is correct except for 7645, and I don't know what is wrong with this. For 7645 my code gives the result 1 but the correct result is 3.
Your method is inappropriate: You count the number of iterations required to null the expression n = n & (n << 1);. This will would be the maximum number of consecutive 1 bits. If the bit pairs are separate, the result will be different from the number of non overlapping bit pairs. In the case in 7645, 0x1ddd or 0001 1101 1101 1101 in decimal, there are 3 groups of 3 consecutive 1 bits, but they get nulled in parallel 3 iterations of the loop, hence count / 2 is 1. You must use a different algorithm such as: int numPairs(int num) { int count = 0; unsigned int x = num; while (x) { if ((x & 3) == 3) { count++; x >>= 2; } else { x >>= 1; } } return count; }
In case speed is important, this can also be done with bit manipulation operations: int numPairs(uint32_t x) { return __builtin_popcount((((x ^ 0x55555555) + 0x55555555) ^ 0x55555555) & x); } This produces a 1 bit in the upper bit of each disjoint 2-bit group of ones, and then counts the 1 bits.
C decompress Bitmask source
This may be somewhat of an odd question as well as my first one ever on this site and a pretty complicated thing to ask basically I have this decompresser for a very specific archived file, I barely understand this but from what i can grasp its some sort of "bit mask" it reads the first 2 bytes out of target file, and stores them as a sequence. The first for loop is where I get confused Say for arguments sake mask is 2 bytes 10 04, or 1040(decimal) thats what it usually is in these files for (t = 0; t<16; t++) { if (mask & (1 << (15 - t))) { This seems to be looping through all 16 bits of those 2 bytes and running an AND operation on mask (1040) on every bit? The if statement is what I don't understand completely? Whats triggering the if? If the bit is greater then 0? Because if mask is 1040, then really what were looking at is if(1040 & 32768) index 15 if(1040 & 16384) index 14 if(1040 & 8192) index 13 if(1040 & 4096) index 12 if(1040 & 2048) index 11 if(1040 & 1024) index 10 if(1040 & 512) and so on..... if(1040 & 256) I just really need to know whats triggering this if statement? i think i might be over thinking it, but is it simply trigger if the current bit is greater then 0? The only other thing i can do is compile this source myself, insert printfs on key variables and go hand in hand with a hex editor and try and figure out whats actually going on here, if anyone could give me a hand would be awesome. #include <stdlib.h> #include <stdio.h> #include <stdint.h> uint8_t dest[1024 * 1024 * 4]; // holds the actual data int main(int argc, char *argv[]) { FILE *fi, *fo; char fname[255]; uint16_t mask, tmp, offset, length; uint16_t seq; uint32_t dptr, sptr; uint16_t l, ct; uint16_t t, s; int test_len; int t_length, t_off; // Print Usage if filename is missing if (argc<3) { printf("sld_unpack - Decompressor for .sld files ()\nsld_unpack <filename.sld> <filename.t2>\n"); return(-1); } // Open .SLD-file if (!(fi = fopen(argv[1], "rb"))) { printf("Error opening %s\n", argv[1]); return(-1); } dptr = 0; fread((uint16_t*)&seq, 1, 2, fi); // read 1st 2 bytes in file test_len = ftell(fi); printf("[Main Header sequence: %d]\n 'offset' : %d \n", seq, test_len); sptr = 0; fread((uint16_t*)&seq, 1, 2, fi); while (!feof(fi)) { // while not at the end of the file set mask equal to sequence (first 2 bytes of header) mask = seq; // loop through 16 bit mask for (t = 0; t<16; t++) { if (mask & (1 << (15 - t))) { // check all bit fields and run AND check to if value greater then 0? test_len = ftell(fi); fread((uint16_t*)&seq, 1, 2, fi); // read sptr = sptr + 2; // set from 0 to 2 tmp = seq; // set tmp to sequence offset = ((uint32_t)tmp & 0x07ff) * 2; length = ((tmp >> 11) & 0x1f) * 2; // 32 - 1? if (length>0) { for (l = 0; l<length; l++) { dest[dptr] = dest[dptr - offset]; dptr++; } } else { // if length == 0 t_length = ftell(fi); fread((uint16_t*)&seq, 1, 2, fi); sptr = sptr + 2; length = seq * 2; for (s = 0; s<length; s++) { dest[dptr] = dest[dptr - offset]; dptr++; } } } else { // if sequence AND returns 0 (or less)? fread((uint16_t*)&seq, 1, 2, fi); t_length = ftell(fi); sptr = sptr + 2; dest[dptr++] = seq & 0xff; dest[dptr++] = (seq >> 8) & 0xff; } } fread((uint16_t*)&seq, 1, 2, fi); } fclose(fi); sprintf(fname, "%s\0", argv[2]); if (!(fo = fopen(fname, "wb"))) { // if file printf("Error creating %s\n", fname); return(-1); } fwrite((uint8_t*)&dest, 1, dptr, fo); fclose(fo); printf("Done.\n"); return(0); }
Be careful here. for arguments sake mask is 2 bytes 10 04, or 1040(decimal) That assumption may be nowhere close to true. You need to show how mask is defined, but generally a mask of bytes 10 (00001010) and 40 (00101000) is binary 101000101000 or decimal (2600) not quite 1040. The general mask of 2600 decimal will match when bits 4,6,10 & 12 are set. Remember a bit mask is nothing more than a number whose binary representation when anded or ored with a second number produces some desired result. Nothing magic about a bit mask, its just a number with the right bits set for your intended purpose. When you and two numbers together and test, your are testing whether there are common bits set in both numbers. Using the for loop and shift, you are doing a bitwise test for which common bits are set. Using the mask of 2600 with the loop counter will test true when bits 4,6,10 & 12 are set. In other words when the test clause equals 8, 32, 512 or 2048. The following is a short example of what is happening in the loop and if statements. #include <stdio.h> /* BUILD_64 */ #if defined(__LP64__) || defined(_LP64) # define BUILD_64 1 #endif /* BITS_PER_LONG */ #ifdef BUILD_64 # define BITS_PER_LONG 64 #else # define BITS_PER_LONG 32 #endif /* CHAR_BIT */ #ifndef CHAR_BIT # define CHAR_BIT 8 #endif char *binpad (unsigned long n, size_t sz); int main (void) { unsigned short t, mask; mask = (10 << 8) | 40; printf ("\n mask : %s (%hu)\n\n", binpad (mask, sizeof mask * CHAR_BIT), mask); for (t = 0; t<16; t++) if (mask & (1 << (15 - t))) printf (" t %2hu : %s (%hu)\n", t, binpad (mask & (1 << (15 - t)), sizeof mask * CHAR_BIT), mask & (1 << (15 - t))); return 0; } /** returns pointer to binary representation of 'n' zero padded to 'sz'. * returns pointer to string contianing binary representation of * unsigned 64-bit (or less ) value zero padded to 'sz' digits. */ char *binpad (unsigned long n, size_t sz) { static char s[BITS_PER_LONG + 1] = {0}; char *p = s + BITS_PER_LONG; register size_t i; for (i = 0; i < sz; i++) *--p = (n>>i & 1) ? '1' : '0'; return p; } Output $ ./bin/bitmask1040 mask : 0000101000101000 (2600) t 4 : 0000100000000000 (2048) t 6 : 0000001000000000 (512) t 10 : 0000000000100000 (32) t 12 : 0000000000001000 (8)
The if statement is what I don't understand completely? Whats triggering the if? If the bit is greater then 0? ... I just really need to know whats triggering this if statement? i think i might be over thinking it, but is it simply trigger if the current bit is greater then 0? The C (and C++) if statement "triggers" when the conditional statement evaluates to true, which is any non-zero value; zero equates to false. Straight C doesn't have a Boolean type, it just use the convention of zero (0) is false, and any other value is true. if (mask & (1 << (15 - t))) {...} is the same as if ((mask & (1 << (15 - t))) != 0) {...} The expression you gave is only true (non-zero) when there is a bit in the mask in the same position that the 1 was shifted by. i.e. is the 15th bit in the mask set, etc. N.b. mask & (1 << (15 - t)) can only ever be 0 or 1 er... will only have one bit set.
Find a unique bit in a collection of numbers
Best way to explain this is a demonstration. There is a collection of numbers. They may be repeated, so: 1110, 0100, 0100, 0010, 0110 ... The number I am looking for is the one that has a bit set, that does not appear in any of the others. The result is the number (in this case 1 - the first number) and the bit position (or the mask is fine) so 1000 (4th bit). There may be more than one solution, but for this purpose it may be greedy. I can do it by iteration... For each number N, it is: N & ~(other numbers OR'd together) But the nature of bits is that there is always a better method if you think outside the box. For instance numbers that appear more than once will never have a unique bit, and have no effect on ORing.
You just need to record whether each bit has been seen once or more and if it's been seen twice or more. Unique bits are those that have been seen once or more and not twice or more. This can be done efficiently using bitwise operations. count1 = 0 count2 = 0 for n in numbers: count2 |= count1 & n count1 |= n for n in numbers: if n & count1 & ~count2: return n If you don't want to iterate over the numbers twice you can keep track of the some number that you've seen that contains each bit. This might be a good optimisation if the numbers are stored on disk and so streaming them requires disk-access, but of course it makes the code a bit more complex. examples = [-1] * wordsize count1 = 0 count2 = 0 for n in numbers: if n & ~count1: for i in xrange(wordsize): if n & (1 << i): examples[i] = n count2 |= count1 & n count1 |= n for i in xrange(wordsize): if (count1 & ~count2) & (1 << i): return examples[i] You might use tricks to extract the bit indexes more efficiently in the loop that sets examples, but since this code is executed at most 'wordsize' times, it's probably not worth it. This code translates easily to C... I just wrote in Python for clarity.
(long version of what I wrote in a comment) By counting the number of times that the bit at index k is one for every k (there is a trick to do this faster than naively, but it's still O(n)), you get a list of bitlength counters in which a count of 1 means that bit was only one once. The index of that counter (found in O(1) because you have a fixed number of bits) is therefore the bit-position you want. To find the number with that bit set, just iterate of all the numbers again and check whether it has that bit set (O(n) again), if it does it's the number you want. In total: O(n) versus O(n2) of checking every number against all others.
This method uses less than 2 passes (but alters the input array) #include <stdio.h> unsigned array[] = { 0,1,2,3,4,5,6,7,8,16,17 }; #define COUNTOF(a) (sizeof(a)/sizeof(a)[0]) void swap(unsigned *a, unsigned *b) { unsigned tmp; tmp = *a; *a = *b; *b = tmp; } int main(void) { unsigned idx,bot,totmask,dupmask; /* First pass: shift all elements that introduce new bits into the found[] array. ** totmask is a mask of bits that occur once or more ** dupmask is a mask of bits that occur twice or more */ totmask=dupmask=0; for (idx=bot=0; idx < COUNTOF(array); idx++) { dupmask |= array[idx] & totmask; if (array[idx] & ~totmask) goto add; continue; add: totmask |= array[idx]; if (bot != idx) swap(array+bot,array+idx); bot++; } fprintf(stderr, "Bot=%u, totmask=%u, dupmask=%u\n", bot, totmask, dupmask ); /* Second pass: reduce list of candidates by checking if ** they consist of *only* duplicate bits */ for (idx=bot; idx-- > 0 ; ) { if ((array[idx] & dupmask) == array[idx]) goto del; continue; del: if (--bot != idx) swap(array+bot,array+idx); } fprintf(stdout, "Results[%u]:\n", bot ); for (idx=0; idx < bot; idx++) { fprintf(stdout, "[%u]: %x\n" ,idx, array[idx] ); } return 0; } UPDATE 2011-11-28 Another version, that does not alter the original array. The (temporary) results are kept in a separate array. #include <stdio.h> #include <limits.h> #include <assert.h> unsigned array[] = { 0,1,2,3,4,5,6,7,8,16,17,32,33,64,96,128,130 }; #define COUNTOF(a) (sizeof(a)/sizeof(a)[0]) void swap(unsigned *a, unsigned *b) { unsigned tmp; tmp = *a, *a = *b, *b = tmp; } int main(void) { unsigned idx,nfound,totmask,dupmask; unsigned found[sizeof array[0] *CHAR_BIT ]; /* First pass: save all elements that introduce new bits to the left ** totmask is a mask of bits that occur once or more ** dupmask is a mask of bits that occur twice or more */ totmask=dupmask=0; for (idx=nfound=0; idx < COUNTOF(array); idx++) { dupmask |= array[idx] & totmask; if (array[idx] & ~totmask) goto add; continue; add: totmask |= array[idx]; found[nfound++] = array[idx]; assert(nfound <= COUNTOF(found) ); } fprintf(stderr, "Bot=%u, totmask=%u, dupmask=%u\n", nfound, totmask, dupmask ); /* Second pass: reduce list of candidates by checking if ** they consist of *only* duplicate bits */ for (idx=nfound; idx-- > 0 ; ) { if ((found[idx] & dupmask) == found[idx]) goto del; continue; del: if (--nfound != idx) swap(found+nfound,found+idx); } fprintf(stdout, "Results[%u]:\n", nfound ); for (idx=0; idx < nfound; idx++) { fprintf(stdout, "[%u]: %x\n" ,idx, found[idx] ); } return 0; }
As pointed out this is not working: You can XOR together the numbers, the result will give you the mask. And then you have to find the first number which doesn't give 0 for the N & mask expression.
What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?
If I have some integer n, and I want to know the position of the most significant bit (that is, if the least significant bit is on the right, I want to know the position of the farthest left bit that is a 1), what is the quickest/most efficient method of finding out? I know that POSIX supports a ffs() method in <strings.h> to find the first set bit, but there doesn't seem to be a corresponding fls() method. Is there some really obvious way of doing this that I'm missing? What about in cases where you can't use POSIX functions for portability? EDIT: What about a solution that works on both 32- and 64-bit architectures (many of the code listings seem like they'd only work on 32-bit integers).
GCC has: -- Built-in Function: int __builtin_clz (unsigned int x) Returns the number of leading 0-bits in X, starting at the most significant bit position. If X is 0, the result is undefined. -- Built-in Function: int __builtin_clzl (unsigned long) Similar to `__builtin_clz', except the argument type is `unsigned long'. -- Built-in Function: int __builtin_clzll (unsigned long long) Similar to `__builtin_clz', except the argument type is `unsigned long long'. I'd expect them to be translated into something reasonably efficient for your current platform, whether it be one of those fancy bit-twiddling algorithms, or a single instruction. A useful trick if your input can be zero is __builtin_clz(x | 1): unconditionally setting the low bit without modifying any others makes the output 31 for x=0, without changing the output for any other input. To avoid needing to do that, your other option is platform-specific intrinsics like ARM GCC's __clz (no header needed), or x86's _lzcnt_u32 on CPUs that support the lzcnt instruction. (Beware that lzcnt decodes as bsr on older CPUs instead of faulting, which gives 31-lzcnt for non-zero inputs.) There's unfortunately no way to portably take advantage of the various CLZ instructions on non-x86 platforms that do define the result for input=0 as 32 or 64 (according to the operand width). x86's lzcnt does that, too, while bsr produces a bit-index that the compiler has to flip unless you use 31-__builtin_clz(x). (The "undefined result" is not C Undefined Behavior, just a value that isn't defined. It's actually whatever was in the destination register when the instruction ran. AMD documents this, Intel doesn't, but Intel's CPUs do implement that behaviour. But it's not whatever was previously in the C variable you're assigning to, that's not usually how things work when gcc turns C into asm. See also Why does breaking the "output dependency" of LZCNT matter?)
Since 2^N is an integer with only the Nth bit set (1 << N), finding the position (N) of the highest set bit is the integer log base 2 of that integer. http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious unsigned int v; unsigned r = 0; while (v >>= 1) { r++; } This "obvious" algorithm may not be transparent to everyone, but when you realize that the code shifts right by one bit repeatedly until the leftmost bit has been shifted off (note that C treats any non-zero value as true) and returns the number of shifts, it makes perfect sense. It also means that it works even when more than one bit is set — the result is always for the most significant bit. If you scroll down on that page, there are faster, more complex variations. However, if you know you're dealing with numbers with a lot of leading zeroes, the naive approach may provide acceptable speed, since bit shifting is rather fast in C, and the simple algorithm doesn't require indexing an array. NOTE: When using 64-bit values, be extremely cautious about using extra-clever algorithms; many of them only work correctly for 32-bit values.
Assuming you're on x86 and game for a bit of inline assembler, Intel provides a BSR instruction ("bit scan reverse"). It's fast on some x86s (microcoded on others). From the manual: Searches the source operand for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand. The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the content source operand is 0, the content of the destination operand is undefined. (If you're on PowerPC there's a similar cntlz ("count leading zeros") instruction.) Example code for gcc: #include <iostream> int main (int,char**) { int n=1; for (;;++n) { int msb; asm("bsrl %1,%0" : "=r"(msb) : "r"(n)); std::cout << n << " : " << msb << std::endl; } return 0; } See also this inline assembler tutorial, which shows (section 9.4) it being considerably faster than looping code.
This is sort of like finding a kind of integer log. There are bit-twiddling tricks, but I've made my own tool for this. The goal of course is for speed. My realization is that the CPU has an automatic bit-detector already, used for integer to float conversion! So use that. double ff=(double)(v|1); return ((*(1+(uint32_t *)&ff))>>20)-1023; // assumes x86 endianness This version casts the value to a double, then reads off the exponent, which tells you where the bit was. The fancy shift and subtract is to extract the proper parts from the IEEE value. It's slightly faster to use floats, but a float can only give you the first 24 bit positions because of its smaller precision. To do this safely, without undefined behaviour in C++ or C, use memcpy instead of pointer casting for type-punning. Compilers know how to inline it efficiently. // static_assert(sizeof(double) == 2 * sizeof(uint32_t), "double isn't 8-byte IEEE binary64"); // and also static_assert something about FLT_ENDIAN? double ff=(double)(v|1); uint32_t tmp; memcpy(&tmp, ((const char*)&ff)+sizeof(uint32_t), sizeof(uint32_t)); return (tmp>>20)-1023; Or in C99 and later, use a union {double d; uint32_t u[2];};. But note that in C++, union type punning is only supported on some compilers as an extension, not in ISO C++. This will usually be slower than a platform-specific intrinsic for a leading-zeros counting instruction, but portable ISO C has no such function. Some CPUs also lack a leading-zero counting instruction, but some of those can efficiently convert integers to double. Type-punning an FP bit pattern back to integer can be slow, though (e.g. on PowerPC it requires a store/reload and usually causes a load-hit-store stall). This algorithm could potentially be useful for SIMD implementations, because fewer CPUs have SIMD lzcnt. x86 only got such an instruction with AVX512CD
This should be lightning fast: int msb(unsigned int v) { static const int pos[32] = {0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9}; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v = (v >> 1) + 1; return pos[(v * 0x077CB531UL) >> 27]; }
Kaz Kylheku here I benchmarked two approaches for this over 63 bit numbers (the long long type on gcc x86_64), staying away from the sign bit. (I happen to need this "find highest bit" for something, you see.) I implemented the data-driven binary search (closely based on one of the above answers). I also implemented a completely unrolled decision tree by hand, which is just code with immediate operands. No loops, no tables. The decision tree (highest_bit_unrolled) benchmarked to be 69% faster, except for the n = 0 case for which the binary search has an explicit test. The binary-search's special test for 0 case is only 48% faster than the decision tree, which does not have a special test. Compiler, machine: (GCC 4.5.2, -O3, x86-64, 2867 Mhz Intel Core i5). int highest_bit_unrolled(long long n) { if (n & 0x7FFFFFFF00000000) { if (n & 0x7FFF000000000000) { if (n & 0x7F00000000000000) { if (n & 0x7000000000000000) { if (n & 0x4000000000000000) return 63; else return (n & 0x2000000000000000) ? 62 : 61; } else { if (n & 0x0C00000000000000) return (n & 0x0800000000000000) ? 60 : 59; else return (n & 0x0200000000000000) ? 58 : 57; } } else { if (n & 0x00F0000000000000) { if (n & 0x00C0000000000000) return (n & 0x0080000000000000) ? 56 : 55; else return (n & 0x0020000000000000) ? 54 : 53; } else { if (n & 0x000C000000000000) return (n & 0x0008000000000000) ? 52 : 51; else return (n & 0x0002000000000000) ? 50 : 49; } } } else { if (n & 0x0000FF0000000000) { if (n & 0x0000F00000000000) { if (n & 0x0000C00000000000) return (n & 0x0000800000000000) ? 48 : 47; else return (n & 0x0000200000000000) ? 46 : 45; } else { if (n & 0x00000C0000000000) return (n & 0x0000080000000000) ? 44 : 43; else return (n & 0x0000020000000000) ? 42 : 41; } } else { if (n & 0x000000F000000000) { if (n & 0x000000C000000000) return (n & 0x0000008000000000) ? 40 : 39; else return (n & 0x0000002000000000) ? 38 : 37; } else { if (n & 0x0000000C00000000) return (n & 0x0000000800000000) ? 36 : 35; else return (n & 0x0000000200000000) ? 34 : 33; } } } } else { if (n & 0x00000000FFFF0000) { if (n & 0x00000000FF000000) { if (n & 0x00000000F0000000) { if (n & 0x00000000C0000000) return (n & 0x0000000080000000) ? 32 : 31; else return (n & 0x0000000020000000) ? 30 : 29; } else { if (n & 0x000000000C000000) return (n & 0x0000000008000000) ? 28 : 27; else return (n & 0x0000000002000000) ? 26 : 25; } } else { if (n & 0x0000000000F00000) { if (n & 0x0000000000C00000) return (n & 0x0000000000800000) ? 24 : 23; else return (n & 0x0000000000200000) ? 22 : 21; } else { if (n & 0x00000000000C0000) return (n & 0x0000000000080000) ? 20 : 19; else return (n & 0x0000000000020000) ? 18 : 17; } } } else { if (n & 0x000000000000FF00) { if (n & 0x000000000000F000) { if (n & 0x000000000000C000) return (n & 0x0000000000008000) ? 16 : 15; else return (n & 0x0000000000002000) ? 14 : 13; } else { if (n & 0x0000000000000C00) return (n & 0x0000000000000800) ? 12 : 11; else return (n & 0x0000000000000200) ? 10 : 9; } } else { if (n & 0x00000000000000F0) { if (n & 0x00000000000000C0) return (n & 0x0000000000000080) ? 8 : 7; else return (n & 0x0000000000000020) ? 6 : 5; } else { if (n & 0x000000000000000C) return (n & 0x0000000000000008) ? 4 : 3; else return (n & 0x0000000000000002) ? 2 : (n ? 1 : 0); } } } } } int highest_bit(long long n) { const long long mask[] = { 0x000000007FFFFFFF, 0x000000000000FFFF, 0x00000000000000FF, 0x000000000000000F, 0x0000000000000003, 0x0000000000000001 }; int hi = 64; int lo = 0; int i = 0; if (n == 0) return 0; for (i = 0; i < sizeof mask / sizeof mask[0]; i++) { int mi = lo + (hi - lo) / 2; if ((n >> mi) != 0) lo = mi; else if ((n & (mask[i] << lo)) != 0) hi = mi; } return lo + 1; } Quick and dirty test program: #include <stdio.h> #include <time.h> #include <stdlib.h> int highest_bit_unrolled(long long n); int highest_bit(long long n); main(int argc, char **argv) { long long n = strtoull(argv[1], NULL, 0); int b1, b2; long i; clock_t start = clock(), mid, end; for (i = 0; i < 1000000000; i++) b1 = highest_bit_unrolled(n); mid = clock(); for (i = 0; i < 1000000000; i++) b2 = highest_bit(n); end = clock(); printf("highest bit of 0x%llx/%lld = %d, %d\n", n, n, b1, b2); printf("time1 = %d\n", (int) (mid - start)); printf("time2 = %d\n", (int) (end - mid)); return 0; } Using only -O2, the difference becomes greater. The decision tree is almost four times faster. I also benchmarked against the naive bit shifting code: int highest_bit_shift(long long n) { int i = 0; for (; n; n >>= 1, i++) ; /* empty */ return i; } This is only fast for small numbers, as one would expect. In determining that the highest bit is 1 for n == 1, it benchmarked more than 80% faster. However, half of randomly chosen numbers in the 63 bit space have the 63rd bit set! On the input 0x3FFFFFFFFFFFFFFF, the decision tree version is quite a bit faster than it is on 1, and shows to be 1120% faster (12.2 times) than the bit shifter. I will also benchmark the decision tree against the GCC builtins, and also try a mixture of inputs rather than repeating against the same number. There may be some sticking branch prediction going on and perhaps some unrealistic caching scenarios which makes it artificially faster on repetitions.
Although I would probably only use this method if I absolutely required the best possible performance (e.g. for writing some sort of board game AI involving bitboards), the most efficient solution is to use inline ASM. See the Optimisations section of this blog post for code with an explanation. [...], the bsrl assembly instruction computes the position of the most significant bit. Thus, we could use this asm statement: asm ("bsrl %1, %0" : "=r" (position) : "r" (number));
unsigned int msb32(register unsigned int x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); return(x & ~(x >> 1)); } 1 register, 13 instructions. Believe it or not, this is usually faster than the BSR instruction mentioned above, which operates in linear time. This is logarithmic time. From http://aggregate.org/MAGIC/#Most%20Significant%201%20Bit
What about int highest_bit(unsigned int a) { int count; std::frexp(a, &count); return count - 1; } ?
Here are some (simple) benchmarks, of algorithms currently given on this page... The algorithms have not been tested over all inputs of unsigned int; so check that first, before blindly using something ;) On my machine clz (__builtin_clz) and asm work best. asm seems even faster then clz... but it might be due to the simple benchmark... //////// go.c /////////////////////////////// // compile with: gcc go.c -o go -lm #include <math.h> #include <stdio.h> #include <stdlib.h> #include <time.h> /***************** math ********************/ #define POS_OF_HIGHESTBITmath(a) /* 0th position is the Least-Signif-Bit */ \ ((unsigned) log2(a)) /* thus: do not use if a <= 0 */ #define NUM_OF_HIGHESTBITmath(a) ((a) \ ? (1U << POS_OF_HIGHESTBITmath(a)) \ : 0) /***************** clz ********************/ unsigned NUM_BITS_U = ((sizeof(unsigned) << 3) - 1); #define POS_OF_HIGHESTBITclz(a) (NUM_BITS_U - __builtin_clz(a)) /* only works for a != 0 */ #define NUM_OF_HIGHESTBITclz(a) ((a) \ ? (1U << POS_OF_HIGHESTBITclz(a)) \ : 0) /***************** i2f ********************/ double FF; #define POS_OF_HIGHESTBITi2f(a) (FF = (double)(ui|1), ((*(1+(unsigned*)&FF))>>20)-1023) #define NUM_OF_HIGHESTBITi2f(a) ((a) \ ? (1U << POS_OF_HIGHESTBITi2f(a)) \ : 0) /***************** asm ********************/ unsigned OUT; #define POS_OF_HIGHESTBITasm(a) (({asm("bsrl %1,%0" : "=r"(OUT) : "r"(a));}), OUT) #define NUM_OF_HIGHESTBITasm(a) ((a) \ ? (1U << POS_OF_HIGHESTBITasm(a)) \ : 0) /***************** bitshift1 ********************/ #define NUM_OF_HIGHESTBITbitshift1(a) (({ \ OUT = a; \ OUT |= (OUT >> 1); \ OUT |= (OUT >> 2); \ OUT |= (OUT >> 4); \ OUT |= (OUT >> 8); \ OUT |= (OUT >> 16); \ }), (OUT & ~(OUT >> 1))) \ /***************** bitshift2 ********************/ int POS[32] = {0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9}; #define POS_OF_HIGHESTBITbitshift2(a) (({ \ OUT = a; \ OUT |= OUT >> 1; \ OUT |= OUT >> 2; \ OUT |= OUT >> 4; \ OUT |= OUT >> 8; \ OUT |= OUT >> 16; \ OUT = (OUT >> 1) + 1; \ }), POS[(OUT * 0x077CB531UL) >> 27]) #define NUM_OF_HIGHESTBITbitshift2(a) ((a) \ ? (1U << POS_OF_HIGHESTBITbitshift2(a)) \ : 0) #define LOOPS 100000000U int main() { time_t start, end; unsigned ui; unsigned n; /********* Checking the first few unsigned values (you'll need to check all if you want to use an algorithm here) **************/ printf("math\n"); for (ui = 0U; ui < 18; ++ui) printf("%i\t%i\n", ui, NUM_OF_HIGHESTBITmath(ui)); printf("\n\n"); printf("clz\n"); for (ui = 0U; ui < 18U; ++ui) printf("%i\t%i\n", ui, NUM_OF_HIGHESTBITclz(ui)); printf("\n\n"); printf("i2f\n"); for (ui = 0U; ui < 18U; ++ui) printf("%i\t%i\n", ui, NUM_OF_HIGHESTBITi2f(ui)); printf("\n\n"); printf("asm\n"); for (ui = 0U; ui < 18U; ++ui) { printf("%i\t%i\n", ui, NUM_OF_HIGHESTBITasm(ui)); } printf("\n\n"); printf("bitshift1\n"); for (ui = 0U; ui < 18U; ++ui) { printf("%i\t%i\n", ui, NUM_OF_HIGHESTBITbitshift1(ui)); } printf("\n\n"); printf("bitshift2\n"); for (ui = 0U; ui < 18U; ++ui) { printf("%i\t%i\n", ui, NUM_OF_HIGHESTBITbitshift2(ui)); } printf("\n\nPlease wait...\n\n"); /************************* Simple clock() benchmark ******************/ start = clock(); for (ui = 0; ui < LOOPS; ++ui) n = NUM_OF_HIGHESTBITmath(ui); end = clock(); printf("math:\t%e\n", (double)(end-start)/CLOCKS_PER_SEC); start = clock(); for (ui = 0; ui < LOOPS; ++ui) n = NUM_OF_HIGHESTBITclz(ui); end = clock(); printf("clz:\t%e\n", (double)(end-start)/CLOCKS_PER_SEC); start = clock(); for (ui = 0; ui < LOOPS; ++ui) n = NUM_OF_HIGHESTBITi2f(ui); end = clock(); printf("i2f:\t%e\n", (double)(end-start)/CLOCKS_PER_SEC); start = clock(); for (ui = 0; ui < LOOPS; ++ui) n = NUM_OF_HIGHESTBITasm(ui); end = clock(); printf("asm:\t%e\n", (double)(end-start)/CLOCKS_PER_SEC); start = clock(); for (ui = 0; ui < LOOPS; ++ui) n = NUM_OF_HIGHESTBITbitshift1(ui); end = clock(); printf("bitshift1:\t%e\n", (double)(end-start)/CLOCKS_PER_SEC); start = clock(); for (ui = 0; ui < LOOPS; ++ui) n = NUM_OF_HIGHESTBITbitshift2(ui); end = clock(); printf("bitshift2\t%e\n", (double)(end-start)/CLOCKS_PER_SEC); printf("\nThe lower, the better. Take note that a negative exponent is good! ;)\n"); return EXIT_SUCCESS; }
Some overly complex answers here. The Debruin technique should only be used when the input is already a power of two, otherwise there's a better way. For a power of 2 input, Debruin is the absolute fastest, even faster than _BitScanReverse on any processor I've tested. However, in the general case, _BitScanReverse (or whatever the intrinsic is called in your compiler) is the fastest (on certain CPU's it can be microcoded though). If the intrinsic function is not an option, here is an optimal software solution for processing general inputs. u8 inline log2 (u32 val) { u8 k = 0; if (val > 0x0000FFFFu) { val >>= 16; k = 16; } if (val > 0x000000FFu) { val >>= 8; k |= 8; } if (val > 0x0000000Fu) { val >>= 4; k |= 4; } if (val > 0x00000003u) { val >>= 2; k |= 2; } k |= (val & 2) >> 1; return k; } Note that this version does not require a Debruin lookup at the end, unlike most of the other answers. It computes the position in place. Tables can be preferable though, if you call it repeatedly enough times, the risk of a cache miss becomes eclipsed by the speedup of a table. u8 kTableLog2[256] = { 0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4, 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5, 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7 }; u8 log2_table(u32 val) { u8 k = 0; if (val > 0x0000FFFFuL) { val >>= 16; k = 16; } if (val > 0x000000FFuL) { val >>= 8; k |= 8; } k |= kTableLog2[val]; // precompute the Log2 of the low byte return k; } This should produce the highest throughput of any of the software answers given here, but if you only call it occasionally, prefer a table-free solution like my first snippet.
I had a need for a routine to do this and before searching the web (and finding this page) I came up with my own solution basedon a binary search. Although I'm sure someone has done this before! It runs in constant time and can be faster than the "obvious" solution posted, although I'm not making any great claims, just posting it for interest. int highest_bit(unsigned int a) { static const unsigned int maskv[] = { 0xffff, 0xff, 0xf, 0x3, 0x1 }; const unsigned int *mask = maskv; int l, h; if (a == 0) return -1; l = 0; h = 32; do { int m = l + (h - l) / 2; if ((a >> m) != 0) l = m; else if ((a & (*mask << l)) != 0) h = m; mask++; } while (l < h - 1); return l; }
A version in C using successive approximation: unsigned int getMsb(unsigned int n) { unsigned int msb = sizeof(n) * 4; unsigned int step = msb; while (step > 1) { step /=2; if (n>>msb) msb += step; else msb -= step; } if (n>>msb) msb++; return (msb - 1); } Advantage: the running time is constant regardless of the provided number, as the number of loops are always the same. ( 4 loops when using "unsigned int")
thats some kind of binary search, it works with all kinds of (unsigned!) integer types #include <climits> #define UINT (unsigned int) #define UINT_BIT (CHAR_BIT*sizeof(UINT)) int msb(UINT x) { if(0 == x) return -1; int c = 0; for(UINT i=UINT_BIT>>1; 0<i; i>>=1) if(static_cast<UINT>(x >> i)) { x >>= i; c |= i; } return c; } to make complete: #include <climits> #define UINT unsigned int #define UINT_BIT (CHAR_BIT*sizeof(UINT)) int lsb(UINT x) { if(0 == x) return -1; int c = UINT_BIT-1; for(UINT i=UINT_BIT>>1; 0<i; i>>=1) if(static_cast<UINT>(x << i)) { x <<= i; c ^= i; } return c; }
Expanding on Josh's benchmark... one can improve the clz as follows /***************** clz2 ********************/ #define NUM_OF_HIGHESTBITclz2(a) ((a) \ ? (((1U) << (sizeof(unsigned)*8-1)) >> __builtin_clz(a)) \ : 0) Regarding the asm: note that there are bsr and bsrl (this is the "long" version). the normal one might be a bit faster.
As the answers above point out, there are a number of ways to determine the most significant bit. However, as was also pointed out, the methods are likely to be unique to either 32bit or 64bit registers. The stanford.edu bithacks page provides solutions that work for both 32bit and 64bit computing. With a little work, they can be combined to provide a solid cross-architecture approach to obtaining the MSB. The solution I arrived at that compiled/worked across 64 & 32 bit computers was: #if defined(__LP64__) || defined(_LP64) # define BUILD_64 1 #endif #include <stdio.h> #include <stdint.h> /* for uint32_t */ /* CHAR_BIT (or include limits.h) */ #ifndef CHAR_BIT #define CHAR_BIT 8 #endif /* CHAR_BIT */ /* * Find the log base 2 of an integer with the MSB N set in O(N) * operations. (on 64bit & 32bit architectures) */ int getmsb (uint32_t word) { int r = 0; if (word < 1) return 0; #ifdef BUILD_64 union { uint32_t u[2]; double d; } t; // temp t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000; t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = word; t.d -= 4503599627370496.0; r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF; #else while (word >>= 1) { r++; } #endif /* BUILD_64 */ return r; }
I know this question is very old, but just having implemented an msb() function myself, I found that most solutions presented here and on other websites are not necessarily the most efficient - at least for my personal definition of efficiency (see also Update below). Here's why: Most solutions (especially those which employ some sort of binary search scheme or the naïve approach which does a linear scan from right to left) seem to neglect the fact that for arbitrary binary numbers, there are not many which start with a very long sequence of zeros. In fact, for any bit-width, half of all integers start with a 1 and a quarter of them start with 01. See where i'm getting at? My argument is that a linear scan starting from the most significant bit position to the least significant (left to right) is not so "linear" as it might look like at first glance. It can be shown1, that for any bit-width, the average number of bits that need to be tested is at most 2. This translates to an amortized time complexity of O(1) with respect to the number of bits (!). Of course, the worst case is still O(n), worse than the O(log(n)) you get with binary-search-like approaches, but since there are so few worst cases, they are negligible for most applications (Update: not quite: There may be few, but they might occur with high probability - see Update below). Here is the "naïve" approach i've come up with, which at least on my machine beats most other approaches (binary search schemes for 32-bit ints always require log2(32) = 5 steps, whereas this silly algorithm requires less than 2 on average) - sorry for this being C++ and not pure C: template <typename T> auto msb(T n) -> int { static_assert(std::is_integral<T>::value && !std::is_signed<T>::value, "msb<T>(): T must be an unsigned integral type."); for (T i = std::numeric_limits<T>::digits - 1, mask = 1 << i; i >= 0; --i, mask >>= 1) { if ((n & mask) != 0) return i; } return 0; } Update: While what i wrote here is perfectly true for arbitrary integers, where every combination of bits is equally probable (my speed test simply measured how long it took to determine the MSB for all 32-bit integers), real-life integers, for which such a function will be called, usually follow a different pattern: In my code, for example, this function is used to determine whether an object size is a power of 2, or to find the next power of 2 greater or equal than an object size. My guess is that most applications using the MSB involve numbers which are much smaller than the maximum number an integer can represent (object sizes rarely utilize all the bits in a size_t). In this case, my solution will actually perform worse than a binary search approach - so the latter should probably be preferred, even though my solution will be faster looping through all integers. TL;DR: Real-life integers will probably have a bias towards the worst case of this simple algorithm, which will make it perform worse in the end - despite the fact that it's amortized O(1) for truly arbitrary integers. 1The argument goes like this (rough draft): Let n be the number of bits (bit-width). There are a total of 2n integers wich can be represented with n bits. There are 2n - 1 integers starting with a 1 (first 1 is fixed, remaining n - 1 bits can be anything). Those integers require only one interation of the loop to determine the MSB. Further, There are 2n - 2 integers starting with 01, requiring 2 iterations, 2n - 3 integers starting with 001, requiring 3 iterations, and so on. If we sum up all the required iterations for all possible integers and divide them by 2n, the total number of integers, we get the average number of iterations needed for determining the MSB for n-bit integers: (1 * 2n - 1 + 2 * 2n - 2 + 3 * 2n - 3 + ... + n) / 2n This series of average iterations is actually convergent and has a limit of 2 for n towards infinity Thus, the naïve left-to-right algorithm has actually an amortized constant time complexity of O(1) for any number of bits.
c99 has given us log2. This removes the need for all the special sauce log2 implementations you see on this page. You can use the standard's log2 implementation like this: const auto n = 13UL; const auto Index = (unsigned long)log2(n); printf("MSB is: %u\n", Index); // Prints 3 (zero offset) An n of 0UL needs to be guarded against as well, because: -∞ is returned and FE_DIVBYZERO is raised I have written an example with that check that arbitrarily sets Index to ULONG_MAX here: https://ideone.com/u26vsi The visual-studio corollary to ephemient's gcc only answer is: const auto n = 13UL; unsigned long Index; _BitScanReverse(&Index, n); printf("MSB is: %u\n", Index); // Prints 3 (zero offset) The documentation for _BitScanReverse states that Index is: Loaded with the bit position of the first set bit (1) found In practice I've found that if n is 0UL that Index is set to 0UL, just as it would be for an n of 1UL. But the only thing guaranteed in the documentation in the case of an n of 0UL is that the return is: 0 if no set bits were found Thus, similarly to the preferable log2 implementation above the return should be checked setting Index to a flagged value in this case. I've again written an example of using ULONG_MAX for this flag value here: http://rextester.com/GCU61409
Think bitwise operators. I missunderstood the question the first time. You should produce an int with the leftmost bit set (the others zero). Assuming cmp is set to that value: position = sizeof(int)*8 while(!(n & cmp)){ n <<=1; position--; }
Woaw, that was many answers. I am not sorry for answering on an old question. int result = 0;//could be a char or int8_t instead if(value){//this assumes the value is 64bit if(0xFFFFFFFF00000000&value){ value>>=(1<<5); result|=(1<<5); }//if it is 32bit then remove this line if(0x00000000FFFF0000&value){ value>>=(1<<4); result|=(1<<4); }//and remove the 32msb if(0x000000000000FF00&value){ value>>=(1<<3); result|=(1<<3); } if(0x00000000000000F0&value){ value>>=(1<<2); result|=(1<<2); } if(0x000000000000000C&value){ value>>=(1<<1); result|=(1<<1); } if(0x0000000000000002&value){ result|=(1<<0); } }else{ result=-1; } This answer is pretty similar to another answer... oh well.
Note that what you are trying to do is calculate the integer log2 of an integer, #include <stdio.h> #include <stdlib.h> unsigned int Log2(unsigned long x) { unsigned long n = x; int bits = sizeof(x)*8; int step = 1; int k=0; for( step = 1; step < bits; ) { n |= (n >> step); step *= 2; ++k; } //printf("%ld %ld\n",x, (x - (n >> 1)) ); return(x - (n >> 1)); } Observe that you can attempt to search more than 1 bit at a time. unsigned int Log2_a(unsigned long x) { unsigned long n = x; int bits = sizeof(x)*8; int step = 1; int step2 = 0; //observe that you can move 8 bits at a time, and there is a pattern... //if( x>1<<step2+8 ) { step2+=8; //if( x>1<<step2+8 ) { step2+=8; //if( x>1<<step2+8 ) { step2+=8; //} //} //} for( step2=0; x>1L<<step2+8; ) { step2+=8; } //printf("step2 %d\n",step2); for( step = 0; x>1L<<(step+step2); ) { step+=1; //printf("step %d\n",step+step2); } printf("log2(%ld) %d\n",x,step+step2); return(step+step2); } This approach uses a binary search unsigned int Log2_b(unsigned long x) { unsigned long n = x; unsigned int bits = sizeof(x)*8; unsigned int hbit = bits-1; unsigned int lbit = 0; unsigned long guess = bits/2; int found = 0; while ( hbit-lbit>1 ) { //printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit); //when value between guess..lbit if( (x<=(1L<<guess)) ) { //printf("%ld < 1<<%d %ld\n",x,guess,1L<<guess); hbit=guess; guess=(hbit+lbit)/2; //printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit); } //when value between hbit..guess //else if( (x>(1L<<guess)) ) { //printf("%ld > 1<<%d %ld\n",x,guess,1L<<guess); lbit=guess; guess=(hbit+lbit)/2; //printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit); } } if( (x>(1L<<guess)) ) ++guess; printf("log2(x%ld)=r%d\n",x,guess); return(guess); } Another binary search method, perhaps more readable, unsigned int Log2_c(unsigned long x) { unsigned long v = x; unsigned int bits = sizeof(x)*8; unsigned int step = bits; unsigned int res = 0; for( step = bits/2; step>0; ) { //printf("log2(%ld) v %d >> step %d = %ld\n",x,v,step,v>>step); while ( v>>step ) { v>>=step; res+=step; //printf("log2(%ld) step %d res %d v>>step %ld\n",x,step,res,v); } step /= 2; } if( (x>(1L<<res)) ) ++res; printf("log2(x%ld)=r%ld\n",x,res); return(res); } And because you will want to test these, int main() { unsigned long int x = 3; for( x=2; x<1000000000; x*=2 ) { //printf("x %ld, x+1 %ld, log2(x+1) %d\n",x,x+1,Log2(x+1)); printf("x %ld, x+1 %ld, log2_a(x+1) %d\n",x,x+1,Log2_a(x+1)); printf("x %ld, x+1 %ld, log2_b(x+1) %d\n",x,x+1,Log2_b(x+1)); printf("x %ld, x+1 %ld, log2_c(x+1) %d\n",x,x+1,Log2_c(x+1)); } return(0); }
Putting this in since it's 'yet another' approach, seems to be different from others already given. returns -1 if x==0, otherwise floor( log2(x)) (max result 31) Reduce from 32 to 4 bit problem, then use a table. Perhaps inelegant, but pragmatic. This is what I use when I don't want to use __builtin_clz because of portability issues. To make it more compact, one could instead use a loop to reduce, adding 4 to r each time, max 7 iterations. Or some hybrid, such as (for 64 bits): loop to reduce to 8, test to reduce to 4. int log2floor( unsigned x ){ static const signed char wtab[16] = {-1,0,1,1, 2,2,2,2, 3,3,3,3,3,3,3,3}; int r = 0; unsigned xk = x >> 16; if( xk != 0 ){ r = 16; x = xk; } // x is 0 .. 0xFFFF xk = x >> 8; if( xk != 0){ r += 8; x = xk; } // x is 0 .. 0xFF xk = x >> 4; if( xk != 0){ r += 4; x = xk; } // now x is 0..15; x=0 only if originally zero. return r + wtab[x]; }
Another poster provided a lookup-table using a byte-wide lookup. In case you want to eke out a bit more performance (at the cost of 32K of memory instead of just 256 lookup entries) here is a solution using a 15-bit lookup table, in C# 7 for .NET. The interesting part is initializing the table. Since it's a relatively small block that we want for the lifetime of the process, I allocate unmanaged memory for this by using Marshal.AllocHGlobal. As you can see, for maximum performance, the whole example is written as native: readonly static byte[] msb_tab_15; // Initialize a table of 32768 bytes with the bit position (counting from LSB=0) // of the highest 'set' (non-zero) bit of its corresponding 16-bit index value. // The table is compressed by half, so use (value >> 1) for indexing. static MyStaticInit() { var p = new byte[0x8000]; for (byte n = 0; n < 16; n++) for (int c = (1 << n) >> 1, i = 0; i < c; i++) p[c + i] = n; msb_tab_15 = p; } The table requires one-time initialization via the code above. It is read-only so a single global copy can be shared for concurrent access. With this table you can quickly look up the integer log2, which is what we're looking for here, for all the various integer widths (8, 16, 32, and 64 bits). Notice that the table entry for 0, the sole integer for which the notion of 'highest set bit' is undefined, is given the value -1. This distinction is necessary for proper handling of 0-valued upper words in the code below. Without further ado, here is the code for each of the various integer primitives: ulong (64-bit) Version /// <summary> Index of the highest set bit in 'v', or -1 for value '0' </summary> public static int HighestOne(this ulong v) { if ((long)v <= 0) return (int)((v >> 57) & 0x40) - 1; // handles cases v==0 and MSB==63 int j = /**/ (int)((0xFFFFFFFFU - v /****/) >> 58) & 0x20; j |= /*****/ (int)((0x0000FFFFU - (v >> j)) >> 59) & 0x10; return j + msb_tab_15[v >> (j + 1)]; } uint (32-bit) Version /// <summary> Index of the highest set bit in 'v', or -1 for value '0' </summary> public static int HighestOne(uint v) { if ((int)v <= 0) return (int)((v >> 26) & 0x20) - 1; // handles cases v==0 and MSB==31 int j = (int)((0x0000FFFFU - v) >> 27) & 0x10; return j + msb_tab_15[v >> (j + 1)]; } Various overloads for the above public static int HighestOne(long v) => HighestOne((ulong)v); public static int HighestOne(int v) => HighestOne((uint)v); public static int HighestOne(ushort v) => msb_tab_15[v >> 1]; public static int HighestOne(short v) => msb_tab_15[(ushort)v >> 1]; public static int HighestOne(char ch) => msb_tab_15[ch >> 1]; public static int HighestOne(sbyte v) => msb_tab_15[(byte)v >> 1]; public static int HighestOne(byte v) => msb_tab_15[v >> 1]; This is a complete, working solution which represents the best performance on .NET 4.7.2 for numerous alternatives that I compared with a specialized performance test harness. Some of these are mentioned below. The test parameters were a uniform density of all 65 bit positions, i.e., 0 ... 31/63 plus value 0 (which produces result -1). The bits below the target index position were filled randomly. The tests were x64 only, release mode, with JIT-optimizations enabled. That's the end of my formal answer here; what follows are some casual notes and links to source code for alternative test candidates associated with the testing I ran to validate the performance and correctness of the above code. The version provided above above, coded as Tab16A was a consistent winner over many runs. These various candidates, in active working/scratch form, can be found here, here, and here. 1 candidates.HighestOne_Tab16A 622,496 2 candidates.HighestOne_Tab16C 628,234 3 candidates.HighestOne_Tab8A 649,146 4 candidates.HighestOne_Tab8B 656,847 5 candidates.HighestOne_Tab16B 657,147 6 candidates.HighestOne_Tab16D 659,650 7 _highest_one_bit_UNMANAGED.HighestOne_U 702,900 8 de_Bruijn.IndexOfMSB 709,672 9 _old_2.HighestOne_Old2 715,810 10 _test_A.HighestOne8 757,188 11 _old_1.HighestOne_Old1 757,925 12 _test_A.HighestOne5 (unsafe) 760,387 13 _test_B.HighestOne8 (unsafe) 763,904 14 _test_A.HighestOne3 (unsafe) 766,433 15 _test_A.HighestOne1 (unsafe) 767,321 16 _test_A.HighestOne4 (unsafe) 771,702 17 _test_B.HighestOne2 (unsafe) 772,136 18 _test_B.HighestOne1 (unsafe) 772,527 19 _test_B.HighestOne3 (unsafe) 774,140 20 _test_A.HighestOne7 (unsafe) 774,581 21 _test_B.HighestOne7 (unsafe) 775,463 22 _test_A.HighestOne2 (unsafe) 776,865 23 candidates.HighestOne_NoTab 777,698 24 _test_B.HighestOne6 (unsafe) 779,481 25 _test_A.HighestOne6 (unsafe) 781,553 26 _test_B.HighestOne4 (unsafe) 785,504 27 _test_B.HighestOne5 (unsafe) 789,797 28 _test_A.HighestOne0 (unsafe) 809,566 29 _test_B.HighestOne0 (unsafe) 814,990 30 _highest_one_bit.HighestOne 824,345 30 _bitarray_ext.RtlFindMostSignificantBit 894,069 31 candidates.HighestOne_Naive 898,865 Notable is that the terrible performance of ntdll.dll!RtlFindMostSignificantBit via P/Invoke: [DllImport("ntdll.dll"), SuppressUnmanagedCodeSecurity, SecuritySafeCritical] public static extern int RtlFindMostSignificantBit(ulong ul); It's really too bad, because here's the entire actual function: RtlFindMostSignificantBit: bsr rdx, rcx mov eax,0FFFFFFFFh movzx ecx, dl cmovne eax,ecx ret I can't imagine the poor performance originating with these five lines, so the managed/native transition penalties must be to blame. I was also surprised that the testing really favored the 32KB (and 64KB) short (16-bit) direct-lookup tables over the 128-byte (and 256-byte) byte (8-bit) lookup tables. I thought the following would be more competitive with the 16-bit lookups, but the latter consistently outperformed this: public static int HighestOne_Tab8A(ulong v) { if ((long)v <= 0) return (int)((v >> 57) & 64) - 1; int j; j = /**/ (int)((0xFFFFFFFFU - v) >> 58) & 32; j += /**/ (int)((0x0000FFFFU - (v >> j)) >> 59) & 16; j += /**/ (int)((0x000000FFU - (v >> j)) >> 60) & 8; return j + msb_tab_8[v >> j]; } The last thing I'll point out is that I was quite shocked that my deBruijn method didn't fare better. This is the method that I had previously been using pervasively: const ulong N_bsf64 = 0x07EDD5E59A4E28C2, N_bsr64 = 0x03F79D71B4CB0A89; readonly public static sbyte[] bsf64 = { 63, 0, 58, 1, 59, 47, 53, 2, 60, 39, 48, 27, 54, 33, 42, 3, 61, 51, 37, 40, 49, 18, 28, 20, 55, 30, 34, 11, 43, 14, 22, 4, 62, 57, 46, 52, 38, 26, 32, 41, 50, 36, 17, 19, 29, 10, 13, 21, 56, 45, 25, 31, 35, 16, 9, 12, 44, 24, 15, 8, 23, 7, 6, 5, }, bsr64 = { 0, 47, 1, 56, 48, 27, 2, 60, 57, 49, 41, 37, 28, 16, 3, 61, 54, 58, 35, 52, 50, 42, 21, 44, 38, 32, 29, 23, 17, 11, 4, 62, 46, 55, 26, 59, 40, 36, 15, 53, 34, 51, 20, 43, 31, 22, 10, 45, 25, 39, 14, 33, 19, 30, 9, 24, 13, 18, 8, 12, 7, 6, 5, 63, }; public static int IndexOfLSB(ulong v) => v != 0 ? bsf64[((v & (ulong)-(long)v) * N_bsf64) >> 58] : -1; public static int IndexOfMSB(ulong v) { if ((long)v <= 0) return (int)((v >> 57) & 64) - 1; v |= v >> 1; v |= v >> 2; v |= v >> 4; // does anybody know a better v |= v >> 8; v |= v >> 16; v |= v >> 32; // way than these 12 ops? return bsr64[(v * N_bsr64) >> 58]; } There's much discussion of how superior and great deBruijn methods at this SO question, and I had tended to agree. My speculation is that, while both the deBruijn and direct lookup table methods (that I found to be fastest) both have to do a table lookup, and both have very minimal branching, only the deBruijn has a 64-bit multiply operation. I only tested the IndexOfMSB functions here--not the deBruijn IndexOfLSB--but I expect the latter to fare much better chance since it has so many fewer operations (see above), and I'll likely continue to use it for LSB.
I assume your question is for an integer (called v below) and not an unsigned integer. int v = 612635685; // whatever value you wish unsigned int get_msb(int v) { int r = 31; // maximum number of iteration until integer has been totally left shifted out, considering that first bit is index 0. Also we could use (sizeof(int)) << 3 - 1 instead of 31 to make it work on any platform. while (!(v & 0x80000000) && r--) { // mask of the highest bit v <<= 1; // multiply integer by 2. } return r; // will even return -1 if no bit was set, allowing error catch } If you want to make it work without taking into account the sign you can add an extra 'v <<= 1;' before the loop (and change r value to 30 accordingly). Please let me know if I forgot anything. I haven't tested it but it should work just fine.
This looks big but works really fast compared to loop thank from bluegsmith int Bit_Find_MSB_Fast(int x2) { long x = x2 & 0x0FFFFFFFFl; long num_even = x & 0xAAAAAAAA; long num_odds = x & 0x55555555; if (x == 0) return(0); if (num_even > num_odds) { if ((num_even & 0xFFFF0000) != 0) // top 4 { if ((num_even & 0xFF000000) != 0) { if ((num_even & 0xF0000000) != 0) { if ((num_even & 0x80000000) != 0) return(32); else return(30); } else { if ((num_even & 0x08000000) != 0) return(28); else return(26); } } else { if ((num_even & 0x00F00000) != 0) { if ((num_even & 0x00800000) != 0) return(24); else return(22); } else { if ((num_even & 0x00080000) != 0) return(20); else return(18); } } } else { if ((num_even & 0x0000FF00) != 0) { if ((num_even & 0x0000F000) != 0) { if ((num_even & 0x00008000) != 0) return(16); else return(14); } else { if ((num_even & 0x00000800) != 0) return(12); else return(10); } } else { if ((num_even & 0x000000F0) != 0) { if ((num_even & 0x00000080) != 0)return(8); else return(6); } else { if ((num_even & 0x00000008) != 0) return(4); else return(2); } } } } else { if ((num_odds & 0xFFFF0000) != 0) // top 4 { if ((num_odds & 0xFF000000) != 0) { if ((num_odds & 0xF0000000) != 0) { if ((num_odds & 0x40000000) != 0) return(31); else return(29); } else { if ((num_odds & 0x04000000) != 0) return(27); else return(25); } } else { if ((num_odds & 0x00F00000) != 0) { if ((num_odds & 0x00400000) != 0) return(23); else return(21); } else { if ((num_odds & 0x00040000) != 0) return(19); else return(17); } } } else { if ((num_odds & 0x0000FF00) != 0) { if ((num_odds & 0x0000F000) != 0) { if ((num_odds & 0x00004000) != 0) return(15); else return(13); } else { if ((num_odds & 0x00000400) != 0) return(11); else return(9); } } else { if ((num_odds & 0x000000F0) != 0) { if ((num_odds & 0x00000040) != 0)return(7); else return(5); } else { if ((num_odds & 0x00000004) != 0) return(3); else return(1); } } } } }
There's a proposal to add bit manipulation functions in C, specifically leading zeros is helpful to find highest bit set. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2827.htm#design-bit-leading.trailing.zeroes.ones They are expected to be implemented as built-ins where possible, so sure it is an efficient way. This is similar to what was recently added to C++ (std::countl_zero, etc).
The code: // x>=1; unsigned func(unsigned x) { double d = x ; int p= (*reinterpret_cast<long long*>(&d) >> 52) - 1023; printf( "The left-most non zero bit of %d is bit %d\n", x, p); } Or get the integer part of FPU instruction FYL2X (Y*Log2 X) by setting Y=1
My humble method is very simple: MSB(x) = INT[Log(x) / Log(2)] Translation: The MSB of x is the integer value of (Log of Base x divided by the Log of Base 2). This can easily and quickly be adapted to any programming language. Try it on your calculator to see for yourself that it works.
Here is a fast solution for C that works in GCC and Clang; ready to be copied and pasted. #include <limits.h> unsigned int fls(const unsigned int value) { return (unsigned int)1 << ((sizeof(unsigned int) * CHAR_BIT) - __builtin_clz(value) - 1); } unsigned long flsl(const unsigned long value) { return (unsigned long)1 << ((sizeof(unsigned long) * CHAR_BIT) - __builtin_clzl(value) - 1); } unsigned long long flsll(const unsigned long long value) { return (unsigned long long)1 << ((sizeof(unsigned long long) * CHAR_BIT) - __builtin_clzll(value) - 1); } And a little improved version for C++. #include <climits> constexpr unsigned int fls(const unsigned int value) { return (unsigned int)1 << ((sizeof(unsigned int) * CHAR_BIT) - __builtin_clz(value) - 1); } constexpr unsigned long fls(const unsigned long value) { return (unsigned long)1 << ((sizeof(unsigned long) * CHAR_BIT) - __builtin_clzl(value) - 1); } constexpr unsigned long long fls(const unsigned long long value) { return (unsigned long long)1 << ((sizeof(unsigned long long) * CHAR_BIT) - __builtin_clzll(value) - 1); } The code assumes that value won't be 0. If you want to allow 0, you need to modify it.
Since I seemingly have nothing else to do, I dedicated an inordinate amount of time to this problem during the weekend. Without direct hardware support, it SEEMED like it should be possible to do better than O(log(w)) for w=64bit. And indeed, it is possible to do it in O(log log w), except the performance crossover doesn't happen until w>=256bit. Either way, I gave it a go and the best I could come up with was the following mix of techniques: uint64_t msb64 (uint64_t n) { const uint64_t M1 = 0x1111111111111111; // we need to clear blocks of b=4 bits: log(w/b) >= b n |= (n>>1); n |= (n>>2); // reverse prefix scan, compiles to 1 mulx uint64_t s = ((M1<<4)*(__uint128_t)(n&M1))>>64; // parallel-reduce each block s |= (s>>1); s |= (s>>2); // parallel reduce, 1 imul uint64_t c = (s&M1)*(M1<<4); // collect last nibble, generate compute count - count%4 c = c >> (64-4-2); // move last nibble to lowest bits leaving two extra bits c &= (0x0F<<2); // zero the lowest 2 bits // add the missing bits; this could be better solved with a bit of foresight // by having the sum already stored uint8_t b = (n >> c); // & 0x0F; // no need to zero the bits over the msb const uint64_t S = 0x3333333322221100; // last should give -1ul return c | ((S>>(4*b)) & 0x03); } This solution is branchless and doesn't require an external table that can generate cache misses. The two 64-bit multiplications aren't much of a performance issue in modern x86-64 architectures. I benchmarked the 64-bit versions of some of the most common solutions presented here and elsewhere. Finding a consistent timing and ranking proved to be way harder than I expected. This has to do not only with the distribution of the inputs, but also with out-of-order execution, and other CPU shennanigans, which can sometimes overlap the computation of two or more cycles in a loop. I ran the tests on an AMD Zen using RDTSC and taking a number of precautions such as running a warm-up, introducing artificial chain dependencies, and so on. For a 64-bit pseudorandom even distribution the results are: name cycles comment clz 5.16 builtin intrinsic, fastest cast 5.18 cast to double, extract exp ulog2 7.50 reduction + deBrujin msb64* 11.26 this version unrolled 19.12 varying performance obvious 110.49 "obviously" slowest for int64 Casting to double is always surprisingly close to the builtin intrinsic. The "obvious" way of adding the bits one at a time has the largest spread in performance of all, being comparable to the fastest methods for small numbers and 20x slower for the largest ones. My method is around 50% slower than deBrujin, but has the advantage of using no extra memory and having a predictable performance. I might try to further optimize it if I ever have time.