I have a requirement to compute k as the smallest power of 2 which is >= an integer value, n (n is always > 0)
currently I am using:
#define log2(x) log(x)/log(2)
#define round(x) (int)(x+0.5)
k = round(pow(2,(ceil(log2(n)))));
this is in a performance critical function
Is there a more computationally efficient way of calculating k?
/* returns greatest power of 2 less than or equal to x, branch-free */
/* Source: Hacker's Delight, First Edition. */
int
flp2(int x)
{
x = x | (x>>1);
x = x | (x>>2);
x = x | (x>>4);
x = x | (x>>8);
x = x | (x>>16);
return x - (x>>1);
}
It's entertaining to study it and see how it works. I think the only way for you to know for sure which of the solutions you see will be optimal for your situation is to use all of them in a text fixture and profile it and see which is most efficient for your purpose.
Being branch-free, this one is likely to be quite good performance-wise relative to some others, but you should test it directly to be sure.
If you want the least power of two greater than or equal to X, you can use a slightly different solution:
unsigned
clp2(unsigned x)
{
x = x -1;
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
return x + 1;
}
int calculate_least_covering_power_of_two(int x)
{
int k = 1;
while( k < x ) k = k << 1;
return k;
}
lim = 123;
n = 1;
while( ( n = n << 1 ) <= lim );
Multiply your number by 2 until it's bigger than lim.
Left shift of one multiplies value by 2.
Yes, You can calculate this by simply taking the number in question, and using bit-shifts to determine the power of 2.
Right-shifting takes all the bits in the number and moves them to the right, dropping the far right (least significant) digit. It is equivalent to performing an integer division by 2. Left-shifting a value moves all the bits to the left, dropping the bits that shift off the left end, and adding zeroes to the right end, effectively multiplying the value by 2.
So if you count how many times you need to right shift before the number reaches zero, you have calculated the integer portion of the base 2 logarithm. Then use it to create your result by left-shifting the value 1 that many times.
int CalculateK(int val)
{
int cnt = 0;
while(val > 0)
{
cnt++;
val = val >> 1;
}
return 1 << cnt;
}
EDIT: Alternatively, and a bit simpler: you don't have to calculate the count
int CalculateK(int val)
{
int res = 1;
while(res <= val) res <<= 1;
return res ;
}
k = 1 << (int)(ceil(log2(n)));
You can take advantage of the fact that binary digits represent powers of two (1 is 1, 10 is 2, 100 is 4, etc). Shifting 1 left by the exponent of 2 gives you the same value, but it's much faster.
Although if you can somehow avoid the ceil(log2(n)) you will see a much larger performance increase.
Source: hackersdelight.org
/* altered to: power of 2 which is greater than an integer value */
unsigned clp2(unsigned x) {
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >>16);
return x + 1;
}
Keep in mind you will need to add:
x = x | (x >> 32);
For 64bit numbers.
Related
I need to find out the mask value with respect to the number provided by the user.
For example. If user provides input as
22 (in binary 10110)
and then I need to find the mask value by changing the high bit of the input as 1 and rest to 0.
So in this case it should be:
16 (in binary 10000)
Is there any inbuilt method in c language to do so.
you could compute the position of the highest bit
Once you have it, just shift left to get the proper mask value:
unsigned int x = 22;
int result = 0;
if (x != 0)
{
unsigned int y = x;
int bit_pos=-1;
while (y != 0)
{
y >>= 1;
bit_pos++;
}
result = 1<<bit_pos;
}
this sets result to 16
(there's a particular case if entered value is 0)
Basically, you need to floor align to the nearest power of two number. I am not sure there is a standard function for that, but try the following:
static inline uint32_t
floor_align32pow2(uint32_t x)
{
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
return (x >> 1) + (x & 1);
}
This question I have tried to solve it but couldn't get any way. Any pointers would be appreciated.
Regular subtraction way of doing division is not the intention here, ingenious way of using shifting operator to get this done is the intention.
Although an answer has been accepted, I post mine for what it's worth.
UPDATE. This works by multiplying by a recurring binary fraction. In decimal 1/9 = 0.1111111 recurring. In binary, that is 1/1001 = 0.000111000111000111 recurring.
Notice the binary multiplier is in groups of 6 bits, decimal 7 recurring. So what I want to do here, is to multiply the dividend by 7, shift it right 6 bits, and add it to a running quotient. However to keep significance, I do the shift after the addition, and shift the quotient q after the loop ends to align it properly.
There are up to 6 iterations of the calculation loop for a 32 bit int (6 bits * 6 shifts = 36 bits).
#include<stdio.h>
int main(void)
{
unsigned x, y, q, d;
int i, err = 0;
for (x=1; x<100; x++) { // candidates
q = 0; // quotient
y = (x << 3) - x; // y = x * 7
while(y) { // until nothing significant
q += y; // add (effectively) binary 0.000111
y >>= 6; // realign
}
q >>= 6; // align
d = x / 9; // the true answer
if (d != q) {
printf ("%d / 9 = %d (%d)\n", x, q, d); // print any errors
err++;
}
}
printf ("Errors: %d\n", err);
return 0;
}
Unfortunately, this fails for every candidate that is a multiple of 9, for rounding error, due to the same reason that multiplying decimal 27 * 0.111111 = 2.999999 and not 3. So I now complicate the answer by keeping the 4 l.s. bits of the quotient for rounding. The result is it works for all int values limited by the two top nibbles, one for the * 7 and one for the * 16 significance.
#include<stdio.h>
int main(void)
{
unsigned x, y, q, d;
int i, err = 0;
for (x=1; x<0x00FFFFFF; x++) {
q = 8; // quotient with (effectively) 0.5 for rounding
y = (x << 3) - x; // y = x * 7
y <<= 4; // y *= 16 for rounding
while(y) { // until nothing significant
q += y; // add (effectively) binary 0.000111
y >>= 6; // realign
}
q >>= (4 + 6); // the 4 bits significance + recurrence
d = x / 9; // the true answer
if (d != q) {
printf ("%d / 9 = %d (%d)\n", x, q, d); // print any errors
err++;
}
}
printf ("Errors: %d\n", err);
return 0;
}
Here's a solution heavily inspired by Hacker's Delight that really uses only bit shifts:
def divu9(n):
q = n - (n >> 3)
q = q + (q >> 6)
q = q + (q>>12) + (q>>24); q = q >> 3
r = n - (((q << 2) << 1) + q)
return q + ((r + 7) >> 4)
#return q + (r > 8)
See this answer: https://stackoverflow.com/a/11694778/4907651
Exactly what you're looking for except the divisor is 3.
EDIT: explanation
I will replace the add function with simply + as you're looking for the solution without using * or / only.
In this explanation, we assume we are dividing by 3.
Also, I am assuming you know how to convert decimal to binary and vice versa.
int divideby3 (int num) {
int sum = 0;
while (num > 3) {
sum += (num >> 2);
num = (num >> 2) + (num & 3);
}
if (num == 3)
sum += 1;
return sum;
}
This approach uses bitwise operators:
bitwise AND: &.
bitwise left shift: <<. Shifts binary values left.
bitwise right shift: >>. Shifts binary values right.
bitwise XOR: ^
The first condition (num > 3) is as such because the divisor is 3. In your case, the divisor is 9, so when you use it, the condition must be (num > 9).
Suppose the number we want to divide is 6.
In binary, 6 is represented as 000110.
Now, we enter while (num > 3) loop. The first statement adds sum (initialised to 0) to num >> 2.
What num >> 2 does:
num in binary initially: 00000000 00000110
after bitwise shift: 00000000 00000001 i.e. 1 in decimal
sum after adding num >> 2 is 1.
Since we know num >> 2 is equal to 1, we add that to num & 3.
num in binary initially: 00000000 00000110
3 in binary: 00000000 00000011
For each bit position in the result of expression a & b, the bit is 1 if both operands contain 1, and 0 otherwise
result of num & 3: 00000000 00000010 i.e. 2 in decimal
num after num = (num >> 2) + (num & 3) equals 1 + 2 = 3
Now, since num is EQUAL to 3, we enter if (num==3) loop.
We then add 1 to sum, and return the value. This value of sum is the quotient.
As expected, the value returned is 2.
Hope that wasn't a horrible explanation.
Create a loop and every step you should substract N-9 .. then (N-9)-9 .. until N<9 OR N=0 and every substraction you count the step For exemple : 36/9 36-9=27 cmpt (1) 27-9=18 cmpt(2) 18-9=9 cmpt(3) 9-9=0 cmpt (4)
So 36/9= 4
This http://en.wikipedia.org/wiki/Ancient_Egyptian_multiplication algorithm can do it using only subtraction and binary shifts in log(n) time. However, as far as I know, state-of-the-art hardware already either use this one, or even better algorithms. Therefore, I do not think there is anything you can do (assuming performance is your goal) unless you can somehow avoid the division completely or change your use case so that you can divide by a power of 2, because there are some tricks for these cases.
If you're not allowed to multiply/divide, you're left with addition/subtraction. Dividing by a number shows how many times the divisor contains the dividend. You can use this in return: How many times can you subtract the number from the original value?
divisor = 85;
dividend = 9;
remaining = divisor;
result = 0;
while (remaining >= dividend)
{
remaining -= dividend;
result++;
}
std::cout << divisor << " / " << dividend << " = " << result;
If you need to divide a positive number, you can use the following function:
unsigned int divideBy9(unsigned int num)
{
unsigned int result = 0;
while (num >= 9)
{
result += 1;
num -= 9;
}
return result;
}
In the case of a negative number, you can use a similar approach.
Hope this helps!
I'm working on a function that returns 1 when x can be represented as an n-bit, 2’s complement number and 0 if it can't. Right now my code works for some examples like (5, 3), (-4, 3). But I can't get it to work for instances where n is bigger than x like (2, 6). Any suggestions as to why?
I do have restrictions though which include casting, either explicit or implicit, relative comparison operators (<, >, <=, and >=), division, modulus, and multiplication, subtraction, conditionals (if or ? :), loops, switch statements, function calls, and macro invocations. Assume 1 < n < 32.
int problem2(int x, int n){
int temp = x;
uint32_t mask;
int maskco;
mask = 0xFFFFFFFF << n;
maskco = (mask | temp);
return (maskco) == x;
}
In your function, temp is just redundant, and maskco always have the top bit(s) set, so it won't work if x is a positive number where the top bit isn't set
The simple solution is to mask out the most significant bits of the absolute value, leaving only the low n bits and check if it's still equal to the original value. The absolute value can be calculated using this method
int fit_in_n_bits(int x, int n)
{
int maskabs = x >> (sizeof(int) * CHAR_BIT - 1);
int xabs = (x + maskabs) ^ maskabs; // xabs = |x|
int nm = ~n + 1U; // nm = -n
int mask = 0xFFFFFFFFU >> (32 + nm);
return (xabs & mask) == xabs;
}
Another way:
int fit_in_n_bits2(int x, int n)
{
int nm = ~n + 1U;
int shift = 32U + nm;
int masksign = x >> (shift + 1);
int maskzero = 0xFFFFFFFFU >> shift;
return ((x & maskzero) | masksign) == x;
}
You can also check out oon's way here
int check_bits_fit_in_2s_complement(signed int x, unsigned int n) {
int mask = x >> 31;
return !(((~x & mask) + (x & ~mask))>> (n + ~0));
}
One more way
/*
* fitsBits - return 1 if x can be represented as an
* n-bit, two's complement integer.
* 1 <= n <= 32
* Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 15
* Rating: 2
*/
int fitsBits(int x, int n) {
int r, c;
c = 33 + ~n;
r = !(((x << c)>>c)^x);
return r;
}
Related:
How to tell if a 32 bit int can fit in a 16 bit short
counting the number of bit required to represent an integer in 2's complement
int problem2_mj(int x, int n){
unsigned int r;
int const mask = (-x) >> sizeof(int) * CHAR_BIT - 1;
r = (-x + mask - (1 & mask)) ^ mask; // Converts +n -> n, -n -> (n-1)
return !(((1 << (n-1)) - r) >> sizeof(int) * CHAR_BIT - 1);
}
Find the absolute value and subtract 1 if the number was negative
Check if number is less than or equal to 2n-1
Check a working demo here
As per your updated request here is the code how to add two numbers:
int AddNums(int x, int y)
{
int carry;
// Iteration 1
carry = x & y;
x = x ^ y;
y = carry << 1;
// Iteration 2
carry = x & y;
x = x ^ y;
y = carry << 1;
...
// Iteration 31 (I am assuming the size of int is 32 bits)
carry = x & y;
x = x ^ y;
y = carry << 1;
return x;
}
Getting the modulus of a number can be easily done without the modulus operator or divisions, if your operand is a power of 2. In that case, the following formula holds: x % y = (x & (y − 1)). This is often many performant in many architectures. Can the same be done for mod 31?
int mod31(int a){ return a % 31; };
Here are two ways to approach this problem. The first one using a common bit-twiddling technique, and if carefully optimized can beat hardware division. The other one substitutes a multiply for the divide, similar to the optimization performed by gcc, and is far and away the fastest. The bottom line is that there's not much point trying to avoid the % operator if the second argument is constant, because gcc's got it covered. (And probably other compilers, too.)
The following function is based on the fact that x is the same (mod 31) as the sum of the base-32 digits of x. That's true because 32 is 1 mod 31, and consequently any power of 32 is 1 mod 31. So each "digit" position in a base-32 number contributes the digit * 1 to the mod 31 sum. And it's easy to get the base-32 representation: we just take the bits five at a time.
(Like the rest of the functions in this answer, it will only work for non-negative x).
unsigned mod31(unsigned x) {
unsigned tmp;
for (tmp = 0; x; x >>= 5) {
tmp += x & 31;
}
// Here we assume that there are at most 160 bits in x
tmp = (tmp >> 5) + (tmp & 31);
return tmp >= 31 ? tmp - 31 : tmp;
}
For a specific integer size, you could unroll the loop and quite possibly beat division. (And see #chux's answer for a way to convert the loop into O(log bits) operations instead of O(bits) It's more difficult to beat gcc, which avoids division when the dividend is a constant known at compile-time.
In a very quick benchmark using unsigned 32 bit integers, the naive unrolled loop took 19 seconds and a version based on #chux's answer took only 13 seconds, but gcc's x%31 took 9.7 seconds. Forcing gcc to use a hardware divide (by making the division non-constant) took 23.4 seconds, and the code as shown above took 25.6 seconds. Those figures should be taken with several grains of salt. The times are for computing i%31 for all possible values of i, on my laptop using -O3 -march=native.
gcc avoids 32-bit division by a constant by replacing it with what is essentially a 64-bit multiplication by the inverse of the constant followed by a right shift. (The actual algorithm does a bit more work to avoid overflows.) The procedure was implemented more than 20 years ago in gcc v2.6, and the paper which describes the algorithm is available on the gmp site. (GMP also uses this trick.)
Here's a simplified version: Say we want to compute n // 31 for some unsigned 32-bit integer n (using the pythonic // to indicate truncated integer division). We use the "magic constant" m = 232 // 31, which is 138547332. Now it's clear that for any n:
m * n <= 232 * n/31 < m * n + n
⇒ m * n // 232 <= n//31 <= (m * n + n) // 232
(Here we make use of the fact that if a < b then floor(a) <= floor(b).)
Furthermore, since n < 232, m * n // 232 and (m * n + n) // 232 are either the same integer or two consecutive integers. Consequently, one (or both) of those two is the actual value of n//31.
Now, we really want to compute n%31. So we need to multiply the (presumed) quotient by 31, and subtract that from n. If we use the smaller of the two possible quotients, it may turn out that the computed modulo value is too big, but it can only be too big by 31.
Or, to put it in code:
static unsigned long long magic = 138547332;
unsigned mod31g(unsigned x) {
unsigned q = (x * magic) >> 32;
// To multiply by 31, we multiply by 32 and subtract
unsigned mod = x - ((q << 5) - q);
return mod < 31 ? mod : mod - 31;
}
The actual algorithm used by gcc avoids the test at the end by using a slightly more accurate computation based on multiplying by 237//31 + 1. That always produces the correct quotient, but at the cost of some extra shifts and adds to avoid integer overflow. As it turns out, the version above is slightly faster -- in the same benchmark as above, it took only 6.3 seconds.
Other benchmarked functions, for completeness:
Naive unrolled loop
unsigned mod31b(unsigned x) {
unsigned tmp = x & 31; x >>= 5;
tmp += x & 31; x >>= 5;
tmp += x & 31; x >>= 5;
tmp += x & 31; x >>= 5;
tmp += x & 31; x >>= 5;
tmp += x & 31; x >>= 5;
tmp += x & 31;
tmp = (tmp >> 5) + (tmp & 31);
return tmp >= 31 ? tmp - 31 : tmp;
}
#chux's improvement, slightly optimized
static const unsigned mask1 = (31U << 0) | (31U << 10) | (31U << 20) | (31U << 30);
static const unsigned mask2 = (31U << 5) | (31U << 15) | (31U << 25);
unsigned mod31c(unsigned x) {
x = (x & mask1) + ((x & mask2) >> 5);
x += x >> 20;
x += x >> 10;
x = (x & 31) + ((x >> 5) & 31);
return x >= 31 ? x - 31: x;
}
[Edit2] below for performance notes
An attempt with only 1 if condition.
This approach is O(log2(sizeof unsigned)). Run time would increase by 1 set of ands/shifts/add rather than twice the time with a loop approach should code use uint64_t.
unsigned mod31(uint32_t x) {
#define m31 (31lu)
#define m3131 ((m31 << 5) | m31)
#define m31313131 ((m3131 << 10) | m3131)
static const uint32_t mask1 = (m31 << 0) | (m31 << 10) | (m31 << 20) | (m31 << 30);
static const uint32_t mask2 = (m31 << 5) | (m31 << 15) | (m31 << 25);
uint32_t a = x & mask1;
uint32_t b = x & mask2;
x = a + (b >> 5);
// x = xx 0000x xxxxx 0000x xxxxx 0000x xxxxx
a = x & m31313131;
b = x & (m31313131 << 20);
x = a + (b >> 20);
// x = 00 00000 00000 000xx xxxxx 000xx xxxxx
a = x & m3131;
b = x & (m3131 << 10);
x = a + (b >> 10);
// x = 00 00000 00000 00000 00000 00xxx xxxxx
a = x & m31;
b = x & (m31 << 5);
x = a + (b >> 5);
// x = 00 00000 00000 00000 00000 0000x xxxxx
return x >= 31 ? x-31 : x;
}
[Edit]
The first addition method sums the individual 7 groups of five bit in parallel. Subsequent additions bring the 7 group into 4, then 2, then 1. This final 7-bit sum then proceeds to add its upper half (2-bits) to its lower half(5-bits). Code then uses one test to perform the final "mod".
This method scales for wider unsigned up to at least uint165_t log2(31+1)*(31+2). Pass that, a little more code is needed.
See #rici for some good optimizations. Still recommend using uint32_t vs. unsigned and 31UL in shifts like 31U << 15 as an unsigned 31U may only be 16 bits long. (16 bit int popular in embedded world in 2014).
[Edit2]
Besides letting the compiler use its optimizer, 2 additional techniques sped performance. These are more minor parlor tricks that yielded a modest improvement. Keep in mind YMMV and this is for a 32-bit unsigned.
Using a table look-up for the last modulo improved 10-20%. Using unsigned t table rather than unsigned char t helped a bit too. It turned out that table length, as first expected needed to be 2*31, only needed 31+5.
Using a local variable rather than always calling the function parameter surprisingly helped. Likely a weakness in my gcc compiler.
Found non-branching solutions, not shown, to replace x >= 31 ? x-31 : x. but their coding complexity was greater and performance was slower.
All-in-all, a fun exercise.
unsigned mod31quik(unsigned xx) {
#define mask (31u | (31u << 10) | (31u << 20) | (31u << 30))
unsigned x = (xx & mask) + ((xx >> 5) & mask);
x += x >> 20;
x += x >> 10;
x = (x & 31u) + ((x >> 5) & 31u);
static const unsigned char t[31 * 2 /* 36 */] = { 0, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
return t[x];
}
int mod31(int a){
while(a >= 31) {
a -= 31;
}
return a;
};
It works if a > 0, but I doubt it will be faster than % operator.
If you want to get the modulus of dividing by a denominator d such that d = (1 << e) - 1 where e is some exponent, you can use the fact that the binary expansion of 1/d is a repeating fraction with bits set every e digits. For example, for e = 5, d = 31, and 1/d = 0.0000100001....
Similar to rici’s answer, this algorithm effectively computes the sum of the base-(1 << e) digits of a:
uint16_t mod31(uint16_t a) {
uint16_t b;
for (b = a; a > 31; a = b)
for (b = 0; a != 0; a >>= 5)
b += a & 31;
return b == 31 ? 0 : b;
}
You can unroll this loop, because the denominator and the number of bits in the numerator are both constant, but it’s probably better to let the compiler do that. And of course you can change 5 to an input parameter and 31 to a variable computed from that.
You could use successive addition / subtraction. There is no other trick since 31 is a prime number to see what the modulus of a number N is mod 31 you will have to divide and find the remainder.
int mode(int number, int modulus) {
int result = number;
if (number >= 0) {
while(result > modulus) { result = result - modulus;}
} else {
while (result < 0) { result = result + modulus;)
}
}
unsigned long ccNextPOT(unsigned long x){
x = x - 1;
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >>16);
return x + 1;
}
The OR and SHIFT statements fills with ones all bits of x to the right of most significant bit (up to 32 bits). Together with the pre-decrement and post-increment statements, this computes (as the function name suggets) the next power-of-two number, equal or greater than the given number (if x is greater than 0 and less than 2^32)