Related
Let's say I've been given two integers a, b where a is a positive integer and is smaller than b. I have to find an efficient algorithm that's going to give me the sum of number of base2 digits (number of bits) over the interval [a, b]. For example, in the interval [0, 4] the sum of digits is equal to 9 because 0 = 1 digit, 1 = 1 digit, 2 = 2 digits, 3 = 2 digits and 4 = 3 digits.
My program is capable of calculating this number by using a loop but I'm looking for something more efficient for large numbers. Here are the snippets of my code just to give you an idea:
int numberOfBits(int i) {
if(i == 0) {
return 1;
}
else {
return (int) log2(i) + 1;
}
}
The function above is for calculating the number of digits of one number in the interval.
The code below shows you how I use it in my main function.
for(i = a; i <= b; i++) {
l = l + numberOfBits(i);
}
printf("Digits: %d\n", l);
Ideally I should be able to get the number of digits by using the two values of my interval and using some special algorithm to do that.
Try this code, i think it gives you what you are needing to calculate the binaries:
int bit(int x)
{
if(!x) return 1;
else
{
int i;
for(i = 0; x; i++, x >>= 1);
return i;
}
}
The main thing to understand here is that the number of digits used to represent a number in binary increases by one with each power of two:
+--------------+---------------+
| number range | binary digits |
+==============+===============+
| 0 - 1 | 1 |
+--------------+---------------+
| 2 - 3 | 2 |
+--------------+---------------+
| 4 - 7 | 3 |
+--------------+---------------+
| 8 - 15 | 4 |
+--------------+---------------+
| 16 - 31 | 5 |
+--------------+---------------+
| 32 - 63 | 6 |
+--------------+---------------+
| ... | ... |
A trivial improvement over your brute force algorithm would then be to figure out how many times this number of digits has increased between the two numbers passed in (given by the base two logarithm) and add up the digits by multiplying the count of numbers that can be represented by the given number of digits (given by the power of two) with the number of digits.
A naive implementation of this algorithm is:
int digits_sum_seq(int a, int b)
{
int sum = 0;
int i = 0;
int log2b = b <= 0 ? 1 : floor(log2(b));
int log2a = a <= 0 ? 1 : floor(log2(a)) + 1;
sum += (pow(2, log2a) - a) * (log2a);
for (i = log2b; i > log2a; i--)
sum += pow(2, i - 1) * i;
sum += (b - pow(2, log2b) + 1) * (log2b + 1);
return sum;
}
It can then be improved by the more efficient versions of the log and pow functions seen in the other answers.
First, we can improve the speed of log2, but that only gives us a fixed factor speed-up and doesn't change the scaling.
Faster log2 adapted from: https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup
The lookup table method takes only about 7 operations to find the log
of a 32-bit value. If extended for 64-bit quantities, it would take
roughly 9 operations. Another operation can be trimmed off by using
four tables, with the possible additions incorporated into each. Using
int table elements may be faster, depending on your architecture.
Second, we must re-think the algorithm. If you know that numbers between N and M have the same number of digits, would you add them up one by one or would you rather do (M-N+1)*numDigits?
But if we have a range where multiple numbers appear what do we do? Let's just find the intervals of same digits, and add sums of those intervals. Implemented below. I think that my findEndLimit could be further optimized with a lookup table.
Code
#include <stdio.h>
#include <limits.h>
#include <time.h>
unsigned int fastLog2(unsigned int v)
{
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
return (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
return (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
}
unsigned int numberOfBits(unsigned int i)
{
if (i == 0) {
return 1;
}
else {
return fastLog2(i) + 1;
}
}
unsigned int findEndLimit(unsigned int sx, unsigned int ex)
{
unsigned int sy = numberOfBits(sx);
unsigned int ey = numberOfBits(ex);
unsigned int mx;
unsigned int my;
if (sy == ey) // this also means sx == ex
return ex;
// assumes sy < ey
mx = (ex - sx) / 2 + sx; // will eq. sx for sx + 1 == ex
my = numberOfBits(mx);
while (ex - sx != 1) {
mx = (ex - sx) / 2 + sx; // will eq. sx for sx + 1 == ex
my = numberOfBits(mx);
if (my == ey) {
ex = mx;
ey = numberOfBits(ex);
}
else {
sx = mx;
sy = numberOfBits(sx);
}
}
return sx+1;
}
int main(void)
{
unsigned int a, b, m;
unsigned long l;
clock_t start, end;
l = 0;
a = 0;
b = UINT_MAX;
start = clock();
unsigned int i;
for (i = a; i < b; ++i) {
l += numberOfBits(i);
}
if (i == b) {
l += numberOfBits(i);
}
end = clock();
printf("Naive\n");
printf("Digits: %ld; Time: %fs\n",l, ((double)(end-start))/CLOCKS_PER_SEC);
l=0;
start = clock();
do {
m = findEndLimit(a, b);
l += (b-m + 1) * (unsigned long)numberOfBits(b);
b = m-1;
} while (b > a);
l += (b-a+1) * (unsigned long)numberOfBits(b);
end = clock();
printf("Binary search\n");
printf("Digits: %ld; Time: %fs\n",l, ((double)(end-start))/CLOCKS_PER_SEC);
}
Output
From 0 to UINT_MAX
$ ./main
Naive
Digits: 133143986178; Time: 25.722492s
Binary search
Digits: 133143986178; Time: 0.000025s
My findEndLimit can take long time in some edge cases:
From UINT_MAX/16+1 to UINT_MAX/8
$ ./main
Naive
Digits: 7784628224; Time: 1.651067s
Binary search
Digits: 7784628224; Time: 4.921520s
Conceptually, you would need to split the task to two subproblems -
1) find the sum of digits from 0..M, and from 0..N, then subtract.
2) find the floor(log2(x)), because eg for the number 77 the numbers 64,65,...77 all have 6 digits, the next 32 have 5 digits, the next 16 have 4 digits and so on, which makes a geometric progression.
Thus:
int digits(int a) {
if (a == 0) return 1; // should digits(0) be 0 or 1 ?
int b=(int)floor(log2(a)); // use any all-integer calculation hack
int sum = 1 + (b+1) * (a- (1<<b) +1); // added 1, due to digits(0)==1
while (--b)
sum += (b + 1) << b; // shortcut for (b + 1) * (1 << b);
return sum;
}
int digits_range(int a, int b) {
if (a <= 0 || b <= 0) return -1; // formulas work for strictly positive numbers
return digits(b)-digits(a-1);
}
As efficiency depends on the tools available, one approach would be doing it "analog":
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
unsigned long long pow2sum_min(unsigned long long n, long long unsigned m)
{
if (m >= n)
{
return 1;
}
--n;
return (2ULL << n) + pow2sum_min(n, m);
}
#define LN(x) (log2(x)/log2(M_E))
int main(int argc, char** argv)
{
if (2 >= argc)
{
fprintf(stderr, "%s a b\n", argv[0]);
exit(EXIT_FAILURE);
}
long a = atol(argv[1]), b = atol(argv[2]);
if (0L >= a || 0L >= b || b < a)
{
puts("Na ...!");
exit(EXIT_FAILURE);
}
/* Expand intevall to cover full dimensions: */
unsigned long long a_c = pow(2, floor(log2(a)));
unsigned long long b_c = pow(2, floor(log2(b+1)) + 1);
double log2_a_c = log2(a_c);
double log2_b_c = log2(b_c);
unsigned long p2s = pow2sum_min(log2_b_c, log2_a_c) - 1;
/* Integral log2(x) between a_c and b_c: */
double A = ((b_c * (LN(b_c) - 1))
- (a_c * (LN(a_c) - 1)))/LN(2)
+ (b+1 - a);
/* "Integer"-integral - integral of log2(x)'s inverse function (2**x) between log(a_c) and log(b_c): */
double D = p2s - (b_c - a_c)/LN(2);
/* Corrective from a_c/b_c to a/b : */
double C = (log2_b_c - 1)*(b_c - (b+1)) + log2_a_c*(a - a_c);
printf("Total used digits: %lld\n", (long long) ((A - D - C) +.5));
}
:-)
The main thing here is the number and kind of iterations done.
Number is
log(floor(b_c)) - log(floor(a_c))
times
doing one
n - 1 /* Integer decrement */
2**n + s /* One bit-shift and one integer addition */
for each iteration.
Here's an entirely look-up based approach. You don't even need the log2 :)
Algorithm
First we precompute interval limits where the number of bits would change and create a lookup table. In other words we create an array limits[2^n], where limits[i] gives us the biggest integer that can be represented with (i+1) bits. Our array is then {1, 3, 7, ..., 2^n-1}.
Then, when we want to determine the sum of bits for our range, we must first match our range limits a and b with the smallest index for which a <= limits[i] and b <= limits[j] holds, which will then tell us that we need (i+1) bits to represent a, and (j+1) bits to represent b.
If the indexes are the same, then the result is simply (b-a+1)*(i+1), otherwise we must separately get the number of bits from our value to the edge of same number of bits interval, and add up total number of bits for each interval between as well. In any case, simple arithmetic.
Code
#include <stdio.h>
#include <limits.h>
#include <time.h>
unsigned long bitsnumsum(unsigned int a, unsigned int b)
{
// generate lookup table
// limits[i] is the max. number we can represent with (i+1) bits
static const unsigned int limits[32] =
{
#define LTN(n) n*2u-1, n*4u-1, n*8u-1, n*16u-1, n*32u-1, n*64u-1, n*128u-1, n*256u-1
LTN(1),
LTN(256),
LTN(256*256),
LTN(256*256*256)
};
// make it work for any order of arguments
if (b < a) {
unsigned int c = a;
a = b;
b = c;
}
// find interval of a
unsigned int i = 0;
while (a > limits[i]) {
++i;
}
// find interval of b
unsigned int j = i;
while (b > limits[j]) {
++j;
}
// add it all up
unsigned long sum = 0;
if (i == j) {
// a and b in the same range
// conveniently, this also deals with j == 0
// so no danger to do [j-1] below
return (i+1) * (unsigned long)(b - a + 1);
}
else {
// add sum of digits in range [a, limits[i]]
sum += (i+1) * (unsigned long)(limits[i] - a + 1);
// add sum of digits in range [limits[j], b]
sum += (j+1) * (unsigned long)(b - limits[j-1]);
// add sum of digits in range [limits[i], limits[j]]
for (++i; i<j; ++i) {
sum += (i+1) * (unsigned long)(limits[i] - limits[i-1]);
}
return sum;
}
}
int main(void)
{
clock_t start, end;
unsigned int a=0, b=UINT_MAX;
start = clock();
printf("Sum of binary digits for numbers in range "
"[%u, %u]: %lu\n", a, b, bitsnumsum(a, b));
end = clock();
printf("Time: %fs\n", ((double)(end-start))/CLOCKS_PER_SEC);
}
Output
$ ./lookup
Sum of binary digits for numbers in range [0, 4294967295]: 133143986178
Time: 0.000282s
Algorithm
The main idea is to find the n2 = log2(x) rounded down. That is the number of digits in x. Let pow2 = 1 << n2. n2 * (pow2 - x + 1) is the number of digits in the values [x...pow2]. Now find the sun of digits in the powers of 2 from 1 to n2-1
Code
I am certain various simplifications can be made.
Untested code. Will review later.
// Let us use unsigned for everything.
unsigned ulog2(unsigned value) {
unsigned result = 0;
if (0xFFFF0000u & value) {
value >>= 16; result += 16;
}
if (0xFF00u & value) {
value >>= 8; result += 8;
}
if (0xF0u & value) {
value >>= 4; result += 4;
}
if (0xCu & value) {
value >>= 2; result += 2;
}
if (0x2 & value) {
value >>= 1; result += 1;
}
return result;
}
unsigned bit_count_helper(unsigned x) {
if (x == 0) {
return 1;
}
unsigned n2 = ulog2(x);
unsigned pow2 = 1u << n;
unsigned sum = n2 * (pow2 - x + 1u); // value from pow2 to x
while (n2 > 0) {
// ... + 5*16 + 4*8 + 3*4 + 2*2 + 1*1
pow2 /= 2;
sum += n2 * pow2;
}
return sum;
}
unsigned bit_count(unsigned a, unsigned b) {
assert(a < b);
return bit_count_helper(b - 1) - bit_count_helper(a);
}
For this problem your solution is the simplest, the one called "naive" where you look for every element in the sequence or in your case interval for check something or execute operations.
Naive Algorithm
Assuming that a and b are positive integers with b greater than a let's call the dimension/size of the interval [a,b], n = (b-a).
Having our number of elements n and using some notations of algorithms (like big-O notation link), the worst case cost is O(n*(numberOfBits_cost)).
From this we can see that we can speed up our algorithm by using a faster algorithm for computing numberOfBits() or we need to find a way to not look at every element of the interval that costs us n operations.
Intuition
Now looking at a possible interval [6,14] you can see that for 6 and 7 we need 3 digits, with 4 need for 8,9,10,11,12,13,14. This results in calling numberOfBits() for every number that use the same number of digits to be represented, while the following multiplication operation would be faster:
(number_in_subinterval)*digitsForThisInterval
((14-8)+1)*4 = 28
((7-6)+1)*3 = 6
So we reduced the looping on 9 elements with 9 operations to only 2.
So writing a function that use this intuition will give us a more efficient in time, not necessarily in memory, algorithm. Using your numberOfBits() function I have created this solution:
int intuitionSol(int a, int b){
int digitsForA = numberOfBits(a);
int digitsForB = numberOfBits(b);
if(digitsForA != digitsForB){
//because a or b can be that isn't the first or last element of the
// interval that a specific number of digit can rappresent there is a need
// to execute some correction operation before on a and b
int tmp = pow(2,digitsForA) - a;
int result = tmp*digitsForA; //will containt the final result that will be returned
int i;
for(i = digitsForA + 1; i < digitsForB; i++){
int interval_elements = pow(2,i) - pow(2,i-1);
result = result + ((interval_elements) * i);
//printf("NumOfElem: %i for %i digits; sum:= %i\n", interval_elements, i, result);
}
int tmp1 = ((b + 1) - pow(2,digitsForB-1));
result = result + tmp1*digitsForB;
return result;
}
else {
int elements = (b - a) + 1;
return elements * digitsForA; // or digitsForB
}
}
Let's look at the cost, this algorithm costs is the cost of doing correction operation on a and b plus the most expensive one that of the for-loop. In my solution however I'm not looping over all elements but only on numberOfBits(b)-numberOfBits(a) that in the worst case, when [0,n], become log(n)-1 thats equivalent to O(log n).
To resume we passed from a linear operations cost O(n) to a logartmic one O(log n) in the worst case. Look on this diagram the diferinces between the two.
Note
When I talk about interval or sub-interval I refer to the interval of elements that use the same number of digits to represent the number in binary.
Following there are some output of my tests with the last one that shows the difference:
Considered interval is [0,4]
YourSol: 9 in time: 0.000015s
IntuitionSol: 9 in time: 0.000007s
Considered interval is [0,0]
YourSol: 1 in time: 0.000005s
IntuitionSol: 1 in time: 0.000005s
Considered interval is [4,7]
YourSol: 12 in time: 0.000016s
IntuitionSol: 12 in time: 0.000005s
Considered interval is [2,123456]
YourSol: 1967697 in time: 0.005010s
IntuitionSol: 1967697 in time: 0.000015s
Here is the code:
long long mul(long long x)
{
uint64_t M[64] = INIT;
uint64_t result = 0;
for ( int i = 0; i < 64; i++ )
{
uint64_t a = x & M[i];
uint64_t b = 0;
while ( a ){
b ^= a & 1;;
a >>= 1;
}
result |= b << (63 - i);
}
return result;
}
This code implements multiplication of the matrix and vector on GF(2). The code that returns result as the product of 64x64 matrix M and 1x64 vector x.
I want to know what linear algebraic operation( on GF(2) ) this code is:
long long unknown(long long x)
{
uint64_t A[] = INIT;
uint64_t a = 0, b = 0;
for( i = 1; i <= 64; i++ ){
for( j = i; j <= 64; j++ ){
if( ((x >> (64-i)) & 1) && ((x >> (64-j)) & 1) )
a ^= A[b];
b++;
}
}
return a;
}
I want to know what linear algebraic operation( on GF(2) ) this code is:
Of course you mean GF(2)64, the field of 64-dimensional vectors over GF(2).
Consider first the loop structure:
for( i = 1; i <= 64; i++ ){
for( j = i; j <= 64; j++ ){
That's looking at every distinct pair of indices (the indices themselves not necessarily distinct from each other). That should provide a first clue. We then see
if( ((x >> (64-i)) & 1) && ((x >> (64-j)) & 1) )
, which is testing whether vector x has both bit i and bit j set. If it does, then we add a row of matrix A into accumulation variable a, by vector sum (== element-wise exclusive or). By incrementing b on every inner-loop iteration, we ensure that each iteration services a different row of A. And that also tells us that A must have 64 * 65 / 2 = 160 rows (that matter).
In general, this is not a linear operation at all. The criterion for an operation o on a vector field over GF(2) to be linear boils down to this expression holding for all pairs of vectors x and y:
o(x + y) = o(x) + o(y)
Now, for notational convenience, let's consider the field GF(2)2 instead of GF(2)64; the result can be extended from the former to the latter simply by adding zeroes. Let x be the bit vector (1, 0) (represented, for example, by the integer 2). Let y be the bit vector (0, 1) (represented by the integer 1). And let A be this matrix:
1 0
0 1
1 0
Your operation has the following among its results:
operand result as integer comment
x (1, 0) 2 Only the first row is accumulated
y (1, 0) 2 Only the third row is accumulated
x + y (0, 1) 1 All rows are accumulated
Clearly, it is not the case that o(x) + o(y) = o(x + y) for this x, y, and characteristic A, so the operation is not linear for this A.
There are matrices A for which the corresponding operation is linear, but what linear operation they represent will depend on A. For example, it is possible to represent a wide variety of matrix-vector multiplications this way. It's not clear to me whether linear operations other than matrix-vector multiplications can be represented in this form, but I'm inclined to think not.
I am working on a fixed-point platform (floating-point arithmetic not supported).
I represent any rational number q as the floor value of q * (1 << precision).
I need an efficient method for calculating log base 2 of x, where 1 < x < 2.
Here is what I've done so far:
uint64_t Log2(uint64_t x, uint8_t precision)
{
uint64 res = 0;
uint64 one = (uint64_t)1 << precision;
uint64 two = (uint64_t)2 << precision;
for (uint8_t i = precision; i > 0 ; i--)
{
x = (x * x) / one; // now 1 < x < 4
if (x >= two)
{
x >>= 1; // now 1 < x < 2
res += (uint64_t)1 << (i - 1);
}
}
return res;
}
This works well, however, it takes a toll on the overall performance of my program, which requires executing this for a large amount of input values.
For all it matters, the precision used is 31, but this may change so I need to keep it as a variable.
Are there any optimizations that I can apply here?
I was thinking of something in the form of "multiply first, sum up last".
But that would imply calculating x ^ (2 ^ precision), which would very quickly overflow.
Update
I have previously tried to get rid of the branch, but it just made things worse:
for (uint8_t i = precision; i > 0 ; i--)
{
x = (x * x) / one; // now 1 < x < 4
uint64_t n = x / two;
x >>= n; // now 1 < x < 2
res += n << (i - 1);
}
return res;
The only things I can think of is to do the loop with a right-shift instead of a decrement and change a few operations to their equivalent binary ops. That may or may not be relevant to your platform, but in my x64 PC they yield an improvement of about 2%:
uint64_t Log2(uint64_t x, uint8_t precision)
{
uint64_t res = 0;
uint64_t two = (uint64_t)2 << precision;
for (uint64_t b = (uint64_t)1 << (precision - 1); b; b >>= 1)
{
x = (x * x) >> precision; // now 1 < x < 4
if (x & two)
{
x >>= 1; // now 1 < x < 2
res |= b;
}
}
return res;
}
My proposal would go from opposite direction -- into a use of a constant-performance at fixed number of steps.
Given a reasonable small amount of resources will still suffice and the precision target is known and always reached, the constant-performance deployment can beat most iterative schemes.
A Taylor expansion ( since 1715 ) of log2(x) provides both a solid calculus basement plus (almost) infinite precision a-priori known to be feasible for any depth of fixed-point arithmetics ( be it for Epiphany / FPGA / ASIC / you keep it private / ... )
Math transforms the whole problem into an optionally small amount of a few node points X_tab_i, for which ( as few as platform precision requires ) constants are pre-calculated for each node point. The rest is a platform-efficient assembly of Taylor sum of products, granting the result is obtained both in constant-time + having a residual error under design-driven threshold ( the target PSPACE x PTIME constraints tradeoff here is obvious for design phase, yet the process is always a CTIME, CSPACE once deployed )
Voilá:
Given X: lookup closest X_tab_i,
with C0_tab_i, C1_tab_i, C2_tab_i, .., Cn_tab_i
//-----------------------------------------------------------------<STATIC/CONST>
// ![i]
#DEFINE C0_tab_i <log2( X_tab_i )>
#DEFINE C1_tab_i < ( X_tab_i )^(-1) * ( +1 / ( 1 * ln(2) )>
#DEFINE C2_tab_i < ( X_tab_i )^(-2) * ( -1 / ( 2 * ln(2) )>
#DEFINE C3_tab_i < ( X_tab_i )^(-3) * ( +1 / ( 3 * ln(2) )>
::: : : :
#DEFINE CN_tab_i < ( X_tab_i )^(-N) * ( -1^(N-1) ) / ( N * ln(2) )>
// -----------------------------------------------------------------<PROCESS>-BEG
DIFF = X - X_tab_i; CORR = DIFF;
RES = C0_tab_i
+ C1_tab_i * CORR; CORR *= DIFF;
RES += C2_tab_i * CORR; CORR *= DIFF;
... +=
RES += Cn_tab_i * CORR; CORR *= DIFF;
// --------------------------------------------------------------<PROCESS>-END:
The problem I have is x = (16807 x k) % 65536
ie 16807k ≡ x (mod 65536)
I need to calculate k knowing x.
My best effort so far is something of a brute force. Is there a mathematical way to calculate k?
If not any optimisations on my current code would be appreciated.
t = x;
while ( t += 15115 ) // 16807k = 65536n + x - this is the n
{
if (t%16807 == 0)
return t/16807;
}
return x;
EDIT: Changed += to 15115
An odd numbers has a multiplicative inverse modulo a power of two.
The inverse of 16807 mod 216 is 22039.
That means that (16807 * 22039) % 65536 == 1, and consequently, that
(16807 * 22039 * x) % 65536 == x
And
k = (22039 * x) % 65536
So you don't have to try anything, you can simply calculate k directly.
You solve this kind of problems using the extended euclidean algorithm for the GCD of 16807 and 65536
The remainder sequence is initiated with
R0=65536
R1=16807
and the computation of the inverse with
V0=0 (V0*16807 == R0 mod 65536)
V1=1 (V1*16807 == R1 mod 65536)
Then using integer long division,
Q1=R0/R1=3,
R2=R0-Q1*R1=15115
V2=V0-Q*V1=-3 (V2*16807 == R2 mod 65536)
Q2=R1/R2=1,
R3=R1-Q2*R2=1692
V3=V1-Q2*V2=4
Q3=8, R4=1579, V4=-35
Q4=1, R5=113, V5=39
Q5=13, R6=110, V6=-542
Q6=1, R7=3, V7=581
Q7=36, R8=2, V8=-21458
Q8=1, R9=1, V9=22039
so that 22039 is found as the modular inverse of 15115 modulo 65536.
If you have to look up k repeatedly for different x, you can build a table of solutions before you start decoding:
uint16_t g = 16807u;
uint16_t *mods = malloc(0x10000 * sizeof(*mods));
int i;
for (i = 0; i < 0x10000; i++) {
uint16_t x = g * i; // x is effectively x mod 2**16
mods[x] = i;
};
The solution to yor equation in the 16-bit-range is then:
uint16_t k = mods[x];
It is assumed that x is a 16-bit unsigned integer. Don't forget to free(mods) after you're done.
If k is a solution, then k+65536 is also a solution.
The straightforward brute-force method to find the first k (k>= 0) would be:
for (k=0; k < 65536; k++) {
if ( (k*16807) % 65536 == x ) {
// Found it!
break;
}
}
if (k=65536) {
// No solution found
}
return k;
What is the most efficient way given to raise an integer to the power of another integer in C?
// 2^3
pow(2,3) == 8
// 5^5
pow(5,5) == 3125
Exponentiation by squaring.
int ipow(int base, int exp)
{
int result = 1;
for (;;)
{
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
break;
base *= base;
}
return result;
}
This is the standard method for doing modular exponentiation for huge numbers in asymmetric cryptography.
Note that exponentiation by squaring is not the most optimal method. It is probably the best you can do as a general method that works for all exponent values, but for a specific exponent value there might be a better sequence that needs fewer multiplications.
For instance, if you want to compute x^15, the method of exponentiation by squaring will give you:
x^15 = (x^7)*(x^7)*x
x^7 = (x^3)*(x^3)*x
x^3 = x*x*x
This is a total of 6 multiplications.
It turns out this can be done using "just" 5 multiplications via addition-chain exponentiation.
n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15
There are no efficient algorithms to find this optimal sequence of multiplications. From Wikipedia:
The problem of finding the shortest addition chain cannot be solved by dynamic programming, because it does not satisfy the assumption of optimal substructure. That is, it is not sufficient to decompose the power into smaller powers, each of which is computed minimally, since the addition chains for the smaller powers may be related (to share computations). For example, in the shortest addition chain for a¹⁵ above, the subproblem for a⁶ must be computed as (a³)² since a³ is re-used (as opposed to, say, a⁶ = a²(a²)², which also requires three multiplies).
If you need to raise 2 to a power. The fastest way to do so is to bit shift by the power.
2 ** 3 == 1 << 3 == 8
2 ** 30 == 1 << 30 == 1073741824 (A Gigabyte)
Here is the method in Java
private int ipow(int base, int exp)
{
int result = 1;
while (exp != 0)
{
if ((exp & 1) == 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
An extremely specialized case is, when you need say 2^(-x to the y), where x, is of course is negative and y is too large to do shifting on an int. You can still do 2^x in constant time by screwing with a float.
struct IeeeFloat
{
unsigned int base : 23;
unsigned int exponent : 8;
unsigned int signBit : 1;
};
union IeeeFloatUnion
{
IeeeFloat brokenOut;
float f;
};
inline float twoToThe(char exponent)
{
// notice how the range checking is already done on the exponent var
static IeeeFloatUnion u;
u.f = 2.0;
// Change the exponent part of the float
u.brokenOut.exponent += (exponent - 1);
return (u.f);
}
You can get more powers of 2 by using a double as the base type.
(Thanks a lot to commenters for helping to square this post away).
There's also the possibility that learning more about IEEE floats, other special cases of exponentiation might present themselves.
power() function to work for Integers Only
int power(int base, unsigned int exp){
if (exp == 0)
return 1;
int temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else
return base*temp*temp;
}
Complexity = O(log(exp))
power() function to work for negative exp and float base.
float power(float base, int exp) {
if( exp == 0)
return 1;
float temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else {
if(exp > 0)
return base*temp*temp;
else
return (temp*temp)/base; //negative exponent computation
}
}
Complexity = O(log(exp))
If you want to get the value of an integer for 2 raised to the power of something it is always better to use the shift option:
pow(2,5) can be replaced by 1<<5
This is much more efficient.
int pow( int base, int exponent)
{ // Does not work for negative exponents. (But that would be leaving the range of int)
if (exponent == 0) return 1; // base case;
int temp = pow(base, exponent/2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
Just as a follow up to comments on the efficiency of exponentiation by squaring.
The advantage of that approach is that it runs in log(n) time. For example, if you were going to calculate something huge, such as x^1048575 (2^20 - 1), you only have to go thru the loop 20 times, not 1 million+ using the naive approach.
Also, in terms of code complexity, it is simpler than trying to find the most optimal sequence of multiplications, a la Pramod's suggestion.
Edit:
I guess I should clarify before someone tags me for the potential for overflow. This approach assumes that you have some sort of hugeint library.
Late to the party:
Below is a solution that also deals with y < 0 as best as it can.
It uses a result of intmax_t for maximum range. There is no provision for answers that do not fit in intmax_t.
powjii(0, 0) --> 1 which is a common result for this case.
pow(0,negative), another undefined result, returns INTMAX_MAX
intmax_t powjii(int x, int y) {
if (y < 0) {
switch (x) {
case 0:
return INTMAX_MAX;
case 1:
return 1;
case -1:
return y % 2 ? -1 : 1;
}
return 0;
}
intmax_t z = 1;
intmax_t base = x;
for (;;) {
if (y % 2) {
z *= base;
}
y /= 2;
if (y == 0) {
break;
}
base *= base;
}
return z;
}
This code uses a forever loop for(;;) to avoid the final base *= base common in other looped solutions. That multiplication is 1) not needed and 2) could be int*int overflow which is UB.
more generic solution considering negative exponenet
private static int pow(int base, int exponent) {
int result = 1;
if (exponent == 0)
return result; // base case;
if (exponent < 0)
return 1 / pow(base, -exponent);
int temp = pow(base, exponent / 2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
The O(log N) solution in Swift...
// Time complexity is O(log N)
func power(_ base: Int, _ exp: Int) -> Int {
// 1. If the exponent is 1 then return the number (e.g a^1 == a)
//Time complexity O(1)
if exp == 1 {
return base
}
// 2. Calculate the value of the number raised to half of the exponent. This will be used to calculate the final answer by squaring the result (e.g a^2n == (a^n)^2 == a^n * a^n). The idea is that we can do half the amount of work by obtaining a^n and multiplying the result by itself to get a^2n
//Time complexity O(log N)
let tempVal = power(base, exp/2)
// 3. If the exponent was odd then decompose the result in such a way that it allows you to divide the exponent in two (e.g. a^(2n+1) == a^1 * a^2n == a^1 * a^n * a^n). If the eponent is even then the result must be the base raised to half the exponent squared (e.g. a^2n == a^n * a^n = (a^n)^2).
//Time complexity O(1)
return (exp % 2 == 1 ? base : 1) * tempVal * tempVal
}
int pow(int const x, unsigned const e) noexcept
{
return !e ? 1 : 1 == e ? x : (e % 2 ? x : 1) * pow(x * x, e / 2);
//return !e ? 1 : 1 == e ? x : (((x ^ 1) & -(e % 2)) ^ 1) * pow(x * x, e / 2);
}
Yes, it's recursive, but a good optimizing compiler will optimize recursion away.
One more implementation (in Java). May not be most efficient solution but # of iterations is same as that of Exponential solution.
public static long pow(long base, long exp){
if(exp ==0){
return 1;
}
if(exp ==1){
return base;
}
if(exp % 2 == 0){
long half = pow(base, exp/2);
return half * half;
}else{
long half = pow(base, (exp -1)/2);
return base * half * half;
}
}
I use recursive, if the exp is even,5^10 =25^5.
int pow(float base,float exp){
if (exp==0)return 1;
else if(exp>0&&exp%2==0){
return pow(base*base,exp/2);
}else if (exp>0&&exp%2!=0){
return base*pow(base,exp-1);
}
}
In addition to the answer by Elias, which causes Undefined Behaviour when implemented with signed integers, and incorrect values for high input when implemented with unsigned integers,
here is a modified version of the Exponentiation by Squaring that also works with signed integer types, and doesn't give incorrect values:
#include <stdint.h>
#define SQRT_INT64_MAX (INT64_C(0xB504F333))
int64_t alx_pow_s64 (int64_t base, uint8_t exp)
{
int_fast64_t base_;
int_fast64_t result;
base_ = base;
if (base_ == 1)
return 1;
if (!exp)
return 1;
if (!base_)
return 0;
result = 1;
if (exp & 1)
result *= base_;
exp >>= 1;
while (exp) {
if (base_ > SQRT_INT64_MAX)
return 0;
base_ *= base_;
if (exp & 1)
result *= base_;
exp >>= 1;
}
return result;
}
Considerations for this function:
(1 ** N) == 1
(N ** 0) == 1
(0 ** 0) == 1
(0 ** N) == 0
If any overflow or wrapping is going to take place, return 0;
I used int64_t, but any width (signed or unsigned) can be used with little modification. However, if you need to use a non-fixed-width integer type, you will need to change SQRT_INT64_MAX by (int)sqrt(INT_MAX) (in the case of using int) or something similar, which should be optimized, but it is uglier, and not a C constant expression. Also casting the result of sqrt() to an int is not very good because of floating point precission in case of a perfect square, but as I don't know of any implementation where INT_MAX -or the maximum of any type- is a perfect square, you can live with that.
I have implemented algorithm that memorizes all computed powers and then uses them when need. So for example x^13 is equal to (x^2)^2^2 * x^2^2 * x where x^2^2 it taken from the table instead of computing it once again. This is basically implementation of #Pramod answer (but in C#).
The number of multiplication needed is Ceil(Log n)
public static int Power(int base, int exp)
{
int tab[] = new int[exp + 1];
tab[0] = 1;
tab[1] = base;
return Power(base, exp, tab);
}
public static int Power(int base, int exp, int tab[])
{
if(exp == 0) return 1;
if(exp == 1) return base;
int i = 1;
while(i < exp/2)
{
if(tab[2 * i] <= 0)
tab[2 * i] = tab[i] * tab[i];
i = i << 1;
}
if(exp <= i)
return tab[i];
else return tab[i] * Power(base, exp - i, tab);
}
Here is a O(1) algorithm for calculating x ** y, inspired by this comment. It works for 32-bit signed int.
For small values of y, it uses exponentiation by squaring. For large values of y, there are only a few values of x where the result doesn't overflow. This implementation uses a lookup table to read the result without calculating.
On overflow, the C standard permits any behavior, including crash. However, I decided to do bound-checking on LUT indices to prevent memory access violation, which could be surprising and undesirable.
Pseudo-code:
If `x` is between -2 and 2, use special-case formulas.
Otherwise, if `y` is between 0 and 8, use special-case formulas.
Otherwise:
Set x = abs(x); remember if x was negative
If x <= 10 and y <= 19:
Load precomputed result from a lookup table
Otherwise:
Set result to 0 (overflow)
If x was negative and y is odd, negate the result
C code:
#define POW9(x) x * x * x * x * x * x * x * x * x
#define POW10(x) POW9(x) * x
#define POW11(x) POW10(x) * x
#define POW12(x) POW11(x) * x
#define POW13(x) POW12(x) * x
#define POW14(x) POW13(x) * x
#define POW15(x) POW14(x) * x
#define POW16(x) POW15(x) * x
#define POW17(x) POW16(x) * x
#define POW18(x) POW17(x) * x
#define POW19(x) POW18(x) * x
int mypow(int x, unsigned y)
{
static int table[8][11] = {
{POW9(3), POW10(3), POW11(3), POW12(3), POW13(3), POW14(3), POW15(3), POW16(3), POW17(3), POW18(3), POW19(3)},
{POW9(4), POW10(4), POW11(4), POW12(4), POW13(4), POW14(4), POW15(4), 0, 0, 0, 0},
{POW9(5), POW10(5), POW11(5), POW12(5), POW13(5), 0, 0, 0, 0, 0, 0},
{POW9(6), POW10(6), POW11(6), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(7), POW10(7), POW11(7), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(8), POW10(8), 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(9), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(10), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int is_neg;
int r;
switch (x)
{
case 0:
return y == 0 ? 1 : 0;
case 1:
return 1;
case -1:
return y % 2 == 0 ? 1 : -1;
case 2:
return 1 << y;
case -2:
return (y % 2 == 0 ? 1 : -1) << y;
default:
switch (y)
{
case 0:
return 1;
case 1:
return x;
case 2:
return x * x;
case 3:
return x * x * x;
case 4:
r = x * x;
return r * r;
case 5:
r = x * x;
return r * r * x;
case 6:
r = x * x;
return r * r * r;
case 7:
r = x * x;
return r * r * r * x;
case 8:
r = x * x;
r = r * r;
return r * r;
default:
is_neg = x < 0;
if (is_neg)
x = -x;
if (x <= 10 && y <= 19)
r = table[x - 3][y - 9];
else
r = 0;
if (is_neg && y % 2 == 1)
r = -r;
return r;
}
}
}
My case is a little different, I'm trying to create a mask from a power, but I thought I'd share the solution I found anyway.
Obviously, it only works for powers of 2.
Mask1 = 1 << (Exponent - 1);
Mask2 = Mask1 - 1;
return Mask1 + Mask2;
In case you know the exponent (and it is an integer) at compile-time, you can use templates to unroll the loop. This can be made more efficient, but I wanted to demonstrate the basic principle here:
#include <iostream>
template<unsigned long N>
unsigned long inline exp_unroll(unsigned base) {
return base * exp_unroll<N-1>(base);
}
We terminate the recursion using a template specialization:
template<>
unsigned long inline exp_unroll<1>(unsigned base) {
return base;
}
The exponent needs to be known at runtime,
int main(int argc, char * argv[]) {
std::cout << argv[1] <<"**5= " << exp_unroll<5>(atoi(argv[1])) << ;std::endl;
}
I've noticed something strange about the standard exponential squaring algorithm with gnu-GMP :
I implemented 2 nearly-identical functions - a power-modulo function using the most vanilla binary exponential squaring algorithm,
labeled ______2()
then another one basically the same concept, but re-mapped to dividing by 10 at each round instead of dividing by 2,
labeled ______10()
.
( time ( jot - 1456 9999999999 6671 | pvE0 |
gawk -Mbe '
function ______10(_, __, ___, ____, _____, _______) {
__ = +__
____ = (____+=_____=____^= \
(_ %=___=+___)<_)+____++^____—
while (__) {
if (_______= __%____) {
if (__==_______) {
return (_^__ *_____) %___
}
__-=_______
_____ = (_^_______*_____) %___
}
__/=____
_ = _^____%___
}
}
function ______2(_, __, ___, ____, _____) {
__=+__
____+=____=_____^=(_%=___=+___)<_
while (__) {
if (__ %____) {
if (__<____) {
return (_*_____) %___
}
_____ = (_____*_) %___
--__
}
__/=____
_= (_*_) %___
}
}
BEGIN {
OFMT = CONVFMT = "%.250g"
__ = (___=_^= FS=OFS= "=")(_<_)
_____ = __^(_=3)^--_ * ++_-(_+_)^_
______ = _^(_+_)-_ + _^!_
_______ = int(______*_____)
________ = 10 ^ 5 + 1
_________ = 8 ^ 4 * 2 - 1
}
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
.
($++NF = ______10(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:08 [6.02MiB/s] [6.02MiB/s] [ <=> ]
in0: 15.6MiB 0:00:08 [1.95MiB/s] [1.95MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
8.31s user 0.06s system 103% cpu 8.058 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
($++NF = ______2(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:12 [3.78MiB/s] [3.78MiB/s] [<=> ]
in0: 15.6MiB 0:00:12 [1.22MiB/s] [1.22MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
13.05s user 0.07s system 102% cpu 12.821 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
For reasons extremely counter-intuitive and unknown to me, for a wide variety of inputs i threw at it, the div-10 variant is nearly always faster. It's the matching of hashes between the 2 that made it truly baffling, despite computers obviously not being built in and for a base-10 paradigm.
Am I missing something critical or obvious in the code/approach that might be skewing the results in a confounding manner ? Thanks.