Finding the minimum value to satisfy a modulus - c

The problem I have is x = (16807 x k) % 65536
ie 16807k ≡ x (mod 65536)
I need to calculate k knowing x.
My best effort so far is something of a brute force. Is there a mathematical way to calculate k?
If not any optimisations on my current code would be appreciated.
t = x;
while ( t += 15115 ) // 16807k = 65536n + x - this is the n
{
if (t%16807 == 0)
return t/16807;
}
return x;
EDIT: Changed += to 15115

An odd numbers has a multiplicative inverse modulo a power of two.
The inverse of 16807 mod 216 is 22039.
That means that (16807 * 22039) % 65536 == 1, and consequently, that
(16807 * 22039 * x) % 65536 == x
And
k = (22039 * x) % 65536
So you don't have to try anything, you can simply calculate k directly.

You solve this kind of problems using the extended euclidean algorithm for the GCD of 16807 and 65536
The remainder sequence is initiated with
R0=65536
R1=16807
and the computation of the inverse with
V0=0 (V0*16807 == R0 mod 65536)
V1=1 (V1*16807 == R1 mod 65536)
Then using integer long division,
Q1=R0/R1=3,
R2=R0-Q1*R1=15115
V2=V0-Q*V1=-3 (V2*16807 == R2 mod 65536)
Q2=R1/R2=1,
R3=R1-Q2*R2=1692
V3=V1-Q2*V2=4
Q3=8, R4=1579, V4=-35
Q4=1, R5=113, V5=39
Q5=13, R6=110, V6=-542
Q6=1, R7=3, V7=581
Q7=36, R8=2, V8=-21458
Q8=1, R9=1, V9=22039
so that 22039 is found as the modular inverse of 15115 modulo 65536.

If you have to look up k repeatedly for different x, you can build a table of solutions before you start decoding:
uint16_t g = 16807u;
uint16_t *mods = malloc(0x10000 * sizeof(*mods));
int i;
for (i = 0; i < 0x10000; i++) {
uint16_t x = g * i; // x is effectively x mod 2**16
mods[x] = i;
};
The solution to yor equation in the 16-bit-range is then:
uint16_t k = mods[x];
It is assumed that x is a 16-bit unsigned integer. Don't forget to free(mods) after you're done.

If k is a solution, then k+65536 is also a solution.
The straightforward brute-force method to find the first k (k>= 0) would be:
for (k=0; k < 65536; k++) {
if ( (k*16807) % 65536 == x ) {
// Found it!
break;
}
}
if (k=65536) {
// No solution found
}
return k;

Related

Uniform distribution in arc4random_uniform and PCG

Both arc4random_uniform from OpenBSD and the PCG library by Melissa O'Neill have a similar looking algorithm to generate a non biased unsigned integer value up to an excluding upper bound.
inline uint64_t
pcg_setseq_64_rxs_m_xs_64_boundedrand_r(struct pcg_state_setseq_64 * rng,
uint64_t bound) {
uint64_t threshold = -bound % bound;
for (;;) {
uint64_t r = pcg_setseq_64_rxs_m_xs_64_random_r(rng);
if (r >= threshold)
return r % bound;
}
}
Isn't -bound % bound always zero? If it's always zero then why have the loop and the if statement at all?
The OpenBSD has the same thing too.
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
uint32_t r, min;
if (upper_bound < 2)
return 0;
/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return r % upper_bound;
}
Apple's version of arc4random_uniform has a different version of it.
u_int32_t
arc4random_uniform(u_int32_t upper_bound)
{
u_int32_t r, min;
if (upper_bound < 2)
return (0);
#if (ULONG_MAX > 0xffffffffUL)
min = 0x100000000UL % upper_bound;
#else
/* Calculate (2**32 % upper_bound) avoiding 64-bit math */
if (upper_bound > 0x80000000)
min = 1 + ~upper_bound; /* 2**32 - upper_bound */
else {
/* (2**32 - (x * 2)) % x == 2**32 % x when x <= 2**31 */
min = ((0xffffffff - (upper_bound * 2)) + 1) % upper_bound;
}
#endif
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}
return (r % upper_bound);
}
Because bound is a uint64_t, -bound is evaluated modulo 264. The result is 264−bound, not −bound.
Then -bound % bound calculates the residue of 264−bound modulo bound. This equals the residue of 264 modulo bound.
By setting threshold to this and rejecting numbers that are less than threshold, the routine reduces the accepted interval to 264−threshold numbers. The result is an interval that has a number of numbers that is a multiple of bound.
From a number r selected in that interval, the routine returns r % bound. Due to the trimming of the interval, there are an equal number of occurrences of each residue, so the result has no bias for any residue over any other.

What is the matrix/vector operation that corresponds to this code?

Here is the code:
long long mul(long long x)
{
uint64_t M[64] = INIT;
uint64_t result = 0;
for ( int i = 0; i < 64; i++ )
{
uint64_t a = x & M[i];
uint64_t b = 0;
while ( a ){
b ^= a & 1;;
a >>= 1;
}
result |= b << (63 - i);
}
return result;
}
This code implements multiplication of the matrix and vector on GF(2). The code that returns result as the product of 64x64 matrix M and 1x64 vector x.
I want to know what linear algebraic operation( on GF(2) ) this code is:
long long unknown(long long x)
{
uint64_t A[] = INIT;
uint64_t a = 0, b = 0;
for( i = 1; i <= 64; i++ ){
for( j = i; j <= 64; j++ ){
if( ((x >> (64-i)) & 1) && ((x >> (64-j)) & 1) )
a ^= A[b];
b++;
}
}
return a;
}
I want to know what linear algebraic operation( on GF(2) ) this code is:
Of course you mean GF(2)64, the field of 64-dimensional vectors over GF(2).
Consider first the loop structure:
for( i = 1; i <= 64; i++ ){
for( j = i; j <= 64; j++ ){
That's looking at every distinct pair of indices (the indices themselves not necessarily distinct from each other). That should provide a first clue. We then see
if( ((x >> (64-i)) & 1) && ((x >> (64-j)) & 1) )
, which is testing whether vector x has both bit i and bit j set. If it does, then we add a row of matrix A into accumulation variable a, by vector sum (== element-wise exclusive or). By incrementing b on every inner-loop iteration, we ensure that each iteration services a different row of A. And that also tells us that A must have 64 * 65 / 2 = 160 rows (that matter).
In general, this is not a linear operation at all. The criterion for an operation o on a vector field over GF(2) to be linear boils down to this expression holding for all pairs of vectors x and y:
o(x + y) = o(x) + o(y)
Now, for notational convenience, let's consider the field GF(2)2 instead of GF(2)64; the result can be extended from the former to the latter simply by adding zeroes. Let x be the bit vector (1, 0) (represented, for example, by the integer 2). Let y be the bit vector (0, 1) (represented by the integer 1). And let A be this matrix:
1 0
0 1
1 0
Your operation has the following among its results:
operand result as integer comment
x (1, 0) 2 Only the first row is accumulated
y (1, 0) 2 Only the third row is accumulated
x + y (0, 1) 1 All rows are accumulated
Clearly, it is not the case that o(x) + o(y) = o(x + y) for this x, y, and characteristic A, so the operation is not linear for this A.
There are matrices A for which the corresponding operation is linear, but what linear operation they represent will depend on A. For example, it is possible to represent a wide variety of matrix-vector multiplications this way. It's not clear to me whether linear operations other than matrix-vector multiplications can be represented in this form, but I'm inclined to think not.

Compute logarithmic expression without floating point arithmetics or log

I need to compute the mathematical expression floor(ln(u)/ln(1-p)) for 0 < u < 1 and 0 < p < 1 in C on an embedded processor with no floating point arithmetics and no ln function. The result is a positive integer. I know about the limit cases (p=0), I'll deal with them later...
I imagine that the solution involves having u and p range over 0..UINT16_MAX, and appeal to a lookup table for the logarithm, but I cannot figure out how exactly: what does the lookup table map to?
The result needs not be 100% exact, approximations are OK.
Thanks!
Since the logarithm is used in both dividend and divisor, there is no need to use log(); we can use log2() instead. Due to the restrictions on the inputs u and p the logarithms are known to be both negative, so we can restrict ourselves to compute the positive quantity -log2().
We can use fixed-point arithmetic to compute the logarithm. We do so by multiplying the original input by a sequence of factors of decreasing magnitude that approach 1. Considering each of the factor in sequence, we multiply the input only by those factors that result in a product closer to 1, but without exceeding it. While doing so, we sum the log2() of the factors that "fit". At the end of this procedure we wind up with a number very close to 1 as our final product, and a sum that represents the binary logarithm.
This process is known in the literature as multiplicative normalization or pseudo division, and some early publications describing it are the works by De Lugish and Meggitt. The latter indicates that the origin is basically Henry Briggs's method for computing common logarithms.
B. de Lugish. "A Class of Algorithms for Automatic Evaluation of Functions and Computations in a Digital Computer". PhD thesis, Dept. of Computer Science, University of Illinois, Urbana, 1970.
J. E. Meggitt. "Pseudo division and pseudo multiplication processes". IBM Journal of Research and Development, Vol. 6, No. 2, April 1962, pp. 210-226
As the chosen set of factors comprises 2i and (1+2-i) the necessary multiplications can be performed without the need for a multiplication instruction: the products can be computed by either shift or shift plus add.
Since the inputs u and p are purely fractional numbers with 16 bits, we may want to chose a 5.16 fixed-point result for the logarithm. By simply dividing the two logarithm values, we remove the fixed-point scale factor, and apply a floor() operation at the same time, because for positive numbers, floor(x) is identical to trunc(x) and integer division is truncating.
Note that the fixed-point computation of the logarithm results in large relative error for inputs near 1. This in turn means the entire function computed using fixed-point arithmetic may deliver results significantly different from the reference if p is small. An example of this is the following test case: u=55af p=0052 res=848 ref=874.
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
/* input x is a 0.16 fixed-point number in [0,1)
function returns -log2(x) as a 5.16 fixed-point number in (0, 16]
*/
uint32_t nlog2_16 (uint16_t x)
{
uint32_t r = 0;
uint32_t t, a = x;
/* try factors 2**i with i = 8, 4, 2, 1 */
if ((t = a << 8 ) < 0x10000) { a = t; r += 0x80000; }
if ((t = a << 4 ) < 0x10000) { a = t; r += 0x40000; }
if ((t = a << 2 ) < 0x10000) { a = t; r += 0x20000; }
if ((t = a << 1 ) < 0x10000) { a = t; r += 0x10000; }
/* try factors (1+2**(-i)) with i = 1, .., 16 */
if ((t = a + (a >> 1)) < 0x10000) { a = t; r += 0x095c0; }
if ((t = a + (a >> 2)) < 0x10000) { a = t; r += 0x0526a; }
if ((t = a + (a >> 3)) < 0x10000) { a = t; r += 0x02b80; }
if ((t = a + (a >> 4)) < 0x10000) { a = t; r += 0x01664; }
if ((t = a + (a >> 5)) < 0x10000) { a = t; r += 0x00b5d; }
if ((t = a + (a >> 6)) < 0x10000) { a = t; r += 0x005ba; }
if ((t = a + (a >> 7)) < 0x10000) { a = t; r += 0x002e0; }
if ((t = a + (a >> 8)) < 0x10000) { a = t; r += 0x00171; }
if ((t = a + (a >> 9)) < 0x10000) { a = t; r += 0x000b8; }
if ((t = a + (a >> 10)) < 0x10000) { a = t; r += 0x0005c; }
if ((t = a + (a >> 11)) < 0x10000) { a = t; r += 0x0002e; }
if ((t = a + (a >> 12)) < 0x10000) { a = t; r += 0x00017; }
if ((t = a + (a >> 13)) < 0x10000) { a = t; r += 0x0000c; }
if ((t = a + (a >> 14)) < 0x10000) { a = t; r += 0x00006; }
if ((t = a + (a >> 15)) < 0x10000) { a = t; r += 0x00003; }
if ((t = a + (a >> 16)) < 0x10000) { a = t; r += 0x00001; }
return r;
}
/* Compute floor(log(u)/log(1-p)) for 0 < u < 1 and 0 < p < 1,
where 'u' and 'p' are represented as 0.16 fixed-point numbers
Result is an integer in range [0, 1048676]
*/
uint32_t func (uint16_t u, uint16_t p)
{
uint16_t one_minus_p = 0x10000 - p; // 1.0 - p
uint32_t log_u = nlog2_16 (u);
uint32_t log_p = nlog2_16 (one_minus_p);
uint32_t res = log_u / log_p; // divide and floor in one go
return res;
}
The maximum value of this function basically depends on the precision limit; that is, how arbitrarily close to the limits (u -> 0) or (1 - p -> 1) the fixed point values can be.
If we assume (k) fractional bits, e.g., with the limits: u = (2^-k) and 1 - p = 1 - (2^-k),
then the maximum value is: k / (k - log2(2^k - 1))
(As the ratio of natural logarithms, we are free to use any base e.g., lb(x) or log2)
Unlike njuffa's answer, I went with a lookup table approach, settling on k = 10 fractional bits to represent 0 < frac(u) < 1024 and 0 < frac(p) < 1024. This requires a log table with 2^k entries. Using 32-bit table values, we're only looking at a 4KiB table.
Any more than that, and you are using enough memory that you could seriously consider using the relevant parts of a 'soft-float' library. e.g., k = 16 would yield a 256KiB LUT.
We're computing the values - log2(i / 1024.0) for 0 < i < 1024. Since these values are in the open interval (0, k), we only need 4 binary digits to store the integral part. So we store the precomputed LUT in 32-bit [4.28] fixed-point format:
uint32_t lut[1024]; /* never use lut[0] */
for (uint32_t i = 1; i < 1024; i++)
lut[i] = (uint32_t) (- (log2(i / 1024.0) * (268435456.0));
Given: u, p represented by [0.10] fixed-point values in [1, 1023] :
uint32_t func (uint16_t u, uint16_t p)
{
/* assert: 0 < u, p < 1024 */
return lut[u] / lut[1024 - p];
}
We can easily test all valid (u, p) pairs against the 'naive' floating-point evaluation:
floor(log(u / 1024.0) / log(1.0 - p / 1024.0))
and only get a mismatch (+1 too high) on the following cases:
u = 193, p = 1 : 1708 vs 1707 (1.7079978488147417e+03)
u = 250, p = 384 : 3 vs 2 (2.9999999999999996e+00)
u = 413, p = 4 : 232 vs 231 (2.3199989016957960e+02)
u = 603, p = 1 : 542 vs 541 (5.4199909906444600e+02)
u = 680, p = 1 : 419 vs 418 (4.1899938077226307e+02)
Finally, it turns out that using the natural logarithm in a [3.29] fixed-point format gives us even higher precision, where:
lut[i] = (uint32_t) (- (log(i / 1024.0) * (536870912.0));
only yields a single 'mismatch', though 'bignum' precision suggests it's correct:
u = 250, p = 384 : 3 vs 2 (2.9999999999999996e+00)

sum's sum of divizors of numbers less than or equal to N

I really need some help at this problem:
Given a positive integer N, we define xsum(N) as sum's sum of all positive integer divisors' numbers less or equal to N.
For example: xsum(6) = 1 + (1 + 2) + (1 + 3) + (1 + 2 + 4) + (1 + 5) + (1 + 2 + 3 + 6) = 33.
(xsum - sum of divizors of 1 + sum of divizors of 2 + ... + sum of div of 6)
Given a positive integer K, you are asked to find the lowest N that satisfies the condition: xsum(N) >= K
K is a nonzero natural number that has at most 14 digits
time limit : 0.2 sec
Obviously, the brute force will fall for most cases with Time Limit Exceeded. I haven't find something better than it yet, so that's the code:
fscanf(fi,"%lld",&k);
i=2;
sum=1;
while(sum<k) {
sum=sum+i+1;
d=2;
while(d*d<=i) {
if(i%d==0 && d*d!=i)
sum=sum+d+i/d;
else
if(d*d==i)
sum+=d;
d++;
}
i++;
}
Any better ideas?
For each number n in range [1 , N] the following applies: n is divisor of exactly roundDown(N / n) numbers in range [1 , N]. Thus for each n we add a total of n * roundDown(N / n) to the result.
int xsum(int N){
int result = 0;
for(int i = 1 ; i <= N ; i++)
result += (N / i) * i;//due to the int-division the two i don't cancel out
return result;
}
The idea behind this algorithm can aswell be used to solve the main-problem (smallest N such that xsum(N) >= K) in faster time than brute-force search.
The complete search can be further optimized using some rules we can derive from the above code: K = minN * minN (minN would be the correct result if K = 2 * 3 * ...). Using this information we have a lower-bound for starting the search.
Next step would be to search for the upper bound. Since the growth of xsum(N) is (approximately) quadratic we can use this to approximate N. This optimized guessing allows to find the searched value pretty fast.
int N(int K){
//start with the minimum-bound of N
int upperN = (int) sqrt(K);
int lowerN = upperN;
int tmpSum;
//search until xsum(upperN) reaches K
while((tmpSum = xsum(upperN)) < K){
int r = K - tmpSum;
lowerN = upperN;
upperN += (int) sqrt(r / 3) + 1;
}
//Now the we have an upper and a lower bound for searching N
//the rest of the search can be done using binary-search (i won't
//implement it here)
int N;//search for the value
return N;
}

The most efficient way to implement an integer based power function pow(int, int)

What is the most efficient way given to raise an integer to the power of another integer in C?
// 2^3
pow(2,3) == 8
// 5^5
pow(5,5) == 3125
Exponentiation by squaring.
int ipow(int base, int exp)
{
int result = 1;
for (;;)
{
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
break;
base *= base;
}
return result;
}
This is the standard method for doing modular exponentiation for huge numbers in asymmetric cryptography.
Note that exponentiation by squaring is not the most optimal method. It is probably the best you can do as a general method that works for all exponent values, but for a specific exponent value there might be a better sequence that needs fewer multiplications.
For instance, if you want to compute x^15, the method of exponentiation by squaring will give you:
x^15 = (x^7)*(x^7)*x
x^7 = (x^3)*(x^3)*x
x^3 = x*x*x
This is a total of 6 multiplications.
It turns out this can be done using "just" 5 multiplications via addition-chain exponentiation.
n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15
There are no efficient algorithms to find this optimal sequence of multiplications. From Wikipedia:
The problem of finding the shortest addition chain cannot be solved by dynamic programming, because it does not satisfy the assumption of optimal substructure. That is, it is not sufficient to decompose the power into smaller powers, each of which is computed minimally, since the addition chains for the smaller powers may be related (to share computations). For example, in the shortest addition chain for a¹⁵ above, the subproblem for a⁶ must be computed as (a³)² since a³ is re-used (as opposed to, say, a⁶ = a²(a²)², which also requires three multiplies).
If you need to raise 2 to a power. The fastest way to do so is to bit shift by the power.
2 ** 3 == 1 << 3 == 8
2 ** 30 == 1 << 30 == 1073741824 (A Gigabyte)
Here is the method in Java
private int ipow(int base, int exp)
{
int result = 1;
while (exp != 0)
{
if ((exp & 1) == 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
An extremely specialized case is, when you need say 2^(-x to the y), where x, is of course is negative and y is too large to do shifting on an int. You can still do 2^x in constant time by screwing with a float.
struct IeeeFloat
{
unsigned int base : 23;
unsigned int exponent : 8;
unsigned int signBit : 1;
};
union IeeeFloatUnion
{
IeeeFloat brokenOut;
float f;
};
inline float twoToThe(char exponent)
{
// notice how the range checking is already done on the exponent var
static IeeeFloatUnion u;
u.f = 2.0;
// Change the exponent part of the float
u.brokenOut.exponent += (exponent - 1);
return (u.f);
}
You can get more powers of 2 by using a double as the base type.
(Thanks a lot to commenters for helping to square this post away).
There's also the possibility that learning more about IEEE floats, other special cases of exponentiation might present themselves.
power() function to work for Integers Only
int power(int base, unsigned int exp){
if (exp == 0)
return 1;
int temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else
return base*temp*temp;
}
Complexity = O(log(exp))
power() function to work for negative exp and float base.
float power(float base, int exp) {
if( exp == 0)
return 1;
float temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else {
if(exp > 0)
return base*temp*temp;
else
return (temp*temp)/base; //negative exponent computation
}
}
Complexity = O(log(exp))
If you want to get the value of an integer for 2 raised to the power of something it is always better to use the shift option:
pow(2,5) can be replaced by 1<<5
This is much more efficient.
int pow( int base, int exponent)
{ // Does not work for negative exponents. (But that would be leaving the range of int)
if (exponent == 0) return 1; // base case;
int temp = pow(base, exponent/2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
Just as a follow up to comments on the efficiency of exponentiation by squaring.
The advantage of that approach is that it runs in log(n) time. For example, if you were going to calculate something huge, such as x^1048575 (2^20 - 1), you only have to go thru the loop 20 times, not 1 million+ using the naive approach.
Also, in terms of code complexity, it is simpler than trying to find the most optimal sequence of multiplications, a la Pramod's suggestion.
Edit:
I guess I should clarify before someone tags me for the potential for overflow. This approach assumes that you have some sort of hugeint library.
Late to the party:
Below is a solution that also deals with y < 0 as best as it can.
It uses a result of intmax_t for maximum range. There is no provision for answers that do not fit in intmax_t.
powjii(0, 0) --> 1 which is a common result for this case.
pow(0,negative), another undefined result, returns INTMAX_MAX
intmax_t powjii(int x, int y) {
if (y < 0) {
switch (x) {
case 0:
return INTMAX_MAX;
case 1:
return 1;
case -1:
return y % 2 ? -1 : 1;
}
return 0;
}
intmax_t z = 1;
intmax_t base = x;
for (;;) {
if (y % 2) {
z *= base;
}
y /= 2;
if (y == 0) {
break;
}
base *= base;
}
return z;
}
This code uses a forever loop for(;;) to avoid the final base *= base common in other looped solutions. That multiplication is 1) not needed and 2) could be int*int overflow which is UB.
more generic solution considering negative exponenet
private static int pow(int base, int exponent) {
int result = 1;
if (exponent == 0)
return result; // base case;
if (exponent < 0)
return 1 / pow(base, -exponent);
int temp = pow(base, exponent / 2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
The O(log N) solution in Swift...
// Time complexity is O(log N)
func power(_ base: Int, _ exp: Int) -> Int {
// 1. If the exponent is 1 then return the number (e.g a^1 == a)
//Time complexity O(1)
if exp == 1 {
return base
}
// 2. Calculate the value of the number raised to half of the exponent. This will be used to calculate the final answer by squaring the result (e.g a^2n == (a^n)^2 == a^n * a^n). The idea is that we can do half the amount of work by obtaining a^n and multiplying the result by itself to get a^2n
//Time complexity O(log N)
let tempVal = power(base, exp/2)
// 3. If the exponent was odd then decompose the result in such a way that it allows you to divide the exponent in two (e.g. a^(2n+1) == a^1 * a^2n == a^1 * a^n * a^n). If the eponent is even then the result must be the base raised to half the exponent squared (e.g. a^2n == a^n * a^n = (a^n)^2).
//Time complexity O(1)
return (exp % 2 == 1 ? base : 1) * tempVal * tempVal
}
int pow(int const x, unsigned const e) noexcept
{
return !e ? 1 : 1 == e ? x : (e % 2 ? x : 1) * pow(x * x, e / 2);
//return !e ? 1 : 1 == e ? x : (((x ^ 1) & -(e % 2)) ^ 1) * pow(x * x, e / 2);
}
Yes, it's recursive, but a good optimizing compiler will optimize recursion away.
One more implementation (in Java). May not be most efficient solution but # of iterations is same as that of Exponential solution.
public static long pow(long base, long exp){
if(exp ==0){
return 1;
}
if(exp ==1){
return base;
}
if(exp % 2 == 0){
long half = pow(base, exp/2);
return half * half;
}else{
long half = pow(base, (exp -1)/2);
return base * half * half;
}
}
I use recursive, if the exp is even,5^10 =25^5.
int pow(float base,float exp){
if (exp==0)return 1;
else if(exp>0&&exp%2==0){
return pow(base*base,exp/2);
}else if (exp>0&&exp%2!=0){
return base*pow(base,exp-1);
}
}
In addition to the answer by Elias, which causes Undefined Behaviour when implemented with signed integers, and incorrect values for high input when implemented with unsigned integers,
here is a modified version of the Exponentiation by Squaring that also works with signed integer types, and doesn't give incorrect values:
#include <stdint.h>
#define SQRT_INT64_MAX (INT64_C(0xB504F333))
int64_t alx_pow_s64 (int64_t base, uint8_t exp)
{
int_fast64_t base_;
int_fast64_t result;
base_ = base;
if (base_ == 1)
return 1;
if (!exp)
return 1;
if (!base_)
return 0;
result = 1;
if (exp & 1)
result *= base_;
exp >>= 1;
while (exp) {
if (base_ > SQRT_INT64_MAX)
return 0;
base_ *= base_;
if (exp & 1)
result *= base_;
exp >>= 1;
}
return result;
}
Considerations for this function:
(1 ** N) == 1
(N ** 0) == 1
(0 ** 0) == 1
(0 ** N) == 0
If any overflow or wrapping is going to take place, return 0;
I used int64_t, but any width (signed or unsigned) can be used with little modification. However, if you need to use a non-fixed-width integer type, you will need to change SQRT_INT64_MAX by (int)sqrt(INT_MAX) (in the case of using int) or something similar, which should be optimized, but it is uglier, and not a C constant expression. Also casting the result of sqrt() to an int is not very good because of floating point precission in case of a perfect square, but as I don't know of any implementation where INT_MAX -or the maximum of any type- is a perfect square, you can live with that.
I have implemented algorithm that memorizes all computed powers and then uses them when need. So for example x^13 is equal to (x^2)^2^2 * x^2^2 * x where x^2^2 it taken from the table instead of computing it once again. This is basically implementation of #Pramod answer (but in C#).
The number of multiplication needed is Ceil(Log n)
public static int Power(int base, int exp)
{
int tab[] = new int[exp + 1];
tab[0] = 1;
tab[1] = base;
return Power(base, exp, tab);
}
public static int Power(int base, int exp, int tab[])
{
if(exp == 0) return 1;
if(exp == 1) return base;
int i = 1;
while(i < exp/2)
{
if(tab[2 * i] <= 0)
tab[2 * i] = tab[i] * tab[i];
i = i << 1;
}
if(exp <= i)
return tab[i];
else return tab[i] * Power(base, exp - i, tab);
}
Here is a O(1) algorithm for calculating x ** y, inspired by this comment. It works for 32-bit signed int.
For small values of y, it uses exponentiation by squaring. For large values of y, there are only a few values of x where the result doesn't overflow. This implementation uses a lookup table to read the result without calculating.
On overflow, the C standard permits any behavior, including crash. However, I decided to do bound-checking on LUT indices to prevent memory access violation, which could be surprising and undesirable.
Pseudo-code:
If `x` is between -2 and 2, use special-case formulas.
Otherwise, if `y` is between 0 and 8, use special-case formulas.
Otherwise:
Set x = abs(x); remember if x was negative
If x <= 10 and y <= 19:
Load precomputed result from a lookup table
Otherwise:
Set result to 0 (overflow)
If x was negative and y is odd, negate the result
C code:
#define POW9(x) x * x * x * x * x * x * x * x * x
#define POW10(x) POW9(x) * x
#define POW11(x) POW10(x) * x
#define POW12(x) POW11(x) * x
#define POW13(x) POW12(x) * x
#define POW14(x) POW13(x) * x
#define POW15(x) POW14(x) * x
#define POW16(x) POW15(x) * x
#define POW17(x) POW16(x) * x
#define POW18(x) POW17(x) * x
#define POW19(x) POW18(x) * x
int mypow(int x, unsigned y)
{
static int table[8][11] = {
{POW9(3), POW10(3), POW11(3), POW12(3), POW13(3), POW14(3), POW15(3), POW16(3), POW17(3), POW18(3), POW19(3)},
{POW9(4), POW10(4), POW11(4), POW12(4), POW13(4), POW14(4), POW15(4), 0, 0, 0, 0},
{POW9(5), POW10(5), POW11(5), POW12(5), POW13(5), 0, 0, 0, 0, 0, 0},
{POW9(6), POW10(6), POW11(6), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(7), POW10(7), POW11(7), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(8), POW10(8), 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(9), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(10), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int is_neg;
int r;
switch (x)
{
case 0:
return y == 0 ? 1 : 0;
case 1:
return 1;
case -1:
return y % 2 == 0 ? 1 : -1;
case 2:
return 1 << y;
case -2:
return (y % 2 == 0 ? 1 : -1) << y;
default:
switch (y)
{
case 0:
return 1;
case 1:
return x;
case 2:
return x * x;
case 3:
return x * x * x;
case 4:
r = x * x;
return r * r;
case 5:
r = x * x;
return r * r * x;
case 6:
r = x * x;
return r * r * r;
case 7:
r = x * x;
return r * r * r * x;
case 8:
r = x * x;
r = r * r;
return r * r;
default:
is_neg = x < 0;
if (is_neg)
x = -x;
if (x <= 10 && y <= 19)
r = table[x - 3][y - 9];
else
r = 0;
if (is_neg && y % 2 == 1)
r = -r;
return r;
}
}
}
My case is a little different, I'm trying to create a mask from a power, but I thought I'd share the solution I found anyway.
Obviously, it only works for powers of 2.
Mask1 = 1 << (Exponent - 1);
Mask2 = Mask1 - 1;
return Mask1 + Mask2;
In case you know the exponent (and it is an integer) at compile-time, you can use templates to unroll the loop. This can be made more efficient, but I wanted to demonstrate the basic principle here:
#include <iostream>
template<unsigned long N>
unsigned long inline exp_unroll(unsigned base) {
return base * exp_unroll<N-1>(base);
}
We terminate the recursion using a template specialization:
template<>
unsigned long inline exp_unroll<1>(unsigned base) {
return base;
}
The exponent needs to be known at runtime,
int main(int argc, char * argv[]) {
std::cout << argv[1] <<"**5= " << exp_unroll<5>(atoi(argv[1])) << ;std::endl;
}
I've noticed something strange about the standard exponential squaring algorithm with gnu-GMP :
I implemented 2 nearly-identical functions - a power-modulo function using the most vanilla binary exponential squaring algorithm,
labeled ______2()
then another one basically the same concept, but re-mapped to dividing by 10 at each round instead of dividing by 2,
labeled ______10()
.
( time ( jot - 1456 9999999999 6671 | pvE0 |
gawk -Mbe '
function ______10(_, __, ___, ____, _____, _______) {
__ = +__
____ = (____+=_____=____^= \
(_ %=___=+___)<_)+____++^____—
while (__) {
if (_______= __%____) {
if (__==_______) {
return (_^__ *_____) %___
}
__-=_______
_____ = (_^_______*_____) %___
}
__/=____
_ = _^____%___
}
}
function ______2(_, __, ___, ____, _____) {
__=+__
____+=____=_____^=(_%=___=+___)<_
while (__) {
if (__ %____) {
if (__<____) {
return (_*_____) %___
}
_____ = (_____*_) %___
--__
}
__/=____
_= (_*_) %___
}
}
BEGIN {
OFMT = CONVFMT = "%.250g"
__ = (___=_^= FS=OFS= "=")(_<_)
_____ = __^(_=3)^--_ * ++_-(_+_)^_
______ = _^(_+_)-_ + _^!_
_______ = int(______*_____)
________ = 10 ^ 5 + 1
_________ = 8 ^ 4 * 2 - 1
}
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
.
($++NF = ______10(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:08 [6.02MiB/s] [6.02MiB/s] [ <=> ]
in0: 15.6MiB 0:00:08 [1.95MiB/s] [1.95MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
8.31s user 0.06s system 103% cpu 8.058 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
($++NF = ______2(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:12 [3.78MiB/s] [3.78MiB/s] [<=> ]
in0: 15.6MiB 0:00:12 [1.22MiB/s] [1.22MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
13.05s user 0.07s system 102% cpu 12.821 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
For reasons extremely counter-intuitive and unknown to me, for a wide variety of inputs i threw at it, the div-10 variant is nearly always faster. It's the matching of hashes between the 2 that made it truly baffling, despite computers obviously not being built in and for a base-10 paradigm.
Am I missing something critical or obvious in the code/approach that might be skewing the results in a confounding manner ? Thanks.

Resources