GNU Scientific Library, Power function efficiency - c

I was going through GSL library. I am pasting the function they used for finding the power of a double number.
double gsl_pow_int(double x, int n)
{
double value = 1.0;
if(n < 0) {
x = 1.0/x;
n = -n;
}
/* repeated squaring method
* returns 0.0^0 = 1.0, so continuous in x
*/
do {
if(n & 1) value *= x; /* for n odd */
n >>= 1;
x *= x;
} while (n);
return value;
}
But wouldn't it be more efficient if they use?
double gsl_pow_int(double x, int n)
{
double value = 1.0;
if(n < 0) {
x = 1.0/x;
n = -n;
}
/* repeated squaring method
* returns 0.0^0 = 1.0, so continuous in x
*/
do{
if(--n)value*=x;
}while(n);
return value;
}

Your code doesn't even properly handle negative powers! How can you claim that your code is optimised.
Also,next,just decreasing space from your program doesn't make your
code more-optimised.Their code has got more readability and more
proper indentation than yours!!! Their code is proper for negative
powers too and much more optimised!
Also,next, bitwise logical operations like & and right shifting >> is considered more efficient than multiplying as what you have done.

Related

Time Complexity for Power Function in C

For the Power function below, time limit has exceeded.
I can see other solutions to this problem here, but wanted to know why time limit exceeds with my implementation.
double Power(double x, int n)
{
if (n == 0) return 1;
if (x == 0) return 0;
double result = x;
int temp = n;
if (temp < 0)
{
temp = temp * -1;
}
for (int i = 1; i < temp; i++)
{
result *= x;
}
if (n < 0)
{
result = 1 / result;
}
return result;
}
Your algorithm is very slow for large values of n. You're doing n multiplications to get the power, so this is O(n) complexity.
p = x*x*x*...*x
\---------/
n times
You can speed up the calculation by grouping the values. For example you could calculate the square of x and then multiply that value n/2 times with itself (Note that you may need a single x in the end if n is odd).
x2 = x*x
p = x2*x2*...*x2 (*x)
\----------/
n/2 times
With this you only needed (n+1)/2+1 multiplications, which is O(n/2) and twice as fast in the limit of large n.
As you might guess, you can even further group the values and reuse those grouped powers, which leads to a O(log(n)) time complexity as #dbush pointed out in the comment to your question:
double Power(double x, int n) {
double result = 1.0;
double group;
if ( x == 0.0 ) {
return 0.0;
}
if ( n < 0 ) {
n = -n;
group = 1.0/x;
} else {
group = x;
}
while ( n > 0 ) {
if ( n % 2 ) {
result *= group;
}
n = n/2;
group *= group;
}
return result;
}
This algorithm keeps squaring the value of the group and multiply that group value to the result if needed.
Note There is a constant time O(1) implementation of the power function (e.g. the pow from math.h). This makes use of the fact that doubles only have a limited precision. The power can be written as
pow(x,n) = exp(n*log(x))
and the exponential exp as well as the natural logarithm log can be calulated in constant time (see my answer to this question for example). For small integer values of n, the above algorithm is faster though.

Program for finding nth root of the number without any external library or header like math.h

Is there any way to find nth root of the number without any external library in C? I'm working on a bare metal code so there is no OS. Also, no complete C is there.
You can write a program like this for nth root. This program is for square root.
int floorSqrt(int x)
{
// Base cases
if (x == 0 || x == 1)
return x;
// Staring from 1, try all numbers until
// i*i is greater than or equal to x.
int i = 1, result = 1;
while (result < x)
{
if (result == x)
return result;
i++;
result = i*i;
}
return i-1;
}
You can use the same approach for nth root.
Here there is a C implementation of the the nth root algorithm you can find in wikipedia. It needs an exponentiation algorithm, so I also include an implementation of a basic method for exponentiation by squaring that you can find also find in wikipedia.
double npower(double const base, int const n)
{
if (n < 0) return npower(1/base, -n)
else if (n == 0) return 1.0;
else if (n == 1) return base;
else if (n % 2) return base*npower(base*base, n/2);
else return npower(base*base, n/2);
}
double nroot(double const base, int const n)
{
if (n == 1) return base;
else if (n <= 0 || base < 0) return NAN;
else {
double delta, x = base/n;
do {
delta = (base/npower(x,n-1)-x)/n;
x += delta;
} while (fabs(delta) >= 1e-8);
return x;
}
}
Some comments on this:
The nth root algorithm in wikipedia leaves freedom for the initial guess. In this example I set it up to be base/n, but this was just a guess.
The macro NAN is usually defined in <math.h>, so you would need to define it to be suitable for your needs.
Both functions are implemented in a very rough and simple way, and their performance can be greatly improved with careful thought.
The tolerance in this example is set to 1e-8 and should be changed to something different. It should probably be proportional to the value of the base.
You can try the nth_root C function :
// return a number that, when multiplied by itself nth times, makes N.
unsigned nth_root(const unsigned n, const unsigned nth) {
unsigned a = n, b, c, r = nth ? n + (n > 1) : n == 1 ;
for (; a < r; b = a + (nth - 1) * r, a = b / nth)
for (r = a, a = n, c = nth - 1; c && (a /= r); --c);
return r;
}
Source

For loop with unsigned int

I have a logical problem in my code, maybe it is caused by overflowing but I can't solve this on my own, so I would be thankful if anyone can help me.
In the following piece of code, I have implemented the function taylor_log(), which can count "n" iterations of taylor polynomial. In the void function I am looking for number of iterations (*limit) which is enough to count a logarithm with desired accuracy compared to log function from .
The thing is that sometimes UINT_MAX is not enough iterations to get the desired accuracy and at this point I want to let the user know that the number of needed iterations is higher than UINT_MAX. But my code don't work, for example for x = 1e+280, eps = 623. It just counts, counts and never give result.
TaylorPolynomial
double taylor_log(double x, unsigned int n){
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; i <= n; i++)
{
f_sum *= (x - 1) / x;
sum += f_sum / i;
}
return sum;
}
void guessIt(double x, double eps, unsigned int *limit){
*limit = 10;
double real_log = log(x);
double t_log = taylor_log(x, *limit);
while(myabs(real_log - t_log) > eps)
{
if (*limit == UINT_MAX)
{
*limit = 0;
break;
}
if (*limit >= UINT_MAX/2)
{
*limit = UINT_MAX;
t_log = taylor_log(x, *limit);
}
else
{
*limit = (*limit) *2;
t_log = taylor_log(x, *limit);
}
}
}
EDIT: Ok guys, thanks for your reactions so far. I have changed my code to this:
if (*limit == UINT_MAX-1)
{
*limit = 0;
break;
}
if (*limit >= UINT_MAX/2)
{
*limit = UINT_MAX-1;
t_log = taylor_log(x, *limit);
}
but it still doesn't work correctly, I have set printf to the beggining of taylor_log() function to see the value of "n" and its (..., 671088640, 1342177280, 2684354560, 5, 4, 3, 2, 2, 1, 2013265920, ...). Don't understand it..
This code below assigns the limit to UINT_MAX
if (*limit >= UINT_MAX/2)
{
*limit = UINT_MAX;
t_log = taylor_log(x, *limit);
}
And your for loop is defined like this:
for (unsigned int i = 1; i <= n; i++)
i will ALWAYS be less than or equal to UINT_MAX because there is never going to be a value of i that is greater than UINT_MAX. Because that's the largest value i could ever be. So there is certainly overflow and your loop exit condition is never met. i rolls over to zero and the process repeats indefinitely.
You should change your loop condition to i < n or change your limit to UINT_MAX - 1.
[Edit]
OP coded correctly but must insure a limited range (0.5 < x < 2.0 ?)
Below is a code version that self determines when to stop. Iteration count goes high near x near 0.5 and 2.0. The iteration count needed goes into the millions. Such the alternative coded far below.
double taylor_logA(double x) {
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; ; i++) {
f_sum *= (x - 1) / x;
double sum_before = sum;
sum += f_sum / i;
if (sum_before == sum) {
printf("%d\n", i);
break;
}
}
return sum;
}
Wrongalternative implementation of the series: Ref
Sample alternative - it converges faster.
double taylor_log2(double x, unsigned int n) {
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; i <= n; i++) {
f_sum *= (x - 1) / 1; // / 1 (or remove)
if (i & 1) sum += f_sum / i;
else sum -= f_sum / i; // subtract even terms
}
return sum;
}
A reasonable number of terms will converge as needed.
Alternatively, continue until terms are too small (maybe 50 or so)
double taylor_log3(double x) {
double f_sum = 1.0;
double sum = 0.0;
for (unsigned int i = 1; ; i++) {
double sum_before = sum;
f_sum *= x - 1;
if (i & 1) sum += f_sum / i;
else sum -= f_sum / i;
if (sum_before == sum) {
printf("%d\n", i);
break;
}
}
return sum;
}
Other improvements possible. example see More efficient series
First, using std::numeric_limits<unsigned int>::max() will make your code more c++-ish than c-ish. Second, you can use the integral type unsigned long long and std::numeric_limits<unsigned long long>::max() for the limit, which is pretty mush the limit for an integral type. If you want a higher limit, you may use long double. floating points also allows you to use infinity with std::numeric_limits<double>::infinity() note that infinity work with double, float and long double.
If neither of these types provide you the precision you need, look at boost::multiprecision
First of all, the Taylor series for the logarithm function only converges for values of 0 < x < 2, so it's quite possible that the eps precision is never hit.
Secondly, are you sure that it loops forever, instead of hitting the *limit >= UINT_MAX/2 after a very long time?
OP is using the series well outside its usable range of 0.5 x < 2.0 with calls like taylor_log(1e280, n)
Even within the range, x values near the limits of 0.5 and 2.0 converge very slowly needing millions+ of iterations. A precise log() will not result. Best to use the 2x range about 1.0.
Create a wrapper function to call the original function in its sweet range of sqrt(2)/2 < x < sqrt(2). Converges, worst case, with about 40 iterations.
#define SQRT_0_5 0.70710678118654752440084436210485
#define LN2 0.69314718055994530941723212145818
// Valid over the range (0...DBL_MAX]
double taylor_logB(double x, unsigned int n) {
int expo;
double signif = frexp(x, &expo);
if (signif < SQRT_0_5) {
signif *= 2;
expo--;
}
double y = taylor_log(signif,n);
y += expo*LN2;
return y;
}

Fast implementation binary exponentiation implementation in OpenCL

I've been trying to design a fast binary exponentiation implementation in OpenCL. My current implementation is very similar to the one in this book about pi.
// Returns 16^n mod ak
inline double expm (long n, double ak)
{
double r = 16.0;
long nt;
if (ak == 1) return 0.;
if (n == 0) return 1;
if (n == 1) return fmod(16.0, ak);
for (nt=1; nt <= n; nt <<=1);
nt >>= 2;
do
{
r = fmod(r*r, ak);
if ((n & nt) != 0)
r = fmod(16.0*r, ak);
nt >>= 1;
} while (nt != 0);
return r;
}
Is there room for improvement? Right now my program is spending the vast majority of it's time in this function.
My first thought is to vectorize it, for a potential speed up of ~1.6x. This uses 5 multiplies per loop compared to 2 multiplies in the original, but with approximately a quarter the number of loops for sufficiently large N. Converting all the doubles to longs, and swapping out the fmods for %s may provide some speed up depending on the exact GPU used and whatever.
inline double expm(long n, double ak) {
double4 r = (1.0, 1.0, 1.0, 1.0);
long4 ns = n & (0x1111111111111111, 0x2222222222222222, 0x4444444444444444,
0x8888888888888888);
long nt;
if(ak == 1) return 0.;
for(nt=15; nt<n; nt<<=4); //This can probably be vectorized somehow as well.
do {
double4 tmp = r*r;
tmp = tmp*tmp;
tmp = tmp*tmp;
r = fmod(tmp*tmp, ak); //Raise it to the 16th power,
//same as multiplying the exponent
//(of the result) by 16, same as
//bitshifting the exponent to the right 4 bits.
r = select(fmod(r*(16.0,256.0,65536.0, 4294967296.0), ak), r, (ns & nt) - 1);
nt >>= 4;
} while(nt != 0); //Process n four bits at a time.
return fmod(r.x*r.y*r.z*r.w, ak); //And then combine all of them.
}
Edit: I'm pretty sure it works now.
The loop to extract nt = log2(n); can be replaced by
if (n & 1) ...; n >>= 1;
in the do-while loop.
Given that initially r = 16;, fmod(r*r, ak) vs fmod(16*r,ak) can be easily delayed to calculate the modulo only every Nth iteration or so -- Loop unrolling?
Also why fmod?

The most efficient way to implement an integer based power function pow(int, int)

What is the most efficient way given to raise an integer to the power of another integer in C?
// 2^3
pow(2,3) == 8
// 5^5
pow(5,5) == 3125
Exponentiation by squaring.
int ipow(int base, int exp)
{
int result = 1;
for (;;)
{
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
break;
base *= base;
}
return result;
}
This is the standard method for doing modular exponentiation for huge numbers in asymmetric cryptography.
Note that exponentiation by squaring is not the most optimal method. It is probably the best you can do as a general method that works for all exponent values, but for a specific exponent value there might be a better sequence that needs fewer multiplications.
For instance, if you want to compute x^15, the method of exponentiation by squaring will give you:
x^15 = (x^7)*(x^7)*x
x^7 = (x^3)*(x^3)*x
x^3 = x*x*x
This is a total of 6 multiplications.
It turns out this can be done using "just" 5 multiplications via addition-chain exponentiation.
n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15
There are no efficient algorithms to find this optimal sequence of multiplications. From Wikipedia:
The problem of finding the shortest addition chain cannot be solved by dynamic programming, because it does not satisfy the assumption of optimal substructure. That is, it is not sufficient to decompose the power into smaller powers, each of which is computed minimally, since the addition chains for the smaller powers may be related (to share computations). For example, in the shortest addition chain for a¹⁵ above, the subproblem for a⁶ must be computed as (a³)² since a³ is re-used (as opposed to, say, a⁶ = a²(a²)², which also requires three multiplies).
If you need to raise 2 to a power. The fastest way to do so is to bit shift by the power.
2 ** 3 == 1 << 3 == 8
2 ** 30 == 1 << 30 == 1073741824 (A Gigabyte)
Here is the method in Java
private int ipow(int base, int exp)
{
int result = 1;
while (exp != 0)
{
if ((exp & 1) == 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
An extremely specialized case is, when you need say 2^(-x to the y), where x, is of course is negative and y is too large to do shifting on an int. You can still do 2^x in constant time by screwing with a float.
struct IeeeFloat
{
unsigned int base : 23;
unsigned int exponent : 8;
unsigned int signBit : 1;
};
union IeeeFloatUnion
{
IeeeFloat brokenOut;
float f;
};
inline float twoToThe(char exponent)
{
// notice how the range checking is already done on the exponent var
static IeeeFloatUnion u;
u.f = 2.0;
// Change the exponent part of the float
u.brokenOut.exponent += (exponent - 1);
return (u.f);
}
You can get more powers of 2 by using a double as the base type.
(Thanks a lot to commenters for helping to square this post away).
There's also the possibility that learning more about IEEE floats, other special cases of exponentiation might present themselves.
power() function to work for Integers Only
int power(int base, unsigned int exp){
if (exp == 0)
return 1;
int temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else
return base*temp*temp;
}
Complexity = O(log(exp))
power() function to work for negative exp and float base.
float power(float base, int exp) {
if( exp == 0)
return 1;
float temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else {
if(exp > 0)
return base*temp*temp;
else
return (temp*temp)/base; //negative exponent computation
}
}
Complexity = O(log(exp))
If you want to get the value of an integer for 2 raised to the power of something it is always better to use the shift option:
pow(2,5) can be replaced by 1<<5
This is much more efficient.
int pow( int base, int exponent)
{ // Does not work for negative exponents. (But that would be leaving the range of int)
if (exponent == 0) return 1; // base case;
int temp = pow(base, exponent/2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
Just as a follow up to comments on the efficiency of exponentiation by squaring.
The advantage of that approach is that it runs in log(n) time. For example, if you were going to calculate something huge, such as x^1048575 (2^20 - 1), you only have to go thru the loop 20 times, not 1 million+ using the naive approach.
Also, in terms of code complexity, it is simpler than trying to find the most optimal sequence of multiplications, a la Pramod's suggestion.
Edit:
I guess I should clarify before someone tags me for the potential for overflow. This approach assumes that you have some sort of hugeint library.
Late to the party:
Below is a solution that also deals with y < 0 as best as it can.
It uses a result of intmax_t for maximum range. There is no provision for answers that do not fit in intmax_t.
powjii(0, 0) --> 1 which is a common result for this case.
pow(0,negative), another undefined result, returns INTMAX_MAX
intmax_t powjii(int x, int y) {
if (y < 0) {
switch (x) {
case 0:
return INTMAX_MAX;
case 1:
return 1;
case -1:
return y % 2 ? -1 : 1;
}
return 0;
}
intmax_t z = 1;
intmax_t base = x;
for (;;) {
if (y % 2) {
z *= base;
}
y /= 2;
if (y == 0) {
break;
}
base *= base;
}
return z;
}
This code uses a forever loop for(;;) to avoid the final base *= base common in other looped solutions. That multiplication is 1) not needed and 2) could be int*int overflow which is UB.
more generic solution considering negative exponenet
private static int pow(int base, int exponent) {
int result = 1;
if (exponent == 0)
return result; // base case;
if (exponent < 0)
return 1 / pow(base, -exponent);
int temp = pow(base, exponent / 2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
The O(log N) solution in Swift...
// Time complexity is O(log N)
func power(_ base: Int, _ exp: Int) -> Int {
// 1. If the exponent is 1 then return the number (e.g a^1 == a)
//Time complexity O(1)
if exp == 1 {
return base
}
// 2. Calculate the value of the number raised to half of the exponent. This will be used to calculate the final answer by squaring the result (e.g a^2n == (a^n)^2 == a^n * a^n). The idea is that we can do half the amount of work by obtaining a^n and multiplying the result by itself to get a^2n
//Time complexity O(log N)
let tempVal = power(base, exp/2)
// 3. If the exponent was odd then decompose the result in such a way that it allows you to divide the exponent in two (e.g. a^(2n+1) == a^1 * a^2n == a^1 * a^n * a^n). If the eponent is even then the result must be the base raised to half the exponent squared (e.g. a^2n == a^n * a^n = (a^n)^2).
//Time complexity O(1)
return (exp % 2 == 1 ? base : 1) * tempVal * tempVal
}
int pow(int const x, unsigned const e) noexcept
{
return !e ? 1 : 1 == e ? x : (e % 2 ? x : 1) * pow(x * x, e / 2);
//return !e ? 1 : 1 == e ? x : (((x ^ 1) & -(e % 2)) ^ 1) * pow(x * x, e / 2);
}
Yes, it's recursive, but a good optimizing compiler will optimize recursion away.
One more implementation (in Java). May not be most efficient solution but # of iterations is same as that of Exponential solution.
public static long pow(long base, long exp){
if(exp ==0){
return 1;
}
if(exp ==1){
return base;
}
if(exp % 2 == 0){
long half = pow(base, exp/2);
return half * half;
}else{
long half = pow(base, (exp -1)/2);
return base * half * half;
}
}
I use recursive, if the exp is even,5^10 =25^5.
int pow(float base,float exp){
if (exp==0)return 1;
else if(exp>0&&exp%2==0){
return pow(base*base,exp/2);
}else if (exp>0&&exp%2!=0){
return base*pow(base,exp-1);
}
}
In addition to the answer by Elias, which causes Undefined Behaviour when implemented with signed integers, and incorrect values for high input when implemented with unsigned integers,
here is a modified version of the Exponentiation by Squaring that also works with signed integer types, and doesn't give incorrect values:
#include <stdint.h>
#define SQRT_INT64_MAX (INT64_C(0xB504F333))
int64_t alx_pow_s64 (int64_t base, uint8_t exp)
{
int_fast64_t base_;
int_fast64_t result;
base_ = base;
if (base_ == 1)
return 1;
if (!exp)
return 1;
if (!base_)
return 0;
result = 1;
if (exp & 1)
result *= base_;
exp >>= 1;
while (exp) {
if (base_ > SQRT_INT64_MAX)
return 0;
base_ *= base_;
if (exp & 1)
result *= base_;
exp >>= 1;
}
return result;
}
Considerations for this function:
(1 ** N) == 1
(N ** 0) == 1
(0 ** 0) == 1
(0 ** N) == 0
If any overflow or wrapping is going to take place, return 0;
I used int64_t, but any width (signed or unsigned) can be used with little modification. However, if you need to use a non-fixed-width integer type, you will need to change SQRT_INT64_MAX by (int)sqrt(INT_MAX) (in the case of using int) or something similar, which should be optimized, but it is uglier, and not a C constant expression. Also casting the result of sqrt() to an int is not very good because of floating point precission in case of a perfect square, but as I don't know of any implementation where INT_MAX -or the maximum of any type- is a perfect square, you can live with that.
I have implemented algorithm that memorizes all computed powers and then uses them when need. So for example x^13 is equal to (x^2)^2^2 * x^2^2 * x where x^2^2 it taken from the table instead of computing it once again. This is basically implementation of #Pramod answer (but in C#).
The number of multiplication needed is Ceil(Log n)
public static int Power(int base, int exp)
{
int tab[] = new int[exp + 1];
tab[0] = 1;
tab[1] = base;
return Power(base, exp, tab);
}
public static int Power(int base, int exp, int tab[])
{
if(exp == 0) return 1;
if(exp == 1) return base;
int i = 1;
while(i < exp/2)
{
if(tab[2 * i] <= 0)
tab[2 * i] = tab[i] * tab[i];
i = i << 1;
}
if(exp <= i)
return tab[i];
else return tab[i] * Power(base, exp - i, tab);
}
Here is a O(1) algorithm for calculating x ** y, inspired by this comment. It works for 32-bit signed int.
For small values of y, it uses exponentiation by squaring. For large values of y, there are only a few values of x where the result doesn't overflow. This implementation uses a lookup table to read the result without calculating.
On overflow, the C standard permits any behavior, including crash. However, I decided to do bound-checking on LUT indices to prevent memory access violation, which could be surprising and undesirable.
Pseudo-code:
If `x` is between -2 and 2, use special-case formulas.
Otherwise, if `y` is between 0 and 8, use special-case formulas.
Otherwise:
Set x = abs(x); remember if x was negative
If x <= 10 and y <= 19:
Load precomputed result from a lookup table
Otherwise:
Set result to 0 (overflow)
If x was negative and y is odd, negate the result
C code:
#define POW9(x) x * x * x * x * x * x * x * x * x
#define POW10(x) POW9(x) * x
#define POW11(x) POW10(x) * x
#define POW12(x) POW11(x) * x
#define POW13(x) POW12(x) * x
#define POW14(x) POW13(x) * x
#define POW15(x) POW14(x) * x
#define POW16(x) POW15(x) * x
#define POW17(x) POW16(x) * x
#define POW18(x) POW17(x) * x
#define POW19(x) POW18(x) * x
int mypow(int x, unsigned y)
{
static int table[8][11] = {
{POW9(3), POW10(3), POW11(3), POW12(3), POW13(3), POW14(3), POW15(3), POW16(3), POW17(3), POW18(3), POW19(3)},
{POW9(4), POW10(4), POW11(4), POW12(4), POW13(4), POW14(4), POW15(4), 0, 0, 0, 0},
{POW9(5), POW10(5), POW11(5), POW12(5), POW13(5), 0, 0, 0, 0, 0, 0},
{POW9(6), POW10(6), POW11(6), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(7), POW10(7), POW11(7), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(8), POW10(8), 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(9), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(10), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int is_neg;
int r;
switch (x)
{
case 0:
return y == 0 ? 1 : 0;
case 1:
return 1;
case -1:
return y % 2 == 0 ? 1 : -1;
case 2:
return 1 << y;
case -2:
return (y % 2 == 0 ? 1 : -1) << y;
default:
switch (y)
{
case 0:
return 1;
case 1:
return x;
case 2:
return x * x;
case 3:
return x * x * x;
case 4:
r = x * x;
return r * r;
case 5:
r = x * x;
return r * r * x;
case 6:
r = x * x;
return r * r * r;
case 7:
r = x * x;
return r * r * r * x;
case 8:
r = x * x;
r = r * r;
return r * r;
default:
is_neg = x < 0;
if (is_neg)
x = -x;
if (x <= 10 && y <= 19)
r = table[x - 3][y - 9];
else
r = 0;
if (is_neg && y % 2 == 1)
r = -r;
return r;
}
}
}
My case is a little different, I'm trying to create a mask from a power, but I thought I'd share the solution I found anyway.
Obviously, it only works for powers of 2.
Mask1 = 1 << (Exponent - 1);
Mask2 = Mask1 - 1;
return Mask1 + Mask2;
In case you know the exponent (and it is an integer) at compile-time, you can use templates to unroll the loop. This can be made more efficient, but I wanted to demonstrate the basic principle here:
#include <iostream>
template<unsigned long N>
unsigned long inline exp_unroll(unsigned base) {
return base * exp_unroll<N-1>(base);
}
We terminate the recursion using a template specialization:
template<>
unsigned long inline exp_unroll<1>(unsigned base) {
return base;
}
The exponent needs to be known at runtime,
int main(int argc, char * argv[]) {
std::cout << argv[1] <<"**5= " << exp_unroll<5>(atoi(argv[1])) << ;std::endl;
}
I've noticed something strange about the standard exponential squaring algorithm with gnu-GMP :
I implemented 2 nearly-identical functions - a power-modulo function using the most vanilla binary exponential squaring algorithm,
labeled ______2()
then another one basically the same concept, but re-mapped to dividing by 10 at each round instead of dividing by 2,
labeled ______10()
.
( time ( jot - 1456 9999999999 6671 | pvE0 |
gawk -Mbe '
function ______10(_, __, ___, ____, _____, _______) {
__ = +__
____ = (____+=_____=____^= \
(_ %=___=+___)<_)+____++^____—
while (__) {
if (_______= __%____) {
if (__==_______) {
return (_^__ *_____) %___
}
__-=_______
_____ = (_^_______*_____) %___
}
__/=____
_ = _^____%___
}
}
function ______2(_, __, ___, ____, _____) {
__=+__
____+=____=_____^=(_%=___=+___)<_
while (__) {
if (__ %____) {
if (__<____) {
return (_*_____) %___
}
_____ = (_____*_) %___
--__
}
__/=____
_= (_*_) %___
}
}
BEGIN {
OFMT = CONVFMT = "%.250g"
__ = (___=_^= FS=OFS= "=")(_<_)
_____ = __^(_=3)^--_ * ++_-(_+_)^_
______ = _^(_+_)-_ + _^!_
_______ = int(______*_____)
________ = 10 ^ 5 + 1
_________ = 8 ^ 4 * 2 - 1
}
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
.
($++NF = ______10(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:08 [6.02MiB/s] [6.02MiB/s] [ <=> ]
in0: 15.6MiB 0:00:08 [1.95MiB/s] [1.95MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
8.31s user 0.06s system 103% cpu 8.058 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
($++NF = ______2(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:12 [3.78MiB/s] [3.78MiB/s] [<=> ]
in0: 15.6MiB 0:00:12 [1.22MiB/s] [1.22MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
13.05s user 0.07s system 102% cpu 12.821 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
For reasons extremely counter-intuitive and unknown to me, for a wide variety of inputs i threw at it, the div-10 variant is nearly always faster. It's the matching of hashes between the 2 that made it truly baffling, despite computers obviously not being built in and for a base-10 paradigm.
Am I missing something critical or obvious in the code/approach that might be skewing the results in a confounding manner ? Thanks.

Resources