Optimize C code preventing while loops [duplicate] - c

I'm looking for some nice C code that will accomplish effectively:
while (deltaPhase >= M_PI) deltaPhase -= M_TWOPI;
while (deltaPhase < -M_PI) deltaPhase += M_TWOPI;
What are my options?

Edit Apr 19, 2013:
Modulo function updated to handle boundary cases as noted by aka.nice and arr_sea:
static const double _PI= 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348;
static const double _TWO_PI= 6.2831853071795864769252867665590057683943387987502116419498891846156328125724179972560696;
// Floating-point modulo
// The result (the remainder) has same sign as the divisor.
// Similar to matlab's mod(); Not similar to fmod() - Mod(-3,4)= 1 fmod(-3,4)= -3
template<typename T>
T Mod(T x, T y)
{
static_assert(!std::numeric_limits<T>::is_exact , "Mod: floating-point type expected");
if (0. == y)
return x;
double m= x - y * floor(x/y);
// handle boundary cases resulted from floating-point cut off:
if (y > 0) // modulo range: [0..y)
{
if (m>=y) // Mod(-1e-16 , 360. ): m= 360.
return 0;
if (m<0 )
{
if (y+m == y)
return 0 ; // just in case...
else
return y+m; // Mod(106.81415022205296 , _TWO_PI ): m= -1.421e-14
}
}
else // modulo range: (y..0]
{
if (m<=y) // Mod(1e-16 , -360. ): m= -360.
return 0;
if (m>0 )
{
if (y+m == y)
return 0 ; // just in case...
else
return y+m; // Mod(-106.81415022205296, -_TWO_PI): m= 1.421e-14
}
}
return m;
}
// wrap [rad] angle to [-PI..PI)
inline double WrapPosNegPI(double fAng)
{
return Mod(fAng + _PI, _TWO_PI) - _PI;
}
// wrap [rad] angle to [0..TWO_PI)
inline double WrapTwoPI(double fAng)
{
return Mod(fAng, _TWO_PI);
}
// wrap [deg] angle to [-180..180)
inline double WrapPosNeg180(double fAng)
{
return Mod(fAng + 180., 360.) - 180.;
}
// wrap [deg] angle to [0..360)
inline double Wrap360(double fAng)
{
return Mod(fAng ,360.);
}

One-liner constant-time solution:
Okay, it's a two-liner if you count the second function for [min,max) form, but close enough — you could merge them together anyways.
/* change to `float/fmodf` or `long double/fmodl` or `int/%` as appropriate */
/* wrap x -> [0,max) */
double wrapMax(double x, double max)
{
/* integer math: `(max + x % max) % max` */
return fmod(max + fmod(x, max), max);
}
/* wrap x -> [min,max) */
double wrapMinMax(double x, double min, double max)
{
return min + wrapMax(x - min, max - min);
}
Then you can simply use deltaPhase = wrapMinMax(deltaPhase, -M_PI, +M_PI).
The solutions is constant-time, meaning that the time it takes does not depend on how far your value is from [-PI,+PI) — for better or for worse.
Verification:
Now, I don't expect you to take my word for it, so here are some examples, including boundary conditions. I'm using integers for clarity, but it works much the same with fmod() and floats:
Positive x:
wrapMax(3, 5) == 3: (5 + 3 % 5) % 5 == (5 + 3) % 5 == 8 % 5 == 3
wrapMax(6, 5) == 1: (5 + 6 % 5) % 5 == (5 + 1) % 5 == 6 % 5 == 1
Negative x:
Note: These assume that integer modulo copies left-hand sign; if not, you get the above ("Positive") case.
wrapMax(-3, 5) == 2: (5 + (-3) % 5) % 5 == (5 - 3) % 5 == 2 % 5 == 2
wrapMax(-6, 5) == 4: (5 + (-6) % 5) % 5 == (5 - 1) % 5 == 4 % 5 == 4
Boundaries:
wrapMax(0, 5) == 0: (5 + 0 % 5) % 5 == (5 + 0) % 5 == 5 % 5 == 0
wrapMax(5, 5) == 0: (5 + 5 % 5) % 5 == (5 + 0) % 5== 5 % 5 == 0
wrapMax(-5, 5) == 0: (5 + (-5) % 5) % 5 == (5 + 0) % 5 == 5 % 5 == 0
Note: Possibly -0 instead of +0 for floating-point.
The wrapMinMax function works much the same: wrapping x to [min,max) is the same as wrapping x - min to [0,max-min), and then (re-)adding min to the result.
I don't know what would happen with a negative max, but feel free to check that yourself!

If ever your input angle can reach arbitrarily high values, and if continuity matters, you can also try
atan2(sin(x),cos(x))
This will preserve continuity of sin(x) and cos(x) better than modulo for high values of x, especially in single precision (float).
Indeed, exact_value_of_pi - double_precision_approximation ~= 1.22e-16
On the other hand, most library/hardware use a high precision approximation of PI for applying the modulo when evaluating trigonometric functions (though x86 family is known to use a rather poor one).
Result might be in [-pi,pi], you'll have to check the exact bounds.
Personaly, I would prevent any angle to reach several revolutions by wrapping systematically and stick to a fmod solution like the one of boost.

There is also fmod function in math.h but the sign causes trouble so that a subsequent operation is needed to make the result fir in the proper range (like you already do with the while's). For big values of deltaPhase this is probably faster than substracting/adding `M_TWOPI' hundreds of times.
deltaPhase = fmod(deltaPhase, M_TWOPI);
EDIT:
I didn't try it intensively but I think you can use fmod this way by handling positive and negative values differently:
if (deltaPhase>0)
deltaPhase = fmod(deltaPhase+M_PI, 2.0*M_PI)-M_PI;
else
deltaPhase = fmod(deltaPhase-M_PI, 2.0*M_PI)+M_PI;
The computational time is constant (unlike the while solution which gets slower as the absolute value of deltaPhase increases)

I would do this:
double wrap(double x) {
return x-2*M_PI*floor(x/(2*M_PI)+0.5);
}
There will be significant numerical errors. The best solution to the numerical errors is to store your phase scaled by 1/PI or by 1/(2*PI) and depending on what you are doing store them as fixed point.

Instead of working in radians, use angles scaled by 1/(2π) and use modf, floor etc. Convert back to radians to use library functions.
This also has the effect that rotating ten thousand and a half revolutions is the same as rotating half then ten thousand revolutions, which is not guaranteed if your angles are in radians, as you have an exact representation in the floating point value rather than summing approximate representations:
#include <iostream>
#include <cmath>
float wrap_rads ( float r )
{
while ( r > M_PI ) {
r -= 2 * M_PI;
}
while ( r <= -M_PI ) {
r += 2 * M_PI;
}
return r;
}
float wrap_grads ( float r )
{
float i;
r = modff ( r, &i );
if ( r > 0.5 ) r -= 1;
if ( r <= -0.5 ) r += 1;
return r;
}
int main ()
{
for (int rotations = 1; rotations < 100000; rotations *= 10 ) {
{
float pi = ( float ) M_PI;
float two_pi = 2 * pi;
float a = pi;
a += rotations * two_pi;
std::cout << rotations << " and a half rotations in radians " << a << " => " << wrap_rads ( a ) / two_pi << '\n' ;
}
{
float pi = ( float ) 0.5;
float two_pi = 2 * pi;
float a = pi;
a += rotations * two_pi;
std::cout << rotations << " and a half rotations in grads " << a << " => " << wrap_grads ( a ) / two_pi << '\n' ;
}
std::cout << '\n';
}}

Here is a version for other people finding this question that can use C++ with Boost:
#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/sign.hpp>
template<typename T>
inline T normalizeRadiansPiToMinusPi(T rad)
{
// copy the sign of the value in radians to the value of pi
T signedPI = boost::math::copysign(boost::math::constants::pi<T>(),rad);
// set the value of rad to the appropriate signed value between pi and -pi
rad = fmod(rad+signedPI,(2*boost::math::constants::pi<T>())) - signedPI;
return rad;
}
C++11 version, no Boost dependency:
#include <cmath>
// Bring the 'difference' between two angles into [-pi; pi].
template <typename T>
T normalizeRadiansPiToMinusPi(T rad) {
// Copy the sign of the value in radians to the value of pi.
T signed_pi = std::copysign(M_PI,rad);
// Set the value of difference to the appropriate signed value between pi and -pi.
rad = std::fmod(rad + signed_pi,(2 * M_PI)) - signed_pi;
return rad;
}

I encountered this question when searching for how to wrap a floating point value (or a double) between two arbitrary numbers. It didn't answer specifically for my case, so I worked out my own solution which can be seen here. This will take a given value and wrap it between lowerBound and upperBound where upperBound perfectly meets lowerBound such that they are equivalent (ie: 360 degrees == 0 degrees so 360 would wrap to 0)
Hopefully this answer is helpful to others stumbling across this question looking for a more generic bounding solution.
double boundBetween(double val, double lowerBound, double upperBound){
if(lowerBound > upperBound){std::swap(lowerBound, upperBound);}
val-=lowerBound; //adjust to 0
double rangeSize = upperBound - lowerBound;
if(rangeSize == 0){return upperBound;} //avoid dividing by 0
return val - (rangeSize * std::floor(val/rangeSize)) + lowerBound;
}
A related question for integers is available here:
Clean, efficient algorithm for wrapping integers in C++

A two-liner, non-iterative, tested solution for normalizing arbitrary angles to [-π, π):
double normalizeAngle(double angle)
{
double a = fmod(angle + M_PI, 2 * M_PI);
return a >= 0 ? (a - M_PI) : (a + M_PI);
}
Similarly, for [0, 2π):
double normalizeAngle(double angle)
{
double a = fmod(angle, 2 * M_PI);
return a >= 0 ? a : (a + 2 * M_PI);
}

In the case where fmod() is implemented through truncated division and has the same sign as the dividend, it can be taken advantage of to solve the general problem thusly:
For the case of (-PI, PI]:
if (x > 0) x = x - 2PI * ceil(x/2PI) #Shift to the negative regime
return fmod(x - PI, 2PI) + PI
And for the case of [-PI, PI):
if (x < 0) x = x - 2PI * floor(x/2PI) #Shift to the positive regime
return fmod(x + PI, 2PI) - PI
[Note that this is pseudocode; my original was written in Tcl, and I didn't want to torture everyone with that. I needed the first case, so had to figure this out.]

deltaPhase -= floor(deltaPhase/M_TWOPI)*M_TWOPI;

The way suggested you suggested is best. It is fastest for small deflections. If angles in your program are constantly being deflected into the proper range, then you should only run into big out of range values rarely. Therefore paying the cost of a complicated modular arithmetic code every round seems wasteful. Comparisons are cheap compared to modular arithmetic (http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/).

In C99:
float unwindRadians( float radians )
{
const bool radiansNeedUnwinding = radians < -M_PI || M_PI <= radians;
if ( radiansNeedUnwinding )
{
if ( signbit( radians ) )
{
radians = -fmodf( -radians + M_PI, 2.f * M_PI ) + M_PI;
}
else
{
radians = fmodf( radians + M_PI, 2.f * M_PI ) - M_PI;
}
}
return radians;
}

If linking against glibc's libm (including newlib's implementation) you can access
__ieee754_rem_pio2f() and __ieee754_rem_pio2() private functions:
extern __int32_t __ieee754_rem_pio2f (float,float*);
float wrapToPI(float xf){
const float p[4]={0,M_PI_2,M_PI,-M_PI_2};
float yf[2];
int q;
int qmod4;
q=__ieee754_rem_pio2f(xf,yf);
/* xf = q * M_PI_2 + yf[0] + yf[1] /
* yf[1] << y[0], not sure if it could be ignored */
qmod4= q % 4;
if (qmod4==2)
/* (yf[0] > 0) defines interval (-pi,pi]*/
return ( (yf[0] > 0) ? -p[2] : p[2] ) + yf[0] + yf[1];
else
return p[qmod4] + yf[0] + yf[1];
}
Edit: Just realised that you need to link to libm.a, I couldn't find the symbols declared in libm.so

I have used (in python):
def WrapAngle(Wrapped, UnWrapped ):
TWOPI = math.pi * 2
TWOPIINV = 1.0 / TWOPI
return UnWrapped + round((Wrapped - UnWrapped) * TWOPIINV) * TWOPI
c-code equivalent:
#define TWOPI 6.28318531
double WrapAngle(const double dWrapped, const double dUnWrapped )
{
const double TWOPIINV = 1.0/ TWOPI;
return dUnWrapped + round((dWrapped - dUnWrapped) * TWOPIINV) * TWOPI;
}
notice that this brings it in the wrapped domain +/- 2pi so for +/- pi domain you need to handle that afterward like:
if( angle > pi):
angle -= 2*math.pi

Related

Writing a function that calculates the sum of squares within a range in one line in C

My try
double sum_squares_from(double x, double n){
return n<=0 ? 0 : x*x + sum_squares_from((x+n-1)*(x+n-1),n-1);
}
Instead of using loops my professor wants us to write functions like this...
What the exercise asks for is a function sum_squares_from() with double x being the starting number and n is the number of number. For example if you do x = 2 and n = 4 you get 2*2+3*3+4*4+5*5. It returns zero if n == 0.
My thinking was that in my example what I have is basically x*x+(x+1)(x+1)+(x+1+1)(x+1+1)+(x+1+1+1)(x+1+1+1) = (x+0)(x+0)+(x+1)(x+1)+(x+2)(x+2)+(x+3)(x+3) = (x+n-1)^2 repeated n times where n gets decremented every time by one until it becomes zero and then you sum everything.
Did I do it right?
(if my professor seems a bit demanding... he somehow does this sort of thing all in his head without auxiliary calculations. Scary guy)
It's not recursive, but it's one line:
int
sum_squares(int x, int n) {
return ((x + n - 1) * (x + n) * (2 * (x + n - 1) + 1) / 6) - ((x - 1) * x * (2 * (x - 1) + 1) / 6);
}
Sum of squares (of integers) has a closed-form solution for 1 .. n. This code calculates the sum of squares from 1 .. (x+n) and then subtracts the sum of squares from 1 .. (x-1).
The original version of this answer used ASCII art.
So,
&Sum;i:0..n i = n(n+1)(&half;)
&Sum;i:0..n i2 = n(n+1)(2n+1)(&frac16;)
We note that,
&Sum;i:0..n (x+i)2
&equals; &Sum;i:0...n x2 + 2xi + i2
&equals; (n+1)x2 + (2x)&Sum;i:0..n i + &Sum;i:0..n i2
&equals; (n+1)x2 + n(n+1)x + n(n+1)(2n+1)(&frac16;)
Thus, your sum has the closed form:
double sum_squares_from(double x, int n) {
return ((n-- > 0)
? (n + 1) * x * x
+ x * n * (n + 1)
+ n * (n + 1) * (2 * n + 1) / 6.
: 0);
}
If I apply some obfuscation, the one-line version becomes:
double sum_squares_from(double x, int n) {
return (n-->0)?(n+1)*(x*x+x*n+n*(2*n+1)/6.):0;
}
If the task is to implement the summation in a loop, use tail recursion. Tail recursion can be mechanically replaced with a loop, and many compilers implement this optimization.
static double sum_squares_from_loop(double x, int n, double s) {
return (n <= 0) ? s : sum_squares_from_loop(x+1, n-1, s+x*x);
}
double sum_squares_from(double x, int n) {
return sum_squares_from_loop(x, n, 0);
}
As an illustration, if you observe the generated assembly in GCC at a sufficient optimization level (-Os, -O2, or -O3), you will notice that the recursive call is eliminated (and sum_squares_from_loop is inlined to boot).
Try it online!
As mentioned in my original comment, n should not be type double, but instead be type int to avoid floating point comparison problems with n <= 0. Making the change and simplifying the multiplication and recursive call, you do:
double sum_squares_from(double x, int n)
{
return n <= 0 ? 0 : x * x + sum_squares_from (x + 1, n - 1);
}
If you think about starting with x * x and increasing x by 1, n times, then the simple x * x + sum_squares_from (x + 1, n - 1) is quite easy to understand.
Maybe this?
double sum_squares_from(double x, double n) {
return n <= 0 ? 0 : (x + n - 1) * (x + n - 1) + sum_squares_from(x, n - 1);
}

An efficient method for calculating log base 2 of a number between 1 and 2

I am working on a fixed-point platform (floating-point arithmetic not supported).
I represent any rational number q as the floor value of q * (1 << precision).
I need an efficient method for calculating log base 2 of x, where 1 < x < 2.
Here is what I've done so far:
uint64_t Log2(uint64_t x, uint8_t precision)
{
uint64 res = 0;
uint64 one = (uint64_t)1 << precision;
uint64 two = (uint64_t)2 << precision;
for (uint8_t i = precision; i > 0 ; i--)
{
x = (x * x) / one; // now 1 < x < 4
if (x >= two)
{
x >>= 1; // now 1 < x < 2
res += (uint64_t)1 << (i - 1);
}
}
return res;
}
This works well, however, it takes a toll on the overall performance of my program, which requires executing this for a large amount of input values.
For all it matters, the precision used is 31, but this may change so I need to keep it as a variable.
Are there any optimizations that I can apply here?
I was thinking of something in the form of "multiply first, sum up last".
But that would imply calculating x ^ (2 ^ precision), which would very quickly overflow.
Update
I have previously tried to get rid of the branch, but it just made things worse:
for (uint8_t i = precision; i > 0 ; i--)
{
x = (x * x) / one; // now 1 < x < 4
uint64_t n = x / two;
x >>= n; // now 1 < x < 2
res += n << (i - 1);
}
return res;
The only things I can think of is to do the loop with a right-shift instead of a decrement and change a few operations to their equivalent binary ops. That may or may not be relevant to your platform, but in my x64 PC they yield an improvement of about 2%:
uint64_t Log2(uint64_t x, uint8_t precision)
{
uint64_t res = 0;
uint64_t two = (uint64_t)2 << precision;
for (uint64_t b = (uint64_t)1 << (precision - 1); b; b >>= 1)
{
x = (x * x) >> precision; // now 1 < x < 4
if (x & two)
{
x >>= 1; // now 1 < x < 2
res |= b;
}
}
return res;
}
My proposal would go from opposite direction -- into a use of a constant-performance at fixed number of steps.
Given a reasonable small amount of resources will still suffice and the precision target is known and always reached, the constant-performance deployment can beat most iterative schemes.
A Taylor expansion ( since 1715 ) of log2(x) provides both a solid calculus basement plus (almost) infinite precision a-priori known to be feasible for any depth of fixed-point arithmetics ( be it for Epiphany / FPGA / ASIC / you keep it private / ... )
Math transforms the whole problem into an optionally small amount of a few node points X_tab_i, for which ( as few as platform precision requires ) constants are pre-calculated for each node point. The rest is a platform-efficient assembly of Taylor sum of products, granting the result is obtained both in constant-time + having a residual error under design-driven threshold ( the target PSPACE x PTIME constraints tradeoff here is obvious for design phase, yet the process is always a CTIME, CSPACE once deployed )
Voilá:
Given X: lookup closest X_tab_i,
with C0_tab_i, C1_tab_i, C2_tab_i, .., Cn_tab_i
//-----------------------------------------------------------------<STATIC/CONST>
// ![i]
#DEFINE C0_tab_i <log2( X_tab_i )>
#DEFINE C1_tab_i < ( X_tab_i )^(-1) * ( +1 / ( 1 * ln(2) )>
#DEFINE C2_tab_i < ( X_tab_i )^(-2) * ( -1 / ( 2 * ln(2) )>
#DEFINE C3_tab_i < ( X_tab_i )^(-3) * ( +1 / ( 3 * ln(2) )>
::: : : :
#DEFINE CN_tab_i < ( X_tab_i )^(-N) * ( -1^(N-1) ) / ( N * ln(2) )>
// -----------------------------------------------------------------<PROCESS>-BEG
DIFF = X - X_tab_i; CORR = DIFF;
RES = C0_tab_i
+ C1_tab_i * CORR; CORR *= DIFF;
RES += C2_tab_i * CORR; CORR *= DIFF;
... +=
RES += Cn_tab_i * CORR; CORR *= DIFF;
// --------------------------------------------------------------<PROCESS>-END:

Value of a variable not changing in a loop

I have a function that calculates the sin() of a radian angle. It takes two parameters, the value of the angle in radian and the terms. Just to make everything clear, this is how sin() is calculated:
sin(x) = x - (1/3! * X^3) + (1/5! * X^5) - (1/7! * X^7) + (1/9! * X^9) - ...
This is the function that do this calculation:
double sinx(double theta, int terms) //Theta is the angle x in radian
{
double result = 0;//this variable holds the value and it's updated with each term.
int i = 1;
int num = 3;
while(i <= terms-1)
{
if(i % 2 != 0){
result = result - ( (1.0/factorial(num)) * pow(theta, num) );
printf("if\n");//this is just for debugging
}
else if(i % 2 == 0){
result = result + ( (1.0/factorial(num)) * pow(theta, num) );
printf("else if\n");//this is for debugging too
}
printf("%lf\n", result);//debugging also
num = num + 2;
i = i + 1;
}
return theta + result; //this calculates the final term
}
The problem is the variable result's value doesn't change. Which also results in the final result not changing when using different number of terms.
Those are some outputs I get:
//with theta = 0.2 and terms = 6 ;;
if
-0.001333
else if
-0.001331
if
-0.001331
else if
-0.001331
if
-0.001331
Computed Sin<0.200000> = 0.198669. //this is the returned value. It's printed in the main
//with theta = 0.2 and terms = 7
if
-0.001333
else if
-0.001331
if
-0.001331
else if
-0.001331
if
-0.001331
else if
-0.001331
Computed Sin<0.200000> = 0.198669.
Any ideas?
Your code should be totally right. At least my calculator gives the same result.
If you change your printf("%lf\n", result); to printf("%.17f\n", result); you get this output:
if
-0.00133333333333333
else if
-0.00133066666666667
if
-0.00133066920634921
else if
-0.00133066920493827
if
-0.00133066920493878
else if
-0.00133066920493878
Now you can see, that it is still changing in every loop, but very little.
Really it converges fast so for double precision there is no difference between 6 and 7 terms. Here is the dump with better precision:
if
-0.00133333333333333350
else if
-0.00133066666666666680
if
-0.00133066920634920640
else if
-0.00133066920493827170
if
-0.00133066920493878470
Sin(0.2, 6) = 0.19866933079506122000
if
-0.00133333333333333350
else if
-0.00133066666666666680
if
-0.00133066920634920640
else if
-0.00133066920493827170
if
-0.00133066920493878470
else if
-0.00133066920493878450
Sin(0.2, 7) = 0.19866933079506122000
Everything looks correct here. The reason the result doesn't appear to change is to do with how quickly the Taylor series for sin converges for small angles. If you try with a bigger number say pi you should see the value updating slightly more often. You may also want to include something to limit theta from -pi to +pi as sin is a periodic function.
theta = mod(theta+pi, 2*pi) - pi
Including this restriction will alleviate the need for more terms if you start calculating values > pi or < -pi
If performance is important then you can reduce some of the calculations by removing repeats in calculating the factorials and large exponents
double sin(double theta, int terms = 7)
{
theta = mod(theta+pi, 2*pi) - pi;
double sum = x, term = x, fact = 3;
for (int i = 1; i < terms; i++)
{
term = -term * theta * theta /(fact * (fact - 1));
sum += term;
fact += 2;
}
return sum;
}
Your program as you have posted seems quite right IF you have your factorial written the right way. I have written factorial this way:
double factorial(int n) {
if(n <= 1) {
return 1.0;
}
else {
return n * factorial(n-1);
}
}
Try using this.
Using 0.785398 (approx. pi/4) and 10 terms, I get output 0.707107.
double d = sinx(0.785398, 10);
printf("%f\n", d); // prints 0.707107
Here are some runs:
printf("%.20f\n", sinx(3.1415926535897932, 100));
printf("%.20f\n", sinx(3.1415926535897932/2, 100));
printf("%.20f\n", sinx(3.1415926535897932/4, 100));
Outputs:
0.00000000000000044409
1.00000000000000000000
0.70710678118654746000
which seem accurate enough, given the pi used is only approximate.
What did you expect ?
The third term is
0.2^5/120 = 0.000002
if you show the first six decimals, and the next terms are yet smaller.
Side remark:
It is much more efficient and more accurate to compute a term from the previous, using the recurrence
T*= Z/(N*(N-1))
where Z= -X*X (and this way, the alternating signs are automatically handled).

How to evaluate the Sine Series (Taylor) for value of x using Recursion in C?

(C) calculate series
y = x - x3/3! + x5/5! - x7/7! + .....
where stopping criterion is
| xi/i! | <= 0.001
What I have tried :
#include<stdio.h>
#include<math.h>
int fact(int x){
if(x>1){
return x * fact(x-1);
}
else {
return 1 ;
}
}
int main () {
int x , i=1 , sign=1 ;
float result ;
scanf("%d",&x);
while(abs(pow(x,i)/fact(i))>0.001){
result += sign*(pow(x,i)/fact(i));
i+2;
sign = sign * -1 ;
}
printf("result= %f\n",result);
return 0 ;
}
the problem is
when i input 90 ... the output should be 1 ... ( it's like the sin(x) )
im getting a different output
The code (at least) misses to initialise result.
Change
float result;
to
float result = 0.f;
Also
i+2;
is a NOP (no-operation). It results in nothing. It adds 2 to i and does not assign the result to anything, "throughs away" the result.
To increment i by 2 do:
i = i + 2;
or
i += 2;
Also^2 using abs() won't work as it return int.
Use fabs() to get a floating point value.
Or just do not use it at all as it's argument will never be negative here.
As a final advice prefer using double over float, as floats accurary is limited.
The problem is very clear. You have to convert degree into radian before performing the loop. Your code has some other issues also.
Here is the rectified code, which gives you 1 for 90:
#include<stdio.h>
#include<math.h>
int fact(int x){
if(x>1){
return x * fact(x-1);
}
else {
return 1 ;
}
}
int main () {
int x , i=1 , sign=1;
double result, rad;
scanf("%d",&x);
rad = x/180.0*3.1415;
while((pow(x,i)/fact(i))>0.001){
result += sign*(pow(rad,i)/fact(i));
i+=2;
sign *= -1 ;
}
printf("result= %f\n",result);
return 0 ;
}
sin(90)=0.89399666360055789051826949840421...
Besides the unit confusion, your code is not very efficient, as you compute the powers and factorials from scratch on each term, when a nice recurrence exists.
Sin= x
Term= x
Factor= -x*x
i= 2
while |Term| > 0.001:
Term*= Factor / (i * (i+1))
Sin+= Term
i+= 2
Because of huge cancellation errors, this formula is not appropriate for large values of the argument. My own implementation gives -1.07524337969e+21 for 90.
If you compute it for 90-14*2*Pi instead, you get 0.893995..., not a so bad result.
An algorithm that calculates sin (x) using the following power series: sin (x) = (x / 1!) - (X ^ 3/3) + (x ^ 5/5!) - (! ^ x 7/7) + ... We stop the calculation when the difference between two succesive terms of the sum given is less than a certain tolerance.

The most efficient way to implement an integer based power function pow(int, int)

What is the most efficient way given to raise an integer to the power of another integer in C?
// 2^3
pow(2,3) == 8
// 5^5
pow(5,5) == 3125
Exponentiation by squaring.
int ipow(int base, int exp)
{
int result = 1;
for (;;)
{
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
break;
base *= base;
}
return result;
}
This is the standard method for doing modular exponentiation for huge numbers in asymmetric cryptography.
Note that exponentiation by squaring is not the most optimal method. It is probably the best you can do as a general method that works for all exponent values, but for a specific exponent value there might be a better sequence that needs fewer multiplications.
For instance, if you want to compute x^15, the method of exponentiation by squaring will give you:
x^15 = (x^7)*(x^7)*x
x^7 = (x^3)*(x^3)*x
x^3 = x*x*x
This is a total of 6 multiplications.
It turns out this can be done using "just" 5 multiplications via addition-chain exponentiation.
n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15
There are no efficient algorithms to find this optimal sequence of multiplications. From Wikipedia:
The problem of finding the shortest addition chain cannot be solved by dynamic programming, because it does not satisfy the assumption of optimal substructure. That is, it is not sufficient to decompose the power into smaller powers, each of which is computed minimally, since the addition chains for the smaller powers may be related (to share computations). For example, in the shortest addition chain for a¹⁵ above, the subproblem for a⁶ must be computed as (a³)² since a³ is re-used (as opposed to, say, a⁶ = a²(a²)², which also requires three multiplies).
If you need to raise 2 to a power. The fastest way to do so is to bit shift by the power.
2 ** 3 == 1 << 3 == 8
2 ** 30 == 1 << 30 == 1073741824 (A Gigabyte)
Here is the method in Java
private int ipow(int base, int exp)
{
int result = 1;
while (exp != 0)
{
if ((exp & 1) == 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
An extremely specialized case is, when you need say 2^(-x to the y), where x, is of course is negative and y is too large to do shifting on an int. You can still do 2^x in constant time by screwing with a float.
struct IeeeFloat
{
unsigned int base : 23;
unsigned int exponent : 8;
unsigned int signBit : 1;
};
union IeeeFloatUnion
{
IeeeFloat brokenOut;
float f;
};
inline float twoToThe(char exponent)
{
// notice how the range checking is already done on the exponent var
static IeeeFloatUnion u;
u.f = 2.0;
// Change the exponent part of the float
u.brokenOut.exponent += (exponent - 1);
return (u.f);
}
You can get more powers of 2 by using a double as the base type.
(Thanks a lot to commenters for helping to square this post away).
There's also the possibility that learning more about IEEE floats, other special cases of exponentiation might present themselves.
power() function to work for Integers Only
int power(int base, unsigned int exp){
if (exp == 0)
return 1;
int temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else
return base*temp*temp;
}
Complexity = O(log(exp))
power() function to work for negative exp and float base.
float power(float base, int exp) {
if( exp == 0)
return 1;
float temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else {
if(exp > 0)
return base*temp*temp;
else
return (temp*temp)/base; //negative exponent computation
}
}
Complexity = O(log(exp))
If you want to get the value of an integer for 2 raised to the power of something it is always better to use the shift option:
pow(2,5) can be replaced by 1<<5
This is much more efficient.
int pow( int base, int exponent)
{ // Does not work for negative exponents. (But that would be leaving the range of int)
if (exponent == 0) return 1; // base case;
int temp = pow(base, exponent/2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
Just as a follow up to comments on the efficiency of exponentiation by squaring.
The advantage of that approach is that it runs in log(n) time. For example, if you were going to calculate something huge, such as x^1048575 (2^20 - 1), you only have to go thru the loop 20 times, not 1 million+ using the naive approach.
Also, in terms of code complexity, it is simpler than trying to find the most optimal sequence of multiplications, a la Pramod's suggestion.
Edit:
I guess I should clarify before someone tags me for the potential for overflow. This approach assumes that you have some sort of hugeint library.
Late to the party:
Below is a solution that also deals with y < 0 as best as it can.
It uses a result of intmax_t for maximum range. There is no provision for answers that do not fit in intmax_t.
powjii(0, 0) --> 1 which is a common result for this case.
pow(0,negative), another undefined result, returns INTMAX_MAX
intmax_t powjii(int x, int y) {
if (y < 0) {
switch (x) {
case 0:
return INTMAX_MAX;
case 1:
return 1;
case -1:
return y % 2 ? -1 : 1;
}
return 0;
}
intmax_t z = 1;
intmax_t base = x;
for (;;) {
if (y % 2) {
z *= base;
}
y /= 2;
if (y == 0) {
break;
}
base *= base;
}
return z;
}
This code uses a forever loop for(;;) to avoid the final base *= base common in other looped solutions. That multiplication is 1) not needed and 2) could be int*int overflow which is UB.
more generic solution considering negative exponenet
private static int pow(int base, int exponent) {
int result = 1;
if (exponent == 0)
return result; // base case;
if (exponent < 0)
return 1 / pow(base, -exponent);
int temp = pow(base, exponent / 2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
The O(log N) solution in Swift...
// Time complexity is O(log N)
func power(_ base: Int, _ exp: Int) -> Int {
// 1. If the exponent is 1 then return the number (e.g a^1 == a)
//Time complexity O(1)
if exp == 1 {
return base
}
// 2. Calculate the value of the number raised to half of the exponent. This will be used to calculate the final answer by squaring the result (e.g a^2n == (a^n)^2 == a^n * a^n). The idea is that we can do half the amount of work by obtaining a^n and multiplying the result by itself to get a^2n
//Time complexity O(log N)
let tempVal = power(base, exp/2)
// 3. If the exponent was odd then decompose the result in such a way that it allows you to divide the exponent in two (e.g. a^(2n+1) == a^1 * a^2n == a^1 * a^n * a^n). If the eponent is even then the result must be the base raised to half the exponent squared (e.g. a^2n == a^n * a^n = (a^n)^2).
//Time complexity O(1)
return (exp % 2 == 1 ? base : 1) * tempVal * tempVal
}
int pow(int const x, unsigned const e) noexcept
{
return !e ? 1 : 1 == e ? x : (e % 2 ? x : 1) * pow(x * x, e / 2);
//return !e ? 1 : 1 == e ? x : (((x ^ 1) & -(e % 2)) ^ 1) * pow(x * x, e / 2);
}
Yes, it's recursive, but a good optimizing compiler will optimize recursion away.
One more implementation (in Java). May not be most efficient solution but # of iterations is same as that of Exponential solution.
public static long pow(long base, long exp){
if(exp ==0){
return 1;
}
if(exp ==1){
return base;
}
if(exp % 2 == 0){
long half = pow(base, exp/2);
return half * half;
}else{
long half = pow(base, (exp -1)/2);
return base * half * half;
}
}
I use recursive, if the exp is even,5^10 =25^5.
int pow(float base,float exp){
if (exp==0)return 1;
else if(exp>0&&exp%2==0){
return pow(base*base,exp/2);
}else if (exp>0&&exp%2!=0){
return base*pow(base,exp-1);
}
}
In addition to the answer by Elias, which causes Undefined Behaviour when implemented with signed integers, and incorrect values for high input when implemented with unsigned integers,
here is a modified version of the Exponentiation by Squaring that also works with signed integer types, and doesn't give incorrect values:
#include <stdint.h>
#define SQRT_INT64_MAX (INT64_C(0xB504F333))
int64_t alx_pow_s64 (int64_t base, uint8_t exp)
{
int_fast64_t base_;
int_fast64_t result;
base_ = base;
if (base_ == 1)
return 1;
if (!exp)
return 1;
if (!base_)
return 0;
result = 1;
if (exp & 1)
result *= base_;
exp >>= 1;
while (exp) {
if (base_ > SQRT_INT64_MAX)
return 0;
base_ *= base_;
if (exp & 1)
result *= base_;
exp >>= 1;
}
return result;
}
Considerations for this function:
(1 ** N) == 1
(N ** 0) == 1
(0 ** 0) == 1
(0 ** N) == 0
If any overflow or wrapping is going to take place, return 0;
I used int64_t, but any width (signed or unsigned) can be used with little modification. However, if you need to use a non-fixed-width integer type, you will need to change SQRT_INT64_MAX by (int)sqrt(INT_MAX) (in the case of using int) or something similar, which should be optimized, but it is uglier, and not a C constant expression. Also casting the result of sqrt() to an int is not very good because of floating point precission in case of a perfect square, but as I don't know of any implementation where INT_MAX -or the maximum of any type- is a perfect square, you can live with that.
I have implemented algorithm that memorizes all computed powers and then uses them when need. So for example x^13 is equal to (x^2)^2^2 * x^2^2 * x where x^2^2 it taken from the table instead of computing it once again. This is basically implementation of #Pramod answer (but in C#).
The number of multiplication needed is Ceil(Log n)
public static int Power(int base, int exp)
{
int tab[] = new int[exp + 1];
tab[0] = 1;
tab[1] = base;
return Power(base, exp, tab);
}
public static int Power(int base, int exp, int tab[])
{
if(exp == 0) return 1;
if(exp == 1) return base;
int i = 1;
while(i < exp/2)
{
if(tab[2 * i] <= 0)
tab[2 * i] = tab[i] * tab[i];
i = i << 1;
}
if(exp <= i)
return tab[i];
else return tab[i] * Power(base, exp - i, tab);
}
Here is a O(1) algorithm for calculating x ** y, inspired by this comment. It works for 32-bit signed int.
For small values of y, it uses exponentiation by squaring. For large values of y, there are only a few values of x where the result doesn't overflow. This implementation uses a lookup table to read the result without calculating.
On overflow, the C standard permits any behavior, including crash. However, I decided to do bound-checking on LUT indices to prevent memory access violation, which could be surprising and undesirable.
Pseudo-code:
If `x` is between -2 and 2, use special-case formulas.
Otherwise, if `y` is between 0 and 8, use special-case formulas.
Otherwise:
Set x = abs(x); remember if x was negative
If x <= 10 and y <= 19:
Load precomputed result from a lookup table
Otherwise:
Set result to 0 (overflow)
If x was negative and y is odd, negate the result
C code:
#define POW9(x) x * x * x * x * x * x * x * x * x
#define POW10(x) POW9(x) * x
#define POW11(x) POW10(x) * x
#define POW12(x) POW11(x) * x
#define POW13(x) POW12(x) * x
#define POW14(x) POW13(x) * x
#define POW15(x) POW14(x) * x
#define POW16(x) POW15(x) * x
#define POW17(x) POW16(x) * x
#define POW18(x) POW17(x) * x
#define POW19(x) POW18(x) * x
int mypow(int x, unsigned y)
{
static int table[8][11] = {
{POW9(3), POW10(3), POW11(3), POW12(3), POW13(3), POW14(3), POW15(3), POW16(3), POW17(3), POW18(3), POW19(3)},
{POW9(4), POW10(4), POW11(4), POW12(4), POW13(4), POW14(4), POW15(4), 0, 0, 0, 0},
{POW9(5), POW10(5), POW11(5), POW12(5), POW13(5), 0, 0, 0, 0, 0, 0},
{POW9(6), POW10(6), POW11(6), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(7), POW10(7), POW11(7), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(8), POW10(8), 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(9), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(10), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int is_neg;
int r;
switch (x)
{
case 0:
return y == 0 ? 1 : 0;
case 1:
return 1;
case -1:
return y % 2 == 0 ? 1 : -1;
case 2:
return 1 << y;
case -2:
return (y % 2 == 0 ? 1 : -1) << y;
default:
switch (y)
{
case 0:
return 1;
case 1:
return x;
case 2:
return x * x;
case 3:
return x * x * x;
case 4:
r = x * x;
return r * r;
case 5:
r = x * x;
return r * r * x;
case 6:
r = x * x;
return r * r * r;
case 7:
r = x * x;
return r * r * r * x;
case 8:
r = x * x;
r = r * r;
return r * r;
default:
is_neg = x < 0;
if (is_neg)
x = -x;
if (x <= 10 && y <= 19)
r = table[x - 3][y - 9];
else
r = 0;
if (is_neg && y % 2 == 1)
r = -r;
return r;
}
}
}
My case is a little different, I'm trying to create a mask from a power, but I thought I'd share the solution I found anyway.
Obviously, it only works for powers of 2.
Mask1 = 1 << (Exponent - 1);
Mask2 = Mask1 - 1;
return Mask1 + Mask2;
In case you know the exponent (and it is an integer) at compile-time, you can use templates to unroll the loop. This can be made more efficient, but I wanted to demonstrate the basic principle here:
#include <iostream>
template<unsigned long N>
unsigned long inline exp_unroll(unsigned base) {
return base * exp_unroll<N-1>(base);
}
We terminate the recursion using a template specialization:
template<>
unsigned long inline exp_unroll<1>(unsigned base) {
return base;
}
The exponent needs to be known at runtime,
int main(int argc, char * argv[]) {
std::cout << argv[1] <<"**5= " << exp_unroll<5>(atoi(argv[1])) << ;std::endl;
}
I've noticed something strange about the standard exponential squaring algorithm with gnu-GMP :
I implemented 2 nearly-identical functions - a power-modulo function using the most vanilla binary exponential squaring algorithm,
labeled ______2()
then another one basically the same concept, but re-mapped to dividing by 10 at each round instead of dividing by 2,
labeled ______10()
.
( time ( jot - 1456 9999999999 6671 | pvE0 |
gawk -Mbe '
function ______10(_, __, ___, ____, _____, _______) {
__ = +__
____ = (____+=_____=____^= \
(_ %=___=+___)<_)+____++^____—
while (__) {
if (_______= __%____) {
if (__==_______) {
return (_^__ *_____) %___
}
__-=_______
_____ = (_^_______*_____) %___
}
__/=____
_ = _^____%___
}
}
function ______2(_, __, ___, ____, _____) {
__=+__
____+=____=_____^=(_%=___=+___)<_
while (__) {
if (__ %____) {
if (__<____) {
return (_*_____) %___
}
_____ = (_____*_) %___
--__
}
__/=____
_= (_*_) %___
}
}
BEGIN {
OFMT = CONVFMT = "%.250g"
__ = (___=_^= FS=OFS= "=")(_<_)
_____ = __^(_=3)^--_ * ++_-(_+_)^_
______ = _^(_+_)-_ + _^!_
_______ = int(______*_____)
________ = 10 ^ 5 + 1
_________ = 8 ^ 4 * 2 - 1
}
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
.
($++NF = ______10(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:08 [6.02MiB/s] [6.02MiB/s] [ <=> ]
in0: 15.6MiB 0:00:08 [1.95MiB/s] [1.95MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
8.31s user 0.06s system 103% cpu 8.058 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
($++NF = ______2(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:12 [3.78MiB/s] [3.78MiB/s] [<=> ]
in0: 15.6MiB 0:00:12 [1.22MiB/s] [1.22MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
13.05s user 0.07s system 102% cpu 12.821 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
For reasons extremely counter-intuitive and unknown to me, for a wide variety of inputs i threw at it, the div-10 variant is nearly always faster. It's the matching of hashes between the 2 that made it truly baffling, despite computers obviously not being built in and for a base-10 paradigm.
Am I missing something critical or obvious in the code/approach that might be skewing the results in a confounding manner ? Thanks.

Resources