Pow function returning wrong result - c

When I use the pow() function, sometimes the results are off by one. For example, this code produces 124, but I know that 5³ should be 125.
int main(){
int i = pow(5, 3);
printf("%d", i);
Why is the result wrong?

Your problem is that you are mixing integer variables with floating point math. My bet is that the result of 5^3 is something like 124.999999 due to rounding problems and when cast into integer variable get floored to 124.
There are 3 ways to deal with this:
more safely mix floating math and integers
int x=5,y=3,z;
// or
but using this will always present a possible risk of rounding errors affecting the result especially for higher exponents.
compute on floating variables only
so replace int with float or double. This is a bit safer than #1 but still in some cases is this not usable (depends on the task). and may need occasional floor,ceil,round along the way to get the wanted result correctly.
Use integer math only
This is the safest way (unless you cross the int limit). The pow can be computed on integer math relatively easily see:
Power by squaring for negative exponents

pow(x, y) is most likely implemented as exp(y * log(x)): modern CPUs can evaluate exp and log in a couple of flicks of the wrist.
Although adequate for many scientific applications, when truncating the result to an integer, the result can be off for even trivial arguments. That's what is happening here.
Your best bet is to roll your own version of pow for integer arguments; i.e. find one from a good library. As a starting point, see The most efficient way to implement an integer based power function pow(int, int)

Use Float Data Type
#include <stdio.h>
#include <math.h>
int main()
float x=2;
float y=2;
float p= pow(x,y);
return 0;

You can use this function instead of pow:
long long int Pow(long long int base, unsigned int exp)
if (exp > 0)
return base * Pow(base, exp-1);
return 1;


Using round() function in c

I'm a bit confused about the round() function in C.
First of all, man says:
#include <math.h>
double round(double x);
These functions return the rounded integer value.
If x is integral, +0, -0, NaN, or infinite, x itself is returned.
The return value is a double / float or an int?
In second place, I've created a function that first rounds, then casts to int. Latter on my code I use it as a mean to compare doubles
int tointn(double in,int n)
int i = 0;
i = (int)round(in*pow(10,n));
return i;
This function apparently isn't stable throughout my tests. Is there redundancy here? Well... I'm not looking only for an answer, but a better understanding on the subject.
The wording in the man-page is meant to be read literally, that is in its mathematical sense. The wording "x is integral" means that x is an element of Z, not that x has the data type int.
Casting a double to int can be dangerous because the maximum arbitrary integral value a double can hold is 2^52 (assuming an IEEE 754 conforming binary64 ), the maximum value an int can hold might be smaller (it is mostly 32 bit on 32-bit architectures and also 32-bit on some 64-bit architectures).
If you need only powers of ten you can test it with this little program yourself:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(){
int i;
for(i = 0;i < 26;i++){
printf("%d:\t%.2f\t%d\n",i, pow(10,i), (int)pow(10,i));
Instead of casting you should use the functions that return a proper integral data type like e.g.: lround(3).
here is an excerpt from the man page.
#include <math.h>
double round(double x);
float roundf(float x);
long double roundl(long double x);
notice: the returned value is NEVER a integer. However, the fractional part of the returned value is set to 0.
notice: depending on exactly which function is called will determine the type of the returned value.
Here is an excerpt from the man page about which way the rounding will be done:
These functions round x to the nearest integer, but round halfway cases
away from zero (regardless of the current rounding direction, see
fenv(3)), instead of to the nearest even integer like rint(3).
For example, round(0.5) is 1.0, and round(-0.5) is -1.0.
If you want a long integer to be returned then please use lround:
long int tolongint(double in)
return lround(in));
For details please see lround which is available as of the C++ 11 standard.

Division of two floats giving incorrect answer

Attempting to divide two floats in C, using the code below:
#include <stdio.h>
#include <math.h>
int main(){
float fpfd = 122.88e6;
float flo = 10e10;
float int_part, frac_part;
int_part = (int)(flo/fpfd);
frac_part = (flo/fpfd) - int_part;
printf("\nInt_Part = %f\n", int_part);
printf("Frac_Part = %f\n", frac_part);
To this code, I use the commands:
>> gcc test_prog.c -o test_prog -lm
>> ./test_prog
I then get this output:
Int_Part = 813.000000
Frac_Part = 0.802063
Now, this Frac_part it seems is incorrect. I have tried the same equation on a calculator first and then in Wolfram Alpha and they both give me:
Frac_Part = 0.802083
Notice the number at the fifth decimal place is different.
This may seem insignificant to most, but for the calculations I am doing it is of paramount importance.
Can anyone explain to me why the C code is making this error?
When you have inadequate precision from floating point operations, the first most natural step is to just use floating point types of higher precision, e.g. use double instead of float. (As pointed out immediately in the other answers.)
Second, examine the different floating point operations and consider their precisions. The one that stands out to me as being a source of error is the method above of separating a float into integer part and fractional part, by simply casting to int and subtracting. This is not ideal, because, when you subtract the integer part from the original value, you are doing arithmetic where the three numbers involved (two inputs and result) have very different scales, and this will likely lead to precision loss.
I would suggest to use the C <math.h> function modf instead to split floating point numbers into integer and fractional part. http://www.techonthenet.com/c_language/standard_library_functions/math_h/modf.php
(In greater detail: When you do an operation like f - (int)f, the floating point addition procedure is going to see that two numbers of some given precision X are being added, and it's going to naturally assume that the result will also have precision X. Then it will perform the actual computation under that assumption, and finally reevaluate the precision of the result at the end. Because the initial prediction turned out not to be ideal, some low order bits are going to get lost.)
Float are single precision for floating point, you should instead try to use double, the following code give me the right result:
#include <stdio.h>
#include <math.h>
int main(){
double fpfd = 122.88e6;
double flo = 10e10;
double int_part, frac_part;
int_part = (int)(flo/fpfd);
frac_part = (flo/fpfd) - int_part;
printf("\nInt_Part = %f\n", int_part);
printf("Frac_Part = %f\n", frac_part);
Why ?
As I said, float are single precision floating point, they are smaller than double (in most architecture, sizeof(float) < sizeof(double)).
By using double instead of float you will have more bit to store the mantissa and the exponent part of the number (see wikipedia).
float has only 6~9 significant digits, it's not precise enough for most uses in practice. Changing all float variables to double (which provides 15~17 significant digits) gives output:
Int_Part = 813.000000
Frac_Part = 0.802083

Why does pow(n,2) return 24 when n=5, with my compiler and OS?

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
int n,i,ele;
return 0;
The output is 24.
I'm using GNU/GCC in Code::Blocks.
What is happening?
I know the pow function returns a double , but 25 fits an int type so why does this code print a 24 instead of a 25? If n=4; n=6; n=3; n=2; the code works, but with the five it doesn't.
Here is what may be happening here. You should be able to confirm this by looking at your compiler's implementation of the pow function:
Assuming you have the correct #include's, (all the previous answers and comments about this are correct -- don't take the #include files for granted), the prototype for the standard pow function is this:
double pow(double, double);
and you're calling pow like this:
The pow function goes through an algorithm (probably using logarithms), thus uses floating point functions and values to compute the power value.
The pow function does not go through a naive "multiply the value of x a total of n times", since it has to also compute pow using fractional exponents, and you can't compute fractional powers that way.
So more than likely, the computation of pow using the parameters 5 and 2 resulted in a slight rounding error. When you assigned to an int, you truncated the fractional value, thus yielding 24.
If you are using integers, you might as well write your own "intpow" or similar function that simply multiplies the value the requisite number of times. The benefits of this are:
You won't get into the situation where you may get subtle rounding errors using pow.
Your intpow function will more than likely run faster than an equivalent call to pow.
You want int result from a function meant for doubles.
You should perhaps use
ele=(int)(0.5 + pow(n,2));
/* ^ ^ */
/* casting and rounding */
Floating-point arithmetic is not exact.
Although small values can be added and subtracted exactly, the pow() function normally works by multiplying logarithms, so even if the inputs are both exact, the result is not. Assigning to int always truncates, so if the inexactness is negative, you'll get 24 rather than 25.
The moral of this story is to use integer operations on integers, and be suspicious of <math.h> functions when the actual arguments are to be promoted or truncated. It's unfortunate that GCC doesn't warn unless you add -Wfloat-conversion (it's not in -Wall -Wextra, probably because there are many cases where such conversion is anticipated and wanted).
For integer powers, it's always safer and faster to use multiplication (division if negative) rather than pow() - reserve the latter for where it's needed! Do be aware of the risk of overflow, though.
When you use pow with variables, its result is double. Assigning to an int truncates it.
So you can avoid this error by assigning result of pow to double or float variable.
So basically
It translates to exp(log(x) * y) which will produce a result that isn't precisely the same as x^y - just a near approximation as a floating point value,. So for example 5^2 will become 24.9999996 or 25.00002

Can I calculate error introduced by doubles?

Suppose I have an irrational number like \sqrt{3}. As it is irrational, it has no decimal representation. So when you try to express it with a IEEE 754 double, you will introduce an error.
A decimal representation with a lot of digits is:
Now, when I calculate \sqrt{3}, I get 1.732051:
#include <stdio.h> // printf
#include <math.h> // needed for sqrt
int main() {
double myVar = sqrt (3);
printf("as double:\t%f\n", myVar);
According to Wolfram|Alpha, I have an error of 1.11100... × 10^-7.
Is there any way I can calculate the error myself?
(I don't mind switching to C++, Python or Java. I could probably also use Mathematica, if there is no simple alternative)
Just to clarify: I don't want a solution that works only for sqrt{3}. I would like to get a function that gives me the error for any number. If that is not possible, I would at least like to know how Wolfram|Alpha gets more values.
My try
While writing this question, I found this:
#include <stdio.h> // printf
#include <math.h> // needed for sqrt
#include <float.h> // needed for higher precision
int main() {
long double r = sqrtl(3.0L);
printf("Precision: %d digits; %.*Lg\n",LDBL_DIG,LDBL_DIG,r);
With this one, I can get the error down to 2.0 * 10^-18 according to Wolfram|Alpha. So I thought this might be close enough to get a good estimation of the error. I wrote this:
#include <stdio.h> // printf
#include <math.h> // needed for sqrt
#include <float.h>
int main() {
double myVar = sqrt (3);
long double r = sqrtl(3.0L);
long double error = abs(r-myVar) / r;
printf("Double:\t\t%f\n", myVar);
printf("Precision:\t%d digits; %.*Lg\n",LDBL_DIG,LDBL_DIG,r);
printf("Error:\t\t%.*Lg\n", LDBL_DIG, error);
But it outputs:
Double: 1.732051
Precision: 18 digits; 1.73205080756887729
Error: 0
How can I fix that to get the error?
What every Programmer should know about Floating Point Arithmetic by Goldberg is the definite guide you are looking for.
printf rounds doubles to 6 places when you use %f without a precision.
double x = 1.3;
long double y = 1.3L;
long double err = y - (double) x;
printf("Error %.20Lf\n", err);
My output: -0.00000000000000004445
If the result is 0, your long double and double are the same.
One way to obtain an interval that is guaranteed to contain the real value of the computation is to use interval arithmetic. Then, comparing the double result to the interval tells you how far the double computation is, at worst, from the real computation.
Frama-C's value analysis can do this for you with option -all-rounding-modes.
double Frama_C_sqrt(double x);
double sqrt(double x)
return Frama_C_sqrt(x);
double y;
int main(){
y = sqrt(3.0);
Analyzing the program with:
frama-c -val t.c -float-normal -all-rounding-modes
[value] Values at end of function main:
y ∈ [1.7320508075688772 .. 1.7320508075688774]
This means that the real value of sqrt(3), and thus the value that would be in variable y if the program computed with real numbers, is within the double bounds [1.7320508075688772 .. 1.7320508075688774].
Frama-C's value analysis does not support the long double type, but if I understand correctly, you were only using long double as reference to estimate the error made with double. The drawback of that method is that long double is itself imprecise. With interval arithmetic as implemented in Frama-C's value analysis, the real value of the computation is guaranteed to be within the displayed bounds.
You have a mistake in printing Double: 1.732051 here printf("Double:\t\t%f\n", myVar);
The actual value of double myVar is
1.732050807568877281 //18 digits
so 1.732050807568877281-1.732050807568877281 is zero
According to the C standard printf("%f", d) will default to 6 digits after the decimal point. This is not the full precision of your double.
It might be that double and long double happen to be the same on your architecture. I have different sizes for them on my architecture and get a non-zero error in your example code.
You want fabsl instead of abs when calculating the error, at least when using C. (In C, abs is integer.) With this substitution, I get:
Double: 1.732051
Precision: 18 digits; 1.73205080756887729
Error: 5.79643049346087304e-17
(Calculated on Mac OS X 10.8.3 with Apple clang 4.0.)
Using long double to estimate the errors in double is a reasonable approach for a few simple calculations, except:
If you are calculating the more accurate long double results, why bother with double?
Error behavior in sequences of calculations is hard to describe and can grow to the point where long double is not providing an accurate estimate of the exact result.
There exist perverse situations where long double gets less accurate results than double. (Mostly encountered when somebody constructs an example to teach students a lesson, but they exist nonetheless.)
In general, there is no simple and efficient way to calculate the error in a floating-point result in a sequence of calculations. If there were, it would be effectively a means of calculating a more accurate result, and we would use that instead of the floating-point calculations alone.
In special cases, such as when developing math library routines, the errors resulting from a particular sequence of code are studied carefully (and the code is redesigned as necessary to have acceptable error behavior). More often, error is estimated either by performing various “experiments” to see how much results fluctuate with varying inputs or by studying general mathematical behavior of systems.
You also asked “I would like to get a function that gives me the error for any number.” Well, that is easy, given any number x and the calculated result x', the error is exactly x' – x. The actual problem is you probably do not have a description of x that can be used to evaluate that expression easily. In your example, x is sqrt(3). Obviously, then, the error is sqrt(3) – x, and x is exactly 1.732050807568877193176604123436845839023590087890625. Now all you need to do is evaluate sqrt(3). In other words, numerically evaluating the error is about as hard as numerically evaluating the original number.
Is there some class of numbers you want to perform this analysis for?
Also, do you actually want to calculate the error or just a good bound on the error? The latter is somewhat easier, although it remains hard for sequences of calculations. For all elementary operations, IEEE 754 requires the produced result to be the result that is nearest the mathematically exact result (in the appropriate direction for the rounding mode being used). In round-to-nearest mode, this implies that each result is at most 1/2 ULP (unit of least precision) away from the exact result. For operations such as those found in the standard math library (sine, logarithm, et cetera), most libraries will produce results within a few ULP of the exact result.

How to use % operator for float values in c

When I use % operator on float values I get error stating that "invalid operands to binary % (have ‘float’ and ‘double’)".I want to enter the integers value only but the numbers are very large(not in the range of int type)so to avoid the inconvenience I use float.Is there any way to use % operator on such large integer values????
You can use the fmod function from the standard math library. Its prototype is in the standard header <math.h>.
You're probably better off using long long, which has greater precision than double in most systems.
Note: If your numbers are bigger than a long long can hold, then fmod probably won't behave the way you want it to. In that case, your best bet is a bigint library, such as this one.
The % operator is only defined for integer type operands; you'll need to use the fmod* library functions for floating-point types:
#include <math.h>
double fmod(double x, double y);
float fmodf(float x, float y);
long double fmodl(long double x, long double y);
When I haven't had easy access to fmod or other libraries (for example, doing a quick Arduino sketch), I find that the following works well enough:
float someValue = 0.0;
// later...
// Since someValue = (someValue + 1) % 256 won't work for floats...
someValue += 1.0; // (or whatever increment you want to use)
while (someValue >= 256.0){
someValue -= 256.0;
consider : int 32 bit and long long int of 64 bits
Yes, %(modulo) operator isn't work with floats and double.. if you want to do the modulo operation on large number you can check long long int(64bits) might this help you.
still the range grater than 64 bits then in that case you need to store the data in .. string and do the modulo operation algorithmically.
or either you can go to any scripting language like python
If you want to use an int use long long, don't use a format that is non-ideal for your problem if a better format exists.
