C weird approximation on floating point [duplicate] - c

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 8 years ago.
I have the following code:
#include<stdio.h>
int main(int argc, char const *argv[])
{
float min, max, step;
min = -0.85, max = 0.85, step = 0.002;
int rank = 3, total = 4;
float step1 = min + (max - min) * rank / total; // should be 0.425
printf("%f %.7g\n", step1, step1); // 0.425000 0.4250001
float step2 = min + (max - min) * (rank + 1) / total - step; //should be 0.848
printf("%f %.7g\n", step2, step2); // 0.848000 0.848
float noc = (step2 - step1 + step) / step; //should be 212,5
printf("%f %.7g\n", noc, noc); // 212.499985 212.5
int nol = 1200;
int result = (int)nol * noc; //should be 255000
printf("%d\n", result); // 254999
return 0;
}
This is a fragment of code isolated from a project I have to do. The final result should be 255000, but for some causes, it shows 254999. Can someone please explain me what happens in the process? I have read somewhere that multiplying a floating number with 10^k and then dividing back solves such problems, but in this case, due to the variable step varying from 0.000001 to 0.1, I can't actually use that (in the same way, I can't use a defined EPSILON). What else can I do?
Thanks in advance!
P.S.: I have used double and long double as well, but with same problems, only this time error propagates from a further decimal. I am using gcc 4.8.2, under Ubuntu 14.04.1.

Truncation vs. rounding.
Due to subtle rounding effect of FP arithmetic, the product nol * noc may be slightly less than an integer value. Conversion to int results in fractional truncation. Suggest rounding before conversion to int.
#include <math.h>
int result = (int) roundf(nol * noc);

the significant problem(s) are:
1) mixing floating point and double with integer math
--so the compiler promotes all the math to float (or double)
2) not all numbers can be expressed exactly in float
3) --the initialization of min, max, step are taking double literals
and converting them to float
--even double cannot express all values exactly
--some precision is lost when performing the conversion from double to float
4) this code excerpt: (rank + 1) / total is always = 1
--(although the many conversions may result in being 'not exactly' 1)
5) argc and argv are not referenced in your code.
--this, given that all warnings are enabled, will rise two warnings
at compile time about unused parameters
6) this line in your code is not correct syntax
--(although the compiler might not complain) #include<stdio.h>
--it should be #include <stdio.h>
--sometimes spaces count, sometimes they dont

Related

Calculator for pi stopping

I'm trying to calculate pi with the precision of 10 decimal places. And the efficiency has to be the best(speed and memory allocation). The programming language is C in CodeBlocks.
I don't want to change the formula I'm using:
Problem: after a while, the resulting number stops incrementing but the iteration doesn't stop.
I'm not sure if this is a math problem or some kind of variable overflow.
The resulting number is 3.1415926431 and the number I want to achieve is 3.1415926535.
Every time the incrementation stops at this specific number and the iteration continues. Is there a possibility of an overflow or something?
Now I'm printing out every thousandth iteration (just the see the process) This will be deleted in the end.
notice the
a = n; a *= 4 * a; is for memory efficiency, there are more similar cuts I did.
code I'm using
#include <stdio.h>
#include <math.h>
#include <time.h>
int main(){
double time_spent = 0.0;
clock_t begin = clock();
int n=1;
double resultNumber= 1;
double pi = 3.1415926535;
double pi2 = pi / 2;
double a;
while(1){
a = n;
a *= 4 * a;
resultNumber *= a / (a - 1);
n++;
if (fabs(resultNumber - pi2) < pow(10,-10))
break;
if (n%1000==0) {
printf("%.10f %d\n", resultNumber*2, n);
}
}
clock_t end = clock();
time_spent += (double)(end - begin) / CLOCKS_PER_SEC;
printf("The elapsed time is %f seconds", time_spent);
return 0;
}
You can try it out here:
https://onlinegdb.com/q2Gil1DHdy
Is there a possibility of an overflow or something?
The precision of floating-point numbers is limited. In a typical C implementation, double has 53 bits of mantissa, which corresponds to about 15 significant decimal digits. But the range of such FP numbers is much larger than +/- 1015, so when your FP number is large enough, the units digit is not significant. Then subtracting 1 from it will not produce a different number. When your a reaches that point, the quotient a / (a - 1) will be identically 1, so multiplying by that will not change the working result.
It's possible that you would get enough additional precision by using long double instead of double. That might help both in getting you more terms in your product before the problem described above sets in, and also by reducing the relative magnitude of FP rounding errors earlier in the computation.
You can rescue a little of the accuracy by the following trick:
4n² / (4n² - 1) = 1 + 1 / (4n² - 1)
For large n, these factors are close to 1 and challenge the floating-point representation. You can use the identity
(1 + a)(1 + b)(1 + c)... = 1 + (a + b + c...) + (ab + ac + ... + bc + ...) ...
So for small terms a, b, c... (when the second order terms disappear), it is more accurate to use the approximation 1 + (a + b + c...), of course summing inside the parenthesis first.

Sum of array of floats returns different results [duplicate]

This question already has answers here:
How best to sum up lots of floating point numbers?
(5 answers)
Is floating point math broken?
(31 answers)
Closed 4 years ago.
Here I have a function sum() of type float that takes in a pointer t of type float and an integer size. It returns the sum of all the elements in the array. Then I create two arrays using that function. One that has the BIG value at the first index and one that has it at the last index. When I return the sums of each of those arrays I get different results. This is my code:
#include <stdlib.h>
#include <stdio.h>
#define N 1024
#define SMALL 1.0
#define BIG 100000000.0
float sum(float* t, int size) { // here I define the function sum()
float s = 0.0;
for (int i = 0; i < size; i++) {
s += t[i];
}
return s;
}
int main() {
float tab[N];
for (int i = 0; i < N; i++) {
tab[i] = SMALL;
}
tab[0] = BIG;
float sum1 = sum(tab, N); // initialize sum1 with the big value at index 0
printf("sum1 = %f\n", sum1);
tab[0] = SMALL;
tab[N-1] = BIG;
float sum2 = sum(tab, N); // initialize sum2 with the big value at last index
printf("sum2 = %f\n", sum2);
return 0;
}
After compiling the code and running it I get the following output:
Sum = 100000000.000000
Sum = 100001024.000000
Why do I get different results even though the arrays have the same elements ( but at different indexes ).
What you're experiencing is floating point imprecision. Here's a simple demonstration.
int main() {
float big = 100000000.0;
float small = 1.0;
printf("%f\n", big + small);
printf("%f\n", big + (19 *small));
return 0;
}
You'd expect 100000001.0 and 100000019.0.
$ ./test
100000000.000000
100000016.000000
Why'd that happen? Because computers don't store numbers like we do, floating point numbers doubly so. A float has a size of just 32 bits, but can store numbers up to about 3^38 rather than the just 2^31 a 32 bit integer can. And it can store decimal places. How? They cheat. What it really stores is the sign, an exponent, and a mantissa.
sign * 2^exponent * mantissa
The mantissa is what determines accuracy and there's only 24 bits in a float. So large numbers lose precision.
You can read about exactly how and play around with the representation.
To solve this either use a double which has greater precision, or use an accurate, but slower, arbitrary precision library such as GMP.
Why do I get different results even though the arrays have the same elements
In floating-point math, 100000000.0 + 1.0 equals 100000000.0 and not 100000001.0, but 100000000.0 + 1024.0 does equal 100001024.0. Given the value 100000000.0, the value 1.0 is too small to show up in the available bits used to represent 100000000.0.
So when you put 100000000.0 first, all the later + 1.0 operations have no effect.
When you put 100000000.0 last, though, all the previous 1000+ 1.0 + 1.0 + ... do add up to 1024.0, and 1024.0 is "big enough" to make a difference given the available precision of floating point math.

How to compare double variables in the if statement

As I am trying to compare these doubles, it won't seem to be working correctly
Here it goes: (This is exactly my problem)
#include <stdio.h>
#include <math.h>
int main () {
int i_wagen;
double dd[20];
dd[0]=0.;
dd[1]=0.;
double abstand= 15.;
double K_spiel=0.015;
double s_rel_0= K_spiel;
int i;
for(i=1; i<=9; i++)
{
i_wagen=2*(i-1)+2;
dd[i_wagen]=dd[i_wagen-1]-abstand;
i_wagen=2*(i-1)+3;
dd[i_wagen]=dd[i_wagen-1]-s_rel_0;
}
double s_rel=dd[3-1]-dd[3];
if((fabs(s_rel) - K_spiel) == 0.)
{
printf("yes\n");
}
return(0);
}
After executing the programm, it wont print the yes.
How to compare double variables in the if statement?
Take under account limited precision of the double representation of floating point numbers!
Your problem is simple and covered in Is floating point math broken?
Floating point operations are not precise. The representation of the given number may not be precise.
For 0.1 in the standard binary64 format, the representation can be written exactly as 0.1000000000000000055511151231257827021181583404541015625
Double precision (double) gives you only 52 bits of significant, 11 bits of exponent, and 1 sign bit. Floating point numbers in C use IEEE 754 encoding.
See the output of your program and the possible fix where you settle down for the variable being close to 0.0:
#include <stdio.h>
#include <math.h>
#define PRECISION 1e-6
int main (void) {
int i_wagen;
double dd[20];
dd[0]=0.;
dd[1]=0.;
double abstand= 15.;
double K_spiel=0.015;
double s_rel_0= K_spiel;
int i;
for(i=1; i<=9; i++)
{
i_wagen = 2*(i-1)+2;
dd[i_wagen] = dd[i_wagen-1]-abstand;
i_wagen = 2*(i-1)+3;
dd[i_wagen] = dd[i_wagen-1] - s_rel_0;
}
double s_rel = dd[3-1]-dd[3];
printf(" s_rel %.16f K_spiel %.16f diff %.16f \n" , s_rel, K_spiel, ((fabs(s_rel) - K_spiel)) );
if((fabs(s_rel) - K_spiel) == 0.0) // THIS WILL NOT WORK!
{
printf("yes\n");
}
// Settle down for being close enough to 0.0
if( fabs( (fabs(s_rel) - K_spiel)) < PRECISION)
{
printf("yes!!!\n");
}
return(0);
}
Output:
s_rel 0.0150000000000006 K_spiel 0.0150000000000000 diff 0.0000000000000006
yes!!!
You're comparing x to two different matrix entries: the first if compares x to coeff[0][0], the second to coeff[0][1]. So if x is greater than coeff[0][0] and less than or equal to coeff[0][1] the program will execture the final else branch. You probably want to compare x to the same matrix entry in both if statements. And in that case, the last else branch would be useless, since one of the three cases (less than, equal to or greater than) MUST be true.
First, dd[i_wagen-1] as used in the statement:
dd[i_wagen]=dd[i_wagen-1]-abstand;
is uninitialized. Code will run, but will have unpredictable results.
To initialize, you can use:
double dd[20]={0}; //sufficient
or possibly
double dd[20]={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; //explicit, but not necessary
Moving to your actual question, it all comes down to this statement:
if((fabs(s_rel) - K_spiel) == 0.)
You have initialized K_spiel to 0.015. And at this point in your execution flow s_rel appears to be close to 0.015. But it is actually closer to 0.0150000000000006. So the comparison fails.
One trick that is commonly used is to define an epsilon value, and use it to determine if the difference between two floating point values is small enough to satisfy your purpose:
From The Art of Computer Programming, the following snippet uses this approach, and will work for your very specific example: (caution: Read why this approach will not work for all floating point related comparisons.)
bool approximatelyEqual(float a, float b, float epsilon)
{
return fabs(a - b) <= ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}
So replace the line:
if((fabs(s_rel) - K_spiel) == 0.)
with
if(approximatelyEqual(s_rel, K_spiel, 1e-8)

C multiplying and dividing

I am running a program for school and am running into a simple math problem that C is not calculating correctly. I am simply trying to divide my output by a number and it is returning the original number.
int main(void)
{
float fcost = 599;
float shipt = (fcost / 120);
shipt = shipt * 120;
printf("%4.0f", shipt);
return 0;
}
From what was stated in the comments, you want the result of the division to be rounded down.
In order to do that, you should use int instead of float so that integer division is performed (which truncates the fractional part) instead of floating point division (which retains the fractional part):
int fcost = 599;
int shipt = (fcost / 120);
shipt *=120;
printf("%d", shipt);
If you're trying to simply divide the number by 120, remove the next line, i.e remove
shipt*=120
As it nullifies the previous line's process.
In the event this isn't what you mean, please clarify.
OP appears to want a truncated number quotient.
Converting a FP number to int risks undefined behavior (UB) if the original double is much outside the range of int.
The standard library provides various rounding functions to avoid the above problem.
#include <math.h>
int main(void) {
float fcost = 599;
// to round to the next lower whole number
float shipt = floorf(fcost / 120);
// to round to the next higher whole number
float shipt = ceilf(fcost / 120);
// to round to the next whole number toward 0
float shipt = truncf(fcost / 120);
shipt *= 120;
printf("%4.0f", shipt);
return 0;
}

Floating point exception in C

#include <stdio.h>
#include <math.h>
#include <string.h>
long fctl(int n){
int a=1,i;
for(i=n;i>1;--i)
a*=i;
return a;
}
long ch(int n, int r){
return fctl(n) / (fctl(n-r)*fctl(r));
}
int main(){
char *hands[] = {"onepair", "twopair", "triple", "straight", "flush", "fullhouse", "fourofakind", "straightflush", "royalflush"};
double handprobs[9];
handprobs[0] = ch(13,1)*ch(4,2) * ch(12,3)*pow(ch(4,1), 3) / ch(52,5);
handprobs[1] = ch(13,2)*pow(ch(4,2), 2) * ch(11,1)*ch(4,1) / ch(52,5);
handprobs[2] = ch(13,1)*ch(4,3) * ch(12,2)*pow(ch(4,1), 2) / ch(52,5);
handprobs[3] = 10.0 * pow(ch(4, 1),5) / ch(52, 5) - 10.0/ch(52,5) - 4.0/ch(52,5);
handprobs[4] = ch(13,5)*ch(4,1) / ch(52, 5) - 10.0/ch(52,5);
handprobs[5] = ch(13,1)*ch(4,3) * ch(12,1)*ch(4,2) / ch(52,5);
handprobs[6] = ch(13,1)*1 * ch(12,1)*ch(4,1) / ch(52,5);
handprobs[7] = 40.0 / ch(52, 5) - 4.0/ch(52,5),
handprobs[8] = 4.0 / ch(52, 5);
int i;
for(i=0;hands[i];++i){
printf("%s\t%f\n",hands[i], handprobs[i]);
}
}
When I compile it returns "Floating point exception (core dumped)", not sure why. (Have tried converting all the probs with (double).) Any ideas?
fctl(52) is waaaaay too big for an int. You're going to have to rethink your approach to doing this calculation. You can output INT_MAX to see how far you can actually go. You can buy a tiny bit more space by using unsigned long long (cf. ULLONG_MAX) but that is still nowhere near big enough for 52! .
Invoking integer overflow causes undefined behaviour; "floating point exception" often means attempt to do integer division by zero, which is plausible given your attempted calculations plus the fact that they overflowed. Don't ask me why this is reported as FPE despite the fact that it didn't involve any floating point. (probably "historical reasons")
After accept answer.
#Matt McNabb wells points out that fctl(52) is certianly to big for a vaid numeric result to fit in an long. (#mrVoid asserts 225 bit int needed.)
But certain #Lưu Vĩnh Phúc is on the right track as to what caused the exception.
fctl(x) will be the product of numbers 1 to x, half of those are even. Thus fctl(x) will have x/2 LSbits set to zero.
Assuming 32-bit int, once the number of LSBits of fctl(n-r) and fctl(r) exceed/meet 32, the product (fctl(n-r)*fctl(r)) will be 0 and return fctl(n) / (0); throws an exception.
On many systems an integer divide by 0 is reported as an floating-point error. I think this oddity occurs to simplify trap handling.

Resources