float vs. double strange behaviour in c

float vs. double strange behaviour in c - c

i have troubles implementing a simple training program in C. The program should calculate a random cosinus or sinus of an angle, print the question "calculate cosinus/sinus of the angle x" to the user, who should type in the right answer in form "factor sqrt(value)". i.e. for cos(0) the user should type 1, for sin(45) the user should type 0.5sqrt(2). Most of the code is given in this task. The program doesn't work properly - for cos(270) the right answer is meant to be -0.000. Why is this happening? Why doesn't this code screams "division by 0"? Furthermore according to the task description the variable right should be of type double and rueckgabe of type int. But when i use double instead of float, i just get very high values (like 21234 or -435343). If i would use int as a return value of get_user_input(), the program won't work, right?
Here's the code:
#include <stdio.h>
#include <math.h>
#include <time.h>
#include <stdlib.h>
#define PI (acos(-1))
#define ACCURACY 1e-4
float get_user_input(is_cos, angle){
if (is_cos == 1) {
printf("Berechnen Sie den Cosinus zu %i\n", angle);
}
else {
printf("Berechnen Sie den Sinus zu %i\n", angle);
}
float faktor, wurzel=1.;
float rueckgabe;
scanf("%fsqrt(%f)", &faktor, &wurzel);
rueckgabe = faktor * sqrt(wurzel);
return rueckgabe;
}
int main (){
float right;
int correct;
int angles[] = { 0, 30, 45, 60, 90, 180, 270, 360 };
srand ( time(NULL) );
int is_cos = rand()%2;
int angle = angles[ rand()%(sizeof(angles)/sizeof(int)) ];
if( is_cos == 1) {
right = cos(angle/180.*PI);
}
else {
right = sin(angle/180.*PI);
}
correct = fabs(get_user_input(is_cos, angle)/right - 1.) <= ACCURACY;
printf("Ihre Antwort war %s!\n", correct ? "richtig" : "falsch");
return 0;
}

Since the sine and cosine return values in the range [-1, 1], I'd suggest that you use the absolute, rather than the relative, error. By replacing
correct = fabs(get_user_input(is_cos, angle)/right - 1.) <= ACCURACY;
with
correct = fabs(get_user_input(is_cos, angle) - right) <= ACCURACY;
everything should work as expected.
Generally I tend to use relative errors for large values and absolute errors for small values. You can combine both with
fabs(a-b)/(1.0+min(fabs(a), fabs(b)))
which (assuming you have a reasonable definition of min) tends to the relative error for large values and to the absolute error for small ones.

In your program, both divisor and dividend can be close to / exactly 0 at the same time.
Your program does not give you a "division by zero"-error, because by default most floating point implementations silently give you infinity/NaN/0/-0, depending on the exact values you divide.

The cosine to 270° is not exactly zero in floating-point arithmetic, because 270° cannot be expressed exactly in radians.
The following table shows the cosine of 270° (in the middle row) and in the first and last rows the cosines of the adjacent 32-bit floating-point numbers:
phi cos(phi)
4.7123885 -0.0000004649123
4.712389 0.000000011924881
4.7123895 0.00000048876205
And the same for 64-bit double-precision floating-point numbers:
phi cos(phi)
4.712388980384689 -1.0718754395722282e-15
4.71238898038469 -1.8369701987210297e-16
4.712388980384691 7.044813998280222e-16
With the current floating-point precision, there's no way that cos(phi) with phi in the vicinity of 1.5*pi can be exactly zero.
You could fix that by writing a cosdeg that takes an argument in degrees and returns exact values for the angles where the cosines and sines are -1, 0 or 1 and values calulated with radians otherwise. (Which then will happily generate the desired division by zero.)

The arguments to sin() and cos() are expressed in radians, not degrees.

Related

Round float to 2 decimal places in C language?

float number = 123.8798831;
number=(floorf((number + number * 0.1) * 100.0)) / 100.0;
printf("number = %f",number);
I want to get number = 136.25
But the compiler shows me number = 136.259995
I know that I can write like this printf("number = %.2f",number) ,but I need the number itself for further operation.It is necessary that the number be stored in a variable as number = 136.25

It is necessary that the number be stored in a variable as number = 136.25
But that would be the incorrect result. The precise result of number + number * 0.1 is 136.26787141. When you round that downwards to 2 decimal places, the number that you would get is 136.26, and not 136.25.
However, there is no way to store 136.26 in a float because it simply isn't a representable value (on your system). Best you can get is a value that is very close to it. You have successfully produced a floating point number that is very close to 136.26. If you cannot accept the slight error in the value, then you shouldn't be using finite precision floating point arithmetic.
If you wish to print the value of a floating point number up to limited number of decimals, you must understand that not all values can be represented by floating point numbers, and that you must use %.2f to get desired output.
Round float to 2 decimal places in C language?
Just like you did:
multiply with 100
round
divide by 100

I agree with the other comments/answers that using floating point numbers for money is usually not a good idea, not all numbers can be stored exactly. Basically, when you use floating point numbers, you sacrifice exactness for being able to storage very large and very small numbers and being able to store decimals. You don't want to sacrifice exactness when dealing with real money, but I think this is a student project, and no actual money is being calculated, so I wrote the small program to show one way of doing this.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(void)
{
double number, percent_interest, interest, result, rounded_result;
number = 123.8798831;
percent_interest = 0.1;
interest = (number * percent_interest)/100; //Calculate interest of interest_rate percent.
result = number + interest;
rounded_result = floor(result * 100) / 100;
printf("number=%f, percent_interest=%f, interest=%f, result=%f, rounded_result=%f\n", number, percent_interest, interest, result, rounded_result);
return EXIT_SUCCESS;
}
As you can see, I use double instead float, because double has more precession and floating point constants are of type double not float. The code in your question should give you a warning because in
float number = 123.8798831;
123.8798831 is of type double and has to be converted to float (possibly losing precession in the process).
You should also notice that my program calculates interest at .1% (like you say you want to do) unlike the code in your question which calculates interest at 10%. Your code multiplies by 0.1 which is 10/100 or 10%.

Here is an example of a function you can use for rounding to x number of decimals.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stddef.h>
double dround(double number, int dp)
{
int charsNeeded = 1 + snprintf(NULL, 0, "%.*f", dp, number);
char *buffer = malloc(charsNeeded);
snprintf(buffer, charsNeeded, "%.*f", dp, number);
double result = atof(buffer);
free(buffer);
return result;
}
int main()
{
float number = 37.777779;
number = dround(number,2);
printf("Number is %f\n",number);
return 0;
}

Underflow error in floating point arithmetic in C

I am new to C, and my task is to create a function
f(x) = sqrt[(x^2)+1]-1
that can handle very large numbers and very small numbers. I am submitting my script on an online interface that checks my answers.
For very large numbers I simplify the expression to:
f(x) = x-1
By just using the highest power. This was the correct answer.
The same logic does not work for smaller numbers. For small numbers (on the order of 1e-7), they are very quickly truncated to zero, even before they are squared. I suspect that this has to do with floating point precision in C. In my textbook, it says that the float type has smallest possible value of 1.17549e-38, with 6 digit precision. So although 1e-7 is much larger than 1.17e-38, it has a higher precision, and is therefore rounded to zero. This is my guess, correct me if I'm wrong.
As a solution, I am thinking that I should convert x to a long double when x < 1e-6. However when I do this, I still get the same error. Any ideas? Let me know if I can clarify. Code below:
#include <math.h>
#include <stdio.h>
double feval(double x) {
/* Insert your code here */
if (x > 1e299)
{;
return x-1;
}
if (x < 1e-6)
{
long double g;
g = x;
printf("x = %Lf\n", g);
long double a;
a = pow(x,2);
printf("x squared = %Lf\n", a);
return sqrt(g*g+1.)- 1.;
}
else
{
printf("x = %f\n", x);
printf("Used third \n");
return sqrt(pow(x,2)+1.)-1;
}
}
int main(void)
{
double x;
printf("Input: ");
scanf("%lf", &x);
double b;
b = feval(x);
printf("%f\n", b);
return 0;
}

For small inputs, you're getting truncation error when you do 1+x^2. If x=1e-7f, x*x will happily fit into a 32 bit floating point number (with a little bit of error due to the fact that 1e-7 does not have an exact floating point representation, but x*x will be so much smaller than 1 that floating point precision will not be sufficient to represent 1+x*x.
It would be more appropriate to do a Taylor expansion of sqrt(1+x^2), which to lowest order would be
sqrt(1+x^2) = 1 + 0.5*x^2 + O(x^4)
Then, you could write your result as
sqrt(1+x^2)-1 = 0.5*x^2 + O(x^4),
avoiding the scenario where you add a very small number to 1.
As a side note, you should not use pow for integer powers. For x^2, you should just do x*x. Arbitrary integer powers are a little trickier to do efficiently; the GNU scientific library for example has a function for efficiently computing arbitrary integer powers.

There are two issues here when implementing this in the naive way: Overflow or underflow in intermediate computation when computing x * x, and substractive cancellation during final subtraction of 1. The second issue is an accuracy issue.
ISO C has a standard math function hypot (x, y) that performs the computation sqrt (x * x + y * y) accurately while avoiding underflow and overflow in intermediate computation. A common approach to fix issues with subtractive cancellation is to transform the computation algebraically such that it is transformed into multiplications and / or divisions.
Combining these two fixes leads to the following implementation for float argument. It has an error of less than 3 ulps across all possible inputs according to my testing.
/* Compute sqrt(x*x+1)-1 accurately and without spurious overflow or underflow */
float func (float x)
{
return (x / (1.0f + hypotf (x, 1.0f))) * x;
}

A trick that is often useful in these cases is based on the identity
(a+1)*(a-1) = a*a-1
In this case
sqrt(x*x+1)-1 = (sqrt(x*x+1)-1)*(sqrt(x*x+1)+1)
/(sqrt(x*x+1)+1)
= (x*x+1-1) / (sqrt(x*x+1)+1)
= x*x/(sqrt(x*x+1)+1)
The last formula can be used as an implementation. For vwry small x sqrt(x*x+1)+1 will be close to 2 (for small enough x it will be 2) but we don;t loose precision in evaluating it.

The problem isn't with running into the minimum value, but with the precision.
As you said yourself, float on your machine has about 7 digits of precision. So let's take x = 1e-7, so that x^2 = 1e-14. That's still well within the range of float, no problems there. But now add 1. The exact answer would be 1.00000000000001. But if we only have 7 digits of precision, this gets rounded to 1.0000000, i.e. exactly 1. So you end up computing sqrt(1.0)-1 which is exactly 0.
One approach would be to use the linear approximation of sqrt around x=1 that sqrt(x) ~ 1+0.5*(x-1). That would lead to the approximation f(x) ~ 0.5*x^2.

How to compare double variables in the if statement

As I am trying to compare these doubles, it won't seem to be working correctly
Here it goes: (This is exactly my problem)
#include <stdio.h>
#include <math.h>
int main () {
int i_wagen;
double dd[20];
dd[0]=0.;
dd[1]=0.;
double abstand= 15.;
double K_spiel=0.015;
double s_rel_0= K_spiel;
int i;
for(i=1; i<=9; i++)
{
i_wagen=2*(i-1)+2;
dd[i_wagen]=dd[i_wagen-1]-abstand;
i_wagen=2*(i-1)+3;
dd[i_wagen]=dd[i_wagen-1]-s_rel_0;
}
double s_rel=dd[3-1]-dd[3];
if((fabs(s_rel) - K_spiel) == 0.)
{
printf("yes\n");
}
return(0);
}
After executing the programm, it wont print the yes.

How to compare double variables in the if statement?
Take under account limited precision of the double representation of floating point numbers!
Your problem is simple and covered in Is floating point math broken?
Floating point operations are not precise. The representation of the given number may not be precise.
For 0.1 in the standard binary64 format, the representation can be written exactly as 0.1000000000000000055511151231257827021181583404541015625
Double precision (double) gives you only 52 bits of significant, 11 bits of exponent, and 1 sign bit. Floating point numbers in C use IEEE 754 encoding.
See the output of your program and the possible fix where you settle down for the variable being close to 0.0:
#include <stdio.h>
#include <math.h>
#define PRECISION 1e-6
int main (void) {
int i_wagen;
double dd[20];
dd[0]=0.;
dd[1]=0.;
double abstand= 15.;
double K_spiel=0.015;
double s_rel_0= K_spiel;
int i;
for(i=1; i<=9; i++)
{
i_wagen = 2*(i-1)+2;
dd[i_wagen] = dd[i_wagen-1]-abstand;
i_wagen = 2*(i-1)+3;
dd[i_wagen] = dd[i_wagen-1] - s_rel_0;
}
double s_rel = dd[3-1]-dd[3];
printf(" s_rel %.16f K_spiel %.16f diff %.16f \n" , s_rel, K_spiel, ((fabs(s_rel) - K_spiel)) );
if((fabs(s_rel) - K_spiel) == 0.0) // THIS WILL NOT WORK!
{
printf("yes\n");
}
// Settle down for being close enough to 0.0
if( fabs( (fabs(s_rel) - K_spiel)) < PRECISION)
{
printf("yes!!!\n");
}
return(0);
}
Output:
s_rel 0.0150000000000006 K_spiel 0.0150000000000000 diff 0.0000000000000006
yes!!!

You're comparing x to two different matrix entries: the first if compares x to coeff[0][0], the second to coeff[0][1]. So if x is greater than coeff[0][0] and less than or equal to coeff[0][1] the program will execture the final else branch. You probably want to compare x to the same matrix entry in both if statements. And in that case, the last else branch would be useless, since one of the three cases (less than, equal to or greater than) MUST be true.

First, dd[i_wagen-1] as used in the statement:
dd[i_wagen]=dd[i_wagen-1]-abstand;
is uninitialized. Code will run, but will have unpredictable results.
To initialize, you can use:
double dd[20]={0}; //sufficient
or possibly
double dd[20]={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; //explicit, but not necessary
Moving to your actual question, it all comes down to this statement:
if((fabs(s_rel) - K_spiel) == 0.)
You have initialized K_spiel to 0.015. And at this point in your execution flow s_rel appears to be close to 0.015. But it is actually closer to 0.0150000000000006. So the comparison fails.
One trick that is commonly used is to define an epsilon value, and use it to determine if the difference between two floating point values is small enough to satisfy your purpose:
From The Art of Computer Programming, the following snippet uses this approach, and will work for your very specific example: (caution: Read why this approach will not work for all floating point related comparisons.)
bool approximatelyEqual(float a, float b, float epsilon)
{
return fabs(a - b) <= ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}
So replace the line:
if((fabs(s_rel) - K_spiel) == 0.)
with
if(approximatelyEqual(s_rel, K_spiel, 1e-8)

Understanding the maximum values that can be stored in floats in C

I have come across some behaviour with the float type in C that I do not understand, and was hoping might be explained. Using the macros defined in float.h I can determine the maximum/minimum values that the datatype can store on the given hardware. However when performing a calculation that should not exceed these limits, I find that a typed float variable fails where a double succeeds.
The following is a minimal example, which compiles on my machine.
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
int main(int argc, char **argv)
{
int gridsize;
long gridsize3;
float *datagrid;
float sumval_f;
double sumval_d;
long i;
gridsize = 512;
gridsize3 = (long)gridsize*gridsize*gridsize;
datagrid = calloc(gridsize3, sizeof(float));
if(datagrid == NULL)
{
free(datagrid);
printf("Memory allocation failed\n");
exit(0);
}
for(i=0; i<gridsize3; i++)
{
datagrid[i] += 1.0;
}
sumval_f = 0.0;
sumval_d = 0.0;
for(i=0; i<gridsize3; i++)
{
sumval_f += datagrid[i];
sumval_d += (double)datagrid[i];
}
printf("\ngridsize3 = %e\n", (float)gridsize3);
printf("FLT_MIN = %e\n", FLT_MIN);
printf("FLT_MAX = %e\n", FLT_MAX);
printf("DBL_MIN = %e\n", DBL_MIN);
printf("DBL_MAX = %e\n", DBL_MAX);
printf("\nfloat sum = %f\n", sumval_f);
printf("double sum = %lf\n", sumval_d);
printf("sumval_d/sumval_f = %f\n\n", sumval_d/(double)sumval_f);
free(datagrid);
return(0);
}
Compiling with gcc I find the output:
gridsize3 = 1.342177e+08
FLT_MIN = 1.175494e-38
FLT_MAX = 3.402823e+38
DBL_MIN = 2.225074e-308
DBL_MAX = 1.797693e+308
float sum = 16777216.000000
double sum = 134217728.000000
sumval_d/sumval_f = 8.000000
Whilst compiling with icc the sumval_f = 67108864.0 and hence the final ratio is instead 2.0*. Note that the float sum is incorrect, whilst the double sum is correct.
As far as I can tell the output of FLT_MAX suggests that the sum should fit into a float, and yet it seems to plateau out at either an eighth or a half of the full value.
Is there a compiler specific override to the values found using float.h?
Why is a double required to correctly find the sum of this array?
*Interestingly the inclusion of an if statement inside the for loop that prints values of the array causes the value to match the gcc output, i.e. an eighth of the correct sum, rather than a half.

The problem here isn't the range of values but the precision.
Assuming a 32-bit IEEE754 float, this datatype has a maximum of 24 bits of precision. This means that not all integers larger than 16777216 can be represented exactly.
So when your sum reaches 16777216, adding 1 to it is outside the precision of what the datatype can store, so the number doesn't get any bigger.
A (presumably) 64-bit double has 53 bits of precision. This is enough bits to hold all integer values up to your sum of 134217728, so it gives you an accurate result.

A float can precisely represent any integer between -16777215 and +16777215, inclusive. It can also represent all even integers between -2*16777215 and +2*16777215 (including +/- 2*8388608, i.e. 16777216), all multiples of 4 between -4*16777215 and +4*16777215, and likewise for all power-of-two scaling factors up to 2^104 (roughly 2.028E+31). Additionally, it can represent multiples of 1/2 from -16777215/2 to +16777215/2, multiples of 1/4 from -16777215/4 to +16777215/4, etc. down to multiples of 1/2^149 from -167777215/(2^149) to +16777215/(2^149).

Floating point numbers represent all of the infinite possible values between any two numbers; but, computers cannot hold an infinite number of values. So a compromise is made. The floating point numbers hold an approximation of the value.
This means that if you pick a value that is "more" than the stored floating point number, but not enough to arrive at the "next" storable approximation, then storing that logically bigger number won't actually change the floating point value.
The "error" in a floating point approximation is variable. For small numbers, the error is more precise; for bigger numbers, the error proportionally the same, but a bigger actual value.

Sine function using Taylor expansion (C Programming)

Here is the question..
This is what I've done so far,
#include <stdio.h>
#include <math.h>
long int factorial(int m)
{
if (m==0 || m==1) return (1);
else return (m*factorial(m-1));
}
double power(double x,int n)
{
double val=1;
int i;
for (i=1;i<=n;i++)
{
val*=x;
}
return val;
}
double sine(double x)
{
int n;
double val=0;
for (n=0;n<8;n++)
{
double p = power(-1,n);
double px = power(x,2*n+1);
long fac = factorial(2*n+1);
val += p * px / fac;
}
return val;
}
int main()
{
double x;
printf("Enter angles in degrees: ");
scanf("%lf",&x);
printf("\nValue of sine of %.2f is %.2lf\n",x,sine(x * M_PI / 180));
printf("\nValue of sine of %.2f from library function is %.2lf\n",x,sin(x * M_PI / 180));
return 0;
}
The problem is that the program works perfectly fine from 0 to 180 degrees, but beyond that it gives error.. Also when I increase the value of n in for (n=0;n<8;n++) beyond 8, i get significant error.. There is nothing wrong with the algorithm, I've tested it in my calculator, and the program seems to be fine as well.. I think the problem is due to the range of the data type.. what should i correct to get rid of this error?
Thanks..

You are correct that the error is due to the range of the data type. In sine(), you are calculating the factorial of 15, which is a huge number and does not fit in 32 bits (which is presumably what long int is implemented as on your system). To fix this, you could either:
Redefine factorial to return a double.
Rework your code to combine power and factorial into one loop, which alternately multiplies by x, and divides by i. This will be messier-looking but will avoid the possibility of overflowing a double (granted, I don't think that's a problem for your use case).

15! is indeed beyond range that a 32bit integer can hold. I'd use doubles throughout if I were you.
The taylor series for sin(x) converges more slowly for large values of x. For x outside -π,π. I'd add/subtract multiples of 2*π to get as small an x as possible.

You need range reduction. Note that a Taylor series is best near zero and that in the negative range it is the (negative) mirror image of it's positive range. So, in short: reduce the range (by the modula of 2 PI) to wrap it it the range where you have the highest accuracy. The range beyond 1/2 PI is getting less accurate, so you also want to use the formula: sin(1/2 PI + x) = sin(1/2 PI - x). For negative vales use the formula: sin(-x) = -sin(x). Now you only need to evaluate the interval 0 - 1/2 PI while spanning the whole range. Of course for VERY large values accuracy of the modula of 2 PI will suffer.

You may be having a problem with 15!.
I would print out the values for p, px, fac, and the value for the term for each iteration, and check them out.

You're only including 8 terms in an infinite series. If you think about it for a second in terms of a polynomial, you should see that you don't have a good enough fit for the entire curve.
The fact is that you only need to write the function for 0 <= x <=\pi; all other values will follow using these relationships:
sin(-x) = -sin(x)
and
sin(x+\pi;) = -sin(x)
and
sin(x+2n\pi) = sin(x)
I'd recommend that you normalize your input angle using these to make your function work for all angles as written.
There's a lot of inefficiency built into your code (e.g. you keep recalculating factorials that would easily fit in a table lookup; you use power() to oscillate between -1 and +1). But first make it work correctly, then make it faster.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

float vs. double strange behaviour in c - c

In your program, both divisor and dividend can be close to / exactly 0 at the same time. Your program does not give you a "division by zero"-error, because by default most floating point implementations silently give you infinity/NaN/0/-0, depending on the exact values you divide.

The arguments to sin() and cos() are expressed in radians, not degrees.

Related

Round float to 2 decimal places in C language?

Underflow error in floating point arithmetic in C

How to compare double variables in the if statement

Understanding the maximum values that can be stored in floats in C

Sine function using Taylor expansion (C Programming)

Categories

Resources