precision between float and double in C - c

I understand there are several topics same as mine, but I still don't really get it, so I'm expecting someone could explain this in a more simple but explicit way for me instead of pasting other topics' links, thanks.
Here's a sample code:
int a = 960;
int b = 16;
float c = a*0.001;
float d = a*0.001 + b;
double e = a*0.001 + b;
printf("%f\n%f\n%lf", c, d, e);
which outputs:
0.960000
16.959999
16.960000
My two questions are:
Why does adding an integer to a float ends up as the second output, but changing float to double solves the problem as the third output?
Why does the third output have the same number of digits with the first and second output after the decimal point since it should be a more precise value?

The reason why they produce the same number of decimal places, is because 6 is the default value. You can change that as in the edited example below, where the syntax is %.*f. The * can be either a number as shown below, or in the second case, supplied as another argument.
#include <stdio.h>
int main(void) {
int a = 960;
int b = 16;
float c = a*0.001;
float d = a*0.001 + b;
double e = a*0.001 + b;
printf("%.9f\n", c);
printf("%.*f\n", 9, d);
printf("%.16f\n", e);
}
Program output:
0.959999979
16.959999084
16.9600000000000009
The extra decimal places now shows that none of the results is exact. One reason is because 0.001 cannot be exactly coded as a floating point value. There are other reasons too, which have been extensively covered.
One easy way to understand why, is that a float has about 2^32 different values that can be encoded, however there is an infinity of real numbers within the range of float, and only about 2^32 of them can be represented exactly. In the case of the fraction 1/1000, in binary it is a recurring value (as is the fraction 1/3 in decimal).

I think the calculation a*0.001 will be done in double precision in both cases, then some precision is lost when you store it as a float.
You can choose how many decimal digits are printed by printf by writing e.g. "%.10lf" (to get 10 digits) instead of just "%lf".

Related

Trouble when computing modulos with floats in C

I am not an expert in programming, and I am facing the following issue.
I need to compute modulo between floats A and B.
So I use fmod((double)A, (double)B).
Theorically, if A is a multiple of B, then the result is 0.0.
However, due to floating point precision purpose, A and B are not exactly the number I expected to have.
Then, the result of the modulo computation is not 0.0, but something different.
Which is problematic.
Example:
A=99999.9, but the compiler interprets it as 99999.898.
B=99.9, but the compiler interprets it as 99.900002.
fmod(A,B) expected to be 0.0, but gives actually 99.9.
So the question is: how do you use to manage this kind of situation ?
Thank you
The trouble is that:
A is not 99999.9, but 99999.8984375 and
B is not 99.9, but 99.90000152587890625 and
A mod B is 99.89691162109375
OP is getting the correct answer for the arguments given.
Need to use different augments.
A reasonable alternative is to convert the arguments by a scaled power-of-10, then round to an integer, %, back to floating point and un-scale.
Overflow is a concern.
Since OP wants to treat numbers to the nearest 0.1, scale by 10.
#include <float.h>
#include <stdio.h>
int main(void) {
float A = 99999.9;
float B = 99.9;
printf("%.25f\n", A);
printf("%.25f\n", B);
printf("%.25f\n", fmod(A,B));
long long a = lround(A*10.0);
long long b = lround(B*10.0);
long long m = a%b;
double D = m/10.0;
printf("D = %.25f\n", D);
return 0;
}
Output
99999.8984375000000000000000000
99.9000015258789062500000000
99.8969116210937500000000000
D = 0.0000000000000000000000000
Alternative
long long a = lround(A*10.0);
long long b = lround(B*10.0);
long long m = a%b;
double D = m/10.0;
Scale, but skip the integer conversion part
double a = round(A*10.0);
double b = round(B*10.0);
double m = fmod(a,b);
double D = m/10.0;

Why doesn't roundf() round a float value and why do int - float math operations return wrong values?

I don't understand why doesn't the roundf() function from math.h round the donation variable, whilst it rounds livestockPM without a problem. I need to use the rounded values for other calculations, but I'm using printf to check if the values are correct, and it simply returns wrong values (doesn't round variable donation). Also, the variable final only returns values as if rounded to .00, doesn't matter what variables farmer1,2,3 hold.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
int main(){
int farmer1 = 9940;
int farmer2 = 4241;
int farmer3 = 7779;
float livestockPM = (float)farmer1 / (float)farmer2;
printf("livestock: %f\n",livestockPM);
livestockPM = roundf(livestockPM * 100) / 100;
printf("livestock rounded: %f\n",livestockPM);
float donation = (float)livestockPM * (float)farmer3;
printf("donation: %f\n", donation);
donation = roundf(donation * 100.00) / 100.00;
printf("donation rounded: %f\n", donation);
float final = donation * (float)farmer2;
printf("final: %f\n", final);
return 0;
}
Output:
livestock: 2.343787
livestock rounded: 2.340000
donation: 18202.859375
donation rounded: 18202.859375
final: 77198328.000000
Anyone got any idea why? I was thinking because of multiplying float with int, but I can't seem to get it work like this. I've tried removing the (float) from integer variables, but the results were undesirable as well. Thanks.
OP's float is encoded using binary floating point and 18202.859375 lacks precision to take on a value that "%f" prints as 18202.860000.
A float cannot represent every possible number. As a binary floating point number it can represent numbers like below. See IEEE 754 Converter, but not in between.
18202.859375
18202.86138125
When the following executes, the best possible result is again 18202.859375.
float donation_rounded = roundf(18202.859375 * 100.00) / 100.00;
Recall that printf("%f\n", x) prints a number rounded textually to the closest 0.000001 value.
Code could use double, but the same problem will occur with very large numbers, but may meet OP''s immediate need. #user3386109
As OP appears to be trying to cope with money, there is no great solution in standard C. best money/currency representation goes into some of the issues.

Moving decimal place to right in c

I'm new to C and when I run the code below, the value that is put out is 12098 instead of 12099.
I'm aware that working with decimals always involves a degree of inaccuracy, but is there a way to accurately move the decimal point to the right two places every time?
#include <stdio.h>
int main(void)
{
int i;
float f = 120.99;
i = f * 100;
printf("%d", i);
}
Use the round function
float f = 120.99;
int i = round( f * 100.0 );
Be aware however, that a float typically only has 6 or 7 digits of precision, so there's a maximum value where this will work. The smallest float value that won't convert properly is the number 131072.01. If you multiply by 100 and round, the result will be 13107202.
You can extend the range of your numbers by using double values, but even a double has limited range. (A double has 16 or 17 digits of precision.) For example, the following code will print 10000000000000098
double d = 100000000000000.99;
uint64_t j = round( d * 100.0 );
printf( "%llu\n", j );
That's just an example, finding the smallest number is that exceeds the precision of a double is left as an exercise for the reader.
Use fixed-point arithmetic on integers:
#include <stdio.h>
#define abs(x) ((x)<0 ? -(x) : (x))
int main(void)
{
int d = 12099;
int i = d * 100;
printf("%d.%02d\n", d/100, abs(d)%100);
printf("%d.%02d\n", i/100, abs(i)%100);
}
Your problem is that float are represented internaly using IEEE-754. That is in base 2 and not in base 10. 0.25 will have an exact representation, but 0.1 has not, nor has 120.99.
What really happens is that due to floating point inacuracy, the ieee-754 float closest to the decimal value 120.99 multiplied by 100 is slightly below 12099, so it is truncated to 12098. You compiler should have warned you that you had a truncation from float to in (mine did).
The only foolproof way to get what you expect is to add 0.5 to the float before the truncation to int :
i = (f * 100) + 0.5
But beware floating point are inherently inaccurate when processing decimal values.
Edit :
Of course for negative numbers, it should be i = (f * 100) - 0.5 ...
If you'd like to continue operating on the number as a floating point number, then the answer is more or less no. There's various things you can do for small numbers, but as your numbers get larger, you'll have issues.
If you'd like to only print the number, then my recommendation would be to convert the number to a string, and then move the decimal point there. This can be slightly complicated depending on how you represent the number in the string (exponential and what not).
If you'd like this to work and you don't mind not using floating point, then I'd recommend researching any number of fixed decimal libraries.
You can use
float f = 120.99f
or
double f = 120.99
by default c store floating-point values as double so if you store them in float variable implicit casting is happened and it is bad ...
i think this works.

Number of decimal digits in C

I like to change the number of decimal digits showed whenever I use a float number in C. Does it have something to do with the FLT_DIG value defined in float.h? If so, how could I change that from 6 to 10?
I'm getting a number like 0.000000 while the actual value is 0.0000003455.
There are two separate issues here: The precision of the floating point number stored, which is determined by using float vs double and then there's the precision of the number being printed as such:
float foo = 0.0123456789;
printf("%.4f\n", foo); // This will print 0.0123 (4 digits).
double bar = 0.012345678912345;
printf("%.10lf\n", bar); // This will print 0.0123456789
I experimented this problem and I found out that you cannot have great precision with float they are really bad .But if you use double it would give you the right answer. just mention %.10lf for precision upto 10 decimal points
You're running out of precision. Floats don't have great precision, if you want more decimal places, use the double data type.
Also, it seems that you're using printf() & co. to display the numbers - if you ever decide to use doubles instead of floats, don't forget to change the format specifiers from %f to %lf - that's for a double.
#kosmoplan - thank you for a good question!
#epsalon - thank you for a good response. My first thought, too, was "float" vs. "double". I was mistaken. You hit it on the head by realizing it was actually a "printf/format" issue. Good job!
Finally, to put to rest some lingering peripheral controversy:
/*
SAMPLE OUTPUT:
a=0.000000, x=0.012346, y=0.012346
a=0.0000003455, x=0.0123456791, y=0.0123456789
*/
#include <stdio.h>
int
main (int argc, char *argv[])
{
float x = 0.0123456789, a = 0.0000003455;
double y = 0.0123456789;
printf ("a=%f, x=%f, y=%lf\n", a, x, y);
printf ("a=%.10f, x=%.10f, y=%.10lf\n", a, x, y);
return 0;
}

How do I compute maximum/minimum of 8 different float values

I need to find maximum and minimum of 8 float values I get. I did as follows. But float comparisons are going awry as warned by any good C book!
How do I compute the max and min in a accurate way.
main()
{
float mx,mx1,mx2,mx3,mx4,mn,mn1,mn2,mn3,mn4,tm1,tm2;
mx1 = mymax(2.1,2.01); //this returns 2.09999 instead of 2.1 because a is passed as 2.09999.
mx2 = mymax(-3.5,7.000001);
mx3 = mymax(7,5);
mx4 = mymax(7.0000011,0); //this returns incorrectly- 7.000001
tm1 = mymax(mx1,mx2);
tm2 = mymax(mx3,mx4);
mx = mymax(tm1,tm2);
mn1 = mymin(2.1,2.01);
mn2 = mymin(-3.5,7.000001);
mn3 = mymin(7,5);
mn4 = mymin(7.0000011,0);
tm1 = mymin(mx1,mx2);
tm2 = mymin(mx3,mx4);
mn = mymin(tm1,tm2);
printf("Max is %f, Min is %f \n",mx,mn);
getch();
}
float mymax(float a,float b)
{
if(a >= b)
{
return a;
}
else
{
return b;
}
}
float mymin(float a,float b)
{
if(a <= b)
{
return a;
}
else
{
return b;
}
}
How can I do exact comparisons of these floats? This is all C code.
thank you.
-AD.
You are doing exact comparison of these floats. The problem (with your example code at least) is that float simply does not have enough digits of precision to represent the values of your literals sufficiently. 7.000001 and 7.0000011 simply are so close together that the mantissa of a 32 bit float cannot represent them differently.
But the example seems artificial. What is the real problem you're trying to solve? What values will you actually be working with? Or is this just an academic exercise?
The best solution depends on the answer to that. If your actual values just require somewhat more more precision than float can provide, use double. If you need exact representation of decimal digits, use a decimal type library. If you want to improve your understanding of how floating point values work, read The Floating-Point Guide.
You can do exact comparison of floats. Either directly as floats, or by casting them to int with the same bit representation.
float a = 1.0f;
float b = 2.0f;
int &ia = *(int *)(&a);
int &ib = *(int *)(&b);
/* you can compare a and b, or ia and ib, the results will be the same,
whatever the values of the floats are.
Floats are ordered the correct way when its bits are considered as int
and thus can be compared (provided that float and int both are 32 bits).
*/
But you will never be able to represent exactly 2.1 as a float.
Your problem is not a problem of comparison, it is a problem of representation of a value.
I'd claim that these comparisons are actually exact, since no value is altered.
The problem is that many float literals can't be represented exactly by IEEE-754 floating point numbers. So for example 2.1.
If you need an exact representation of base 10 pointed numbers you could - for example - write your own fixed point BCD arithmetic.
Concerning finding min and max at the same time:
A way that needs less comparisons is for each index pair (2*i, 2*i+1) first finding the minimum (n/2 comparisons)
Then find the minimum of the minima ((n-1)/2 comparisons) and the maximum of the maxima ((n-1)/2 comparisons).
So we get (3*n-2)/2 comparisons instead of (2*n-2)/2 when finding the minimum and maximum separated.
The < and > comparison always works correct with floats or doubles. Only the == comparison has problems, therefore you are advised to use epsilon.
So your method of calculating min, max has no issue. Note that if you use float, you should use the notation 2.1f instead of 2.1. Just a note.

Resources