Floating point rounding in C - c

I've run into some weird rounding behaviour with floats. The code below demonstrates the problem. What is the best way to solve this? I've been looking for solutions but haven't had much luck.
#include<stdio.h>
int main(void)
{
float t;
t = 5592411;
printf("%f\n", 1.5*t);
t *= 1.5;
printf("%f\n", t);
return 0;
}
The code above should print out the same value, but I get this on my setup using GCC 4.7.2:
8388616.500000
8388616.000000
If I use a calculator, I get the first value, so I assume the second is being rounded somehow. I have identical Fortran code which does not round the value(has the 0.5).

1.5 is a double constant rather than a float and C has automatic promotion rules. So when you perform 1.5*t what happens is (i) t is converted to a double; (ii) that double is multiplied by the double 1.5; and (iii) the double is printed (as %f is the formatter for a double).
Conversely, t *= 1.5 promotes t to a double, performs a double multiplication and then truncates the result to store it back into a [single precision] float.
For evidence, try either:
float t;
t = 5592411;
printf("%f\n", 1.5f*t); // multiply a float by a float, for no promotion
t *= 1.5;
printf("%f\n", t);
return 0;
Or:
double t; // store our intermediate results in a double
t = 5592411;
printf("%f\n", 1.5f*t);
t *= 1.5;
printf("%f\n", t);
return 0;

The first calculation is done with double precision, the second is calculated the same, but truncated to single precision in the assignment to float.
If you use double for your variable, you'll get the same result. It's a good idea to use this type over float whenever accuracy may be a concern.

In the first case, the result is a double which can precisely represent the desired value.
In the second case, the result is a float which can't precisely represent the desired value.
Try the same with double and you'll end up with the same results either way.
#include<stdio.h>
int main(void)
{
double t;
t = 5592411;
printf("%f\n", 1.5*t);
t *= 1.5;
printf("%f\n", t);
return 0;
}

Writing 1.5 in C code is interpreted as a double, which has more precision than the float type.
The first case,
printf("%f\n", 1.5*t);
results in t being implicitly converted to a double (with greater precision) and then multiplied. The printf function, which casts the input corresponding to %f anyway, prints the result, which is also a double.
The second case has the 1.5 being converted to the float type, which has less precision and cannot store as small details.
If you want to avoid this effect, use 1.5f instead on 1.5 to use floats, or change the type of t to double.

Whether this would work at all depends on the machine representation of floats and doubles. Passing a float on a typical 32 bit architecture pushes 4 bytes on the argument stack. Passing a double would push 8 bytes. Passing a double but using %f is asking to treat it as a float which will look at the first 4 bytes pushed in our typical case. Depending on machine representation this might be close to the intended result or might be way out in left field.

Related

VS code is showing different answer for two same code

I run two same code. But it shows different answer.
Code 1:
#include<stdio.h>
int main(){
float far = 98.6;
printf("%f", (far-32)*5/9);
return 0;
}
Code 2:
#include<stdio.h>
int main(){
float far = 98.6;
float cel;
cel = (far-32)*5/9;
printf("%f", cel);
return 0;
}
First code gives 36.99999 as output and second code gives 37.00000 as output.
Research FLT_EVAL_METHOD. This reports the intermediate floating-point math allowed.
printf("%d\n", FLT_EVAL_METHOD);
When this is non-zero, the 2 codes may have different output as printf("%f", (far-32)*5/9); can print the result of (far-32)*5/9 using double or float math.
In the 2nd case, (far-32)*5/9); is performed user float or double and then saved as a float and then printed. Its promotion to a double as part of a printf() ... argument does not affect the value.
For deeper understanding, print far, cel, (far-32)*5/9 with "%a" and "%.17g" for greater detail.
In both cases, far is the float value 0x1.8a6666p+6 or 98.599998474121094...
As I see it the first used double math in printf("%f", (far-32)*5/9); and the second used double math too, yet rounded to a float result from cel = (far-32)*5/9;. To be certain we need to see the intermediate results as suggested above.
Avoid double constants with float objects. It sometimes makes a difference.
// float far = 98.6;
float far = 98.6f;
Use double objects as a default. Save float for select speed/space cases. #Some programmer dude.
The difference lies in the types used and the printf call.
Variable-argument functions like printf will promote arguments of smaller types. So for example a float argument will be promoted to double.
The type double have much higher precision than float.
Since in the first program you do the calculation as part of the actual printf call, not storing the result in a variable using less precision, the compiler might perform the whole calculation using double and increasing the precision. Precision that will be lost when you store the result in cel in the second example.
Unless you have very specific requirements, you should generally always use double for all your floating-point variables and values and calculations.

How to extract fractional part of a double value without rounding in c

When i try to extract the fractional part of a double it seems to be rounding down in C, Is there a way to do it without rounding?
double t = 8.2;
int ipart = (int)t;
long long val = abs((long long)(t*1000000));
long long fpart = (val)%1000000;
fpart gives 199999, Is there a way to get it as 200000 without rounding down? Tried many ways but none of the methods seems to be working for all the numbers.
Intention is to finally convert this double into string which should have the exact value as "8.20000". If i can extract fraction part in long long variable then i can generate the string using snprintf.
How to extract fractional part of a double value without rounding ...?
Use modf() from the standard C library.
#include <math.h>
double ipart;
double fraction = modf(value, &ipart);
printf("Fraction %g\n", fraction);
printf("Whole number %g\n", ipart);
The modf functions break the argument value into integral and fractional parts, each of which has the same type and sign as the argument. They store the integral part (in floating-point format) in the object pointed to by iptr.
C17dr ยง 7.12.6.12 2
Deeper into 8.2
double can represent about 264 different values exactly. 8.2 is not one of them.
With double t = 8.2;, t takes on a nearby value, which is exactly 8.199999999999999289457264239899814128875732421875 or
81801439850948192/9007199254740992 due to the binary nature of common double.
To find the fraction, use fraction*pow(2,DBL_MANT_DIG)/pow(2,DBL_MANT_DIG).
Thus the goal of "to get it as 200000 without rounding down" as 200000/denominator for the fraction part of t is not possible.
The value 8.2 can't be exactly represented in binary floating point. The actual value is closer to 8.19999999999999929.
Because of this, you're forced to round:
long long val = llabs(round(t*1000000));
Or add 0.5 to the value before converting:
long long val = llabs((t*1000000) + 0.5);

Assigning a value to a float variable changes the value in C

I am trying to calculate an average of integer numbers and assign it to a float variable. When I debug it with cgdb and print the right side of the average calculation, it gives me the right number. However, when I assign it to the (float*)payload the value changes from 401850471 to 401850464.00.
float sum= 0.0;
for (int i = 0;
i<avg_operator->data_source->column_pointer.result->num_tuples;
i++) {
sum+= ((int*)avg_operator->data_source->column_pointer.result->payload)[i];
}
((float*)avg_operator->result->payload)[0]=
sum/(float)avg_operator->data_source->column_pointer.result->num_tuples;
You cannot convert an int to a float by casting their pointers, that gives a random / undefined value. You need to dereference the float pointer, and assign the value.
that:
((float*)avg_operator->result->payload)[0]= sum/(float)avg_operator->data_source->column_pointer.result->num_tuples;
isn't casting, it's lying to the compiler. You should dereference, and no need to cast to float, as the conversion to integer is done automatically:
avg_operator->result->payload[0]= sum/(float)avg_operator->data_source->column_pointer.result->num_tuples;
(well, maybe you need to round the value instead of truncating, though)
also, since payload is an integer, no need to cast to integer pointer as well, just do:
sum+= avg_operator->data_source->column_pointer.result->payload[i];
and define sum as a float, one never knows with floating point accumulation error (if the sum isn't too big for an integer, that is)
When I debug it with cgdb and print the right side of the average calculation, it gives me the right number.
The debugger is showing the quotient using double math. C allows float division to use wider types. But once the quotient is assigned to a float, precision narrowing may occur.
401850471 is a 29 bit value. A float typically has 24 bits of precision. Something must give.
401850464.0 is the closest representable float to 401850471, so that speaks well to that at least there is a reasonable result.
OP is also doing other strange code manipulations. A recommend solution begins with a wider sum type and more precise division and storage.
long long sum = 0.0;
int n = avg_operator->data_source->column_pointer.result->num_tuples
int *data = (int*)avg_operator->data_source->column_pointer.result->payload;
for (int i = 0; i < n; i++) {
sum += data[i];
}
double average = 1.0 * sum / n;
printf("Average %f\n", average);
If the answer must be a float, code must live with a rounded (in a binary sense) average.

Trying to print answer to equation and getting zero in C.

printf("Percent decrease: ");
printf("%.2f", (float)((orgChar-codeChar)/orgChar));
I'm using this statement to print some results to my command console, however, I end up with zero. Putting the equation into another variable doesn't work either.
orgChar = 91 and codeChar = 13, how do I print out this equation?
Integer division will lead to result 0 here and you are type casting the result later to float so eventually you will end up with 0
Make any one of the variables float before division
(orgChar-codeChar)/(float)orgChar
As others have mentioned, the subtraction and division are done using integer math before the cast to (float). By that point, the integer division has a truncated result of 0. Instead:
// (float)((orgChar-codeChar)/orgChar)
((float) orgChar - codeChar)/orgChar
// or
(orgChar - codeChar)/ (float) orgChar
As the float argument gets converted to double as part of the "usual argument promotion" of arguments to a variadic function like printf(), might as well do
printf("%.2f", (orgChar-codeChar)/ (double) orgChar);
Casting, in general, should be avoided. Some casts unintentionally narrow the operation. If unsigned is 32-bit and a1 is uint64_t, then a1 was narrowed before the shift and unexpected results may occur. If a1 was a char, it is nicely converted without trouble to an unsigned.
The second method of *1u will not narrow. It will insure a2*1u is at least the width of an unsigned.
unsigned sh1 = (unsigned) a1 >> b1; // avoid
unsigned sh2 = a2*1u >> b2; // better
So recommend, rather than (float) or (double), use the idiom of multiplying by 1.
printf("%.2f", (orgChar - codeChar) * 1.0 / orgChar);
you don't need to typecast the whole expression. you can simply type cast either the numerator or the denominator to get the float result with precision of 2 decimal places.
for eg:
here in this code defining a variable c as float doesnt guarantee the result to be float.for getting the precise result you need to typecast either the numerator or denominator.
You shouldn't need to cast to float at all. Simply make sure both variables are of type float or double before attempting to print them as floats. This means either declaring the variables as floats, or using the correct function, such as atof () when converting the data to floats (normally this is done when you get the data from the command-line or a file.)
This should work...
#include <stdio.h>
int
main (void)
{
float orgChar = 91;
float codeChar = 13;
printf ("%.2f\n", (orgChar - codeChar) / orgChar);
return 0;
}

Why does 1/2 in c = -2010232232?

I have a very simple C program.
int main(void){
double sum = 1/2;
printf("%d\n", sum);
return 0;
}
Why does it return a number like "-2030243223"?
And why does this number change every time I run the program?
I've tried using int's and float's but I can't seem to get the output to be 0.5!?
Use %f to print a double, not %d. The latter causes undefined behavior.
Also the expression 1/2 uses integer division which yields 0, so to get .5, use 1/2. (note trailing period).
Finally, to actually get .5 instead of something like 0.500000, specify the precision:
printf("%.1f\n", sum);
You're passing an IEEE floating-point number to printf, but telling it it's an integer.
Change
printf("%d\n", sum);
to
printf("%f\n", sum);
As noted in other comments, you probably also want to do floating-point division rather than integer division. 1/2 is zero; you're dividing two integers, so the result is an integer (yes, even though you're about to assign it into a floating-point variable -- each expression's type is determined in isolation) and it has to discard the remainder. You probably want 1.0/2 instead; if either term is a float, then the result is a float, so you'll get 0.5.
You're trying to print a double to an Integer. (%d).
Even though they both are numbers, internally there are huge differences, for example, memory ones.
You're trying to print a double as an int. Try using:
printf("%g\n");
Also, you will want to make you initial division 1.0/2.0 or else you will get 0.
Try this:
int main(void){
double sum = 1.0/2.0;
printf("%10.3f\n", sum);
return 0;
}
There is a couple of issues with your code
int main(void){
// you need to divide a double by a double, not two ints
double sum = 1.0/2.0;
// %d is for int, not doubles. You use %f in the case of printing a double
printf("%f\n", sum);
return 0;
}
Now if you really want to print the variable as an int, you can cast the variable sum as an int by doing printf("%d\n", (int)sum);, but you'll have a rounding issue.
Try the following:
printf("%f\n", sum);
%d is a placeholder for integer, maybe the address of sum has been printed out^^
The bad format can't be the whole answer, because:
With OP's code, the double should be zero, stored as 0x0000000000000000, which couldn't show as "-2030243223" (0x86FCF269).
OP says this value changes all the time.
So there must be something else, such as the fact that the header is missing: I guess the compiler doesn't know printf is a variadic function, and tries to pass the argument through floating-point registers.
printf, being a variadic function, probably attempts to read its input value from the stack, hence garbage.

Resources