Typically, Rounding to 2 decimal places is very easy with
printf("%.2lf",<variable>);
However, the rounding system will usually rounds to the nearest even. For example,
2.554 -> 2.55
2.555 -> 2.56
2.565 -> 2.56
2.566 -> 2.57
And what I want to achieve is that
2.555 -> 2.56
2.565 -> 2.57
In fact, rounding half-up is doable in C, but for Integer only;
int a = (int)(b+0.5)
So, I'm asking for how to do the same thing as above with 2 decimal places on positive values instead of Integer to achieve what I said earlier for printing.
It is not clear whether you actually want to "round half-up", or rather "round half away from zero", which requires different treatment for negative values.
Single precision binary float is precise to at least 6 decimal places, and 20 for double, so nudging a FP value by DBL_EPSILON (defined in float.h) will cause a round-up to the next 100th by printf( "%.2lf", x ) for n.nn5 values. without affecting the displayed value for values not n.nn5
double x2 = x * (1 + DBL_EPSILON) ; // round half-away from zero
printf( "%.2lf", x2 ) ;
For different rounding behaviours:
double x2 = x * (1 - DBL_EPSILON) ; // round half-toward zero
double x2 = x + DBL_EPSILON ; // round half-up
double x2 = x - DBL_EPSILON ; // round half-down
Following is precise code to round a double to the nearest 0.01 double.
The code functions like x = round(100.0*x)/100.0; except it handles uses manipulations to insure scaling by 100.0 is done exactly without precision loss.
Likely this is more code than OP is interested, but it does work.
It works for the entire double range -DBL_MAX to DBL_MAX. (still should do more unit testing).
It depends on FLT_RADIX == 2, which is common.
#include <float.h>
#include <math.h>
void r100_best(const char *s) {
double x;
sscanf(s, "%lf", &x);
// Break x into whole number and fractional parts.
// Code only needs to round the fractional part.
// This preserves the entire `double` range.
double xi, xf;
xf = modf(x, &xi);
// Multiply the fractional part by N (256).
// Break into whole and fractional parts.
// This provides the needed extended precision.
// N should be >= 100 and a power of 2.
// The multiplication by a power of 2 will not introduce any rounding.
double xfi, xff;
xff = modf(xf * 256, &xfi);
// Multiply both parts by 100.
// *100 incurs 7 more bits of precision of which the preceding code
// insures the 8 LSbit of xfi, xff are zero.
int xfi100, xff100;
xfi100 = (int) (xfi * 100.0);
xff100 = (int) (xff * 100.0); // Cast here will truncate (towards 0)
// sum the 2 parts.
// sum is the exact truncate-toward-0 version of xf*256*100
int sum = xfi100 + xff100;
// add in half N
if (sum < 0)
sum -= 128;
else
sum += 128;
xf = sum / 256;
xf /= 100;
double y = xi + xf;
printf("%6s %25.22f ", "x", x);
printf("%6s %25.22f %.2f\n", "y", y, y);
}
int main(void) {
r100_best("1.105");
r100_best("1.115");
r100_best("1.125");
r100_best("1.135");
r100_best("1.145");
r100_best("1.155");
r100_best("1.165");
return 0;
}
[Edit] OP clarified that only the printed value needs rounding to 2 decimal places.
OP's observation that rounding of numbers "half-way" per a "round to even" or "round away from zero" is misleading. Of 100 "half-way" numbers like 0.005, 0.015, 0.025, ... 0.995, only 4 are typically exactly "half-way": 0.125, 0.375, 0.625, 0.875. This is because floating-point number format use base-2 and numbers like 2.565 cannot be exactly represented.
Instead, sample numbers like 2.565 have as the closest double value of 2.564999999999999947... assuming binary64. Rounding that number to nearest 0.01 should be 2.56 rather than 2.57 as desired by OP.
Thus only numbers ending with 0.125 and 0.625 area exactly half-way and round down rather than up as desired by OP. Suggest to accept that and use:
printf("%.2lf",variable); // This should be sufficient
To get close to OP's goal, numbers could be A) tested against ending with 0.125 or 0.625 or B) increased slightly. The smallest increase would be
#include <math.h>
printf("%.2f", nextafter(x, 2*x));
Another nudge method is found with #Clifford.
[Former answer that rounds a double to the nearest double multiple of 0.01]
Typical floating-point uses formats like binary64 which employs base-2. "Rounding to nearest mathmatical 0.01 and ties away from 0.0" is challenging.
As #Pascal Cuoq mentions, floating point numbers like 2.555 typically are only near 2.555 and have a more precise value like 2.555000000000000159872... which is not half way.
#BLUEPIXY solution below is best and practical.
x = round(100.0*x)/100.0;
"The round functions round their argument to the nearest integer value in floating-point
format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.6.
The ((int)(100 * (x + 0.005)) / 100.0) approach has 2 problems: it may round in the wrong direction for negative numbers (OP did not specify) and integers typically have a much smaller range (INT_MIN to INT_MAX) that double.
There are still some cases when like when double x = atof("1.115"); which end up near 1.12 when it really should be 1.11 because 1.115, as a double is really closer to 1.11 and not "half-way".
string x rounded x
1.115 1.1149999999999999911182e+00 1.1200000000000001065814e+00
OP has not specified rounding of negative numbers, assuming y = -f(-x).
Related
I am doing a numerical analysis of a math software I developed. I want to identify what is the uncertainty of my result. Being f() my method and x an input value, I want to identify y of my result as f(x) +/- y. My f() method has multiple operations between float variables. To study the error propagation occurred in f(), I have to apply the Statistical Propagation of Uncertainty formulas and in order to do so I have to know the uncertainty of a float variable.
I do understand the architecture of a float variable as specified in the IEEE 754 standard and the rounding error converting a decimal value to float inherent to the latter.
From what I understood of the literature, the FLT_EPSILON macro in http://www.cplusplus.com/reference/cfloat/
defines my y value but this quick test proves it wrong:
float f1 = 1.234567f;
float f2 = 1.234567f + 1.192092896e-7f;
float f3 = 1.234567f + 1.192092895e-7f;
printf("Inicial:\t%f\n", f1);
printf("Inicial:\t%f\n", f2);
printf("Inicial:\t%f\n\n", f3);
Output:
Inicial: 1.234567
Inicial: 1.234567
Inicial: 1.234567
When the expected output should be:
Inicial: 1.234567
Inicial: 1.234568 <---
Inicial: 1.234567
What is that I am wrong about?
Should not the float value of x + FLT_EPSILON and x - FLT_EPSILON be the same?
EDIT: My question is being R the float value of x, what is the y value that x + y || x - y equals the same R float value?
Propagation of uncertainty is from the field of statistics and refers to how uncertainties in inputs affect mathematical functions of them. The analysis of errors that occur in computational arithmetic is numerical analysis.
FLT_EPSILON is not a measure of uncertainty or error in floating-point results. It is the distance between 1 and the next value representable in the float type. Hence, it is the size of steps between representable numbers at the magnitude of 1.
When you convert a decimal numeral to floating-point, the rounding error that results may have a magnitude of up to ½ the step size when the common round-to-nearest mode is used. The reason the bound is ½ the step size is that for any number x (within the finite domain of the floating-point format), there is a representable value within ½ the step size (inclusive). This is because, if there is a representable number more than ½ the step size in one direction, there is a representable number less than ½ the step size in the other direction.
The step size varies with the magnitudes of the numbers. With binary floating-point, it doubles at 2, and again at 4, then 8, and so on. Below 1, it halves, and again at ½, ¼, and so on.
When you perform floating-point arithmetic operations, the rounding that occurs in the computation may compound or cancel previous errors. There is no general formula for the final error.
The two numerals use used in your sample code, 1.192092897e-7f and 1.192092896e-7f, are so close together that they convert to the same float value, 2−23. That is why there is no difference in your f2 and f3.
There is a difference between f1 and f2, but you did not print enough digits to display it.
You ask “Should not the float value of x + FLT_EPSILON and x - FLT_EPSILON be the same?”, but your code does not contain x - FLT_EPSILON.
Re: “My question is being R the float value of x, what is the y value that x + y || x - y equals the same R float value?” This is trivially satisfied by y = 0. Did you mean to ask what is the largest value of y that satisfies the condition? That is a bit complicated.
The step size for a number x is called the ULP of x, which we may consider as a function ULP(x). ULP stands for Unit of Least Precision. It is the place value of the least digit in the floating-point representation of x. It is not a constant; it is a function of x.
For most values representable in a floating-point format, the largest y that satisfies your condition is ½ ULP(x) of the least digit in the floating-point representation of x is even and, if the digit is odd, it is just under ½ ULP(x). This complication arises from the rule that the results of arithmetic are rounded to the nearest representable value and, in case of a tie, the value with the even low digit is chosen. Thus, adding ½ ULP(x) to x will yield a tie that will round to x if the low digit is even, but will not round to x if the low digit is odd.
However, for x that are on the boundary where the ULP changes, the largest y that satisfies your condition is ¼ ULP(x). This is because, just below x (in magnitude), the step size changes, and the next number lower than x is half of x’s step size away instead of the usual full step size. So you can only go halfway toward that value before changing the result of the subtraction, so the most y can be is ¼ ULP(x).
Float is a 32 bit IEEE 754 single precision Floating Point Number: 1 bit for the sign, 8 bits for the exponent, and 23* for the value, i.e. float has 7 decimal digits of precision.
Increase the printf number of printed digits to see more but after 7 digits its just noise:
#include <stdio.h>
int main(void) {
float f1 = 1.234567f;
float f2 = 1.234567f + 1.192092897e-7f;
float f3 = 1.234567f + 1.192092896e-7f;
printf("Inicial:\t%.16f\n", f1);
printf("Inicial:\t%.16f\n", f2);
printf("Inicial:\t%.16f\n\n", f3);
return 0;
}
Output:
Inicial: 1.2345670461654663
Inicial: 1.2345671653747559
Inicial: 1.2345671653747559
float f1 = 1.234567f;
float f2 = f1 + 1.192092897e-7f;
float f3 = f1 + 1.192092896e-7f;
printf("Inicial:\t%.20f\n", f1);
printf("Inicial:\t%.20f\n", f2);
printf("Inicial:\t%.20f\n\n", f3);
Output:
Inicial: 1.23456704616546630000
Inicial: 1.23456716537475590000
Inicial: 1.23456716537475590000
No, your expectation is wrong
In the first printf call, you're printing the variable f1 with no effect which is just 1.234567f.
How to round result to third digit after the third digit.
float result = cos(number);
Note that I want to save the result up to the third digit, no rounding. And no, I don't want to print it with .3f, I need to save it as new value;
Example:
0.00367 -> 0.003
N.B. No extra zeroes after 3 are wanted.
Also, I need to be able to get the 3rd digit. For example if it is 0.0037212, I want to get the 3 and use it as an int in some calculation.
0.00367 -> 0.003
A float can typically represent about 232 different values exactly. 0.00367 and 0.003 are not in that set.
The closest float to 0.00367 is 0.0036700000055134296417236328125
The closest float to 0.003__ is 0.0030000000260770320892333984375
I want to save the result up to the third digit
This goal needs a compromise. Save the result to a float near a multiple of 0.001.
Scaling by 1000.0, truncating and dividing by 1000.0 will work for most values.
float y1 = truncf(x * 1000.0f) / 1000.0f;
The above gives a slightly wrong answer with some values near x.xxx000... and x.xxx999.... Using higher precision can solve that.
float y2 = (float) (trunc(x * 1000.0) / 1000.0);
I want to get the 3 and use it as an int in some calculation.
Skip the un-scaling part and only keep 1 digit with fmod().
int digit = (int) fmod((trunc(x * 1000.0), 10);
digit = abs(digit);
In the end, I suspect this approach will not completely satisfy OP's unstated "use it as an int in some calculation.". There are many subtitles to FP math, especially when trying to use a binary FP, as are most double, in some sort of decimal way.
Perhaps the following will meet OP's goal, even though it does some rounding.:
int third_digit = (int) lround(cos(number)*1000.0) % 10;
third_digit = abs(third_digit);
You can scale the value up, use trunc to truncate toward zero, then scale down:
float result = trunc(cos(number) * 1000) / 1000;
Note that due to the inexact nature of floating point numbers, the result won't be the exact value.
If you're looking to specifically extract the third decimal digit, you can do that as follows:
int digit = (int)(result * 1000) % 10;
This will scale the number up so that the digit in question is to the left of the decimal point, then extract that digit.
You can subtract from the number it's remainder from division by 0.001:
result -= fmod(result, 0.001);
Demo
Update:
The question is updated with very conflicting requirements. If you have an exact 0.003 number, there will be infinite numbers of zeroes after it, and it is a mathematical property of numbers. OTOH, float representation cannot guarantee that every exact number of 3 decimal digits will be represented exactly. To solve this problem you will need to give up on using the float type and switch to a some sort of fixed point representation.
Overkill, using sprintf()
double /* or float */ val = 0.00385475337;
if (val < 0) exit(EXIT_FAILURE);
if (val >= 1) exit(EXIT_FAILURE);
char tmp[55];
sprintf(tmp, "%.50f", val);
int third_digit = tmp[4] - '0';
I have seen this code:
(int)(num < 0 ? (num - 0.5) : (num + 0.5))
(How to round floating point numbers to the nearest integer in C?)
for rounding but I need to use float and precision for three digits after the point.
Examples:
254.450 should be rounded up to 255.
254.432 should be rounded down to 254
254.448 should be rounded down to 254
and so on.
Notice: This is what I mean by "3 digits" the bold digits after the dot.
I believe it should be faster then roundf() because I use many hundreds of thousands rounds when I need to calculate the rounds. Do you have some tips how to do that? I tried to search source of roundf but nothing found.
Note: I need it for RGB2HSV conversion function so I think 3 digits should be enough. I use positive numbers.
"it should be faster then roundf()" is only verifiable with profiling various approaches.
To round to 0 places (round to nearest whole number), use roundf()
float f;
float f_rounded3 = roundf(f);
To round to 3 places using float, use round()
The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction.
#include <math.h>
float f;
float f_rounded3 = round(f * 1000.0)/1000.0;
Code purposely uses the intermediate type of double, else code code use with reduced range:
float f_rounded3 = roundf(f * 1000.0f)/1000.0f;
If code is having trouble rounding 254.450 to 255.0 using roundf() or various tests, it is likely because the value is not 254.450, but a float close to it like 254.4499969 which rounds to 254. Typical FP using a binary format and 254.450 is not exactly representable.
You can use double transformation float -> string -> float, while first transformation make 3 digits after point:
sprintf(tmpStr, "%.3f", num);
this work for me
#include <stdio.h>
int main(int ac, char**av)
{
float val = 254.449f;
float val2 = 254.450f;
int res = (int)(val < 0 ? (val - 0.55f) : (val + 0.55f));
int res2 = (int)(val2 < 0 ? (val2 - 0.55f) : (val2 + 0.55f));
printf("%f %d %d\n", val, res, res2);
return 0;
}
output : 254.449005 254 255
to increase the precision just add any 5 you want in 0.55f like 0.555f, 0.5555f, etc
I wanted something like this:
float num = 254.454300;
float precision=10;
float p = 10*precision;
num = (int)(num * p + 0.5) / p ;
But the result will be inaccurate (with error) - my x86 machine gives me this result: 254.449997
When you can change de border from b=0.5 to b=0.45 you must know that for positives the rounded value is round_0(x,b)=(int)( x+(1-b) ) therefore b=0.45 ⟹ round_0(x)=(int)(x+0.55) and you can threat the signal. But remember that don't exists 254.45 but 254.449997 and 254.449999999999989, maybe you prefer to use b=0.4495.
If you have float round_0(float) to zero-digit rounding (can be like you show in question), you can do for one, two... n-digit rounding like this in C/C++: # define round_n(x,n) (round_0((x)*1e##n)/1e##n).
round_1( x , b ) = round_0( 10*x ,b)/10
round_2( x , b ) = round_0( 100*x ,b)/100
round_3( x , b ) = round_0( 1000*x ,b)/1000
round_n( x , b , n ) = round_0( (10^n)*x ,b)/(10^n)
But do typecast to int and (one more typecast) to float to operate is slower than rounds in operations. If don't simplify the add/sub (some compilers have this setting) for faster zero-digit round to float type you can do it.
inline float round_0( float x , float b=0.5f ){
return (( x+(0.5f-b) )+(3<<22))-(3<<22) ; // or (( x+(0.5f-b) )-(3<<22))+(3<<22) ;
}
inline double round_0( double x , double b=0.5 ){
return (( x+(0.5-b) )+(3<<51))-(3<<51) ; // or (( x+(0.5-b) )-(3<<51))+(3<<51) ;
}
When b=0.5 it correctly rounds to nearest integer if |x|<=2^23 (float) or |x|<=2^52 (double). But if compiler uses FPU (ten bytes floating-point) optimizing loads then constant is 3.0*(1u<<63), works |x|<=2^64 and use long double can be faster.
I found Stevens Computing Services – K & R Exercise 2-1 a very thorough answer to K&R 2-1. This slice of the full code computes the maximum value of a float type in the C programming language.
Unluckily my theoretical comprehension of float values is quite limited. I know they are composed of significand (mantissa.. ) and a magnitude which is a power of 2.
#include <stdio.h>
#include <limits.h>
#include <float.h>
main()
{
float flt_a, flt_b, flt_c, flt_r;
/* FLOAT */
printf("\nFLOAT MAX\n");
printf("<limits.h> %E ", FLT_MAX);
flt_a = 2.0;
flt_b = 1.0;
while (flt_a != flt_b) {
flt_m = flt_b; /* MAX POWER OF 2 IN MANTISSA */
flt_a = flt_b = flt_b * 2.0;
flt_a = flt_a + 1.0;
}
flt_m = flt_m + (flt_m - 1); /* MAX VALUE OF MANTISSA */
flt_a = flt_b = flt_c = flt_m;
while (flt_b == flt_c) {
flt_c = flt_a;
flt_a = flt_a * 2.0;
flt_b = flt_a / 2.0;
}
printf("COMPUTED %E\n", flt_c);
}
I understand that the latter part basically checks to which power of 2 it's possible to raise the significand with a three variable algorithm. What about the first part?
I can see that a progression of multiples of 2 should eventually determine the value of the significand, but I tried to trace a few small numbers to check how it should work and it failed to find the right values...
======================================================================
What are the concepts on which this program is based upon and does this program gets more precise as longer and non-integer numbers have to be found?
The first loop determines the number of bits contributing to the significand by finding the least power 2 such that adding 1 to it (using floating-point arithmetic) fails to change its value. If that's the nth power of two, then the significand uses n bits, because with n bits you can express all the integers from 0 through 2^n - 1, but not 2^n. The floating-point representation of 2^n must therefore have an exponent large enough that the (binary) units digit is not significant.
By that same token, having found the first power of 2 whose float representation has worse than unit precision, the maximim float value that does have unit precision is one less. That value is recorded in variable flt_m.
The second loop then tests for the maximum exponent by starting with the maximum unit-precision value, and repeatedly doubling it (thereby increasing the exponent by 1) until it finds that the result cannot be converted back by halving it. The maximum float is the value before that final doubling.
Do note, by the way, that all the above supposes a base-2 floating-point representation. You are unlikely to run into anything different, but C does not actually require any specific representation.
With respect to the second part of your question,
does this program gets more precise as longer and non-integer numbers have to be found?
the program takes care to avoid losing precision. It does assume a binary floating-point representation such as you described, but it will work correctly regardless of the number of bits in the significand or exponent of such a representation. No non-integers are involved, but the program already deals with numbers that have worse than unit precision, and with numbers larger than can be represented with type int.
I'm new to C and when I run the code below, the value that is put out is 12098 instead of 12099.
I'm aware that working with decimals always involves a degree of inaccuracy, but is there a way to accurately move the decimal point to the right two places every time?
#include <stdio.h>
int main(void)
{
int i;
float f = 120.99;
i = f * 100;
printf("%d", i);
}
Use the round function
float f = 120.99;
int i = round( f * 100.0 );
Be aware however, that a float typically only has 6 or 7 digits of precision, so there's a maximum value where this will work. The smallest float value that won't convert properly is the number 131072.01. If you multiply by 100 and round, the result will be 13107202.
You can extend the range of your numbers by using double values, but even a double has limited range. (A double has 16 or 17 digits of precision.) For example, the following code will print 10000000000000098
double d = 100000000000000.99;
uint64_t j = round( d * 100.0 );
printf( "%llu\n", j );
That's just an example, finding the smallest number is that exceeds the precision of a double is left as an exercise for the reader.
Use fixed-point arithmetic on integers:
#include <stdio.h>
#define abs(x) ((x)<0 ? -(x) : (x))
int main(void)
{
int d = 12099;
int i = d * 100;
printf("%d.%02d\n", d/100, abs(d)%100);
printf("%d.%02d\n", i/100, abs(i)%100);
}
Your problem is that float are represented internaly using IEEE-754. That is in base 2 and not in base 10. 0.25 will have an exact representation, but 0.1 has not, nor has 120.99.
What really happens is that due to floating point inacuracy, the ieee-754 float closest to the decimal value 120.99 multiplied by 100 is slightly below 12099, so it is truncated to 12098. You compiler should have warned you that you had a truncation from float to in (mine did).
The only foolproof way to get what you expect is to add 0.5 to the float before the truncation to int :
i = (f * 100) + 0.5
But beware floating point are inherently inaccurate when processing decimal values.
Edit :
Of course for negative numbers, it should be i = (f * 100) - 0.5 ...
If you'd like to continue operating on the number as a floating point number, then the answer is more or less no. There's various things you can do for small numbers, but as your numbers get larger, you'll have issues.
If you'd like to only print the number, then my recommendation would be to convert the number to a string, and then move the decimal point there. This can be slightly complicated depending on how you represent the number in the string (exponential and what not).
If you'd like this to work and you don't mind not using floating point, then I'd recommend researching any number of fixed decimal libraries.
You can use
float f = 120.99f
or
double f = 120.99
by default c store floating-point values as double so if you store them in float variable implicit casting is happened and it is bad ...
i think this works.