Fast rounding float on three digits precision in C - c

I have seen this code:
(int)(num < 0 ? (num - 0.5) : (num + 0.5))
(How to round floating point numbers to the nearest integer in C?)
for rounding but I need to use float and precision for three digits after the point.
Examples:
254.450 should be rounded up to 255.
254.432 should be rounded down to 254
254.448 should be rounded down to 254
and so on.
Notice: This is what I mean by "3 digits" the bold digits after the dot.
I believe it should be faster then roundf() because I use many hundreds of thousands rounds when I need to calculate the rounds. Do you have some tips how to do that? I tried to search source of roundf but nothing found.
Note: I need it for RGB2HSV conversion function so I think 3 digits should be enough. I use positive numbers.

"it should be faster then roundf()" is only verifiable with profiling various approaches.
To round to 0 places (round to nearest whole number), use roundf()
float f;
float f_rounded3 = roundf(f);
To round to 3 places using float, use round()
The round functions round their argument to the nearest integer value in floating-point format, rounding halfway cases away from zero, regardless of the current rounding direction.
#include <math.h>
float f;
float f_rounded3 = round(f * 1000.0)/1000.0;
Code purposely uses the intermediate type of double, else code code use with reduced range:
float f_rounded3 = roundf(f * 1000.0f)/1000.0f;
If code is having trouble rounding 254.450 to 255.0 using roundf() or various tests, it is likely because the value is not 254.450, but a float close to it like 254.4499969 which rounds to 254. Typical FP using a binary format and 254.450 is not exactly representable.

You can use double transformation float -> string -> float, while first transformation make 3 digits after point:
sprintf(tmpStr, "%.3f", num);

this work for me
#include <stdio.h>
int main(int ac, char**av)
{
float val = 254.449f;
float val2 = 254.450f;
int res = (int)(val < 0 ? (val - 0.55f) : (val + 0.55f));
int res2 = (int)(val2 < 0 ? (val2 - 0.55f) : (val2 + 0.55f));
printf("%f %d %d\n", val, res, res2);
return 0;
}
output : 254.449005 254 255
to increase the precision just add any 5 you want in 0.55f like 0.555f, 0.5555f, etc

I wanted something like this:
float num = 254.454300;
float precision=10;
float p = 10*precision;
num = (int)(num * p + 0.5) / p ;
But the result will be inaccurate (with error) - my x86 machine gives me this result: 254.449997

When you can change de border from b=0.5 to b=0.45 you must know that for positives the rounded value is round_0(x,b)=(int)( x+(1-b) ) therefore b=0.45 ⟹ round_0(x)=(int)(x+0.55) and you can threat the signal. But remember that don't exists 254.45 but 254.449997 and 254.449999999999989, maybe you prefer to use b=0.4495.
If you have float round_0(float) to zero-digit rounding (can be like you show in question), you can do for one, two... n-digit rounding like this in C/C++: # define round_n(x,n) (round_0((x)*1e##n)/1e##n).
round_1( x , b ) = round_0( 10*x ,b)/10
round_2( x , b ) = round_0( 100*x ,b)/100
round_3( x , b ) = round_0( 1000*x ,b)/1000
round_n( x , b , n ) = round_0( (10^n)*x ,b)/(10^n)
But do typecast to int and (one more typecast) to float to operate is slower than rounds in operations. If don't simplify the add/sub (some compilers have this setting) for faster zero-digit round to float type you can do it.
inline float round_0( float x , float b=0.5f ){
return (( x+(0.5f-b) )+(3<<22))-(3<<22) ; // or (( x+(0.5f-b) )-(3<<22))+(3<<22) ;
}
inline double round_0( double x , double b=0.5 ){
return (( x+(0.5-b) )+(3<<51))-(3<<51) ; // or (( x+(0.5-b) )-(3<<51))+(3<<51) ;
}
When b=0.5 it correctly rounds to nearest integer if |x|<=2^23 (float) or |x|<=2^52 (double). But if compiler uses FPU (ten bytes floating-point) optimizing loads then constant is 3.0*(1u<<63), works |x|<=2^64 and use long double can be faster.

Related

Round 37.1-28.75 float calculation correctly to 8.4 instead of 8.3

I have problem with floating point rounding. I want to calculate floating point numbers and round them to (given) N decimals. In this example I want to round to 1 decimal places.
Calculation 37.1-28.75 will result into floating point 8.349998 (instead of 8.35), which will result printf rounding to 8.3 instead of 8.4 for 1 decimal places.
The actual result in math is 37.10-28.75=8.35000000, but due to floating point imprecision it is converted into 8.349998, which is then converted into 8.3 instead of 8.4 when using 1 decimal place rounding.
Minimum reproducible example:
float a = 37.10;
float b = 28.75;
//a-b = 8.35 = 8.4
printf("%.1f\n", a - b); //outputs 8.3 instead of 8.4
Is it valid to add following to the result:
float result = a - b;
if (result > 0.0f)
{
result += powf(10, -nr_of_decimals - 1) / 2;
}
else
{
result -= powf(10, -nr_of_decimals - 1) / 2;
}
EDIT: corrected that I want 1 decimal place rounded output, not 2 decimal places
EDIT2: negative results are needed as well (28.75-37.1 = -8.4)
On my system I do actually get 8.35. It's possible that you have to set the rounding direction to "nearest" first, try this (compile with e.g. gcc ... -lm):
#include <fenv.h>
#include <stdio.h>
int main()
{
float a = 37.10;
float b = 28.75;
float res = a - b;
fesetround(FE_TONEAREST);
printf("%.2f\n", res);
}
Binary floating point is, after all, binary, and if you do care about the correct decimal rounding this much, then your choices would be:
decimal floating point, or
fixed point.
I'd say the solution is to use fixed point, especially if you're on embedded, and forget about everything else.
With
int32_t a = 3710;
int32_t b = 2875;
the result of
a - b
will exactly be
835
every time; and then you just need to have a simple fixed point printing routine for the desired precision, and check the following digit after the last digit to see if it needs to be rounded up.
If you want to round to 2 decimals, you can add 0.005 to the result and then offset it with floorf:
float f = 37.10f - 28.75f;
float r = floorf((f + 0.005f) * 100.f) / 100.f;
printf("%f\n", r);
The output is 8.350000
Why are you using floats instead of doubles?
Regarding your question:
Is it valid to add following to the result:
float result = a - b;
if (result > 0.0f)
{
result += powf(10, -nr_of_decimals - 1) / 2;
}
else
{
result -= powf(10, -nr_of_decimals - 1) / 2;
}
It doesn't seem so, on my computer I get 8.350498 instead of 8.350000.
After your edit:
Calculation 37.1-28.75 will result into floating point 8.349998, which will result printf rounding to 8.3 instead of 8.4.
Then
float r = roundf((f + (f < 0.f ? -0.05f : +0.05f)) * 10.f) / 10.f;
is what you are looking for.

Save float result number to third digit, no rounding in C

How to round result to third digit after the third digit.
float result = cos(number);
Note that I want to save the result up to the third digit, no rounding. And no, I don't want to print it with .3f, I need to save it as new value;
Example:
0.00367 -> 0.003
N.B. No extra zeroes after 3 are wanted.
Also, I need to be able to get the 3rd digit. For example if it is 0.0037212, I want to get the 3 and use it as an int in some calculation.
0.00367 -> 0.003
A float can typically represent about 232 different values exactly. 0.00367 and 0.003 are not in that set.
The closest float to 0.00367 is 0.0036700000055134296417236328125
The closest float to 0.003__ is 0.0030000000260770320892333984375
I want to save the result up to the third digit
This goal needs a compromise. Save the result to a float near a multiple of 0.001.
Scaling by 1000.0, truncating and dividing by 1000.0 will work for most values.
float y1 = truncf(x * 1000.0f) / 1000.0f;
The above gives a slightly wrong answer with some values near x.xxx000... and x.xxx999.... Using higher precision can solve that.
float y2 = (float) (trunc(x * 1000.0) / 1000.0);
I want to get the 3 and use it as an int in some calculation.
Skip the un-scaling part and only keep 1 digit with fmod().
int digit = (int) fmod((trunc(x * 1000.0), 10);
digit = abs(digit);
In the end, I suspect this approach will not completely satisfy OP's unstated "use it as an int in some calculation.". There are many subtitles to FP math, especially when trying to use a binary FP, as are most double, in some sort of decimal way.
Perhaps the following will meet OP's goal, even though it does some rounding.:
int third_digit = (int) lround(cos(number)*1000.0) % 10;
third_digit = abs(third_digit);
You can scale the value up, use trunc to truncate toward zero, then scale down:
float result = trunc(cos(number) * 1000) / 1000;
Note that due to the inexact nature of floating point numbers, the result won't be the exact value.
If you're looking to specifically extract the third decimal digit, you can do that as follows:
int digit = (int)(result * 1000) % 10;
This will scale the number up so that the digit in question is to the left of the decimal point, then extract that digit.
You can subtract from the number it's remainder from division by 0.001:
result -= fmod(result, 0.001);
Demo
Update:
The question is updated with very conflicting requirements. If you have an exact 0.003 number, there will be infinite numbers of zeroes after it, and it is a mathematical property of numbers. OTOH, float representation cannot guarantee that every exact number of 3 decimal digits will be represented exactly. To solve this problem you will need to give up on using the float type and switch to a some sort of fixed point representation.
Overkill, using sprintf()
double /* or float */ val = 0.00385475337;
if (val < 0) exit(EXIT_FAILURE);
if (val >= 1) exit(EXIT_FAILURE);
char tmp[55];
sprintf(tmp, "%.50f", val);
int third_digit = tmp[4] - '0';

Moving decimal place to right in c

I'm new to C and when I run the code below, the value that is put out is 12098 instead of 12099.
I'm aware that working with decimals always involves a degree of inaccuracy, but is there a way to accurately move the decimal point to the right two places every time?
#include <stdio.h>
int main(void)
{
int i;
float f = 120.99;
i = f * 100;
printf("%d", i);
}
Use the round function
float f = 120.99;
int i = round( f * 100.0 );
Be aware however, that a float typically only has 6 or 7 digits of precision, so there's a maximum value where this will work. The smallest float value that won't convert properly is the number 131072.01. If you multiply by 100 and round, the result will be 13107202.
You can extend the range of your numbers by using double values, but even a double has limited range. (A double has 16 or 17 digits of precision.) For example, the following code will print 10000000000000098
double d = 100000000000000.99;
uint64_t j = round( d * 100.0 );
printf( "%llu\n", j );
That's just an example, finding the smallest number is that exceeds the precision of a double is left as an exercise for the reader.
Use fixed-point arithmetic on integers:
#include <stdio.h>
#define abs(x) ((x)<0 ? -(x) : (x))
int main(void)
{
int d = 12099;
int i = d * 100;
printf("%d.%02d\n", d/100, abs(d)%100);
printf("%d.%02d\n", i/100, abs(i)%100);
}
Your problem is that float are represented internaly using IEEE-754. That is in base 2 and not in base 10. 0.25 will have an exact representation, but 0.1 has not, nor has 120.99.
What really happens is that due to floating point inacuracy, the ieee-754 float closest to the decimal value 120.99 multiplied by 100 is slightly below 12099, so it is truncated to 12098. You compiler should have warned you that you had a truncation from float to in (mine did).
The only foolproof way to get what you expect is to add 0.5 to the float before the truncation to int :
i = (f * 100) + 0.5
But beware floating point are inherently inaccurate when processing decimal values.
Edit :
Of course for negative numbers, it should be i = (f * 100) - 0.5 ...
If you'd like to continue operating on the number as a floating point number, then the answer is more or less no. There's various things you can do for small numbers, but as your numbers get larger, you'll have issues.
If you'd like to only print the number, then my recommendation would be to convert the number to a string, and then move the decimal point there. This can be slightly complicated depending on how you represent the number in the string (exponential and what not).
If you'd like this to work and you don't mind not using floating point, then I'd recommend researching any number of fixed decimal libraries.
You can use
float f = 120.99f
or
double f = 120.99
by default c store floating-point values as double so if you store them in float variable implicit casting is happened and it is bad ...
i think this works.

Round positive value half-up to 2 decimal places in C

Typically, Rounding to 2 decimal places is very easy with
printf("%.2lf",<variable>);
However, the rounding system will usually rounds to the nearest even. For example,
2.554 -> 2.55
2.555 -> 2.56
2.565 -> 2.56
2.566 -> 2.57
And what I want to achieve is that
2.555 -> 2.56
2.565 -> 2.57
In fact, rounding half-up is doable in C, but for Integer only;
int a = (int)(b+0.5)
So, I'm asking for how to do the same thing as above with 2 decimal places on positive values instead of Integer to achieve what I said earlier for printing.
It is not clear whether you actually want to "round half-up", or rather "round half away from zero", which requires different treatment for negative values.
Single precision binary float is precise to at least 6 decimal places, and 20 for double, so nudging a FP value by DBL_EPSILON (defined in float.h) will cause a round-up to the next 100th by printf( "%.2lf", x ) for n.nn5 values. without affecting the displayed value for values not n.nn5
double x2 = x * (1 + DBL_EPSILON) ; // round half-away from zero
printf( "%.2lf", x2 ) ;
For different rounding behaviours:
double x2 = x * (1 - DBL_EPSILON) ; // round half-toward zero
double x2 = x + DBL_EPSILON ; // round half-up
double x2 = x - DBL_EPSILON ; // round half-down
Following is precise code to round a double to the nearest 0.01 double.
The code functions like x = round(100.0*x)/100.0; except it handles uses manipulations to insure scaling by 100.0 is done exactly without precision loss.
Likely this is more code than OP is interested, but it does work.
It works for the entire double range -DBL_MAX to DBL_MAX. (still should do more unit testing).
It depends on FLT_RADIX == 2, which is common.
#include <float.h>
#include <math.h>
void r100_best(const char *s) {
double x;
sscanf(s, "%lf", &x);
// Break x into whole number and fractional parts.
// Code only needs to round the fractional part.
// This preserves the entire `double` range.
double xi, xf;
xf = modf(x, &xi);
// Multiply the fractional part by N (256).
// Break into whole and fractional parts.
// This provides the needed extended precision.
// N should be >= 100 and a power of 2.
// The multiplication by a power of 2 will not introduce any rounding.
double xfi, xff;
xff = modf(xf * 256, &xfi);
// Multiply both parts by 100.
// *100 incurs 7 more bits of precision of which the preceding code
// insures the 8 LSbit of xfi, xff are zero.
int xfi100, xff100;
xfi100 = (int) (xfi * 100.0);
xff100 = (int) (xff * 100.0); // Cast here will truncate (towards 0)
// sum the 2 parts.
// sum is the exact truncate-toward-0 version of xf*256*100
int sum = xfi100 + xff100;
// add in half N
if (sum < 0)
sum -= 128;
else
sum += 128;
xf = sum / 256;
xf /= 100;
double y = xi + xf;
printf("%6s %25.22f ", "x", x);
printf("%6s %25.22f %.2f\n", "y", y, y);
}
int main(void) {
r100_best("1.105");
r100_best("1.115");
r100_best("1.125");
r100_best("1.135");
r100_best("1.145");
r100_best("1.155");
r100_best("1.165");
return 0;
}
[Edit] OP clarified that only the printed value needs rounding to 2 decimal places.
OP's observation that rounding of numbers "half-way" per a "round to even" or "round away from zero" is misleading. Of 100 "half-way" numbers like 0.005, 0.015, 0.025, ... 0.995, only 4 are typically exactly "half-way": 0.125, 0.375, 0.625, 0.875. This is because floating-point number format use base-2 and numbers like 2.565 cannot be exactly represented.
Instead, sample numbers like 2.565 have as the closest double value of 2.564999999999999947... assuming binary64. Rounding that number to nearest 0.01 should be 2.56 rather than 2.57 as desired by OP.
Thus only numbers ending with 0.125 and 0.625 area exactly half-way and round down rather than up as desired by OP. Suggest to accept that and use:
printf("%.2lf",variable); // This should be sufficient
To get close to OP's goal, numbers could be A) tested against ending with 0.125 or 0.625 or B) increased slightly. The smallest increase would be
#include <math.h>
printf("%.2f", nextafter(x, 2*x));
Another nudge method is found with #Clifford.
[Former answer that rounds a double to the nearest double multiple of 0.01]
Typical floating-point uses formats like binary64 which employs base-2. "Rounding to nearest mathmatical 0.01 and ties away from 0.0" is challenging.
As #Pascal Cuoq mentions, floating point numbers like 2.555 typically are only near 2.555 and have a more precise value like 2.555000000000000159872... which is not half way.
#BLUEPIXY solution below is best and practical.
x = round(100.0*x)/100.0;
"The round functions round their argument to the nearest integer value in floating-point
format, rounding halfway cases away from zero, regardless of the current rounding direction." C11dr §7.12.9.6.
The ((int)(100 * (x + 0.005)) / 100.0) approach has 2 problems: it may round in the wrong direction for negative numbers (OP did not specify) and integers typically have a much smaller range (INT_MIN to INT_MAX) that double.
There are still some cases when like when double x = atof("1.115"); which end up near 1.12 when it really should be 1.11 because 1.115, as a double is really closer to 1.11 and not "half-way".
string x rounded x
1.115 1.1149999999999999911182e+00 1.1200000000000001065814e+00
OP has not specified rounding of negative numbers, assuming y = -f(-x).

Implementing single-precision division as double-precision multiplication

Question
For a C99 compiler implementing exact IEEE 754 arithmetic, do values of f, divisor of type float exist such that f / divisor != (float)(f * (1.0 / divisor))?
EDIT: By “implementing exact IEEE 754 arithmetic” I mean a compiler that rightfully defines FLT_EVAL_METHOD as 0.
Context
A C compiler that provides IEEE 754-compliant floating-point can only replace a single-precision division by a constant by a single-precision multiplication by the inverse if said inverse is itself representable exactly as a float.
In practice, this only happens for powers of two. So a programmer, Alex, may be confident that f / 2.0f will be compiled as if it had been f * 0.5f, but if it is acceptable for Alex to multiply by 0.10f instead of dividing by 10, Alex should express it by writing the multiplication in the program, or by using a compiler option such as GCC's -ffast-math.
This question is about transforming a single-precision division into a double-precision multiplication. Does it always produce the correctly rounded result? Is there a chance that it could be cheaper, and thus be an optimization that compilers might make (even without -ffast-math)?
I have compared (float)(f * 0.10) and f / 10.0f for all single-precision values of f between 1 and 2, without finding any counter-example. This should cover all divisions of normal floats producing a normal result.
Then I generalized the test to all divisors with the program below:
#include <float.h>
#include <math.h>
#include <stdio.h>
int main(void){
for (float divisor = 1.0; divisor != 2.0; divisor = nextafterf(divisor, 2.0))
{
double factor = 1.0 / divisor; // double-precision inverse
for (float f = 1.0; f != 2.0; f = nextafterf(f, 2.0))
{
float cr = f / divisor;
float opt = f * factor; // double-precision multiplication
if (cr != opt)
printf("For divisor=%a, f=%a, f/divisor=%a but (float)(f*factor)=%a\n",
divisor, f, cr, opt);
}
}
}
The search space is just large enough to make this interesting (246). The program is currently running. Can someone tell me whether it will print something, perhaps with an explanation why or why not, before it has finished?
Your program won't print anything, assuming round-ties-to-even rounding mode. The essence of the argument is as follows:
We're assuming that both f and divisor are between 1.0 and 2.0. So f = a / 2^23 and divisor = b / 2^23 for some integers a and b in the range [2^23, 2^24). The case divisor = 1.0 isn't interesting, so we can further assume that b > 2^23.
The only way that (float)(f * (1.0 / divisor)) could give the wrong result would be for the exact value f / divisor to be so close to a halfway case (i.e., a number exactly halfway between two single-precision floats) that the accumulated errors in the expression f * (1.0 / divisor) push us to the other side of that halfway case from the true value.
But that can't happen. For simplicity, let's first assume that f >= divisor, so that the exact quotient is in [1.0, 2.0). Now any halfway case for single precision in the interval [1.0, 2.0) has the form c / 2^24 for some odd integer c with 2^24 < c < 2^25. The exact value of f / divisor is a / b, so the absolute value of the difference f / divisor - c / 2^24 is bounded below by 1 / (2^24 b), so is at least 1 / 2^48 (since b < 2^24). So we're more than 16 double-precision ulps away from any halfway case, and it should be easy to show that the error in the double precision computation can never exceed 16 ulps. (I haven't done the arithmetic, but I'd guess it's easy to show an upper bound of 3 ulps on the error.)
So f / divisor can't be close enough to a halfway case to create problems. Note that f / divisor can't be an exact halfway case, either: since c is odd, c and 2^24 are relatively prime, so the only way we could have c / 2^24 = a / b is if b is a multiple of 2^24. But b is in the range (2^23, 2^24), so that's not possible.
The case where f < divisor is similar: the halfway cases then have the form c / 2^25 and the analogous argument shows that abs(f / divisor - c / 2^25) is greater than 1 / 2^49, which again gives us a margin of 16 double-precision ulps to play with.
It's certainly not possible if non-default rounding modes are possible. For example, in replacing 3.0f / 3.0f with 3.0f * C, a value of C less than the exact reciprocal would yield the wrong result in downward or toward-zero rounding modes, whereas a value of C greater than the exact reciprocal would yield the wrong result for upward rounding mode.
It's less clear to me whether what you're looking for is possible if you restrict to default rounding mode. I'll think about it and revise this answer if I come up with anything.
Random search resulted in an example.
Looks like when the result is a "denormal/subnormal" number, the inequality is possible. But then, maybe my platform is not IEEE 754 compliant?
f 0x1.7cbff8p-25
divisor -0x1.839p+116
q -0x1.f8p-142
q2 -0x1.f6p-142
int MyIsFinite(float f) {
union {
float f;
unsigned char uc[sizeof (float)];
unsigned long ul;
} x;
x.f = f;
return (x.ul & 0x7F800000L) != 0x7F800000L;
}
float floatRandom() {
union {
float f;
unsigned char uc[sizeof (float)];
} x;
do {
size_t i;
for (i=0; i<sizeof(x.uc); i++) x.uc[i] = rand();
} while (!MyIsFinite(x.f));
return x.f;
}
void testPC() {
for (;;) {
volatile float f, divisor, q, qd;
do {
f = floatRandom();
divisor = floatRandom();
q = f / divisor;
} while (!MyIsFinite(q));
qd = (float) (f * (1.0 / divisor));
if (qd != q) {
printf("%a %a %a %a\n", f, divisor, q, qd);
return;
}
}
}
Eclipse PC Version: Juno Service Release 2
Build id: 20130225-0426

Resources