Trouble when computing modulos with floats in C - c

I am not an expert in programming, and I am facing the following issue.
I need to compute modulo between floats A and B.
So I use fmod((double)A, (double)B).
Theorically, if A is a multiple of B, then the result is 0.0.
However, due to floating point precision purpose, A and B are not exactly the number I expected to have.
Then, the result of the modulo computation is not 0.0, but something different.
Which is problematic.
Example:
A=99999.9, but the compiler interprets it as 99999.898.
B=99.9, but the compiler interprets it as 99.900002.
fmod(A,B) expected to be 0.0, but gives actually 99.9.
So the question is: how do you use to manage this kind of situation ?
Thank you

The trouble is that:
A is not 99999.9, but 99999.8984375 and
B is not 99.9, but 99.90000152587890625 and
A mod B is 99.89691162109375
OP is getting the correct answer for the arguments given.
Need to use different augments.
A reasonable alternative is to convert the arguments by a scaled power-of-10, then round to an integer, %, back to floating point and un-scale.
Overflow is a concern.
Since OP wants to treat numbers to the nearest 0.1, scale by 10.
#include <float.h>
#include <stdio.h>
int main(void) {
float A = 99999.9;
float B = 99.9;
printf("%.25f\n", A);
printf("%.25f\n", B);
printf("%.25f\n", fmod(A,B));
long long a = lround(A*10.0);
long long b = lround(B*10.0);
long long m = a%b;
double D = m/10.0;
printf("D = %.25f\n", D);
return 0;
}
Output
99999.8984375000000000000000000
99.9000015258789062500000000
99.8969116210937500000000000
D = 0.0000000000000000000000000
Alternative
long long a = lround(A*10.0);
long long b = lround(B*10.0);
long long m = a%b;
double D = m/10.0;
Scale, but skip the integer conversion part
double a = round(A*10.0);
double b = round(B*10.0);
double m = fmod(a,b);
double D = m/10.0;

Related

Return zero and Inf values in C

I use C to do computation using the following code:
#include <stdio.h>
#include <math.h>
void main() {
float x = 3.104924e-33;
int i = 6000, j = 1089;
float value, value_inv;
value = sqrt(x / ((float)i * j));
value_inv = 1. / value;
printf("value = %e\n", value);
printf("value_inv = %e\n", value_inv);
}
We can see, in fact, value = 2.18e-20. This does not exceed the boundary of float data type in C. But why the computer gives me
value = 0.000000e+00
value_inv = inf
Does anybody know why it happens and how to solve this problem without changing data type to double?
OP's float apparently does not support sub-normals. C allows non-support.
Does anybody know why it happens and how to solve this problem without changing data type to double?
This may be a implementation detail or due to a compiler option. Without changing to double, look to a different compiler or options. Look at options concerning sub-normal support, precision used for intermediate calculation and optimization levels (which sometimes short edge change cases like this.)
On my machine which does handle sub-normals, using C11, FLT_TRUE_MIN, smallest non-zero float is smaller than FLT_MIN, the smallest normal non-zero float.
#include<float.h>
float xx = x/((float)i*j);
printf("xx = %e %e %e\n",xx, FLT_MIN, FLT_TRUE_MIN);
Output
xx = 4.751943e-40 1.175494e-38 1.401298e-45
In OP's case, without sub-normal support, xx became 0.0f and led to the undesired output.
Using double math will handle the small intermediate float values.
value = sqrt(x/(1.0*i*j)); // Form product with `double` math
value_inv = 1.0f/value; // Here we can just use float math
printf("value = %e\n",value);
printf("value_inv = %e\n",value_inv);
Output
value = 2.179897e-20
value_inv = 4.587373e+19
On my computer (Ryzen 2700X, x86_64) the results are:
value = 2.179897e-020
value_inv = 4.587373e+019
You can try 1.f instead 1. , which actually is a double:
value_inv = 1.f/value;
Apparently your system hasn't support more digit for float. On my system the output is:
value = 2.179895e-020
value_inv = 4.587376e+019
I got the answer by myself.
I should change sqrt(x/((float)i*j)) to sqrt((double)x/((double)i*j)). After this, I can get correct result:
value = 2.179897e-20
value_inv = 4.587373e+19
There is no reason to use float instead of double for such computations:
3.104924e-33 is a double constant, it gets converted to float upon assignment, with a potential loss of precision
sqrt gets a double argument and returns a double value. Implicit conversions occur again with potential loss of precision.
1. / value computes with the type double because 1. has this type. value gets converted before the division and the result is converted to float to store to value_inv.
value and value_inv are implicitly converted to double when passed to printf.
All these conversions may incur loss of precision or even truncation to 0.. You should instead always use double unless there is a strong requirement to use float:
#include <stdio.h>
#include <math.h>
int main() {
double x = 3.104924e-33;
int i = 6000, j = 1089;
double value, value_inv;
value = sqrt(x / ((double)i * j));
value_inv = 1. / value;
printf("value = %e\n", value);
printf("value_inv = %e\n", value_inv);
return 0;
}
If for some reason you are required to use float, be careful to avoid unneeded conversions:
#include <stdio.h>
#include <math.h>
int main() {
float x = 3.104924e-33F;
int i = 6000, j = 1089;
float value, value_inv;
value = sqrtf(x / ((float)i * j));
value_inv = 1.F / value;
printf("value = %e\n", value);
printf("value_inv = %e\n", value_inv);
return 0;
}

precision between float and double in C

I understand there are several topics same as mine, but I still don't really get it, so I'm expecting someone could explain this in a more simple but explicit way for me instead of pasting other topics' links, thanks.
Here's a sample code:
int a = 960;
int b = 16;
float c = a*0.001;
float d = a*0.001 + b;
double e = a*0.001 + b;
printf("%f\n%f\n%lf", c, d, e);
which outputs:
0.960000
16.959999
16.960000
My two questions are:
Why does adding an integer to a float ends up as the second output, but changing float to double solves the problem as the third output?
Why does the third output have the same number of digits with the first and second output after the decimal point since it should be a more precise value?
The reason why they produce the same number of decimal places, is because 6 is the default value. You can change that as in the edited example below, where the syntax is %.*f. The * can be either a number as shown below, or in the second case, supplied as another argument.
#include <stdio.h>
int main(void) {
int a = 960;
int b = 16;
float c = a*0.001;
float d = a*0.001 + b;
double e = a*0.001 + b;
printf("%.9f\n", c);
printf("%.*f\n", 9, d);
printf("%.16f\n", e);
}
Program output:
0.959999979
16.959999084
16.9600000000000009
The extra decimal places now shows that none of the results is exact. One reason is because 0.001 cannot be exactly coded as a floating point value. There are other reasons too, which have been extensively covered.
One easy way to understand why, is that a float has about 2^32 different values that can be encoded, however there is an infinity of real numbers within the range of float, and only about 2^32 of them can be represented exactly. In the case of the fraction 1/1000, in binary it is a recurring value (as is the fraction 1/3 in decimal).
I think the calculation a*0.001 will be done in double precision in both cases, then some precision is lost when you store it as a float.
You can choose how many decimal digits are printed by printf by writing e.g. "%.10lf" (to get 10 digits) instead of just "%lf".

How to make your result on divide number is not rounded?

Example (in C):
#include<stdio.h>
int main()
{
int a, b = 999;
float c = 0.0;
scanf("%d", &a);
c = (float)a/b;
printf("%.3lf...", c);
return 0;
}
If I put 998 it will come out 0.999, but I want the result be 0.998; how?
It looks like you want to truncate instead of round.
The mathematical result of 999/998 is 0.9989989989... Rounded to three decimal places, that is 0.999. So if you use %.3f to print it, that's what you're going to get.
When you convert a floating-point number to integer in C, the fractional part is truncated. So if you had the number 998.9989989 and you converted it to an int, you'd get 998. So you can get the result you want by multiplying by 1000, truncating to an int, and dividing by 1000 again:
c = c * 1000;
c = (int)c;
c = c / 1000;
Or you could shorten that to
c = (int)(c * 1000) / 1000.;
This will work fine for problems such as 998/999 ≈ 0.998, but you're close to the edge of where type float's limited precision will start introducing its own rounding issues. Using double would be a better choice. (Type float's limited precision almost always introduces issues.)

Concatenating longs and floats into a long

I have two numbers:
1234567890 <--- Long
and
0.123456 <--- Float
Is there any way to combine these to make a float(or double) in the following format:
(123)4567890.123456
I don't mind if the numbers in brackets have to be removed.
Given a long l and a float f, you can use:
double result = l % 10000000 + (double) f;
This will usually lose some accuracy in the fraction portion.
Update: From a comment, we learn that these values are a time represented as a number of seconds and a fraction of a second and that it is desired to calculate an interval. If we want to find the difference between two times, then we can calculate the difference with fewer problems from accuracy and precision this way:
double SubtractTimes(long l0, float f0, long l1, float f1)
{
long ld = l1 - l0;
double fd = (double) f1 - f0;
return ld + fd;
}
Note: If there is a concern that the time may have wrapped around some upper limit, then the code should test for this and make adjustments.
I must be missing something. Isn't it as easy as this?
long l = 1234567890;
float f = 0.123456;
float result = l + f;
Use this:
double result = l + f;
printf("%.6f",result);

Why storing a double expression in a variable before a cast to int can lead to different results than casting it directly?

I write this short program to test the conversion from double to int:
int main() {
int a;
int d;
double b = 0.41;
/* Cast from variable. */
double c = b * 100.0;
a = (int)(c);
/* Cast expression directly. */
d = (int)(b * 100.0);
printf("c = %f \n", c);
printf("a = %d \n", a);
printf("d = %d \n", d);
return 0;
}
Output:
c = 41.000000
a = 41
d = 40
Why do a and d have different values even though they are both the product of b and 100?
The C standard allows a C implementation to compute floating-point operations with more precision than the nominal type. For example, the Intel 80-bit floating-point format may be used when the type in the source code is double, for the IEEE-754 64-bit format. In this case, the behavior can be completely explained by assuming the C implementation uses long double (80 bit) whenever it can and converts to double when the C standard requires it.
I conjecture what happens in this case is:
In double b = 0.41;, 0.41 is converted to double and stored in b. The conversion results in a value slightly less than .41.
In double c = b * 100.0000;, b * 100.0000 is evaluated in long double. This produces a value slightly less than 41.
That expression is used to initialize c. The C standard requires that it be converted to double at this point. Because the value is so close to 41, the conversion produces exactly 41. So c is 41.
a = (int)(c); produces 41, as normal.
In d = (int)(b * 100.000);, we have the same multiplication as before. The value is the same as before, something slightly less than 41. However, this value is not assigned to or used to intialize a double, so no conversion to double occurs. Instead, it is converted to int. Since the value is slightly less than 41, the conversion produces 40.
The compiler can infer that c has to be initialized with 0.41 * 100.0 and does that better than the calculation of d.
The crux of the problem is that 0.41 is not exactly representable in IEEE 754 64-bit binary floating point. The actual value (with only enough precision to show the relevant part) is 0.409999999999999975575..., while 100 can be represented exactly. Multiplying these together should yield 40.9999999999999975575..., which is again not quite representable. In the likely case that the rounding mode is towards nearest, zero, or negative infinity, this should be rounded to 40.9999999999999964.... When cast to an int, this is rounded to 40.
The compiler is allowed to do calculations with higher precision, however, and in particular may replace the multiplication in the assignment of c with a direct store of the computed value.
Edit: I miscalculated the largest representable number less than 41, the correct value is approximately 40.99999999999999289.... As both Eric Postpischil and Daniel Fischer correctly point out, even the value calculated as a double should be rounded to 41 unless the rounding mode is towards zero or negative infinity. Do you know what the rounding mode is? It makes a difference, as this code sample shows:
#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
int main(void)
{
int roundMode = fegetround( );
volatile double d1;
volatile double d2;
volatile double result;
volatile int rounded;
fesetround(FE_TONEAREST);
d1 = 0.41;
d2 = 100;
result = d1 * d2;
rounded = result;
printf("nearest rounded=%i\n", rounded);
fesetround(FE_TOWARDZERO);
d1 = 0.41;
d2 = 100;
result = d1 * d2;
rounded = result;
printf("zero rounded=%i\n", rounded);
fesetround(roundMode);
return 0;
}
Output:
nearest rounded=41
zero rounded=40

Resources