I am quite a newbie to c. So when i writing a small game demo, i face a really strange problem.
void testC()
{
float a = 825300160;
float b = a + 0.1;
assert(a != b);
}
The above assert statement can't passed. Very strange.
My environment is mac os ml. gcc 4.2.1
The fractional portion of a float consists of 23 bits. You need 30 bits to represent 825300160, so the less significant portion of the number is dropped. Adding .1 does not make a difference - you need to add roughly 32 for the number to change:
float a = 825300160;
float b = a + 31.5;
assert(a != b); // No change is detected
float c = a + 32;
assert(a != c); // Change is detected
There's not enough precision in the float type. If you really need to distinguish a 0.1 addition to a number as large as 825300160, use double.
As this site shows, both a and b would be represented as
0 10011100 10001001100010001010011
in the IEEE standard for floats, where the first bit is the sign, the next 8 are the exponent, and the remaining 23 the mantissa. There's just not enough space in those 23 bits to represent the difference, because the exponent is so large.
Related
The same operations seem to work differently for larger and smaller values (I think the code below explains the question better than I could in words) I have calculated max and max3 in the same way except the values are different. Similarly I have calculated max2 and max4 the exact same way with different values. Yet the answer I'm getting is very different?:
#include <stdio.h>
#include <math.h>
int main(void)
{
// 86997171 / 48 = 1812441.0625
int max = ceil((float) 86997171 / 48);
float max2 = ((float) 86997171)/ 48;
printf("max = %i, max2 = %f\n", max, max2);
int max3 = ceil((float) 3 / 2);
float max4 = ((float) 3) / 2;
printf("ma3 = %i, max4 = %f\n", max3, max4);
}
Output:
max = 1812441, max2 = 1812441.000000
ma3 = 2, max4 = 1.500000
I was expecting max = 1812442, max2 = 1812441.062500 to be the output, since that's what it should be in principle. Now I don't know what to do
float division in C for large numbers
This issue has nothing to do with division. The rounding error occurs in the initial conversion to float.
In the format most commonly used for float, IEEE-754 binary32, the two representable numbers closed to 86,997,171 are 86,997,168 and 86,997,176. (These are 10,874,746•23 and 10,874,747•103. 10,874,746 and 10,874,747 are 24-bit numbers (it takes 24 digits in binary to represent them), and 24 bits is all the binary32 format has for representing the fraction portion of a floating-point number.)
Of those two, 86,997,168 is closer. So, in (float) 86997171, 86,997,171 is converted to 86,997,168.
Then 86,997,168 / 48 is 1,812,441. So (float) 86997171 / 48 is 1,812,441, and so is ceil((float) 86997171 / 48). So max and max2 are both set to 1,812,441.
In C, float is a single-precision floating-point format, so it is usually 4 bytes (on most compilers), so its precision is around 6-9 significant digits, typically 7 digits.
Your number in question, 1812441.0625 has 11 digits, which don't fit in a float type.
You should use double instead, which in C is a double-precision floating-point format, so it is usually 8 bytes (on most compilers) so its precision is around 15-18 significant digits, typically 16 digits, and therefore can keep the precision of your number.
In fact, using double in this case gives:
max = 1812442, max2 = 1812441.062500
ma3 = 2, max4 = 1.500000
which is what you need.
Link to code.
Note that the precision of these types is explained here. It is far from the truth (as explained by the link), but it gives good perspective in your question.
I am quite a newbie to c. So when i writing a small game demo, i face a really strange problem.
void testC()
{
float a = 825300160;
float b = a + 0.1;
assert(a != b);
}
The above assert statement can't passed. Very strange.
My environment is mac os ml. gcc 4.2.1
The fractional portion of a float consists of 23 bits. You need 30 bits to represent 825300160, so the less significant portion of the number is dropped. Adding .1 does not make a difference - you need to add roughly 32 for the number to change:
float a = 825300160;
float b = a + 31.5;
assert(a != b); // No change is detected
float c = a + 32;
assert(a != c); // Change is detected
There's not enough precision in the float type. If you really need to distinguish a 0.1 addition to a number as large as 825300160, use double.
As this site shows, both a and b would be represented as
0 10011100 10001001100010001010011
in the IEEE standard for floats, where the first bit is the sign, the next 8 are the exponent, and the remaining 23 the mantissa. There's just not enough space in those 23 bits to represent the difference, because the exponent is so large.
I can't understand why the code below doesn't work properly with the value 4.2. I learnt using a debugger that 4.2 isn't actually the number four point two; rather as a floating point value 4.2 becomes 4.19999981
To make up for this, I just added change = change + 0.00001; there on line 18.
Why do I have to do that? Why is this the way floating point integers work?
#include <stdio.h>
#include <cs50.h>
float change;
int coinTotal;
int main(void)
{
do {
// Prompting the user to give the change amount
printf("Enter change: ");
// Getting a float from the user
change = get_float();
}
while (change < 0);
change = change + 0.00001;
// Subtracting quarters from the change given
for (int i = 0; change >= 0.25; i++)
{
change = change - 0.25;
coinTotal++;
}
// Subtracting nickels from the remaining change
for(int i = 0; change >= 0.1; i++)
{
change = change - 0.1;
coinTotal++;
}
// Subtracting dimes from the remaining change
for(int i = 0; change >= 0.05; i++)
{
change = change - 0.05;
coinTotal++;
}
// Subtracting pennies from the remaining change
for(int i = 0; change >= 0.01; i++)
{
change = change - 0.01;
coinTotal++;
}
// Printing total coins used
printf("%i\n", coinTotal);
}
Typically float can represent about 232 different values exactly. With float, 4.2 is not one of them. Instead the value is about 4.19999981 as OP has reported.
Working with fractional money is tricky. Rarely is float an acceptable type for money. This details some alternatives like base-10 FP, double, integers and custom types.
If code stays with some FP type, change >= 0.1, and other compares, need to alter to change >= (0.01 - 0.005) or the like. The compare needs to be tolerant of values just less than or greater than a multiple of 0.01.
As you have discovered. It's impossible to represent rational numbers as floating-point values on computers, due to the fact that the machine is storing it in a somewhat fixed sized ammount of bits.
The most common standard is IEEE 754 Check here
Most commonly you will work with floats that are in single precision (32 bits in total). The number is represented as 1 bit for sign, 8 bits for exponent , 23 bits for mantissa.
The representation is as follows x=S*M*B^E where:
S - sign (-1 or 1)
M - mantissa (a normalized fraction)
B - Base (here as 2)
E - exponent ( 8bits -> -128,127 or 0,255 depending on definition in standard)
This fraction is (M) causing the problems with accurate representation of values. You need to represent a certain aproximation while being given a limited ammount of bits (You can only accurately represent values that can be combined by summing 1/2, 1/4, 1/8... )
Commonly 32 bits allows you for precision for around 6 places in fraction.
You can use 64 bit (double) for a greater range and slightly better precision.
Make every number in your program 100 times bigger, use the math.h roundf function, and divide the result by 100 when you are about to print the value to the screen.
This question already has answers here:
Comparing float and double
(3 answers)
Closed 7 years ago.
int main(void)
{
float me = 1.1;
double you = 1.1;
if ( me == you ) {
printf("I love U");
} else {
printf("I hate U");
}
}
This prints "I hate U". Why?
Floats use binary fraction. If you convert 1.1 to float, this will result in a binary representation.
Each bit right if the binary point halves the weight of the digit, as much as for decimal, it divides by ten. Bits left of the point double (times ten for decimal).
in decimal: ... 0*2 + 1*1 + 0*0.5 + 0*0.25 + 0*0.125 + 1*0.0625 + ...
binary: 0 1 . 0 0 0 1 ...
2's exp: 1 0 -1 -2 -3 -4
(exponent to the power of 2)
Problem is that 1.1 cannot be converted exactly to binary representation. For double, there are, however, more significant digits than for float.
If you compare the values, first, the float is converted to double. But as the computer does not know about the original decimal value, it simply fills the trailing digits of the new double with all 0, while the double value is more precise. So both do compare not equal.
This is a common pitfall when using floats. For this and other reasons (e.g. rounding errors), you should not use exact comparison for equal/unequal), but a ranged compare using the smallest value different from 0:
#include "float.h"
...
// check for "almost equal"
if ( fabs(fval - dval) <= FLT_EPSILON )
...
Note the usage of FLT_EPSILON, which is the aforementioned value for single precision float values. Also note the <=, not <, as the latter will actually require exact match).
If you compare two doubles, you might use DBL_EPSILON, but be careful with that.
Depending on intermediate calculations, the tolerance has to be increased (you cannot reduce it further than epsilon), as rounding errors, etc. will sum up. Floats in general are not forgiving with wrong assumptions about precision, conversion and rounding.
Edit:
As suggested by #chux, this might not work as expected for larger values, as you have to scale EPSILON according to the exponents. This conforms to what I stated: float comparision is not that simple as integer comparison. Think about before comparing.
In short, you should NOT use == to compare floating points.
for example
float i = 1.1; // or double
float j = 1.1; // or double
This argument
(i==j) == true // is not always valid
for a correct comparison you should use epsilon (very small number):
(abs(i-j)<epsilon)== true // this argument is valid
The question simplifies to why do me and you have different values?
Usually, C floating point is based on a binary representation. Many compilers & hardware follow IEEE 754 binary32 and binary64. Rare machines use a decimal, base-16 or other floating point representation.
OP's machine certainly does not represent 1.1 exactly as 1.1, but to the nearest representable floating point number.
Consider the below which prints out me and you to high precision. The previous representable floating point numbers are also shown. It is easy to see me != you.
#include <math.h>
#include <stdio.h>
int main(void) {
float me = 1.1;
double you = 1.1;
printf("%.50f\n", nextafterf(me,0)); // previous float value
printf("%.50f\n", me);
printf("%.50f\n", nextafter(you,0)); // previous double value
printf("%.50f\n", you);
1.09999990463256835937500000000000000000000000000000
1.10000002384185791015625000000000000000000000000000
1.09999999999999986677323704498121514916420000000000
1.10000000000000008881784197001252323389053300000000
But it is more complicated: C allows code to use higher precision for intermediate calculations depending on FLT_EVAL_METHOD. So on another machine, where FLT_EVAL_METHOD==1 (evaluate all FP to double), the compare test may pass.
Comparing for exact equality is rarely used in floating point code, aside from comparison to 0.0. More often code uses an ordered compare a < b. Comparing for approximate equality involves another parameter to control how near. #R.. has a good answer on that.
Because you are comparing two Floating point!
Floating point comparison is not exact because of Rounding Errors. Simple values like 1.1 or 9.0 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. For example:
float a = 9.0 + 16.0
double b = 25.0
if(a == b) // can be false!
if(a >= b) // can also be false!
Even
if(abs(a-b) < 0.0001) // wrong - don't do this
This is a bad way to do it because a fixed epsilon (0.0001) is chosen because it “looks small”, could actually be way too large when the numbers being compared are very small as well.
I personally use the following method, may be this will help you:
#include <iostream> // std::cout
#include <cmath> // std::abs
#include <algorithm> // std::min
using namespace std;
#define MIN_NORMAL 1.17549435E-38f
#define MAX_VALUE 3.4028235E38f
bool nearlyEqual(float a, float b, float epsilon) {
float absA = std::abs(a);
float absB = std::abs(b);
float diff = std::abs(a - b);
if (a == b) {
return true;
} else if (a == 0 || b == 0 || diff < MIN_NORMAL) {
return diff < (epsilon * MIN_NORMAL);
} else {
return diff / std::min(absA + absB, MAX_VALUE) < epsilon;
}
}
This method passes tests for many important special cases, for different a, b and epsilon.
And don't forget to read What Every Computer Scientist Should Know About Floating-Point Arithmetic!
I'm new to C and when I run the code below, the value that is put out is 12098 instead of 12099.
I'm aware that working with decimals always involves a degree of inaccuracy, but is there a way to accurately move the decimal point to the right two places every time?
#include <stdio.h>
int main(void)
{
int i;
float f = 120.99;
i = f * 100;
printf("%d", i);
}
Use the round function
float f = 120.99;
int i = round( f * 100.0 );
Be aware however, that a float typically only has 6 or 7 digits of precision, so there's a maximum value where this will work. The smallest float value that won't convert properly is the number 131072.01. If you multiply by 100 and round, the result will be 13107202.
You can extend the range of your numbers by using double values, but even a double has limited range. (A double has 16 or 17 digits of precision.) For example, the following code will print 10000000000000098
double d = 100000000000000.99;
uint64_t j = round( d * 100.0 );
printf( "%llu\n", j );
That's just an example, finding the smallest number is that exceeds the precision of a double is left as an exercise for the reader.
Use fixed-point arithmetic on integers:
#include <stdio.h>
#define abs(x) ((x)<0 ? -(x) : (x))
int main(void)
{
int d = 12099;
int i = d * 100;
printf("%d.%02d\n", d/100, abs(d)%100);
printf("%d.%02d\n", i/100, abs(i)%100);
}
Your problem is that float are represented internaly using IEEE-754. That is in base 2 and not in base 10. 0.25 will have an exact representation, but 0.1 has not, nor has 120.99.
What really happens is that due to floating point inacuracy, the ieee-754 float closest to the decimal value 120.99 multiplied by 100 is slightly below 12099, so it is truncated to 12098. You compiler should have warned you that you had a truncation from float to in (mine did).
The only foolproof way to get what you expect is to add 0.5 to the float before the truncation to int :
i = (f * 100) + 0.5
But beware floating point are inherently inaccurate when processing decimal values.
Edit :
Of course for negative numbers, it should be i = (f * 100) - 0.5 ...
If you'd like to continue operating on the number as a floating point number, then the answer is more or less no. There's various things you can do for small numbers, but as your numbers get larger, you'll have issues.
If you'd like to only print the number, then my recommendation would be to convert the number to a string, and then move the decimal point there. This can be slightly complicated depending on how you represent the number in the string (exponential and what not).
If you'd like this to work and you don't mind not using floating point, then I'd recommend researching any number of fixed decimal libraries.
You can use
float f = 120.99f
or
double f = 120.99
by default c store floating-point values as double so if you store them in float variable implicit casting is happened and it is bad ...
i think this works.