Data lost from float assignment to float print - c

This came up during testing where I have to compare values between actual output and expected output.
Code:
float nf = 584227.4649743827f;
printf("Output: \t %.9f \n", nf);
Output:
Output: 584227.437500
I clearly have some gaps in my knowledge in C, so could someone explain to me this situation:
Why is there this deviation (0.027474382659420f) in the print ?
Is this only the limitation of print, or is it the float data type limitation?
which value is actually stored in the variable nf?
How should I work with values like this so I don't lose information like having a deviation of 0.027474382659420f during assignment.
Any other suggestion related to this kind of problem in testing would be also much appriciated.

Why is there this deviation (0.027474382659420f) in the print ?
Because float has an accuracy of about 7 digits, and your deviation starts at the seventh digit.
Is this only the limitation of print, or is it the float data type
limitation?
It's a limitation of floating point numbers, so that also includes double (although double has a higher precision). It also has to do with the conversion between binary and decimal numbers, so for instance 0.2 is a repeating decimal (well, binary) in binary representation, so it is suspect to rounding errors, too, and might actually turn into something like 0.200000000000000011.
which value is actually stored in the variable nf?
The one that you see printed. The 584227.4649743827f that you specified most likely won't even exist in the binary of your compiled program and is "translated" to the actually used value during compilation.
How should I work with values like this so I don't lose information
like having a deviation of 0.027474382659420f during assignment.
Use double, that has an accuracy of about 15-17 digits. Also, you need to remove the f from the 584227.4649743827f, turning it into a double constant instead. If that is not accurate enough, you may have to use external libraries instead for arbitrary precision numbers, such as GMP.
Your floating point numbers most likely adhere to the IEEE 754 standard, but there's no guarantee.

Related

Adding two large double numbers in C gives an incorrect result

I would like to ask you a question regarding to adding two big numbers that are double in the C language.
Lets say there are two numbers that are double: 1.31E+42 and 1.399E+43.
If I do the adding in Excel, the result is 15300000000000000000000000000000000000000000. That should be correct.
If I do the adding in the C language, the result is 15299999999999999804719125983728080953278464.
The difference is quite huge. Does anybody know how to get the right result when adding or multiplying big numbers in the C language that are double?
I have to add another important information. One thing is to print it out. I know that it is possible to print the number as you guys suggested. But I also need it as the value for another work. To be specific, it is a task of analysing two circles - their intersection and/or touching (if they touch externally or internally or if there is an intersection).
https://www.bbc.co.uk/bitesize/guides/z9pssbk/revision/4
Two circles will touch if the distance between their centres (V) is equal to the sum of their radii (external touch), or the difference between their radii (internal touch). So I have to do the adding for external touch and then to compare if the distance between their centers V is the same like the sum of their radii. So it is not only about printing the value out.
First circle:
Sx = -3.2E+41, Sy = -3.31E+42, R = 1.31E+42
Second circle:
Sx = 1.354E+43, Sy = 3.17E+42, R = 1.399E+43
The distance between their centers in C language is:
V = 15300000000000002280599204554488630751526912.000000
Sum of their radii is in the C language:
SumR =15299999999999999804719125983728080953278464.000000
According to the reference they should touch externally, so if I do the following condition I get the information that there is no external touch.
if (fabs(V - SumR) < 0.001)
printf("There is an external touch")
else
printf("No external touch")
Floating-point arithmetic approximates real-number arithmetic. When converting decimal numbers to binary floating-point or doing floating-point arithmetic, you generally should not expect the results you would get with real-number arithmetic.
This answer assumes your C implementation uses the IEEE-754 binary64 format for double and performs arithmetic using round-to-nearest-ties-to-even, including conversion from decimal to double.
The binary64 format has a sign, a 53-bit significand (a “fraction portion” of the number), and an 11-bit exponent.
This format cannot represent 1.31•1042. The nearest value it can represent is 1310000000000000060347708657386176332693504. When your C source text contains 1.31e42, your compiler converts it to 1310000000000000060347708657386176332693504. If we write this using hexadecimal for the significand, it is 1.E137CED6DF0D116•2139. You can see the initial “1” and 13 more hexadecimal digits (4 bits each) make up 53 bits.
The format also cannot represent 1.399•1043. The nearest value it can represent is 13989999999999999899113922237014438982975488. When your C source text contains 1.399e43, your compiler converts it to 13989999999999999899113922237014438982975488. Using hexadecimal, this is 1.4131D470653E916•2143
When these are added, the ordinary mathematical result is not representable. The result produced is the nearest representable number, which is 15299999999999999804719125983728080953278464. Using hexadecimal, this is 1.5F45515DD32F616•2143.
This is the right result as expected for floating-point arithmetic. Getting the “right” result for your purpose depends on what you want to accomplish. For many purposes, getting an approximate result with floating-point suffices, and one simply understands that the result is approximate. If an exact result is necessary, alternative formats and software must be used.
If you need to process big numbers, you should use an arbitrary precision arithmetic (a.k.a bignums) library, such as GMPlib.
If you want to use floating point numbers, take time to understand them by reading the floating point guide and about IEEE 754. They don't follow intuitive properties of real numbers (e.g. most operations are not associative).
Read of course Modern C and this C reference website, then the C11 draft standard n1570.

C float and double comparisons

I'm comparing simple floats and doubles in C, specifically the value 8.7 for both of them. Now I assign 8.7 to each variable, when I print I get a result of 8.7000 for both values. Why has the compiler added these zeros. And the main question I wanted to ask was is there any further numbers that I'm not seeing, as in hidden after the trailing zeros. I read that I shouldn't do comparisons like this with float because of a lack of precision, but I thought with such a small value surely it can store 8.7 with a degree of accuracy needed to compare itself with another 8.7 value?
My only worry is that its actually being represented somewhere in memory as eg 8.70000003758 or something, which is throwing my comparisons off? I tried to printf with %.20f to see any further numbers that might be hiding but I think that just created numbers that were otherwise not there as the whole accuracy of the number changed to 8.6918734634834929 or something similar.
I'm comparing simple floats and doubles in C, specifically the value 8.7 for both of them.
Bad choice, since 8.7 has no exact binary representation.
Now I assign 8.7 to each variable, when I print I get a result of 8.7000 for both values. Why has the compiler added these zeros.
It hasn't, your print routine has.
And the main question I wanted to ask was is there any further numbers that I'm not seeing, as in hidden after the trailing zeros.
Definitely, since 8.7 has no exact binary representation. (Try to write it out as the sum of integer powers of 2, you can't do it.)
I read that I shouldn't do comparisons like this with float because of a lack of precision, but I thought with such a small value surely it can store 8.7 with a degree of accuracy needed to compare itself with another 8.7 value?
You thought wrong. 1/3 is small but has no exact decimal representation with a finite number of digits. Whether a value is big or small has nothing to do with whether it can be represented exactly with a finite number of digits in a particular base.
My only worry is that its actually being represented somewhere in memory as eg 8.70000003758 or something, which is throwing my comparisons off?
Exactly, just as representing 1/3 as 0.333333333 would do.
I tried to printf with %.20f to see any further numbers that might be hiding but I think that just created numbers that were otherwise not there as the whole accuracy of the number changed to 8.6918734634834929 or something similar.
That's probably just a bug. Show us that code. Perhaps you tried to output a double and left out the l.

upper bound for the floating point error for a number

There are many questions (and answers) on this subject, but I am too thick to figure it out. In C, for a floating point of a given type, say double:
double x;
scanf("%lf", &x);
Is there a generic way to calculate an upper bound (as small as possible) for the error between the decimal fraction string passed to scanf and the internal representation of what is now in x?
If I understand correctly, there is sometimes going to be an error, and it will increase as the absolute value of the decimal fraction increases (in other words, 0.1 will be a bit off, but 100000000.1 will be off by much more).
This aspect of the C standard is slightly under-specified, but you can expect the conversion from decimal to double to be within one Unit in the Last Place of the original.
You seem to be looking for a bound on the absolute error of the conversion. With the above assumption, you can compute such a bound as a double as DBL_EPSILON * x. DBL_EPSILON is typically 2^-52.
A tighter bound on the error that can have been made during the conversion can be computed as follows:
double va = fabs(x);
double error = nextafter(va, +0./0.) - va;
The best conversion functions guarantee conversion to half a ULP in default round-to-nearest mode. If you are using conversion functions with this guarantee, you can divide the bound I offer by two.
The above applies when the original number represented in decimal is 0 or when its absolute value is comprised between DBL_MIN (approx. 2*10^-308) and DBL_MAX (approx. 2*10^308). If the non-null decimal number's absolute value is lower than DBL_MIN, then the absolute error is only bounded by DBL_MIN * DBL_EPSILON. If the absolute value is higher than DBL_MAX, you are likely to get infinity as the result of the conversion.
you cant think of this in terms of base 10, the error is in base 2, which wont necessarily point to a specific decimal place in base 10.
You have two underlying issues with your question, first scanf taking an ascii string and converting it to a binary number, that is one piece of software which uses a number of C libraries. I have seen for example compile time parsing vs runtime parsing give different conversion results on the same system. so in terms of error, if you want an exact number convert it yourself and place that binary number in the register/variable, otherwise accept what you get with the conversion and understand there may be rounding or clipping on the conversion that you didnt expect (which results in an accuracy issue, you didnt get the number you expected).
the second and real problem Pascal already answered. you only have x number if binary places. In terms of decimal if you had 3 decimal places the number 1.2345 would either have to be represented as 1.234 or 1.235. same for binary if you have 3 bits of mantissa then 1.0011 is either 1.001 or 1.010 depending on rounding. the mantissa length for IEEE floating point numbers is well documented you can simply google to find how many binary places you have for each precision.

Why can't I multiply a float? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Dealing with accuracy problems in floating-point numbers
I was quite surprised why I tried to multiply a float in C (with GCC 3.2) and that it did not do as I expected.. As a sample:
int main() {
float nb = 3.11f;
nb *= 10;
printf("%f\n", nb);
}
Displays: 31.099998
I am curious regarding the way floats are implemented and why it produces this unexpected behavior?
First off, you can multiply floats. The problem you have is not the multiplication itself, but the original number you've used. Multiplication can lose some precision, but here the original number you've multiplied started with lost precision.
This is actually an expected behavior. floats are implemented using binary representation which means they can't accurately represent decimal values.
See MSDN for more information.
You can also see in the description of float that it has 6-7 significant digits accuracy. In your example if you round 31.099998 to 7 significant digits you will get 31.1 so it still works as expected here.
double type would of course be more accurate, but still has rounding error due to it's binary representation while the number you wrote is decimal.
If you want complete accuracy for decimal numbers, you should use a decimal type. This type exists in languages like C#. http://msdn.microsoft.com/en-us/library/system.decimal.aspx
You can also use rational numbers representation. Using two integers which will give you complete accuracy as long as you can represent the number as a division of two integers.
This is working as expected. Computers have finite precision, because they're trying to compute floating point values from integers. This leads to floating point inaccuracies.
The Floating point wikipedia page goes into far more detail on the representation and resulting accuracy problems than I could here :)
Interesting real-world side-note: this is partly why a lot of money calculations are done using integers (cents) - don't let the computer lose money with lack of precision! I want my $0.00001!
The number 3.11 cannot be represented in binary. The closest you can get with 24 significant bits is 11.0001110000101000111101, which works out to 3.1099998950958251953125 in decimal.
If your number 3.11 is supposed to represent a monetary amount, then you need to use a decimal representation.
In the Python communities we often see people surprised at this, so there are well-tested-and-debugged FAQs and tutorial sections on the issue (of course they're phrased in terms of Python, not C, but since Python delegates float arithmetic to the underlying C and hardware anyway, all the descriptions of float's mechanics still apply).
It's not the multiplication's fault, of course -- remove the statement where you multiply nb and you'll see similar issues anyway.
From Wikipedia article:
The fact that floating-point numbers
cannot precisely represent all real
numbers, and that floating-point
operations cannot precisely represent
true arithmetic operations, leads to
many surprising situations. This is
related to the finite precision with
which computers generally represent
numbers.
Floating points are not precise because they use base 2 (because it's binary: either 0 or 1) instead of base 10. And base 2 converting to base 10, as many have stated before, will cause rounding precision issues.

Why does the value of this float change from what it was set to?

Why is this C program giving the "wrong" output?
#include<stdio.h>
void main()
{
float f = 12345.054321;
printf("%f", f);
getch();
}
Output:
12345.054688
But the output should be, 12345.054321.
I am using VC++ in VS2008.
It's giving the "wrong" answer simply because not all real values are representable by floats (or doubles, for that matter). What you'll get is an approximation based on the underlying encoding.
In order to represent every real value, even between 1.0x10-100 and 1.1x10-100 (a truly minuscule range), you still require an infinite number of bits.
Single-precision IEEE754 values have only 32 bits available (some of which are tasked to other things such as exponent and NaN/Inf representations) and cannot therefore give you infinite precision. They actually have 23 bits available giving precision of about 224 (there's an extra implicit bit) or just over 7 decimal digits (log10(224) is roughly 7.2).
I enclose the word "wrong" in quotes because it's not actually wrong. What's wrong is your understanding about how computers represent numbers (don't be offended though, you're not alone in this misapprehension).
Head on over to http://www.h-schmidt.net/FloatApplet/IEEE754.html and type your number into the "Decimal representation" box to see this in action.
If you want a more accurate number, use doubles instead of floats - these have double the number of bits available for representing values (assuming your C implementation is using IEEE754 single and double precision data types for float and double respectively).
If you want arbitrary precision, you'll need to use a "bignum" library like GMP although that's somewhat slower than native types so make sure you understand the trade-offs.
The decimal number 12345.054321 cannot be represented accurately as a float on your platform. The result that you are seeing is a decimal approximation to the closest number that can be represented as a float.
floats are about convenience and speed, and use a binary representation - if you care about precision use a decimal type.
To understand the problem, read What Every Computer Scientist Should Know About Floating-Point Arithmetic:
http://docs.sun.com/source/806-3568/ncg_goldberg.html
For a solution, see the Decimal Arithmetic FAQ:
http://speleotrove.com/decimal/decifaq.html
It's all to do with precision. Your number cannot be stored accurately in a float.
Single-precision floating point values can only represent about eight to nine significant (decimal) digits. Beyond that point, you're seeing quantization error.

Resources