Why do I need 17 significant digits (and not 16) to represent a double? - c

Can someone give me an example of a floating point number (double precision), that needs more than 16 significant decimal digits to represent it?
I have found in this thread that sometimes you need up to 17 digits, but I am not able to find an example of such a number (16 seems enough to me).
Can somebody clarify this?

My other answer was dead wrong.
#include <stdio.h>
int
main(int argc, char *argv[])
{
unsigned long long n = 1ULL << 53;
unsigned long long a = 2*(n-1);
unsigned long long b = 2*(n-2);
printf("%llu\n%llu\n%d\n", a, b, (double)a == (double)b);
return 0;
}
Compile and run to see:
18014398509481982
18014398509481980
0
a and b are just 2*(253-1) and 2*(253-2).
Those are 17-digit base-10 numbers. When rounded to 16 digits, they are the same. Yet a and b clearly only need 53 bits of precision to represent in base-2. So if you take a and b and cast them to double, you get your counter-example.

The correct answer is the one by Nemo above. Here I am just pasting a simple Fortran program showing an example of the two numbers, that need 17 digits of precision to print, showing, that one does need (es23.16) format to print double precision numbers, if one doesn't want to loose any precision:
program test
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: a, b
a = 1.8014398509481982e+16_dp
b = 1.8014398509481980e+16_dp
print *, "First we show, that we have two different 'a' and 'b':"
print *, "a == b:", a == b, "a-b:", a-b
print *, "using (es22.15)"
print "(es22.15)", a
print "(es22.15)", b
print *, "using (es23.16)"
print "(es23.16)", a
print "(es23.16)", b
end program
it prints:
First we show, that we have two different 'a' and 'b':
a == b: F a-b: 2.0000000000000000
using (es22.15)
1.801439850948198E+16
1.801439850948198E+16
using (es23.16)
1.8014398509481982E+16
1.8014398509481980E+16

I think the guy on that thread is wrong, and 16 base-10 digits are always enough to represent an IEEE double.
My attempt at a proof would go something like this:
Suppose otherwise. Then, necessarily, two distinct double-precision numbers must be represented by the same 16-significant-digit base-10 number.
But two distinct double-precision numbers must differ by at least one part in 253, which is greater than one part in 1016. And no two numbers differing by more than one part in 1016 could possibly round to the same 16-significant-digit base-10 number.
This is not completely rigorous and could be wrong. :-)

Dig into the single and double precision basics and wean yourself of the notion of this or that (16-17) many DECIMAL digits and start thinking in (53) BINARY digits. The necessary examples may be found here at stackoverflow if you spend some time digging.
And I fail to see how you can award a best answer to anyone giving a DECIMAL answer without qualified BINARY explanations. This stuff is straight-forward but it is not trivial.

The largest continuous range of integers that can be exactly represented by a double (8-byte IEEE) is -253 to 253 (-9007199254740992. to 9007199254740992.). The numbers -253-1 and 253+1 cannot be exactly represented by a double.
Therefore, no more than 16 significant decimal digits to the left of the decimal point will exactly represent a double in the continuous range.

Related

Why double and %f don't want to print 10 decimals?

I am learning c programming language and am figuring out format specifiers, but it seems as if double and %f are not working corectly.
Here is my code
#include <stdio.h>
int main(void)
{
double a = 15.1234567899876;
printf("%13.10f", a);
}
In my textbook it's stated that in "%13.10f" 13 stands for total number of digits we want to be printed(including dot) and 10 is number of decimals. So i expected to get 15.1234567899 but didn't.
After running it I get 15.1234567900. It's not just not enough decimals, but decimals are not printed correctly. Variable a has 8 after 7 and before 9, but printed number does not.
Can someone please tell me where am I wrong.
Thank you. Lp
printf is supposed to round the result to the number of digits you asked for.
you asked: 15.1234567899876
you got: 15.1234567900
digit count: 1234567890
So printf is behaving correctly.
You should beware, though, that both types float and double have finite precision. Also their finite precision is as a number of binary bits, not decimal digits. So after about 7 digits for a float, and about 16 digits for a double, you'll start seeing results that can seem quite strange if you don't realize what's going on. You can see this if you start printing more digits:
printf("%18.15f\n", a);
you asked: 15.1234567899876
you got: 15.123456789987600
So that's okay. But:
printf("%23.20f\n", a);
you asked: 15.1234567899876
you got: 15.12345678998759979095
Here we see that, at the 15th digit, the number actually stored internally begins to differ slightly from the number you asked for. You can read more about this at Is floating point math broken?
Footnote: What was the number actually stored internally? It was the hexadecimal floating-point number 0xf.1f9add3b7744, or expressed in C's %a format, 0x1.e3f35ba76ee88p+3. Converted back to decimal, it's exactly 15.1234567899875997909475699998438358306884765625. All those other renditions (15.1234567900, 15.123456789987600, and 15.12345678998759979095) are rounded to some smaller number of digits. The internal value makes the most sense, perhaps, expressed in binary, where it's 0b1111.0001111110011010110111010011101101110111010001000, with exactly 53 significant bits, of which 52 are explicit and one implicit, per IEEE-754 double precision.

Why does C print float values after the decimal point different from the input value? [duplicate]

This question already has answers here:
Why IEEE754 single-precision float has only 7 digit precision?
(2 answers)
Closed 1 year ago.
Why does C print float values after the decimal point different from the input value?
Following is the code.
CODE:
#include <stdio.h>
#include<math.h>
void main()
{
float num=2118850.132000;
printf("num:%f",num);
}
OUTPUT:
num:2118850.250000
This should have printed 2118850.132000, But instead it is changing the digits after the decimal to .250000. Why is it happening so?
Also, what can one do to avoid this?
Please guide me.
Your computer uses binary floating point internally. Type float has 24 bits of precision, which translates to approximately 7 decimal digits of precision.
Your number, 2118850.132, has 10 decimal digits of precision. So right away we can see that it probably won't be possible to represent this number exactly as a float.
Furthermore, due to the properties of binary numbers, no decimal fraction that ends in 1, 2, 3, 4, 6, 7, 8, or 9 (that is, numbers like 0.1 or 0.2 or 0.132) can be exactly represented in binary. So those numbers are always going to experience some conversion or roundoff error.
When you enter the number 2118850.132 as a float, it is converted internally into the binary fraction 1000000101010011000010.01. That's equivalent to the decimal fraction 2118850.25. So that's why the .132 seems to get converted to 0.25.
As I mentioned, float has only 24 bits of precision. You'll notice that 1000000101010011000010.01 is exactly 24 bits long. So we can't, for example, get closer to your original number by using something like 1000000101010011000010.001, which would be equivalent to 2118850.125, which would be closer to your 2118850.132. No, the next lower 24-bit fraction is 1000000101010011000010.00 which is equivalent to 2118850.00, and the next higher one is 1000000101010011000010.10 which is equivalent to 2118850.50, and both of those are farther away from your 2118850.132. So 2118850.25 is as close as you can get with a float.
If you used type double you could get closer. Type double has 53 bits of precision, which translates to approximately 16 decimal digits. But you still have the problem that .132 ends in 2 and so can never be exactly represented in binary. As type double, your number would be represented internally as the binary number 1000000101010011000010.0010000111001010110000001000010 (note 53 bits), which is equivalent to 2118850.132000000216066837310791015625, which is much closer to your 2118850.132, but is still not exact. (Also notice that 2118850.132000000216066837310791015625 begins to diverge from your 2118850.1320000000 after 16 digits.)
So how do you avoid this? At one level, you can't. It's a fundamental limitation of finite-precision floating-point numbers that they cannot represent all real numbers with perfect accuracy. Also, the fact that computers typically use binary floating-point internally means that they can almost never represent "exact-looking" decimal fractions like .132 exactly.
There are two things you can do:
If you need more than about 7 digits worth of precision, definitely use type double, don't try to use type float.
If you believe your data is accurate to three places past the decimal, print it out using %.3f. If you take 2118850.132 as a double, and printf it using %.3f, you'll get 2118850.132, like you want. (But if you printed it with %.12f, you'd get the misleading 2118850.132000000216.)
This will work if you use double instead of float:
#include <stdio.h>
#include<math.h>
void main()
{
double num=2118850.132000;
printf("num:%f",num);
}

How are results rounded in floating-point arithmetic?

I wrote this code that simply sums a list of n numbers, to practice with floating point arithmetic, and I don't understand this:
I am working with float, this means I have 7 digits of precision, therefore, if I do the operation 10002*10002=100040004, the result in data type float will be 100040000.000000, since I lost any digit beyond the 7th (the program still knows the exponent, as seen here).
If the input in this program is
3
10000
10001
10002
You will see that, however, when this program computes 30003*30003=900180009 we have 30003*30003=900180032.000000
I understand this 32 appears becasue I am working with float, and my goal is not to make the program more precise but understand why this is happening. Why is it 900180032.000000 and not 900180000.000000? Why does this decimal noise (32) appear in 30003*30003 and not in 10002*10002 even when the magnitude of the numbers are the same? Thank you for your time.
#include <stdio.h>
#include <math.h>
#define MAX_SIZE 200
int main()
{
int numbers[MAX_SIZE];
int i, N;
float sum=0;
float sumb=0;
float sumc=0;
printf("introduce n" );
scanf("%d", &N);
printf("write %d numbers:\n", N);
for(i=0; i<N; i++)
{
scanf("%d", &numbers[i]);
}
int r=0;
while (r<N){
sum=sum+numbers[r];
sumb=sumb+(numbers[r]*numbers[r]);
printf("sum is %f\n",sum);
printf("sumb is %f\n",sumb);
r++;
}
sumc=(sum*sum);
printf("sumc is %f\n",sumc);
}
As explained below, the computed result of multiplying 10,002 by 10,002 must be a multiple of eight, and the computed result of multiplying 30,003 by 30,003 must be a multiple of 64, due to the magnitudes of the numbers and the number of bits available for representing them. Although your question asks about “decimal noise,” there are no decimal digits involved here. The results are entirely due to rounding to multiples of powers of two. (Your C implementation appears to use the common IEEE 754 format for binary floating-point.)
When you multiply 10,002 by 10,002, the computed result must be a multiple of eight. I will explain why below. The mathematical result is 100,040,004. The nearest multiples of eight are 100,040,000 and 100,040,008. They are equally far from the exact result, and the rule used to break ties chooses the even multiple (100,040,000 is eight times 12,505,000, an even number, while 100,040,008 is eight times 12,505,001, an odd number).
Many C implementations use IEEE 754 32-bit basic binary floating-point for float. In this format, a number is represented as an integer M multiplied by a power of two 2e. The integer M must be less than 224 in magnitude. The exponent e may be from −149 to 104. These limits come from the numbers of bits used to represent the integer and the exponent.
So all float values in this format have the value M • 2e for some M and some e. There are no decimal digits in the format, just an integer multiplied by a power of two.
Consider the number 100,040,004. The biggest M we can use is 16,777,215 (224−1). That is not big enough that we can write 100,040,004 as M • 20. So we must increase the exponent. Even with 22, the biggest we can get is 16,777,215 • 22 = 67,108,860. So we must use 23. And that is why the computed result must be a multiple of eight, in this case.
So, to produce a result for 10,002•10,002 in float, the computer uses 12,505,000 • 23, which is 100,040,000.
In 30,003•30,003, the result must be a multiple of 64. The exact result is 900,180,009. 25 is not enough because 16,777,215•25 is 536,870,880. So we need 26, which is 64. The two nearest multiples of 64 are 900,179,968 and 900,180,032. In this case, the latter is closer (23 away versus 41 away), so it is chosen.
(While I have described the format as an integer times a power of two, it can also be described as a binary numeral with one binary digit before the radix point and 23 binary digits after it, with the exponent range adjusted to compensate. These are mathematically equivalent. The IEEE 754 standard uses the latter description. Textbooks may use the former description because it makes analyzing some of the numerical properties easier.)
Floating point arithmetic is done in binary, not in decimal.
Floats actually have 24 binary bits of precision, 1 of which is a sign bit and 23 of which are called significand bits. This converts to approximately 7 decimal digits of precision.
The number you're looking at, 900180032, is already 9 digits long and so it makes sense that the last two digits (the 32) might be wrong. The rounding like the arithmetic is done in binary, the reason for the difference in rounding can only be seen if you break things down into binary.
900180032 = 110101101001111010100001000000
900180000 = 110101101001111010100000100000
If you count from the first 1 to the last 1 in each of those numbers (the part I put in bold), that is how many significand bits it takes to store the number. 900180032 takes only 23 significand bits to store while 900180000 takes 24 significand bits which makes 900180000 an impossible number to store as floats only have 23 significand bits. 900180032 is the closest number to the correct answer, 900180009, that a float can store.
In the other example
100040000 = 101111101100111110101000000
100040004 = 101111101100111110101000100
The correct answer, 100040004 has 25 significand bits, too much for floats. The nearest number that has 23 or less significand bits is 10004000 which only has 21 significant bits.
For more on floating point arithmetic works, try here http://steve.hollasch.net/cgindex/coding/ieeefloat.html

Count number of digits after `.` in floating point numbers?

This is one interview question.
How do you compute the number of digit after . in floating point number.
e.g. if given 3.554 output=3
for 43.000 output=0.
My code snippet is here
double no =3.44;
int count =0;
while(no!=((int)no))
{
count++;
no=no*10;
}
printf("%d",count);
There are some numbers that can not be indicated by float type. for example, there is no 73.487 in float type, the number indicated by float in c is 73.486999999999995 to approximate it.
Now how to solve it as it is going in some infinite loop.
Note : In the IEEE 754 Specifications, a 32 bit float is divided as 24+7+1 bits. The 7 bits indicate the mantissa.
I doubt this is what you want since the question is asking for something that's not usually meaningful with floating point numbers, but here is the answer:
int digits_after_decimal_point(double x)
{
int i;
for (i=0; x!=rint(x); x+=x, i++);
return i;
}
The problem isn't really solvable as stated, since floating-point is typically represented in binary, not in decimal. As you say, many (in fact most) decimal numbers are not exactly representable in floating-point.
On the other hand, all numbers that are exactly representable in binary floating-point are decimals with a finite number of digits -- but that's not particularly useful if you want a result of 2 for 3.44.
When I run your code snippet, it says that 3.44 has 2 digits after the decimal point -- because 3.44 * 10.0 * 10.0 just happens to yield exactly 344.0. That might not happen for another number like, say, 3.43 (I haven't tried it).
When I try it with 1.0/3.0, it goes into an infinite loop. Adding some printfs shows that no becomes exactly 33333333333333324.0 after 17 iterations -- but that number is too big to be represented as an int (at least on my system), and converting it to int has undefined behavior.
And for large numbers, repeatedly multiplying by 10 will inevitably give you a floating-point overflow. There are ways to avoid that, but they don't solve the other problems.
If you store the value 3.44 in a double object, the actual value stored (at least on my system) is exactly 3.439999999999999946709294817992486059665679931640625, which has 51 decimal digits in its fractional part. Suppose you really want to compute the number of decimal digits after the point in 3.439999999999999946709294817992486059665679931640625. Since 3.44 and 3.439999999999999946709294817992486059665679931640625 are effectively the same number, there's no way for any C function to distinguish between them and know whether it should return 2 or 51 (or 50 if you meant 3.43999999999999994670929481799248605966567993164062, or ...).
You could probably detect that the stored value is "close enough" to 3.44, but that makes it a much more complex problem -- and it loses the ability to determine the number of decimal digits in the fractional part of 3.439999999999999946709294817992486059665679931640625.
The question is meaningful only if the number you're given is stored in some format that can actually represent decimal fractions (such as a string), or if you add some complex requirement for determining which decimal fraction a given binary approximation is meant to represent.
There's probably a reasonable way to do the latter by looking for the unique decimal fraction whose nearest approximation in the given floating-point type is the given binary floating-point number.
The question could be interpreted as such:
Given a floating point number, find the shortest decimal representation that would be re-interpreted as the same floating point value with correct rounding.
Once formulated like this, the answer is Yes we can - see this algorithm:
Printing floating point numbers quickly and accurately. Robert G. Burger and R. Kent Dybvig. ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, June 1996
http://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf
See also references from Compute the double value nearest preferred decimal result for a Smalltalk implementation.
Sounds like you need to either use sprintf to get an actual rounded version, or have the input be a string (and not parsed to a float).
Either way, once you have a string version of the number, counting characters after the decimal should be trivial.
It is my logic to count the number of digits.
number = 245.98
Take input as a string
char str[10] = "245.98";
Convert string to int using to count the number of digits before the decimal point.
int atoi(const char *string)
Use logic n/10 inside the while to count the numbers.
Numbers after decimal logic
Get the length of the string using strlen(n)
inside the while (a[i]! ='.'). then increment i
Later you can add step 3 logic output and step 4 logic output
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char num[100] = "345653.8768";
int count=0;
int i=0;
int len;
int before_decimal = atoi(num);
int after_decimal;
int total_Count;
printf("Converting string to int : %d\n", before_decimal);
//Lets count the numbers of digits before before_decimal
while(before_decimal!=0){
before_decimal = before_decimal/10;
count++;
}
printf("number of digits before decimal are %d\n",count);
//Lets get the number of digits after decimal
// first get the lenght of the string
len = strlen(num);
printf("Total number of digits including '.' are =%d\n",len);
//Now count the number after '.' decimal points
// Hope you know how to compare the strings
while(num[i]!='.'){
i++;
}
// total lenght of number - numberof digits after decimal -1(becuase every string ends with '\0')
after_decimal= len-i-1;
printf("Number of digits after decimal points are %d\n",after_decimal);
//Lets add both count Now
// ie. Number of digits before decmal and after decimal
total_Count = count+ after_decimal;
printf("Total number of digits are :%d\n",total_Count);
return 0;
}
Output:
Converting string to int : 345653
number of digits before decimal are 6
Total number of digits including '.' are =11
Number of digits after decimal points are 4
Total number of digits are :10
There are no general exact solutions. But you can convert the value to string and don't count the part exceeding the type's precision and exclude the trailing 0s or 9s. This will work for more cases but it still won't return the correct answer for all.
For example double's accuracy is about 15 digits if the input is a decimal string from the user (17 digits for binary-decimal-binary round trip), so for 73.486999999999995 there are 15 - 2 = 13 digits after the radix point (minus the 2 digits in the int part). After that there are still many 9s in the fractional part, subtract them from the count too. Here there are ten 9s which means there are 13 - 10 = 3 decimal digits. If you use 17 digits then the last digit which may be just garbage, exclude it before counting the 9s or 0s.
Alternatively just start from the 15 or 16th digit and iterate until you see the first non-0 and non-9 digit. Count the remaining digits and you'll get 3 in this case. Of course while iterating you must also make sure that the trailing is all 0s or all 9s
Request: e.g. if given 3.554 output = 3, for 43.000 output = 0
Problem: that's already a decimal like 0.33345. When this gets converted to a double, it might be something like 0.333459999...125. The goal is merely to determine that 0.33345 is a shorter decimal that will produce the same double. The solution is to convert it to a string with the right number of digits that results in the same original value.
int digits(double v){
int d=0; while(d < 50){
string t=DoubleToString(v,d); double vt = StrToDouble(t);
if(MathAbs(v-vt) < 1e-15) break;
++d;
}
return d;
}
double v=0.33345; PrintFormat("v=%g, d=%i", v,digits(v));// v=0.33345, d=5
v=0.01; PrintFormat("v=%g, d=%i", v,digits(v));// v=0.01, d=2
v=0.00001; PrintFormat("v=%g, d=%i", v,digits(v));// v=1e-05, d=5
v=5*0.00001; PrintFormat("v=%g, d=%i", v,digits(v));// v=5e-05, d=5
v=5*.1*.1*.1; PrintFormat("v=%g, d=%i", v,digits(v));// v=0.005, d=3
v=0.05; PrintFormat("v=%g, d=%i", v,digits(v));// v=0.05, d=2
v=0.25; PrintFormat("v=%g, d=%i", v,digits(v));// v=0.25, d=2
v=1/3.; PrintFormat("v=%g, d=%i", v,digits(v));// v=0.333333, d=15
What you can do is multiply the number by various powers of 10, round that to the nearest integer, and then divide by the same number of powers of 10. When the final result compares different from the original number, you've gone one digit too far.
I haven't read it in a long time, so I don't know how it relates to this idea, but How to Print Floating-Point Numbers Accurately from PLDI 1990 and 2003 Retrospective are probably very relevant to the basic problem.

Why are my exponential numbers being rounded? (C language)

I am getting unexpected results when printing some doubles. Some rounding is taking place, and I'm not sure why.
#include <stdio.h>
int main(void)
{
double d1 = 0;
double d2 = 0;
d1 = 1.2345678901234567e16;
d2 = 112233445566778899.0;
printf("d1: %.0lf\n", d1);
printf("d2: %.0lf\n", d2);
return 0;
}
The results of running the program are:
d1: 12345678901234568
d2: 112233445566778900
In the first case, I'm not sure why the last digit (the 7) got rounded to an 8, if there are no numbers after it.
In the second case, I also don't know why the number in the hundreds position got rounded. Doubles should accomodate numbers much larger than these without rounding.
Thanks
Not "much larger" - in fact you're right at the limit for "accuracy". A double has 53 bits of accuracy. Your first number is about 10^16, which would need about 16/(log 2) = 53.15 bits to be accurate to within an integer.
“Doubles should accomodate numbers much larger than these without rounding.” Why do you think so?
An IEEE standard double (which is what you are using) has 53 bits (binary digits) of precision.
Go to Wolfram Alpha and ask it for the binary representation of 12345678901234567. It will tell you that the binary form has 54 digits. Therefore it cannot be represented exactly as a double.
Your second number requires 57 digits, so it too cannot be represented exactly.
Doubles should accomodate numbers much larger than these without rounding - yes, if they are powers of 2. If there is a large distance between the leftest and rightest 1 in their binary presentation, they will be rounded.
A 64 bit double only has 16 or so decimal digits of precision - you're simply reaching the precision limits of the data type

Resources