How to translate the value of %a given by printf? - c

What is the value of 0x1.921fb82c2bd7fp+1 in a human readable presentation?
I got this value by printf using %a.

The mantissa is hexadecimal and the exponent is a decimal value representing the power of 2 the mantissa is scaled by.

Related

How many digits after the decimal point can a float variable save in c? [duplicate]

Generally we say that a float has precision of 6 digits after the decimal point. But if we store a large number of the order of 10^30 we won't get 6 digits after the decimal point. So is it correct to say that floats have a precision of 6 digits after the decimal point?
"6 digits after the decimal point" is nonesnse, and your example is a good demonstration of this.
This is an exact specification of the float data type.
The precision of the float is 24 bits. There are 23 bits denoting the fraction after the binary point, plus there's also an "implicit leading bit", according to the online source. This gives 24 significant bits in total.
Hence in decimal digits this is approximately:
24 * log(2) / log(10) = 7.22
It sounds like you're asking about precision to decimal places (digits after the decimal point), whereas significant figures (total number of digits excluding leading and traling zeroes) is a better way to describe accuracy of numbers.
You're correct in that the number of digits after the decimal point will change when the number is larger - but if we're talking precision, the number of significant figures will not change when the number is larger. However, the answer isn't simple for decimal numbers:
Most systems these days use IEE floating point format to represent numbers in C. However, if you're on something unusual, it's worth checking. Single precision IEE float numbers are made up of three parts:
The sign bit (is this number positive or negative)
The (generally also signed) exponent
The fraction (the number before the exponent is applied)
As we'd expect, this is all stored in binary.
How many significant figures?
If you are using IEE-754 numbers, "how many significant figures" probably isn't an easy way to think about it, because the precision is measured in binary significant figures rather than decimal. floats have only 23 bits of accuracy for the fraction part, but because there's an implicit leading bit (unless the fraction part is all zeroes, which indicates a final value of 1), there are 24 effective bits of precision.
This means there are 24 significant binary digits, which does not translate to an exact number of decimal significant figures. You can use the formula 24 * log(2) / log(10) to determine that there are 7.225 digits of decimal precision, which isn't a very good answer to your question, since there are numbers of 24 significant binary digits which only have 6 significant decimal digits.
So, single precision floating point numbers have 6-9 significant decimal digits of precision, depending on the number.
Interestingly, you can also use this precision to work out the largest consecutive integer (counting from zero) that you can successfully represent in a single precision float. It is 2^24, or 16,777,216. You can exactly store larger integers, but only if they can be represented in 24 significant binary digits.
Further trivia: The limited size of the fraction component is the same thing that causes this in Javascript:
> console.log(9999999999999999);
10000000000000000
Javascript numbers are always represented as double precision floats, which have 53 bits of precision. This means between 2^53 and 2^54, only even numbers can be represented, because the final bit of any odd number is lost.
The precision of floating point numbers should be measured in binary digits, not decimal digits. This is because computers operate on binary numbers, and a binary fraction can only approximate a decimal fraction.
Language lawyers will say that the exact width of a float is unspecified by the C standard and therefore implementation-dependent, but on any platform you are likely to encounter a C float means an IEEE754 single-precision number.
IEEE754 specifies that a floating point number is in scientific notation: (-1)s×2e×m
where s is one bit wide, e is eight bits wide, and m is twenty three bits wide. Mathematically, m is 24 bits wide because it's always assumed that the top bit is 1.
So, the maximum number of decimal digits that can be approximated with this representation is: log10(224) = 7.22 .
That approximates seven significant decimal digits, and an exponent ranging from 2-126 to 2127.
Notice that the exponent is measured separately. This is exactly like if you were using ordinary scientific notation, like "A person weighs 72.3 kilograms = 7.23×104 grams". Notice that there are three significant digits here, representing that the number is only accurate to within 100 grams. But there is also an exponent which is a different number entirely. You can have a very big exponent with very few significant digits, like "the sun weighs 1.99×1033 grams." Big number, few digits.
In a nutshell, a float can store about 7-8 significant decimal digits. Let me illustrate this with an example:
1234567001.00
^
+---------------- this information is lost
.01234567001
^
+-------------- this information is lost
Basically, the float stores two values: 1234567 and the position of the decimal point.
Now, this is a simplified example. Floats store binary values instead of decimal values. A 32-bit IEEE 754 float has space for 23 "significant bits" (plus the first one which is always assumed to be 1), which corresponds to roughly 7-8 decimal digits.
1234567001.00 (dec) =
1001001100101011111111101011001.00 (bin) gets rounded to
1001001100101011111111110000000.00 =
| 23 bits |
1234567040.00 (dec)
And this is exactly what C produces:
void main() {
float a = 1234567001;
printf("%f", a); // outputs 1234567040
}

Convert integer to IEEE floating point?

I am currently reading "Computer Systems: A Programmer's Perspective". In the book, big-endian is used(most significant bits first). In the context of IEEE floating point numbers, using 32-bit single-precision, here is a citation of conversion between an integer and IEEE floating point:
One useful exercise for understanding floating-point representations
is to convert sample integer values into floating-point form. For
example, we saw in Figure
2.15 that 12,345 has binary representation [11000000111001]. We create a normalized representation of this by shifting 13 positions to the
right of a binary point, giving 12,345 = 1.10000001110012 × 2^13. To
encode this in IEEE single-precision format, we construct the fraction
field by dropping the leading 1 and adding 10 zeros to the end, giving
binary representation [10000001110010000000000]. To construct the
exponent field, we add bias 127 to 13, giving 140, which has binary
representation [10001100]. We combine this with a sign bit of 0 to get
the floating-point representation in binary of
[01000110010000001110010000000000].
What I do not understand is "by dropping the leading 1 and adding 10 zeros to the end, giving
binary representation [10000001110010000000000]." If big-endian is used, why can you add 10 zeros to the end of 1000000111001? Doesn't that lead to another value than that after the binary point? It would make sense to me if we added 10 zeros in the front since the final decimal value would still be that originally after the binary point.
Why/how can you add 10 zeros to the back without changing the value if big-endian is used?
This is how the number 12345 is represented as a 32-bit single-precision IEEE754 float:
3 2 1 0
1 09876543 21098765432109876543210
S ---E8--- ----------F23----------
Binary: 0 10001100 10000001110010000000000
Hex: 4640 E400
Precision: SP
Sign: Positive
Exponent: 13 (Stored: 140, Bias: 127)
Hex-float: +0x1.81c8p13
Value: +12345.0 (NORMAL)
Since this is a NORMAL value, the fractional part is interpreted with an implicit 1-bit; that is it's 1.10000001110010000000000. So, to fill the 23 bit mantissa you simply add 10 0's at the end as it doesn't change the value.
Endianness isn't really related to how these numbers are represented, as each bit has a fixed meaning. But in general, the most-significant-bit is to the left in both the exponent and the mantissa.

printf with %a does not seem to produce a hexadecimal number

I have been told that "%a" used in C's printf would display hexadecimal format of a number. To test it, I print out the representation of 2^10:
printf ("%a", pow(2.0,10));
which gives
0x1p+10
I am confused because the exponent part "+10" looks more like a decimal format rather than a hexadecimal format. A hexadecimal format should have been 1pA. Where am I wrong?
It's correct, that format is called hexadecimal for doubles.
The man page says:
For a conversion, the double argument is converted to hexadecimal
notation (using the letters abcdef) in the style [-]0xh.hhhp[+-]d [...]
the exponent consists of a positive or negative sign followed by a
decimal number representing an exponent of 2.
So it's correct that while the mantissa is in hex, the exponent is still decimal.

Computer Precision in C language

I find intriguing some features on C programming and computer precision.
For example, in my computer if I print the DBL_MANT_DIG variable (of limits.h library) that indicates the bit precision of a double, it returns 64. That means 64bits of mantissa. And that means I can store up to 19 digits in the mantissa.
However if I ask the computer to print more digits, say printf("%.40lf",...),it still does print them. What are those digits and where are they stored?
Another thing is that if a print the variable DBL_MAX I get: 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
This has more than 19 digits. Where are they stored again?
To print this numbers I do:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <float.h>
int main(void)
{
printf("Double Min/Max: %lf %lf\n", DBL_MIN, DBL_MAX);
printf("Digits mantissa (bit precission) double: %d\n", DBL_MANT_DIG);
return 0;
}
(An IEEE double-precision floating point value has a 52-bit mantissa, not 64-bit.)
You can think of floating-point numbers as being stored in the binary equivalent of scientific notation. Even though the number you posted has "more than 19 digits," it can still be represented using only 52 mantissa bits.
Imagine you have a mantissa that holds 4 decimal digits, and a 3 decimal digit exponent. This is a floating point number, but in decimal, not binary. The maximum representable value here is 9999e999 (= 9999 * 10999), the decimal expansion of which clearly has far more than 4 digits. But it is representable using 4 decimal digits and a 3-digit exponent.
Here is a thought experiment. Consider the fraction 3/7. Print it as a decimal.
0.428571428571428571428571428571428571428571428571...
Where are the digits stored?
Consider the number 1/8. It can be represented with only an exponent and no mantissa. Yet it's decimal representation 0.125 has 3 nonzero digits. The decimal representation is computed during printing. Internally a binary representation with a base 2 exponent is used.

problems in floating point comparison [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 7 years ago.
void main()
{
float f = 0.98;
if(f <= 0.98)
printf("hi");
else
printf("hello");
getch();
}
I am getting this problem here.On using different floating point values of f i am getting different results.
Why this is happening?
f is using float precision, but 0.98 is in double precision by default, so the statement f <= 0.98 is compared using double precision.
The f is therefore converted to a double in the comparison, but may make the result slightly larger than 0.98.
Use
if(f <= 0.98f)
or use a double for f instead.
In detail... assuming float is IEEE single-precision and double is IEEE double-precision.
These kinds of floating point numbers are stored with base-2 representation. In base-2 this number needs an infinite precision to represent as it is a repeated decimal:
0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...
A float can only store 24 bits of significant figures, i.e.
0.111110101110000101000111_101...
^ round off here
= 0.111110101110000101001000
= 16441672 / 2^24
= 0.98000001907...
A double can store 53 bits of signficant figures, so
0.11111010111000010100011110101110000101000111101011100_00101000...
^ round off here
= 0.11111010111000010100011110101110000101000111101011100
= 8827055269646172 / 2^53
= 0.97999999999999998224...
So the 0.98 will become slightly larger in float and smaller in double.
It's because floating point values are not exact representations of the number. All base ten numbers need to be represented on the computer as base 2 numbers. It's in this conversion that precision is lost.
Read more about this at http://en.wikipedia.org/wiki/Floating_point
An example (from encountering this problem in my VB6 days)
To convert the number 1.1 to a single precision floating point number we need to convert it to binary. There are 32 bits that need to be created.
Bit 1 is the sign bit (is it negative [1] or position [0])
Bits 2-9 are for the exponent value
Bits 10-32 are for the mantissa (a.k.a. significand, basically the coefficient of scientific notation )
So for 1.1 the single floating point value is stored as follows (this is truncated value, the compiler may round the least significant bit behind the scenes, but all I do is truncate it, which is slightly less accurate but doesn't change the results of this example):
s --exp--- -------mantissa--------
0 01111111 00011001100110011001100
If you notice in the mantissa there is the repeating pattern 0011. 1/10 in binary is like 1/3 in decimal. It goes on forever. So to retrieve the values from the 32-bit single precision floating point value we must first convert the exponent and mantissa to decimal numbers so we can use them.
sign = 0 = a positive number
exponent: 01111111 = 127
mantissa: 00011001100110011001100 = 838860
With the mantissa we need to convert it to a decimal value. The reason is there is an implied integer ahead of the binary number (i.e. 1.00011001100110011001100). The implied number is because the mantissa represents a normalized value to be used in the scientific notation: 1.0001100110011.... * 2^(x-127).
To get the decimal value out of 838860 we simply divide by 2^-23 as there are 23 bits in the mantissa. This gives us 0.099999904632568359375. Add the implied 1 to the mantissa gives us 1.099999904632568359375. The exponent is 127 but the formula calls for 2^(x-127).
So here is the math:
(1 + 099999904632568359375) * 2^(127-127)
1.099999904632568359375 * 1 = 1.099999904632568359375
As you can see 1.1 is not really stored in the single floating point value as 1.1.

Resources