I have been told that "%a" used in C's printf would display hexadecimal format of a number. To test it, I print out the representation of 2^10:
printf ("%a", pow(2.0,10));
which gives
0x1p+10
I am confused because the exponent part "+10" looks more like a decimal format rather than a hexadecimal format. A hexadecimal format should have been 1pA. Where am I wrong?
It's correct, that format is called hexadecimal for doubles.
The man page says:
For a conversion, the double argument is converted to hexadecimal
notation (using the letters abcdef) in the style [-]0xh.hhhp[+-]d [...]
the exponent consists of a positive or negative sign followed by a
decimal number representing an exponent of 2.
So it's correct that while the mantissa is in hex, the exponent is still decimal.
Related
I find intriguing some features on C programming and computer precision.
For example, in my computer if I print the DBL_MANT_DIG variable (of limits.h library) that indicates the bit precision of a double, it returns 64. That means 64bits of mantissa. And that means I can store up to 19 digits in the mantissa.
However if I ask the computer to print more digits, say printf("%.40lf",...),it still does print them. What are those digits and where are they stored?
Another thing is that if a print the variable DBL_MAX I get: 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
This has more than 19 digits. Where are they stored again?
To print this numbers I do:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <float.h>
int main(void)
{
printf("Double Min/Max: %lf %lf\n", DBL_MIN, DBL_MAX);
printf("Digits mantissa (bit precission) double: %d\n", DBL_MANT_DIG);
return 0;
}
(An IEEE double-precision floating point value has a 52-bit mantissa, not 64-bit.)
You can think of floating-point numbers as being stored in the binary equivalent of scientific notation. Even though the number you posted has "more than 19 digits," it can still be represented using only 52 mantissa bits.
Imagine you have a mantissa that holds 4 decimal digits, and a 3 decimal digit exponent. This is a floating point number, but in decimal, not binary. The maximum representable value here is 9999e999 (= 9999 * 10999), the decimal expansion of which clearly has far more than 4 digits. But it is representable using 4 decimal digits and a 3-digit exponent.
Here is a thought experiment. Consider the fraction 3/7. Print it as a decimal.
0.428571428571428571428571428571428571428571428571...
Where are the digits stored?
Consider the number 1/8. It can be represented with only an exponent and no mantissa. Yet it's decimal representation 0.125 has 3 nonzero digits. The decimal representation is computed during printing. Internally a binary representation with a base 2 exponent is used.
What is the value of 0x1.921fb82c2bd7fp+1 in a human readable presentation?
I got this value by printf using %a.
The mantissa is hexadecimal and the exponent is a decimal value representing the power of 2 the mantissa is scaled by.
If you print a float with more precision than is stored in memory, aren't the extra places supposed to have zeros in them? I have code that is something like this:
double z[2*N]="0";
...
for( n=1; n<=2*N; n++) {
fprintf( u1, "%.25g", z[n-1]);
fputc( n<2*N ? ',' : '\n', u1);
}
Which is creating output like this:
0,0.7071067811865474617150085,....
A float should have only 17 decimal places (right? Doesn't 53 bits comes out to 17 decimal places). If that's so, then the 18th, 19th... 25th places should have zeros. Notice in the above output that they have digits other than 0 in them.
Am I misunderstanding something? If so, what?
No, 53 bits means that the 17 decimal places are what you can trust, but because base-10 notation that we use is in a different base from which the double is stored (binary), the later digits are just because 1/2^53 is not exactly 1/10^n, i.e.,
1/2^53 = .0000000000000001110223024625156540423631668090820312500000000
The string printed by your implementation shows the exact value of the double in your example, and this is permitted by the C standard, as I show below.
First, we should understand what the floating-point object represents. The C standard does a poor job of this, but, presuming your implementation uses the IEEE 754 floating-point standard, a normal floating-point object represents exactly (-1)s•2e•(1+f) for some sign bit s (0 or 1), exponent e (in range for the specific type, -1022 to 1023 for double), and fraction f (also in range, 52 bits after a radix point for double). Many people use the object to approximate nearby values, but, according to the standard, the object only represents the one value it is defined to be.
The value you show, 0.7071067811865474617150085, is exactly representable as a double (sign bit 0, exponent -1, and fraction bits [in hexadecimal] .6a09e667f3bcc16). It is important to understand the double with this value represents exactly that value; it does not represent nearby values, such as 0.707106781186547461715.
Now that we know the value being passed to fprintf, we can consider what the C standard says about this. First, the C standard defines a constant named DECIMAL_DIG. C 2011 5.2.4.2.2 11 defines this to be the number of decimal digits such that any floating-point number in the widest supported type can be rounded to that many decimal digits and back again without change to the value. The precision you passed to fprintf, 25, is likely greater than the value of DECIMAL_DIG on your system.
In C 2011 7.21.6.1 13, the standard says “If the number of significant decimal digits is more than DECIMAL_DIG but the source value is exactly representable with DECIMAL_DIG digits, then the result should be an exact representation with trailing zeros. Otherwise, the source value is bounded by two adjacent decimal strings L < U , both having DECIMAL_DIG significant digits; the value of the resultant decimal string D should satisfy L ≤ D ≤ U, with the extra stipulation that the error should have a correct sign for the current rounding direction.”
This wording allows the compiler some wiggle room. The intent is that the result must be accurate enough that it can be converted back to the original double with no error. It may be more accurate, and some C implementations will produce the exactly correct value, which is permitted since it satisfies the paragraph above.
Incidentally, the value you show is not the double closest to sqrt(2)/2. That value is +0x1.6A09E667F3BCDp-1 = 0.70710678118654757273731092936941422522068023681640625.
There is enough precision to represent 0.7071067811865474617150085 in double precision floating point. The 64 bit output is actually 3FE6A09E667F3BCC
The formula used to evaluate the number is an exponentiation, so you cannot say that 53 bits will take 17 decimal places.
EDIT:
Look at the example below in the wiki article for another instance:
0.333333333333333314829616256247390992939472198486328125
=2^(−54) × 15 5555 5555 5555 base16
=2^(−2) × (15 5555 5555 5555 base16 × 2^(−52) )
You are asking for float, but in your code appears double.
Anyway, neither float or double have always the same number of decimals. Float have assigned 32 bits (4 bytes) for a floating point representation according to IEEE 754.
From Wikipedia:
The IEEE 754 standard specifies a binary32 as having:
Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 (23 explicitly stored)
This gives from 6 to 9 significant decimal digits precision (if a
decimal string with at most 6 significant decimal is converted to IEEE
754 single precision and then converted back to the same number of
significant decimal, then the final string should match the original;
and if an IEEE 754 single precision is converted to a decimal string
with at least 9 significant decimal and then converted back to single,
then the final number must match the original).
In the case of double, from Wikipedia again:
Double-precision binary floating-point is a commonly used format on
PCs, due to its wider range over single-precision floating point, in
spite of its performance and bandwidth cost. As with single-precision
floating-point format, it lacks precision on integer numbers when
compared with an integer format of the same size. It is commonly known
simply as double. The IEEE 754 standard specifies a binary64 as
having:
Sign bit: 1 bit
Exponent width: 11 bits
Significand precision: 53 bits (52 explicitly stored)
This gives from 15 - 17 significant
decimal digits precision. If a decimal string with at most 15
significant decimal is converted to IEEE 754 double precision and then
converted back to the same number of significant decimal, then the
final string should match the original; and if an IEEE 754 double
precision is converted to a decimal string with at least 17
significant decimal and then converted back to double, then the final
number must match the original.
On the other hand, you can't expect that if you have a float and print it out with more precision that the really stored, the rest of digits will fill with 0s. The compiler can't imagine the tricks you are trying to do.
float f1 = 123.125;
int i1 = -150;
f1 = i1; // integer to floating conversion
printf("%i assigned to an float produces %f\n", i1, f1);
Output:
-150 assigned to an float produces -150.000000
My question is why the result has 6 zeros (000000) after the . and not 7 or 8 or some number?
That's just what printf does. See the man page where it says
f, F
The double argument shall be converted to decimal notation in the style "[-]ddd.ddd", where the number of digits after the radix character is equal to the precision specification. If the precision is missing, it shall be taken as 6; if the precision is explicitly zero and no '#' flag is present, no radix character shall appear. If a radix character appears, at least one digit appears before it. The low-order digit shall be rounded in an implementation-defined manner.
(emphasis mine)
It has nothing to do with how 150 is represented as a floating point number in memory (and in fact, it's promoted to a double because printf is varargs).
The number of zeros you see is a result of the default precision used by the %f printf conversion. It's basically unrelated to the integer to floating point conversion.
Because the C standard (§7.19.6.1) says that in the absence of information to the contrary, %f will print 6 decimal places.
f,F A double argument representing a floating-point number is converted to
decimal notation in the style [−]ddd.ddd, where the number of digits after
the decimal-point character is equal to the precision specification. If the
precision is missing, it is taken as 6; if the precision is zero and the # flag is
not specified, no decimal-point character appears.
Floating point arithmetic is not exact. printf is just showing that number of zeroes.
From the documentation:
The default number of digits after the
decimal point is six, but this can be
changed with a precision field. If a
decimal point appears, at least one
digit appears before it. The "double"
value is rounded to the correct number
of decimal places.
I'm learning C right now and there is a conversion specifier %a which writes a number in p-notation as opposed to %e which writes something in e-notation (exponential notation).
What is p-notation?
You use %a to get a hexadecimal representation of a floating-point number. This might be useful if you are a student learning floating-point representations, or if you want to be able to read and write an exact floating-point number with no rounding error (but not very human-readable).
This format specificier, along with many others, was added as part of the C99 standard. Dinkumware have an excellent C99 library reference free online; it's PJ Plauger's company, and he had a lot to do with both C89 and C99 standard libraries. Link above is to printing functions; the general library reference is http://www.dinkumware.com/manuals/default.aspx
Here is an extract from the c99 standard, section 7.19.6.1 (7) which shows the details for %a or %A (similar to the mac details given by dmckee above):
A double argument representing a
floating-point number is converted in
the style [−]0xh.hhhhp±d, where there
is one hexadecimal digit (which is
nonzero if the argument is a
normalized floating-point number and
is otherwise unspecified) before the
decimal-point character and the
number of hexadecimal digits after it
is equal to the precision; if the
precision is missing and FLT_RADIX is
a power of 2, then the precision is
sufficient for an exact representation
of the value; if the precision is
missing and FLT_RADIX is not a power
of 2, then the precision is sufficient
to distinguish248) values of type
double, except that trailing zeros may
be omitted; if the precision is zero
and the # flag is not specified, no
decimal- point character appears. The
letters abcdef are used for a
conversion and the letters ABCDEF for
A conversion. The A conversion
specifier produces a number with X and
P instead of x and p. The exponent
always contains at least one digit,
and only as many more digits as
necessary to represent the decimal
exponent of 2. If the value is zero,
the exponent is zero.
From the printf(3) man page on my Mac OS X box (therefore the BSD c standard library implementation):
aA
The double argument is rounded and converted to hexadecimal nota-
tion in the style [-]0xh.hhhp[+-]d, where the number of digits
after the hexadecimal-point character is equal to the precision
specification. If the precision is missing, it is taken as
enough to represent the floating-point number exactly, and no
rounding occurs. If the precision is zero, no hexadecimal-point
character appears. The p is a literal character p', and the
exponent consists of a positive or negative sign followed by a
decimal number representing an exponent of 2. The A conversion
uses the prefix ``0X'' (rather than ``0x''), the letters
``ABCDEF'' (rather than ``abcdef'') to represent the hex digits,
and the letterP' (rather than `p') to separate the mantissa and
exponent.
The 'p' (or 'P') serves to separate the (hexadecimal) mantissa from the (hexadecimal) exponent.
These specifiers are not in my K&R, and the man page is not specific about what standard (if any) specifies them.
I just checked my Debian 5.0 box (using glibc 2.7) which also has it; that man page says that it is c99 related (again, no reference to any particular standard).
This might be useful: http://www.cppreference.com/wiki/c/io/printf
Specifically, here are the format specifiers you can use in printf (w/o modifiers like .02 etc):
Code Format
%c character
%d signed integers
%i signed integers
%I64d long long (8B integer), MS-specific
%I64u unsigned long long (8B integer), MS-specific
%e scientific notation, with a lowercase “e”
%E scientific notation, with a uppercase “E”
%f floating point
%g use %e or %f, whichever is shorter
%G use %E or %f, whichever is shorter
%o octal
%s a string of characters
%u unsigned integer
%x unsigned hexadecimal, with lowercase letters
%X unsigned hexadecimal, with uppercase letters
%p a pointer
%n the argument shall be a pointer to an integer into which is placed the number of characters written so far
There is no %a format specifier (as as I'm aware, and certainly not in any of the common implementations).
There is a %p format specifier which prints a pointer address.
Ref.
UPDATE: please see other posts.