you could say that I am relatively new to C, but I need clarification on a question.
I have a char[] that represents a number. If this char[] is longer than LONG_MAX I want to tell the user it is too long. The problem is that when I compare its value to a float, it becomes truncated. Here's what I mean.
int main(int argc, char ** argv) {
char str[] = argv[1]; /* I set it to 9223372036854775809, only +1 higher than LONG_MAX */
double l = atof(str);
double j = LONG_MAX;
printf("%lf\n", l); /* This prints 9223372036854775808.000000, which is LONG_MAX ??? WHY?? */
printf("%lf\n", j); /* This prints same as above, 9223372036854775808.000000 */
printf("%s\n", l > j ? "true" : "false"); /* false */
return 0; /* what am I doing wrong? */
}
UPDATE:
I tried your iret solution and I still run into the same rounding problem
j = LONG_MAX;
int iret = sscanf (str, "%lf", &l);
if (iret != 1)
return 0; /* conversion was bad */
else {
if (l > j || l < -(j))
return 0; /* too small or too large */
}
printf("%lf\n", l);
printf("%lf\n", j);
printf("%s\n", l > j ? "true" : "false");
You can check for overflow easily enough with strtol, but it requires a little extra work.
const char *str = ...;
char *e;
long x;
errno = 0;
x = strtol(str, &e, 0);
if (!*str || *e) {
fprintf(stderr, "invalid number: %s\n", str);
exit(1);
}
if ((x == LONG_MAX || x == LONG_MIN) && errno == ERANGE) {
fprintf(stderr, "number too large: %s\n", str);
exit(1);
}
Now, let's talk about the problem with strtod (or atof, which is just a broken version of strtod).
If you convert 9223372036854775809 to a double, then 9223372036854775808 is correct. A double has 53 bits of precision, and a 64-bit long has, well, 64 bits. As soon as you start working with floating-point numbers you need to be prepared for rounding.
For example, there is round-off error in the following code. Can you spot it?
double x = 0.1;
Footnote: I'm assuming 64-bit long and IEEE double-precision double here.
Per the "man" page for atof:
http://linux.die.net/man/3/atof
The atof() function converts the initial portion of the string pointed
to by nptr to double. ... atof() does not detect errors.
Which is why I prefer to use "sscanf" instead:
int iret = sscanf (SOME_TEXT, "%lf", &SOME_DOUBLE);
If "iret != 1", then I know an error occurred, and can take the appropriate action.
IMHO...
PS:
Why aren't you declaring "l" and "j". Naughty naughty! You should always declare your variables, even in FORTRAN and Basic ;)
And why not declare them as float ("%f") or double ("%lf")?
I believe the reason why you are getting the value "9223372036854775808.000000" for both your inputs is due to the precision limitations of floating point approximations.
By the way, the value isn't LONG_MAX, but LONG_MAX + 1 -> 2^63.
By definition, the representation of LONG_MAX exactly (as an integer) requires 63 bits of precision. However, in a 64-bit floating point representation like double, there are only 53 bits of precision, since the other bits are required to store the sign and exponent. That's why both LONG_MAX and LONG_MAX + 2 end up being rounded to 9223372036854775808 <=> 2^63.
To handle this case properly, perhaps look at strtol which will set an error code when the input is out of range.
Your long seems to be 63 bits. A float has 23 significant bits (or 24 with leading 1). Your longs are the exactly the same in the upper 24 bits. So they become the same.
A float is basically mantissa*2^exponent. The exponent makes sure that the magnitude still matches. But you only have 23 bits of mantissa. So it becomes top23bitsoflong*2^(numberofremainingbits).
This is a bit simplified ignoring leading one and exponent bias.
Related
I´m looking for an alternative for the ceil() and floor() functions in C, due to I am not allowed to use these in a project.
What I have build so far is a tricky back and forth way by the use of the cast operator and with that the conversion from a floating-point value (in my case a double) into an int and later as I need the closest integers, above and below the given floating-point value, to be also double values, back to double:
#include <stdio.h>
int main(void) {
double original = 124.576;
double floorint;
double ceilint;
int f;
int c;
f = (int)original; //Truncation to closest floor integer value
c = f + 1;
floorint = (double)f;
ceilint = (double)c;
printf("Original Value: %lf, Floor Int: %lf , Ceil Int: %lf", original, floorint, ceilint);
}
Output:
Original Value: 124.576000, Floor Int: 124.000000 , Ceil Int: 125.000000
For this example normally I would not need the ceil and floor integer values of c and f to be converted back to double but I need them in double in my real program. Consider that as a requirement for the task.
Although the output is giving the desired values and seems right so far, I´m still in concern if this method is really that right and appropriate or, to say it more clearly, if this method does bring any bad behavior or issue into the program or gives me a performance-loss in comparison to other alternatives, if there are any other possible alternatives.
Do you know a better alternative? And if so, why this one should be better?
Thank you very much.
Do you know a better alternative? And if so, why this one should be better?
OP'code fails:
original is already a whole number.
original is a negative like -1.5. Truncation is not floor there.
original is just outside int range.
original is not-a-number.
Alternative construction
double my_ceil(double x)
Using the cast to some integer type trick is a problem when x is outsize the integer range. So check first if x is inside range of a wide enough integer (one whose precision exceeds double). x values outside that are already whole numbers. Recommend to go for the widest integer (u)intmax_t.
Remember that a cast to an integer is a round toward 0 and not a floor. Different handling needed if x is negative/positive when code is ceil() or floor(). OP's code missed this.
I'd avoid if (x >= INTMAX_MAX) { as that involves (double) INTMAX_MAX whose rounding and then precise value is "chosen in an implementation-defined manner". Instead, I'd compare against INTMAX_MAX_P1. some_integer_MAX is a Mersenne Number and with 2's complement, ...MIN is a negated "power of 2".
#include <inttypes.h>
#define INTMAX_MAX_P1 ((INTMAX_MAX/2 + 1)*2.0)
double my_ceil(double x) {
if (x >= INTMAX_MAX_P1) {
return x;
}
if (x < INTMAX_MIN) {
return x;
}
intmax_t i = (intmax_t) x; // this rounds towards 0
if (i < 0 || x == i) return i; // negative x is already rounded up.
return i + 1.0;
}
As x may be a not-a-number, it is more useful to reverse the compare as relational compare of a NaN is false.
double my_ceil(double x) {
if (x >= INTMAX_MIN && x < INTMAX_MAX_P1) {
intmax_t i = (intmax_t) x; // this rounds towards 0
if (i < 0 || x == i) return i; // negative x is already rounded up.
return i + 1.0;
}
return x;
}
double my_floor(double x) {
if (x >= INTMAX_MIN && x < INTMAX_MAX_P1) {
intmax_t i = (intmax_t) x; // this rounds towards 0
if (i > 0 || x == i) return i; // positive x is already rounded down.
return i - 1.0;
}
return x;
}
You're missing an important step: you need to check if the number is already integral, so for ceil assuming non-negative numbers (generalisation is trivial), use something like
double ceil(double f){
if (f >= LLONG_MAX){
// f will be integral unless you have a really funky platform
return f;
} else {
long long i = f;
return 0.0 + i + (f != i); // to obviate potential long long overflow
}
}
Another missing piece in the puzzle, which is covered off by my enclosing if, is to check if f is within the bounds of a long long. On common platforms if f was outside the bounds of a long long then it would be integral anyway.
Note that floor is trivial due to the fact that truncation to long long is always towards zero.
Is there a printf width specifier which can be applied to a floating point specifier that would automatically format the output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired?
For example, suppose I print a float to a precision of 2 decimal places:
float foobar = 0.9375;
printf("%.2f", foobar); // prints out 0.94
When I scan the output 0.94, I have no standards-compliant guarantee that I'll get the original 0.9375 floating-point value back (in this example, I probably won't).
I would like a way tell printf to automatically print the floating-point value to the necessary number of significant digits to ensure that it can be scanned back to the original value passed to printf.
I could use some of the macros in float.h to derive the maximum width to pass to printf, but is there already a specifier to automatically print to the necessary number of significant digits -- or at least to the maximum width?
I recommend #Jens Gustedt hexadecimal solution: use %a.
OP wants “print with maximum precision (or at least to the most significant decimal)”.
A simple example would be to print one seventh as in:
#include <float.h>
int Digs = DECIMAL_DIG;
double OneSeventh = 1.0/7.0;
printf("%.*e\n", Digs, OneSeventh);
// 1.428571428571428492127e-01
But let's dig deeper ...
Mathematically, the answer is "0.142857 142857 142857 ...", but we are using finite precision floating point numbers.
Let's assume IEEE 754 double-precision binary.
So the OneSeventh = 1.0/7.0 results in the value below. Also shown are the preceding and following representable double floating point numbers.
OneSeventh before = 0.1428571428571428 214571170656199683435261249542236328125
OneSeventh = 0.1428571428571428 49212692681248881854116916656494140625
OneSeventh after = 0.1428571428571428 769682682968777953647077083587646484375
Printing the exact decimal representation of a double has limited uses.
C has 2 families of macros in <float.h> to help us.
The first set is the number of significant digits to print in a string in decimal so when scanning the string back,
we get the original floating point. There are shown with the C spec's minimum value and a sample C11 compiler.
FLT_DECIMAL_DIG 6, 9 (float) (C11)
DBL_DECIMAL_DIG 10, 17 (double) (C11)
LDBL_DECIMAL_DIG 10, 21 (long double) (C11)
DECIMAL_DIG 10, 21 (widest supported floating type) (C99)
The second set is the number of significant digits a string may be scanned into a floating point and then the FP printed, still retaining the same string presentation. There are shown with the C spec's minimum value and a sample C11 compiler. I believe available pre-C99.
FLT_DIG 6, 6 (float)
DBL_DIG 10, 15 (double)
LDBL_DIG 10, 18 (long double)
The first set of macros seems to meet OP's goal of significant digits. But that macro is not always available.
#ifdef DBL_DECIMAL_DIG
#define OP_DBL_Digs (DBL_DECIMAL_DIG)
#else
#ifdef DECIMAL_DIG
#define OP_DBL_Digs (DECIMAL_DIG)
#else
#define OP_DBL_Digs (DBL_DIG + 3)
#endif
#endif
The "+ 3" was the crux of my previous answer.
Its centered on if knowing the round-trip conversion string-FP-string (set #2 macros available C89), how would one determine the digits for FP-string-FP (set #1 macros available post C89)? In general, add 3 was the result.
Now how many significant digits to print is known and driven via <float.h>.
To print N significant decimal digits one may use various formats.
With "%e", the precision field is the number of digits after the lead digit and decimal point.
So - 1 is in order. Note: This -1 is not in the initial int Digs = DECIMAL_DIG;
printf("%.*e\n", OP_DBL_Digs - 1, OneSeventh);
// 1.4285714285714285e-01
With "%f", the precision field is the number of digits after the decimal point.
For a number like OneSeventh/1000000.0, one would need OP_DBL_Digs + 6 to see all the significant digits.
printf("%.*f\n", OP_DBL_Digs , OneSeventh);
// 0.14285714285714285
printf("%.*f\n", OP_DBL_Digs + 6, OneSeventh/1000000.0);
// 0.00000014285714285714285
Note: Many are use to "%f". That displays 6 digits after the decimal point; 6 is the display default, not the precision of the number.
The short answer to print floating point numbers losslessly (such that they can be read
back in to exactly the same number, except NaN and Infinity):
If your type is float: use printf("%.9g", number).
If your type is double: use printf("%.17g", number).
Do NOT use %f, since that only specifies how many significant digits after the decimal and will truncate small numbers. For reference, the magic numbers 9 and 17 can be found in float.h which defines FLT_DECIMAL_DIG and DBL_DECIMAL_DIG.
If you are only interested in the bit (resp hex pattern) you could use the %a format. This guarantees you:
The
default precision suffices for an exact representation of the value if an exact representation in base 2 exists and otherwise is sufficiently large to distinguish values of type double.
I'd have to add that this is only available since C99.
No, there is no such printf width specifier to print floating-point with maximum precision. Let me explain why.
The maximum precision of float and double is variable, and dependent on the actual value of the float or double.
Recall float and double are stored in sign.exponent.mantissa format. This means that there are many more bits used for the fractional component for small numbers than for big numbers.
For example, float can easily distinguish between 0.0 and 0.1.
float r = 0;
printf( "%.6f\n", r ) ; // 0.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // 0.100000
But float has no idea of the difference between 1e27 and 1e27 + 0.1.
r = 1e27;
printf( "%.6f\n", r ) ; // 999999988484154753734934528.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // still 999999988484154753734934528.000000
This is because all the precision (which is limited by the number of mantissa bits) is used up for the large part of the number, left of the decimal.
The %.f modifier just says how many decimal values you want to print from the float number as far as formatting goes. The fact that the accuracy available depends on the size of the number is up to you as the programmer to handle. printf can't/doesn't handle that for you.
Simply use the macros from <float.h> and the variable-width conversion specifier (".*"):
float f = 3.14159265358979323846;
printf("%.*f\n", FLT_DIG, f);
In one of my comments to an answer I lamented that I've long wanted some way to print all the significant digits in a floating point value in decimal form, in much the same way the as the question asks. Well I finally sat down and wrote it. It's not quite perfect, and this is demo code that prints additional information, but it mostly works for my tests. Please let me know if you (i.e. anyone) would like a copy of the whole wrapper program which drives it for testing.
static unsigned int
ilog10(uintmax_t v);
/*
* Note: As presented this demo code prints a whole line including information
* about how the form was arrived with, as well as in certain cases a couple of
* interesting details about the number, such as the number of decimal places,
* and possibley the magnitude of the value and the number of significant
* digits.
*/
void
print_decimal(double d)
{
size_t sigdig;
int dplaces;
double flintmax;
/*
* If we really want to see a plain decimal presentation with all of
* the possible significant digits of precision for a floating point
* number, then we must calculate the correct number of decimal places
* to show with "%.*f" as follows.
*
* This is in lieu of always using either full on scientific notation
* with "%e" (where the presentation is always in decimal format so we
* can directly print the maximum number of significant digits
* supported by the representation, taking into acount the one digit
* represented by by the leading digit)
*
* printf("%1.*e", DBL_DECIMAL_DIG - 1, d)
*
* or using the built-in human-friendly formatting with "%g" (where a
* '*' parameter is used as the number of significant digits to print
* and so we can just print exactly the maximum number supported by the
* representation)
*
* printf("%.*g", DBL_DECIMAL_DIG, d)
*
*
* N.B.: If we want the printed result to again survive a round-trip
* conversion to binary and back, and to be rounded to a human-friendly
* number, then we can only print DBL_DIG significant digits (instead
* of the larger DBL_DECIMAL_DIG digits).
*
* Note: "flintmax" here refers to the largest consecutive integer
* that can be safely stored in a floating point variable without
* losing precision.
*/
#ifdef PRINT_ROUND_TRIP_SAFE
# ifdef DBL_DIG
sigdig = DBL_DIG;
# else
sigdig = ilog10(uipow(FLT_RADIX, DBL_MANT_DIG - 1));
# endif
#else
# ifdef DBL_DECIMAL_DIG
sigdig = DBL_DECIMAL_DIG;
# else
sigdig = (size_t) lrint(ceil(DBL_MANT_DIG * log10((double) FLT_RADIX))) + 1;
# endif
#endif
flintmax = pow((double) FLT_RADIX, (double) DBL_MANT_DIG); /* xxx use uipow() */
if (d == 0.0) {
printf("z = %.*s\n", (int) sigdig + 1, "0.000000000000000000000"); /* 21 */
} else if (fabs(d) >= 0.1 &&
fabs(d) <= flintmax) {
dplaces = (int) (sigdig - (size_t) lrint(ceil(log10(ceil(fabs(d))))));
if (dplaces < 0) {
/* XXX this is likely never less than -1 */
/*
* XXX the last digit is not significant!!! XXX
*
* This should also be printed with sprintf() and edited...
*/
printf("R = %.0f [%d too many significant digits!!!, zero decimal places]\n", d, abs(dplaces));
} else if (dplaces == 0) {
/*
* The decimal fraction here is not significant and
* should always be zero (XXX I've never seen this)
*/
printf("R = %.0f [zero decimal places]\n", d);
} else {
if (fabs(d) == 1.0) {
/*
* This is a special case where the calculation
* is off by one because log10(1.0) is 0, but
* we still have the leading '1' whole digit to
* count as a significant digit.
*/
#if 0
printf("ceil(1.0) = %f, log10(ceil(1.0)) = %f, ceil(log10(ceil(1.0))) = %f\n",
ceil(fabs(d)), log10(ceil(fabs(d))), ceil(log10(ceil(fabs(d)))));
#endif
dplaces--;
}
/* this is really the "useful" range of %f */
printf("r = %.*f [%d decimal places]\n", dplaces, d, dplaces);
}
} else {
if (fabs(d) < 1.0) {
int lz;
lz = abs((int) lrint(floor(log10(fabs(d)))));
/* i.e. add # of leading zeros to the precision */
dplaces = (int) sigdig - 1 + lz;
printf("f = %.*f [%d decimal places]\n", dplaces, d, dplaces);
} else { /* d > flintmax */
size_t n;
size_t i;
char *df;
/*
* hmmmm... the easy way to suppress the "invalid",
* i.e. non-significant digits is to do a string
* replacement of all dgits after the first
* DBL_DECIMAL_DIG to convert them to zeros, and to
* round the least significant digit.
*/
df = malloc((size_t) 1);
n = (size_t) snprintf(df, (size_t) 1, "%.1f", d);
n++; /* for the NUL */
df = realloc(df, n);
(void) snprintf(df, n, "%.1f", d);
if ((n - 2) > sigdig) {
/*
* XXX rounding the integer part here is "hard"
* -- we would have to convert the digits up to
* this point back into a binary format and
* round that value appropriately in order to
* do it correctly.
*/
if (df[sigdig] >= '5' && df[sigdig] <= '9') {
if (df[sigdig - 1] == '9') {
/*
* xxx fixing this is left as
* an exercise to the reader!
*/
printf("F = *** failed to round integer part at the least significant digit!!! ***\n");
free(df);
return;
} else {
df[sigdig - 1]++;
}
}
for (i = sigdig; df[i] != '.'; i++) {
df[i] = '0';
}
} else {
i = n - 1; /* less the NUL */
if (isnan(d) || isinf(d)) {
sigdig = 0; /* "nan" or "inf" */
}
}
printf("F = %.*s. [0 decimal places, %lu digits, %lu digits significant]\n",
(int) i, df, (unsigned long int) i, (unsigned long int) sigdig);
free(df);
}
}
return;
}
static unsigned int
msb(uintmax_t v)
{
unsigned int mb = 0;
while (v >>= 1) { /* unroll for more speed... (see ilog2()) */
mb++;
}
return mb;
}
static unsigned int
ilog10(uintmax_t v)
{
unsigned int r;
static unsigned long long int const PowersOf10[] =
{ 1LLU, 10LLU, 100LLU, 1000LLU, 10000LLU, 100000LLU, 1000000LLU,
10000000LLU, 100000000LLU, 1000000000LLU, 10000000000LLU,
100000000000LLU, 1000000000000LLU, 10000000000000LLU,
100000000000000LLU, 1000000000000000LLU, 10000000000000000LLU,
100000000000000000LLU, 1000000000000000000LLU,
10000000000000000000LLU };
if (!v) {
return ~0U;
}
/*
* By the relationship "log10(v) = log2(v) / log2(10)", we need to
* multiply "log2(v)" by "1 / log2(10)", which is approximately
* 1233/4096, or (1233, followed by a right shift of 12).
*
* Finally, since the result is only an approximation that may be off
* by one, the exact value is found by subtracting "v < PowersOf10[r]"
* from the result.
*/
r = ((msb(v) * 1233) >> 12) + 1;
return r - (v < PowersOf10[r]);
}
I run a small experiment to verify that printing with DBL_DECIMAL_DIG does indeed exactly preserve the number's binary representation. It turned out that for the compilers and C libraries I tried, DBL_DECIMAL_DIG is indeed the number of digits required, and printing with even one digit less creates a significant problem.
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
union {
short s[4];
double d;
} u;
void
test(int digits)
{
int i, j;
char buff[40];
double d2;
int n, num_equal, bin_equal;
srand(17);
n = num_equal = bin_equal = 0;
for (i = 0; i < 1000000; i++) {
for (j = 0; j < 4; j++)
u.s[j] = (rand() << 8) ^ rand();
if (isnan(u.d))
continue;
n++;
sprintf(buff, "%.*g", digits, u.d);
sscanf(buff, "%lg", &d2);
if (u.d == d2)
num_equal++;
if (memcmp(&u.d, &d2, sizeof(double)) == 0)
bin_equal++;
}
printf("Tested %d values with %d digits: %d found numericaly equal, %d found binary equal\n", n, digits, num_equal, bin_equal);
}
int
main()
{
test(DBL_DECIMAL_DIG);
test(DBL_DECIMAL_DIG - 1);
return 0;
}
I run this with Microsoft's C compiler 19.00.24215.1 and gcc version 7.4.0 20170516 (Debian 6.3.0-18+deb9u1). Using one less decimal digit halves the number of numbers that compare exactly equal. (I also verified that rand() as used indeed produces about one million different numbers.) Here are the detailed results.
Microsoft C
Tested 999507 values with 17 digits: 999507 found numericaly equal, 999507 found binary equal
Tested 999507 values with 16 digits: 545389 found numericaly equal, 545389 found binary equal
GCC
Tested 999485 values with 17 digits: 999485 found numericaly equal, 999485 found binary equal
Tested 999485 values with 16 digits: 545402 found numericaly equal, 545402 found binary equal
To my knowledge, there is a well diffused algorithm allowing to output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired in dtoa.c written by David Gay, which is available here on Netlib (see also the associated paper). This code is used e.g. in Python, MySQL, Scilab, and many others.
I'm trying to write a code that converts a real number to a 64 bit floating point binary. In order to do this, the user inputs a real number (for example, 547.4242) and the program must output a 64 bit floating point binary.
My ideas:
The sign part is easy.
The program converts the integer part (547 for the previous example) and stores the result in an int variable. Then, the program converts the fractional part (.4242 for the previous example) and stores the result into an array (each position of the array stores '1' or '0').
This is where I'm stuck. Summarizing, I have: "Integer part = 1000100011" (type int) and "Fractional part = 0110110010011000010111110000011011110110100101000100" (array).
How can I proceed?
the following code is used to determine internal representation of a floating point number according to the IEEE754 notation. This code is made in Turbo c++ ide but you can easily convert for a generalised ide.
#include<conio.h>
#include<stdio.h>
void decimal_to_binary(unsigned char);
union u
{
float f;
char c;
};
int main()
{
int i;
char*ptr;
union u a;
clrscr();
printf("ENTER THE FLOATING POINT NUMBER : \n");
scanf("%f",&a.f);
ptr=&a.c+sizeof(float);
for(i=0;i<sizeof(float);i++)
{
ptr--;
decimal_to_binary(*ptr);
}
getch();
return 0;
}
void decimal_to_binary(unsigned char n)
{
int arr[8];
int i;
//printf("n = %u ",n);
for(i=7;i>=0;i--)
{
if(n%2==0)
arr[i]=0;
else
arr[i]=1;
n/=2;
}
for(i=0;i<8;i++)
printf("%d",arr[i]);
printf(" ");
}
For further details visit Click here!
In order to correctly round all possible decimal representations to the nearest double, you need big integers. Using only the basic integer types from C will leave you to re-implement big integer arithmetics. Each of these two approaches is possible, more information about each follows:
For the first approach, you need a big integer library: GMP is a good one. Armed with such a big integer library, you tackle an input such as the example 123.456E78 as the integer 123456 * 1075 and start wondering what values M in [253 … 254) and P in [-1022 … 1023] make (M / 253) * 2P closest to this number. This question can be answered with big integer operations, following the steps described in this blog post (summary: first determine P. Then use a division to compute M). A complete implementation must take care of subnormal numbers and infinities (inf is the correct result to return for any decimal representation of a number that would have an exponent larger than +1023).
The second approach, if you do not want to include or implement a full general-purpose big integer library, still requires a few basic operations to be implemented on arrays of C integers representing large numbers. The function decfloat() in this implementation represents large numbers in base 109 because that simplifies the conversion from the initial decimal representation to the internal representation as an array x of uint32_t.
Following is a basic conversion. Enough to get OP started.
OP's "integer part of real number" --> int is far too limiting. Better to simply convert the entire string to a large integer like uintmax_t. Note the decimal point '.' and account for overflow while scanning.
This code does not handle exponents nor negative numbers. It may be off in the the last bit or so due to limited integer ui or the the final num = ui * pow10(expo). It handles most overflow cases.
#include <inttypes.h>
double my_atof(const char *src) {
uintmax_t ui = 0;
int dp = '.';
size_t dpi;
size_t i = 0;
size_t toobig = 0;
int ch;
for (i = 0; (ch = (unsigned char) src[i]) != '\0'; i++) {
if (ch == dp) {
dp = '\0'; // only get 1 dp
dpi = i;
continue;
}
if (!isdigit(ch)) {
break; // illegal character
}
ch -= '0';
// detect overflow
if (toobig ||
(ui >= UINTMAX_MAX / 10 &&
(ui > UINTMAX_MAX / 10 || ch > UINTMAX_MAX % 10))) {
toobig++;
continue;
}
ui = ui * 10 + ch;
}
intmax_t expo = toobig;
if (dp == '\0') {
expo -= i - dpi - 1;
}
double num;
if (expo < 0) {
// slightly more precise than: num = ui * pow10(expo);
num = ui / pow10(-expo);
} else {
num = ui * pow10(expo);
}
return num;
}
The trick is to treat the value as an integer, so read your 547.4242 as an unsigned long long (ie 64-bits or more), ie 5474242, counting the number of digits after the '.', in this case 4. Now you have a value which is 10^4 bigger than it should be. So you float the 5474242 (as a double, or long double) and divide by 10^4.
Decimal to binary conversion is deceptively simple. When you have more bits than the float will hold, then it will have to round. More fun occurs when you have more digits than a 64-bit integer will hold -- noting that trailing zeros are special -- and you have to decide whether to round or not (and what rounding occurs when you float). Then there's dealing with an E+/-99. Then when you do the eventual division (or multiplication) by 10^n, you have (a) another potential rounding, and (b) the issue that large 10^n are not exactly represented in your floating point -- which is another source of error. (And for E+/-99 forms, you may need upto and a little beyond 10^300 for the final step.)
Enjoy !
Is there a printf width specifier which can be applied to a floating point specifier that would automatically format the output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired?
For example, suppose I print a float to a precision of 2 decimal places:
float foobar = 0.9375;
printf("%.2f", foobar); // prints out 0.94
When I scan the output 0.94, I have no standards-compliant guarantee that I'll get the original 0.9375 floating-point value back (in this example, I probably won't).
I would like a way tell printf to automatically print the floating-point value to the necessary number of significant digits to ensure that it can be scanned back to the original value passed to printf.
I could use some of the macros in float.h to derive the maximum width to pass to printf, but is there already a specifier to automatically print to the necessary number of significant digits -- or at least to the maximum width?
I recommend #Jens Gustedt hexadecimal solution: use %a.
OP wants “print with maximum precision (or at least to the most significant decimal)”.
A simple example would be to print one seventh as in:
#include <float.h>
int Digs = DECIMAL_DIG;
double OneSeventh = 1.0/7.0;
printf("%.*e\n", Digs, OneSeventh);
// 1.428571428571428492127e-01
But let's dig deeper ...
Mathematically, the answer is "0.142857 142857 142857 ...", but we are using finite precision floating point numbers.
Let's assume IEEE 754 double-precision binary.
So the OneSeventh = 1.0/7.0 results in the value below. Also shown are the preceding and following representable double floating point numbers.
OneSeventh before = 0.1428571428571428 214571170656199683435261249542236328125
OneSeventh = 0.1428571428571428 49212692681248881854116916656494140625
OneSeventh after = 0.1428571428571428 769682682968777953647077083587646484375
Printing the exact decimal representation of a double has limited uses.
C has 2 families of macros in <float.h> to help us.
The first set is the number of significant digits to print in a string in decimal so when scanning the string back,
we get the original floating point. There are shown with the C spec's minimum value and a sample C11 compiler.
FLT_DECIMAL_DIG 6, 9 (float) (C11)
DBL_DECIMAL_DIG 10, 17 (double) (C11)
LDBL_DECIMAL_DIG 10, 21 (long double) (C11)
DECIMAL_DIG 10, 21 (widest supported floating type) (C99)
The second set is the number of significant digits a string may be scanned into a floating point and then the FP printed, still retaining the same string presentation. There are shown with the C spec's minimum value and a sample C11 compiler. I believe available pre-C99.
FLT_DIG 6, 6 (float)
DBL_DIG 10, 15 (double)
LDBL_DIG 10, 18 (long double)
The first set of macros seems to meet OP's goal of significant digits. But that macro is not always available.
#ifdef DBL_DECIMAL_DIG
#define OP_DBL_Digs (DBL_DECIMAL_DIG)
#else
#ifdef DECIMAL_DIG
#define OP_DBL_Digs (DECIMAL_DIG)
#else
#define OP_DBL_Digs (DBL_DIG + 3)
#endif
#endif
The "+ 3" was the crux of my previous answer.
Its centered on if knowing the round-trip conversion string-FP-string (set #2 macros available C89), how would one determine the digits for FP-string-FP (set #1 macros available post C89)? In general, add 3 was the result.
Now how many significant digits to print is known and driven via <float.h>.
To print N significant decimal digits one may use various formats.
With "%e", the precision field is the number of digits after the lead digit and decimal point.
So - 1 is in order. Note: This -1 is not in the initial int Digs = DECIMAL_DIG;
printf("%.*e\n", OP_DBL_Digs - 1, OneSeventh);
// 1.4285714285714285e-01
With "%f", the precision field is the number of digits after the decimal point.
For a number like OneSeventh/1000000.0, one would need OP_DBL_Digs + 6 to see all the significant digits.
printf("%.*f\n", OP_DBL_Digs , OneSeventh);
// 0.14285714285714285
printf("%.*f\n", OP_DBL_Digs + 6, OneSeventh/1000000.0);
// 0.00000014285714285714285
Note: Many are use to "%f". That displays 6 digits after the decimal point; 6 is the display default, not the precision of the number.
The short answer to print floating point numbers losslessly (such that they can be read
back in to exactly the same number, except NaN and Infinity):
If your type is float: use printf("%.9g", number).
If your type is double: use printf("%.17g", number).
Do NOT use %f, since that only specifies how many significant digits after the decimal and will truncate small numbers. For reference, the magic numbers 9 and 17 can be found in float.h which defines FLT_DECIMAL_DIG and DBL_DECIMAL_DIG.
If you are only interested in the bit (resp hex pattern) you could use the %a format. This guarantees you:
The
default precision suffices for an exact representation of the value if an exact representation in base 2 exists and otherwise is sufficiently large to distinguish values of type double.
I'd have to add that this is only available since C99.
No, there is no such printf width specifier to print floating-point with maximum precision. Let me explain why.
The maximum precision of float and double is variable, and dependent on the actual value of the float or double.
Recall float and double are stored in sign.exponent.mantissa format. This means that there are many more bits used for the fractional component for small numbers than for big numbers.
For example, float can easily distinguish between 0.0 and 0.1.
float r = 0;
printf( "%.6f\n", r ) ; // 0.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // 0.100000
But float has no idea of the difference between 1e27 and 1e27 + 0.1.
r = 1e27;
printf( "%.6f\n", r ) ; // 999999988484154753734934528.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // still 999999988484154753734934528.000000
This is because all the precision (which is limited by the number of mantissa bits) is used up for the large part of the number, left of the decimal.
The %.f modifier just says how many decimal values you want to print from the float number as far as formatting goes. The fact that the accuracy available depends on the size of the number is up to you as the programmer to handle. printf can't/doesn't handle that for you.
Simply use the macros from <float.h> and the variable-width conversion specifier (".*"):
float f = 3.14159265358979323846;
printf("%.*f\n", FLT_DIG, f);
In one of my comments to an answer I lamented that I've long wanted some way to print all the significant digits in a floating point value in decimal form, in much the same way the as the question asks. Well I finally sat down and wrote it. It's not quite perfect, and this is demo code that prints additional information, but it mostly works for my tests. Please let me know if you (i.e. anyone) would like a copy of the whole wrapper program which drives it for testing.
static unsigned int
ilog10(uintmax_t v);
/*
* Note: As presented this demo code prints a whole line including information
* about how the form was arrived with, as well as in certain cases a couple of
* interesting details about the number, such as the number of decimal places,
* and possibley the magnitude of the value and the number of significant
* digits.
*/
void
print_decimal(double d)
{
size_t sigdig;
int dplaces;
double flintmax;
/*
* If we really want to see a plain decimal presentation with all of
* the possible significant digits of precision for a floating point
* number, then we must calculate the correct number of decimal places
* to show with "%.*f" as follows.
*
* This is in lieu of always using either full on scientific notation
* with "%e" (where the presentation is always in decimal format so we
* can directly print the maximum number of significant digits
* supported by the representation, taking into acount the one digit
* represented by by the leading digit)
*
* printf("%1.*e", DBL_DECIMAL_DIG - 1, d)
*
* or using the built-in human-friendly formatting with "%g" (where a
* '*' parameter is used as the number of significant digits to print
* and so we can just print exactly the maximum number supported by the
* representation)
*
* printf("%.*g", DBL_DECIMAL_DIG, d)
*
*
* N.B.: If we want the printed result to again survive a round-trip
* conversion to binary and back, and to be rounded to a human-friendly
* number, then we can only print DBL_DIG significant digits (instead
* of the larger DBL_DECIMAL_DIG digits).
*
* Note: "flintmax" here refers to the largest consecutive integer
* that can be safely stored in a floating point variable without
* losing precision.
*/
#ifdef PRINT_ROUND_TRIP_SAFE
# ifdef DBL_DIG
sigdig = DBL_DIG;
# else
sigdig = ilog10(uipow(FLT_RADIX, DBL_MANT_DIG - 1));
# endif
#else
# ifdef DBL_DECIMAL_DIG
sigdig = DBL_DECIMAL_DIG;
# else
sigdig = (size_t) lrint(ceil(DBL_MANT_DIG * log10((double) FLT_RADIX))) + 1;
# endif
#endif
flintmax = pow((double) FLT_RADIX, (double) DBL_MANT_DIG); /* xxx use uipow() */
if (d == 0.0) {
printf("z = %.*s\n", (int) sigdig + 1, "0.000000000000000000000"); /* 21 */
} else if (fabs(d) >= 0.1 &&
fabs(d) <= flintmax) {
dplaces = (int) (sigdig - (size_t) lrint(ceil(log10(ceil(fabs(d))))));
if (dplaces < 0) {
/* XXX this is likely never less than -1 */
/*
* XXX the last digit is not significant!!! XXX
*
* This should also be printed with sprintf() and edited...
*/
printf("R = %.0f [%d too many significant digits!!!, zero decimal places]\n", d, abs(dplaces));
} else if (dplaces == 0) {
/*
* The decimal fraction here is not significant and
* should always be zero (XXX I've never seen this)
*/
printf("R = %.0f [zero decimal places]\n", d);
} else {
if (fabs(d) == 1.0) {
/*
* This is a special case where the calculation
* is off by one because log10(1.0) is 0, but
* we still have the leading '1' whole digit to
* count as a significant digit.
*/
#if 0
printf("ceil(1.0) = %f, log10(ceil(1.0)) = %f, ceil(log10(ceil(1.0))) = %f\n",
ceil(fabs(d)), log10(ceil(fabs(d))), ceil(log10(ceil(fabs(d)))));
#endif
dplaces--;
}
/* this is really the "useful" range of %f */
printf("r = %.*f [%d decimal places]\n", dplaces, d, dplaces);
}
} else {
if (fabs(d) < 1.0) {
int lz;
lz = abs((int) lrint(floor(log10(fabs(d)))));
/* i.e. add # of leading zeros to the precision */
dplaces = (int) sigdig - 1 + lz;
printf("f = %.*f [%d decimal places]\n", dplaces, d, dplaces);
} else { /* d > flintmax */
size_t n;
size_t i;
char *df;
/*
* hmmmm... the easy way to suppress the "invalid",
* i.e. non-significant digits is to do a string
* replacement of all dgits after the first
* DBL_DECIMAL_DIG to convert them to zeros, and to
* round the least significant digit.
*/
df = malloc((size_t) 1);
n = (size_t) snprintf(df, (size_t) 1, "%.1f", d);
n++; /* for the NUL */
df = realloc(df, n);
(void) snprintf(df, n, "%.1f", d);
if ((n - 2) > sigdig) {
/*
* XXX rounding the integer part here is "hard"
* -- we would have to convert the digits up to
* this point back into a binary format and
* round that value appropriately in order to
* do it correctly.
*/
if (df[sigdig] >= '5' && df[sigdig] <= '9') {
if (df[sigdig - 1] == '9') {
/*
* xxx fixing this is left as
* an exercise to the reader!
*/
printf("F = *** failed to round integer part at the least significant digit!!! ***\n");
free(df);
return;
} else {
df[sigdig - 1]++;
}
}
for (i = sigdig; df[i] != '.'; i++) {
df[i] = '0';
}
} else {
i = n - 1; /* less the NUL */
if (isnan(d) || isinf(d)) {
sigdig = 0; /* "nan" or "inf" */
}
}
printf("F = %.*s. [0 decimal places, %lu digits, %lu digits significant]\n",
(int) i, df, (unsigned long int) i, (unsigned long int) sigdig);
free(df);
}
}
return;
}
static unsigned int
msb(uintmax_t v)
{
unsigned int mb = 0;
while (v >>= 1) { /* unroll for more speed... (see ilog2()) */
mb++;
}
return mb;
}
static unsigned int
ilog10(uintmax_t v)
{
unsigned int r;
static unsigned long long int const PowersOf10[] =
{ 1LLU, 10LLU, 100LLU, 1000LLU, 10000LLU, 100000LLU, 1000000LLU,
10000000LLU, 100000000LLU, 1000000000LLU, 10000000000LLU,
100000000000LLU, 1000000000000LLU, 10000000000000LLU,
100000000000000LLU, 1000000000000000LLU, 10000000000000000LLU,
100000000000000000LLU, 1000000000000000000LLU,
10000000000000000000LLU };
if (!v) {
return ~0U;
}
/*
* By the relationship "log10(v) = log2(v) / log2(10)", we need to
* multiply "log2(v)" by "1 / log2(10)", which is approximately
* 1233/4096, or (1233, followed by a right shift of 12).
*
* Finally, since the result is only an approximation that may be off
* by one, the exact value is found by subtracting "v < PowersOf10[r]"
* from the result.
*/
r = ((msb(v) * 1233) >> 12) + 1;
return r - (v < PowersOf10[r]);
}
I run a small experiment to verify that printing with DBL_DECIMAL_DIG does indeed exactly preserve the number's binary representation. It turned out that for the compilers and C libraries I tried, DBL_DECIMAL_DIG is indeed the number of digits required, and printing with even one digit less creates a significant problem.
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
union {
short s[4];
double d;
} u;
void
test(int digits)
{
int i, j;
char buff[40];
double d2;
int n, num_equal, bin_equal;
srand(17);
n = num_equal = bin_equal = 0;
for (i = 0; i < 1000000; i++) {
for (j = 0; j < 4; j++)
u.s[j] = (rand() << 8) ^ rand();
if (isnan(u.d))
continue;
n++;
sprintf(buff, "%.*g", digits, u.d);
sscanf(buff, "%lg", &d2);
if (u.d == d2)
num_equal++;
if (memcmp(&u.d, &d2, sizeof(double)) == 0)
bin_equal++;
}
printf("Tested %d values with %d digits: %d found numericaly equal, %d found binary equal\n", n, digits, num_equal, bin_equal);
}
int
main()
{
test(DBL_DECIMAL_DIG);
test(DBL_DECIMAL_DIG - 1);
return 0;
}
I run this with Microsoft's C compiler 19.00.24215.1 and gcc version 7.4.0 20170516 (Debian 6.3.0-18+deb9u1). Using one less decimal digit halves the number of numbers that compare exactly equal. (I also verified that rand() as used indeed produces about one million different numbers.) Here are the detailed results.
Microsoft C
Tested 999507 values with 17 digits: 999507 found numericaly equal, 999507 found binary equal
Tested 999507 values with 16 digits: 545389 found numericaly equal, 545389 found binary equal
GCC
Tested 999485 values with 17 digits: 999485 found numericaly equal, 999485 found binary equal
Tested 999485 values with 16 digits: 545402 found numericaly equal, 545402 found binary equal
To my knowledge, there is a well diffused algorithm allowing to output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired in dtoa.c written by David Gay, which is available here on Netlib (see also the associated paper). This code is used e.g. in Python, MySQL, Scilab, and many others.
I'm attempting to store the value 0.9999 into an mpfr_t variable using the mpfr_set_str() function
But 0.9999 is rounded to 1 (or some other value != 0.9999) during storage, no matter the round value (GMP_RNDD, GMP_RNDU, GMP_RNDN, GMP_RNDZ)
So what's the best method to store 0.9999 in an mpfr_t variable using mpfr_set_str()?
Is it possible?
Here is my test program, it prints "buffer is: 1", instead of the wanted "buffer is: 0.9999":
int main()
{
size_t precision = 4;
mpfr_t mpfrValue;
mpfr_init2(mpfrValue, precision);
mpfr_set_str(mpfrValue, "0.9999", 10, GMP_RNDN);
char *buffer = (char*)malloc((sizeof(char) * precision) + 3);
mp_exp_t exponent;
mpfr_get_str(buffer,
&exponent,
10,
precision,
mpfrValue,
GMP_RNDN);
printf("buffer is: %s\n", buffer);
free(buffer);
mpfr_clear(mpfrValue);
return 0;
}
Thanks for the help
precision is given in bits, not in decimal digits, as you seem to be assuming. It seems that you can reprint the correct value to 4 decimal digits with 15 bits precision. Also, you can output directly using mpfr_printf.
If you do need to use mpfr_get_str, I would pass null as the first parameter. If you do that the string is allocated for you. Then, to free it you call mpfr_free_str.
int main()
{
size_t precision = 15;
mpfr_t mpfrValue;
mpfr_init2(mpfrValue, precision);
mpfr_set_str(mpfrValue, "0.9999", 10, GMP_RNDN);
mpfr_printf("Value is: %Rf\n", mpfrValue);
mpfr_clear(mpfrValue);
return 0;
}