e format with printf not printing the desired output [duplicate]

e format with printf not printing the desired output [duplicate] - c

I want to control the number of exponent digits after 'e' in C printf %e?
For example, C printf("%e") result 2.35e+03, but I want 2.35e+003, I need 3 digits of exponent, how do I use printf?
Code:
#include<stdio.h>
int main()
{
double x=34523423.52342353;
printf("%.3g\n%.3e",x,x);
return 0;
}
Result:
http://codepad.org/dSLzQIrn
3.45e+07
3.452e+07
I want
3.45e+007
3.452e+007
But interestingly, I got the right results in Windows with MinGW.

"...The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent. ..." C11dr §7.21.6.1 8
So 3.45e+07 is compliant (what OP does not want) and 3.45e+007 is not compliant (what OP wants).
As C does not provide a standard way for code to alter the number of exponent digits, code is left to fend for itself.
Various compilers support some control.
visual studio _set_output_format
For fun, following is DIY code
double x = 34523423.52342353;
// - 1 . xxx e - EEEE \0
#define ExpectedSize (1+1+1 +3 +1+1+ 4 + 1)
char buf[ExpectedSize + 10];
snprintf(buf, sizeof buf, "%.3e", x);
char *e = strchr(buf, 'e'); // lucky 'e' not in "Infinity" nor "NaN"
if (e) {
e++;
int expo = atoi(e);
snprintf(e, sizeof buf - (e - buf), "%05d", expo); // 5 more illustrative than 3
}
puts(buf);
3.452e00007
Also see c++ how to get "one digit exponent" with printf

printf Format tags prototype:
%[flags][width][.precision][length]specifier
The precision
... This gives ... the number of digits to appear after
the radix character for a, A, e, E, f, and F conversions ... .
You are using the conversion and the precision specifier correctly, the difference is with the implementations of the C library function and the environments on the differing systems. The precision specifies the number of digits after the '.' (dot, period, etc..). It does not set the number of characters that represent the exponentiation. The facts that it provides 3 digits on windows is just the way windows specifies the format, not the way the C standard library specifies that printf will work.
It would take comparing how the source implementations differ to see what is relied on for that piece of the format string. (it will probably boil down to some obscure difference in the way the windows v. linux/unix environments/locale/etc. are defined or specified)

char *nexp(double x, int p, int n) // Number with p digits of precision, n digits of exponent.
{
const int NN=12;
static char s[NN][256];//(fvca)
static int i=-1;
int j,e;
i=(++i)%NN; // Index of what s is to be used...
sprintf(s[i],"%.*lE", p,x); // Number...
for(j=0; s[i][j]; j++) if(s[i][j]=='E') break; // Find the 'E'...
if(s[i][j]=='E') // Found!
{
e= atoi(s[i]+j+1);
sprintf(s[i]+j+1, "%+0*d", n+1,e);
return s[i];
}
else return "***";
}
// Best Regards, GGa
// G_G

Related

How do I print a floating-point value for later scanning with perfect accuracy?

Suppose I have a floating-point value of type float or double (i.e. 32 or 64 bits on typical machines). I want to print this value as text (e.g. to the standard output stream), and then later, in some other process, scan it back in - with fscanf() if I'm using C, or perhaps with istream::operator>>() if I'm using C++. But - I need the scanned float to end up being exactly, identical to the original value (up to equivalent representations of the same value). Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float.
How should I do this? I'm assuming the standard library of (C and C++) won't be sufficient, but perhaps I'm wrong. I suppose that a sufficient number of decimal digits might be able to guarantee an error that's underneath the precision threshold - but that's not the same as guaranteeing the rounding/truncation will happen just the way I want it.
Notes:
The scanning does not having to be perfectly accurate w.r.t. the value it scans, only the original value.
If it makes it easier, you may assume the value is a number and is not infinity.
denormal support is desired but not required; still if we get a denormal, failure should be conspicuous.

First, you should use the %a format with fprintf and fscanf. This is what it was designed for, and the C standard requires it to work (reproduce the original number) if the implementation uses binary floating-point.
Failing that, you should print a float with at least FLT_DECIMAL_DIG significant digits and a double with at least DBL_DECIMAL_DIG significant digits. Those constants are defined in <float.h> and are defined:
… number of decimal digits, n, such that any floating-point number with p radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,… [b is the base used for the floating-point format, defined in FLT_RADIX, and p is the number of base-b digits in the format.]
For example:
printf("%.*g\n", FLT_DECIMAL_DIG, 1.f/3);
or:
#define QuoteHelper(x) #x
#define Quote(x) QuoteHelper(x)
…
printf("%." Quote(FLT_DECIMAL_DIG) "g\n", 1.f/3);
In C++, these constants are defined in <limits> as std::numeric_limits<Type>::max_digits10, where Type is float or double or another floating-point type.
Note that the C standard only recommends that such a round-trip through a decimal numeral work; it does not require it. For example, C 2018 5.2.4.2.2 15 says, under the heading “Recommended practice”:
Conversion from (at least) double to decimal with DECIMAL_DIG digits and back should be the identity function. [DECIMAL_DIG is the equivalent of FLT_DECIMAL_DIG or DBL_DECIMAL_DIG for the widest floating-point format supported in the implementation.]
In contrast, if you use %a, and FLT_RADIX is a power of two (meaning the implementation uses a floating-point base that is two, 16, or another power of two), then C standard requires that the result of scanning the numeral produced with %a equals the original number.

I need the scanned float to end up being exactly, identical to the original value.
As already pointed out in the other answers, that can be achieved with the %a format specifier.
Also, the printed value should be easily readable - to a human - as floating-point, i.e. I don't want to print 0x42355316 and reinterpret that as a 32-bit float.
That's more tricky and subjective. The first part of the string that %a produces is in fact a fraction composed by hexadecimal digits, so that an output like 0x1.4p+3 may take some time to be parsed as 10 by a human reader.
An option could be to print all the decimal digits needed to represent the floating-point value, but there may be a lot of them. Consider, for example the value 0.1, its closest representation as a 64-bit float may be
0x1.999999999999ap-4 == 0.1000000000000000055511151231257827021181583404541015625
While printf("%.*lf\n", DBL_DECIMAL_DIG, 01); (see e.g. Eric's answer) would print
0.10000000000000001 // If DBL_DECIMAL_DIG == 17
My proposal is somewhere in the middle. Similarly to what %a does, we can exactly represent any floating-point value with radix 2 as a fraction multiplied by 2 raised to some integer power. We can transform that fraction into a whole number (increasing the exponent accordingly) and print it as a decimal value.
0x1.999999999999ap-4 --> 1.999999999999a16 * 2-4 --> 1999999999999a16 * 2-56
--> 720575940379279410 * 2-56
That whole number has a limited number of digits (it's < 253), but the result it's still an exact representation of the original double value.
The following snippet is a proof of concept, without any check for corner cases. The format specifier %a separates the mantissa and the exponent with a p character (as in "... multiplied by two raised to the Power of..."), I'll use a q instead, for no particular reason other than using a different symbol.
The value of the mantissa will also be reduced (and the exponent raised accordingly), removing all the trailing zero-bits. The idea beeing that 5q+1 (parsed as 510 * 21) should be more "easily" identified as 10, rather than 2814749767106560q-48.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void to_my_format(double x, char *str)
{
int exponent;
double mantissa = frexp(x, &exponent);
long long m = 0;
if ( mantissa ) {
exponent -= 52;
m = (long long)scalbn(mantissa, 52);
// A reduced mantissa should be more readable
while (m && m % 2 == 0) {
++exponent;
m /= 2;
}
}
sprintf(str, "%lldq%+d", m, exponent);
// ^
// Here 'q' is used to separate the mantissa from the exponent
}
double from_my_format(char const *str)
{
char *end;
long long mantissa = strtoll(str, &end, 10);
long exponent = strtol(str + (end - str + 1), &end, 10);
return scalbn(mantissa, exponent);
}
int main(void)
{
double tests[] = { 1, 0.5, 2, 10, -256, acos(-1), 1000000, 0.1, 0.125 };
size_t n = (sizeof tests) / (sizeof *tests);
char num[32];
for ( size_t i = 0; i < n; ++i ) {
to_my_format(tests[i], num);
double x = from_my_format(num);
printf("%22s%22a ", num, tests[i]);
if ( tests[i] != x )
printf(" *** %22a *** Round-trip failed\n", x);
else
printf("%58.55g\n", x);
}
return 0;
}
Testable here.
Generally, the improvement in readability is admitedly little to none, surely a matter of opinion.

You can use the %a format specifier to print the value as hexadecimal floating point. Note that this is not the same as reinterpreting the float as an integer and printing the integer value.
For example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
float x;
scanf("%f", &x);
printf("x=%.7f\n", x);
char str[20];
sprintf(str, "%a", x);
printf("str=%s\n", str);
float y;
sscanf(str, "%f", &y);
printf("y=%.7f\n", y);
printf("x==y: %d\n", (x == y));
return 0;
}
With an input of 4, this outputs:
x=4.0000000
str=0x1p+2
y=4.0000000
x==y: 1
With an input of 3.3, this outputs:
x=3.3000000
str=0x1.a66666p+1
y=3.3000000
x==y: 1
As you can see from the output, the %a format specifier prints in exponential format with the significand in hex and the exponent in decimal. This format can then be converted directly back to the exact same value as demonstrated by the equality check.

How to convert float number to string without losing user-entered precision in C?

Here's what I'm trying to do:
I need to print the fractional part of a floating number which has to be input as a float during user input.
The fractional part should be like: if float is 43.3423, the output should be 3423; and if number is 45.3400 output should be 3400.
This can be done easily with a string input but I need a way to make this work with float without losing the extra zeros or without appending zeros to user's original input.
Here's what I already tried :-
Take the fractional part by frac = num - (int)num and then multiplying frac until we get zero as the remainder. But this fails for cases like 34.3400 — the last two zeros won't get included with this method.
Convert the float number to a string by
char string[20];
sprintf(string, "%f", float_number);
The sprintf function puts the float number as a string but here also it doesn't automatically detect the user entered precision and fills the string with extra zeros at the end (6 total precision). So here also the information about the user's original entered precision is not obtained.
So, is there a way to get this done? The number must be taken as float number from user. Is there any way to get info about what's the user's entered precision? If it's not possible, an explanation would be very helpful.

I think I understand where you're coming from. E.g. in physics, it's a difference whether you write 42.5 or 42.500, the number of significant digits is implicitly given. 42.5 stands for any number x: 42.45 <= x < 42.55 and 42.500 for any x: 42.4995 <= x < 42.5005.
For larger numbers, you would use scientific notation: 1.0e6 would mean a number x with x: 950000 <= x < 1050000.
A floating point number uses this same format, but with binary digits (sometimes called bits ;)) instead of decimal digits. But there are two important differences:
The number of digits (bits) used depends only on the data type of the floating point number. If your data type has e.g. 20 bits for the mantissa, every number stored in it will have these 20 bits. The mantissa is always stored without a part after the "decimal" (binary?) point, so you won't know how many significant bits there are.
There's no direct mapping between bits and decimal digits. You will need roughly 3.5 bits to represent a decimal digit. So even if you knew a number of significant bits, you still wouldn't know how many significant decimal digits that would make.
To address your problem, you could store the number of significant digits yourself in something like this:
struct myNumber
{
double value;
int nsignificant;
};
Of course, you have to parse the input yourself to find out what to place in nsignificant. Also, use at least double here for the value, the very limited precision of float won't get you far. With this, you could use nsignificant to determine a proper format string for printing the number with the amount of digits you want.
This still has the problem mentioned above: you can't directly map decimal digits to bits, so there's never a guarantee your number can be stored with the precision you intend. In cases where an exact decimal representation is important, you'll want to use a different data type for that. C# provides one, but C doesn't. You'd have to implement it yourself. You could start with something like this:
struct myDecimal
{
long mantissa;
short exponent;
short nsignificant;
}
In this struct, you could e.g. place 1.0e6 like this:
struct myDecimal x = {
.mantissa = 1;
.exponent = 6;
.nsignificant = 2;
};
Of course, this would require you to write quite a lot of own code for parsing and formatting these numbers.

which has to be input as a float during user input.
So, is there a way to get this done.
Almost. The "trick" is to note the textual length of user input. The below will remember the offset of the first non-whitespace character and the offset after the numeric input.
scanf(" %n%f%n", &n1, &input, &n2);
n2 - n1 gives code the length of user input to represent the float. This method can get fooled if user input is in exponential notation, hexadecimal FP notation, infinity, Not-a-number, excessive leading zeros, etc. Yet works well with straight decimal input.
The idea is to print the number to a buffer with at least n2 - n1 precision and then determine how much of the fractional portion to print.
Recall that float typically has about 6-7 significant leading digits of significance, so attempting to input text like "123456789.0" will result in a float with the exact value of 123456792.0 and the output will be based on that value.
#include <float.h>
#include <math.h>
int scan_print_float(void) {
float input;
int n1, n2;
int cnt = scanf(" %n%f%n", &n1, &input, &n2);
if (cnt == 1) {
int len = n2 - n1;
char buf[len * 2 + 1];
snprintf(buf, sizeof buf, "%.*f", len, input);
char dp = '.';
char *p = strchr(buf, dp);
if (p) {
int front_to_dp = p + 1 - buf;
int prec = len - front_to_dp;
if (prec >= 0) {
return printf("<%.*s>\n", prec, p+1);
}
}
}
puts(".");
return 0;
}
int main(void) {
while (scan_print_float()) {
fflush(stdout);
}
return EXIT_SUCCESS;
}
Input/Output
43.3423
<3423>
45.3400
<3400>
-45.3400
<3400>
0.00
<00>
1234.500000
<500000>
.
.
To robustly handle this and the various edge cases, code should read user input as text and not as a float.
Note: float can typically represent about 232 numbers exactly.
43.3423 is usually not one of them. Instead it has an exactly value of 43.3423004150390625
43.3400 is usually not one of them. Instead it has an exactly value of 43.340000152587890625

The only way is to create a struct with the original string value and/ or required precision for rounding

C: convert a real number to 64 bit floating point binary

I'm trying to write a code that converts a real number to a 64 bit floating point binary. In order to do this, the user inputs a real number (for example, 547.4242) and the program must output a 64 bit floating point binary.
My ideas:
The sign part is easy.
The program converts the integer part (547 for the previous example) and stores the result in an int variable. Then, the program converts the fractional part (.4242 for the previous example) and stores the result into an array (each position of the array stores '1' or '0').
This is where I'm stuck. Summarizing, I have: "Integer part = 1000100011" (type int) and "Fractional part = 0110110010011000010111110000011011110110100101000100" (array).
How can I proceed?

the following code is used to determine internal representation of a floating point number according to the IEEE754 notation. This code is made in Turbo c++ ide but you can easily convert for a generalised ide.
#include<conio.h>
#include<stdio.h>
void decimal_to_binary(unsigned char);
union u
{
float f;
char c;
};
int main()
{
int i;
char*ptr;
union u a;
clrscr();
printf("ENTER THE FLOATING POINT NUMBER : \n");
scanf("%f",&a.f);
ptr=&a.c+sizeof(float);
for(i=0;i<sizeof(float);i++)
{
ptr--;
decimal_to_binary(*ptr);
}
getch();
return 0;
}
void decimal_to_binary(unsigned char n)
{
int arr[8];
int i;
//printf("n = %u ",n);
for(i=7;i>=0;i--)
{
if(n%2==0)
arr[i]=0;
else
arr[i]=1;
n/=2;
}
for(i=0;i<8;i++)
printf("%d",arr[i]);
printf(" ");
}
For further details visit Click here!

In order to correctly round all possible decimal representations to the nearest double, you need big integers. Using only the basic integer types from C will leave you to re-implement big integer arithmetics. Each of these two approaches is possible, more information about each follows:
For the first approach, you need a big integer library: GMP is a good one. Armed with such a big integer library, you tackle an input such as the example 123.456E78 as the integer 123456 * 1075 and start wondering what values M in [253 … 254) and P in [-1022 … 1023] make (M / 253) * 2P closest to this number. This question can be answered with big integer operations, following the steps described in this blog post (summary: first determine P. Then use a division to compute M). A complete implementation must take care of subnormal numbers and infinities (inf is the correct result to return for any decimal representation of a number that would have an exponent larger than +1023).
The second approach, if you do not want to include or implement a full general-purpose big integer library, still requires a few basic operations to be implemented on arrays of C integers representing large numbers. The function decfloat() in this implementation represents large numbers in base 109 because that simplifies the conversion from the initial decimal representation to the internal representation as an array x of uint32_t.

Following is a basic conversion. Enough to get OP started.
OP's "integer part of real number" --> int is far too limiting. Better to simply convert the entire string to a large integer like uintmax_t. Note the decimal point '.' and account for overflow while scanning.
This code does not handle exponents nor negative numbers. It may be off in the the last bit or so due to limited integer ui or the the final num = ui * pow10(expo). It handles most overflow cases.
#include <inttypes.h>
double my_atof(const char *src) {
uintmax_t ui = 0;
int dp = '.';
size_t dpi;
size_t i = 0;
size_t toobig = 0;
int ch;
for (i = 0; (ch = (unsigned char) src[i]) != '\0'; i++) {
if (ch == dp) {
dp = '\0'; // only get 1 dp
dpi = i;
continue;
}
if (!isdigit(ch)) {
break; // illegal character
}
ch -= '0';
// detect overflow
if (toobig ||
(ui >= UINTMAX_MAX / 10 &&
(ui > UINTMAX_MAX / 10 || ch > UINTMAX_MAX % 10))) {
toobig++;
continue;
}
ui = ui * 10 + ch;
}
intmax_t expo = toobig;
if (dp == '\0') {
expo -= i - dpi - 1;
}
double num;
if (expo < 0) {
// slightly more precise than: num = ui * pow10(expo);
num = ui / pow10(-expo);
} else {
num = ui * pow10(expo);
}
return num;
}

The trick is to treat the value as an integer, so read your 547.4242 as an unsigned long long (ie 64-bits or more), ie 5474242, counting the number of digits after the '.', in this case 4. Now you have a value which is 10^4 bigger than it should be. So you float the 5474242 (as a double, or long double) and divide by 10^4.
Decimal to binary conversion is deceptively simple. When you have more bits than the float will hold, then it will have to round. More fun occurs when you have more digits than a 64-bit integer will hold -- noting that trailing zeros are special -- and you have to decide whether to round or not (and what rounding occurs when you float). Then there's dealing with an E+/-99. Then when you do the eventual division (or multiplication) by 10^n, you have (a) another potential rounding, and (b) the issue that large 10^n are not exactly represented in your floating point -- which is another source of error. (And for E+/-99 forms, you may need upto and a little beyond 10^300 for the final step.)
Enjoy !

Print float/double without trailing zeros? [duplicate]

This question already has answers here:
Avoid trailing zeroes in printf()
(16 answers)
Closed 6 years ago.
There are a few questions related to this, but I haven't seen one that correctly answers this question. I want to print a floating-point number, but I want the number of decimal places to be adaptive. As an example:
0 -> 0
1234 -> 1234
0.1234 -> 0.1234
0.3 -> 0.3
Annoyingly, the %f specifier will only print to a fixed precision, so it will add trailing zeros to all numbers that don't reach that precision. Some have suggested the %g specifier, which works for a set of numbers, but it will switch to scientific notation for some numbers, like this:
printf("%g", 1000000.0); // prints 1e+06
How can I print floating-point numbers without the unnecessary zeros, while still maintaining printf's standard accuracy for numbers that actually have fractional components?

Use snprintf to print to a temporary buffer then remove the trailing '0' characters manually. There is no other way that's both correct and reasonably easy to implement.

The problem is that using IEEE standard 754 representation, floating point values (with a fractional part) can never have "trailing zeros".
Trailing zeros mean that the fractional value can be written as x/10^n for some integers x, n. But the only fractions that can be represented by this standard have the form x/2^n for some integers x, n.
So what you write as 0.1234 is represented using the bytes 0x3D 0xFC 0xB9 0x24. This is:
Sign = 0
Exponent = 01111011 (which means -4)
Significand: 1.11111001011100100100100
The significand means: 1 + 1/2 + 1/4 + 1/8 + 1/16 + 0/32 + 0/64 + 1/128 + 0/256 + 1/512 + 1/1024 + 1/2048 + 0/4096 + 0/8192 + ...
If you perform this calculation, you get 1.974400043487548828125.
So the number is + 1.974400043487548828125 * 2^(-4) = 0.1234000027179718
(I've calculated this using a computer, of course, so it could be off for the same reason...)
As you can see, the computer does not want to decide for you that you want to chop this number after 4 digits (only) and not after 9 digits (0.123400002). The point is that the computer doesn't see this number as 0.1234 with an infinite number of trailing zeros.
So I don't think there's a better way than R.'s.

Try:
printf("%.20g\n", 1000000.0); // = 1000000
This will switch to scientific notation after 20 significant digits (default is after 6 digits for "%g"):
printf("%.20g\n", 1e+19); // = 10000000000000000000
printf("%.20g\n", 1e+20); // = 1e+20
But be careful with double precision:
printf("%.20g\n", 0.12345); // = 0.12345000000000000417
printf("%.15g\n", 0.12345); // = 0.12345

I wrote a small function to do this myself using the method already mentioned, here it is with a tester. I can't guarantee it's bug free but I think it's fine.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *convert(char *s, double x);
int main()
{
char str[3][20];
printf("%s\n%s\n%s\n", convert(str[0], 0),
convert (str[1], 2823.28920000),
convert (str[2], 4.000342300));
}
char *convert(char *s, double x)
{
char *buf = malloc(100);
char *p;
int ch;
sprintf(buf, "%.10f", x);
p = buf + strlen(buf) - 1;
while (*p == '0' && *p-- != '.');
*(p+1) = '\0';
if (*p == '.') *p = '\0';
strcpy(s, buf);
free (buf);
return s;
}
Output:
0
2823.2892
4.0003423

You can print like this:
printf("%.0f", 1000000.0);
For a more detailed answer, look here.

What's the first double that deviates from its corresponding long by delta?

I want to know the first double from 0d upwards that deviates by the long of the "same value" by some delta, say 1e-8. I'm failing here though. I'm trying to do this in C although I usually use managed languages, just in case. Please help.
#include <stdio.h>
#include <limits.h>
#define DELTA 1e-8
int main() {
double d = 0; // checked, the literal is fine
long i;
for (i = 0L; i < LONG_MAX; i++) {
d=i; // gcc does the cast right, i checked
if (d-i > DELTA || d-i < -DELTA) {
printf("%f", d);
break;
}
}
}
I'm guessing that the issue is that d-i casts i to double and therefore d==i and then the difference is always 0. How else can I detect this properly -- I'd prefer fun C casting over comparing strings, which would take forever.
ANSWER: is exactly as we expected. 2^53+1 = 9007199254740993 is the first point of difference according to standard C/UNIX/POSIX tools. Thanks much to pax for his program. And I guess mathematics wins again.

Doubles in IEE754 have a precision of 52 bits which means they can store numbers accurately up to (at least) 251.
If your longs are 32-bit, they will only have the (positive) range 0 to 231 so there is no 32-bit long that cannot be represented exactly as a double. For a 64-bit long, it will be (roughly) 252 so I'd be starting around there, not at zero.
You can use the following program to detect where the failures start to occur. An earlier version I had relied on the fact that the last digit in a number that continuously doubles follows the sequence {2,4,8,6}. However, I opted eventually to use a known trusted tool (bc) for checking the whole number, not just the last digit.
Keep in mind that this may be affected by the actions of sprintf() rather than the real accuracy of doubles (I don't think so personally since it had no troubles with certain numbers up to 2143).
This is the program:
#include <stdio.h>
#include <string.h>
int main() {
FILE *fin;
double d = 1.0; // 2^n-1 to avoid exact powers of 2.
int i = 1;
char ds[1000];
char tst[1000];
// Loop forever, rely on break to finish.
while (1) {
// Get C version of the double.
sprintf (ds, "%.0f", d);
// Get bc version of the double.
sprintf (tst, "echo '2^%d - 1' | bc >tmpfile", i);
system(tst);
fin = fopen ("tmpfile", "r");
fgets (tst, sizeof (tst), fin);
fclose (fin);
tst[strlen (tst) - 1] = '\0';
// Check them.
if (strcmp (ds, tst) != 0) {
printf( "2^%d - 1 <-- bc failure\n", i);
printf( " got [%s]\n", ds);
printf( " expected [%s]\n", tst);
break;
}
// Output for status then move to next.
printf( "2^%d - 1 = %s\n", i, ds);
d = (d + 1) * 2 - 1; // Again, 2^n - 1.
i++;
}
}
This keeps going until:
2^51 - 1 = 2251799813685247
2^52 - 1 = 4503599627370495
2^53 - 1 = 9007199254740991
2^54 - 1 <-- bc failure
got [18014398509481984]
expected [18014398509481983]
which is about where I expected it to fail.
As an aside, I originally used numbers of the form 2n but that got me up to:
2^136 = 87112285931760246646623899502532662132736
2^137 = 174224571863520493293247799005065324265472
2^138 = 348449143727040986586495598010130648530944
2^139 = 696898287454081973172991196020261297061888
2^140 = 1393796574908163946345982392040522594123776
2^141 = 2787593149816327892691964784081045188247552
2^142 = 5575186299632655785383929568162090376495104
2^143 <-- bc failure
got [11150372599265311570767859136324180752990210]
expected [11150372599265311570767859136324180752990208]
with the size of a double being 8 bytes (checked with sizeof). It turned out these numbers were of the binary form "1000..." which can be represented for far longer with doubles. That's when I switched to using 2n-1 to get a better bit pattern: all one bits.

The first long to be 'wrong' when cast to a double will not be off by 1e-8, it will be off by 1. As long as the double can fit the long in its significand, it will represent it accurately.
I forget exactly how many bits a double has for precision vs offset, but that would tell you the max size it could represent. The first long to be wrong should have the binary form 10000..., so you can find it much quicker by starting at 1 and left-shifting.
Wikipedia says 52 bits in the significand, not counting the implicit starting 1. That should mean the first long to be cast to a different value is 2^53.

Although I'm hesitant to mention Fortran 95 and successors in this discussion, I'll mention that Fortran since the 1990 standard has offered a SPACING intrinsic function which tells you what the difference between representable REALs are about a given REAL. You could do a binary search on this, stopping when SPACING(X) > DELTA. For compilers that use the same floating point model as the one you are interested in (likely to be the IEEE754 standard), you should get the same results.

Off hand, I thought that doubles could represent all integers (within their bounds) exactly.
If that is not the case, then you're going to want to cast both i and d to something with MORE precision than either of them. Perhaps a long double will work.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

e format with printf not printing the desired output [duplicate] - c

Related

How do I print a floating-point value for later scanning with perfect accuracy?

How to convert float number to string without losing user-entered precision in C?

C: convert a real number to 64 bit floating point binary

Print float/double without trailing zeros? [duplicate]

What's the first double that deviates from its corresponding long by delta?

Categories

Resources