Related
I want to write code that, if I input a decimal number like 612.216, I can print it as a 612216 (actually convert it to integer). However, the program changes my number to something like 2162160000000000000000001 and I don't what to do about it.
This is my code:
#include <stdio.h>
#include <math.h>
int main() {
long double x;
scanf_s("%Lf", &x);
while (floor(x)!=x)
x = x * 10;
printf("%Lf", x);
return 0;}
How about this:
#include <stdio.h>
int main() {
double number = 612.216;
char number_as_string[20];
snprintf(number_as_string,"%lf", number);
for(int i = 0; number_as_string[i] != '\0'; i++)
if(number_as_string[i] != '.')
printf("%c", number_as_string[i]);
return 0;
}
The downside is the statically allocated array. You can use snprintf to convert the double into an array of chars.
The floating point representation isn't exact so there is a very very small error in any floating point number. You could try something like this pseudocode,
while ((x - floor(x) > 0.0000000000000000001)
x *= 10;
Your math library might define a better number to use like FLT_MIN or some such ;)
The problem with your floor(x)!=x check is that it doesn't take into account any inaccuracy in the representation of the input long double number. (In the example given, this causes an 'extra' 0.0000000000000000000001 to be added to the actual value.) See Is floating point math broken? for more information on such inaccuracies inherent in any representation of floating-point numbers.
To fix this in your code, you can compare the difference between floor(x) and x to a given 'tolerance' - if it's less than that, consider the loop finished. You can use a value derived from the LDBL_EPSILON constant as a typical value for that 'tolerance', though you may like to experiment with different values.
Here is a possible code solution:
#include <stdio.h>
#include <math.h>
#include <float.h> // For the LDBL_EPSILON definition
int main()
{
long double x;
scanf_s("%Lf", &x);
while ((x - floor(x)) > (LDBL_EPSILON * x * 10)) // Try changing the "10" value!
x = x * 10;
printf("%.0Lf", x); // Add the ".0" to remove the trailing ".000000" in the output
return 0;
}
long double can store many finite values exactly. There are all of the form:
+/- some_integer * 2some_exponent
Since "612.216" is not represent-able like that (0.216 cannot be expressed as a binary fraction like 0.25 can), a nearby long double value was used like ~612.2160000000000000253...
Also, OP's repeated use of x = x * 10; adds small rounding errors and does not pose a reasonable conversion limit.
A alternative approach uses LDBL_DIG (the number of significant decimal digits that round trip from decimal text to long double to decimal text unchanged) and to print the double to a buffer. Let *printf() do the heavy lifting of converting a double to the best decimal text.
#include <float.h>
#include <stdio.h>
// To print up to the LDBL_DIG most significant digits w/o trailing zeros:
void print_sig_digits(long double x) {
// - d . ddd....ddd e - expo \0
char buf[1 + 1 + 1 + (LDBL_DIG-1) + 1 + 1 + 8 +1];
#define e_OFFSET (1 + 1 + 1 + (LDBL_DIG-1))
// Print using exponential format
snprintf(buf, sizeof buf, "%+.*Le", LDBL_DIG, x);
buf[e_OFFSET] = '\0'; // End string at 'e'
for (int i = e_OFFSET - 1; buf[i] == '0'; i--) {
buf[i] = '\0'; // Lop off trailing '0'
}
buf[2] = buf[1]; // Copy first digit over '.'
printf("%s\n", &buf[2]);
}
int main(void) {
printf("LDBL_DIG: %d\n", LDBL_DIG);
print_sig_digits( 612.216L);
print_sig_digits( 1.0L/7);
print_sig_digits( 0.000123L);
return 0;
}
Output
LDBL_DIG: 18
612216
142857142857142857
123
I want to read digit by digit the decimals of the sqrt of 5 in C.
The square root of 5 is 2,23606797749979..., so this'd be the expected output:
2
3
6
0
6
7
9
7
7
...
I've found the following code:
#include<stdio.h>
void main()
{
int number;
float temp, sqrt;
printf("Provide the number: \n");
scanf("%d", &number);
// store the half of the given number e.g from 256 => 128
sqrt = number / 2;
temp = 0;
// Iterate until sqrt is different of temp, that is updated on the loop
while(sqrt != temp){
// initially 0, is updated with the initial value of 128
// (on second iteration = 65)
// and so on
temp = sqrt;
// Then, replace values (256 / 128 + 128 ) / 2 = 65
// (on second iteration 34.46923076923077)
// and so on
sqrt = ( number/temp + temp) / 2;
}
printf("The square root of '%d' is '%f'", number, sqrt);
}
But this approach stores the result in a float variable, and I don't want to depend on the limits of the float types, as I would like to extract like 10,000 digits, for instance. I also tried to use the native sqrt() function and casting it to string number using this method, but I faced the same issue.
What you've asked about is a very hard problem, and whether it's even possible to do "one by one" (i.e. without working space requirement that scales with how far out you want to go) depends on both the particular irrational number and the base you want it represented in. For example, in 1995 when a formula for pi was discovered that allows computing the nth binary digit in O(1) space, this was a really big deal. It was not something people expected to be possible.
If you're willing to accept O(n) space, then some cases like the one you mentioned are fairly easy. For example, if you have the first n digits of the square root of a number as a decimal string, you can simply try appending each digit 0 to 9, then squaring the string with long multiplication (same as you learned in grade school), and choosing the last one that doesn't overshoot. Of course this is very slow, but it's simple. The easy way to make it a lot faster (but still asymptotically just as bad) is using an arbitrary-precision math library in place of strings. Doing significantly better requires more advanced approaches and in general may not be possible.
As already noted, you need to change the algorithm into a digit-by-digit one (there are some examples in the Wikipedia page about the methods of computing of the square roots) and use an arbitrary precision arithmetic library to perform the calculations (for instance, GMP).
In the following snippet I implemented the before mentioned algorithm, using GMP (but not the square root function that the library provides). Instead of calculating one decimal digit at a time, this implementation uses a larger base, the greatest multiple of 10 that fits inside an unsigned long, so that it can produce 9 or 18 decimal digits at every iteration.
It also uses an adapted Newton method to find the actual "digit".
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <gmp.h>
unsigned long max_ul(unsigned long a, unsigned long b)
{
return a < b ? b : a;
}
int main(int argc, char *argv[])
{
// The GMP functions accept 'unsigned long int' values as parameters.
// The algorithm implemented here can work with bases other than 10,
// so that it can evaluate more than one decimal digit at a time.
const unsigned long base = sizeof(unsigned long) > 4
? 1000000000000000000
: 1000000000;
const unsigned long decimals_per_digit = sizeof(unsigned long) > 4 ? 18 : 9;
// Extract the number to be square rooted and the desired number of decimal
// digits from the command line arguments. Fallback to 0 in case of errors.
const unsigned long number = argc > 1 ? atoi(argv[1]) : 0;
const unsigned long n_digits = argc > 2 ? atoi(argv[2]) : 0;
// All the variables used by GMP need to be properly initialized before use.
// 'c' is basically the remainder, initially set to the original number
mpz_t c;
mpz_init_set_ui(c, number);
// At every iteration, the algorithm "move to the left" by two "digits"
// the reminder, so it multplies it by base^2.
mpz_t base_squared;
mpz_init_set_ui(base_squared, base);
mpz_mul(base_squared, base_squared, base_squared);
// 'p' stores the digits of the root found so far. The others are helper variables
mpz_t p;
mpz_init_set_ui(p, 0UL);
mpz_t y;
mpz_init(y);
mpz_t yy;
mpz_init(yy);
mpz_t dy;
mpz_init(dy);
mpz_t dx;
mpz_init(dx);
mpz_t pp;
mpz_init(pp);
// Timing, for testing porpuses
clock_t start = clock(), diff;
unsigned long x_max = number;
// Each "digit" correspond to some decimal digits
for (unsigned long i = 0,
last = (n_digits + decimals_per_digit) / decimals_per_digit + 1UL;
i < last; ++i)
{
// Find the greatest x such that: x * (2 * base * p + x) <= c
// where x is in [0, base), using a specialized Newton method
// pp = 2 * base * p
mpz_mul_ui(pp, p, 2UL * base);
unsigned long x = x_max;
for (;;)
{
// y = x * (pp + x)
mpz_add_ui(yy, pp, x);
mpz_mul_ui(y, yy, x);
// dy = y - c
mpz_sub(dy, y, c);
// If y <= c we have found the correct x
if ( mpz_sgn(dy) <= 0 )
break;
// Newton's step: dx = dy/y' where y' = 2 * x + pp
mpz_add_ui(yy, yy, x);
mpz_tdiv_q(dx, dy, yy);
// Update x even if dx == 0 (last iteration)
x -= max_ul(mpz_get_si(dx), 1);
}
x_max = base - 1;
// The actual format of the printed "digits" is up to you
if (i % 4 == 0)
{
if (i == 0)
printf("%lu.", x);
putchar('\n');
}
else
printf("%018lu", x);
// p = base * p + x
mpz_mul_ui(p, p, base);
mpz_add_ui(p, p, x);
// c = (c - y) * base^2
mpz_sub(c, c, y);
mpz_mul(c, c, base_squared);
}
diff = clock() - start;
long int msec = diff * 1000L / CLOCKS_PER_SEC;
printf("\n\nTime taken: %ld.%03ld s\n", msec / 1000, msec % 1000);
// Final cleanup
mpz_clear(c);
mpz_clear(base_squared);
mpz_clear(p);
mpz_clear(pp);
mpz_clear(dx);
mpz_clear(y);
mpz_clear(dy);
mpz_clear(yy);
}
You can see the outputted digits here.
Your title says:
How to compute the digits of an irrational number one by one?
Irrational numbers are not limited to most square roots. They also include numbers of the form log(x), exp(z), sin(y), etc. (transcendental numbers). However, there are some important factors that determine whether or how fast you can compute a given irrational number's digits one by one (that is, from left to right).
Not all irrational numbers are computable; that is, no one has found a way to approximate them to any desired length (whether by a closed form expression, a series, or otherwise).
There are many ways numbers can be expressed, such as by their binary or decimal expansions, as continued fractions, as series, etc. And there are different algorithms to compute a given number's digits depending on the representation.
Some formulas compute a given number's digits in a particular base (such as base 2), not in an arbitrary base.
For example, besides the first formula to extract the digits of π without computing the previous digits, there are other formulas of this type (known as BBP-type formulas) that extract the digits of certain irrational numbers. However, these formulas only work for a particular base, not all BBP-type formulas have a formal proof, and most importantly, not all irrational numbers have a BBP-type formula (essentially, only certain log and arctan constants do, not numbers of the form exp(x) or sqrt(x)).
On the other hand, if you can express an irrational number as a continued fraction (which all real numbers have), you can extract its digits from left to right, and in any base desired, using a specific algorithm. What is more, this algorithm works for any real number constant, including square roots, exponentials (e and exp(x)), logarithms, etc., as long as you know how to express it as a continued fraction. For an implementation see "Digits of pi and Python generators". See also Code to Generate e one Digit at a Time.
Is there a printf width specifier which can be applied to a floating point specifier that would automatically format the output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired?
For example, suppose I print a float to a precision of 2 decimal places:
float foobar = 0.9375;
printf("%.2f", foobar); // prints out 0.94
When I scan the output 0.94, I have no standards-compliant guarantee that I'll get the original 0.9375 floating-point value back (in this example, I probably won't).
I would like a way tell printf to automatically print the floating-point value to the necessary number of significant digits to ensure that it can be scanned back to the original value passed to printf.
I could use some of the macros in float.h to derive the maximum width to pass to printf, but is there already a specifier to automatically print to the necessary number of significant digits -- or at least to the maximum width?
I recommend #Jens Gustedt hexadecimal solution: use %a.
OP wants “print with maximum precision (or at least to the most significant decimal)”.
A simple example would be to print one seventh as in:
#include <float.h>
int Digs = DECIMAL_DIG;
double OneSeventh = 1.0/7.0;
printf("%.*e\n", Digs, OneSeventh);
// 1.428571428571428492127e-01
But let's dig deeper ...
Mathematically, the answer is "0.142857 142857 142857 ...", but we are using finite precision floating point numbers.
Let's assume IEEE 754 double-precision binary.
So the OneSeventh = 1.0/7.0 results in the value below. Also shown are the preceding and following representable double floating point numbers.
OneSeventh before = 0.1428571428571428 214571170656199683435261249542236328125
OneSeventh = 0.1428571428571428 49212692681248881854116916656494140625
OneSeventh after = 0.1428571428571428 769682682968777953647077083587646484375
Printing the exact decimal representation of a double has limited uses.
C has 2 families of macros in <float.h> to help us.
The first set is the number of significant digits to print in a string in decimal so when scanning the string back,
we get the original floating point. There are shown with the C spec's minimum value and a sample C11 compiler.
FLT_DECIMAL_DIG 6, 9 (float) (C11)
DBL_DECIMAL_DIG 10, 17 (double) (C11)
LDBL_DECIMAL_DIG 10, 21 (long double) (C11)
DECIMAL_DIG 10, 21 (widest supported floating type) (C99)
The second set is the number of significant digits a string may be scanned into a floating point and then the FP printed, still retaining the same string presentation. There are shown with the C spec's minimum value and a sample C11 compiler. I believe available pre-C99.
FLT_DIG 6, 6 (float)
DBL_DIG 10, 15 (double)
LDBL_DIG 10, 18 (long double)
The first set of macros seems to meet OP's goal of significant digits. But that macro is not always available.
#ifdef DBL_DECIMAL_DIG
#define OP_DBL_Digs (DBL_DECIMAL_DIG)
#else
#ifdef DECIMAL_DIG
#define OP_DBL_Digs (DECIMAL_DIG)
#else
#define OP_DBL_Digs (DBL_DIG + 3)
#endif
#endif
The "+ 3" was the crux of my previous answer.
Its centered on if knowing the round-trip conversion string-FP-string (set #2 macros available C89), how would one determine the digits for FP-string-FP (set #1 macros available post C89)? In general, add 3 was the result.
Now how many significant digits to print is known and driven via <float.h>.
To print N significant decimal digits one may use various formats.
With "%e", the precision field is the number of digits after the lead digit and decimal point.
So - 1 is in order. Note: This -1 is not in the initial int Digs = DECIMAL_DIG;
printf("%.*e\n", OP_DBL_Digs - 1, OneSeventh);
// 1.4285714285714285e-01
With "%f", the precision field is the number of digits after the decimal point.
For a number like OneSeventh/1000000.0, one would need OP_DBL_Digs + 6 to see all the significant digits.
printf("%.*f\n", OP_DBL_Digs , OneSeventh);
// 0.14285714285714285
printf("%.*f\n", OP_DBL_Digs + 6, OneSeventh/1000000.0);
// 0.00000014285714285714285
Note: Many are use to "%f". That displays 6 digits after the decimal point; 6 is the display default, not the precision of the number.
The short answer to print floating point numbers losslessly (such that they can be read
back in to exactly the same number, except NaN and Infinity):
If your type is float: use printf("%.9g", number).
If your type is double: use printf("%.17g", number).
Do NOT use %f, since that only specifies how many significant digits after the decimal and will truncate small numbers. For reference, the magic numbers 9 and 17 can be found in float.h which defines FLT_DECIMAL_DIG and DBL_DECIMAL_DIG.
If you are only interested in the bit (resp hex pattern) you could use the %a format. This guarantees you:
The
default precision suffices for an exact representation of the value if an exact representation in base 2 exists and otherwise is sufficiently large to distinguish values of type double.
I'd have to add that this is only available since C99.
No, there is no such printf width specifier to print floating-point with maximum precision. Let me explain why.
The maximum precision of float and double is variable, and dependent on the actual value of the float or double.
Recall float and double are stored in sign.exponent.mantissa format. This means that there are many more bits used for the fractional component for small numbers than for big numbers.
For example, float can easily distinguish between 0.0 and 0.1.
float r = 0;
printf( "%.6f\n", r ) ; // 0.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // 0.100000
But float has no idea of the difference between 1e27 and 1e27 + 0.1.
r = 1e27;
printf( "%.6f\n", r ) ; // 999999988484154753734934528.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // still 999999988484154753734934528.000000
This is because all the precision (which is limited by the number of mantissa bits) is used up for the large part of the number, left of the decimal.
The %.f modifier just says how many decimal values you want to print from the float number as far as formatting goes. The fact that the accuracy available depends on the size of the number is up to you as the programmer to handle. printf can't/doesn't handle that for you.
Simply use the macros from <float.h> and the variable-width conversion specifier (".*"):
float f = 3.14159265358979323846;
printf("%.*f\n", FLT_DIG, f);
In one of my comments to an answer I lamented that I've long wanted some way to print all the significant digits in a floating point value in decimal form, in much the same way the as the question asks. Well I finally sat down and wrote it. It's not quite perfect, and this is demo code that prints additional information, but it mostly works for my tests. Please let me know if you (i.e. anyone) would like a copy of the whole wrapper program which drives it for testing.
static unsigned int
ilog10(uintmax_t v);
/*
* Note: As presented this demo code prints a whole line including information
* about how the form was arrived with, as well as in certain cases a couple of
* interesting details about the number, such as the number of decimal places,
* and possibley the magnitude of the value and the number of significant
* digits.
*/
void
print_decimal(double d)
{
size_t sigdig;
int dplaces;
double flintmax;
/*
* If we really want to see a plain decimal presentation with all of
* the possible significant digits of precision for a floating point
* number, then we must calculate the correct number of decimal places
* to show with "%.*f" as follows.
*
* This is in lieu of always using either full on scientific notation
* with "%e" (where the presentation is always in decimal format so we
* can directly print the maximum number of significant digits
* supported by the representation, taking into acount the one digit
* represented by by the leading digit)
*
* printf("%1.*e", DBL_DECIMAL_DIG - 1, d)
*
* or using the built-in human-friendly formatting with "%g" (where a
* '*' parameter is used as the number of significant digits to print
* and so we can just print exactly the maximum number supported by the
* representation)
*
* printf("%.*g", DBL_DECIMAL_DIG, d)
*
*
* N.B.: If we want the printed result to again survive a round-trip
* conversion to binary and back, and to be rounded to a human-friendly
* number, then we can only print DBL_DIG significant digits (instead
* of the larger DBL_DECIMAL_DIG digits).
*
* Note: "flintmax" here refers to the largest consecutive integer
* that can be safely stored in a floating point variable without
* losing precision.
*/
#ifdef PRINT_ROUND_TRIP_SAFE
# ifdef DBL_DIG
sigdig = DBL_DIG;
# else
sigdig = ilog10(uipow(FLT_RADIX, DBL_MANT_DIG - 1));
# endif
#else
# ifdef DBL_DECIMAL_DIG
sigdig = DBL_DECIMAL_DIG;
# else
sigdig = (size_t) lrint(ceil(DBL_MANT_DIG * log10((double) FLT_RADIX))) + 1;
# endif
#endif
flintmax = pow((double) FLT_RADIX, (double) DBL_MANT_DIG); /* xxx use uipow() */
if (d == 0.0) {
printf("z = %.*s\n", (int) sigdig + 1, "0.000000000000000000000"); /* 21 */
} else if (fabs(d) >= 0.1 &&
fabs(d) <= flintmax) {
dplaces = (int) (sigdig - (size_t) lrint(ceil(log10(ceil(fabs(d))))));
if (dplaces < 0) {
/* XXX this is likely never less than -1 */
/*
* XXX the last digit is not significant!!! XXX
*
* This should also be printed with sprintf() and edited...
*/
printf("R = %.0f [%d too many significant digits!!!, zero decimal places]\n", d, abs(dplaces));
} else if (dplaces == 0) {
/*
* The decimal fraction here is not significant and
* should always be zero (XXX I've never seen this)
*/
printf("R = %.0f [zero decimal places]\n", d);
} else {
if (fabs(d) == 1.0) {
/*
* This is a special case where the calculation
* is off by one because log10(1.0) is 0, but
* we still have the leading '1' whole digit to
* count as a significant digit.
*/
#if 0
printf("ceil(1.0) = %f, log10(ceil(1.0)) = %f, ceil(log10(ceil(1.0))) = %f\n",
ceil(fabs(d)), log10(ceil(fabs(d))), ceil(log10(ceil(fabs(d)))));
#endif
dplaces--;
}
/* this is really the "useful" range of %f */
printf("r = %.*f [%d decimal places]\n", dplaces, d, dplaces);
}
} else {
if (fabs(d) < 1.0) {
int lz;
lz = abs((int) lrint(floor(log10(fabs(d)))));
/* i.e. add # of leading zeros to the precision */
dplaces = (int) sigdig - 1 + lz;
printf("f = %.*f [%d decimal places]\n", dplaces, d, dplaces);
} else { /* d > flintmax */
size_t n;
size_t i;
char *df;
/*
* hmmmm... the easy way to suppress the "invalid",
* i.e. non-significant digits is to do a string
* replacement of all dgits after the first
* DBL_DECIMAL_DIG to convert them to zeros, and to
* round the least significant digit.
*/
df = malloc((size_t) 1);
n = (size_t) snprintf(df, (size_t) 1, "%.1f", d);
n++; /* for the NUL */
df = realloc(df, n);
(void) snprintf(df, n, "%.1f", d);
if ((n - 2) > sigdig) {
/*
* XXX rounding the integer part here is "hard"
* -- we would have to convert the digits up to
* this point back into a binary format and
* round that value appropriately in order to
* do it correctly.
*/
if (df[sigdig] >= '5' && df[sigdig] <= '9') {
if (df[sigdig - 1] == '9') {
/*
* xxx fixing this is left as
* an exercise to the reader!
*/
printf("F = *** failed to round integer part at the least significant digit!!! ***\n");
free(df);
return;
} else {
df[sigdig - 1]++;
}
}
for (i = sigdig; df[i] != '.'; i++) {
df[i] = '0';
}
} else {
i = n - 1; /* less the NUL */
if (isnan(d) || isinf(d)) {
sigdig = 0; /* "nan" or "inf" */
}
}
printf("F = %.*s. [0 decimal places, %lu digits, %lu digits significant]\n",
(int) i, df, (unsigned long int) i, (unsigned long int) sigdig);
free(df);
}
}
return;
}
static unsigned int
msb(uintmax_t v)
{
unsigned int mb = 0;
while (v >>= 1) { /* unroll for more speed... (see ilog2()) */
mb++;
}
return mb;
}
static unsigned int
ilog10(uintmax_t v)
{
unsigned int r;
static unsigned long long int const PowersOf10[] =
{ 1LLU, 10LLU, 100LLU, 1000LLU, 10000LLU, 100000LLU, 1000000LLU,
10000000LLU, 100000000LLU, 1000000000LLU, 10000000000LLU,
100000000000LLU, 1000000000000LLU, 10000000000000LLU,
100000000000000LLU, 1000000000000000LLU, 10000000000000000LLU,
100000000000000000LLU, 1000000000000000000LLU,
10000000000000000000LLU };
if (!v) {
return ~0U;
}
/*
* By the relationship "log10(v) = log2(v) / log2(10)", we need to
* multiply "log2(v)" by "1 / log2(10)", which is approximately
* 1233/4096, or (1233, followed by a right shift of 12).
*
* Finally, since the result is only an approximation that may be off
* by one, the exact value is found by subtracting "v < PowersOf10[r]"
* from the result.
*/
r = ((msb(v) * 1233) >> 12) + 1;
return r - (v < PowersOf10[r]);
}
I run a small experiment to verify that printing with DBL_DECIMAL_DIG does indeed exactly preserve the number's binary representation. It turned out that for the compilers and C libraries I tried, DBL_DECIMAL_DIG is indeed the number of digits required, and printing with even one digit less creates a significant problem.
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
union {
short s[4];
double d;
} u;
void
test(int digits)
{
int i, j;
char buff[40];
double d2;
int n, num_equal, bin_equal;
srand(17);
n = num_equal = bin_equal = 0;
for (i = 0; i < 1000000; i++) {
for (j = 0; j < 4; j++)
u.s[j] = (rand() << 8) ^ rand();
if (isnan(u.d))
continue;
n++;
sprintf(buff, "%.*g", digits, u.d);
sscanf(buff, "%lg", &d2);
if (u.d == d2)
num_equal++;
if (memcmp(&u.d, &d2, sizeof(double)) == 0)
bin_equal++;
}
printf("Tested %d values with %d digits: %d found numericaly equal, %d found binary equal\n", n, digits, num_equal, bin_equal);
}
int
main()
{
test(DBL_DECIMAL_DIG);
test(DBL_DECIMAL_DIG - 1);
return 0;
}
I run this with Microsoft's C compiler 19.00.24215.1 and gcc version 7.4.0 20170516 (Debian 6.3.0-18+deb9u1). Using one less decimal digit halves the number of numbers that compare exactly equal. (I also verified that rand() as used indeed produces about one million different numbers.) Here are the detailed results.
Microsoft C
Tested 999507 values with 17 digits: 999507 found numericaly equal, 999507 found binary equal
Tested 999507 values with 16 digits: 545389 found numericaly equal, 545389 found binary equal
GCC
Tested 999485 values with 17 digits: 999485 found numericaly equal, 999485 found binary equal
Tested 999485 values with 16 digits: 545402 found numericaly equal, 545402 found binary equal
To my knowledge, there is a well diffused algorithm allowing to output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired in dtoa.c written by David Gay, which is available here on Netlib (see also the associated paper). This code is used e.g. in Python, MySQL, Scilab, and many others.
I was looking at another question (here) where someone was looking for a way to get the square root of a 64 bit integer in x86 assembly.
This turns out to be very simple. The solution is to convert to a floating point number, calculate the sqrt and then convert back.
I need to do something very similar in C however when I look into equivalents I'm getting a little stuck. I can only find a sqrt function which takes in doubles. Doubles do not have the precision to store large 64bit integers without introducing significant rounding error.
Is there a common math library that I can use which has a long double sqrt function?
There is no need for long double; the square root can be calculated with double (if it is IEEE-754 64-bit binary). The rounding error in converting a 64-bit integer to double is nearly irrelevant in this problem.
The rounding error is at most one part in 253. This causes an error in the square root of at most one part in 254. The sqrt itself has a rounding error of less than one part in 253, due to rounding the mathematical result to the double format. The sum of these errors is tiny; the largest possible square root of a 64-bit integer (rounded to 53 bits) is 232, so an error of three parts in 254 is less than .00000072.
For a uint64_t x, consider sqrt(x). We know this value is within .00000072 of the exact square root of x, but we do not know its direction. If we adjust it to sqrt(x) - 0x1p-20, then we know we have a value that is less than, but very close to, the square root of x.
Then this code calculates the square root of x, truncated to an integer, provided the operations conform to IEEE 754:
uint64_t y = sqrt(x) - 0x1p-20;
if (2*y < x - y*y)
++y;
(2*y < x - y*y is equivalent to (y+1)*(y+1) <= x except that it avoids wrapping the 64-bit integer if y+1 is 232.)
Function sqrtl(), taking a long double, is part of C99.
Note that your compilation platform does not have to implement long double as 80-bit extended-precision. It is only required to be as wide as double, and Visual Studio implements is as a plain double. GCC and Clang do compile long double to 80-bit extended-precision on Intel processors.
Yes, the standard library has sqrtl() (since C99).
If you only want to calculate sqrt for integers, using divide and conquer should find the result in max 32 iterations:
uint64_t mysqrt (uint64_t a)
{
uint64_t min=0;
//uint64_t max=1<<32;
uint64_t max=((uint64_t) 1) << 32; //chux' bugfix
while(1)
{
if (max <= 1 + min)
return min;
uint64_t sqt = min + (max - min)/2;
uint64_t sq = sqt*sqt;
if (sq == a)
return sqt;
if (sq > a)
max = sqt;
else
min = sqt;
}
Debugging is left as exercise for the reader.
Here we collect several observations in order to arrive to a solution:
In standard C >= 1999, it is garanted that non-netative integers have a representation in bits as one would expected for any base-2 number.
----> Hence, we can trust in bit manipulation of this type of numbers.
If x is a unsigned integer type, tnen x >> 1 == x / 2 and x << 1 == x * 2.
(!) But: It is very probable that bit operations shall be done faster than their arithmetical counterparts.
sqrt(x) is mathematically equivalent to exp(log(x)/2.0).
If we consider truncated logarithms and base-2 exponential for integers, we could obtain a fair estimate: IntExp2( IntLog2(x) / 2) "==" IntSqrtDn(x), where "=" is informal notation meaning almost equatl to (in the sense of a good approximation).
If we write IntExp2( IntLog2(x) / 2 + 1) "==" IntSqrtUp(x), we obtain an "above" approximation for the integer square root.
The approximations obtained in (4.) and (5.) are a little rough (they enclose the true value of sqrt(x) between two consecutive powers of 2), but they could be a very well starting point for any algorithm that searchs for the square roor of x.
The Newton algorithm for square root could be work well for integers, if we have a good first approximation to the real solution.
http://en.wikipedia.org/wiki/Integer_square_root
The final algorithm needs some mathematical comprobations to be plenty sure that always work properly, but I will not do it right now... I will show you the final program, instead:
#include <stdio.h> /* For printf()... */
#include <stdint.h> /* For uintmax_t... */
#include <math.h> /* For sqrt() .... */
int IntLog2(uintmax_t n) {
if (n == 0) return -1; /* Error */
int L;
for (L = 0; n >>= 1; L++)
;
return L; /* It takes < 64 steps for long long */
}
uintmax_t IntExp2(int n) {
if (n < 0)
return 0; /* Error */
uintmax_t E;
for (E = 1; n-- > 0; E <<= 1)
;
return E; /* It takes < 64 steps for long long */
}
uintmax_t IntSqrtDn(uintmax_t n) { return IntExp2(IntLog2(n) / 2); }
uintmax_t IntSqrtUp(uintmax_t n) { return IntExp2(IntLog2(n) / 2 + 1); }
int main(void) {
uintmax_t N = 947612934; /* Try here your number! */
uintmax_t sqrtn = IntSqrtDn(N), /* 1st approx. to sqrt(N) by below */
sqrtn0 = IntSqrtUp(N); /* 1st approx. to sqrt(N) by above */
/* The following means while( abs(sqrt-sqrt0) > 1) { stuff... } */
/* However, we take care of subtractions on unsigned arithmetic, just in case... */
while ( (sqrtn > sqrtn0 + 1) || (sqrtn0 > sqrtn+1) )
sqrtn0 = sqrtn, sqrtn = (sqrtn0 + N/sqrtn0) / 2; /* Newton iteration */
printf("N==%llu, sqrt(N)==%g, IntSqrtDn(N)==%llu, IntSqrtUp(N)==%llu, sqrtn==%llu, sqrtn*sqrtn==%llu\n\n",
N, sqrt(N), IntSqrtDn(N), IntSqrtUp(N), sqrtn, sqrtn*sqrtn);
return 0;
}
The last value stored in sqrtn is the integer square root of N.
The last line of the program just shows all the values, with comprobation purposes.
So, you can try different values of Nand see what happens.
If we add a counter inside the while-loop, we'll see that no more than a few iterations happen.
Remark: It is necessary to verify that the condition abs(sqrtn-sqrtn0)<=1 is always achieved when working in the integer-number setting. If not, we shall have to fix the algorithm.
Remark2: In the initialization sentences, observe that sqrtn0 == sqrtn * 2 == sqrtn << 1. This avoids us some calculations.
// sqrt_i64 returns the integer square root of v.
int64_t sqrt_i64(int64_t v) {
uint64_t q = 0, b = 1, r = v;
for( b <<= 62; b > 0 && b > r; b >>= 2);
while( b > 0 ) {
uint64_t t = q + b;
q >>= 1;
if( r >= t ) {
r -= t;
q += b;
}
b >>= 2;
}
return q;
}
The for loop may be optimized by using the clz machine code instruction.
Is there a printf width specifier which can be applied to a floating point specifier that would automatically format the output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired?
For example, suppose I print a float to a precision of 2 decimal places:
float foobar = 0.9375;
printf("%.2f", foobar); // prints out 0.94
When I scan the output 0.94, I have no standards-compliant guarantee that I'll get the original 0.9375 floating-point value back (in this example, I probably won't).
I would like a way tell printf to automatically print the floating-point value to the necessary number of significant digits to ensure that it can be scanned back to the original value passed to printf.
I could use some of the macros in float.h to derive the maximum width to pass to printf, but is there already a specifier to automatically print to the necessary number of significant digits -- or at least to the maximum width?
I recommend #Jens Gustedt hexadecimal solution: use %a.
OP wants “print with maximum precision (or at least to the most significant decimal)”.
A simple example would be to print one seventh as in:
#include <float.h>
int Digs = DECIMAL_DIG;
double OneSeventh = 1.0/7.0;
printf("%.*e\n", Digs, OneSeventh);
// 1.428571428571428492127e-01
But let's dig deeper ...
Mathematically, the answer is "0.142857 142857 142857 ...", but we are using finite precision floating point numbers.
Let's assume IEEE 754 double-precision binary.
So the OneSeventh = 1.0/7.0 results in the value below. Also shown are the preceding and following representable double floating point numbers.
OneSeventh before = 0.1428571428571428 214571170656199683435261249542236328125
OneSeventh = 0.1428571428571428 49212692681248881854116916656494140625
OneSeventh after = 0.1428571428571428 769682682968777953647077083587646484375
Printing the exact decimal representation of a double has limited uses.
C has 2 families of macros in <float.h> to help us.
The first set is the number of significant digits to print in a string in decimal so when scanning the string back,
we get the original floating point. There are shown with the C spec's minimum value and a sample C11 compiler.
FLT_DECIMAL_DIG 6, 9 (float) (C11)
DBL_DECIMAL_DIG 10, 17 (double) (C11)
LDBL_DECIMAL_DIG 10, 21 (long double) (C11)
DECIMAL_DIG 10, 21 (widest supported floating type) (C99)
The second set is the number of significant digits a string may be scanned into a floating point and then the FP printed, still retaining the same string presentation. There are shown with the C spec's minimum value and a sample C11 compiler. I believe available pre-C99.
FLT_DIG 6, 6 (float)
DBL_DIG 10, 15 (double)
LDBL_DIG 10, 18 (long double)
The first set of macros seems to meet OP's goal of significant digits. But that macro is not always available.
#ifdef DBL_DECIMAL_DIG
#define OP_DBL_Digs (DBL_DECIMAL_DIG)
#else
#ifdef DECIMAL_DIG
#define OP_DBL_Digs (DECIMAL_DIG)
#else
#define OP_DBL_Digs (DBL_DIG + 3)
#endif
#endif
The "+ 3" was the crux of my previous answer.
Its centered on if knowing the round-trip conversion string-FP-string (set #2 macros available C89), how would one determine the digits for FP-string-FP (set #1 macros available post C89)? In general, add 3 was the result.
Now how many significant digits to print is known and driven via <float.h>.
To print N significant decimal digits one may use various formats.
With "%e", the precision field is the number of digits after the lead digit and decimal point.
So - 1 is in order. Note: This -1 is not in the initial int Digs = DECIMAL_DIG;
printf("%.*e\n", OP_DBL_Digs - 1, OneSeventh);
// 1.4285714285714285e-01
With "%f", the precision field is the number of digits after the decimal point.
For a number like OneSeventh/1000000.0, one would need OP_DBL_Digs + 6 to see all the significant digits.
printf("%.*f\n", OP_DBL_Digs , OneSeventh);
// 0.14285714285714285
printf("%.*f\n", OP_DBL_Digs + 6, OneSeventh/1000000.0);
// 0.00000014285714285714285
Note: Many are use to "%f". That displays 6 digits after the decimal point; 6 is the display default, not the precision of the number.
The short answer to print floating point numbers losslessly (such that they can be read
back in to exactly the same number, except NaN and Infinity):
If your type is float: use printf("%.9g", number).
If your type is double: use printf("%.17g", number).
Do NOT use %f, since that only specifies how many significant digits after the decimal and will truncate small numbers. For reference, the magic numbers 9 and 17 can be found in float.h which defines FLT_DECIMAL_DIG and DBL_DECIMAL_DIG.
If you are only interested in the bit (resp hex pattern) you could use the %a format. This guarantees you:
The
default precision suffices for an exact representation of the value if an exact representation in base 2 exists and otherwise is sufficiently large to distinguish values of type double.
I'd have to add that this is only available since C99.
No, there is no such printf width specifier to print floating-point with maximum precision. Let me explain why.
The maximum precision of float and double is variable, and dependent on the actual value of the float or double.
Recall float and double are stored in sign.exponent.mantissa format. This means that there are many more bits used for the fractional component for small numbers than for big numbers.
For example, float can easily distinguish between 0.0 and 0.1.
float r = 0;
printf( "%.6f\n", r ) ; // 0.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // 0.100000
But float has no idea of the difference between 1e27 and 1e27 + 0.1.
r = 1e27;
printf( "%.6f\n", r ) ; // 999999988484154753734934528.000000
r+=0.1 ;
printf( "%.6f\n", r ) ; // still 999999988484154753734934528.000000
This is because all the precision (which is limited by the number of mantissa bits) is used up for the large part of the number, left of the decimal.
The %.f modifier just says how many decimal values you want to print from the float number as far as formatting goes. The fact that the accuracy available depends on the size of the number is up to you as the programmer to handle. printf can't/doesn't handle that for you.
Simply use the macros from <float.h> and the variable-width conversion specifier (".*"):
float f = 3.14159265358979323846;
printf("%.*f\n", FLT_DIG, f);
In one of my comments to an answer I lamented that I've long wanted some way to print all the significant digits in a floating point value in decimal form, in much the same way the as the question asks. Well I finally sat down and wrote it. It's not quite perfect, and this is demo code that prints additional information, but it mostly works for my tests. Please let me know if you (i.e. anyone) would like a copy of the whole wrapper program which drives it for testing.
static unsigned int
ilog10(uintmax_t v);
/*
* Note: As presented this demo code prints a whole line including information
* about how the form was arrived with, as well as in certain cases a couple of
* interesting details about the number, such as the number of decimal places,
* and possibley the magnitude of the value and the number of significant
* digits.
*/
void
print_decimal(double d)
{
size_t sigdig;
int dplaces;
double flintmax;
/*
* If we really want to see a plain decimal presentation with all of
* the possible significant digits of precision for a floating point
* number, then we must calculate the correct number of decimal places
* to show with "%.*f" as follows.
*
* This is in lieu of always using either full on scientific notation
* with "%e" (where the presentation is always in decimal format so we
* can directly print the maximum number of significant digits
* supported by the representation, taking into acount the one digit
* represented by by the leading digit)
*
* printf("%1.*e", DBL_DECIMAL_DIG - 1, d)
*
* or using the built-in human-friendly formatting with "%g" (where a
* '*' parameter is used as the number of significant digits to print
* and so we can just print exactly the maximum number supported by the
* representation)
*
* printf("%.*g", DBL_DECIMAL_DIG, d)
*
*
* N.B.: If we want the printed result to again survive a round-trip
* conversion to binary and back, and to be rounded to a human-friendly
* number, then we can only print DBL_DIG significant digits (instead
* of the larger DBL_DECIMAL_DIG digits).
*
* Note: "flintmax" here refers to the largest consecutive integer
* that can be safely stored in a floating point variable without
* losing precision.
*/
#ifdef PRINT_ROUND_TRIP_SAFE
# ifdef DBL_DIG
sigdig = DBL_DIG;
# else
sigdig = ilog10(uipow(FLT_RADIX, DBL_MANT_DIG - 1));
# endif
#else
# ifdef DBL_DECIMAL_DIG
sigdig = DBL_DECIMAL_DIG;
# else
sigdig = (size_t) lrint(ceil(DBL_MANT_DIG * log10((double) FLT_RADIX))) + 1;
# endif
#endif
flintmax = pow((double) FLT_RADIX, (double) DBL_MANT_DIG); /* xxx use uipow() */
if (d == 0.0) {
printf("z = %.*s\n", (int) sigdig + 1, "0.000000000000000000000"); /* 21 */
} else if (fabs(d) >= 0.1 &&
fabs(d) <= flintmax) {
dplaces = (int) (sigdig - (size_t) lrint(ceil(log10(ceil(fabs(d))))));
if (dplaces < 0) {
/* XXX this is likely never less than -1 */
/*
* XXX the last digit is not significant!!! XXX
*
* This should also be printed with sprintf() and edited...
*/
printf("R = %.0f [%d too many significant digits!!!, zero decimal places]\n", d, abs(dplaces));
} else if (dplaces == 0) {
/*
* The decimal fraction here is not significant and
* should always be zero (XXX I've never seen this)
*/
printf("R = %.0f [zero decimal places]\n", d);
} else {
if (fabs(d) == 1.0) {
/*
* This is a special case where the calculation
* is off by one because log10(1.0) is 0, but
* we still have the leading '1' whole digit to
* count as a significant digit.
*/
#if 0
printf("ceil(1.0) = %f, log10(ceil(1.0)) = %f, ceil(log10(ceil(1.0))) = %f\n",
ceil(fabs(d)), log10(ceil(fabs(d))), ceil(log10(ceil(fabs(d)))));
#endif
dplaces--;
}
/* this is really the "useful" range of %f */
printf("r = %.*f [%d decimal places]\n", dplaces, d, dplaces);
}
} else {
if (fabs(d) < 1.0) {
int lz;
lz = abs((int) lrint(floor(log10(fabs(d)))));
/* i.e. add # of leading zeros to the precision */
dplaces = (int) sigdig - 1 + lz;
printf("f = %.*f [%d decimal places]\n", dplaces, d, dplaces);
} else { /* d > flintmax */
size_t n;
size_t i;
char *df;
/*
* hmmmm... the easy way to suppress the "invalid",
* i.e. non-significant digits is to do a string
* replacement of all dgits after the first
* DBL_DECIMAL_DIG to convert them to zeros, and to
* round the least significant digit.
*/
df = malloc((size_t) 1);
n = (size_t) snprintf(df, (size_t) 1, "%.1f", d);
n++; /* for the NUL */
df = realloc(df, n);
(void) snprintf(df, n, "%.1f", d);
if ((n - 2) > sigdig) {
/*
* XXX rounding the integer part here is "hard"
* -- we would have to convert the digits up to
* this point back into a binary format and
* round that value appropriately in order to
* do it correctly.
*/
if (df[sigdig] >= '5' && df[sigdig] <= '9') {
if (df[sigdig - 1] == '9') {
/*
* xxx fixing this is left as
* an exercise to the reader!
*/
printf("F = *** failed to round integer part at the least significant digit!!! ***\n");
free(df);
return;
} else {
df[sigdig - 1]++;
}
}
for (i = sigdig; df[i] != '.'; i++) {
df[i] = '0';
}
} else {
i = n - 1; /* less the NUL */
if (isnan(d) || isinf(d)) {
sigdig = 0; /* "nan" or "inf" */
}
}
printf("F = %.*s. [0 decimal places, %lu digits, %lu digits significant]\n",
(int) i, df, (unsigned long int) i, (unsigned long int) sigdig);
free(df);
}
}
return;
}
static unsigned int
msb(uintmax_t v)
{
unsigned int mb = 0;
while (v >>= 1) { /* unroll for more speed... (see ilog2()) */
mb++;
}
return mb;
}
static unsigned int
ilog10(uintmax_t v)
{
unsigned int r;
static unsigned long long int const PowersOf10[] =
{ 1LLU, 10LLU, 100LLU, 1000LLU, 10000LLU, 100000LLU, 1000000LLU,
10000000LLU, 100000000LLU, 1000000000LLU, 10000000000LLU,
100000000000LLU, 1000000000000LLU, 10000000000000LLU,
100000000000000LLU, 1000000000000000LLU, 10000000000000000LLU,
100000000000000000LLU, 1000000000000000000LLU,
10000000000000000000LLU };
if (!v) {
return ~0U;
}
/*
* By the relationship "log10(v) = log2(v) / log2(10)", we need to
* multiply "log2(v)" by "1 / log2(10)", which is approximately
* 1233/4096, or (1233, followed by a right shift of 12).
*
* Finally, since the result is only an approximation that may be off
* by one, the exact value is found by subtracting "v < PowersOf10[r]"
* from the result.
*/
r = ((msb(v) * 1233) >> 12) + 1;
return r - (v < PowersOf10[r]);
}
I run a small experiment to verify that printing with DBL_DECIMAL_DIG does indeed exactly preserve the number's binary representation. It turned out that for the compilers and C libraries I tried, DBL_DECIMAL_DIG is indeed the number of digits required, and printing with even one digit less creates a significant problem.
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
union {
short s[4];
double d;
} u;
void
test(int digits)
{
int i, j;
char buff[40];
double d2;
int n, num_equal, bin_equal;
srand(17);
n = num_equal = bin_equal = 0;
for (i = 0; i < 1000000; i++) {
for (j = 0; j < 4; j++)
u.s[j] = (rand() << 8) ^ rand();
if (isnan(u.d))
continue;
n++;
sprintf(buff, "%.*g", digits, u.d);
sscanf(buff, "%lg", &d2);
if (u.d == d2)
num_equal++;
if (memcmp(&u.d, &d2, sizeof(double)) == 0)
bin_equal++;
}
printf("Tested %d values with %d digits: %d found numericaly equal, %d found binary equal\n", n, digits, num_equal, bin_equal);
}
int
main()
{
test(DBL_DECIMAL_DIG);
test(DBL_DECIMAL_DIG - 1);
return 0;
}
I run this with Microsoft's C compiler 19.00.24215.1 and gcc version 7.4.0 20170516 (Debian 6.3.0-18+deb9u1). Using one less decimal digit halves the number of numbers that compare exactly equal. (I also verified that rand() as used indeed produces about one million different numbers.) Here are the detailed results.
Microsoft C
Tested 999507 values with 17 digits: 999507 found numericaly equal, 999507 found binary equal
Tested 999507 values with 16 digits: 545389 found numericaly equal, 545389 found binary equal
GCC
Tested 999485 values with 17 digits: 999485 found numericaly equal, 999485 found binary equal
Tested 999485 values with 16 digits: 545402 found numericaly equal, 545402 found binary equal
To my knowledge, there is a well diffused algorithm allowing to output to the necessary number of significant digits such that when scanning the string back in, the original floating point value is acquired in dtoa.c written by David Gay, which is available here on Netlib (see also the associated paper). This code is used e.g. in Python, MySQL, Scilab, and many others.