Calculate the range of double - c

As part of a exercise from "The C programming Language" i am trying to find a way to calculate the maximum possible float and the maximum possible double on my computer. The technique shown below works with floats (to calculate the max float) but not with double:
// max float:
float f = 1.0;
float last_f;
float step = 9.0;
while(1) {
last_f = f;
f *= (1.0 + step);
while (f == INFINITY) {
step /= 2.0;
f = last_f * (1.0 + step);
}
if (! (f > last_f) )
break;
}
printf("calculated float max : %e\n", last_f);
printf("limits.h float max : %e\n", FLT_MAX);
printf("diff : %e\n", FLT_MAX - last_f);
printf("The expected value? : %s\n\n", (FLT_MAX == last_f)? "yes":"no");
// max double:
double d = 1.0;
double last_d;
double step_d = 9.0;
while(1) {
last_d = d;
d *= (1.0 + step_d);
while (d == INFINITY) {
step_d /= 2.0;
d = last_d * (1.0 + step_d);
}
if (! (d > last_d) )
break;
}
printf("calculated double max: %e\n", last_d);
printf("limits.h double max : %e\n", DBL_MAX);
printf("diff : %e\n", DBL_MAX - last_d);
printf("The expected value? : %s\n\n", (DBL_MAX == last_d)? "yes":"no");
and this results to:
calculated float max : 3.402823e+38
limits.h float max : 3.402823e+38
diff : 0.000000e+00
The expected value? : yes
calculated double max: 1.797693e+308
limits.h double max : 1.797693e+308
diff : 1.995840e+292
The expected value? : no
It looks to me like it still calculates using single precision in the second case.
What am i missing?

OP's approach works when calculations are done with wider precision than float in the first case and wider than double in the 2nd case.
In the first case, OP reports FLT_EVAL_METHOD == 0 so float calculations are done as float and double are done as double. Note that float step ... 1.0 + step is a double calculation.
The below code forces the calculation to double and so I can replicate OP's problem even with my FLT_EVEL_METHOD==2 (Use long double for internal calculations.)
volatile double d = 1.0;
volatile double last_d;
volatile double step_d = 9.0;
while(1) {
last_d = d;
d *= (1.0 + step_d);
while (d == INFINITY) {
step_d /= 2.0;
volatile double sum = 1.0 + step_d;
d = last_d * sum;
//d = last_d + step_d*last_d;
}
if (! (d > last_d) ) {
break;
}
}
diff : 1.995840e+292
The expected value? : no
Instead OP should use the following which does not form the inexact sum of 1.0 + step_d when step_d is small, rather it forms the exact product of step_d*last_d. The 2nd form results in a more accurate calculation for the new d, by providing an additional bit of calculation precision in d. Higher precision FP is not needed to employ OP's approach.
d = last_d + step_d*last_d;
diff : 0x0p+0 0.000000e+00
The expected value? : yes

The expressions with the literals n.0 are all double precision floating point types. That allows the assignment to f to be calculated using a higher precision intermediate value.
It's this effect that allows the algorithm to converge in the float case.
With strict double precision floating point such convergence is not possible.
If you had used the f suffix on the literals in the float case then convergence would not occur there either.
A fix would be to use long double suffixes on the literals if your platform has a wider long double type.

Related

Print Value of pi. How many terms of this series do you have to use before you first get 3.14? 3.141? 3.1415? 3.14159?

#include <stdio.h>
#include <math.h>
int main(void) {
// Set the initial value of pi to 0
double pi = 0.0;
// Set the initial value of the term to 1
double term = 1.0;
// Set the initial value of the divisor to 1
double divisor = 1.0;
// Print the table header
printf("%10s%25s\n", "Number of terms", "Approximation of pi");
// Calculate and print the approximations of pi
for (int i = 1; i <= 20; i++) {
pi += term / divisor;
printf("%10d%25.10f\n", i, pi*4.0);
term *= -1.0;
divisor += 2.0;
}
return 0;
}
I tried to correct the code but still can't get closer to the value as it is ask by my teacher in our assignment...
The Question is..
Calculate the value of π from the infinite series. Print a table that
shows the value of π approximated by one term of this series, by two terms, by three terms,
and so on. How many terms of this series do you have to use before you first get 3.14?
3.141? 3.1415? 3.14159?
How many terms of this series do you have to use before you first get 3.14? 3.141? 3.1415? 3.14159?
The details of "first get 3.14" are a bit unclear. Below attempts something like OP's goal and illustrates the slow convergence as computation time is proportional to the number of terms.
The high number of terms, each incurring round-off errors in the division and addition eventually render this computation too inaccurate for high term count.
int main(void) {
double pi_true = 3.1415926535897932384626433832795;
double threshold = 0.5;
int dp = 0;
// Set the initial value of pi to 0
double pi = 0.0;
// Set the initial value of the term to 1
double term = 1.0;
// Set the initial value of the divisor to 1
double divisor = 1.0;
// Print the table header
printf("%7s %12s %-25.16f\n", "", "", pi_true);
printf("%7s %12s %-25s\n", "", "# of terms", "Approximation of pi");
// Calculate and print the approximations of pi
for (long long i = 1; ; i++) {
pi += term / divisor;
double diff = fabs(4*pi - pi_true);
if (diff <= threshold) {
printf("%7.1e %12lld %-25.16f %-25.*f\n", diff, i, pi * 4.0, dp++, pi * 4.0);
fflush(stdout);
threshold /= 10;
if (4*pi == pi_true) {
break;
}
}
term *= -1.0;
divisor += 2.0;
}
puts("Done");
return 0;
}
Output
3.1415926535897931
# of terms Approximation of pi
4.7e-01 2 2.6666666666666670 3
5.0e-02 20 3.0916238066678399 3.1
5.0e-03 200 3.1365926848388161 3.14
5.0e-04 2000 3.1410926536210413 3.141
5.0e-05 20000 3.1415426535898248 3.1415
5.0e-06 200001 3.1415976535647618 3.14160
5.0e-07 2000001 3.1415931535894743 3.141593
5.0e-08 19999992 3.1415926035897974 3.1415926
5.0e-09 199984633 3.1415926585897931 3.14159266
5.0e-10 1993125509 3.1415926540897927 3.141592654
5.0e-11 19446391919 3.1415926536397927 3.1415926536
...
Ref 3.1415926535897931
On a 2nd attempt, perhaps this is closer to OP's goal
int main(void) {
double pi_true = 3.1415926535897932384626433832795;
double threshold_lo = 2.5;
double threshold_hi = 3.5;
double error_band = 0.5;
int dp = 0;
// Set the initial value of pi to 0
double pi = 0.0;
// Set the initial value of the term to 4
double term = 4.0;
// Set the initial value of the divisor to 1
double divisor = 1.0;
// Print the table header
printf("%12s %-25.16f\n", "", pi_true);
printf("%12s %-25s\n", "# of terms", "Approximation of pi");
// Calculate and print the approximations of pi
for (long long i = 1;; i++) {
pi += term / divisor;
if (pi > threshold_lo && pi < threshold_hi) {
printf("%12lld %-25.16f %-25.*f\n", i, pi, dp++, pi);
fflush(stdout);
char buf[100] = "3.1415926535897932384626433832795";
buf[dp + 2] = 0;
error_band /= 10.0;
double target = atof(buf);
threshold_lo = target - error_band;
threshold_hi = target + error_band;
}
term *= -1.0;
divisor += 2.0;
}
puts("Done");
return 0;
}
Output
3.1415926535897931
# of terms Approximation of pi
2 2.6666666666666670 3
12 3.0584027659273332 3.1
152 3.1350137774059244 3.14
916 3.1405009508583017 3.141
7010 3.1414500002381582 3.1415
130658 3.1415850000208838 3.14159
866860 3.1415915000009238 3.141592
9653464 3.1415925500000141 3.1415926
116423306 3.1415926450000007 3.14159265
919102060 3.1415926525000004 3.141592653
7234029994 3.1415926534500005 3.1415926535

Error when calculating max value for double variable in C

I followed the solution here: How to Calculate Double + Float Precision and have been unable to calculate the maximum value for variables of type double.
I run:
double dbl_max = pow(2, pow(2, 10)) * (1-pow(2, -53));
printf("%.2e", dbl_max);
Result: inf
Or:
double dbl_max = (pow(2, pow(2, 10)));
printf("%.2e", dbl_max);
Result: inf
Or:
double dbl_max = pow(2, pow(2, 9)) * (1-pow(2, -53));
printf("%.2e", dbl_max);
Result: 1.34e+154
Why isn't the calculation fitting into the variable? The top sample above works just fine for float variables.
The intermediate exponent is one too high.
Change pow(2, 10) to (pow(2, 10) - 1) and it
should work. You can compensate by multiplying the final result by
2.
– Tom Karzes
double dbl_max = pow(2, pow(2, 10)-1) * (1-pow(2, -53)) * 2;
printf("%.2e", dbl_max);

Division issues in C

I don't really know how to explain this (that's why the title was to vague) but I need a way to make C divide in a certain way, I need to make c divide without any decimals in the answer (besides the remainder) for example;
Instead of 5.21 / .25 = 20.84
I need this 5.21 / .25 = *20* Remainder = *.21*
I found out how to find the remainder with Fmod() but how do I find the 20?
Thanks ~
how about using implicit casts?
float k = 5.21 / .25;
int n = k;
k -= n;
results in
k = .84
n = 20
using only ints will also do the job if you don't need the remainder
int k = 5.21 / .25
will automatically truncate k and get k = 20
Use double modf(double value, double *iptr) to extract the integer portion of a FP number.
The modf functions break the argument value into integral and fractional parts, each of which has the same type and sign as the argument. C11 §7.12.6.12 2
#include <math.h>
#include <stdio.h>
int main() {
double a = 5.21;
double b = 0.25;
double q = a / b;
double r = fmod(a, b);
printf("quotient: %f\n", q);
printf("remander: %f\n", r);
double ipart;
double fpart = modf(q, &ipart);
printf("quotient i part: %f\n", ipart);
printf("quotient f part: %f\n", fpart);
return 0;
}
Output
quotient: 20.840000
remander: 0.210000
quotient i part: 20.000000
quotient f part: 0.840000
Using int is problematic due to a limited range, precision and sign issues.

casting signed to double different result than casting to float then double

So as part of an assignment I am working if a expression : (double) (float) x == (double) x
returns awlays 1 or not.(x is a signed integer)
it works for every value except for INT_MAX. I was wondering why is it so? if i print the values, they both show the same value,even for INT_MAX.
x = INT_MAX ;
printf("Signed X: %d\n",x);
float fx1 = (float)x;
double dx1 = (double)x;
double dfx = (double)(float)x;
printf("(double) x: %g\n",dx1);
printf("(float) x: %f \n",fx1);
printf("(double)(float)x: %g\n",dfx);
if((double) (float) x == (double) x){
printf("RESULT:%d\n", ((double)(float) x == (double) x));
}
EDIT: the entire program:
#include<stdio.h>
#include<stdlib.h>
#include<limits.h>
int main(int argc, char *argv[]){
//create random values
int x = INT_MAX ;
printf("Signed X: %d\n",x);
float fx1 = (float)x;
double dx1 = (double)x;
double dfx = (double)(float)x;
printf("(double) x: %g\n",dx1);
printf("(float) x: %f \n",fx1);
printf("(double)(float)x: %g\n",dfx);
if((double) (float) x == (double) x){
printf("RESULT:%d\n", ((double)(float) x == (double) x));
}
return 0;
}//end of main function
int and float have most likely the same number of bits in their representation, namely 32. float has a mantissa, an exponent and a sign bit, so the mantissa must have less than 31 bit, needed for the bigger int values like INT_MAX. So there loss of precision when storing in float.

Manually implementing a rounding function in C

I have written a C program (which is part of my project) to round off a float value to the given precision specified by the user. The function is something like this
float round_offf (float num, int precision)
What I have done in this program is convert the float number into a string and then processed it.
But is there a way to keep the number as float itself and implement the same.
Eg. num = 4.445 prec = 1 result = 4.4
Of course there is. Very simple:
#include <math.h>
float custom_round(float num, int prec)
{
int trunc = round(num * pow(10, prec));
return (float)trunc / pow(10, prec);
}
Edit: it seems to me that you want this because you think you can't have dynamic precision in a format string. Apparently, you can:
int precision = 3;
double pie = 3.14159265358979323648; // I'm hungry, I need a double pie
printf("Pi equals %.*lf\n", precision, pie);
This prints 3.142.
Yes:
float round_offf(float num, int precision)
{
int result;
int power;
power = pow(10, precision + 1);
result = num * power;
if ((result % 10) > 5)
result += 10;
result /= 10;
return ((float)result / (float)power);
}

Resources