How to use float.h macros to enhance the floating point precision

How to use float.h macros to enhance the floating point precision - c

As I understood from this answer, there is a way to extend the precision using float.h via the macro LDBL_MANT_DIG. My goal is to enhance the floating point precision of double values so that I can store a more accurate number, e.g., 0.000000000566666 instead of 0.000000. Kindly, can someone give a short example of to use this macro so that I can extend the precision stored in the buffer?

Your comment about wanting to store more accurate numbers so you don't get just 0.000000 suggests that the problem is not in the storage but in the way you're printing the numbers. Consider the following code:
#include <stdio.h>
int main(void)
{
float f = 0.000000000566666F;
double d = 0.000000000566666;
long double l = 0.000000000566666L;
printf("%f %16.16f %13.6e\n", f, f, f);
printf("%f %16.16f %13.6e\n", d, d, d);
printf("%lf %16.16lf %13.6le\n", d, d, d);
printf("%Lf %16.16Lf %13.6Le\n", l, l, l);
return 0;
}
When run, it produces:
0.000000 0.0000000005666660 5.666660e-10
0.000000 0.0000000005666660 5.666660e-10
0.000000 0.0000000005666660 5.666660e-10
0.000000 0.0000000005666660 5.666660e-10
As you can see, using the default "%f" format prints 6 decimal places, which treats the value as 0.0. However, as the format with more precision shows, the value is stored correctly and can be displayed with more decimal places, or with the %e format, or indeed with the %g format though the code doesn't show that in use — the output would be the same as the %e format in this example.
The %f conversion specification, as opposed to %lf or %Lf, says 'print a double'. Note that when float values are passed to printf(), they are automatically converted to double (just as numeric types shorter than int are promoted to int). Therefore, %f can be used for both float and double types, and indeed the %lf format (which was defined in C99 — everything else was defined in C90) can be used to format float or double values. The %Lf format expects a long double.
There isn't a way to store more precision in a float or double simply by using any of the macros from <float.h>. Those are more descriptions of the characteristics of the floating-point types and the way that they behave than anything else.

The answer you cited only mentions that the macro is equal to the number of precision digits that you can store. It cannot in any way increase precision. But the macro is for "long doubles", not doubles. You can use the long double type if you need more precision than the double type:
long double x = 3.14L;
Notice the "L" after the number for specifying a long double literal.

Floating-point types are implemented in hardware. The precision is standardized across the industry and baked into the circuits of the CPU. There's no way to increase it beyond long double except an extended-precision software library such as GMP.
The good news is that floating-point numbers don't get bogged down in leading zeroes. 0.000000000566666 won't round to zero. With only six digits, you only even need a single-precision float to represent it well.
There is an issue with math.h (not float.h), where the POSIX standard fails to provide π and e with long double precision. There are a couple workarounds: GNU defines e.g. M_PIl and M_El, or you can also use the preprocessor to paste an l onto such literal constants in another library (giving the number long double type) and hope for spare digits.

Related

Is there any way to not lose the precision and still get the value?

First off, I'm a total beginner at C, with prior experience of programming in Java and Python. The goal of the program was to add 2 numbers. While I was playing with the code, I encountered an issue with precision. The issue was caused when I added 2 numbers- 1 of float data type and the other of double data type.
Code:
#include <stdio.h>
int main() {
double b=20.12345678;
float c=30.1234f;
printf("The Sum of %.8f and %.4f is= %.8f\n", b, c, b+c);
return 0;
}
Output:
The Sum of 20.12345678 and 30.1234 is= 50.24685651
However, the correct output should be: 50.24685678
float values are accurate up-to 6 decimal places, and so is the output.
I tried casting the values explicitly to double type, but its still of no use.
PS: When I convert the variable type from float to double, the output is precise; but is there any other way to add float and double integers without messing with their data type?
Thank You.

float only guarantees 6 decimal digits of precision, so any computation with a float (even if the other operands are double, even if you're storing the result to a double) will only be precise to 6 digits.
If you need greater precision, then limit yourself to double or long double. If you need more than 10 decimal digits of precision, then you'll need to use something other than the native floating point types and library functions. You'll either need to roll your own, or use an arbitrary precision math library like GNU MP.

The value assigned to c can't be expressed exactly so it gets assigned the next closest value. You don't see that when printing to 4 decimal places but you do see it if you print 8:
printf("The Sum of %.8f and %.8f is= %.8f\n", b, c, b+c);
Output:
The Sum of 20.12345678 and 30.12339973 is= 50.24685651
So the constant 30.1234f is already imprecise enough for the calculation you're trying to do.

How do I print in double precision?

I'm completely new to C and I'm trying to complete an assignment. The exercise is to print tan(x) with x incrementing from 0 to pi/2.
We need to print this in float and double. I wrote a program that seems to work, but I only printed floats, while I expected double.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
double x;
double pi;
pi = M_PI;
for (x = 0; x<=pi/2; x+= pi/20)
{
printf("x = %lf, tan = %lf\n",x, tan(x));
}
exit(0);
}
My question is:
Why do I get floats, while I defined the variables as double and used %lf in the printf function?
What do I need to change to get doubles as output?

"...but I only printed floats, while I expected double"
You are actually outputting double values.
float arguments to variadic functions (including printf()) are implicitly promoted to double in general. reference.
So even if your statement
printf("x = %lf, tan = %lf\n",x, tan(x));
were changed to:
printf("x = %f, tan = %f\n",x, tan(x));
It would still output double as both "%f" and "%lf" are used as double format specifiers for printf() (and other variadic functions).
Edit to address following statement/questions in comments:
"I know that a double notation has 15 digits of [precision]."
Yes. But there is a difference between the actual IEEE 754 specified characteristics of the float/double data types, and the way that they can be _made to appear using format specifiers in the printf() function.
In simplest terms:
double has double (2x) the precision of a float.
float is a 32 bit IEEE 754 single precision Floating Point Number with 1 bit for the sign, 8 bits for the exponent, and 24* for the value, resulting in 7 decimal digits of precision.
double is a 64 bit IEEE 754 double precision Floating Point Number with 1 bit for the sign, 11 bits for the exponent, and 53* bits for the value resulting in 15 decimal digits of precision.
* - including the implicit bit (which always equals 1 for normal numbers, and 0 for subnormal numbers. This implicit bit is not stored in memory), but not the sign bit.
"...But with %.20f I was able to print more digits, how is that possible and where do the digits come from?"
The extra digits are inaccuracies caused by binary representation of analog numbers, coupled with using a width specifier to force more information to display than what is actually represented by the stored value.
Although width specifiers have there rightful place, they can also result in providing misleading results.

Why do I get floats, while I defined the variables as double and used %lf in the printf function?
Code is not getting "floats", output is simply text. Even if the argument coded is a float or a double, the output is the text translation of the floating point number - often rounded.
printf() simply follows the behavior of "%lf": print a floating point value with 6 places after the decimal point. With printf(), "%lf" performs exactly like "%f".
printf("%lf\n%lf\n%f\n%f\n", 123.45, 123.45f, 123.45, 123.45f);
// 123.450000
// 123.449997
// 123.450000
// 123.449997
What do I need to change to get doubles as output?
Nothing, the output is text, not double. To see more digits, print with greater precision.
printf("%.50f\n%.25f\n", 123.45, 123.45f);
// 123.45000000000000284217094304040074348449710000000000
// 123.4499969482421875000000000
how do I manipulate the code so that my output is in float notation?
Try "%e", "%a" for exponential notation. For a better idea of how many digits to print: Printf width specifier to maintain precision of floating-point value.
printf("%.50e\n%.25e\n", 123.45, 123.45f);
printf("%a\n%a\n", 123.45, 123.45f);
// 1.23450000000000002842170943040400743484497100000000e+02
// 1.2344999694824218750000000e+02
// 0x1.edccccccccccdp+6
// 0x1.edccccp+6
printf("%.*e\n%.*e\n", DBL_DECIMAL_DIG-1, 123.45, FLT_DECIMAL_DIG-1,123.45f);
// 1.2345000000000000e+02
// 1.23449997e+02

How can I fix my floats being rounded down to doubles?

I know that by default in C when you declare a float it gets automatically saved as a double and that if you want it to be saved as a float you have to declare it like this
float x = 0.11f
but what if my x value comes from a scanf? How can I do so that when I print it it doesn't get rounded down or up?
Here's my code btw, thanks for the help.
#include <stdio.h>
int main() {
float number = 0;
float comparison;
do{
printf("\nEnter a number: ");
scanf("%f", &comparison);
if(comparison > number) {
number = comparison;
}
}while(comparison > 0);
printf("The largest number enteres was: %f\n\n", number);
}

what if my x value comes from a scanf? How can I do so that when I print it it doesn't get rounded down or up?
scanf with an %f directive will read the input and convert it to a float (not a double). If the matched text does not correspond to a number exactly representable as a float then there will be rounding at this stage. There is no alternative.
When you pass an argument of type float to printf() for printing, it will be promoted to type double. This is required by the signature of that function. But type double can exactly represent all values of type float, so this promotion does not involve any rounding. printf's handling of the %f directives is aligned with this automatic promotion: the corresponding (promoted) argument is expected to be of type double.
There are multiple avenues to reproducing the input exactly, depending on what constraints you are willing to put on that input. The most general is to read, store, and print the data as a string, though even this has its complications.
If you are willing to place a limit on the maximum decimal range and precision for which verbatim reproduction is supported, then you may be able to get output rounded to the same representation as the input by specifying a precision in your printf field directives:
float f;
scanf("%f", &f);
printf("%f %.2f %5.2f\n", f, f, f);
If you want to use a built-in floating-point format and also avoid trailing zeroes being appended then either an explicit precision like that or a %g directive is probably needed:
printf("%f %g\n", f, f);
Other alternatives are more involved, such as creating a fixed-point or arbitrary-precision decimal data type, along with appropriate functions for reading and writing it. I presume that goes beyond what you're presently interested in doing.
Note: "double" is short for "double precision", as opposed to notionally single-precision "float". The former is the larger type in terms of storage and representational capability. In real-world implementations, there is never any "rounding down" from float to double.

C - Long double min and max value [duplicate]

I'm working with C, I have to do an exercise in which I have to print the value of long double min and long double max.
I used float.h as header, but these two macros (LDBL_MIN/MAX) give me the same value as if it was just a double.
I'm using Visual Studio 2015 and if I hover the mouse on LDBL MIN it says #define LDBL_MIN DBL_MIN. Is that why it prints dbl_min instead of ldbl_min?
How can I fix this problem?
printf("Type: Long Double Value: %lf Min: %e Max: %e Memory:%lu\n",
val10, LDBL_MIN, LDBL_MAX, longd_size);
It is a problem because my assignment requires two different values for LDBL and DBL.

C does not specify that long double must have a greater precision/range than double.
Even if the implementation treats them as different types, they may have the same implementation, range, precision, min value, max value, etc.
Concerning Visual Studio, MS Long Double helps.
To fix the problem, use another compiler that supports long double with a greater precision/range than double. Perhaps GCC?

From this reference on the lfoating point types:
long double - extended precision floating point type. Matches IEEE-754 extended floating-point type if supported, otherwise matches some non-standard extended floating-point type as long as its precision is better than double and range is at least as good as double, otherwise matches the type double. Some x86 and x86_64 implementations use the 80-bit x87 floating point type.
Added emphasis is mine.
What the above quote says is that while a compliant C compiler must have the long double type, it doesn't really have to support it differently than double. Something which is probably the case with the Visual Studio C compiler.

Those macros are either broken, or long double is just an alias for double on your system. To test, set a long double to DBL_MAX, multiply by two, then subtract DBL_MAX from it. If the result is finite, then you have extra exponent space in the long double. If not, and long double is bigger than double, the extra bytes could just be padding, or you could have the same exponent space and more precision. So LDBL_MAX's genuine value will be just a smidgen over DBL_MAX.
The easiest way to generate the max is simply to look up the binary representation. However if you want to do it in portable C, you can probe it by repeated multiplications to get the magnitude, then fill out the mantissa by repeatedly adding descending powers of two until you run out of precision.

Pointers in C programming with double precision

Ok so this what I must do but i can't make it work:
a) Change to float instead of integers. And assign 0.3 as starting value to "u".
b) Use double precision instead of integers. Asign 0.3x10^45 as starting value for "u".
c) Use characters instead of integers. Assign starting value as 'C' for "u".
#include <stdio.h>
main ()
{
int u = 3;
int v;
int *pu;
int *pv;
pu = &u;
v = *pu;
pv = &v;
printf("\nu=%d &u=%X pu=%X *pu=%d", u, &u, pu, *pu);
printf("\n\nv=%d &v=%X pv=%X *pv=%d", v, &v, pv, *pv);
}
I'll be really grateful if anyone could modify my code to do the things above. Thanks

This question is testing a few things. First do you know your types? You are expected to know that a floating pointing number is declared with float, a double precision number with double, and a character with char.
Second you are expected to know how to assign a literal value to those different types. For the float literal you are probably expected to use 0.3f, since without that suffix it would be double precision by default (although in this context it isn't going to make any difference). For the double, you are expected to know how to use scientific notation (the literal value should be 0.3e45). The character literal I would hope is fairly obvious to you.
Finally you are expected to know the various type characters used in the printf format specification. Both single and double precision numbers use the same type characters, but you have a choice of %e, %f or %g, depending on your requirements. I tend to use %g as a good general purpose choice, but my guess is they are expecing you to use %e for the double (because that forces the use of scientific notation) and possibly %f for the float - it depends what you have been taught. For a character you use %c.
Also, note that you should only be replacing the %d type characters in the format strings. The %X values are used to output a hexadecimal representation of the pointers (&u and pu). A pointer isn't going to change into a floating point value or a character just because the type that is being pointed to has changed - an address is always an integer when you are writing it out.