How to set precision of a float - c

Can someone explain me how to choose the precision of a float with a C function?
Examples:
theFatFunction(0.666666666, 3) returns 0.667
theFatFunction(0.111111111, 3) returns 0.111

You can't do that, since precision is determined by the data type (i.e. float or double or long double). If you want to round it for printing purposes, you can use the proper format specifiers in printf(), i.e. printf("%0.3f\n", 0.666666666).

You can't. Precision depends entirely on the data type. You've got float and double and that's it.

Floats have a static, fixed precision. You can't change it. What you can sometimes do, is round the number.
See this page, and consider to scale yourself by powers of 10. Note that not all numbers are exactly representable as floats, either.

Most systems follow IEEE-754 floating point standard which defines several floating point types.
On these systems, usually float is the IEEE-754 binary32 single precision type: it has 24-bit of precision. double is the binary64 double precision type; it has 53-bit of precision. The precision in bit numbers is defined by the IEEE-754 standard and cannot be changed.
When you print values of floating point types using functions of the fprintf family (e.g., printf), the precision is defined as the maximum number of significant digits and is by default set to 6 digits. You can change the default precision with a . followed by a decimal number in the conversion specification. For example:
printf("%.10f\n", 4.0 * atan(1.0)); // prints 3.1415926536
whereas
printf("%f\n", 4.0 * atan(1.0)); // prints 3.141593

It might be roughly the following steps:
Add 0.666666666 with 0.0005 (we get 0.667166666)
Multiply by 1000 (we get 667.166666)
Shift the number to an int (we get 667)
Shift it back to float (we get 667.0)
Divide by 1000 (we get 0.667)
Thank you.

Precision is determined by the data type (i.e. float or double or long double).
If you want to round it for printing purposes, you can use the proper format specifiers in printf(), i.e.
printf("%0.3f\n", 0.666666666) //will print 0.667 in c
Now if you want to round it for calculating purposes you have to first multiply the float by 10^number of digits then typecast to int , do the calculation and then again typecast to float and divide by same power of 10
float f=0.66666;
f *= 1000; // 666.660
int i = (int)f; // 666
i = 2*i; // 1332
f = i; // 1332
f /= 1000; // 1.332
printf("%f",f); //1.332000

Related

How do I print in double precision?

I'm completely new to C and I'm trying to complete an assignment. The exercise is to print tan(x) with x incrementing from 0 to pi/2.
We need to print this in float and double. I wrote a program that seems to work, but I only printed floats, while I expected double.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
double x;
double pi;
pi = M_PI;
for (x = 0; x<=pi/2; x+= pi/20)
{
printf("x = %lf, tan = %lf\n",x, tan(x));
}
exit(0);
}
My question is:
Why do I get floats, while I defined the variables as double and used %lf in the printf function?
What do I need to change to get doubles as output?
"...but I only printed floats, while I expected double"
You are actually outputting double values.
float arguments to variadic functions (including printf()) are implicitly promoted to double in general. reference.
So even if your statement
printf("x = %lf, tan = %lf\n",x, tan(x));
were changed to:
printf("x = %f, tan = %f\n",x, tan(x));
It would still output double as both "%f" and "%lf" are used as double format specifiers for printf() (and other variadic functions).
Edit to address following statement/questions in comments:
"I know that a double notation has 15 digits of [precision]."
Yes. But there is a difference between the actual IEEE 754 specified characteristics of the float/double data types, and the way that they can be _made to appear using format specifiers in the printf() function.
In simplest terms:
double has double (2x) the precision of a float.
float is a 32 bit IEEE 754 single precision Floating Point Number with 1 bit for the sign, 8 bits for the exponent, and 24* for the value, resulting in 7 decimal digits of precision.
double is a 64 bit IEEE 754 double precision Floating Point Number with 1 bit for the sign, 11 bits for the exponent, and 53* bits for the value resulting in 15 decimal digits of precision.
* - including the implicit bit (which always equals 1 for normal numbers, and 0 for subnormal numbers. This implicit bit is not stored in memory), but not the sign bit.
"...But with %.20f I was able to print more digits, how is that possible and where do the digits come from?"
The extra digits are inaccuracies caused by binary representation of analog numbers, coupled with using a width specifier to force more information to display than what is actually represented by the stored value.
Although width specifiers have there rightful place, they can also result in providing misleading results.
Why do I get floats, while I defined the variables as double and used %lf in the printf function?
Code is not getting "floats", output is simply text. Even if the argument coded is a float or a double, the output is the text translation of the floating point number - often rounded.
printf() simply follows the behavior of "%lf": print a floating point value with 6 places after the decimal point. With printf(), "%lf" performs exactly like "%f".
printf("%lf\n%lf\n%f\n%f\n", 123.45, 123.45f, 123.45, 123.45f);
// 123.450000
// 123.449997
// 123.450000
// 123.449997
What do I need to change to get doubles as output?
Nothing, the output is text, not double. To see more digits, print with greater precision.
printf("%.50f\n%.25f\n", 123.45, 123.45f);
// 123.45000000000000284217094304040074348449710000000000
// 123.4499969482421875000000000
how do I manipulate the code so that my output is in float notation?
Try "%e", "%a" for exponential notation. For a better idea of how many digits to print: Printf width specifier to maintain precision of floating-point value.
printf("%.50e\n%.25e\n", 123.45, 123.45f);
printf("%a\n%a\n", 123.45, 123.45f);
// 1.23450000000000002842170943040400743484497100000000e+02
// 1.2344999694824218750000000e+02
// 0x1.edccccccccccdp+6
// 0x1.edccccp+6
printf("%.*e\n%.*e\n", DBL_DECIMAL_DIG-1, 123.45, FLT_DECIMAL_DIG-1,123.45f);
// 1.2345000000000000e+02
// 1.23449997e+02

Why casting double to int might give different results?

I am using fixed decimal point number (using uint16_t) to store percentage with 2 fractional digits. I have found that the way I am casting the double value to integer makes a difference in the resulting value.
const char* testString = "99.85";
double percent = atof(testString);
double hundred = 100;
uint16_t reInt1 = (uint16_t)(hundred * percent);
double stagedDouble = hundred * percent;
uint16_t reInt2 = (uint16_t)stagedDouble;
Example output:
percent: 99.850000
stagedDouble: 9985.000000
reInt1: 9984
reInt2: 9985
The error is visible in about 47% of all values between 0 and 10000 (of the fixed point representation). It does not appear at all when casting with stagedDouble. And I do not understand why the two integers are different. I am using GCC 6.3.0.
Edit:
Improved code snippet to demonstrate percent variable and to unify the coefficient between the two statements. The change of 100 into a double seems as a quality change that might affect the output, but it does not change a thing in my program.
Is percent a float? If so, look at what types you're multiplying.
reInt1 is double * float and stagedDouble is int * float. Mixing up floating point math can cause these types of rounding errors.
Changing the 100's to be both double or both int results in the same answer.
The reported behavior is consistent with percent being declared float, and the use of IEEE-754 basic 32-bit and 64-bit binary floating-point for float and double.
uint16_t reInt1 = (uint16_t)(100.0 * percent);
Since 100.0 is a double constant, this converts percent to double, performs a multiplication in double, and converts the result to uint16_t. The multiplication may have a very slight rounding error, up to ½ ULP of the double format, a relative error around 2−53.
double stagedDouble = 100 * percent;
uint16_t reInt2 = (uint16_t)stagedDouble;
Since 100 is an int constant, this converts 100 to float, performs a multiplication in float, and converts the result to uint16_t. The rounding error in the multiplication may be up to ½ ULP of the float format, a relative error around 2−24.
Since all of the values are near hundredths of an integer, a 50:50 ratio of errors up:down would make about half the results just under what is needed for the integer threshold. In the multiplications, all those with values that are 0, 25, 50, or 100 one-hundredths would be exact (because 25/100 is ¼, which is exactly representable in binary floating-point), so 96/100 would have rounding errors. If the directions of the float and double rounding errors behave as independent, uniform random variables, about half would round in different directions, producing different results, giving about 48% mismatches, which is consistent with the 47% reported in the question.
(However, when I measure the actual results, I get 42% differences between the float and double methods. I suspect that has something to do with the trailing bits in the float multiplication before rounding—the distribution might not act like a uniform distribution of two possibilities. It may be the OP’s code prepares the percent values in some way other than dividing an integer value by 100.)

large numbers and float and double in C

I need to deal with very large matrices and/or large numbers and I don't know why
double result = 2251.000000 * 9488.000000 + 7887.000000 * 8397.000000;
gives me the correct output of 87584627.000000.
Same with int result.
However, if I use float result = 2251.000000f + ... etc,
it gives me 87584624.000000 and I have no idea why!
Can somebody tell me what I'm missing?
The most common format for floating point numbers in C is the IEEE-754 format, described in this wikipedia article. The binary32 format corresponds to a float, and the binary64 format corresponds to a double.
A float has just over 7 decimal digits of precision. Since the answer to your equation has 8 significant digits, the answer cannot be exactly represented as a float.
A double has almost 16 decimal digits of precision, and therefore does have an exact representation of the answer. Therefore, in general, when you are doing general purpose mathematics, you should be using doubles. However, it's important to note that even a double may not have enough precision for every application. For example, the national debt of the United States is 18,149,752,816,959.61 which barely fits into a double.

How do I round off a Float value in C

I'm looking for function which can Round off a float value in C ,Suppose have number 0.1153846 should be rounded off till 6 decimal producing output as 0.115385
Though there is function in objective c like lroundf() but not sure how can use it in my context.
I'm on gcc compiler and any help would be much appreciated.
float f = 0.1153846;
f = floor(f * 1000000) / 1000000;
This should work.
You might want to do
double x = 0.1153846;
double rx = round (x * 1e6) * 1.e-6;
However, remember that IEEE 754 floating points are binary, with base 2 mantissa.
It's quite unusual for float to be a decimal type, that means that whatever you do, the result of the rounding will most often than not not be representable in a float and will be adjusted again to match a representable number.
If such a rounding is really needed for computational purpose, a floating point type is probably not the correct type to use.
If it is just for display purpose, use the control the printf family give you.
int precision = 3;
float num = 1.63322;
printf("%.*f",precision,num);

Multiplying two floats doesn't give exact result

I am trying to multiply two floats as follows:
float number1 = 321.12;
float number2 = 345.34;
float rexsult = number1 * number2;
The result I want to see is 110895.582, but when I run the code it just gives me 110896. Most of the time I'm having this issue. Any calculator gives me the exact result with all decimals. How can I achive that result?
edit : It's C code. I'm using XCode iOS simulator.
There's a lot of rounding going on.
float a = 321.12; // this number will be rounded
float b = 345.34; // this number will also be rounded
float r = a * b; // and this number will be rounded too
printf("%.15f\n", r);
I get 110895.578125000000000 after the three separate roundings.
If you want more than 6 decimal digits' worth of precision, you will have to use double and not float. (Note that I said "decimal digits' worth", because you don't get decimal digits, you get binary.) As it stands, 1/2 ULP of error (a worst-case bound for a perfectly rounded result) is about 0.004.
If you want exactly rounded decimal numbers, you will have to use a specialized decimal library for such a task. A double has more than enough precision for scientists, but if you work with money everything has to be 100% exact. No floating point numbers for money.
Unlike integers, floating point numbers take some real work before you can get accustomed to their pitfalls. See "What Every Computer Scientist Should Know About Floating-Point Arithmetic", which is the classic introduction to the topic.
Edit: Actually, I'm not sure that the code rounds three times. It might round five times, since the constants for a and b might be rounded first to double-precision and then to single-precision when they are stored. But I don't know the rules of this part of C very well.
You will never get the exact result that way.
First of all, number1 ≠ 321.12 because that value cannot be represented exactly in a base-2 system. You'll need an infinite number of bits for it.
The same holds for number2 ≠ 345.34.
So, you begin with inexact values to begin with.
Then the product will get rounded because multiplication gives you double the number of significant digits but the product has to be stored in float again if you multiply floats.
You probably want to use a 10-based system for your numbers. Or, in case your numbers only have 2 decimal digits of the fractional, you can use integers (32-bit integers are sufficient in this case, but you may end up needing 64-bit):
32112 * 34534 = 1108955808.
That represents 321.12 * 345.34 = 110895.5808.
Since you are using C you could easily set the precision by using "%.xf" where x is the wanted precision.
For example:
float n1 = 321.12;
float n2 = 345.34;
float result = n1 * n2;
printf("%.20f", result);
Output:
110895.57812500000000000000
However, note that float only gives six digits of precision. For better precision use double.
floating point variables are only approximate representation, not precise one. Not every number can "fit" into float variable. For example, there is no way to put 1/10 (0.1) into binary variable, just like it's not possible to put 1/3 into decimal one (you can only approximate it with endless 0.33333)
when outputting such variables, it's usual to apply many rounding options. Unless you set them all, you can never be sure which of them are applied. This is especially true for << operators, as the stream can be told how to round BEFORE <<.
Printf also does some rounding. Consider http://codepad.org/LLweoeHp:
float t = 0.1f;
printf("result: %f\n", t);
--
result: 0.100000
Well, it looks fine. Why? Because printf defaulted to some precision and rounded up the output. Let's dial in 50 places after decimal point: http://codepad.org/frUPOvcI
float t = 0.1f;
printf("result: %.50f\n", t);
--
result: 0.10000000149011611938476562500000000000000000000000
That's different, isn't it? After 625 the float ran out of capacity to hold more data, that's why we see zeroes.
A double can hold more digits, but 0.1 in binary is not finite. Double has to give up, eventually: http://codepad.org/RAd7Yu2r
double t = 0.1;
printf("result: %.70f\n", t);
--
result: 0.1000000000000000055511151231257827021181583404541015625000000000000000
In your example, 321.12 alone is enough to cause trouble: http://codepad.org/cgw3vUKn
float t = 321.12f;
printf("and the result is: %.50f\n", t);
result: 321.11999511718750000000000000000000000000000000000000
This is why one has to round up floating point values before presenting them to humans.
Calculator programs don't use floats or doubles at all. They implement decimal number format. eg:
struct decimal
{
int mantissa; //meaningfull digits
int exponent; //number of decimal zeroes
};
Ofc that requires reinventing all operations: addition, substraction, multiplication and division. Or just look for a decimal library.

Resources