Rounding float to int - c

Self studying coding (noob here), the answer to a practice problem is as follows:
amount = (int) round (c);
Where c is a float.
Is it safe to say that this line converts the float to an integer through rounding?
I tried researching methods of converting floats to integers but none used the syntax as above.

You should look at the return value of round.
If it returns a float, then your int casting will not lose precision and will convert the float to an int.
If it returns an int, then the conversion happens in the function, and there is no need to try converting it again.
This is of course if you really wish to round the number. If you want 10.8 to become 11, then your code is a possible solution, but if you want it to become 10, then just convert (cast) it to an int.

I would just do amount = int(c)
Here is a full example
amount = 10.3495829
amount = int(amount)
print(amount)
It should print 10!

float has the higher range than integer primitive value, which means a float is a bigger than int. Due to this fact you can convert a float to an int by just down-casting it
int value = (int) 9.99f; // Will return 9
Just note, that this typecasting will truncate everything after the decimal point , it won't perform any rounding or flooring operation on the value.
As you see from above example if you have float of 9.999, (down) casting to an integer will return 9 . However If you need rounding then use Math.round() method, which converts float to its nearest integer by adding +0.5 to it's value and then truncating it.
Java tutorial , primitive datatypes
Java language specifications, casting of primitive types

Related

type of numbers that are not stored in a variable [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 4 years ago.
I'm a Little confused about some numbers in C-Code. I have the following piece of Code
int k;
float a = 0.04f;
for (k=0; k*a < 0.12; k++) {
* do something *
}
in this case, what is the type of "0.12" ? double ? float ? it has never been declared anywhere. Anyways, in my program the Loop above is excuted 4 times, even though 3*0.04=0.12 < 0.12 is not true. Once I Exchange 0.12 with 0.12F (because I am globally restricted to float precision in all of the program), the Loop is now executed 3 times. I do not understand why, and what is happening here. Are there any proper Guidelines on how to write such statements to not get unexpected issues?
Another related issue is the following: in the Definition of variables, say
float b = 1/180 * 3.14159265359;
what exactly "is" "1" in this case ? and "180" ? Integers ? Are they converted to float numbers ? Is it okay to write it like that ? Or should it be "1.0f/180.0f*3.14159265359f;"
Last part of the question,
if i have a fuction
void testfunction(float a)
which does some things.
if I call the fuction with testfunction(40.0/6.0), how is that Division handled ? It seems that its calculated with double precision and then converted to a float. Why ?
That was a Long question, I hope someone can help me understand it.
...numbers that are not stored in a variable
They are called "constants".
Any unsuffixed floating point constant has type double.
Quoting C11, chapter §6.4.4.2
An unsuffixed floating constant has type double. If suffixed by the letter f or F, it has
type float. If suffixed by the letter l or L, it has type long double.
For Integer constants, the type will depend on the value.
Quoting C11, chapter §6.4.4.1,
The type of an integer constant is the first of the corresponding list in which its value can
be represented. [..]
and for unsuffixed decimal number, the list is
int
long int
long long int
Regarding the mathematical operation accuracy of floating point numbers, see this post
"0.12" is a constant, and yes, without a trailing f or F, it will be interpreted as a double.
That number can not be expressed exactly as a binary fraction. When you compare k * a, the result is a float because both operands are floats. The result is slightly less than 0.12, but when you compare against a double, it gets padded out with zeros to the required size, which increases the discrepancy. When you use a float constant, the result is not padded (cast), and by good luck, comes out exactly equal to the binary representation of "0.12f". This would likely not be the case for a more complicated operation.
if we use a fractional number for eg, like 3.4 it's default type if double. You should explicitly specify it like 3.4f if you want it to consider as single precision (float).
if you call the following function
int fun()
{
float a= 1.2f;
double b = 1.2;
return (a==b);
}
this will always return zero (false).because before making comparison the float type is converted to double (lower type to higher type) , At this time sometimes it can't reproduce exact 1.2, a slight variations in value can be happen after 6th position from decimal point. You can check the difference by the following print statement
printf("double: %0.9f \n float: %0.9f\n",1.2,1.2f);
It results ike
double: 1.200000000
float: 1.200000481

Why is the return type of the "ceil()" function "double" instead of some integer type?

I've just implemented a line of code, where two numbers need to be divided and the result needs to be rounded up to the next integer number. I started very naïvely:
i_quotient = ceil(a/b);
As the numbers a and b are both integer numbers, this did not work: the division gets executed as an integer division, which is rounding down by default, so I need to force the division to be a floating point operation:
i_quotient = ceil((double) a / b);
Now this seems to work, but it leaves a warning saying that I am trying to assign a double to an integer, and indeed, following the header file "math.h" the return type of the ceil() function is "double", and now I'm lost: what's the sense of a rounding function to return a double? Can anybody enlighten me about this?
A double has a range that can be greater than any integer type.
Returning double is the only way to ensure that the result type has a range that can handle all possible input.
ceil() takes a double as an argument. So, if it were to return an integer, what integer type would you choose that can still represent its ceiled value?
Whatever may be the type, it should be able to represent all possible double values.
The integer type that can hold the highest possible value is uintmax_t.
But that doesn't guarantee it can hold all double values even in some implementations it can.
So, it makes sense to return a double value for ceil(). If an integer value is needed, then the caller can always cast it to the desired integer type.
OP starts with two integers a,b and questions why a function double ceil(double) that takes a double, does not return some integer type.
Most floating-point math functions take floating point arguments and return the same type.
A big reason double ceil(double) does not return an integer type is because that limited functionality is rarely needed. Integer types have (or almost always have) a more limited range that double. ceil(DBL_MAX) is not expected to fit in an integer type.
There is little need to use double math to solve an integer problem.
If code needs to divide integers and round up the quotient, use the following. Ref:#mch
i_quotient = (a + b - 1) / b;
The above will handle most of OP's cases when a >= 0 and b > 0. Other considerations are needed when a or b are negative or if a + b - 1 may overflow.
Because why should it? Converting betwen int and double takes time. This overhead can become significant. If you want to convert a double to int do so explicitly:
i_quotient = (int)ceil((double) a / b);
Check this answer if you want to know more about this latency. You have to consider that C is quit old and achievable performance was one of the top priorities. But even C# and other modern languages usually return a floating value for ceil just for consistency.
Leaving technical discussions apart, couldn't be simply for consistency?
If the function takes a double it should return a result of the same type, if there's no particular reasons to return a different type.
It's up to the user to transform it to an integer if he needs to.
After all you may be working only with doubles in your application.
Although ceil means to round up to the next whole number , it doesn't mean strictly that it is an integer, it's obvious that an integer is a whole number but that doesn't have to prejudice our mind.

how to truncate a number with a decimal point into a int? what's the function for this?

The problem occurs when I do a division operation. I would like to know who to truncate a number with a decimal point into a whole number such as 2, 4, 67.
It truncates automatically is you assign value to "int" variable:
int c;
c = a/b;
Or you can cast like this:
c = (int) (a/b);
This truncates it even if c is defined as float or double.
Usually truncation is not the best (depends what you want to achieve of course). Usually result is rounded like this:
c= round(a/b,0);
is more intelligent because rounds result properly. If you use linux, you can easily get reference with "man round" about exact data types etc.
You can use the trunc() function defined in math.h. It will remove fractional part and will return nearest integer not larger than the given number.
This is how it is defined:
double trunc(double x);
Below is how you can use it:
double a = 18.67;
double b = 3.8;
int c = trunc(a/b);
You can check man trunc on Linux to get more details about this function. As pointed out in previous answers, you can cast division result to integer or it will automatically be truncated if assigned to integer but if you were interested to know about a C function which does the job then trunc() is the one.
int result = (int)ceilf(myFloat );
int result = (int)roundf(myFloat );
int result = (int)floor(myFloat);
float result = ceilf(myFloat );
float result = roundf(myFloat );
float result = floor(myFloat);
I think it will be helpful to you.
Manually or implicitly casting from a floating-point type to an integral type causes automatic truncation toward zero. Keep in mind that if the integral type is not sufficiently large to store the value, overflow will occur. If you simply need to print the value with everything past the decimal point truncated, use printf():
printf("%.0f", floor(float_val));
As Tõnu Samuel has pointed out, that printf() invocation will actually round the floating-point parameter by default.

When a double with an integer value is cast to an integer, is it guaranteed to do it 'properly'?

When a double has an 'exact' integer value, like so:
double x = 1.0;
double y = 123123;
double z = -4.000000;
Is it guaranteed that it will round properly to 1, 123123, and -4 when cast to an integer type via (int)x, (int)y, (int)z? (And not truncate to 0, 123122 or -5 b/c of floating point weirdness). I ask b/c according to this page (which is about fp's in lua, a language that only has doubles as its numeric type by default), talks about how integer operations with doubles are exact according to IEEE 754, but I'm not sure if, when calling C-functions with integer type parameters, I need to worry about rounding doubles manually, or it is taken care of when the doubles have exact integer values.
Yes, if the integer value fits in an int.
A double could represent integer values that are out of range for your int type. For example, 123123.0 cannot be converted to an int if your int type has only 16 bits.
It's also not guaranteed that a double can represent every value a particular type can represent. IEEE 754 uses something like 52 or 53 bits for the mantissa. If your long has 64 bits, then converting a very large long to double and back might not give the same value.
As Daniel Fischer stated, if the value of the integer part of the double (in your case, the double exactly) is representable in the type you are converting to, the result is exact. If the value is out of range of the destination type, the behavior is undefined. “Undefined” means the standard allows any behavior: You might get the closest representable number, you might get zero, you might get an exception, or the computer might explode. (Note: While the C standard permits your computer to explode, or even to destroy the universe, it is likely the manufacturer’s specifications impose a stricter limit on the behavior.)
It will do it correctly, if it really is true integer, which you might be assured of in some contexts. But if the value is the result of previous floating point calculations, you could not easily know that.
Why not explicitly calculate the value with the floor() function, as in long value = floor(x + 0.5). Or, even better, use the modf() function to inspect for an integer value.
Yes it will hold the exact value you give it because you input it in code. Sometimes in calculations it would yield 0.99999999999 for example but that is due to the error in calculating with doubles not its storing capacity

Why does Splint (the C code checker) give an error when comparing a float to an int?

Both are mathematical values, however the float does have more precision. Is that the only reason for the error - the difference in precision? Or is there another potential (and more serious) problem?
It's because the set of integer values does not equal the set of float values for the 'int' and 'float' types. For example, the float value 0.5 has no equal in the integer set and the integer value 4519245367 might not exist in the set of values a float can store. So, the checker flags this as an issue to be checked by the programmer.
Because it probably isn't a very good idea. Not all floats can be truncated to ints; not all ints can be converted to floats.
When doing the comparison, the integer value will get "promoted" to a floating point value. At that point you are doing a exact equality comparison between two floating point numbers, which is almost always a bad thing.
You should generally have some sort of "epsilon ball", or range of acceptable values, and you do the comparison if the two vaues are close enough to each other to be considered equal. You need a function roughly like this:
int double_equals(double a, double b, double epsilon)
{
return ( a > ( b - epsilon ) && a < ( b + epsilon ) );
}
If your application doesn't have an obvious choice of epsilon, then use DBL_EPSILON.
Because floats can't store an exact int value, so if you have two variables, int i and float f, even if you assign "i = f;", the comparison "if (i == f)" probably won't return true.
Assuming signed integers and IEEE floating point format, the magnitudes of integers that can be represented are:
short -> 15 bits
float -> 23 bits
long -> 31 bits
double -> 52 bits
Therefore a float can represent any short and a double can represent any long.
If you need to get around this (you have a legitimate reason and are happy none of the issues mentioned in the other answers are an issue for you) then just cast from one type to another.

Resources