how do i round a float to an int? [closed]

how do i round a float to an int? [closed] - c

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
say i have:
55.3 and want 55
55.6 and want 56
81.1 and want 81
etc.
i have been trying to use the round() function but it seems to keep giving me the highest value for all decimal places.

OP has not posted code that failed. Certain OP coded incorrectly.
Using round() is the correct way to round a double before converting to an int. Of course it must be in range.
#include <math.h>
#include <assert.h>
int round_to_int(double x) {
x = round(x);
assert(x >= INT_MIN);
assert(x < (INT_MAX/2 + 1)*2.0);
return (int) x;
}
See How to test for lossless double / integer conversion? for details about the assert()s.
Why not use (int)(x + 0.5);?
1) It fails for negative numbers.
2) It can fail for the double just smaller than 0.5 as the x + 0.5 can round to 1.0.
3) When the precision of int exceeds double, values where the least significant bit is 0.5 or 1.0, x+0.5 may round to the next integer.
4) Unadorned, it has no range checking.

In the olden days we used to say int the_int = (int)(some_double + 0.5); (obviously beware if you are dealing with negative values too).

Increase the magnitude by one half and truncate:
int round_up_or_down(double x)
{
return x > 0 ? x + 0.5 : x - 0.5;
}
This distributes the real intervals uniformly:
...
[-1.5, -0.5) => -1
[-0.5, +0.5) => 0
[+0.5, +1.5) -> +1
...

Related

What is the difference in a division by 12 and the subtraction of a n/3 by n/4 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I was doing a cs50 lab and I implemented the following algorithm in c.
The code must return values in int, so I declared n_initial and n_final how int.
while (n_final>n_initial)
{
n_initial += n_initial/12;
cont++ ;
}
The problem is that with a very large value it works, in these cases it does not come out of the loop while.
For instance when n_inital = 10 and n_final = 100,the program does not come out of the loop while,only when i debug the code that it works "right",since the value does not come out as expected.
//other algoritm which work
while (n_final>n_initial)
{
n_initial += (n_initial/3) - (n_initial/4);
cont++ ;
}
so i did it that other way and it worked and I wanted to know why this,with a view to mathematically they are the same thing.
//sorry for my bad english ,I'm brasilian

Assuming that n_initial is an integer, the result of each division will round down to the next integer. So, for example, the result for an initial value of 33 would be:
n_initial += n_initial/12;
—> 33 + 2 = 35
n_initial += (n_initial/3) - (n_initial/4);
—> 33 + 11 - 8 = 36
So while using fractional arithmetic by hand, the two would produce the same result, when using integers the do not. Note that if floating point arithmetic were used, the values would be nearly equivalent.

it is clear that (in mathematics):
(initial/3.0) - (initial/4.0) ==
(4.0 * initial - 3.0 * initial) / 12.0 ==
((4.0 - 3.0) * initial) / 12.0 ==
(1.0 * initial) / 12.0 ==
initial / 12.0
so they should be equal (as real values)... but as you are using integers (and integer division), they are not. Integer division loses precision based on rounding erros, so the second expression gives you numbers less rounded down (as the denominators are smaller).
In the case you expose, if you divide 10 by 12 you get 0 as quotient (there's no 0.83333333333 integer number, you lose the full 10 numbers in the remainder, as the denominator is larger than the numerator), and then you end adding 0 to initial and this makes the loop to run forever.
When you do the second expresion, you have: (10*4)/3 - (10*3)/4 == 40 / 3 - 30 / 4 == 13 - 7 == 6 (you lose the remainder, 40 % 3 == 1, in the first operand, and 30 % 4 == 2 in the second) and this time you end adding a value of 6, this makes the loop to end.
You have been operating real number algebra with integer numbers, thinking that they are real valued ones.... and so, the two expressions you wrote above are not equivalent. You could solve your problem by introducing some floating point literals and forcing operations to be done as floating point numbers (but anyway, if you force finally your numbers to be integers, you can be again in the same problem). As in
float initial;
initial += initial / 12.0; /* 0.83333333 */
initial += initial / 3.0 - initial / 4.0; /* 0.8333333 */
will give you the same exact (well, quasi exact) result.

Negative zero isn't equal to positive zero [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
so i am having this strange problem here and I don't know what to do.
so in the following I am posting an excerpt of my code:
printf("%lf , %lf \n", cGrid_Y,sideLength);
printf("%lf <= %lf\n", Point_Y, cGrid_Y+sideLength);
bool x = (Point_Y <= (sideLength + cGrid_Y) );
printf("%s \n", x ? "true" : "false");
cGrid_Y and sideLength are doubles. And I am getting this output:
-12.800000 , 12.800000
0.000000 <= -0.000000
false
So my question is, why I am not getting a true ?

This is not a problem with negative zeros. 0.0 <= -0.0 is true. The problem is that your values are not actually zero or negative zero but some very small value that's being rounded to 0 for presentation when you ask printf to show it rounded to 6 decimal places. Either print with %e or %g (which will use exponential notation to show a better approximation) or %.1100f which is sufficient precision to show the exact value of any double.

So my question is, why I am not getting a true ?
Are you sure that Point_Y == 0.0? Same for the sum of sideLength + cGrid_Y?
Check your assumptions:
printf("Point_Y is zero: %d\n", Point_Y == 0.0);
printf("`sideLength + cGrid_Y` is zero: %d\n", (sideLength + cGrid_Y) == 0.0);
The reason you got confused is because by default printf's floats/doubles are't printed to full precision.
#include <stdio.h>
int main()
{
double x = 0.0000001;
printf("x: %f\n", x);
}
outputs: x: 0.000000

#edit I wrote a stupid mistaken thing: I thought that negative zero doesn't equal positive zero, whereas after reading it all properly -0 equals +0.
The source of your problem: you are trying to test whether two doubles are equal, and this is a terrible idea, due to the imprecision of floating point representation. Note the common example: 0.1 + 0.2 does NOT equal 0.3:
Is floating point math broken?
https://www.quora.com/Why-is-0-1+0-2-not-equal-to-0-3-in-most-programming-languages
You cannot represent 12.8 in binary without losing a part of the imformation, due to the fundamental nature of how this number is stored in the computer's memory.
The same way you can't represent 1/7 in our "normal" decimal representation of numbers with a finite amount of digits.

Questions about float characteristics [duplicate]

This question already has answers here:
Why are floating point numbers inaccurate?
(5 answers)
Closed 5 years ago.
Q1: For what reason isn't it recommended to compare floats by == or != like in V1?
Q2: Does fabs() in V2 work the same way, like I programmed it in V3?
Q3: Is it ok to use (x >= y) and (x <= y)?
Q4: According to Wikipedia float has a precision between 6 and 9 digits, in my case 7 digits. So on what does it depend, which precision between 6 and 9 digits my float has? See [1]
[1] float characteristics
Source: Wikipedia
Type | Size | Precision | Range
Float | 4Byte ^= 32Bits | 6-9 decimal digits | (2-2^23)*2^127
Source: tutorialspoint
Type | Size | Precision | Range
Float | 4Byte ^= 32Bits | 6 decimal digits | 1.2E-38 to 3.4E+38
Source: chortle
Type | Size | Precision | Range
Float | 4Byte ^= 32Bits | 7 decimal digits | -3.4E+38 to +3.4E+38
The following three codes produce the same result, still it is not recommended to use the first variant.
1. Variant
#include <stdio.h> // printf() scanf()
int main()
{
float a = 3.1415926;
float b = 3.1415930;
if (a == b)
{
printf("a(%+.7f) == b(%+.7f)\n", a, b);
}
if (a != b)
{
printf("a(%+.7f) != b(%+.7f)\n", a, b);
}
return 0;
}
V1-Output:
a(+3.1415925) != b(+3.1415930)
2. Variant
#include <stdio.h> // printf() scanf()
#include <float.h> // FLT_EPSILON == 0.0000001
#include <math.h> // fabs()
int main()
{
float x = 3.1415926;
float y = 3.1415930;
if (fabs(x - y) < FLT_EPSILON)
{
printf("x(%+.7f) == y(%+.7f)\n", x, y);
}
if (fabs(x - y) > FLT_EPSILON)
{
printf("x(%+.7f) != y(%+.7f)\n", x, y);
}
return 0;
}
V2-Output:
x(+3.1415925) != y(+3.1415930)
3. Variant:
#include <stdio.h> // printf() scanf()
#include <float.h> // FLT_EPSILON == 0.0000001
#include <stdlib.h> // abs()
int main()
{
float x = 3.1415926;
float y = 3.1415930;
const int FPF = 10000000; // Float_Precission_Factor
if ((float)(abs((x - y) * FPF)) / FPF < FLT_EPSILON) // if (x == y)
{
printf("x(%+.7f) == y(%+.7f)\n", x, y);
}
if ((float)(abs((x - y) * FPF)) / FPF > FLT_EPSILON) // if (x != y)
{
printf("x(%+.7f) != y(%+.7f)\n", x, y);
}
return 0;
}
V3-Output:
x(+3.1415925) != y(+3.1415930)
I am grateful for any help, links, references and hints!

When working with floating-point operations, almost every step may introduce a small rounding error. Convert a number from decimal in the source code to the floating-point format? There is a small error, unless the number is exactly representable. Add two numbers? Their exact sum often has more bits than fit in the floating-point format, so it has to be rounded to fit. The same is true for multiplication and division. Take a square root? The result is usually irrational and cannot be represented in the floating-point format, so it is rounded. Call the library to get the cosine or the logarithm? The exact result is usually irrational, so it is rounded. And most math libraries have some additional error as well, because calculating those functions very precisely is hard.
So, let’s say you calculate some value and have a result in x. It has a variety of errors incorporated into it. And you calculate another value and have a result in y. Suppose that, if calculated with exact mathematics, these two values would be equal. What is the chance that the errors in x and y are exactly the same?
It is unlikely. If x and y were calculated in different ways, they experienced different errors, and it is essentially chance whether they have the same total error or not. Therefore, even if the exact mathematical results would be equal, x == y may be false because of the errors.
Similarly, two exact mathematical values might be different, but the errors might coincide so that x == y returns true.
Therefore x == y and x != y generally cannot be used to tell if the desired exact mathematical values are equal or not.
What can be used? Unfortunately, there is no general solution to this. Your examples use FLT_EPSILON as an error threshold, but that is not useful. After doing more than a few floating-point operations, the error may easily accumulated to be more than FLT_EPSILON, either as an absolute error or a relative error.
In order to make a comparison, you need to have some knowledge about how large the accumulated error might be, and that depends greatly on the particular calculations you have performed. You also need to know what the consequences of false positives and false negatives are—is it more important to avoid falsely stating two things are equal or to avoid falsely stating two things are unequal? These issues are specific to each algorithm and its data.

Because on 64 bit machine you will find out that 0.1*3 = 0.30000000000000004 :-)
See the links #yano and #PM-77-1 provided as comments.

You know machine stores everything using 0 and 1.
Also know that not every floating point value is representable in binary within a limited bits.
Computers stores possible nearest representable binary of the given numbers.
So their is a difference between 2.0000001 and 2.0000000 in the eye of computer (but we say they are equal!).
Not always this trouble appears, but it is risky.

rand() and RAND_MAX giving different values depending upon the datatype [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Here when I print the value of x its giving zero as output.Whereas when I print y, I am getting correct value(a random number between 0 and 1),the typecasting is the problem it seems.Why do i need to typecast it?
double x,y;
x=rand()/RAND_MAX;
printf("X=%f\n",x);
y=(double)rand()/RAND_MAX;
printf("Y=%f",y);
Output
X=0.000000
Y=0.546745

When you divide an integer by an integer, you get truncating integer division.
So using
y = (double)rand() / RAND_MAX;
is absolutely the right way to get the result you want.

Different types of division yielded different responses.
// Integer division
x=rand()/RAND_MAX;
// floating-point division
y=(double)rand()/RAND_MAX;
It isn't the a cast is needed, but it is one of the ways to insure floating-point division. I like the last one as it insures the division will be done at least to the precision of x, be it float, double or long double without changing code.
x = (double)rand()/RAND_MAX;
x = 1.0*rand()/RAND_MAX;
x = rand()/(1.0*RAND_MAX);
x = rand();
x /= RAND_MAX;
BTW, often code needs to generate a number [0.0 ... 1.0): from 0.0 to almost 1.0.
x = rand();
x /= RAND_MAX + 1.0;

Bitwise overflow checking in c [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am trying to write two functions that will check/prevent overflow in c (using only ! ~ | & ^ +) but cant get it. The first is will a certain twos compliment/signed int will fit in a certatin amount of bits: fitsB(int x, int n) where is the int and n is the size of bits to use. Also a function that will check to see if two ints will not overflow when added together: overflowInt(int x, int y). I can get it if they are unsigned ints but the negatives just make things harder for me. Anyone know how to?
There also is no casting and ints are always 32 bit

/*
* addOK - Determine if can compute x+y without overflow
* Example: addOK(0x80000000,0x80000000) = 0,
* addOK(0x80000000,0x70000000) = 1,
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 20
* Rating: 3
*/
int addOK(int x, int y) {
// Find the sign bit in each word
//if a and b have different signs, you cannot get overflow.
//if they are the same, check that a is different from c and b is different from c,
// if they are the same, then there was no overflow.
int z=x+y;
int a=x>>31;
int b=y>>31;
int c=z>>31;
return !!(a^b)|(!(a^c)&!(b^c));
}

x will fit in n bits if x < 2^(n-1).
The overflow question needs more information. Two ints will not overflow if you assign them to a long (or a double).

Using the above example (Adam Shiemke), you can find the maximum (positive) value and minimum value (negative) to get the range for n number of bits. 2^(n-1) (from Adam's example) and minus one for the maximum/positive number which can be represented in the n bits. For the minimum value, negate 2^(n-1) to get the minimum value x => -(2^(n-1)); (Note the >= not > for the minimum range). For example, for n = 4 bits, 2^(4-1) - 1 = 2^3 -1 = 7 so x <= 7 and x >= -8 = (-(2^(4-1)).
This assumes the initial input does not overflow a 32 bit quanity (Hopefully an error occurs in that condition) and the number of bits you are using is less then 32 (as you are adding 1 for the negative range and if you have 32 bits, it will overflow, see below for an explanation).
To determine if addition will overflow, if you have the maximum value, the x + y <= maximum value. By using algebra, we can get y <= maximum value - x. You can then compare the passed in value for y and if it does not meet the condition, the addition will overflow. For example if x is the maximumn value, then y <= 0, so y must be less then or equal to zero or the addition will overflow.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight