Round up a float number that has a non-zero decimal digit

Round up a float number that has a non-zero decimal digit - c

I was asking about round a number half up earlier today and got great help from #alk. In that post, my thinking was to round up 4.5 to 5 but round 4.4 down to 4. And the solution given by #alk was:
int round_number(float x)
{
return x + 0.5;
}
and it works very elegantly!
In this post, I would like to discuss how to implement the ceil() function in C.
Along the same line as the last solution given by #alk, I came up with the following:
int round_up(float y)
{
return y + 0.99999999;
}
This works for all situations except when the the float number y has .00000001. I am wondering if there's any better way to do the same thing as ceil() in C.

Unless you reliably know the epsilon of float (I'm not sure standard C provides that), I think you're stuck with return (y < 0 || y == (int)y) ? y : y + 1;

This fails for negative numbers.
int round_up(float y) {
return y + 0.99999999;
}
But let's use that to our advantage. float to int conversion is a truncate toward 0.0. Thus negative numbers are doing a "round up" or "ceiling" function. When we have a positive float, convert to int noting this is a "floor" function. Adjust when y is not an integer.
(Assume y within INT_MIN ... INT_MAX.)
int ceil(float y) {
if (y < 0) {
return y; // this does a ceiling function as y < 0.
}
int i = y; // this does a floor function as y >= 0.
if (i != y) i++;
return i;
}
void ceil_test(float y) {
printf("%f %d\n", y, ceil(y));
}

The first snippet works incorrectly for negative numbers. -3.5 will be come -3, not -4. To round values properly use
int round_number(float x)
{
if (x >= 0)
return x + 0.5f;
else
return x - 0.5f
}
Even that way it's still incorrect for 2 values. See Why does Math.round(0.49999999999999994) return 1?. Note that you need to use the f suffix to get the float literal, otherwise the operation will be done in double precision and then downcast back to float
For ceiling, adding 1 is enough
int ceiling(float x)
{
if (x < 0 || (int)x == x)
return x;
else
return x + 1.0f;
}
When x is an integer, e.g. x = 3.0 (or -3.0), it returns 3 (or -3). For x = 3.1 it returns 4, for x = -3.1 it returns -3

Related

How to find the numbers that satisfy (x - y * sqrt(2016.0)) / (y + sqrt(2016.0)) = 2016 using loops in C

I'm trying to find numbers that satisfy the clause (x - y * √ 2016) / (y + √ 2016) = 2016.
Number x and y can be rational numbers.
That's what I already tried:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main() {
int x, y;
for(x = 1; x < 10000; x++) {
for(y = 1; y < 10000; y++) {
if( (x - y * sqrt(2016.0)) / (y + sqrt(2016.0) ) == 2016) {
printf("Numbers are: %d and %d.", x, y);
}
}
}
return 0;
}

Using floating point math and brute force search to "solve" this problem is conceptionally a bad idea. This is because with FP math round-off error propagates in a non-intuitive way, and hence many equations that are solvable in a mathematical sense have no (exact) solution with FP numbers. So using FP math to approximate solutions of mathematical equations is inherently difficult.
I suggest a simplification of the problem before programming.
If one does this and only searches for integer solutions one would find that the only solutions are
x = -2016^2 = -4064256
y = -2016
Why: Just rearrange a bit and obtain
x = 2016*y + (2016 + y)*sqrt(2016)
Since sqrt(2016) is not an integer the term in the clause before the sqrt must be zero. Everything else follows from that.
In case a non-integer solution is desired, the above can be used to find the x for every y. Which even enumerates all solutions.
So this shows that simplification of a mathematical problem before attempted solution in a computer is usually mandatory (especially with FP math).
EDIT: In case you look for rational numbers, the same argument can be applied as for the integer case. Since sqrt(2016) is not a rational number, y must also be -2016. So for the rational case, the only solutions are the same as for the integers, i.e,
x = -2016^2 = -4064256
y = -2016

This is just the equation for a line. Here's an exact solution:
x = (sqrt(2016) + 2016)*y + 2016*sqrt(2016)
For any value of y, x is given by the above. The x-intercept is:
x = 2016*sqrt(2016)
y = 0
The y-intercept is:
x = 0
y = -2016*sqrt(2016)/(sqrt(2016)+2016)

numbers that satisfy (x - y * sqrt(2016.0)) / (y + sqrt(2016.0)) = 2016
Starting with #Tom Karzes
x = (sqrt(2016) + 2016)*y + 2016*sqrt(2016)
Let y = -2016
x = (sqrt(2016) + 2016)*-2016 + 2016*sqrt(2016)
x = 2016*-2016 = -4064256
So x,y = -4064256, -2016 is one exact solution.
With math, this is the only one.
With sqrt(x) not being exactly √x and peculiarities of double math, there may be other solutions that pass a C code simulation.
As a C simulation like OP's, lets us "guess" the answer's x,y are both multiples of 2016 and may be negative.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double f(int x, int y) {
double z = (x - y * sqrt(2016.0)) / (y + sqrt(2016.0));
z = z - 2016;
return z * z;
}
#define M 3000
int main() {
double best_diff = DBL_MAX;
int best_x = 0;
int best_y = 0;
int x, y;
for (x = -M; x < M; x++) {
for (y = -M; y < M; y++) {
double diff = f(x * 2016, y * 2016);
if (diff < best_diff) {
best_diff = diff;
best_x = x;
best_y = y;
}
if (diff == 0) {
printf("Numbers are: %d and %d.\n", best_x*2016, best_y*2016);
}
}
}
if (best_diff != 0.0) {
printf("Numbers are: %d and %d --> %e.", best_x*2016, best_y*2016, best_diff);
}
return 0;
}
Output
Numbers are: -4064256 and -2016.

The result from operations with floats are in general not exact.
Change:
if( (x - y * sqrt(2016.0)) / (y + sqrt(2016.0) ) == 2016)
to something like
if( fabs((x - y * sqrt(2016.0)) / (y + sqrt(2016.0) ) - 2016) < 0.00000001)
where 0.00000001 is a tolerance chosen by you.
But as pointed out, you don't want to search through the domains of more variables than necessary. Solve the math first. Using Wolfram Alpha like this we get y=(x-24192*√14)/(12*(168+√14))

Program will compile but not execute

I'm relatively new to C and I am trying to get a program to run. It's a program that lists all primes from 1 - 1000. The way I compile it is by typing gcc Primes.c into the command line and running it in NppExec.
Code:
#include <stdio.h>
#include <math.h>
int x;
int bool = 1;
int y;
main() {
for (x = 2; x <= 100; x++) {
bool = 1;
for (y = 0; y <= floor(x/2); y++) {
if (x % y == 0) {
bool = 0;
}
}
if (bool == 1) {
printf("%d\n", x);
}
}
}
I type:
NPP_SAVE
CD $(CURRENT_DIRECTORY)
C:\MinGW32\bin\gcc.exe -g "$(FILE_NAME)"
a
into the box and then execute. A window will pop up and say a.exe has stopped working. Any idea's on how to fix this?

There is a bug and many smaller issues in your code:
You should not define x, bool and y as global variables, define these variables inside the body of the main function.
The prototype for main is int main(void) or int main(int argc, char *argv[]). main() is an obsolete form that is no longer supported by the C standard. gcc will accept it but clang will not by default. You should also add return 0; at the end of main: although it was made implicit in C99, it is better style and more portable to older compilers.
You should not use bool as a variable name, bool is usually used as a type name, name your flag prime.
You should loop until 1000 as instructed.
y <= floor(x / 2) does not work as expected: x / 2 is an integer expression, it already has the value you expect, passing it to floor will convert it as a double and return the same value, y will be automatically converted to double for the comparison. All this is unnecessary, y <= x / 2 suffices. floor is the only function used from <math.h>, you can remove this include file too.
Note that your condition will let the program perform way too many iterations, you should instead use y * y <= x.
Initializing y to 0 will cause a division by zero when you compute x % y, invoking undefined behavior, the crash you observe. You must initialize y to 2 because all numbers are divisible by 1.
You should exit the loop when you detect that x is composite, avoiding unnecessary iterations.
Here is a corrected version:
#include <stdio.h>
int main(void) {
int x, y, prime;
for (x = 2; x <= 1000; x++) {
prime = 1;
for (y = 2; y * y <= x; y++) {
if (x % y == 0) {
prime = 0;
break;
}
}
if (prime == 1) {
printf("%d\n", x);
}
}
return 0;
}

You have
for (y = 0; y <= floor(x/2); y++) {
if (x % y == 0) {
You initialize y to zero, then you directly divide by y (i.e. zero). Division by zero doesn't work, it will crash your program.

The problem are these 2 lines:
for (y = 0; y <= floor(x/2); y++) {
if (x % y == 0) {
What do you expect to get with 2 % 0?
As per the standard:
C99 6.5.5p5 - The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.
The crash is because of the Modulo operator when y is 0.
how to check if there is a division by zero in c

How to evaluate the Sine Series (Taylor) for value of x using Recursion in C?

(C) calculate series
y = x - x3/3! + x5/5! - x7/7! + .....
where stopping criterion is
| xi/i! | <= 0.001
What I have tried :
#include<stdio.h>
#include<math.h>
int fact(int x){
if(x>1){
return x * fact(x-1);
}
else {
return 1 ;
}
}
int main () {
int x , i=1 , sign=1 ;
float result ;
scanf("%d",&x);
while(abs(pow(x,i)/fact(i))>0.001){
result += sign*(pow(x,i)/fact(i));
i+2;
sign = sign * -1 ;
}
printf("result= %f\n",result);
return 0 ;
}
the problem is
when i input 90 ... the output should be 1 ... ( it's like the sin(x) )
im getting a different output

The code (at least) misses to initialise result.
Change
float result;
to
float result = 0.f;
Also
i+2;
is a NOP (no-operation). It results in nothing. It adds 2 to i and does not assign the result to anything, "throughs away" the result.
To increment i by 2 do:
i = i + 2;
or
i += 2;
Also^2 using abs() won't work as it return int.
Use fabs() to get a floating point value.
Or just do not use it at all as it's argument will never be negative here.
As a final advice prefer using double over float, as floats accurary is limited.

The problem is very clear. You have to convert degree into radian before performing the loop. Your code has some other issues also.
Here is the rectified code, which gives you 1 for 90:
#include<stdio.h>
#include<math.h>
int fact(int x){
if(x>1){
return x * fact(x-1);
}
else {
return 1 ;
}
}
int main () {
int x , i=1 , sign=1;
double result, rad;
scanf("%d",&x);
rad = x/180.0*3.1415;
while((pow(x,i)/fact(i))>0.001){
result += sign*(pow(rad,i)/fact(i));
i+=2;
sign *= -1 ;
}
printf("result= %f\n",result);
return 0 ;
}

sin(90)=0.89399666360055789051826949840421...
Besides the unit confusion, your code is not very efficient, as you compute the powers and factorials from scratch on each term, when a nice recurrence exists.
Sin= x
Term= x
Factor= -x*x
i= 2
while |Term| > 0.001:
Term*= Factor / (i * (i+1))
Sin+= Term
i+= 2
Because of huge cancellation errors, this formula is not appropriate for large values of the argument. My own implementation gives -1.07524337969e+21 for 90.
If you compute it for 90-14*2*Pi instead, you get 0.893995..., not a so bad result.

An algorithm that calculates sin (x) using the following power series: sin (x) = (x / 1!) - (X ^ 3/3) + (x ^ 5/5!) - (! ^ x 7/7) + ... We stop the calculation when the difference between two succesive terms of the sum given is less than a certain tolerance.

Comparing fractions with struct

The function is supposed to compare two fractions that are stored in two structs.
If fraction L = fraction R return 0
If L > R return 1
If R > L return -1
Here is the code I have now:
int compare_fractions(Fraction L, Fraction R)
{
double z = (L.numer/L.denom) - (R.numer/R.denom);
// THIS CODE IS INCORRECT - FIX IT!
if(z == 0)
return 0;
else if(z < 0)
return -1;
else if(z
return 1;
}
However when I run the following tests I receive 0's with the following comparisons:
(1,3) ? (2,3)
(5,6) ? (3,4)
(2,4) ? (1,4)
where (1,3) is fraction L and (2,3) is fraction R

If the numerator and denominator are ints (or another integer type) then the division is integer division, you'll never get the correct fractional part
Casting it to double can correct most of the problem but you'll face the slow divisions and sometimes errors due to floating-point roundings.
You should use multiplication instead. It'll be much faster and you don't need a floating-point division which is very slow on some architectures. This way you don't need to worry about floating-point comparisons either
int compare_fractions(Fraction L, Fraction R)
{
int z = L.numer*R.denom - L.denom*R.numer;
if (z == 0)
return 0;
else if (z > 0)
return 1;
else
return -1;
}
Of course you need to make sure that all the denominators are positive, otherwise you need to normalize it (you can use chux's suggestion below). You also need to account for overflow if you values can be large by doing the math in a wider type like
long long z = (long long)L.numer*R.denom - L.denom*R.numer
If you can lax the requirements a bit to return negative, 0 or positive values for less than, equal or more than case just like strcmp() then you can remove the checks for z's value altogether and return L.numer*R.denom - L.denom*R.numer directly instead
If you still need to return -1, 0 and 1 then there are several ways to shorten/optimize it like
return (z > 0) - (z < 0);
return (z == 0) ? 0 : (z < 0 ? -1 : 1);
return (z >> 31) | (!!z);
Is there a standard sign function (signum, sgn) in C/C++?
Fast sign of integer in C
Branchless code that maps zero, negative, and positive to 0, 1, 2

When you divide an int by another int, it will first divide them and (because the result must be an int as well) rounds the result towards zero. First at this point is it cast into a double:
int a = 7;
int b = 3;
double c = a / b; // = 2, because 2.333... rounded down is 2, which is
// then cast to a double
The solution is to cast either the numerator or the denominator to a double before dividing:
int a = 7;
int b = 3;
double c = (double)a / b; // = 2.333... because it's cast to a double before
// dividing
//double c = a / (double)b; // this will also work
More specifically, if you change one line in your code to this, it should work:
double z = ((double)L.numer/L.denom) - ((double)R.numer/R.denom);

How to check if float can be exactly represented as an integer

I'm looking to for a reasonably efficient way of determining if a floating point value (double) can be exactly represented by an integer data type (long, 64 bit).
My initial thought was to check the exponent to see if it was 0 (or more precisely 127). But that won't work because 2.0 would be e=1 m=1...
So basically, I am stuck. I have a feeling that I can do this with bit masks, but I'm just not getting my head around how to do that at this point.
So how can I check to see if a double is exactly representable as a long?
Thanks

I think I have found a way to clamp a double into an integer in a standard-conforming fashion (this is not really what the question is about, but it helps a lot). First, we need to see why the obvious code is not correct.
// INCORRECT CODE
uint64_t double_to_uint64 (double x)
{
if (x < 0.0) {
return 0;
}
if (x > UINT64_MAX) {
return UINT64_MAX;
}
return x;
}
The problem here is that in the second comparison, UINT64_MAX is being implicitly converted to double. The C standard does not specify exactly how this conversion works, only that it is to be rounded up or down to a representable value. This means that the second comparison may be false, even if should mathematically be true (which can happen when UINT64_MAX is rounded up, and 'x' is mathematically between UINT64_MAX and (double)UINT64_MAX). As such, the conversion of double to uint64_t can result in undefined behavior in that edge case.
Surprisingly, the solution is very simple. Consider that while UINT64_MAX may not be exactly representable in a double, UINT64_MAX+1, being a power of two (and not too large), certainly is. So, if we first round the input to an integer, the comparison x > UINT64_MAX is equivalent to x >= UINT64_MAX+1, except for possible overflow in the addition. We can fix the overflow by using ldexp instead of adding one to UINT64_MAX. That being said, the following code should be correct.
/* Input: a double 'x', which must not be NaN.
* Output: If 'x' is lesser than zero, then zero;
* otherwise, if 'x' is greater than UINT64_MAX, then UINT64_MAX;
* otherwise, 'x', rounded down to an integer.
*/
uint64_t double_to_uint64 (double x)
{
assert(!isnan(x));
double y = floor(x);
if (y < 0.0) {
return 0;
}
if (y >= ldexp(1.0, 64)) {
return UINT64_MAX;
}
return y;
}
Now, to back to your question: is x is exactly representable in an uint64_t? Only if it was neither rounded nor clamped.
/* Input: a double 'x', which must not be NaN.
* Output: If 'x' is exactly representable in an uint64_t,
* then 1, otherwise 0.
*/
int double_representable_in_uint64 (double x)
{
assert(!isnan(x));
return (floor(x) == x && x >= 0.0 && x < ldexp(1.0, 64));
}
The same algorithm can be used for integers of different size, and also for signed integers with a minor modification. The code that follows does some very basic tests of the uint32_t and uint64_t versions (only false positives can possibly be caught), but is also suitable for manual examination of the edge cases.
#include <inttypes.h>
#include <math.h>
#include <limits.h>
#include <assert.h>
#include <stdio.h>
uint32_t double_to_uint32 (double x)
{
assert(!isnan(x));
double y = floor(x);
if (y < 0.0) {
return 0;
}
if (y >= ldexp(1.0, 32)) {
return UINT32_MAX;
}
return y;
}
uint64_t double_to_uint64 (double x)
{
assert(!isnan(x));
double y = floor(x);
if (y < 0.0) {
return 0;
}
if (y >= ldexp(1.0, 64)) {
return UINT64_MAX;
}
return y;
}
int double_representable_in_uint32 (double x)
{
assert(!isnan(x));
return (floor(x) == x && x >= 0.0 && x < ldexp(1.0, 32));
}
int double_representable_in_uint64 (double x)
{
assert(!isnan(x));
return (floor(x) == x && x >= 0.0 && x < ldexp(1.0, 64));
}
int main ()
{
{
printf("Testing 32-bit\n");
for (double x = 4294967295.999990; x < 4294967296.000017; x = nextafter(x, INFINITY)) {
uint32_t y = double_to_uint32(x);
int representable = double_representable_in_uint32(x);
printf("%f -> %" PRIu32 " representable=%d\n", x, y, representable);
assert(!representable || (double)(uint32_t)x == x);
}
}
{
printf("Testing 64-bit\n");
double x = ldexp(1.0, 64) - 40000.0;
for (double x = 18446744073709510656.0; x < 18446744073709629440.0; x = nextafter(x, INFINITY)) {
uint64_t y = double_to_uint64(x);
int representable = double_representable_in_uint64(x);
printf("%f -> %" PRIu64 " representable=%d\n", x, y, representable);
assert(!representable || (double)(uint64_t)x == x);
}
}
}

Here's one method that could work in most cases. I'm not sure if/how it will break if you give it NaN, INF, very large (overflow) numbers...
(Though I think they will all return false - not exactly representable.)
You could:
Cast it to an integer.
Cast it back to a floating-point.
Compare with original value.
Something like this:
double val = ... ; // Value
if ((double)(long long)val == val){
// Exactly representable
}
floor() and ceil() are also fair game (though they may fail if the value overflows an integer):
floor(val) == val
ceil(val) == val
And here's a messy bit-mask solution:
This uses union type-punning and assumes IEEE double-precision. Union type-punning is only valid in C99 TR2 and later.
int representable(double x){
// Handle corner cases:
if (x == 0)
return 1;
// -2^63 is representable as a signed 64-bit integer, but +2^63 is not.
if (x == -9223372036854775808.)
return 1;
// Warning: Union type-punning is only valid in C99 TR2 or later.
union{
double f;
uint64_t i;
} val;
val.f = x;
uint64_t exp = val.i & 0x7ff0000000000000ull;
uint64_t man = val.i & 0x000fffffffffffffull;
man |= 0x0010000000000000ull; // Implicit leading 1-bit.
int shift = (exp >> 52) - 1075;
// Out of range
if (shift < -52 || shift > 10)
return 0;
// Test mantissa
if (shift < 0){
shift = -shift;
return ((man >> shift) << shift) == man;
}else{
return ((man << shift) >> shift) == man;
}
}

You can use the modf function to split a float into the integer and fraction parts. modf is in the standard C library.
#include <math.h>
#include <limits.h>
double val = ...
double i;
long l;
/* check if fractional part is 0 */
if (modf(val, &i) == 0.0) {
/* val is an integer. check if it can be stored in a long */
if (val >= LONG_MIN && val <= LONG_MAX) {
/* can be exactly represented by a long */
l = val;
}
}

How to check if float can be exactly represented as an integer ?
I'm looking to for a reasonably efficient way of determining if a floating point value double can be exactly represented by an integer data type long, 64 bit.
Range (LONG_MIN, LONG_MAX) and fraction (frexp()) tests needed. Also need to watch out for not-a-numbers.
The usual idea is to test like (double)(long)x == x, but to avoid its direct usage. (long)x, when x is out of range, is undefined behavior (UB).
The valid range of conversion for (long)x is LONG_MIN - 1 < x < LONG_MAX + 1 as code discards any fractional part of x during the conversion. So code needs to test, using FP math, if x is in range.
#include <limits.h>
#include <stdbool.h>
#define DBL_LONG_MAXP1 (2.0*(LONG_MAX/2+1))
#define DBL_LONG_MINM1 (2.0*(LONG_MIN/2-1))
bool double_to_long_exact_possible(double x) {
if (x < DBL_LONG_MAXP1) {
double whole_number_part;
if (frexp(x, &whole_number_part) != 0.0) {
return false; // Fractional part exist.
}
#if -LONG_MAX == LONG_MIN
// rare non-2's complement machine
return x > DBL_LONG_MINM1;
#else
return x - LONG_MIN > -1.0;
#endif
}
return false; // Too large or NaN
}

Any IEEE floating-point double or float value with a magnitude at or above 2^52 or 2^23 will be whole number. Adding 2^52 or 2^23 to a positive number whose magnitude is less than that will cause it to be rounded to a whole number. Subtracting the value that was added will yield a whole number which will equal the original iff the original was a whole number. Note that this algorithm will fail with some numbers larger than 2^52, but it isn't needed for numbers that big.

Could you use the modulus operator to check if the double is divisible by one... or am I completely misunderstanding the question?
double val = ... ; // Value
if(val % 1 == 0) {
// Val is evenly divisible by 1 and is therefore a whole number
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Round up a float number that has a non-zero decimal digit - c

Unless you reliably know the epsilon of float (I'm not sure standard C provides that), I think you're stuck with return (y < 0 || y == (int)y) ? y : y + 1;

Related

How to find the numbers that satisfy (x - y * sqrt(2016.0)) / (y + sqrt(2016.0)) = 2016 using loops in C

Program will compile but not execute

How to evaluate the Sine Series (Taylor) for value of x using Recursion in C?

Comparing fractions with struct

How to check if float can be exactly represented as an integer

Categories

Resources