I need to convert a double/float angle to the range of [-180,180] by adding or subtracting 360. The remainder function works, but I am not sure why.
x = remainder (x, 360);
Why does this produce a range of [-180,180] and not [0,359.99999...]?
I understand that remainder and mod are the same for positive numbers, but they work differently for negative numbers... I just have not seen a good explanation of what is happening.
I'm happy that this works of course, but I don't really understand why.
Taken from cppreference:
The IEEE floating-point remainder of the division operation x/y calculated by this function is exactly the value x - n*y, where the value n is the integral value nearest the exact value x/y. When |n-x/y| = ½, the value n is chosen to be even.
In contrast to fmod(), the returned value is not guaranteed to have the same sign as x.
If the returned value is 0, it will have the same sign as x.
What's happening here is the function remainder(x, y) rounds to the nearest integer value, times your y and then subtracts the result from x.
Example:
remainder(160.0f, 360.0f)
result = 160 - round(160.0f / 360.0f) * 360
result = 160 - round(0.44f) * 360
result = 160 - (0 * 360)
result = 160.0f
Example2:
remainder(190.0f, 360.0f)
result = 190 - round(190.0f / 360.0f) * 360
result = 190 - round(0.53f) * 360
result = 190 - (1 * 360)
result = -170.0f
Thus you could end up having negative numbers depending on your input variable.
Related
I am just starting to learn C programming but whenever I run this:
double a = 9.92;
double b = 2.00;
printf("%.2lf, %.2lf, %.2lf", a, b, remainder(a, b));
The output is always $-0.18. The output that I was hoping for is $1.92.
The remainder function rounds the result of the division to the nearest integer, which means you can get a negative result. From the man page:
The remainder() function computes the remainder of dividing x
by y. The return value is x-n*y, where n is the value x / y,
rounded to the nearest integer. If the absolute value of x-n*y
is 0.5, n is chosen to be even.
So given your input, 9.92 / 2 results in 4.96 which rounded to the nearest integer is 5. Then you have 9.92 - (5 * 2.00) == 9.92 - 10.0 == -0.08.
You want to instead use fmod, which rounds the division toward 0. The man page for that states:
The fmod() function computes the floating-point remainder of
dividing x by y. The return value is x - n * y, where n is the
quotient of x / y, rounded toward zero to an integer.
I want to know whether the program defined below can return 1 assuming:
IEEE754 floating point arithmetics
no overflow (neither in max/x nor in f*x)
no nan or inf (obviously)
0 < x and 0 < n < 32
no unsafe math optimization
int canfail(int n, double x) {
double max = 1ULL << n; // 2^n
double f = max / x;
return f * x > max;
}
In my opinion, it should sometime return 1, as roundToNearest(max / x) can in general be greater than max/x.
I'm able to find numbers for the opposite case, where f * x < max, but I have no examples of input that show f * x > max and I have no idea of how to find one. Can somebody help ?
EDIT:
I know the value of x if in a range between 10^(-6) and 10^6 (that still leaves a lot (too much possible double values), but I know I will not have to deal with overflow, underflow or sub-normal numbers !
In addition, I just realized that because max is a power of two and we don't deal with overflow, the solution will be the same by fixing max=1 as it is exactly the same computation, but shifted.
Therefore, the problem correspond to finding a positive, normal double value x such that `(1/x) * x > 1.0 !!
I made a little program to try to find a solution:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
#include <omp.h>
int main( void ) {
#pragma omp parallel
{
unsigned short int xsubi[3] = {
omp_get_thread_num(),
omp_get_thread_num(),
omp_get_thread_num()
};
#pragma omp for
for(int64_t i=0; i<INT64_MAX; i++) {
double x = fmod(nrand48(xsubi), 1048576.0);
if(x<0.000001)
continue;
double f = 1.0 / x;
if(f * x > 1.0) {
printf("found !!! x=%.30f\n", x);
fflush(stdout);
}
}
}
return 1;
}
If you change the sign of the comparison, you will find some value quickly. However, it seems to run forever with f * x > 1.0
In the absence of underflow or overflow, the exponents are irrelevant; if M/x*x > M, then (M/p) / (x/q) * (x/q) > (M/p) for any powers of two p and q. So let’s consider 252 ≤ x < 253 and M = 2105. We can eliminate x = 252 since this yields exact floating-point arithmetic, so 252 < x < 253.
Division of 2105 by x yields integer quotient q and integer remainder r, with 252 q < 253, 0 < r < x, and 2105 = q•x + r.
In order for M/x*x to exceed M, both the division and the multiplication must round up. Since the division rounds up, x/2 ≤ r.
With rounding up, the result of floating-point division of 2105 by x yields q+1. Then the exact (not rounded) multiplication yields (q+1)•x = q•x + x = q•x + x + r - r = q•x + r + x − r = 2105 + x − r. Since x/2 < r, x − r ≤ x/2, so rounding this exact result rounds down, yielding 2105. (The “<” case always rounds down, and the “=” case rounds down because 2105 has the low even bit.)
Therefore, for powers of two M and all arithmetic within exponent bounds, M/x*x > M never occurs with round-to-nearest-ties-to-even.
Multiplication by a power of two is just a scaling of exponent, it does not change the problem: so it's the same as finding x such that (1/x) * x > 1.
One solution is brute force search.
For same reasons, we can limit the search of such x in the interval (1.0,2.0(
A better approach is to analyze error bounds without brute force.
Let's note ix the nearest floating point to 1/x.
Considering xand ixas exact fractions, we can write the integer division: 1 = ix * x + r where ris the remainder
(these are all fractions with denominators being powers of 2, so we have to multiply the whole equation by appropriate power of 2 to really have integer division).
In other words, ix = 1/x - r/x, where -r/x is the rounding error of inversion.
When we multiply the inverse approximation by x, the exact value is ix*x = 1 - r.
We know that the floating point result will be rounded to the nearest float to that exact value.
So, assumming default rounding mode to nearest, tie to even, the question asked is whether -r can exceed 0.5 ulp.
The short answer is never!
Suppose |r| > 0.5 ulp, then the rounding error -r/x does exceed half ulp of exact result 1/x.
This is not a proper answer, because the exact result is not a floating point and does not have an ulp, but you get the idea...
I might come back with a correct proof if i have time, but my bet is that you can find it already done, possibly on SO
EDIT
Why can you find (1/x) * x < 1?
Simply because 1.0 is at a binade limit, so below 1, we have to prove that r<0.25 ulp, what we cannot...
canfail(1, pow(2, 1023) * (2 - pow(2, -51))) will return 1.
I need to convert and round F to C. My function is simply:
return (int)((((float)5 / (float)9) * (f - 32)) + 0.5)
But if I input 14 f, I get back -9 c, instead of -10 c.
C has a nice function lround() to round and convert to an integer.
The lround and llround functions round their argument to the nearest integer value, rounding halfway cases away from zero, regardless of the current rounding direction. C11dr §7.12.9.7 2
#include <math.h>
return lround(5.0/9.0 * (f - 32));
The +0.5 and than cast to int has various troubles with it. It "rounds" incorrectly for negative values and rounds incorrectly for various edge case when x +0.5 is not exact.
Use the <math.h> round functions, rint(), round(), nearbyint(), etc) best tools in the shed.
OP comment about needing a vxWorks solution. That apparently has iround to do the job.
For a no math.h nor double solution:
Use (a + sign(a)*b/2)/b idiom. After offsetting by 32 degrees F, we need c = irounded(5*f/9) or c = irounded(10*f/18).
int FtoC(int f) {
f -= 32;
if (f < 0) {
return (2*5*f - 9)/(2*9);
}
return (2*5*f + 9)/(2*9);
}
((14 - 32) * 5.0) / 9.0 = -10.0
-10.0 + 0.5 = -9.5
(int)(-9.5) = -9
Adding 0.5 for rounding purposes will only work when the result of the calculation of f - 32 is positive. If the result is negative, it has to be changed to -0.5.
You could change your code to this:
int roundVal = (f < 32) ? -0.5 : 0.5;
return (int)((((float)5 / (float)9) * (f - 32)) + roundVal);
It sounds like you have two problems:
The number you're trying to round is negative, meaning that the standard trick of adding 0.5 goes the wrong way.
Standard rounding functions like round() are for some reason denied to you.
So just write your own:
double my_round(double x, double to_nearest)
{
if(x >= 0)
return (int)(x / to_nearest + 0.5) * to_nearest;
else
return (int)(x / to_nearest - 0.5) * to_nearest;
}
Now you can write
return (int)my_round(5./9. * (f - 32), 1.0);
Everyone's
"Int() doesn't function correctly in the negative region of the number
line"
is completely and utterly WRONG, and quite disgusting! We programmers should know and understand the concept of "the number line"!
Int(9.5) == 10 => true
Int(-9.5) == -9 => true
Lets say we have a dataset, that coincidently is something point 5, and is a linear system.
Keep in mind that this is matlab syntax, to me programming is programming, so entirely applicable in any language.
x = [-9.5:1:9.5] % -9.5 to 9.5 at increments of 1
-9.5 -8.5 -7.5 ..... 9.5
% Now we need a function Int(), and lets say it rounds to the nearest,
% as y'all say it should be: "direction of the sign". MATLAB doesn't
% have
Int()... that I know of.
function INTEGER = Int_stupid(NUMBER)
POL = NUMBER / abs(NUMBER) % Polarity multiplier
VALUE_temp = NUMBER + (POL * 0.5) % incorrectly implemented
% rounding to the nearest
% A number divided by it's absolute value is 1 times it's
% polarity
% ( -9.5 / abs( -9.5 ) ) = -1
% ( 9.5 / abs( 9.5 ) ) = 1
end
function INTEGER = Int(NUMBER) % how every other Int function works
VALUE_temp = NUMBER + 0.5 % correctly implemented rounding
% to the nearest
end
% Now we need the whole dataset rounded to the "nearest direction of the sign"
x_rounded = Int_stupid(x) => x = [-10, -9, -8,... -1, 1, 2...]
% notice how there is no 0, there is
% discontinuity in this bad rounding.
% Notice that in the plot there is a zig,
% or zag, in my PERFECT LINEAR SYSTEM.
% Notice the two parallel lines with no
% defects representing the RAW linear
% system, and the parallel correctly
% rounded => floor( x + 0.5 )
Rounded to the nearest data, if done correctly, will parallel the actual data.
Sorry for my anger, and programmatic insults. I expect experts to be experts, that don't sell completely incorrect information. And if I do the same, I expect the same humiliation from my peers => YOU.
References ( for 2nd grade how to round numbers ):
%_https://math.stackexchange.com/questions/3448/rules-for-rounding-positive-and-negative-numbers
%_https://en.wikipedia.org/wiki/IEEE_754#Rounding_algorithms
I am coding a very simple program in C but I keep getting wrong answers to calculations that I am doing. The final output that I want needs to have no decimal places so I am using int as the data type even though the answer will not be an integer. Here is the code:
int numberOfInches = (100/254)*101;
I either get the answer 0 if I use int as the data type or crazy long numbers if I try using float or double. Any ideas on what I am doing wrong?
100 / 254
This is integer division, which will get 0. You then multiply by 101.
To do floating point division, at least one of the operands of / must be floating point:
int n = (int)((100. / 254.) * 101.);
When both operands of the / operator are integer, it performs integer division, i.e. the result is the quotient with the fraction part trucated. The result of 100/254 is less that 1, so it rounds down to 0.
You can either make one of the constants floating point:
(100.0/254)*101
Or you can do the division last:
(100*101)/254
You use an int. The results are rounded to the nearest int value:
100/254 = 0 THEN 0 * 101 = 0 SO, final result is 0.
I think you can do something like :
int numberOfInches = 100 * 101 / 254;
result: 100 * 101 = 10100 THEN 10100 / 254 = 39.7xxxxxx SO, final result is 40.
Obviously it will give 0.
int numberOfInches = (100/254)*101;
When it will calculated first it will evaluate inner bracket.
so (100/254) will be 0 And when you multiply 0 to 101, i.e 0*101 = 0.
It will be 0.
To get the correct output, use the following.
int numberOfInches =(int)((100.0/254)*101);
Division in parentheses is integer-type, so 100/254=0. If you want to calculate value with a fractional part try:
int numberOfInches = (100.0 / 254.0) * 101.0;
Why not using float ? try 100.0f/254.0f
I'm working via a basic 'Programming in C' book.
I have written the following code based off of it in order to calculate the square root of a number:
#include <stdio.h>
float absoluteValue (float x)
{
if(x < 0)
x = -x;
return (x);
}
float squareRoot (float x, float epsilon)
{
float guess = 1.0;
while(absoluteValue(guess * guess - x) >= epsilon)
{
guess = (x/guess + guess) / 2.0;
}
return guess;
}
int main (void)
{
printf("SquareRoot(2.0) = %f\n", squareRoot(2.0, .00001));
printf("SquareRoot(144.0) = %f\n", squareRoot(144.0, .00001));
printf("SquareRoot(17.5) = %f\n", squareRoot(17.5, .00001));
return 0;
}
An exercise in the book has said that the current criteria used for termination of the loop in squareRoot() is not suitable for use when computing the square root of a very large or a very small number.
Instead of comparing the difference between the value of x and the value of guess^2, the program should compare the ratio of the two values to 1. The closer this ratio gets to 1, the more accurate the approximation of the square root.
If the ratio is just guess^2/x, shouldn't my code inside of the while loop:
guess = (x/guess + guess) / 2.0;
be replaced by:
guess = ((guess * guess) / x ) / 1 ; ?
This compiles but nothing is printed out into the terminal. Surely I'm doing exactly what the exercise is asking?
To calculate the ratio just do (guess * guess / x) that could be either higher or lower than 1 depending on your implementation. Similarly, your margin of error (in percent) would be absoluteValue((guess * guess / x) - 1) * 100
All they want you to check is how close the square root is. By squaring the number you get and dividing it by the number you took the square root of you are just checking how close you were to the original number.
Example:
sqrt(4) = 2
2 * 2 / 4 = 1 (this is exact so we get 1 (2 * 2 = 4 = 4))
margin of error = (1 - 1) * 100 = 0% margin of error
Another example:
sqrt(4) = 1.999 (lets just say you got this)
1.999 * 1.999 = 3.996
3.996/4 = .999 (so we are close but not exact)
To check margin of error:
.999 - 1 = -.001
absoluteValue(-.001) = .001
.001 * 100 = .1% margin of error
How about applying a little algebra? Your current criterion is:
|guess2 - x| >= epsilon
You are elsewhere assuming that guess is nonzero, so it is algebraically safe to convert that to
|1 - x / guess2| >= epsilon / guess2
epsilon is just a parameter governing how close the match needs to be, and the above reformulation shows that it must be expressed in terms of the floating-point spacing near guess2 to yield equivalent precision for all evaluations. But of course that's not possible because epsilon is a constant. This is, in fact, exactly why the original criterion gets less effective as x diverges from 1.
Let us instead write the alternative expression
|1 - x / guess2| >= delta
Here, delta expresses the desired precision in terms of the spacing of floating point values in the vicinity of 1, which is related to a fixed quantity sometimes called the "machine epsilon". You can directly select the required precision via your choice of delta, and you will get the same precision for all x, provided that no arithmetic operations overflow.
Now just convert that back into code.
Suggest a different point of view.
As this method guess_next = (x/guess + guess) / 2.0;, once the initial approximation is in the neighborhood, the number of bits of accuracy doubles. Example log2(FLT_EPSILON) is about -23, so 6 iterations are needed. (Think 23, 12, 6, 3, 2, 1)
The trouble with using guess * guess is that it may vanish, become 0.0 or infinity for a non-zero x.
To form a quality initial guess:
assert(x > 0.0f);
int expo;
float signif = frexpf(x, &expo);
float guess = ldexpf(signif, expo/2);
Now iterate N times (e.g. 6), (N based on FLT_EPSILON, FLT_DECIMAL_DIG or FLT_DIG.)
for (i=0; i<N; i++) {
guess = (x/guess + guess) / 2.0f;
}
The cost of perhaps an extra iteration is saved by avoiding an expensive termination condition calculation.
If code wants to compare a/b nearest to 1.0f
Simply use some epsilon factor like 1 or 2.
float a = guess;
float b = x/guess;
assert(b);
float q = a/b;
#define FACTOR (1.0f /* some value 1.0f to maybe 2,3 or 4 */)
if (q >= 1.0f - FLT_EPSILON*N && q <= 1.0f + FLT_EPSILON*N) {
close_enough();
}
First lesson in numerical analysis: for floating point numbers x+y has the potential for large relative errors, especially when the sum is near zero, but x*y has very limited relative errors.