What shall I do, if C compares two doubles wrong? [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to write a fmod() function
double fmod(double x, double y) {
double mod = x;
while(mod >= y)
{
mod -= y;
}
return mod;
}
But fmod(1.2, 0.05) returns 0.05

Although the title asks about incorrect comparison, the comparison in the program shown is correct. It is the only correct floating-point operation in the program; all the others have errors (compared to real-number arithmetic).
In fmod(1.2, 0.05), neither 1.2 nor 0.5 are representable in the double format used in your C implementation. These numerals in source code are rounded to the nearest representable values, 1.1999999999999999555910790149937383830547332763671875 and 0.05000000000000000277555756156289135105907917022705078125.
Then, in the subtraction in mod -= y;, the exact real arithmetic result, 1.14999999999999995281552145343084703199565410614013671875 is not representable, so it is rounded to 1.149999999999999911182158029987476766109466552734375.
Similar errors continue during the calculations, until eventually 0.0499999999999994615418330567990778945386409759521484375 is produced. At each point, the comparison mod >= y correctly evaluates whether mod is greater than or equal to y. When mod is less than y, the loop stops.
However, due to intervening errors, the result produced, 0.0499999999999994615418330567990778945386409759521484375, is not equal to the residue of 1.1999999999999999555910790149937383830547332763671875 divided by 0.05000000000000000277555756156289135105907917022705078125. The correct result can be calculated with the standard fmod function, which returns 0.04999999999999989175325509904723730869591236114501953125.
Note that, when you define a function named fmod, the C standard does not define the behavior because this conflicts with the standard library function of that name. You ought to give it a different name, such as fmodAlternate.
Inside the fmod routine, errors can be avoided. It is possible to implement fmod so that it returns an exact result for the arguments it is given. (This is possible because the result is always in a region of the floating-point range that is fine enough [has a low enough exponent] to represent the real arithmetic result exactly.) However, the errors in providing the arguments cannot be corrected: It is not possible to represent 1.2 or 0.05 in the double format your C implementation uses. The source code fmod(1.2, .05) will always calculate fmod(1.1999999999999999555910790149937383830547332763671875, 0.05000000000000000277555756156289135105907917022705078125), which is 0.04999999999999989175325509904723730869591236114501953125.
An alternative is to represent the numbers differently. For example, you could scale these numbers by a factor of 100, and fmod(120, 5) will return 0. What solution is appropriate depends on the circumstances of the problem you are trying to solve.

Related

Why doesn't roundf work for me in this case? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
This is C. I am a beginner, so sorry for the experts to whom this question may seem trivial. I am trying to round this float to the nearest integer, away from zero. I've also tried rintf based on some other posts on the internet, but it just won't work! I used printf to check the results, and they weren't rounded to the nearest integer.
//Approximate US grade level.
float index = 0.0588 * L - 0.296 * S - 15.8;
float roundf(float index);
Note that
float roundf(float index);
is a declaration of a function. It is not a call.
If you use float roundf(float index); as function call inside of index = float roundf(float index); you should get a compiler error, but maybe you are on an uncommon compiler. Thus it can be a reason that it "won't work" as expected.
A correct call would be index = round(index);.
I used printf to check the results, and they weren't rounded to the nearest integer.
Note that floating-point precision isn't the best one in case you want to represent integers with it. A float or double can't represent an even integer value fully accurate. It has only a narrowed and limited precision.
Related:
Why not use Double or Float to represent currency?
Functions in C take an input and return an output in this way: output = function(input);. There may be more than one input for some functions, of course, but the principle is that.
For your case, try index = roundf(index); if you want the result to overwrite the non-rounded value.

Simple floating point multiplication not giving expected result [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 3 years ago.
When given the input 150, I expect the output to be the mathematically correct answer 70685.7750, but I am getting the wrong output 70685.7812.
#include<stdio.h>
int main()
{
float A,n,R;
n=3.14159;
scanf("%f",&R);
A=n*(R*R);
printf("A=%.4f\n",A);
}
float and double numbers are not represented very accurately in the memory. The main reason is that the memory is limited, and most non-integers are not.
The best example is PI. You can specify as many digits as you want, but it will still be an approximation.
The limited precision of representing the numbers is the reason of the following rule:
when working with floats and double numbers, not not check for equality (m == n), but check that the difference between them is smaller than a certain error ((m-n) < e)
Please note, as mentioned in the comments too, that the above rule is not "the mother rule of all rules". There are other rules also.
Careful analysis must be done for each particular situation, in order to have a properly working application.
(Thanks #EricPostpischil for the reminder)
It is common for a variable of type float to be an IEEE-754 32-bit floating point number.
The number 3.14159 cannot be stored exactly in an IEEE-754 32-bit float - the closest value is approximately 3.14159012. 150 * 150 * 3.14159012 is 70685.7777, and the closest value to this that can be represented in a 32-bit float is 70685.78125, which you are then printing with %.4f so you see 70685.7812.
Another way of thinking about this is that your n value only ends up being accurate to the sixth significant figure, so - as you are just calculating a series of multiplications - your result is also only acccurate to the sixth significant figure (ie 70685.8). (In the general case this can be worse - for example subtraction of two close values can lead to a large increase in the relative error).
If you switch to using variables of type double (and change the scanf() to use %lf), then you will likely get the answer you are after. double is typically a 64-bit float, which means that the error in the representation of your n values and the result is small enough not to affect the fourth decimal place.
Have you heard that float and double values aren't always perfectly accurate, have limited precision? Have you heard that type float gives you the equivalent of only about 7 decimal digits' worth of precision? This is what that means. Your expected and actual answers, 70685.7750 and 70685.7812, differ in the seventh digit, just about as expected.
I expect the output to be the mathematically correct answer
I am sorry to disappoint you, but that's your mistake. As a general rule, when you're doing floating-point arithmetic, you will never get the mathematically correct answer, you will always get a limited-precision approximation of the mathematically correct answer.
The canonical SO answers to this sort of question are collected at Is floating point math broken?. You might want to read some of those answers for more enlightenment.

Problems with palindrome in c [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
A C program to find if the input number is palindrome or not
The problem as I see it is that the even numbered powers come out strange. Can anyone please tell me what the problem could be?
The pow function is returning incorrect values right. instead of using the pow in the code multiply 10 with the sum to get the output
remainder = n%10;
reversed = reversed*10 + remainder;
n /= 10;
The best way to get accurate results with pow function would be to use doubles as much as you can. Like most functions using integers for large floating point operations tends to leave you with inaccurate results
The implementation of pow you are using returns incorrect values. For integer powers of 10, it ought to return exactly 1, 10, 100, 1000, et cetera, but it returns values slightly different. Furthermore, when it returns a value slightly under an integer and that value is converted from floating-point to int, it is truncated, so the result of int x = pow(10, 3) may be 999 rather than 1000.
Do not use this pow for exponentiating ten. You can write a simple integer replacement for pow (with another name, of course), or you can rewrite your code to avoid relying on exponentiating ten. (Working iteratively, with 1, 10, 100, 1000, and so on, is often better—simply multiplying by ten at each step instead of exponentiating to calculate the power.)

Strange float rounding [duplicate]

This question already has answers here:
Why does division result in zero instead of a decimal?
(5 answers)
Closed 8 years ago.
I'm programming a microcontroller atmega168 (8 bit).
I want to do something like:
float A = cos(- (2/3) * M_PI);
Including of course math.h (#define M_PI 3.14159265358979323846)
As result, instead of having -0.5, i get 1.
I check the result using a serial communication to my pc that i'm sure that works also for float numbers because if i set
A= -0.50;
I receive the correct result.
PS. I cannot use double...also because i don't see the reason of doing so
Help me please!
2/3 is evaluated using integer arithmetic. It evaluates to 0. You mean to use floating point divide
float A = cos(-(2.0/3.0) * M_PI);
If you want float literals use an f suffix:
float A = cos(-(2.0f/3.0f) * M_PI);
Do note however, that the M_PI macro, expanded here, is a double literal. Is that what you want?
I presume the real code doesn't look quite like this. If this really is your code then you would write float A = -0.5f and move on. I guess the real code has variables.
How much precision do you need, and where are the numbers coming from? If your goal is to compute the cosine of 120 degrees, just set a to 0.5. If your original numbers are not expressed as radians, and if you don't need an absolutely precise result, a table-based approximation may be more useful than built in trig functions, since you can strike whatever balance between table size, execution speed, and precision will best fit your needs. Note also that if your original numbers are integers, you may be able to compute trig functions without using any floating-point values [e.g. one could have a function that accepts angles from 0-65535 and returns values from -16384 to +16384]. Integer math is often much faster than floating-point, so such a function could be a major performance win.

Why does MSVS not optimize away +0? [duplicate]

This question already has answers here:
Why does Clang optimize away x * 1.0 but NOT x + 0.0?
(2 answers)
Closed 8 months ago.
This question demonstrates a very interesting phenomenon: denormalized floats slow down the code more than an order of magnitude.
The behavior is well explained in the accepted answer. However, there is one comment, with currently 153 upvotes, that I cannot find satisfactory answer to:
Why isn't the compiler just dropping the +/- 0 in this case?!? –
Michael Dorgan
Side note: I have the impression that 0f is/must be exactly representable (furthermore - it's binary representation must be all zeroes), but can't find such a claim in the c11 standard. A quote proving this, or argument disproving this claim, would be most welcome. Regardless, Michael's question is the main question here.
§5.2.4.2.2
An implementation may give zero and values that are not floating-point
numbers (such as infinities and NaNs) a sign or may leave them
unsigned.
The compiler cannot eliminate the addition of a floating-point positive zero because it is not an identity operation. By IEEE 754 rules, the result of adding +0. to −0. is not −0.; it is +0.
The compiler may eliminate the subtraction of +0. or the addition of −0. because those are identity operations.
For example, when I compile this:
double foo(double x) { return x + 0.; }
with Apple GNU C 4.2.1 using -O3 on an Intel Mac, the resulting assembly code contains addsd LC0(%rip), %xmm0. When I compile this:
double foo(double x) { return x - 0.; }
there is no add instruction; the assembly merely returns its input.
So, it is likely the code in the original question contained an add instruction for this statement:
y[i] = y[i] + 0;
but contained no instruction for this statement:
y[i] = y[i] - 0;
However, the first statement involved arithmetic with subnormal values in y[i], so it was sufficient to slow down the program.
It is not the zero constant 0.0f that is denormalized, it is the values that approach zero each iteration of the loop. As they become closer and closer to zero, they need more precision to represent, hence the denormalization. In the original question, these are the y[i] values.
The crucial difference between the slow and fast versions of the code is the statement y[i] = y[i] + 0.1f;. As soon as this line is executed, the extra precision in the float is lost, and the denormalization needed to represent that precision is no longer needed. Afterwards, floating point operations on y[i] remain fast because they aren't denormalized.
Why is the extra precision lost when you add 0.1f? Because floating point numbers only have so many significant digits. Say you have enough storage for three significant digits, then 0.00001 = 1e-5, and 0.00001 + 0.1 = 0.1, at least for this example float format, because it doesn't have room to store the least significant bit in 0.10001.

Resources