For fun, I was trying to evaluate the Gaussian integral from 0 to 1 using a series expansion. For this reason, I wrote a factorial function which works well up to 20!(I checked) and then I wrote this:
int main(){
int n;
long double result=0;
for(n=0; n<=5; n++){
if(n%2==0){
result+=(((long double) 1/(long double)(factorial(n)*(2*n+1))));
} else {
result-=(((long double) 1/(long double)(factorial(n)*(2*n+1))));
}
}
printf("The Gaussian integral from 0 to 1 is %Lf\n", result);
}
This gives me a strange negative number which is obviously not even close. I suspect the problem is with the cast, but I don't know what it is. Any thoughts? This is not the first thing I tried. I tried converting anything in the expression and putting the explicit cast at the beginning, but it didn't work.
You are using the MinGW compiler (port of gcc for Windows), which has issues with the long double type. This is due to conflicts between GCC's implementation of long double and Microsoft's C library. See also this question.
According to this question, defining __USE_MINGW_ANSI_STDIO may solve this. If not, using double instead will work.
In (long double)(factorial(n)*(2*n+1), the multiplications are integer multiplications and the first one could overflow if the result of factorial is already close to the limit of the integer type used.
Write ((long double)(factorial(n))*(2*n+1) so that the first multiplication is a floating-point multiplication.
You're almost certainly overflowing your integer type. In C this is technically undefined behaviour.
For 32 bit unsigned integer, 13! will overflow. On 64 bit, 21! will overflow.
Your algorithm will survive a little longer if you use a floating point double type or an extension like __uint128 (gives you, I think, up to 34!) if your compiler supports it.
Another problem that you have is that you are progressively adding terms of decreasing size to your total. That's never a good idea when working with floating point types. If you run your for loop in the reverse order then the result will be more accurate.
Related
I was solving this problem on spoj http://www.spoj.com/problems/ATOMS/. I had to give the integral part of log(m / n) / log(k) as output. I had taken m, n, k as long long. When I was calculating it using long doubles, I was getting a wrong answer, but when I used float, it got accepted.
printf("%lld\n", (long long)(log(m / (long double)n) / log(k)));
This was giving a wrong answer but this:
printf("%lld\n", (long long)((float)log(m / (float)n) / (float)log(k)));
got accepted. So are there situations when float is better than double with respect to precision?
A float is never more accurate than a double since the former must be a subset of the latter, by the C standard:
6.2.5/6: "The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double."
Note that the standard does not insist on a particular floating point representation although IEEE754 is particularly common.
It might be better in some cases in terms of calculation time/space performance. One example that is just on the table in front of me - an ARM Cortex-M4F based microcontroller, having a hardware Floating Point Unit (FPU), capable of working with single-precision arithmetic, but not with double precision, which is giving an incredible boost to floating point calculations.
Try this simple code :
#include<stdio.h>
int main(void)
{
float i=3.3;
if(i==3.3)
printf("Equal\n");
else
printf("Not Equal\n");
return 0;
}
Now try the same with double as a datatype of i.
double will always give you more precision than a float.
With double, you encode the number using 64 bits, while your using only 32 bits with float.
Edit: As Jens mentioned it may not be the case. double will give more precision only if the compiler is using IEEE-754. That's the case of GCC, Clang and MSVC. I haven't yet encountered a compiler which didn't use 32 bits for floats and 64 bits for doubles though...
I'm using codeblocks and it is giving a different output to other compilers and I can't find a solution to it.What's the undefined behaviour in this program and is there any solution to avoid it?
This is the code to print the nth number in a number system with only 3 & 4.
#include<stdio.h>
#include<math.h>
int main(void)
{
int n,i,value;
scanf("%d",&n);
value=i=0;
while(n>0)
{
if((n%2)==0)
{
value+=4*pow(10,i);
}
else
{
value+=3*pow(10,i);
}
n=(n-1)/2;
i=i+1;
}
printf("\nThe number is : %d",value);
}
It works fine for numbers upto 6..And the output for numbers greater than 6 is one less than what it actually should be. E.g. if n=7,output=332 where it should be 333.
EDIT : Provided the full code with braces.
you're using the function pow(), which has the signature
double pow(double x, double y);
and calculations as int. Rounding/truncation errors ?
There is no undefined behavior in this code. i=i+1; is well-defined behavior, not to be confused with i=i++; which gives undefined behavior.
The only thing that could cause different outputs here would be floating point inaccuracy.
Try value += 4 * (int)nearbyint(pow(10,i)); and see if it makes any difference.
Seems that floating point gets truncated.
It sounds like a compiler bug.
You are computing the result as value+=3*pow(10,i); but what this actually translates to is value+= (int)(3*pow(10,i));
One of two things might be wrong here:
pow(10,0)!=1.0
cast to int is truncating the result incorrectly.
To debug it easily just try printing the partial results and see there the problem is.
The problem here is most likely that the pow function on this particular platform performs its computations by taking the log of the argument (perhaps natural log; perhaps log base 2), multiplying by the exponent, and then raising the base of the first logarithm to the power of that product. Performing such an operation with infinite-precision numbers would yield a mathematically-correct result, as would performing the operation on extended-precision numbers and returning a double result. My guess would be that the pow function used on this implementation may have been written for a platform which could perform the intermediate computations using extended-precision numbers, and which would consequently return correct double-precision values, but it is being run on a platform which lacks an extended-precision type. As a consequence of this, pow(10,3) may be returning something like 999.9999999997, and coercing that to int yields 999 rather than 1000.
If you're trying to get an integer-type result, there's really no reason to compute the power as a floating-point value. Rather than computing 10^i within the loop, it would be better to have a variable that's initialized to 1 and gets multiplied by 10 each time through the loop.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int n,i,ele;
n=5;
ele=pow(n,2);
printf("%d",ele);
return 0;
}
The output is 24.
I'm using GNU/GCC in Code::Blocks.
What is happening?
I know the pow function returns a double , but 25 fits an int type so why does this code print a 24 instead of a 25? If n=4; n=6; n=3; n=2; the code works, but with the five it doesn't.
Here is what may be happening here. You should be able to confirm this by looking at your compiler's implementation of the pow function:
Assuming you have the correct #include's, (all the previous answers and comments about this are correct -- don't take the #include files for granted), the prototype for the standard pow function is this:
double pow(double, double);
and you're calling pow like this:
pow(5,2);
The pow function goes through an algorithm (probably using logarithms), thus uses floating point functions and values to compute the power value.
The pow function does not go through a naive "multiply the value of x a total of n times", since it has to also compute pow using fractional exponents, and you can't compute fractional powers that way.
So more than likely, the computation of pow using the parameters 5 and 2 resulted in a slight rounding error. When you assigned to an int, you truncated the fractional value, thus yielding 24.
If you are using integers, you might as well write your own "intpow" or similar function that simply multiplies the value the requisite number of times. The benefits of this are:
You won't get into the situation where you may get subtle rounding errors using pow.
Your intpow function will more than likely run faster than an equivalent call to pow.
You want int result from a function meant for doubles.
You should perhaps use
ele=(int)(0.5 + pow(n,2));
/* ^ ^ */
/* casting and rounding */
Floating-point arithmetic is not exact.
Although small values can be added and subtracted exactly, the pow() function normally works by multiplying logarithms, so even if the inputs are both exact, the result is not. Assigning to int always truncates, so if the inexactness is negative, you'll get 24 rather than 25.
The moral of this story is to use integer operations on integers, and be suspicious of <math.h> functions when the actual arguments are to be promoted or truncated. It's unfortunate that GCC doesn't warn unless you add -Wfloat-conversion (it's not in -Wall -Wextra, probably because there are many cases where such conversion is anticipated and wanted).
For integer powers, it's always safer and faster to use multiplication (division if negative) rather than pow() - reserve the latter for where it's needed! Do be aware of the risk of overflow, though.
When you use pow with variables, its result is double. Assigning to an int truncates it.
So you can avoid this error by assigning result of pow to double or float variable.
So basically
It translates to exp(log(x) * y) which will produce a result that isn't precisely the same as x^y - just a near approximation as a floating point value,. So for example 5^2 will become 24.9999996 or 25.00002
I recently came across a C code (working by the way) where I found
freq_xtal = ((622.08E6 * vcxo_reg_val->hiv * vcxo_reg_val->n1)/(temp_rfreq));
From my intuition it seems that 622.08E6 should mean 622.08 x 106. From this question this assumption is correct.
So I tried replacing 622.08e6 with
uint32_t default_freq = 622080000;
For some reason this doesn't seem to work
Any thoughts or suggestions appreciated
The problem you are having (and I'm speculating here because I don't have the rest of your code) appears to be that replacing the floating point with an integer caused the multiplication and division to be integer based, and not decimal based. As a result, you now compute the wrong value.
Try type casting your uint32_t to a double and see if that clears it up.
The problem is due to overflow!
The original expression (622.08E6 * vcxo_reg_val->hiv * vcxo_reg_val->n1)/temp_rfreq (you have too many unnecessary parentheses though) is done in double precision because 622.08E6 is a double literal. That'll result in a floating-point value
However if you replace the literal with 622080000 then the whole expression will be done in integer math if all the variables are integer. But more importantly, integer math will overflow (at least much sooner than floating-point one)
Notice that UINT32_MAX / 622080000.0 ≈ 6.9. That means just multiply the constant by 7 and it'll overflow. However in the code you multiply 622080000 with 2 other values whose product may well be above 6. You should add the ULL suffix to do the math in unsigned long long
freq_xtal = (622080000ULL * vcxo_reg_val->hiv * vcxo_reg_val->n1)/temp_rfreq;
or change the variable to uint64_t default_freq = 622080000ULL;
I have this code
#include <stdio.h>
#include <math.h>
static double const x = 665857;
static double const y = 470832;
int main(){
double z = x*x*x*x -y*y*y*y*4 - y*y*4;
printf("%f \n",z);
return 0;
}
The real solution of this is equation is 1. As already answered on a previous question by myself, this code fails because of catastrophic cancellation. However, now I've found an even more strange thing. It works if you use long longs, while, as far as I know, they have less range than doubles. Why?
long long has less range, but more precision than double.
However, that's not what's at work here. Your computation actually exceeds the range of long long as well, but because of the way in which integer overflow is handled on your system, the correct result falls out anyway. (Note that the behavior of signed integer overflow is not pinned down by the C standard, but "usually" behaves as you see here).
If you look instead at the intermediate result x*x*x*x, you will see that if you compute it using double, it has a sensible value; not exact, but rounded and good enough for most purposes. However, if you compute it in long long, you will find a number that appears at first to be absolutely bonkers, due to overflow.
In a double there are bits for the mantissa and exponent. For large doubles the distance between two doubles (same exponent, 1 added to the mantissa) results, is much larger than 1. Hence you are in the same situation as infinity + 1 = infinity.
long long's will overflow, calculate modulo 2”, and hence the result when it should be 1 can indeed be one.
An overflow for a floating point type can either be considered undefined or an error, depending on language and environment. An overflow for an integral type simply wraps around (sometimes still yielding correct results, and sometimes not).