Taylor series of function e^x

Taylor series of function e^x - c

Given a number x. You need to calculate sum of Taylor Series of e^x.
e^x = 1 + x + x^2/2! + x^3/3! + ...
Calculate sum until a general number is lower or equal to 10^(-9).
Down below is my solution but it is wrong for x<0 numbers. Do you have any idea how to fix this to work for negative numbers.
int x,i,n;
long long fact; //fact needs to be double
double sum=0,k=1;
scanf("%d",&x);
i=0; sum=0; k=1;
while (fabs(k)>=1.0E-9) {
fact=1;
for (int j=1;j<=i;++j)
fact*=j;
k=pow(x,i)/fact;
sum+=k;
++i;
}
printf("%lf\n",sum);

You should not use the pow function for raising a (possibly negative) number to an integer power. Instead use repeated multiplication as you do to compute the factorial.
Notice also that you could store the last computed values of $n!$ and $x^k$ to obtain $(n+1)!$ and $x^{k+1}$ with a single multiplication.

Your problem is that your factorial computation overflows and becomes garbage.
After that your ith term doesn't decrease anymore and produce completely wrong results.
After 20 iterations a 64 bits number cannot contains the value of 20!. See: http://www.wolframalpha.com/input/?i=21%21%2F2%5E64
If x^n/n! is not inferior to your threshold (1e-9) when n=20 then your computation of n! will overflow even a 64 bits integer. When that happens you will get the value of n! modulo 2^63 (I simplify because you didn't use an unsigned integer and you will get random negative value instead but the principle remains). These values may be very low instead of being very high. And this will cause your x^n/n! to become greater instead of smaller.

fact needs to be double, it can not be long long because of divides.

Related

Binomial Coefficients rounding error

I have to calculate in c binomial coefficients of the expression
(x+y)**n, with n very large (order of 500-1000). The first algo to calculate binomial coefficients that came to my mind was multiplicative formula. So I coded it into my program as
long double binomial(int k, int m)
{
int i,j;
long double num=1, den=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
num*=(k+1-i);
den*=i;
}
return num/den;
}
This code is really fast on a single core thread, compared for example to recursive formula, although the latter one is less subject to rounding errors since involves only sums and not divisions.
So I wanted to test these algos for great values and tried to evaluate 500 choose 250 (order 10^160). I have found that the "relative error" is less than 10^(-19), so basically they are the same number, although they differ something like 10^141.
So I'm wondering: Is there a way to evaluate the order of the error of the calculation? And is there some fast way to calculate binomial coefficients which is more precise than the multiplicative formula? Since I don't know the precision of my algo I don't know where to truncate the stirling's series to get better results..
I've googled for some tables of binomial coefficients so I could copy from those, but the best one I've found stops at n=100...

If you're just computing individual binomial coefficients C(n,k) with n fairly large but no larger than about 1750, then your best bet with a decent C library is to use the tgammal standard library function:
tgammal(n+1) / (tgammal(n-k+1) * tgammal(k+1))
Tested with the Gnu implementation of libm, that consistently produced results within a few ULP of the precise value, and generally better than solutions based on multiplying and dividing.
If k is small (or large) enough that the binomial coefficient does not overflow 64 bits of precision, then you can get a precise result by alternately multiplying and dividing.
If n is so large that tgammal(n+1) exceeds the range of a long double (more than 1754) but not so large that the numerator overflows, then a multiplicative solution is the best you can get without a bignum library. However, you could also use
expl(lgammal(n+1) - lgammal(n-k+1) - lgammal(k+1))
which is less precise but easier to code. (Also, if the logarithm of the coefficient is useful to you, the above formula will work over quite a large range of n and k. Not having to use expl will improve the accuracy.)
If you need a range of binomial coefficients with the same value of n, then your best bet is iterative addition:
void binoms(unsigned n, long double* res) {
// res must have (n+3)/2 elements
res[0] = 1;
for (unsigned i = 2, half = 0; i <= n; ++i) {
res[half + 1] = res[half] * 2;
for (int k = half; k > 0; --k)
res[k] += res[k-1];
if (i % 2 == 0)
++half;
}
}
The above produces only the coefficients with k from 0 to n/2. It has a slightly larger round-off error than the multiplicative algorithm (at least when k is getting close to n/2), but it's a lot quicker if you need all the coefficients and it has a larger range of acceptable inputs.

To get exact integer results for small k and m, a better solution might be (a slight variation of your code) :
unsigned long binomial(int k, int m)
{
int i,j; unsigned long num=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
num*=(k+1-i);
num/=i;
}
return num;
}
Every time you get a combinatorial number after doing the division num/=i, so you won't get truncated. To get approximate results for bigger k and m, your solution might be good. But beware that long double multiplication is already much slower than the multiplication and division of integers (unsigned long or size_t). If you want to get bigger numbers exact, probably a big integer class must be coded or included from a library. You can also google if there's fast factorial algorithm for n! of extremely big integer n. That may help with combinatorics, too. Stirling's formula is a good approximation for ln(n!) when n is large. It all depends on how accurate you want to be.

If you really want to use the multiplicative formula, I would recommend an exception based approach.
Implement the formula with large integers (long long for example)
Attempt division operations as soon as possible (as suggested by Zhuoran)
Add code to check correctness of every division and multiplication
Resolve incorrect divisions or multiplications, e.g.
try the division in loop proposed by Zhuoran, but if it fails resort back to the initial algorithm (accumulating the product of divisor in den)
store the unresolved multiplier, divisors in additional long integers and try to resolve them in next iteration loops
If you really use large numbers then your result might not fit in long integer. then in that case you can switch to long double or use your personal LongInteger storage.
This is a skeleton code, to give you an idea:
long long binomial_l(int k, int m)
{
int i,j;
long long num=1, den=1;
j=m<(k-m)?m:(k-m);
for(i=1;i<=j;i++)
{
int multiplier=(k+1-i);
int divisor=i;
long long candidate_num=num*multiplier;
//check multiplication
if((candidate_num/multiplier)!=num)
{
//resolve exception...
}
else
{
num=candidate_num;
}
candidate_num=num/divisor;
//check division
if((candidate_num*divisor)==num)
{
num=candidate_num;
}
else
{
//resolve exception
den*=divisor;
//this multiplication should also be checked...
}
}
long long candidate_result= num/den;
if((candidate_result*den)==num)
{
return candidate_result;
}
// you should not get here if all exceptions are resolved
return 0;
}

This may not be what OP is looking for, but one can analytically approximate nCr for large n with binary entropy function. It is mentioned in
Page 10 of http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/5.16.pdf
https://math.stackexchange.com/questions/835017/using-binary-entropy-function-to-approximate-logn-choose-k

Upper bound for number of digits of big integer in different base

I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.
Example:
A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.
The function I am currently using is:
int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
return ceil(n_digits*log(src_base)/log(dst_base));
}
Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.
I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.
GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.

I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.
In the following, let dst_base be 2^w, src_base be b, and n_digits be n.
Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)
For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:
ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) )
≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.
In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.

I'm not sure about float point rounding in this case, but it is relatively easy to implement this using only integers, as log2 is a classic bit manipulation pattern and integer division can be easily rounded up. The following code is equivalent to yours, but using integers:
// Returns log2(x) rounded up using bit manipulation (not most efficient way)
unsigned int log2(unsigned int x)
{
unsigned int y = 0;
--x;
while (x) {
y++;
x >>= 1;
}
return y;
}
// Returns ceil(a/b) using integer division
unsigned int roundup(unsigned int a, unsigned int b)
{
return (a + b - 1) / b;
}
unsigned int get_num_digits_in_different_base(unsigned int n_digits, unsigned int src_base, unsigned int log2_dst_base)
{
return roundup(n_digits * log2(src_base), log2_dst_base);
}
Please, note that:
This function return different results compared to yours! However, in every case I looked, both were still correct (the smaller value was more accurate, but your requirement is just an upper bound).
The integer version I wrote receives log2_dst_base instead of dst_base to avoid overflow for 2^64.
log2 can be made more efficient using lookup tables.
I've used unsigned int instead of int.

accuracy of sqrt of integers

I have a loop like this:
for(uint64_t i=0; i*i<n; i++) {
This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.
unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {
In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.
My question is: is the sqrt function accurate enough to do this for all cases?
Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.
My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.
sqrt(x+dx) < sqrt(x) + 1 //solve
dx < 2*sqrt(x) + 1
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1
Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:
unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi); //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi)); //65535 65536
Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.

First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.
If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:
for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)
replacing the multiply with an lea or a shift and two adds.
Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.
However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:
uint32_t cut = sqrt(n) + 1;
and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:
uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;
Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:
uint32_t cut = sqrt(n);
Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:
double_n = n*(1 + e)
which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:
square_root(double_n) = square_root(n)*square_root(1+e)
Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:
square_root(1+e) = 1 + e/2 + O(e^2)
= 1 + d with |d| <~ 2^-54
Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.
[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.

All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both
a simple way to avoid imul in loop
a simple way to compute the ceiling sqrt
Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:
sqrtFloor
"Return the integer part of the square root of self"
| guess delta |
guess := 1 bitShift: (self highBit + 1) // 2.
[
delta := (guess squared - self) // (guess + guess).
delta = 0 ] whileFalse: [
guess := guess - delta ].
^guess - 1
Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow
But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:
uint32_t sqrtFloor(uint64_t n)
{
int64_t diff;
int64_t delta;
uint64_t guess=sqrt(n); /* implicit conversions here... */
while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
guess -= delta;
return guess-(diff>0);
}
That's a few integer multiplications and divisions, but outside the main loop.

What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.
For x>0, there is
.
To make the notation more compact, rewriting the above formula as
Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:
Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.
For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.

C Program to sum a simple series

I have a program here that is supposed to sum up the series
1+1/2+1/3+1/4... etc
The only user entry is to enter how many times you want this sum to run for.
However, I keep getting the sum one.
#include <stdio.h>
int main(void)
{
int b,x; /* b is number of times program runs and x is the count*/
float sum;
printf("Enter the number of times you want series to run.\n");
scanf("%d", &b);
printf("x sum\n");
for(x=1,sum=0;x<b+1;x++)
{
printf("%d %9.3f\n",x, (sum +=(float)(1/x)));
}
return 0;
}
I don't quite get why it isn't working. As you can see, I did tell it to print x and when it did, x was incrementing correctly.The sum just kept adding up to one.

You have misplaced parentheses so you're doing integer division for 1/x and getting 0 for any value of x > 1.
I suggest you change:
printf("%d %9.3f\n",x, (sum +=(float)(1/x)));
to:
printf("%d %9.3f\n",x, (sum += 1.0f/x));

Two problems: one dull, one interesting.
1) 1 / x will be imprecise since 1 and x are both integral types and so the computation will be done in integer arithmetic. All the cast does is convert the resultant integral type to floating point. To resolve this, write 1.0 / x. Then 'x' is promoted to floating point prior to the division.
2) You should reverse the order of the for loop:
sum = 0.0;
for(x = b; x >= 1; --x)
(I've also moved the initialisation or sum from the for loop as sum = 0 is an expression of type float but x = b is an expression of type int so you ought not use the comma operator as they have different data types.)
The reason is subtle: you should only add floating points of similar magnitude. Doing the loop my way means the smaller values are added first.
The effect will be noticeable for high values of b; try it. Your original way will always understate the sum.

The problem is integer division when you do 1/x, which always result in 0 as long as x is greater than 1. Even it you later convert this to a float, the "damage" is already done. An easy fix would be to change the division to 1.0f/x.

Since you have declared x as an int, (1/x) returns 1 when x is 1 and 0 for x>1. So, sum remains 1. So you get the same result.
So, change (1/x) to 1.0f/x, so that the result is returned as a float

Here you are computing 1/x in which the fractional value is truncated. Converting it into float after the original value has been truncated doesn't make sense.
So change this to:-
printf("%d %9.3f\n",x, (sum +=(float)(1/x)));
to
printf("%d %9.3f\n",x, (sum += 1.0f/x));

The expression (1/x) will always be integer division. For the first run this will be 1/1 giving you 1. However, next time round it will be 1/2 which is 0. Basically for 1/x where x>1 the answer will be zero.
To get around this write the expression as 1.0/x which will cause x to be promoted to a double, giving you double division.

What is the time complexity of this multiplication algorithm?

For the classic interview question "How do you perform integer multiplication without the multiplication operator?", the easiest answer is, of course, the following linear-time algorithm in C:
int mult(int multiplicand, int multiplier)
{
for (int i = 1; i < multiplier; i++)
{
multiplicand += multiplicand;
}
return multiplicand;
}
Of course, there is a faster algorithm. If we take advantage of the property that bit shifting to the left is equivalent to multiplying by 2 to the power of the number of bits shifted, we can bit-shift up to the nearest power of 2, and use our previous algorithm to add up from there. So, our code would now look something like this:
#include <math.h>
int log2( double n )
{
return log(n) / log(2);
}
int mult(int multiplicand, int multiplier)
{
int nearest_power = 2 ^ (floor(log2(multiplier)));
multiplicand << nearest_power;
for (int i = nearest_power; i < multiplier; i++)
{
multiplicand += multiplicand;
}
return multiplicand;
}
I'm having trouble determining what the time complexity of this algorithm is. I don't believe that O(n - 2^(floor(log2(n)))) is the correct way to express this, although (I think?) it's technically correct. Can anyone provide some insight on this?

mulitplier - nearest_power can be as large as half of multiplier, and as it tends towards infinity the constant 0.5 there doesn't matter (not to mention we get rid of constants in Big O). The loop is therefore O(multiplier). I'm not sure about the bit-shifting.
Edit: I took more of a look around on the bit-shifting. As gbulmer says, it can be O(n), where n is the number of bits shifted. However, it can also be O(1) on certain architectures. See: Is bit shifting O(1) or O(n)?
However, it doesn't matter in this case! n > log2(n) for all valid n. So we have O(n) + O(multiplier) which is a subset of O(2*multiplier) due to the aforementioned relationship, and thus the whole algorithm is O(multiplier).

The point of finding the nearest power is so that your function runtime could get close to runtime O(1). This happens when 2^nearest_power is very close to the result of your addition.
Behind the scenes the whole "to the power of 2" is done with bit shifting.
So, to answer your question, the second version of your code is still worse case linear time: O(multiplier).
Your answer, O(n - 2^(floor(log2(n)))), is also not incorrect; it's just very precise and might be hard to do in your head quickly to find the bounds.

Edit
Let's look at the second posted algorithm, starting with:
int nearest_power = 2 ^ (floor(log2(multiplier)));
I believe calculating log2, is, rather pleasingly, O(log2(multiplier))
then nearest_power gets to the interval [multiplier/2 to multiplier], the magnitude of this is multiplier/2. This is the same as finding the highest set-bit for a positive number.
So the for loop is O(multiplier/2), the constant of 1/2 comes out, so it is O(n)
On average, it is half the interval away, which would be O(multiplier/4). But that is just the constant 1/4 * n, so it is still O(n), the constant is smaller but it is still O(n).
A faster algorithm.
Our intuitiion is we can multiply by an n digit number in n steps
In binary this is using 1-bit shift, 1-bit test and binary add to construct the whole answer. Each of those operations is O(1). This is long-multiplication, one digit at a time.
If we use O(1) operations for n, an x bit number, it is O(log2(n)) or O(x), where x is the number of bits in the number
This is an O(log2(n)) algorithm:
int mult(int multiplicand, int multiplier) {
int product = 0;
while (multiplier) {
if (multiplier & 1) product += multiplicand;
multiplicand <<= 1;
multiplier >>= 1;
}
return product;
}
It is essentially how we do long multiplication.
Of course, the wise thing to do is use the smaller number as the multiplier. (I'll leave that as an exercise for the reader :-)
This only works for positive values, but by testing and remembering the signs of the input, operating on positive values, and then adjusting the sign, it works for all numbers.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Taylor series of function e^x - c

fact needs to be double, it can not be long long because of divides.

Related

Binomial Coefficients rounding error

Upper bound for number of digits of big integer in different base

accuracy of sqrt of integers

C Program to sum a simple series

What is the time complexity of this multiplication algorithm?

Categories

Resources