A continued fraction is a series of divisions of this kind:
depth 1 1+1/s
depth 2 1+1/(1+1/s)
depth 3 1+1/(1+1/(1+1/s))
. . .
. . .
. . .
The depth is an integer, but s is a floating point number.
What would be an optimal algorithm (performance-wise) to calculate the result for such a fraction with large depth?
Hint: unroll each of these formulas using basic algebra. You will see a pattern emerge.
I'll show you the first steps so it becomes obvious:
f(2,s) = 1+1/s = (s+1)/s
f(3,s) = 1+1/f(2,s) = 1+(s/(s+1)) = (1*(s+1) + s)/(s+1) = (2*s + 1) / (s + 1)
/* You multiply the first "1" by denominator */
f(4,s) = 1+1/f(3,s) = 1+(s+1)/(2s+1) = (1*(2*s+1) + (s+1))/(2*s+1) = (3*s + 2) / (2*s + 1)
f(5,s) = 1+1/f(4,s) = 1+(2s+1)/(3s+2) = (1*(3*s+2) + (2s+1))/(3*s+2) = (5*s + 3) / (3*s + 2)
...
Hint2: if you don't see the obvious pattern emerging form the above, the most optimal algorithm would involve calculating Fibonacci numbers (so you'd need to google for optimal Fibonacci # generator).
I'd like to elaborate a bit on DVK's excellent answer. I'll stick with his notation f(d,s) to denote the sought value for depth d.
If you calculate the value f(d,s) for large d, you'll notice that the values converge as d increases.
Let φ=f(∞,s). That is, φ is the limit as d approaches infinity, and is the continued fraction fully expanded. Note that φ contains a copy of itself, so that we can write φ=1+1/φ. Multiplying both sides by φ and rearranging, we get the quadratic equation
φ2 - φ - 1 = 0
which can be solved to get
φ = (1 + √5)/2.
This is the famous golden ratio.
You'll find that f(d,s) is very close to φ as d gets large.
But wait. There's more!
As DVK pointed out, the formula for f(d,s) involves terms from the Fibonacci sequence. In particular, it involves ratios of successive terms of the Fibonacci sequence. There is a closed form expression for the nth term of the sequence, namely
(φn-(1-φ)n)/√5.
Since 1-φ is less than one, (1-φ)n gets small as n gets large, so a good approximation for the nth Fibonacci term is φn/√5. And getting back to DVK's formula, the ratio of successive terms in the Fibonacci sequence will tend to φn+1/φn = φ.
So that's a second way of getting to the fact that the continued fraction in this question evaluates to φ.
Smells like tail recursion(recursion(recursion(...))).
(In other words - loop it!)
I would start with calculating 1/s, which we will call a.
Then use a for-loop, as, if you use recursion, in C, you may experience a stack overflow.
Since this is homework I won't give much code, but, if you start with a simple loop, of 1, then keep increasing it, until you get to 4, then you can just go to n times.
Since you are always going to be dividing 1/s and division is expensive, just doing it one time will help with performance.
I expect that if you work it out that you can actually find a pattern that will help you to further optimize.
You may find an article such as this: http://www.b-list.org/weblog/2006/nov/05/programming-tips-learn-optimization-strategies/, to be helpful.
I am assuming by performance-wise you mean that you want it to be fast, regardless of memory used, btw.
You may find that if you cache the values that you calculated, at each step, that you can reuse them, rather than redoing an expensive calculation.
I personally would do 4-5 steps by hand, writing out the equations and results of each step, and see if any pattern emerges.
Update:
GCC has added tail recursion, and I never noticed it, since I try to limit recursion heavily in C, from habit. But this answer has a nice quick explanation of different optimizations gcc did based on the optimization level.
http://answers.yahoo.com/question/index?qid=20100511111152AAVHx6s
Related
I am trying to generate an logarithmic spaced array in C.
For example, starting at 100 and ending at 500, with 40 logarithmic spaced points.
Can anyone help me? Are there any logspace() functions available?
With no further constraints, simply divide the linear interval [ln(100)..ln(500)] into as much subintervals (equidistant) as you need. Then take the exp() of each point.
Arrays always use linear, integer and n+1 stepping. So you have to map the logarithmic scale to the linear index. This can be done either by simply taking log(log_index) or a table of ranges and a linear search in that. For log(), there might be approximations which suit your needs better and are faster than a full-grown (float) logarithm function.
You might for instance take the number of the uppermost 1-bit in the log-index and use the next n lower bits as range-index:
// all vars are size_t (unsigned at least!)
base_index = get_number_of_uppermost_bit(log_index);
shift = (base_index > 3U) ? (base_index - 3U) : 0;
lin_index = base_index * 8U + ((log_index >> shift) & (8U-1U);
The values of 8 and 3 (ld(8)) are the number of entries per log-range. Note these are linear (sometimes an acceptable approximation). You can also apply the algorithm to the lower bits, however getting an integer log function. But the above is faster and might be sufficient. Alternatively, you can use a lookup table for the lower 3 bits.
A decimal stepping would be more difficult that way and pretty inefficient.
I want to generate random number in sorted order.
I wrote below code:
void CreateSortedNode(pNode head)
{
int size = 10, last = 0;
pNode temp;
while(size-- > 0) {
temp = (pnode)malloc(sizeof(struct node));
last += (rand()%10);
temp->data = last;//randomly generate number in sorted order
list_add(temp);
}
}
[EDIT:]
Expecting number will be generated in increased or decreased order: i.e {2, 5, 9, 23, 45, 68 }
int main()
{
int size = 10, last = 0;
while(size-- > 0) {
last += (rand()%10);
printf("%4d",last);
}
return 0;
}
Any better idea?
Solved back in 1979 (by Bentley and Saxe at Carnegie-Mellon):
https://apps.dtic.mil/dtic/tr/fulltext/u2/a066739.pdf
The solution is ridiculously compact in terms of code too!
Their paper is in Pascal, I converted it to Python so it should work with any language:
from random import random
cur_max=100 #desired maximum random number
n=100 #size of the array to fill
x=[0]*(n) #generate an array x of size n
for i in range(n,0,-1):
cur_max=cur_max*random()**(1/i) #the magic formula
x[i-1]=cur_max
print(x) #the results
Enjoy your sorted random numbers...
Without any information about sample size or sample universe, it's not easy to know if the following is interesting but irrelevant or a solution, but since it is in any case interesting, here goes.
The problem:
In O(1) space, produce an unbiased ordered random sample of size n from an ordered set S of size N: <S1,S2,…SN>, such that the elements in the sample are in the same order as the elements in the ordered set.
The solution:
With probability n/|S|, do the following:
add S1 to the sample.
decrement n
Remove S1 from S
Repeat steps 1 and 2, each time with the new first element (and size) of S until n is 0, at which point the sample will have the desired number of elements.
The solution in python:
from random import randrange
# select n random integers in order from range(N)
def sample(n, N):
# insist that 0 <= n <= N
for i in range(N):
if randrange(N - i) < n:
yield i
n -= 1
if n <= 0:
break
The problem with the solution:
It takes O(N) time. We'd really like to take O(n) time, since n is likely to be much smaller than N. On the other hand, we'd like to retain the O(1) space, in case n is also quite large.
A better solution (outline only)
(The following is adapted from a 1987 paper by Jeffrey Scott Vitter, "An Efficient Algorithm for Sequential Random Sampling". See Dr. Vitter's publications page.. Please read the paper for the details.)
Instead of incrementing i and selecting a random number, as in the above python code, it would be cool if we could generate a random number according to some distribution which would be the number of times that i will be incremented without any element being yielded. All we need is the distribution (which will obviously depend on the current values of n and N.)
Of course, we can derive the distribution precisely from an examination of the algorithm. That doesn't help much, though, because the resulting formula requires a lot of time to compute accurately, and the end result is still O(N).
However, we don't always have to compute it accurately. Suppose we have some easily computable reasonably good approximation which consistently underestimates the probabilities (with the consequence that it will sometimes not make a prediction). If that approximation works, we can use it; if not, we'll need to fallback to the accurate computation. If that happens sufficiently rarely, we might be able to achieve O(n) on the average. And indeed, Dr. Vitter's paper shows how to do this. (With code.)
Suppose you wanted to generate just three random numbers, x, y, and z so that they are in sorted order x <= y <= z. You will place these in some C++ container, which I'll just denote as a list like D = [x, y, z], so we can also say that x is component 0 of D, or D_0 and so on.
For any sequential algorithm that first draws a random value for x, let's say it comes up with 2.5, then this tells us some information about what y has to be, Namely, y >= 2.5.
So, conditional on the value of x, your desired random number algorithm has to satisfy the property that p(y >= x | x) = 1. If the distribution you are drawing from is anything like a common distribution, like uniform or Guassian, then it's clear to see that usually p(y >= x) would be some other expression involving the density for that distribution. (In fact, only a pathological distribution like a Dirac Delta at "infinity" could be independent, and would be nonsense for your application.)
So what we can speculate with great confidence is that p(y >= t | x) for various values of t is not equal to p(y >= t). That's the definition for dependent random variables. So now you know that the random variable y (second in your eventual list) is not statistically independent of x.
Another way to state it is that in your output data D, the components of D are not statistically independent observations. And in fact they must be positively correlated since if we learn that x is bigger than we thought, we also automatically learn that y is bigger than or equal to what we thought.
In this sense, a sequential algorithm that provides this kind of output is an example of a Markov Chain. The probability distribution of a given number in the sequence is conditionally dependent on the previous number.
If you really want a Markov Chain like that (I suspect that you don't), then you could instead draw a first number at random (for x) and then draw positive deltas, which you will add to each successive number, like this:
Draw a value for x, say 2.5
Draw a strictly positive value for y-x, say 13.7, so y is 2.5 + 13.7 = 16.2
Draw a strictly positive value for z-y, say 0.001, so z is 16.201
and so on...
You just have to acknowledge that the components of your result are not statistically independent, and so you cannot use them in an application that relies on statistical independence assumptions.
I am studying for an exam and i came across some problems i need to address - dealing with Base Cases:
I am converting from Code to a Recurrence Relation not the other way around
Example 1:
if(n==1) return 0;
Now the recurrence relation to that piece of code is: T(1) = 0
How i got that?
By looking at n==1, we see this is a comparison with a value > 0, which is doing some form of work so we set "T(1)" and the return 0; isn't doing any work so we say "=0"
=> T(1) = 0;
Example 2:
if(n==0) return n+1*2;
Analyzing: n==0 means we aren't doing any work so T(0), but return n+1*2; is doing work so "=1"
=> T(0) = 1;
What i want to know is if this the correct way of analyzing a piece of code like that to come up with a recurrence relation base case?
I am unsure about these which i came up on my own to exhaust possibilities of base-cases:
Example 3: if(n==m-2) return n-1; //answer: T(1) = 1; ?
Example 4: if(n!=2) return n; //answer: T(1) = 1; ?
Example 5: if(n/2==0) return 1; //answer: T(1) = 1; ?
Example 6: if(n<2) return; //answer: T(1) = 0; ?
It's hard to analyze base cases outside of the context of the code, so it might be helpful if you posted the entire function. However, I think your confusion is arising from the assumption that T(n) always represents "work". I'm guessing that you are taking a class on complexity and you have learned about recurrence relations as a method for expressing the complexity of a recursive function.
T(n) is just a function: you plug in a number n (usually a positive integer) and you get out a number T(n). Just like any other function, T(n) means nothing on its own. However, we often use a function with the notation T(n) to express the amount of time required by an algorithm to run on an input of size n. These are two separate concepts; (1) a function T(n) and the various ways to represent it, such as a recurrence relationship, and (2) the number of operations required to run an algorithm.
Let me give an example.
int factorial(n)
{
if (n > 0)
return n*factorial(n-1);
else
return 1;
}
Let's see if we can write some function F(n) that represents the output of the code. well, F(n) = n*F(n-1), with F(0) = 1. Why? Clearly from the code, the result of F(0) is 1. For any other value of n, the result is F(n) = n*F(n-1). That recurrence relation is a very convenient way to express the output of a recursive function. Of course, I could just as easily say that F(n) = n! (the factorial operator), which is also correct. That's a non-recurrence expression of the same function. Notice that I haven't said anything about the run time of the algorithm or how much "work" it is doing. I'm just writing a mathematical expression for what the code outputs.
Dealing with the run time of a function is a little trickier, since you have to decide what you mean by "work" or "an operation." Let's suppose that we don't count "return" as an operation, but we do count multiplication as an operation and we count a conditional (the if statement) as an operation. Under these assumptions, we can try to write a recurrence relation for a function T(n) that describes how much work is done when the input n is given to the function. (Later, I'll make a comment about why this is a poor question.) For n = 0, we have a conditional (the if statement) and a return, nothing else. So T(0) = 1. For any other n > 0, we have a conditional, a multiply, and however many operations are required to compute T(n-1). So the total for n is:
T(n) = 1 (conditional) + 1 (multiply) + T(n-1) = 2 + T(n-1),
T(0) = 1.
We could write T(n) as a recurrence relation: T(n) = 2 + T(n-1), T(0) = 1. Of course, this is also just the function T(n) = 1 + 2n. Again, I want to stress that these are two very different functions. F(n) is describing the output of the function when n is the input. T(n) is describing how much work is done when n is the input.
Now, the T(n) that I just described is bad in terms of complexity theory. The reason is that complexity theory isn't about describing how much work is required to compute functions that take only a single integer as an argument. In other words, we aren't looking at the work required for a function of the form F(n). We want something more general: how much work is required to perform an algorithm on an input of size n. For example, MergeSort is an algorithm for sorting a list of objects. It requires roughly nlog(n) operations to run MergeSort on a list of n items. Notice that MergeSort isn't doing anything with the number n, rather, it operates on a list of size n. In contrast, our factorial function F(n) isn't operating on an input of size n: presumably n is an integer type, so it is probably 32-bits or 64-bits or something, no matter its value. Or you can get picky and say that its size is the minimum number of bits to describe it. In any case, n is the input, not the size of the input.
When you are answering these questions, it is important to be very clear about whether they want a recurrence relation that describes the output of the function, or a recurrence relation that describes the run time of the function.
Background:
Given n balls such that:
'a' balls are of colour GREEN
'b' balls are of colour BLUE
'c' balls are of colour RED
...
(of course a + b + c + ... = n)
The number of permutations in which these balls can be arranged is given by:
perm = n! / (a! b! c! ..)
Question 1:
How can I 'elegantly' calculate perm so as to avoid an integer overflow as as long as possible, and be sure that when I am done calculating, I either have the correct value of perm, or I know that the final result will overflow?
Basically, I want to avoid using something like GNU GMP.
Optionally, Question 2:
Is this a really bad idea, and should I just go ahead and use GMP?
These are known as the multinomial coefficients, which I shall denote by m(a,b,...).
And you can efficiently calculate them avoiding overflow by exploiting this identity (which should be fairly simple to prove):
m(a,b,c,...) = m(a-1,b,c,...) + m(a,b-1,c,...) + m(a,b,c-1,...) + ...
m(0,0,0,...) = 1 // base case
m(anything negative) = 0 // base case 2
Then it's a simple matter of using recursion to calculate the coefficients. Note that to avoid an exponential running time, you need to either cache your results (to avoid recomputation) or use dynamic programming.
To check for overflow, just make sure the sums won't overflow.
And yes, it's a very bad idea to use arbitrary precision libraries to do this simple task.
If you have globs of cpu time, you can make lists out of all the factorials, then find the prime factorization of all the numbers in the lists, then cancel all the numbers on the top with those on the bottom, until the numbers are completely reduced.
The overflow-safest way is the way Dave suggested. You find the exponent with which the prime p divides n! by the summation
m = n;
e = 0;
do{
m /= p;
e += m;
}while(m > 0);
Subtract the exponents of p in the factorisations of a! etc. Do that for all primes <= n, and you have the factorisation of the multinomial coefficient. That calculation overflows if and only if the final result overflows. But the multinomial coefficients grow rather fast, so you will have overflow already for fairly small n. For substantial calculations, you will need a bignum library (if you don't need exact results, you can get by a bit longer using doubles).
Even if you use a bignum library, it is worthwhile to keep intermediate results from getting too large, so instead of calculating the factorials and dividing huge numbers, it is better to calculate the parts in sequence,
n!/(a! * b! * c! * ...) = n! / (a! * (n-a)!) * (n-a)! / (b! * (n-a-b)!) * ...
and to compute each of these factors, let's take the second for illustration,
(n-a)! / (b! * (n-a-b)!) = \prod_{i = 1}^b (n-a+1-i)/i
is calculated with
prod = 1
for i = 1 to b
prod = prod * (n-a+1-i)
prod = prod / i
Finally multiply the parts. This requires n divisions and n + number_of_parts - 1 multiplications, keeping the intermediate results moderately small.
According to this link, you can calculate multinomial coefficients as a product of several binomial coefficients, controlling integer overflow on the way.
This reduces original problem to overflow-controlled computation of a binomial coefficient.
Notations: n! = prod(1,n) where prod you may guess what does.
It's very easy, but first you must know that for any 2 positive integers (i, n > 0) that expression is a positive integer:
prod(i,i+n-1)/prod(1,n)
Thus the idea is to slice the computation of n! in small chunks and to divide asap.
With a, than with b and so on.
perm = (a!/a!) * (prod(a+1, a+b)/b!) * ... * (prod(a+b+c+...y+1,n)/z!)
Each of these factors is an integer, so if perm will not overflow, neither one of its factors will do.
Though, in the calculation of a such factor could be an overflow in numerator or denominator but that's avoidable doing a multiplication in numerator then a division in alternance:
prod(a+1, a+b)/b! = (a+1)(a+2)/2*(a+3)/3*..*(a+b)/b
In that way every division will yield an integer.
For one of my course project I started implementing "Naive Bayesian classifier" in C. My project is to implement a document classifier application (especially Spam) using huge training data.
Now I have problem implementing the algorithm because of the limitations in the C's datatype.
( Algorithm I am using is given here, http://en.wikipedia.org/wiki/Bayesian_spam_filtering )
PROBLEM STATEMENT:
The algorithm involves taking each word in a document and calculating probability of it being spam word. If p1, p2 p3 .... pn are probabilities of word-1, 2, 3 ... n. The probability of doc being spam or not is calculated using
Here, probability value can be very easily around 0.01. So even if I use datatype "double" my calculation will go for a toss. To confirm this I wrote a sample code given below.
#define PROBABILITY_OF_UNLIKELY_SPAM_WORD (0.01)
#define PROBABILITY_OF_MOSTLY_SPAM_WORD (0.99)
int main()
{
int index;
long double numerator = 1.0;
long double denom1 = 1.0, denom2 = 1.0;
long double doc_spam_prob;
/* Simulating FEW unlikely spam words */
for(index = 0; index < 162; index++)
{
numerator = numerator*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD;
denom2 = denom2*(long double)PROBABILITY_OF_UNLIKELY_SPAM_WORD;
denom1 = denom1*(long double)(1 - PROBABILITY_OF_UNLIKELY_SPAM_WORD);
}
/* Simulating lot of mostly definite spam words */
for (index = 0; index < 1000; index++)
{
numerator = numerator*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD;
denom2 = denom2*(long double)PROBABILITY_OF_MOSTLY_SPAM_WORD;
denom1 = denom1*(long double)(1- PROBABILITY_OF_MOSTLY_SPAM_WORD);
}
doc_spam_prob= (numerator/(denom1+denom2));
return 0;
}
I tried Float, double and even long double datatypes but still same problem.
Hence, say in a 100K words document I am analyzing, if just 162 words are having 1% spam probability and remaining 99838 are conspicuously spam words, then still my app will say it as Not Spam doc because of Precision error (as numerator easily goes to ZERO)!!!.
This is the first time I am hitting such issue. So how exactly should this problem be tackled?
This happens often in machine learning. AFAIK, there's nothing you can do about the loss in precision. So to bypass this, we use the log function and convert divisions and multiplications to subtractions and additions, resp.
SO I decided to do the math,
The original equation is:
I slightly modify it:
Taking logs on both sides:
Let,
Substituting,
Hence the alternate formula for computing the combined probability:
If you need me to expand on this, please leave a comment.
Here's a trick:
for the sake of readability, let S := p_1 * ... * p_n and H := (1-p_1) * ... * (1-p_n),
then we have:
p = S / (S + H)
p = 1 / ((S + H) / S)
p = 1 / (1 + H / S)
let`s expand again:
p = 1 / (1 + ((1-p_1) * ... * (1-p_n)) / (p_1 * ... * p_n))
p = 1 / (1 + (1-p_1)/p_1 * ... * (1-p_n)/p_n)
So basically, you will obtain a product of quite large numbers (between 0 and, for p_i = 0.01, 99). The idea is, not to multiply tons of small numbers with one another, to obtain, well, 0, but to make a quotient of two small numbers. For example, if n = 1000000 and p_i = 0.5 for all i, the above method will give you 0/(0+0) which is NaN, whereas the proposed method will give you 1/(1+1*...1), which is 0.5.
You can get even better results, when all p_i are sorted and you pair them up in opposed order (let's assume p_1 < ... < p_n), then the following formula will get even better precision:
p = 1 / (1 + (1-p_1)/p_n * ... * (1-p_n)/p_1)
that way you devide big numerators (small p_i) with big denominators (big p_(n+1-i)), and small numerators with small denominators.
edit: MSalter proposed a useful further optimization in his answer. Using it, the formula reads as follows:
p = 1 / (1 + (1-p_1)/p_n * (1-p_2)/p_(n-1) * ... * (1-p_(n-1))/p_2 * (1-p_n)/p_1)
Your problem is caused because you are collecting too many terms without regard for their size. One solution is to take logarithms. Another is to sort your individual terms. First, let's rewrite the equation as 1/p = 1 + ∏((1-p_i)/p_i). Now your problem is that some of the terms are small, while others are big. If you have too many small terms in a row, you'll underflow, and with too many big terms you'll overflow the intermediate result.
So, don't put too many of the same order in a row. Sort the terms (1-p_i)/p_i. As a result, the first will be the smallest term, the last the biggest. Now, if you'd multiply them straight away you would still have an underflow. But the order of calculation doesn't matter. Use two iterators into your temporary collection. One starts at the beginning (i.e. (1-p_0)/p_0), the other at the end (i.e (1-p_n)/p_n), and your intermediate result starts at 1.0. Now, when your intermediate result is >=1.0, you take a term from the front, and when your intemediate result is < 1.0 you take a result from the back.
The result is that as you take terms, the intermediate result will oscillate around 1.0. It will only go up or down as you run out of small or big terms. But that's OK. At that point, you've consumed the extremes on both ends, so it the intermediate result will slowly approach the final result.
There's of course a real possibility of overflow. If the input is completely unlikely to be spam (p=1E-1000) then 1/p will overflow, because ∏((1-p_i)/p_i) overflows. But since the terms are sorted, we know that the intermediate result will overflow only if ∏((1-p_i)/p_i) overflows. So, if the intermediate result overflows, there's no subsequent loss of precision.
Try computing the inverse 1/p. That gives you an equation of the form 1 + 1/(1-p1)*(1-p2)...
If you then count the occurrence of each probability--it looks like you have a small number of values that recur--you can use the pow() function--pow(1-p, occurences_of_p)*pow(1-q, occurrences_of_q)--and avoid individual roundoff with each multiplication.
You can use probability in percents or promiles:
doc_spam_prob= (numerator*100/(denom1+denom2));
or
doc_spam_prob= (numerator*1000/(denom1+denom2));
or use some other coefficient
I am not strong in math so I cannot comment on possible simplifications to the formula that might eliminate or reduce your problem. However, I am familiar with the precision limitations of long double types and am aware of several arbitrary and extended precision math libraries for C. Check out:
http://www.nongnu.org/hpalib/
and
http://www.tc.umn.edu/~ringx004/mapm-main.html