Run time of a function - c

Let function f() be:
void f(int n)
{
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i)
printf(“*”);
}
According to my calculations, the run time in Big O method should be O(n2log n).
The answer is O(n2). Why is that?

I owe you an apology. I misread your code the first time around, so the initial answer I gave was incorrect. Here's a corrected answer, along with a comparison with the original answer that explains where my analysis went wrong. I hope you find this interesting - I think there's some really cool math that arises from this!
The code you've posted is shown here:
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i)
printf(“*”);
To determine the runtime of this code, let's look at how much work the inner loop does across all iterations. When i = 1, the loop counts up to n2 by ones, so it does n2 work. When i = 2, the loop counts up to n2 / 2 by twos, so it does n2 / 4 work. When i = 3, the loop counts up to n2 / 3 by threes, so it does n2 / 9 work. More generally, the kth iteration does n2 / k2 work, since it counts up to n2 / k with steps of size k.
If we sum up the work done here for i ranging from 1 to n, inclusive, we see that the runtime is
n2 + n2 / 4 + n2 / 9 + n2 / 16 + ... + n2 / n2
= n2 (1 + 1/4 + 1/9 + 1/16 + 1/25 + ... + 1/n2).
The summation here (1 + 1/4 + 1/9 + 1/16 + ...) has the (surprising!) property that, in the limit, it's exactly equal to π2 / 6. In other words, the runtime of your code asymptotically approaches n2 π / 6, so the runtime is O(n2). You can see this by writing a program that compares the number of actual steps against n2 π / 6 and looking at the results.
I got this wrong the first time around because I misread your code as though it were written as
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=1)
printf(“*”);
In other words, I thought that the inner loop took steps of size one on each iteration rather than steps of size i. In that case, the work done by the kth iteration of the loop is n2 / k, rather than n2 / k2, which gives a runtime of
n2 + n2/2 + n2/3 + n2/4 + ...n2/n
= n2(1 + 1/2 + 1/3 + 1/4 + ... + 1/n)
Here, we can use the fact that 1 + 1/2 + 1/3 + ... + 1/n is a well-known summation. The nth harmonic number is defined as Hn = 1 + 1/2 + 1/3 + ... + 1/n and it's known that the harmonic numbers obey Hn = Θ(log n), so this version of the code runs in time O(n2 log n). It's interesting how this change so dramatically changes the runtime of the code!
As an interesting generalization, let's suppose that you change the inner loop so that the step size is iε for some ε > 0 (and assuming you round up). In that case, the number of iterations on the kth time through the inner loop will be n2 / k1 + ε, since the upper bound on the loop is n2 / k and you're taking steps of size kε. Via a similar analysis to what we've seen before, the runtime will be
n2 + n2 / 21+ε + n2 / 31+ε + n2 / 31+ε + ... + n2 / n1+ε
= n2(1 + 1/21+ε + 1/31+ε + 1/41+ε + ... + 1/n1+ε)
If you've taken a calculus course, you might recognize that the series
1 + 1/21+ε + 1/31+ε + 1/41+ε + ... + 1/n1+ε
converges to some fixed limit for any ε > 0, meaning that if the step size is any positive power of i, the overall runtime will be O(n2). This means that all of the following pieces of code have runtime O(n2):
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i)
printf(“*”);
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i*i)
printf(“*”);
for (int i=1; i<=n; i++)
for (int j=1; j<=n*n/i; j+=i*(sqrt(i) + 1))
printf(“*”);

Run time for first loop is n and Run time for second loop is (n/i)^2 (not n^2/i) because we have j+=i(not j++). So total time is as follow:
∑{i=1to n}(n/i)^2 = n^2∑{i=1to n}(1/i)^2 < 2*n^2
So time complexity is O(n^2)

From what I've learned from theory, is that the i does not affect the complexity very much. Since you have an exponential function, the log n would be neglected. Therefore, it would be considered only the big O(n2) instead of the expected O(n2log n).
Recall that when we use big-O notation, we drop constants and low-order terms. This is because when the problem size gets sufficiently large, those terms don't matter. However, this means that two algorithms can have the same big-O time complexity, even though one is always faster than the other. For example, suppose algorithm 1 requires N2 time, and algorithm 2 requires 10 * N2 + N time. For both algorithms, the time is O(N2), but algorithm 1 will always be faster than algorithm 2. In this case, the constants and low-order terms do matter in terms of which algorithm is actually faster.
However, it is important to note that constants do not matter in terms of the question of how an algorithm "scales" (i.e., how does the algorithm's time change when the problem size doubles). Although an algorithm that requires N2 time will always be faster than an algorithm that requires 10*N2 time, for both algorithms, if the problem size doubles, the actual time will quadruple.
When two algorithms have different big-O time complexity, the constants and low-order terms only matter when the problem size is small. For example, even if there are large constants involved, a linear-time algorithm will always eventually be faster than a quadratic-time algorithm. This is illustrated in the following table, which shows the value of 100*N (a time that is linear in N) and the value of N2/100 (a time that is quadratic in N) for some values of N. For values of N less than 104, the quadratic time is smaller than the linear time. However, for all values of N greater than 104, the linear time is smaller.
Take a look at this article for more details.

Related

Deciphering the purpose of a function in C with two for nested loops, one int input and an int return

A friend from another college came to me with this challenge. I was unable to help him, and became rather troubled as I can't understand the purpose of the function he was supposed to decipher. Has anyone ever seen anything like this?
I've tried visualising the data, but couldn't really take any meaningful conclusions out of the graph:
(blue is the return value of mistery, orange is the difference between the last and the current return. The scale is logarithmic for easier reading.)
int mistery(int n){
int i, j, res=0;
for(i = n / 2; i <= n; i++){
for(j = 2; j <= n; j *= 2){
res += n / 2;
}
}
return res;
}
It seems like random code, but I've actually seen the worksheet where this appears. Any input is welcome.
Thanks
The increment added to the result variable on each iteration of the inner loop depends only on function parameter n, which the function does not modify. The result is therefore the product of the increment (n/2) with the total number of inner-loop iterations (supposing that does not overflow).
So how many loop iterations are there? Consider first the outer loop. If the lower bound of i were 0, then the inclusive upper bound of n would yield n+1 iterations. But we are skipping the first n/2 iterations (0 ... (n/2)-1), so that's n+1-(n/2). All the divisions here are integer divisions, and with integer division it is true for all m that m = (m/2) + ((m+1)/2). We can use that to rewrite the number of outer-loop iterations as ((n+1)/2) + 1, or (n+3)/2 (still using integer division).
Now consider the inner loop. The index variable j starts at 2 and is doubled at every iteration until it exceeds n. This yields a number of iterations equal to the floor of the base-2 logarithm of n. The overall number of iterations is therefore
(n+3)/2 * floor(log2(n))
(supposing that we can assume an exact result when n is a power of 2). The combined result, then, is
((n+3)/2) * (n/2) * floor(log2(n))
where the divisions are still integer divisions, and are performed before the multiplications. This explains why the function's derivative spikes at power-of-2 arguments: the number of outer-loop iterations increases by one at each of those points.
We haven't any context from which to guess the purpose of the function, and I don't recognize it as a well-known function, but we can talk a bit about its properties. For example,
it grows asymptotically faster than n2, and asymptotically slower than n3. In fact,
it has a form reminiscent of those that tend to appear in computational asmyptotic bounds, so perhaps it is an estimate or bound for the memory or time that some computation will require. Also,
it is strictly increasing for n > 0. That might not be immediately clear because of the involvement of integer division, but note that n and n + 3 have opposite parity, so whenever n increases by one, exactly one of n/2 and (n+3)/2 increases by one, while the other does not change (and the logarithmic term is non-decreasing).
as already discussed, it has sudden jumps at powers of 2. This can be viewed in terms of the number of iterations of the outer loop, or, alternatively, in terms of the involvement of the floor() function in the single-equation form.
The polynomial part of the function is related to the equation for the sum of the integers from 1 to n.
The logarithmic part of the function is related to the number of significant bits in the binary representation of n.

While loop time complexity O(logn)

I cannot understand why the time complexity for this code is O(logn):
double n;
/* ... */
while (n>1) {
n*=0.999;
}
At least it says so in my study materials.
Imagine if the code were as follows:
double n;
/* ... */
while (n>1) {
n*=0.5;
}
It should be intuitive to see how this is O(logn), I hope.
When you multiply by 0.999 instead, it becomes slower by a constant factor, but of course the complexity is still written as O(logn)
You want to calculate how many iterations you need before n becomes equal to (or less than) 1.
If you call the number of iterations for k you want to solve
n * 0.999^k = 1
It goes like this
n * 0.999^k = 1
0.999^k = 1/n
log(0.999^k) = log(1/n)
k * log(0.999) = -log(n)
k = -log(n)/log(0.999)
k = (-1/log(0.999)) * log(n)
For big-O we only care about "the big picture" so we throw away constants. Here log(0.999) is a negative constant so (-1/log(0.999)) is a positive constant that we can "throw away", i.e. set to 1. So we get:
k ~ log(n)
So the code is O(logn)
From this you can also notice that the value of the constant (i.e. 0.999 in your example) doesn't matter for the big-O calculation. All constant values greater than 0 and less than 1 will result in O(logn).
Logarithm has two inputs: a base and a number. The result of a logarithm is the power you need to raise the base to to achieve the given number. Since your base is 0.999, the number is the first smaller than 1 and you have a scalar, which is n, effectively the number of steps depends on the power you need to raise your base to achieve such a small number, which multiplied with n will yield a smaller number than 1. This corresponds to the definition of the logarithm, with which I have started my answer.
EDIT:
Think about it this way: You have n as an input and you search for the first k where
n * .999^k < 1
since you are searching k by incrementing it, since if you have l >= n at a step, then you will have l * .999 in the next step. Repeating this achieves a logarithmic complexity for your multiplication algorithm.

Finding the smallest sum of the difference of A[i] and a constant

For an assignment I need to solve a mathmatical problem. I narrowed it down to the following:
Let A[1, ... ,n] be an array of n integers.
Let y be an integer constant.
Now, I have to write an algorithm that finds the minimum of M(y) in O(n) time:
M(y) = Sum |A[i] - y|, i = 1 to n. Note that I not just take A[i] - y, but the absolute value |A[i] - y|.
For clarity, I also put this equation in Wolfram Alpha.
I have considered least squares method, but this will not yield the minimum of M(y) but more of an average value of A, I think. As I'm taking the absolute value of A[i] - y, there is also no way I can differentiate this function to y. Also I can't just come up with any algorithm because I have to do it in O(n) time. Also, I believe there can be more correct answers for y in some cases, in that case, the value of y must be equal to one of the integer elements of A.
This has really been eating me for a whole week now and I still haven't figured it out. Can anyone please teach me the way to go or point me in the right direction? I'm stuck. Thank you so much for your help.
You want to pick a y for which M(y) = sum(abs(A[i] - y)) is minimal. Let's assume every A[i] is positive (it does not change the result, because the problem is invariant by translation).
Let's start with two simple observations. First, if you pick y such that y < min(A) or y > max(A), you end up with a greater value for M(y) than if you picked y such that min(A) <= y <= max(A). Also, there is a unique local minimum or range of minima of A (M(y) is convex).
So we can start by picking some y in the interval [min(A) .. max(A)] and try to move this value around so that we get a smaller M(y). To make things easier to understand, let's sort A and pick a i in [1 .. n] (so y = A[i]).
There are three cases to consider.
If A[i+1] > A[i], and either {n is odd and i < (n+1)/2} or {n is even and i < n/2}, then M(A[i+1]) < M(A[i]).
This is because, going from M(A[i]) to M(A[i+1]), the number of terms that decrease (that is n-i) is greater than the number of terms that increase (that is i), and the increase or decrease is always of the same amount. In the case where n is odd, i < (n+1)/2 <=> 2*i < n+1 <=> 2*i < n, because 2*i is even (thus necessarily smaller than a larger even number from which we subtract one).
In more formal terms, M(A[i]) = sum(A[i]-A[s]) + sum(A[g]-A[i]), where s and g represent indices such that A[s] < A[i] and A[g] > A[i]. So if A[i+1] > A[i], then M(A[i+1]) = sum(A[i]-A[s]) + i*(A[i+1]-A[i]) + sum(A[g]-A[i]) - (n-i)*(A[i+1]-A[i]) = M(A[i]) + (2*i-n)*(A[i+1]-A[i]). Since 2*i < n and A[i+1] > A[i], (2*i-n)*(A[i+1]-A[i]) < 0, so M(A[i+1]) < M(A[i]).
Similarly, if A[i-1] < A[i], and either {n is odd and i > (n+1)/2} or {n is even and i > (n/2)+1}, then M(A[i-1]) > M(A[i]).
Finally, if {n is odd and i = (n+1)/2} or {n is even and i = (n/2) or (n/2)+1}, then you have a minimum, because decrementing or incrementing i will eventually lead you to the first or second case, respectively. There are leftover possible values for i, but all of them lead to A[i] being a minimum too.
The median of A is exactly the value A[i] where i satisfies the last case. If the number of elements in A is odd, then you have exactly one such value, y = A[(n+1)/2] (but possibly multiple indices for it) ; if it's even, then you have a range (which may contain just one integer) of such values, A[n/2] <= y <= A[n/2+1].
There is a standard C++ algorithm that can help you find the median in O(n) time : nth_element. If you are using another language, look up the median of medians algorithm (which Nico Schertler pointed out) or even introselect (which is what nth_element typically uses).

Subinterval of length >= k in an array, with maximum average [duplicate]

From an integer array A[N], I'd like to find an interval [i,j] that has a maximized average (A[i] + A[i + 1] + .. + A[j]) / (j - i + 1).
The length of the interval (j - i + 1) should be more than L.(L >= 1)
What I thought was to calculate an average for every i ~ j, but it is too slow to do like this.(N is too big)
Is there an algorithm faster than O(N^2)? Or I'd like to know whether there exists a randomized method for that.
There is an O(N*logC) algorithm, where C is proportional to the maximum element value of the array. Comparing with some more complicated algorithms in recent papers, this algorithm is easier to understand, and can be implemented in a short time, and still fast enough in practical.
For simplicity, We assume there is at least one non-negative integers in the array.
The algorithm is based on binary search. At first, we can find that the final answer must be in the range [0, max(A)], and we half this interval in each iteration, until it is small enough (10-6 for example). In each iteration, assume the available interval is [a,b], we need to check whether the maximum average is no less than (a+b)/2. If so, we get a smaller interval [(a+b)/2, b], or else we get [a, (a+b)/2].
Now the problem is: Given a number K, how to check that the final answer is at least K?
Assume the average is at least K, there exist some i, j such that (A[i] + A[i+1] + ... + A[j]) / (j - i + 1) >= K. We multiply both sides by (j-i+1), and move the right side to left, and we get (A[i] - K) + (A[i+1] - K) + ... + (A[j] - K) >= 0.
So, let B[i] = A[i] - K, we only need to find an interval [i, j] (j - i + 1 > L) such that B[i] + ... + B[j] >= 0. Now the problem is: Given array B and length L, we are to find an interval of maximum sum whose length is more than L. If the maximum sum is >= 0, the original average number K is possible.
The second problem can be solved by linear scan. Let sumB[0] = 0, sumB[i] = B[1] + B[2] + ... + B[i]. For each index i, the max-sum interval which ended at B[i] is sumB[i] - min(sumB[0], sumB[1], ..., sumB[i-L-1]). When scanning the array with increasing i, we can maintain the min(sumB[0], ..., sumB[i-L-1]) on the fly.
The time complexity of the sub-problem is O(N). And we need O(logC) iterations, so the total complexity is O(N*logC).
P.s. This kinds of "average problem" belongs to a family of problems called fractional programming. The similar problems are minimum average-weighted spanning tree, minimum average-weighted cycle, etc.
P.s. again. The O(logC) is a loose bound. I think we can reduce it by some careful analysis.

Efficiently calculating nCk mod p

I have came across this problem many time but I am unable to solve it. There would occur some cases or the other which will wrong answer or otherwise the program I write will be too slow. Formally I am talking about calculating
nCk mod p where p is a prime n is a large number, and 1<=k<=n.
What have I tried:
I know the recursive formulation of factorial and then modelling it as a dynamic programming problem, but I feel that it is slow. The recursive formulation is (nCk) + (nCk-1) = (n+1Ck). I took care of the modulus while storing values in array to avoid overflows but I am not sure that just doing a mod p on the result will avoid all overflows as it may happen that one needs to remove.
To compute nCr, there's a simple algorithm based on the rule nCr = (n - 1)C(r - 1) * n / r:
def nCr(n,r):
if r == 0:
return 1
return n * nCr(n - 1, r - 1) // r
Now in modulo arithmetic we don't quite have division, but we have modulo inverses which (when modding by a prime) are just as good
def nCrModP(n, r, p):
if r == 0:
return 1
return n * nCrModP(n - 1, r - 1) * modinv(r, p) % p
Here's one implementation of modinv on rosettacode
Not sure what you mean by "storing values in array", but I assume they array serves as a lookup table while running to avoid redundant calculations to speed things up. This should take care of the speed problem. Regarding the overflows - you can perform the modulo operation at any stage of computation and repeat it as much as you want - the result will be correct.
First, let's work with the case where p is relatively small.
Take the base-p expansions of n and k: write n = n_0 + n_1 p + n_2 p^2 + ... + n_m p^m and k = k_0 + k_1 p + ... + k_m p^m where each n_i and each k_i is at least 0 but less than p. A theorem (which I think is due to Edouard Lucas) states that C(n,k) = C(n_0, k_0) * C(n_1, k_1) * ... * C(n_m, k_m). This reduces to taking a mod-p product of numbers in the "n is relatively small" case below.
Second, if n is relatively small, you can just compute binomial coefficients using dynamic programming on the formula C(n,k) = C(n-1,k-1) + C(n-1,k), reducing mod p at each step. Or do something more clever.
Third, if k is relatively small (and less than p), you should be able to compute n!/(k!(n-k)!) mod p by computing n!/(n-k)! as n * (n-1) * ... * (n-k+1), reducing modulo p after each product, then multiplying by the modular inverses of each number between 1 and k.

Resources