Accurate method for finding the time complexity of a function - c

How to find the time complexity of this function:
Code
void f(int n)
{
for(int i=0; i<n; ++i)
for(int j=0; j<i; ++j)
for(int k=i*j; k>0; k/=2)
printf("~");
}
I took an educated guess of (n^2)*log(n) based on intuition and it turned out to be correct.
But I can't seem to find an accurate explanation for it.

For every value of i, i>0, there will be i-1 values of the inner loop, each of them for k starting respectively at:
i*1, i*2, ..., i(i-1)
Since k is divided by 2 until it reaches 0, each of these inner-inner loops require lg(k) steps. Hence
lg(i*1) + lg(i*2) + ... + lg(i(i-1)) = lg(i) + lg(i) + lg(2) + ... + lg(i) + lg(i-1)
= (i-1)lg(i) + lg(2) + ... + lg(i-1)
Therefore the total would be
f(n) ::= sum_{i=1}^{n-1} i*lg(i) + lg(2) + ... + lg(i-1)
Let's now bound f(n+1) from above:
f(n+1) <= sum_{i-1}^n i*lg(i) + (i-1)lg(i-1)
<= 2*sum_{i-1}^n i*lg(i)
<= C*integral_0^n x(ln x) ; integral bound, some constant C
= C/2(n^2(ln n) - n^2/2) ; integral x*ln(x) = x^2/2*ln(x) - x^2/4
= O(n^2*lg(n))
If we now bound f(n+1) from below:
f(n+1) >= sum_{i=1}^n i*lg(i)
>= C*integral_0^n x(ln x) ; integral bound
= C*(n^2*ln(n)/2 - n^2/4) ; integral x*ln(x) = x^2/2*ln(x) - x^2/4
>= C/4(n^2*ln(n))
= O(n^2*lg(n))

Related

How to use MPI and OpenMP to run a parallel loop

I need to use MPI and OpenMP (2 different problems) to parallelize a code from Sbac-Pad marathon (reference: http://lspd.mackenzie.br/marathon/18/problems.html). I am working on the himeno benchmark. I believe the only part of this code that is worth parallellizing is the jacobi function:
#define MR(mt,n,r,c,d) mt->m[(n) * mt->mrows * mt->mcols * mt->mdeps + (r) * mt->mcols* mt->mdeps + (c) * mt->mdeps + (d)]
struct Matrix {
float* m;
int mnums;
int mrows;
int mcols;
int mdeps;
};
float
jacobi(int nn, Matrix* a,Matrix* b,Matrix* c,
Matrix* p,Matrix* bnd,Matrix* wrk1,Matrix* wrk2)
{
int i,j,k,n,imax,jmax,kmax;
float gosa,s0,ss;
imax= p->mrows-1;
jmax= p->mcols-1;
kmax= p->mdeps-1;
for(n=0 ; n<nn ; n++){
gosa = 0.0;
for(i=1 ; i<imax; i++)
for(j=1 ; j<jmax ; j++)
for(k=1 ; k<kmax ; k++){
s0= MR(a,0,i,j,k)*MR(p,0,i+1,j, k)
+ MR(a,1,i,j,k)*MR(p,0,i, j+1,k)
+ MR(a,2,i,j,k)*MR(p,0,i, j, k+1)
+ MR(b,0,i,j,k)
*( MR(p,0,i+1,j+1,k) - MR(p,0,i+1,j-1,k)
- MR(p,0,i-1,j+1,k) + MR(p,0,i-1,j-1,k) )
+ MR(b,1,i,j,k)
*( MR(p,0,i,j+1,k+1) - MR(p,0,i,j-1,k+1)
- MR(p,0,i,j+1,k-1) + MR(p,0,i,j-1,k-1) )
+ MR(b,2,i,j,k)
*( MR(p,0,i+1,j,k+1) - MR(p,0,i-1,j,k+1)
- MR(p,0,i+1,j,k-1) + MR(p,0,i-1,j,k-1) )
+ MR(c,0,i,j,k) * MR(p,0,i-1,j, k)
+ MR(c,1,i,j,k) * MR(p,0,i, j-1,k)
+ MR(c,2,i,j,k) * MR(p,0,i, j, k-1)
+ MR(wrk1,0,i,j,k);
ss= (s0*MR(a,3,i,j,k) - MR(p,0,i,j,k))*MR(bnd,0,i,j,k);
gosa+= ss*ss;
MR(wrk2,0,i,j,k)= MR(p,0,i,j,k) + omega*ss;
}
for(i=1 ; i<imax ; i++)
for(j=1 ; j<jmax ; j++)
for(k=1 ; k<kmax ; k++)
MR(p,0,i,j,k)= MR(wrk2,0,i,j,k);
} /* end n loop */
return(gosa);
}
The problem is, this function seems to have a sequential nature, since every iteration of nn is dependant on the last one. What I tried, using MPI, was making an auxiliar variable for gosa (auxgosa), and using MPI_REDUCE after the i j k for loops, like the following (root process is rank = 0):
//rank is the current process
//size is the total amount of processes
int start = ((imax+1)/size)*rank;
int stop = ((imax+1)/size)*(rank+1)-1;
if(rank == 0){start++;}
for(n=0 ; n<nn ; n++){
gosa = 0.0;
auxgosa = 0.0;
for(i=start ; i<stop; i++)
for(j=1 ; j<jmax ; j++)
for(k=1 ; k<kmax ; k++){
s0= MR(aa,0,i,j,k)*MR(pp,0,i+1,j,k)
+ MR(aa,1,i,j,k)*MR(pp,0,i, j+1,k)
+ MR(aa,2,i,j,k)*MR(pp,0,i, j, k+1)
+ MR(bb,0,i,j,k)
*( MR(pp,0,i+1,j+1,k) - MR(pp,0,i+1,j-1,k)
- MR(pp,0,i-1,j+1,k) + MR(pp,0,i-1,j-1,k) )
+ MR(bb,1,i,j,k)
*( MR(pp,0,i,j+1,k+1) - MR(pp,0,i,j-1,k+1)
- MR(pp,0,i,j+1,k-1) + MR(pp,0,i,j-1,k-1) )
+ MR(bb,2,i,j,k)
*( MR(pp,0,i+1,j,k+1) - MR(pp,0,i-1,j,k+1)
- MR(pp,0,i+1,j,k-1) + MR(pp,0,i-1,j,k-1) )
+ MR(cc,0,i,j,k) * MR(pp,0,i-1,j, k)
+ MR(cc,1,i,j,k) * MR(pp,0,i, j-1,k)
+ MR(cc,2,i,j,k) * MR(pp,0,i, j, k-1)
+ MR(awrk1,0,i,j,k);
ss= (s0*MR(aa,3,i,j,k) - MR(pp,0,i,j,k))*MR(abnd,0,i,j,k);
auxgosa+= ss*ss;
MR(awrk2,0,i,j,k)= MR(pp,0,i,j,k) + omega*ss;
}
MPI_Reduce(&auxgosa,&gosa,1,MPI_FLOAT,MPI_SUM,0,MPI_COMM_WORLD);
for(i=1 ; i<imax ; i++)
for(j=1 ; j<jmax ; j++)
for(k=1 ; k<kmax ; k++)
MR(pp,0,i,j,k)= MR(awrk2,0,i,j,k);
} /* end n loop */
Unfortunately, this didn't work. Could anyone give me some insight about this? I plan using a similar strategy with OpenMP.
If awrk2 is different from a, p, b, c and wrk1, then there is no loop carried dependence.
A simple google search will point you to parallelized versions of the Himeno benchmark (MPI, OpenMP and hybrid MPI+OpenMP versions are available).

Loop Splitting makes code slower

So I'm optimizing a loop (as homework) that adds 10,000 elements 600,000 times. The time without optimizations is 23.34s~ and my goal is to reach less than 7 seconds for a B and less than 5 for an A.
So I started my optimizations by first unrolling the loop like this.
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7];
This reduces the runtime to about 6.4~ seconds (I can hit about 6 if I unroll further).
So I figured I would try adding sub-sums and making a final sum at the end to save time on read-write dependencies and I came up with code that looks like this.
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum0 += array[j] + array[j+1];
sum1 += array[j+2] + array[j+3];
sum2 += array[j+4] + array[j+5];
sum3 += array[j+6] + array[j+7];
However this increases the runtime to about 6.8 seconds
I tried a similar technique using pointers and the best I could do was about 15 seconds.
I only know that the machine I'm running this on (as it is a service purchased by the school) is a 32 bit, remote, Intel based, Linux virtual server that I believe is running Red Hat.
I've tried every technique I can think of to speed up the code, but they all seem to have the opposite effect. Could someone elaborate on what I'm doing wrong? Or another technique I could use to lower the runtime? The best the teacher could do was about 4.8 seconds.
As an additional condition I cannot have more than 50 lines of code in the finished project, so doing something complex is likely not possible.
Here is a full copy of both sources
#include <stdio.h>
#include <stdlib.h>
// You are only allowed to make changes to this code as specified by the comments in it.
// The code you submit must have these two values.
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main(void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
// You can add variables between this comment ...
// double sum0 = 0;
// double sum1 = 0;
// double sum2 = 0;
// double sum3 = 0;
// ... and this one.
// Please change 'your name' to your actual name.
printf("CS201 - Asgmt 4 - ACTUAL NAME\n");
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] + array[j+6] + array[j+7];
}
// ... and this one. But your inner loop must do the same
// number of additions as this one does.
}
// You can add some final code between this comment ...
// sum = sum0 + sum1 + sum2 + sum3;
// ... and this one.
return 0;
}
Broken up code
#include <stdio.h>
#include <stdlib.h>
// You are only allowed to make changes to this code as specified by the comments in it.
// The code you submit must have these two values.
#define N_TIMES 600000
#define ARRAY_SIZE 10000
int main(void)
{
double *array = calloc(ARRAY_SIZE, sizeof(double));
double sum = 0;
int i;
// You can add variables between this comment ...
double sum0 = 0;
double sum1 = 0;
double sum2 = 0;
double sum3 = 0;
// ... and this one.
// Please change 'your name' to your actual name.
printf("CS201 - Asgmt 4 - ACTUAL NAME\n");
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j += 8) {
sum0 += array[j] + array[j+1];
sum1 += array[j+2] + array[j+3];
sum2 += array[j+4] + array[j+5];
sum3 += array[j+6] + array[j+7];
}
// ... and this one. But your inner loop must do the same
// number of additions as this one does.
}
// You can add some final code between this comment ...
sum = sum0 + sum1 + sum2 + sum3;
// ... and this one.
return 0;
}
ANSWER
The 'time' application we use to judge the grade is a little bit off. The best I could do was 4.9~ by unrolling the loop 50 times and grouping it like I did below using TomKarzes's basic format.
int j;
for (j = 0; j < ARRAY_SIZE; j += 50) {
sum +=(((((((array[j] + array[j+1]) + (array[j+2] + array[j+3])) +
((array[j+4] + array[j+5]) + (array[j+6] + array[j+7]))) +
(((array[j+8] + array[j+9]) + (array[j+10] + array[j+11])) +
((array[j+12] + array[j+13]) + (array[j+14] + array[j+15])))) +
((((array[j+16] + array[j+17]) + (array[j+18] + array[j+19]))))) +
(((((array[j+20] + array[j+21]) + (array[j+22] + array[j+23])) +
((array[j+24] + array[j+25]) + (array[j+26] + array[j+27]))) +
(((array[j+28] + array[j+29]) + (array[j+30] + array[j+31])) +
((array[j+32] + array[j+33]) + (array[j+34] + array[j+35])))) +
((((array[j+36] + array[j+37]) + (array[j+38] + array[j+39])))))) +
((((array[j+40] + array[j+41]) + (array[j+42] + array[j+43])) +
((array[j+44] + array[j+45]) + (array[j+46] + array[j+47]))) +
(array[j+48] + array[j+49])));
}
I experimented with the grouping a bit. On my machine, with my gcc, I found that the following worked best:
for (j = 0; j < ARRAY_SIZE; j += 16) {
sum = sum +
(array[j ] + array[j+ 1]) +
(array[j+ 2] + array[j+ 3]) +
(array[j+ 4] + array[j+ 5]) +
(array[j+ 6] + array[j+ 7]) +
(array[j+ 8] + array[j+ 9]) +
(array[j+10] + array[j+11]) +
(array[j+12] + array[j+13]) +
(array[j+14] + array[j+15]);
}
In other words, it's unrolled 16 times, it groups the sums into pairs, and then it adds the pairs linearly. I also removed the += operator, which affects when sum is first used in the additions.
I found that the measured times varied significantly from one run to the next, even without changing anything, so I suggest timing each version several times before making any conclusions about whether the time has improved or gotten worse.
I'd be interested to know what numbers you get on your machine with this version of the inner loop.
Update: Here's my current fastest version (on my machine, with my compiler):
int j1, j2;
j1 = 0;
do {
j2 = j1 + 20;
sum = sum +
(array[j1 ] + array[j1+ 1]) +
(array[j1+ 2] + array[j1+ 3]) +
(array[j1+ 4] + array[j1+ 5]) +
(array[j1+ 6] + array[j1+ 7]) +
(array[j1+ 8] + array[j1+ 9]) +
(array[j1+10] + array[j1+11]) +
(array[j1+12] + array[j1+13]) +
(array[j1+14] + array[j1+15]) +
(array[j1+16] + array[j1+17]) +
(array[j1+18] + array[j1+19]);
j1 = j2 + 20;
sum = sum +
(array[j2 ] + array[j2+ 1]) +
(array[j2+ 2] + array[j2+ 3]) +
(array[j2+ 4] + array[j2+ 5]) +
(array[j2+ 6] + array[j2+ 7]) +
(array[j2+ 8] + array[j2+ 9]) +
(array[j2+10] + array[j2+11]) +
(array[j2+12] + array[j2+13]) +
(array[j2+14] + array[j2+15]) +
(array[j2+16] + array[j2+17]) +
(array[j2+18] + array[j2+19]);
}
while (j1 < ARRAY_SIZE);
This uses a total unroll amount of 40, split into two groups of 20, with alternating induction variables that are pre-incremenented to break dependencies, and a post-tested loop. Again, you can experiment with the parentheses groupings to fine-tune it for your compiler and platform.
I tried your code with the following approaches:
No optimization, for loop with integer indexes by 1, simple sum +=. This took 16.4 seconds on my 64 bit 2011 MacBook Pro.
gcc -O2, same code, got down to 5.46 seconds.
gcc -O3, same code, got down to 5.45 seconds.
I tried using your code with 8-way addition into the sum variable. This took it down to 2.03 seconds.
I doubled that to 16-way additon into the sum variable, this took it down to 1.91 seconds.
I doubled that to 32-way addition into the sum variable. The time WENT UP to 2.08 seconds.
I switched to a pointer approach, as suggested by #kcraigie. With -O3, the time was 6.01 seconds. (Very surprising to me!)
register double * p;
for (p = array; p < array + ARRAY_SIZE; ++p) {
sum += *p;
}
I changed the for loop to a while loop, with sum += *p++ and got the time down to 5.64 seconds.
I changed the while loop to count down instead of up, the time went up to 5.88 seconds.
I changed back to a for loop with incrementing-by-8 integer index, added 8 register double sum[0-7] variables, and added _array[j+N] to sumN for N in [0,7]. With _array declared to be a register double *const initialized to array, on the chance that it matters. This got the time down to 1.86 seconds.
I changed to a macro that expanded to 10,000 copies of +_array[n], with N a constant. Then I did sum = tnKX(addsum) and the compiler crashed with a segmentation fault. So a pure-inlining approach isn't going to work.
I switched to a macro that expanded to 10,000 copies of sum += _array[n] with N a constant. That ran in 6.63 seconds!! Apparently the overhead of loading all that code reduces the effectiveness of the inlining.
I tried declaring a static double _array[ARRAY_SIZE]; and then using __builtin_memcpy to copy it before the first loop. With the 8-way parallel addition, this resulted in a time of 2.96 seconds. I don't think static array is the way to go. (Sad - I was hoping the constant address would be a winner.)
From all this, it seems like 16-way inlining or 8-way parallel variables should be the way to go. You'll have to try this on your own platform to make sure - I don't know what the wider architecture will do to the numbers.
Edit:
Following a suggestion from #pvg, I added this code:
int ntimes = 0;
// ... and this one.
...
// You can change anything between this comment ...
if (ntimes++ == 0) {
Which reduced the run time to < 0.01 seconds. ;-) It's a winner, if you don't get hit with the F-stick.

Largest Slice Sum from Two Different Arrays

Original Problem: Problem 1 (INOI 2015)
There are two arrays A[1..N] and B[1..N]
An operation SSum is defined on them as
SSum[i,j] = A[i] + A[j] + B[t (where t = i+1, i+2, ..., j-1)] when i < j
SSum[i,j] = A[i] + A[j] + B[t (where t = 1, 2, ..., j-1, i+1, i+2, ..., N)] when i > j
SSum[i,i] = A[i]
The challenge is to find the largest possible value of SSum.
I had an O(n^2) solution based on computing the Prefix Sums of B
#include <iostream>
#include <utility>
int main(){
int N;
std::cin >> N;
int *a = new int[N+1];
long long int *bPrefixSums = new long long int[N+1];
for (int iii=1; iii<=N; iii++) //1-based arrays to prevent confusion
std::cin >> a[iii];
bPrefixSums[0] = 0;
for (int b,iii=1; iii<=N; iii++){
std::cin >> b;
bPrefixSums[iii] = bPrefixSums[iii-1] + b;
}
long long int SSum, SSumMax=-(1<<10);
for (int i=1; i <= N; i++)
for (int j=1; j <= N; j++){
if (i<j)
SSum = a[i] + a[j] + (bPrefixSums[j-1] - bPrefixSums[i]);
else if (i==j)
SSum = a[i];
else
SSum = a[i] + a[j] + ((bPrefixSums[N] - bPrefixSums[i]) + bPrefixSums[j-1]);
SSumMax = std::max(SSum, SSumMax);
}
std::cout << SSumMax;
return 0;
}
For larger values of N around 10^6, the program fails to complete the task in 3 seconds.
Since I didn't get enough rep to add a comment, I shall just write the ideas here in this answer.
This problem is really nice, and I was actually inspired by this link. Thanks to #superty.
We may consider this problem separately, in other words, into three conditions: i == j, i < j, i > j. And we only need to find the maximum result.
Consider i == j: The maximum result should be a[i], and it's easy to find the answer in O(n) time complexity.
Consider i < j: It's quite similar to the classical maximum sum problem, and for each j we only need to find the i in the left which manages to make the result maximum.
Think about the classical problem first, if we are asked to get the maximum partial sum for array a, we calculate the prefix-sum of a in order to get an O(n) complexity. Now in this problem, it is almost the same.
You can see that here(i < j), we have SSum[i,j] = A[i] + A[j] + B[t (where t = i+1, i+2, ..., j-1)] = (B[1] + B[2] + ... + B[j - 1] + A[j]) - (B[1] + B[2] + ... B[i] - A[i]), and the first term stays the same when j stays the same while the second term stays the same when i stays the same. So the solution now is quite clear, you get two 'prefix-sum' and find the smallest prefix_sum_2[i] for each prefix_sum_1[j].
Consider i > j: It's quite similar with this discussion on SO(but this discussion doesn't help much).
Similarly, we get SSum[i,j] = A[i] + A[j] + B[t (where t = 1, 2, ..., j-1, i+1, i+2, ..., N)] = (B[1] + B[2] + ... + B[j - 1] + A[j]) + (A[i] + B[i + 1] + ... + B[n - 1] + B[n]). Now you need to get both the prefix-sum and the suffix-sum of the array (we need prefix_sum[i] = a[i] + prefix_sum[i - 1] - a[i - 1] and suffix similarly), and get another two arrays, say ans_left[i] as the maximum value of the first term for all j <= i and ans_right[j] as the maximum value of the second term for i >= j, so the answer in this condition is the maximum value among all (ans_left[i] + ans_right[i + 1])
Finally, the maximum result required for the original problem is the maximum of the answers for these three sub-cases.
It's clear to see that the total complexity is O(n).

kth smallest element in two sorted array - O(log n) Solution

The above is one of the interview question. There is an article about 0(log n) algorithm explaining the invariant (i + j = k – 1). I'm having much difficulty in understanding this algorithm. Could anyone explain this algorithm in simple way and also why do they calculate i as (int)((double)m / (m+n) * (k-1)). I appreciate your help. Thanks.
protected static int kthSmallestEasy(int[] A, int aLow, int aLength, int[] B, int bLow, int bLength, int k)
{
//Error Handling
assert(aLow >= 0); assert(bLow >= 0);
assert(aLength >= 0); assert(bLength >= 0); assert(aLength + bLength >= k);
int i = (int)((double)((k - 1) * aLength / (aLength + bLength)));
int j = k - 1 - i;
int Ai_1 = aLow + i == 0 ? Int32.MinValue : A[aLow + i - 1];
int Ai = aLow + i == A.Length ? Int32.MaxValue : A[aLow + i];
int Bj_1 = bLow + j == 0 ? Int32.MinValue : B[bLow + j - 1];
int Bj = bLow + j == B.Length ? Int32.MaxValue : B[bLow + j];
if (Bj_1 < Ai && Ai < Bj)
return Ai;
else if (Ai_1 < Bj && Bj < Ai)
return Bj;
assert(Ai < Bj - 1 || Bj < Ai_1);
if (Ai < Bj_1) // exclude A[aLow .. i] and A[j..bHigh], k was replaced by k - i - 1
return kthSmallestEasy(A, aLow + i + 1, aLength - i - 1, B, bLow, j, k - i - 1);
else // exclude A[i, aHigh] and B[bLow .. j], k was replaced by k - j - 1
return kthSmallestEasy(A, aLow, i, B, bLow + j + 1, bLength - j - 1, k - j - 1);
Could anyone explain this algorithm in simple way.
Yes, it is essentially a bisection algorithm.
In successive passes, it moves probes on one array index upward and the other index array downward, seeking equal values while keeping the sum of the two indices equal to k.
and also why do they calculate i as (int)((double)m / (m+n) * (k-1)).
This gives an estimate of the new half-way point assuming an equidistribution of values between the known points.

complexity for a nested loop with varying internal loop

Very similar complexity examples. I am trying to understand as to how these questions vary. Exam coming up tomorrow :( Any shortcuts for find the complexities here.
CASE 1:
void doit(int N) {
while (N) {
for (int j = 0; j < N; j += 1) {}
N = N / 2;
}
}
CASE 2:
void doit(int N) {
while (N) {
for (int j = 0; j < N; j *= 4) {}
N = N / 2;
}
}
CASE 3:
void doit(int N) {
while (N) {
for (int j = 0; j < N; j *= 2) {}
N = N / 2;
}
}
Thank you so much!
void doit(int N) {
while (N) {
for (int j = 0; j < N; j += 1) {}
N = N / 2;
}
}
To find the O() of this, notice that we are dividing N by 2 each iteration. So, (not to insult your intelligence, but for completeness) the final non-zero iteration through the loop we will have N=1. The time before that we will have N=a(2), then before that N=a(4)... where 0< a < N (note those are non-inclusive bounds). So, this loop will execute a total of log(N) times, meaning the first iteration we see that N=a2^(floor(log(N))).
Why do we care about that? Well, it's a geometric series which has a nice closed form:
Sum = \sum_{k=0}^{\log(N)} a2^k = a*\frac{1-2^{\log N +1}}{1-2} = 2aN-a = O(N).
If someone can figure out how to get that latexy notation to display correctly for me I would really appreciate it.
You already have the answer to number 1 - O(n), as given by #NickO, here is an alternative explanation.
Denote the number of outer repeats of inner loop by T(N), and let the number of outer loops be h. Note that h = log_2(N)
T(N) = N + N/2 + ... + N / (2^i) + ... + 2 + 1
< 2N (sum of geometric series)
in O(N)
Number 3: is O((logN)^2)
Denote the number of outer repeats of inner loop by T(N), and let the number of outer loops be h. Note that h = log_2(N)
T(N) = log(N) + log(N/2) + log(N/4) + ... + log(1) (because log(a*b) = log(a) + log(b)
= log(N * (N/2) * (N/4) * ... * 1)
= log(N^h * (1 * 1/2 * 1/4 * .... * 1/N))
= log(N^h) + log(1 * 1/2 * 1/4 * .... * 1/N) (because log(a*b) = log(a) + log(b))
< log(N^h) + log(1)
= log(N^h) (log(1) = 0)
= h * log(N) (log(a^b) = b*log(a))
= (log(N))^2 (because h=log_2(N))
Number 2 is almost identical to number 3.
(In 2,3: assuming j starts from 1, not from 0, if this is not the case #WhozCraig giving the reason why it never breaks)

Resources