How are big O notation equations assigned in search algorithms? - c

For linear search it makes sense that the run time is big O of N since it will always be one step. As for my understanding of bubble sort it's runtime is O of n^2 this makes sense to me because you'd iterate the number of elements in the an array and each time compare two values till the end of said array.
But for merge sort it's always splitting the data in half, so I'm confused as to explanation as to why the run time is n log n. Additionally I want to clarify my understanding of insertion sorts runtime big O of n^2. Since insertion sort looks for the smallest number then compares it to every single number of the array it would be n^2 because it will loop through the array contents for every iteration.
If I could be given some advice about merge sort, and general understanding of run times that'd be appreciated. I am an absolute newbie and wanted to throw that disclaimer out there.

Let's assume that sorting of an array of N elements is taking T(N) time. In merge sort we know that we need to sort two arrays of N/2 elements (that is 2*T(N/2)) and then merge them (in O(N) time complexity, that is c*N for some constant c).
So, T(N) = 2T(N/2) + c*N.
We could stop here, as it is basically the "equation" you asking about. But let's go a bit further.
To simplify things, we can show that T(N) = kN log N as follows (for some constant k):
Let's substitute T on both sides of the equation we have derived:
kN log N = 2 * k*(N/2) log (N/2) + c*N
and expand the right hand side (assuming log with base 2):
= k*N *(log N - log 2) + c*N = k*N *(log N - 1) + c*N = kNlog N + (c-k)N
That is for c=k the equality holds, and it proves that T(N) if of a form kN log N, that is O(N log N)

Related

Time Complexity of Insertion and Selection sort When there are only two key values in an array

I am reviewing Algorithm, 4th Editon by sedgewick recently, and come across such a problem and cannot solve it.
The problem goes like this:
2.1.28 Equal keys. Formulate and validate hypotheses about the running time of insertion
sort and selection sort for arrays that contain just two key values, assuming that
the values are equally likely to occur.
Explanation: You have n elements, each can be 0 or 1 (without loss of generality), and for each element x: P(x=0)=P(x=1).
Any help will be welcomed.
Selection sort:
The time complexity is going to remain the same (as it is without the 2 keys assumption), it is independent on the values of the arrays, only the number of elements.
Time complexity for selection sort in this case is O(n^2)
However, this is true only for the original algorithm that scans the entire tail of the array for each outer loop iteration. if you optimize it to find the next "0", at iteration i, since you have already "cleared" the first i-1 zeros, the i'th zero mean location is at index 2i. This means each time, the inner loop will need to do 2i-(i-1)=i+1 iterations.
Suming it up will be:
1 + 2 + ... + n = n(n+1)/2
Which is, unfortunately, still in O(n^2).
Another optimization could be to "remmber" where you have last stopped. This will significantly improve complexity to O(n), since you don't need to traverse the same element more than once - but that's going to be a different algorithm, not selection sort.
Insertion Sort:
Here, things are more complicated. Note that in the inner loop (taken from wikipedia), the number of operations depends on the values:
while j > 0 and A[j-1] > x
However, recall that in insertion sort, after the ith step, the first i elements are sorted. Since we are assuming P(x=0)=P(x=1), an average of i/2 elements are 0's and i/2 are 1's.
This means, the time complexity on average, for the inner loop is O(i/2).
Summing this up will get you:
1/2 + 2/2 + 3/2 + ... + n/2 = 1/2* (1+2+...+n) = 1/2*n(n+1)/2 = n(n+1)/4
The above is however, still in O(n^2).
The above is not a formal proof, because it implicitly uses E(f(E(x)) = E(f(x)), which is not true, but it can give you guidelines how to formally build your proof.
Well obviosuly you only need to search until you find the first 0, when searching for the next smmalest. For example, in the selection sort, you scan the array looking for the next smallest number to swap into the current position. Since there are only 0s and 1s you can stop the scan when encountering the first 0 (since it is the next smallest number), so there is no need to continue scanning the rest of the array in this cycle. If 0 is not found then the sorting is complete, since the "unsorted" portion is all 1s.
Insertion sort is basically the same. They are both O(N) in this case.

The number of distinct integers in an array is O(log n). How to get an O(n log log n) worst-case time algorithm to sort such sequences?

The question is from the The Algorithm Design Manual. I have been working on it but haven't found a method to arrive at the right answer.
Question:
We seek to sort a sequence S of n integers with many duplications, such that the number of distinct integers in S is O(log n). Give an O(n log log n) worst-case time algorithm to sort such sequences.
I think maybe can first pick all these distinct elements and form an array of logn length and then record the frequency and sort it. However the my first step seems to blow up running time too much...Is any superior selection method or is my method totally wrong? Thanks
Use a balanced binary tree to calculate the number of occurrences of each number. Since there are only log N distinct numbers, the size of the tree is log N, and thus all operations is performed in log log N. (this is exactly how map<> is implemented is C++)
Then, just iterate the nodes of the tree in a pre-order traversal, and print each integer the required number of times in this order.
Create an array containing pairs of (unique numbers, count). The array is initially empty and kept sorted.
For each number in your original array, look the number up in the sorted array using binary search. Since the array has size O (log N), the binary search each time takes O (log log N), you do that N times, total O (N log log N). When found, you increase the count.
When not found, you insert the new number with a count of 1. This operation only happens O (log N) times, and is trivially done in O (log N) steps, for a total of O (log^2 N), which is much smaller than O (N log log N).
When you are done, fill the original array with the required numbers. That takes O (N).
There's really no need to create a balanced sorted tree to make the insertions faster, because the set of unique numbers is so small.
If the set of integers is all contained in a range X ≤ number ≤ Y, then the problem can be solved in O (max (N, Y - X + 1)) using an array of X - Y + 1 counters and not even bothering to find unique numbers. The technique is reportedly used to great effect in Iain Banks' book "Player of Games".

find nth-smallest value across m sorted arrays using idea from 2 sorted arrays

May I ask whether would it be possible? the general approach would be somehow like find n-th value on two sorted array, to ignore the insignificants and try to focus on the rest by adjusting the value of n in recursion
The 2 sorted arrays problem would yield a computation time O(log(|A|)+log(|B|), while the question is similar, I would like to ask if there exist algorithm for m sorted arrays for time O(log(|A1|)+log(|A2|)+...+log(|Am|)),
or some similar variation that is near the time I mentioned above (due to the variable m, we might need some other sorting algorithm for the pivots from those arrays),
or if such algorithm doesn't exist, why?
I just can't find this algorithm from googling
There is a simple randomized algorithm:
Select a pivot randomly from any of the m arrays. Let's call it x
For every array, do a binary search for x to find out how many values < x are in the array. Say we have ri values < x in array i. We know that x has rank r = sum(i = 1 to m, ri) in the union of all arrays.
If n <= r, we restrict each array i to the indices 0...(ri - 1) and recurse. If n > r, we restrict each array to the indices ri...|Ai | - 1
repeat
The expected recursion depth is O(log(N)) with N being the total number of elements, with a proof similar to that of Quickselect, so the expected running time is something like O(m * log2(N)).
The paper "Generalized Selection and Ranking" by Frederickson and Johnson proposes selection and ranking algorithms for different scenarios, for example an O(m + c * log(k/c)) algorithm to select the k-th element from m equally sized sorted sequences, with c = min{m, k}.

finding the maximum number in array

there is an array of numbers an this array is irregular and we should find a maximum number (n) that at least n number is bigger than it (this number may be in array and may not be in array )
for example if we give 2 5 7 6 9 number 4 is maximum number that at least 4 number (or more than it ) is bigger than 4 (5 6 7 9 are bigger)
i solve this problem but i think it gives time limit in big array of numbers so i want to resolve this problem in another way
so i use merge sort for sorting that because it take nlog(n) and then i use a counter an it counts from 1 to k if we have k number more than k we count again for example we count from 1 to 4 then in 5 we don't have 5 number more than 5 so we give k-1 = 4 and this is our n .
it's good or it maybe gives time limit ? does anybody have another idea ?
thanks
In c++ there is a function called std::nth_element and it can find the nth element of an array in linear time. Using this function you should find the N - n- th element (where N is the total number of elements in the array) and subtract 1 from it.
As you seek a solution in C you can not make use of this function, but you can implement your solution similarly. nth_element performs something quite similar to qsort, but it only performs partition on the part of the array where the n-th element is.
Now let's assume you have nth_element implemented. We will perform something like combination of binary search and nth_element. First we assume that the answer of the question is the middle element of the array (i.e. the N/2-th element). We use nth_element and we find the N/2th element. If it is more than N/2 we know the answer to your problem is at least N/2, otherwise it will be less. Either way in order to find the answer we will only continue with one of the two partitions created by the N/2th element. If this partition is the right one(elements bigger than N/2) we continue solving the same problem, otherwise we start searching for the max element M on the left of the N/2th element that has at least x bigger elements such that x + N/2 > M. The two subproblems will have the same complexity. You continue performing this operation until the interval you are interested in is of length 1.
Now let's prove the complexity of the above algorithm is linear. First nth_element is linear performing operations in the order of N, second nth_element that only considers one half of the array will perform operations in the order of N/2 the third - in the order of N/4 and so on. All in all you will perform operations in the order of N + N/2 + N/4 + ... + 1. This sum is less than 2 * N thus your complexity is still linear.
Your solution is asymptotically slower than what I propose above as it has a complexity O(n*log(n)), while my solution has complexity of O(n).
I would use a modified variant of a sorting algorithm that uses pivot values.
The reason is that you want to sort as few elements as possible.
So I would use qsort as my base algorithm and let the pivot element control which partition to sort (you will only need to sort one).

Time complexity in Divide and conquer algorithms

Could you please help me in understanding the Time Complexity for Divide and Conquer algorithm.
Let's take example of this one.
http://www.geeksforgeeks.org/archives/4583 Method 2:
It gave T(n) = 3/2n -2 and i don't understand why?
I am sorry, if i gave you an extra page to open too but i really wanna understand atleast to a good high level so that i can find the complexity of such algorithms on my own, You answer is highly appreciated.
Can't open this link due to some reason. I'll still give it a try.
When you use the divide and conquer strategy, what you do is you break up the problem into many smaller problems and then you combine the solutions for the small problems to get the solution for the main problem.
How to solve the smaller problems: By breaking them up further. This process of breaking up continues until you reach a level where the problem is small enough to be handled directly.
How to compute time complexity:
Assume the time taken by your algo is T(n). Notice that time taken is a function of the problem size i.e. n.
Now, notice what you are doing. You break up the problems into let's say k parts each of size n/k (they may not be equal in size, in which case you'll have to add the time taken by them individually). Now, you'll solve these k parts. Time taken by each part would be T(n/k) because the problem size is reduced to n/k now. And you are solving k of these. So, it takes k * T(n/k) time.
After solving these smaller problems, you'll combine their solutions. This will also take some time. And that time will be a function of your problem size again. (It could also be constant). Let that time be O(f(n)).
So, total time taken by your algorithm will be:
T(n) = (k * T(n/k)) + O(f(n))
You've got a recurrence relation now which you can solve to get T(n).
As this link indicate:
T(n) = T(floor(n/2)) + T(ceil(n/2)) + 2
T(2) = 1
T(1) = 0
for T(2), it is a base with single comparison before returning. for T(1) it is a base without any comparison.
For T(n): You recursively call the method for two halves of the array, and compare the two (min,max) tuples to find the real min and max, which gives you the above T(n) equation
If n is a power of 2, then we can write T(n) as:
T(n) = 2T(n/2) + 2
This is well explaining itself.
T(n) = 3/2n -2
In here, you solve it with induction:
Base case: for n=2: T(2) = 1 = (3/2)*2 -2
We assume T(k) = (3/2)k - 2 for each k < n
T(n) = 2T(n/2) + 2 = (*) 2*((3/2*(n/2)) -2) + 2 = 3*(n/2) -4 + 2 = (3/2)*n -2
(*)induction assumption, is true because n/2 < n
Because we proved the induction correct, we can conclude: T(n) = (3/2)n - 2

Resources