I have an array of N numbers which are same.I am applying Quick sort on it.
What should be the time complexity of the sorting in this case.
I goggled around this question but did not get the exact explanation.
Any help would be appreciated.
This depends on the implementation of Quicksort. The traditional implementation which partitions into 2 (< and >=) sections will have O(n*n) on identical input. While no swaps will necessarily occur, it will cause n recursive calls to be made - each of which need to make a comparison with the pivot and n-recursionDepth elements. i.e. O(n*n) comparisons need to be made
However there is a simple variant which partitions into 3 sets (<, = and >). This variant has O(n) performance in this case - instead of choosing the pivot, swapping and then recursing on 0to pivotIndex-1 and pivotIndex+1 to n, it will put swap all things equal to the pivot to the 'middle' partition (which in the case of all identical inputs always means swapping with itself i.e. a no-op) meaning the call stack will only be 1 deep in this particular case n comparisons and no swaps occur. I believe this variant has made its way into the standard library on linux at least.
The performance of quicksort depends on the pivot selection. The closer the chosen pivot is to the median element, the better is quicksort's performance.
In this specific case you're lucky - the pivot you select will always be a median, since all values are the same. The partition step of quicksort will hence never have to swap elements, and the two pointers will meet exactly in the middle. The two subproblems will have therefore be exactly half the size - giving you a perfect O(n log n).
To be a little more specific, this depends on how well the partition step is implemented. The loop-invariant only needs to make sure that smaller elements are in the left-hand sub-problem, while greater elements are in the right-hand sub-problem. There's no guarantee that a partition implementation never swaps equal elements. But it is always unnecessary work, so no clever implementation should do it: The left and right pointers will never detect an inversion respective the pivot (i.e. you will never hit the case where *left > pivot && *right < pivot) and so the left pointer will be incremented, the right pointer will be decremented every step and they will finally meet in the middle, generating subproblems of size n/2.
It depends on the particular implementation.
If there is only one kind of comparison (≤ or <) to determine where the other elements go relative to the pivot, they will all go into one of the partitions, and you will get O(n2) performance, since the problem size will decrease by only 1 each step.
The algorithm listed here is an example (the accompanying illustration are for a different algorithm).
If there are two kinds of comparisons, for example < for elements on the left and > for elements on the right, as is the case in a two-pointer implementation, and if you take care to move the pointers in step, then you might get perfect O(n log n) performance, because half the equal elements will be split evenly in the two partitions.
The illustrations in the link above use an algorithm which doesn't move pointers in step, so you still get poor performance (look at the "Few unique" case).
So it depends whether you have this special case in mind when implementing the algorithm.
Practical implementations often handle a broader special case: if there are no swaps in the partitioning step, they assume the data is nearly sorted, and use an insertion sort, which gives an even better O(n) in the case of all equal elements.
tobyodavies has provided the right solution. It does handle the case and finish in O(n) time when all the keys are equal.
It is the same partitioning as we do in dutch national flag problem
http://en.wikipedia.org/wiki/Dutch_national_flag_problem
Sharing the code from princeton
http://algs4.cs.princeton.edu/23quicksort/Quick3way.java.html
If you implement the 2-way partitioning algorithm then at every step the array will be halved. This is because when identical keys will be encountered, the scan stops. As a result at each step, the partitioning element will be positioned at the center of the subarray thereby halving the array in every subsequent recursive call. Now, this case is similar to the mergesort case which uses ~N lg N compares to sort an array of N elements. Ergo for duplicate keys, the traditional 2-way partitioning algorithm for Quicksort uses ~N lg N compares, thereby following a linearithmic approach.
Quick Sort code is done using "partition" and "quicksort" functions.
Basically, there are two best ways for implementing Quicksort.
The difference between these two is only the "partition" function,
1.Lomuto
2.Hoare
With a partitioning algorithm such as the Lomuto partition scheme described above (even one that chooses good pivot values), quicksort exhibits poor performance for inputs that contain many repeated elements. The problem is clearly apparent when all the input elements are equal: at each recursion, the left partition is empty (no input values are less than the pivot), and the right partition has only decreased by one element (the pivot is removed). Consequently, the Lomuto partition scheme takes quadratic time to sort an array of equal values.
So, this takes O(n^2) time using the Lomuto partition algorithm.
By using the Hoare partition algorithm we get the best case with all the array elements equal. The time complexity is O(n).
Reference: https://en.wikipedia.org/wiki/Quicksort
Related
What is the minimum number of recursive call to sort a list of n elements using quick sort.
I cannot understand that how many times the recursion function is actually called and specifically what is meant by "minimum number "
Quick Sort is a collection of algorithms where a data set is sorted by choosing a pivot value, partitioning the data and recursing on the partitions until the partition size is smaller than 2.
Simple implementations use the first, last or middle element as pivot and partition the data into 2 sets:
the elements that compare less or equal to the pivot
the elements that compare greater or equal to the pivot
Efficient implementations use elaborate methods to choose the pivot value and partition the data into 3 sets:
the elements that compare less than the pivot
the elements that compare equal to the pivot
the elements that compare greater than the pivot
They might also switch to a different algorithm below a certain partition size and or if a pathological distribution is detected, to avoid quadratic time complexity.
For the 3 sets implementations, the optimal cases are those where all elements compare equal and no recursion is needed. This constitutes the minimum number of recursive calls: 0.
In other cases, the number of recursive calls is highly dependent on the data distribution, the pivot selection method and other implementation choices such as:
base case handling: testing the partition length before recursing or upon entering the function,
switching to a different algorithm such as insertion sort for small partitions or shell sort for pathological distributions
On average the number of recursive calls for quick sort is approximately:
2n if the length test is only at the start of the function
n if the test is performed before recursing
n/t if switching to another algorithm for partition lengths below a threshold of t.
Note that the depth of recursion, which is a different but important question, can be limited to log2(n) by combining iteration and recursion, recursing on the smaller partition and iterating on the larger one.
Note also that quick sort can be implemented without recursion, using small arrays of length log2(n) to keep track of pending partitions.
Suppose I have 2 cases:
Case 1: I always choose the 1st element as pivot. In this case the worst case O(n2) is when the array is already sorted or reverse sorted.
Case 2: I choose a random element as pivot. Here worst case O(n2) is possible when the random pivot is always the max or the min element in the subarray.
Can't I argue that if we are given a Random array, P(O(n2) in Case 1) = P(O(n2) in case 2). Because intuitively P(sorted array or reverse sorted array) = P(random pivot is always the max or the min element in the subarray)?
If so, how is the 2nd case any good because we need extra effort to select random pivot? We need 2nd case only when the data would be following a certain pattern. Am I right? Please enlighten.
When all permutations of the input are equally likely, the probability of every time choosing a bad pivot is the same for both strategies (first or random). It would be the same for any strategy that makes no comparison (middle, third, alternating between beforelast and second...).
(This might be different for a strategy that compares elements, such as median-of-three.)
But the truth is that in practice, the permutations aren't equiprobable at all and there is a strong bias toward the nearly sorted sequences.
Said differently, when the input is well shuffled or when you choose the pivot randomly, you must be very unlucky to do a bad drawing every time and the probability of the worst-case is infinitesimal. For a sorted sequences, odds are quite different as you are sure to lose every time !
As a side note, picking a random value indeed has a cost, which is not neglectible compared to the partitioning of small sequences. This is why it matters to switch to a more straightforward sort for sequences of length L or less, and tune the value of L to optimal.
To avoid the worst case, you have to choose the optimal pivot for each subdivision: the median element. If you use any short-cut method to select the pivot, like random, first, median-of-three or whatever, then the possibility is there of encountering the worst case. It's just a question of probabilities.
Certain input cases are likely to occur, at least in some applications, such as the case when the elements are already sorted or nearly sorted.
If there is going to be the threat of worst-case behavior, then it is good to at least mitigate the threat by preventing input cases which are likely from triggering that worst case behavior.
By picking predictable element, like first element, you could easily hit the worst case. If a grain of randomness is added, then the pattern will likely be broken and at some point actual running time of the sorting algorithm will be lower than O(N^2).
On a related note, random picking the pivot is not that good idea either. There are techniques, such as median of medians, which are coming with a proof that worst case running time will still be O(NlogN). That is huge advantage over taking the first element as the pivot.
You can refer to this article for implementation based on median of medians: Finding Kth Smallest Element in an Unsorted Array
We're not worried about the runtime for when we're given a randomly-generated array. We're worried about the runtime when the array is sorted or near-sorted, which is actually pretty common. We're also worried about the runtime when the array is generated by an adversary with elements specifically chosen to ruin our day.
If it were just about random input, picking the first element as the pivot would be fine.
I have an array of n elements in which only one element is not repeated, else all the other numbers are repeated >1 times. And there is no limit on the range of the numbers in the array.
Some solutions are:
Making use of hash, but that would result in linear time complexity but very poor space complexity
Sorting the list using MergeSort O(nlogn) and then finding the element which doesn't repeat
Is there a better solution?
One general approach is to implement a bucketing technique (of which hashing is such a technique) to distribute the elements into different "buckets" using their identity (say index) and then find the bucket with the smallest size (1 in your case). This problem, I believe, is also known as the minority element problem. There will be as many buckets as there are unique elements in your set.
Doing this by hashing is problematic because of collisions and how your algorithm might handle that. Certain associative array approaches such as tries and extendable hashing don't seem to apply as they are better suited to strings.
One application of the above is to the Union-Find data structure. Your sets will be the buckets and you'll need to call MakeSet() and Find() for each element in your array for a cost of $O(\alpha(n))$ per call, where $\alpha(n)$ is the extremely slow-growing inverse Ackermann function. You can think of it as being effectively a constant.
You'll have to do Union when an element already exist. With some changes to keep track of the set with minimum cardinality, this solution should work. The time complexity of this solution is $O(n\alpha(n))$.
Your problem also appears to be loosely related to the Element Uniqueness problem.
Try a multi-pass scanning if you have strict space limitation.
Say the input has n elements and you can only hold m elements in your memory. If you use a hash-table approach, in the worst case you need to handle n/2 unique numbers so you want m>n/2. In case you don't have that big m, you can partition n elements to k=(max(input)-min(input))/(2m) groups, and go ahead scan the n input elements k times (in the worst case):
1st run: you only hash-get/put/mark/whatever elements with value < min(input)+m*2; because in the range (min(input), min(input)+m*2) there are at most m unique elements and you can handle that. If you are lucky you already find the unique one, otherwise continue.
2nd run: only operate on elements with value in range (min(input)+m*2, min(input)+m*4), and
so on, so forth
In this way, you compromise the time complexity to a O(kn), but you get a space complexity bound of O(m)
Two ideas come to my mind:
A smoothsort may be a better alternative than the cited mergesort for your needs given it's O(1) in memory usage, O(nlogn) in the worst case as the merge sort but O(n) in the best case;
Based on the (reverse) idea of the splay tree, you could make a type of tree that would
push the leafs toward the bottom once they are used (instead of upward as in the splay tree). This would still give you a O(nlogn) implantation of the sort, but the advantage would be the O(1) step of finding the unique element, it would be the root. The sorting algorithm is the sum of O(nlogn) + O(n) and this algorithm would be O(nlogn) + O(1)
Otherwise, as you stated, using a hash based solution (like hash-implemented set) would result in a O(n) algorithm (O(n) to insert and add a counting reference to it and O(n) to traverse your set to find the unique element) but you seemed to dislike the memory usage, though I don't know why. Memory is cheap, these days...
Got asked this in a lecture...stumped by it a bit.
how can you guarantee that quicksort will always sort an array of integers?
Thanks.
Gratuitously plagiarising Wikipedia:
The correctness of the partition algorithm is based on the following
two arguments:
At each iteration, all the elements processed so far
are in the desired position: before the pivot if less than the pivot's
value, after the pivot if greater than the pivot's value (loop
invariant).
Each iteration leaves one fewer element to be processed
(loop variant).
The correctness of the overall algorithm can be proven
via induction: for zero or one element, the algorithm leaves the data
unchanged; for a larger data set it produces the concatenation of two
parts, elements less than the pivot and elements greater than it,
themselves sorted by the recursive hypothesis.
Quicksort function by taking a pivot value, and sorting the remaining data in to two groups. One higher and one lower. You then do this to the each group in turn until you get groups no larger than one. At this point you can guarantee that the data is sorted because you can guarantee that any pivot value is in the correct place because you have directly compared it with another pivot value, which is also in the correct place. In the end, you are left with sets of size 1 or size 0 which cannot be sorted because they cannot be rearranged and thus are already sorted.
Hope this helps, it was what we were taught for A Level Further Mathematics (16-18, UK).
Your professor may be referring to "stability." Have a look here: http://en.wikipedia.org/wiki/Stable_sort#Stability. Stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different then this distinction is not necessary.
Quicksort (in efficient implementations) is not a stable sort, so one way to guarantee stability would be to insure that there are no duplicate integers in your array.
I am trying to sort an array which has properties like
it increases upto some extent then it starts decreasing, then increases and then decreases and so on. Is there any algorithm which can sort this in less then nlog(n) complexity by making use of it being partially ordered?
array example = 14,19,34,56,36,22,20,7,45,56,50,32,31,45......... upto n
Thanks in advance
Any sequence of numbers will go up and down and up and down again etc unless they are already fully sorted (May start with a down, of course). You could run through the sequence noting the points where it changes direction, then then merge-sort the sequences (reverse reading the backward sequences)
In general the complexity is N log N because we don't know how sorted it is at this point. If it is moderately well sorted, i.e. there are fewer changes of direction, it will take fewer comparisons.
You could find the change / partition points, and perform a merge sort between pairs of partitions. This would take advantage of the existing ordering, as normally the merge sort starts with pairs of elements.
Edit Just trying to figure out the complexity here. Merge sort is n log(n), where the log(n) relates to the number of times you have to re-partition. First every pair of elements, then every pair of pairs, etc... until you reach the size of the array. In this case you have n elements with p partitions, where p < n, so I'm guessing the complexity is p log(p), but am open to correction. e.g. merge each pair of paritions, and repeat based on half the number of partitions after the merge.
See Topological sorting
If you know for a fact that the data are "almost sorted" and the set size is reasonably small (say an array that can be indexed by a 16-bit integer), then Shell is probably your best bet. Yes, it has a basic time complexity of O(n^2) (which can be reduced by the sequence used for gap sizing to a current best-worst-case of O(n*log^2(n))), but the performance improves with the sortedness of the input set to a best-case of O(n) on an already-sorted set. Using Sedgewick's sequence for gap size will give the best performance on those occasions when the input is not as sorted as you expected it to be.
Strand Sort might be close to what you're looking for. O(n sqrt(n)) in the average case, O(n) best case (list already sorted), O(n^2) worst case (list sorted in reverse order).
Share and enjoy.