What is the minimum number of recursive call to sort a list of n elements using quick sort.
I cannot understand that how many times the recursion function is actually called and specifically what is meant by "minimum number "
Quick Sort is a collection of algorithms where a data set is sorted by choosing a pivot value, partitioning the data and recursing on the partitions until the partition size is smaller than 2.
Simple implementations use the first, last or middle element as pivot and partition the data into 2 sets:
the elements that compare less or equal to the pivot
the elements that compare greater or equal to the pivot
Efficient implementations use elaborate methods to choose the pivot value and partition the data into 3 sets:
the elements that compare less than the pivot
the elements that compare equal to the pivot
the elements that compare greater than the pivot
They might also switch to a different algorithm below a certain partition size and or if a pathological distribution is detected, to avoid quadratic time complexity.
For the 3 sets implementations, the optimal cases are those where all elements compare equal and no recursion is needed. This constitutes the minimum number of recursive calls: 0.
In other cases, the number of recursive calls is highly dependent on the data distribution, the pivot selection method and other implementation choices such as:
base case handling: testing the partition length before recursing or upon entering the function,
switching to a different algorithm such as insertion sort for small partitions or shell sort for pathological distributions
On average the number of recursive calls for quick sort is approximately:
2n if the length test is only at the start of the function
n if the test is performed before recursing
n/t if switching to another algorithm for partition lengths below a threshold of t.
Note that the depth of recursion, which is a different but important question, can be limited to log2(n) by combining iteration and recursion, recursing on the smaller partition and iterating on the larger one.
Note also that quick sort can be implemented without recursion, using small arrays of length log2(n) to keep track of pending partitions.
Related
I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?
If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.
Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).
The best case scenario of insertion sort is meant to be O(n), however, if you have 2 elements in an array that are already sorted, such as 10 and 11, doesn't it only make one comparison rather than 2?
Time complexity of O(n) does not mean that the number of steps is exactly n, it means that the number of steps is dominated by a linear function. Basically, sorting twice as many elements should take at most twice as much time for large numbers.
The best case scenario for insert sort is when you can insert the new element after just one comparison. This can happen in only 2 cases:
You are inserting elements in from a reverse sorted list and you compare the new element with the first element of the target list.
You are inserting elements from a sorted list and you compare the new element with the last one of the target list.
In these 2 cases, each new element is inserted after just one comparison, including in the case you mention.
The time complexity would be indeed O(n) for these very special cases. You do not need such a favorable case for this complexity, the time complexity will be O(n) if there is a constant upper bound for the number of comparisons independent of the list length.
Note that it is a common optimization to try and handle sorted lists in an optimized way. If the optimization mentioned in the second paragraph above is not implemented, sorting an already sorted list would be the worst case scenario, with n comparisons for the insertion of the n+1th element.
In the general case, insertion sort on lists has a time complexity of O(n2), but careful implementation can produce an optimal solution for already sorted lists.
Note that this is true for lists where inserting at any position has a constant cost, insertion sort on arrays does not have this property. It can still be optimized to handle these special cases, but not both at the same time.
Insertion sort does N - 1 comparisons if the input is already sorted.
This is because for every element it compares it with a previous element and does something if the order is not right (it is not important what it does now, because the order is always right). So you will do this N - 1 times.
So it looks like you have to understand a big O notation. Because O(n) does not mean n operations, it does not even mean close to n operations (n/10^9 is O(n) and it is not really close to n). All it mean that the function approximately linear (think about it as limit where n-> inf).
Got asked this in a lecture...stumped by it a bit.
how can you guarantee that quicksort will always sort an array of integers?
Thanks.
Gratuitously plagiarising Wikipedia:
The correctness of the partition algorithm is based on the following
two arguments:
At each iteration, all the elements processed so far
are in the desired position: before the pivot if less than the pivot's
value, after the pivot if greater than the pivot's value (loop
invariant).
Each iteration leaves one fewer element to be processed
(loop variant).
The correctness of the overall algorithm can be proven
via induction: for zero or one element, the algorithm leaves the data
unchanged; for a larger data set it produces the concatenation of two
parts, elements less than the pivot and elements greater than it,
themselves sorted by the recursive hypothesis.
Quicksort function by taking a pivot value, and sorting the remaining data in to two groups. One higher and one lower. You then do this to the each group in turn until you get groups no larger than one. At this point you can guarantee that the data is sorted because you can guarantee that any pivot value is in the correct place because you have directly compared it with another pivot value, which is also in the correct place. In the end, you are left with sets of size 1 or size 0 which cannot be sorted because they cannot be rearranged and thus are already sorted.
Hope this helps, it was what we were taught for A Level Further Mathematics (16-18, UK).
Your professor may be referring to "stability." Have a look here: http://en.wikipedia.org/wiki/Stable_sort#Stability. Stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different then this distinction is not necessary.
Quicksort (in efficient implementations) is not a stable sort, so one way to guarantee stability would be to insure that there are no duplicate integers in your array.
I have an array of N numbers which are same.I am applying Quick sort on it.
What should be the time complexity of the sorting in this case.
I goggled around this question but did not get the exact explanation.
Any help would be appreciated.
This depends on the implementation of Quicksort. The traditional implementation which partitions into 2 (< and >=) sections will have O(n*n) on identical input. While no swaps will necessarily occur, it will cause n recursive calls to be made - each of which need to make a comparison with the pivot and n-recursionDepth elements. i.e. O(n*n) comparisons need to be made
However there is a simple variant which partitions into 3 sets (<, = and >). This variant has O(n) performance in this case - instead of choosing the pivot, swapping and then recursing on 0to pivotIndex-1 and pivotIndex+1 to n, it will put swap all things equal to the pivot to the 'middle' partition (which in the case of all identical inputs always means swapping with itself i.e. a no-op) meaning the call stack will only be 1 deep in this particular case n comparisons and no swaps occur. I believe this variant has made its way into the standard library on linux at least.
The performance of quicksort depends on the pivot selection. The closer the chosen pivot is to the median element, the better is quicksort's performance.
In this specific case you're lucky - the pivot you select will always be a median, since all values are the same. The partition step of quicksort will hence never have to swap elements, and the two pointers will meet exactly in the middle. The two subproblems will have therefore be exactly half the size - giving you a perfect O(n log n).
To be a little more specific, this depends on how well the partition step is implemented. The loop-invariant only needs to make sure that smaller elements are in the left-hand sub-problem, while greater elements are in the right-hand sub-problem. There's no guarantee that a partition implementation never swaps equal elements. But it is always unnecessary work, so no clever implementation should do it: The left and right pointers will never detect an inversion respective the pivot (i.e. you will never hit the case where *left > pivot && *right < pivot) and so the left pointer will be incremented, the right pointer will be decremented every step and they will finally meet in the middle, generating subproblems of size n/2.
It depends on the particular implementation.
If there is only one kind of comparison (≤ or <) to determine where the other elements go relative to the pivot, they will all go into one of the partitions, and you will get O(n2) performance, since the problem size will decrease by only 1 each step.
The algorithm listed here is an example (the accompanying illustration are for a different algorithm).
If there are two kinds of comparisons, for example < for elements on the left and > for elements on the right, as is the case in a two-pointer implementation, and if you take care to move the pointers in step, then you might get perfect O(n log n) performance, because half the equal elements will be split evenly in the two partitions.
The illustrations in the link above use an algorithm which doesn't move pointers in step, so you still get poor performance (look at the "Few unique" case).
So it depends whether you have this special case in mind when implementing the algorithm.
Practical implementations often handle a broader special case: if there are no swaps in the partitioning step, they assume the data is nearly sorted, and use an insertion sort, which gives an even better O(n) in the case of all equal elements.
tobyodavies has provided the right solution. It does handle the case and finish in O(n) time when all the keys are equal.
It is the same partitioning as we do in dutch national flag problem
http://en.wikipedia.org/wiki/Dutch_national_flag_problem
Sharing the code from princeton
http://algs4.cs.princeton.edu/23quicksort/Quick3way.java.html
If you implement the 2-way partitioning algorithm then at every step the array will be halved. This is because when identical keys will be encountered, the scan stops. As a result at each step, the partitioning element will be positioned at the center of the subarray thereby halving the array in every subsequent recursive call. Now, this case is similar to the mergesort case which uses ~N lg N compares to sort an array of N elements. Ergo for duplicate keys, the traditional 2-way partitioning algorithm for Quicksort uses ~N lg N compares, thereby following a linearithmic approach.
Quick Sort code is done using "partition" and "quicksort" functions.
Basically, there are two best ways for implementing Quicksort.
The difference between these two is only the "partition" function,
1.Lomuto
2.Hoare
With a partitioning algorithm such as the Lomuto partition scheme described above (even one that chooses good pivot values), quicksort exhibits poor performance for inputs that contain many repeated elements. The problem is clearly apparent when all the input elements are equal: at each recursion, the left partition is empty (no input values are less than the pivot), and the right partition has only decreased by one element (the pivot is removed). Consequently, the Lomuto partition scheme takes quadratic time to sort an array of equal values.
So, this takes O(n^2) time using the Lomuto partition algorithm.
By using the Hoare partition algorithm we get the best case with all the array elements equal. The time complexity is O(n).
Reference: https://en.wikipedia.org/wiki/Quicksort
I am trying to sort an array which has properties like
it increases upto some extent then it starts decreasing, then increases and then decreases and so on. Is there any algorithm which can sort this in less then nlog(n) complexity by making use of it being partially ordered?
array example = 14,19,34,56,36,22,20,7,45,56,50,32,31,45......... upto n
Thanks in advance
Any sequence of numbers will go up and down and up and down again etc unless they are already fully sorted (May start with a down, of course). You could run through the sequence noting the points where it changes direction, then then merge-sort the sequences (reverse reading the backward sequences)
In general the complexity is N log N because we don't know how sorted it is at this point. If it is moderately well sorted, i.e. there are fewer changes of direction, it will take fewer comparisons.
You could find the change / partition points, and perform a merge sort between pairs of partitions. This would take advantage of the existing ordering, as normally the merge sort starts with pairs of elements.
Edit Just trying to figure out the complexity here. Merge sort is n log(n), where the log(n) relates to the number of times you have to re-partition. First every pair of elements, then every pair of pairs, etc... until you reach the size of the array. In this case you have n elements with p partitions, where p < n, so I'm guessing the complexity is p log(p), but am open to correction. e.g. merge each pair of paritions, and repeat based on half the number of partitions after the merge.
See Topological sorting
If you know for a fact that the data are "almost sorted" and the set size is reasonably small (say an array that can be indexed by a 16-bit integer), then Shell is probably your best bet. Yes, it has a basic time complexity of O(n^2) (which can be reduced by the sequence used for gap sizing to a current best-worst-case of O(n*log^2(n))), but the performance improves with the sortedness of the input set to a best-case of O(n) on an already-sorted set. Using Sedgewick's sequence for gap size will give the best performance on those occasions when the input is not as sorted as you expected it to be.
Strand Sort might be close to what you're looking for. O(n sqrt(n)) in the average case, O(n) best case (list already sorted), O(n^2) worst case (list sorted in reverse order).
Share and enjoy.