Sort an array which is partially sorted

Sort an array which is partially sorted - c

I am trying to sort an array which has properties like
it increases upto some extent then it starts decreasing, then increases and then decreases and so on. Is there any algorithm which can sort this in less then nlog(n) complexity by making use of it being partially ordered?
array example = 14,19,34,56,36,22,20,7,45,56,50,32,31,45......... upto n
Thanks in advance

Any sequence of numbers will go up and down and up and down again etc unless they are already fully sorted (May start with a down, of course). You could run through the sequence noting the points where it changes direction, then then merge-sort the sequences (reverse reading the backward sequences)
In general the complexity is N log N because we don't know how sorted it is at this point. If it is moderately well sorted, i.e. there are fewer changes of direction, it will take fewer comparisons.

You could find the change / partition points, and perform a merge sort between pairs of partitions. This would take advantage of the existing ordering, as normally the merge sort starts with pairs of elements.
Edit Just trying to figure out the complexity here. Merge sort is n log(n), where the log(n) relates to the number of times you have to re-partition. First every pair of elements, then every pair of pairs, etc... until you reach the size of the array. In this case you have n elements with p partitions, where p < n, so I'm guessing the complexity is p log(p), but am open to correction. e.g. merge each pair of paritions, and repeat based on half the number of partitions after the merge.

See Topological sorting

If you know for a fact that the data are "almost sorted" and the set size is reasonably small (say an array that can be indexed by a 16-bit integer), then Shell is probably your best bet. Yes, it has a basic time complexity of O(n^2) (which can be reduced by the sequence used for gap sizing to a current best-worst-case of O(n*log^2(n))), but the performance improves with the sortedness of the input set to a best-case of O(n) on an already-sorted set. Using Sedgewick's sequence for gap size will give the best performance on those occasions when the input is not as sorted as you expected it to be.

Strand Sort might be close to what you're looking for. O(n sqrt(n)) in the average case, O(n) best case (list already sorted), O(n^2) worst case (list sorted in reverse order).
Share and enjoy.

Related

How many comparisons does insertion sort do in an already-ordered 2-element array?

The best case scenario of insertion sort is meant to be O(n), however, if you have 2 elements in an array that are already sorted, such as 10 and 11, doesn't it only make one comparison rather than 2?

Time complexity of O(n) does not mean that the number of steps is exactly n, it means that the number of steps is dominated by a linear function. Basically, sorting twice as many elements should take at most twice as much time for large numbers.
The best case scenario for insert sort is when you can insert the new element after just one comparison. This can happen in only 2 cases:
You are inserting elements in from a reverse sorted list and you compare the new element with the first element of the target list.
You are inserting elements from a sorted list and you compare the new element with the last one of the target list.
In these 2 cases, each new element is inserted after just one comparison, including in the case you mention.
The time complexity would be indeed O(n) for these very special cases. You do not need such a favorable case for this complexity, the time complexity will be O(n) if there is a constant upper bound for the number of comparisons independent of the list length.
Note that it is a common optimization to try and handle sorted lists in an optimized way. If the optimization mentioned in the second paragraph above is not implemented, sorting an already sorted list would be the worst case scenario, with n comparisons for the insertion of the n+1th element.
In the general case, insertion sort on lists has a time complexity of O(n2), but careful implementation can produce an optimal solution for already sorted lists.
Note that this is true for lists where inserting at any position has a constant cost, insertion sort on arrays does not have this property. It can still be optimized to handle these special cases, but not both at the same time.

Insertion sort does N - 1 comparisons if the input is already sorted.
This is because for every element it compares it with a previous element and does something if the order is not right (it is not important what it does now, because the order is always right). So you will do this N - 1 times.
So it looks like you have to understand a big O notation. Because O(n) does not mean n operations, it does not even mean close to n operations (n/10^9 is O(n) and it is not really close to n). All it mean that the function approximately linear (think about it as limit where n-> inf).

Efficient way to compute sum of k largest numbers in a list?

I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?

The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)

I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).

One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?

Find the one non-repeating element in array?

I have an array of n elements in which only one element is not repeated, else all the other numbers are repeated >1 times. And there is no limit on the range of the numbers in the array.
Some solutions are:
Making use of hash, but that would result in linear time complexity but very poor space complexity
Sorting the list using MergeSort O(nlogn) and then finding the element which doesn't repeat
Is there a better solution?

One general approach is to implement a bucketing technique (of which hashing is such a technique) to distribute the elements into different "buckets" using their identity (say index) and then find the bucket with the smallest size (1 in your case). This problem, I believe, is also known as the minority element problem. There will be as many buckets as there are unique elements in your set.
Doing this by hashing is problematic because of collisions and how your algorithm might handle that. Certain associative array approaches such as tries and extendable hashing don't seem to apply as they are better suited to strings.
One application of the above is to the Union-Find data structure. Your sets will be the buckets and you'll need to call MakeSet() and Find() for each element in your array for a cost of $O(\alpha(n))$ per call, where $\alpha(n)$ is the extremely slow-growing inverse Ackermann function. You can think of it as being effectively a constant.
You'll have to do Union when an element already exist. With some changes to keep track of the set with minimum cardinality, this solution should work. The time complexity of this solution is $O(n\alpha(n))$.
Your problem also appears to be loosely related to the Element Uniqueness problem.

Try a multi-pass scanning if you have strict space limitation.
Say the input has n elements and you can only hold m elements in your memory. If you use a hash-table approach, in the worst case you need to handle n/2 unique numbers so you want m>n/2. In case you don't have that big m, you can partition n elements to k=(max(input)-min(input))/(2m) groups, and go ahead scan the n input elements k times (in the worst case):
1st run: you only hash-get/put/mark/whatever elements with value < min(input)+m*2; because in the range (min(input), min(input)+m*2) there are at most m unique elements and you can handle that. If you are lucky you already find the unique one, otherwise continue.
2nd run: only operate on elements with value in range (min(input)+m*2, min(input)+m*4), and
so on, so forth
In this way, you compromise the time complexity to a O(kn), but you get a space complexity bound of O(m)

Two ideas come to my mind:
A smoothsort may be a better alternative than the cited mergesort for your needs given it's O(1) in memory usage, O(nlogn) in the worst case as the merge sort but O(n) in the best case;
Based on the (reverse) idea of the splay tree, you could make a type of tree that would
push the leafs toward the bottom once they are used (instead of upward as in the splay tree). This would still give you a O(nlogn) implantation of the sort, but the advantage would be the O(1) step of finding the unique element, it would be the root. The sorting algorithm is the sum of O(nlogn) + O(n) and this algorithm would be O(nlogn) + O(1)
Otherwise, as you stated, using a hash based solution (like hash-implemented set) would result in a O(n) algorithm (O(n) to insert and add a counting reference to it and O(n) to traverse your set to find the unique element) but you seemed to dislike the memory usage, though I don't know why. Memory is cheap, these days...

Quicksort complexity when all the elements are same?

I have an array of N numbers which are same.I am applying Quick sort on it.
What should be the time complexity of the sorting in this case.
I goggled around this question but did not get the exact explanation.
Any help would be appreciated.

This depends on the implementation of Quicksort. The traditional implementation which partitions into 2 (< and >=) sections will have O(n*n) on identical input. While no swaps will necessarily occur, it will cause n recursive calls to be made - each of which need to make a comparison with the pivot and n-recursionDepth elements. i.e. O(n*n) comparisons need to be made
However there is a simple variant which partitions into 3 sets (<, = and >). This variant has O(n) performance in this case - instead of choosing the pivot, swapping and then recursing on 0to pivotIndex-1 and pivotIndex+1 to n, it will put swap all things equal to the pivot to the 'middle' partition (which in the case of all identical inputs always means swapping with itself i.e. a no-op) meaning the call stack will only be 1 deep in this particular case n comparisons and no swaps occur. I believe this variant has made its way into the standard library on linux at least.

The performance of quicksort depends on the pivot selection. The closer the chosen pivot is to the median element, the better is quicksort's performance.
In this specific case you're lucky - the pivot you select will always be a median, since all values are the same. The partition step of quicksort will hence never have to swap elements, and the two pointers will meet exactly in the middle. The two subproblems will have therefore be exactly half the size - giving you a perfect O(n log n).
To be a little more specific, this depends on how well the partition step is implemented. The loop-invariant only needs to make sure that smaller elements are in the left-hand sub-problem, while greater elements are in the right-hand sub-problem. There's no guarantee that a partition implementation never swaps equal elements. But it is always unnecessary work, so no clever implementation should do it: The left and right pointers will never detect an inversion respective the pivot (i.e. you will never hit the case where *left > pivot && *right < pivot) and so the left pointer will be incremented, the right pointer will be decremented every step and they will finally meet in the middle, generating subproblems of size n/2.

It depends on the particular implementation.
If there is only one kind of comparison (≤ or <) to determine where the other elements go relative to the pivot, they will all go into one of the partitions, and you will get O(n2) performance, since the problem size will decrease by only 1 each step.
The algorithm listed here is an example (the accompanying illustration are for a different algorithm).
If there are two kinds of comparisons, for example < for elements on the left and > for elements on the right, as is the case in a two-pointer implementation, and if you take care to move the pointers in step, then you might get perfect O(n log n) performance, because half the equal elements will be split evenly in the two partitions.
The illustrations in the link above use an algorithm which doesn't move pointers in step, so you still get poor performance (look at the "Few unique" case).
So it depends whether you have this special case in mind when implementing the algorithm.
Practical implementations often handle a broader special case: if there are no swaps in the partitioning step, they assume the data is nearly sorted, and use an insertion sort, which gives an even better O(n) in the case of all equal elements.

tobyodavies has provided the right solution. It does handle the case and finish in O(n) time when all the keys are equal.
It is the same partitioning as we do in dutch national flag problem
http://en.wikipedia.org/wiki/Dutch_national_flag_problem
Sharing the code from princeton
http://algs4.cs.princeton.edu/23quicksort/Quick3way.java.html

If you implement the 2-way partitioning algorithm then at every step the array will be halved. This is because when identical keys will be encountered, the scan stops. As a result at each step, the partitioning element will be positioned at the center of the subarray thereby halving the array in every subsequent recursive call. Now, this case is similar to the mergesort case which uses ~N lg N compares to sort an array of N elements. Ergo for duplicate keys, the traditional 2-way partitioning algorithm for Quicksort uses ~N lg N compares, thereby following a linearithmic approach.

Quick Sort code is done using "partition" and "quicksort" functions.
Basically, there are two best ways for implementing Quicksort.
The difference between these two is only the "partition" function,
1.Lomuto
2.Hoare
With a partitioning algorithm such as the Lomuto partition scheme described above (even one that chooses good pivot values), quicksort exhibits poor performance for inputs that contain many repeated elements. The problem is clearly apparent when all the input elements are equal: at each recursion, the left partition is empty (no input values are less than the pivot), and the right partition has only decreased by one element (the pivot is removed). Consequently, the Lomuto partition scheme takes quadratic time to sort an array of equal values.
So, this takes O(n^2) time using the Lomuto partition algorithm.
By using the Hoare partition algorithm we get the best case with all the array elements equal. The time complexity is O(n).
Reference: https://en.wikipedia.org/wiki/Quicksort

How can we find the i'th greatest element of the array?

Algorithm for Finding nth smallest/largest element in an array using data strucuture self balancing binary search tree..
Read the post: Find kth smallest element in a binary search tree in Optimum way. But the correct answer is not clear, as i am not able to figure out the correct answer, for an example that i took...... Please a bit more explanation required.......

C.A.R. Hoare's select algorithm is designed for precisely this purpose. It executes in [expected] linear time, with logarithmic extra storage.
Edit: the obvious alternative of sorting, then picking the right element has O(N log N) complexity instead of O(N). Storing the i largest elements in sorted order requires O(i) auxiliary storage, and roughly O(N * i log i) complexity. This can be a win if i is known a priori to be quite small (e.g. 1 or 2). For more general use, select is usually better.
Edit2: offhand, I don't have a good reference for it, but described the idea in a previous answer.

First sort the array descending, then take the ith element.

Create a sorted data structure to hold i elements and set the initial count to 0.
Process each element in the source array, adding it to that new structure until the new structure is full.
Then process the rest of the source array. For each one that is larger than the smallest in the sorted data structure, remove the smallest from that structure and put the new one in.
Once you've processed all elements in the source array, your structure will hold the i greatest elements. Just grab the last of these and you have your i'th greatest element.
Voila!
Alternatively, sort it then just grab the i'th element directly.

That's a fitting task for the heaps which feature very low insert and low delete_min costs. E.g. pairing heaps. It would have the worst case O(n*log(n)) performance. But since non-trivial to implement, better check first suggested elsewhere selection algorithms.

There are many strategies available for your task (if you don't focus on the self-balancing tree to begin with).
It's usually a tradeoff speed / memory. Most algorithms require either to modify the array in place or O(N) additional storage.
The solution with self-balancing tree is in the latter category, but it's not the right choice here. The issue is that building the tree itself takes O(N*log N), which will dominate the later search term and give a final complexity of O(N*log N). Therefore you're not better than simply sorting the array and use a complex datastructure...
In general, the issue largely depends on the magnitude of i related to N. If you think for a minute, for i == 1 it's trivial right ? It's called finding the maximum.
Well, the same strategy obviously work for i == 2 (carrying the 2 maximum elements around) in linear time. And it's also trivially symmetric: ie if you need to find the N-1 th element, then just carry around the 2 minimum elements.
However, it loses efficiency when i is about N/2 or N/4. Carrying the i maximum elements then mean sorting an array of size i... and thus we fallback on the N*log N wall.
Jerry Coffin pointed out a simple solution, which works well for this case. Here is the reference on Wikipedia. The full article also describes the Median of Median method: it's more reliable, but involves more work and is thus generally slower.

Create an empty list L
For each element x in the original list,
add x in sorted position to L
if L has more than i elements,
pop the smallest one off L
if List2 has i elements,
return the i-th element,
else
return failure
This should take O(N (log (i))). If i is assumd to be a constant, then it is O(N).

Build a heap from the elements and call MIN i times.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight