Sorting a partially sorted array in O(n) - arrays

Hey so I'm just really stuck on this question.
I need to devise an algorithm (no need for code) that sorts a certain partially sorted array into a fully sorted array. The array has N real numbers and the first N-[N\sqrt(N)] (the [] denotes the floor of this number) elements are sorted, while are the rest are not. There are no special properties to the unsorted numbers at the end, in fact I'm told nothing about them other than they're obviously real numbers like the rest.
The kicker is time complexity for the algorithm needs to be O(n).
My first thought was to try and sort only the unsorted numbers and then use a merge algorithm, but I can't figure out any sorting algorithm that would work here in O(n). So I'm thinking about this all wrong, any ideas?

This is not possible in the general case using a comparison-based sorting algorithm. You are most likely missing something from the question.
Imagine the partially sorted array [1, 2, 3, 4564, 8481, 448788, 145, 86411, 23477]. It contains 9 elements, the first 3 of which are sorted (note that floor(N/sqrt(N)) = floor(sqrt(N)) assuming you meant N/sqrt(N), and floor(sqrt(9)) = 3). The problem is that the unsorted elements are all in a range that does not contain the sorted elements. It makes the sorted part of the array useless to any sorting algorithm, since they will stay there anyway (or be moved to the very end in the case where they are greater than the unsorted elements).
With this kind of input, you still need to sort, independently, N - floor(sqrt(N)) elements. And as far as I know, N - floor(sqrt(N)) ~ N (the ~ basically means "is the same complexity as"). So you are left with an array of approximately N elements to sort, which takes O(N log N) time in the general case.
Now, I specified "using a comparison-based sorting algorithm", because sorting real numbers (in some range, like the usual floating-point numbers stored in computers) can be done in amortized O(N) time using a hash sort (similar to a counting sort), or maybe even a modified radix sort if done properly. But the fact that a part of the array is already sorted doesn't help.

In other words, this means there are sqrt(N) unsorted elements at the end of the array. You can sort them with an O(n^2) algorithm which will give a time of O(sqrt(N)^2) = O(N); then do the merge you mentioned which will also run in O(N). Both steps together will therefore take just O(N).

Related

Understanding the Big O for squaring elements in an array

I was working on a problem where you have to square the numbers in a sorted array on leetcode. Here is the original problem
Given an array of integers A sorted in non-decreasing order, return an array of the squares of each number, also in sorted non-decreasing order.
I am trying to understand the big O for my code and for the code that was given in the solution.
This is my code
def sortedSquare(A):
new_A = []
for num in A:
num = num*num
new_A.append(num)
return sorted(new_A)
print(sortedSquare([-4, -1, 0, 3, 10]))
Here is the code from the solution:
def sortedSquares(self, A):
return sorted(x*x for x in A)
For the solution, the Big O is
NlogN
Where N is the length of the array. I don't understand why it would be logN and not just N for the Big O?
For my solution, I am seeing it as Big O of N because I am just iterating through the entire array.
Also, is my solution a good solution compared to the solution that was given?
Your solution does the exact same thing as the given solution. Both solutions square all the elements and then sort the resultant array, with the leetcode solution being a bit more concise.
The reason why both these solutions are O(NlogN) is because of the use of sorted(). Python's builtin sort is timsort which sorts the array in O(NlogN) time. The use of sorted(), not squaring, is what provides the dominant factor in your time complexity (O(NlogN) + O(N) = O(NlogN)).
Note though that this problem can be solved in O(N) using two pointers or by using the merge step in mergesort.
Edit:
David Eisenstat brought up a very good point on timsort. Timsort aggregates strictly increasing and strictly decreasing runs and merges them. Since the resultant squared array will be first strictly decreasing and then strictly increasing, timsort will actually reverse the strictly decreasing run and then merge them in O(N).
The way complexity works is that the overall complexity for the whole program is the worst complexity for any one part. So, in your case, you have the part that squares the numbers and you have the part that sorts the numbers. So which part is the one that determines the overall complexity?
The squaring part is o(n) because you only touch the elements once in order to square them.
What about the sorting part? Generally it depends on what sorting function you use:
Most sort routines have O(n*log(n)) because they use a divide and conquer algorithm.
Some (like bubble sort) have O(n^2)
Some (like the counting sort) have O(n)
In your case, they say that the given solution is O(n*log(n)) and since the squaring part is O(n) then the sorting part must be O(n*log(n)). And since your code uses the same sorting function as the given solution your sort must also be O(n*log(n))
So your squaring part is O(n) and your sorting part is O(n*log(n)) and the overall complexity is the worst of those: O(n*log(n))
If extra storage space is allowed (like in your solution), the whole process can be performed in time O(N). The initial array is already sorted. You can split it in two subsequences with the negative and positive values.
Square all elements (O(N)) and reverse the negative subsequence (O(N) at worse), so that both sequences are sorted. If one of the subsequences is empty, you are done.
Otherwise, merge the two sequences, in time O(N) (this is the step that uses extra O(N) space).

How many comparisons does insertion sort do in an already-ordered 2-element array?

The best case scenario of insertion sort is meant to be O(n), however, if you have 2 elements in an array that are already sorted, such as 10 and 11, doesn't it only make one comparison rather than 2?
Time complexity of O(n) does not mean that the number of steps is exactly n, it means that the number of steps is dominated by a linear function. Basically, sorting twice as many elements should take at most twice as much time for large numbers.
The best case scenario for insert sort is when you can insert the new element after just one comparison. This can happen in only 2 cases:
You are inserting elements in from a reverse sorted list and you compare the new element with the first element of the target list.
You are inserting elements from a sorted list and you compare the new element with the last one of the target list.
In these 2 cases, each new element is inserted after just one comparison, including in the case you mention.
The time complexity would be indeed O(n) for these very special cases. You do not need such a favorable case for this complexity, the time complexity will be O(n) if there is a constant upper bound for the number of comparisons independent of the list length.
Note that it is a common optimization to try and handle sorted lists in an optimized way. If the optimization mentioned in the second paragraph above is not implemented, sorting an already sorted list would be the worst case scenario, with n comparisons for the insertion of the n+1th element.
In the general case, insertion sort on lists has a time complexity of O(n2), but careful implementation can produce an optimal solution for already sorted lists.
Note that this is true for lists where inserting at any position has a constant cost, insertion sort on arrays does not have this property. It can still be optimized to handle these special cases, but not both at the same time.
Insertion sort does N - 1 comparisons if the input is already sorted.
This is because for every element it compares it with a previous element and does something if the order is not right (it is not important what it does now, because the order is always right). So you will do this N - 1 times.
So it looks like you have to understand a big O notation. Because O(n) does not mean n operations, it does not even mean close to n operations (n/10^9 is O(n) and it is not really close to n). All it mean that the function approximately linear (think about it as limit where n-> inf).

Searching algorithm with complexity O(log n), UNSORTED list/array

I had this exercice in an exam which stated:
Find an algorithm which can search for the highest number in an
unsorted list and have a Big-Oh complexity of O(log(N)).
The only searching algorithm with a log n complexity that I have found is the binary search algorithm but that one requires my list/array to be sorted.
Is there such an algorithm?
This is a trick question. It has not been stated that the list has N elements. So, you can use a change of variable, and replace N with 2K. Now, solve the problem with a linear algorithm on a list with K elements.
If we assume there are N elements in the list, a possible solution would be to use N parallel computing elements [ CE0 .. CEN ]. In the base case of the algorithm, we let each computing element CEi in [ CEN/2 .. CEN ] compare list values x2i-N and x2i-N+1. Each computing element reports the larger of their two assigned values to CEi/2. The iterative steps of the algorithm is that each computing element CEk that receives two reported values reports the largest to CEk/2. This iterative logic continues until CE0 processes a report from itself. Instead of reporting to itself again, it outputs the result.
If parallel computation is ruled out, then there is no solution to the problem.
No, there is no such algorithms. In a unsorted list, find a highest number require to browse through all elements.
So, no algorithm better than O(n) exists!
The best one can do is O(n) time in an unsorted array.
But instead of simply looking through the whole list you can apply a partition() routine (from the quicksort algorithm) and instead of recursing on the lower half of the partition you can recurse on the upper half and keep partitioning until the largest element is found. This takes O(n) time.
Check out for detailed explanation:
http://en.wikipedia.org/wiki/Quickselect
How to find the kth largest element in an unsorted array of length n in O(n)?
Hope it helped! :)

Sort the t smallest integers of an unsorted array of length n

I am trying to find the most efficient way to sort the t smallest integers of an unsorted array of length n.
I am trying to have O(n) runtime but, keep getting stuck.
The best I can think of is just sorting the entire array and taking the first t. In all other cases, I keep hitting the chance that the smallest is left behind, and if I check them all then, it has the same time complexity of sorting the entire array.
Can anyone give me some ideas?
Run something like quickselect to find the t-th element and then partition the data to extract the t smallest elements. This can be done in O(n) time (average case).
Quickselect is:
An algorithm, similar on quicksort, which repeatedly picks a 'pivot' and partitions the data according to this pivot (leaving the pivot in the middle, with smaller elements on the left, and larger elements on the right). It then recurses to the side which contains the target element (which it can easily determined by just counting the number of elements on either side).
Then you'll still need to sort the t elements, which can be done with, for example, quicksort or mergesort, giving a running time of O(t log t).
The total running time will be O(n + t log t) - you probably can't do much better than that.
If t is considerably smaller than n you can find those t elements in one traverse over the array, always saving the t smallest items and getting rid of bigger integers - many data structures are available for this, BST for example.
Then the run time will be min(O(n), O(t log(t)))

algorithm of sorting d sorted arrays

Please help to understand the running time of the following algorithm
I have d already sorted arrays (every array have more than 1 element) with total n elements.
i want to have one sorted array of size n
if i am not mistaken insertion sort is running linearly on partially sorted arrays
if i will concatenate this d arrays into one n element array and sort it with insertion sort
isn't it a partially sorted array and running time of insertion sort on this array wont be O(n) ?
Insertion sort is O(n²), even when original array is concatenation of several presorted arrays. You probably need to use mergesort to combine several sorted arrays into one sorted array. This will give you O(n·ln(d)) performance
No, this will take quadratic time. Insertion sort is only linear if each element is at most a constant distance d away from the point where it would be in a sorted array, in which case it takes O(nd) time -- that's what's meant by partially sorted. You don't have that guarantee.
You can do this in linear time only under the assumption that the number of subarrays is guaranteed to be a small constant. In that case, you can use a k-way merge.
Insertion sort is fairly (relatively) linear for small values of N. If N is large then your performance will more likely be N^2.
The fact that the sub-arrays are sort wont, I believe, help that much if N is sufficiently large.
Timsort is a good candidate for partially sorted arrays
If the arrays are known to be sorted, it's a simple matter of treating each array as a queue, sorting the "heads", selecting the smallest of the heads to put into the new array, then "popping" the selected value from its array.
If D is small then a simple bubble sort works well for sorting the heads, otherwise you should use some sort of insertion sort, since only one element needs to be placed into the order.
This is basically a "merge sort", I believe. Very useful when the list to be sorted exceeds working storage, since you can sort smaller lists first, without thrashing, then combine using very little working storage.

Resources