I'm looking for an Array of n digits, that will be sorted using Insertion Sort in time of O(n^(7/4)).
What kind of Array, as a function of n, will give me such running time?
As an example, it can be an array which is half (or more) sorted and half not, or sorted back-words, or whatever.
Hope it's clear enough.
Thanks!
Related
Hey so I'm just really stuck on this question.
I need to devise an algorithm (no need for code) that sorts a certain partially sorted array into a fully sorted array. The array has N real numbers and the first N-[N\sqrt(N)] (the [] denotes the floor of this number) elements are sorted, while are the rest are not. There are no special properties to the unsorted numbers at the end, in fact I'm told nothing about them other than they're obviously real numbers like the rest.
The kicker is time complexity for the algorithm needs to be O(n).
My first thought was to try and sort only the unsorted numbers and then use a merge algorithm, but I can't figure out any sorting algorithm that would work here in O(n). So I'm thinking about this all wrong, any ideas?
This is not possible in the general case using a comparison-based sorting algorithm. You are most likely missing something from the question.
Imagine the partially sorted array [1, 2, 3, 4564, 8481, 448788, 145, 86411, 23477]. It contains 9 elements, the first 3 of which are sorted (note that floor(N/sqrt(N)) = floor(sqrt(N)) assuming you meant N/sqrt(N), and floor(sqrt(9)) = 3). The problem is that the unsorted elements are all in a range that does not contain the sorted elements. It makes the sorted part of the array useless to any sorting algorithm, since they will stay there anyway (or be moved to the very end in the case where they are greater than the unsorted elements).
With this kind of input, you still need to sort, independently, N - floor(sqrt(N)) elements. And as far as I know, N - floor(sqrt(N)) ~ N (the ~ basically means "is the same complexity as"). So you are left with an array of approximately N elements to sort, which takes O(N log N) time in the general case.
Now, I specified "using a comparison-based sorting algorithm", because sorting real numbers (in some range, like the usual floating-point numbers stored in computers) can be done in amortized O(N) time using a hash sort (similar to a counting sort), or maybe even a modified radix sort if done properly. But the fact that a part of the array is already sorted doesn't help.
In other words, this means there are sqrt(N) unsorted elements at the end of the array. You can sort them with an O(n^2) algorithm which will give a time of O(sqrt(N)^2) = O(N); then do the merge you mentioned which will also run in O(N). Both steps together will therefore take just O(N).
I was asked a question recently in an interview.
We are given an array A of size n+m with first n places filled with elements in random order (and m empty places at the end). Also, we have an array B with m elements in random order.
Write a merge function so that array A is filled with (n+m) elements in sorted order.
I was able to give a O((n+m)log(n+m)) solution.
Is there a better solution to this problem?
NO there's no better solution to that. Let t = max(m,n) then the complexity is O(tlog(t)). How do we go on to prove there's no better solution ?
Will if there was a better solution to this problem when nothing is known about the data, then given any array of size N (big enough), we could split it to n, m arrays and sort in less than Nlog(N).
I am trying to find the most efficient way to sort the t smallest integers of an unsorted array of length n.
I am trying to have O(n) runtime but, keep getting stuck.
The best I can think of is just sorting the entire array and taking the first t. In all other cases, I keep hitting the chance that the smallest is left behind, and if I check them all then, it has the same time complexity of sorting the entire array.
Can anyone give me some ideas?
Run something like quickselect to find the t-th element and then partition the data to extract the t smallest elements. This can be done in O(n) time (average case).
Quickselect is:
An algorithm, similar on quicksort, which repeatedly picks a 'pivot' and partitions the data according to this pivot (leaving the pivot in the middle, with smaller elements on the left, and larger elements on the right). It then recurses to the side which contains the target element (which it can easily determined by just counting the number of elements on either side).
Then you'll still need to sort the t elements, which can be done with, for example, quicksort or mergesort, giving a running time of O(t log t).
The total running time will be O(n + t log t) - you probably can't do much better than that.
If t is considerably smaller than n you can find those t elements in one traverse over the array, always saving the t smallest items and getting rid of bigger integers - many data structures are available for this, BST for example.
Then the run time will be min(O(n), O(t log(t)))
Please help to understand the running time of the following algorithm
I have d already sorted arrays (every array have more than 1 element) with total n elements.
i want to have one sorted array of size n
if i am not mistaken insertion sort is running linearly on partially sorted arrays
if i will concatenate this d arrays into one n element array and sort it with insertion sort
isn't it a partially sorted array and running time of insertion sort on this array wont be O(n) ?
Insertion sort is O(n²), even when original array is concatenation of several presorted arrays. You probably need to use mergesort to combine several sorted arrays into one sorted array. This will give you O(n·ln(d)) performance
No, this will take quadratic time. Insertion sort is only linear if each element is at most a constant distance d away from the point where it would be in a sorted array, in which case it takes O(nd) time -- that's what's meant by partially sorted. You don't have that guarantee.
You can do this in linear time only under the assumption that the number of subarrays is guaranteed to be a small constant. In that case, you can use a k-way merge.
Insertion sort is fairly (relatively) linear for small values of N. If N is large then your performance will more likely be N^2.
The fact that the sub-arrays are sort wont, I believe, help that much if N is sufficiently large.
Timsort is a good candidate for partially sorted arrays
If the arrays are known to be sorted, it's a simple matter of treating each array as a queue, sorting the "heads", selecting the smallest of the heads to put into the new array, then "popping" the selected value from its array.
If D is small then a simple bubble sort works well for sorting the heads, otherwise you should use some sort of insertion sort, since only one element needs to be placed into the order.
This is basically a "merge sort", I believe. Very useful when the list to be sorted exceeds working storage, since you can sort smaller lists first, without thrashing, then combine using very little working storage.
I have an array of structs called struct Test testArray[25].
The Test struct contains a member called int size.
What is the fastest way to get another array of Test structs that contain all from the original excluding the 5 largest, based on the member size? WITHOUT modifying the original array.
NOTE: Amount of items in the array can be much larger, was just using this for testing and the values could be dynamic. Just wanted a slower subset for testing.
I was thinking of making a copy of the original testArray and then sorting that array. Then return an array of Test structs that did not contain the top 5 or bottom 5 (depending on asc or desc).
OR
Iterating through the testArray looking for the largest 5 and then making a copy of the original array excluding the largest 5. This way seems like it would iterate through the array too many times comparing to the array of 5 largest that had been found.
Follow up question:
Here is what i am doing now, let me know what you think?
Considering the number of largest elements i am interested in is going to remain the same, i am iterating through the array and getting the largest element and swapping it to the front of the array. Then i skip the first element and look for the largest after that and swap it into the second index... so on so forth. Until i have the first 5 largest. Then i stop sorting and just copy the sixth index to the end into a new array.
This way, no matter what, i only iterate through the array 5 times. And i do not have to sort the whole thing.
Partial Sorting with a linear time selection algorithm will do this in O(n) time, where sorting would be O(nlogn).
To quote the Partial Sorting page:
The linear-time selection algorithm described above can be used to find the k smallest or the k largest elements in worst-case linear time O(n). To find the k smallest elements, find the kth smallest element using the linear-time median-of-medians selection algorithm. After that, partition the array with the kth smallest element as pivot. The k smallest elements will be the first k elements.
You can find the k largest items in O(n), although making a copy of the array or an array of pointers to each element (smarter) will cost you some time as well, but you have to do that regardless.
If you'd like me to give a complete explanation of the algorithm involved, just comment.
Update:
Regarding your follow up question, which basically suggests iterating over the list five times... that will work. But it iterates over the list more times than you need to. Finding the k largest elements in one pass (using an O(n) selection algorithm) is much better than that. That way you iterate once to make your new array, and once more to do the selection (if you use median-of-medians, you will not need to iterate a third time to remove the five largest items as you can just split the working array into two parts based on where the 5th largest item is), rather than iterating once to make your new array and then an additional five times.
As stated sorting is O(nlogn +5) iterating in O(5n + 5). In the general case finding m largest numbers is O(nlog +m) using the sort algorithm and O(mn +m) in the iteration algoritm. The question of which algorithm is better depends on the values of m and n. For a value of five iterating is better for up to 2 to the 5th numbers I.e. a measly 32. However in terms of operations sorting is more intensive than iterating so it'll be quite a bit more until it is faster.
You can do better theoretically by using a sorted srray of the largest numbers so far and binary search to maintain the order that will give you O(nlogm) but that again depends on the values of n and m.
Maybe an array isn't the best structure for what you want. Specially since you need to sort it every time a new value is added. Maybe a linked list is better, with a sort on insert (which is O(N) on the worst case and O(1) in the best), then just discard the last five elements. Also, you have to consider that just switching a pointer is considerably faster than reallocating the entire array just get another element in there.
Why not an AVL Tree? Traverse time is O(log2N), but you have to consider the time of rebalancing the tree, and if the time spent coding that is worth it.
With usage of min-heap data structure and set heap size to 5, you can traverse the array and insert into heap when the minimum element of heap is less than the element in the array.
getMin takes O(1) time and insertion takes O(log(k)) time where k is the element size of heap (in our case it is 5). So in the worst case we have complexity O(n*log(k)) to find max 5 elements. Another O(n) will take to get the excluded list.