Big-O running time of sorting vs inserting - arrays

So if you use quick sort to sort an array you can do it in O(nlogn) using quicksort and then once you sort it, you can insert new elements into the array in O(logn) with a binary-search-esque algorithm.
My question is, is there a way to prove that if you can insert into a sorted array in O(logn) time, then that means that the sorting algorithm would have had to be at least O(nlogn)?
In other words, is there a relationship between the two algorithms' running times?

No: it would be possible to use bubblesort (O(n²)) to sort the array. After that, it would still be possible to use the same algorithm to insert at O(log(n)) time.

Well, the fact that insertion which maintains order is O(log n) means that a sort operation can be performed in O(n log n) simply by inserting each element in turn into the array. This however is probably the opposite of what you're really asking; it proves that there is an O(n log n) sort, but doesn't disprove the possibility of a faster sort.

First off, some sorting algorithms, which require special conditions, have a potentially lower time complexity. For instance Counting Sort which has complexity of O(n+k) (where k is the highest number in the array) or Radix Sort with the complexity of O(n*k) (where k is the maximal number of digits in the arrays elements).
The bound of O(nlogn) applies to comparison algorithms.
Second, to answer the question, no (sadly). The relation between insertion and sorting is the other way around, just like davmac said.
The reason that the lower bound on comparison is that you require at least nlogn comparisons to find the right order. See This wiki page on comparison sort algorithms for more details. Just note that the prof has nothing to do with insertion (the prof does not assume any insertions actually).

Related

What's the most efficient way to construct a new, sorted, array?

Background
Most questions around sorting talk about sorting an existing unsorted array. Is constructing a new array in a sorted order an equivalent problem or a different one? Here's an example that will clear things up:
Example
I'm generating N random numbers and want to insert them into a new array as I generate them, and I want the final array to be sorted.
Possible Solutions
Insertion Sort
My gut told me that putting each element in the correct place as it's generated would be fastest. This is accomplished by doing a binary search to find the correct point in the array to insert the new element. However, this is an insertion sort, which is known to be less efficient on large lists than other sorting algorithms.
Quicksort
Quicksort is generally thought of as the most efficient 'general' sorting algorithm, where nothing is known about the inputs to the array, and it's more efficient than insertion sort on large lists. Would it, therefore, be best to simply put the random numbers in the array in an unsorted order, and then sort them at the end with quicksort?
Other Solutions
Is there another algorithm I haven't thought of?
Most questions around sorting talk about sorting an existing unsorted array. Is constructing a new array in a sorted order an equivalent problem or a different one? 
It boils down to the same problem for random data, due to efficiency considerations.
Given random data, it's actually more efficient to first generate the random values into an array (unsorted) - O(n) time complexity - and then sort it with your favorite O(n log n) sort algorithm, making the entire operation O(2n log n) time complexity, and, depending on sort algorithm used, between O(1) and O(n) space complexity.
There is no way to beat that approach by "keeping an array sorted as it's constructed" for random data, because any approach will require exactly O(n) generations/insertions of the values, and at least O(n log n) comparisons/swaps/shifts - no matter which method, from the numerous mentioned in comments on the original question, is used. Note, as per a very useful comment on my original answer, the binary insertion sort variant suggested in the original question will likely degrade to O(n^2) time complexity, making it an inferior solution to just generating an array of values first and then sorting it.
Using a balanced tree just matches the time complexity of generating an array and then sorting it - but loses in space complexity, as trees have some overhead, compared to an array, to keep track of child nodes, etc. Also of note, trees are heap-allocated, and require a pointer dereference operation for accessing any child node - so even though the Big-O time complexity is equivalent to first generating an array of data and then sorting it, the real performance of the tree solution will be worse, as there's no data locality, and there's extra cost of pointer dereference. An additional consideration on balanced trees is that insertion cost into something like an AVL is quite high - that is, the n in AVL's O(n log n) insertion is not the same cost as n in an in-place sort of an array, due to necessary rotations of tree nodes to achieve balance. Just because Big-O is the same doesn't mean performance is the same. Even if you have an absolute need to be able to grab the data in a sorted order at some point during construction of the array, it might still be cheaper to just sort an array as you need it, unless you need it sorted at each insertion!
Note, this answer pertains to random data - it is possible, and even likely, to come up with a more efficient approach for "keeping an array sorted as it's constructed" if both the size and characteristics of the data are known, and follow some mathematical pattern, other than randomness; however, such approach would necessarily be overfit for the specific data set it relates to, rather than a general solution.
I recommend the Heapsort or Mergesort.
Heapsort is a comparison-based algorithm that uses a binary heap data structure to sort elements. It divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element and moving that to the sorted region.
Mergesort is a comparison-based algorithm that focuses on how to merge together two pre-sorted arrays such that the resulting array is also sorted.
If you want a true O(nlogn) and sorted "as it is constructed", I would recommend using a proper (tree) based data structure instead of array. You can use data structures like self balanced binary tree, AVL trees.

Understanding the Big O for squaring elements in an array

I was working on a problem where you have to square the numbers in a sorted array on leetcode. Here is the original problem
Given an array of integers A sorted in non-decreasing order, return an array of the squares of each number, also in sorted non-decreasing order.
I am trying to understand the big O for my code and for the code that was given in the solution.
This is my code
def sortedSquare(A):
new_A = []
for num in A:
num = num*num
new_A.append(num)
return sorted(new_A)
print(sortedSquare([-4, -1, 0, 3, 10]))
Here is the code from the solution:
def sortedSquares(self, A):
return sorted(x*x for x in A)
For the solution, the Big O is
NlogN
Where N is the length of the array. I don't understand why it would be logN and not just N for the Big O?
For my solution, I am seeing it as Big O of N because I am just iterating through the entire array.
Also, is my solution a good solution compared to the solution that was given?
Your solution does the exact same thing as the given solution. Both solutions square all the elements and then sort the resultant array, with the leetcode solution being a bit more concise.
The reason why both these solutions are O(NlogN) is because of the use of sorted(). Python's builtin sort is timsort which sorts the array in O(NlogN) time. The use of sorted(), not squaring, is what provides the dominant factor in your time complexity (O(NlogN) + O(N) = O(NlogN)).
Note though that this problem can be solved in O(N) using two pointers or by using the merge step in mergesort.
Edit:
David Eisenstat brought up a very good point on timsort. Timsort aggregates strictly increasing and strictly decreasing runs and merges them. Since the resultant squared array will be first strictly decreasing and then strictly increasing, timsort will actually reverse the strictly decreasing run and then merge them in O(N).
The way complexity works is that the overall complexity for the whole program is the worst complexity for any one part. So, in your case, you have the part that squares the numbers and you have the part that sorts the numbers. So which part is the one that determines the overall complexity?
The squaring part is o(n) because you only touch the elements once in order to square them.
What about the sorting part? Generally it depends on what sorting function you use:
Most sort routines have O(n*log(n)) because they use a divide and conquer algorithm.
Some (like bubble sort) have O(n^2)
Some (like the counting sort) have O(n)
In your case, they say that the given solution is O(n*log(n)) and since the squaring part is O(n) then the sorting part must be O(n*log(n)). And since your code uses the same sorting function as the given solution your sort must also be O(n*log(n))
So your squaring part is O(n) and your sorting part is O(n*log(n)) and the overall complexity is the worst of those: O(n*log(n))
If extra storage space is allowed (like in your solution), the whole process can be performed in time O(N). The initial array is already sorted. You can split it in two subsequences with the negative and positive values.
Square all elements (O(N)) and reverse the negative subsequence (O(N) at worse), so that both sequences are sorted. If one of the subsequences is empty, you are done.
Otherwise, merge the two sequences, in time O(N) (this is the step that uses extra O(N) space).

Given an infinitely large array with priority associated with every element then sort the given array according to increasing priority in O(n)

I was asked this question in an interview. I couldn't do better than O(NlogN). I was sorting every time.
Counting sort or Radix sort can be used here if faster algorithm required than O(nlogn). You can achieve O(n) time complexity (for Radix sort, its little bit more than constant time) with extra space - O(n) as well.

Sorting a partially sorted array in O(n)

Hey so I'm just really stuck on this question.
I need to devise an algorithm (no need for code) that sorts a certain partially sorted array into a fully sorted array. The array has N real numbers and the first N-[N\sqrt(N)] (the [] denotes the floor of this number) elements are sorted, while are the rest are not. There are no special properties to the unsorted numbers at the end, in fact I'm told nothing about them other than they're obviously real numbers like the rest.
The kicker is time complexity for the algorithm needs to be O(n).
My first thought was to try and sort only the unsorted numbers and then use a merge algorithm, but I can't figure out any sorting algorithm that would work here in O(n). So I'm thinking about this all wrong, any ideas?
This is not possible in the general case using a comparison-based sorting algorithm. You are most likely missing something from the question.
Imagine the partially sorted array [1, 2, 3, 4564, 8481, 448788, 145, 86411, 23477]. It contains 9 elements, the first 3 of which are sorted (note that floor(N/sqrt(N)) = floor(sqrt(N)) assuming you meant N/sqrt(N), and floor(sqrt(9)) = 3). The problem is that the unsorted elements are all in a range that does not contain the sorted elements. It makes the sorted part of the array useless to any sorting algorithm, since they will stay there anyway (or be moved to the very end in the case where they are greater than the unsorted elements).
With this kind of input, you still need to sort, independently, N - floor(sqrt(N)) elements. And as far as I know, N - floor(sqrt(N)) ~ N (the ~ basically means "is the same complexity as"). So you are left with an array of approximately N elements to sort, which takes O(N log N) time in the general case.
Now, I specified "using a comparison-based sorting algorithm", because sorting real numbers (in some range, like the usual floating-point numbers stored in computers) can be done in amortized O(N) time using a hash sort (similar to a counting sort), or maybe even a modified radix sort if done properly. But the fact that a part of the array is already sorted doesn't help.
In other words, this means there are sqrt(N) unsorted elements at the end of the array. You can sort them with an O(n^2) algorithm which will give a time of O(sqrt(N)^2) = O(N); then do the merge you mentioned which will also run in O(N). Both steps together will therefore take just O(N).

Find the one non-repeating element in array?

I have an array of n elements in which only one element is not repeated, else all the other numbers are repeated >1 times. And there is no limit on the range of the numbers in the array.
Some solutions are:
Making use of hash, but that would result in linear time complexity but very poor space complexity
Sorting the list using MergeSort O(nlogn) and then finding the element which doesn't repeat
Is there a better solution?
One general approach is to implement a bucketing technique (of which hashing is such a technique) to distribute the elements into different "buckets" using their identity (say index) and then find the bucket with the smallest size (1 in your case). This problem, I believe, is also known as the minority element problem. There will be as many buckets as there are unique elements in your set.
Doing this by hashing is problematic because of collisions and how your algorithm might handle that. Certain associative array approaches such as tries and extendable hashing don't seem to apply as they are better suited to strings.
One application of the above is to the Union-Find data structure. Your sets will be the buckets and you'll need to call MakeSet() and Find() for each element in your array for a cost of $O(\alpha(n))$ per call, where $\alpha(n)$ is the extremely slow-growing inverse Ackermann function. You can think of it as being effectively a constant.
You'll have to do Union when an element already exist. With some changes to keep track of the set with minimum cardinality, this solution should work. The time complexity of this solution is $O(n\alpha(n))$.
Your problem also appears to be loosely related to the Element Uniqueness problem.
Try a multi-pass scanning if you have strict space limitation.
Say the input has n elements and you can only hold m elements in your memory. If you use a hash-table approach, in the worst case you need to handle n/2 unique numbers so you want m>n/2. In case you don't have that big m, you can partition n elements to k=(max(input)-min(input))/(2m) groups, and go ahead scan the n input elements k times (in the worst case):
1st run: you only hash-get/put/mark/whatever elements with value < min(input)+m*2; because in the range (min(input), min(input)+m*2) there are at most m unique elements and you can handle that. If you are lucky you already find the unique one, otherwise continue.
2nd run: only operate on elements with value in range (min(input)+m*2, min(input)+m*4), and
so on, so forth
In this way, you compromise the time complexity to a O(kn), but you get a space complexity bound of O(m)
Two ideas come to my mind:
A smoothsort may be a better alternative than the cited mergesort for your needs given it's O(1) in memory usage, O(nlogn) in the worst case as the merge sort but O(n) in the best case;
Based on the (reverse) idea of the splay tree, you could make a type of tree that would
push the leafs toward the bottom once they are used (instead of upward as in the splay tree). This would still give you a O(nlogn) implantation of the sort, but the advantage would be the O(1) step of finding the unique element, it would be the root. The sorting algorithm is the sum of O(nlogn) + O(n) and this algorithm would be O(nlogn) + O(1)
Otherwise, as you stated, using a hash based solution (like hash-implemented set) would result in a O(n) algorithm (O(n) to insert and add a counting reference to it and O(n) to traverse your set to find the unique element) but you seemed to dislike the memory usage, though I don't know why. Memory is cheap, these days...

Resources