Given an array the size of n where:
1/2 of the array is with a single (unknown) value.
1/4 of the array is with a single (unknown) different value.
And so on for 1/8, 1/16, 1/32
Give an algorithm to sort the array.
You cannot use the find median algorithm
So what I figured is:
There are only logn different values
There is a simple solution using a binary heap on O ( n*loglogn)
It looks like a question that needed to be solved in O (n)
Here is one possible approach:
scan the array and store element frequencies (there are log n distinct elements) in a hash table in amortized O(n) time; this is doable because we can do insertions in amortized O(1) time;
now run a classic sorting algorithm on these log n elements: this is doable in deterministic O(log n log log n) time using, say, heap sort or merge sort;
now expand the sorted array---or create a new one and fill it using the sorted array and the hash table---using frequencies from the hash table; this is doable in O(n) amortized time.
The whole algorithm thus runs in amortized O(n) time, i.e., it is dominated by eliminating duplicates and expanding the sorted array. The space complexity is O(n).
This is essentially optimal because you need to "touch" all the elements to print the sorted array, which means we have a matching lower bound of Omega(n) on the running time.
The idea is to use the Majority algorithm that takes O(n) then discovering what's the "half" value deleting it from the array and then doing it again on the new array
n+n/2+n/4+n/8+..... < 2n => O (n)
Going over the array once, keep hash map for seen values.
Like you said there are only log(n) different values.
Now you have list of all the different values - sorting them will take lon(n)*log(log(n))
Once you have the sorted uniq like it's easy to constract the original array : The max value will take n/2 cells , the 2nd take n/4 and so on.
The Total run time is O(n + lon(n)*log(log(n)) + n) which is O(n)
Related
I am trying Index mapping in Hashing to search an element in an array. A linear search would take O(n) to search an element in an array of size n. In hashing what we're doing is basically reducing the time complexity to O(1) by creating a 2d matrix of zeros (say hash[1000][2]) and reassigning hash[a[i]][0] to 1 if a[i] is positive and hash[-a[i]][1] if a[i] is negative. Here a[i] is the array from which we are supposed to search an element.
for(i=0 ;i<n ;i++)
{
if(a[i]>=0)
has[a[i]][0]=1;
else
has[-a[i]][1]=1;
}
How much time does the above code take to execute?
Even by hashing, aren't we having a time complexity of O(n) just like linear search? Isn't what has consumed to assign n 1's in a 2d array of zeros equal to the time taken to search an element in a linear fashion?
A typical thing to do here would be to maintain an array sorted by hash key, and then use a binary search to locate elements, giving worst-case time complexity of O(log n).
You can maintain the sort order by using the same binary search for inserting new elements as you use for finding existing elements.
That last point is important: as noted in comments, sorting before each search degrades search time to the point where brute-force linear search is faster for small datasets. But that overhead can be eliminated; the array never needs to be sorted if you maintain sort order when inserting new elements.
Hey so I'm just really stuck on this question.
I need to devise an algorithm (no need for code) that sorts a certain partially sorted array into a fully sorted array. The array has N real numbers and the first N-[N\sqrt(N)] (the [] denotes the floor of this number) elements are sorted, while are the rest are not. There are no special properties to the unsorted numbers at the end, in fact I'm told nothing about them other than they're obviously real numbers like the rest.
The kicker is time complexity for the algorithm needs to be O(n).
My first thought was to try and sort only the unsorted numbers and then use a merge algorithm, but I can't figure out any sorting algorithm that would work here in O(n). So I'm thinking about this all wrong, any ideas?
This is not possible in the general case using a comparison-based sorting algorithm. You are most likely missing something from the question.
Imagine the partially sorted array [1, 2, 3, 4564, 8481, 448788, 145, 86411, 23477]. It contains 9 elements, the first 3 of which are sorted (note that floor(N/sqrt(N)) = floor(sqrt(N)) assuming you meant N/sqrt(N), and floor(sqrt(9)) = 3). The problem is that the unsorted elements are all in a range that does not contain the sorted elements. It makes the sorted part of the array useless to any sorting algorithm, since they will stay there anyway (or be moved to the very end in the case where they are greater than the unsorted elements).
With this kind of input, you still need to sort, independently, N - floor(sqrt(N)) elements. And as far as I know, N - floor(sqrt(N)) ~ N (the ~ basically means "is the same complexity as"). So you are left with an array of approximately N elements to sort, which takes O(N log N) time in the general case.
Now, I specified "using a comparison-based sorting algorithm", because sorting real numbers (in some range, like the usual floating-point numbers stored in computers) can be done in amortized O(N) time using a hash sort (similar to a counting sort), or maybe even a modified radix sort if done properly. But the fact that a part of the array is already sorted doesn't help.
In other words, this means there are sqrt(N) unsorted elements at the end of the array. You can sort them with an O(n^2) algorithm which will give a time of O(sqrt(N)^2) = O(N); then do the merge you mentioned which will also run in O(N). Both steps together will therefore take just O(N).
Given an array of integers what is the worst case time complexity that would find pair of integers which are same ?
I think this can be done in O(n) by using counting sort or by using XOR .
Am i right ?
Question is not worried about space complexity and answer says O(nlgn).
Counting sort
If the input allows you to use counting sort, then all you have to do is sort the input array in O(n) time and then look for duplicates, also in O(n) time. This algorithm can be improved (although not in complexity), since you don't actually need to sort the array. You can create the same auxiliary array that counting sort uses, which is indexed by the input integers, and then add these integers one by one until the current one has already been inserted. At this point, the two equal integers have been found.
This solution provides worst-case, average and best-case linear time complexities (O(n)), but requires the input integers to be in a known and ideally small range.
Hashing
If you cannot use counting sort, then you could fall back on hashing and use the same solution as before (without sorting), with a hash table instead of the auxiliary array. The issue with hash tables is that the worst-case time complexity of their operations is linear, not constant. Indeed, due to collisions and rehashing, insertions are done in O(n) time in the worst case.
Since you need O(n) insertions, that makes the worst-case time complexity of this solution quadratic (O(n²)), even though its average and best-case time complexities are linear (O(n)).
Sorting
Another solution, in case counting sort is not applicable, is to use another sorting algorithm. The worst-case time complexity for comparison-based sorting algorithms is, at best, O(n log n). The solution would be to sort the input array and look for duplicates in O(n) time.
This solution has worst-case and average time complexities of O(n log n), and depending on the sorting algorithm, a best-case linear time complexity (O(n)).
Following is the pseudo code for Counting Sort:
# input -- the array of items to be sorted; key(x) returns the key for item x
# n -- the length of the input
# k -- a number such that all keys are in the range 0..k-1
# count -- an array of numbers, with indexes 0..k-1, initially all zero
# output -- an array of items, with indexes 0..n-1
# x -- an individual input item, used within the algorithm
# total, oldCount, i -- numbers used within the algorithm
# calculate the histogram of key frequencies:
for x in input:
count[key(x)] += 1
# calculate the starting index for each key:
total = 0
for i in range(k): # i = 0, 1, ... k-1
oldCount = count[i]
count[i] = total
total += oldCount
# copy to output array, preserving order of inputs with equal keys:
for x in input:
output[count[key(x)]] = x
count[key(x)] += 1
return output
As you can observe, all the keys are in the range of 0 ... k-1. In your case number itself is the key, and it has to be in certain range for counting sort to be applicable. Only then it can be done in O(n) with O(k) space.
Otherwise, solution is O(nlogn) using any comparison based sorting.
If you subscribe to integer sorts being O(n), then by all means this is O(n) by sorting + iterating until two adjacent elements compare equal.
Hashing is actually O(n2) in the worst case (you have the world's worst hashing algorithm that hashes everything to the same index). Although in practice using a hash table to get counts will give you linear time performance (average case).
In reality, linear time integer sorts "cheat" by fixing the number of bits used to represent an integer as some constant k that can then be ignored later. (In practice, though, these are great assumptions and integer sorts can be really fast!)
Comparison-based sorts like merge sort will give you O(n log n) complexity in the worst case.
The XOR solution you speak of is for finding a single unique "extra" item between two otherwise identical lists of integers.
Asking because of interest;
Assuming the comparison of elements in array takes O(n),
would it be possible to sort the array in O(n) if 99% if the elements are same?
And what if an element in the array appears more than n/2 times?
Still O(n) possible?
No: once you find the element that occurs 99% of the time, you still need to sort the rest of the array, which will contain n/100 elements. We know that a lower bound on the big O of comparison-based sorting algorithms is n log n - so for sorting the remaining n/100 elements, we can't do better than O(n/100 log(n/100)), which is still O(n log n).
I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?
The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)
I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).
One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?