I wrote/used this kind of sorting. I'm just wondering does it has any name or is it similar to any of existing sorting algorithms. By the way, is it even efficient/worthy or not really?
int s = 20;
int unsorted_array[s];
//(adding numbers to unsorted_array)
//We assume that every number is different to avoid any difficulties.
int i, i2, pos;
int sorted_array[s];
//!!! sorting algo starts here:
for(i=0; i<s; i++){
pos = 0;
for(i2=0; i2<s; i2++){
if(unsorted_array[i] > unsorted_array[i2]){
pos += 1;
}
}
sorted_array[pos] = unsorted_array[i];
}
So what do you think? is it slower/faster than other kinds of sorting methods? I'm still learning. thanks for any replies!
Is it slower/faster than other kinds of sorting methods?
Let's analyze the time complexity of this function. How much work will it need to do as the size of the unsorted list grows?
The important part are loops. They will tell us how many times you need to do the thing and that's what's important. Your loops can be broken down to this:
for(1 to s){
for(1 to s){
do that thing
}
}
For each element it has to recheck every element. If there are 2 items that means you do the thing 4 times. 3 items, 9 times. 4 items, 16 times. We say the time complexity is n^2 (n is the convention for the size) because as the size increases the number of steps is squared. That means the time it takes will grow exponentially as the size increases. At 10 items it takes 100 times. At 100 items it takes 10,000. At 1,000 it takes 1,000,000. n^2 is to be avoided if possible.
Most sorting algorithms can do their work in n * log(n) or quasilinear time. As the size increases the time will grow by n * log(n). This is faster than linear, but slower than exponential. log(n) is often the natural logarithm or ln(n). At 10 items it will take about 23 times. 100 about 460. At 1000 about 6900. So your algorithm is slower.
Algorithms above n * log(n) grow so fast it's necessary to distort the vertical time scale to meaningfully fit them on the same graph with better performing algorithms.
As you can guess, for large numbers of items it's more important to have a better performing algorithm than to do the thing faster. An n^2 algorithm that does the thing 100 times faster than an n log n will lose at about 600 items.
n^2 = 100 n * ln(n)
n = 100 ln(n)
n / ln(n) = 100
It looks to me like some kind of reverse selection sort. A selection sort would say "what element goes in position 0?" and then find that element. Your sort seems to be saying "where does the element currently in position 0 go?", which is an equally valid question.
In terms of complexity, it's definitely O(n^2), which puts it on par with the other "inefficient" schemes like insertion, selection, bubble, etc. and puts it below the more sophisticated "better" ones like merge or quick. The main concerns I have on top of that are that you're actually iterating n^2 times, whereas algorithms like insertion or selection can get away with n (n + 1) / 2 (triangular numbers, as opposed to square ones), which is the same complexity class but a smaller number overall. Also, your sort requires us to come up with a new array in memory, whereas a lot of the existing ones (insertion and selection, in particular, since they're sort of close to yours) can be done in constant space without allocating any more arrays.
This works by finding, for each element in the array, the number of elements less than it. This is equivalent for finding its final position.
It is O(n*n) which is slow (there are decent O(n*lg(n)) algorithms). But there are plenty of O(n*n) sorting algorithms out there so you're in good company.
It also only works with unique values which is a pretty severe limitation.
It also requires a second array to copy it to, of which most sorting algorithms don't so this is another minus point.
On the upside it does zero swaps, and only a very small number of actual copies (exactly n infact), which in some cases may be a good thing, but that's a pretty minor plus point.
Related
i was wondering what would be the time complexity of this piece of code?
last = 0
ans = 0
array = [1,2,3,3,3,4,5,6]
for number in array:
if number != last then: ans++;
last = number
return ans
im thinking O(n^2) as we look at all the array elements twice, once in executing the for loop and then another time when comparing the two subsequent values, but I am not sure if my guess is correct.
While processing each array element, you just make one comparison, based on which you update ans and last. The complexity of the algorithm stands at O(n), and not O(n^2).
The answer is actually O(1) for this case, and I will explain why after explaining why a similar algorithm would be O(n) and not O(n^2).
Take a look at the following example:
def do_something(array):
for number in array:
if number != last then: ans++;
last = number
return ans
We go through each item in the array once, and do two operations with it.
The rule for time complexity is you take the largest component and remove a factor.
if we actually wanted to calculate the exact number of operaitons, you might try something like:
for number in array:
if number != last then: ans++; # n times
last = number # n times
return ans # 1 time
# total number of instructions = 2 * n + 1
Now, Python is a high level language so some of these operations are actually multiple operations put together, so that instruction count is not accurate. Instead, when discussing complexity we just take the largest contributing term (2 * n) and remove the coefficient to get (n). big-O is used when discussing worst case, so we call this O(n).
I think your confused because the algorithm you provided looks at two numbers at a time. the distinction you need to understand is that your code only "looks at 2 numbers at a time, once for each item in the array". It does not look at every possible pair of numbers in the array. Even if your code looked at half of the number of possible pairs, this would still be O(n^2) because the 1/2 term would be excluded.
Consider this code that does, here is an example of an O(n^2) algorithm.
for n1 in array:
for n2 in array:
print(n1 + n2)
In this example, we are looking at each pair of numbers. How many pairs are there? There are n^2 pairs of numbers. Contrast this with your question, we look at each number individually, and compare with last. How many pairs of number and last are there? At worst, 2 * n, which we call O(n).
I hope this clears up why this would be O(n) and not O(n^2). However, as I said at the beginning of my answer this is actually O(1). This is because the length of the array is specifically 8, and not some arbitrary length n. Every time you execute this code it will take the same amount of time, it doesn't vary with anything and so there is no n. n in my example was the length of the array, but there is no such length term provided in your example.
Let an array with the size of n. We need to write an algorithm which checks if there's a number which appears at least n/loglogn times.
I've understood that there's a way doing it in O(n*logloglogn) which goes something like this:
Find the median using select algorithm and count how many times it appears. if it appears more than n/loglogn we return true. It takes O(n).
Partition the array according the median. It takes O(n)
Apply the algorithm on both sides of the partition (two n/2 arrays).
If we reached a subarray of size less than n/loglogn, stop and return false.
Questions:
Is this algorithm correct?
The recurrence is: T(n) = 2T(n/2) + O(n) and the base case is T(n/loglogn) = O(1). Now, the largest number of calls in the recurrence-tree is O(logloglogn) and since every call is O(n) then the time complexity is O(n*logloglogn). Is that correct?
The suggested solution works, and the complexity is indeed O(n/logloglog(n)).
Let's say a "pass i" is the running of all recursive calls of depth i. Note that each pass requires O(n) time, since while each call is much less than O(n), there are several calls - and overall, each element is processed once in each "pass".
Now, we need to find the number of passes. This is done by solving the equation:
n/log(log(n)) = n / 2^x
<->
n/log(log(n)) * 2^x = n
And the idea is each call is dividing the array by half until you get to the predefined size of n/log(log(n)).
This problem is indeed solved for x in O(n/log(log(log(n))), as you can see in wolfram alpha, and thus the complexity is indeed O(nlog(log(log(n))))
As for correctness - that's because if an element repeats more than the required - it must be in some subarray with size greater/equals the required size, and by reducing constantly the size of the array, you will arrive to a case at some point where #repeats <= size(array) <= #repeats - at this point, you are going to find this element as the median, and find out it's indeed a "frequent item".
Some other approach, in O(n/log(log(n)) time - but with great constants is suggested by Karp-Papadimitriou-Shanker, and is based on filling a table with "candidates" while processing the array.
Can we find the mode of an array in O(n) time without using Additional O(n) space, nor Hash. Moreover the data is not sorted?
The problem is not easier then Element distinctness problem1 - so basically without the additional space - the problem's complexity is Theta(nlogn) at best (and since it can be done in Theta(nlogn) - it is ineed the case).
So basically - if you cannot use extra space for the hash table, best is sort and iterate, which is Theta(nlogn).
(1) Given an algorithm A that runs in O(f(n)) for this problem, it is easy to see that one can run A and then verify that the resulting element repeats more then once with an extra iteration to solve the element distinctness problem in O(f(n) + n).
Under the right circumstances, yes. Just for example, if your data is amenable to a radix sort, then you can sort with only constant extra space in linear time, followed by a linear scan through the sorted data to find the mode.
If your data requires comparison-based sorting, then I'm pretty sure O(N log N) is about as well as you can do in the general case.
Just count the frequencies. This is not O(n) space, it is O(k), with k being the number of distinct values in the range. This is actually constant space.
Time is clearly linear O(n)
//init
counts = array[k]
for i = 0 to k
counts[i] = 0
maxCnt = 0
maxVal = vals[0]
for val in vals
counts[val]++
if (counts[val] > maxCnt)
maxCnt = counts[val]
maxVal = val
The main problem here, is that while k may be a constant, it may also be very very huge. However, k could also be small. Regardless, this does properly answer your question, even if it isn't practical.
I know this can be done by sorting the array and taking the larger numbers until the required condition is met. That would take at least nlog(n) sorting time.
Is there any improvement over nlog(n).
We can assume all numbers are positive.
Here is an algorithm that is O(n + size(smallest subset) * log(n)). If the smallest subset is much smaller than the array, this will be O(n).
Read http://en.wikipedia.org/wiki/Heap_%28data_structure%29 if my description of the algorithm is unclear (it is light on details, but the details are all there).
Turn the array into a heap arranged such that the biggest element is available in time O(n).
Repeatedly extract the biggest element from the heap until their sum is large enough. This takes O(size(smallest subset) * log(n)).
This is almost certainly the answer they were hoping for, though not getting it shouldn't be a deal breaker.
Edit: Here is another variant that is often faster, but can be slower.
Walk through elements, until the sum of the first few exceeds S. Store current_sum.
Copy those elements into an array.
Heapify that array such that the minimum is easy to find, remember the minimum.
For each remaining element in the main array:
if min(in our heap) < element:
insert element into heap
increase current_sum by element
while S + min(in our heap) < current_sum:
current_sum -= min(in our heap)
remove min from heap
If we get to reject most of the array without manipulating our heap, this can be up to twice as fast as the previous solution. But it is also possible to be slower, such as when the last element in the array happens to be bigger than S.
Assuming the numbers are integers, you can improve upon the usual n lg(n) complexity of sorting because in this case we have the extra information that the values are between 0 and S (for our purposes, integers larger than S are the same as S).
Because the range of values is finite, you can use a non-comparative sorting algorithm such as Pigeonhole Sort or Radix Sort to go below n lg(n).
Note that these methods are dependent on some function of S, so if S gets large enough (and n stays small enough) you may be better off reverting to a comparative sort.
Here is an O(n) expected time solution to the problem. It's somewhat like Moron's idea but we don't throw out the work that our selection algorithm did in each step, and we start trying from an item potentially in the middle rather than using the repeated doubling approach.
Alternatively, It's really just quickselect with a little additional book keeping for the remaining sum.
First, it's clear that if you had the elements in sorted order, you could just pick the largest items first until you exceed the desired sum. Our solution is going to be like that, except we'll try as hard as we can to not to discover ordering information, because sorting is slow.
You want to be able to determine if a given value is the cut off. If we include that value and everything greater than it, we meet or exceed S, but when we remove it, then we are below S, then we are golden.
Here is the psuedo code, I didn't test it for edge cases, but this gets the idea across.
def Solve(arr, s):
# We could get rid of worse case O(n^2) behavior that basically never happens
# by selecting the median here deterministically, but in practice, the constant
# factor on the algorithm will be much worse.
p = random_element(arr)
left_arr, right_arr = partition(arr, p)
# assume p is in neither left_arr nor right_arr
right_sum = sum(right_arr)
if right_sum + p >= s:
if right_sum < s:
# solved it, p forms the cut off
return len(right_arr) + 1
# took too much, at least we eliminated left_arr and p
return Solve(right_arr, s)
else:
# didn't take enough yet, include all elements from and eliminate right_arr and p
return len(right_arr) + 1 + Solve(left_arr, s - right_sum - p)
One improvement (asymptotically) over Theta(nlogn) you can do is to get an O(n log K) time algorithm, where K is the required minimum number of elements.
Thus if K is constant, or say log n, this is better (asymptotically) than sorting. Of course if K is n^epsilon, then this is not better than Theta(n logn).
The way to do this is to use selection algorithms, which can tell you the ith largest element in O(n) time.
Now do a binary search for K, starting with i=1 (the largest) and doubling i etc at each turn.
You find the ith largest, and find the sum of the i largest elements and check if it is greater than S or not.
This way, you would run O(log K) runs of the selection algorithm (which is O(n)) for a total running time of O(n log K).
eliminate numbers < S, if you find some number ==S, then solved
pigeon-hole sort the numbers < S
Sum elements highest to lowest in the sorted order till you exceed S.
I was asked this interview question recently:
You're given an array that is almost sorted, in that each of the N elements may be misplaced by no more than k positions from the correct sorted order. Find a space-and-time efficient algorithm to sort the array.
I have an O(N log k) solution as follows.
Let's denote arr[0..n) to mean the elements of the array from index 0 (inclusive) to N (exclusive).
Sort arr[0..2k)
Now we know that arr[0..k) are in their final sorted positions...
...but arr[k..2k) may still be misplaced by k!
Sort arr[k..3k)
Now we know that arr[k..2k) are in their final sorted positions...
...but arr[2k..3k) may still be misplaced by k
Sort arr[2k..4k)
....
Until you sort arr[ik..N), then you're done!
This final step may be cheaper than the other steps when you have less than 2k elements left
In each step, you sort at most 2k elements in O(k log k), putting at least k elements in their final sorted positions at the end of each step. There are O(N/k) steps, so the overall complexity is O(N log k).
My questions are:
Is O(N log k) optimal? Can this be improved upon?
Can you do this without (partially) re-sorting the same elements?
As Bob Sedgewick showed in his dissertation work (and follow-ons), insertion sort absolutely crushes the "almost-sorted array". In this case your asymptotics look good but if k < 12 I bet insertion sort wins every time. I don't know that there's a good explanation for why insertion sort does so well, but the place to look would be in one of Sedgewick's textbooks entitled Algorithms (he has done many editions for different languages).
I have no idea whether O(N log k) is optimal, but more to the point, I don't really careāif k is small, it's the constant factors that matter, and if k is large, you may as well just sort the array.
Insertion sort will nail this problem without re-sorting the same elements.
Big-O notation is all very well for algorithm class, but in the real world, constants matter. It's all too easy to lose sight of this. (And I say this as a professor who has taught Big-O notation!)
If using only the comparison model, O(n log k) is optimal. Consider the case when k = n.
To answer your other question, yes it is possible to do this without sorting, by using heaps.
Use a min-heap of 2k elements. Insert 2k elements first, then remove min, insert next element etc.
This guarantees O(n log k) time and O(k) space and heaps usually have small enough hidden constants.
Since k is apparently supposed to be pretty small, an insertion sort is probably the most obvious and generally accepted algorithm.
In an insertion sort on random elements, you have to scan through N elements, and you have to move each one an average of N/2 positions, giving ~N*N/2 total operations. The "/2" constant is ignored in a big-O (or similar) characterization, giving O(N2) complexity.
In the case you're proposing, the expected number of operations is ~N*K/2 -- but since k is a constant, the whole k/2 term is ignored in a big-O characterization, so the overall complexity is O(N).
Your solution is a good one if k is large enough. There is no better solution in terms of time complexity; each element might be out of place by k places, which means you need to learn log2 k bits of information to place it correctly, which means you need to make log2 k comparisons at least--so it's got to be a complexity of at least O(N log k).
However, as others have pointed out, if k is small, the constant terms are going to kill you. Use something that's very fast per operation, like insertion sort, in that case.
If you really wanted to be optimal, you'd implement both methods, and switch from one to the other depending on k.
It was already pointed out that one of the asymptotically optimal solutions uses a min heap and I just wanted to provide code in Java:
public void sortNearlySorted(int[] nums, int k) {
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
for (int i = 0; i < k; i++) {
minHeap.add(nums[i]);
}
for (int i = 0; i < nums.length; i++) {
if (i + k < nums.length) {
minHeap.add(nums[i + k]);
}
nums[i] = minHeap.remove();
}
}