Sort by Frequency

Sort by Frequency - arrays

I want to sort the elements in an array by their frequency.
Input: 2 5 2 8 5 6 8 8
Output: 8 8 8 2 2 5 5 6
Now one solution to this would be:
Sort the elements using Quick sort or Merge sort. O(nlogn)
Construct a 2D array of element and count by scanning the sorted array. O(n)
Sort the constructed 2D array according to count. O(nlogn)
Among the other probable methods that I have read, one uses a Binary Search Tree and the other uses Hashing.
Could anyone suggest me a better algorithm? I know the complexity can't be reduced. But I want to avoid so many traversals.

You can perform one pass on the array without sorting it, and on a separate structure go counting how many times you find an element. This could be done on a separate array, if you know the ranges of the elements you'll find, or on a hash table, if you don't. In any case this process will be O(n). Then you can perform a sort of the second structure generated (where you have the count), using as sort parameter the amount that each element has associated. This second process is, as you said O(nlogn) if you choose a proper algorithm.
For this second phase I would recommend using Heap sort, by the means of a priority queue. You can tell the queue to order the elements by the count attribute (the one calculated on step one), and then just add the elements one by one. When you finish adding, the queue will be already sorted, and the algorithm has the desired complexity. TO retrieve your elements in order you just have to start popping.

Related

Unsorting a sorted array and preserving the original order of the array elements in C

I have my initial array ( random or duplicates allowed) which needs to be sorted first and then needs to be unsorted the sorted array back to the original array with the order preserved.
Ex:
orig_array: char[512] = { 1, 2, 4, 1,3,5,a,b,c,........}
Sorted array: char[512] = { 1, 1, 2,2,3.....) ( using sorting algorithms )
Unsorting back to original order: { 1, 2,4,1,3,......}
I am stuck on the way on how to implement unsorting back to the original order in which they are created.
Some background on why I'm asking:
Unsorting the array is done completely on a different system. The sorted array is sent out to the network and is received by the host. The original data must be preserved.
I have a compression module on both ends to compress and decompress data. Compression module is CPU expensive. My idea was compression on data sequences sorted shouldn't consume much CPU and should be pretty quick to encrypt and decrypt on both ends which decreases the overall latency

If you sort "in-place" - that is, reorder elements in the actual array, you cannot unsort. Sorry.
You could, instead, sort "out-of-place", or sort an array of pointers to elements in the array (using the array data for comparison) - keeping the original array as-is.
See also: In-place algorithms on Wikipedia

You cannot reverse a sort without retaining additional data. A trivial way to do that is to retain a copy of the data in their original order, for instance by performing an out-of-place sort. That's fast and simple, but it's not the only alternative.
Another way you could provide for reversing the process (per #lurker) is to make a list of the changes (swaps / permutations) performed on the array, in the order they are performed, and to reverse by going backward through that list and reversing each individual step. That's fairly simple, and approximately as fast as the original sort, but it requires more memory in the average-case asymptotic limit than just keeping a copy does.
Yet another way is to track the element permutation performed by the sort. You can do this by creating an auxiliary array of element indices, one per element to be sorted, initially in order. As the sort proceeds, it mirrors each element move with a corresponding move in the auxiliary array. When the sort is finished, the auxiliary array tells you what the original index of each element was, and you can use that information to return the elements to the original order. This is no less memory efficient than keeping a copy of the original data, and the order restoration can be performed in linear time.
Alternatively, you can avoid sorting the original data at all. If you instead sort an array of pointers to the data or, equivalently, indices of the data then you can find the sorted order without losing the original order in the first place.

You say you want to sort and unsort because the data needs to be compressed (which takes time) before being sent over a network, and compressing sorted data "should" be faster than compressing unsorted data.
This sounds like a case of premature optimization. "Should" be faster sounds like a guess. You should test compressing sorted data vs. compressing unsorted data to see if there's any difference. You would also need to take the sorting and unsorted time into account. There's a good chance that the cost of sorting would be much more any any possible gain from compressing a sorted list.
Even if you find that compressing sorted data is faster, it may not matter.
Here's the problem with unsorting.
Forget about software for a moment. Suppose I give you the following sorted list:
2 7 9 10 15 18 19 20 26 29
Now unsort it.
How would you do it? Without any additional information, there's no way to know what the "proper" order is. It may very well have been sorted to begin with, in which case it's already "unsorted".
At a minimum, you would need to know what position each value was in when the list is unsorted. So if the unsorted list looked like this:
10 9 26 29 18 2 20 7 19 15
You would need another list giving the original index of each value in the sorted list:
5 7 1 0 9 4 8 6 2 3
But now you have two lists of numbers, so you've doubled the size of the data you need to compress and send. On top of that, one of those lists is unsorted. So you're back to where you started.
The only possible way you could gain anything is if the items in the original list are not just numbers but a large aggregate data type, so that the list of indexes is small compared to the original list. In that case, you could add the original list index to the datatype to simplify keeping track of it. Then you would again have to test if compressing a sorted list is faster than an unsorted list. But again, the chance that compressing a sorted list is faster is small, even more so if the list contains complex data types.
My recommendation: just compress and send the unsorted list.

You can create a copy of your original array before you sort it or you can create another array of pointers and sort pointers...
The memory is not a problem, the time yes, don't lose more time to find magic function that not exist...

Given a source and a final array, find the number of hops to generate final from the source in less than quadratic time complexity

Suppose you have an array of integers (for eg. [1 5 3 4 6]). The elements are rearranged according to the following rule. Every element can hop forward (towards left) and slide the elements in those indices over which it hopped. The process starts with element in second index (i.e. 5). It has a choice to hop over element 1 or it can stay in its own position.If it does choose to hop, element 1 slides down to index 2. Let us assume it does choose to hop and our resultant array will then be [5 1 3 4 6]. Element 3 can now hop over 1 or 2 positions and the process repeats. If 3 hops over one position the array will now be [5 3 1 4 6] and if it hops over two positions it will now be [3 5 1 4 6].
It is very easy to show that all possible permutation of the elements is possible in this way. Also any final configuration can be reached by an unique set of occurrences.
The question is, given a final array and a source array, find the total number of hops required to arrive at the final array from the source. A O(N^2) implementation is easy to find, however I believe this can be done in O(N) or O(NlogN). Also if it is not possible to do better than O(N2) it will be great to know.
For example if the final array is [3,5,1,4,6] and the source array [1,5,3,4,6], the answer will be 3.
My O(N2) algorithm is like this: you loop over all the positions of the source array from the end, since we know that is the last element to move. Here it will be 6 and we check its position in the final array. We calculate the number of hops necessary and need to rearrange the final array to put that element in its original position in the source array. The rearranging step goes over all the elements in the array and the process loops over all the elements, hence O(N^2). Using Hashmap or map can help in searching, but the map needs to be updated after every step which makes in O(N^2).
P.S. I am trying to model correlation between two permutations in a Bayesian way and this is a sub-problem of that. Any ideas on modifying the generative process to make the problem simpler is also helpful.
Edit: I have found my answer. This is exactly what Kendall Tau distance does. There is an easy merge sort based algorithm to find this out in O(NlogN).

Consider the target array as an ordering. A target array [2 5 4 1 3] can be seen as [1 2 3 4 5], just by relabeling. You only have to know the mapping to be able to compare elements in constant time. On this instance, to compare 4 and 5 you check: index[4]=2 > index[5]=1 (in the target array) and so 4 > 5 (meaning: 4 must be to the right of 5 at the end).
So what you really have is just a vanilla sorting problem. The ordering is just different from the usual numerical ordering. The only thing that changes is your comparison function. The rest is basically the same. Sorting can be achieved in O(nlgn), or even O(n) (radix sort). That said, you have some additional constraints: you must sort in-place, and you can only swap two adjacent elements.
A strong and simple candidate would be selection sort, which will do just that in O(n^2) time. On each iteration, you identify the "leftiest" remaining element in the "unplaced" portion and swap it until it lands at the end of the "placed" portion. It can improve to O(nlgn) with the use of an appropriate data structure (priority queue for identifying the "leftiest" remaining element in O(lgn) time). Since nlgn is a lower bound for comparison based sorting, I really don't think you can do better than that.
Edit: So you're not interested in the sequence of swaps at all, only the minimum number of swaps required. This is exactly the number of inversions in the array (adapted to your particular needs: "non natural ordering" comparison function, but it doesn't change the maths). See this answer for a proof of that assertion.
One way to find the number of inversions is to adapt the Merge Sort algorithm. Since you have to actually sort the array to compute it, it turns out to be still O(nlgn) time. For an implementation, see this answer or this (again, remember that you'll have to adapt).

From your answer I assume number of hops is total number of swaps of adjacent elements needed to transform original array to final array.
I suggest to use something like insert sort, but without insertion part - data in arrays will not be altered.
You can make queue t for stalled hoppers as balanced binary search tree with counters (number of elements in subtree).
You can add element to t, remove element from t, balance t and find element position in t in O(log C) time, where C is the number of elements in t.
Few words on finding position of element. It consists of binary search with accumulation of skipped left sides (and middle elements +1 if you keep elements on branches) counts.
Few words on balancing/addition/removal. You have to traverse upward from removed/added element/subtree and update counters. Overall number of operations still hold at O(log C) for insert+balance and remove+balance.
Let's t is that (balanced search tree) queue, p is current original array index, q is final array index, original array is a, final array is f.
Now we have 1 loop starting from left side (say, p=0, q=0):
If a[p] == f[q], then original array element hops over the whole queue. Add t.count to the answer, increment p, increment q.
If a[p] != f[q] and f[q] is not in t, then insert a[p] into t and increment p.
If a[p] != f[q] and f[q] is in t, then add f[q]'s position in queue to answer, remove f[q] from t and increment q.
I like the magic that will ensure this process will move p and q to ends of arrays in the same time if arrays are really permutations of one array. Nevertheless you should probably check p and q overflows to detect incorrect data as we have no really faster way to prove data is correct.

Find the median of an unsorted array without sorting [duplicate]

This question already has answers here:
O(n) algorithm to find the median of n² implicit numbers
(3 answers)
Closed 7 years ago.
is there a way to find the Median of an unsorted array:
1- without sorting it.
2- without using the select algorithm, nor the median of medians
I found a lot of other questions similar to mine. But the solutions, most of them, if not all of them, discussed the SelectProblem and the MedianOfMedians

You can certainly find the median of an array without sorting it. What is not easy is doing that efficiently.
For example, you could just iterate over the elements of the array; for each element, count the number of elements less than and equal to it, until you find a value with the correct count. That will be O(n2) time but only O(1) space.
Or you could use a min heap whose size is just over half the size of the array. (That is, if the array has 2k or 2k+1 elements, then the heap should have k+1 elements.) Build the heap using the first array elements, using the standard heap building algorithm (which is O(N)). Then, for each remaining element x, if x is greater than the heap's minimum, replace the min element with x and do a SiftUp operation (which is O(log N)). At the end, the median is either the heap's minimum element (if the original array's size was odd) or is the average of the two smallest elements in the heap. So that's a total of O(n log n) time, and O(n) space if you cannot rearrange array elements. (If you can rearrange array elements, you can do this in-place.)

There is a randomized algorithm able to accomplish this task in O(n) steps (average case scenario), but it does involve sorting some subsets of the array. And, because of its random nature, there is no guarantee it will actually ever finish (though this unfortunate event should happen with vanishing probability).
I will leave the main idea here. For a more detailed description and for the proof of why this algorithm works, check here.
Let A be your array and let n=|A|. Lets assume all elements of A are distinct. The algorithm goes like this:
Randomly select t = n^(3/4) elements from A.
Let T be the "set" of the selected elements.Sort T.
Set pl = T[t/2-sqrt(n)] and pr = T[t/2+sqrt(n)].
Iterate through the elements of A and determine how many elements are less than pl (denoted by l) and how many are greater than pr (denoted by r). If l > n/2 or r > n/2, go back to step 1.
Let M be the set of elements in A in between pl and pr. M can be determined in step 4, just in case we reach step 5. If the size of M is no more than 4t, sort M. Otherwise, go back to step 1.
Return m = M[n/2-l] as the median element.
The main idea behind the algorithm is to obtain two elements (pl and pr) that enclose the median element (i.e. pl < m < pr) such that these two are very close one two each other in the ordered version of the array (and do this without actually sorting the array). With high probability, all the six steps only need to execute once (i.e. you will get pl and pr with these "good" properties from the first and only pass through step 1-5, so no going back to step 1). Once you find two such elements, you can simply sort the elements in between them and find the median element of A.
Step 2 and Step 5 do involve some sorting (which might be against the "rules" you've mysteriously established :p). If sorting a sub-array is on the table, you should use some sorting method that does this in O(slogs) steps, where s is the size of the array you are sorting. Since T and M are significantly smaller than A the sorting steps take "less than" O(n) steps. If it is also against the rules to sort a sub-array, then take into consideration that in both cases the sorting is not really needed. You only need to find a way to determine pl, pr and m, which is just another selection problem (with respective indices). While sorting T and M does accomplish this, you could use any other selection method (perhaps something rici suggested earlier).

A non-destructive routine selip() is described at http://www.aip.de/groups/soe/local/numres/bookfpdf/f8-5.pdf. It makes multiple passes through the data, at each stage making a random choice of items within the current range of values and then counting the number of items to establish the ranks of the random selection.

time complexity solving method

any easy method to calculate time complexity but not using searching and sorting
for eg: An array of size n initialized with 0.
write a code which inserts the
value 3k at position 3k in the array, whe re k=0,1…

Can you make this a bit more clear?
The way I am understanding this right now, are you asking how I could for example given the array
[1 2 2 1 4 5 6 3 2 4 ...]
multiply the value at k by 3?
If that is so, that would be a matter of indexing into the array.
If you are trying to find the value 3 in the array, there are multiple ways to go about it.
You could simply traverse the array, which is going to be at most o(n) time, but if it is sorted, just do a binary search.
edit:
By time complexity, the first one would be o(1) and the binary search would
be o(log n)

Quickest way to find 5 largest values in an array of structs

I have an array of structs called struct Test testArray[25].
The Test struct contains a member called int size.
What is the fastest way to get another array of Test structs that contain all from the original excluding the 5 largest, based on the member size? WITHOUT modifying the original array.
NOTE: Amount of items in the array can be much larger, was just using this for testing and the values could be dynamic. Just wanted a slower subset for testing.
I was thinking of making a copy of the original testArray and then sorting that array. Then return an array of Test structs that did not contain the top 5 or bottom 5 (depending on asc or desc).
OR
Iterating through the testArray looking for the largest 5 and then making a copy of the original array excluding the largest 5. This way seems like it would iterate through the array too many times comparing to the array of 5 largest that had been found.
Follow up question:
Here is what i am doing now, let me know what you think?
Considering the number of largest elements i am interested in is going to remain the same, i am iterating through the array and getting the largest element and swapping it to the front of the array. Then i skip the first element and look for the largest after that and swap it into the second index... so on so forth. Until i have the first 5 largest. Then i stop sorting and just copy the sixth index to the end into a new array.
This way, no matter what, i only iterate through the array 5 times. And i do not have to sort the whole thing.

Partial Sorting with a linear time selection algorithm will do this in O(n) time, where sorting would be O(nlogn).
To quote the Partial Sorting page:
The linear-time selection algorithm described above can be used to find the k smallest or the k largest elements in worst-case linear time O(n). To find the k smallest elements, find the kth smallest element using the linear-time median-of-medians selection algorithm. After that, partition the array with the kth smallest element as pivot. The k smallest elements will be the first k elements.
You can find the k largest items in O(n), although making a copy of the array or an array of pointers to each element (smarter) will cost you some time as well, but you have to do that regardless.
If you'd like me to give a complete explanation of the algorithm involved, just comment.
Update:
Regarding your follow up question, which basically suggests iterating over the list five times... that will work. But it iterates over the list more times than you need to. Finding the k largest elements in one pass (using an O(n) selection algorithm) is much better than that. That way you iterate once to make your new array, and once more to do the selection (if you use median-of-medians, you will not need to iterate a third time to remove the five largest items as you can just split the working array into two parts based on where the 5th largest item is), rather than iterating once to make your new array and then an additional five times.

As stated sorting is O(nlogn +5) iterating in O(5n + 5). In the general case finding m largest numbers is O(nlog +m) using the sort algorithm and O(mn +m) in the iteration algoritm. The question of which algorithm is better depends on the values of m and n. For a value of five iterating is better for up to 2 to the 5th numbers I.e. a measly 32. However in terms of operations sorting is more intensive than iterating so it'll be quite a bit more until it is faster.
You can do better theoretically by using a sorted srray of the largest numbers so far and binary search to maintain the order that will give you O(nlogm) but that again depends on the values of n and m.

Maybe an array isn't the best structure for what you want. Specially since you need to sort it every time a new value is added. Maybe a linked list is better, with a sort on insert (which is O(N) on the worst case and O(1) in the best), then just discard the last five elements. Also, you have to consider that just switching a pointer is considerably faster than reallocating the entire array just get another element in there.
Why not an AVL Tree? Traverse time is O(log2N), but you have to consider the time of rebalancing the tree, and if the time spent coding that is worth it.

With usage of min-heap data structure and set heap size to 5, you can traverse the array and insert into heap when the minimum element of heap is less than the element in the array.
getMin takes O(1) time and insertion takes O(log(k)) time where k is the element size of heap (in our case it is 5). So in the worst case we have complexity O(n*log(k)) to find max 5 elements. Another O(n) will take to get the excluded list.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight