Sorting and merging two arrays in efficient way? - arrays

We have two arrays (not sorted), of capacity n and n+m. The first array has n elements. The second array has m elements (and additionally n places reserved for more elements).
The goal is to merge the two arrays and store the result in the second array in a sorted manner, without using extra space.
Currently, I sort both arrays using quick-sort and then merge them using merge-sort. Is there a more efficient way to achieve this???

You can explore merge sort.
https://www.google.com/search?q=mergesort&ie=UTF-8&oe=UTF-8&hl=en&client=safari#itp=open0
Or depending on the size, you can do quicksort on each array, and then merge them using the merge sort technique (or merge, then quicksort).
I would go with mergesort, it basically works by sorting each array individually, then it puts them together in order
You're looking at O(nlogn) for mergesort and O(nlogn) for quicksort, but possible O(n^2) worst case with quicksort.

Clearly the best thing to do is to copy the contents of N into the free space in the N+M array and quicksort the N+M array.
By doing 2 quicksorts and then a merge sort you are just making the entire operation less efficient.
Here is a mental exercise, if you had to sort an array of length M, would you split it into 2 arrays, M1 and M2, sort each and then merge sort them together? No. If you did that you would just be limiting information available to each call of quicksort, slowing down the process.
So why would you keep your two starting arrays separate?

If i wanted to guarantee also the O(n*log(n)) behaviour, I would use modified version of Heapsort, which would use the both arrays as a base for the the heap, and would store the sorted data in the additional part of the array.
This also might be faster than two Quicksorts, because it does not require the additional merge operation. Also Quicksort is terribly slow, when small arrays are being sorted (the size of the problem is not mentioned the the setting of the problem).

If you are storing in a second array, than you are using extra space, but you can minimize that by helping the GC like this:
join both arrays in a second array
set both previous variables to null so they are eligible to be garbage collected
sort the second array, Arrays.sort(...) - O(n log(n))
Look at the javadoc for this method:
/**
* Sorts the specified array into ascending numerical order.
*
* <p>Implementation note: The sorting algorithm is a Dual-Pivot Quicksort
* by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm
* offers O(n log(n)) performance on many data sets that cause other
* quicksorts to degrade to quadratic performance, and is typically
* faster than traditional (one-pivot) Quicksort implementations.
*
* #param a the array to be sorted
*/
public static void sort(int[] a) {

Let's call the smaller array the N array and the other one the M array. I'm assuming the elements of the M array are initially in locations 0 through m-1. Sort both arrays using your favorite technique, which may depend on other criteria such as stability or limiting worst-case behavior.
if min(N) > max(M)
copy N's elements over starting at location m [O(n) time]
else
move M's elements to the end of the M array (last down to first) [O(m) time]
if min(M) > max(N)
copy N's elements over starting at location 0 [O(n) time for the copy]
else
perform classic merge: min of remaining m's and n's gets migrated
to next available space in M [O(min(m,n) time]
Overall this is dominated by the initial sorting time, the merge phase is all linear. Migrating the m's to the end of the M array guarantees no space collisions, so you don't need extra side storage as per your specification.

Related

Array with specific values

Given an array the size of n where:
1/2 of the array is with a single (unknown) value.
1/4 of the array is with a single (unknown) different value.
And so on for 1/8, 1/16, 1/32
Give an algorithm to sort the array.
You cannot use the find median algorithm
So what I figured is:
There are only logn different values
There is a simple solution using a binary heap on O ( n*loglogn)
It looks like a question that needed to be solved in O (n)
Here is one possible approach:
scan the array and store element frequencies (there are log n distinct elements) in a hash table in amortized O(n) time; this is doable because we can do insertions in amortized O(1) time;
now run a classic sorting algorithm on these log n elements: this is doable in deterministic O(log n log log n) time using, say, heap sort or merge sort;
now expand the sorted array---or create a new one and fill it using the sorted array and the hash table---using frequencies from the hash table; this is doable in O(n) amortized time.
The whole algorithm thus runs in amortized O(n) time, i.e., it is dominated by eliminating duplicates and expanding the sorted array. The space complexity is O(n).
This is essentially optimal because you need to "touch" all the elements to print the sorted array, which means we have a matching lower bound of Omega(n) on the running time.
The idea is to use the Majority algorithm that takes O(n) then discovering what's the "half" value deleting it from the array and then doing it again on the new array
n+n/2+n/4+n/8+..... < 2n => O (n)
Going over the array once, keep hash map for seen values.
Like you said there are only log(n) different values.
Now you have list of all the different values - sorting them will take lon(n)*log(log(n))
Once you have the sorted uniq like it's easy to constract the original array : The max value will take n/2 cells , the 2nd take n/4 and so on.
The Total run time is O(n + lon(n)*log(log(n)) + n) which is O(n)

Comparing two String array in most efficient way

This problem is about searching a string in a master array (contains the list of all UIDs). The second array contains all the strings to be searched.
For example:
First array(Master List) contains: UID1 UID2 UID3... UID99
Second array contains: UID3 UID144 UID50
If a match is found in first array then 1 is returned otherwise 0 is return. So the output for the above example should be 101.
What could be the most efficient approach (targeting C) to solve the above keeping in mind that the traditional way dealing with this would be n^2!!!
sort the master string array and do binary search.
Efficient in terms of what?
I would go with #Trying's suggestion as a good compromise between decent running speed, low memory usage, and very (very!) low complexity of implementation.
Just use qsort() to sort the first master array in place, then use bsearch() to search it.
Assuming n elements in the master array and m in the second array, this should give O(m*log n) time complexity which seems decent.
Another option is to build a hash for the strings in the Master list, it's a single O(M) (assuming the lengths are O(1)), then assuming the hash is distributed evenly, searching a single element should take on average O(M/S), with S being the size the hash (the even distribution means that on average this is the amount of elements mapping into the same hash entry). You can further control the size to fine tune the trade off between space and efficiency
There are mainly two good approaches for this problem:
Use a binary search: a binary search requires the UIDs in the first array to be sorted and allows you to find a solution in O(log n) where n is the number of elements in the master array. The total complexity would be O(m log n) with m the number of elements to be searched.
Use a hashmap: You can store the elements of the master array in a hashmap (O(n)) and then check whether your elements of the second array are in the hashmap (O(m)). The total complexity would be O(n+m).
While the complexity of the second approach looks better, you must keep in mind that if your hash is bad, it could be O(m*n) in the worst case (but you would be very very unlikely). Also you would use more memory and the operations are also slower. In your case, I would use the first approach.

algorithm of sorting d sorted arrays

Please help to understand the running time of the following algorithm
I have d already sorted arrays (every array have more than 1 element) with total n elements.
i want to have one sorted array of size n
if i am not mistaken insertion sort is running linearly on partially sorted arrays
if i will concatenate this d arrays into one n element array and sort it with insertion sort
isn't it a partially sorted array and running time of insertion sort on this array wont be O(n) ?
Insertion sort is O(n²), even when original array is concatenation of several presorted arrays. You probably need to use mergesort to combine several sorted arrays into one sorted array. This will give you O(n·ln(d)) performance
No, this will take quadratic time. Insertion sort is only linear if each element is at most a constant distance d away from the point where it would be in a sorted array, in which case it takes O(nd) time -- that's what's meant by partially sorted. You don't have that guarantee.
You can do this in linear time only under the assumption that the number of subarrays is guaranteed to be a small constant. In that case, you can use a k-way merge.
Insertion sort is fairly (relatively) linear for small values of N. If N is large then your performance will more likely be N^2.
The fact that the sub-arrays are sort wont, I believe, help that much if N is sufficiently large.
Timsort is a good candidate for partially sorted arrays
If the arrays are known to be sorted, it's a simple matter of treating each array as a queue, sorting the "heads", selecting the smallest of the heads to put into the new array, then "popping" the selected value from its array.
If D is small then a simple bubble sort works well for sorting the heads, otherwise you should use some sort of insertion sort, since only one element needs to be placed into the order.
This is basically a "merge sort", I believe. Very useful when the list to be sorted exceeds working storage, since you can sort smaller lists first, without thrashing, then combine using very little working storage.

Find common elements in two sorted arrays [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
The intersection of two sorted arrays
We have two sorted arrays A and B, besides compare one with all the elements in other array, how to design a best algorithm to find the array with their common elements?
Hold two pointers: one for each array.
i <- 0, j <- 0
repeat while i < length(arr1) and j < length(arr2):
if arr1[i] > arr2[j]: increase j
else if arr1[i] < arr2[j]: increase i
else : output arr[i], increase both pointers
The idea is, if the data is sorted, if the element is "too big" in one array, it will be "too big" for all other elements left in the array - since it is sorted.
This solution requires a single traversal on the data. O(n) (with good constants as well).
If the lengths of two arrays (say, A has N elements and B has M elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M).
If the lengths are significantly different (say, M << N), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N) in that case.
As you can see O(M * log N) is better than O(N + M) if M is much smaller than N, and worse otherwise.
The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.
These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N. In that case let's choose step value S = [N / M]. You take first element from array A and perform a straddled linear search for that element in array B with step S, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ... and so on. Once you find the index range [S*i, S*(i+1)] that potentially contains the element you are searching for, you switch to binary search inside that segment of array B. Done. The straddled linear search for the next element of A begins where the previous search left off. (As a side note, it might make sense to choose the value of S equal to a power of 2).
This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.
besides compare one with all the elements in other array
You will have to compare A[] to B[] in order to know that they are the same -- unless you know a lot about what kind of data they can hold. The nature of the comparison probably has many solutions and can be optimized as required.
If the arrays are very strictly created ie only sequential values of a known pattern and always starts from a known point you could just look at the length of each array and know whether or not all items are common.
This unfortunately doesn't sound like a very realistic or useful array and so you are back to checking for A[i] in B[]

Sort an array which is partially sorted

I am trying to sort an array which has properties like
it increases upto some extent then it starts decreasing, then increases and then decreases and so on. Is there any algorithm which can sort this in less then nlog(n) complexity by making use of it being partially ordered?
array example = 14,19,34,56,36,22,20,7,45,56,50,32,31,45......... upto n
Thanks in advance
Any sequence of numbers will go up and down and up and down again etc unless they are already fully sorted (May start with a down, of course). You could run through the sequence noting the points where it changes direction, then then merge-sort the sequences (reverse reading the backward sequences)
In general the complexity is N log N because we don't know how sorted it is at this point. If it is moderately well sorted, i.e. there are fewer changes of direction, it will take fewer comparisons.
You could find the change / partition points, and perform a merge sort between pairs of partitions. This would take advantage of the existing ordering, as normally the merge sort starts with pairs of elements.
Edit Just trying to figure out the complexity here. Merge sort is n log(n), where the log(n) relates to the number of times you have to re-partition. First every pair of elements, then every pair of pairs, etc... until you reach the size of the array. In this case you have n elements with p partitions, where p < n, so I'm guessing the complexity is p log(p), but am open to correction. e.g. merge each pair of paritions, and repeat based on half the number of partitions after the merge.
See Topological sorting
If you know for a fact that the data are "almost sorted" and the set size is reasonably small (say an array that can be indexed by a 16-bit integer), then Shell is probably your best bet. Yes, it has a basic time complexity of O(n^2) (which can be reduced by the sequence used for gap sizing to a current best-worst-case of O(n*log^2(n))), but the performance improves with the sortedness of the input set to a best-case of O(n) on an already-sorted set. Using Sedgewick's sequence for gap size will give the best performance on those occasions when the input is not as sorted as you expected it to be.
Strand Sort might be close to what you're looking for. O(n sqrt(n)) in the average case, O(n) best case (list already sorted), O(n^2) worst case (list sorted in reverse order).
Share and enjoy.

Resources