Finding Common Duplicates in Arrays

Finding Common Duplicates in Arrays - arrays

I have two integer arrays with no info on range of the elements in the arrays. All i know is the lengths of two arrays m and n. Now there are some duplicates in both the arrays. I want to find the lowest common duplicate from both the arrays. Assume that i have limited memory, how do i solve this?

May be MergeSort Algorithm can help you for solving this problem. MergeSort Algorithm is basically for sorting the elements in a list but its main crux is in its Divide and Conquer Approach. As you have mentioned that the memory is limited, hence Divide and Conquer seems to be a reasonable approach for solving the problem.

one possible method (NB! there are probably more efficient ways for doing this both memory and performance wise).
create two map instances A and B for first and second array. key is the integer and value is the integer count in the respective array.
count the number of occurrences of each integer in both arrays and store them in maps A and B accordingly
remove the (key,value) pairs where value is less than 2 in both A and B
take the keysets of both A and B and find the intersection.
find the lowest key in the intersection, which is the answer

Related

sort array in fewest operations with k operations

I was at an interview, and the interviewer asked me to make a function that would find a list of operations that would sort an array in the fewest operations.
The allowed operations were swapping any two numbers in the array.
For example, given the array [0,3,4,1], two possible answers would be
[swap(1,2), swap(2,3)]
[swap(2,3), swap(1,2)]
I had already seen a question like that here, so I solved it with a modified version of the solution there.
However, after that, the interviewer changed the question a bit.
The goal was still to find the shortest list of operations to sort the array, but now the operations were rotating the array to left, rotating the array to the right, and swapping the numbers at index 0 and 1.
I tried to solve it with backtracking and a hash table, but my implementation did not work, and I also feel that there is a general solution to this problem given any k operations. For example, what if we were allowed to swap the two numbers in the middle or to rotate the array by two elements to the left but only one element to the right?
How would you solve this?

Fast way to count smaller/equal/larger elements in array

I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?

If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.

Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).

Merge operation on two small unsorted arrays to produce one big sorted array

I was asked a question recently in an interview.
We are given an array A of size n+m with first n places filled with elements in random order (and m empty places at the end). Also, we have an array B with m elements in random order.
Write a merge function so that array A is filled with (n+m) elements in sorted order.
I was able to give a O((n+m)log(n+m)) solution.
Is there a better solution to this problem?

NO there's no better solution to that. Let t = max(m,n) then the complexity is O(tlog(t)). How do we go on to prove there's no better solution ?
Will if there was a better solution to this problem when nothing is known about the data, then given any array of size N (big enough), we could split it to n, m arrays and sort in less than Nlog(N).

Finding Median in Three Sorted Arrays in O(logn)

By googling for minutes, I know the basic idea.
Let A,B,and C be sorted arrays containing n elements.
Pick median in each array and call them medA, medB, and medC.
Without loss of generality, suppose that medA > medB > medC.
The elements bigger than medA in array A cannot become the median of three arrays. Likewise, the elements smaller than medC in array C cannot, so such elements will be ignored.
Repeat steps 2-4 recursively.
My question is, what is the base case?
Assuming a lot of base cases, I tested the algorithm by hands for hours, but I was not able to find a correct base case.
Also, the lengths of three arrays will become different every recursive step. Does step 4 work even if the length of three arrays are different?

This algorithm works for two sorted arrays of same sizes but not three. After the one iteration, you eliminates half of the elements in A and C but leaves B unchanged, so the number of elements in these arrays are no longer the same, and the method no longer apply. For arrays of different sizes, if you apply the same method, you will be removing different number of elements from the lower half and upper half, therefore the median of the remaining elements is not the same as the median of the original arrays.
That being said, you can modify the algorithm to eliminate same number of elements at both end in each iteration, this could be in efficient when some of the arrays are very small and some are very large. You can also turn this into a question of finding the k-th element, track the number of elements being throw away and change value of k at each iteration. Either way this is much trickier than the two array situation.
There is another post talking about a general case: Median of 5 sorted arrays

I think you can use the selection algorithm, slightly modified to handle more arrays.
You're looking for the median, which is the p=[n/2]th element.
Pick the median of the largest array, find for that value the splitting point in the other two arrays (binary search, log(n)). Now you know that the selected number is the kth (k = sum of the positions).
If k > p, discard elements in the 3 arrays above it, if smaller, below it (discarding can be implemented by maintaing lower and upper indexes for each array, separately). If it was smaller, also update p = p - k.
Repeat until k=p.
Oops, I think this is log(n)^2, let me think about it...

How do I find common elements from n arrays

I am thinking of sorting and then doing binary search. Is that the best way?

I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.

Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.

It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).
This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.
Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.

Define "best".
If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?
Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).

The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.
A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.