Finding common elements in two arrays of different size

Finding common elements in two arrays of different size - arrays

I have to find the best way to get common elements of two arrays of different size.
The arrays are unordered; the common elements are in different position, but in the same order (if in the array A common element b came after a, the same happens in array B) and with max distance N.
I can't use more additional space of O(N).
Actually I extract N elements from array A, order them with mergesort and perform a dicotomic search using N elements of array B. Then I get the next N elements from the position of the match I found and do another cycle.
The cost of this should be, using m as length of array B, O(m N log N)
I have tried using an hashtable, but to manage collisions I have to implement a List, and efficiency goes down.
There is a better way?

Assuming you can have "holes" in your matched sequence (A = [1,3,2] AND B = [1,4,2] Then MatchSet = {1,2})
Maybe I am wrong but you could try this pseudo code:
i <- 0; j <- 0; jHit <- -1
matchSet <- Empty
While i < Length(A) AND j < Length(B):
If A[i] == B[j] Then
matchSet.add(A[i])
i <- i+1
jHit <- j
End If
j <- j+1
If j == Length(B) Then
i <- i+1
j <- jHit+1
End If
End Loop
The first index (i) points to the next element of A not found in B whereas (j) is used to look for the next element of B (after the last element found in A).
This yould give you a time complexity of O(mN) and space usage of O(N).
Here you have an implementation in Python:
def match(A,B):
i = 0
j = 0
jHit = -1
res = []
while i < len(A) and j < len(B):
if A[i] == B[j]:
res.append(A[i])
i += 1
jHit = j
j += 1
if j == len(B):
i += 1
j = jHit+1
return res

Related

Given a sorted array and a positive integer k, find the number of integer in the interval(100(i-1)/k, 100(i)/k] for 1 <= i <= k

Given a sorted array A[1..n] and a positive integer k, count the number of integer in the intervals(100(i-1)/k, 100(i)/k] for 1 <= i <= k and store it in another array G[1..k]
Assume array G is already created(is not an input in the algorithm )
and element in G is initialized to be 0.
Also, there is a helper function Increase(i, count) to find the interval(G[?]) of A[i] correspond to and increase the value of G[?] by count;
For example, a sorted array [1,11,25,34,46,62,78,90,99] and k = 4
so the result should be G[1] = 3, G[2] = 2, G[3] = 1, G[4] = 3
where G[1] is an interval (0,25] G[2] -> (25,50] G[3] -> (50,75] G[4] -> (75,100]
Is there any divide-and-conquer algorithm to solve this problem? rather than solve it linearly?
More advanced:
Also, If we cannot directly access the element in array A , and there is a function Compare(x, y) to return true if A[x] and A[y] is in the same interval.
How to solve it? Can I try to make each group call at most log n time Increase and there are k groups hence having O(k log n ) running time?
My observation at this point:
if A[i] and A[y] is in the same interval where i < y, element A[j] with i < j < y will also in the same interval.

The easiest sublinear approach (assuming k << n) is to do (k+1) binary searches, one for each boundary value, yielding an approximately (k lg n)-comparison algorithm.
This can be lowered to approximately (k (1 + lg (n/k))) by combining probes together intelligently.

Find the maximum length subarray condition 2 * min > max

This was an interview question I was recently asked at Adobe:
In an array, find the maximum length subarray with the condition 2 * min > max, where min is the minimum element of the subarray, and max is the maximum element of the subarray.
Does anyone has any approach better than O(n^2)?
Of course, we can't sort, as a subarray is required.
Below is my O(n^2) approach:
max=Integer.MIN_VALUE;
for (int i=0; i<A.length-1;i++)
for(j=i+1;j<A.length;j++)
{
int min =findMin(A,i,j);
int max =findMAx(A,i,j);
if(2*min<=max) {
if(j-i+1>max)
max = j-i+1
}
}
Does anybody know an O(n) solution?

Let A[i…j] be the subarray consisting of A[i], A[i+1], … A[j].
Observations:
If A[i…j] doesn't satisfy the criterion, then neither does A[i…(j+1)], because 2·min(A[i…(j+1)]) ≤ 2·min(A[i…j]) ≤ max(A[i…j]) ≤ max(A[i…(j+1)]). So you can abort your inner loop as soon as you find a j for which condition is not satisfied.
If we've already found a subarray of length L that meets the criterion, then there's no need to consider any subarray with length ≤ L. So you can start your inner loop with j = i + maxLength rather than j = i + 1. (Of course, you'll need to initialize maxLength to 0 rather than Integer.MIN_VALUE.)
Combining the above, we have:
int maxLength = 0;
for (int i = 0; i < A.length; ++i) {
for (int j = i + maxLength; j < A.length; ++j) {
if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
// success -- now let's look for a longer subarray:
maxLength = j - i + 1;
} else {
// failure -- keep looking for a subarray this length:
break;
}
}
}
It may not be obvious at first glance, but the inner loop now goes through a total of only O(n) iterations, because j can only take each value at most once. (For example, if i is 3 and maxLength is 5, then j starts at 8. If we A[3…8] meets the criterion, we increment maxLength until we find a subarray that doesn't meet the criterion. Once that happens, we progress from A[i…(i+maxLength)] to A[(i+1)…((i+1)+maxLength)], which means the new loop starts with a greater j than the previous loop left off.)
We can make this more explicit by refactoring a bit to model A[i…j] as a sliding-and-potentially-expanding window: incrementing i removes an element from the left edge of the window, incrementing j adds an element to the right edge of the window, and there's never any need to increment i without also incrementing j:
int maxLength = 0;
int i = 0, j = 0;
while (j < A.length) {
if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
// success -- now let's look for a longer subarray:
maxLength = j - i + 1;
++j;
} else {
// failure -- keep looking for a subarray this length:
++i;
++j;
}
}
or, if you prefer:
int maxLength = 0;
int i = 0;
for (int j = 0; j < A.length; ++j) {
if (findMin(A,i,j) * 2 > findMax(A,i,j)) {
// success -- now let's look for a longer subarray:
maxLength = j - i + 1;
} else {
// failure -- keep looking for a subarray this length:
++i;
}
}
Since in your solution, the inner loop iterates a total of O(n2) times, and you've stated that your solution runs in O(n2) time, we could argue that, since the above has the inner loop iterate only O(n) times, the above must run in O(n) time.
The problem is, that premise is really very questionable; you haven't indicated how you would implement findMin and findMax, but the straightforward implementation would take O(j−i) time, such that your solution actually runs in O(n3) rather than O(n2). So if we reduce the number of inner loop iterations from O(n2) to O(n), that just brings the total time complexity down from O(n3) to O(n2).
But, as it happens, it is possible to calculate the min and max of these subarrays in amortized O(1) time and O(n) extra space, using "Method 3" at https://www.geeksforgeeks.org/sliding-window-maximum-maximum-of-all-subarrays-of-size-k/. (Hat-tip to גלעד ברקן for pointing this out.) The way it works is, you maintain two deques, minseq for calculating min and maxseq for calculating max. (I'll only explain minseq; maxseq is analogous.) At any given time, the first element (head) of minseq is the index of the min element in A[i…j]; the second element of minseq is the index of the min element after the first element; and so on. (So, for example, if the subarray is [80,10,30,60,50] starting at index #2, then minseq will be [3,4,6], those being the indices of the subsequence [10,30,50].) Whenever you increment i, you check if the old value of i is the head of minseq (meaning that it's the current min); if so, you remove the head. Whenever you increment j, you repeatedly check if the tail of minseq is the index of an element that's greater or equal to the element at j; if so, you remove the tail and repeat. Once you've removed all such tail elements, you add j to the tail. Since each index is added to and removed from the deque at most once, this bookkeeping has a total cost of O(n).
That gives you overall O(n) time, as desired.

There's a simple O(n log n) time and O(n) space solution since we know the length of the window is bound, which is to binary search for the window size. For each chosen window size, we iterate over the array once, and we make O(log n) such traversals. If the window is too large, we won't find a solution and try a window half the size; otherwise we try a window halfway between this and the last successful window size. (To update the min and max in the sliding window we can use method 3 described here.)

Here's an algorithm in O(n lg k) time, where n is the length of the array and k the length of the maxmimum subarray having 2 * min > max.
Let A the array. Let's start with the following invariant: for j between 0 and length A, SA(j) is empty or 2 * min > max. It is extremely easy to initialize: take the empty subarray from indices 0 to 0. (Note that SA(j) may be empty because A[j] may be zero or negative: you don't have 2 * min > max because min >= 2 * min > max is impossible.)
The algorithm is: for each j, we set SA(j) = SA(j-1) + A[j]. But if A[j] >= 2 * min(SA(j-1)), then the invariant is broken. To restore the invariant, we have to remove all the elements e from SA(j) that meet A[j] >= 2 * e. In the same way, the invariant is broken if 2 * A[j] <= max(SA(j-1)). To restore the invariant, we have to remove all the elements e from SA(j) that meet 2 * A[j] <= e.
On the fly, we keep a track of the longest SA(j) found and return it.
Hence the algorithm:
SA(0) <- A[0..1] # 1 excluded -> empty subarray
ret <- SA(0)
for j in 1..length(A):
if A[j] >= 2 * min(SA(j-1)):
i <- the last index having A[j] >= 2 * A[i]
SA(j) <- A[i+1..j+1]
else if 2 * A[j] <= max(SA(j-1)):
i <- the last index having 2 * A[j] <= A[i]
SA(j) <- A[i+1..j+1]
if length(SA(j)) > length(ret):
ret <- SA(j)
return ret
The question is: how do we find the last index i having A[j] >= 2 * A[i]? If we iterate over SA(j-1), we need k steps at most, and then the time complexity will be O(n k) (we start with j-1 and look for the last value that keeps the invariant).
But there is a better solution. Imagine we have a min heap that stores elements of SA(j-1) along with their positions. The first element is the minimum of SA(j-1), let i0 be its index. We can remove all elements from the start of SA(j-1) to i0 included. Now, are we sure that A[j] >= 2 * A[i] for all remaining is? No: there is maybe more elements that are to small. Hence we remove the elements one after the other until the invariant is restored.
We'll need a max heap to, to handle the other situation 2 * A[j] <= max(SA(j-1)).
The easier is to create an ad hoc queue that has the following operations:
add(v): add an element v to the queue
remove_until_min_gt(v): remove elements from start of the queue until the minimum is greater than v
remove_until_max_lt(v): remove elements from start of the queue until the maximum is less than v
maximum: get the maximum of the queue
minimum: get the minimum of the queue
With two heaps, maximum and minimum are O(1), but the other operations are O(lg k).
Here is a Python implementation that keep indices of the start and the en of the queue:
import heapq
class Queue:
def __init__(self):
self._i = 0 # start in A
self._j = 0 # end in A
self._minheap = []
self._maxheap = []
def add(self, value):
# store the value and the indices in both heaps
heapq.heappush(self._minheap, (value, self._j))
heapq.heappush(self._maxheap, (-value, self._j))
# update the index in A
self._j += 1
def remove_until_min_gt(self, v):
return self._remove_until(self._minheap, lambda x: x > v)
def remove_until_max_lt(self, v):
return self._remove_until(self._maxheap, lambda x: -x < v)
def _remove_until(self, heap, check):
while heap and not check(heap[0][0]):
j = heapq.heappop(heap)[1]
if self._i < j + 1:
self._i = j + 1 # update the start index
# remove front elements before the start index
# there may remain elements before the start index in the heaps,
# but the first element is after the start index.
while self._minheap and self._minheap[0][1] < self._i:
heapq.heappop(self._minheap)
while self._maxheap and self._maxheap[0][1] < self._i:
heapq.heappop(self._maxheap)
def minimum(self):
return self._minheap[0][0]
def maximum(self):
return -self._maxheap[0][0]
def __repr__(self):
ns = [v for v, _ in self._minheap]
return f"Queue({ns})"
def __len__(self):
return self._j - self._i
def from_to(self):
return self._i, self._j
def find_min_twice_max_subarray(A):
queue = Queue()
best_len = 0
best = (0, 0)
for v in A:
queue.add(v)
if 2 * v <= queue.maximum():
queue.remove_until_max_lt(v)
elif v >= 2 * queue.minimum():
queue.remove_until_min_gt(v/2)
if len(queue) > best_len:
best_len = len(queue)
best = queue.from_to()
return best
You can see that every element of A except the last one may pass through this queue, thus the O(n lg k) time complexity.
Here's a test.
import random
A = [random.randint(-10, 20) for _ in range(25)]
print(A)
# [18, -4, 14, -9, 8, -6, 12, 13, -7, 7, -2, 14, 7, 9, -9, 9, 20, 19, 14, 13, 14, 14, 2, -8, -2]
print(A[slice(*find_min_twice_max_subarray(A))])
# [20, 19, 14, 13, 14, 14]
Obviously, if there was a way to find the start index that restores the invariant in O(1), we would have a time complexity in  O(1). (This reminds me how the KMP algorithm finds the best new start in a string matching problem, but I don't know if it is possible to create something similar here.)

Why is the last element of the list not shuffled in the Fischer-Yates algorithm?

On the Wikipedia, it gives the amended Fischer-Yates shuffle algorithm as:
-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n−2 do
j ← random integer such that i ≤ j < n
exchange a[i] and a[j]
Why is the last element of the array not shuffled?

Putting #CollinD's comments as an answer:
If i == n-1, then it can only possibly be swapped with j = n-1 so why bother running an extra iteration?
At a given i, there is a 1/(n-i) chance to swap with any element in [i, n) (including i itself)

Given a number X , find two elements in two sorted arrays such A[i]+B[j] = X in O(n+m)

Given the following problem , I'd appreciate for any corrections since I have no solution
for the current question (taken from one of my professor's exams !!!) :
Remark: this is no homework !
Problem:
Given two sorted arrays A (with length n) & B (with length m) , where each
element (in both arrays) is a real number , and a number X (also a real number) ,
we'd like to know whether or not exists a ∈ A and b ∈ B , such as :
a + b = X , in O(n+m) running time .
Solution :
First , we start to check from the end of both arrays , since we don't need the numbers that are bigger than X :
i = n
k = m
while A[i] > X , i = i -1
while B[k] > X , k = k -1
Define j = 0 .
Now start running from the current i in A , and from j in B :
while i > 0 , j < k :
if A[i]+B[j] == X , then return both cells
else j = j+1 , i = i -1
In the end we'd have either the two elements , or we'd reach out of bounds in one
or both of the arrays , which means that no two elements such a + b = X are indeed exist .
Any remarks would be much appreciated
Thanks

You shouldn't adjust i and j at the same time. If the sum is too big, you should decrease i. If it is too small, increase j.

This problem is a special case of the following question:
Find number in sorted matrix (Rows n Columns) in O(log n)
Consider a matrix filled with the C[i,j] = A[i] + B[j], then starting from one of the corners, you decrease i if the sum is too big, and if it's too small, increase j.
Of course you don't have to create and/or fill this matrix C in your program, just assume you know any element of this matrix: it's A[i] + B[j], you can compute it immediately at any moment. The resulting algorithm is O(m+n).

I have the same question for homework.
I worked out before I check it on the internet.
Here is my solution(Python), hope some big guys see it and help me to improve it
# 1.3. Given two sorted arrays a and b and number S. Find two elements such that sum a[i] + b[j] = S.Time O(n).
def find_elem_sum(alist, blist, S): # O(n)
if alist is None or alist == [] or blist is None or blist == []:
return None
# find the numbers which are less than S in the lists
#
#
# pretend it is done
a_length = len(alist)
b_length = len(blist)
a_index, b_index = 0, 0
count = 0
while alist and a_index < a_length and b_index < b_length:
count += 1
if alist[a_index] + blist[b_index] < S:
if a_index < a_length - 1:
a_index += 1
if a_index == a_length - 1:
b_index += 1;
continue
elif alist[a_index] + blist[b_index] > S:
alist = alist[:a_index]
a_length = len(alist)
a_index = a_length - 1
elif alist[a_index] + blist[b_index] == S:
return [a_index, b_index, count]
return None

Suggest an Efficient Algorithm

Given an Array arr of size 100000, each element 0 <= arr[i] < 100. (not sorted, contains duplicates)
Find out how many triplets (i,j,k) are present such that arr[i] ^ arr[j] ^ arr[k] == 0
Note : ^ is the Xor operator. also 0 <= i <= j <= k <= 100000
I have a feeling i have to calculate the frequencies and do some calculation using the frequency, but i just can't seem to get started.
Any algorithm better than the obvious O(n^3) is welcome. :)
It's not homework. :)

I think the key is you don't need to identify the i,j,k, just count how many.
Initialise an array size 100
Loop though arr, counting how many of each value there are - O(n)
Loop through non-zero elements of the the small array, working out what triples meet the condition - assume the counts of the three numbers involved are A, B, C - the number of combinations in the original arr is (A+B+C)/!A!B!C! - 100**3 operations, but that's still O(1) assuming the 100 is a fixed value.
So, O(n).

Possible O(n^2) solution, if it works: Maintain variable count and two arrays, single[100] and pair[100]. Iterate the arr, and for each element of value n:
update count: count += pair[n]
update pair: iterate array single and for each element of index x and value s != 0 do pair[s^n] += single[x]
update single: single[n]++
In the end count holds the result.

Possible O(100 * n) = O(n) solution.
it solve problem i <= j <= k.
As you know A ^ B = 0 <=> A = B, so
long long calcTripletsCount( const vector<int>& sourceArray )
{
long long res = 0;
vector<int> count(128);
vector<int> countPairs(128);
for(int i = 0; i < sourceArray.size(); i++)
{
count[sourceArray[i]]++; // count[t] contain count of element t in (sourceArray[0]..sourceArray[i])
for(int j = 0; j < count.size(); j++)
countPairs[j ^ sourceArray[i]] += count[j]; // countPairs[t] contain count of pairs p1, p2 (p1 <= p2 for keeping order) where t = sourceArray[i] ^ sourceArray[j]
res += countPairs[sourceArray[i]]; // a ^ b ^ c = 0 if a ^ b = c, we add count of pairs (p1, p2) where sourceArray[p1] ^ sourceArray[p2] = sourceArray[i]. it easy to see that we keep order(p1 <= p2 <= i)
}
return res;
}
Sorry for my bad English...

I have a (simple) O(n^2 log n) solution which takes into account the fact that i, j and k refer to indices, not integers.
A simple first pass allow us to build an array A of 100 values: values -> list of indices, we keep the list sorted for later use. O(n log n)
For each pair i,j such that i <= j, we compute X = arr[i]^arr[j]. We then perform a binary search in A[X] to locate the number of indices k such that k >= j. O(n^2 log n)
I could not find any way to leverage sorting / counting algorithm because they annihilate the index requirement.

Sort the array, keeping a map of new indices to originals. O(nlgn)
Loop over i,j:i<j. O(n^2)
Calculate x = arr[i] ^ arr[j]
Since x ^ arr[k] == 0, arr[k] = x, so binary search k>j for x. O(lgn)
For all found k, print mapped i,j,k
O(n^2 lgn)

Start with a frequency count of the number of occurrences of each number between 1 and 100, as Paul suggests. This produces an array freq[] of length 100.
Next, instead of looping over triples A,B,C from that array and testing the condition A^B^C=0,
loop over pairs A,B with A < B. For each A,B, calculate C=A^B (so that now A^B^C=0), and verify that A < B < C < 100. (Any triple will occur in some order, so this doesn't miss triples. But see below). The running total will look like:
Sum+=freq[A]*freq[B]*freq[C]
The work is O(n) for the frequency count, plus about 5000 for the loop over A < B.
Since every triple of three different numbers A,B,C must occur in some order, this finds each such triple exactly once. Next you'll have to look for triples in which two numbers are equal. But if two numbers are equal and the xor of three of them is 0, the third number must be zero. So this amounts to a secondary linear search for B over the frequency count array, counting occurrences of (A=0, B=C < 100). (Be very careful with this case, and especially careful with the case B=0. The count is not just freq[B] ** 2 or freq[0] ** 3. There is a little combinatorics problem hiding there.)
Hope this helps!