Intersection of two unsorted integer arrays - arrays

Given two arrays, A and B, both of size (m). The numbers in the arrays are in range of [-n,n]. I need to find an algorithm that returns the intersection of A and B in O(m).
For example:
Assume that
A={1,2,14,14,5}
and
B={2,2,14,3}
the algorithm needs to return 2 and 14.
I've tried to define two Arrays with size (n), one stands for the positive numbers and the other for the negative numbers, and each index of the arrays represents the number.
I thought I could scan one array of A and B and sign each element with 1 in the arrays, and check directly the elements of the other array.
But it turns out that I only can use the arrays when I initialize them - which takes O(n).
What can I do to improve the algorithm?

This can be done with a set:
A_set = set(A)
print([b for b in B if b in A_set])
The construction of the set happens in O(m), checking each element of B needs O(m) time, so the total runtime complexity is O(m).
You will also need O(m) space to store the set.

If you can create an array or bit vector of 2n+1 Booleans in O(m) time or better, then you don't have to initialize the array before you use it:
Create an array S of 2n+1 elements.
For each element a in A, set S[a+n] = false
For each element b in B, set S[b+n] = true
Return the elements a of A for which S[a+n] == true
If you can't create that array in that time, then maybe you can use #fafl's answer, although that is only expected O(m) time unless the set implementation is very special.

The thing to be noticed is that there are duplicates in both the sets (I know that's not what set is, sets have unique entries, but the stated example in the question prompt to me that, say it's a multiset). Expecting the n <= 10^6. Build an array from -n to n, which can function as a hash map. '0' be the index for '-n', then 'i' be the index for 'i-n'. Iterate over array A, and on encountering a value, say c, hash[c+n]++. Now take an empty set for the result. Iterate over the array B. For each value c in B, if hash[c+n] > 0, then put it in the result set and decrement the count. Like this one can find the intersection in O(m).

It is recommended to manually implement a bit-set to solve the problem.
For instance:
min(Array A, Array B) = 1 AND
max(Array A, Array B) = 14 —
therefore, the bit-set is from min = 1 to max = 14 and Array A = :11001000000001 and Array B = :01100000000001 — *(Array A) &= *(Array B) = :01000000000001 which is 2 and 14.

Related

given 3 arrays check if there is any common number

**I have 3 arrays a[1...n] b[1...n] c[1....n] which contain integers.
It is not mentioned if the arrays are sorted or if each array has or has not duplicates.
The task is to check if there is any common number in the given arrays and return true or false.
For example : these arrays a=[3,1,5,10] b=[4,2,6,1] c=[5,3,1,7] have one common number : 1
I need to write an algorithm with time complexity O(n^2).
I let the current element traversed in a[] be x, in b[] be y and in c[] be z and have following cases inside the loop : If x, y and z are same, I can simply return true and stop the program,something like:
for(x=1;x<=n;x++)
for(y=1;y<=n;y++)
for(z=1;z<=n;z++)
if(a[x]==b[y]==c[z])
return true
But this algorithm has time complexity O(n^3) and I need O(n^2).Any suggestions?
There is a pretty simple and efficient solution for this.
Sort a and b. Complexity = O(NlogN)
For each element in c, use binary search to check if it exists in both a and b. Complexity = O(NlogN).
That'll give you a total complexity of O(NlogN), better than O(N^2).
Create a new array, and save common elements in a and b arrays. Then find common elements in this array with c array.
python solution
def find_d(a, b, c):
for i in a:
for j in b:
if i==j:
d.append(i)
def findAllCommon(c, d):
for i in c:
for j in d:
if i==j:
e.append(i)
break
a = [3,1,5,10]
b = [4,2,6,1]
c = [5,3,1,7]
d = []
e = []
find_d(a, b, c)
findAllCommon(c, d)
if len(e)>0:
print("True")
else:
print("False")
Since I haven't seen a solution based on sets, so I suggest looking for how sets are implemented in your language of choice and do the equivalent of this:
set(a).intersection(b).intersection(c) != set([])
This evaluates to True if there is a common element, False otherwise. It runs in O(n) time.
All solutions so far either require O(n) additional space (creating a new array/set) or change the order of the arrays (sorting).
If you want to solve the problem in O(1) additional space and without changing the original arrays, you indeed can't do better than O(n^2) time:
foreach(var x in a) { // n iterations
if (b.Contains(x) && c.Contains(x)) return true; // max. 2n
} // O(n^2)
return false;
A suggestion:
Combine into a single array(z) where z = sum of the entries in each array. Keep track of how many entries there were in Array 1, Array 2, Array 3.
For each entry Z traverse the array to see how many duplicates there are within the combined array and where they are. For those which have 2 or more (ie there are 3 or more of the same number), check that the location of those duplicates correspond to having been in different arrays to begin with (ruling our duplicates within the original arrays). If your number Z has 2 or more duplicates and they are all in different arrays (checked through their position in the array) then store that number Z in result array.
Report result array.
You will traverse the entire combined array once and then almost (no need to check if Z is a duplicate of itself) traverse it again for each Z, so n^2 complexity.
Also worth noting that the time complexity will now be a function of total number of entries and not of number of arrays (your nested loops would become n^4 with 4 arrays - this will stay as n^2)
You could make it more efficient by always checking the already found duplicates before checking for a new Z - if the new Z is already found as a duplicate to an earlier Z you need not traverse to check for that number again. This will make it more efficient the more duplicates there are - with few duplicates the reduction in number of traverses is probably not worth the extra complexity.
Of course you could also do this without actually combining the values into a single array - you would just need to make sure that your traversing routine looks through the arrays and keeps track of what it finds the in the right order.
Edit
Thinking about it, the above is doing way more than you want. It would allow you to report on doubles, quads etc. as well.
If you just want triples, then it is much easier/quicker. Since a triple needs to be in all 3 arrays, you can start by finding those numbers which are in any of the 2 arrays (if they are different lengths, compare the 2 shortest arrays first) and then to check any doublets found against the third array. Not sure what that brings the complexity down to but it will be less than n^2...
there are many ways to solve this here few selected ones sorted by complexity (descending) assuming n is average size of your individual arrays:
Brute force O(n^3)
its basicaly the same as you do so test any triplet combination by 3 nested for loops
for(x=1;x<=n;x++)
for(y=1;y<=n;y++)
for(z=1;z<=n;z++)
if(a[x]==b[y]==c[z])
return true;
return false;
slightly optimized brute force O(n^2)
simply check if each element from a is in b and if yes then check if it is also in c which is O(n*(n+n)) = O(n^2) as the b and c loops are not nested anymore:
for(x=1;x<=n;x++)
{
for(ret=false,y=1;y<=n;y++)
if(a[x]==b[y])
{ ret=true; break; }
if (!ret) continue;
for(ret=false,z=1;z<=n;z++)
if(a[x]==c[z])
{ ret=true; break; }
if (ret) return true;
}
return false;
exploit sorting O(n.log(n))
simply sort all arrays O(n.log(n)) and then just traverse all 3 arrays together to test if each element is present in all arrays (single for loop, incrementing the smallest element array). This can be done also with binary search like one of the other answers suggest but that is slower still not exceeding n.log(n). Here the O(n) traversal:
for(x=1,y=1,z=1;(x<=n)&&(y<=n)&&(z<=n);)
{
if(a[x]==b[y]==c[z]) return true;
if ((a[x]<b[y])&&(a[x]<c[z])) x++;
else if ((b[y]<a[x])&&(b[y]<c[z])) y++;
else z++;
}
return false;
however this needs to change the contents of arrays or need additional arrays for index sorting instead (so O(n) space).
histogram based O(n+m)
this can be used only if the range of elements in your array is not too big. Let say the arrays can hold numbers 1 .. m then you add (modified) histogram holding set bit for each array where value is presen and simply check if value is present in all 3:
int h[m]; // histogram
for(i=1;i<=m;i++) h[i]=0; // clear histogram
for(x=1;x<=n;x++) h[a[x]]|=1;
for(y=1;y<=n;y++) h[b[y]]|=2;
for(z=1;z<=n;z++) h[c[z]]|=4;
for(i=1;i<=m;i++) if (h[i]==7) return true;
return false;
This needs O(m) space ...
So you clearly want option #2
Beware all the code is just copy pasted yours and modified directly in answer editor so there might be typos or syntax error I do not see right now...

Displacement of unsorted shifted array

We have one unsorted array with distinct entries a_1, a_2, ... a_n, and we also know a shifted array a_(n-k), ...a_n, a_1, a_2, ... The goal is to find the displacement k given these two arrays. Of course there is a worst case linear algorithm O(n). But can we do better than this?
There is a hint that the answer has something to do with the k distribution. If k is distributed uniformly between 0 and n, then we have to do it within O(n). If k is distributed in otherway there might be some better way.
If there are no duplicates in the array (distinct entries) I would do this with a while loop and incrementing an index value k starting from 0 and comparing two items at once one from the beginning and one from the end. Such as array1[k] === array2[0] or array1[n-k] === array[0] and the index value k should be the displacement once the above comparison returns true.
There is an O(sqrt(n)) solution, as the op figured out based on #greybeard's hint.
From the first list, hash the first sqrt(n) elements. For the second list, look at the elements advancing by sqrt(n) elements at each time.
However, we might ask if there is a solution that might be close to O(k) (or less!) if k is small and n is large. In fact, I claim there is an O(sqrt(k)) solution.
For that, I propose an incremental process of increasing the step size. So the algorithm looks like this:
First, grab 2 elements from the first list - hash those values (and keep position of values as lookup value, so this should be thought of as a HashMap with key being elements of the list and values being positions).
Compare those elements with the first and third element from the second list.
Hash the values from the second list as well.
Next, look at the third element from the first list - hashing the value. In the process, see if it matches either of the elements found in the second list. Next, advance 3 elements in the second list, and compare its value - remember that values as well.
Continue like this:
increase the prefix length from the first list, and at each point, increasing the step size of the second list. Whenever you grab a new element for the first list, you have to compare it with values in the second list, but that's fine because it does not significantly affect performance.
Notice that when your prefix length is p, you have already checked the first p*(p+1)/2 elements in the second list. So for a given value of k, this process will require that prefix length p is approximately sqrt(2k), which is O(sqrt(k)) as required.
Basically, if we know that a[0] does not equal b[0], we do not need to check if a[1] equals b[1]. Extending this idea and hashing the a's, checks can go as follows:
a[0] == b[0] or b[0] in hash? => known k's: 0
a[1] == b[2] or b[2] in hash? => known k's: 0,1,2
a[2] == b[5] or b[5] in hash? => known k's: 0,1,2,3,4,5
a[3] == b[9] or b[9] in hash? => known k's: 0,1,2,3,4,5,6,7,8,9
a[4] == b[14] or b[14] in hash? => known k's: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
...
(I think that's O(sqrt n) time and space worst case complexity.)
maybe if you incorporate them into a hashtable. then the access and compare time for a(n-k) in the original array will be O(1).

Does the array “sum and/or sub” to `x`?

Goal
I would like to write an algorithm (in C) which returns TRUE or FALSE (1 or 0) depending whether the array A given in input can “sum and/or sub” to x (see below for clarification). Note that all values of A are integers bounded between [1,x-1] that were randomly (uniformly) sampled.
Clarification and examples
By “sum and/or sub”, I mean placing "+" and "-" in front of each element of array and summing over. Let's call this function SumSub.
int SumSub (int* A,int x)
{
...
}
SumSub({2,7,5},10)
should return TRUE as 7-2+5=10. You will note that the first element of A can also be taken as negative so that the order of elements in A does not matter.
SumSub({2,7,5,2},10)
should return FALSE as there is no way to “sum and/or sub” the elements of array to reach the value of x. Please note, this means that all elements of A must be used.
Complexity
Let n be the length of A. Complexity of the problem is of order O(2^n) if one has to explore all possible combinations of pluses and minus. However, some combinations are more likely than others and therefore are worth being explored first (hoping the output will be TRUE). Typically, the combination which requires substracting all elements from the largest number is impossible (as all elements of A are lower than x). Also, if n>x, it makes no sense to try adding all the elements of A.
Question
How should I go about writing this function?
Unfortunately your problem can be reduced to subset-sum problem which is NP-Complete. Thus the exponential solution can't be avoided.
The original problem's solution is indeed exponential as you said. BUT with the given range[1,x-1] for numbers in A[] you can make the solution polynomial. There is a very simple dynamic programming solution.
With the order:
Time Complexity: O(n^2*x)
Memory Complexity: O(n^2*x)
where, n=num of elements in A[]
You need to use dynamic programming approach for this
You know the min,max range that can be made in in the range [-nx,nx]. Create a 2d array of size (n)X(2*n*x+1). Lets call this dp[][]
dp[i][j] = taking all elements of A[] from [0..i-1] whether its possible to make the value j
so
dp[10][3] = 1 means taking first 10 elements of A[] we CAN create the value 3
dp[10][3] = 0 means taking first 10 elements of A[] we can NOT create the value 3
Here is a kind of pseudo code for this:
int SumSub (int* A,int x)
{
bool dp[][];//set all values of this array 0
dp[0][0] = true;
for(i=1;i<=n;i++) {
int val = A[i-1];
for(j=-n*x;j<=n*x;j++) {
dp[i][j]=dp[ i-1 ][ j + val ] | dp[ i-1 ][ j - val ];
}
}
return dp[n][x];
}
Unfortunately this is NP-complete even when x is restricted to the value 0, so don't expect a polynomial-time algorithm. To show this I'll give a simple reduction from the NP-hard Partition Problem, which asks whether a given multiset of positive integers can be partitioned into two parts having equal sums:
Suppose we have an instance of the Partition Problem consisting of n positive integers B_1, ..., B_n. Create from this an instance of your problem in which A_i = B_i for each 1 <= i <= n, and set x = 0.
Clearly if there is a partition of B into two parts C and D having equal sums, then there is also a solution to the instance of your problem: Put a + in front of every number in C, and a - in front of every number in D (or the other way round). Since C and D have equal sums, this expression must equal 0.
OTOH, if the solution to the instance of your problem that we just created is YES (TRUE), then we can easily create a partition of B into two parts having equal sums: just put all the positive terms in one part (say, C), and all the negative terms (without the preceding - of course) in the other (say, D). Since we know that the total value of the expression is 0, it must be that the sum of the (positive) numbers in C is equal to the (negated) sum of the numbers in D.
Thus a YES to either problem instance implies a YES to the other problem instance, which in turn implies that a NO to either problem instance implies a NO to the other problem instance -- that is, the two problem instances have equal solutions. Thus if it were possible to solve your problem in polynomial time, it would be possible to solve the NP-hard Partition Problem in polynomial time too, by constructing the above instance of your problem, solving it with your poly-time algorithm, and reporting the result it gives.

Largest 3 numbers c language [duplicate]

I have an array
A[4]={4,5,9,1}
I need it would give the first 3 top elements like 9,5,4
I know how to find the max element but how to find the 2nd and 3rd max?
i.e if
max=A[0]
for(i=1;i<4;i++)
{
if (A[i]>max)
{
max=A[i];
location=i+1;
}
}
actually sorting will not be suitable for my application because,
the position number is also important for me i.e. I have to know in which positions the first 3 maximum is occurring, here it is in 0th,1th and 2nd position...so I am thinking of a logic
that after getting the max value if I could put 0 at that location and could apply the same steps for that new array i.e.{4,5,0,1}
But I am bit confused how to put my logic in code
Consider using the technique employed in the Python standard library. It uses an underlying heap data structure:
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapify(result)
for elem in it:
heappushpop(result, elem)
result.sort(reverse=True)
return result
The steps are:
Make an n length fixed array to hold the results.
Populate the array with the first n elements of the input.
Transform the array into a minheap.
Loop over remaining inputs, replacing the top element of the heap if new data element is larger.
If needed, sort the final n elements.
The heap approach is memory efficient (not requiring more memory than the target output) and typically has a very low number of comparisons (see this comparative analysis).
You can use the selection algorithm
Also to mention that the complexity will be O(n) ie, O(n) for selection and O(n) for iterating, so the total is also O(n)
What your essentially asking is equivalent to sorting your array in descending order. The fastest way to do this is using heapsort or quicksort depending on the size of your array.
Once your array is sorted your largest number will be at index 0, your second largest will be at index 1, ...., in general your nth largest will be at index n-1
you can follw this procedure,
1. Add the n elements to another array B[n];
2. Sort the array B[n]
3. Then for each element in A[n...m] check,
A[k]>B[0]
if so then number A[k] is among n large elements so,
search for proper position for A[k] in B[n] and replace and move the numbers on left in B[n] so that B[n] contains n large elements.
4. Repeat this for all elements in A[m].
At the end B[n] will have the n largest elements.

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Resources