given 3 arrays check if there is any common number - arrays

**I have 3 arrays a[1...n] b[1...n] c[1....n] which contain integers.
It is not mentioned if the arrays are sorted or if each array has or has not duplicates.
The task is to check if there is any common number in the given arrays and return true or false.
For example : these arrays a=[3,1,5,10] b=[4,2,6,1] c=[5,3,1,7] have one common number : 1
I need to write an algorithm with time complexity O(n^2).
I let the current element traversed in a[] be x, in b[] be y and in c[] be z and have following cases inside the loop : If x, y and z are same, I can simply return true and stop the program,something like:
for(x=1;x<=n;x++)
for(y=1;y<=n;y++)
for(z=1;z<=n;z++)
if(a[x]==b[y]==c[z])
return true
But this algorithm has time complexity O(n^3) and I need O(n^2).Any suggestions?

There is a pretty simple and efficient solution for this.
Sort a and b. Complexity = O(NlogN)
For each element in c, use binary search to check if it exists in both a and b. Complexity = O(NlogN).
That'll give you a total complexity of O(NlogN), better than O(N^2).

Create a new array, and save common elements in a and b arrays. Then find common elements in this array with c array.
python solution
def find_d(a, b, c):
for i in a:
for j in b:
if i==j:
d.append(i)
def findAllCommon(c, d):
for i in c:
for j in d:
if i==j:
e.append(i)
break
a = [3,1,5,10]
b = [4,2,6,1]
c = [5,3,1,7]
d = []
e = []
find_d(a, b, c)
findAllCommon(c, d)
if len(e)>0:
print("True")
else:
print("False")

Since I haven't seen a solution based on sets, so I suggest looking for how sets are implemented in your language of choice and do the equivalent of this:
set(a).intersection(b).intersection(c) != set([])
This evaluates to True if there is a common element, False otherwise. It runs in O(n) time.

All solutions so far either require O(n) additional space (creating a new array/set) or change the order of the arrays (sorting).
If you want to solve the problem in O(1) additional space and without changing the original arrays, you indeed can't do better than O(n^2) time:
foreach(var x in a) { // n iterations
if (b.Contains(x) && c.Contains(x)) return true; // max. 2n
} // O(n^2)
return false;

A suggestion:
Combine into a single array(z) where z = sum of the entries in each array. Keep track of how many entries there were in Array 1, Array 2, Array 3.
For each entry Z traverse the array to see how many duplicates there are within the combined array and where they are. For those which have 2 or more (ie there are 3 or more of the same number), check that the location of those duplicates correspond to having been in different arrays to begin with (ruling our duplicates within the original arrays). If your number Z has 2 or more duplicates and they are all in different arrays (checked through their position in the array) then store that number Z in result array.
Report result array.
You will traverse the entire combined array once and then almost (no need to check if Z is a duplicate of itself) traverse it again for each Z, so n^2 complexity.
Also worth noting that the time complexity will now be a function of total number of entries and not of number of arrays (your nested loops would become n^4 with 4 arrays - this will stay as n^2)
You could make it more efficient by always checking the already found duplicates before checking for a new Z - if the new Z is already found as a duplicate to an earlier Z you need not traverse to check for that number again. This will make it more efficient the more duplicates there are - with few duplicates the reduction in number of traverses is probably not worth the extra complexity.
Of course you could also do this without actually combining the values into a single array - you would just need to make sure that your traversing routine looks through the arrays and keeps track of what it finds the in the right order.
Edit
Thinking about it, the above is doing way more than you want. It would allow you to report on doubles, quads etc. as well.
If you just want triples, then it is much easier/quicker. Since a triple needs to be in all 3 arrays, you can start by finding those numbers which are in any of the 2 arrays (if they are different lengths, compare the 2 shortest arrays first) and then to check any doublets found against the third array. Not sure what that brings the complexity down to but it will be less than n^2...

there are many ways to solve this here few selected ones sorted by complexity (descending) assuming n is average size of your individual arrays:
Brute force O(n^3)
its basicaly the same as you do so test any triplet combination by 3 nested for loops
for(x=1;x<=n;x++)
for(y=1;y<=n;y++)
for(z=1;z<=n;z++)
if(a[x]==b[y]==c[z])
return true;
return false;
slightly optimized brute force O(n^2)
simply check if each element from a is in b and if yes then check if it is also in c which is O(n*(n+n)) = O(n^2) as the b and c loops are not nested anymore:
for(x=1;x<=n;x++)
{
for(ret=false,y=1;y<=n;y++)
if(a[x]==b[y])
{ ret=true; break; }
if (!ret) continue;
for(ret=false,z=1;z<=n;z++)
if(a[x]==c[z])
{ ret=true; break; }
if (ret) return true;
}
return false;
exploit sorting O(n.log(n))
simply sort all arrays O(n.log(n)) and then just traverse all 3 arrays together to test if each element is present in all arrays (single for loop, incrementing the smallest element array). This can be done also with binary search like one of the other answers suggest but that is slower still not exceeding n.log(n). Here the O(n) traversal:
for(x=1,y=1,z=1;(x<=n)&&(y<=n)&&(z<=n);)
{
if(a[x]==b[y]==c[z]) return true;
if ((a[x]<b[y])&&(a[x]<c[z])) x++;
else if ((b[y]<a[x])&&(b[y]<c[z])) y++;
else z++;
}
return false;
however this needs to change the contents of arrays or need additional arrays for index sorting instead (so O(n) space).
histogram based O(n+m)
this can be used only if the range of elements in your array is not too big. Let say the arrays can hold numbers 1 .. m then you add (modified) histogram holding set bit for each array where value is presen and simply check if value is present in all 3:
int h[m]; // histogram
for(i=1;i<=m;i++) h[i]=0; // clear histogram
for(x=1;x<=n;x++) h[a[x]]|=1;
for(y=1;y<=n;y++) h[b[y]]|=2;
for(z=1;z<=n;z++) h[c[z]]|=4;
for(i=1;i<=m;i++) if (h[i]==7) return true;
return false;
This needs O(m) space ...
So you clearly want option #2
Beware all the code is just copy pasted yours and modified directly in answer editor so there might be typos or syntax error I do not see right now...

Related

Limit input data to achieve a better Big O complexity

You are given an unsorted array of n integers, and you would like to find if there are any duplicates in the array (i.e. any integer appearing more than once).
Describe an algorithm (implemented with two nested loops) to do this.
The question that I am stuck at is:
How can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?
Your help will be greatly appreciated. This is not related to my coursework, assignment or coursework and such. It's from the previous year exam paper and I am doing some self-study but seem to be stuck on this question. The only possible solution that i could come up with is:
If we limit the data, and use nested loops to perform operations to find if there are duplicates. The complexity would be O(n) simply because the amount of time the operations take to perform is proportional to the data size.
If my answer makes no sense, then please ignore it and if you could, then please suggest possible solutions/ working out to this answer.
If someone could help me solve this answer, I would be grateful as I have attempted countless possible solution, all of which seems to be not the correct one.
Edited part, again.. Another possible solution (if effective!):
We could implement a loop to sort the array so that it sorts the array (from lowest integer to highest integer), therefore the duplicates will be right next to each other making them easier and faster to be identified.
The big O complexity would still be O(n^2).
Since this is linear type, it would simply use the first loop and iterate n-1 times as we are getting the index in the array (in the first iteration it could be, for instance, 1) and store this in a variable names 'current'.
The loop will update the current variable by +1 each time through the iteration, within that loop, we now write another loop to compare the current number to the next number and if it equals to the next number, we can print using a printf statement else we move back to the outer loop to update the current variable by + 1 (next value in the array) and update the next variable to hold the value of the number after the value in current.
You can do linearly (O(n)) for any input if you use hash tables (which have constant look-up time).
However, this is not what you are being asked about.
By limiting the possible values in the array, you can achieve linear performance.
E.g., if your integers have range 1..L, you can allocate a bit array of length L, initialize it to 0, and iterate over your input array, checking and flipping the appropriate bit for each input.
A variance of Bucket Sort will do. This will give you complexity of O(n) where 'n' is the number of input elements.
But one restriction - max value. You should know the max value your integer array can take. Lets say it as m.
The idea is to create a bool array of size m (all initialized to false). Then iterate over your array. As you find an element, set bucket[m] to true. If it is already true then you've encountered a duplicate.
A java code,
// alternatively, you can iterate over the array to find the maxVal which again is O(n).
public boolean findDup(int [] arr, int maxVal)
{
// java by default assigns false to all the values.
boolean bucket[] = new boolean[maxVal];
for (int elem : arr)
{
if (bucket[elem])
{
return true; // a duplicate found
}
bucket[elem] = true;
}
return false;
}
But the constraint here is the space. You need O(maxVal) space.
nested loops get you O(N*M) or O(N*log(M)) for O(N) you can not use nested loops !!!
I would do it by use of histogram instead:
DWORD in[N]={ ... }; // input data ... values are from < 0 , M )
DWORD his[M]={ ... }; // histogram of in[]
int i,j;
// compute histogram O(N)
for (i=0;i<M;i++) his[i]=0; // this can be done also by memset ...
for (i=0;i<N;i++) his[in[i]]++; // if the range of values is not from 0 then shift it ...
// remove duplicates O(N)
for (i=0,j=0;i<N;i++)
{
his[in[i]]--; // count down duplicates
in[j]=in[i]; // copy item
if (his[in[i]]<=0) j++; // if not duplicate then do not delete it
}
// now j holds the new in[] array size
[Notes]
if value range is too big with sparse areas then you need to convert his[]
to dynamic list with two values per item
one is the value from in[] and the second is its occurrence count
but then you need nested loop -> O(N*M)
or with binary search -> O(N*log(M))

Convert a given array of integers to a sorted one, by deleting the minimum number of elements

I am working on the following problem:
I have to convert a given array of integers to a sorted one, by deleting the minimum number of elements.
For example: [3,5,2,10,11] will be sorted by deleting ‘2‘ : [3,5,10,11].
Or [3,6,2,4,5,7,7] will be sorted by deleting ‘3‘,’6‘ : [2,4,5,7,7] OR by deleting ‘6‘,’2‘ : [3,4,5,7,7] (both ways i remove 2 elements, that’s why they are both correct).
My thought was to keep a counter for each element to see how many conflicts it has with other elements.
What i mean by conflicts: in the first example, numbers ‘3‘ and ‘5‘ have 1 conflict each (with the number ‘2‘) and number ‘2‘ has 2 conflicts (with numbers ‘3‘ and ‘5‘).
So, after calculating the array of conflicts, i remove from the original array the element that has the max number of conflicts and i repeat for the remaining array until all elements have 0 conflicts.
This is not an efficient way though (in some cases that i havent thought it might also produce wrong results), so i was wondering if anyone could think of a better solution.
I believe this is just a cleverly disguised version of the longest increasing subsequence problem. If you delete the minimum number of elements to have a sorted sequence, what you're left with is the longest increasing subsequence of the original array. Accordingly, you can do the following:
Find the longest increasing subsequence (O(n log n) algorithms exist for this), then
Delete everything not in that subsequence.
Hope this helps!
You can build a DAG based on the elements in the array:
Each element a[n] is a vertex.
For any pair of elements (m,n) where (m < n) and (a[m] <= a[n]) add a directed edge.
Optimization: you can build simple chains for sorted subarrays. For example, if a[m]<=a[m+1)<=a[m+2]<=a[m+3]>a[m+4], you can skip adding the edges (m,m+2) and (m,m+3) for vertex m.
The goal now is to find the longest path in the graph, which has a linear time solution for directed acyclic graphs.
A algorithm is described in the aforementioned Wikipedia page and also here.
I would do it with recursive programming. Here is my pseudo code :
/**
* sortedArray : an array already sorted.
* leftToSort : an unsorted array that need to be sorted/merged with sortedArray.
* first call to this function must be sortArrayByRemoval([], arrayToSort);
**/
public Integer[] sortArrayByRemoval(Integer[] sortedArray, Integer[] leftToSort){
if(leftToSort.size==0){
return sortedArray; //end of recursion
}
Integer candidate = leftToSort[0];
if(candidate>=sortedArray[last]){ //append candidate to the sorted array
return sortArrayByRemoval(sortedArray.append(candidate) , leftToSort.removeFirst());
}else{
//either we skip it
Integer[] result1 = sortArrayByRemoval(sortedArray,leftToSort.removeFirst());
//either we do back tracking
Integer[] result2 = sortArrayByRemoval(sortedArray.removeLast(),leftToSort);
//and finally we return the best choice (i.e. the longest array)
return biggestArray(result1, result2);
}
}
Maybe not the most efficient, but I think it gives you the correct answer.
If you are not stuck on the idea of literally deleting things from the original array, then what you want is the longest increasing subsequence of your original sequence of numbers. This is a well-known, classic problem and you can find many examples in the literature or in textbooks.
(If you are stuck on deleting, well... find the LCS and delete everything not in the LCS.)

(Algorithm) Find if two unsorted arrays have any common elements in O(n) time without sorting?

We have two unsorted arrays and each array has a length of n. These arrays contain random integers in the range of 0-n100. How to find if these two arrays have any common elements in O(n)/linear time? Sorting is not allowed.
Hashtable will save you. Really, it's like a swiss knife for algorithms.
Just put in it all values from the first array and then check if any value from the second array is present.
You have not defined the model of computation. Assuming you can only read O(1) bits in O(1) time (anything else would be a rather exotic model of computation), there can be no algorithm solving the problem in O(n) worst case time complexity.
Proof Sketch:
Each number in the input takes O(log(n ^ 100)) = O(100 log n) = O(log n) bits. The entire input therefore O(n log n) bits, which can not be read in O(n) time. Any O(n) algorithm can therefore not read the entire input, and hence not react if these bits matter.
Answering Neil:
Since you know at start what is your N (two arrays of size N), you can create a hash with array size of 2*N*some_ratio (for example: some_ratio= 1.5). With this size, almost all simple hash functions will provide you good spread of the entities.
You can also implement find_or_insert to search for existing or insert a new one at the same action, this will reduce the hash function and comparison calls. (c++ stl find_or_insert is not good enough since it doesnt tell you whether the item was there before or not).
Linearity Test
Using Mathematica hash function and arbitrary length integers.
Tested until n=2^20, generating random numbers till (2^20)^100 = (approx 10^602)
Just in case ... the program is:
k = {};
For[t = 1, t < 21, t++,
i = 2^t;
Clear[a, b];
Table[a[RandomInteger[i^100]] = 1, {i}];
b = Table[RandomInteger[i^100], {i}];
Contains = False;
AppendTo[k,
{i, First#Timing#For[j = 2, j <= i, j++,
Contains = Contains || (NumericQ[a[b[[j]]]]);
]}]];
ListLinePlot[k, PlotRange -> All, AxesLabel -> {"n", "Time(secs)"}]
Put the elements of the first array in an hash table, and check for existence scanning the second array. This gives you a solution in O(N) average case.
If you want a truly O(N) worst case solution then instead of using an hash table use a linear array in the range 0-n^100. Note that you need to use just a single bit per entry.
If storage is not important, then scratch hash table in favor for an array of n in length. Flag to true when you come across that number in first array. In pass through second array, if you find any of them to be true, you have your answer. O(n).
Define largeArray(n)
// First pass
for(element i in firstArray)
largeArray[i] = true;
// Second pass
Define hasFound = false;
for(element i in secondArray)
if(largeArray[i] == true)
hasFound = true;
break;
Have you tried a counting sort? It is simple to implement, uses O(n) space and also has a \theta(n) time complexity.
Based on the ideas posted till date.We can store the one array integer elements into a hash map . Maximum number of different integers can be stored in RAM . Hash map will have only unique integer values. Duplicates are ignored.
Here is the implementation in Perl language.
#!/usr/bin/perl
use strict;
use warnings;
sub find_common_elements{ # function that prints common elements in two unsorted array
my (#arr1,#array2)=#_; # array elements assumed to be filled and passed as function arguments
my $hash; # hash map to store value of one array
# runtime to prepare hash map is O(n).
foreach my $ele ($arr1){
$hash->{$ele}=true; # true here element exists key is integer number and value is true, duplicate elements will be overwritten
# size of array will fit in memory as duplicate integers are ignored ( mx size will be 2 ^( 32) -1 or 2^(64) -1 based on operating system) which can be stored in RAM.
}
# O(n ) to traverse second array and finding common elements in two array
foreach my $ele2($arr2){
# search in hash map is O(1), if all integers of array are same then hash map will have only one entry and still search tim is O(1)
if( defined $hash->{$ele}){
print "\n $ele is common in both array \n";
}
}
}
I hope it helps.

Compare two integer arrays with same length

[Description] Given two integer arrays with the same length. Design an algorithm which can judge whether they're the same. The definition of "same" is that, if these two arrays were in sorted order, the elements in corresponding position should be the same.
[Example]
<1 2 3 4> = <3 1 2 4>
<1 2 3 4> != <3 4 1 1>
[Limitation] The algorithm should require constant extra space, and O(n) running time.
(Probably too complex for an interview question.)
(You can use O(N) time to check the min, max, sum, sumsq, etc. are equal first.)
Use no-extra-space radix sort to sort the two arrays in-place. O(N) time complexity, O(1) space.
Then compare them using the usual algorithm. O(N) time complexity, O(1) space.
(Provided (max − min) of the arrays is of O(Nk) with a finite k.)
You can try a probabilistic approach - convert the arrays into a number in some huge base B and mod by some prime P, for example sum B^a_i for all i mod some big-ish P. If they both come out to the same number, try again for as many primes as you want. If it's false at any attempts, then they are not correct. If they pass enough challenges, then they are equal, with high probability.
There's a trivial proof for B > N, P > biggest number. So there must be a challenge that cannot be met. This is actually the deterministic approach, though the complexity analysis might be more difficult, depending on how people view the complexity in terms of the size of the input (as opposed to just the number of elements).
I claim that: Unless the range of input is specified, then it is IMPOSSIBLE to solve in onstant extra space, and O(n) running time.
I will be happy to be proven wrong, so that I can learn something new.
Insert all elements from the first array into a hashtable
Try to insert all elements from the second array into the same hashtable - for each insert to element should already be there
Ok, this is not with constant extra space, but the best I could come up at the moment:-). Are there any other constraints imposed on the question, like for example to biggest integer that may be included in the array?
A few answers are basically correct, even though they don't look like it. The hash table approach (for one example) has an upper limit based on the range of the type involved rather than the number of elements in the arrays. At least by by most definitions, that makes the (upper limit on) the space a constant, although the constant may be quite large.
In theory, you could change that from an upper limit to a true constant amount of space. Just for example, if you were working in C or C++, and it was an array of char, you could use something like:
size_t counts[UCHAR_MAX];
Since UCHAR_MAX is a constant, the amount of space used by the array is also a constant.
Edit: I'd note for the record that a bound on the ranges/sizes of items involved is implicit in nearly all descriptions of algorithmic complexity. Just for example, we all "know" that Quicksort is an O(N log N) algorithm. That's only true, however, if we assume that comparing and swapping the items being sorted takes constant time, which can only be true if we bound the range. If the range of items involved is large enough that we can no longer treat a comparison or a swap as taking constant time, then its complexity would become something like O(N log N log R), were R is the range, so log R approximates the number of bits necessary to represent an item.
Is this a trick question? If the authors assumed integers to be within a given range (2^32 etc.) then "extra constant space" might simply be an array of size 2^32 in which you count the occurrences in both lists.
If the integers are unranged, it cannot be done.
You could add each element into a hashmap<Integer, Integer>, with the following rules: Array A is the adder, array B is the remover. When inserting from Array A, if the key does not exist, insert it with a value of 1. If the key exists, increment the value (keep a count). When removing, if the key exists and is greater than 1, reduce it by 1. If the key exists and is 1, remove the element.
Run through array A followed by array B using the rules above. If at any time during the removal phase array B does not find an element, you can immediately return false. If after both the adder and remover are finished the hashmap is empty, the arrays are equivalent.
Edit: The size of the hashtable will be equal to the number of distinct values in the array does this fit the definition of constant space?
I imagine the solution will require some sort of transformation that is both associative and commutative and guarantees a unique result for a unique set of inputs. However I'm not sure if that even exists.
public static boolean match(int[] array1, int[] array2) {
int x, y = 0;
for(x = 0; x < array1.length; x++) {
y = x;
while(array1[x] != array2[y]) {
if (y + 1 == array1.length)
return false;
y++;
}
int swap = array2[x];
array2[x] = array2[y];
array2[y] = swap;
}
return true;
}
For each array, Use Counting sort technique to build the count of number of elements less than or equal to a particular element . Then compare the two built auxillary arrays at every index, if they r equal arrays r equal else they r not . COunting sort requires O(n) and array comparison at every index is again O(n) so totally its O(n) and the space required is equal to the size of two arrays . Here is a link to counting sort http://en.wikipedia.org/wiki/Counting_sort.
given int are in the range -n..+n a simple way to check for equity may be the following (pseudo code):
// a & b are the array
accumulator = 0
arraysize = size(a)
for(i=0 ; i < arraysize; ++i) {
accumulator = accumulator + a[i] - b[i]
if abs(accumulator) > ((arraysize - i) * n) { return FALSE }
}
return (accumulator == 0)
accumulator must be able to store integer with range = +- arraysize * n
How 'bout this - XOR all the numbers in both the arrays. If the result is 0, you got a match.

Find duplicate entry in array of integers

As a homework question, the following task had been given:
You are given an array with integers between 1 and 1,000,000. One
integer is in the array twice. How can you determine which one? Can
you think of a way to do it using little extra memory.
My solutions so far:
Solution 1
List item
Have a hash table
Iterate through array and store its elements in hash table
As soon as you find an element which is already in hash table, it is
the dup element
Pros
It runs in O(n) time and with only 1 pass
Cons
It uses O(n) extra memory
Solution 2
Sort the array using merge sort (O(nlogn) time)
Parse again and if you see a element twice you got the dup.
Pros
It doesn't use extra memory
Cons
Running time is greater than O(n)
Can you guys think of any better solution?
The question is a little ambiguous; when the request is "which one," does it mean return the value that is duplicated, or the position in the sequence of the duplicated one? If the former, any of the following three solutions will work; if it is the latter, the first is the only that will help.
Solution #1: assumes array is immutable
Build a bitmap; set the nth bit as you iterate through the array. If the bit is already set, you've found a duplicate. It runs on linear time, and will work for any size array.
The bitmap would be created with as many bits as there are possible values in the array. As you iterate through the array, you check the nth bit in the array. If it is set, you've found your duplicate. If it isn't, then set it. (Logic for doing this can be seen in the pseudo-code in this Wikipedia entry on Bit arrays or use the System.Collections.BitArray class.)
Solution #2: assumes array is mutable
Sort the array, and then do a linear search until the current value equals the previous value. Uses the least memory of all. Bonus points for altering the sort algorithm to detect the duplicate during a comparison operation and terminating early.
Solution #3: (assumes array length = 1,000,001)
Sum all of the integers in the array.
From that, subtract the sum of the integers 1 through 1,000,000 inclusive.
What's left will be your duplicated value.
This take almost no extra memory, can be done in one pass if you calculate the sums at the same time.
The disadvantage is that you need to do the entire loop to find the answer.
The advantages are simplicity, and a high probability it will in fact run faster than the other solutions.
Assuming all the numbers from 1 to 1,000,000 are in the array, the sum of all numbers from 1 to 1,000,000 is (1,000,000)*(1,000,000 + 1)/2 = 500,000 * 1,000,001 = 500,000,500,000.
So just add up all the numbers in the array, subtract 500,000,500,000, and you'll be left with the number that occured twice.
O(n) time, and O(1) memory.
If the assumption isn't true, you could try using a Bloom Filter - they can be stored much more compactly than a hash table (since they only store fact of presence), but they do run the risk of false positives. This risk can be bounded though, by our choice of how much memory to spend on the bloom filter.
We can then use the bloom filter to detect potential duplicates in O(n) time and check each candidate in O(n) time.
This python code is a modification of QuickSort:
def findDuplicate(arr):
orig_len = len(arr)
if orig_len <= 1:
return None
pivot = arr.pop(0)
greater = [i for i in arr if i > pivot]
lesser = [i for i in arr if i < pivot]
if len(greater) + len(lesser) != orig_len - 1:
return pivot
else:
return findDuplicate(lesser) or findDuplicate(greater)
It finds a duplicate in O(n logn)), I think. It uses extra memory on the stack, but it can be rewritten to use only one copy of the original data, I believe:
def findDuplicate(arr):
orig_len = len(arr)
if orig_len <= 1:
return None
pivot = arr.pop(0)
greater = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] > pivot]
lesser = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] < pivot]
if len(arr):
return pivot
else:
return findDuplicate(lesser) or findDuplicate(greater)
The list comprehensions that produce greater and lesser destroy the original with calls to pop(). If arr is not empty after removing greater and lesser from it, then there must be a duplicate and it must be pivot.
The code suffers from the usual stack overflow problems on sorted data, so either a random pivot or an iterative solution which queues the data is necessary:
def findDuplicate(full):
import copy
q = [full]
while len(q):
arr = copy.copy(q.pop(0))
orig_len = len(arr)
if orig_len > 1:
pivot = arr.pop(0)
greater = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] > pivot]
lesser = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] < pivot]
if len(arr):
return pivot
else:
q.append(greater)
q.append(lesser)
return None
However, now the code needs to take a deep copy of the data at the top of the loop, changing the memory requirements.
So much for computer science. The naive algorithm clobbers my code in python, probably because of python's sorting algorithm:
def findDuplicate(arr):
arr = sorted(arr)
prev = arr.pop(0)
for element in arr:
if element == prev:
return prev
else:
prev = element
return None
Rather than sorting the array and then checking, I would suggest writing an implementation of a comparison sort function that exits as soon as the dup is found, leading to no extra memory requirement (depending on the algorithm you choose, obviously) and a worst case O(nlogn) time (again, depending on the algorithm), rather than a best (and average, depending...) case O(nlogn) time.
E.g. An implementation of in-place merge sort.
http://en.wikipedia.org/wiki/Merge_sort
Hint: Use the property that A XOR A == 0, and 0 XOR A == A.
As a variant of your solution (2), you can use radix sort. No extra memory, and will run in
linear time. You can argue that time is also affected by the size of numbers representation, but you have already given bounds for that: radix sort runs in time O(k n), where k is the number of digits you can sort ar each pass. That makes the whole algorithm O(7n)for sorting plus O(n) for checking the duplicated number -- which is O(8n)=O(n).
Pros:
No extra memory
O(n)
Cons:
Need eight O(n) passes.
And how about the problem of finding ALL duplicates? Can this be done in less than
O(n ln n) time? (Sort & scan) (If you want to restore the original array, carry along the original index and reorder after the end, which can be done in O(n) time)
def singleton(array):
return reduce(lambda x,y:x^y, array)
Sort integer by sorting them on place they should be. If you get "collision" than you found the correct number.
space complexity O(1) (just same space that can be overwriten)
time complexity less than O(n) becuse you will statisticaly found collison before getting on the end.

Resources