Fastest Search algorithm in 2D array - c

So, I have a 2D array, int a[X][Y];
X can go up to 10 000 000 and Y is maximum 6.
Given an array int v[Z] (Z <= Y), I have to see if I find a line in a that contains all the elements from v.
What would be the fastest algorithm for this matter and how would you implement this?
I have already tried the classic method of taking line by line and then with the 2 fors search, one for v elements and one for a elements but it takes too long.
What would be the best (fastest) approach ?
int check()
{
int nrfound;
for (int l = 0; l < lines_counter; l++) for each line in a array
{
nrfound = 0;
for (int i = 0; i < n; i++) { // for each element in v array
for (int j = 0; j < m; j++) // for each element in a[l] line
if (v[i] == a[l][j])
nrfound++;
if (nrfound == Z)
return 0;
}
}
return 1;
}

I see three things to consider:
Using threads.
If it's possible, when constructing int a[X][Y] table I would create additional array int[6][Y] which will contain:
List of indexes which contain 1, 2, 3 .. 6 elements. This allows you to narrow the search.
For each X count Hash of it's values. Then count Hash of V values.
Compare Hash code, instead of each separate value.

For the case of reusing the same array a[] with multiple different v[]:
Sort every line of a[][] as preliminary step (executed once)
Sort v[]
Use single loop (instead of two) to get intersection of ordered v[] and every ordered line of a[] - with approach like merge procedure of merge sort
index_v = 0
index_a = 0
while index_v < length_v and index_a < length_a:
if v[index_v] == a[index_a]
index_v++, index_a++
else if v[index_v] < a[index_a]
index_v++
else
index_a++
if index_v == length_v:
return OK, a[] line contains all v elements

Sorting 1e7 arrays of size 6 can be easily parallelized using fixed sorting network with or without Simd/multithreading.
Sort v and compare that with same principle as merge sorting two sorted lists.
The overall worst case complexity is between 13e7..24e7 comparisons (sorting network for 6 elements requires 12 conditional swaps and merging v/a[n] requires 1..12 comparisons.

As you're working in C, it limits available data structures:
I would suggest :
Initialize N threads, divide matrix rows X in N buckets, and run searching for each bucket in parallel.
Depending on type of 2D input array : You can save some time with boundary conditions as you want all elements of query array maintain the order. You can also make use of (Z <= Y) Length of each line as to match if should first match the length.
Sorting the array will add complexity to it. So better to avoid it.

Your algorithm has a flaw if there are duplicate elements in the a[i][] subarrays. A matching element of v will be counted multiple times and the count may equal Z by coincidence.
Here is a corrected version:
int check(int X, int Y, int Z, int a[X][Y], int v[Z]) {
for (int x = 0; x < X; x++) {
// for each line in array a
int mask = 0;
for (int z = 0; z < Z; z++) {
// for each element in array v
for (int y = 0, m = 1; y < Y; y++, m <<= 1) {
// for each element in line a[x]
if (v[z] == a[x][y] && !(mask & m)) {
mask |= m;
break;
}
}
if (y == Y)
break;
}
if (z == Z)
return 0; // found a match
}
}
return 1; // no match
}
Unfortunately, the above code might be even slower than the posted one, but it is worth testing as the inner loop is exited as soon as a element from v is not found in a[x].

Related

How i can find all unique sets of positions of elements in matrix in C?

I need to solve the following problem for a 5×5 matrix, but to explain I will use an example with a 3×3 matrix:
A = { { 1, 3, 2 }
,{ 3, 2, 3 }
,{ 0, 4, 5 } };
I need to find all distinct sets of 3 (because the matrix is 3x3) positions sharing no row or column with the others, compute the sum of elements of A for each set of positions, and print the minimum of these sums.
Position = (0,0),(1,1),(2,2) sum = 1+2+5 = 8
(0,0),(1,2),(2,1) sum = 1+3+4 = 8
(0,1),(1,0),(2,2) sum = 3+3+5 = 11
(0,1),(1,2),(2,0) sum = 3+3+0 = 6
(2,0),(1,1),(0,2) sum = 0+2+2 = 4
.
.
.
(I think you understood the main principle).
So the output must include: (2,0),(1,1),(0,2) minimal sum = 4
Remember: I actually need to do it for a 5×5 matrix.
A functional, albeit naive, way to do this is to use 6 for-loops (5 nested). Loop from 0 to 2 with the top loop storing its iteration # in an int (called firstRow for example). Similarly the second loop will store firstCol. The third loop will be used to store secondRow so you'll need to continue if secondRow == firstRow. For the last two loops you'll need to check against the indeces for the other two. In the innermost nested loop, call your findSum function with the 3 coordinate pairs.
testCoords(*arr1, *arr2, *arr3)
{
#get the sum
}
#algorithm defined for n = 3
mySearch(n)
{
int coord1[2], coord2[2], coord3[2]; #assume 3by3
int minSum = n * MAX_VAL, obsSum;
for (int r1 = 0; r1 < n; r1++)
{
coord1[0] = r1;
for (int c1 = 0; c1 < n; c1++)
{
coord1[1] = c1;
for (int r2 = 0; r2 < n; r2++)
{
if (r1 != r2)
{
coord2[0] = r2;
for (int c2 = 0; c2 < n; c2++)
{
if (c1 != c2)
{
coord2[1] = c2;
for (int r3 = 0; r3 < n; r3++)
{
if (r1 != r3 && r2 != r3)
{
coord3[0] = r3;
for (int c3 = 0; c3 < n; c3++)
{
coord3[1] = c3;
obsSum = testCoords(coord1, coord2, coord3);
if (obsSum < minSum)
{
minSum = obsSum;
}
}
}
}
}
}
}
}
}
}
}
This will be fine for small arrays such as n=3 or n=5, but the number of iterations quickly gets ridiculous as its n^(n*2). For example, even with 5x5 matrix you'll do 10 million iterations (not too mention a long winded algorithm). A more dynamic algorithm or perhaps a tree implementation is probably a good fit here. For example, a recursive approach could find one index pair (which eliminates a row and column), then calls itself with the resultant (n-1)*(n-1) 2d array -- as so:
int minSum = n * MAX_VAL;
coordSearch(int **matrix, n)
{
int thisCoord[2];
if (n == 1)
{
return matrix[0][0];
}
else
{
for (int i = 0; i < n; i++)
{
thisCoord[0] = i;
for (int j = 0; j < n; j++)
{
thisCoord[1] = j;
##need to update the matrix s.t. row represented by i is removed and col represented by j is removed
##ill leave that up to you -- assume its called updatedMatrix
updatedMatrix = reduce(matrix, i, j);
return matrix[thisCoord[0], thisCoord[1]] + coordSearch(updatedMatrix, n-1);
}
}
}
}
int main(void)
{
#have some 2d structure that is n * n
int minSum = n * MAX_VAL, obsSum;
int row, col;
for (int i = 0; i < n; i++)
{
row = i
for (int j = 0; j < n; j++)
{
col = j;
updatedMatrix = reduce(matrix, row, col);
obsSum = coordSearch(updatedMatrix, n- 1);
if (obsSum < minSum)
{
minSum = obsSum;
}
}
}
}
For a 3x3 2d array, the recursive approach will look at the 9 coordinate pairs at the top level, then in the next level we will be dealing with a 2x2 2d array so we will only consider 4 coordinate pairs, then in the bottom level we just return whichever value resides in our 1x1 "2d array". Complexity is n^2 * (n-1)^2 * .. * 1. Keep in mind though, that each "step" requires updating the matrix which is a operation dense procedure.
Here's another suggestion: all of the sets of locations in the matrix that you want to use can be represented as permutations of an identity matrix whose "1" entries tell you which matrix elements to add up. You then take the minimum over the set of sums for all of the permutations. You can represent a permutation with a simple array since there are only N elements equal to 1 in a permutation of the NxN identity matrix. So call that array p where p(i) tells you which column on the i'th row to use.
So the fundamental observation here is that you want all permutations of the NxN identity matrix, and you can represent these as permutations of (0,1,...,N-1).
Pseudocode might look like:
Given: an NxN matrix (2-D array), M, for which you want the minimal sum of N
elements with no subset falling on the same row or column
minsum = N * max entry in M (just initialized to guarantee >= min sum sought)
foreach permutation p of (0,1,...,N-1):
sum = 0
for i = 0:N-1:
sum += M(i,p(i))
if sum >= minsum: break; # (if we already know this isn't a new min, move on)
if sum < minsum: minsum = sum
print("minimum sum = ", minsum)
Adding a bit of code to remember a particular set of locations that add up to the minimum is left here as an exercise for the reader. Note that this gives up on any permutation as soon as it's not going to be a new minimum sum.
For an NxN array, there are N! permutations, so in practice this gets expensive fast for large N (not your current problem at N = 5). At that point, deeper dynamic programming techniques to quit early on partial results or avoid recomputing subset sums by using, say, memoization would be applicable and desirable.
Most other algorithms are going to do the same basic work in some way that may or may not look obviously similar in code. I like this approach because it has a nice mapping onto a fairly straight-forward understanding in mathematical terms and you can readily identify that what makes it get expensive quickly as N grows is the need to calculate a minimum over a rapidly-expanding set of permutations.
Algorithms to compute all permutations of an array are pretty easy to come by and you get one for free in C++ in the function next_permutation, which is part of the STL. My recommendation is to google "list all permutations" and if you need to work in a particular programming language, add that to the query as well. The algorithm isn't terribly complicated and exists in both recursive and iterative forms. And hey, for the 5x5 case you could probably statically list all 120 permutations anyway.

Find location of numbers in 2d array

I have two arrays. Array A and Array B. Now I need to get where in array B is sequence from array A located. I need to get location of last number and I don't know how.
A[4]={6,3,3,2};
B[10][18]={
{5,3,6,5,6,1,6,1,4,4,5,4,4,6,3,3,1,3},
{6,2,3,6,3,3,2,4,3,1,5,5,3,4,4,1,6,5},
{6,4,3,1,6,2,2,5,3,4,3,2,6,4,5,5,1,4},
{5,3,5,6,6,4,3,2,6,5,1,2,5,6,5,2,3,1},
{1,2,5,2,6,3,1,5,4,6,4,4,4,2,2,2,3,3},
{4,1,4,2,3,2,3,6,4,1,6,2,3,4,4,1,1,4},
{5,3,3,2,6,2,5,2,3,1,2,6,5,1,6,4,1,3},
{4,5,2,1,2,5,2,6,4,3,3,2,3,3,3,1,5,1},
{1,3,5,5,2,1,3,3,3,1,3,3,6,3,3,3,6,5},
{4,5,2,4,2,3,4,2,5,6,5,2,6,3,5,4,5,2}
};
For example: Sequence 6,3,3,2 start in second row and in forth column and ends in seventh column. I need to get location of number 2. My result should be:
Row = 2,
Column= 7
Sequence isn't always in row. It can be in column to. For example:
3,2,4,3 and I ned to know location of number 4.
I know how to search one number in one dimensional array but in this case I don't have solution.
Language is C.
You can compare blocks using memcmp:
for (i = 0; i < rows; i++) { /* For each row */
for (j = 0; j < cols - size; j++) { /* For each col until cols - 4 */
if (memcmp(A, &B[i][j], sizeof(A)) == 0) { /* Compare entire block */
#include <stdio.h>
#include <string.h>
int main(void)
{
int A[4] = {6,3,3,2};
int B[10][18] = {
{5,3,6,5,6,1,6,1,4,4,5,4,4,6,3,3,1,3},
{6,2,3,6,3,3,2,4,3,1,5,5,3,4,4,1,6,5},
{6,4,3,1,6,2,2,5,3,4,3,2,6,4,5,5,1,4},
{5,3,5,6,6,4,3,2,6,5,1,2,5,6,5,2,3,1},
{1,2,5,2,6,3,1,5,4,6,4,4,4,2,2,2,3,3},
{4,1,4,2,3,2,3,6,4,1,6,2,3,4,4,1,1,4},
{5,3,3,2,6,2,5,2,3,1,2,6,5,1,6,4,1,3},
{4,5,2,1,2,5,2,6,4,3,3,2,3,3,3,1,5,1},
{1,3,5,5,2,1,3,3,3,1,3,3,6,3,3,3,6,5},
{4,5,2,4,2,3,4,2,5,6,5,2,6,3,5,4,5,2}
};
size_t i, j, size, rows, cols;
int founded = 0;
size = sizeof(A) / sizeof(A[0]);
rows = sizeof(B) / sizeof(B[0]);
cols = sizeof(B[0]) / sizeof(B[0][0]);
for (i = 0; i < rows; i++) {
for (j = 0; j < cols - size; j++) {
if (memcmp(A, &B[i][j], sizeof(A)) == 0) {
founded = 1;
break;
}
}
if (founded) break;
}
if (founded) printf("Row: %zu Col: %zu\n", i + 1, j + size);
return 0;
}
The problem is not the language. The problem you face is you need to come out with the algorithm first.
Actually this can be easily done by just looking at the first number of the 1D array. In your example it is 6 from (6,3,3,2).
Look for 6 in your 2D array.
Once 6 is found use a loop which loop 4 times (because there are 4 numbers to look for - (6,3,3,2).
In the loop, check whether the subsequent numbers are 3,3,2.
If it is, return the location
Else continue the process to look for 6.
Done!
It will look like this:
for(x=0; x<rows; x++)
for(y=0; y<cols; y++)
{
if(matrix[x][y] == array1D[0])
for(z=1; z<array1DSize; z++){
if(matrix[x][y] != array1D[z])
break;
location = y;
}
}
If you know how to do it with a one dimensional array, you can do it like that in C with multidimensional arrays too!
For instance, say you have a two dimensional array like so:
int array[5][5]; // 5x5 array of ints
You can actually access it in linear fashion, by doing:
(*array)[linear offset]
So that means if you want to access the 2nd column of the 2nd row, you can do:
(*array)[6]
Because the 2nd row starts at index 5, and the second column is at index 1, so you would do (5+1) to get 6. Likewise, the 3rd row would start at index 10, so if you wanted the 2nd column in the third row, you can do (10+1).
Knowing that, you can take your original algorithm and adapt it to access the multidimensional array in a linear fashion. This takes place of the "wrap around" possibility as well.

Function in C that realize the mathematical union of two arrays of int in another one [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
The language is ANSI C. I have 2 arrays of int: A and B. A has an index called m and B an index called n. The assignment says that m MUST BE different from n, so the arrays must have different size. I have coded already this. A is ordered ascending while B is ordered descending. I have to write a function that does the mathematical union of the two arrays in another one called C. If an element is in both the arrays you have to put only one in the array of the union (array C).
My code does not work very well. The last element is not ordered, I receive an output with a very big last number that I do not know from where it comes.
int index_c=index_m+index_n; //the index of array c
// is obtained by the sum of two indexes of the array A and B
int c[index_c];
int k=0;
for (i=0; i < index_m; i++)
{
for (j=0; j < index_n; j++)
{
if (a[i]==b[j])
{
c[k]=a[i]; //put only one time if is repeated more time in the two arrays
}
else
{
c[k]=a[i]; //put the a[i] element in the array c
c[k+1]=b[j]; //the element of the other array next to
}
}
k++;
}
printf("Elements in array C are: \n");
for (i=0; i<index_c; i++)
printf("element %d\n", c[i]);
It doesn't matter if the array C is not sorted, I will sort after the union. Any suggestions?
I am trying the suggestion of put k++ when I add 1 input, and k+2 when I add two input to array C. Now It works a bit well, but it doesn't full work. I mean in output I have not big number values but one of the output value (the 3rd) is the same as the first.
Example: 3 9 3 2 5 The second 3 is wrong and it's missing a number that is covered by the second 3.
Other example 2 4 2 1 9
I spot two immediate logical errors which should be fixed at the very least:
you either store one number in c, when both inputs are the same, and increase k by 1, or you store two numbers into c. You should then increase k with 2 as well. In the code you have now, you only have to add another +1 -- but consider putting these additions inside the if..else test blocks for clarity. Currently, you are overwriting the last one stored.
You print the result from 0 to index_c, the sum of the lengths of the two input arrays. That is not logical because you are throwing out numbers. Hence you get 'random' numbers as output; those are merely uninitialized, i.e. never written to. Print from 0 to k, as that is the valid range of your input.
So far none of the answers exploit the fact that the arrays are both sorted. Here is an implementation which is almost identical to a merge as suggested in the comments. The complexity of the merge is O(m + n).
I have assumed that each array has no duplicates (no [0, 1, 1, 3]), but you could add checks like if (k == 0 || k > 0 && C[k - 1] != A[i]) to fix this if I assumed wrong.
The function returns the length of C.C is sorted in increasing order. To have C be sorted in decreasing order instead change if (A[i] < B[j]) to if (A[i] > B[j]).
int union_merge(const int *A, int m, const int *B, int n, int *C) {
int i = 0, j = n - 1, k = 0;
while (i < m && j >= 0) {
if (A[i] < B[j]) {
C[k++] = A[i++];
} else if (A[i] == B[j]) {
C[k++] = A[i++];
--j;
} else {
C[k++] = B[j--];
}
}
while (j >= 0) {
C[k++] = B[j--];
}
while (i < m) {
C[k++] = A[i++];
}
return k;
}
Let's say that you have two arrays A and B, and union array C. You can input both arrays A and B into one array. Then you can sort that array and after sorting iterate over array and add value to array C(union array) if you didn't already add that value.Total complexity is O( N * log(N) ) Look at code:
#include <stdio.h>
#include <stdlib.h>
#define MAX 100000
int a[2*MAX+3], c[2*MAX+3];
int cmp(const void *a, const void *b) {
if ( *(int*)a < *(int*)b ) return -1;
if ( *(int*)a == *(int*)b ) return 0;
if ( *(int*)a > *(int*)b ) return 1;
}
int main() {
int i, k;
int n, m; scanf("%d%d", &n, &m); // size of the first array and size of the second array
n += m;
for(i = 0; i < n; ++i) // O(N) , input both arrays into one array
scanf("%d", &a[i]);
qsort(a, n, sizeof(int), cmp); // O( N * log(N) ), sort the given array
c[0] = a[0];
for(i = 1, k = 1; i < n; ++i) // O(N)
if(c[k - 1] != a[i]) // if the last element that you added to the union array is different than the current element in first array then add that element to union array
c[k++] = a[i];
for(i = 0; i < k; ++i) // O(K)
printf("%d ", c[i]);
return 0;
}

Leetcode: Four Sum

Problem: Given an array S of n integers, are there elements a, b, c, and d in S such that a + b + c + d = target? Find all unique quadruplets in the array which gives the sum of target.
Note:
Elements in a quadruplet (a,b,c,d) must be in non-descending order. (ie, a ≤ b ≤ c ≤ d)
The solution set must not contain duplicate quadruplets.
For example, given array S = {1 0 -1 0 -2 2}, and target = 0.
A solution set is:
(-1, 0, 0, 1)
(-2, -1, 1, 2)
(-2, 0, 0, 2)
I know there's an O(n^3) solution to this problem, but I was wondering if there's a faster algorithm. I googled a lot and found that many people gave an O(n^2logn) solution, which fails to correctly deal with cases when there are duplicates of pair sums in S (like here
and here). I hope someone can give me a correct version of an O(n^2logn) algorithm if it really exists.
Thanks!
The brute-force algorithm takes time O(n^4): Use four nested loops to form all combinations of four items from the input, and keep any that sum to the target.
A simple improvement takes time O(n^3): Use three nested loops to form all combinations of three items from the input, and keep any that sum to the negative of the target.
The best algorithm I know is a meet-in-the-middle algorithm that operates in time O(n^2): Use two nested loops to form all combinations of two items from the input, storing the pairs and totals in some kind of dictionary (hash table, balanced tree) indexed by total. Then use two more nested loops to again form all combinations of two items from the input, and keep the two items from the nested loops, plus the two items from the dictionary, for any pair of items that sums to the negative of a total in the dictionary.
I have code at my blog.
IMHO, for O(n^2lgn) algorithm, the problem of duplicates can be solved when creating the aux[] array. (I'm using the name in the second link you provided). The basic idea is first sort the elements in the input, and then while processing the array, skip the duplicates.
vector<int> createAuxArray(vector<int> input) {
int len = input.size();
vector<int> aux;
sort(input.begin(), input.end());
for (int i = 0; i < len; ++i) {
if (i != 0 && input[i] == input[i - 1]) continue; // skip when encountered a duplicate
for (int j = i + 1; j < len; ++j) {
if (j != i + 1 && input[j] == input[j - 1]) continue; // same idea
aux.push_back(createAuxElement(input[i], input[j]);
}
}
return aux;
}
Complexity for this module is O(nlgn) + O(n^2) = O(n^2), which doesn't affect the overall performance. Once we have created aux array, we can plug it into the code mentioned in the post and the results will be correct.
Note that a BST or hashtable can be used to replace the sorting, but in general it doesn't decrease the complexity since you have to insert/query (O(lgN)) inside 2-nested loop.
This is a modified version of the geeksforgeeks solution which handles duplicates of pair sums as well. I noticed that some of the pairs were missing because the hash table was overwriting the old pairs when it found new pair that satisfies the sum. Thus, the fix is to avoid overwriting by storing them in a vector of pairs. Hope this helps!
vector<vector<int> > fourSum(vector<int> &a, int t) {
unordered_map<int, vector<pair<int,int> > > twoSum;
set<vector<int> > ans;
int n = a.size();
for (int i = 0; i < n; i++) for (int j = i + 1; j < n; j++) twoSum[a[i] + a[j]].push_back(make_pair(i, j));
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
if (twoSum.find(t - a[i] - a[j]) != twoSum.end()) {
for (auto comp : twoSum[t - a[i] - a[j]]) {
if (comp.first != i and comp.first != j and comp.second != i and comp.second != j) {
vector<int> row = {a[i], a[j], a[comp.first], a[comp.second]};
sort(row.begin(), row.end());
ans.insert(row);
}
}
}
}
}
vector<vector<int> > ret(ans.begin(), ans.end());
return ret;
}

How to find a duplicate element in an array of shuffled consecutive integers?

I recently came across a question somewhere:
Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?
What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?
Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.
Eg:
Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10
Input - Expected => 2
Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)
Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)
printf("Answer : %d\n",
array[0] ^
array[1] ^
array[2] ^
// continue typing...
array[999] ^
array[1000] ^
1 ^
2 ^
// continue typing...
999^
1000
);
Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.
And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)
Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.
Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.
Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.
for (int i = 1; i < 1001; i++)
{
array[i] = array[i] ^ array[i-1] ^ i;
}
printf("Answer : %d\n", array[1000]);
I will leave the exercise of figuring out why this works to you, with a simple hint :-):
a ^ a = 0
0 ^ a = a
A non destructive version of solution by Franci Penov.
This can be done by making use of the XOR operator.
Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index: 0, 1, 2, 3, 4
Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.
int i;
int dupe = 0;
for(i = 0; i < N; i++) {
dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.
The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.
Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)
This makes use of an additional variable so does not meet the requirements in the question completely.
Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.
To paraphrase Francis Penov's solution.
The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.
The solution is:
acc = 0
for i in array: acc = acc ^ i
Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.
acc = 0
for i in len(array): acc = acc ^ i ^ array[i]
Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)
But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.
It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).
Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.
One line solution in Python
arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2
Explanation on why it works is in #Matthieu M.'s answer.
If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.
Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:
sum = 0
for each element of the array:
sum += that element of the array
number_that_occurred_twice = sum - 500500
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
public static void main(String[] args) {
int start = 1;
int end = 10;
int arr[] = {1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10};
System.out.println(findDuplicate(arr, start, end));
}
static int findDuplicate(int arr[], int start, int end) {
int sumAll = 0;
for(int i = start; i <= end; i++) {
sumAll += i;
}
System.out.println(sumAll);
int sumArrElem = 0;
for(int e : arr) {
sumArrElem += e;
}
System.out.println(sumArrElem);
return sumArrElem - sumAll;
}
No extra storage requirement (apart from loop variable).
int length = (sizeof array) / (sizeof array[0]);
for(int i = 1; i < length; i++) {
array[0] += array[i];
}
printf(
"Answer : %d\n",
( array[0] - (length * (length + 1)) / 2 )
);
Do arguments and callstacks count as auxiliary storage?
int sumRemaining(int* remaining, int count) {
if (!count) {
return 0;
}
return remaining[0] + sumRemaining(remaining + 1, count - 1);
}
printf("duplicate is %d", sumRemaining(array, 1001) - 500500);
Edit: tail call version
int sumRemaining(int* remaining, int count, int sumSoFar) {
if (!count) {
return sumSoFar;
}
return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);
public int duplicateNumber(int[] A) {
int count = 0;
for(int k = 0; k < A.Length; k++)
count += A[k];
return count - (A.Length * (A.Length - 1) >> 1);
}
A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.
For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
I support the addition of all the elements and then subtracting from it the sum of all the indices but this won't work if the number of elements is very large. I.e. It will cause an integer overflow! So I have devised this algorithm which may be will reduce the chances of an integer overflow to a large extent.
for i=0 to n-1
begin:
diff = a[i]-i;
dup = dup + diff;
end
// where dup is the duplicate element..
But by this method I won't be able to find out the index at which the duplicate element is present!
For that I need to traverse the array another time which is not desirable.
Improvement of Fraci's answer based on the property of XORing consecutive values:
int result = xor_sum(N);
for (i = 0; i < N+1; i++)
{
result = result ^ array[i];
}
Where:
// Compute (((1 xor 2) xor 3) .. xor value)
int xor_sum(int value)
{
int modulo = x % 4;
if (modulo == 0)
return value;
else if (modulo == 1)
return 1;
else if (modulo == 2)
return i + 1;
else
return 0;
}
Or in pseudocode/math lang f(n) defined as (optimized):
if n mod 4 = 0 then X = n
if n mod 4 = 1 then X = 1
if n mod 4 = 2 then X = n+1
if n mod 4 = 3 then X = 0
And in canonical form f(n) is:
f(0) = 0
f(n) = f(n-1) xor n
My answer to question 2:
Find the sum and product of numbers from 1 -(to) N, say SUM, PROD.
Find the sum and product of Numbers from 1 - N- x -y, (assume x, y missing), say mySum, myProd,
Thus:
SUM = mySum + x + y;
PROD = myProd* x*y;
Thus:
x*y = PROD/myProd; x+y = SUM - mySum;
We can find x,y if solve this equation.
In the aux version, you first set all the values to -1 and as you iterate check if you have already inserted the value to the aux array. If not (value must be -1 then), insert. If you have a duplicate, here is your solution!
In the one without aux, you retrieve an element from the list and check if the rest of the list contains that value. If it contains, here you've found it.
private static int findDuplicated(int[] array) {
if (array == null || array.length < 2) {
System.out.println("invalid");
return -1;
}
int[] checker = new int[array.length];
Arrays.fill(checker, -1);
for (int i = 0; i < array.length; i++) {
int value = array[i];
int checked = checker[value];
if (checked == -1) {
checker[value] = value;
} else {
return value;
}
}
return -1;
}
private static int findDuplicatedWithoutAux(int[] array) {
if (array == null || array.length < 2) {
System.out.println("invalid");
return -1;
}
for (int i = 0; i < array.length; i++) {
int value = array[i];
for (int j = i + 1; j < array.length; j++) {
int toCompare = array[j];
if (value == toCompare) {
return array[i];
}
}
}
return -1;
}

Resources