Find 2 repeating elements in given array

Find 2 repeating elements in given array - arrays

Given an array with n+2 elements, all elements in the array are in the range 1 to n and all elements occur only once except two elements which occur twice.
Find those 2 repeating numbers. For example, if the array is [4, 2, 4, 5, 2, 3, 1], then n is 5, there are n+2 = 7 elements with all elements occurring only once except 2 and 4.
So my question is how to solve the above problem using XOR operation. I have seen the solution on other websites but I'm not able to understand it. Please consider the following example:
arr[] = {2, 4, 7, 9, 2, 4}
XOR every element. xor = 2^4^7^9^2^4 = 14 (1110)
Get a number which has only one set bit of the xor. Since we can easily get the rightmost set bit, let us use it.
set_bit_no = xor & ~(xor-1) = (1110) & ~(1101) = 0010. Now set_bit_no will have only set as rightmost set bit of xor.
Now divide the elements in two sets and do xor of elements in each set, and we get the non-repeating elements 7 and 9.

Yes, you can solve it with XORs. This answer expands on Paulo Almeida's great comment.
The algorithm works as follows:
Since we know that the array contains every element in the range [1 .. n], we start by XORing every element in the array together and then XOR the result with every element in the range [1 .. n]. Because of the XOR properties, the unique elements cancel out and the result is the XOR of the duplicated elements (because the duplicate elements have been XORed 3 times in total, whereas all the others were XORed twice and canceled out). This is stored in xor_dups.
Next, find a bit in xor_dups that is a 1. Again, due to XOR's properties, a bit set to 1 in xor_dups means that that bit is different in the binary representation of the duplicate numbers. Any bit that is a 1 can be picked for the next step, my implementation chooses the least significant. This is stored in diff_bit.
Now, split the array elements into two groups: one group contains the numbers that have a 0 bit on the position of the 1-bit that we picked from xor_dups. The other group contains the numbers that have a 1-bit instead. Since this bit is different in the numbers we're looking for, they can't both be in the same group. Furthermore, both occurrences of each number go to the same group.
So now we're almost done. Consider the group for the elements with the 0-bit. XOR them all together, then XOR the result with all the elements in the range [1..n] that have a 0-bit on that position, and the result is the duplicate number of that group (because there's only one number repeated inside each group, all the non-repeated numbers canceled out because each one was XORed twice except for the repeated number which was XORed three times).
Rinse, repeat: for the group with the 1-bit, XOR them all together, then XOR the result with all the elements in the range [1..n] that have a 1-bit on that position, and the result is the other duplicate number.
Here's an implementation in C:
#include <assert.h>
void find_two_repeating(int arr[], size_t arr_len, int *a, int *b) {
assert(arr_len > 3);
size_t n = arr_len-2;
int i;
int xor_dups = 0;
for (i = 0; i < arr_len; i++)
xor_dups ^= arr[i];
for (i = 1; i <= n; i++)
xor_dups ^= i;
int diff_bit = xor_dups & -xor_dups;
*a = 0;
*b = 0;
for (i = 0; i < arr_len; i++)
if (arr[i] & diff_bit)
*a ^= arr[i];
else
*b ^= arr[i];
for (i = 1; i <= n; i++)
if (i & diff_bit)
*a ^= i;
else
*b ^= i;
}
arr_len is the total length of the array arr (the value of n+2), and the repeated entries are stored in *a and *b (these are so-called output parameters).

Related

Given an array A[] of N numbers. Now, you need to find and print the Summation of the bitwise OR of all possible subsets of this array

For [1, 2, 3], all possible subsets are {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}
The sum of OR of these subsets are, 1 + 2 + 3 + 3 + 3 + 3 + 3 = 18.
My Approach is to generate all possible subset and find their OR and sum it but time complexity is O(2^n) , but I need a solution with O(nlogn) or less.

As you having 3 alements so 2^3=8 subsets will be created and you need to or all subset and print the sum of all subsets, By following logic you can get the solution you required
public class AndOfSubSetsOfSet {
public static void main(String[] args) {
findSubsets(new int[]{1, 2,3});
}
private static void findSubsets(int array[]) {
int numOfSubsets = 1 << array.length;
int a = 0;
for (int i = 0; i < numOfSubsets; i++) {
int pos = array.length - 1;
int bitmask = i;
int temp = 0;
int count = 0;
while (bitmask > 0) {
if ((bitmask & 1) == 1) {
if (count == 0) {
temp = array[pos];
} else
temp = array[pos] | temp;
count++;
}
//this will shift this number to left so one bit will be remove
bitmask >>= 1;
pos--;
}
count = 0;
a += temp;
temp = 0;
}
System.out.println(a);
}
}
`

one best approach you can use 3 loops outer loop would select number of elements of pair we have to make 2,3,4....upto n. and inner two loops would select elements according to outer loop. in the inner loop you can use bitwise OR so get the answer.
here time complexicity is better than exponential.
if any problem i would gave you code .
please vote if like.

Let's find the solution by calculating bitwise values. Consider the following points first. We will formulate the algorithm based on these points
For N numbers, there can be 2^N-1 such subsets.
For N numbers, where the maximum number of bits can be k, what can be the maximum output? Obviously when every subset sum is all 1's (i.e., for every combination there will be 1 in every bit of k positions). So calculate this MAX. In your example k = 2 and N = 3. So the MAX is when all the subset sum will be 11 (i.e.,3). SO MAX = (2^N-1)*(2^k-1) = 21.
Note that, the value of a bit of subset sum will only be 0 when the bits of every element of that subset is 0. So For every bit first calculate how many subsets can have 0 value in that bit. Then multiply that number with the corresponding value (2^bit_position) and deduct from MAX. In your case, for the leftmost position (i.e., position 0), there is only one 0 (in 2). So in 2^1-1 = 1 subset, the subsets sum's 0 position will be 0. So deduct 1*1 from MAX. Similarly for position 1, there can be only 1 subset with 0 at position 1 of subset sum ({2}). so deduct 1*2 from MAX. For every bit, calculate this value and keep deducting. the final MAX will be the result. If you consider 16 bit integer and you don't know about max k, then calculate using k = 16.
Let's consider another example with N = {1,4}. The subsets are {1},{4},{1,4}, and the result is = 1+4+5 = 10
here k = 3, N = 2. SO MAX = (2^K-1)*(2^N-1) = 21.
For 0 bit, there is only single 0 (in 4). so deduct 1*1 from MAX. So new MAX = 21 -1 = 20.
For 1 bit, both 1 and 4 has 0. so deduct (2^2-1)*2 from MAX. So new MAX = 20 -6 = 14.
For 2 bit, there is only single 0 (in 1). so deduct 1*4 from MAX. So new MAX = 14 -4 = 10.
As we have calculated for every bit position, thus the final result is 10.
Time Complexity
First and second steps can be calculated in constant time
In third step, the main thing is to find the number of 0 bit of each position. So for N number it takes O(k*N) in total. as k will be constant so the overall complexity will be O(N).

Algorithms for selecting a number uniformly randomly from a subset of integers

Suppose that int array a[i]=i, i=0, 1, 2, ..., N-1. Now given that each integer would associate with a bit. I need an algorithm to uniformly randomly select an integer from the subset of integers whose associated bit are 0. It is assumed that the total number of integers whose associated bit are 0 is already given in a variable T.
One straightforward solution I have is to generate r=rand() % T, and then find the r-th integer whose associated bit is 0 (by testing i=0,1,...). However, I wonder would there be any decent algorithms for doing this? Also, if say that the associated bits are stored in some long int variables (which is true in my case), finding the r-th integer whose associated bit is 0 would not be a easy task.
Thanks for your inputs.

If the associated bits are irregular, i.e. cannot be deduced from the value of i by a simple formula, then it is just impossible to locate the r-th '0' bit without enumerating those that precede, unless preprocessing is allowed.
A good solution is to precompute a table that will store the indexes of the '0' bit entries contiguously, and lookup this table for the r-th entry. (Instead of an index table, you can as well fill another array with the elements from the subset only.)
Indexing a packed bit array is not such a big deal. Assuming 64 bits long ints, the bit at index i is found by the expression
(PackedBits[i >> 6] >> (i & 63)) & 1
(The 6 because 64 == (1 << 6).)
In case you really want to find the r-th '0' sequentially, you can speed-up the search a little (x 64) by precomputing the number of '0's in every long int so that you can skip 64 entries in a single go.
And if you really really don't want to precompute anything, you can still speed-up the search by processing the bits 8 by 8, using a static table that relates every byte value (among 256) to the number of '0' bits in it. (Or even 16 by 16 if you can afford using a table of 65536 numbers.)

You can speed this up by trading memory for speed.
T must be an array, that stores in T[n] the number of integers in a[] that have bit n cleared, and this needs to be precomputed at some point. So, while you are calculating that, store the indices of all the integers that have a given bit cleared in another 2 dimensional array, indexed by the bit number and r.
In C for example:
#define BITS (64)
#define N (100)
long int a[N];
int T[BITS];
int index[BITS][N];
void init()
{
int i, j;
// clear T:
for(j = 0; j < BITS; j++)
T[j] = 0;
// compute T and the indices for each:
for(i = 0; i < N; i++)
{
for(j = 0; j < BITS; j++)
{
if((a[i] & (1 << j)) == 0)
{
// increment T and store the index
index[j][T[j]++] = i;
}
}
}
}
Then you can find your random number like this:
long number = N[index[bit][rand() % T[bit]];
You could make this more memory-efficient by using a less wasteful data structure that only stores as many indices for each bit as there are actual values in a[] that have the bit cleared.

If T is sufficiently large, the most efficient solution is going to be to randomly select an integer up to N and loop until the condition is met.

Repeated and missing number in an array using xor

How to find the repeated number and missing number as well using xor?
For eg: actual = [1,2,3] input_received = [3,2,3]. Here the missing number is 1 and the repeated number is 3. I found a quite interesting solution while surfing,
int missing_and_repeating(int a[], int n, int size){
int xor =0;
int i;
int x =0 , y =0;
for(i=0; i<size; i++)
xor = xor^a[i];
for(i=1; i<=n; i++)
xor = xor^i;
// Get the rightmost bit which is set
int set_bit_no = xor & ~(xor -1);
// XOR numbers in two buckets
for(i=0; i<size; i++){
if(a[i]& set_bit_no){
x = x^a[i];
}
else
y = y^ a[i];
}
for(i=1; i<=n; i++){
if(i & set_bit_no)
x = x^i;
else
y = y^i;
}
printf("\n %d %d ", x,y );
}
'actual' array is XORed and 'input_received' array is XORed
set_bit_no is assigned and both the arrays are split into two halves according to set_bit_no.
So again go back to our array and numbers from 1 to N-1 and 0 to size and XOR numbers in two buckets, one buckets contains XOR result of XORing all numbers with given bit set and other bucket contains XOR result of XORing all numbers with given bit reset.
I could not understand what set_bit_no is and why they are taking it, and how the array is split according to it. Someone please help me with a short example.

I assume that pre-condition is that exactly one array element is missing and instead replaced by a duplicate of another element. So, actual = {a1, a2, ..., an} and input_received = {a2, a2,..., an} (we can always rearrange elements so that a1 is missing and a2 is repeated).
So when you xor all elements from both arrays you get xored = a1 ^ a2.
Now we have to decompose that number to know a1 and a2. You take one of the non-zero bits (doesn't mater which one, the easiest way is to take the least significant like it's done in the code, and you always have one if a1 != a2). This bit is set in only one of numbers a1 or a2. So you xor all numbers which have this bit, and they'll annihilate each other, leaving only a1 or a2, and you xor all other numbers, so the result will be another number - a2 or a1 respectively.
It's important to note that this algorithm doesn't tell which of these numbers is missing and which is repeated. This is demonstrated by the following code:
int main() {
int arr1[] = {1, 1};
missing_and_repeating(arr1, 2, 2);
int arr2[] = {2, 2};
missing_and_repeating(arr2, 2, 2);
}
Output:
1 2
1 2

Checksum for an integer array?

I have an array that is of size 4,9,16 or 25 (according to the input) and the numbers in the array are the same but less by one (if the array size is 9 then the biggest element in the array would be 8) the numbers start with 0
and I would like to do some algorithm to generate some sort of a checksum for the array so that I can compare that 2 arrays are equal without looping through the whole array and checking each element one by one.
Where can I get this sort of information? I need something that is as simple as possible. Thank you.
edit: just to be clear on what I want:
-All the numbers in the array are distinct, so [0,1,1,2] is not valid because there is a repeated element (1)
-The position of the numbers matter, so [0,1,2,3] is not the same as [3,2,1,0]
-The array will contain the number 0, so this should also be taken into consideration.
EDIT:
Okay I tried to implement the Fletcher's algorithm here:
http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Straightforward
int fletcher(int array[], int size){
int i;
int sum1=0;
int sum2=0;
for(i=0;i<size;i++){
sum1=(sum1+array[i])%255;
sum2=(sum2+sum1)%255;
}
return (sum2 << 8) | sum1;
}
to be honest I have no idea what does the return line do but unfortunately, the algorithm does not work.
For arrays [2,1,3,0] and [1,3,2,0] I get the same checksum.
EDIT2:
okay here's another one, the Adler checksum
http://en.wikipedia.org/wiki/Adler-32#Example_implementation
#define MOD 65521;
unsigned long adler(int array[], int size){
int i;
unsigned long a=1;
unsigned long b=0;
for(i=0;i<size;i++){
a=(a+array[i])%MOD;
b=(b+a)%MOD;
}
return (b <<16) | a;
}
This also does not work.
Arrays [2,0,3,1] and [1,3,0,2] generate same checksum.
I'm losing hope here, any ideas?

Let's take the case of your array of 25 integers. You explain that it can contains any permutations of the unique integers 0 to 24. According to this page, there is 25! (25 factorial) possible permutations, that is 15511210043330985984000000. Far more than a 32bit integer can contains.
The conclusion is that you will have collision, no matter how hard you try.
Now, here is a simple algorithm that account for position:
int checksum(int[] array, int size) {
int c = 0;
for(int i = 0; i < size; i++) {
c += array[i];
c = c << 3 | c >> (32 - 3); // rotate a little
c ^= 0xFFFFFFFF; // invert just for fun
}
return c;
}

I think what you want is in the answer of the following thread:
Fast permutation -> number -> permutation mapping algorithms
You just take the number your permutation is mapped to and take that as your Checksum. As there is exactly one Checksum per permutation there can't be a smaller Checksum that is collision free.

How about the checksum of weighted sum? Let's take an example for [0,1,2,3]. First pick a seed and limit, let's pick a seed as 7 and limit as 10000007.
a[4] = {0, 1, 2, 3}
limit = 10000007, seed = 7
result = 0
result = ((result + a[0]) * seed) % limit = ((0 + 0) * 7)) % 10000007 = 0
result = ((result + a[1]) * seed) % limit = ((0 + 1) * 7)) % 10000007 = 7
result = ((result + a[2]) * seed) % limit = ((7 + 2) * 7)) % 10000007 = 63
result = ((result + a[3]) * seed) % limit = ((63 + 3) * 7)) % 10000007 = 462
Your checksum is 462 for that [0, 1, 2, 3].
The reference is http://www.codeabbey.com/index/wiki/checksum

For an array of N unique integers from 1 to N, just adding up the elements will always be N*(N+1)/2. Therefore the only difference is in the ordering. If by "checksum" you imply that you tolerate some collisions, then one way is to sum the differences between consecutive numbers. So for example, the delta checksum for {1,2,3,4} is 1+1+1=3, but the delta checksum for {4,3,2,1} is -1+-1+-1=-3.
No requirements were given for collision rates or computational complexity, but if the above doesn't suit, then I recommend a position dependent checksum

From what I understand your array contains a permutation of numbers from 0 to N-1. One check-sum which will be useful is the rank of the array in its lexicographic ordering. What does it means ? Given 0, 1, 2
You have the possible permutations
1: 0, 1, 2
2: 0, 2, 1
3: 1, 0, 2
4: 1, 2, 0
5: 2, 0, 1
6: 2, 1, 0
The check-sum will be the first number, and computed when you create the array. There are solutions proposed in
Find the index of a given permutation in the list of permutations in lexicographic order
which can be helpful, although it seems the best algorithm was of quadratic complexity. To improve it to linear complexity you should cache the values of the factorials before hand.
The advantage? ZERO collision.
EDIT: Computation
The value is like the evaluation of a polynomial where factorial is used for the monomial instead of power. So the function is
f(x0,....,xn-1) = X0 * (0!) + X1 * (1!) + X2 * (2!) +...+ Xn-1 * (n-1!)
The idea is to use each values to get a sub-range of permutations, and with enough values you pinpoint an unique permutation.
Now for the implementation (like the one of a polynomial):
pre compute 0!.... to n-1! at the beginning of the program
Each time you set an array you use f(elements) to compute its checksum
you compare in O(1) using this checksum

How to find a 2 unpaired elements in array?

You have an array with n=2k+2 elements where 2 elements haven't pair. Example for 8 elemets array: 1 2 3 47 3 1 2 0. "47" and "0" haven't pair in array. If I have array where only 1 element has't pair, I solve this problem with XOR. But I have 2 unpair elements! What can I do? Solution could be for a O(n) time performance and for O(1) additional memory.

Some hints...
It will take 2 passes. First, go through the list and XOR all elements together. See what you get. Proceed from there.
Edit: The key observation about the result of the first pass should be that it shows you the set of bits in which the 2 unpaired elements differ.

Use INT_MAX/8 bytes of memory. Walk the array. XOR the bit corresponding to each value with 1. If there are 0 or 2 instances the bit will end up 0. If there is only one instance, it will be set. O(1) mem, O(N) time.

Scan the Array and put each number and count in hash.
Rescan and find out the items with count=1.
This is O(n).

You can try this.It will take O(n) time
int xor = arr[0];
int set_bit_no;
int i;
int x = 0; //First unpair number
int y = 0; //second unpair number
for (i = 1; i < n; i++)
xor ^= arr[i];
set_bit_no = xor & ~(xor-1);//Get the rightmost set bit in set_bit_no
for (i = 0; i < n; i++)
{
if (arr[i] & set_bit_no) {
//XOR of first set
x = x ^ arr[i];
}
else
{
//XOR of second set
y = y ^ arr[i];
}
}
Explanation...
arr[] = {2, 4, 7, 9, 2, 4}
1) Get the XOR of all the elements.
xor = 2^4^7^9^2^4 = 14 (1110)
2) Get a number which has only one set bit of the xor.
Since we can easily get the rightmost set bit, let us use it.
set_bit_no = xor & ~(xor-1) = (1110) & ~(1101) = 0010
Now set_bit_no will have only set as rightmost set bit of xor.
3) Now divide the elements in two sets and do xor of
elements in each set, and we get the non-repeating
elements 7 and 9.