Find the first missing positive integer - arrays

Given an array of integers, find the first missing positive integer in linear time and constant space. In other words, find the lowest positive integer that does not exist in the array. The array can contain duplicates and negative numbers as well.
For example, the input [3, 4, -1, 1] should give 2. The input [1, 2, 0] should give 3.
I did this but could not get through it and then searched it on google and got an answer on geeks for geeks but could not understand it. Can anyone provide a logic for this using simple concepts? I have just started competitive programming.

One way to find the solution is to rearrange the array, and then finding the first
number misplaced:
int find_missing(std::vector<int>& v)
{
for (std::size_t i = 0; i != v.size(); ++i) {
std::size_t e = i;
while (0 < v[e] // Correct range
&& std::size_t(v[e]) <= v.size() // Correct range
&& std::size_t(v[e]) != e + 1 // Correct place
&& v[e] != v[v[e] - 1] // Duplicate
) {
std::swap(v[e], v[v[e] - 1]);
}
}
// Now the array look like
// {1, 2, 3, x, 5, 6, x}
// Find first misplaced number
for (std::size_t i = 0; i != v.size(); ++i) {
if (std::size_t(v[i]) != i + 1) {
return i + 1;
}
}
// All are correctly placed:
return v.size();
}
Demo

If a bitmap (extension of bitmask) is acceptable, then we could use 1 bit per positive integer and then just scroll the array. The bitmap is initialized with all bits to 0. As we scroll the array, we ignore negatives and turn the nth bit on when we encounter n. When we find, for example, 13, we turn the 13th bit into 1. (Likewise the number 1 would turn the first bit to 1) Then we scroll the bitmask and check the first zero. Done.
However, this might not be considered a constant complexity at all, since when the max positive int is MAXINT, we need the bitmap to be MAXINT bits large. Too bad. In theory, though, this is correct. Also O(2*N) = O(N)
So we have to store some information in the array or this is impossible to solve in O(N) in a single go.
Another solution consists in mapping array index with integer and storing information using sign. If the array size is L, for example, the missing int will be less or equal to L+1 (L+1 when the array if full like [1,2,3,4], unless this case counts as no element missing). Thanks Jarod for the hint on this.
Considered O(3N) is still O(N), how about:
step 1: scroll the array and swap negatives and zeroes moving them to the beginning. Turn everything non positive, that was swapped this way, to 1. The authentic positives will start at index j.
step 2: The whole array is now positive but true data lies from j to the end of the array. Scroll the subarray with authentic data and when you find, say, number H, turn index the Hth indexed number of the whole array negative. If H is greater than the array size, skip it. When you find for example 2, turn arr[1] (second element) negative.
step 3: scroll again the array checking for the first positive number. Basing on the index you know what the first missing positive integer is.

Related

Number of subarrays with same 'degree' as the array

So this problem was asked in a quiz and the problem goes like:
You are given an array 'a' with elements ranging from 1-106 and the size of array could be maximum 105 Now we are asked to find the number of subarrays with the same 'degree' as the original array. Degree of an array is defined as the frequency of maximum occurring element in the array. Multiple elements could have the same frequency.
I was stuck in this problem for like an hour but couldn't think of any solution. How do I solve it?
Sample Input:
first-input
1,2,2,3,1
first-output 2
second-input
1,1,2,1,2,2
second-output 4
The element that occurs most frequently is called the mode; this problem defines degree as the frequency count. Your tasks are:
Identify all of the mode values.
For each mode value, find the index range of that value. For instance, in the array
[1, 1, 2, 1, 3, 3, 2, 4, 2, 4, 5, 5, 5]
You have three modes (1 2 5) with a degree of 3. The index ranges are
1 - 0:3
2 - 2:8
5 - 10:12
You need to count all index ranges (subarrays) that include at least one of those three ranges.
I've tailored this example to have both basic cases: modes that overlap, and those that do not. Note that containment is a moot point: if you have an array where one mode's range contains another:
[0, 1, 1, 1, 0, 0]
You can ignore the outer one altogether: any subarray that contains 0 will also contain 1.
ANALYSIS
A subarray is defined by two numbers, the starting and ending indices. Since we must have 0 <= start <= end <= len(array), this is the "handshake" problem between array bounds. We have N(N+1)/2 possible subarrays.
For 10**5 elements, you could just brute-force the problem from here: for each pair of indices, check to see whether that range contains any of the mode ranges. However, you can easily cut that down with interval recognition.
ALGORITHM
Step through the mode ranges, left to right. First, count all subranges that include the first mode range [0:3]. There is only 1 possible starts [0] and 10 possible ends [3:12]; that's 10 subarrays.
Now move to the second mode range, [2:8]. You need to count subarrays that include this, but exclude those you've already counted. Since there's an overlap, you need a starting point later than 0, or an ending point before 3. This second clause is not possible with the given range.
Thus, you consider start [1:2], end [8:12]. That's 2 * 5 more subarrays.
For the third range [10:12 (no overlap), you need a starting point that does not include any other subrange. This means that any starting point [3:10] will do. Since there's only one possible endpoint, you have 8*1, or 8 more subarrays.
Can you turn this into something formal?
Taking reference from leet code
https://leetcode.com/problems/degree-of-an-array/solution/
solve
class Solution {
public int findShortestSubArray(int[] nums) {
Map<Integer, Integer> left = new HashMap(),
right = new HashMap(), count = new HashMap();
for (int i = 0; i < nums.length; i++) {
int x = nums[i];
if (left.get(x) == null) left.put(x, i);
right.put(x, i);
count.put(x, count.getOrDefault(x, 0) + 1);
}
int ans = nums.length;
int degree = Collections.max(count.values());
for (int x: count.keySet()) {
if (count.get(x) == degree) {
ans = Math.min(ans, right.get(x) - left.get(x) + 1);
}
}
return ans;
}
}

Find a unique integer in an array

I am looking for an algorithm to solve the following problem: We are given an integer array of size n which contains k (0 < k < n) many elements exactly once. Every other integer occurs an even number of times in the array. The output should be any of the k unique numbers. k is a fixed number and not part of the input.
An example would be the input [1, 2, 2, 4, 4, 2, 2, 3] with both 1 and 3 being a correct output.
Most importantly, the algorithm should run in O(n) time and require only O(1) additional space.
edit: There has been some confusion regarding whether there is only one unique integer or multiple. I apologize for this. The correct problem is that there is an arbitrary but fixed amount. I have updated the original question above.
"Dante." gave a good answer for the case that there are at most two such numbers. This link also provides a solution for three. "David Eisenstat" commented that it is also possible to do for any fixed k. I would be grateful for a solution.
There is a standard algorithm to solve such problems using XOR operator:
Time Complexity = O(n)
Space Complexity = O(1)
Suppose your input array contains only one element that occurs odd no of times and rest occur even number of times,we take advantage of the following fact:
Any expression having even number of 0's and 1's in any order will always be = 0 when xor is applied.
That is
0^1^....... = 0 as long as number of 0 is even and number of 1 is even
and 0 and 1 can occur in any order.
Because all numbers that occur even number of times will have their corresponding bits form even number of 1's and 0's and only the number which occurs only once will have its bit left out when we take xor of all elements of array because
0(from no's occuring even times)^1(from no occuring once) = 1
0(from no's occuring even times)^0(from no occuring once) = 0
as you can see the bit of only the number occuring once is preserved.
This means when given such an array and you take xor of all the elements,the result is the number which occurs only once.
So the algorithm for array of length n is:
result = array[0]^array[1]^.....array[n-1]
Different Scenario
As the OP mentioned that input can also be an array which has two numbers occuring only once and rest occur even number of times.
This is solved using the same logic as above but with little difference.
Idea of algorithm:
If you take xor of all the elements then definitely all the bits of elements occuring even number of times will result in 0,which means:
The result will have its bit 1 only at that bit position where the bits of the two numbers occuring only once differ.
We will use the above idea.
Now we focus on the resultant xor bit which is 1(any bit which is 1) and make rest 0.The result is a number which will allow us to differentiate between the two numbers(the required ones).
Because the bit is 1,it means they differ at this position,it means one will have 0 at this position and one will have 1.This means one number when taken AND results in 0 and one does not.
Since it is very easy to set the right most bit,we set it of the result xor as
A = result & ~(result-1)
Now traverse through the array once and if array[i]&A is 0 store the number in variable number_1 as
number_1 = number_1^array[i]
otherwise
number_2 = number_2^array[i]
Because the remaining numbers occur even number of times,their bit will automatically disappear.
So the algorithm is
1.Take xor of all elements,call it xor.
2.Set the rightmost bit of xor and store it in B.
3.Do the following:
number_1=0,number_2=0;
for(i = 0 to n-1)
{
if(array[i] & B)
number_1 = number_1^array[i];
else
number_2 = number_2^array[i];
}
The number_1 and number_2 are the required numbers.
Here's a Las Vegas algorithm that, given k, the exact number of elements that occur an odd number of times, reports all of them in expected time O(n k) (read: linear-time when k is O(1)) and space O(1) words, assuming that "give me a uniform random word" and "give me the number of 1 bits set in this word (popcount)" are constant-time operations. I'm pretty sure that I'm not the first person to come up with this algorithm (and I'm not even sure that I'm remembering all of the refinements), but I've reached the limits of my patience trying to find it.
The central technique is called random restrictions. Essentially what we do is to filter the input randomly by value, in the hope that we retain exactly one odd-count element. We apply the classic XOR algorithm to the filtered array and check the result; if it succeeded, then we pretend to add it to the array, to make it even-count. Repeat until all k elements are found.
The filtration process goes like this. Treat each input word x as a binary vector of length w (doesn't matter what w is). Compute a random binary matrix A of size w by ceil(1 + lg k) and a random binary vector b of length ceil(1 + lg k). We filter the input by retaining those x such that Ax = b, where the left-hand side is a matrix multiplication mod 2. In implementation, A is represented as ceil(1 + lg k) vectors a1, a2, .... We compute the bits of Ax as popcount(a1 ^ x), popcount(a2 ^ x), .... (This is convenient because we can short-circuit the comparison with b, which shaves a factor lg k from the running time.)
The analysis is to show that, in a given pass, we manage with constant probability to single out one of the odd-count elements. First note that, for some fixed x, the probability that Ax = b is 2-ceil(1 + lg k) = Θ(1/k). Given that Ax = b, for all y ≠ x, the probability that Ay = b is less than 2-ceil(1 + lg k). Thus, the expected number of elements that accompany x is less than 1/2, so with probability more than 1/2, x is unique in the filtered input. Sum over all k odd-count elements (these events are disjoint), and the probability is Θ(1).
Here's a deterministic linear-time algorithm for k = 3. Let the odd-count elements be a, b, c. Accumulate the XOR of the array, which is s = a ^ b ^ c. For each bit i, observe that, if a[i] == b[i] == c[i], then s[i] == a[i] == b[i] == c[i]. Make another pass through the array, accumulate the XOR of the lowest bit set in s ^ x. The even-count elements contribute nothing again. Two of the odd-count elements contribute the same bit and cancel each other out. Thus, the lowest bit set in the XOR is where exactly one of the odd-count elements differs from s. We can use the restriction method above to find it, then the k = 2 method to find the others.
The question title says "the unique integer", but the question body says there can be more than one unique element.
If there is in fact only one non-duplicate: XOR all the elements together. The duplicates all cancel, because they come in pairs (or higher multiples of 2), so the result is the unique integer.
See Dante's answer for an extension of this idea that can handle two unique elements. It can't be generalized to more than that.
Perhaps for k unique elements, we could use k accumulators to track sum(a[i]**k). i.e. a[i], a[i]2, etc. This probably only works for Faster algorithm to find unique element between two arrays?, not this case where the duplicates are all in one array. IDK if an xor of squares, cubes, etc. would be any use for resolving things.
Track the counts for each element and only return the elements with a count of 1. This can be done with a hash map. The below example tracks the result using a hash set while it's still building the counts map. Still O(n) but less efficient, but I think it's slightly more instructive.
Javascript with jsfiddle http://jsfiddle.net/nmckchsa/
function findUnique(arr) {
var uniq = new Map();
var result = new Set();
// iterate through array
for(var i=0; i<arr.length; i++) {
var v = arr[i];
// add value to map that contains counts
if(uniq.has(v)) {
uniq.set(v, uniq.get(v) + 1);
// count is greater than 1 remove from set
result.delete(v);
} else {
uniq.set(v, 1);
// add a possibly uniq value to the set
result.add(v);
}
}
// set to array O(n)
var a = [], x = 0;
result.forEach(function(v) { a[x++] = v; });
return a;
}
alert(findUnique([1,2,3,0,1,2,3,1,2,3,5,4,4]));
EDIT Since the non-uniq numbers appear an even number of times #PeterCordes suggested a more elegant set toggle.
Here's how that would look.
function findUnique(arr) {
var result = new Set();
// iterate through array
for(var i=0; i<arr.length; i++) {
var v = arr[i];
if(result.has(v)) { // even occurances
result.delete(v);
} else { // odd occurances
result.add(v);
}
}
// set to array O(n)
var a = [], x = 0;
result.forEach(function(v) { a[x++] = v; });
return a;
}
JSFiddle http://jsfiddle.net/hepsyqyw/
Assuming you have an input array: [2,3,4,2,4]
Output: 3
In Ruby, you can do something as simple as this:
[2,3,4,2,4].inject(0) {|xor, v| xor ^ v}
Create an array counts that has INT_MAX slots, with each element initialized to zero.
For each element in the input list, increment counts[element] by one. (edit: actually, you will need to do counts[element] = (counts_element+1)%2, or else you might overflow the value for really ridiculously large values of N. It's acceptable to do this kind of modulus counting because all duplicate items appear an even number of times)
Iterate through counts until you find a slot that contains "1". Return the index of that slot.
Step 2 is O(N) time. Steps 1 and 3 take up a lot of memory and a lot of time, but neither one is proportional to the size of the input list, so they're still technically O(1).
(note: this assumes that integers have a minimum and maximum value, as is the case for many programming languages.)

Minimum value of numbers in char array

I recently ran into this problem in an interview, and I was curious what the best way to solve it would be. The question is given a char array that has the ascii characters '0' to '9' make one swap such that the the set of ascii values in the resultant array forms the lowest possible value. Both the input array will not have preceding 0s and neither should the resultant array.
So here is an example: char a[] = {'1','0', '9','7','6'}
The solution: char b[] = { '1','0', '6', '7', '9'}
Another example: char a[] = {'9','0', '7','6','1'}
The solution: char b[] = {'1','0', '7','6','9'}
I am looking for the best solution in terms of performance. Since only one swap is allowed I assumed that sorting is not allowed. I did not clarify that though. So we are looking for the lowest possible value that can be obtained through just using one swap. It would help if you could provide the complexity of the solution as well.
Algorithm:
Note that since there can't be a leading 0, 0's should be catered for separately.
Go from the right, keeping track of the minimum non-zero number. Also record the first 0.
Whenever you find a number larger than the recorded non-zero number, record those 2 as the best possible swap so far.
Once you've found a zero, record any non-zero non-leading character from here as the best possible swap for the zero.
Note that you're not doing any comparisons between the quality of either of the swaps above, we simply replace the current best with the new one, as it's always better to swap with a more-left position.
When done, compare the target positions of the best swap for the zero and the best swap for the non-zero and pick the left-most position, or the zero if they're the same.
If no possible swap was found, the array is already the minimum permutation of the given numbers, thus don't do anything. Or, if we have to, swap the two right-most characters.
Running time:
O(n).
Example:
Input: 10976
Processing 6 7 9 0 1
Minimum 6 6 6 6 1
Best swap - 6+7 6+9 6+9 6+9
Zero? No No No Yes Yes
Best 0 swap - - - - -
So the best swap is 6/9, giving us 10679
Input: 3601
Processing 1 0 6 3
Minimum 1 1 1 3
Best swap - - 1+6 1+3
Zero? No Yes Yes Yes
Best 0 swap - - 0+6 0+6
Here possible swaps are 1/3 and 0/6.
For the 1/3 swap, the target position is 0 (0-indexed).
For the 0/6 swap, the target position is 1.
So we pick the 1/3 swap giving us 1603.
Depends on what you mean by best.
Take out the lowest number except '0' as the first item in b, then sort the rest into ascending order.
Start at the leftmost digit
While current digit is valued the lowest (non-zero for the first digits equal
to the number of zeroes in the set) of the remaining set of numbers, advance
to the next number
Swap current value with lowest remaining value.
You can keep a sorted copy of the array to help you in your decision making process. You'd need it for knowing how many there are of your current lowest number(s). You can make it even more efficient if you store their indices as well.
It's not necessary to have any additional storage, but it would likely make things faster. In a single pass of the first array, you can get the number of 0s as well as the index to the next lowest number, but in an array like 1, 0, 6, 9, 7, you would then have to go through the array 4 different times.
EDIT - slightly more flushed out algorithm
Copy the array into a separate one, called c, and sort c. (You'll use this to make your decision faster, though you could simply repeatedly analyze the array.)
if c has zeros, find the value of its first non-zero value, called x
if a[0] != x, swap a[0] with x.
set index to 1
while a[index] == c[index], ++index
swap a[index] with c[index]
If you do it this way, it costs you another array, but is done with one sort and one pass through the array.
If you don't do it that way, you'll have to pass through the remainder of the array each time to find the minimums. I'm not great at complexity, but I believe that's n log n, since you'll be starting from higher indices each iteration.
Without using the array, you'd have to do something like
Find the lowest non-zero value
if a[0] isn't this value, swap with this value
index = 1
find lowest value starting at a[index]
if they're not equal, swap the values, done. Otherwise, increment index
That's still n log n. I think sorting can make it more efficient.
I think that you clearly have to go through the array at least once. You need to do this to find the smallest value.
You also need to find the value which will satisfy the conditions. The poster says that the input array will not have '0' as the leading elements.
We loop over the array one position at a time. We are looking for another position which has the minimum value (non zero for the first spot, anything for the other spots) and is smaller than any other seen.
for (pos = 0; pos < array_size-1; pos++) {
low_index = pos;
min_value = (pos == 0) ? '1' : '0';
for (i = pos+1; i < array_size; i++) {
if (min_value <= array[i]) && (array[i] < array[low_index])) {
low_index = i;
}
}
if (low_index != pos) {
low_value = array[pos];
array[pos] = array[low_index];
array[low_index] = low_value;
break;
}
}

Puzzle : finding out repeated element in an Array

Size of an array is n.All elements in the array are distinct in the range of [0 , n-1] except two elements.Find out repeated element without using extra temporary array with constant time complexity.
I tried with o(n) like this.
a[]={1,0,0,2,3};
b[]={-1,-1,-1,-1,-1};
i=0;
int required;
while(i<n)
{
b[a[i]]++;
if(b[a[i]==1)
required=a[i];
}
print required;
If there is no constraint on range of numbers i.e allowing out of range also.Is it possible get o(n) solution without temporary array.
XOR all the elements together, then XOR the result with XOR([0..n-1]).
This gives you missing XOR repeat; since missing!=repeat, at least one bit is set in missing XOR repeat.
Pick one of those set bits. Iterate over all the elements again, and only XOR elements with that bit set. Then iterate from 1 to n-1 and XOR those numbers that have that bit set.
Now, the value is either the repeated value or the missing value. Scan the elements for that value. If you find it, it's the repeated element. Otherwise, it's the missing value so XOR it with missing XOR repeat.
Look what is first and last number
Calculate SUM(1) of array elements without duplicate (like you know that sum of 1...5 = 1+2+3+4+5 = 15. Call it SUM(1)). As AaronMcSmooth pointed out, the formula is Sum(1, n) = (n+1)n/2.
Calculate SUM(2) of the elements in array that is given to you.
Subtract SUM(2) - SUM(1). Whoa! The result is the duplicate number (like if a given array is 1, 2, 3, 4, 5, 3, the SUM(2) will be 18. 18 - 15 = 3. So 3 is a duplicate).
Good luck coding!
Pick two distinct random indexes. If the array values at those indexes are the same, return true.
This operates in constant time. As a bonus, you get the right answer with probability 2/n * 1/(n-1).
O(n) without the temp array.
a[]={1,0,0,2,3};
i=0;
int required;
while(i<n)
{
a[a[i] % n] += n;
if(a[a[i] % n] >= 2 * n)
required = a[i] % n;
}
print required;
(Assuming of course that n < MAX_INT - 2n)
This example could be useful for int, char, and string.
char[] ch = { 'A', 'B', 'C', 'D', 'F', 'A', 'B' };
Dictionary<char, int> result = new Dictionary<char, int>();
foreach (char c in ch)
{
if (result.Keys.Contains(c))
{
result[c] = result[c] + 1;
}
else
{
result.Add(c, 1);
}
}
foreach (KeyValuePair<char, int> pair in result)
{
if (pair.Value > 1)
{
Console.WriteLine(pair.Key);
}
}
Console.Read();
Build a lookup table. Lookup. Done.
Non-temporary array solution:
Build lookup into gate array hardware, invoke.
The best I can do is O(n log n) in time and O(1) in space:
The basic idea is to perform a binary search of the values 0 through n-1, passing over the whole array of n elements at each step.
Initially, let i=0, j=n-1 and k=(i+j)/2.
On each run through the array, sum the elements whose values are in the range i to k, and count the number of elements in this range.
If the sum is equal to (k-i)*(k-i+1)/2 + i*(k-i+1), then the range i through k has neither the duplicate nor the omitted value. If the count of elements is less than k-i+1, then the range has the omitted value but not the duplicate. In either case, replace i by k+1 and k by the new value of (i+j)/2.
Else, replace j by k and k by the new value of (i+j)/2.
If i!=j, goto 2.
The algorithm terminates with i==j and both equal to the duplicate element.
(Note: I edited this to simplify it. The old version could have found either the duplicate or the omitted element, and had to use Vlad's difference trick to find the duplicate if the initial search turned up the omitted value instead.)
Lazy solution: Put the elements to java.util.Set one by one by add(E) until getting add(E)==false.
Sorry no constant-time. HashMap:O(N), TreeSet:O(lgN * N).
Based on #sje's answer. Worst case is 2 passes through the array, no additional storage, non destructive.
O(n) without the temp array.
a[]={1,0,0,2,3};
i=0;
int required;
while (a[a[i] % n] < n)   
a[a[i++] % n] += n;
required = a[i] % n;
while (i-->0)
a[a[i]%n]-=n;
print required;
(Assuming of course that n < MAX_INT/2)

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...
The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.
It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.
Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.
Proof (completely rewritten in a more formal style):
Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.
We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.
Group boundaries
| | | | |
0 0 1 1 1 2 2 3 3
| | | |
Segment boundaries
Wherever there is an increase of two, we assume a double boundary by convention.
Group boundaries
| || | |
0 0 0 2 2 2 2 3 3
Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.
Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //
We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.
Group boundaries
| X | | |
0 0 ? 1 2 2 3 3 3
| | | |
Segment boundaries
Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".
Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.
Proof: Every other consecutive pair bounds only even groups. //
Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.
Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.
Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //
Example:
Observed group boundaries; relevant subsequence marked by […]
[ ] |
0 0 Y 1 1 Z 2 3 3
| | | |
Segment boundaries
Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1
As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.
At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.
In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.
Extensions:
This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.
It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).
A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).
PROOF OF O((log N)²)
T(2) = 1 //to make the summation nice
T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements
For some N=2^k,
T(2^k) = (log 2^k) + T(2^(k-1))
= (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
= (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
= k + (k-1) + (k-2) + ... + 1
= k(k+1)/2
= (k² + k)/2
= (log(N)² + log(N))/ 2
= O(log(N)²)
Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:
[* * * * a a a a * * *]
^ ^
| |
| |
i j
Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.
This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.
In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.
It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.
With that background, here are some observations:
1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.
2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.
3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.
4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).
5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.
Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.
As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.
I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.
The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).
The algorithm is as follows (written in C#):
// Finds the start of the range of equal numbers containing the index "index",
// which is assumed to be inside the array
//
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
int candidate = index;
int value = arr[index];
int step = 1;
// find the boundary for binary search:
while(candidate>=0 && arr[candidate] == value)
{
candidate -= step;
step *= 2;
}
// binary search:
int a = Math.Max(0,candidate);
int b = candidate+step/2;
while(a+1!=b)
{
int c = (a+b)/2;
if(arr[c] == value)
b = c;
else
a = c;
}
return b;
}
// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
if(arr[start] == arr[end-1])
return end;
int middle = (start+end)/2;
int rangeStart = findRangeStart(arr,middle);
if((rangeStart & 1) == 0)
return search(arr, middle, end);
return search(arr, start, rangeStart);
}
// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
return search(arr, 0, arr.Length);
}
Take the middle element e. Use binary search to find the first and last occurrence. O(log(n))
If it is odd return e.
Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]
Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).
AHhh. There is an answer.
Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.
In pseudocode (this is the general idea, not tested...):
private static int FindOddBall(int[] ary)
{
int l = 0,
r = ary.Length - 1;
int n = (l+r)/2;
while (r > l+2)
{
n = (l + r) / 2;
while (ary[n] == ary[n-1])
n = FindBreakIndex(ary, l, n);
if (n % 2 == 0) // even index we are on or to the left of the oddball
l = n;
else // odd index we are to the right of the oddball
r = n-1;
}
return ary[l];
}
private static int FindBreakIndex(int[] ary, int l, int n)
{
var t = ary[n];
var r = n;
while(ary[n] != t || ary[n] == ary[n-1])
if(ary[n] == t)
{
r = n;
n = (l + r)/2;
}
else
{
l = n;
n = (l + r)/2;
}
return n;
}
You can use this algorithm:
int GetSpecialOne(int[] array, int length)
{
int specialOne = array[0];
for(int i=1; i < length; i++)
{
specialOne ^= array[i];
}
return specialOne;
}
Solved with the help of a similar question which can be found here on http://www.technicalinterviewquestions.net
We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?
So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.
Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?
An array of size 5 could be divided into 1, 2 or 3 sublists.
What seems to be obvious is not that obvious, if we go into details.
An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).
Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).
Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?
Or can we calculate the solution without knowing the probability of the length of the sublists?
The clue is you're looking for log(n). That's less than n.
Stepping through the entire array, one at a time? That's n. That's not going to work.
We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.
So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.
Keep going from there.
Use a hash table
For each element E in the input set
if E is set in the hash table
increment it's value
else
set E in the hash table and initialize it to 0
For each key K in hash table
if K % 2 = 1
return K
As this algorithm is 2n it belongs to O(n)
Try this:
int getOddOccurrence(int ar[], int ar_size)
{
int i;
int xor = 0;
for (i=0; i < ar_size; i++)
xor = xor ^ ar[i];
return res;
}
XOR will cancel out everytime you XOR with the same number so 1^1=0 but 1^1^1=1 so every pair should cancel out leaving the odd number out.
Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].
edit: due to public demand, here is the code
int f(int *x, int min, int max) {
int size = max;
min /= 2;
max /= 2;
while (min < max) {
int i = (min + max)/2;
if (i==0 || x[2*i-1] == x[2*i])
min = i+1;
else
max = i-1;
}
if (2*max == size || x[2*max] != x[2*max+1])
return x[2*max];
return x[2*min];
}

Resources