Please help me understand a while condition in a binary search function - c

I am learning in C about sorting arrays and while loops. Some code from the book I'm using to study this with is the following:
//function to perform binary search of an array
size_t binarySearch(const int b[], int searchKey, size_t low, size_t high)
{
//loop until low index is greater than high index
while (low <= high) {
//determine middle element of subarray being searched
size_t middle = (low + high) / 2;
//display subarray used in this iteration
printRow(b, low, middle, high);
// if searchKey matched middle element, return middle
if (searchKey == b[middle]) {
return middle;
}
// if searchKey is less than middle element, set new high
else if (searchKey < b[middle]) {
high = middle - 1; //search low end of array
}
else {
low = middle + 1; //search high end of the array
}
}//end while
return -1; //searchKey not found
}
The problem is that I can't figure out how the initial while condition works: while (low <= high). I mean it seems like low can never be greater than high. Can anyone tell me under what situation low would be greater than high and therefore terminate the loop?
I tried to write down and visualize how the algorithm might work, but I cannot understand it.

In these conditions value of low is greater than the value of high
When an element is not in the list and the element to find is greater than the element at mid: There will be only one element in the list, so the low = middle + 1 will execute, and the resulting value of low greater
When in function call we pass high greater than low
binarySearch(arr, search_value, 4, 3)
Try for these inputs
Input
array = {2, 3, 4, 10, 40}
to_search = 11
Output (low, high)
0, 4
3, 4
4, 4
4, 3 <--- low > high

Well, don't treat the statement as something in normal everyday talks. Of course, normally there is no discussion about any chance of having 5 being higher, than, say, 8.
But in code, loops and conditions, things are different, because code is dynamic. If you'll follow the code, among the many conditions you'll see that there is a situation of low = middle + 1; (that's the last condition). At the "opposite" spectrum you'll see a condition above it high = middle - 1;. And of course one that controls middle: middle = (low + high) / 2;.
In this situation, low has a chance (assuming you hit its condition) to increase and high has a chance (again, assuming you hit its condition) to decrease.
Going back to our random example at the start of 5 and 8, that means the first number might eventually go up from to 7 and 8 might go down to 4. In our code, of course, you need to follow the actual flow and see the values and how and when they hit those conditions.
But bottom line is, low and high are never static. This is a rare occasion as you'll find most loops only modifying the first variable (in our case **low **). But that is how one patter of loops is: you start off at one value or one condition, then you progress in the code until you break that condition.
Small caveat: because control flow is man-made, if one does not pay attention they might end up with infinite loops (meaning the stop condition is never triggered). That means the program will go in a loop and basically "sit still" but still use up CPU power (and, depending on what it does in the loop, even memory until the machine runs out of memory).
What you'll see in some loops as a precaution is something like this:
int i = 0;
while (some condition)
{
if (i++ > some number)
break;
... some control flow code
}
What that does is a failsafe stop. a variable is incremented each loop until a number that the programmer feels is safe enough then exists the loop. break is how you exit loops. Of course, this is just a failsafe, as I said, normally you need to make sure that the code inside the loops are properly handled and no chance of an infinite loop happens.

Let's assume that searchKey is not in your array to make the explanation easier. Here is an example:
b is { 1, 2, 3 }
searchKey is 4
low is 0
high is 2
Initially low is less than high so we enter the while loop. middle is (0 + 2) / 2 which is 1. In our particular case we hit the else statement which means we set low to middle + 1 which is 2. That means low is now equal to high so the loop continues. This time middle is (2 + 2) / 2 which is 2. We hit the else statement again setting low to 3 making low greater than high and ending the loop
If instead searchKey was -1 then we would have set high to middle - 1 which is 0. That would have meant high is now equal to low. On the following iteration of the loop we would set high again with a value of -1 ending the loop

Related

A probability theory problem in skiplist's C implement

These days I am looking at skiplist code in Algorithms in C, Parts 1-4, and when insert a new value into skiplist is more complex than I think. During insert, code should ensure that the new value insert into level i with the probability of 1/2^i, and this implement by below code:
static int Rand()
{
int i, j = 0;
uint32_t t = rand();
for (i = 1, j = 2; i < lg_n_max; i ++, j += j)
if (t > RANDMAX / j)
break;
if (i > lg_n)
lg_n = i;
return i;
}
I don't know how the Rand function ensure this, can you explain this for me, thank you.
Presumably RANDMAX is intended to be RAND_MAX.
Neglecting rounding issues, half the return values of rand are above RAND_MAX / 2, and therefore half the time, the loop exits with i = 1.
If the loop continues, it updates i to 2 and j to 4. Then half the remaining return values (¾ of the total) are above RAND_MAX / 4, so, one-quarter of the time, the loop exits with i = 2.
Further iterations continue in the same manner, each iteration exiting with a portion of return values that is half the previous, until the lg_n_max limit is reached.
Thus, neglecting rounding issues and the final limit, the routine returns 1 half the time, 2 one-quarter of the time, 3 one-eighth the time, and so on.
lg_n is not defined in the routine. It appears to be a record of the greatest value returned by the routine so far.
Thanks Eric Postpischil very much for his answer, I have understand how to ensure the probability. And I have a more understood answer:
The t is a random value between 0 and RANDMAX, and we assume that the loop will run 2 times. In the first loop, value of t is smaller than RANDMAX/2^1, means that value of t fall into the range from 0 to RANDMAX/2 , the probability of this is 1/2. In the second loop, remember the fact that value of t is in the range of (0, RANDMAX/2^i), value of t is smaller that RANDMAX/2^2, means that value of t fall into the range from 0 to RANDMAX/2^2, the probability of this is also 1/2, because the range of (0, RANDMAX/2^2) is only 1/2 of the range in first loop, and the first loop show value of t is in the range of (0, RANDMAX/2^1). And notice that the probability of second loop is conditional probability,it‘s based on the probability of first loop, so the probability of second loop is 1/2*1/2=1/4.
In a short, every loop will bring a * 1/2 to last loop's probability.

binary search nearest match with last occurrence

I am implementing effective algorithm to search last occurrence of ( the key or nearest match (upper bound)).
So far, i got this.
long bin_search_closest_match_last_occurance ( long * lArray, long sizeArray, long lnumber)
{
long left, right, mid, last_occur;
left = 0;
right = sizeArray - 1;
last_occur = -1;
while ( left <= right )
{
mid = ( left + right ) / 2;
if ( lArray[mid] == lnumber )
{
last_occur = mid;
left = mid +1;
}
if ( lArray[mid] > lnumber )
right = mid - 1;
else
left = mid + 1;
}
return last_occur!=-1?last_occur:mid;
}
Let's have an array {0,0,1,5,9,9,9,9} and the key is 6
Fce should return index 7, but my fce returns 4
Please note, that i do not want to iterate linearly to the last matching index.
In mind i have solution where i change parameters fce(add start,end indexes) and do another binary search withing fce from found upper bound to the end of the array (Only if i dont find exact match, last_occur==-1).
I want to ask if there's better/cleaner solution to implement it?
n.m.'s 2-search approach will work, and it keeps the optimal time complexity, but it's likely to increase the constant factor by around 2, or by around 1.5 if you begin the second search from where the first search ended.
If instead you take an "ordinary" binary search that finds the first instance of lnumber (or, if it doesn't exist, a lower bound), and change it so that the algorithm logically "reverses" the array by changing every array access lArray[x] to lArray[sizeArray - 1 - x] (for any expression x), and also "reverse" the ordering by changing the > lnumber test to < lnumber, then only a single binary search is needed. The only array accesses this algorithm actually performs are two lookups to lArray[mid], which an optimising compiler is very likely to evaluate only once if it can prove that nothing will change the value in between the accesses (this might require adding restrict to the declaration of long * lArray; alternatively, you could just load the element into a local variable and test it twice instead). Either way, if only a single array lookup per iteration is needed, then changing the index from mid to sizeArray - 1 - mid will add just 2 extra subtractions per iteration (or just 1 if you --sizeArray before entering the loop), which I expect will not increase the constant nearly as much as n.m.'s approach. Of course, as with anything, if performance is critical then test it; and if it's not, then don't worry too much about saving microseconds.
You will also need to "reverse" the return value too:
return last_occur!=-1?last_occur:sizeArray - 1 - mid;

Writing a recursive binary search in C

I've found the following code online,
int binary_search(int a[], int low, int high, int target) {
if (high < low)
return -1;
int middle = (low + high)/2;
if (target < a[middle])
return binary_search(a, low, middle-1, target);
else if (target > a[middle])
return binary_search(a, middle+1, high, target);
else if (target == a[middle])
return middle;
}
My function has a specified prototype(meaning that it has a set number of arguments that cannot be altered) this is what I have so far
bool search(int value, int array[], int n) {
if (array[n/2] == value)
return 1;
else if (array[n/2] < value)
return search(value, &array[n/2], (n)/2);
else
// how do I "return" the other half?
}
Does my implementation look correct so far? I can't seem to figure out how to implement the final else statement.
high and low represent the bounds of subarray in which continue the research. If you analyze the code you'll notice that if target is smaller that a[middle] you'll have to continue the research in the first half of the array (in fact it calls binary_search passing the same low bound but, as a superior bound, the actual middle-1). On the other side, if target is greater that a[middle] you'll have to continue the research in the second half of the array (from middle+1 to high). Of course, if target is equal to a[middle] you've finished.
The trick to writing a recursive anything:
Figure out how it should end.
Figure out how it should look one step before it ends.
Figure out how it should look two steps before it ends, and how moving from #2 to #1 is exactly the same as moving from #3 to #2.
Step #1:
If the number at the beginning of the search range is the desired number, return true.
If the end of the search range is the same as the beginning of the search range, and the number in the search range is not the desired number, return false.
Step #2:
If the search range has a length of two, split it into two one element search ranges, and search the range that might contain the required number.
Step #3:
If the search range has a length of more than two, split it into two roughly equal search ranges, and search the range that might contain the required number.
(which combining the two would look like)
If the search range has a length of two or more elements, split it into two roughly equal ranges, check the highest (last) number in the "lower" range, if the number is equal to or less than that number, search the lower range; otherwise, search the higher range.
This technique will not return you an optimum solution unless you select an optimum way to solve the problem; however, it will return you a correct solution (provided you do not make any true blunders).
Now the code
bool search(int value int array[], int lowIndex, int highIndex) {
if (array[lowIndex] == value) {
return true;
} else if (lowIndex == highIndex) {
return false;
}
int middleIndex = lowIndex + highIndex / 2;
if (array[middleIndex] <= value) {
return search(value, array, lowIndex, middleIndex);
} else {
return search(value, array, middleIndex+1, highIndex);
}
}
When reading code online, you have a big disadvantage. You don't do any of the above three steps, so you really have to go about solving the problem backwards. It is akin to saying, I have a solution, but now I have to figure out how someone else solved it (assuming that they didn't make any mistakes, and assuming that they had the exact same requirements as you).
The high and low variables represent the current range you are searching. You usually start with the beginning and end of the array, and then determine if the value is in the first or second half, or exactly in the middle. If it is in the middle, you return that point. If it is below the middle, you search again (recursively), but now only in the lower half. If it is above the middle, you search the upper half. And you repeat this, each time dividing up and narrowing the range. If you find the value, you return, otherwise, if the range is so narrow that it is empty (both low and end high indexes are the same), you didn't find it.
High and low are upper and lower bounds on the candidate indices of the array. In other words, they define the portion of the subarray in which it is possible for the search target to exist. Since the size of the subarray is cut in half each iteration, it is easy to see that the algorithm is O(log n).
return search(value, &array[(n)/2], (n)/2);
On your current code, first of all, n should not be in parentheses (it doesn't make a difference, but it confuses me).
Next up, if it's meant to be returning the index in the array, your code doesn't do that, it returns 1. Judging by the prototype, you might consider a non-recursive approach, but this can work fine if you add the right values on to each return.
You can figure out the other statement. Just draw a picture, figure out where the pointers should be, and code them up. Here's a start:
new array if > n/2
v-----------v
0, 1, 2, 3, 4, 5, 6, 7
^
n/2
Actually, you probably don't want to be including your middle value. Finally, make sure to take in to account lists of length zero, one, two and three. And please write unit tests. This is probably one of the most often incorrectly implemented algorithms.
I have tried to resolve your problem and this below code is really work . But what is the condition to escape recursion if value that want to be searched not lies in array
if(value==a[size/2]) return size/2;
if( value<a[size/2]) {
search(a,size/2,value);
} else if (value>a[size/2] && a[size/2]<a[(a.length-1)/2]) {
search(a,size/2+size,value);
} else {
search(a,size/2+a.length-1,value);
}
int * binSearch (int *arr,int size, int num2Search)
{
if(size==1)
return ((*arr==num2Search)?(arr):(NULL));
if(*(arr+(size/2))<num2Search)
return binSearch(arr+(size/2)+1,(size/2),num2Search);
if(*(arr+(size/2))==num2Search)
return (arr+(size/2));
return binSearch(arr,(size/2),num2Search);
}

Is it possible to have only one comparison per iteration of a binary search algorithm?

In binary search algorithm we have two comparisons:
if (key == a[mid]) then found;
else if (key < a[mid]) then binary_search(a[],left,mid-1);
else binary_search(a[],mid+1,right);
Is there a way by which I can have only one comparison instead of the above two.
--
Thanks
Alok.Kr.
See:
http://en.wikipedia.org/wiki/Binary_search_algorithm#Single_comparison_per_iteration
Taken from wiki:
low = 0
high = N
while (low < high) {
mid = low + ((high - low) / 2)
if (A[mid] < value)
low = mid + 1;
else
//can't be high = mid-1: here A[mid] >= value,
//so high can't be < mid if A[mid] == value
high = mid;
}
// high == low, using high or low depends on taste
if ((low < N) && (A[low] == value))
return low // found
else
return -1 // not found
Pros/cons from wiki:
"This approach foregoes the possibility of early termination on discovery of a match, thus successful searches have log2(N) iterations instead of an expected log2(N) − 1 iterations. On the other hand, this implementation makes fewer comparisons: log2(N) is less than the expected number of comparisons for the two-test implementations of 1·5(log2(N) − 1), for N greater than eight."
Yes. Just don't eliminate mid from the recursive call.
if ( left == right ) return NULL;
if ( left + 1 == right ) return key == a[left]? &a[left] : NULL;
mid = left + ( right - left / 2 );
if (key < a[mid]) return binary_search(a[],left,mid-1);
else return binary_search(a[],mid,right); // include `mid` in next round
You only need to eliminate half of the set with each recursion to achieve O(logN) performance. You're going above and beyond by eliminating half+1.
If you only use < during recursion, the algorithm will find the least element which is not less than key (but may be greater than key). Finish off by performing a single equality test.
In assembler, you could:
cmp key,a[mid]
beq found
bge else
So if your compiler is really good at peephole optimizations, it might already do this for you.
This is recursive algorithm. First comparison is stop criteria and second actual search, so you cannot remove them.
In first you asking whenever you have already found the element and in second in which part of the array to look for element. So you cannot make those decisions based only on one comparison.
First things first: do you need to optimize the program? Have you measured to know where you need to do it? Is it in this function?
For primitive types the second comparison is as fast an operation as it gets. The higher cost of the comparison is loading the element into the appropriate register, and that is needed for the first comparison. Once that comparison is executed, the value is already in a register and the second operation takes a single processor instruction plus the possible cost of the branch misprediction.
Assuming integral types, the cost in processor time of the algorithm is most probably dominated by the cost of the recursive calls if the compiler is not being able to perform tail-recursion optimization. If you really need to optimize this, try compiling with all the optimization flags on and analyze the assembler to identify whether the tail-recursion optimization is being applied. If not, manually convert the algorithm from recursive to iterative.
This will have two effects: obscure the code (avoid modifying a clean solution unless you really need to) and it avoid function calls.
If you are speaking of C++, and the type is complex and the overloaded comparison operators are expensive, the fastest boost in performance is implementing a compare method that will return a negative number for less-than, 0 for equal, and a positive number if greater-than. Then precompute the result before the comparisons and then perform integer only checks. That will remove the overall cost of the algorithm to a single processing of the real objects with the expensive comparison and set you back in the original assumption.
for (step = 0; step < n; step <<= 1);
for (i = 0; step; step >>= 1)
if (i + step < n && v[i + step] <= x)
i += step;
Ok, this was a interview question in Adobe, and I was just trying to figure out how to do this.
Now I have got solution to it, so I,m posting
void binary_search (int a[] , int low , int high , int key )
{
int mid = (low+high)/2;
if (key == a[mid]) {
printf ("Number Found\n");
return;
}
else {
int sign = Calc_sign (key-a[mid]);
low = low*sign + (1-sign)*mid;
high = mid*sign + (1-sign)*right;
binary_search (a,low,high,key);
}
}
int Calc_sign(int a)
{
return ( (a & 80000000) >> 31);
}
So in the code there will only be one comparison for checking if the keyvalue is eqaul to the mid element.
--
Thanks
Alok Kr.

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...
The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.
It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.
Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.
Proof (completely rewritten in a more formal style):
Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.
We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.
Group boundaries
| | | | |
0 0 1 1 1 2 2 3 3
| | | |
Segment boundaries
Wherever there is an increase of two, we assume a double boundary by convention.
Group boundaries
| || | |
0 0 0 2 2 2 2 3 3
Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.
Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //
We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.
Group boundaries
| X | | |
0 0 ? 1 2 2 3 3 3
| | | |
Segment boundaries
Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".
Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.
Proof: Every other consecutive pair bounds only even groups. //
Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.
Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.
Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //
Example:
Observed group boundaries; relevant subsequence marked by […]
[ ] |
0 0 Y 1 1 Z 2 3 3
| | | |
Segment boundaries
Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1
As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.
At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.
In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.
Extensions:
This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.
It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).
A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).
PROOF OF O((log N)²)
T(2) = 1 //to make the summation nice
T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements
For some N=2^k,
T(2^k) = (log 2^k) + T(2^(k-1))
= (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
= (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
= k + (k-1) + (k-2) + ... + 1
= k(k+1)/2
= (k² + k)/2
= (log(N)² + log(N))/ 2
= O(log(N)²)
Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:
[* * * * a a a a * * *]
^ ^
| |
| |
i j
Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.
This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.
In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.
It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.
With that background, here are some observations:
1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.
2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.
3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.
4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).
5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.
Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.
As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.
I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.
The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).
The algorithm is as follows (written in C#):
// Finds the start of the range of equal numbers containing the index "index",
// which is assumed to be inside the array
//
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
int candidate = index;
int value = arr[index];
int step = 1;
// find the boundary for binary search:
while(candidate>=0 && arr[candidate] == value)
{
candidate -= step;
step *= 2;
}
// binary search:
int a = Math.Max(0,candidate);
int b = candidate+step/2;
while(a+1!=b)
{
int c = (a+b)/2;
if(arr[c] == value)
b = c;
else
a = c;
}
return b;
}
// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
if(arr[start] == arr[end-1])
return end;
int middle = (start+end)/2;
int rangeStart = findRangeStart(arr,middle);
if((rangeStart & 1) == 0)
return search(arr, middle, end);
return search(arr, start, rangeStart);
}
// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
return search(arr, 0, arr.Length);
}
Take the middle element e. Use binary search to find the first and last occurrence. O(log(n))
If it is odd return e.
Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]
Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).
AHhh. There is an answer.
Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.
In pseudocode (this is the general idea, not tested...):
private static int FindOddBall(int[] ary)
{
int l = 0,
r = ary.Length - 1;
int n = (l+r)/2;
while (r > l+2)
{
n = (l + r) / 2;
while (ary[n] == ary[n-1])
n = FindBreakIndex(ary, l, n);
if (n % 2 == 0) // even index we are on or to the left of the oddball
l = n;
else // odd index we are to the right of the oddball
r = n-1;
}
return ary[l];
}
private static int FindBreakIndex(int[] ary, int l, int n)
{
var t = ary[n];
var r = n;
while(ary[n] != t || ary[n] == ary[n-1])
if(ary[n] == t)
{
r = n;
n = (l + r)/2;
}
else
{
l = n;
n = (l + r)/2;
}
return n;
}
You can use this algorithm:
int GetSpecialOne(int[] array, int length)
{
int specialOne = array[0];
for(int i=1; i < length; i++)
{
specialOne ^= array[i];
}
return specialOne;
}
Solved with the help of a similar question which can be found here on http://www.technicalinterviewquestions.net
We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?
So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.
Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?
An array of size 5 could be divided into 1, 2 or 3 sublists.
What seems to be obvious is not that obvious, if we go into details.
An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).
Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).
Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?
Or can we calculate the solution without knowing the probability of the length of the sublists?
The clue is you're looking for log(n). That's less than n.
Stepping through the entire array, one at a time? That's n. That's not going to work.
We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.
So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.
Keep going from there.
Use a hash table
For each element E in the input set
if E is set in the hash table
increment it's value
else
set E in the hash table and initialize it to 0
For each key K in hash table
if K % 2 = 1
return K
As this algorithm is 2n it belongs to O(n)
Try this:
int getOddOccurrence(int ar[], int ar_size)
{
int i;
int xor = 0;
for (i=0; i < ar_size; i++)
xor = xor ^ ar[i];
return res;
}
XOR will cancel out everytime you XOR with the same number so 1^1=0 but 1^1^1=1 so every pair should cancel out leaving the odd number out.
Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].
edit: due to public demand, here is the code
int f(int *x, int min, int max) {
int size = max;
min /= 2;
max /= 2;
while (min < max) {
int i = (min + max)/2;
if (i==0 || x[2*i-1] == x[2*i])
min = i+1;
else
max = i-1;
}
if (2*max == size || x[2*max] != x[2*max+1])
return x[2*max];
return x[2*min];
}

Resources