Writing a recursive binary search in C

Writing a recursive binary search in C - c

I've found the following code online,
int binary_search(int a[], int low, int high, int target) {
if (high < low)
return -1;
int middle = (low + high)/2;
if (target < a[middle])
return binary_search(a, low, middle-1, target);
else if (target > a[middle])
return binary_search(a, middle+1, high, target);
else if (target == a[middle])
return middle;
}
My function has a specified prototype(meaning that it has a set number of arguments that cannot be altered) this is what I have so far
bool search(int value, int array[], int n) {
if (array[n/2] == value)
return 1;
else if (array[n/2] < value)
return search(value, &array[n/2], (n)/2);
else
// how do I "return" the other half?
}
Does my implementation look correct so far? I can't seem to figure out how to implement the final else statement.

high and low represent the bounds of subarray in which continue the research. If you analyze the code you'll notice that if target is smaller that a[middle] you'll have to continue the research in the first half of the array (in fact it calls binary_search passing the same low bound but, as a superior bound, the actual middle-1). On the other side, if target is greater that a[middle] you'll have to continue the research in the second half of the array (from middle+1 to high). Of course, if target is equal to a[middle] you've finished.

The trick to writing a recursive anything:
Figure out how it should end.
Figure out how it should look one step before it ends.
Figure out how it should look two steps before it ends, and how moving from #2 to #1 is exactly the same as moving from #3 to #2.
Step #1:
If the number at the beginning of the search range is the desired number, return true.
If the end of the search range is the same as the beginning of the search range, and the number in the search range is not the desired number, return false.
Step #2:
If the search range has a length of two, split it into two one element search ranges, and search the range that might contain the required number.
Step #3:
If the search range has a length of more than two, split it into two roughly equal search ranges, and search the range that might contain the required number.
(which combining the two would look like)
If the search range has a length of two or more elements, split it into two roughly equal ranges, check the highest (last) number in the "lower" range, if the number is equal to or less than that number, search the lower range; otherwise, search the higher range.
This technique will not return you an optimum solution unless you select an optimum way to solve the problem; however, it will return you a correct solution (provided you do not make any true blunders).
Now the code
bool search(int value int array[], int lowIndex, int highIndex) {
if (array[lowIndex] == value) {
return true;
} else if (lowIndex == highIndex) {
return false;
}
int middleIndex = lowIndex + highIndex / 2;
if (array[middleIndex] <= value) {
return search(value, array, lowIndex, middleIndex);
} else {
return search(value, array, middleIndex+1, highIndex);
}
}
When reading code online, you have a big disadvantage. You don't do any of the above three steps, so you really have to go about solving the problem backwards. It is akin to saying, I have a solution, but now I have to figure out how someone else solved it (assuming that they didn't make any mistakes, and assuming that they had the exact same requirements as you).

The high and low variables represent the current range you are searching. You usually start with the beginning and end of the array, and then determine if the value is in the first or second half, or exactly in the middle. If it is in the middle, you return that point. If it is below the middle, you search again (recursively), but now only in the lower half. If it is above the middle, you search the upper half. And you repeat this, each time dividing up and narrowing the range. If you find the value, you return, otherwise, if the range is so narrow that it is empty (both low and end high indexes are the same), you didn't find it.

High and low are upper and lower bounds on the candidate indices of the array. In other words, they define the portion of the subarray in which it is possible for the search target to exist. Since the size of the subarray is cut in half each iteration, it is easy to see that the algorithm is O(log n).

return search(value, &array[(n)/2], (n)/2);
On your current code, first of all, n should not be in parentheses (it doesn't make a difference, but it confuses me).
Next up, if it's meant to be returning the index in the array, your code doesn't do that, it returns 1. Judging by the prototype, you might consider a non-recursive approach, but this can work fine if you add the right values on to each return.
You can figure out the other statement. Just draw a picture, figure out where the pointers should be, and code them up. Here's a start:
new array if > n/2
v-----------v
0, 1, 2, 3, 4, 5, 6, 7
^
n/2
Actually, you probably don't want to be including your middle value. Finally, make sure to take in to account lists of length zero, one, two and three. And please write unit tests. This is probably one of the most often incorrectly implemented algorithms.

I have tried to resolve your problem and this below code is really work . But what is the condition to escape recursion if value that want to be searched not lies in array
if(value==a[size/2]) return size/2;
if( value<a[size/2]) {
search(a,size/2,value);
} else if (value>a[size/2] && a[size/2]<a[(a.length-1)/2]) {
search(a,size/2+size,value);
} else {
search(a,size/2+a.length-1,value);
}

int * binSearch (int *arr,int size, int num2Search)
{
if(size==1)
return ((*arr==num2Search)?(arr):(NULL));
if(*(arr+(size/2))<num2Search)
return binSearch(arr+(size/2)+1,(size/2),num2Search);
if(*(arr+(size/2))==num2Search)
return (arr+(size/2));
return binSearch(arr,(size/2),num2Search);
}

Related

can someone suggest a better algorithm than this to check if there is at least one duplicate value in an array?

an unsorted integer array nums, and it's size numsSize is given as arguments of function containsDuplicate and we have to return a boolean value true if at least one duplicate value is there otherwise false.
for this task I chose to check if every element, and the elements after that are equal or not until last second element is reached, if equal I will be returning true otherwise false.
bool containsDuplicate(int* nums, int numsSize){
for(int i =0 ;i< numsSize-1;i++)
{
for(int j = i+1;j < numsSize; j++)
{
if(nums[i] == nums[j])
{
return true;
}
}
}
return false;
}
To minimize run time, I've written return value just when the duplicates are found, but still my code is not performing well on large size arrays, I'm expecting an algorithm which has a time complexity O(n) if possible. And is there anyway we can skip the values which are duplicates of previously looked values?
I've seen all other solutions, but I couldn't find a better solution in C.

Your algorithm is O(n^2). But if you sort first, which can be done in less than O(n^2), then determining if there is a duplicate in the array is O(n).
You could maintain a lookup table to determine if each value has been previously seen, which would run in O(n) time, but unless the potential range of values stored in the array are relatively small, this has prohibitive memory usage.
For instance, if you know the values in the array will range from 0-127.
int contains_dupes(int *arr, size_t n) {
char seen[128] = {0};
for (size_t i = 0; i < n; i++) {
if (seen[arr[i]]) return 0;
seen[arr[i]] = 1;
}
return 1;
}
But if we assume int is 4 bytes, and the values in the array can be any int, and we use char for our lookup table, then your lookup table would have to be 4GB in size.

O(n) time, O(n) space: use a set or map. Parse your array, checking each element in turn for membership in your set or map. If it's present then you've found a duplicate; if not, then add it.
If O(n) space is too expensive, you can get away with far less by doing a first pass using a cuckoo hash, which is a space efficient data structure that guarantees no false negatives, but can have false positives. Use the same approach as above but with the cuckoo hash instead of a set or map. Any duplicates you detect may be false positives, so will need to be checked.
Then, parse the array a second time, using the approach described in the first paragraph, but skip past anything that isn't in your set of candidates.
This is still O(n) time.
https://en.wikipedia.org/wiki/Cuckoo_hashing

Magic Array Index Time/Space Complexity

I've been looking at the following problem:
Magic Index: A magic index in an array A[0...n-1] is defined to be an index i such as A[i] = i. Given a sorted non-distinct array of integers, write a method to find a magic index if one exists.
Here is my solution:
static int magicNonDistinct(int[] array, int start, int end) {
if (end < start) return -1;
int mid = start + (end - start) / 2;
if (mid < 0 || mid >= array.length) return -1;
int v = array[mid];
if (v == mid) return mid;
int leftEnd = Math.min(v, mid - 1);
int leftRes = magicNonDistinct(array, start, leftEnd);
if (leftRes != -1) return leftRes;
int rightStart = Math.max(v, mid + 1);
int rightRes = magicNonDistinct(array, rightStart, end);
return rightRes;
}
It works just fine and is the recommended solution from the book Cracking The Code Interview 6th Edition, problem 8.3 Follow up (sorry for spoiling).
However when running this on a distinct array with no magic index, it visits all the elements, yielding a worst case running time of O(n).
Since it is recursive it takes O(n) memory as worst case.
Why would this solution be preferable to just iterating over the array? This solution (my own) is better I would argue:
static int magicNonDistinctV2(int[] array) {
for (int i = 0; i < array.length; ++i) {
int v = array[i];
if (v == i) return v;
if (v >= array.length) return -1;
else if (v > i) i = v - 1;
}
return -1;
}
O(n) running time O(1) space always?
Could somebody derive a better time complexity for the initial algorithm? I've been thinking about looking if it is O(d), where d is the number of distinct elements, however that case is also wrong since the min/max only works in one direction (think about if v = 5, mid = 4 and the lower part of the array is all fives).
EDIT:
Ok people think I'm bananas and scream O(log(n)) as soon as they see something that looks like binary search. Sorry for being unclear folks.
Let's talk about the code in the first posting I made (the solution by CTCI):
If we have an array looking like this: [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8], actually an array looking like this: [-1,...,n-2] of size n, we know that there is not element that can match. However - the algorithm will visit all elements since the elements aren't unique. I dare you, run it, it can not divide the search space by 2 as in a regular binary search. Please tell me what is wrong with my reasoning.

No, in my opinion the first solution is not O(log n) as other answers state, it is really O(n) worst case (in the worst case it still needs to go through all the elements, consider equivalence array shifted by one as also mentioned by the author).
The cause why it is not O(log n) is because it needs to search on both sides of the middle (binary search only checks one side of middle therefore it is O(log n)).
It allows to skip items if you're lucky, however your second iterative solution skips items too if not needed to look on them (because you know there cannot be magic index in such range as the array is sorted) so in my opinion the second solution is better (the same complexity + iterative i.e. better space complexity and no recursive calls which are relatively expensive).
EDIT: However when I thought about the first solution again, it on the other side allows to also "skip backwards" if possible, which the iterative solution does not allow - consider for example an array like { -10, -9, -8, -7, -6, -5 } - the iterative solution would need to check all the elements, because it starts at the beginning and the values do not allow to skip forward, whereas when starting from the middle, the algo can completely skip checking the first half, then the first half of the second half, etc.

You are correct, the worst case complexity is O(n). You may have to visit all the elements of your array.
There is only one reason to not visit the array elements [mid, end] and that is when array[mid] > end (because in that case, the magic index is surely absent from [mid, end] elements).
Similarly, there is only one reason to not visit the array elements [start, mid] and that is when array[start] > mid.
So, there is a hope that you may not have to visit all the elements. Therefore it is one optimization which may work.
Thus, this binary-like method seems better than iterating over the entire array linearly but in worst case, you will hit O(n).
PS: I've assumed that array is sorted in ascending order.

It looks like you misunderstood the time complexity the required solution. The worse case is not O(n), it is O(log(n)). This is because during each pass you search next time only half of the array.
Here is a C++ example and check that for the whole array of 11 elements, it take only 3 checks.

Binary search for multiple distinct numbers in a large array in minimum number of comparisons

I have a large array of size n (say n = 1000000) with values monotonically non-decreasing. I have a set of 'k' key values (say k = { 1,23,39,55,..}). Assume key values are sorted. I have to find the index of these key values in the large array using minimum number of comparisons. How do I use binary search to search for multiple unique values? Doing it separately for each key value takes lot of comparisons. Can I use reuse some knowledge I learned in one search somehow when I search for another element on the same big array?

Sort the needles (the values you will search for).
Create an array of the same length as the needles, with each element being a pair of indexes. Initialize each pair with {0, len(haystack)}. These pairs represent all the knowledge we have of the possible locations of the needles.
Look at the middle value in the haystack. Now do binary search for that value in your needles. For all lesser needles, set the upper bound (in the array from step 2) to the current haystack index. For all greater needles, set the lower bound.
While you were doing step 3, keep track of which needle now has the largest range remaining. Bisect it and use this as your new middle value to repeat step 3. If the largest range is singular, you're done: all needles have been found (or if not found, their prospective location in the haystack is now known).
There may be some slight complication here when you have duplicate values in the haystack, but I think once you have the rest sorted out this should not be too difficult.
I was curious if NumPy implemented anything like this. The Python name for what you're doing is numpy.searchsorted(), and once you get through the API layers it comes to this:
/*
* Updating only one of the indices based on the previous key
* gives the search a big boost when keys are sorted, but slightly
* slows down things for purely random ones.
*/
if (#TYPE#_LT(last_key_val, key_val)) {
max_idx = arr_len;
}
else {
min_idx = 0;
max_idx = (max_idx < arr_len) ? (max_idx + 1) : arr_len;
}
So they do not do a full-blown optimization like I described, but they do track when the current needle is greater than the last needle, they can avoid searching the haystack below where the last needle was found. This is a simple and elegant improvement over the naive implementation, and as seen from the comments, it must be kept simple and fast because the function does not require the needles to be sorted in the first place.
By the way: my proposed solution aimed for something like theoretical optimality in big-O terms, but if you have a large number of needles, the fastest thing to do is probably to sort the needles then iterate over the entire haystack and all the needles in tandem: linear-search for the first needle, then resume from there to look for the second, etc. You can even skip every second item in the haystack by recognizing that if a needle is greater than A and less than C, it must belong at position B (assuming you don't care about the left/right insertion order for needles not in the haystack). You can then do about len(haystack)/2 comparisons and the entire thing will be very cache-friendly (after sorting the needles, of course).

One way to reuse knowledge from previous steps is like others suggested: once you have located a key, you can restrict the search ranges for the smaller and larger keys.
Assuming N=2^n, K=2^k and lucky outcomes:
after finding the middle key, (n comparisons), you have two subarrays of size N/2. Perform 2 searches for the "quartile" keys (n-1 comparisons each), reducing to N/4 subarrays...
In total, n + 2(n-1) + 4(n-2) + ... + 2^(k-1)(n-k+1) comparisons. After a bit of math, this equals roughly K.n-K.k = K.(n-k).
This is a best case scenario and the savings are not so significant compared to independent searches (K.n comparisons). Anyway, the worst case (all searches resulting in imbalanced partitions) is not worse than independent searches.
UPDATE: this is an instance of the Minimum Comparison Merging problem
Finding the locations of the K keys in the array of N values is the same as merging the two sorted sequences.
From Knuth Vol. 3, Section 5.3.2, we know that at least ceiling(lg(C(N+K,K))) comparisons are required (because there are C(N+K,K) ways to intersperse the keys in the array). When K is much smaller than N, this is close to lg((N^K/K!), or K lg(N) - K lg(K) = K.(n-k).
This bound cannot be beaten by any comparison-based method, so any such algorithm will take time essentially proportional to the number of keys.

Sort needles.
Search for first needle
Update lower bound of haystack with search result
Search for last needle
Update upper bound of haystack with search result
Go 2.
While not optimal it is much easier to implement.

If you have array of ints, and you want to search for minimum number of comparisons, I want to suggest you interpolation search from Knuth, 6.2.1. If binary search requires Log(N) iterations (and comparisons), interpolation search requires only Log(Log(N)) operations.
For details and code sample see:
http://en.wikipedia.org/wiki/Interpolation_search
http://xlinux.nist.gov/dads//HTML/interpolationSearch.html

I know the question was regarding C, but I just did an implementation of this in Javascript I thought I'd share. Not intended to work if you have duplicate elements in the array...I think it will just return any of the possible indexes in that case. For an array with 1 million elements where you search for each element its about 2.5x faster. If you also search for elements that are not contained in the array then its even faster. In one data set I through at it it was several times faster. For small arrays its about the same
singleSearch=function(array, num) {
return this.singleSearch_(array, num, 0, array.length)
}
singleSearch_=function(array, num, left, right){
while (left < right) {
var middle =(left + right) >> 1;
var midValue = array[middle];
if (num > midValue) {
left = middle + 1;
} else {
right = middle;
}
}
return left;
};
multiSearch=function(array, nums) {
var numsLength=nums.length;
var results=new Int32Array(numsLength);
this.multiSearch_(array, nums, 0, array.length, 0, numsLength, results);
return results;
};
multiSearch_=function(array, nums, left, right, numsLeft, numsRight, results) {
var middle = (left + right) >> 1;
var midValue = array[middle];
var numsMiddle = this.singleSearch_(nums, midValue, numsLeft, numsRight);
if ((numsRight - numsLeft) > 1) {
if (middle + 1 < right) {
var newLeft = middle;
var newRight = middle;
if ((numsRight - numsMiddle) > 0) {
this.multiSearch_(array, nums, newLeft, right, numsMiddle, numsRight, results);
}
if (numsMiddle - numsLeft > 0) {
this.multiSearch_(array, nums, left, newRight, numsLeft, numsMiddle, results);
}
}
else {
for (var i = numsLeft; i < numsRight; i++) {
var result = this.singleSearch_(array, nums[i], left, right);
results[i] = result;
}
}
}
else {
var result = this.singleSearch_(array, nums[numsLeft], left, right);
results[numsLeft] = result;
};
}

// A recursive binary search based function. It returns index of x in
// given array arr[l..r] is present, otherwise -1.
int binarySearch(int arr[], int l, int r, int x)
{
if (r >= l)
{
int mid = l + (r - l)/2;
// If the element is present at one of the middle 3 positions
if (arr[mid] == x) return mid;
if (mid > l && arr[mid-1] == x) return (mid - 1);
if (mid < r && arr[mid+1] == x) return (mid + 1);
// If element is smaller than mid, then it can only be present
// in left subarray
if (arr[mid] > x) return binarySearch(arr, l, mid-2, x);
// Else the element can only be present in right subarray
return binarySearch(arr, mid+2, r, x);
}
// We reach here when element is not present in array
return -1;
}

Algorithm to find maximum sum of elements array such that not more than k elements are adjacent

Hi,
I came across this question. Given an array containing only positive values. You need to find the maximum sum that could result by adding the elements. The condition is that you cannot pick more than k adjacent elements. My simple solution is this
http://pastebin.com/s4KxjQRN
This solution does not produce correct input in all cases. I am not able to figure out why.
Can any one help? Thank you.

In your code, you just skip on every k+1th element. Sometimes, it's better to skip on more elements, but do it wisely. (choose the lowest numbers to skip on, etc)
Edit: some simple recursive solution: (it's not effective, but will work)
long maxsum(int n,int k,long *profits) {
long sum=0,max=0,cur;
int i;
if (n<=k) {
for (i=0;i<n;i++) sum+=profits[i];
return sum;
}
for (i=0;i<=k;i++) {
cur=sum+maxsum(n-i-1,k,profits+i+1);
if (cur>max) max=cur;
sum+=profits[i];
}
return max;
}

Find n-th smallest element in array without sorting?

I want to write a program to find the n-th smallest element without using any sorting technique..
Can we do it recursively, divide and conquer style like quick-sort?
If not, how?

You can find information about that problem here: Selection algorithm.

What you are referring to is the Selection Algorithm, as previously noted. Specifically, your reference to quicksort suggests you are thinking of the partition based selection.
Here's how it works:
Like in Quicksort, you start by picking a good
pivot: something that you think is nearly
half-way through your list. Then you
go through your entire list of items
swapping things back and forth until
all the items less than your pivot
are in the beginning of the list, and
all things greater than your pivot
are at the end. Your pivot goes into the leftover spot in the middle.
Normally in a quicksort you'd recurse
on both sides of the pivot, but for
the Selection Algorithm you'll only
recurse on the side that contains the
index you are interested in. So, if
you want to find the 3rd lowest
value, recurse on whichever side
contains index 2 (because index 0 is
the 1st lowest value).
You can stop recursing when you've
narrowed the region to just the one
index. At the end, you'll have one
unsorted list of the "m-1" smallest
objects, and another unsorted list of the "n-m" largest
objects. The "m"th object will be inbetween.
This algorithm is also good for finding a sorted list of the highest m elements... just select the m'th largest element, and sort the list above it. Or, for an algorithm that is a little bit faster, do the Quicksort algorithm, but decline to recurse into regions not overlapping the region for which you want to find the sorted values.
The really neat thing about this is that it normally runs in O(n) time. The first time through, it sees the entire list. On the first recursion, it sees about half, then one quarter, etc. So, it looks at about 2n elements, therefore it runs in O(n) time. Unfortunately, as in quicksort, if you consistently pick a bad pivot, you'll be running in O(n2) time.

This task is quite possible to complete within roughly O(n) time (n being the length of the list) by using a heap structure (specifically, a priority queue based on a Fibonacci heap), which gives O(1) insertion time and O(log n) removal time).
Consider the task of retrieving the m-th smallest element from the list. By simply looping over the list and adding each item to the priority queue (of size m), you can effectively create a queue of each of the items in the list in O(n) time (or possibly fewer using some optimisations, though I'm not sure this is exceedingly helpful). Then, it is a straightforward matter of removing the element with lowest priority in the queue (highest priority being the smallest item), which only takes O(log m) time in total, and you're finished.
So overall, the time complexity of the algorithm would be O(n + log n), but since log n << n (i.e. n grows a lot faster than log n), this reduces to simply O(n). I don't think you'll be able to get anything significantly more efficient than this in the general case.

You can use Binary heap, if u dont want to use fibonacci heap.
Algo:
Contruct the min binary heap from the array this operation will take O(n) time.
Since this is a min binary heap, the element at the root is the minimum value.
So keep on removing element frm root, till u get ur kth minimum value. o(1) operation
Make sure after every remove you re-store the heap kO(logn) operation.
So running time here is O(klogn) + O(n)............so it is O(klogn)...

Two stacks can be used like this to locate the Nth smallest number in one pass.
Start with empty Stack-A and Stack-B
PUSH the first number into Stack-A
The next number onwards, choose to PUSH into Stack-A only if the number is smaller than its top
When you have to PUSH into Stack-A, run through these steps
While TOP of Stack-A is larger than new number, POP TOP of Stack-A and push it into Stack-B
When Stack-A goes empty or its TOP is smaller than new number, PUSH in the new number and restore the contents of Stack-B over it
At this point you have inserted the new number to its correct (sorted) place in Stack-A and Stack-B is empty again
If Stack-A depth is now sufficient you have reached the end of your search
I generally agree to Noldorins' optimization analysis.
This stack solution is towards a simple scheme that will work (with relatively more data movement -- across the two stacks). The heap scheme reduces the fetch for Nth smallest number to a tree traversal (log m).
If your target is an optimal solution (say for a large set of numbers or maybe for a programming assignment, where optimization and the demonstration of it are critical) you should use the heap technique.
The stack solution can be compressed in space requirements by implementing the two stacks within the same space of K elements (where K is the size of your data set). So, the downside is just extra stack movement as you insert.

Here is the Ans to find Kth smallest element from an array:
#include<stdio.h>
#include<conio.h>
#include<iostream>
using namespace std;
int Nthmin=0,j=0,i;
int GetNthSmall(int numbers[],int NoOfElements,int Nthsmall);
int main()
{
int size;
cout<<"Enter Size of array\n";
cin>>size;
int *arr=(int*)malloc(sizeof(int)*size);
cout<<"\nEnter array elements\n";
for(i=0;i<size;i++)
cin>>*(arr+i);
cout<<"\n";
for(i=0;i<size;i++)
cout<<*(arr+i)<<" ";
cout<<"\n";
int n=sizeof(arr)/sizeof(int);
int result=GetNthSmall(arr,size,3);
printf("Result = %d",result);
getch();
return 0;
}
int GetNthSmall(int numbers[],int NoOfElements,int Nthsmall)
{
int min=numbers[0];
while(j<Nthsmall)
{
Nthmin=numbers[0];
for(i=1;i<NoOfElements;i++)
{
if(j==0)
{
if(numbers[i]<min)
{
min=numbers[i];
}
Nthmin=min;
}
else
{
if(numbers[i]<Nthmin && numbers[i]>min)
Nthmin=numbers[i];
}
}
min=Nthmin;
j++;
}
return Nthmin;
}

The simplest way to find the nth largest element in an array without using any sorting methods.
public static void kthLargestElement() {
int[] a = { 5, 4, 3, 2, 1, 9, 8 };
int n = 3;
int max = a[0], min = a[0];
for (int i = 0; i < a.length; i++) {
if (a[i] < min) {
min = a[i];
}
if (a[i] > max) {
max = a[i];
}
}
int max1 = max, c = 0;
for (int i = 0; i < a.length; i++) {
for (int j = 0; j < a.length; j++) {
if (a[j] > min && a[j] < max) {
max = a[j];
}
}
min = max;
max = max1;
c++;
if (c == (a.length - n)) {
System.out.println(min);
}
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight