what is the algorithm that will give me O(logd) - arrays

the question is " Suggest an algorithm that takes a sorted Array and X , and it will return the index of X in the Array if it's not found in the array return -1 , the Time Complexity of the algorithm should be O(log d ) while d is the number of elements that are smaller than X
I cant think of something other than looking at the middle index and compare it if it smaller or bigger than X , then do the same thing recursively . but i don't think it is O(log d ) . I have a homework to submit and I don't know what to do .

Exponential search is O(log d).
Starting at upper = 0, compare the value array[upper] to value. If it is less than value, update upper = (upper + 1) * 2; until array[upper] >= value. If it is equal, then return upper, otherwise perform a binary search between [upper / 2, upper).
In JavaScript it would look like this:
function exponentialSearch (array, value) {
let upper = 0;
// exponential gallop
while (array[upper] < value) upper = (upper + 1) * 2;
if (array[upper] === value) return upper;
// binary search
for (let lower = upper / 2; upper > lower; ) {
const bisect = lower + Math.floor((upper - lower) / 2);
if (array[bisect] > value) upper = bisect;
else if (array[bisect] < value) lower = bisect;
else return bisect;
}
return -1;
}

Related

Minimum number of operations to make pair sums of array equal

You are given a list of integers nums of even length. Consider an operation where you pick any number in nums and update it with a value between [1, max(nums)]. Return the number of operations required such that for every i, nums[i] + nums[n - 1 - i] equals to the same number. The problem can be solved greedily.
Note: n is the size of the array and max(nums) is the maximum element in nums.
For example: nums = [1,5,4,5,9,3] the expected operations are 2.
Explanation: The maxnums is 9, so I can change any element of nums to any number between [1, 9] which costs one operation.
Choose 1 at index 0 and change it to 6
Choose 9 at index 4 and change it to 4.
Now this makes the nums[0] + nums[5] = nums[1] + nums[4] = nums[2] + nums[3] = 9. We had changed 2 numbers and it cost us 2 operations which is the minimum for this input.
The approach that I've used is to find the median of the sums and use that to find the number of operations greedily.
Let us find the all the sums of the array based on the given condition.
Sums can be calculated by nums[i] + nums[n-1-i].
Let i = 0, nums[0] + nums[6-1-0] = 4.
i = 1, nums[1] + nums[6-1-1] = 14.
i = 2, nums[2] + nums[6-1-2] = 9.
Store these sums in an array and sort it.
sums = [4,9,14] after sorting. Now find the median from sums which is 9 as it is the middle element.
Now I use this median to equalize the sums and we can find the number of operations. I've also added the code that I use to calculate the number of operations.
int operations = 0;
for(int i=0; i<nums.size()/2; i++) {
if(nums[i] + nums[nums.size()-1-i] == mid)
continue;
if(nums[i] + nums[nums.size()-1-i] > mid) {
if(nums[i] + 1 <= mid || 1 + nums[nums.size()-1-i] <= mid) {
operations++;
} else {
operations += 2;
}
} else if (maxnums + nums[nums.size()-1-i] >= mid || nums[i] + maxnums >= mid) {
operations++;
} else {
operations += 2;
}
}
The total operations for this example is 2 which is correct.
The problem here is that, for some cases choosing the median gives the wrong result. For example, the nums = [10, 7, 2, 9, 4, 1, 7, 3, 10, 8] expects 5 operations but my code gives 6 if the median (16) was chosen.
Is choosing the median not the most optimal approach? Can anyone help provide a better approach?
I think the following should work:
iterate pairs of numbers
for each pair, calculate the sum of that pair, as well as the min and max sum that can be achieved by changing just one of the values
update a dictionary/map with -1 when starting a new "region" requiring one fewer change, and +1 when that region is over
iterate the boundaries in that dictionary and update the total changes needed to find the sum that requires the fewest updates
Example code in Python, giving 9 as the best sum for your example, requiring 5 changes.
from collections import defaultdict
nums = [10, 7, 2, 9, 4, 1, 7, 3, 10, 8]
m = max(nums)
pairs = [(nums[i], nums[-1-i]) for i in range(len(nums)//2)]
print(pairs)
score = defaultdict(int)
for a, b in map(sorted, pairs):
low = a + 1
high = m + b
score[low] -= 1
score[a+b] -= 1
score[a+b+1] += 1
score[high+1] += 1
print(sorted(score.items()))
cur = best = len(nums)
num = None
for i in sorted(score):
cur += score[i]
print(i, cur)
if cur < best:
best, num = cur, i
print(best, num)
The total complexity of this should be O(nlogn), needing O(n) to create the dictionary, O(nlogn) for sorting, and O(n) for iterating the sorted values in that dictionary. (Do not use an array or the complexity could be much higher if max(nums) >> len(nums))
(UPDATED receiving additional information)
The optimal sum must be one of the following:
a sum of a pair -> because you can keep both numbers of that pair
the min value of a pair + 1 -> because it is the smallest possible sum you only need to change 1 of the numbers for that pair
the max value of a pair + the max overall value -> because it is the largest possible sum you only need to change 1 of the numbers for that pair
Hence, there are order N possible sums.
The total number of operations for this optimal sum can be calculated in various ways.
The O(N²) is quite trivial. And you can implement it quite easily if you want to confirm other solutions work.
Making it O(N log N)
getting all possible optimal sums O(N)
for each possible sum you can calculate occ the number of pairs having that exact sum and thus don't require any manipulation. O(N)
For all other pairs you just need to know if it requires 1 or 2 operations to get to that sum. Which is 2 when it is either impossible if the smallest of the pair is too big to reach sum with the smallest possible number or when the largest of the pair is too small to reach the sum with the largest possible number. Many data structures could be used for that (BIT, Tree, ..). I just used a sorted list and applied binary search (not exhaustively tested though). O(N log N)
Example solution in java:
int[] nums = new int[] {10, 7, 2, 9, 4, 1, 7, 3, 10, 8};
// preprocess pairs: O(N)
int min = 1
, max = nums[0];
List<Integer> minList = new ArrayList<>();
List<Integer> maxList = new ArrayList<>();
Map<Integer, Integer> occ = new HashMap<>();
for (int i=0;i<nums.length/2;i++) {
int curMin = Math.min(nums[i], nums[nums.length-1-i]);
int curMax = Math.max(nums[i], nums[nums.length-1-i]);
min = Math.min(min, curMin);
max = Math.max(max, curMax);
minList.add(curMin);
maxList.add(curMax);
// create all pair sums
int pairSum = nums[i] + nums[nums.length-1-i];
int currentOccurences = occ.getOrDefault(pairSum, 0);
occ.put(pairSum, currentOccurences + 1);
}
// sorting 0(N log N)
Collections.sort(minList);
Collections.sort(maxList);
// border cases
for (int a : minList) {
occ.putIfAbsent(a + max, 0);
}
for (int a : maxList) {
occ.putIfAbsent(a + min, 0);
}
// loop over all condidates O(N log N)
int best = (nums.length-2);
int med = max + min;
for (Map.Entry<Integer, Integer> entry : occ.entrySet()) {
int sum = entry.getKey();
int count = entry.getValue();
int requiredChanges = (nums.length / 2) - count;
if (sum > med) {
// border case where max of pair is too small to be changed to pair of sum
requiredChanges += countSmaller(maxList, sum - max);
} else if (sum < med) {
// border case where having a min of pair is too big to be changed to pair of sum
requiredChanges += countGreater(minList, sum - min);
}
System.out.println(sum + " -> " + requiredChanges);
best = Math.min(best, requiredChanges);
}
System.out.println("Result: " + best);
}
// O(log N)
private static int countGreater(List<Integer> list, int key) {
int low=0, high=list.size();
while(low < high) {
int mid = (low + high) / 2;
if (list.get(mid) <= key) {
low = mid + 1;
} else {
high = mid;
}
}
return list.size() - low;
}
// O(log N)
private static int countSmaller(List<Integer> list, int key) {
int low=0, high=list.size();
while(low < high) {
int mid = (low + high) / 2;
if (list.get(mid) < key) {
low = mid + 1;
} else {
high = mid;
}
}
return low;
}
Just to offer some theory -- we can easily show that the upper bound for needed changes is n / 2, where n is the number of elements. This is because each pair can be made in one change to anything between 1 + C and max(nums) + C, where C is any of the two elements in a pair. For the smallest C, we can bind max(nums) + 1 at the highest; and for the largest C, we can bind 1 + max(nums) at the lowest.
Since those two bounds at the worst cases are equal, we are guaranteed there is some solution with at most N / 2 changes that leaves at least one C (array element) unchanged.
From that we conclude that an optimal solution either (1) has at least one pair where neither element is changed and the rest require only one change per pair, or (2) our optimal solution has n / 2 changes as discussed above.
We can therefore proceed to test each existing pair's single or zero change possibilities as candidates. We can iterate over a sorted list of two to three possibilities per pair, labeled with each cost and index. (Other authors on this page have offered similar ways and code.)

O(log n) algorithm to find best insert position in sorted array

I'm trying to make an algorithm that finds the best position to insert the target into the already sorted array.
The goal is to either return the position of the item if it exists in the list, else return the position it would go into to keep the list sorted.
So say I have a list:
0 1 2 3 4 5 6
---------------------------------
| 1 | 2 | 4 | 9 | 10 | 39 | 100 |
---------------------------------
And my target item is 14
It should return an index position of 5
Pseudo-code I currently have:
array = generateSomeArrayOfOrderedNumbers()
number findBestIndex(target, start, end)
mid = abs(end - start) / 2
if (mid < 2)
// Not really sure what to put here
return start + 1 // ??
if (target < array[mid])
// The target belongs on the left side of our list //
return findBestIndex(target, start, mid - 1)
else
// The target belongs on the right side of our list //
return findBestIndex(target, mid + 1, end)
I not really sure what to put at this point. I tried to take a binary search approach to this, but this is the best I could come up with after 5 rewrites or so.
There's several problems with your code:
mid = abs(end - start) / 2
This is not the middle between start and end, it's half the distance between them (rounded down to an integer). Later you use it like it was indeed a valid index:
findBestIndex(target, start, mid - 1)
Which it is not. You probably meant to use mid = (start + end) // 2 or something here.
You also miss a few indices because you skip over the mid:
return findBestIndex(target, start, mid - 1)
...
return findBestIndex(target, mid + 1, end)
Your base case must now be expressed a bit differently as well. A good candidate is the condition
if start == end
Because now you definitely know you're finished searching. Note that you also should consider the case where all the array elements are smaller than target, so you need to insert it at the end.
I don't often search binary, but if I do, this is how
Binary search is something that is surprisingly hard to get right if you've never done it before. I usually use the following pattern if I do a binary search:
lo, hi = 0, n // [lo, hi] is the search range, but hi will never be inspected.
while lo < hi:
mid = (lo + hi) // 2
if check(mid): hi = mid
else: lo = mid + 1
Under the condition that check is a monotone binary predicate (it is always false up to some point and true from that point on), after this loop, lo == hi will be the first number in the range [0..n] with check(lo) == true. check(n) is implicitely assumed to be true (that's part of the magic of this approach).
So what is a monotone predicate that is true for all indices including and after our target position and false for all positions before?
If we think about it, we want to find the first number in the array that is larger than our target, so we just plug that in and we're good to go:
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if (a[mid] > target): hi = mid
else: lo = mid + 1
return lo;
this is the code I have used:
int binarySearch( float arr[] , float x , int low , int high )
{
int mid;
while( low < high ) {
mid = ( high + low ) / 2;
if( arr[mid]== x ) {
break;
}
else if( arr[mid] > x ) {
high=mid-1;
}
else {
low= mid+1;
}
}
mid = ( high + low ) / 2;
if (x<=arr[mid])
return mid;
else
return mid+1;
}
the point is that even when low becomes equal to high you have to check.
see this example for instance:
0.5->0.75
and you are looking for true position of 0.7 or 1.
in both cases when going out of while loop: low=high=1
but one of them should be placed in position 1 and the other in position 2.
You are on the right track.
First, you do not need abs in mid = abs(end + start) / 2
Assume abs here means absolute value, because end should always be no less than start, unless there is some mistake in your code. So here abs never helps but may be potentially hiding your problem make it hard to debug.
You do not need if (mid < 2) section either , nothing special about mid smaller than two.
array = generateSomeArrayOfOrderedNumbers()
int start = 0;
int end = array.size();
int findBestIndex(target, start, end){
if (start == end){ //you already searched entire array, return the position to insert
if (stat == 0) return 0; // if it's the beginning of the array just return 0.
if(array[start] > target) return start -1; //if last searched index is bigger than target return the position before it.
else return start;
}
mid = (end - start) / 2
// find correct position
if(target == array[mid]) return mid;
if (target < array[mid])
{
// The target belongs on the left side of our list //
return findBestIndex(target, start, mid - 1)
}
else
{
// The target belongs on the right side of our list //
return findBestIndex(target, mid + 1, end)
}
}
I solved this by counting the number of elements that are strictly smaller (<) than the key to insert. The retrieved count is the insert position. Here is a ready to use implementation in Java:
int binarySearchCount(int array[], int left, int right, int key) {
if(left > right) {
return -1; // or throw exception
}
int mid = -1; //init with arbitrary value
while (left <= right) {
// Middle element
mid = (left + right) / 2;
// If the search key on the left half
if (key < array[mid]) {
right = mid - 1;
}
// If the search key on the right half
else if (key > array[mid]) {
left = mid + 1;
}
// We found the key
else {
// handle duplicates
while(mid > 0 && array[mid-1] == array[mid]) {
--mid;
}
break;
}
}
// return the number of elements that are strictly smaller (<) than the key
return key <= array[mid] ? mid : mid + 1;
}
Below is the code that is used to search a target value (which is a list of an array) from the sorted array (It contains duplicate values).
It returns the array of positions where we can insert the target values.
Hope this code helps you in any way.
Any suggestions are welcome.
static int[] climbingLeaderboard(int[] scores, int[] alice) {
int[] noDuplicateScores = IntStream.of(scores).distinct().toArray();
int[] rank = new int[alice.length];
for (int k = 0; k < alice.length; k++) {
int i=0;
int j = noDuplicateScores.length-1;
int pos=0;
int target = alice[k];
while(i<=j) {
int mid = (j+i)/2;
if(target < noDuplicateScores[mid]) {
i = mid +1;
pos = i;
}else if(target > noDuplicateScores[mid]) {
j = mid-1;
pos = j+1;
}else {
pos = mid;
break;
}
}
rank[k] = pos+1;
}
return rank;
}
Here is a solution by tweaking the binary search using python.
def func(x, y):
start = 0
end = len(x)
while start <= end:
mid = (start + end)//2
print(start, end, mid)
if mid + 1 >= len(x):
return mid + 1
if x[mid] < y and x[mid + 1] > y:
return mid + 1
elif x[mid] > y:
end = mid - 1
else:
start = mid + 1
return 0
func([1,2,4,5], 3)
Solution with slightly modified binary search in java
int findInsertionIndex(int[] arr, int t) {
int s = 0, e = arr.length - 1;
if(t < arr[s])return s;
if(t > arr[e])return e;
while (s < e){
int mid = (s + e)/2;
if(arr[mid] >= t){
e = mid - 1;
}
if(arr[mid] < t){
s = mid + 1;
}
}
return arr[s] < t? s + 1 : s;
}
The above code works upon these possible scenarios:
If arr[mid] > target -> target index lies in left half, Find the index of first max value of target and return it.
If arr[mid] < target -> target index lies in right half, Find the index of first min value of target and return the index + 1 to point the target/insertion index.
if arr[mid] == target -> Find the first occurring index of target value and return it.

Algorithm for finding a combination of integers greater than a specified value

I've been trying to develop an algorithm that would take an input array and return an array such that the integers contained within are the combination of integers with the smallest sum greater than a specified value (limited to a combination of size k).
For instance, if I have the array [1,4,5,10,17,34] and I specified a minimum sum of 31, the function would return [1,4,10,17]. Or, if I wanted it limited to a max array size of 2, it would just return [34].
Is there an efficient way to do this? Any help would be appreciated!
Something like this? It returns the value, but could easily be adapted to return the sequence.
Algorithm: assuming sorted input, test the k-length combinations for the smallest sum greater than min, stop after the first array element greater than min.
JavaScript:
var roses = [1,4,5,10,17,34]
function f(index,current,k,best,min,K)
{
if (roses.length == index)
return best
for (var i = index; i < roses.length; i++)
{
var candidate = current + roses[i]
if (candidate == min + 1)
return candidate
if (candidate > min)
best = best < 0 ? candidate : Math.min(best,candidate)
if (roses[i] > min)
break
if (k + 1 < K)
{
var nextCandidate = f(i + 1,candidate,k + 1,best,min,K)
if (nextCandidate > min)
best = best < 0 ? nextCandidate : Math.min(best,nextCandidate)
if (best == min + 1)
return best
}
}
return best
}
Output:
console.log(f(0,0,0,-1,31,3))
32
console.log(f(0,0,0,-1,31,2))
34
This is more of a hybrid solution, with Dynamic Programming and Back Tracking. We can use Back Tracking alone to solve this problem, but then we have to do exhaustive searching (2^N) to find the solution. The DP part optimizes the search space in Back Tracking.
import sys
from collections import OrderedDict
MinimumSum = 31
MaxArraySize = 4
InputData = sorted([1,4,5,10,17,34])
# Input part is over
Target = MinimumSum + 1
Previous, Current = OrderedDict({0:0}), OrderedDict({0:0})
for Number in InputData:
for CurrentNumber, Count in Previous.items():
if Number + CurrentNumber in Current:
Current[Number + CurrentNumber] = min(Current[Number + CurrentNumber], Count + 1)
else:
Current[Number + CurrentNumber] = Count + 1
Previous = Current.copy()
FoundSolution = False
for Number, Count in Previous.items():
if (Number >= Target and Count < MaxArraySize):
MaxArraySize = Count
Target = Number
FoundSolution = True
break
if not FoundSolution:
print "Not possible"
sys.exit(0)
else:
print Target, MaxArraySize
FoundSolution = False
Solution = []
def Backtrack(CurrentIndex, Sum, MaxArraySizeUsed):
global FoundSolution
if (MaxArraySizeUsed <= MaxArraySize and Sum == Target):
FoundSolution = True
return
if (CurrentIndex == len(InputData) or MaxArraySizeUsed > MaxArraySize or Sum > Target):
return
for i in range(CurrentIndex, len(InputData)):
Backtrack(i + 1, Sum, MaxArraySizeUsed)
if (FoundSolution): return
Backtrack(i + 1, Sum + InputData[i], MaxArraySizeUsed + 1)
if (FoundSolution):
Solution.append(InputData[i])
return
Backtrack(0, 0, 0)
print sorted(Solution)
Note: As per the examples given by you in the question, Minimum sum and Maximum Array Size are strictly greater and lesser than the values specified, respectively.
For this input
MinimumSum = 31
MaxArraySize = 4
InputData = sorted([1,4,5,10,17,34])
Output is
[5, 10, 17]
where as, for this input
MinimumSum = 31
MaxArraySize = 3
InputData = sorted([1,4,5,10,17,34])
Output is
[34]
Explanation
Target = MinimumSum + 1
Previous, Current = OrderedDict({0:0}), OrderedDict({0:0})
for Number in InputData:
for CurrentNumber, Count in Previous.items():
if Number + CurrentNumber in Current:
Current[Number + CurrentNumber] = min(Current[Number + CurrentNumber], Count + 1)
else:
Current[Number + CurrentNumber] = Count + 1
Previous = Current.copy()
This part of the program finds the minimum number of numbers from the input data, required to make the sum of numbers from 1 to the maximum possible number (which is the sum of all the input data). Its a dynamic programming solution, for knapsack problem. You can read about that in the internet.
FoundSolution = False
for Number, Count in Previous.items():
if (Number >= Target and Count < MaxArraySize):
MaxArraySize = Count
Target = Number
FoundSolution = True
break
if not FoundSolution:
print "Not possible"
sys.exit(0)
else:
print Target, MaxArraySize
This part of the program, finds the Target value which matches the MaxArraySize criteria.
def Backtrack(CurrentIndex, Sum, MaxArraySizeUsed):
global FoundSolution
if (MaxArraySizeUsed <= MaxArraySize and Sum == Target):
FoundSolution = True
return
if (CurrentIndex == len(InputData) or MaxArraySizeUsed > MaxArraySize or Sum > Target):
return
for i in range(CurrentIndex, len(InputData)):
Backtrack(i + 1, Sum, MaxArraySizeUsed)
if (FoundSolution): return
Backtrack(i + 1, Sum + InputData[i], MaxArraySizeUsed + 1)
if (FoundSolution):
Solution.append(InputData[i])
return
Backtrack(0, 0, 0)
Now that we know that the solution exists, we want to recreate the solution. We use backtracking technique here. You can easily find lot of good tutorials about this also in the internet.

Find the first element in a sorted array that is greater than the target

In a general binary search, we are looking for a value which appears in the array. Sometimes, however, we need to find the first element which is either greater or less than a target.
Here is my ugly, incomplete solution:
// Assume all elements are positive, i.e., greater than zero
int bs (int[] a, int t) {
int s = 0, e = a.length;
int firstlarge = 1 << 30;
int firstlargeindex = -1;
while (s < e) {
int m = (s + e) / 2;
if (a[m] > t) {
// how can I know a[m] is the first larger than
if(a[m] < firstlarge) {
firstlarge = a[m];
firstlargeindex = m;
}
e = m - 1;
} else if (a[m] < /* something */) {
// go to the right part
// how can i know is the first less than
}
}
}
Is there a more elegant solution for this kind of problem?
One way of thinking about this problem is to think about doing a binary search over a transformed version of the array, where the array has been modified by applying the function
f(x) = 1 if x > target
0 else
Now, the goal is to find the very first place that this function takes on the value 1. We can do that using a binary search as follows:
int low = 0, high = numElems; // numElems is the size of the array i.e arr.size()
while (low != high) {
int mid = (low + high) / 2; // Or a fancy way to avoid int overflow
if (arr[mid] <= target) {
/* This index, and everything below it, must not be the first element
* greater than what we're looking for because this element is no greater
* than the element.
*/
low = mid + 1;
}
else {
/* This element is at least as large as the element, so anything after it can't
* be the first element that's at least as large.
*/
high = mid;
}
}
/* Now, low and high both point to the element in question. */
To see that this algorithm is correct, consider each comparison being made. If we find an element that's no greater than the target element, then it and everything below it can't possibly match, so there's no need to search that region. We can recursively search the right half. If we find an element that is larger than the element in question, then anything after it must also be larger, so they can't be the first element that's bigger and so we don't need to search them. The middle element is thus the last possible place it could be.
Note that on each iteration we drop off at least half the remaining elements from consideration. If the top branch executes, then the elements in the range [low, (low + high) / 2] are all discarded, causing us to lose floor((low + high) / 2) - low + 1 >= (low + high) / 2 - low = (high - low) / 2 elements.
If the bottom branch executes, then the elements in the range [(low + high) / 2 + 1, high] are all discarded. This loses us high - floor(low + high) / 2 + 1 >= high - (low + high) / 2 = (high - low) / 2 elements.
Consequently, we'll end up finding the first element greater than the target in O(lg n) iterations of this process.
Here's a trace of the algorithm running on the array 0 0 1 1 1 1.
Initially, we have
0 0 1 1 1 1
L = 0 H = 6
So we compute mid = (0 + 6) / 2 = 3, so we inspect the element at position 3, which has value 1. Since 1 > 0, we set high = mid = 3. We now have
0 0 1
L H
We compute mid = (0 + 3) / 2 = 1, so we inspect element 1. Since this has value 0 <= 0, we set mid = low + 1 = 2. We're now left with L = 2 and H = 3:
0 0 1
L H
Now, we compute mid = (2 + 3) / 2 = 2. The element at index 2 is 1, and since 1 ≥ 0, we set H = mid = 2, at which point we stop, and indeed we're looking at the first element greater than 0.
You can use std::upper_bound if the array is sorted (assuming n is the size of array a[]):
int* p = std::upper_bound( a, a + n, x );
if( p == a + n )
std::cout << "No element greater";
else
std::cout << "The first element greater is " << *p
<< " at position " << p - a;
After many years of teaching algorithms, my approach for solving binary search problems is to set the start and the end on the elements, not outside of the array. This way I can feel what's going on and everything is under control, without feeling magic about the solution.
The key point in solving binary search problems (and many other loop-based solutions) is a set of good invariants. Choosing the right invariant makes problem-solving a cake. It took me many years to grasp the invariant concept although I had learned it first in college many years ago.
Even if you want to solve binary search problems by choosing start or end outside of the array, you can still achieve it with a proper invariant. That being said, my choice is stated above to always set a start on the first element and end on the last element of the array.
So to summarize, so far we have:
int start = 0;
int end = a.length - 1;
Now the invariant. The array right now we have is [start, end]. We don't know anything yet about the elements. All of them might be greater than the target, or all might be smaller, or some smaller and some larger. So we can't make any assumptions so far about the elements. Our goal is to find the first element greater than the target. So we choose the invariants like this:
Any element to the right of the end is greater than the target. Any
element to the left of the start is smaller than or equal to the
target.
We can easily see that our invariant is correct at the start (ie before going into any loop). All the elements to the left of the start (no elements basically) are smaller than or equal to the target, same reasoning for the end.
With this invariant, when the loop finishes, the first element after the end will be the answer (remember the invariant that the right side of the end are all greater than the target?). So answer = end + 1.
Also, we need to note that when the loop finishes, the start will be one more than the end. ie start = end + 1. So equivalently we can say start is the answer as well (invariant was that anything to the left of the start is smaller than or equal to the target, so start itself is the first element larger than the target).
So everything being said, here is the code.
public static int find(int a[], int target) {
int st = 0;
int end = a.length - 1;
while(st <= end) {
int mid = (st + end) / 2; // or elegant way of st + (end - st) / 2;
if (a[mid] <= target) {
st = mid + 1;
} else { // mid > target
end = mid - 1;
}
}
return st; // or return end + 1
}
A few extra notes about this way of solving binary search problems:
This type of solution always shrinks the size of subarrays by at least 1. This is obvious in the code. The new start or end are either +1 or -1 in the mid. I like this approach better than including the mid in both or one side, and then reason later why the algo is correct. This way it's more tangible and more error-free.
The condition for the while loop is st <= end. Not st < end. That means the smallest size that enters the while loop is an array of size 1. And that totally aligns with what we expect. In other ways of solving binary search problems, sometimes the smallest size is an array of size 2 (if st < end), and honestly I find it much easier to always address all array sizes including size 1.
So hope this clarifies the solution for this problem and many other binary search problems. Treat this solution as a way to professionally understand and solve many more binary search problems without ever wobbling whether the algorithm works for edge cases or not.
How about the following recursive approach:
public static int minElementGreaterThanOrEqualToKey(int A[], int key,
int imin, int imax) {
// Return -1 if the maximum value is less than the minimum or if the key
// is great than the maximum
if (imax < imin || key > A[imax])
return -1;
// Return the first element of the array if that element is greater than
// or equal to the key.
if (key < A[imin])
return imin;
// When the minimum and maximum values become equal, we have located the element.
if (imax == imin)
return imax;
else {
// calculate midpoint to cut set in half, avoiding integer overflow
int imid = imin + ((imax - imin) / 2);
// if key is in upper subset, then recursively search in that subset
if (A[imid] < key)
return minElementGreaterThanOrEqualToKey(A, key, imid + 1, imax);
// if key is in lower subset, then recursively search in that subset
else
return minElementGreaterThanOrEqualToKey(A, key, imin, imid);
}
}
public static int search(int target, int[] arr) {
if (arr == null || arr.length == 0)
return -1;
int lower = 0, higher = arr.length - 1, last = -1;
while (lower <= higher) {
int mid = lower + (higher - lower) / 2;
if (target == arr[mid]) {
last = mid;
lower = mid + 1;
} else if (target < arr[mid]) {
higher = mid - 1;
} else {
lower = mid + 1;
}
}
return (last > -1 && last < arr.length - 1) ? last + 1 : -1;
}
If we find target == arr[mid], then any previous element would be either less than or equal to the target. Hence, the lower boundary is set as lower=mid+1. Also, last is the last index of 'target'. Finally, we return last+1 - taking care of boundary conditions.
My implementation uses condition bottom <= top which is different from the answer by templatetypedef.
int FirstElementGreaterThan(int n, const vector<int>& values) {
int B = 0, T = values.size() - 1, M = 0;
while (B <= T) { // B strictly increases, T strictly decreases
M = B + (T - B) / 2;
if (values[M] <= n) { // all values at or before M are not the target
B = M + 1;
} else {
T = M - 1;// search for other elements before M
}
}
return T + 1;
}
Hhere is a modified binary search code in JAVA with time complexity O(logn) that :
returns index of element to be searched if element is present
returns index of next greater element if searched element is not present in array
returns -1 if an element greater than the largest element of array is searched
public static int search(int arr[],int key) {
int low=0,high=arr.length,mid=-1;
boolean flag=false;
while(low<high) {
mid=(low+high)/2;
if(arr[mid]==key) {
flag=true;
break;
} else if(arr[mid]<key) {
low=mid+1;
} else {
high=mid;
}
}
if(flag) {
return mid;
}
else {
if(low>=arr.length)
return -1;
else
return low;
//high will give next smaller
}
}
public static void main(String args[]) throws IOException {
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
//int n=Integer.parseInt(br.readLine());
int arr[]={12,15,54,221,712};
int key=71;
System.out.println(search(arr,key));
br.close();
}
kind =0 : exact match
kind=1 : just grater than x
kind=-1 : just smaller than x;
It returns -1 if no match is found.
#include <iostream>
#include <algorithm>
using namespace std;
int g(int arr[], int l , int r, int x, int kind){
switch(kind){
case 0: // for exact match
if(arr[l] == x) return l;
else if(arr[r] == x) return r;
else return -1;
break;
case 1: // for just greater than x
if(arr[l]>=x) return l;
else if(arr[r]>=x) return r;
else return -1;
break;
case -1: // for just smaller than x
if(arr[r]<=x) return r;
else if(arr[l] <= x) return l;
else return -1;
break;
default:
cout <<"please give "kind" as 0, -1, 1 only" << ednl;
}
}
int f(int arr[], int n, int l, int r, int x, int kind){
if(l==r) return l;
if(l>r) return -1;
int m = l+(r-l)/2;
while(m>l){
if(arr[m] == x) return m;
if(arr[m] > x) r = m;
if(arr[m] < x) l = m;
m = l+(r-l)/2;
}
int pos = g(arr, l, r, x, kind);
return pos;
}
int main()
{
int arr[] = {1,2,3,5,8,14, 22, 44, 55};
int n = sizeof(arr)/sizeof(arr[0]);
sort(arr, arr+n);
int tcs;
cin >> tcs;
while(tcs--){
int l = 0, r = n-1, x = 88, kind = -1; // you can modify these values
cin >> x;
int pos = f(arr, n, l, r, x, kind);
// kind =0: exact match, kind=1: just grater than x, kind=-1: just smaller than x;
cout <<"position"<< pos << " Value ";
if(pos >= 0) cout << arr[pos];
cout << endl;
}
return 0;
}

The Most Efficient Algorithm to Find First Prefix-Match From a Sorted String Array?

Input:
1) A huge sorted array of string SA;
2) A prefix string P;
Output:
The index of the first string matching the input prefix if any.
If there is no such match, then output will be -1.
Example:
SA = {"ab", "abd", "abdf", "abz"}
P = "abd"
The output should be 1 (index starting from 0).
What's the most algorithm way to do this kind of job?
If you only want to do this once, use binary search, if on the other hand you need to do it for many different prefixes but on the same string array, building a radix tree can be a good idea, after you've built the tree each look up will be very fast.
This is just a modified bisection search:
Only check as many characters in each element as are in the search string; and
If you find a match, keep searching backwards (either linearly or by further bisection searches) until you find a non-matching result and then return the index of the last matching result.
It can be done in linear time using a Suffix Tree. Building the suffix tree takes linear time.
The FreeBSD kernel use a Radix tree for its routing table, you should check that.
Here is a possible solution (in Python), which has O(k.log(n)) time complexity and O(1) additional space complexity (considering n strings and k prefix length).
The rationale behind it to perform a binary search which only considers a given character index of the strings. If these are present, continue to the next character index. If any of the prefix characters cannot be found in any string, it returns immediately.
from typing import List
def first(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
result = -1
while left <= right:
mid = left + ((right - left) // 2)
if ( i >= len(items[mid]) ):
left = mid + 1
elif (c < items[mid][i]):
right = mid - 1
elif (c > items[mid][i]):
left = mid + 1
else:
result = mid
right = mid - 1
return result
def last(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
result = -1
while left <= right:
mid = left + ((right - left) // 2)
if ( i >= len(items[mid]) ):
left = mid + 1
elif (c < items[mid][i]):
right = mid - 1
elif (c > items[mid][i]):
left = mid + 1
else:
result = mid
left = mid + 1
return result
def is_prefix(items: List[str], prefix: str):
left = 0
right = len(items) - 1
for i in range(len(prefix)):
c = prefix[i]
left = first(items, prefix, i, c, left, right)
right = last(items, prefix, i, c, left, right)
if (left == -1 or right == -1):
return False
return True
# Test cases
a = ['ab', 'abjsiohjd', 'abikshdiu', 'ashdi','abcde Aasioudhf', 'abcdefgOAJ', 'aa', 'aaap', 'aas', 'asd', 'bbbbb', 'bsadiojh', 'iod', '0asdn', 'asdjd', 'bqw', 'ba']
a.sort()
print(a)
print(is_prefix(a, 'abcdf'))
print(is_prefix(a, 'abcde'))
print(is_prefix(a, 'abcdef'))
print(is_prefix(a, 'abcdefg'))
print(is_prefix(a, 'abcdefgh'))
print(is_prefix(a, 'abcde Aa'))
print(is_prefix(a, 'iod'))
print(is_prefix(a, 'ZZZZZZiod'))
This gist is available at https://gist.github.com/lopespm/9790d60492aff25ea0960fe9ed389c0f
My current solution in mind is, instead of to find the "prefix", try to find a "virtual prefix".
For example, prefix is “abd", try to find a virtual-prefix “abc(255)". (255) just represents the max char number. After locating the "abc(255)". The next word should be the first word matching "abd" if any.
Are you in the position to precalculate all possible prefixes?
If so, you can do that, then use a binary search to find the prefix in the precalculated table. Store the subscript to the desired value with the prefix.
My solution:
Used binary search.
private static int search(String[] words, String searchPrefix) {
if (words == null || words.length == 0) {
return -1;
}
int low = 0;
int high = words.length - 1;
int searchPrefixLength = searchPrefix.length();
while (low <= high) {
int mid = low + (high - low) / 2;
String word = words[mid];
int compare = -1;
if (searchPrefixLength <= word.length()) {
compare = word.substring(0, searchPrefixLength).compareTo(searchPrefix);
}
if (compare == 0) {
return mid;
} else if (compare > 0) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return -1;
}

Resources