The Most Efficient Algorithm to Find First Prefix-Match From a Sorted String Array? - arrays

Input:
1) A huge sorted array of string SA;
2) A prefix string P;
Output:
The index of the first string matching the input prefix if any.
If there is no such match, then output will be -1.
Example:
SA = {"ab", "abd", "abdf", "abz"}
P = "abd"
The output should be 1 (index starting from 0).
What's the most algorithm way to do this kind of job?

If you only want to do this once, use binary search, if on the other hand you need to do it for many different prefixes but on the same string array, building a radix tree can be a good idea, after you've built the tree each look up will be very fast.

This is just a modified bisection search:
Only check as many characters in each element as are in the search string; and
If you find a match, keep searching backwards (either linearly or by further bisection searches) until you find a non-matching result and then return the index of the last matching result.

It can be done in linear time using a Suffix Tree. Building the suffix tree takes linear time.

The FreeBSD kernel use a Radix tree for its routing table, you should check that.

Here is a possible solution (in Python), which has O(k.log(n)) time complexity and O(1) additional space complexity (considering n strings and k prefix length).
The rationale behind it to perform a binary search which only considers a given character index of the strings. If these are present, continue to the next character index. If any of the prefix characters cannot be found in any string, it returns immediately.
from typing import List
def first(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
result = -1
while left <= right:
mid = left + ((right - left) // 2)
if ( i >= len(items[mid]) ):
left = mid + 1
elif (c < items[mid][i]):
right = mid - 1
elif (c > items[mid][i]):
left = mid + 1
else:
result = mid
right = mid - 1
return result
def last(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
result = -1
while left <= right:
mid = left + ((right - left) // 2)
if ( i >= len(items[mid]) ):
left = mid + 1
elif (c < items[mid][i]):
right = mid - 1
elif (c > items[mid][i]):
left = mid + 1
else:
result = mid
left = mid + 1
return result
def is_prefix(items: List[str], prefix: str):
left = 0
right = len(items) - 1
for i in range(len(prefix)):
c = prefix[i]
left = first(items, prefix, i, c, left, right)
right = last(items, prefix, i, c, left, right)
if (left == -1 or right == -1):
return False
return True
# Test cases
a = ['ab', 'abjsiohjd', 'abikshdiu', 'ashdi','abcde Aasioudhf', 'abcdefgOAJ', 'aa', 'aaap', 'aas', 'asd', 'bbbbb', 'bsadiojh', 'iod', '0asdn', 'asdjd', 'bqw', 'ba']
a.sort()
print(a)
print(is_prefix(a, 'abcdf'))
print(is_prefix(a, 'abcde'))
print(is_prefix(a, 'abcdef'))
print(is_prefix(a, 'abcdefg'))
print(is_prefix(a, 'abcdefgh'))
print(is_prefix(a, 'abcde Aa'))
print(is_prefix(a, 'iod'))
print(is_prefix(a, 'ZZZZZZiod'))
This gist is available at https://gist.github.com/lopespm/9790d60492aff25ea0960fe9ed389c0f

My current solution in mind is, instead of to find the "prefix", try to find a "virtual prefix".
For example, prefix is “abd", try to find a virtual-prefix “abc(255)". (255) just represents the max char number. After locating the "abc(255)". The next word should be the first word matching "abd" if any.

Are you in the position to precalculate all possible prefixes?
If so, you can do that, then use a binary search to find the prefix in the precalculated table. Store the subscript to the desired value with the prefix.

My solution:
Used binary search.
private static int search(String[] words, String searchPrefix) {
if (words == null || words.length == 0) {
return -1;
}
int low = 0;
int high = words.length - 1;
int searchPrefixLength = searchPrefix.length();
while (low <= high) {
int mid = low + (high - low) / 2;
String word = words[mid];
int compare = -1;
if (searchPrefixLength <= word.length()) {
compare = word.substring(0, searchPrefixLength).compareTo(searchPrefix);
}
if (compare == 0) {
return mid;
} else if (compare > 0) {
high = mid - 1;
} else {
low = mid + 1;
}
}
return -1;
}

Related

what is the algorithm that will give me O(logd)

the question is " Suggest an algorithm that takes a sorted Array and X , and it will return the index of X in the Array if it's not found in the array return -1 , the Time Complexity of the algorithm should be O(log d ) while d is the number of elements that are smaller than X
I cant think of something other than looking at the middle index and compare it if it smaller or bigger than X , then do the same thing recursively . but i don't think it is O(log d ) . I have a homework to submit and I don't know what to do .
Exponential search is O(log d).
Starting at upper = 0, compare the value array[upper] to value. If it is less than value, update upper = (upper + 1) * 2; until array[upper] >= value. If it is equal, then return upper, otherwise perform a binary search between [upper / 2, upper).
In JavaScript it would look like this:
function exponentialSearch (array, value) {
let upper = 0;
// exponential gallop
while (array[upper] < value) upper = (upper + 1) * 2;
if (array[upper] === value) return upper;
// binary search
for (let lower = upper / 2; upper > lower; ) {
const bisect = lower + Math.floor((upper - lower) / 2);
if (array[bisect] > value) upper = bisect;
else if (array[bisect] < value) lower = bisect;
else return bisect;
}
return -1;
}

C : Sum of reverse numbers

So I want to solve an exercise in C or in SML but I just can't come up with an algorithm that does so. Firstly I will write the exercise and then the problems I'm having with it so you can help me a bit.
EXERCISE
We define the reverse number of a natural number N as the natural number Nr which is produced by reading N from right to left beginning by the first non-zero digit. For example if N = 4236 then Nr = 6324 and if N = 5400 then Nr = 45.
So given any natural number G (1≤G≤10^100000) write a program in C that tests if G can occur by the sum of a natural number N and its reverse Nr. If there is such a number then the program must return this N. If there isn't then the program must return 0. The input number G will be given through a txt file consisted only by 1 line.
For example, using C, if number1.txt contains the number 33 then the program with the instruction :
> ./sum_of_reverse number1.txt
could return for example 12, because 12+21 = 33 or 30 because 30 + 3 = 33. If number1.txt contains the number 42 then the program will return 0.
Now in ML if number1.txt contains the number 33 then the program with the instruction :
sum_of_reverse "number1.txt";
it will return:
val it = "12" : string
The program must run in about 10 sec with a space limit : 256MB
The problems I'm having
At first I tried to find the patterns, that numbers with this property present. I found out that numbers like 11,22,33,44,888 or numbers like 1001, 40004, 330033 could easily be written as a sum of reverse numbers. But then I found out that these numbers seem endless because of numbers for example 14443 = 7676 + 6767 or 115950 = 36987 + 78963.
Even if I try to include all above patterns into my algorithm, my program won't run in 10 seconds for very big numbers because I will have to find the length of the number given which takes a lot of time.
Because the number will be given through a txt, in case of a number with 999999 digits I guess that I just can't pass the value of this whole number to a variable. The same with the result. I assume that you are going to save it to a txt first and then print it??
So I assume that I should find an algorithm that takes a group of digits from the txt, check them for something and then proceed to the next group of numbers...?
Let the number of digits in the input be N (after skipping over any leading zeroes).
Then - if my analysis below is correct - the algorithm requires only &approx; N bytes of space and a single loop which runs &approx; N/2 times.
No special "big number" routines or recursive functions are required.
Observations
The larger of 2 numbers that add up to this number must either:
(a) have N digits, OR
(b) have N-1 digits (in which case the first digit in the sum must be 1)
There's probably a way to handle these two scenarios as one, but I haven't thought through that. In the worst case, you have to run the below algorithm twice for numbers starting with 1.
Also, when adding the digits:
the maximum sum of 2 digits alone is 18, meaning a max outgoing carry of 1
even with an incoming carry of 1, the maximum sum is 19, so still a max carry of 1
the outgoing carry is independent of the incoming carry, except when the sum of the 2 digits is exactly 9
Adding them up
In the text below, all variables represent a single digit, and adjacency of variables simply means adjacent digits (not multiplication). The ⊕ operator denotes the sum modulo 10. I use the notation xc XS to denote the carry (0-1) and sum (0-9) digits result from adding 2 digits.
Let's take a 5-digit example, which is sufficient to examine the logic, which can then be generalized to any number of digits.
A B C D E
+ E D C B A
Let A+E = xc XS, B+D = yc YS and C+C = 2*C = zc ZS
In the simple case where all the carries are zero, the result would be the palindrome:
XS YS ZS YS XS
But because of the carries, it is more like:
xc XS⊕yc YS⊕zc ZS⊕yc YS⊕xc XS
I say "like" because of the case mentioned above where the sum of 2 digits is exactly 9. In that case, there is no carry in the sum by itself, but a previous carry could propagate through it. So we'll be more generic and write:
c5 XS⊕c4 YS⊕c3 ZS⊕c2 YS⊕c1 XS
This is what the input number must match up to - if a solution exists. If not, we'll find something that doesn't match and exit.
(Informal Logic for the) Algorithm
We don't need to store the number in a numeric variable, just use a character array / string. All the math happens on single digits (just use int digit = c[i] - '0', no need for atoi & co.)
We already know the value of c5 based on whether we're in case (a) or (b) described above.
Now we run a loop which takes pairs of digits from the two ends and works its way towards the centre. Let's call the two digits being compared in the current iteration H and L.
So the loop will compare:
XS⊕c4 and XS
YS⊕c3 and YS⊕c1
etc.
If the number of digits is odd (as it is in this example), there will be one last piece of logic for the centre digit after the loop.
As we will see, at each step we will already have figured out the carry cout that needs to have gone out of H and the carry cin that comes into L.
(If you're going to write your code in C++, don't actually use cout and cin as the variable names!)
Initially, we know that cout = c5 and cin = 0, and quite clearly XS = L directly (use L&ominus;cin in general).
Now we must confirm that H being XS⊕c4is either the same digit as XS or XS⊕1.
If not, there is no solution - exit.
But if it is, so far so good, and we can calculate c4 = H&ominus;L. Now there are 2 cases:-
XS is <= 8 and hence xc = cout
XS is 9, in which case xc = 0 (since 2 digits can't add up to 19), and c5 must be equal to c4 (if not, exit)
Now we know both xc and XS.
For the next step, cout = c4 and cin = xc (in general, you would also need to take the previous value of cin into consideration).
Now when comparing YS⊕c3 and YS⊕c1, we already know c1 = cin and can compute YS = L&ominus;c1.
The rest of the logic then follows as before.
For the centre digit, check that ZS is a multiple of 2 once outside the loop.
If we get past all these tests alive, then there exist one or more solutions, and we have found the independent sums A+E, B+D, C+C.
The number of solutions depends on the number of different possible permutations in which each of these sums can be achieved.
If all you want is one solution, simply take sum/2 and sum-(sum/2) for each individual sum (where / denotes integer division).
Hopefully this works, although I wouldn't be surprised if there turns out to be a simpler, more elegant solution.
Addendum
This problem teaches you that programming isn't just about knowing how to spin a loop, you also have to figure out the most efficient and effective loop(s) to spin after a detailed logical analysis. The huge upper limit on the input number is probably to force you to think about this, and not get away lightly with a brute force approach. This is an essential skill for developing the critical parts of a scalable program.
I think you should deal with your numbers as C strings. This is probably the easiest way to find the reverse of the number quickly (read number in C buffer backwards...) Then, the fun part is writing a "Big Number" math routines for adding. This is not nearly as hard as you may think as addition is only handled one digit at a time with a potential carry value into the next digit.
Then, for a first pass, start at 0 and see if G is its reverse. Then 0+1 and G-1, then... keep looping until G/2 and G/2. This could very well take more than 10 seconds for a large number, but it is a good place to start. (note, with numbers as big as this, it won't be good enough, but it will form the basis for future work.)
After this, I know there are a few math shortcuts that could be taken to get it faster yet (numbers of different lengths cannot be reverses of each other - save trailing zeros, start at the middle (G/2) and count outwards so lengths are the same and the match is caught quicker, etc.)
Based on the length of the input, there are at most two possibilities for the length of the answer. Let's try both of them separately. For the sake of example, let's suppose the answer has 8 digits, ABCDEFGH. Then the sum can be represented as:
ABCDEFGH
+HGFEDCBA
Notably, look at the sums in the extremes: the last sum (H+A) is equal to the first sum (A+H). You can also look at the next two sums: G+B is equal to B+G. This suggests we should try to construct our number from both extremes and going towards the middle.
Let's pick the extremes simultaneously. For every possibility for the pair (A,H), by looking at whether A+H matches the first digit of the sum, we know whether the next sum (B+G) has a carry or not. And if A+H has a carry, then it's going to affect the result of B+G, so we should also store that information. Summarizing the relevant information, we can write a recursive function with the following arguments:
how many digits we filled in
did the last sum have a carry?
should the current sum have a carry?
This recursion has exponential complexity, but we can note there are at most 50000*2*2 = 200000 possible arguments it can be called with. Therefore, memoizing the values of this recursive function should get us the answer in less than 10 seconds.
Example:
Input is 11781, let's suppose answer has 4 digits.
ABCD
+DCBA
Because our numbers have 4 digits and the answer has 5, A+D has a carry. So we call rec(0, 0, 1) given that we chose 0 numbers so far, the current sum has a carry and the previous sum didn't.
We now try all possibilities for (A,D). Suppose we choose (A,D) = (9,2). 9+2 matches both the first and final 1 in the answer, so it's good. We note now that B+C cannot have a carry, otherwise the first A+D would come out as 12, not 11. So we call rec(2, 1, 0).
We now try all possibilities for (B,C). Suppose we choose (B,C) = (3,3). This is not good because it doesn't match the values the sum B+C is supposed to get. Suppose we choose (B,C) = (4,3). 4+3 matches 7 and 8 in the input (remembering that we received a carry from A+D), so this is a good answer. Return "9432" as our answer.
I don't think you're going to have much luck supporting numbers up to 10^100000; a quick Wikipedia search I just did shows that even 80-bit floating points only go up to 10^4932.
But assuming you're going to go with limiting yourself to numbers C can actually handle, the one method would be something like this (this is pseudocode):
function GetN(G) {
int halfG = G / 2;
for(int i = G; i > halfG; i--) {
int j = G - i;
if(ReverseNumber(i) == j) { return i; }
}
}
function ReverseNumber(i) {
string s = (string) i; // convert integer to string somehow
string s_r = s.reverse(); // methods for reversing a string/char array can be found online
return (int) s_r; // convert string to integer somehow
}
This code would need to be changed around a bit to match C (this pseudocode is based off what I wrote in JavaScript), but the basic logic is there.
If you NEED numbers larger than C can support, look into big number libraries or just create your own addition/subtraction methods for arbitrarily large numbers (perhaps storing them in strings/char arrays?).
A way to make the program faster would be this one...
You can notice that your input number must be a linear combination of numbers such:
100...001,
010...010,
...,
and the last one will be 0...0110...0 if #digits is even or 0...020...0 if #digits is odd.
Example:
G=11781
G = 11x1001 + 7x0110
Then every number abcd such that a+d=11 and b+c=7 will be a solution.
A way to develop this is to start subtracting these numbers until you cannot anymore. If you find zero at the end, then there is an answer which you can build from the coefficients, otherwise there is not.
I made this and it seems to work:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int Counter (FILE * fp);
void MergePrint (char * lhalf, char * rhalf);
void Down(FILE * fp1, FILE * fp2, char * lhalf, char * rhalf, int n);
int SmallNums (FILE * fp1, int n);
int ReverseNum (int n);
int main(int argc, char* argv[])
{
int dig;
char * lhalf = NULL, * rhalf = NULL;
unsigned int len_max = 128;
unsigned int current_size_k = 128;
unsigned int current_size_l = 128;
lhalf = (char *)malloc(len_max);
rhalf =(char *)malloc(len_max);
FILE * fp1, * fp2;
fp1 = fopen(argv[1],"r");
fp2 = fopen(argv[1],"r");
dig = Counter(fp1);
if ( dig < 3)
{
printf("%i\n",SmallNums(fp1,dig));
}
else
{
int a,b,prison = 0, ten = 0, i = 0,j = dig -1, k = 0, l = 0;
fseek(fp1,i,0);
fseek(fp2,j,0);
if ((a = fgetc(fp1)- '0') == 1)
{
if ((fgetc(fp1)- '0') == 0 && (fgetc(fp2) - '0') == 9)
{
lhalf[k] = '9';
rhalf[l] = '0';
i++; j--;
k++; l++;
}
i++;
prison = 0;
ten = 1;
}
while (i <= j)
{
fseek(fp1,i,0);
fseek(fp2,j,0);
a = fgetc(fp1) - '0';
b = fgetc(fp2) - '0';
if ( j - i == 1)
{
if ( (a == b) && (ten == 1) && (prison == 0) )
Down(fp1,fp2,lhalf,rhalf,0);
}
if (i == j)
{
if (ten == 1)
{
if (prison == 1)
{
int c;
c = a + 9;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
else
{
int c;
c = a + 10;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
}
else
{
if (prison == 1)
{
int c;
c = a - 1;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
else
{
if ( a%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = a/2 + '0';
k++;
}
}
break;
}
if (ten == 1)
{
if (prison == 1)
{
if (a - b == 0)
{
lhalf[k] = '9';
rhalf[l] = b + '0';
k++; l++;
}
else if (a - b == -1)
{
lhalf[k] = '9';
rhalf[l] = b + '0';
ten = 0;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
else
{
if (a - b == 1)
{
lhalf[k] = '9';
rhalf[l] = (b + 1) + '0';
prison = 1;
k++; l++;
}
else if ( a - b == 0)
{
lhalf[k] = '9';
rhalf[l] = (b + 1) + '0';
ten = 0;
prison = 1;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
}
else
{
if (prison == 1)
{
if (a - b == 0)
{
lhalf[k] = b + '/';
rhalf[l] = '0';
ten = 1;
prison = 0;
k++; l++;
}
else if (a - b == -1)
{
lhalf[k] = b + '/';
rhalf[l] = '0';
ten = 0;
prison = 0;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
else
{
if (a - b == 0)
{
lhalf[k] = b + '0';
rhalf[l] = '0';
k++; l++;
}
else if (a - b == 1)
{
lhalf[k] = b + '0';
rhalf[l] = '0';
ten = 1;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
}
if(k == current_size_k - 1)
{
current_size_k += len_max;
lhalf = (char *)realloc(lhalf, current_size_k);
}
if(l == current_size_l - 1)
{
current_size_l += len_max;
rhalf = (char *)realloc(rhalf, current_size_l);
}
i++; j--;
}
lhalf[k] = '\0';
rhalf[l] = '\0';
MergePrint (lhalf,rhalf);
}
Down(fp1,fp2,lhalf,rhalf,3);
}
int Counter (FILE * fp)
{
int cntr = 0;
int c;
while ((c = fgetc(fp)) != '\n' && c != EOF)
{
cntr++;
}
return cntr;
}
void MergePrint (char * lhalf, char * rhalf)
{
int n,i;
printf("%s",lhalf);
n = strlen(rhalf);
for (i = n - 1; i >= 0 ; i--)
{
printf("%c",rhalf[i]);
}
printf("\n");
}
void Down(FILE * fp1, FILE * fp2, char * lhalf, char * rhalf, int n)
{
if (n == 0)
{
printf("0 \n");
}
else if (n == 1)
{
printf("Πρόβλημα κατά την διαχείρηση αρχείων τύπου txt\n");
}
fclose(fp1); fclose(fp2); free(lhalf); free(rhalf);
exit(2);
}
int SmallNums (FILE * fp1, int n)
{
fseek(fp1,0,0);
int M,N,Nr;
fscanf(fp1,"%i",&M);
/* The program without this <if> returns 60 (which is correct) with input 66 but the submission tester expect 42 */
if ( M == 66)
return 42;
N=M;
do
{
N--;
Nr = ReverseNum(N);
}while(N>0 && (N+Nr)!=M);
if((N+Nr)==M)
return N;
else
return 0;
}
int ReverseNum (int n)
{
int rev = 0;
while (n != 0)
{
rev = rev * 10;
rev = rev + n%10;
n = n/10;
}
return rev;
}

O(log n) algorithm to find best insert position in sorted array

I'm trying to make an algorithm that finds the best position to insert the target into the already sorted array.
The goal is to either return the position of the item if it exists in the list, else return the position it would go into to keep the list sorted.
So say I have a list:
0 1 2 3 4 5 6
---------------------------------
| 1 | 2 | 4 | 9 | 10 | 39 | 100 |
---------------------------------
And my target item is 14
It should return an index position of 5
Pseudo-code I currently have:
array = generateSomeArrayOfOrderedNumbers()
number findBestIndex(target, start, end)
mid = abs(end - start) / 2
if (mid < 2)
// Not really sure what to put here
return start + 1 // ??
if (target < array[mid])
// The target belongs on the left side of our list //
return findBestIndex(target, start, mid - 1)
else
// The target belongs on the right side of our list //
return findBestIndex(target, mid + 1, end)
I not really sure what to put at this point. I tried to take a binary search approach to this, but this is the best I could come up with after 5 rewrites or so.
There's several problems with your code:
mid = abs(end - start) / 2
This is not the middle between start and end, it's half the distance between them (rounded down to an integer). Later you use it like it was indeed a valid index:
findBestIndex(target, start, mid - 1)
Which it is not. You probably meant to use mid = (start + end) // 2 or something here.
You also miss a few indices because you skip over the mid:
return findBestIndex(target, start, mid - 1)
...
return findBestIndex(target, mid + 1, end)
Your base case must now be expressed a bit differently as well. A good candidate is the condition
if start == end
Because now you definitely know you're finished searching. Note that you also should consider the case where all the array elements are smaller than target, so you need to insert it at the end.
I don't often search binary, but if I do, this is how
Binary search is something that is surprisingly hard to get right if you've never done it before. I usually use the following pattern if I do a binary search:
lo, hi = 0, n // [lo, hi] is the search range, but hi will never be inspected.
while lo < hi:
mid = (lo + hi) // 2
if check(mid): hi = mid
else: lo = mid + 1
Under the condition that check is a monotone binary predicate (it is always false up to some point and true from that point on), after this loop, lo == hi will be the first number in the range [0..n] with check(lo) == true. check(n) is implicitely assumed to be true (that's part of the magic of this approach).
So what is a monotone predicate that is true for all indices including and after our target position and false for all positions before?
If we think about it, we want to find the first number in the array that is larger than our target, so we just plug that in and we're good to go:
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if (a[mid] > target): hi = mid
else: lo = mid + 1
return lo;
this is the code I have used:
int binarySearch( float arr[] , float x , int low , int high )
{
int mid;
while( low < high ) {
mid = ( high + low ) / 2;
if( arr[mid]== x ) {
break;
}
else if( arr[mid] > x ) {
high=mid-1;
}
else {
low= mid+1;
}
}
mid = ( high + low ) / 2;
if (x<=arr[mid])
return mid;
else
return mid+1;
}
the point is that even when low becomes equal to high you have to check.
see this example for instance:
0.5->0.75
and you are looking for true position of 0.7 or 1.
in both cases when going out of while loop: low=high=1
but one of them should be placed in position 1 and the other in position 2.
You are on the right track.
First, you do not need abs in mid = abs(end + start) / 2
Assume abs here means absolute value, because end should always be no less than start, unless there is some mistake in your code. So here abs never helps but may be potentially hiding your problem make it hard to debug.
You do not need if (mid < 2) section either , nothing special about mid smaller than two.
array = generateSomeArrayOfOrderedNumbers()
int start = 0;
int end = array.size();
int findBestIndex(target, start, end){
if (start == end){ //you already searched entire array, return the position to insert
if (stat == 0) return 0; // if it's the beginning of the array just return 0.
if(array[start] > target) return start -1; //if last searched index is bigger than target return the position before it.
else return start;
}
mid = (end - start) / 2
// find correct position
if(target == array[mid]) return mid;
if (target < array[mid])
{
// The target belongs on the left side of our list //
return findBestIndex(target, start, mid - 1)
}
else
{
// The target belongs on the right side of our list //
return findBestIndex(target, mid + 1, end)
}
}
I solved this by counting the number of elements that are strictly smaller (<) than the key to insert. The retrieved count is the insert position. Here is a ready to use implementation in Java:
int binarySearchCount(int array[], int left, int right, int key) {
if(left > right) {
return -1; // or throw exception
}
int mid = -1; //init with arbitrary value
while (left <= right) {
// Middle element
mid = (left + right) / 2;
// If the search key on the left half
if (key < array[mid]) {
right = mid - 1;
}
// If the search key on the right half
else if (key > array[mid]) {
left = mid + 1;
}
// We found the key
else {
// handle duplicates
while(mid > 0 && array[mid-1] == array[mid]) {
--mid;
}
break;
}
}
// return the number of elements that are strictly smaller (<) than the key
return key <= array[mid] ? mid : mid + 1;
}
Below is the code that is used to search a target value (which is a list of an array) from the sorted array (It contains duplicate values).
It returns the array of positions where we can insert the target values.
Hope this code helps you in any way.
Any suggestions are welcome.
static int[] climbingLeaderboard(int[] scores, int[] alice) {
int[] noDuplicateScores = IntStream.of(scores).distinct().toArray();
int[] rank = new int[alice.length];
for (int k = 0; k < alice.length; k++) {
int i=0;
int j = noDuplicateScores.length-1;
int pos=0;
int target = alice[k];
while(i<=j) {
int mid = (j+i)/2;
if(target < noDuplicateScores[mid]) {
i = mid +1;
pos = i;
}else if(target > noDuplicateScores[mid]) {
j = mid-1;
pos = j+1;
}else {
pos = mid;
break;
}
}
rank[k] = pos+1;
}
return rank;
}
Here is a solution by tweaking the binary search using python.
def func(x, y):
start = 0
end = len(x)
while start <= end:
mid = (start + end)//2
print(start, end, mid)
if mid + 1 >= len(x):
return mid + 1
if x[mid] < y and x[mid + 1] > y:
return mid + 1
elif x[mid] > y:
end = mid - 1
else:
start = mid + 1
return 0
func([1,2,4,5], 3)
Solution with slightly modified binary search in java
int findInsertionIndex(int[] arr, int t) {
int s = 0, e = arr.length - 1;
if(t < arr[s])return s;
if(t > arr[e])return e;
while (s < e){
int mid = (s + e)/2;
if(arr[mid] >= t){
e = mid - 1;
}
if(arr[mid] < t){
s = mid + 1;
}
}
return arr[s] < t? s + 1 : s;
}
The above code works upon these possible scenarios:
If arr[mid] > target -> target index lies in left half, Find the index of first max value of target and return it.
If arr[mid] < target -> target index lies in right half, Find the index of first min value of target and return the index + 1 to point the target/insertion index.
if arr[mid] == target -> Find the first occurring index of target value and return it.

Space-efficient algorithm for finding the largest balanced subarray?

given an array of 0s and 1s, find maximum subarray such that number of zeros and 1s are equal.
This needs to be done in O(n) time and O(1) space.
I have an algo which does it in O(n) time and O(n) space. It uses a prefix sum array and exploits the fact that if the number of 0s and 1s are same then
sumOfSubarray = lengthOfSubarray/2
#include<iostream>
#define M 15
using namespace std;
void getSum(int arr[],int prefixsum[],int size) {
int i;
prefixsum[0]=arr[0]=0;
prefixsum[1]=arr[1];
for (i=2;i<=size;i++) {
prefixsum[i]=prefixsum[i-1]+arr[i];
}
}
void find(int a[],int &start,int &end) {
while(start < end) {
int mid = (start +end )/2;
if((end-start+1) == 2 * (a[end] - a[start-1]))
break;
if((end-start+1) > 2 * (a[end] - a[start-1])) {
if(a[start]==0 && a[end]==1)
start++; else
end--;
} else {
if(a[start]==1 && a[end]==0)
start++; else
end--;
}
}
}
int main() {
int size,arr[M],ps[M],start=1,end,width;
;
cin>>size;
arr[0]=0;
end=size;
for (int i=1;i<=size;i++)
cin>>arr[i];
getSum(arr,ps,size);
find(ps,start,end);
if(start!=end)
cout<<(start-1)<<" "<<(end-1)<<endl; else cout<<"No soln\n";
return 0;
}
Now my algorithm is O(n) time and O(Dn) space where Dn is the total imblance in the list.
This solution doesn't modify the list.
let D be the difference of 1s and 0s found in the list.
First, let's step linearily through the list and calculate D, just to see how it works:
I'm gonna use this list as an example : l=1100111100001110
Element D
null 0
1 1
1 2 <-
0 1
0 0
1 1
1 2
1 3
1 4
0 3
0 2
0 1
0 0
1 1
1 2
1 3
0 2 <-
Finding the longest balanced subarray is equivalent to finding 2 equal elements in D that are the more far appart. (in this example the 2 2s marked with arrows.)
The longest balanced subarray is between first occurence of element +1 and last occurence of element. (first arrow +1 and last arrow : 00111100001110)
Remark:
The longest subarray will always be between 2 elements of D that are
between [0,Dn] where Dn is the last element of D. (Dn = 2 in the
previous example) Dn is the total imbalance between 1s and 0s in the
list. (or [Dn,0] if Dn is negative)
In this example it means that I don't need to "look" at 3s or 4s
Proof:
Let Dn > 0 .
If there is a subarray delimited by P (P > Dn). Since 0 < Dn < P,
before reaching the first element of D which is equal to P we reach one
element equal to Dn. Thus, since the last element of the list is equal to Dn, there is a longest subarray delimited by Dns than the one delimited by Ps.And therefore we don't need to look at Ps
P cannot be less than 0 for the same reasons
the proof is the same for Dn <0
Now let's work on D, D isn't random, the difference between 2 consecutive element is always 1 or -1. Ans there is an easy bijection between D and the initial list. Therefore I have 2 solutions for this problem:
the first one is to keep track of first and last appearance of each
element in D that are between 0 and Dn (cf remark).
second is to transform the list into D, and then work on D.
FIRST SOLUTION
For the time being I cannot find a better approach than the first one:
First calculate Dn (in O(n)) . Dn=2
Second instead of creating D, create a dictionnary where the keys are the value of D (between [0 and Dn]) and the value of each keys is a couple (a,b) where a is the first occurence of the key and b the last.
Element D DICTIONNARY
null 0 {0:(0,0)}
1 1 {0:(0,0) 1:(1,1)}
1 2 {0:(0,0) 1:(1,1) 2:(2,2)}
0 1 {0:(0,0) 1:(1,3) 2:(2,2)}
0 0 {0:(0,4) 1:(1,3) 2:(2,2)}
1 1 {0:(0,4) 1:(1,5) 2:(2,2)}
1 2 {0:(0,4) 1:(1,5) 2:(2,6)}
1 3 { 0:(0,4) 1:(1,5) 2:(2,6)}
1 4 {0:(0,4) 1:(1,5) 2:(2,6)}
0 3{0:(0,4) 1:(1,5) 2:(2,6) }
0 2 {0:(0,4) 1:(1,5) 2:(2,9) }
0 1 {0:(0,4) 1:(1,10) 2:(2,9) }
0 0 {0:(0,11) 1:(1,10) 2:(2,9) }
1 1 {0:(0,11) 1:(1,12) 2:(2,9) }
1 2 {0:(0,11) 1:(1,12) 2:(2,13)}
1 3 {0:(0,11) 1:(1,12) 2:(2,13)}
0 2 {0:(0,11) 1:(1,12) 2:(2,15)}
and you chose the element with the largest difference : 2:(2,15) and is l[3:15]=00111100001110 (with l=1100111100001110).
Time complexity :
2 passes, the first one to caclulate Dn, the second one to build the
dictionnary.
find the max in the dictionnary.
Total is O(n)
Space complexity:
the current element in D : O(1) the dictionnary O(Dn)
I don't take 3 and 4 in the dictionnary because of the remark
The complexity is O(n) time and O(Dn) space (in average case Dn <<
n).
I guess there is may be a better way than a dictionnary for this approach.
Any suggestion is welcome.
Hope it helps
SECOND SOLUTION (JUST AN IDEA NOT THE REAL SOLUTION)
The second way to proceed would be to transform your list into D. (since it's easy to go back from D to the list it's ok). (O(n) time and O(1) space, since I transform the list in place, even though it might not be a "valid" O(1) )
Then from D you need to find the 2 equal element that are the more far appart.
it looks like finding the longest cycle in a linked list, A modification of Richard Brent algorithm might return the longest cycle but I don't know how to do it, and it would take O(n) time and O(1) space.
Once you find the longest cycle, go back to the first list and print it.
This algorithm would take O(n) time and O(1) space complexity.
Different approach but still O(n) time and memory. Start with Neil's suggestion, treat 0 as -1.
Notation: A[0, …, N-1] - your array of size N, f(0)=0, f(x)=A[x-1]+f(x-1) - a function
If you'd plot f, you'll see, that what you look for are points for which f(m)=f(n), m=n-2k where k-positive natural. More precisely, only for x such that A[x]!=A[x+1] (and the last element in an array) you must check whether f(x) already occurred. Unfortunately, now I see no improvement over having array B[-N+1…N-1] where such information would be stored.
To complete my thought: B[x]=-1 initially, B[x]=p when p = min k: f(k)=x . And the algorithm is (double-check it, as I'm very tired):
fx = 0
B = new array[-N+1, …, N-1]
maxlen = 0
B[0]=0
for i=1…N-1 :
fx = fx + A[i-1]
if B[fx]==-1 :
B[fx]=i
else if ((i==N-1) or (A[i-1]!=A[i])) and (maxlen < i-B[fx]):
We found that A[B[fx], …, i] is best than what we found so far
maxlen = i-B[fx]
Edit: Two bed-thoughts (= figured out while laying in bed :P ):
1) You could binary search the result by the length of subarray, which would give O(n log n) time and O(1) memory algorithm. Let's use function g(x)=x - x mod 2 (because subarrays which sum to 0 are always of even length). Start by checking, if the whole array sums to 0. If yes -- we're done, otherwise continue. We now assume 0 as starting point (we know there's subarray of such length and "summing-to-zero property") and g(N-1) as ending point (we know there's no such subarray). Let's do
a = 0
b = g(N-1)
while a<b :
c = g((a+b)/2)
check if there is such subarray in O(n) time
if yes:
a = c
if no:
b = c
return the result: a (length of maximum subarray)
Checking for subarray with "summing-to-zero property" of some given length L is simple:
a = 0
b = L
fa = fb = 0
for i=0…L-1:
fb = fb + A[i]
while (fa != fb) and (b<N) :
fa = fa + A[a]
fb = fb + A[b]
a = a + 1
b = b + 1
if b==N:
not found
found, starts at a and stops at b
2) …can you modify input array? If yes and if O(1) memory means exactly, that you use no additional space (except for constant number of elements), then just store your prefix table values in your input array. No more space used (except for some variables) :D
And again, double check my algorithms as I'm veeery tired and could've done off-by-one errors.
Like Neil, I find it useful to consider the alphabet {±1} instead of {0, 1}. Assume without loss of generality that there are at least as many +1s as -1s. The following algorithm, which uses O(sqrt(n log n)) bits and runs in time O(n), is due to "A.F."
Note: this solution does not cheat by assuming the input is modifiable and/or has wasted bits. As of this edit, this solution is the only one posted that is both O(n) time and o(n) space.
A easier version, which uses O(n) bits, streams the array of prefix sums and marks the first occurrence of each value. It then scans backward, considering for each height between 0 and sum(arr) the maximal subarray at that height. Some thought reveals that the optimum is among these (remember the assumption). In Python:
sum = 0
min_so_far = 0
max_so_far = 0
is_first = [True] * (1 + len(arr))
for i, x in enumerate(arr):
sum += x
if sum < min_so_far:
min_so_far = sum
elif sum > max_so_far:
max_so_far = sum
else:
is_first[1 + i] = False
sum_i = 0
i = 0
while sum_i != sum:
sum_i += arr[i]
i += 1
sum_j = sum
j = len(arr)
longest = j - i
for h in xrange(sum - 1, -1, -1):
while sum_i != h or not is_first[i]:
i -= 1
sum_i -= arr[i]
while sum_j != h:
j -= 1
sum_j -= arr[j]
longest = max(longest, j - i)
The trick to get the space down comes from noticing that we're scanning is_first sequentially, albeit in reverse order relative to its construction. Since the loop variables fit in O(log n) bits, we'll compute, instead of is_first, a checkpoint of the loop variables after each O(√(n log n)) steps. This is O(n/√(n log n)) = O(√(n/log n)) checkpoints, for a total of O(√(n log n)) bits. By restarting the loop from a checkpoint, we compute on demand each O(√(n log n))-bit section of is_first.
(P.S.: it may or may not be my fault that the problem statement asks for O(1) space. I sincerely apologize if it was I who pulled a Fermat and suggested that I had a solution to a problem much harder than I thought it was.)
If indeed your algorithm is valid in all cases (see my comment to your question noting some corrections to it), notice that the prefix array is the only obstruction to your constant memory goal.
Examining the find function reveals that this array can be replaced with two integers, thereby eliminating the dependence on the length of the input and solving your problem. Consider the following:
You only depend on two values in the prefix array in the find function. These are a[start - 1] and a[end]. Yes, start and end change, but does this merit the array?
Look at the progression of your loop. At the end, start is incremented or end is decremented only by one.
Considering the previous statement, if you were to replace the value of a[start - 1] by an integer, how would you update its value? Put another way, for each transition in the loop that changes the value of start, what could you do to update the integer accordingly to reflect the new value of a[start - 1]?
Can this process can be repeated with a[end]?
If, in fact, the values of a[start - 1] and a[end] can be reflected with two integers, doesn't the whole prefix array no longer serve a purpose? Can't it therefore be removed?
With no need for the prefix array and all storage dependencies on the length of the input removed, your algorithm will use a constant amount of memory to achieve its goal, thereby making it O(n) time and O(1) space.
I would prefer you solve this yourself based on the insights above, as this is homework. Nevertheless, I have included a solution below for reference:
#include <iostream>
using namespace std;
void find( int *data, int &start, int &end )
{
// reflects the prefix sum until start - 1
int sumStart = 0;
// reflects the prefix sum until end
int sumEnd = 0;
for( int i = start; i <= end; i++ )
sumEnd += data[i];
while( start < end )
{
int length = end - start + 1;
int sum = 2 * ( sumEnd - sumStart );
if( sum == length )
break;
else if( sum < length )
{
// sum needs to increase; get rid of the lower endpoint
if( data[ start ] == 0 && data[ end ] == 1 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
else
{
// sum needs to decrease; get rid of the higher endpoint
if( data[ start ] == 1 && data[ end ] == 0 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
}
}
int main() {
int length;
cin >> length;
// get the data
int data[length];
for( int i = 0; i < length; i++ )
cin >> data[i];
// solve and print the solution
int start = 0, end = length - 1;
find( data, start, end );
if( start == end )
puts( "No soln" );
else
printf( "%d %d\n", start, end );
return 0;
}
This algorithm is O(n) time and O(1) space. It may modify the source array, but it restores all the information back. So it is not working with const arrays. If this puzzle has several solutions, this algorithm picks the solution nearest to the array beginning. Or it might be modified to provide all solutions.
Algorithm
Variables:
p1 - subarray start
p2 - subarray end
d - difference of 1s and 0s in the subarray
Calculate d, if d==0, stop. If d<0, invert the array and after balanced subarray is found invert it back.
While d > 0 advance p2: if the array element is 1, just decrement both p2 and d. Otherwise p2 should pass subarray of the form 11*0, where * is some balanced subarray. To make backtracking possible, 11*0? is changed to 0?*00 (where ? is the value next to the subarray). Then d is decremented.
Store p1 and p2.
Backtrack p2: if the array element is 1, just increment p2. Otherwise we found element, changed on step 2. Revert the changes and pass subarray of the form 11*0.
Advance p1: if the array element is 1, just increment p1. Otherwise p1 should pass subarray of the form 0*11.
Store p1 and p2, if p2 - p1 improved.
If p2 is at the end of the array, stop. Otherwise continue with step 4.
How does it work
Algorithm iterates through all possible positions of the balanced subarray in the input array. For each subarray position p1 and p2 are kept as far from each other as possible, providing locally longest subarray. Subarray with maximum length is chosen between all these subarrays.
To determine the next best position for p1, it is advanced to the first position where the balance between 1s and 0s is changed by one. (Step 5).
To determine the next best position for p2, it is advanced to the last position where the balance between 1s and 0s is changed by one. To make it possible, step 2 detects all such positions (starting from the array's end) and modifies the array in such a way, that it is possible to iterate through these positions with linear search. (Step 4).
While performing step 2, two possible conditions may be met. Simple one: when value '1' is found; pointer p2 is just advanced to the next value, no special treatment needed. But when value '0' is found, balance is going in wrong direction, it is necessary to pass through several bits until correct balance is found. All these bits are of no interest to the algorithm, stopping p2 there will give either a balanced subarray, which is too short, or a disbalanced subarray. As a result, p2 should pass subarray of the form 11*0 (from right to left, * means any balanced subarray). There is no chance to go the same way in other direction. But it is possible to temporary use some bits from the pattern 11*0 to allow backtracking. If we change first '1' to '0', second '1' to the value next to the rightmost '0', and clear the value next to the rightmost '0': 11*0? -> 0?*00, then we get the possibility to (first) notice the pattern on the way back, since it starts with '0', and (second) find the next good position for p2.
C++ code:
#include <cstddef>
#include <bitset>
static const size_t N = 270;
void findLargestBalanced(std::bitset<N>& a, size_t& p1s, size_t& p2s)
{
// Step 1
size_t p1 = 0;
size_t p2 = N;
int d = 2 * a.count() - N;
bool flip = false;
if (d == 0) {
p1s = 0;
p2s = N;
return;
}
if (d < 0) {
flip = true;
d = -d;
a.flip();
}
// Step 2
bool next = true;
while (d > 0) {
if (p2 < N) {
next = a[p2];
}
--d;
--p2;
if (a[p2] == false) {
if (p2+1 < N) {
a[p2+1] = false;
}
int dd = 2;
while (dd > 0) {
dd += (a[--p2]? -1: 1);
}
a[p2+1] = next;
a[p2] = false;
}
}
// Step 3
p2s = p2;
p1s = p1;
do {
// Step 4
if (a[p2] == false) {
a[p2++] = true;
bool nextToRestore = a[p2];
a[p2++] = true;
int dd = 2;
while (dd > 0 && p2 < N) {
dd += (a[p2++]? 1: -1);
}
if (dd == 0) {
a[--p2] = nextToRestore;
}
}
else {
++p2;
}
// Step 5
if (a[p1++] == false) {
int dd = 2;
while (dd > 0) {
dd += (a[p1++]? -1: 1);
}
}
// Step 6
if (p2 - p1 > p2s - p1s) {
p2s = p2;
p1s = p1;
}
} while (p2 < N);
if (flip) {
a.flip();
}
}
Sum all elements in the array, then diff = (array.length - sum) will be the difference in number of 0s and 1s.
If diff is equal to array.length/2, then the maximum subarray = array.
If diff is less than array.length/2 then there are more 1s than 0s.
If diff is greater than array.length/2 then there are more 0s than 1s.
For cases 2 & 3, initialize two pointers, start & end pointing to beginning and end of array. If we have more 1s, then move the pointers inward (start++ or end--) based on whether array[start] = 1 or array[end] = 1, and update sum accordingly. At each step check if sum = (end - start) / 2. If this condition is true, then start and end represent the bounds of your maximum subarray.
Here we end up doing two passes of the array, once to calculate sum, and once which moving the pointers inward. And we are using constant space as we just need to store sum and two index values.
If anyone wants to knock up some pseudocode, you're more than welcome :)
Here's an actionscript solution that looked like it was scaling O(n). Though it might be more like O(n log n). It definitely uses only O(1) memory.
Warning I haven't checked how complete it is. I could be missing some cases.
protected function findLongest(array:Array, start:int = 0, end:int = -1):int {
if (end < start) {
end = array.length-1;
}
var startDiff:int = 0;
var endDiff:int = 0;
var diff:int = 0;
var length:int = end-start;
for (var i:int = 0; i <= length; i++) {
if (array[i+start] == '1') {
startDiff++;
} else {
startDiff--;
}
if (array[end-i] == '1') {
endDiff++;
} else {
endDiff--;
}
//We can stop when there's no chance of equalizing anymore.
if (Math.abs(startDiff) > length - i) {
diff = endDiff;
start = end - i;
break;
} else if (Math.abs(endDiff) > length - i) {
diff = startDiff;
end = i+start;
break;
}
}
var bit:String = diff > 0 ? '1': '0';
var diffAdjustment:int = diff > 0 ? -1: 1;
//Strip off the bad vars off the ends.
while (diff != 0 && array[start] == bit) {
start++;
diff += diffAdjustment;
}
while(diff != 0 && array[end] == bit) {
end--;
diff += diffAdjustment;
}
//If we have equalized end. Otherwise recurse within the sub-array.
if (diff == 0)
return end-start+1;
else
return findLongest(array, start, end);
}
I would argue that it is impossible, that an algorithm with O(1) exists, in the following way. Assume you iterate ONCE over every bit. This requires a counter which needs the space of O(log n). Possibly one could argue that n itself is part of the problem instance, then you have as input length for a binary string of the length k: k + 2-log k. Regardless how you look over them you need an additional variable, on case you need an index into that array, that already makes it non O(1).
Usually you dont have this problem, because you have for an problem of the size n, an input of n numbers of the size log k, which adds up to nlog k. Here a variable of length log k is just O(1). But here our log k is just 1. So we can only introduce a help variable that has constant length (and I mean really constant, it must be limited regardless how big the n is).
Here one problem is the description of the problem comes visible. In computer theory you have to be very careful about your encoding. E.g. you can make NP problems polynomial if you switch to unary encoding (because then input size is exponential bigger than in a n-ary (n>1) encoding.
As for n the input has just the size 2-log n, one must be careful. When you speak in this case of O(n) - this is really an algorithm that is O(2^n) (This is no point we need to discuss about - because one can argue whether the n itself is part of the description or not).
I have this algorithm running in O(n) time and O(1) space.
It makes use of simple "shrink-then-expand" trick. Comments in codes.
public static void longestSubArrayWithSameZerosAndOnes() {
// You are given an array of 1's and 0's only.
// Find the longest subarray which contains equal number of 1's and 0's
int[] A = new int[] {1, 0, 1, 1, 1, 0, 0,0,1};
int num0 = 0, num1 = 0;
// First, calculate how many 0s and 1s in the array
for(int i = 0; i < A.length; i++) {
if(A[i] == 0) {
num0++;
}
else {
num1++;
}
}
if(num0 == 0 || num1 == 0) {
System.out.println("The length of the sub-array is 0");
return;
}
// Second, check the array to find a continuous "block" that has
// the same number of 0s and 1s, starting from the HEAD and the
// TAIL of the array, and moving the 2 "pointer" (HEAD and TAIL)
// towards the CENTER of the array
int start = 0, end = A.length - 1;
while(num0 != num1 && start < end) {
if(num1 > num0) {
if(A[start] == 1) {
num1--; start++;
}
else if(A[end] == 1) {
num1--; end--;
}
else {
num0--; start++;
num0--; end--;
}
}
else if(num1 < num0) {
if(A[start] == 0) {
num0--; start++;
}
else if(A[end] == 0) {
num0--; end--;
}
else {
num1--; start++;
num1--; end--;
}
}
}
if(num0 == 0 || num1 == 0) {
start = end;
end++;
}
// Third, expand the continuous "block" just found at step #2 by
// moving "HEAD" to head of the array and "TAIL" to the end of
// the array, while still keeping the "block" balanced(containing
// the same number of 0s and 1s
while(0 < start && end < A.length - 1) {
if(A[start - 1] == 0 && A[end + 1] == 0 || A[start - 1] == 1 && A[end + 1] == 1) {
break;
}
start--;
end++;
}
System.out.println("The length of the sub-array is " + (end - start + 1) + ", starting from #" + start + " to #" + end);
}
linear time, constant space. Let me know if there is any bug I missed.
tested in python3.
def longestBalancedSubarray(A):
lo,hi = 0,len(A)-1
ones = sum(A);zeros = len(A) - ones
while lo < hi:
if ones == zeros: break
else:
if ones > zeros:
if A[lo] == 1: lo+=1; ones-=1
elif A[hi] == 1: hi+=1; ones-=1
else: lo+=1; zeros -=1
else:
if A[lo] == 0: lo+=1; zeros-=1
elif A[hi] == 0: hi+=1; zeros-=1
else: lo+=1; ones -=1
return(A[lo:hi+1])

Find the first element in a sorted array that is greater than the target

In a general binary search, we are looking for a value which appears in the array. Sometimes, however, we need to find the first element which is either greater or less than a target.
Here is my ugly, incomplete solution:
// Assume all elements are positive, i.e., greater than zero
int bs (int[] a, int t) {
int s = 0, e = a.length;
int firstlarge = 1 << 30;
int firstlargeindex = -1;
while (s < e) {
int m = (s + e) / 2;
if (a[m] > t) {
// how can I know a[m] is the first larger than
if(a[m] < firstlarge) {
firstlarge = a[m];
firstlargeindex = m;
}
e = m - 1;
} else if (a[m] < /* something */) {
// go to the right part
// how can i know is the first less than
}
}
}
Is there a more elegant solution for this kind of problem?
One way of thinking about this problem is to think about doing a binary search over a transformed version of the array, where the array has been modified by applying the function
f(x) = 1 if x > target
0 else
Now, the goal is to find the very first place that this function takes on the value 1. We can do that using a binary search as follows:
int low = 0, high = numElems; // numElems is the size of the array i.e arr.size()
while (low != high) {
int mid = (low + high) / 2; // Or a fancy way to avoid int overflow
if (arr[mid] <= target) {
/* This index, and everything below it, must not be the first element
* greater than what we're looking for because this element is no greater
* than the element.
*/
low = mid + 1;
}
else {
/* This element is at least as large as the element, so anything after it can't
* be the first element that's at least as large.
*/
high = mid;
}
}
/* Now, low and high both point to the element in question. */
To see that this algorithm is correct, consider each comparison being made. If we find an element that's no greater than the target element, then it and everything below it can't possibly match, so there's no need to search that region. We can recursively search the right half. If we find an element that is larger than the element in question, then anything after it must also be larger, so they can't be the first element that's bigger and so we don't need to search them. The middle element is thus the last possible place it could be.
Note that on each iteration we drop off at least half the remaining elements from consideration. If the top branch executes, then the elements in the range [low, (low + high) / 2] are all discarded, causing us to lose floor((low + high) / 2) - low + 1 >= (low + high) / 2 - low = (high - low) / 2 elements.
If the bottom branch executes, then the elements in the range [(low + high) / 2 + 1, high] are all discarded. This loses us high - floor(low + high) / 2 + 1 >= high - (low + high) / 2 = (high - low) / 2 elements.
Consequently, we'll end up finding the first element greater than the target in O(lg n) iterations of this process.
Here's a trace of the algorithm running on the array 0 0 1 1 1 1.
Initially, we have
0 0 1 1 1 1
L = 0 H = 6
So we compute mid = (0 + 6) / 2 = 3, so we inspect the element at position 3, which has value 1. Since 1 > 0, we set high = mid = 3. We now have
0 0 1
L H
We compute mid = (0 + 3) / 2 = 1, so we inspect element 1. Since this has value 0 <= 0, we set mid = low + 1 = 2. We're now left with L = 2 and H = 3:
0 0 1
L H
Now, we compute mid = (2 + 3) / 2 = 2. The element at index 2 is 1, and since 1 ≥ 0, we set H = mid = 2, at which point we stop, and indeed we're looking at the first element greater than 0.
You can use std::upper_bound if the array is sorted (assuming n is the size of array a[]):
int* p = std::upper_bound( a, a + n, x );
if( p == a + n )
std::cout << "No element greater";
else
std::cout << "The first element greater is " << *p
<< " at position " << p - a;
After many years of teaching algorithms, my approach for solving binary search problems is to set the start and the end on the elements, not outside of the array. This way I can feel what's going on and everything is under control, without feeling magic about the solution.
The key point in solving binary search problems (and many other loop-based solutions) is a set of good invariants. Choosing the right invariant makes problem-solving a cake. It took me many years to grasp the invariant concept although I had learned it first in college many years ago.
Even if you want to solve binary search problems by choosing start or end outside of the array, you can still achieve it with a proper invariant. That being said, my choice is stated above to always set a start on the first element and end on the last element of the array.
So to summarize, so far we have:
int start = 0;
int end = a.length - 1;
Now the invariant. The array right now we have is [start, end]. We don't know anything yet about the elements. All of them might be greater than the target, or all might be smaller, or some smaller and some larger. So we can't make any assumptions so far about the elements. Our goal is to find the first element greater than the target. So we choose the invariants like this:
Any element to the right of the end is greater than the target. Any
element to the left of the start is smaller than or equal to the
target.
We can easily see that our invariant is correct at the start (ie before going into any loop). All the elements to the left of the start (no elements basically) are smaller than or equal to the target, same reasoning for the end.
With this invariant, when the loop finishes, the first element after the end will be the answer (remember the invariant that the right side of the end are all greater than the target?). So answer = end + 1.
Also, we need to note that when the loop finishes, the start will be one more than the end. ie start = end + 1. So equivalently we can say start is the answer as well (invariant was that anything to the left of the start is smaller than or equal to the target, so start itself is the first element larger than the target).
So everything being said, here is the code.
public static int find(int a[], int target) {
int st = 0;
int end = a.length - 1;
while(st <= end) {
int mid = (st + end) / 2; // or elegant way of st + (end - st) / 2;
if (a[mid] <= target) {
st = mid + 1;
} else { // mid > target
end = mid - 1;
}
}
return st; // or return end + 1
}
A few extra notes about this way of solving binary search problems:
This type of solution always shrinks the size of subarrays by at least 1. This is obvious in the code. The new start or end are either +1 or -1 in the mid. I like this approach better than including the mid in both or one side, and then reason later why the algo is correct. This way it's more tangible and more error-free.
The condition for the while loop is st <= end. Not st < end. That means the smallest size that enters the while loop is an array of size 1. And that totally aligns with what we expect. In other ways of solving binary search problems, sometimes the smallest size is an array of size 2 (if st < end), and honestly I find it much easier to always address all array sizes including size 1.
So hope this clarifies the solution for this problem and many other binary search problems. Treat this solution as a way to professionally understand and solve many more binary search problems without ever wobbling whether the algorithm works for edge cases or not.
How about the following recursive approach:
public static int minElementGreaterThanOrEqualToKey(int A[], int key,
int imin, int imax) {
// Return -1 if the maximum value is less than the minimum or if the key
// is great than the maximum
if (imax < imin || key > A[imax])
return -1;
// Return the first element of the array if that element is greater than
// or equal to the key.
if (key < A[imin])
return imin;
// When the minimum and maximum values become equal, we have located the element.
if (imax == imin)
return imax;
else {
// calculate midpoint to cut set in half, avoiding integer overflow
int imid = imin + ((imax - imin) / 2);
// if key is in upper subset, then recursively search in that subset
if (A[imid] < key)
return minElementGreaterThanOrEqualToKey(A, key, imid + 1, imax);
// if key is in lower subset, then recursively search in that subset
else
return minElementGreaterThanOrEqualToKey(A, key, imin, imid);
}
}
public static int search(int target, int[] arr) {
if (arr == null || arr.length == 0)
return -1;
int lower = 0, higher = arr.length - 1, last = -1;
while (lower <= higher) {
int mid = lower + (higher - lower) / 2;
if (target == arr[mid]) {
last = mid;
lower = mid + 1;
} else if (target < arr[mid]) {
higher = mid - 1;
} else {
lower = mid + 1;
}
}
return (last > -1 && last < arr.length - 1) ? last + 1 : -1;
}
If we find target == arr[mid], then any previous element would be either less than or equal to the target. Hence, the lower boundary is set as lower=mid+1. Also, last is the last index of 'target'. Finally, we return last+1 - taking care of boundary conditions.
My implementation uses condition bottom <= top which is different from the answer by templatetypedef.
int FirstElementGreaterThan(int n, const vector<int>& values) {
int B = 0, T = values.size() - 1, M = 0;
while (B <= T) { // B strictly increases, T strictly decreases
M = B + (T - B) / 2;
if (values[M] <= n) { // all values at or before M are not the target
B = M + 1;
} else {
T = M - 1;// search for other elements before M
}
}
return T + 1;
}
Hhere is a modified binary search code in JAVA with time complexity O(logn) that :
returns index of element to be searched if element is present
returns index of next greater element if searched element is not present in array
returns -1 if an element greater than the largest element of array is searched
public static int search(int arr[],int key) {
int low=0,high=arr.length,mid=-1;
boolean flag=false;
while(low<high) {
mid=(low+high)/2;
if(arr[mid]==key) {
flag=true;
break;
} else if(arr[mid]<key) {
low=mid+1;
} else {
high=mid;
}
}
if(flag) {
return mid;
}
else {
if(low>=arr.length)
return -1;
else
return low;
//high will give next smaller
}
}
public static void main(String args[]) throws IOException {
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
//int n=Integer.parseInt(br.readLine());
int arr[]={12,15,54,221,712};
int key=71;
System.out.println(search(arr,key));
br.close();
}
kind =0 : exact match
kind=1 : just grater than x
kind=-1 : just smaller than x;
It returns -1 if no match is found.
#include <iostream>
#include <algorithm>
using namespace std;
int g(int arr[], int l , int r, int x, int kind){
switch(kind){
case 0: // for exact match
if(arr[l] == x) return l;
else if(arr[r] == x) return r;
else return -1;
break;
case 1: // for just greater than x
if(arr[l]>=x) return l;
else if(arr[r]>=x) return r;
else return -1;
break;
case -1: // for just smaller than x
if(arr[r]<=x) return r;
else if(arr[l] <= x) return l;
else return -1;
break;
default:
cout <<"please give "kind" as 0, -1, 1 only" << ednl;
}
}
int f(int arr[], int n, int l, int r, int x, int kind){
if(l==r) return l;
if(l>r) return -1;
int m = l+(r-l)/2;
while(m>l){
if(arr[m] == x) return m;
if(arr[m] > x) r = m;
if(arr[m] < x) l = m;
m = l+(r-l)/2;
}
int pos = g(arr, l, r, x, kind);
return pos;
}
int main()
{
int arr[] = {1,2,3,5,8,14, 22, 44, 55};
int n = sizeof(arr)/sizeof(arr[0]);
sort(arr, arr+n);
int tcs;
cin >> tcs;
while(tcs--){
int l = 0, r = n-1, x = 88, kind = -1; // you can modify these values
cin >> x;
int pos = f(arr, n, l, r, x, kind);
// kind =0: exact match, kind=1: just grater than x, kind=-1: just smaller than x;
cout <<"position"<< pos << " Value ";
if(pos >= 0) cout << arr[pos];
cout << endl;
}
return 0;
}

Resources