find an element in infinite sorted array - arrays

I got this as an interview question ...
infinite array which is sorted and from some position (we dont know the position) only special symbol '$' will be there we need to find an element in that array ...
i gave a solution like get the first occurrance of $ and then do binary search on the previous part from $
to find the first occurance of $ i gave solution like increment in window size if (i,2i)
the code i gave is
#include<stdio.h>
int first(int *arr,int start,int end,int index)
{
int mid=(start+end)/2;
if((mid==start||arr[mid-1] != '$') && arr[mid]=='$')
return mid;
if(arr[mid]=='$')
return first(arr,start,mid-1,index);
else
{
if(arr[end] =='$')
return first(arr,mid+1,end,index);
else
return first(arr,end+1,(1<<index),index+1);
}
}
int binsearch(int *arr,int end ,int n)
{
int low,high,mid;
high=end-1;
low=0;
while(low<= high)
{
mid=(low+high)/2;
if(n<arr[mid])
high=mid-1;
else if (n >arr[mid])
low=mid+1;
else
return mid;
}
return -1;
}
int main()
{
int arr[20]={1,2,3,4,5,6,7,8,9,10,'$','$','$','$','$','$','$','$','$','$'};
int i =first(arr,0,2,2);
printf("first occurance of $ is %d\n",i);
int n=20;//n is required element to be found
if(i==0||arr[i-1]<n)
printf(" element %d not found",n);
else{
int p=binsearch(arr,i,n);
if(p != -1)
printf("element %d is found at index %d",n,p);
else
printf(" element %d not found",n);
}
return 0;
}
Is there any better way to do the above problem ??
And also i wanted to know to find the first occurance of $ why should we move the window only in powers of 2 why not 3 like (i,3i)
Can someone pls through some light on the recurrance relation ..pls help..

Seems like a fine way to do it to me. As a small optimization, you can stop your first routine when you reach any number bigger than the one you're searching for (not just $).
Growing the window by powers of 2 means you'll find the end in log_2(n) iterations. Growing by factors of 3 means you'll find it in log_3(n) iterations, which is smaller. But not asymptotically smaller, as O(log_2(n)) == O(log_3(n)). And your binary search is going to take log_2(n) steps anyway, so making the first part faster is not going to help your big-O running time.

The efficient part of first function in iterative format would be
private int searchNum(int[] arr, int num, int start, int end) {
int index = 0;
boolean found = false;
for (int i = 0; i < arr.length; i = 1 << index) {
if (start + i < arr.length) {
if (arr[start] <= num && arr[start + i] >= num) {
found = true;
return bsearch(arr, num, start, start + i);
} else {
start = start + i;
}
} else {
return bsearch(arr, num, start, arr.length - 1);
}
}
return 0;
}
this wont return you first occurance but instead try to find number directly as in your case you are missing probability that number itself could be found even before finding the $ symbol. So worst case complexity is O(logn)..
and best case would be (1)
after that you pass this to
private int bsearch(int[] array, int search, int first, int last) {
int middle = (first + last) / 2;
while (first <= last) {
if (array[middle] < search)
first = middle + 1;
else if (array[middle] == search) {
System.out.println(search + " found at location "
+ (middle + 1) + ".");
return middle;
} else
last = middle - 1;
middle = (first + last) / 2;
}
if (first > last)
System.out.println(search + " is not present in the list.\n");
return -1;
}
calling function
if ((pos = searchNum(arr, num, 0, 2)) != -1) {
System.out.println("found # " + pos);
} else {
System.out.println("not found");
}

This is python solution.
arr = [3,5,7,9,10,90,100,130,140,160,170,171,172,173,174,175,176]
elm = 171
k = 0
while (True):
try:
i = (1 << k) - 1 # same as 2**k - 1 # eg 0,1,3,7,15
# print k
if(arr[i] == elm):
print "found at " + str(i)
exit()
elif( arr[i] > elm):
break
except Exception as e:
break
k = k+1
begin = 2**(k-1) # go back to previous power of 2
end = 2**k -1
# Binary search
while (begin <= end):
mid = begin + (end-begin)/2
try:
if(arr[mid] == elm):
print "found at " + str(mid)
exit()
elif(arr[mid] > elm):
end = mid-1
else:
begin = mid+1
except Exception as e:
# Exception can occur if you are trying to access min element and that is not available. hence set end to mid-1
end = mid-1
print "Element not found"

Related

The longest sub-array with switching elements

An array is called "switching" if the odd and even elements are equal.
Example:
[2,4,2,4] is a switching array because the members in even positions (indexes 0 and 2) and odd positions (indexes 1 and 3) are equal.
If A = [3,7,3,7, 2, 1, 2], the switching sub-arrays are:
== > [3,7,3,7] and [2,1,2]
Therefore, the longest switching sub-array is [3,7,3,7] with length = 4.
As another example if A = [1,5,6,0,1,0], the the only switching sub-array is [0,1,0].
Another example: A= [7,-5,-5,-5,7,-1,7], the switching sub-arrays are [7,-1,7] and [-5,-5,-5].
Question:
Write a function that receives an array and find its longest switching sub-array.
I would like to know how you solve this problem and which strategies you use to solve this with a good time complexity?
I am assuming that the array is zero indexed .
if arr.size <= 2
return arr.size
else
ans = 2
temp_ans = 2 // We will update ans when temp_ans > ans;
for i = 2; i < arr.size ; ++i
if arr[i] = arr[i-2]
temp_ans = temp_ans + 1;
else
temp_ans = 2;
ans = max(temp_ans , ans);
return ans;
I think this should work and I don't think it needs any kind of explanation .
Example Code
private static int solve(int[] arr){
if(arr.length == 1) return 1;
int even = arr[0],odd = arr[1];
int start = 0,max_len = 0;
for(int i=2;i<arr.length;++i){
if(i%2 == 0 && arr[i] != even || i%2 == 1 && arr[i] != odd){
max_len = Math.max(max_len,i - start);
start = i-1;
if(i%2 == 0){
even = arr[i];
odd = arr[i-1];
}else{
even = arr[i-1];
odd = arr[i];
}
}
}
return Math.max(max_len,arr.length - start);
}
It's like a sliding window problem.
We keep track of even and odd equality with 2 variables, even and odd.
Whenever we come across a unmet condition, like index even but not equal with even variable and same goes for odd, we first
Record the length till now in max_len.
Reset start to i-1 as this is need incase of all elements equal.
Reset even and odd according to current index i to arr[i] and arr[i-1] respectively.
Demo: https://ideone.com/iUQti7
I didn't analyse the time complexity, just wrote a solution that uses recursion and it works (I think):
public class Main
{
public static int switching(int[] arr, int index, int end)
{
try
{
if (arr[index] == arr[index+2])
{
end = index+2;
return switching(arr, index+1, end);
}
} catch (Exception e) {}
return end;
}
public static void main(String[] args)
{
//int[] arr = {3,2,3,2,3};
//int[] arr = {3,2,3};
//int[] arr = {4,4,4};
int[] arr = {1,2,3,4,5,4,4,7,9,8,10};
int best = -1;
for (int i = 0; i < arr.length; i++)
best = Math.max(best, (switching(arr, i, 0) - i));
System.out.println(best+1); // It returns, in this example, 3
}
}
int switchingSubarray(vector<int> &arr, int n) {
if(n==1||n==2) return n;
int i=0;
int ans=2;
int j=2;
while(j<n)
{
if(arr[j]==arr[j-2]) j++;
else
{
ans=max(ans,j-i);
i=j-1;
j++;
}
}
ans=max(ans,j-i);
return ans;
}
Just using sliding window technique to solve this problems as element at j and j-2 need to be same.
Try to dry run on paper u will surely get it .
# Switching if numbers in even positions equal to odd positions find length of longest switch in continuos sub array
def check(t):
even = []
odd = []
i = 0
while i < len(t):
if i % 2 == 0:
even.append(t[i])
else:
odd.append(t[i])
i += 1
if len(set(even)) == 1 and len(set(odd)) == 1:
return True
else:
return False
def solution(A):
maxval = 0
if len(A) == 1:
return 1
for i in range(0, len(A)):
for j in range(0, len(A)):
if check(A[i:j+1]) == True:
val = len(A[i:j+1])
print(A[i:j+1])
if val > maxval:
maxval = val
return maxval
A = [3,2,3,2,3]
A = [7,4,-2,4,-2,-9]
A=[4]
A = [7,-5,-5,-5,7,-1,7]
print(solution(A))

Search of an element on a unsorted array recursively

This is an exercise that I took from an exam. It asks to write a function that receives an unsorted array v[] and a number X and the function will return 1 if X is present in v[] or 0 if X is not present in v[]. The function must be recursive and must work in this manner:
1. Compares X with the element in the middle of v[];
2. The function calls itself (recursion!!) on upper half and on the lower half of v[];
So I've written this function:
int occ(int *p,int dim,int X){
int pivot,a,b;
pivot=(dim)/2;
if(dim==0) //end of array
return 0;
if(*(p+pivot)==X) //verify if the element in the middle is X
return 1;
a=occ(p,pivot,X); //call on lower half
b=occ(p+pivot,dim-pivot,X); //call on upper half
if(a+b>=1) //if X is found return 1 else 0
return 1;
else{
return 0;
}
}
I tried to simulated it on a sheet of paper and it seems to be correct (Even though I'm not sure) then I've written it on ideone and it can't run the program!
Here is the link: https://ideone.com/ZwwpAW
Is my code actually wrong (probably!) or is it a problem related to ideone. Can someone help me? Thank you in advance!!!
The problem is with b=occ(p+pivot,dim-pivot,X); when pivot is 0. i.e. when dim is 1.
the next function call becomes occ(p,1,X); This again leads to the call occ(p,1,X); in a continuous loop.
It can be fixed by adding a condition to the call, as shown in the code below.
int occ(int *p,int dim,int X){
int pivot,a=0,b=0;
pivot=(dim)/2;
if(dim==0){
return 0;
}
if(*(p+pivot)==X)
return 1;
if (pivot != 0)
{
a=occ(p,pivot,X);
b=occ(p+pivot,dim-pivot,X);
}
if(a+b>=1)
return 1;
else{
return 0;
}
}
The implemetation is causing a stack overflow, as the recursion does not terminate if the input contains only one element. This can be fixed as follows.
int occ(int *p, int dim, int X)
{
int pivot, a, b;
pivot = (dim) / 2;
if (dim == 0)
{
return 0;
}
if (*(p + pivot) == X)
{
return 1;
}
if (dim == 1)
{
if (*(p + pivot) == X)
{
return 1;
}
else
{
return 0;
}
}
a = occ(p, pivot, X);
b = occ(p + pivot, dim - pivot, X);
if (a + b >= 1)
{
return 1;
}
else
{
return 0;
}
}
It's enought to change only this one line in the source code to avoid the endless loop with occ(p,1,X):
//if(dim==0) //end of array
if (pivot == 0)
return 0;

Infinite recursion: binary search & asserts

I'm writing an implementation of binary search in C and I'm getting infinite recursion for no apparent (to me) reason. Heres my code:
/*Orchestrate program*/
int main(int argc, char* argv){
int array[] = {1,2,3,3,3,6,7,8,9,9,20,21,22};
int length = 13;
int key = 23;
binary_search(key, array, 0, length - 1);
return 0;
}
int binary_search(int key, int array[], int first_index, int last_index){
int middle;
middle = (first_index + last_index)/2;
if (first_index == last_index){
if (array[middle] == key){
printf("%d has been found at position %d\n", key, middle+1);
}
printf("item not found");
}
else if (key > array[middle]){
binary_search(key, array, middle, last_index);
}
else if (key < array[middle]){
binary_search(key, array, first_index, middle);
}
}
Based on the value of my key in main, I guess the problem lies in the first else if, but I'm not sure why. If I were to remove the first_index == last_index line, the algorithm works fine but only when the item is in the array. If the item isn't in the array, I naturally get infinite recursion.
Also, I tried to fix this problem by removing the first_index == last_index line and placing a return -1; at the end of the function, but I get the same problem that I am getting now.
EDIT:
Putting together pieces of advice I received from a few different users, I came to the following solution (fixed off by one errors and un-nested decisions):
void binary_search(int key, int array[], int first_index, int last_index){
int middle;
middle = (first_index + last_index)/2;
if (array[middle] == key){
printf("%d has been found at position %d\n", key, middle+1);
}
if (first_index == last_index){
printf("item not found");
}
else if (key > array[middle]){
binary_search(key, array, middle + 1, last_index);
}
else if (key < array[middle]){
binary_search(key, array, first_index, middle - 1);
}
}
I have a follow-up question: Could there have been a way to use asserts to assist me in finding this solution myself? (I'm just learning about asserts so I'm wondering where I can apply them)
You search ever smaller ranges of a sorted array. The bounadries of your array are inclusive.
The base case of your recursion is: If the range is empty, the key is not found. Or, in code:
if (first_index > last_index){
printf("Not found\n");
}
You should calculate and compare the middle element of your range only after you have established that the range is not empty. In that case, you have three outcomes:
The middle element is the key: bingo!
The middle element is smaller than the key: Search the right half of the array and exclude the middle element, which we have already checked.
The middle element is larger than the key: Ditto, but with the left half.
Putting this all together:
void binary_search(int key, int array[], int first_index, int last_index)
{
if (first_index > last_index){
printf("Not found\n");
} else {
int middle = (first_index + last_index) / 2;
if (array[middle] == key) printf("%d at index %d\n", key, middle);
if (key > array[middle]){
binary_search(key, array, middle + 1, last_index);
} else {
binary_search(key, array, first_index, middle - 1);
}
}
}
This function still has two things that nag me:
A function that prints the index is of little practical use. The printing should be done by the client code, i.e. by the code that calls the function. Return the found index or a special value for "not found" instead.
The range has inclusive bounds. That's not very C-like. In C, a range is usually described by an inclusive lower and an exclusive upper bound. That's how array indices and for loops work. Following this convention means that your client code doesn't have to do the awkward length - 1 calculation.
So here's a variant that returns the index or -1 if the key is not in the array:
int binary_search1(int key, int array[], int first_index, int last_index)
{
if (first_index < last_index){
int middle = (first_index + last_index) / 2;
if (array[middle] == key) return middle;
if (key > array[middle]){
return binary_search1(key, array, middle + 1, last_index);
} else {
return binary_search1(key, array, first_index, middle);
}
}
return -1;
}
and test it with:
int main()
{
int arr[6] = {3, 4, 6, 8, 12, 13};
int i;
for (i = 0; i < 20; i++) {
int ix = binary_search(i, arr, 0, 6);
if (ix < 0) {
printf("%d not found.\n", i);
} else {
printf("%d at index %d.\n", i, ix);
}
}
return 0;
}
Note that your original array has duplicate entries. This is okay, but you will get the index of any of the duplicate values, not necessarily the first one.
Your function should look like this:
void binary_search(int key, int array[], int first_index, int last_index){
int middle;
middle = (first_index + last_index)/2;
if (array[middle] == key){
printf("%d has been found at position %d\n", key, middle+1);
}
else if (first_index == last_index) {
printf("item not found");
}
else if (key > array[middle]){
binary_search(key, array, middle + 1, last_index);
}
else {
//assert (key < array[middle]); // feel free to uncomment this one and include the assert library if you want
binary_search(key, array, first_index, middle - 1);
}
}
In other words, increment or decrement middle appropriately in the recursive call.
This is important, because, for example, when you reduce to size 2 for your search and middle is your first element, then effectively you are not changing the dimension of the array in the recursive calls.
I also changed your function to void since you are not returning anything.

Programs works when compiled in clang, but not gcc in Windows

Have this search function that works when I compile in Linux using clang, but on Windows using MinGW gcc, I do not get the right answer. Included in the code is an array where clearly the value I'm looking for is in the array. So output should be "Found it!". Anyone know what might the issue be with windows?
#include <stdio.h>
#include <stdbool.h>
bool re_search(int value, int values[], int first, int last);
int main(void)
{
int value = 12;
int values[] = {2,4,5,12,23,34};
int n = 6;
if (re_search(value, values, 0, n-1))
{
printf("Found it!\n");
}
else
{
printf("Did not find it\n");
}
return 0;
}
bool re_search(int value, int values[], int first, int last)
{
last = last-1;
int middle = (first+last)/2;
while (first <= last)
{
if (value == values[middle])
{
return true;
}
else if (value > values[middle])
{
first = middle + 1;
middle = (first + last) / 2;
return re_search(value, &values[middle], first, last);
}
else
{
last = middle - 1;
middle = (first + last) / 2;
return re_search(value, &values[first], first, last);
}
}
return false;
}
Your recursive call return re_search(value, &values[middle], first, last); is passing in both an array which starts at the midpoint, and a new value of first which counts from the whole array's start. You want to do one or the other; not both.
That is, you first call with:
values == {2,4,5,12,23,34}
first == 0
last == 5
In the first iteration, you try middle == 2, so values[middle] is 5, which is less than 12. You then recurse with
values == {12,23,34}
first == 3
last == 5
And - oh dear! - even values[first] is now out of range. Chances are, on Linux you got (un)lucky and hit the value you were searching for past the end of the array.
does not matter whether GCC and windows.
bool re_search(int value, int values[], int first, int last){
if (first <= last){
int middle = (first+last)/2;
if (value == values[middle]){
return true;
} else if (value > values[middle]){
return re_search(value, values, middle + 1, last);
} else {
return re_search(value, values, first, middle - 1);
}
}
return false;
}
bool re_search(int value, int values[], int first, int last){
while (first <= last){
int middle = (first+last)/2;
if (value == values[middle]){
return true;
} else if (value > values[middle]){
first = middle + 1;
} else {
last = middle - 1;
}
}
return false;
}

Find length of smallest window that contains all the characters of a string in another string

Recently i have been interviewed. I didn't do well cause i got stuck at the following question
suppose a sequence is given : A D C B D A B C D A C D
and search sequence is like: A C D
task was to find the start and end index in given string that contains all the characters of search string preserving the order.
Output: assuming index start from 1:
start index 10
end index 12
explanation :
1.start/end index are not 1/3 respectively because though they contain the string but order was not maintained
2.start/end index are not 1/5 respectively because though they contain the string in the order but the length is not optimum
3.start/end index are not 6/9 respectively because though they contain the string in the order but the length is not optimum
Please go through How to find smallest substring which contains all characters from a given string?.
But the above question is different since the order is not maintained. I'm still struggling to maintain the indexes. Any help would be appreciated . thanks
I tried to write some simple c code to solve the problem:
Update:
I wrote a search function that looks for the required characters in correct order, returning the length of the window and storing the window start point to ìnt * startAt. The function processes a sub-sequence of given hay from specified startpoint int start to it's end
The rest of the algorithm is located in main where all possible subsequences are tested with a small optimisation: we start looking for the next window right after the startpoint of the previous one, so we skip some unnecessary turns. During the process we keep track f the 'till-now best solution
Complexity is O(n*n/2)
Update2:
unnecessary dependencies have been removed, unnecessary subsequent calls to strlen(...) have been replaced by size parameters passed to search(...)
#include <stdio.h>
// search for single occurrence
int search(const char hay[], int haySize, const char needle[], int needleSize, int start, int * startAt)
{
int i, charFound = 0;
// search from start to end
for (i = start; i < haySize; i++)
{
// found a character ?
if (hay[i] == needle[charFound])
{
// is it the first one?
if (charFound == 0)
*startAt = i; // store starting position
charFound++; // and go to next one
}
// are we done?
if (charFound == needleSize)
return i - *startAt + 1; // success
}
return -1; // failure
}
int main(int argc, char **argv)
{
char hay[] = "ADCBDABCDACD";
char needle[] = "ACD";
int resultStartAt, resultLength = -1, i, haySize = sizeof(hay) - 1, needleSize = sizeof(needle) - 1;
// search all possible occurrences
for (i = 0; i < haySize - needleSize; i++)
{
int startAt, length;
length = search(hay, haySize, needle, needleSize, i, &startAt);
// found something?
if (length != -1)
{
// check if it's the first result, or a one better than before
if ((resultLength == -1) || (resultLength > length))
{
resultLength = length;
resultStartAt = startAt;
}
// skip unnecessary steps in the next turn
i = startAt;
}
}
printf("start at: %d, length: %d\n", resultStartAt, resultLength);
return 0;
}
Start from the beginning of the string.
If you encounter an A, then mark the position and push it on a stack. After that, keep checking the characters sequentially until
1. If you encounter an A, update the A's position to current value.
2. If you encounter a C, push it onto the stack.
After you encounter a C, again keep checking the characters sequentially until,
1. If you encounter a D, erase the stack containing A and C and mark the score from A to D for this sub-sequence.
2. If you encounter an A, then start another Stack and mark this position as well.
2a. If now you encounter a C, then erase the earlier stacks and keep the most recent stack.
2b. If you encounter a D, then erase the older stack and mark the score and check if it is less than the current best score.
Keep doing this till you reach the end of the string.
The pseudo code can be something like:
Initialize stack = empty;
Initialize bestLength = mainString.size() + 1; // a large value for the subsequence.
Initialize currentLength = 0;
for ( int i = 0; i < mainString.size(); i++ ) {
if ( stack is empty ) {
if ( mainString[i] == 'A' ) {
start a new stack and push A on it.
mark the startPosition for this stack as i.
}
continue;
}
For each of the stacks ( there can be at most two stacks prevailing,
one of size 1 and other of size 0 ) {
if ( stack size == 1 ) // only A in it {
if ( mainString[i] == 'A' ) {
update the startPosition for this stack as i.
}
if ( mainString[i] == 'C' ) {
push C on to this stack.
}
} else if ( stack size == 2 ) // A & C in it {
if ( mainString[i] == 'C' ) {
if there is a stack with size 1, then delete this stack;// the other one dominates this stack.
}
if ( mainString[i] == 'D' ) {
mark the score from startPosition till i and update bestLength accordingly.
delete this stack.
}
}
}
}
I modified my previous suggestion using a single queue, now I believe this algorithm runs with O(N*m) time:
FindSequence(char[] sequenceList)
{
queue startSeqQueue;
int i = 0, k;
int minSequenceLength = sequenceList.length + 1;
int startIdx = -1, endIdx = -1;
for (i = 0; i < sequenceList.length - 2; i++)
{
if (sequenceList[i] == 'A')
{
startSeqQueue.queue(i);
}
}
while (startSeqQueue!=null)
{
i = startSeqQueue.enqueue();
k = i + 1;
while (sequenceList.length < k && sequenceList[k] != 'C')
if (sequenceList[i] == 'A') i = startSeqQueue.enqueue();
k++;
while (sequenceList.length < k && sequenceList[k] != 'D')
k++;
if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
{
startIdx = i;
endIdx = j;
minSequenceLength = k - i + 1;
}
}
return startIdx & endIdx
}
My previous (O(1) memory) suggestion:
FindSequence(char[] sequenceList)
{
int i = 0, k;
int minSequenceLength = sequenceList.length + 1;
int startIdx = -1, endIdx = -1;
for (i = 0; i < sequenceList.length - 2; i++)
if (sequenceList[i] == 'A')
k = i+1;
while (sequenceList.length < k && sequenceList[k] != 'C')
k++;
while (sequenceList.length < k && sequenceList[k] != 'D')
k++;
if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
{
startIdx = i;
endIdx = j;
minSequenceLength = k - i + 1;
}
return startIdx & endIdx;
}
Here's my version. It keeps track of possible candidates for an optimum solution. For each character in the hay, it checks whether this character is in sequence of each candidate. It then selectes the shortest candidate. Quite straightforward.
class ShortestSequenceFinder
{
public class Solution
{
public int StartIndex;
public int Length;
}
private class Candidate
{
public int StartIndex;
public int SearchIndex;
}
public Solution Execute(string hay, string needle)
{
var candidates = new List<Candidate>();
var result = new Solution() { Length = hay.Length + 1 };
for (int i = 0; i < hay.Length; i++)
{
char c = hay[i];
for (int j = candidates.Count - 1; j >= 0; j--)
{
if (c == needle[candidates[j].SearchIndex])
{
if (candidates[j].SearchIndex == needle.Length - 1)
{
int candidateLength = i - candidates[j].StartIndex;
if (candidateLength < result.Length)
{
result.Length = candidateLength;
result.StartIndex = candidates[j].StartIndex;
}
candidates.RemoveAt(j);
}
else
{
candidates[j].SearchIndex += 1;
}
}
}
if (c == needle[0])
candidates.Add(new Candidate { SearchIndex = 1, StartIndex = i });
}
return result;
}
}
It runs in O(n*m).
Here is my solution in Python. It returns the indexes assuming 0-indexed sequences. Therefore, for the given example it returns (9, 11) instead of (10, 12). Obviously it's easy to mutate this to return (10, 12) if you wish.
def solution(s, ss):
S, E = [], []
for i in xrange(len(s)):
if s[i] == ss[0]:
S.append(i)
if s[i] == ss[-1]:
E.append(i)
candidates = sorted([(start, end) for start in S for end in E
if start <= end and end - start >= len(ss) - 1],
lambda x,y: (x[1] - x[0]) - (y[1] - y[0]))
for cand in candidates:
i, j = cand[0], 0
while i <= cand[-1]:
if s[i] == ss[j]:
j += 1
i += 1
if j == len(ss):
return cand
Usage:
>>> from so import solution
>>> s = 'ADCBDABCDACD'
>>> solution(s, 'ACD')
(9, 11)
>>> solution(s, 'ADC')
(0, 2)
>>> solution(s, 'DCCD')
(1, 8)
>>> solution(s, s)
(0, 11)
>>> s = 'ABC'
>>> solution(s, 'B')
(1, 1)
>>> print solution(s, 'gibberish')
None
I think the time complexity is O(p log(p)) where p is the number of pairs of indexes in the sequence that refer to search_sequence[0] and search_sequence[-1] where the index for search_sequence[0] is less than the index forsearch_sequence[-1] because it sorts these p pairings using an O(n log n) algorithm. But then again, my substring iteration at the end could totally overshadow that sorting step. I'm not really sure.
It probably has a worst-case time complexity which is bounded by O(n*m) where n is the length of the sequence and m is the length of the search sequence, but at the moment I cannot think of an example worst-case.
Here is my O(m*n) algorithm in Java:
class ShortestWindowAlgorithm {
Multimap<Character, Integer> charToNeedleIdx; // Character -> indexes in needle, from rightmost to leftmost | Multimap is a class from Guava
int[] prefixesIdx; // prefixesIdx[i] -- rightmost index in the hay window that contains the shortest found prefix of needle[0..i]
int[] prefixesLengths; // prefixesLengths[i] -- shortest window containing needle[0..i]
public int shortestWindow(String hay, String needle) {
init(needle);
for (int i = 0; i < hay.length(); i++) {
for (int needleIdx : charToNeedleIdx.get(hay.charAt(i))) {
if (firstTimeAchievedPrefix(needleIdx) || foundShorterPrefix(needleIdx, i)) {
prefixesIdx[needleIdx] = i;
prefixesLengths[needleIdx] = getPrefixNewLength(needleIdx, i);
forgetOldPrefixes(needleIdx);
}
}
}
return prefixesLengths[prefixesLengths.length - 1];
}
private void init(String needle) {
charToNeedleIdx = ArrayListMultimap.create();
prefixesIdx = new int[needle.length()];
prefixesLengths = new int[needle.length()];
for (int i = needle.length() - 1; i >= 0; i--) {
charToNeedleIdx.put(needle.charAt(i), i);
prefixesIdx[i] = -1;
prefixesLengths[i] = -1;
}
}
private boolean firstTimeAchievedPrefix(int needleIdx) {
int shortestPrefixSoFar = prefixesLengths[needleIdx];
return shortestPrefixSoFar == -1 && (needleIdx == 0 || prefixesLengths[needleIdx - 1] != -1);
}
private boolean foundShorterPrefix(int needleIdx, int hayIdx) {
int shortestPrefixSoFar = prefixesLengths[needleIdx];
int newLength = getPrefixNewLength(needleIdx, hayIdx);
return newLength <= shortestPrefixSoFar;
}
private int getPrefixNewLength(int needleIdx, int hayIdx) {
return needleIdx == 0 ? 1 : (prefixesLengths[needleIdx - 1] + (hayIdx - prefixesIdx[needleIdx - 1]));
}
private void forgetOldPrefixes(int needleIdx) {
if (needleIdx > 0) {
prefixesLengths[needleIdx - 1] = -1;
prefixesIdx[needleIdx - 1] = -1;
}
}
}
It works on every input and also can handle repeated characters etc.
Here are some examples:
public class StackOverflow {
public static void main(String[] args) {
ShortestWindowAlgorithm algorithm = new ShortestWindowAlgorithm();
System.out.println(algorithm.shortestWindow("AXCXXCAXCXAXCXCXAXAXCXCXDXDXDXAXCXDXAXAXCD", "AACD")); // 6
System.out.println(algorithm.shortestWindow("ADCBDABCDACD", "ACD")); // 3
System.out.println(algorithm.shortestWindow("ADCBDABCD", "ACD")); // 4
}
I haven't read every answer here, but I don't think anyone has noticed that this is just a restricted version of local pairwise sequence alignment, in which we are only allowed to insert characters (and not delete or substitute them). As such it will be solved by a simplification of the Smith-Waterman algorithm that considers only 2 cases per vertex (arriving at the vertex either by matching a character exactly, or by inserting a character) rather than 3 cases. This algorithm is O(n^2).
Here's my solution. It follows one of the pattern matching solutions. Please comment/correct me if I'm wrong.
Given the input string as in the question
A D C B D A B C D A C D. Let's first compute the indices where A occurs. Assuming a zero based index this should be [0,5,9].
Now the pseudo code is as follows.
Store the indices of A in a list say *orders*.// orders=[0,5,9]
globalminStart, globalminEnd=0,localMinStart=0,localMinEnd=0;
for (index: orders)
{
int i =index;
Stack chars=new Stack();// to store the characters
i=localminStart;
while(i< length of input string)
{
if(str.charAt(i)=='C') // we've already seen A, so we look for C
st.push(str.charAt(i));
i++;
continue;
else if(str.charAt(i)=='D' and st.peek()=='C')
localminEnd=i; // we have a match! so assign value of i to len
i+=1;
break;
else if(str.charAt(i)=='A' )// seen the next A
break;
}
if (globalMinEnd-globalMinStart<localMinEnd-localMinStart)
{
globalMinEnd=localMinEnd;
globalMinStart=localMinStart;
}
}
return [globalMinstart,globalMinEnd]
}
P.S: this is pseudocode and a rough idea. Id be happy to correct it and understand if there's something wrong.
AFAIC Time complexity -O(n). Space complexity O(n)

Resources