Counting and removing duplicates in an array c - c

Let's say I have an array with [2,4,6,7, 7, 4,4]
I want a program that can iterate through, and then print out something like this:
Value: Count:
2 1
4 3
6 1
7 2
I don't want it to print out ex 4 three times.
What I got so far:
for (int i = 0; i < numberOfInts; i++)
{
dub[i] = 0;
for (int y = 0; y < numberOfInts; y++)
{
if (enarray[i] == enarray[y])
{
dub[i]++;
}
}
}
So basically I check each element in the array against all the elements, and for every duplicate I add one to the index in the new array dub[].
So if I ran this code with the example array above, and then printed it out with I'd get something like this:
1,3,1,2,2,3,3. These are pretty confusing numbers, because I don't really know which numbers these belong to. Especially when I'll randomize the numbers in the array. And then I have to remove numbers so I only have one of each. Anyone got a better solution?

You can iterate through the array while checking for each element if it has been repeated in which case you increment it's count (the loop checks only values a head saving processing time). This let you accomplish what you needed without creating any extra buffer array or structure.
The bool 'bl' prevents repeated printing
int main() {
int arr[] = { 2, 4, 6, 7, 7, 4, 4 };
int size = (sizeof(arr) / sizeof(int));
printf("Value:\tCount\n");
for (int i = 0; i < size; i++) {
int count = 0, bl = 1; //or 'true' for print
//check elements ahead and increment count if repeated value is found
for (int j = i; j < size; j++) {
if (arr[i] == arr[j]) {
count++;
}
}
//check if it has been printed already
for (int j = i-1; j >= 0; j--) {
if (arr[i] == arr[j]) {
bl = 0; //print 'false'
}
}
if (bl) { printf("%d\t\t%d\n", arr[i], count); }
}
return 0;
}

Given the char array only contains '0' to '9', you may utilize a trivial lookup table like this:
#include <stdio.h>
typedef struct
{
char c;
int num;
} TSet;
TSet my_set[] =
{
{ '0', 0 },
{ '1', 0 },
{ '2', 0 },
{ '3', 0 },
{ '4', 0 },
{ '5', 0 },
{ '6', 0 },
{ '7', 0 },
{ '8', 0 },
{ '9', 0 },
};
int main()
{
char a[] = {'2','4','6','7','7', '4','4'};
int i;
for( i = 0; i < sizeof(a) / sizeof(char); i++ )
{
my_set[ a[i] - '0' ].num++;
}
printf( "%-10s%-10s\n", "Value:", "Count:" );
for( i = 0; i < sizeof(my_set) / sizeof(TSet); i++ )
{
if( my_set[i].num != 0 )
{
printf( "%-10c%-10d\n", my_set[i].c, my_set[i].num );
}
}
}
Output:
Value: Count:
2 1
4 3
6 1
7 2

I don't understand the complexity here. I think there are two approaches that are performant and easy to implement:
Counting Sort
requires int array of size of the biggest element in your array
overall complexity O(n + m) where m is the biggest element in your array
qsort and enumeration
qsort works in O(n * log(n)) and gives you a sorted array
once the array is sorted, you can simply iterate over it and count
overall complexity O(n*log(n))

sort the array, typically by using the qsort() function
iterate over all elements counting successively equal elements and if the next different element is detected print the count of the former
This works on any number of different elements. Also no second array is needed.

You have the general idea. In addition to your input array, I would suggest three more arrays:
a used array that keeps track of which entries in the input have already been counted.
a value array that keeps track of the distinct numbers in the input array.
a count array that keeps track of how many times a number appears.
For example, after processing the 2 and the 4 in the input array, the array contents would be
input[] = { 2,4,6,7,7,4,4 };
used[] = { 1,1,0,0,0,1,1 }; // all of the 2's and 4's have been used
value[] = { 2,4 }; // unique numbers found so far are 2 and 4
count[] = { 1,3 }; // one '2' and three '4's

Put a print statement in the outer for loop to print value and repetition
for (int i = 0; i < numberOfInts; i++)
{
dub[i] = 0;
for (int y = 0; y < numberOfInts; y++)
{
if (enarray[i] == enarray[y])
{
dub[i]++;
}
}
printf("%d%d",enarray[i], dub[i]);
}

What you're asking for is strange. Normally, I'd create a struct with 2 members, like 'number' and 'count'. But let's try exactly what you're asking for (unidimensional array with each number followed by it's count):
int
i,
numberOfInts = 7,
numberOfDubs = 0,
enarray[7] = {2,4,6,7,7,4,4},
dub[14]; // sizeof(enrray) * 2 => maximum number of dubs (if there are no duplicates)
// For every number on enarray
for(i = 0; i < numberOfInts; i++)
{
int jump = 0;
// Check if we have already counted it
// Only check against pairs: Odds are the dub counter
for(int d = 0; d < numberOfDubs && !jump; d += 2)
{
if(dub[d] == enarray[i])
{
jump = 1;
}
}
// If not found, count it
if(!jump)
{
// Assign the new number
dub[numberOfDubs] = enarray[i];
dub[numberOfDubs + 1] = 1;
// We can begin from 'i + 1'
for(int y = i + 1; y < numberOfInts; y++)
{
if(enarray[i] == enarray[y])
{
dub[numberOfDubs + 1]++;
}
}
// Increment dub's counter by 2: number and it's counter
numberOfDubs += 2;
}
}
// Show results
for(i = 0; i < numberOfDubs; i += 2)
{
printf("%d repeated %d time%s\n", dub[i], dub[i + 1], (dub[i + 1] == 1 ? "" : "s"));
}

Related

Cunit test invalid read/write of size8

Invalid read and write of size 8 happening in modify_tab_size().
what am I doing wrong? Ive tried almost everything, I dont understand it.
// Function being tested.
int erase_repeated(int *nb_words, char **words) {
for (int i = 0; i < *nb_words; ++i) {
if (words[i] != 0) {
for (int b = 0; b < *nb_words; ++b) {
if (strcmp(words[i], words[b]) == 0 && b != i)
modify_tab_size(&b, nb_words, words);
}
}
}
return *nb_mots;
}
void modify_tab_size(int *b, int *nb_words_update, char **words) {
free(words[*b]);
for (int k = *b; k < *nb_words_update; k++) {
words[k] = words[k + 1]; <--------------------------read error
words[*nb_words_update + 1] = 0; <--------------------------write error
}
(*nb_words_update)--;
(*b)--;
}
The problem is k+1 and *nb_words_update + 1 can walk off the array, and it is. Add printf("k:%d, k+1:%d, *nb_words_update + 1: %d\n", k, k+1, *nb_words_update + 1); into the loop to see.
k:1, k+1:2, *nb_words_update + 1: 4
k:2, k+1:3, *nb_words_update + 1: 4
You've only allocated three slots, 3 and 4 walk off the end of the array.
Since nb_words_update starts as the length of the array, words[*nb_words_update + 1] = 0; is always going to be too large. words[*nb_words_update] = 0; is also too large.
What you seem to be trying to do is deleting an element from an array by shifting everything after it to the left.
void delete_element(char **words, int *b, int *size) {
// Free the string to be deleted.
free(words[*b]);
// Only go up to the second to last element to avoid walking off the array.
for (int i = *b; i < *size-1; i++) {
// Shift everything to the left.
words[i] = words[i+1];
}
// Null out the last element.
// Don't use 0 for NULL, it's confusing.
words[*size-1] = NULL;
// Decrement the size of the array.
(*size)--;
// Redo the check with the newly shifted element.
(*b)--;
}
This sort of thing is better done with a linked list.
Note that your code has a bug. The result is an array of two elements, but one of them is blank. In addition to the return value of erase_repeated, also test its side effect which is to modify words. Test that words contains what you think it does.

The longest sub-array with switching elements

An array is called "switching" if the odd and even elements are equal.
Example:
[2,4,2,4] is a switching array because the members in even positions (indexes 0 and 2) and odd positions (indexes 1 and 3) are equal.
If A = [3,7,3,7, 2, 1, 2], the switching sub-arrays are:
== > [3,7,3,7] and [2,1,2]
Therefore, the longest switching sub-array is [3,7,3,7] with length = 4.
As another example if A = [1,5,6,0,1,0], the the only switching sub-array is [0,1,0].
Another example: A= [7,-5,-5,-5,7,-1,7], the switching sub-arrays are [7,-1,7] and [-5,-5,-5].
Question:
Write a function that receives an array and find its longest switching sub-array.
I would like to know how you solve this problem and which strategies you use to solve this with a good time complexity?
I am assuming that the array is zero indexed .
if arr.size <= 2
return arr.size
else
ans = 2
temp_ans = 2 // We will update ans when temp_ans > ans;
for i = 2; i < arr.size ; ++i
if arr[i] = arr[i-2]
temp_ans = temp_ans + 1;
else
temp_ans = 2;
ans = max(temp_ans , ans);
return ans;
I think this should work and I don't think it needs any kind of explanation .
Example Code
private static int solve(int[] arr){
if(arr.length == 1) return 1;
int even = arr[0],odd = arr[1];
int start = 0,max_len = 0;
for(int i=2;i<arr.length;++i){
if(i%2 == 0 && arr[i] != even || i%2 == 1 && arr[i] != odd){
max_len = Math.max(max_len,i - start);
start = i-1;
if(i%2 == 0){
even = arr[i];
odd = arr[i-1];
}else{
even = arr[i-1];
odd = arr[i];
}
}
}
return Math.max(max_len,arr.length - start);
}
It's like a sliding window problem.
We keep track of even and odd equality with 2 variables, even and odd.
Whenever we come across a unmet condition, like index even but not equal with even variable and same goes for odd, we first
Record the length till now in max_len.
Reset start to i-1 as this is need incase of all elements equal.
Reset even and odd according to current index i to arr[i] and arr[i-1] respectively.
Demo: https://ideone.com/iUQti7
I didn't analyse the time complexity, just wrote a solution that uses recursion and it works (I think):
public class Main
{
public static int switching(int[] arr, int index, int end)
{
try
{
if (arr[index] == arr[index+2])
{
end = index+2;
return switching(arr, index+1, end);
}
} catch (Exception e) {}
return end;
}
public static void main(String[] args)
{
//int[] arr = {3,2,3,2,3};
//int[] arr = {3,2,3};
//int[] arr = {4,4,4};
int[] arr = {1,2,3,4,5,4,4,7,9,8,10};
int best = -1;
for (int i = 0; i < arr.length; i++)
best = Math.max(best, (switching(arr, i, 0) - i));
System.out.println(best+1); // It returns, in this example, 3
}
}
int switchingSubarray(vector<int> &arr, int n) {
if(n==1||n==2) return n;
int i=0;
int ans=2;
int j=2;
while(j<n)
{
if(arr[j]==arr[j-2]) j++;
else
{
ans=max(ans,j-i);
i=j-1;
j++;
}
}
ans=max(ans,j-i);
return ans;
}
Just using sliding window technique to solve this problems as element at j and j-2 need to be same.
Try to dry run on paper u will surely get it .
# Switching if numbers in even positions equal to odd positions find length of longest switch in continuos sub array
def check(t):
even = []
odd = []
i = 0
while i < len(t):
if i % 2 == 0:
even.append(t[i])
else:
odd.append(t[i])
i += 1
if len(set(even)) == 1 and len(set(odd)) == 1:
return True
else:
return False
def solution(A):
maxval = 0
if len(A) == 1:
return 1
for i in range(0, len(A)):
for j in range(0, len(A)):
if check(A[i:j+1]) == True:
val = len(A[i:j+1])
print(A[i:j+1])
if val > maxval:
maxval = val
return maxval
A = [3,2,3,2,3]
A = [7,4,-2,4,-2,-9]
A=[4]
A = [7,-5,-5,-5,7,-1,7]
print(solution(A))

Transform an array to another array by shifting value to adjacent element

I am given 2 arrays, Input and Output Array. The goal is to transform the input array to output array by performing shifting of 1 value in a given step to its adjacent element. Eg: Input array is [0,0,8,0,0] and Output array is [2,0,4,0,2]. Here 1st step would be [0,1,7,0,0] and 2nd step would be [0,1,6,1,0] and so on.
What can be the algorithm to do this efficiently? I was thinking of performing BFS but then we have to do BFS from each element and this can be exponential. Can anyone suggest solution for this problem?
I think you can do this simply by scanning in each direction tracking the cumulative value (in that direction) in the current array and the desired output array and pushing values along ahead of you as necessary:
scan from the left looking for first cell where
cumulative value > cumulative value in desired output
while that holds move 1 from that cell to the next cell to the right
scan from the right looking for first cell where
cumulative value > cumulative value in desired output
while that holds move 1 from that cell to the next cell to the left
For your example the steps would be:
FWD:
[0,0,8,0,0]
[0,0,7,1,0]
[0,0,6,2,0]
[0,0,6,1,1]
[0,0,6,0,2]
REV:
[0,1,5,0,2]
[0,2,4,0,2]
[1,1,4,0,2]
[2,0,4,0,2]
i think BFS could actually work.
notice that n*O(n+m) = O(n^2+nm) and therefore not exponential.
also you could use: Floyd-Warshall algorithm and Johnson’s algorithm, with a weight of 1 for a "flat" graph, or even connect the vertices in a new way by their actual distance and potentially save some iterations.
hope it helped :)
void transform(int[] in, int[] out, int size)
{
int[] state = in.clone();
report(state);
while (true)
{
int minPressure = 0;
int indexOfMinPressure = 0;
int maxPressure = 0;
int indexOfMaxPressure = 0;
int pressureSum = 0;
for (int index = 0; index < size - 1; ++index)
{
int lhsDiff = state[index] - out[index];
int rhsDiff = state[index + 1] - out[index + 1];
int pressure = lhsDiff - rhsDiff;
if (pressure < minPressure)
{
minPressure = pressure;
indexOfMinPressure = index;
}
if (pressure > maxPressure)
{
maxPressure = pressure;
indexOfMaxPressure = index;
}
pressureSum += pressure;
}
if (minPressure == 0 && maxPressure == 0)
{
break;
}
boolean shiftLeft;
if (Math.abs(minPressure) > Math.abs(maxPressure))
{
shiftLeft = true;
}
else if (Math.abs(minPressure) < Math.abs(maxPressure))
{
shiftLeft = false;
}
else
{
shiftLeft = (pressureSum < 0);
}
if (shiftLeft)
{
++state[indexOfMinPressure];
--state[indexOfMinPressure + 1];
}
else
{
--state[indexOfMaxPressure];
++state[indexOfMaxPressure + 1];
}
report(state);
}
}
A simple greedy algorithm will work and do the job in minimum number of steps. The function returns the total numbers of steps required for the task.
int shift(std::vector<int>& a,std::vector<int>& b){
int n = a.size();
int sum1=0,sum2=0;
for (int i = 0; i < n; ++i){
sum1+=a[i];
sum2+=b[i];
}
if (sum1!=sum2)
{
return -1;
}
int operations=0;
int j=0;
for (int i = 0; i < n;)
{
if (a[i]<b[i])
{
while(j<n and a[j]==0){
j++;
}
if(a[j]<b[i]-a[i]){
operations+=(j-i)*a[j];
a[i]+=a[j];
a[j]=0;
}else{
operations+=(j-i)*(b[i]-a[i]);
a[j]-=(b[i]-a[i]);
a[i]=b[i];
}
}else if (a[i]>b[i])
{
a[i+1]+=(a[i]-b[i]);
operations+=(a[i]-b[i]);
a[i]=b[i];
}else{
i++;
}
}
return operations;
}
Here -1 is a special value meaning that given array cannot be converted to desired one.
Time Complexity: O(n).

merge two arrays in ascending order

i wanted to put two sorted arrays into one array in ascending order, but i don't what i did wrong.
it won't put them in order, just combine the two arrays together.
int [] merged = new int[count1 + count2];
int merg1 = 0, merg2 = 0, index = 0;
while (merg1 < count1 && merg2 < count2) {
if (ary1[merg1] <= ary2[merg2]) {
merged[index++] = ary1[merg1++];
}
else {
merged[index++] = ary2[merg2++];
}
while (merg1 < count1) {
merged[index++] = ary1[merg1++];
}
while (merg2 < count2) {
merged[index++] = ary2[merg2++];
}
for (int i = 0; i < index; i++) {
System.out.print(merged[i] + " ");
}
What are the last two while loops for? You may not need them. It seems that you sort the first two numbers then the while loops copy the first array then the second without sorting them.
Can you share how the results look for the following?
arry1 = [1,2,6,7]
arry2 = [2,3,4,8]
I agree with Paradox, without the two additional while loops, your logic above should be enough. ex:
while (merg1 < count1 && merg2 < count2) {
if (ary1[merg1] <= ary2[merg2]) {
merged[index++] = ary1[merg1++];
}
else {
merged[index++] = ary2[merg2++];
}
}
Wouldn't the error come from the
while (merg1 < count1 && merg2 < count2) {
...
}
It terminates when one of the numbers goes past its limit. After the while loop, I would add another loop that adds the rest of the remaining array to the merged array.
can you just use the Array utilities? For example:
int[] array1 = {1,2,6,7};
int[] array2 = {2,3,4,8};
int[] mergedarray = new int[array1.length + array2.length];
System.arraycopy(array1, 0, mergedarray, 0, array1.length);
System.arraycopy(array2, 0, mergedarray, array1.length, array2.length);
Arrays.sort(mergedarray);
That should give you the results your are looking for.

Find length of smallest window that contains all the characters of a string in another string

Recently i have been interviewed. I didn't do well cause i got stuck at the following question
suppose a sequence is given : A D C B D A B C D A C D
and search sequence is like: A C D
task was to find the start and end index in given string that contains all the characters of search string preserving the order.
Output: assuming index start from 1:
start index 10
end index 12
explanation :
1.start/end index are not 1/3 respectively because though they contain the string but order was not maintained
2.start/end index are not 1/5 respectively because though they contain the string in the order but the length is not optimum
3.start/end index are not 6/9 respectively because though they contain the string in the order but the length is not optimum
Please go through How to find smallest substring which contains all characters from a given string?.
But the above question is different since the order is not maintained. I'm still struggling to maintain the indexes. Any help would be appreciated . thanks
I tried to write some simple c code to solve the problem:
Update:
I wrote a search function that looks for the required characters in correct order, returning the length of the window and storing the window start point to ìnt * startAt. The function processes a sub-sequence of given hay from specified startpoint int start to it's end
The rest of the algorithm is located in main where all possible subsequences are tested with a small optimisation: we start looking for the next window right after the startpoint of the previous one, so we skip some unnecessary turns. During the process we keep track f the 'till-now best solution
Complexity is O(n*n/2)
Update2:
unnecessary dependencies have been removed, unnecessary subsequent calls to strlen(...) have been replaced by size parameters passed to search(...)
#include <stdio.h>
// search for single occurrence
int search(const char hay[], int haySize, const char needle[], int needleSize, int start, int * startAt)
{
int i, charFound = 0;
// search from start to end
for (i = start; i < haySize; i++)
{
// found a character ?
if (hay[i] == needle[charFound])
{
// is it the first one?
if (charFound == 0)
*startAt = i; // store starting position
charFound++; // and go to next one
}
// are we done?
if (charFound == needleSize)
return i - *startAt + 1; // success
}
return -1; // failure
}
int main(int argc, char **argv)
{
char hay[] = "ADCBDABCDACD";
char needle[] = "ACD";
int resultStartAt, resultLength = -1, i, haySize = sizeof(hay) - 1, needleSize = sizeof(needle) - 1;
// search all possible occurrences
for (i = 0; i < haySize - needleSize; i++)
{
int startAt, length;
length = search(hay, haySize, needle, needleSize, i, &startAt);
// found something?
if (length != -1)
{
// check if it's the first result, or a one better than before
if ((resultLength == -1) || (resultLength > length))
{
resultLength = length;
resultStartAt = startAt;
}
// skip unnecessary steps in the next turn
i = startAt;
}
}
printf("start at: %d, length: %d\n", resultStartAt, resultLength);
return 0;
}
Start from the beginning of the string.
If you encounter an A, then mark the position and push it on a stack. After that, keep checking the characters sequentially until
1. If you encounter an A, update the A's position to current value.
2. If you encounter a C, push it onto the stack.
After you encounter a C, again keep checking the characters sequentially until,
1. If you encounter a D, erase the stack containing A and C and mark the score from A to D for this sub-sequence.
2. If you encounter an A, then start another Stack and mark this position as well.
2a. If now you encounter a C, then erase the earlier stacks and keep the most recent stack.
2b. If you encounter a D, then erase the older stack and mark the score and check if it is less than the current best score.
Keep doing this till you reach the end of the string.
The pseudo code can be something like:
Initialize stack = empty;
Initialize bestLength = mainString.size() + 1; // a large value for the subsequence.
Initialize currentLength = 0;
for ( int i = 0; i < mainString.size(); i++ ) {
if ( stack is empty ) {
if ( mainString[i] == 'A' ) {
start a new stack and push A on it.
mark the startPosition for this stack as i.
}
continue;
}
For each of the stacks ( there can be at most two stacks prevailing,
one of size 1 and other of size 0 ) {
if ( stack size == 1 ) // only A in it {
if ( mainString[i] == 'A' ) {
update the startPosition for this stack as i.
}
if ( mainString[i] == 'C' ) {
push C on to this stack.
}
} else if ( stack size == 2 ) // A & C in it {
if ( mainString[i] == 'C' ) {
if there is a stack with size 1, then delete this stack;// the other one dominates this stack.
}
if ( mainString[i] == 'D' ) {
mark the score from startPosition till i and update bestLength accordingly.
delete this stack.
}
}
}
}
I modified my previous suggestion using a single queue, now I believe this algorithm runs with O(N*m) time:
FindSequence(char[] sequenceList)
{
queue startSeqQueue;
int i = 0, k;
int minSequenceLength = sequenceList.length + 1;
int startIdx = -1, endIdx = -1;
for (i = 0; i < sequenceList.length - 2; i++)
{
if (sequenceList[i] == 'A')
{
startSeqQueue.queue(i);
}
}
while (startSeqQueue!=null)
{
i = startSeqQueue.enqueue();
k = i + 1;
while (sequenceList.length < k && sequenceList[k] != 'C')
if (sequenceList[i] == 'A') i = startSeqQueue.enqueue();
k++;
while (sequenceList.length < k && sequenceList[k] != 'D')
k++;
if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
{
startIdx = i;
endIdx = j;
minSequenceLength = k - i + 1;
}
}
return startIdx & endIdx
}
My previous (O(1) memory) suggestion:
FindSequence(char[] sequenceList)
{
int i = 0, k;
int minSequenceLength = sequenceList.length + 1;
int startIdx = -1, endIdx = -1;
for (i = 0; i < sequenceList.length - 2; i++)
if (sequenceList[i] == 'A')
k = i+1;
while (sequenceList.length < k && sequenceList[k] != 'C')
k++;
while (sequenceList.length < k && sequenceList[k] != 'D')
k++;
if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
{
startIdx = i;
endIdx = j;
minSequenceLength = k - i + 1;
}
return startIdx & endIdx;
}
Here's my version. It keeps track of possible candidates for an optimum solution. For each character in the hay, it checks whether this character is in sequence of each candidate. It then selectes the shortest candidate. Quite straightforward.
class ShortestSequenceFinder
{
public class Solution
{
public int StartIndex;
public int Length;
}
private class Candidate
{
public int StartIndex;
public int SearchIndex;
}
public Solution Execute(string hay, string needle)
{
var candidates = new List<Candidate>();
var result = new Solution() { Length = hay.Length + 1 };
for (int i = 0; i < hay.Length; i++)
{
char c = hay[i];
for (int j = candidates.Count - 1; j >= 0; j--)
{
if (c == needle[candidates[j].SearchIndex])
{
if (candidates[j].SearchIndex == needle.Length - 1)
{
int candidateLength = i - candidates[j].StartIndex;
if (candidateLength < result.Length)
{
result.Length = candidateLength;
result.StartIndex = candidates[j].StartIndex;
}
candidates.RemoveAt(j);
}
else
{
candidates[j].SearchIndex += 1;
}
}
}
if (c == needle[0])
candidates.Add(new Candidate { SearchIndex = 1, StartIndex = i });
}
return result;
}
}
It runs in O(n*m).
Here is my solution in Python. It returns the indexes assuming 0-indexed sequences. Therefore, for the given example it returns (9, 11) instead of (10, 12). Obviously it's easy to mutate this to return (10, 12) if you wish.
def solution(s, ss):
S, E = [], []
for i in xrange(len(s)):
if s[i] == ss[0]:
S.append(i)
if s[i] == ss[-1]:
E.append(i)
candidates = sorted([(start, end) for start in S for end in E
if start <= end and end - start >= len(ss) - 1],
lambda x,y: (x[1] - x[0]) - (y[1] - y[0]))
for cand in candidates:
i, j = cand[0], 0
while i <= cand[-1]:
if s[i] == ss[j]:
j += 1
i += 1
if j == len(ss):
return cand
Usage:
>>> from so import solution
>>> s = 'ADCBDABCDACD'
>>> solution(s, 'ACD')
(9, 11)
>>> solution(s, 'ADC')
(0, 2)
>>> solution(s, 'DCCD')
(1, 8)
>>> solution(s, s)
(0, 11)
>>> s = 'ABC'
>>> solution(s, 'B')
(1, 1)
>>> print solution(s, 'gibberish')
None
I think the time complexity is O(p log(p)) where p is the number of pairs of indexes in the sequence that refer to search_sequence[0] and search_sequence[-1] where the index for search_sequence[0] is less than the index forsearch_sequence[-1] because it sorts these p pairings using an O(n log n) algorithm. But then again, my substring iteration at the end could totally overshadow that sorting step. I'm not really sure.
It probably has a worst-case time complexity which is bounded by O(n*m) where n is the length of the sequence and m is the length of the search sequence, but at the moment I cannot think of an example worst-case.
Here is my O(m*n) algorithm in Java:
class ShortestWindowAlgorithm {
Multimap<Character, Integer> charToNeedleIdx; // Character -> indexes in needle, from rightmost to leftmost | Multimap is a class from Guava
int[] prefixesIdx; // prefixesIdx[i] -- rightmost index in the hay window that contains the shortest found prefix of needle[0..i]
int[] prefixesLengths; // prefixesLengths[i] -- shortest window containing needle[0..i]
public int shortestWindow(String hay, String needle) {
init(needle);
for (int i = 0; i < hay.length(); i++) {
for (int needleIdx : charToNeedleIdx.get(hay.charAt(i))) {
if (firstTimeAchievedPrefix(needleIdx) || foundShorterPrefix(needleIdx, i)) {
prefixesIdx[needleIdx] = i;
prefixesLengths[needleIdx] = getPrefixNewLength(needleIdx, i);
forgetOldPrefixes(needleIdx);
}
}
}
return prefixesLengths[prefixesLengths.length - 1];
}
private void init(String needle) {
charToNeedleIdx = ArrayListMultimap.create();
prefixesIdx = new int[needle.length()];
prefixesLengths = new int[needle.length()];
for (int i = needle.length() - 1; i >= 0; i--) {
charToNeedleIdx.put(needle.charAt(i), i);
prefixesIdx[i] = -1;
prefixesLengths[i] = -1;
}
}
private boolean firstTimeAchievedPrefix(int needleIdx) {
int shortestPrefixSoFar = prefixesLengths[needleIdx];
return shortestPrefixSoFar == -1 && (needleIdx == 0 || prefixesLengths[needleIdx - 1] != -1);
}
private boolean foundShorterPrefix(int needleIdx, int hayIdx) {
int shortestPrefixSoFar = prefixesLengths[needleIdx];
int newLength = getPrefixNewLength(needleIdx, hayIdx);
return newLength <= shortestPrefixSoFar;
}
private int getPrefixNewLength(int needleIdx, int hayIdx) {
return needleIdx == 0 ? 1 : (prefixesLengths[needleIdx - 1] + (hayIdx - prefixesIdx[needleIdx - 1]));
}
private void forgetOldPrefixes(int needleIdx) {
if (needleIdx > 0) {
prefixesLengths[needleIdx - 1] = -1;
prefixesIdx[needleIdx - 1] = -1;
}
}
}
It works on every input and also can handle repeated characters etc.
Here are some examples:
public class StackOverflow {
public static void main(String[] args) {
ShortestWindowAlgorithm algorithm = new ShortestWindowAlgorithm();
System.out.println(algorithm.shortestWindow("AXCXXCAXCXAXCXCXAXAXCXCXDXDXDXAXCXDXAXAXCD", "AACD")); // 6
System.out.println(algorithm.shortestWindow("ADCBDABCDACD", "ACD")); // 3
System.out.println(algorithm.shortestWindow("ADCBDABCD", "ACD")); // 4
}
I haven't read every answer here, but I don't think anyone has noticed that this is just a restricted version of local pairwise sequence alignment, in which we are only allowed to insert characters (and not delete or substitute them). As such it will be solved by a simplification of the Smith-Waterman algorithm that considers only 2 cases per vertex (arriving at the vertex either by matching a character exactly, or by inserting a character) rather than 3 cases. This algorithm is O(n^2).
Here's my solution. It follows one of the pattern matching solutions. Please comment/correct me if I'm wrong.
Given the input string as in the question
A D C B D A B C D A C D. Let's first compute the indices where A occurs. Assuming a zero based index this should be [0,5,9].
Now the pseudo code is as follows.
Store the indices of A in a list say *orders*.// orders=[0,5,9]
globalminStart, globalminEnd=0,localMinStart=0,localMinEnd=0;
for (index: orders)
{
int i =index;
Stack chars=new Stack();// to store the characters
i=localminStart;
while(i< length of input string)
{
if(str.charAt(i)=='C') // we've already seen A, so we look for C
st.push(str.charAt(i));
i++;
continue;
else if(str.charAt(i)=='D' and st.peek()=='C')
localminEnd=i; // we have a match! so assign value of i to len
i+=1;
break;
else if(str.charAt(i)=='A' )// seen the next A
break;
}
if (globalMinEnd-globalMinStart<localMinEnd-localMinStart)
{
globalMinEnd=localMinEnd;
globalMinStart=localMinStart;
}
}
return [globalMinstart,globalMinEnd]
}
P.S: this is pseudocode and a rough idea. Id be happy to correct it and understand if there's something wrong.
AFAIC Time complexity -O(n). Space complexity O(n)

Resources