Binary Search in C - c

I implemented a search function of elements of the list that it was a binary search returns the index of the element found. My curiosity is to have a method of the binary search you could print all occurrences of the element in the list.
Below is the code
int Binary_Search(int *array, int chave , int N) {
int inf = 0;
int sup = N-1;
int meio;
while (inf <= sup) {
meio = inf + (sup-inf)/2;
if (chave == array[meio])
return meio;
else if (chave < array[meio])
sup = meio-1;
else
inf = meio+1;
}
return -1;
}
part of the other source
How could I make this code snippet only print occurrences duplicated?
else {
Imprime_Struct(Tabinvertida_Fornecedor[aux]->info);
aux=aux+1;
while (aux != i) {
if (strcmp(chave, TabName[aux]->info.name)==0)
Print_Struct(TabName[aux]->info);
aux++;
}
}

You can implement binary search two ways:
1) so that it finds the first element not smaller than given
2) so that it finds the last element not greater than given
Using these two implementations combined you can easily determine the number of copies of each element.
If your array contains integeres only, you don't event have to use both - just pick one and search for
1) n and n+1
2) n-1 and n
respectively.
That gives you logarithmic complexity.

Your function assumes that the array is sorted in descending order. You can modify it to find the location of the first match, if any, and list all matches:
void list_all_matches(const int *array, int N, int chave) {
int meio, inf = 0, sup = N;
while (inf < sup) {
meio = inf + (sup - inf) / 2;
if (chave < array[meio])
sup = meio;
else
inf = meio;
}
while (sup < N && array[sup] == chave)
printf("%d\n", sup++);
}

Once you get the index of the element, you can just scan forward and backwards checking for that element. As the array is sorted, all the duplicates will be together. In the worst case when all the elements are same, this method would take O(n)

Related

Find all unsorted pairs in partially sorted array

I have to find (or atleast count) all pairs of (not necessarily adjacent) unsorted elements in a partially sorted array.
If we assume the sorting to be ascending, the array [1 4 3 2 5] has the following unsorted pairs: (4, 3), (3, 2) and (4, 2).
I'm thinking of an algorithm that works along the lines of insertion sort, as insertion sort tends to compare every new element with all elements which are misplaced with respect to the new element.
Edit: While posting the question, I didn't realise that finding the pairs would have a higher time complexity than counting them. Is there a better possible algorithm that just counts how many such pairs exist?
It depends a little bit on what you mean exactly by "partially sorted" - One could argue that every array is partially sorted to some degree.
Since this algorithm has worst-case complexity O(n^2) anyway (consider the input sorted in descending order), you might as well go down the straight-forward route:
ret = []
for i in range(len(array)):
for j in range(i, len(array)):
if array[i] > array[j]:
ret.append((array[i], array[j]))
return ret
This works very well for random arrays.
However, I suppose what you have in mind is more something that there are larger stretches inside the array where the numbers are sorted but that that's not the case for the array as a whole.
In that case, you can save a bit of time over the naive approach above by first identifying those stretches - this can be done in a linear pass. Once you have them, you only have to compare these stretches with each other, and you can use binary search for that (since the stretches are in sort order).
Here's a Python implementation of what I have in mind:
# find all sorted stretches
stretches = []
begin = 0
for i in range(1, len(array)):
if array[i-1] > array[i]:
stretches.append(array[begin:i])
begin = i
if i+1 > begin:
stretches.append(array[begin:])
# compare stretches
ret = []
for i in range(len(stretches)):
stretchi = stretches[i]
stretchi_rev = None
for j in range(i+1, len(stretches)):
stretchj = stretches[j]
if stretchi[-1] > stretchj[0]:
if stretchi_rev is None:
stretchi_rev = list(reversed(stretchi))
hi = len(stretchj)
for x in stretchi_rev:
i = bisect.bisect_left(stretchj, x, 0, hi)
if i == 0:
break
else:
for y in stretchj[:i]:
ret.append((x, y))
hi = i
return ret
For random arrays, this will be slower than the first approach. But if the array is big, and the amount of partially sorted portions is high enough, this algorithm will at some point starting to beat the brute-force search.
As suggested by #SomeDude in the comments, if you just need to count pairs there's an O(nlogn) solution based on building a binary search tree. There are some subtleties involved - we need to keep track of the number of duplicates (ic) at each node, and for performance reasons we also keep track of the number of right children (rc).
The basic scheme for inserting a value v intro the tree rooted at node n is:
def insert(n, v)
if v < n.data
count = 1 + n.ic + n.rc
if n.left is null
n.left = node(v)
return count
return count + insert(n.left, v)
else if v > n.data
if n.right is null
n.right = node(v)
n.rc = 1
return 0
n.rc += 1
return insert(n.right, v)
else // v == n.data
n.ic += 1
return n.rc
And here's some functioning Java code (Ideone):
static int pairsCount(Integer[] arr) {
int count = 0;
Node root = new Node(arr[0]);
for(int i=1; i<arr.length; i++)
count += insert(root, arr[i]);
return count;
}
static int insert(Node n, int v) {
if(v < n.value) {
int count = 1 + n.rc + n.ic;
if(n.left == null) {
n.left = new Node(v);
return count;
}
return count + insert(n.left, v);
}
else if(v > n.value) {
if(n.right == null) {
n.right = new Node(v);
n.rc = 1;
return 0;
}
n.rc += 1;
return insert(n.right, v);
}
else {
n.ic += 1;
return n.rc;
}
}
static class Node {
int value;
Node left, right;
int rc; // right children count
int ic; // duplicate count
Node(int value) {
this.value = value;
}
}
Test:
Integer[] arr = {1, 4, 3, 2, 5};
System.out.println(pairsCount(arr));
Output:
3

Binary search in C always returns false, even when value is contained in sorted array [duplicate]

This question already has an answer here:
C recursive function won't return true
(1 answer)
Closed 5 years ago.
New to programming. Trying to implement binary search in C but unfortunately it isn't working properly. my function always returns false even when the value is in the array. New to programming. please help.
Function takes following inputs:
"value" - integer value to be found in array.
"values" - the sorted array.
"n" - number of integers in array.
bool search(int value, int values[], int n)
{
// recursive implementation of binary search
if (n % 2 == 0)
{
search_even(value, values, n);
}
else
{
search_odd(value, values, n);
}
return false;
}
bool search_even(int value, int values[], int n)
{
// binary search
if (n <= 0)
{
return false;
}
// check middle of array
else if (value == values[n/2])
{
return true;
}
// search left half of sorted array
else if (value < values[n/2])
{
int less_than_arr[n/2];
for (int i = 0; i < n/2; i++)
{
less_than_arr[i] = values[i];
}
search(value, less_than_arr, n/2);
}
// search right half of sorted array
else if (value > values[n/2])
{
int more_than_arr[(n/2) - 1];
for (int i = 0; i < (n/2) - 1; i++)
{
more_than_arr[i] = values[i + 1 + n/2];
}
search(value, more_than_arr, n/2);
}
return false;
}
bool search_odd(int value, int values[], int n)
{
// binary search
if (n <= 0)
{
return false;
}
// check middle of array
else if (value == values[n/2])
{
return true;
}
// search left half of sorted array
else if (value < values[n/2])
{
int less_than_arr[n/2];
for (int i = 0; i < n/2; i++)
{
less_than_arr[i] = values[i];
}
search(value, less_than_arr, n/2);
}
// search right half of sorted array
else if (value > values[n/2])
{
int more_than_arr[n/2];
for (int i = 0; i < n/2; i++)
{
more_than_arr[i] = values[i + 1 + n/2];
}
search(value, more_than_arr, n/2);
}
return false;
}
You (recursively) call search functions but never return the value computed by the calls. Look at:
bool search(int value, int values[], int n)
{
// recursive implementation of binary search
if (n % 2 == 0)
{
search_even(value, values, n);
}
else
{
search_odd(value, values, n);
}
return false;
}
This function always return false.
You need at least to replace search*(...) with return search*(...), so that value determined at the leaves of the calls is transmitted back to the original (first) call.
Jean-Baptiste has already pointed out the obvious error in your function, but there are more issues:
You create local copies of the subarrays to search. This is not necessary and will make binary search slower than linear search.
Copying data is usually only necessary when you want to modify it, but retain the original state. Your search function only inspects the data. Strictly speaking, your argument int values[] should probably be const int values[] to reflect that fact.
In C, you must pass the pointer to the first element and the length of an array. Arrays decay into pointers to their first element, so the following:
int val[4] = {2, 4, 7, 12};
search(3, val, 4);
already does that.
But here's a useful idiom: If you want to pass in the subarray that starts at position k, use:
search(3, val + k, 4 - k);
More generally, you can pass the array slice [lo, hi), where lo is the zero-based inclusive lower bound and hi is the exclusive upper bound as:
search(3, val + lo, hi - lo);
In the called function, the indices will then be [0, hi - lo); the original array offset is lost.
Further, you don't need to distinguish the two cases of odd n and even n if you calculate the size of the right-hand array as difference between the original size minus the size of the left-hand array plus one:
mid == n / 2
left = [0, mid)
right = [mid + 1, n)
With this, your recursive binary search function will become:
bool search(int value, const int values[], int n)
{
if (n == 0) return false;
if (value < values[n / 2]) {
return search(value, values, n / 2);
}
if (value > values[n / 2]) {
return search(value, values + n / 2 + 1, n - n / 2 - 1);
}
return true;
}

C Sort Algorithm Not Outputting Correct Values?

new to C here. I am making a program that will sort and search a list of random ints for learning purposes, and trying to implement Bubble sort, but am getting odd results in my console during debugging.
I have an array like so:
arr[0] = 3
arr[1] = 2
arr[2] = 1
So if I was to sort this list from least to greatest, it should be in the reverse order. Instead, my sort function seems to be logically flawed, and is outputting the following.
arr[0] = 0
arr[1] = 1
arr[2] = 2
Obviously I am new because someone that knows better will probably spot my mistake very quickly.
find.c
/**
* Prompts user for as many as MAX values until EOF is reached,
* then proceeds to search that "haystack" of values for given needle.
*
* Usage: ./find needle
*
* where needle is the value to find in a haystack of values
*/
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include "helpers.h"
// maximum amount of hay
const int MAX = 65536;
int main(int argc, string argv[])
{
// ensure proper usage
if (argc != 2)
{
printf("Usage: ./find needle\n");
return -1;
}
// remember needle
int needle = atoi(argv[1]);
// fill haystack
int size;
int haystack[MAX];
for (size = 0; size < MAX; size++)
{
// wait for hay until EOF
printf("\nhaystack[%i] = ", size);
int straw = get_int();
if (straw == INT_MAX)
{
break;
}
// add hay to stack
haystack[size] = straw;
}
printf("\n");
// sort the haystack
sort(haystack, size);
// try to find needle in haystack
if (search(needle, haystack, size))
{
printf("\nFound needle in haystack!\n\n");
return 0;
}
else
{
printf("\nDidn't find needle in haystack.\n\n");
return 1;
}
}
helpers.c
#include <cs50.h>
#include "helpers.h"
#include <stdio.h>
/**
* Returns true if value is in array of n values, else false.
*/
bool search(int value, int values[], int n)
{
// TODO: implement a searching algorithm
return false;
}
/**
* Sorts array of n values.
*/
void sort(int values[], int n)
{
// TODO: implement an O(n^2) sorting algorithm
int tmp = 0;
int i = 0;
bool swapped = false;
bool sorted = false;
for (int i = 0; i < n; i++)
{
printf("%i\n", values[i]);
}
while (!sorted)
{
//check if number on left is greater than number on right in sequential order of the array.
if (values[i] > values[i+1])
{
tmp = values[i];
values[i] = values[i+1];
values[i+1] = tmp;
swapped = true;
}
if (i >= n - 1)
{
if (!swapped)
{
//No swaps occured, meaning I can assume the list is sorted.
for (int i = 0; i < n; i++)
{
printf("%i\n", values[i]);
}
sorted = true;
break;
} else {
//A swap occured on this pass through of the array. Set the flag back to false for the next pass through, repeating until no swaps are detected. (Meaning every number is in its proper place.)
i = 0;
swapped = false;
}
} else {
i++;
}
}
}
The problem is that you do the comparison and swap before you do the test if (i >= n - 1). This means that it will compare values[i] > values[i+1] when i == n-1, so it will access outside the array bounds, which is undefined behavior. In your case, there happens to be 0 in the memory after the array, so this is getting swapped into the array, and then it gets sorted to the beginning of the array.
Change
if (values[i] > values[i+1])
to
if (i < n-1 && values[i] > values[i+1])
The highest entries you can swap in an array 0..n-1 are n-2 and n-1. So i may not be larger than n-2 so i+1 accesses n-1.
Therefore your check must be:
if (i > n - 2)

Find length of smallest window that contains all the characters of a string in another string

Recently i have been interviewed. I didn't do well cause i got stuck at the following question
suppose a sequence is given : A D C B D A B C D A C D
and search sequence is like: A C D
task was to find the start and end index in given string that contains all the characters of search string preserving the order.
Output: assuming index start from 1:
start index 10
end index 12
explanation :
1.start/end index are not 1/3 respectively because though they contain the string but order was not maintained
2.start/end index are not 1/5 respectively because though they contain the string in the order but the length is not optimum
3.start/end index are not 6/9 respectively because though they contain the string in the order but the length is not optimum
Please go through How to find smallest substring which contains all characters from a given string?.
But the above question is different since the order is not maintained. I'm still struggling to maintain the indexes. Any help would be appreciated . thanks
I tried to write some simple c code to solve the problem:
Update:
I wrote a search function that looks for the required characters in correct order, returning the length of the window and storing the window start point to ìnt * startAt. The function processes a sub-sequence of given hay from specified startpoint int start to it's end
The rest of the algorithm is located in main where all possible subsequences are tested with a small optimisation: we start looking for the next window right after the startpoint of the previous one, so we skip some unnecessary turns. During the process we keep track f the 'till-now best solution
Complexity is O(n*n/2)
Update2:
unnecessary dependencies have been removed, unnecessary subsequent calls to strlen(...) have been replaced by size parameters passed to search(...)
#include <stdio.h>
// search for single occurrence
int search(const char hay[], int haySize, const char needle[], int needleSize, int start, int * startAt)
{
int i, charFound = 0;
// search from start to end
for (i = start; i < haySize; i++)
{
// found a character ?
if (hay[i] == needle[charFound])
{
// is it the first one?
if (charFound == 0)
*startAt = i; // store starting position
charFound++; // and go to next one
}
// are we done?
if (charFound == needleSize)
return i - *startAt + 1; // success
}
return -1; // failure
}
int main(int argc, char **argv)
{
char hay[] = "ADCBDABCDACD";
char needle[] = "ACD";
int resultStartAt, resultLength = -1, i, haySize = sizeof(hay) - 1, needleSize = sizeof(needle) - 1;
// search all possible occurrences
for (i = 0; i < haySize - needleSize; i++)
{
int startAt, length;
length = search(hay, haySize, needle, needleSize, i, &startAt);
// found something?
if (length != -1)
{
// check if it's the first result, or a one better than before
if ((resultLength == -1) || (resultLength > length))
{
resultLength = length;
resultStartAt = startAt;
}
// skip unnecessary steps in the next turn
i = startAt;
}
}
printf("start at: %d, length: %d\n", resultStartAt, resultLength);
return 0;
}
Start from the beginning of the string.
If you encounter an A, then mark the position and push it on a stack. After that, keep checking the characters sequentially until
1. If you encounter an A, update the A's position to current value.
2. If you encounter a C, push it onto the stack.
After you encounter a C, again keep checking the characters sequentially until,
1. If you encounter a D, erase the stack containing A and C and mark the score from A to D for this sub-sequence.
2. If you encounter an A, then start another Stack and mark this position as well.
2a. If now you encounter a C, then erase the earlier stacks and keep the most recent stack.
2b. If you encounter a D, then erase the older stack and mark the score and check if it is less than the current best score.
Keep doing this till you reach the end of the string.
The pseudo code can be something like:
Initialize stack = empty;
Initialize bestLength = mainString.size() + 1; // a large value for the subsequence.
Initialize currentLength = 0;
for ( int i = 0; i < mainString.size(); i++ ) {
if ( stack is empty ) {
if ( mainString[i] == 'A' ) {
start a new stack and push A on it.
mark the startPosition for this stack as i.
}
continue;
}
For each of the stacks ( there can be at most two stacks prevailing,
one of size 1 and other of size 0 ) {
if ( stack size == 1 ) // only A in it {
if ( mainString[i] == 'A' ) {
update the startPosition for this stack as i.
}
if ( mainString[i] == 'C' ) {
push C on to this stack.
}
} else if ( stack size == 2 ) // A & C in it {
if ( mainString[i] == 'C' ) {
if there is a stack with size 1, then delete this stack;// the other one dominates this stack.
}
if ( mainString[i] == 'D' ) {
mark the score from startPosition till i and update bestLength accordingly.
delete this stack.
}
}
}
}
I modified my previous suggestion using a single queue, now I believe this algorithm runs with O(N*m) time:
FindSequence(char[] sequenceList)
{
queue startSeqQueue;
int i = 0, k;
int minSequenceLength = sequenceList.length + 1;
int startIdx = -1, endIdx = -1;
for (i = 0; i < sequenceList.length - 2; i++)
{
if (sequenceList[i] == 'A')
{
startSeqQueue.queue(i);
}
}
while (startSeqQueue!=null)
{
i = startSeqQueue.enqueue();
k = i + 1;
while (sequenceList.length < k && sequenceList[k] != 'C')
if (sequenceList[i] == 'A') i = startSeqQueue.enqueue();
k++;
while (sequenceList.length < k && sequenceList[k] != 'D')
k++;
if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
{
startIdx = i;
endIdx = j;
minSequenceLength = k - i + 1;
}
}
return startIdx & endIdx
}
My previous (O(1) memory) suggestion:
FindSequence(char[] sequenceList)
{
int i = 0, k;
int minSequenceLength = sequenceList.length + 1;
int startIdx = -1, endIdx = -1;
for (i = 0; i < sequenceList.length - 2; i++)
if (sequenceList[i] == 'A')
k = i+1;
while (sequenceList.length < k && sequenceList[k] != 'C')
k++;
while (sequenceList.length < k && sequenceList[k] != 'D')
k++;
if (k < sequenceList.length && k > minSequenceLength > k - i + 1)
{
startIdx = i;
endIdx = j;
minSequenceLength = k - i + 1;
}
return startIdx & endIdx;
}
Here's my version. It keeps track of possible candidates for an optimum solution. For each character in the hay, it checks whether this character is in sequence of each candidate. It then selectes the shortest candidate. Quite straightforward.
class ShortestSequenceFinder
{
public class Solution
{
public int StartIndex;
public int Length;
}
private class Candidate
{
public int StartIndex;
public int SearchIndex;
}
public Solution Execute(string hay, string needle)
{
var candidates = new List<Candidate>();
var result = new Solution() { Length = hay.Length + 1 };
for (int i = 0; i < hay.Length; i++)
{
char c = hay[i];
for (int j = candidates.Count - 1; j >= 0; j--)
{
if (c == needle[candidates[j].SearchIndex])
{
if (candidates[j].SearchIndex == needle.Length - 1)
{
int candidateLength = i - candidates[j].StartIndex;
if (candidateLength < result.Length)
{
result.Length = candidateLength;
result.StartIndex = candidates[j].StartIndex;
}
candidates.RemoveAt(j);
}
else
{
candidates[j].SearchIndex += 1;
}
}
}
if (c == needle[0])
candidates.Add(new Candidate { SearchIndex = 1, StartIndex = i });
}
return result;
}
}
It runs in O(n*m).
Here is my solution in Python. It returns the indexes assuming 0-indexed sequences. Therefore, for the given example it returns (9, 11) instead of (10, 12). Obviously it's easy to mutate this to return (10, 12) if you wish.
def solution(s, ss):
S, E = [], []
for i in xrange(len(s)):
if s[i] == ss[0]:
S.append(i)
if s[i] == ss[-1]:
E.append(i)
candidates = sorted([(start, end) for start in S for end in E
if start <= end and end - start >= len(ss) - 1],
lambda x,y: (x[1] - x[0]) - (y[1] - y[0]))
for cand in candidates:
i, j = cand[0], 0
while i <= cand[-1]:
if s[i] == ss[j]:
j += 1
i += 1
if j == len(ss):
return cand
Usage:
>>> from so import solution
>>> s = 'ADCBDABCDACD'
>>> solution(s, 'ACD')
(9, 11)
>>> solution(s, 'ADC')
(0, 2)
>>> solution(s, 'DCCD')
(1, 8)
>>> solution(s, s)
(0, 11)
>>> s = 'ABC'
>>> solution(s, 'B')
(1, 1)
>>> print solution(s, 'gibberish')
None
I think the time complexity is O(p log(p)) where p is the number of pairs of indexes in the sequence that refer to search_sequence[0] and search_sequence[-1] where the index for search_sequence[0] is less than the index forsearch_sequence[-1] because it sorts these p pairings using an O(n log n) algorithm. But then again, my substring iteration at the end could totally overshadow that sorting step. I'm not really sure.
It probably has a worst-case time complexity which is bounded by O(n*m) where n is the length of the sequence and m is the length of the search sequence, but at the moment I cannot think of an example worst-case.
Here is my O(m*n) algorithm in Java:
class ShortestWindowAlgorithm {
Multimap<Character, Integer> charToNeedleIdx; // Character -> indexes in needle, from rightmost to leftmost | Multimap is a class from Guava
int[] prefixesIdx; // prefixesIdx[i] -- rightmost index in the hay window that contains the shortest found prefix of needle[0..i]
int[] prefixesLengths; // prefixesLengths[i] -- shortest window containing needle[0..i]
public int shortestWindow(String hay, String needle) {
init(needle);
for (int i = 0; i < hay.length(); i++) {
for (int needleIdx : charToNeedleIdx.get(hay.charAt(i))) {
if (firstTimeAchievedPrefix(needleIdx) || foundShorterPrefix(needleIdx, i)) {
prefixesIdx[needleIdx] = i;
prefixesLengths[needleIdx] = getPrefixNewLength(needleIdx, i);
forgetOldPrefixes(needleIdx);
}
}
}
return prefixesLengths[prefixesLengths.length - 1];
}
private void init(String needle) {
charToNeedleIdx = ArrayListMultimap.create();
prefixesIdx = new int[needle.length()];
prefixesLengths = new int[needle.length()];
for (int i = needle.length() - 1; i >= 0; i--) {
charToNeedleIdx.put(needle.charAt(i), i);
prefixesIdx[i] = -1;
prefixesLengths[i] = -1;
}
}
private boolean firstTimeAchievedPrefix(int needleIdx) {
int shortestPrefixSoFar = prefixesLengths[needleIdx];
return shortestPrefixSoFar == -1 && (needleIdx == 0 || prefixesLengths[needleIdx - 1] != -1);
}
private boolean foundShorterPrefix(int needleIdx, int hayIdx) {
int shortestPrefixSoFar = prefixesLengths[needleIdx];
int newLength = getPrefixNewLength(needleIdx, hayIdx);
return newLength <= shortestPrefixSoFar;
}
private int getPrefixNewLength(int needleIdx, int hayIdx) {
return needleIdx == 0 ? 1 : (prefixesLengths[needleIdx - 1] + (hayIdx - prefixesIdx[needleIdx - 1]));
}
private void forgetOldPrefixes(int needleIdx) {
if (needleIdx > 0) {
prefixesLengths[needleIdx - 1] = -1;
prefixesIdx[needleIdx - 1] = -1;
}
}
}
It works on every input and also can handle repeated characters etc.
Here are some examples:
public class StackOverflow {
public static void main(String[] args) {
ShortestWindowAlgorithm algorithm = new ShortestWindowAlgorithm();
System.out.println(algorithm.shortestWindow("AXCXXCAXCXAXCXCXAXAXCXCXDXDXDXAXCXDXAXAXCD", "AACD")); // 6
System.out.println(algorithm.shortestWindow("ADCBDABCDACD", "ACD")); // 3
System.out.println(algorithm.shortestWindow("ADCBDABCD", "ACD")); // 4
}
I haven't read every answer here, but I don't think anyone has noticed that this is just a restricted version of local pairwise sequence alignment, in which we are only allowed to insert characters (and not delete or substitute them). As such it will be solved by a simplification of the Smith-Waterman algorithm that considers only 2 cases per vertex (arriving at the vertex either by matching a character exactly, or by inserting a character) rather than 3 cases. This algorithm is O(n^2).
Here's my solution. It follows one of the pattern matching solutions. Please comment/correct me if I'm wrong.
Given the input string as in the question
A D C B D A B C D A C D. Let's first compute the indices where A occurs. Assuming a zero based index this should be [0,5,9].
Now the pseudo code is as follows.
Store the indices of A in a list say *orders*.// orders=[0,5,9]
globalminStart, globalminEnd=0,localMinStart=0,localMinEnd=0;
for (index: orders)
{
int i =index;
Stack chars=new Stack();// to store the characters
i=localminStart;
while(i< length of input string)
{
if(str.charAt(i)=='C') // we've already seen A, so we look for C
st.push(str.charAt(i));
i++;
continue;
else if(str.charAt(i)=='D' and st.peek()=='C')
localminEnd=i; // we have a match! so assign value of i to len
i+=1;
break;
else if(str.charAt(i)=='A' )// seen the next A
break;
}
if (globalMinEnd-globalMinStart<localMinEnd-localMinStart)
{
globalMinEnd=localMinEnd;
globalMinStart=localMinStart;
}
}
return [globalMinstart,globalMinEnd]
}
P.S: this is pseudocode and a rough idea. Id be happy to correct it and understand if there's something wrong.
AFAIC Time complexity -O(n). Space complexity O(n)

Removing Duplicates in an array in C

The question is a little complex. The problem here is to get rid of duplicates and save the unique elements of array into another array with their original sequence.
For example :
If the input is entered b a c a d t
The result should be : b a c d t in the exact state that the input entered.
So, for sorting the array then checking couldn't work since I lost the original sequence. I was advised to use array of indices but I don't know how to do. So what is your advise to do that?
For those who are willing to answer the question I wanted to add some specific information.
char** finduni(char *words[100],int limit)
{
//
//Methods here
//
}
is the my function. The array whose duplicates should be removed and stored in a different array is words[100]. So, the process will be done on this. I firstly thought about getting all the elements of words into another array and sort that array but that doesn't work after some tests. Just a reminder for solvers :).
Well, here is a version for char types. Note it doesn't scale.
#include "stdio.h"
#include "string.h"
void removeDuplicates(unsigned char *string)
{
unsigned char allCharacters [256] = { 0 };
int lookAt;
int writeTo = 0;
for(lookAt = 0; lookAt < strlen(string); lookAt++)
{
if(allCharacters[ string[lookAt] ] == 0)
{
allCharacters[ string[lookAt] ] = 1; // mark it seen
string[writeTo++] = string[lookAt]; // copy it
}
}
string[writeTo] = '\0';
}
int main()
{
char word[] = "abbbcdefbbbghasdddaiouasdf";
removeDuplicates(word);
printf("Word is now [%s]\n", word);
return 0;
}
The following is the output:
Word is now [abcdefghsiou]
Is that something like what you want? You can modify the method if there are spaces between the letters, but if you use int, float, double or char * as the types, this method won't scale at all.
EDIT
I posted and then saw your clarification, where it's an array of char *. I'll update the method.
I hope this isn't too much code. I adapted this QuickSort algorithm and basically added index memory to it. The algorithm is O(n log n), as the 3 steps below are additive and that is the worst case complexity of 2 of them.
Sort the array of strings, but every swap should be reflected in the index array as well. After this stage, the i'th element of originalIndices holds the original index of the i'th element of the sorted array.
Remove duplicate elements in the sorted array by setting them to NULL, and setting the index value to elements, which is the highest any can be.
Sort the array of original indices, and make sure every swap is reflected in the array of strings. This gives us back the original array of strings, except the duplicates are at the end and they are all NULL.
For good measure, I return the new count of elements.
Code:
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
void sortArrayAndSetCriteria(char **arr, int elements, int *originalIndices)
{
#define MAX_LEVELS 1000
char *piv;
int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
int idx, cidx;
for(idx = 0; idx < elements; idx++)
originalIndices[idx] = idx;
beg[0] = 0;
end[0] = elements;
while (i>=0)
{
L = beg[i];
R = end[i] - 1;
if (L<R)
{
piv = arr[L];
cidx = originalIndices[L];
if (i==MAX_LEVELS-1)
return;
while (L < R)
{
while (strcmp(arr[R], piv) >= 0 && L < R) R--;
if (L < R)
{
arr[L] = arr[R];
originalIndices[L++] = originalIndices[R];
}
while (strcmp(arr[L], piv) <= 0 && L < R) L++;
if (L < R)
{
arr[R] = arr[L];
originalIndices[R--] = originalIndices[L];
}
}
arr[L] = piv;
originalIndices[L] = cidx;
beg[i + 1] = L + 1;
end[i + 1] = end[i];
end[i++] = L;
}
else
{
i--;
}
}
}
int removeDuplicatesFromBoth(char **arr, int elements, int *originalIndices)
{
// now remove duplicates
int i = 1, newLimit = 1;
char *curr = arr[0];
while (i < elements)
{
if(strcmp(curr, arr[i]) == 0)
{
arr[i] = NULL; // free this if it was malloc'd
originalIndices[i] = elements; // place it at the end
}
else
{
curr = arr[i];
newLimit++;
}
i++;
}
return newLimit;
}
void sortArrayBasedOnCriteria(char **arr, int elements, int *originalIndices)
{
#define MAX_LEVELS 1000
int piv;
int beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R;
int idx;
char *cidx;
beg[0] = 0;
end[0] = elements;
while (i>=0)
{
L = beg[i];
R = end[i] - 1;
if (L<R)
{
piv = originalIndices[L];
cidx = arr[L];
if (i==MAX_LEVELS-1)
return;
while (L < R)
{
while (originalIndices[R] >= piv && L < R) R--;
if (L < R)
{
arr[L] = arr[R];
originalIndices[L++] = originalIndices[R];
}
while (originalIndices[L] <= piv && L < R) L++;
if (L < R)
{
arr[R] = arr[L];
originalIndices[R--] = originalIndices[L];
}
}
arr[L] = cidx;
originalIndices[L] = piv;
beg[i + 1] = L + 1;
end[i + 1] = end[i];
end[i++] = L;
}
else
{
i--;
}
}
}
int removeDuplicateStrings(char *words[], int limit)
{
int *indices = (int *)malloc(limit * sizeof(int));
int newLimit;
sortArrayAndSetCriteria(words, limit, indices);
newLimit = removeDuplicatesFromBoth(words, limit, indices);
sortArrayBasedOnCriteria(words, limit, indices);
free(indices);
return newLimit;
}
int main()
{
char *words[] = { "abc", "def", "bad", "hello", "captain", "def", "abc", "goodbye" };
int newLimit = removeDuplicateStrings(words, 8);
int i = 0;
for(i = 0; i < newLimit; i++) printf(" Word # %d = %s\n", i, words[i]);
return 0;
}
Traverse through the items in the array - O(n) operation
For each item, add it to another sorted-array
Before adding it to the sorted array, check if the entry already exists - O(log n) operation
Finally, O(n log n) operation
i think that in C you can create a second array. then you copy the element from the original array only if this element is not already in the send array.
this also preserve the order of the element.
if you read the element one by one you can discard the element before insert in the original array, this could speedup the process.
As Thomas suggested in a comment, if each element of the array is guaranteed to be from a limited set of values (such as a char) you can achieve this in O(n) time.
Keep an array of 256 bool (or int if your compiler doesn't support bool) or however many different discrete values could possibly be in the array. Initialize all the values to false.
Scan the input array one-by-one.
For each element, if the corresponding value in the bool array is false, add it to the output array and set the bool array value to true. Otherwise, do nothing.
You know how to do it for char type, right?
You can do same thing with strings, but instead of using array of bools (which is technically an implementation of "set" object), you'll have to simulate the "set"(or array of bools) with a linear array of strings you already encountered. I.e. you have an array of strings you already saw, for each new string you check if it is in array of "seen" strings, if it is, then you ignore it (not unique), if it is not in array, you add it to both array of seen strings and output. If you have a small number of different strings (below 1000), you could ignore performance optimizations, and simply compare each new string with everything you already saw before.
With large number of strings (few thousands), however, you'll need to optimize things a bit:
1) Every time you add a new string to an array of strings you already saw, sort the array with insertion sort algorithm. Don't use quickSort, because insertion sort tends to be faster when data is almost sorted.
2) When checking if string is in array, use binary search.
If number of different strings is reasonable (i.e. you don't have billions of unique strings), this approach should be fast enough.

Resources