Search Algorithm with Incomplete Input

Search Algorithm with Incomplete Input - arrays

I need an algorithm which will search an array for a string, but the string may not be exactly the same as one of the items in the array.
For example,
Array = {"Stack", "Over", "Flow", "Stake"}
input = "Sta"
It will need to recognize that Stack and Stake both match the parameters and then choose the one which is first in alphabetical order.
How can I do this?

I would use List, do binarySearch on that list.
List<String> arr = new ArrayList<>();
add elements, while adding elements you can do the following.
int x = Collections.binarySearch(arr, key);
if(x < 0)
arr.add(-x-1, key);
//for n element this takes n.log_n time.
you can do binary search in the list, if the result of binarySearch is > 0, then the key exists in your list, else (-x-1) is the location of the key when it is inserted. go tru each element who begins with input string.
For example, arr is your array and you are searching for input.
arr = {"Flow", "Over", "Stack", "Stake"}
input = "Sta";
int x = Collections.binarySearch(arr, input);
if(x < 0)
x = -x-1;
if(arr.get(x).subString(0,input.length()).equals(input));
System.out.println(arr.get(x))
else
System.out.println("there is no element starting with input string");
Time complexity is O(logn) where n is array's length.

Loop over the sorted array, compute the Levenshtein distance between each string and your target string, and if it is sufficiently small, return.
What constitutes "sufficiently small" is up to you. You'll probably have to do some testing.

Simply loop through each element in the array and compare it to the input, determining if the input is contained in the element. Remove any element that does not meet this prerequisite. Finally go through the remaining elements and pick the one that is first alphabetically.

Loop through all the index values of the array and find the substring match of the input. Find all the matches and print the one whose index value is the lowest.
For example you will find the substring match for Array[0] and Array[3]. Now you have two matches at 0 and 3. Find the next alphabet of the substirng match. At Arrary[0] the next alphabet to Sta is 'c' but at Array[3] the next alphabet is 'k', here a < k, so the output is Array[0]

You may find Trie data structure useful. It is very efficient to find all words you need.
But memory overhead can be significant if you have many words in the list.

Related

How to replace a specific character in an array with two characters

So I just came back from a job interview and one of the questions I had to face with was :
"Given an array of characters and three characters for example :
Array : [a,b,c,z,s,w,y,z,o]
Char 1: 'z'
Char 2 : 'R'
Char 3 : 'R'
Your goal is to replace each 'z' in the array to become two R characters within O(N) time complexity.
so your input will be Array : [a,b,c,z,s,w,y,z,o]
and your output array will be : [a,b,c,R,R,s,w,y,R,R,o]
assume that there is no 'R' in the array before.
You are not allowed to use other arrays or other variables.
The algorithm should be in-line algorithm.
Your final array must be a characters array."
My solution was within O(N^2) time complexity but there is a solution within O(N) time complexity .
The interview is over but I am still thinking about this problem, Can anyone help me to solve this ?

First scan the input to count how many occurrences of char 1 exist. This has a linear time complexity.
From that you know that the length of the final array will be the input length + the number of occurrences.
Then extend the array to its new length, leaving the new slots empty (or whatever value). The exact nature of the operation depends on how the array data structure is implemented. This can surely be done with at worst a linear time complexity.
Use two indexes, i and j, where i references the last character of the input array and j references the very last index in the array (potentially to an empty slot).
Start copying from i to j each time decreasing the values of these indices with one. If you copy the matching letter, then duplicate the copied character to j again, and only reduce j. This has again a linear time complexity.
The algorithm will end with both i and j equal to -1.

Do two iterations.
First, count the number of char1s ('z' in your example).
Now you know how long your array should be at the end: array.size() + num_char1s
Then, go from last to first with input and output iterators. If the element is char1, insert to the end iterator the new chars, otherwise - just copy.
Pseudo code:
num_char1s = 0
for x in array:
if x == char1:
num_char1s++
// Assuming array has sufficient memory already allocated.
out_iterator = num_char1s + size - 1
in_iterator = size - 1
while (in_iterator >= 0):
if (array[in_iterator] == char1):
array[out_iterator--] = char3
array[out_iterator--] = char2
else:
array[out_iterator--] = array[in_iterator]
in_iterator--

In your question, two things are very important.
can't use new variable
can't use new array
So, we must need to use given array.
First we will increase our given array size double. why? Cause at most our new array size = given_array_size*2 (if all characters = char 1)
Now we will shift our given array n times right, where n= given_array_size.
Now we will iterate our array from the new shifted position = n. iterate i=n to 2*n-1
We will take j=0, which will write new array. if we found char 1, we will
make array[j++]=char 2 and array[j++]=char 3.
But if a character is not 'z', we simply don't do anything. array[j++]=array[i]
At last 0 to j-1 is the right answer.
Complexity: O(n)
No new variable and array needed

Given an array, find out the last smaller element for each element

Given an array find the last smaller element's index in array for each element.
For example, suppose the given array is {4,2,1,5,3}. Then last smaller element for each element will be as follows.
4->3
2->1
1->Null
5->3
3->Null
Notice for 1st pair 4->3, 3 is the last element in array smaller than 4.
The resultant/output array would have indexes not the elements themselves. Result would be {4,2,-1,4,-1}
I was asked this question in an interview, but i couldn't think of a solution better than the trivial O(n^2) solution.
Any help would be highly appreciated.

We need to compute max(index) over all elements with smaller values.
Let's sort pairs (element, index) in lexicographical order and iterate over them keeping track of the largest index seen so far. That's exactly the position of the rightmost smaller element. Here's how one could implement it:
def get_right_smaller(xs):
res = [-1] * len(xs)
right_index = -1
for val, idx in sorted((val, idx) for idx, val in enumerate(xs)):
res[idx] = right_index if right_index > idx else -1
right_index = max(right_index, idx)
return res
This solution works properly even if the input array contains equal numbers because the element with smaller index goes earlier if the the values of two elements are the same.
The time complexity of this solution is O(N log N + N) = O(N log N) (it does sorting and one linear pass).
If all elements of the array are O(N), you can make this solution linear using count sort.

Make a list, add last element index.
Walk through array right to left.
For every element:
if list tail value is smaller then current element
find the most first smaller list element (binary search, list is sorted)
otherwise
add element index to the list tail, output -1
for {4,2,1,5,3,6,2} example list will contain index 6 (value 2); index 2 (value 1)

Find longest suffix of string in given array

Given a string and array of strings find the longest suffix of string in array.
for example
string = google.com.tr
array = tr, nic.tr, gov.nic.tr, org.tr, com.tr
returns com.tr
I have tried to use binary search with specific comparator, but failed.
C-code would be welcome.
Edit:
I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.

Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.
Let i = 0.
Let S = null
Let c = prefix[i]
Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.
Increment i.
Go to step 3.
Is that what you're looking for?
Example:
prefix = rt.moc.elgoog
array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt
Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.
Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.
Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.
Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.
etc.

Another naïve, pseudo-answer.
Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.
Not very efficient....

Naive, pseudo-answer:
Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
Iterate over array and see if suffix is in given string
If it is, exit the loop because you are done! If not, continue.
Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.
Edit 0:
Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.
For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.
Edit 1:
On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...

If your array of strings is something along the following:
char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";
etc, then you can simply do this:
int x, max = 0;
for (x = 0; x < STRINGS; x++) {
if (strlen(string[x]) > max) {
max = strlen(string[x]);
}
}
x = 0;
while(true) {
if (string[max][x] == ".") {
GOTO out;
}
x++;
}
out:
char output[MAX_STRING_LENGTH];
int y = 0;
while (string[max][x] != NULL) {
output[y++] = string[++x];
}
(The above code may not actually work (errors, etc.), but you should get the general idea.

Why don't you use suffix arrays ? It works when you have large number of suffixes.
Complexity, O(n(logn)^2), there are O(nlogn) versions too.
Implementation in c here. You can also try googling suffix arrays.

how to calculate the mode of an unsorted array of integers in O(N)?

...using an iterative procedure (no hash table)?
It's not homework. And by mode I mean the most frequent number (statistical mode). I don't want to use a hash table because I want to know how it can be done iteratively.

OK Fantius, how bout this?
Sort the list with a RadixSort (BucketSort) algorithm (technically O(N) time; the numbers must be integers). Start at the first element, remember its value and start a count at 1. Iterate through the list, incrementing the count, until you reach a different value. If the count for that value is higher than the current high count, remember that value and count as the mode. If you get a tie with the high count, remember both (or all) numbers.
... yeah, yeah, the RadixSort is not an in-place sort, and thus involves something you could call a hashtable (a collection of collections indexed by the current digit). However, the hashtable is used to sort, not to calculate the mode.
I'm going to say that on an unsorted list, it would be impossible to compute the mode in linear time without involving a hashtable SOMEWHERE. On a sorted list, the second half of this algorithm works by just keeping track of the current max count.

Definitely sounds like homework. But, try this: go through the list once, and find the largest number. Create an array of integers with that many elements, all initialized to zero. Then, go through the list again, and for each number, increment the equivalent index of the array by 1. Finally, scan your array and return the index that has the highest value. This will execute in roughly linear time, whereas any algorithm that includes a sort will probably take NlogN time or worse. However, this solution is a memory hog; it'll basically create a bell plot just to give you one number from it.
Remember that many (but not all) languages use arrays that are zero-based, so when converting from a "natural" number to an index, subtract one, and then add one to go from index to natural number.

If you don't want to use a hash, use a modified binary search trie (with a counter per node). For each element in the array insert into the trie. If it already exists in the trie, increment the counter. At the end, find the node with the highest counter.
Of course you can also use a hashmap that maps to a counter variable and will work the same way. I don't understand your complaint about it not being iterative... You iterate through the array, and then you iterate through the members of the hashmap to find the highest counter.

just use counting sort and look into array which store the number occurrences for each entity.h store the number occurrences for each entity.

I prepared two implementations in Python with different space and time complexity:
The first one uses "occurence array" is O(k) in terms of time complexity and S(k+1) in terms of space needed, where k is the greatest number in input.
input =[1,2,3,8,4,6,1,3,7,9,6,1,9]
def find_max(tab):
max=tab[0]
for i in range(0,len(tab)):
if tab[i] > max:
max=tab[i]
return max
C = [0]*(find_max(input)+1)
print len(C)
def count_occurences(tab):
max_occurence=C[0]
max_occurence_index=0
for i in range(0,len(tab)):
C[tab[i]]=C[tab[i]]+1
if C[tab[i]]>max_occurence:
max_occurence = C[tab[i]]
max_occurence_index=tab[i]
return max_occurence_index
print count_occurences(input)
NOTE: Imagine such pitiful example of input like an array [1, 10^8,1,1,1], there will be array of length k+1=100000001 needed.
The second one solution assumes, that we sort our input before searching for mode. I used radix sort, which has time complexity O(kn) where k is the length of the longest number and n is size of the input array. And then we have to iterate over whole sorted array of size n, to determine the longest subset of numbers standing for mode.
input =[1,2,3,8,4,6,1,3,7,9,6,1,9]
def radix_sort(A):
len_A = len(A)
mod = 5 #init num of buckets
div = 1
while True:
the_buckets = [[], [], [], [], [], [], [], [], [], []]
for value in A:
ldigit = value % mod
ldigit = ldigit / div
the_buckets[ldigit].append(value)
mod = mod * 10
div = div * 10
if len(the_buckets[0]) == len_A:
return the_buckets[0]
A = []
rd_list_append = A.append
for b in the_buckets:
for i in b:
rd_list_append(i)
def find_mode_in_sorted(A):
mode=A[0]
number_of_occurences =1
number_of_occurences_canidate=0
for i in range(1,len(A)):
if A[i] == mode:
number_of_occurences =number_of_occurences +1
else:
number_of_occurences_canidate=number_of_occurences_canidate+1
if A[i] != A[i-1]:
number_of_occurences_canidate=0
if number_of_occurences_canidate > number_of_occurences :
mode=A[i]
number_of_occurences =number_of_occurences_canidate+1
return mode#,number_of_occurences
s_input=radix_sort(input)
print find_mode_in_sorted(s_input)

Using JavaScript:
const mode = (arr) => {
let numMapping = {};
let mode
let greatestFreq = 0;
for(var i = 0; i < arr.length; i++){
if(numMapping[arr[i]] === undefined){
numMapping[arr[i]] = 0;
}
numMapping[arr[i]] += 1;
if (numMapping[arr[i]] > greatestFreq){
greatestFreq = numMapping[arr[i]]
mode = arr[i]
}
}
return parseInt(mode)
}

How do I remove duplicate strings from an array in C?

I have an array of strings in C and an integer indicating how many strings are in the array.
char *strarray[MAX];
int strcount;
In this array, the highest index (where 10 is higher than 0) is the most recent item added and the lowest index is the most distant item added. The order of items within the array matters.
I need a quick way to check the array for duplicates, remove all but the highest index duplicate, and collapse the array.
For example:
strarray[0] = "Line 1";
strarray[1] = "Line 2";
strarray[2] = "Line 3";
strarray[3] = "Line 2";
strarray[4] = "Line 4";
would become:
strarray[0] = "Line 1";
strarray[1] = "Line 3";
strarray[2] = "Line 2";
strarray[3] = "Line 4";
Index 1 of the original array was removed and indexes 2, 3, and 4 slid downwards to fill the gap.
I have one idea of how to do it. It is untested and I am currently attempting to code it but just from my faint understanding, I am sure this is a horrendous algorithm.
The algorithm presented below would be ran every time a new string is added to the strarray.
For the interest of showing that I am trying, I will include my proposed algorithm below:
Search entire strarray for match to str
If no match, do nothing
If match found, put str in strarray
Now we have a strarray with a max of 1 duplicate entry
Add highest index strarray string to lowest index of temporary string array
Continue downwards into strarray and check each element
If duplicate found, skip it
If not, add it to the next highest index of the temporary string array
Reverse temporary string array and copy to strarray
Once again, this is untested (I am currently implementing it now). I just hope someone out there will have a much better solution.
The order of items is important and the code must utilize the C language (not C++). The lowest index duplicates should be removed and the single highest index kept.
Thank you!

The typical efficient unique function is to:
Sort the given array.
Verify that consecutive runs of the same item are setup so that only one remains.
I believe you can use qsort in combination with strcmp to accomplish the first part; writing an efficient remove would be all on you though.
Unfortunately I don't have specific ideas here; this is kind of a grey area for me because I'm usually using C++, where this would be a simple:
std::vector<std::string> src;
std::sort(src.begin(), src.end());
src.remove(std::unique(src.begin(), src.end()), src.end);
I know you can't use C++, but the implementation should essentially be the same.
Because you need to save the original order, you can have something like:
typedef struct
{
int originalPosition;
char * string;
} tempUniqueEntry;
Do your first sort with respect to string, remove unique sets of elements on the sorted set, then resort with respect to originalPosition. This way you still get O(n lg n) performance, yet you don't lose the original order.
EDIT2:
Simple C implementation example of std::unique:
tempUniqueEntry* unique ( tempUniqueEntry * first, tempUniqueEntry * last )
{
tempUniqueEntry *result=first;
while (++first != last)
{
if (strcmp(result->string,first->string))
*(++result)=*first;
}
return ++result;
}

I don't quite understand your proposed algorithm (I don't understand what it means to add a string to an index in step 5), but what I would do is:
unsigned int i;
for (i = n; i > 0; i--)
{
unsigned int j;
if (strarray[i - 1] == NULL)
{
continue;
}
for (j = i - 1; j > 0; j--)
{
if (strcmp(strarray[i - 1], strarray[j - 1]) == 0)
{
strarray[j - 1] = NULL;
}
}
}
Then you just need to filter the null pointers out of your array (which I'll leave as an exercise).
A different approach would be to iterate backwards over the array and to insert each item into a (balanced) binary search tree as you go. If the item is already in the binary search tree, flag the array item (such as setting the array element to NULL) and move on. When you've processed the entire array, filter out the flagged elements as before. This would have slightly more overhead and would consume more space, but its running time would be O(n log n) instead of O(n^2).

Can you control the input as it is going into the array? If so, just do something like this:
int addToArray(const char * toadd, char * strarray[], int strcount)
{
const int toaddlen = strlen(toadd);
// Add new string to end.
// Remember to add one for the \0 terminator.
strarray[strcount] = malloc(sizeof(char) * (toaddlen + 1));
strncpy(strarray[strcount], toadd, toaddlen + 1);
// Search for a duplicate.
// Note that we are cutting the new array short by one.
for(int i = 0; i < strcount; ++i)
{
if (strncmp(strarray[i], toaddlen + 1) == 0)
{
// Found duplicate.
// Remove it and compact.
// Note use of new array size here.
free(strarray[i]);
for(int k = i + 1; k < strcount + 1; ++k)
strarray[i] = strarray[k];
strarray[strcount] = null;
return strcount;
}
}
// No duplicate found.
return (strcount + 1);
}
You can always use the above function looping over the elements of an existing array, building a new array without duplicates.
PS: If you are doing this type of operation a lot, you should move away from an array as your storage structure, and used a linked list instead. They are much more efficient for removing elements from a location other than the end.

Sort the array with an algorithm like qsort (man 3 qsort in the terminal to see how it should be used) and then use the function strcmp to compare the strings and find duplicates
If you want to mantain the original order you could use a O(N^2) complexity algorithm nesting two for, the first each time pick an element to compare to the other and the second for will be used to scan the rest of the array to find if the chosen element is a duplicate.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Search Algorithm with Incomplete Input - arrays

Loop over the sorted array, compute the Levenshtein distance between each string and your target string, and if it is sufficiently small, return. What constitutes "sufficiently small" is up to you. You'll probably have to do some testing.

Simply loop through each element in the array and compare it to the input, determining if the input is contained in the element. Remove any element that does not meet this prerequisite. Finally go through the remaining elements and pick the one that is first alphabetically.

You may find Trie data structure useful. It is very efficient to find all words you need. But memory overhead can be significant if you have many words in the list.

Related

How to replace a specific character in an array with two characters

Given an array, find out the last smaller element for each element

Find longest suffix of string in given array

how to calculate the mode of an unsorted array of integers in O(N)?

How do I remove duplicate strings from an array in C?

Categories

Resources