'highlighting' a string according to a given pattern - c

I have a bunch of strings such as
1245046126856123
5293812332348977
1552724141123171
7992612370048696
6912394320472896
I give the program a pattern which in this case is '123' and the expected output is
1245046126856[123]
52938[123]32348977
1552724141[123]171
79926[123]70048696
69[123]94320472896
To do this I record the indices where the pattern occurs in an array and then I make an empty array of chars and put '[' and ']' according to the indices. So far it works fine but when I have a string such as
12312312312312312123
I should get
[123][123][123][123][123]12[123]
However I cannot find a way to record the indices according to such a case. I use rabin-karp algorithm for pattern matching and the section where I calculate the indeces as to where to put the brackets is as follows
if(j == M){
index[k] = i; //beginning index
index[k+1] = i+M+1; //finishing index
if((k!=0)&&(index[k-1] == index[k])){
index[k]++;
index[k+1]++;
}
if((k!=0)&&(index[k] < index[k-1])){
index[k] = index[k-2]+M+1;
index[k+1] = i-index[k-1]+M+1;
}
k += 2;
}
i is the index where the pattern starts to occur,
j is the index where the algorithm terminates the pattern check (last character of the given pattern),
k is the index of the index array.
M is the length of the pattern
This results in a string (where only the brackets are placed) like this
[ ][ ][ ][ ][ ][ ]
but as you can see, there should be two empty spaces between the last two sets of brackets. How can I adjust way I calculate the indexes so that I can properly place the brackets?

EDIT
thought it was a python question at first so this is a pythonic answer, but it may help as a pseudo code.
this piece of code should help you find all the indexes in the string that holds the pattern.
string = "12312312312312123"
ptrn = "123"
i = 0
indexes = [] //create a dynamic array (it may also be constant size string length/pattern length or just the string length)
while True:
i = string.find(ptrn, i) //get the next index of the pattern in a substring that starts from the last index of last suffix of the pattern.
if i == -1: //if such index inside the original string (the pattern exists).
break
indexes.append(i) //add the found index of the pattern occurrence into the array.
i += len(ptrn) //get to the next position where the pattern may appear not inside another pattern.
print(indexes)
if you would like to have it on every pattern match even if it's inside another match, you can remove the i+=len(ptrn) and replace the while statement with for i in range(0,len(string)): // run for every index of the string - for(int i=0; i<strlen(string); i++)

Related

How to delete an element from an array in C?

I've tried shifting elements backwards but it is not making the array completely empty.
for(i=pos;i<N-count;i++)
{
A[i]=A[i+1];
}
Actually, I've to test for a key value in an input array and if the key value is present in the array then I've to remove it from the array. The loop should be terminated when the array becomes empty. Here "count" represents the number of times before a key value was found and was removed. And, "pos" represents the position of the element to be removed. I think dynamic memory allocation may help but I've not learned it yet.
From your description and code, by "delete" you probably mean shift the values to remove the given element and shorten the list by reducing the total count.
In your example, pos and count would be/should be the similar (off by 1?) .
The limit for your for loop isn't N - count. It is N - 1
So, you want:
for (i = pos; i < (N - 1); i++) {
A[i] = A[i + 1];
}
N -= 1;
To do a general delete, given some criteria (a function/macro that matches on element(s) to delete, such as match_for_delete below), you can do the match and delete in a single pass on the array:
int isrc = 0;
int idst = 0;
for (; isrc < N; ++isrc) {
if (match_for_delete(A,isrc,...))
continue;
if (isrc > idst)
A[idst] = A[isrc];
++idst;
}
N = idst;

How to replace a specific character in an array with two characters

So I just came back from a job interview and one of the questions I had to face with was :
"Given an array of characters and three characters for example :
Array : [a,b,c,z,s,w,y,z,o]
Char 1: 'z'
Char 2 : 'R'
Char 3 : 'R'
Your goal is to replace each 'z' in the array to become two R characters within O(N) time complexity.
so your input will be Array : [a,b,c,z,s,w,y,z,o]
and your output array will be : [a,b,c,R,R,s,w,y,R,R,o]
assume that there is no 'R' in the array before.
You are not allowed to use other arrays or other variables.
The algorithm should be in-line algorithm.
Your final array must be a characters array."
My solution was within O(N^2) time complexity but there is a solution within O(N) time complexity .
The interview is over but I am still thinking about this problem, Can anyone help me to solve this ?
First scan the input to count how many occurrences of char 1 exist. This has a linear time complexity.
From that you know that the length of the final array will be the input length + the number of occurrences.
Then extend the array to its new length, leaving the new slots empty (or whatever value). The exact nature of the operation depends on how the array data structure is implemented. This can surely be done with at worst a linear time complexity.
Use two indexes, i and j, where i references the last character of the input array and j references the very last index in the array (potentially to an empty slot).
Start copying from i to j each time decreasing the values of these indices with one. If you copy the matching letter, then duplicate the copied character to j again, and only reduce j. This has again a linear time complexity.
The algorithm will end with both i and j equal to -1.
Do two iterations.
First, count the number of char1s ('z' in your example).
Now you know how long your array should be at the end: array.size() + num_char1s
Then, go from last to first with input and output iterators. If the element is char1, insert to the end iterator the new chars, otherwise - just copy.
Pseudo code:
num_char1s = 0
for x in array:
if x == char1:
num_char1s++
// Assuming array has sufficient memory already allocated.
out_iterator = num_char1s + size - 1
in_iterator = size - 1
while (in_iterator >= 0):
if (array[in_iterator] == char1):
array[out_iterator--] = char3
array[out_iterator--] = char2
else:
array[out_iterator--] = array[in_iterator]
in_iterator--
In your question, two things are very important.
can't use new variable
can't use new array
So, we must need to use given array.
First we will increase our given array size double. why? Cause at most our new array size = given_array_size*2 (if all characters = char 1)
Now we will shift our given array n times right, where n= given_array_size.
Now we will iterate our array from the new shifted position = n. iterate i=n to 2*n-1
We will take j=0, which will write new array. if we found char 1, we will
make array[j++]=char 2 and array[j++]=char 3.
But if a character is not 'z', we simply don't do anything. array[j++]=array[i]
At last 0 to j-1 is the right answer.
Complexity: O(n)
No new variable and array needed

Find longest suffix of string in given array

Given a string and array of strings find the longest suffix of string in array.
for example
string = google.com.tr
array = tr, nic.tr, gov.nic.tr, org.tr, com.tr
returns com.tr
I have tried to use binary search with specific comparator, but failed.
C-code would be welcome.
Edit:
I should have said that im looking for a solution where i can do as much work as i can in preparation step (when i only have a array of suffixes, and i can sort it in every way possible, build any data-structure around it etc..), and than for given string find its suffix in this array as fast as possible. Also i know that i can build a trie out of this array, and probably this will give me best performance possible, BUT im very lazy and keeping a trie in raw C in huge peace of tangled enterprise code is no fun at all. So some binsearch-like approach will be very welcome.
Assuming constant time addressing of characters within strings this problem is isomorphic to finding the largest prefix.
Let i = 0.
Let S = null
Let c = prefix[i]
Remove strings a from A if a[i] != c and if A. Replace S with a if a.Length == i + 1.
Increment i.
Go to step 3.
Is that what you're looking for?
Example:
prefix = rt.moc.elgoog
array = rt.moc, rt.org, rt.cin.vof, rt.cin, rt
Pass 0: prefix[0] is 'r' and array[j][0] == 'r' for all j so nothing is removed from the array. i + 1 -> 0 + 1 -> 1 is our target length, but none of the strings have a length of 1, so S remains null.
Pass 1: prefix[1] is 't' and array[j][1] == 'r' for all j so nothing is removed from the array. However there is a string that has length 2, so S becomes rt.
Pass 2: prefix[2] is '.' and array[j][2] == '.' for the remaining strings so nothing changes.
Pass 3: prefix[3] is 'm' and array[j][3] != 'm' for rt.org, rt.cin.vof, and rt.cin so those strings are removed.
etc.
Another naïve, pseudo-answer.
Set boolean "found" to false. While "found" is false, iterate over the array comparing the source string to the strings in the array. If there's a match, set "found" to true and break. If there's no match, use something like strchr() to get to the segment of the string following the first period. Iterate over the array again. Continue until there's a match, or until the last segment of the source string has been compared to all the strings in the array and failed to match.
Not very efficient....
Naive, pseudo-answer:
Sort array of suffixes by length (yes, there may be strings of same length, which is a problem with the question you are asking I think)
Iterate over array and see if suffix is in given string
If it is, exit the loop because you are done! If not, continue.
Alternatively, you could skip the sorting and just iterate, assigning the biggestString if the currentString is bigger than the biggestString that has matched.
Edit 0:
Maybe you could improve this by looking at your array before hand and considering "minimal" elements that need to be checked.
For instance, if .com appears in 20 members you could just check .com against the given string to potentially eliminate 20 candidates.
Edit 1:
On second thought, in order to compare elements in the array you will need to use a string comparison. My feeling is that any gain you get out of an attempt at optimizing the list of strings for comparison might be negated by the expense of comparing them before doing so, if that makes sense. Would appreciate if a CS type could correct me here...
If your array of strings is something along the following:
char string[STRINGS][MAX_STRING_LENGTH];
string[0]="google.com.tr";
string[1]="nic.tr";
etc, then you can simply do this:
int x, max = 0;
for (x = 0; x < STRINGS; x++) {
if (strlen(string[x]) > max) {
max = strlen(string[x]);
}
}
x = 0;
while(true) {
if (string[max][x] == ".") {
GOTO out;
}
x++;
}
out:
char output[MAX_STRING_LENGTH];
int y = 0;
while (string[max][x] != NULL) {
output[y++] = string[++x];
}
(The above code may not actually work (errors, etc.), but you should get the general idea.
Why don't you use suffix arrays ? It works when you have large number of suffixes.
Complexity, O(n(logn)^2), there are O(nlogn) versions too.
Implementation in c here. You can also try googling suffix arrays.

Search Algorithm with Incomplete Input

I need an algorithm which will search an array for a string, but the string may not be exactly the same as one of the items in the array.
For example,
Array = {"Stack", "Over", "Flow", "Stake"}
input = "Sta"
It will need to recognize that Stack and Stake both match the parameters and then choose the one which is first in alphabetical order.
How can I do this?
I would use List, do binarySearch on that list.
List<String> arr = new ArrayList<>();
add elements, while adding elements you can do the following.
int x = Collections.binarySearch(arr, key);
if(x < 0)
arr.add(-x-1, key);
//for n element this takes n.log_n time.
you can do binary search in the list, if the result of binarySearch is > 0, then the key exists in your list, else (-x-1) is the location of the key when it is inserted. go tru each element who begins with input string.
For example, arr is your array and you are searching for input.
arr = {"Flow", "Over", "Stack", "Stake"}
input = "Sta";
int x = Collections.binarySearch(arr, input);
if(x < 0)
x = -x-1;
if(arr.get(x).subString(0,input.length()).equals(input));
System.out.println(arr.get(x))
else
System.out.println("there is no element starting with input string");
Time complexity is O(logn) where n is array's length.
Loop over the sorted array, compute the Levenshtein distance between each string and your target string, and if it is sufficiently small, return.
What constitutes "sufficiently small" is up to you. You'll probably have to do some testing.
Simply loop through each element in the array and compare it to the input, determining if the input is contained in the element. Remove any element that does not meet this prerequisite. Finally go through the remaining elements and pick the one that is first alphabetically.
Loop through all the index values of the array and find the substring match of the input. Find all the matches and print the one whose index value is the lowest.
For example you will find the substring match for Array[0] and Array[3]. Now you have two matches at 0 and 3. Find the next alphabet of the substirng match. At Arrary[0] the next alphabet to Sta is 'c' but at Array[3] the next alphabet is 'k', here a < k, so the output is Array[0]
You may find Trie data structure useful. It is very efficient to find all words you need.
But memory overhead can be significant if you have many words in the list.

How do I remove duplicate strings from an array in C?

I have an array of strings in C and an integer indicating how many strings are in the array.
char *strarray[MAX];
int strcount;
In this array, the highest index (where 10 is higher than 0) is the most recent item added and the lowest index is the most distant item added. The order of items within the array matters.
I need a quick way to check the array for duplicates, remove all but the highest index duplicate, and collapse the array.
For example:
strarray[0] = "Line 1";
strarray[1] = "Line 2";
strarray[2] = "Line 3";
strarray[3] = "Line 2";
strarray[4] = "Line 4";
would become:
strarray[0] = "Line 1";
strarray[1] = "Line 3";
strarray[2] = "Line 2";
strarray[3] = "Line 4";
Index 1 of the original array was removed and indexes 2, 3, and 4 slid downwards to fill the gap.
I have one idea of how to do it. It is untested and I am currently attempting to code it but just from my faint understanding, I am sure this is a horrendous algorithm.
The algorithm presented below would be ran every time a new string is added to the strarray.
For the interest of showing that I am trying, I will include my proposed algorithm below:
Search entire strarray for match to str
If no match, do nothing
If match found, put str in strarray
Now we have a strarray with a max of 1 duplicate entry
Add highest index strarray string to lowest index of temporary string array
Continue downwards into strarray and check each element
If duplicate found, skip it
If not, add it to the next highest index of the temporary string array
Reverse temporary string array and copy to strarray
Once again, this is untested (I am currently implementing it now). I just hope someone out there will have a much better solution.
The order of items is important and the code must utilize the C language (not C++). The lowest index duplicates should be removed and the single highest index kept.
Thank you!
The typical efficient unique function is to:
Sort the given array.
Verify that consecutive runs of the same item are setup so that only one remains.
I believe you can use qsort in combination with strcmp to accomplish the first part; writing an efficient remove would be all on you though.
Unfortunately I don't have specific ideas here; this is kind of a grey area for me because I'm usually using C++, where this would be a simple:
std::vector<std::string> src;
std::sort(src.begin(), src.end());
src.remove(std::unique(src.begin(), src.end()), src.end);
I know you can't use C++, but the implementation should essentially be the same.
Because you need to save the original order, you can have something like:
typedef struct
{
int originalPosition;
char * string;
} tempUniqueEntry;
Do your first sort with respect to string, remove unique sets of elements on the sorted set, then resort with respect to originalPosition. This way you still get O(n lg n) performance, yet you don't lose the original order.
EDIT2:
Simple C implementation example of std::unique:
tempUniqueEntry* unique ( tempUniqueEntry * first, tempUniqueEntry * last )
{
tempUniqueEntry *result=first;
while (++first != last)
{
if (strcmp(result->string,first->string))
*(++result)=*first;
}
return ++result;
}
I don't quite understand your proposed algorithm (I don't understand what it means to add a string to an index in step 5), but what I would do is:
unsigned int i;
for (i = n; i > 0; i--)
{
unsigned int j;
if (strarray[i - 1] == NULL)
{
continue;
}
for (j = i - 1; j > 0; j--)
{
if (strcmp(strarray[i - 1], strarray[j - 1]) == 0)
{
strarray[j - 1] = NULL;
}
}
}
Then you just need to filter the null pointers out of your array (which I'll leave as an exercise).
A different approach would be to iterate backwards over the array and to insert each item into a (balanced) binary search tree as you go. If the item is already in the binary search tree, flag the array item (such as setting the array element to NULL) and move on. When you've processed the entire array, filter out the flagged elements as before. This would have slightly more overhead and would consume more space, but its running time would be O(n log n) instead of O(n^2).
Can you control the input as it is going into the array? If so, just do something like this:
int addToArray(const char * toadd, char * strarray[], int strcount)
{
const int toaddlen = strlen(toadd);
// Add new string to end.
// Remember to add one for the \0 terminator.
strarray[strcount] = malloc(sizeof(char) * (toaddlen + 1));
strncpy(strarray[strcount], toadd, toaddlen + 1);
// Search for a duplicate.
// Note that we are cutting the new array short by one.
for(int i = 0; i < strcount; ++i)
{
if (strncmp(strarray[i], toaddlen + 1) == 0)
{
// Found duplicate.
// Remove it and compact.
// Note use of new array size here.
free(strarray[i]);
for(int k = i + 1; k < strcount + 1; ++k)
strarray[i] = strarray[k];
strarray[strcount] = null;
return strcount;
}
}
// No duplicate found.
return (strcount + 1);
}
You can always use the above function looping over the elements of an existing array, building a new array without duplicates.
PS: If you are doing this type of operation a lot, you should move away from an array as your storage structure, and used a linked list instead. They are much more efficient for removing elements from a location other than the end.
Sort the array with an algorithm like qsort (man 3 qsort in the terminal to see how it should be used) and then use the function strcmp to compare the strings and find duplicates
If you want to mantain the original order you could use a O(N^2) complexity algorithm nesting two for, the first each time pick an element to compare to the other and the second for will be used to scan the rest of the array to find if the chosen element is a duplicate.

Resources