Longest common substring in string itself - c

Given a string like "geekthegeertheregeers" .So we have to find longest common substring in the string itself.
Like in this case "geer" will be longest common substring.
My question is that which algorithm will be applied here.can LCS be modified to find this solution of this problem?

Is the question "finding longest substring occur more than once in substring set"?
Result for "geekthegeertheregeers" should be "egeer"?
If so, you can build suffix array for input string, and construct LCP(Longest Common Prefix) array for suffix array. Both can be done in O(N) (N is length of input string).
Reference:
Suffix Array (http://en.wikipedia.org/wiki/Suffix_array )
LCP array (http://en.wikipedia.org/wiki/LCP_array )

Related

Finding one array in another Array

We have two arrays
char A[]="ABABABABBBABAB";
And the other is
char B[]="BABA";
How can I find B in A and where it starts and where it ends for every occurence?
For example for this one
Between 2-5
Between 4-7
Between 10-13
Yes you can do this using strstr function.
This function returns a pointer to the first occurrence in haystack of any of the entire sequence of characters specified in needle, or a null pointer if the sequence is not present in haystack.
So you will find the pointer pointing the beginning of the string. But then again if you want to find the next occurence you will change the first parameter accordingly omitting the portion where first occurence is found. A simple illustration :-
char haystack[]="abismyabnameab";
char needle[]="ab";
char *ret;
ret = strstr(haystack, needle);
while(ret != NULL){
/* do work */
printf("%s (%zu,%zu)\n",ret, ret-haystack, ret-haystack+strlen(needle)-1 );
ret = strstr(haystack+(ret-haystack)+1,needle);
}
I omitted the part where you get those count's where it spits out the indices of the needle. As an hint notice one thing - the length of the needle will eb known to you and where it starts you know that using strstr. (ret - haystack specifically for each instance of needle in haystack).
Note this illustration code is showing the example for strings which are non-recurring within itself. For example, BB is found in BBBBB then we will find every occurrence in each position. But the solution above skips the second occurrence. A simple modification is adding to haystack 1 to search in string one character later than the previous iteration.
Better solution is to find the failure function using KMP. That will give a better complexity solution. O(n+m). But in earlier case it is O(n*m).

Find Duplicate characters in String ( Solution must be less than O(n^2)

If we traverse the string array for every character and compare with all others,we would find duplicates,but that is O(n^2)
I need some idea to do it in less than O(n^2)
Lets say input string is: nice book, then
output will be: o
Iterate over all characters and store them into a HashMap with key is the character itself and value is nay thing (the character, true, integer,..). Before adding the character to the hash map check if it already exists, if exists then it is a duplicate if not insert it
Here is a pesudo code
for character char in String
if charactersMap.get(char) == null
charactersMap.put(char, char)
else
print C
This solution is O(n) as it iterates over the characters once and looking up for a key into a map takes constant time.

Search the longest substring in two strings

The task is this: Find the longest substring found in two lines. The peculiarity of the problem is that these lines are very long (contents of the file, that is to 400,000 characters each), and the alphabet from which they are composed of short - 4 characters.
Strings can be of different length.
I invented and implemented the following algorithm:
To get the contents of the first file and write to a string str1, removing the line breaks
To get the contents of the second file and write to a string str2, removing the line breaks
We shall consider all substrings the string str1, from the longest to the shortest. To do this, define the cycle while (i>0), at each iteration, which after the main content decreases the length of the string by one. And so to the strings of length 1.
Inside the while loop: All substring of length N differ only in the beginning position.
Let have a string of length N:
It is one substring of length N, which contains, starting at position 0.
There are two substring of length N-1 that start inside positions 0 and 1
In it for three substring of length N-2, which starts inside positions 0, 1, and 2
...
K+1 substring of length N-K, which start from the position 0,1,...,K
The starting position of the count in the for loop(z=0; z<=g-i; z++), within which the function is called getSubstring receiving the substring. And then running the standard function strstr with this substring of a string str2
But does this algorithm long enough. Is there no way to make it faster?
P.S. Write in C
There are at least two classical options to solve longest common substring efficiently
Build a generalized suffix array or suffix tree of the two strings. One can show that the LCS is a prefix of two adjacent suffixes in the suffix array that have different colors (belong to the different strings). I once wrote an answer that describes a simple O(n log n) suffix array construction algorithm
Build a suffix automaton of one string and feed the other string into it. At every point check how "deep" you are in the automaton and report the maximum over all those depths. You can find a C++ implementation in my GitHub.

Regex for this string

The buffer string char *bufferString points to the first element of the following string:
BER Berman, Jane 06/29/91 Photography;Dance;Music\n
I'd like to parse each item of the topics last list of topics only and store them
What I've tried:
#define REGEX_TOPIC "^[a-zA-Z].*^[0-9/0-90-9/0-90-9+]"
char *topic;
topic = strstr(bufferString, REGEX_TOPIC);
Could you help me here?
The strstr() function locates the first occurrence of the null-terminated
string s2 in the null-terminated string s1. It does not handle regular expressions.
For using regular expressions in C, see the answers to Regular expressions in C: examples?.

in C: reading input string, finding it in char array

writing another program, it reads a txt file, and stores all the letter characters and spaces (as \0) in a char array, and ignores everything else. this part works.
now what i need it to do is read a user inputted string, and search for that string in the array, then print the word every time it appears. im terrible at I/O in C, how do you read a string then find it in a char array?
#include <stdio.h>
...
char str [80];
printf ("Enter your word: ");
scanf ("%s",str);
char* pch=strstr(fileData,str);
while (pch!=NULL)
{
printf ("found at %d\n",pch-fileData+1);
pch=strstr(pch+1,str);
}
read in the user inputted string as a char array as well (cause strings are basically char* anyway in C)
use a string matching algorithm like Boyer-Moore or Knutt-Morris-Pratt (more popularly known as KMP) - google for it if you like for C implementations of these - cause they're neat, tried and tested ways of searching strings for substrings and pattern matches and all.
for each of these indexOf cases, print the position where the word is found maybe? or if you prefer, the number of occurrences.
Generally, the list of C string functions, found here, say, are of the format str* or strn*, depending on requirements.
One for-loop inside another for-loop (called nested loop). Go through all the letters in your array, and for each letter go through all the letters in your input string and find out if that part of the array matches with the input string. If it does, print it.

Resources