Match exact string with strstr - c

Suppose I have the following string:
in the interior of the inside is an inner inn
and I want to search, say, for the occurences of "in" (how often "in" appears).
In my program, I've used strstr to do so, but it returns false positives. It will return:
- in the interior of the inside is an inner inn
- interior of the inside is an inner inn
- inside is an inner inn
- inner inn
- inn
Thus thinking "in" appears 5 times, which is obviously not true.
How should I proceed in order to search exclusively for the word "in"?

Try the following
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void)
{
char *s = "in the interior of the inside is an inner inn";
char *t = "in";
size_t n = strlen( t );
size_t count = 0;
char *p = s;
while ( ( p = strstr( p, t ) ) != NULL )
{
char *q = p + n;
if ( p == s || isblank( ( unsigned char ) *( p - 1 ) ) )
{
if ( *q == '\0' || isblank( ( unsigned char ) *q ) ) ++count;
}
p = q;
}
printf( "There are %zu string \"%s\"\n", count, t );
return 0;
}
The output is
There are 1 string "in"
You can also add a check for ispunct if the source string can contain puctuations.

Search for " in "; note the spaces. Then consider the edge cases of a sentence starting with "in " and ending with " in".

One more way to do it is:
Use strtok() on your whole sentence with space as delimiter.
So now you can check your token against "in"

Add a isdelimiter() to check the before and after result of strstr().
// Adjust as needed.
int isdelimiter(char ch) {
return (ch == ' ') || (ch == '\0');
}
int MatchAlex(const char *haystack, const char *needle) {
int match = 0;
const char *h = haystack;
const char *m;
size_t len = strlen(needle);
while ((m = strstr(h, needle)) != NULL) {
if ((m == haystack || isdelimiter(m[-1])) && isdelimiter(m[len])) {
// printf("'%s'",m);
match++;
h += len;
} else {
h++;
}
}
return match;
}
int main(void) {
printf("%d\n",
MatchAlex("in the interior of the inside is an inner inn xxin", "in"));
return 0;
}

Related

using binary search to find the first capital letter in a sorted string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I wrote the following code to find the first capital letter in a string using binary search:
char first_capital(const char str[], int n)
{
int begin = 0;
int end = n - 1;
int mid;
while (begin <= end)
{
mid = (begin + end) / 2;
if (mid == 0 && isupper(str[mid]))
{
return mid;
}
else if (mid > 0 && isupper(str[mid]) && islower(str[mid - 1]))
{
return mid;
}
if (islower(str[mid]))
{
begin = mid + 1;
}
else
{
end = mid - 1;
}
}
return 0;
}
Currently my code isn't working as expected while testing it. If anyone can mention where I went wrong it would help a lot.
NOTE: The input string will be already sorted (all lower case letters appear before upper case letters). const char str[] is the string and int n is the length of the string.
EDIT: for example: first_capital("abcBC", 5) should return 'B'.
Your logic is completely right, but you returned the wrong value
char first_capital(const char str[], int n)
{
int begin = 0;
int end = n - 1;
int mid;
while (begin <= end)
{
mid = (begin + end) / 2;
if(mid == 0 && isupper(str[mid]))
{
return mid; // Here the index is returned not the character
}
else if (mid > 0 && isupper(str[mid]) && islower(str[mid-1]))
{
return mid; // Same goes here
}
if(islower(str[mid]))
{
begin = mid+1;
}
else
{
end = mid - 1;
}
}
return 0;
}
The driver code
int main(){
printf("%d\n", first_capital("abcabcabcabcabcZ", 16));
}
will be giving 15 as an answer which is the index of the character Z.
if u want the character to be returned replace return mid with return str[mid] and 'Z' will be returned.
#include <stdio.h>
/* This will find and return the first UPPERCASE character in txt
* provided that txt is zero-or-more lowercase letters,
* followed by zero-or-more uppercase letters.
* If it is all lower-case letters, it will return \0 (end of string)
* If it is all upper-case letters, it will return the first letter (txt[0])
* If there are non-alpha characters in the string, all bets are off.
*/
char findFirstUpper(const char* txt)
{
size_t lo = 0;
size_t hi = strlen(txt);
while(hi-lo > 1)
{
size_t mid = lo + (hi-lo)/2;
*(isupper(txt[mid])? &hi : &lo) = mid;
}
return isupper(txt[lo])? txt[lo] : txt[hi];
}
int main(void)
{
char answer = findFirstUpper("abcBC");
printf("Found char %c\n", answer);
return 0;
}
If the function deals with strings then the second parameter should be removed.
The function should return a pointer to the first upper case letter or a null pointer if such a letter is not present in the string. That is the function declaration and behavior should be similar to the declaration and behavior of the standard string function strchr. The only difference is that your function does not require a second parameter of the type char because the searched character is implicitly defined by the condition to be an upper case character.
On the other hand, though your function has the return type char it returns an integer that specifies the position of the found character. Also your function does not make a difference between the situations when an upper case character is not found and when a string contains an upper case character in its first position.
Also your function has too many if-else statements.
The function can be declared and defined the following way as it is shown in the demonstrative program below.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
char * first_capital( const char s[] )
{
const char *first = s;
const char *last = s + strlen( s );
while ( first < last )
{
const char *middle = first + ( last - first ) / 2;
if ( islower( ( unsigned char )*middle ) )
{
first = middle + 1;
}
else
{
last = middle;
}
}
return ( char * )( isupper( ( unsigned char )*first ) ? first : NULL );
}
int main(void)
{
const char *s = "";
char *result = first_capital( s );
if ( result )
{
printf( "%c at %zu\n", *result, ( size_t )( result - s ) );
}
else
{
printf( "The string \"%s\" does not contain an upper case letter.\n", s );
}
s = "a";
result = first_capital( s );
if ( result )
{
printf( "%c at %zu\n", *result, ( size_t )( result - s ) );
}
else
{
printf( "The string \"%s\" does not contain an upper case letter.\n", s );
}
s = "A";
result = first_capital( s );
if ( result )
{
printf( "%c at %zu\n", *result, ( size_t )( result - s ) );
}
else
{
printf( "The string \"%s\" does not contain an upper case letter.\n", s );
}
s = "abcdefA";
result = first_capital( s );
if ( result )
{
printf( "%c at %zu\n", *result, ( size_t )( result - s ) );
}
else
{
printf( "The string \"%s\" does not contain an upper case letter.\n", s );
}
s = "abAB";
result = first_capital( s );
if ( result )
{
printf( "%c at %zu\n", *result, ( size_t )( result - s ) );
}
else
{
printf( "The string \"%s\" does not contain an upper case letter.\n", s );
}
return 0;
}
The program output is
The string "" does not contain an upper case letter.
The string "a" does not contain an upper case letter.
A at 0
A at 6
A at 2

C: Problem in comparing two strings in a function

Good morning everyone, I have to simulate the operation of the strstr() function with a function written by me.
In the code I slide the original string in a temporary string and then the comparison with the string to look for, if they are equal it should return 1.
But even if the strings are equal and of the same length the code never enters the if cycle and therefore never returns 1.
My code:
int *strstr_new(char *s7, char *s8) {
int length_s7 = strlen(s7);
int length_s8 = strlen(s8);
char search_string[length_s8];
printf("%d\n", length_s8);
for(int i=0; i<length_s7; i++) {
for(int j=0; j<length_s8; j++) {
search_string[j] = s7[i+j];
search_string[j+1] = '\0';
}
printf("%s\n", s8);
printf("%s\n", search_string);
printf("%d\n", length_s8);
printf("%d\n", strlen(search_string));
//search_string[length_s8+1] = '\0';
if(search_string == s8) {
return(1);
}
}
if(search_string != s8) {
return(NULL);
}}
Does someone have an idea of where I'm wrong?
Thanks!
The function declaration
int *strstr_new(char *s7, char *s8);
looks very strange.
For example why is the return type int *?
Why are function parameters named s7 and s8 instead of for example s1 and s2?
Why are not the function parameters qualified with const?
Creating a variable length array within the function is inefficient and redundant and can lead to stack exhaustion.
char search_string[length_s8];
This loops
for(int j=0; j<length_s8; j++) {
search_string[j] = s7[i+j];
search_string[j+1] = '\0';
}
invokes undefined behavior because this statement
search_string[j+1] = '\0';
writes beyond the array when j is equal to length_s8 - 1.
In this statement
if(search_string == s8) {
there are compared two pointers and it is evident that they are unequal because they point to different arrays.
Without using standard C functions except the function strlen (that could be also defined explicitly) the function can be declared and defined the following way
#include <stdio.h>
#include <string.h>
char * strstr_new( const char *s1, const char *s2 )
{
char *p = NULL;
size_t n1 = strlen( s1 );
size_t n2 = strlen( s2 );
if ( !( n1 < n2 ) )
{
for ( size_t i = 0, n = n1 - n2 + 1; p == NULL && i < n; i++ )
{
size_t j = 0;
while ( j < n2 && s1[i + j] == s2[j] ) ++j;
if ( j == n2 ) p = ( char * )( s1 + i );
}
}
return p;
}
int main( void )
{
const char *s1 = "strstr_new";
const char *s2 = "str";
for ( const char *p = s1; ( p = strstr_new( p, s2 ) ) != NULL; ++p )
{
puts( p );
}
}
The program output is
strstr_new
str_new
If you are allowed to use standard string functions along with strlen then the loop within the function strstr_new can be simplified the following way
for ( size_t i = 0, n = n1 - n2 + 1; p == NULL && i < n; i++ )
{
if ( memcmp( s1 + i, s2, n2 ) == 0 ) p = ( char * )( s1 + i );
}
The biggest problem in your code is comparing strings with == operator. Both search_string and s8 are char pointers, which means you're comparing addresses of different variables, obviously to return False. Try adding another for loop to compare each char in search_string to the corresponding char in s8 (using the dereferencing operator *).
Your string comparisons won't work because you are comparing the addresses of those strings instead of the strings themselves, you'd what to use something like strcmp or memcmp to compare two strings.
Your return type is also not compatible with the return you have particularly if the strings match. I'd return 1 if the string is found and 0 if it's not, for that you'd need to change the return type to int only.
The second string comparison is unneeded, you already test the existance of the substring inside the loop so you just need to return 0 if the loop finds it's way to the end.
Lastly the temporary string is too short and will allow for access outside its bounds, inside the loop.
e.g. if length_s8 is 4 will write to
search_string[4], 5th index, out the bounds of the array.
int strstr_new(char *s7, char *s8) //return 1 for found, 0 for not found
{
int length_s7 = strlen(s7);
int length_s8 = strlen(s8);
char search_string[length_s8 + 1];//you'd want to avoid buffer overflow
for (int i = 0; i < length_s7; i++)
{
for (int j = 0; j < length_s8; j++)
{
search_string[j] = s7[i + j];
search_string[j + 1] = '\0';
}
if (!strcmp(search_string, s8))
{
return 1; //if the string is found return 1 immediately
}
}
return 0; //if it reaches this point, no match was found
}
A couple of tests:
printf("%d\n", strstr_new("this is my string", "this i"));
printf("%d\n", strstr_new("this is my string", "ringo"));
printf("%d\n", strstr_new("this is my string", "ring"));
printf("%d\n", strstr_new("this is my strin", "ths"));
Output:
1
0
1
0

Print the last word of String inside an array with only one loop

The program gets me out of the loop, it shows me in check that it is coming to NULL
Although it should continue to advance to the following letters in the string.
Thanks to all the assistants
void main()
{
char string[2][10] = { "lior","king" };
int words, letter;
for (words=0,letter = 0;words<2 , string[words][letter] != NULL;)
{
letter++;
if (string[words][letter] = NULL)
{
printf("%c\n", string[words][letter - 1]);
words++;
}
}
}
The ambition is that when it reaches the end of the first word, it will print the first letter and advance to the next string
This condition in the loop
words<2 , string[words][letter] != NULL;
is wrong. It seems you mean just
words<2
The first statement in the body pf the loop
letter++;
is also wrong because you skipped the index 0.
If I have understood correctly what you need is the following
#include <stdio.h>
int main(void)
{
enum { N = 10 };
char string[][N] = { "lior","king" };
const size_t M = sizeof( string ) / sizeof( *string );
for ( size_t word = 0, letter = 0; word < M; )
{
if (string[word][letter] == '\0' )
{
if ( letter != 0 ) printf( "%c\n", string[word][letter - 1] );
letter = 0;
++word;
}
else
{
++letter;
}
}
return 0;
}
The program output is
r
g

How can I move all vowels from a string to a different array

I am trying to write a function in C that takes 2 parameters (char *string_1, char *string_2), which will move all vowels from string_1 to string_2. I wrote a function to do this, but the output is not what I expected. The contents of string_2 seem to be quite random and sometimes contain non-vowels.
I wrote a while loop to iterate over string_1, with an if statement to check for vowels. Using pointers, I tried set the contents of string_2 + i to the vowel. Then I made the spot where the vowel used to be in string_1 equal to
' '. I tried several different algorithms, and traced them too, but I'm not sure where the problem is. Also, I am not permitted to use the string library. Any help is greatly appreciated.
#include<stdio.h>
#define MAX_STR_LEN 2048
void moveVowels(char *string_1, char *string_2)
{
int i;
i=0;
while (*(string_1 + i) != '\0')
{
if (*(string_1 + i) == 'a' || *(string_1 + i) == 'A' || *(string_1 +
i) == 'e' || *(string_1 + i) == 'E' || *(string_1 + i) == 'i' ||
*(string_1 + i) == 'I' || *(string_1 + i) == 'o' || *(string_1 + i) ==
'O' || *(string_1 + i) == 'u' || *(string_1 + i) == 'U')
{
*(string_2 + i) = *(string_1 + i);
*(string_1 + i) = ' ';
}
i++;
}
}
int main()
{
char stringy[MAX_STR_LEN]="but they also had a secret, and their greatest
fear";
char vowels[MAX_STR_LEN];
printf("%s\n",stringy);
moveVowels(&stringy[0],&vowels[0]);
printf("%s\n", stringy);
printf("%s\n", vowels);
return 0;
}
Expected Output:
but they also had a secret, and their greatest fear
b t th y ls h d s cr t, nd th r gr t st f r
ueaoaaeeaeieaeea
Actual Output:
but they also had a secret, and their greatest fear
b t th y ls h d s cr t, nd th r gr t st f r
\u]
We, beginners, should help each other.:)
Here you are.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAX_STR_LEN 2048
char * moveVowels( char *s1, const char *s2 )
{
const char *vowels = "aeiou";
size_t n = strlen( s2 );
for ( char *p = s1; *s2 != '\0'; ++s2 )
{
if ( strchr( vowels, tolower( ( unsigned char )*s2 ) ) != NULL )
{
s1[n++] = *s2;
*p++ = ' ';
}
else
{
*p++ = *s2;
}
}
s1[n] = '\0';
return s1;
}
int main(void)
{
char s1[MAX_STR_LEN] =
{
"but they also had a secret, and their greatest fear"
};
char s2[MAX_STR_LEN];
puts( s1 );
puts( moveVowels( s2, s1 ) );
return 0;
}
The program output is
but they also had a secret, and their greatest fear
b t th y ls h d s cr t, nd th r gr t st f rueaoaaeeaeieaeea
Take into account that the destination character array must have enough space to accomodate vowels in the tail of the source string.
If you want to insert the new line character between the source string and vowels then the function can look the following way.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAX_STR_LEN 2048
char * moveVowels( char *s1, const char *s2 )
{
const char *vowels = "aeiou";
size_t n = strlen( s2 );
s1[n++] = '\n';
for ( char *p = s1; *s2 != '\0'; ++s2 )
{
if ( strchr( vowels, tolower( ( unsigned char )*s2 ) ) != NULL )
{
s1[n++] = *s2;
*p++ = ' ';
}
else
{
*p++ = *s2;
}
}
s1[n] = '\0';
return s1;
}
int main(void)
{
char s1[MAX_STR_LEN] =
{
"but they also had a secret, and their greatest fear"
};
char s2[MAX_STR_LEN];
puts( s1 );
puts( moveVowels( s2, s1 ) );
return 0;
}
The program output in this case is
but they also had a secret, and their greatest fear
b t th y ls h d s cr t, nd th r gr t st f r
ueaoaaeeaeieaeea
If you are not allowed to use standard C string functions then you can substitute their calls for loops.
For example the declaration
size_t n = strlen( s2 );
can be substitute for this code snippet
size_t n = 0;
while ( s2[n] ) ++n;
So what you need is substitute the function strchr for a loop by yourself.
To keep track of the offset in the destination string you can either introduce a new variable - or just increment the pointer itself.
Option 1
In this case you need a separate variable to keep track of the offset in the destination string. In your moveVowels() function, add a new variable
int j = 0;
and change
*(string_2 + i) = *(string_1 + i);
to
*(string_2 + j++) = *(string_1 + i);
At end of moveVowels() function, add a NULL terminator to the destination string:
*(string_2 + j) = '\0';
Option 2
In this case you just increment the pointer every time you add a new vowel:
*(string_2++) = *(string_1 + i);
Again, don't forget the null terminator:
*string_2 = '\0';
the following proposed code:
cleanly compiles
does not use any of the string functions
performs the desired functionality
avoids printf() as it is very CPU intensive
avoids expressions like: *(string_1 + i)
makes use of the ctype.h header file that exposes toupper()
and now, the proposed code:
#include <stdio.h>
#include <ctype.h>
#define MAX_STR_LEN 2048
void moveVowels(char *str, char *extractedVowels)
{
for( size_t j = 0, i = 0; string_1[i]; i++ )
{
int temp = toupper( str[i] );
if ( temp == 'A'
|| temp == 'E'
|| temp == 'I'
|| temp == 'O'
|| temp == 'U')
{
extractedVowels[j] = str[i];
j++;
str[i] = ' ';
}
}
}
int main( void )
{
char stringy[MAX_STR_LEN]="but they also had a secret, and their greatest fear";
char vowels[MAX_STR_LEN] = {'\0'};
puts( stringy );
moveVowels(&stringy[0],&vowels[0]);
puts( stringy );
puts( vowels );
return 0;
}
running the proposed code results in:
but they also had a secret, and their greatest fear
b t th y ls h d s cr t, nd th r gr t st f r
ueaoaaeeaeieaeea
However, the OPs code (and the proposed code) fail to handle when a trailing 'y' is also a vowel.

Match sub-string within a string with tolerance of 1 character mismatch

I was going through some Amazon interview questions on CareerCup.com, and I came across this interesting question which I haven't been able to figure out how to do. I have been thinking on this since 2 days. Either I am taking a way off approach, or its a genuinely hard function to write.
Question is as follows:
Write a function in C that can find if a string is a sub-string of another. Note that a mismatch of one character
should be ignored.
A mismatch can be an extra character: ’dog’ matches ‘xxxdoogyyyy’
A mismatch can be a missing character: ’dog’ matches ‘xxxdgyyyy’
A mismatch can be a different character: ’dog’ matches ‘xxxdigyyyy’
The return value wasn't mentioned in the question, so I assume the signature of the function can be something like this:
char * MatchWithTolerance(const char * str, const char * substr);
If there is a match with the given rules, return the pointer to the beginning of matched substring within the string. Else return null.
Bonus
If someone can also figure out a generic way of making the tolerance to n instead of 1, then that would be just brilliant.
In that case the signature would be:
char * MatchWithTolerance(const char * str, const char * substr, unsigned int tolerance = 1);
This seems to work, let me know if you find any errors and I'll try to fix them:
int findHelper(const char *str, const char *substr, int mustMatch = 0)
{
if ( *substr == '\0' )
return 1;
if ( *str == '\0' )
return 0;
if ( *str == *substr )
return findHelper(str + 1, substr + 1, mustMatch);
else
{
if ( mustMatch )
return 0;
if ( *(str + 1) == *substr )
return findHelper(str + 1, substr, 1);
else if ( *str == *(substr + 1) )
return findHelper(str, substr + 1, 1);
else if ( *(str + 1) == *(substr + 1) )
return findHelper(str + 1, substr + 1, 1);
else if ( *(substr + 1) == '\0' )
return 1;
else
return 0;
}
}
int find(const char *str, const char *substr)
{
int ok = 0;
while ( *str != '\0' )
ok |= findHelper(str++, substr, 0);
return ok;
}
int main()
{
printf("%d\n", find("xxxdoogyyyy", "dog"));
printf("%d\n", find("xxxdgyyyy", "dog"));
printf("%d\n", find("xxxdigyyyy", "dog"));
}
Basically, I make sure only one character can differ, and run the function that does this for every suffix of the haystack.
This is related to a classical problem of IT, referred to as Levenshtein distance.
See Wikibooks for a bunch of implementations in different languages.
This is slightly different than the earlier solution, but I was intrigued by the problem and wanted to give it a shot. Obviously optimize if desired, I just wanted a solution.
char *match(char *str, char *substr, int tolerance)
{
if (! *substr) return str;
if (! *str) return NULL;
while (*str)
{
char *str_p;
char *substr_p;
char *matches_missing;
char *matches_mismatched;
str_p = str;
substr_p = substr;
while (*str_p && *substr_p && *str_p == *substr_p)
{
str_p++;
substr_p++;
}
if (! *substr_p) return str;
if (! tolerance)
{
str++;
continue;
}
if (strlen(substr_p) <= tolerance) return str;
/* missed due to a missing letter */
matches_missing = match(str_p, substr_p + 1, tolerance - 1);
if (matches_missing == str_p) return str;
/* missed due to a mismatch of letters */
matches_mismatched = match(str_p + 1, substr_p + 1, tolerance - 1);
if (matches_mismatched == str_p + 1) return str;
str++;
}
return NULL;
}
Is the problem to do this efficiently?
The naive solution is to loop over every substring of size substr in str, from left to right, and return true if the current substring if only one of the characters is different in a comparison.
Let n = size of str
Let m = size of substr
There are O(n) substrings in str, and the matching step takes time O(m). Ergo, the naive solution runs in time
O(n*m)
With arbitary no. of tolerance levels.
Worked for all the test cases I could think of. Loosely based on |/|ad's solution.
#include<stdio.h>
#include<string.h>
report (int x, char* str, char* sstr, int[] t) {
if ( x )
printf( "%s is a substring of %s for a tolerance[%d]\n",sstr,str[i],t[i] );
else
printf ( "%s is NOT a substring of %s for a tolerance[%d]\n",sstr,str[i],t[i] );
}
int find_with_tolerance (char *str, char *sstr, int tol) {
if ( (*sstr) == '\0' ) //end of substring, and match
return 1;
if ( (*str) == '\0' ) //end of string
if ( tol >= strlen(sstr) ) //but tol saves the day
return 1;
else //there's nothing even the poor tol can do
return 0;
if ( *sstr == *str ) { //current char match, smooth
return find_with_tolerance ( str+1, sstr+1, tol );
} else {
if ( tol <= 0 ) //that's it. no more patience
return 0;
for(int i=1; i<=tol; i++) {
if ( *(str+i) == *sstr ) //insertioan of a foreign character
return find_with_tolerance ( str+i+1, sstr+1, tol-i );
if ( *str == *(sstr+i) ) //deal with dletion
return find_with_tolerance ( str+1, sstr+i+1, tol-i );
if ( *(str+i) == *(sstr+i) ) //deal with riplacement
return find_with_tolerance ( str+i+1, sstr+i+1, tol-i );
if ( *(sstr+i) == '\0' ) //substr ends, thanks to tol & this loop
return 1;
}
return 0; //when all fails
}
}
int find (char *str, char *sstr, int tol ) {
int w = 0;
while (*str!='\0')
w |= find_with_tolerance ( str++, sstr, tol );
return (w) ? 1 : 0;
}
int main() {
const int n=3; //no of test cases
char *sstr = "dog"; //the substr
char *str[n] = { "doox", //those cases
"xxxxxd",
"xxdogxx" };
int t[] = {1,1,0}; //tolerance levels for those cases
for(int i = 0; i < n; i++) {
report( find ( *(str+i), sstr, t[i] ), *(str+i), sstr, t[i] );
}
return 0;
}

Resources