Why isn't this case insensitive version of strstr() function working? - c

In an effort of solving a textbook problem, I'm trying to create a case insensitive version of the function called strstr() which is in the C language. So far, I've run into two problems. The first problem being that when I make the case insensitive version of strstr() it worked, but it didn't stop at the first matching string and continued to return the string even if they didn't match.
strstr() is supposed to see the first instance of a matching character up to n counts specified and then stop. Like if I wrote: "Xehanort" in string A and "Xemnas" in string B and specified 4, as the number, it would return Xe.
The idea behind the case insensitive version is that I can write : "Xehanort" in one string and "xemnas" in the next string and have it return Xe.
However, I've run into a new problem in new code I've tried: the function doesn't seem to want to run at all. I've tested this and it turns out the function seems to be at a crash and I'm not sure how to make it stop.
I've tried editing the code, I've tried using different for loops but figured that the code doesn't need to be too sophisticated yet, I've also tried different code entirely than what you are going to read, but that resulted in the problem mentioned earlier.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
#define MAX 100
char *stristr4(const char *p1, const char *p2, size_t num);
int main() {
char c[MAX], d[MAX];
printf("Please enter the string you want to compare.");
gets(c);
printf("Please enter the next string you want to compare.");
gets(d);
printf("The first string to be obtained from \n%s, and \n%s is \n%s",
c, d, stristr4(c, d, MAX));
}
char *stristr4(const char *p1, const char *p2, size_t num) {
const char *str1 = p1;
const char *str2 = p2;
char *str3;
int counter = 0;
for (int i = 0; i < num; i++) {
for (int j = 0; j < num; j++) {
if (tolower(str1[i]) == tolower(str2[j])) {
str3[i] = str1[i];
counter++;
} else {
if (counter > 0) {
break;
} else
continue;
}
}
}
return str3;
}
The code you see will ask for the strings you want to input. Ideally, it should return the input.
Then it should do the stristr function and return the first instance of matching string with case insensitivity.
However, the function I've created doesn't even seem to run.

Your code has undefined behavior (in this case causing a segmentation fault), because you try to store the resulting string via an uninitialized pointer str3.
Standard function strstr returns a pointer to the matching subsequence, you should do the same. The third argument is useless if the first and second arguments are proper C strings.
Here is a modified version:
char *stristr4(const char *p1, const char *p2) {
for (;; p1++) {
for (size_t i = 0;; i++) {
if (p2[i] == '\0')
return (char *)p1;
if (tolower((unsigned char)p1[i]) != tolower((unsigned char)p2[i]))
break;
}
if (*p1 == '\0')
return NULL;
}
}
Notes:
function tolower() as other functions from <ctype.h> takes an int argument that must have the value of an unsigned char or the special negative value EOF. char arguments must be converted to unsigned char to avoid undefined behavior for negative char values. char can be signed or unsigned by default depending on the platform and the compilers settings.
you should never use gets(). This function is obsolete and cannot be used safely with uncontrolled input. Use fgets() and strip the trailing newline:
if (fgets(c, sizeof c, stdin)) {
c[strcspn(c, "\n")] = '\0'; // strip the trailing newline if any
...
}

A third string could be passed to the function and fill that string with the matching characters.
Use fgets instead of gets.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#define MAX 100
int stristr4(const char* p1, const char *p2, char *same);
int main( void)
{
int comp = 0;
char c[MAX] = "", d[MAX] = "", match[MAX] = "";//initialize to all zero
printf ( "Please enter the string you want to compare. ");
fflush ( stdout);//printf has no newline so make sure it prints
fgets ( c, MAX, stdin);
c[strcspn ( c, "\n")] = 0;//remove newline
printf ( "Please enter the next string you want to compare. ");
fflush ( stdout);//printf has no newline so make sure it prints
fgets ( d, MAX, stdin);
d[strcspn ( d, "\n")] = 0;//remove newline
comp = stristr4 ( c, d, match);
printf ( "Comparison of \n%s, and \n%s is \n%d\n", c, d, comp);
if ( *match) {
printf ( "The matching string to be obtained from \n%s, and \n%s is \n%s\n"
, c, d, match);
}
return 0;
}
int stristr4 ( const char *p1,const char *p2, char *same)
{
//pointers not pointing to zero and tolower values are equal
while ( *p1 && *p2 && tolower ( (unsigned char)*p1) == tolower ( (unsigned char)*p2))
{
*same = tolower ( (unsigned char)*p1);//count same characters
same++;//increment to next character
*same = 0;//zero terminate
p1++;
p2++;
}
return *p1 - *p2;//return difference
}

Related

C using tolower case-insensitive

In my code, I used a tolower function in order to eliminate letters not considering their cases. ( case insensitive) but my problem is that if my first input is "HELLO" and my 2nd is "hi" the ouput would be "ello" in lowercase letters instead of "ELLO". Is there any way to fix this? Should I not use tolower function?
#include <stdio.h>
#include <conio.h>
void main()
{
char s1[20],s2[20];
int i,j;
printf("\nEnter string 1:- ");
gets(s1);
printf("\nEnter the string for matching:- ");
gets(s2);
for(int i = 0; s1[i]; i++)
{
s1[i] = tolower(s1[i]);
}
for(int i = 0; s2[i]; i++)
{
s2[i] = tolower(s2[i]);
}
for (i=0;(i<20&&s1[i]!='\0');i++)
{
for (j=0;(j<20&&s2[j]!='\0');j++)
{
if (s1[i]==s2[j])
s1[i]=' ';
}
}
printf("\nString 1 after deletion process is %s",s1);
printf("\nIts compressed form is ");
for (i=0;(i<20&&s1[i]!='\0');i++)
{
if (s1[i]!=' ')
printf("%c",s1[i]);
}
getch();
}
Write a function
Compare the results of tolower() directly — don’t change the strings themselves
Do not use gets() and scanf("%s") — both have no bounds checking
EDIT: sorry, this function simply compares two strings. It is meant to give you an idea of how to use tolower() effectively, not do your work for you. :-)
#include <iso646.h>
#include <ctype.h>
bool is_equal( const char * a, const char * b )
{
while (*a and *b)
{
if (tolower( *a ) != tolower( *b ))
return false;
++a;
++b;
}
if (*a or *b) return false;
return true;
}
Now you can call the function directly.
if (is_equal( "HELLO", "hello" )) ...
Getting a string input from the user in C is always a pain, but you can use fgets() for that.
char s[100]; // the target string (array)
fgets( s, 100, stdin ); // get max 99 characters with null terminator
char * p = strchr( s, '\n' ); // find the Enter key press
if (p) *p = '\0'; // and remove it
puts( s ); // print the string obtained from user
You can always wrap all the annoying stuff for getting strings into a function.
Is there any way to fix this? Should I not use tolower function?
Instead of changing s1[] to lower case, leave s1[] "as is" and change the compare. Still good to change s2[].
// if (s1[i]==s2[j]) s1[i]=' ';
if (tolower(((unsigned char*)s1)[i]) == s2[j]) s1[i]=' ';
tolower(int ch) is well defined for all int values in the unsigned char range and EOF. Since a char may be negative and string processing is best done as unsigned char, use the cast. Also in the s2[] processing.
Also do not use gets(). Research fgets().
Your code has security vulnerabilities because you're using gets() (if the text input by the user is larger than 19 bytes, you'll have buffer overflows on variables s1 and s2). This function is bugged, it's not fixable and should never be used. Instead use, for example, fgets(s1, sizeof(s1), stdin).
The main idea of the problem is that you must preserve the strings, so remove the loops that modify them. In this case the correct predicate for checking if each compared character is the same without regard to case would become:
if (tolower((unsigned char)s1[i]) == tolower((unsigned char)s2[j]))

how to change single char in string array?

Having this:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
char *up(char *);
int main() {
char initstr[20];
printf("enter string\n");
fgets(initstr, 20, stdin);
char *str = up(initstr);
printf("%s\n", str);
}
char *up(char *in) {
char *ret;
for (ret = in;
*in != '\n';
*(ret++) = toupper(*(in++))
);
return ret;
}
Run it as:
$./a.out
enter string
abc
#only new line from `printf("%s\n",str);`
From debugger
Hardware watchpoint 3: in
Old value = 0x7fffffffdc20 "abc\n"
New value = 0x7fffffffdc21 "bc\n"
Hardware watchpoint 2: ret
Old value = 0x7fffffffdc20 "abc\n"
New value = 0x7fffffffdc21 "bc\n"
Hardware watchpoint 3: in
Old value = 0x7fffffffdc21 "bc\n"
New value = 0x7fffffffdc22 "c\n"
Hardware watchpoint 2: ret
Old value = 0x7fffffffdc21 "bc\n"
New value = 0x7fffffffdc22 "c\n"
...
I can see that both variables are reducing, but I wanted to change the ret inline, char by char. But at the end (after loop), the ret is reduced to nothing, and the program will only output \n. So how can I achieve this in the loop head?
EDIT:
Thanks to answer below, having in mind I have to return first address of pointer, I can implement loop_head-only function by this:
char *up(char *in){
char *ret;
size_t size=strlen(in);
for(ret=in;
*in!='\n';
*(ret++)=toupper(*(in++))
);
return (ret-size+1);
}
The bug in up is you increment ret all the way to the newline (\n) and return ret pointing to this character in the string. You should instead return a pointer to the initial character.
It is simpler to write this function using an index.
packing all the logic into the for clauses with an empty body is hard to read and error prone.
Note also that the string might not contain a newline, so it is safer to stop at the null terminator, the newline will not be changed by toupper().
Finally, you should not pass char values to toupper() because this function and all functions from <ctype.h> is only defined for values of type unsigned char and the special negative value EOF. On platforms where char is signed by default, the string might contain negative char values which may cause undefined behavior when passed to toupper(). Cast these as (unsigned char) to avoid this issue.
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
char *up(char *s) {
for (size_t i = 0; s[i] != '\0'; i++) {
s[i] = toupper((unsigned char)s[i]);
}
return s;
}
int main() {
char initstr[20];
printf("enter string\n");
if (fgets(initstr, sizeof initstr, stdin)) {
char *str = up(initstr);
printf("%s\n", str);
}
return 0;
}

My Loop Won't Check All Elements Of Array in C

#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void displayString (const char *sPtr);
void getString (char *[]);
int determinIfConvert (char);
int main ()
{
char originalString[11] = { 0 };
char convertedString[11];
getString (originalString);
displayString (originalString);
// this loop runs through the "originalString" to check for the char: 'a'
for (int i = 0; i < 11; i++) {
determinIfConvert (originalString[i]);
}
system ("pause");
}
void getString (char *a[]) // this function gets a string
{
printf ("enter 11 char string: \n");
scanf ("%s", a);
}
// this program displays the inputstring
void displayString (const char *sPtr)
{
for (; (*sPtr != '\0'); ++sPtr) {
printf ("%c", *sPtr);
}
}
int determinIfConvert (char *a)
{
if (a == 97) // this is a test condition. The goal is to
// check for all lowercase, but now i'm
// only entering "aaaaa"
{
printf ("Works"); // if it prints multiple"works"
// then i can continue my program
// but it only prints ONE "works" and freezes.
}
}
At the moment I have a problem with my For Loop in main() not finishing. The goal is to enter a string of characters, and then check for lowercase ones. This will be done with the function DeterminIfConvert(char). However, when I run through the loop element by element, it freezes after the second element. My test data is "aaaa" and it prints the "aaaa," so I know that my first two functions work just fine. I get to the loop, it goes through the first element, prints "works" and then freezes. :/
Multiple mistakes
void getString(char *a[])
should be
void getString(char a[])
Since you're sending the base address of an array of char, not an array of pointer to char
char *a[]; // array of pointer to char
char a[]; // array of char
int determinIfConvert(char *a)
should be
int determinIfConvert(char a)
Since you're sending a char, not a pointer to char
char * a; // pointer to char
char a; // char
NOTE:
Use the standard definition of main()
int main(void) //if no command line arguments.
If you are inputting an 11-char string, then you should be doing:
char originalString[12] = { 0 };
This is because you need 1 more character to store the null character '\0'.
That is probably why in your function getString(...), the pointer exceeds the array bounds and might invoke undefined behavior.
Finally, your function prototype for getString(...) should be
void getString(char a[]); //without the *
In addition to the other answers, you have several other areas where you can improve your code.
Avoid using magic numbers in your code (e.g. 11). Instead define a constant for the maximum characters in your string #define MAXC 11 or you can use an enum instead enum { MAXC = 11 };
As it currently sits, you do not protect against overflowing your 11 character array (which means your user can enter no more than 10 characters plus room for the nul-terminating character). To protect against the user entering something more than 10, you should use a field-width specifier with scanf:
scanf ("%10s", a);
That doesn't solve your problems with scanf. You must check the return every time to insure the expected number of conversions takes place, e.g.:
if (scanf ("%10s", a) != 1) {
fprintf (stderr, " -> error: invalid input.\n");
exit (EXIT_FAILURE);
}
That's better, but using %s, you cannot read a string containing whitespace, and you are still leaving a trailing '\n' in the input buffer. If the users enters "my dog", you store "my" only. To fix part of the problem you can use a format specifier of "%10[^\n]%*c". However, you must protect against an endless-loop if the user presses [Enter] without other input. To resolve all issues, and prevent leaving the trailing newline in the input buffer, you can use something like:
int getString (char *a) // this function gets a string
{
int c, rtn = 0;
printf ("enter string (10 char or less): ");
while ((rtn = scanf ("%10[^\n]%*c", a)) != 1) {
if (rtn == EOF)
break;
fprintf (stderr, " -> error: invalid input, try again..\n");
printf ("enter string (10 char or less): ");
/* flush input buffer - to avoid endless loop */
while ((c = getchar()) != '\n' && c != EOF) {}
}
return rtn;
}
All of which expose the difficulties using scanf for user input. A better approach may be to use fgets (or getline) to read the complete line of input.
Regardless whether you use scanf or fgets, etc.. you must take a bit of time and care in writing your input handlers to insure you try and cover all ways a user could muck up input. Below fgets is used just to present an alternative. You should also choose a return type that allows you to tell whether you have successfully received input or not. It might as well be a useful return such as the length of the input taken, etc..
The remainder of your level of pointer indirection issues have been addressed by other answers. Putting it all together, you could do something like:
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 11
void displayString (const char *sPtr);
int getString (char *);
int determinIfConvert (char);
int main (void)
{
char originalString [MAXC] = "";
// char convertedString[MAXC] = ""; /* currently unused */
if (!getString (originalString)) {
fprintf (stderr, "error: getString failed.\n");
return 1;
}
displayString (originalString);
// this loop runs through the "originalString" to check for the char: 'a'
for (int i = 0; i < 11; i++) {
determinIfConvert (originalString[i]);
}
system ("pause");
return 0; /* main() is type 'int' and returns a value */
}
int getString (char *a) // this function gets a string
{
char *p = a;
int c;
size_t len = 0;
printf ("enter string (10 char or less): ");
for (;;) {
p = fgets (a, MAXC, stdin);
if (!p) break; /* handle [CTRL+D] */
if (*p == '\n') { /* handle empty str */
fprintf (stderr, " -> error: invalid input, try again..\n");
printf ("enter string (10 char or less): ");
continue;
}
/* trim newline/flush input buffer */
len = strlen (p);
if (len && a[len - 1] == '\n')
a[--len] = 0;
else /* user entered more than 10 chars */
while ((c = getchar()) != '\n' && c != EOF) {}
break;
}
return (int) len;
}
// this program displays the inputstring
void displayString (const char *sPtr)
{
for (; *sPtr; sPtr++) {
printf ("%c", *sPtr);
}
putchar ('\n');
}
int determinIfConvert (char a)
{
if (a == 97)
printf ("Works\n");
return 0;
}
Example Use/Output
$ ./bin/getdispstr
enter string (10 char or less): my dog has fleas
my dog has
Works
$ ./bin/getdispstr
enter string (10 char or less):
-> error: invalid input, try again..
enter string (10 char or less): my dog has fleas, my cat has none.
my dog has
Works
With CTRL+D (EOF)
$ ./bin/getdispstr
enter string (10 char or less): error: getString failed.
There are many ways to do this, this is just an example. Look over all the answers and let me know if you have questions.
This
char originalString[11] = { 0 };
followed by this
for (int i = 0; i < 11; i++)
{
determinIfConvert(originalString[i]);
}
is causing the problem. You see the array of char does not have elements post index 0. And yeah I believe what you are trying to attempt with
getString(originalString); seems like you want to get originalString from user input which is not correctly executed in your case.
You pass object of type char to a function accepting char*
char originalString[11] = { 0 };
determinIfConvert(originalString[i]);
int determinIfConvert(char *a)
A string is nothing but a null terminated set of characters, so if you wish to have 11 characters in you string, you should be allocating 12 bytes to your
array, ie you may change :
char originalString[11] = { 0 };
to
char originalString[12] = "";
/* Here is the string is empty but because you use double quotes
* compiler understands that you are initializing a string, so '\0' is auto
* appended to the end of it by the compiler to mark the end of the string.
*/
So is the case with convertedString[11] change it to
char convertedString[12] = "";
Change
void getString(char *a[]);
to
void getString(char a[]); //char *a is also fine
Change
int determinIfConvert(char *a)
to
int determinIfConvert(char a) // You wish to check a character
You may wish to replace
scanf("%s", a);
with
fgets(a,12,stdin);
because scanf can't check for overflows but fgets can. Here you can have up to 11 characters in the string. If an overflow occurs, the rest of the input is trimmed and '\0' is assigned to the 12th byte.
You may wish to use the islower function to check is a character is lowercase. So you may change
if (a == 97)
to
if (islower(a)) // check if a character is lowercase.
Remember you may need to include the string.h header to use islower()

Compilation error in strncpy in my code to find and print the longest word

I have written a program to find the longest word and to print it.
My code is:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int MaxWord(char text[],char[]);
int main (void){
char text[1000];
char word[1000];
int max;
printf("geben Sie den Text bitte : ");
gets(text);
max=MaxWord(text,word);
printf("ist mit %d Zeichen das laengste wort im Text\n\n\n",max);
return 0;
}
int MaxWord(char text[], char word[])
{
char i;
int ctr=0;
int max=0;
int len;
char begin=0;
len=strlen(text);
for(i=0;i<len+1;i++)
{
if(isalpha(text[i]))
{
if(ctr==0)
{
begin=i;
}
ctr++;
}
else
{
if(ctr>max)
{
max=ctr;
}
ctr=0;
}
}
strncpy(word,begin,max);
printf("%s ",word);
return max;
}
and the error is:
error #2140: Type error in argument 2 to 'strncpy'; expected 'const char * restrict' but found 'char'.
How can I fix this?
Firstly you should not be using gets() function. Use scanf instead.
Also see
http://www.cplusplus.com/reference/cstring/strncpy/
The function strncpy expects a const char* ( so that you are assured the function will not modify the source string ) and you are passing it a char. Hence the error. Please modify your function to pass in a char pointer.
You would need to recheck your logic and fix your strncpy call by passing in the right source string.
Your logic in MaxWord is flawed: you always attempt to copy the last word encountered with the longest length. The type char is inappropriate for i and begin as these are offsets in text potentially larger than 127.
Furthermore, strncpy does not do what you think it does, it is an error prone function that can may not null terminate the destination buffer. Do not use this function.
Do not use gets either because it cannot be used safely, invalid input will cause a buffer overflow.
Here is a corrected version:
int MaxWord(const char *text, char *word) {
int i, ctr = 0, max = 0, len, begin = 0, best = 0;
len = strlen(text);
for (i = 0; i < len; i++) {
if (isalpha((unsigned char)text[i])) {
if (ctr == 0) {
begin = i;
}
ctr++;
} else {
if (ctr > max) {
best = begin;
max = ctr;
}
ctr = 0;
}
}
memcpy(word, test + best, max);
word[max] = '\0';
printf("%s ", word);
return max;
}
It may seem surprising to cast text[i] as (unsigned char), but isalpha() is defined as taking an int argument with the value of an unsigned char or the constant EOF (usually defined as -1). If your compiler considers char to be a signed type, characters in text with the high bit set will be considered negative and will be sign extended when passed to isalpha potentially invoking incorrect or undefined behavior.
Another problem with your code is the determination of word boundaries: if you type in laengste correctly as längste, isalpha() may incorrectly consider the character or characters encoding the ä as a separator instead of a letter. Welcome to the intricate world of character encodings!

Using Pointers and strtok()

I'm building a linked list and need your assistance please as I'm new to C.
I need to input a string that looks like this: (word)_#_(year)_#_(DEFINITION(UPPER CASE))
Ex: Enter a string
Input: invest_#_1945_#_TRADE
Basically I'm looking to build a function that scans the DEFINITION and give's me back the word it relates to.
Enter a word to search in the dictionary
Input: TRADE
Output: Found "TREADE" in the word "invest"
So far I managed to come up using the strtok() function but right now I'm not sure what to do about printing the first word then.
Here's what I could come up with:
char split(char words[99],char *p)
{
p=strtok(words, "_#_");
while (p!=NULL)
{
printf("%s\n",p);
p = strtok(NULL, "_#_");
}
return 0;
}
int main()
{
char hello[99];
char *s = NULL;
printf("Enter a string you want to split\n");
scanf("%s", hello);
split(hello,s);
return 0;
}
Any ideas on what should I do?
I reckon that your problem is how to extract the three bits of information from your formatted string.
The function strtok does not work as you think it does: The second argument is not a literal delimiting string, but a string that serves as a set of characters that are delimiters.
In your case, sscanf seems to be the better choice:
#include <stdlib.h>
#include <stdio.h>
int main()
{
const char *line = "invest_#_1945 _#_TRADE ";
char word[40];
int year;
char def[40];
int n;
n = sscanf(line, "%40[^_]_#_%d_#_%40s", word, &year, def);
if (n == 3) {
printf("word: %s\n", word);
printf("year: %d\n", year);
printf("def'n: %s\n", def);
} else {
printf("Unrecognized line.\n");
}
return 0;
}
The function sscanf examines a given string according to a given pattern. Roughly, that pattern consists of format specifiers that begin with a percent sign, of spaces which denote any amount of white-space characters (including none) and of other characters that have to be matched varbatim. The format specifiers yield a result, which has to be stored. Therefore, for each specifier, a result variable must be given after the format string.
In this case, there are several chunks:
%40[^_] reads up to 40 characters that are not the underscore into a char array. This is a special case of reading a string. Strings in sscanf are really words and may not contain white space. The underscore, however, would be part of a string, so in order not to eat up the underscore of the first delimiter, you have to use the notation [^(chars)], which means: Any sequence of chars that do not contain the given chars. (The caret does the negation here, [(chars)] would mean any sequence of the given chars.)
_#_ matches the first delimiter literally, i.e. only if the next chars are underscore hash mark, underscore.
%d reads a decimal number into an integer. Note that the adress of the integer has to be given here with &.
_#_ matches the second delimiter.
%40s reads a string of up to 40 non-whitespace characters into a char array.
The function returns the number of matched results, which should be three if the line is valid. The function sscanf can be cumbersome, but is probably your best bet here for quick and dirty input.
#include <stdio.h>
#include <string.h>
char *strtokByWord_r(char *str, const char *word, char **store){
char *p, *ret;
if(str != NULL){
*store = str;
}
if(*store == NULL) return NULL;
p = strstr(ret=*store, word);
if(p){
*p='\0';
*store = p + strlen(word);
} else {
*store = NULL;
}
return ret;
}
char *strtokByWord(char *str, const char *word){
static char *store = NULL;
return strtokByWord_r(str, word, &store);
}
int main(){
char input[]="invest_#_1945_#_TRADE";
char *array[3];
char *p;
int i, size = sizeof(array)/sizeof(char*);
for(i=0, p=input;i<size;++i){
if(NULL!=(p=strtokByWord(p, "_#_"))){
array[i]=p;//strdup(p);
p=NULL;
} else {
array[i]=NULL;
break;
}
}
for(i = 0;i<size;++i)
printf("array[%d]=\"%s\"\n", i, array[i]);
/* result
array[0]="invest"
array[1]="1945"
array[2]="TRADE"
*/
return 0;
}

Resources