Finding the longest common word in two strings in C Language? - c

I have had a problem with finding the longest common word in two strings for some time now. First I had an idea of doing it with "isspace" function, but couldn't figure out how to find a common word. Then "strcmp" came to my mind, but so far I was only able to compare two strings. I was thinking of some way to incorporate strcmp and isspace in order to find the different words and then use a temp value to find the longest one, but I couldn't think of the correct code to do so.
#include <stdio.h>
int strcmp(char s[],char t[]);
void main()
{
char s[20],t[20];
printf("Type in a string s.\n");
gets(s);
printf("Type in a string t.\n");
gets( t );
printf("The result of comparison=%d\n",strcmp(s,t));
return 0;
}
int strcmp(char s[],char t[])
{
int i;
for(i=0;s[i]==t[i];i++)
if(s[i]=='\0')
return( 0 );
return(s[i]-t[i]);
}
Please help me with this one. All ideas (and code) are welcomed and appreciated. Thank you in advance!
Edit::
I have been battling with this one for a while and I think I have the solution, however it's a very rigid method. The program has a bug, probably with array "ptrArray1", but I cannot fix it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int returnArrayOfWords (char* str4Parsing, char* arrayParsed[])
{
// returns the length of array
char seps[] = " \t\n"; // separators
char *token = NULL;
char *next_token = NULL;
int i = 0;
// Establish string and get the first token:
token = strtok( str4Parsing, seps);
// While there are tokens in "str4Parsing"
while ((token != NULL))
{
// Get next token:
arrayParsed[i] = token;
//printf( " %s\n", arrayParsed[i] );//to be commented
token = strtok( NULL, seps);
i++;
}
return i;
}
void printArr(char *arr[], int n)
{
int i;
for ( i = 0; i < n; i++)
{
printf("Element %d is %s \n", i, arr[i]);
}
}
void findLargestWord(char *ptrArray1[], int sizeArr1, char *ptrArray2[], int sizeArr2)
{
int maxLength = 0;
char *wordMaxLength = NULL ;
int i = 0, j = 0;
char *w1 = NULL, *w2 = NULL; /*pointers*/
int currLength1 = 0, currLength2 = 0 ;
//printArr(&ptrArray1[0], sizeArr1);
//printArr(&ptrArray2[0], sizeArr2);
for (i = 0; i < sizeArr1; i++)
{
// to find the largest word in the array
w1 = (ptrArray1[i]); // value of address (ptrArray1 + i)
currLength1 = strlen(w1);
//printf("The word from the first string is: %s and its length is : %d \n", w1, currLength1); // check point
for (j = 0; j < sizeArr2; j++)
{
w2 = (ptrArray2[j]); // value of address (ptrArray2 + j)
currLength2 = strlen(w2);
//printf("The word from the second string is : %s and its length is : %d \n", w2, currLength2); // check point
if (strcoll(w1, w2) == 0 && currLength1 == currLength2)
// compares the strings
{
if (currLength2 >= maxLength)
// in the variable maxLength -> the length of the longest word
{
maxLength = currLength2;
wordMaxLength = w2;
printf("The largest word for now is : %s and its length is : %d \n", wordMaxLength, maxLength); // check point
}
}
}
}
printf("The largest word is: %s \n", wordMaxLength);
printf("Its length is: %d \n", maxLength);
}
int main ()
{
int n = 80; /*max number of words in string*/
char arrS1[80], arrS2[80];
char *ptrArray1 = NULL, *ptrArray2 = NULL;
int sizeArr1 = 0, sizeArr2 = 0;
// to allocate memory:
ptrArray1 = (char*)calloc(80, sizeof(char));
if(ptrArray1 == NULL)
{
printf("Error! Memory for Pointer 1 is not allocated.");
exit(0);
}
ptrArray2 = (char*)calloc(80, sizeof(char));
if(ptrArray2 == NULL)
{
printf("Error! Memory for Pointer 2 is not allocated.");
exit(0);
}
printf("Type your first string: ");
fgets(arrS1, 80, stdin);
sizeArr1 = returnArrayOfWords (arrS1, &ptrArray1); // sizeArr1 = number of elements in array 1
printf("Type your second string: ");
fgets(arrS2, 80, stdin);
sizeArr2 = returnArrayOfWords (arrS2, &ptrArray2); // sizeArr2 = number of elements in array 2
findLargestWord(&ptrArray1, sizeArr1, &ptrArray2, sizeArr2);
free(ptrArray1);
free(ptrArray2);
return 0;
}
I also tried to use the latter two posted soltuions, but I have a problem with working with them as stated below.
Any help with my code, fixing my problems with the latter solutions or coming up with new solutions is welcomed. Thank you all in advance!
PS. I'm sorry if my code is poorly placed. I'm still not very good with using the placement.

There are a large number of ways to approach the problem. Below, a pointer to each character in one of the strings is used to search the other for matching characters using strchr. After matching characters are found, a comparison loop runs advancing each of the pointers to devise the length of the common substring present, if any.
The routine, match characters, check substring length, repeat, continues so longs as strchr returns a valid pointer.Each time a longer substring is found, the max length is updated for return and the substring present is copied to r with strncpy and nul-terminated so that the text of the longest common string is available to the calling function, main here.
This is a rather brute force method, and there may be a few additional tweaks to improve efficiency. The function itself is:
/** return length of longest common substring in 'a' and 'b'.
* by searching through each character in 'a' for each match
* in 'b' and comparing substrings present at each match. the
* size of the longest substring is returned, the test of the
* longest common substring is copied to 'r' and made available
* in the calling function. (the lengths should also be passed
* for validation, but that is left as an exercise)
*/
size_t maxspn (const char *a, const char *b, char *r)
{
if (!a||!b||!*a||!*b) return 0; /* valdate parameters */
char *ap = (char *)a; /* pointer to a */
size_t max = 0; /* max substring char */
for (; *ap; ap++) { /* for each char in a */
char *bp = (char *)b; /* find match in b with strchr */
for (; *bp && (bp = strchr (bp, *ap)); bp++) {
char *spa = ap, *spb = bp; /* search ptr initialization */
size_t len = 0; /* find substring len */
for (; *spa && *spb && *spa == *spb; spa++, spb++) len++;
if (len > max) { /* if max, copy to r */
strncpy (r, ap, (max = len));
r[max] = 0; /* nul-terminate r */
}
}
}
return max;
}
The length max is returned, and then updates to r during function execution cause r to hold the string associated with the longest substring match.
Additional improvements were to remove gets which was removed in C11 without deprecation due to its security risk. It should no longer be used by any sane coder (that should cover about 40% of us). Putting the remaining bits together, a small bit of test code could be:
#include <stdio.h>
#include <string.h>
#define MAXC 128
size_t maxspn (const char *a, const char *b, char *r);
void rmlf (char *s);
int main (void) {
char res[MAXC] = "", s[MAXC] = "", t[MAXC] = "";
printf ("Type in a string 's': ");
if (!fgets (s, MAXC, stdin)) { /* validate 's' */
fprintf (stderr, "error: invalid input for 's'.\n");
return 1;
}
rmlf (s); /* remove trailing newline */
printf ("Type in a string 't': ");
if (!fgets (t, MAXC, stdin)) { /* validate 't' */
fprintf (stderr, "error: invalid input for 's'.\n");
return 1;
}
rmlf (t); /* remove trailing newline */
/* obtain longest commons substring between 's' and 't' */
printf ("\nThe longest common string is : %zu ('%s')\n",
maxspn (s, t, res), res);
return 0;
}
/** return length of longest common substring in 'a' and 'b'.
* by searching through each character in 'a' for each match
* in 'b' and comparing substrings present at each match. the
* size of the longest substring is returned, the test of the
* longest common substring is copied to 'r' and made available
* in the calling function. (the lengths should also be passed
* for validation, but that is left as an exercise)
*/
size_t maxspn (const char *a, const char *b, char *r)
{
if (!a||!b||!*a||!*b) return 0; /* valdate parameters */
char *ap = (char *)a; /* pointer to a */
size_t max = 0; /* max substring char */
for (; *ap; ap++) { /* for each char in a */
char *bp = (char *)b; /* find match in b with strchr */
for (; *bp && (bp = strchr (bp, *ap)); bp++) {
char *spa = ap, *spb = bp;
size_t len = 0; /* find substring len */
for (; *spa && *spb && *spa == *spb; spa++, spb++) len++;
if (len > max) { /* if max, copy to r */
strncpy (r, ap, (max = len));
r[max] = 0; /* nul-terminate r */
}
}
}
return max;
}
/** remove trailing newline from 's'. */
void rmlf (char *s)
{
if (!s || !*s) return;
for (; *s && *s != '\n'; s++) {}
*s = 0;
}
Example Use/Output
$ ./bin/strspn
Type in a string 's': a string with colors123456789 all blue
Type in a string 't': a string without colors1234567890 all red
The longest common string is : 16 (' colors123456789')
or, another that may be easier to visualize:
$ ./bin/strspn
Type in a string 's': green eel
Type in a string 't': cold blue steel
The longest common string is : 3 ('eel')
Look over the code and compare with the other answers. Let me know if you have any additional questions. There are a few other validations that should be added to insure text is not written beyond the ends of buffers, etc.. Hopefully this will provide a bit of help or an alternative approach.
Additional Substrings
Just to make sure you and I are seeing the same thing, I have included additional examples of use below. There is no error, and the code preforms as intended. If you are having trouble modifying the code, please let me know what you are attempting to do and I can help. Each of the pointer increments in my code above are validated. If you change anything regarding the pointer increment or nul-termination, the code will not work unless you account for the changes in the validations as well.
$ ./bin/strspn
Type in a string 's': 1
Type in a string 't':
The longest common string is : 0 ('')
$ ./bin/strspn
Type in a string 's': A man a plan a canal panama
Type in a string 't': a man a plan a river panama
The longest common string is : 14 (' man a plan a ')
$ ./bin/strspn
Type in a string 's': this is my favorite string
Type in a string 't': this is my favoritist string
The longest common string is : 18 ('this is my favorit')
$ ./bin/strspn
Type in a string 's': not the same until here
Type in a string 't': cant be equal till here
The longest common string is : 6 ('l here')
$ ./bin/strspn
Type in a string 's': some str with ten in the middle
Type in a string 't': a string often ignorded
The longest common string is : 5 ('ten i')
Longest Common Word
OK, after I finally understand what you are trying to accomplish, you can select the longest common word between the two strings 's' and 't' by tokenizing each string with strtok, saving a pointer to each word in each string in separate pointer arrays, and then simply iterating over the pointer arrays to select the longest common word (1st if multiple common words of the same length). Something as simple as the following is all you need.
NOTE strtok modifies the strings 's' and 't', so make a copy if you need to preserve the originals.
/** return length of longest common word in 'a' and 'b'.
* by tokenizing each word in 'a' & 'b' and iterating over
* each, returning the length of the logest match, and updating
* 'r' to contain the longest common word.
*/
size_t maxspnwhole (char *a, char *b, char *r)
{
if (!a||!b||!*a||!*b) return 0; /* valdate parameters */
char *arra[MAXC] = {NULL}, *arrb[MAXC] = {NULL};
char *ap = a, *bp = b; /* pointers to a & b */
char *delim = " .,-;\t\n"; /* word delimiters */
size_t i, j, len, max, na, nb; /* len, max, n-words */
len = max = na = nb = 0;
/* tokenize both strings into pointer arrays */
for (ap = strtok (a, delim); ap; ap = strtok (NULL, delim))
arra[na++] = ap;
for (bp = strtok (b, delim); bp; bp = strtok (NULL, delim))
arrb[nb++] = bp;
for (i = 0; i < na; i++) /* select longest common word */
for (j = 0; j < nb; j++)
if (*arra[i] == *arrb[j]) /* 1st chars match */
if (!strcmp (arra[i], arrb[j])) { /* check word */
len = strlen (arra[i]);
if (len > max) { /* if longest */
max = len; /* update max */
strcpy (r, arra[i]); /* copy to r */
}
}
return max;
}
Integrating it with the other code, you can compare the results like this:
#include <stdio.h>
#include <string.h>
#define MAXC 128
size_t maxspn (const char *a, const char *b, char *r);
size_t maxspnwhole (char *a, char *b, char *r);
void rmlf (char *s);
int main (void) {
char res[MAXC] = "", s[MAXC] = "", t[MAXC] = "";
printf ("Type in a string 's': ");
if (!fgets (s, MAXC, stdin)) { /* validate 's' */
fprintf (stderr, "error: invalid input for 's'.\n");
return 1;
}
rmlf (s); /* remove trailing newline */
printf ("Type in a string 't': ");
if (!fgets (t, MAXC, stdin)) { /* validate 't' */
fprintf (stderr, "error: invalid input for 's'.\n");
return 1;
}
rmlf (t); /* remove trailing newline */
/* obtain longest commons substring between 's' and 't' */
printf ("\nThe longest common string is : %zu ('%s')\n",
maxspn (s, t, res), res);
/* obtain longest commons word between 's' and 't' */
printf ("\nThe longest common word is : %zu ('%s')\n",
maxspnwhole (s, t, res), res);
return 0;
}
/** return length of longest common word in 'a' and 'b'.
* by tokenizing each word in 'a' & 'b' and iterating over
* each, returning the length of the logest match, and updating
* 'r' to contain the longest common word.
*/
size_t maxspnwhole (char *a, char *b, char *r)
{
if (!a||!b||!*a||!*b) return 0; /* valdate parameters */
char *arra[MAXC] = {NULL}, *arrb[MAXC] = {NULL};
char *ap = a, *bp = b; /* pointers to a & b */
char *delim = " .,-;\t\n"; /* word delimiters */
size_t i, j, len, max, na, nb; /* len, max, n-words */
len = max = na = nb = 0;
/* tokenize both strings into pointer arrays */
for (ap = strtok (a, delim); ap; ap = strtok (NULL, delim))
arra[na++] = ap;
for (bp = strtok (b, delim); bp; bp = strtok (NULL, delim))
arrb[nb++] = bp;
for (i = 0; i < na; i++)
for (j = 0; j < nb; j++)
if (*arra[i] == *arrb[j])
if (!strcmp (arra[i], arrb[j])) {
len = strlen (arra[i]);
if (len > max) {
max = len;
strcpy (r, arra[i]);
}
}
return max;
}
/** return length of longest common substring in 'a' and 'b'.
* by searching through each character in 'a' for each match
* in 'b' and comparing substrings present at each match. the
* size of the longest substring is returned, the test of the
* longest common substring is copied to 'r' and made available
* in the calling function. (the lengths should also be passed
* for validation, but that is left as an exercise)
*/
size_t maxspn (const char *a, const char *b, char *r)
{
if (!a||!b||!*a||!*b) return 0; /* valdate parameters */
char *ap = (char *)a; /* pointer to a */
size_t max = 0; /* max substring char */
for (; *ap; ap++) { /* for each char in a */
char *bp = (char *)b; /* find match in b with strchr */
for (; *bp && (bp = strchr (bp, *ap)); bp++) {
char *spa = ap, *spb = bp;
size_t len = 0; /* find substring len */
for (; *spa && *spb && *spa == *spb; spa++, spb++) len++;
if (len > max) { /* if max, copy to r */
strncpy (r, ap, (max = len));
r[max] = 0; /* nul-terminate r */
}
}
}
return max;
}
/** remove trailing newline from 's'. */
void rmlf (char *s)
{
if (!s || !*s) return;
for (; *s && *s != '\n'; s++) {}
*s = 0;
}
Example Use/Output
$ ./bin/strlongestcmn
Type in a string 's': I have a huge boat.
Type in a string 't': I have a small boat.
The longest common string is : 9 ('I have a ')
The longest common word is : 4 ('have')
Look it over and let me know if you have any further questions.

This example prints the longest substring given two input strings.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int *lcommon(char *str1, char *str2) {
int strlen1 = (unsigned) strlen(str1);
int strlen2 = (unsigned) strlen(str2);
int i, j, k;
int longest = 0;
int **ptr = malloc(2 * sizeof(int *));
static int *ret;
ret = calloc((unsigned) strlen1 + 1, sizeof(int));
for (i = 0; i < 2; i++)
ptr[i] = calloc((unsigned) strlen2, sizeof(int));
k = 0;
for (i = 0; i < strlen1; i++) {
memcpy(ptr[0], ptr[1], strlen2 * sizeof(int));
for (j = 0; j < strlen2; j++) {
if (str1[i] == str2[j]) {
if (i == 0 || j == 0) {
ptr[1][j] = 1;
} else {
ptr[1][j] = ptr[0][j-1] + 1;
}
if (ptr[1][j] > longest) {
longest = ptr[1][j];
k = 0;
ret[k++] = longest;
}
if (ptr[1][j] == longest) {
ret[k++] = i;
ret[k] = -1;
}
} else {
ptr[1][j] = 0;
}
}
}
for (i = 0; i < 2; i++)
free(ptr[i]);
free(ptr);
ret[0] = longest;
return ret;
}
int main(int argc, char *argv[]) {
int i, longest, *ret;
if (argc != 3) {
printf("usage: longest-common-substring string1 string2\n");
exit(1);
}
ret = lcommon(argv[1], argv[2]);
if ((longest = ret[0]) == 0) {
printf("There is no common substring\n");
exit(2);
}
i = 0;
while (ret[++i] != -1) {
printf("%.*s\n", longest, &argv[1][ret[i]-longest+1]);
}
exit(0);
}
Test
$ ./a.out computerprogramming javaprogrammer
programm
You can read more about the problem here.
You can also use an interactive program where you write the strings in the console:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int *lcommon(char *str1, char *str2) {
int strlen1 = (unsigned) strlen(str1);
int strlen2 = (unsigned) strlen(str2);
int i, j, k;
int longest = 0;
int **ptr = malloc(2 * sizeof(int *));
static int *ret;
ret = calloc((unsigned) strlen1 + 1, sizeof(int));
for (i = 0; i < 2; i++)
ptr[i] = calloc((unsigned) strlen2, sizeof(int));
k = 0;
for (i = 0; i < strlen1; i++) {
memcpy(ptr[0], ptr[1], strlen2 * sizeof(int));
for (j = 0; j < strlen2; j++) {
if (str1[i] == str2[j]) {
if (i == 0 || j == 0) {
ptr[1][j] = 1;
} else {
ptr[1][j] = ptr[0][j - 1] + 1;
}
if (ptr[1][j] > longest) {
longest = ptr[1][j];
k = 0;
ret[k++] = longest;
}
if (ptr[1][j] == longest) {
ret[k++] = i;
ret[k] = -1;
}
} else {
ptr[1][j] = 0;
}
}
}
for (i = 0; i < 2; i++)
free(ptr[i]);
free(ptr);
ret[0] = longest;
return ret;
}
int main(int argc, char *argv[]) {
int i, longest, *ret;
if (argc != 3) {
//printf("usage: longest-common-substring string1 string2\n");
char s[20], t[20];
printf("Type in a string s.\n");
fgets(s, 20, stdin);
printf("Type in a string t.\n");
fgets(t, 20, stdin);
ret = lcommon(s, t);
if ((longest = ret[0]) == 0) {
printf("There is no common substring\n");
exit(2);
}
i = 0;
while (ret[++i] != -1) {
printf("%.*s\n", longest, &s[ret[i] - longest + 1]);
}
//printf("The result of comparison=%d\n", strcmp(s, t));
exit(0);
} else { }
ret = lcommon(argv[1], argv[2]);
if ((longest = ret[0]) == 0) {
printf("There is no common substring\n");
exit(2);
}
i = 0;
while (ret[++i] != -1) {
printf("%.*s\n", longest, &argv[1][ret[i] - longest + 1]);
}
exit(0);
}
Test
Type in a string s.
string1
Type in a string t.
string2
string

Related

How to print a certain character of a string in a certain position

Is there a way to print a specified character of a string in a certain position?
For example:
char str[]="Hello";
Output:
o
I have to print the letter "o" in the position indicated by its index (in this case "4");
You can replace 4 below with any number from 0 to 4.
char kth = *(str + 4);
printf("\n%c", kth);
output:
o
This will print out the nth character from the string, padded on the left and right with spaces
void print_nth_padded(const char *str, size_t n) {
for (size_t i = 0; i < n; i++)
puts(" ");
printf("%c", str[n]);
for (size_t i = n + 1; i < strlen(str); i++)
puts(" ");
}
So for example, print_nth_padded("Hello", 4); will print out 4 spaces, followed by o.
I hope this code can help.
It simply verifies if the given index makes sense (if it is in the bounds of the array), and prints spaces befor it prints the actual character.
#include <stdio.h>
#include <string.h>
void print_char(const char * string, size_t index)
{
if (index >= strlen(string))
{
printf("The index is not correct\n");
return;
}
size_t i = 0;
while (i < index)
{
printf(" ");
i++;
}
printf("%c\n",string[i]);
}
Note: calling printf this many times is not optimized, but I thought to leave the code like that for clarity. A more optimised version would first construct the output array of characters and once ready print it.
So we could use malloc for this need:
void print_char(const char * string, size_t index)
{
size_t buffer_size = strlen(string);
if (index >= strlen(string))
{
printf("The index is not correct\n");
return;
}
char * buf = malloc(sizeof(char )* (buffer_size+1));
size_t i = 0;
while (i < index)
{
buf[i] = ' ';
i++;
}
buf[i] = string[i];
buf[i+1] = '\0';
//At this point your buffer is constructed
printf("%s\n",buf);
free(buf);
}
You can have a function that prints the whitespace character index times. When index is reached, the character in that position is displayed.
void display(const char* str, const int index)
{
for (size_t i = 0; i < index; i++)
putc(' ', stdout);
putc(str[index], stdout);
}
Note that it is your responsibility to verify index does not cross the size of the character array. For example,
const char str[] = "Hello";
size_t str_len = sizeof(str) / sizeof(str[0]);
const int index = 7;
if (index >= str_len || index <= 0) {
fprintf(stderr, "error: Attempted to access out-of-bound!\n");
return EXIT_FAILURE;
}
If we consider the full version of the program, we have:
#include <stdio.h>
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
void display(const char* str, const int index)
{
for (size_t i = 0; i < index; i++)
putc(' ', stdout);
putc(str[index], stdout);
}
int main(void)
{
const char str[] = "Hello";
size_t str_len = sizeof str / sizeof str[0];
const int index = 4;
// Verification of the validity of the array index.
if (index >= str_len || index <= 0) {
fprintf(stderr, "error: Attempted to access out-of-bound!\n");
return EXIT_FAILURE;
}
// Calling the function safely.
display(str, index);
return EXIT_SUCCESS;
}
Now, we have the following output:
$ gcc -std=c99 -g 1.c && ./a.out
o
Use the specified character to find its position in the string.
char str[]="Hello";
char specified_character = 'o';
char *position = strchr(str, specified_character);
If found, print the character with left side ' ' padding by passing in a width.
if (position) {
int padding = position - str;
int width = padding + 1;
printf("%*c\n", width, str[padding]);
}
Output:
o

Not getting output from string array function in c

I was making a split function in C to use its return value in some programs. But when I checked its value using printf, I discovered that there are some errors but I was unable to fix them myself. I fixed most of the errors I could.
The code I wrote is:
#include <stdio.h>
#include <string.h>
char **split(char *token, char *delimiter, int *a[], int *size_of_a) {
int i = 0;
char **final_result;
char *str = strtok(token, delimiter);
while (str != NULL) {
*a[i] = strlen(str); //maybe one of the errors but I don't know how to fix it
//even after removing a[i] by backslash and changing it in loop and main, there is still no output received in main
getch();
for (int j = 0; j < *a[i]; j++) {
final_result[i][j] = str[j];
}
str = strtok(NULL, delimiter);
i++;
}
*size_of_a = i;
return final_result;
}
int main() {
char *parameter_1;
char *parameter_2;
int *size_1;
int size_2;
printf("Enter the token: ");
scanf("%s", &parameter_1);
printf("\nEnter the delimiter: ");
scanf("%s", &parameter_2);
char **result_2 = split(parameter_1, parameter_2, &size_1, &size_2);
printf("\nThe result is:");
for (int x = 0; x < size_2; x++) {
printf('\n');
for (int y = 0; y < size_1[x]; y++) {
printf("%c", result_2[x][y]);
}
}
getch();
return 0;
}
How can I fix the output error?
There are multiple problems in the code:
You do not allocate space for the array of pointers: final_result is uninitialized, storing anything via dereferencing it has undefined behavior, most likely a segmentation fault.
You should use strcpn() and strspn() to compute the number of tokens, allocate the array with or without an extra slot for a NULL terminator and perform a second phase splitting the tokens and storing the pointers to the array. You might want to store copies of the tokens to avoid modifying the original string that may be constant or go out of scope.
printf('\n'); is invalid: you must pass a string, not a character constant.
scanf("%s", &parameter_1); also has undefined behavior: you pass the address of a pointer instead of a pointer to an array of char.
Here is a modified version:
#ifdef _MSC_VER
#define _CRT_SECURE_NO_WARNINGS
#endif
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifdef _MSC_VER
// define POSIX function strndup if not available
char *strndup(const char *s, size_t n) {
size_t len;
for (len = 0; len < n && s[len]; len++)
continue;
char *ptr = malloc(len + 1);
if (ptr) {
memcpy(ptr, s, len);
ptr[len] = '\0';
}
return ptr;
}
#endif
char **split(const char *str, const char *delimiters, int **a, int *size_of_a) {
int i, count, len;
char **final_result;
const char *p;
// phase 1: count the number of tokens
p = str + strspn(str, delimiters);
for (count = 0; *p; count++) {
p += strcspn(p, delimiters);
p += strspn(p, delimiters);
}
// phase 2: allocate the arrays
final_result = calloc(sizeof(*final_result), count + 1);
if (a) {
*a = calloc(sizeof(**a), count);
}
if (size_of_a) {
*size_of_a = count;
}
// phase 3: copy the tokens
p = str;
for (i = 0; i < count; i++) {
p += strspn(p, delimiters); // skip the delimiters
len = strcspn(p, delimiters); // count the token length
if (a) {
(*a)[i] = len;
}
final_result[i] = strndup(p, len); // duplicate the token
p += len;
}
final_result[count] = 0;
return final_result;
}
// read and discard the rest of the user input line
int flush_input(void) {
int c;
while ((c = getchar()) != EOF && c != '\n')
continue;
return c;
}
int main() {
char buf[256];
char delimiters[20];
printf("Enter the string: ");
if (scanf("%255[^\n]", buf) != 1)
return 1;
flush_input();
printf("\nEnter the delimiters: ");
if (scanf("%19[^\n]", delimiters) != 1)
return 1;
flush_input();
int *sizes;
int count;
char **array = split(buf, delimiters, &sizes, &count);
printf("\nThe result is:\n");
for (int x = 0; x < count; x++) {
for (int y = 0; y < sizes[x]; y++) {
putchar(array[x][y]);
}
printf("\n");
}
getchar();
return 0;
}

Error when extracting sub-string from the start of source string in C

This is from an exercise in Chapter 9 of Programming in C 4th Edition. The programme is to read in characters into a string and extract a portion of the string into a sub-string by specifying a start position and number of characters.
The programme compiles and runs well except when the zeroth position of the source is stated as the start. Nothing is then displayed.
This is my code.
/* Programme to extract a portion from a string using function
sub-string (source, start, count, result) ex9.4.c
ALGORITHM
Get text input into a char array (declare to be fixed size);
Determine length of source string;
Prepare result array to be dynamic length using desired count + 1;
Copy from source array into result array
*/
#include <stdio.h>
#include <stdbool.h>
#define MAX 501
void read_Line (char buffer[]);
int string_Length (char string[]);
void sub_String (char source[], int start, int count, char result[]);
int main(void)
{
char strSource[MAX];
bool end_Of_Text = false;
int strCount = 0;
printf("This is a programme to extract a sub-string from a source string.\n");
printf("\nType in your text (up to 500 characters).\n");
printf("When you are done, press 'RETURN or ENTER'.\n\n");
while (! end_Of_Text)
{
read_Line(strSource);
if (strSource[0] == '\0')
{
end_Of_Text = true;
}
else
{
strCount += string_Length(strSource);
}
}
// Declare variables to store sub-string parameters
int subStart, subCount;
char subResult[MAX];
printf("Enter start position for sub-string: ");
scanf(" %i", &subStart);
getchar();
printf("Enter number of characters to extract: ");
scanf(" %i", &subCount);
getchar();
// Call sub-string function
sub_String(strSource, subStart, subCount, subResult);
return 0;
}
// Function to get text input
void read_Line (char buffer[])
{
char character;
int i = 0;
do
{
character = getchar();
buffer[i] = character;
++i;
}
while (character != '\n');
buffer[i - 1] = '\0';
}
// Function to count determine the length of a string
int string_Length (char string[])
{
int len = 0;
while (string[len] != '\0')
{
++len;
}
return len;
}
// Function to extract substring
void sub_String (char source[], int start, int count, char result[])
{
int i, j, k;
k = start + count;
for (i = start, j = 0; i < k || i == '\0'; ++i, ++j)
{
result[j] = source[i];
}
result[k] = '\0';
printf("%s\n", result);
}
I am using Code::Blocks on Linux Mint.
Being someone that just started learning programming recently with CS50 and 'Programming in C' books, I did not know how to setup the debugger in Code::Blocks. But thanks to the push by #paulsm4, I managed to get the debugger working. Using the watches window of the debugger, I could see that the while loop in the main function was overwriting the first character in the source array with a null character. The fix was to add a break statement. Thanks to #WhozCraig and #Pascal Getreuer for pointing out other errors that I had missed. This is the corrected code now:
/* Programme to extract a portion from a string using function
sub-string (source, start, count, result) ex9.4.c
ALGORITHM
Get text input into a char array (declare to be fixed size);
Determine length of source string;
Prepare result array to be dynamic length using desired count + 1;
Copy from source array into result array
*/
#include <stdio.h>
#include <stdbool.h>
#define MAX 501
void read_Line (char buffer[]);
int string_Length (char string[]);
void sub_String (char source[], int start, int count, char result[]);
int main(void)
{
char strSource[MAX];
bool end_Of_Text = false;
int strCount = 0;
printf("This is a programme to extract a sub-string from a source string.\n");
printf("\nType in your text (up to 500 characters).\n");
printf("When you are done, press 'RETURN or ENTER'.\n\n");
while (! end_Of_Text)
{
read_Line(strSource);
if (strSource[0] == '\0')
{
end_Of_Text = true;
}
else
{
strCount += string_Length(strSource);
}
break;
}
// Declare variables to store sub-string parameters
int subStart, subCount;
char subResult[MAX];
printf("Enter start position for sub-string: ");
scanf(" %i", &subStart);
getchar();
printf("Enter number of characters to extract: ");
scanf(" %i", &subCount);
getchar();
// Call sub-string function
sub_String(strSource, subStart, subCount, subResult);
return 0;
}
// Function to get text input
void read_Line (char buffer[])
{
char character;
int i = 0;
do
{
character = getchar();
buffer[i] = character;
++i;
}
while (character != '\n');
buffer[i - 1] = '\0';
}
// Function to count determine the length of a string
int string_Length (char string[])
{
int len = 0;
while (string[len] != '\0')
{
++len;
}
return len;
}
// Function to extract substring
void sub_String (char source[], int start, int count, char result[])
{
int i, j, k;
k = start + count;
// Source[i] == '\0' is used in case count exceeds source string length
for (i = start, j = 0; i < k || source[i] == '\0'; ++i, ++j)
{
result[j] = source[i];
}
result[j] = '\0';
printf("%s\n", result);
}

get line in the text for each word

The following program prints the word frequency in a file. I am trying to save for each word in the text file, which line/lines appeared and how many times in total in the whole file. It counts how many times it appeared but I can't get which lines of the txt file.
There is a problem with int lines[]. The ouput of the program gives
segmentation fault
#define MAXWORDS 10000
#define MAXSTRING 100
/* structure holding word frequency information */
typedef struct _word {
char s[MAXSTRING]; /* the word */
int count; /* number of times word occurs */
int lines[1000];
} word;
void insert_word (word *words, int *n, char *s, int no) {
int i;
/* linear search for the word */
for (i=0; i<*n; i++) if (strcmp (s, words[i].s) == 0) {
/* found it? increment and return. */
words[i].count++;
words[i].lines[words[i].count]=no;
printf("%d", no);
return;
}
/* error conditions... */
if (strlen (s) >= MAXSTRING) {
fprintf (stderr, "word too long!\n");
exit (1);
}
if (*n >= MAXWORDS) {
fprintf (stderr, "too many words!\n");
exit (1);
}
/* copy the word into the structure at the first available slot,
* i.e., *n
*/
strcpy (words[*n].s, s);
/* this word has occured once up to now, so count = 1 */
words[*n].count = 1;
words[*n].lines[words[*n].count]=no;
/* one more word */
(*n)++;
}
...........
int main () {
word words[MAXWORDS];
char s[1000];
int i, n, m;
n = 0;
FILE* file = fopen("test.txt", "r");
/* read all the words in the file... */
int no=1;
while (fgets(s, sizeof(s), file)) {
scanf ("%s", s);
insert_word (words, &n, s,no);
no=no+1;
}
}
fclose(file);
qsort((void *) words, n, sizeof (word),
(int (*) (const void *, const void *)) wordcmp);
if (n < 20)
m = n;
else
m = 20;
for (i=0; i<m; i++)
printf ("%s\t[%d] {%d} \n", words[i].s, words[i].count, words[i].lines);
}
Here's a solution with dynamic memory allocation:
typedef struct _word {
char *s; /* the word */
int count; /* number of times word occurs */
int *line_numbers; // Array of line numbers
int num_line_numbers; // Size of the array of line numbers
} word;
// Creating a struct to hold the data. I find it's easier
typedef struct {
word *words; // The array of word structs
int num_words; // The size of the array
} word_list;
void insert_word (word_list *words, char *s, int line_number)
{
/* linear search for the word */
for (int i = 0; i < words->num_words; i++) {
if (strcmp (s, words->words[i].s) == 0) {
/* found it? increment and return. */
words->words[i].count++;
// See if it already appeared in this line
if (words->words[i].line_numbers[words->words[i].num_line_numbers - 1] == line_number) {
return;
}
// New line number. Increase the line number array by one
int *tmp = realloc(words->words[i].line_numbers, sizeof(int) * (words->words[i].num_line_numbers + 1));
if (NULL == tmp) exit(0);
words->words[i].line_numbers = tmp;
// Add the line number to the array
words->words[i].line_numbers[words->words[i].num_line_numbers] = line_number;
words->words[i].num_line_numbers += 1;
return;
}
}
/* error conditions... */
....
// Increase the size of the word array by one.
word *tmp = realloc(words->words, sizeof(word) * (words->num_words + 1));
if (tmp == NULL) exit(0);
words->words = tmp;
/* copy the word into the structure at the first available slot,
* i.e., *n
*/
words->words[words->num_words].s = malloc(strlen(s) + 1);
strcpy(words->words[words->num_words].s, s);
/* this word has occurred once up to now, so count = 1 */
words->words[words->num_words].count = 1;
words->words[words->num_words].line_numbers = malloc(sizeof(int));
words->words[words->num_words].line_numbers[0] = line_number;
words->words[words->num_words].num_line_numbers = 1;
words->num_words += 1;
}
bool remove_word(word_list *words, const char *word_to_delete)
{
for (int i = 0; i < words->num_words; i++) {
if (0 == strcmp(words->words[i].s, word_to_delete)) {
// TODO: handle special case where there is only 1 word in list
// Calc number of words after found word
int number_of_words_to_right = words->num_words - i - 1;
// Free mem
free(words->words[i].s);
free(words->words[i].line_numbers);
// Copy remaining words
memcpy(&words->words[i], &words->words[i + 1], sizeof(word) * number_of_words_to_right);
// Resize the array (technically not required)
word *tmp = realloc(words->words, sizeof(word) * --words->num_words);
if (NULL == tmp) exit(0);
words->words = tmp;
return true;
}
}
return false;
}
And in main()
word_list *words = malloc(sizeof(word_list));
if (NULL == words) exit(0);
memset(words, 0, sizeof(word_list));
....
/* read all the words in the file... */
char s[1000];
int line_number = 1;
while (fgets(s, sizeof(s), file)) {
char *word = strtok(s, " ");
while (word != NULL) {
size_t len = strlen(word);
if (len > 0 && word[len - 1] == '\n') word[--len] = 0;
insert_word(words, word, line_number);
word = strtok(NULL, " ");
}
line_number += 1;
}
fclose(file);
for (int i = 0; i < words->num_words; i++) {
printf("%s\t\t[%d] {", words->words[i].s, words->words[i].count);
for (int j = 0; j < words->words[i].num_line_numbers; j++) {
if (j != 0) printf(",");
printf("%d", words->words[i].line_numbers[j]);
}
printf("}\n");
}
// It's good practice to always free mem. It's super not important
// in this app since the OS will do it when you exit
for (int i = 0; i < words->num_words; i++) {
free(words->words[i].s);
free(words->words[i].line_numbers);
}
free(words->words);
free(words);

How to find all occurrences and all position of a substring in a string?

I need to find all occurrences and output all positions of a substring in a string.
For example: my string is abaaab, my substring is aa, position is 3 and 4, because in aaa my substr is repeated twice.
I want the position at the end to be printed from right to left, and after the position of substring I want the number of occurrences of my subtring.
I tried to do it and I have this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char *str, c;
int x = 0, y = 1;
str = (char*)malloc(sizeof(char));
printf("Inserisci stringa principale : ");
while (c != '\n') {
// read the input from keyboard standard input
c = getc(stdin);
// re-allocate (resize) memory for character read to be stored
str = (char*)realloc(str, y * sizeof(char));
// store read character by making pointer point to c
str[x] = c;
x++;
y++;
}
str[x] = '\0'; // at the end append null character to mark end of string
printf("\nLa stringa inserita : %s", str);
char *sub, b;
int w = 0, z = 1;
sub = (char*)malloc(sizeof(char));
printf("Immetti sottostringa da cercare : ");
while (b != '\n') {
// read the input from keyboard standard input
b = getc(stdin);
// re-allocate (resize) memory for character read to be stored
sub = (char*)realloc(sub, z * sizeof(char));
// store read character by making pointer point to c
sub[w] = b;
w++;
z++;
}
sub[w] = '\0'; // at the end append null character to mark end of string
char *p1, *p2, *p3;
int i=0,j=0,flag=0;
p1 = str;
p2 = sub;
for(i = 0; i<strlen(str); i++)
{
if(*p1 == *p2)
{
p3 = p1;
for(j = 0;j<strlen(sub);j++)
{
if(*p3 == *p2)
{
p3++;p2++;
}
else
break;
}
p2 = sub;
if(j == strlen(sub))
{
flag = 1;
printf("\nSottostringa trovata all'indice : %d\n",i);
}
}
p1++;
}
if(flag==0)
{
printf("Sottostringa non trovata");
}
free(str);
free(sub);
return (0);
}
But it only shows me the position of the first occurrence, and not the number of occurrences.
There are multiple problems in your code:
Your string reallocation scheme is incorrect: the space allocated is one byte too short for the string and you never test for memory allocation failure. You could use getline() if your system supports it or at least write a function to factorize the code.
c is unsinitialized the first time you loop test c != '\n': this has undefined behavior.
Your matching algorithm is too complicated: you use both index values and moving pointers. Use one or the other.
Here is a simplified version:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* read an allocated string from stream.
stop at newline, not included in string.
Return NULL upon EOF
*/
char *my_getline(FILE *stream) {
char *line = NULL;
size_t pos = 0;
int c;
while ((c = getc(stream)) != EOF) {
char *newp = realloc(line, pos + 2);
if (newp == NULL) {
free(line);
return NULL;
}
line = newp;
if (c == '\n')
break;
line[pos++] = (char)c;
}
if (line) {
line[pos] = '\0';
}
return line;
}
int main(void) {
char *str, *sub;
size_t len1, len2, i, count = 0;
// type the main string
printf("Inserisci stringa principale :\n");
str = my_getline(stdin);
// type the substring to search for
printf("Immetti sottostringa da cercare :\n");
sub = my_getline(stdin);
if (str && sub) {
len1 = strlen(str);
len2 = strlen(sub);
for (i = 0; i + len2 <= len1; i++) {
if (!memcmp(str + i, sub, len2)) {
count++;
// substring found at offset
printf("Sottostringa trovata all'indice : %zu\n", i);
}
}
if (count == 0) {
// substring not found
printf("Sottostringa non trovata\n");
}
}
free(str);
free(sub);
return 0;
}
Notes:
The above code finds matches for the empty substring at every offset in the search string. Whether matches should be found or not is a question of specification, but this behavior is consistent with that of strstr().
you could also use standard function strstr() to locate the matches.
Here is a version of the main loop using strstr():
if (str && sub) {
for (char *p = str; (p = strstr(p, sub)) != NULL; p++) {
count++;
// substring found at offset
printf("Sottostringa trovata all'indice : %tu\n", p - str);
if (*p == '\0') /* special case for the empty string */
break;
}
if (count == 0) {
// substring not found
printf("Sottostringa non trovata\n");
}
}
I've checked you code and it seems that your code has problem in the line
if(j == strlen(sub))
Since j is starting from 0 it will always be 1 less than the length of the sub string, change your code to
if(j+1 == strlen(sub))
and it should solve your problem.
For number of occurrences you need another variable to count whenever there is a match with the substring, modifying the if block
if(j+1 == strlen(sub))
{
flag = 1;
occurrences+=1; //declare variable occurrences and initialize it to 0
printf("\nSottostringa trovata all'indice : %d\n",i);
}
Then after the end of the loop just print the 'occurrences' to get the desired result.
Also this is not an efficient way to solve the problem, you can refer to
https://www.topcoder.com/community/data-science/data-science-tutorials/introduction-to-string-searching-algorithms/
for better approach.
A trivial way of finding each occurrence is a strstr called in a loop. After each match, let strstr search one position after that where the match has been found:
int main( ) {
const char *string = "abaaab";
const char *toSearch = "aa";
int nrOfOccurences = 0;
printf("searching for occurences of '%s' in string '%s':\n", string, toSearch);
const char* pos = string;
while (pos) {
pos = strstr(pos, toSearch);
if (pos) {
printf("found occurence at position %td\n", pos-string);
nrOfOccurences++;
pos++; // skip one character
}
}
nrOfOccurences = findRecursive(string, toSearch, 0,0);
printf("nr of occurences: %d\n", nrOfOccurences);
return 0;
}
If you need - as somehow stated - to print the occurrences starting from the last one, you could use a recursive function like the following. A comment in the code above shows how to use it:
int findRecursive(const char* str, const char* toSearch, ptrdiff_t pos, int nrOfOccurences) {
char *next = strstr(str, toSearch);
if (next) {
ptrdiff_t foundPos = pos + next - str;
nrOfOccurences = findRecursive(next+1, toSearch, foundPos+1, nrOfOccurences+1);
printf("occurence found at position %td\n", foundPos);
}
return nrOfOccurences;
}

Resources