How to find unique word number of occurrences? - c

Can't figure out the bug in my code. Every time I input a sentence, the count does increment but the word adds the first letter of the previous word and increments one letter every time. How do I fix this?
void numberOfWordOccurrences(char str[MAX_CHAR]) {
int count = 0, i = 0, j = 0;
char uniqueToken[99][999];
int tokenCount[99] = {0};
while(str[i] != '\0') {
char token[999];
while(str[i] != ' ' && str[i] != '\0') {
token[j++] = str[i++];
}
if(token[j - 1] == ':' || token[j - 1] == ',' || token[j - 1] == '.' || token[j - 1] == ';' || token[j - 1] == '?' || token[j - 1] == '!') {
token[j - 1] = '\0';
}
//null
token[j] = '\0';
//flag
int flag = -1;
for(j = 0; j < count; j++) {
if(strcmp(uniqueToken[j], token) == 0) {
//if flag is valid, then...
flag = j;
tokenCount[flag] = token[flag] + 1;
break;
}
}
if(flag <= 1) {
tokenCount[count] = tokenCount[count] + 1;
strcpy(uniqueToken[count++], token);
}
i++;
}
}```

first you have to set j=0 inside of your main while loop ,otherwise when you go inside of this loop for(j = 0; j < count; j++) jwill in increase , so here token[j++] = str[i++]; you won't start to copy str in token from j=0 that is why you have previous words letters.
second I believe this condition if(flag <= 1) should be if(flag == -1) because if for example first and fifth word are similar flag would be 0 and again that string would be copied in uniqueToken.
also pay attention if you reach \0 you with your two i++ you will pass it and here while(str[i] != '\0') you won't check it so I suggest while(str[i-1] != '\0') also before sending string check if there is anything in it(in a case str[0]='\0'.
look
void numberOfWordOccurrences(char str[]) {
int count = 0, i = 0, j = 0;
char uniqueToken[99][999];
int tokenCount[99] = { 0 };
while (str[i-1] != '\0') {
j = 0;
char token[999];
while (str[i] != ' ' && str[i] != '\0') {
token[j++] = str[i++];
}
if (token[j - 1] == ':' || token[j - 1] == ',' || token[j - 1] == '.' || token[j - 1] == ';' || token[j - 1] == '?' || token[j - 1] == '!') {
token[j - 1] = '\0';
}
//null
token[j] = '\0';
//flag
int flag = -1;
for (j = 0; j < count; j++) {
if (strcmp(uniqueToken[j], token) == 0) {
//if flag is valid, then...
flag = j;
tokenCount[flag] = token[flag] + 1;
break;
}
}
if (flag == -1) {
tokenCount[count] = tokenCount[count] + 1;
strcpy(uniqueToken[count++], token);
strcpy(uniqueToken[count], "\0");
}
i++;
}
}

Related

Counting words in string with c

the push to vaccinate children has taken on fresh urgency amid concerns that the new omicron variant of the virus first identified in southern africa and hong kong in late november will spread quickly in the united states causing a surge in infections already back on the rise from the easily transmitted delta variant given the pervasiveness of delta and prospects of new variants spreading in the united states having as much immunity in the population as possible is critical said dr amesh adalja senior scholar at the johns hopkins center for health security
This is my assignment:
replace multiple spaces to one space between words and delete unnecessary spaces at the beginning and the end.
count the words
print edited string
dont use a new string, just edit.
I can't find problem. It should count the words but it can not do. Help me, please.
//Counting words program C
#include <stdio.h>
#define N 5000
int main(void) {
FILE *fp;
char text[N];
int k, d, leng, spacecount = 0;
int m, j, z, i, p, n;
if ((fp = fopen("soru.txt", "r")) == NULL) {
printf("Dosya acma hatasi!");
return 1;
}
fgets(text, N - 1, fp);
while (k < N && text[k] != '\0') {
leng++;
k++;
}
z = leng;
for (i = 0; i < leng; i++) {
if (i = 0 && text[i] == ' ') {
z--;
for (m = 0; m < leng; m++) {
text[m] = text[m + 1];
}
i--;
text[z] == '\0';
} else
if (text[i] ==' ' && text[i + 1] == ' ') {
z--;
for (j = i; j < leng; j++) {
text[j + 1] = text[j + 2];
}
i--;
text[z] == '\0';
} else
if (text[i] == ' ' && text[i + 1] == '\0') {
z--;
for (j = i; j < leng; j++) {
text[j] = text[j + 1];
}
i--;
text[z] == '\0';
} else
if (text[i] == '\0') {
break;
}
}
while (text[d] != '\0') {
if (text[d] == ' ')
spacecount++;
d++;
}
printf("kelime sayisi: %d" , spacecount + 1);
printf("\n cikti:%s ", text);
fclose(fp);
return 0;
}
I can't find problem. It should count the word but it can not do. Help me, please
for(i=0; i < leng; i++) {
if(i=0 && text[i]== ' '){
z--;
for(m=0; m< leng; m++ ){
text[m] = text [m+1];}
i--;
}
else if(1<i<z && text[i] ==' ' && text[i+1] == ' ' ){
z--;
for(j=i; j<leng ; j++) {
text[j+1] = text [j+2];}
i--;
}
else if(i=z && text[i] ==' ' && text[i+1] == '\0' ){
z--;
for(j=i; j<leng ; j++) {
text[j] = text [j+1]; }
i--;
}
},// I think problem in here. Endless loop
Your code is too complicated. You can solve the problem with 2 index variables: one to read the characters from the input line, one to write the relevant characters into the same buffer.
You would keep track of the previous character, starting with space, and detect the beginning of words as the current character is not a space following a space. You would thus count the words and only output a space before each word except the first on a line.
Here is a modified version:
//Counting words program C
#include <stdio.h>
#define N 5000
int main(void) {
FILE *fp;
char text[N];
int total_words = 0;
if ((fp = fopen("soru.txt", "r")) == NULL) {
printf("Dosya açma hatası!\n");
return 1;
}
while (fgets(text, N, fp) != NULL) {
int len = strlen(text);
int word_count = 0;
char c, lastc = ' ';
int i, j;
// strip the trailing newline
if (len > 0 && text[len - 1] == '\n') {
text[--len] == '\0';
}
for (i = j = 0; i < len; i++) {
c = text[i];
if (c != ' ') {
if (lastc == ' ') {
if (word_count > 0) {
// output a space between words
text[j++] = ' ';
}
word_count++;
}
text[j++] = c; // copy the non space character
}
lastc = c;
}
text[j] = '\0'; // set the null terminator
printf("kelime sayısı: %d\n", word_count);
printf("çıktı: %s\n", text);
total_words += word_count;
}
fclose(fp);
printf("\ntoplam kelime sayısı: %d\n", total_words);
return 0;
}
Note a silly bug in your code: if (i = 0 && text[i] == ' ') is parsed as if ((i = (0 && (text[i] == ' '))) != 0) which is always false and sets i to the value 0. C expression syntax is very powerful but somewhat error prone and confusing. I advise you to use -Wall or -Weverything as a compiler option to let the compiler warn about potential mistakes.
Similarly, you should not write if (1<i<z && ...: 1<i<z is parsed as 1<(i<z) which is always false. You must write 1 < i && i < z or more idiomatically i > 1 && i < z

Program to find how many words in a text don't contain a specific character

The following program runs without printing anything on the screen, maybe because the loop goes over the null character. I can't understand why and how this happens, and how to fix it.
//program to find how many word in the text doesn't contain p char
#include<stdio.h>
#include<stdbool.h>
#define space ' '
void find_word(char s[]) {
bool wordfound = false;
int i, j = 0, word = 0;
i = 0;
while (s[i]) { //s[i]!='\0' does not
if (s[i] != 'p' && s[i + 1] != space) { //for the first word
wordfound = true;
word++;
}
wordfound = false;
if (s[i] == space && s[i + 1] != space) { //for more than one word in the text
for (j = i + 1; s[j] != space; j++)
if (s[j] != 'p' && s[j + 1] != space)
wordfound = true;
}
if (wordfound) {
word++;
}
wordfound = false;
i = j;
i++;
} //end while loop
printf("Number of words not contain p character%d\n\n", word);
}
int main(void) {
char s[] = {"pppp zzzz ppp ssss dfg sfsfdsf"};
find_word(s);
return 0;
}
There are a few problems with this code, but the main one is that inside the loop you assign j to i which causes the infinite loop as the while(s[i]) condition is never met. Why don't you try to make it simple, like so:
//program to find how many word in the text doesn't contain p char
#include<stdio.h>
#include<stdbool.h>
#define space ' '
void find_word(char s[]) {
bool is_in = false;
short words_count = 0, i = 0;
while (s[i]) {
if (s[i] == 'p') { // if this letter is a 'p', mark the word
is_in = true;
}
if (s[i] == space) { // if it's end of word
if (!is_in) { // check if 'p' is present and increase the count
words_count++;
}
is_in = false;
}
i++;
}
if (!is_in) { // check if the last word has 'p'
words_count++;
}
printf("no. of words without p is %d\n", words_count);
}
int main(void) {
char s[] = {"pppp zzzz ppp ssss dfg sfsfdsf"};
find_word(s);
return 0;
}
You appear to have your for-loop terminating condition set to be unsatisfiable given your input.
if (s[i] == space && s[i + 1] != space) { //for more than one word in the text
for (j = i + 1; s[j] != space; j++)
if (s[j] != 'p' && s[j + 1] != space)
wordfound = true;
}
Here you are checking for a leading space in your input string. If you find it you then increment your index checking until you reach another space. What if your string doesn't have a trailing space?
Instead try to have a second condition for null and space to terminate the loop:
if (s[i] == space && s[i + 1] != space) { //for more than one word in the text
for (j = i + 1; s[j] != '\0' && [j] != space; j++)
if (s[j] != 'p' && s[j + 1] != space)
wordfound = true;
}
And then you set:
wordfound = false;
i = j;
i++;
} //end while loop
This will keep re-setting your loop, I'm not clear on your reasoning for this but that will run your loop indefinitely.
If you make these edits your code terminates:
#include<stdio.h>
#include<stdbool.h>
#define space ' '
void find_word(char s[]) {
bool wordfound = false;
int i, j = 0, word = 0;
i = 0;
while (s[i]) { //s[i]!='\0' does not
if (s[i] != 'p' && s[i + 1] != space) { //for the first word
wordfound = true;
word++;
}
wordfound = false;
if (s[i] == space && s[i + 1] != space) { //for more than one word in the text
for (j = i + 1; s[j] && s[j] != space; j++)
if (s[j] != 'p' && s[j + 1] != space)
wordfound = true;
}
if (wordfound) {
word++;
}
wordfound = false;
i++;
} //end while loop
printf("Number of words not contain p character%d\n\n", word);
}
int main(void) {
char s[] = {"pppp zzzz ppp ssss dfg sfsfdsf"};
find_word(s);
return 0;
}
Output:
Number of words not contain p character24

Calculating the % of comment text in a file

I'm trying to calculate the percentage of comment text in a file but I can't figure out what's wrong with my calculation method.
#include <stdio.h>
#include<stdlib.h>
int main()
{
int k, commNum1 = 0, commNum2 = 0, Nbrackets1 = 0, Nbrackets2 = 0, Cbrackets1 = 0, Cbrackets2 = 0, tabs = 0, spaces = 0;
char str[10000];
char ch, file_name[75];
FILE *fp;
char writtenText[2000];
printf("Enter the name of file you wish to see with extension .c or .txt\n");
gets(file_name);
fp = fopen(file_name, "a"); // reads the file
if (fp == NULL)
{
perror("Error while opening the file.\n");
_getche();
exit(EXIT_FAILURE);
}
printf("Enter a sentence:\n");
gets(writtenText);
fprintf(fp, "%s", writtenText);
fclose(fp);
fp = fopen(file_name, "r");
printf("The contents of %s file are :\n\n", file_name);
int i = 0;
while ((ch = fgetc(fp)) != EOF) {
// printf("%c", ch);
str[i] = ch; //printing and storing process
i++;
}
int fsize = i;
for (k = 0; k < fsize; k++) {
if (str[k] == '(')
Nbrackets1++;
}
for (k = 0; k < fsize; k++) {
if (str[k] == ')')
Nbrackets2++;
}
for (k = 0; k < fsize; k++) {
if (str[k] == '{')
Cbrackets1++;
}
for (k = 0; k < fsize; k++) {
if (str[k] == '}')
Cbrackets2++;
}
for (k = 0; k < fsize; k++) {
if (str[k] == '\t')
tabs++;
}
for (k = 0; k < fsize; k++) {
if (str[k] == ' ')
spaces++;
}
for (k = 0; k < fsize; k++) {
if (str[k] == '/' && str[k + 1] == '*') {
while (str[k] != '*' && str[k + 1] != '/') {
commNum1++;
if (str[k] == ' ') {
commNum1--;
}
// printf("commNum1 = %d\n",commNum1); //just to test if my calculations are correct
k++;
}
}
}
for (k = 0; k < fsize; k++) {
if (str[k] == '/' && str[k + 1] == '/') {
while (str[k] != '\n') {
commNum2++;
if (str[k] == ' ') {
commNum2--;
}
// printf("commNum2 = %d\n",commNum2); //just to test if my calculations are correct
k++;
}
}
}
double commAVG = (commNum1 + commNum2) / fsize * 100;
double avgTAS = (tabs + spaces) / 2;
printf("\n\nOccurence of character ( : %d", Nbrackets1);
printf("\nOccurence of character ) : %d", Nbrackets2);
printf("\nOccurence of character { : %d ", Cbrackets1);
printf("\nOccurence of character } : %d ", Cbrackets2);
printf("\nAverage number of spaces and tabulations: %2.f", avgTAS);
printf("\nPercentage of comment text in the file: %2.f%%", commAVG);
fclose(fp);
return 0;
}
My view is that the for loop goes through the whole array in which the text is stored. If it meets a specific set of characters (/* or //) it starts adding 1 to an int. While adding if it finds spaces in between, it subtracts 1. If it meets another specific character or set of characters (/* or \n) it stops adding and the for loop takes over and finishes searching through the whole array. The problem is that it's calculating something else and I can't figure out the flaw in my method. Thanks !
Lets do a little play through... (the thing you should do with your debugger)
for (k = 0; k < fsize; k++) {
if (str[k] == '/' && str[k + 1] == '*') {
while (str[k] != '*' && str[k + 1] != '/') {
commNum1++;
if (str[k] == ' ') {
commNum1--;
}
// printf("commNum1 = %d\n",commNum1); //just to test if my calculations are correct
k++;
}
}
}
Consider the text "/* abc */"
if (str[0] == '/' && str[1] == '*') // true
while (str[0] != '*' && str[1] != '/') // true
commNum1++;
k++;
while (str[1] != '*' && str[2] != '/') // false, cause str[1] == '*'
End of story.
You should try to first increment k above the comment start and then change the while condition
while (str[k] != '*' || str[k + 1] != '/') // instead of &&
Also, in loops where you use look-ahead, adjust your bounds
for (k = 0; k < (fsize - 1); k++) // instead of k < fsize
Maybe you have more errors, but this is the obvious one.
Edit:
Since you mentioned the 400% problem:
You potentially add the same comment for both, commNum1 and commNum2, if the comment is formed like //* comment text or /*// comment text */
Also, your inner while loops don't check for k < fsize, which means that the check will reach beyond the end of array for the last line in file. There you get undefined behavior, potentially counting after-end-of-file-comments until 400% are reached.
Things I'm not going to address further:
/\
* comment starts here, cause \ is preprocessor line removal which merges the two lines into a /*

Why is there an Invalid Write here (Valgrind)

I am coding a shell. When I execute it like this cat /dev/urandom | valgrind ./myshell to run some test and see if I don't have any segfault or other errors, valgrind sometimes tell me that I have an Invalid Write in function my_wordcpy at this line tab[++j] = str[*i];
It doesn't happen every time, but it does happen, and I just can't see why. Here is my code :
static int count_words(char *str, char *sep)
{
int quote;
int words;
int i;
i = -1;
if (count_quotes(str) == -1)
return (0);
words = 0;
quote = 0;
while (str[++i] != '\0')
{
if (str[i] == '"')
{
if (quote == 0)
quote = 1;
else
quote = 0;
}
if (quote == 0
&& (is_cinside(sep, str[i]) == 0 && str[i] != '\t' &&
(is_cinside(sep, str[i + 1]) == 1 ||
str[i + 1] == '\t' || str[i + 1] == '\0')))
++words;
}
return (words);
}
static int my_wordlen(char *str, int *i, char *sep)
{
int quote;
int j;
j = 0;
quote = 0;
while (str[++(*i)] != '\0')
if (str[*i] == '"' && quote == 0)
quote = 1;
else if (quote == 1 || (quote == 0 && is_cinside(sep, str[*i]) == 0 &&
str[*i] != '\t'))
{
++j;
if ((quote == 1 && str[*i + 1] == '"') ||
(quote == 0 && (is_cinside(sep, str[*i + 1]) == 1 ||
str[*i + 1] == '\t' ||
str[*i + 1] == '\0')))
{
if (quote == 1 && str[*i + 1] == '"')
++(*i);
return (j);
}
}
return (-1);
}
static char *my_wordcpy(char *tab, char *str, int *i, char *sep)
{
int quote;
int j;
j = -1;
quote = 0;
while (str[++(*i)] != '\0')
if (str[*i] == '"' && quote == 0)
quote = 1;
else if (quote == 1 || (quote == 0 &&
is_cinside(sep, str[*i]) == 0 && str[*i] != '\t'))
{
tab[++j] = str[*i]; /* here is the invalid write. */
if ((quote == 1 && str[*i + 1] == '"') ||
(quote == 0 && (is_cinside(sep, str[*i + 1]) == 1 ||
str[*i + 1] == '\t' || str[*i + 1] == '\0')))
{
if (quote == 1 && str[*i + 1] == '"')
++(*i);
tab[++j] = '\0';
return (tab);
}
}
return (NULL);
}
char **my_quotetowordtab(char *str, char *sep)
{
char **tab;
int words;
int i;
int j;
int k;
i = -1;
j = -1;
k = -1;
if (str == NULL)
return (NULL);
words = count_words(str, sep);
if ((tab = malloc(sizeof(char *) * (words + 1))) == NULL)
return (NULL);
while (++i < words)
{
if ((tab[i] = malloc(sizeof(char) * (my_wordlen(str, &j, sep) + 1)))
== NULL)
return (NULL);
tab[i] = my_wordcpy(tab[i], str, &k, sep);
}
tab[i] = NULL;
return (tab);
}
my_wordlen can return -1 and you don't check this before giving it to malloc. In this case 0 bytes are allocated hence in my_wordcopy a heap-buffer-overflow occurs.
What happens if you have a str with only a single or odd number of " quote characters? Seems like your code won't check for \0 in that case and therefore it could write passed the end of tab. I think you need to move your NUL character check outside of the 2nd if clause to catch both cases.

How to remove quotes from a string in C

I am trying to remove all quotes in a given line except a backslash followed by a quote
what I have done is this
for (int i = 0; i < lineLength; i ++) {
if (line[i] == '"' ) {
if (line[i-1] == '\\') // if \" is used
line[i-1] = '"'; // then print \
line[i] = '\0'; // or 0
}
}
This removes all characters in the line.. what can I do to remove only quotes?
Any help would be appreciated...
Your problem is line[i] = '\0'; - it terminates the string.
If you want to remove characters from a C string, you need to hold two indices - one for reading and one for writing, loop over the read index reading each character, and write only the ones you want to keep using the second index.
Something along the lines of:
int j = 0;
for (int i = 0; i < lineLength; i ++) {
if (line[i] != '"' && line[i] != '\\') {
line[j++] = line[i];
} else if (line[i+1] == '"' && line[i] == '\\') {
line[j++] = '"';
} else if (line[i+1] != '"' && line[i] == '\\') {
line[j++] = '\\';
}
}
//You missed the string termination ;)
if(j>0) line[j]=0;
You are setting the first " character you find to the null character, terminating the string.
Also an aside, but line[i-1] could cause a segmentation fault when i == 0, or it could happen to contain \ in which case the first quote wouldn't be stripped.
Something like this will do what you want:
char *lineWithoutQuotes = malloc(strlen(line));
int i, j;
if(line[0] != '"')
lineWithoutQuotes[0] = line[0];
for(i = j = 1; i < strlen(line); i++){
if(line[i] == '"' && line[i-1] != '\\')
continue;
lineWithoutQuotes[j++] = line[i];
}
The normal technique using indexes is:
int j = 0;
for (int i = 0; i < lineLength; i++)
{
if (line[i] == '\\')
{
line[j++] = line[i++];
line[j++] = line[i];
if (line[i] == '\0')
break;
}
else if (line[i] != '"')
line[j++] = line[i];
}
line[j] = '\0';
Using pointers (and not needing lineLength), it is:
char *dst = line;
char *src = line;
char c;
while ((c = *src++) != '\0')
{
if (c == '\\')
{
*dst++ = c;
if ((c = *src++) == '\0')
break;
*dst++ = c;
}
else if (c != '"')
*dst++ = c;
}
*dst = '\0';
Or minor variations on those themes...
int newPos = 0;
for (int oldPos = 0; oldPos < lineLength; oldPos++) {
if (!(line[newPos] == '"' && (!newPos || line[newPos-1] == '\\'))) {
line[newPos] = line[oldPos];
newPos++;
}
}
line[newPos] = 0;

Resources