How do it get character frequency and highest character frequency? - c

so this is my function. My main focus is to get the character frequencies and the highest character frequency.
The function below (get_letter_frequencies) is supposed to get a string example ("I am a big boy") and return the character frequencies and the highest character frequency.
The Function should return
i - 2
a - 2
m - 1
b - 2
g - 1
o - 1
y - 1
Highest character frequency would be " iab "
My problem is with the get_letter_frequencies function. What should I arrange from the function in order to return the above output?
void get_letter_frequencies(const char *text, size_t len, int freq[26], int *max_freq)
{
for(int i = 0; i<len; i++)
{
if(text[i] != ' ' || !(is_sentence_terminator(text[i]))) //this condition is set in order to ignore the spaces and the sentence terminators (! ? .)
{
if(text[i] >= 'a' && text[i] <= 'z')
{
freq[text[i] - 'a']++;
}
}
}
for(int j = 0; j < 26; j++)
{
if(freq[j] >= 1)
{
*max_freq = freq[j];
}
}
This function below(is_sentence_terminator). Here the function checks whether the sentence finishes with a " ! ? or . " if it does not finish with one of the terminators then it is not a sentence and ignores it.
int is_sentence_terminator(char ch)
{
if(ch == 33 || ch == 46 || ch == 63)
{
return 1;
}else
{
return 0;
}
}

There are some issues in your code:
there is no need to test for special characters, comparing text[i] to 'a' and 'z' is sufficient for ASCII systems.
in the second loop, you should update *max_freq only if freq[j] is greater than the current value, not 1. *max_freq should be initialized to 0 before the loop.
In the calling code, you would also
print the letters whose frequency is non 0.
print all letters with the maximum frequency using one final loop.
Here is a modified version:
void get_letter_frequencies(const char *text, size_t len, int freq[26], int *max_freq) {
for (int i = 0; i < 26; i++)
freq[i] = 0;
for (int i = 0; i < len; i++) {
if (text[i] >= 'a' && text[i] <= 'z') {
freq[text[i] - 'a']++; // assuming ASCII
}
}
*max_freq = 0;
for (int i = 0; i < 26; i++) {
if (*max_freq < freq[i]) {
*max_freq = freq[i];
}
}
}

Related

Trying to print a sring,but getting "##" in place of data

I am trying to get a string from the user, then remove all characters in a string, except the alphabets.
The string a string containing whitespace.
I have input a string, but the output is only "##".
I don't know what's happening.
//C program
//Program to remove all characters in a string, except alphabet
#include <stdio.h>
#include <string.h>
int main()
{
char str[150], copy[150];
int i = 0, j = 0;
printf("\nEnter a string : ");
fgets(str,150,stdin);
for (i = 0; i < 150; i++)
{
if ((str[i] >= 'a' && str[i] <= 'z') ||
(str[i] >= 'A' && str[i] <= 'Z') || (str[i] == '\0') )
{
*(copy + j) == *(str+i);
j++;
}
}
printf("\nResultant String : ");
for (int i = 0; i < strlen(str); i++)
{
printf("%c",copy[i]);
}
printf("\n");
return 0;
}
On terminal this is the program being run, I input "1 2 3 4 get on the dance floor".
Enter a string : 1 2 3 4 get on the dance floor
Resultant String : ##
The main problem is you're not doing an assignment here:
*(copy + j) == *(str+i);
The == is for comparison. You want =:
*(copy + j) = *(str+i);
Your loop conditions are also incorrect:
for (i = 0; i < 150; i++)
...
for (int i = 0; i < strlen(str); i++)
For the first loop, you're reading all bytes in the source array instead of just the ones that were set, and in the second one you're using the length of the source string instead of the result string to print the result. These should be:
for (i = 0; i < strlen(str); i++)
...
for (int i = 0; i < strlen(copy); i++)

CS50-pset2: Readability. Incorrect outcome

When I run my code, the outcome has failed to show the exact result based on the texts given in the problem set. Although, it showed the grade, the result is incorrect. The text is : "Would you like them here or there? I would not like them here or there. I would not like them anywhere."(Grade 2)
enter image description here
Supposedly, the result for the text is "Grade 2". However, it shows all grades instead.
enter code here
int main(void)
{
string s = get_string("Text: ");
printf("%s\n",s);
int count_letters = 0; //To count letters (uppercase & lowercase)
int count_words = 1; //To count words
int count_sentences = 0; //To count sentences
for (int i = 0; i < strlen(s); i++)
if (isalpha(s[i]))
{
if ((s[i] >= 'a' && s[i] <= 'z' )||( s[i] >= 'A' && s[i] <= 'Z'))
{
count_letters++;
}
if (s[i] == ' ')
{
count_words++;
}
if (s[i] == '.' || s[i] =='!' || s[i] == '?')
{
count_sentences++;
}
//printf("%i count_letter(s)\n", count_letters);
//printf("%i count_words(s)\n", count_words);
//printf("%i sentence(s)\n", count_sentences);
//Coleman-Liau index
float L = (count_letters / (float) count_words) * 100;
float S = (count_sentences / (float) count_words) * 100;
int grade = round (0.0588 * L - 0.296 * S -15.8);
if (grade < 1)
{
printf("Before Grade 1\n");
}
else if (grade >= 16)
{
printf("Grade 16+\n");
}
else
{
printf("Grade %.d\n", grade);
}
}
}
Is there any problem with my code? How can I fix my code in order to receive the exact outcome. I've been doing this problem set for almost 2 days :'/. Thanks in advance
Calculate the number of letters, sentences, and words inside of the loop and calculate Coleman-Liau's index outside of the loop.
Don't calculate something in a loop and try to get specific output from within it as well, it never ends well. So in conclusion, calculate your values in the loop and do everything else outside of it.
int count_letters = 0; //To count letters (uppercase & lowercase)
int count_words = 1; //To count words
int count_sentences = 0; //To count sentences
for (int i = 0; i < strlen(s); i++){
// get the amounts in the loop
if (isalpha(s[i]))
{
if ((s[i] >= 'a' && s[i] <= 'z') || (s[i] >= 'A' && s[i] <= 'Z'))
{
count_letters++;
}
if (s[i] == ' ')
{
count_words++;
}
if (s[i] == '.' || s[i] == '!' || s[i] == '?')
{
count_sentences++;
}
}
}
//Calculate Coleman-Liau outside of it and get the correct grade from your if statements
Since you have three distinct categories to count, I would create a function for each of these.
For example, based on your code, you could create a function to count characters (you don't need isdigit function here, non-digit characters are already filtered out by the algorithm itself):
int get_letters_count(char *text_str)
{
int count_letters = 0;
int text_len = strlen(text_str);
for (int i = 0; i < text_len; i++) {
if ( (text_str[i] >= 'a' && text_str[i] <= 'z')
|| (text_str[i] >= 'A' && text_str[i] <= 'Z')
|| (text_str[i] >= '0' && text_str[i] <= '9')) {
count_letters++;
}
}
return count_letters;
}
This approach of breaking down your program will make it much easier to develop.
Here is a very crude implementation based on the Coleman–Liau index Wikipedia page:
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
int get_letters_count(char *text_str)
{
int count_letters = 0;
int text_len = strlen(text_str);
for (int i = 0; i < text_len; i++) {
if ( (text_str[i] >= 'a' && text_str[i] <= 'z')
|| (text_str[i] >= 'A' && text_str[i] <= 'Z')
|| (text_str[i] >= '0' && text_str[i] <= '9')) {
count_letters++;
}
}
return count_letters;
}
int get_words_count(char *text_str)
{
int count_words = 0;
int text_len = strlen(text_str);
for (int i = 0; i < text_len; i++) {
if (text_str[i] == ' ') {
count_words++;
}
}
if (count_words)
count_words++;
return count_words;
}
bool word_is_acronym(char *word)
{
bool ret = true;
for (; *word && *word != ' '; word++) {
if ( *word != '.'
&& *word < 'A' || *word > 'Z') {
ret = false;
}
}
return ret;
}
int get_sentences_count(char *text_str)
{
int count_sentences = 0;
int text_len = strlen(text_str);
char *last_word = &text_str[0];
for (int i = 0; i < text_len; i++) {
if ( text_str[i] == ' '
&& i < (text_len - 1)) {
last_word = &text_str[i + 1];
}
bool end_mark = text_str[i] == '.'
|| text_str[i] == '!'
|| text_str[i] == '?';
if ( end_mark
&& word_is_acronym(last_word) == false) {
count_sentences++;
}
}
return count_sentences;
}
int main(void)
{
char text_str[] = "Existing computer programs that measure readability are based "
"largely upon subroutines which estimate number of syllables, "
"usually by counting vowels. The shortcoming in estimating syllables "
"is that it necessitates keypunching the prose into the computer. "
"There is no need to estimate syllables since word length in letters "
"is a better predictor of readability than word length in syllables. "
"Therefore, a new readability formula was computed that has for its "
"predictors letters per 100 words and sentences per 100 words. "
"Both predictors can be counted by an optical scanning device, and "
"thus the formula makes it economically feasible for an organization "
"such as the U.S. Office of Education to calibrate the readability of "
"all textbooks for the public school system.";
int count_letters = get_letters_count(text_str);
int count_words = get_words_count(text_str);
int count_sentences = get_sentences_count(text_str);;
if ( count_letters > 0
&& count_words > 0
&& count_sentences > 0) {
float L = ((count_letters * 100) / count_words);
float S = ((count_sentences * 100) / count_words);
float grade = 0.0588 * L - 0.296 * S - 15.8;
printf("grade = %.01f\n", grade);
} else {
printf("bad input\n");
}
}
Ouput:
$ gcc main.c && ./a.out
grade = 14.5
Parsing text can be very trick, though.
Once you get a first version working with a known input such as this, try to expand your data set and keep improving your program.
This program is also far from being computationally efficient. If that becomes a bottleneck, you could optimize the functions or maybe reduce the number of loops grouping the functions in a single loop.
Certainly most times it's better to start with a crude working solution and improve from there instead of attempting a more sophisticated/complete solution right from the beginning.

Mistake in CS50 Readability project with only 1 input, all others work

I seem to have made a mistake in my code but I can't find it.
All reading grades give me the correct grade, except for grade 7 which results in grade 8.
I assume it is a rounding error of some sort?
I tested the following piece of code with and without the round() in the last function.
Without it most of the grade levels are off, with the round() in there I only get an mistake a the grade7 level.
Where is my mistake?
#include <cs50.h>
#include <stdio.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
//Prototypes
int count_letters(string text);
int count_words(string text);
int count_sentences(string text);
int get_score (string text);
//Main function
int main(void)
{
//Get user input
string text = get_string("Text: ");
//Grade user text
int i = get_score(text);
if(i < 1)
{
printf("Before Grade 1\n");
}
else if (i > 1 && i < 16)
{
printf("Grade %i\n", i);
}
else
{
printf("Grade 16+\n");
}
}
// Extra functions
int count_letters(string text)
{
// variables
int letters = strlen(text);
int total_letters = 0;
int characters = 0;
// Loop through text and count all non-letters
for(int i = 0; i < letters; i++)
{
if((text[i] < 65 || text[i] > 95) && (text[i] < 97 || text[i] > 122))
{
characters++;
}
}
// substract all non-letters from total chars and return.
total_letters = letters - characters;
return total_letters;
}
int count_words(string text)
{
// variables
int letters = strlen(text);
int spaces = 1;
// Loop through text and count all spaces
for(int i = 0; i < letters; i++)
{
if(text[i] == ' ')
{
spaces++;
}
}
return spaces;
}
int count_sentences(string text)
{
// variables
int letters = strlen(text);
int sentence = 0;
// Loop through text and count all sentences
for(int i = 0; i < letters; i++)
{
if(text[i] == 46 || text[i] == 33 || text[i] == 63)
{
sentence++;
}
}
return sentence;
}
int get_score (string text)
{
//variables
int letters = count_letters(text);
int words = count_words(text);
int sentence = count_sentences(text);
float index = 0;
// letters divided by words * 100
float L = 100 * letters / words;
// sentences divided by words *100
float S = 100 * sentence / words;
index = round(0.0588 * L - 0.296 * S - 15.8);
return index;
}
if((text[i] < 65 || text[i] > 95) && (text[i] < 97 || text[i] > 122)) is almost certainly a bug. You probably meant to be implementing isalpha, but you did it incorrectly. You meant to write:
if((text[i] < 'A' || text[i] > 'Z') && (text[i] < 'a' || text[i] > 'z')), which would have avoided the typo in which 95 was used instead of 90. Instead of this, though, you should just use the standard library and write:
if( ! isalpha(text[i]) ) ...
Using literals like 'A' instead of the magic number 65 makes the code more readable and helps avoid trivial mistakes like this.
There are quite a few issues with your code:
As #IrAM has mentioned in a comment, your if does not handle a score of 1. Moreover, you can simplify your if checks if you start from the other end, i.e. first check for greater than 16:
int main(void)
{
//Get user input
string text = get_string("Text: ");
//Grade user text
int i = get_score(text);
if(i > 16)
{
printf("Grade 16+\n");
}
else if (i > 0)
{
printf("Grade %i\n", i);
}
else
{
printf("Before Grade 1\n");
}
}
As #Gerhardh mentions, you are dividing two integers which forces the result to be an integer too. If at least one of the operands is casted to a float, the result is type-promoted to a float:
// letters divided by words * 100
float L = 100 * (float) letters / words;
// sentences divided by words *100
float S = 100 * (float) sentence / words;
Optimizations
You have three different functions for counting words, sentences and letters. Why three loops when you can do it in one loop? Plus an additional iteration for strlen() in each function. Write a Count struct like this:
struct Count
{
int letters;
int words;
int sentences;
int length;
};
Then have one function that returns this struct. Like #WilliamPursell mentions, using character literals instead of ASCII values makes code much more readable:
Count get_count(string text)
{
Count result = {0, 1, 0, 0};
result.length = strlen(text);
int characters = 0;
// Loop through text and count all non-letters
for(int i = 0; i < count.length; i++)
{
if((text[i] < 'A' || text[i] > 'Z') && (text[i] < 'a' || text[i] > 'z'))
{
characters++;
}
if(text[i] == ' ')
{
result.words++;
}
if(text[i] == '.' || text[i] == '!' || text[i] == '?')
{
result.sentences++;
}
}
// subtract all non-letters from total chars and return.
count.letters = count.length - characters;
return result;
}
This is what get_score() will change to:
int get_score (string text)
{
//variables
Count result = get_count(text);
float index = 0;
// letters divided by words * 100
float L = 100 * (float) result.letters / result.words;
// sentences divided by words *100
float S = 100 * (float) result.sentences / result.words;
index = round(0.0588 * L - 0.296 * S - 15.8);
return index;
}
Side Note: A '.' may always not necessarily mean the end of a sentence. It has other meanings like in an acronym or as ellipsis.

Palindrome/mini-Palindrome in string

I need to check if a given string is a Palindrome or mini-Palindrome.
Palindrome length will be 2 or more, the function need to ignore spaces and ignore the differences of upper and lower alphabet.
if the string is Palindrome the function will transfer the indexes of the start and the end of him and will return 1 else return 0.
example1: "My gym" the function will transfer low=0 high=5 and 1
example2: "I Love ANNA" the function will transfer low=7 high=10 and 1
example3: "I love Pasta" return 0.
Also i can’t use functions from librarys other then string.h stdlib.h stdio.h.
I tried to write like this:
int i;
int size = strlen(str);
i = 0;
while (str[i] != '\0')
{
if (str[i] == ' ')
{
i++;
continue;
}
//-------------------
if (str[i] >= ‘a’ && str[i] <= ‘z’)
str[i] = str[i] - 32;
if (str[size-1] >= ‘a’ && str[size-1] <= ‘z’)
str[size-1] = str[size-1] - 32;
//-------------------
if (str[i] == str[size-1])
{
*low = i;
*high = size-1;
return 1;
}
else
{
size--;
i++;
}
}
return 0;
But it isnt working well, i cant figure how to do it with the example 2
Here goes. Will this help you
#define LOWER(a) (((a) >=' A' && (a) <= 'Z') ? ((a) - 'A' +'a') : (a))
#define MYCMP(a,b) (LOWER(a) == LOWER(b))
int is_palidrome(char *s) {
int start = 0;
int end = strlen(s) - 1;
for (; s[start] // Not end of line
&& end >=0 // Not run over the from of the line
&& start < end // Still not got to the middle
&& MYCMP(s[start], s[end]) == 1; // They are still equal
start++, end--) { //Nowt }
};
return (start >= end);
}
I made a program. It works only if the string contains letters and spaces. You can modify it to work for other characters.
#include <stdio.h>
#include <string.h>
#define SIZE 100
int isPalindrome( char *s, size_t l );
int main() {
char str[SIZE];
size_t i, j, len, pldrm = 0;
fgets(str, SIZE, stdin);
len = strlen(str);
for(i = 0; i < len; i++) if( str[i] != ' ' && !((str[i] >= 'a' && str[i] <= 'z') || (str[i] >= 'A' && str[i] <= 'Z')) ) goto the_end;
for(i = 0; i < len-1; i++) {
if( str[i] != ' ' ) {
for(j = i+1; j < len; j++) {
if( (pldrm = isPalindrome(&str[i], j-i+1)) ) {
str[j+1] = '\0';
goto the_end;
}
}
}
}
the_end:
pldrm ? printf("A palindrome has been found from the position %zu till the position %zu.\n\nThe palindrome is: %s\n", i, j, &str[i]) : puts("No palindromes");
return 0;
}
int isPalindrome( char *s, size_t l )
{
static const char az[26] = "abcdefghijklmnopqrstuvwxyz", AZ[26] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int isPldrm = 1, spc = 0; // used to skip spaces within the palindrome
for(size_t i = 0; i < l/2; i++) {
for(size_t j = 0; j < 26; j++) {
if( s[i] == az[j] || s[i] == AZ[j] ) {
while( s[l-1-i-spc] == ' ' ) ++spc;
if( s[l-1-i-spc] != az[j] && s[l-1-i-spc] != AZ[j] ) {
isPldrm = 0;
goto thats_it;
}
break;
}
}
}
thats_it:
return isPldrm;
}
Also, it finds only the first palindrome in the input. Doesn't check for further palindromes.

Sorting words out in a string array

My program is designed to allow the user to input a string and my program will output the number of occurrences of each letters and words. My program also sorts the words alphabetically.
My issue is: I output the words seen (first unsorted) and their occurrences as a table, and in my table I don't want duplicates. SOLVED
For example, if the word "to" was seen twice I just want the word "to" to appear only once in my table outputting the number of occurrences.
How can I fix this? Also, why is it that i can't simply set string[i] == delim to apply to every delimiter rather than having to assign it manually for each delimiter?
Edit: Fixed my output error. But how can I set a condition for string[i] to equal any of the delimiters in my code rather than just work for the space bar? For example on my output, if i enter "you, you" it will out put "you, you" rather than just "you". How can I write it so it removes the comma and compares "you, you" to be as one word.
Any help is appreciated. My code is below:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
const char delim[] = ", . - !*()&^%$##<> ? []{}\\ / \"";
#define SIZE 1000
void occurrences(char s[], int count[]);
void lower(char s[]);
int main()
{
char string[SIZE], words[SIZE][SIZE], temp[SIZE];
int i = 0, j = 0, k = 0, n = 0, count;
int c = 0, cnt[26] = { 0 };
printf("Enter your input string:");
fgets(string, 256, stdin);
string[strlen(string) - 1] = '\0';
lower(string);
occurrences(string, cnt);
printf("Number of occurrences of each letter in the text: \n");
for (c = 0; c < 26; c++){
if (cnt[c] != 0){
printf("%c \t %d\n", c + 'a', cnt[c]);
}
}
/*extracting each and every string and copying to a different place */
while (string[i] != '\0')
{
if (string[i] == ' ')
{
words[j][k] = '\0';
k = 0;
j++;
}
else
{
words[j][k++] = string[i];
}
i++;
}
words[j][k] = '\0';
n = j;
printf("Unsorted Frequency:\n");
for (i = 0; i < n; i++)
{
strcpy(temp, words[i]);
for (j = i + 1; j <= n; j++)
{
if (strcmp(words[i], words[j]) == 0)
{
for (a = j; a <= n; a++)
strcpy(words[a], words[a + 1]);
n--;
}
} //inner for
}
i = 0;
/* find the frequency of each word */
while (i <= n) {
count = 1;
if (i != n) {
for (j = i + 1; j <= n; j++) {
if (strcmp(words[i], words[j]) == 0) {
count++;
}
}
}
/* count - indicates the frequecy of word[i] */
printf("%s\t%d\n", words[i], count);
/* skipping to the next word to process */
i = i + count;
}
printf("ALphabetical Order:\n");
for (i = 0; i < n; i++)
{
strcpy(temp, words[i]);
for (j = i + 1; j <= n; j++)
{
if (strcmp(words[i], words[j]) > 0)
{
strcpy(temp, words[j]);
strcpy(words[j], words[i]);
strcpy(words[i], temp);
}
}
}
i = 0;
while (i <= n) {
count = 1;
if (i != n) {
for (j = i + 1; j <= n; j++) {
if (strcmp(words[i], words[j]) == 0) {
count++;
}
}
}
printf("%s\n", words[i]);
i = i + count;
}
return 0;
}
void occurrences(char s[], int count[]){
int i = 0;
while (s[i] != '\0'){
if (s[i] >= 'a' && s[i] <= 'z')
count[s[i] - 'a']++;
i++;
}
}
void lower(char s[]){
int i = 0;
while (s[i] != '\0'){
if (s[i] >= 'A' && s[i] <= 'Z'){
s[i] = (s[i] - 'A') + 'a';
}
i++;
}
}
I have the solution to your problem and its name is called Wall. No, not the type to bang your head against when you encounter a problem that you can't seem to solve but for the Warnings that you want your compiler to emit: ALL OF THEM.
If you compile C code with out using -Wall then you can commit all the errors that people tell you is why C is so dangerous. But once you enable Warnings the compiler will tell you about them.
I have 4 for your program:
for (c; c< 26; c++) { That first c doesn't do anything, this could be written for (; c < 26; c++) { or perhaps beter as for (c = 0; c <26; c++) {
words[i] == NULL "Statement with no effect". Well that probably isn't what you wanted to do. The compiler tells you that that line doesn't do anything.
"Unused variable 'text'." That is pretty clear too: you have defined text as a variable but then never used it. Perhaps you meant to or perhaps it was a variable you thought you needed. Either way it can go now.
"Control reaches end of non-void function". In C main is usually defined as int main, i.e. main returns an int. Standard practice is to return 0 if the program successfully completed and some other value on error. Adding return 0; at the end of main will work.
You can simplify your delimiters. Anything that is not a-z (after lower casing it), is a delimiter. You don't [need to] care which one it is. It's the end of a word. Rather than specify delimiters, specify chars that are word chars (e.g. if words were C symbols, the word chars would be: A-Z, a-z, 0-9, and _). But, it looks like you only want a-z.
Here are some [untested] examples:
void
scanline(char *buf)
{
int chr;
char *lhs;
char *rhs;
char tmp[5000];
lhs = tmp;
for (rhs = buf; *rhs != 0; ++rhs) {
chr = *rhs;
if ((chr >= 'A') && (chr <= 'Z'))
chr = (chr - 'A') + 'a';
if ((chr >= 'a') && (chr <= 'z')) {
*lhs++ = chr;
char_histogram[chr] += 1;
continue;
}
*lhs = 0;
if (lhs > tmp)
count_string(tmp);
lhs = tmp;
}
if (lhs > tmp) {
*lhs = 0;
count_string(tmp);
}
}
void
count_string(char *str)
{
int idx;
int match;
match = -1;
for (idx = 0; idx < word_count; ++idx) {
if (strcmp(words[idx],str) == 0) {
match = idx;
break;
}
}
if (match < 0) {
match = word_count++;
strcpy(words[match],str);
}
word_histogram[match] += 1;
}
Using separate arrays is ugly. Using a struct might be better:
#define STRMAX 100 // max string length
#define WORDMAX 1000 // max number of strings
struct word {
int word_hist; // histogram value
char word_string[STRMAX]; // string value
};
int word_count; // number of elements in wordlist
struct word wordlist[WORDMAX]; // list of known words

Resources