Finding the shortest word in a C string - c

I'm trying to find the shortest word in a given C string, but somehow it fails if there is only one word or if it's the last word.
I tried start at the null character and count backwards until I hit " " and than count the steps I took but it did not work properly.
Also is there a better or faster way to iterate through a string while finding the shortest or longest word?
#include <sys/types.h>
#include <string.h>
#include <stdio.h>
ssize_t find_short(const char *s) {
int min = 100;
int count = 0;
for (int i = 0 ; i < strlen(s); i++) {
if (s[i] != ' ') {
count++;
} else {
if (count < min) {
min = count;
}
count = 0;
}
if (s[i] == '\0') {
count = 0;
while (s[i] != ' ') {
--i;
count++;
if (s[i] == ' ' && count < min) {
min = count;
}
}
}
}
return min;
}

Your idea was correct you just complicated a little. Let us break down your code:
int min = 100;
First you should initialized min to INT_MAX which you can get it from #include <limits.h>. Maybe all words are bigger than 100.
for (int i = 0 ; i < strlen(s); i++)
you can use the C-String terminal character '\0':
for(int i = 0; s[i] != '\0'; i++)
The if and else part:
if (s[i] != ' ')
{
count++;
}
else
{
if (count < min)
{
min = count;
}
count = 0;
}
almost correct, but you need to append count != 0 to the condition count < min to ensure that if the string starts with spaces, you do not want to count them as the smallest word.
This part can be removed :
if (s[i] == '\0')
{
count = 0;
while(s[i] != ' ')
{
--i;
count++;
if(s[i] == ' ' && count < min)
{
min = count;
}
}
}
check the last word outside the loop. Hence, your code would look like the following:
ssize_t find_short(const char *s)
{
int min = INT_MAX;
int count = 0;
// Iterate over the string
for(int i = 0; s[i] != '\0'; i++){
if(s[i] != ' '){ // Increment the size of the word find so far
count++;
}
else{ // There is a space meaning new word
if(count != 0 && count < min){ // Check if current word is the smallest
min = count; // Update the counter
}
count = 0; // Set the counter
}
}
return (count < min) ? count : min // Check the size of the last word
}

There are multiple issues in your code:
mixing int, size_t and ssize_t for the index and string lengths and return value is confusing and incorrect as these types have a different range.
int min = 100; produces an incorrect return value if the shortest word is longer than that.
for (int i = 0 ; i < strlen(s); i++) is potentially very inefficient as the string length may be recomputed at every iteration. for (size_t i = 0 ; s[i] != '\0'; i++) is a better alternative.
find_short returns 100 for the empty string instead of 0.
Scanning backwards from the end is tricky and not necessary: to avoid this special case, omit the test in the for loop and detect the end of word by comparing the character with space or the null byte, breaking from the loop in the latter case after potentially updating the minimum length.
The initial value for min should be 0 to account for the case where the string is empty or contains only whitespace. Whenever a word has been found, min should be updated if it is 0 or if the word length is non zero and less than min.
Here are an implementation using <ctype.h> to test for whitespace:
#include <stddef.h>
#include <ctype.h>
size_t find_short(const char *s) {
size_t min = 0, len = 0;
for (;;) {
unsigned char c = *s++;
if (isspace(c) || c == '\0') {
if (min == 0 || (len > 0 && len < min))
min = len;
if (c == '\0') // test for loop termination
break;
len = 0;
} else {
len++;
}
}
return min;
}
Here is a more general alternative using the functions strcspn() and strspn() from <string.h> where you can define the set of word separators:
#include <string.h>
size_t find_short(const char *s) {
size_t min = 0;
const char *seps = " \t\r\n"; // you could add dashes and punctuation
while (*s) {
s += strspn(s, seps);
if (*s) {
size_t len = strcspn(s, seps);
if (min == 0 || (len > 0 && len < min))
min = len;
s += len;
}
}
return min;
}

As you said your program fails if the shortest word is the last one. That's because the last i for which the loop runs is i == len-1 that is the last letter of the string. So in this last lap, count will be incremented, but you will never check if this count of the last word was smaller than the min you had so far.
Assuming that you receive a null-terminated string, you could extend the loop till i <= len (where len = strlen(s)) and adjust the if condition to
if( s[i] != ' ' && s[i] )
That means: if s[i] is not a space nor the terminating null character
Also you can remove the condition if (s[i] == '\0').
About faster algorithms, I don't think it's possible to do better.
If you want you can automate the count increment using an inner empty for loop running till it finds a space and then in the outer for check for how long the innermost have been running.
I once wrote a program for the same problem which uses an inner for, I show you just for the algorithm, but don't take example from the "style" of the code, I was trying to make it as few lines as possible and that's not a good practice.
ssize_t find_short(const char *s)
{
ssize_t min = 99, i = 0;
for( --s; !i || (min > 1 && *s); s += i) {
for(i = 1; *(s+i) != ' ' && *(s+i); i++);
if( min > i-1 ) min = i-1;
}
return min;
}
Oh, one improvement I just noticed in my code could be to return the min when it reaches 1 because you know you are not going to find shorter words.

I'd suggest that you use strtok to split your string into an array of strings using space character as the token. You can then process each string in the resulting array to determine which is the "word" that you want.
See Split strings into tokens and save them in an array

Related

Counting occurrences of words within an inputted string in c

I'm currently struggling with counting the occurrences of the words within an inputted string. I believe it is just my logic that is off but I've been scratching my head for a while and I've just hit a wall.
The problems I'm currently yet to solve are:
With longer inputs the ends of the string is sometimes cut off.
Incrementing the counter for each word when repeated
I know the code has things that may not be the most ideal way for it to work but I'm fairly new to C so any pointers are really helpful.
To sum it up I'm looking for pointers to help solve the issues I'm facing above
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#define MAX_WORDS 1000
int main(void) {
int i,j,isUnique,uniqueLen;
char word[MAX_WORDS];
char words[200][30];
char uniqueWords[200][30];
int count[200];
char *p = strtok(word, " ");
int index=0;
//read input until EOF is reached
scanf("%[^EOF]", word);
//initialize count array
for (i = 0; i < 200; i++) {
count[i] = 0;
}
//convert lower case letters to upper
for (i = 0; word[i] != '\0'; i++) {
if (word[i] >= 'a' && word[i] <= 'z') {
word[i] = word[i] - 32;
}
}
//Split work string into an array and save each token into the array words
p = strtok(word, " ,.;!\n");
while (p != NULL)
{
strcpy(words[index], p);
p = strtok(NULL, " ,.;!\n");
index++;
}
/*
Check each string in the array word for occurances within the uniqueWords array. If it is unique then
copy the string from word into the unique word array. Otherwise the counter for the repeated word is incremented.
*/
uniqueLen = 0;
for (i = 0; i < index; i++) {
isUnique = 1;
for (j = 0; j < index; j++) {
if (strcmp(uniqueWords[j],words[i])==0) {
isUnique = 0;
break;
}
else {
}
}
if (isUnique) {
strcpy(uniqueWords[uniqueLen], words[i]);
count[uniqueLen] += 1;
uniqueLen++;
}
else {
}
}
for (i = 0; i < uniqueLen; i++) {
printf("%s => %i\n", uniqueWords[i],count[i]);
}
}
This is the code i ended up using, this turned out to be mainly an issue with using the scanf function. Placing it in a while loop made it much easier to edit words as inputted.
Thankyou for all the help :)
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main(void) {
// Create all variables
int i, len, isUnique, index;
char word[200];
char uniqueWords[200][30];
int count[200];
// Initialize the count array
for (i = 0; i < 200; i++) {
count[i] = 0;
}
// Set the value for index to 0
index = 0;
// Read all words inputted until the EOF marker is reached
while (scanf("%s", word) != EOF) {
/*
For each word being read if the characters within it are lowercase
then each are then incremented into being uppercase values.
*/
for (i = 0; word[i] != '\0'; i++) {
if (word[i] >= 'a' && word[i] <= 'z') {
word[i] = word[i] - 32;
}
}
/*
We use len to find the length of the word being read. This is then used
to access the final character of the word and remove it if it is not an
alphabetic character.
*/
len = strlen(word);
if (ispunct(word[len - 1]))
word[len - 1] = '\0';
/*
The next part removes the non alphabetic characters from within the words.
This happens by incrementing through each character of the word and by
using the isalpha and removing the characters if they are not alphabetic
characters.
*/
size_t pos = 0;
for (char *p = word; *p; ++p)
if (isalpha(*p))
word[pos++] = *p;
word[pos] = '\0';
/*
We set the isUnique value to 1 as upon comparing the arrays later we
change this value to 0 to show the word is not unique.
*/
isUnique = 1;
/*
For each word through the string we use a for loop when the counter i
is below the index and while the isUnique value is 1.
*/
for (i = 0; i < index && isUnique; i++)
{
/*
Using the strcmp function we are able to check if the word in
question is in the uniqueWords array. If it is found we then
change the isUnique value to 0 to show that the value is not
unique and prevent the loop happening again.
*/
if (strcmp(uniqueWords[i], word) == 0)
isUnique = 0;
}
/* If word is unique then add it to the uniqueWords list
and increment index. Otherwise increment occurrence
count of current word.
*/
if (isUnique)
{
strcpy(uniqueWords[index], word);
count[index]++;
index++;
}
else
{
count[i - 1]++;
}
}
/*
For each item in the uniqueWords list we iterate through the words
and print them out in the correct format with the word and the following count of them.
*/
for (i = 0; i < index; i++)
{
printf("%s => %d\n", uniqueWords[i], count[i]);
}
}
I don't know if you are facing some requirements, but for all it's limitations in terms of standard library functions, C does have one that would make your job much easier, strstr, e.g.:
Live demo
#include <stdio.h>
#include <string.h>
int main() {
const char str[] = "stringstringdstringdstringadasstringipoistring";
const char* substr = "string";
const char* orig = str;
const char* temp = substr;
int length = 0;
while(*temp++){length++;} // length of substr
int count = 0;
char *ret = strstr(orig, substr);
while (ret != NULL){
count++;
//check next occurence
ret = strstr(ret + length, substr);
}
printf("%d", count);
}
The output should be 6.
Regarding user3121023's comment, scanf("%999[^\n]", word); parses all characters until it finds a \n or it reaches the width limit, and I agree fgets ( word, sizeof word, stdin); is better.

Checking for duplicate words in a string in C [duplicate]

This question already has answers here:
How to check duplicate words in a string in C?
(2 answers)
Closed 1 year ago.
I have written a code in c to search for duplicate words in a string, It just appends every word in a string to a 2d string array, but it is returning 0 for the numbers of rows and duplicate strings, what is the problem with the code?
int main() {
char str[50] = "C code find duplicate string";
char str2d[10][50];
int count = 0;
int row = 0, column = 0;
for (int i = 0; str[i] != '\0'; i++) {
if (str[i] != '\0' || str[i] != ' ') {
str2d[row][column] = str[i];
column += 1;
} else {
str2d[row][column] = '\0';
row += 1;
column = 0;
}
}
for (int x = 0; x <= row; x++) {
for (int y = x + 1; y <= row; y++) {
if (strcmp(str2d[x], str2d[y]) == 0 && (strcmp(str2d[y], "0") != 0)) {
count += 1;
}
}
}
printf("%i %i", row, count);
return 0;
}
There are multiple problems in your code:
the 2D array might be too small: there could be as many as 25 words in a 50 byte string, and even more if you consider sequences of spaces to embed empty words.
the test if (str[i] != '\0' || str[i] != ' ') is always true.
the last word in the string is not null terminated in the 2D array.
the word at str2d[row] is uninitialized if the string ends with a space
sequences of spaces cause empty words to be stored into the 2D array.
there is no point in testing strcmp(str2d[y], "0"). This might be a failed attempt at ignoring empty words, which could be tested with strcmp(str2d[y], "").
Here is a modified version:
#include <stdio.h>
#include <string.h>
int main() {
char str[50] = "C code find duplicate string";
char str2d[25][50];
int count = 0, row = 0, column = 0;
for (int i = 0;;) {
// skip initial spaces
while (str[i] == ' ')
i++;
if (str[i] == '\0')
break;
// copy characters up to the next space or the end of the string
while (str[i] != ' ' && str[i] != '\0')
str2d[row][column++] = str[i++];
str2d[row][column] = '\0';
row++;
}
for (int x = 0; x < row; x++) {
for (int y = x + 1; y < row; y++) {
if (strcmp(str2d[x], str2d[y]) == 0)
count += 1;
}
}
printf("%i %i\n", row, count);
return 0;
}
The problems are:
if (str[i] != '\0' || str[i] != ' ') should be if (str[i] != '\0' && str[i] != ' '). If I recall right, using the logical or will prevent reaching the else case.
if (strcmp(str2d[x], str2d[y]) == 0 && (strcmp(str2d[y], "0") != 0)) should be if (strcmp(str2d[x], str2d[y]) == 0). Otherwise, your code will not count duplicates when the word is "0".
a. To avoid confusion, use something like printf("Number of rows = %d, Number of duplicates = %d\n", row+1, count);. Since C arrays start at index 0, that's what row in your code contains. But the number of rows is 1.
b. If you haven't realised by now, there are no duplicates in your str variable: char str[50] = "C code find duplicate string";. So your code returns a correct value of 0. Change it to char str[50] = "C code find duplicate duplicate"; (for example), your code will correctly return 1.

How do it get character frequency and highest character frequency?

so this is my function. My main focus is to get the character frequencies and the highest character frequency.
The function below (get_letter_frequencies) is supposed to get a string example ("I am a big boy") and return the character frequencies and the highest character frequency.
The Function should return
i - 2
a - 2
m - 1
b - 2
g - 1
o - 1
y - 1
Highest character frequency would be " iab "
My problem is with the get_letter_frequencies function. What should I arrange from the function in order to return the above output?
void get_letter_frequencies(const char *text, size_t len, int freq[26], int *max_freq)
{
for(int i = 0; i<len; i++)
{
if(text[i] != ' ' || !(is_sentence_terminator(text[i]))) //this condition is set in order to ignore the spaces and the sentence terminators (! ? .)
{
if(text[i] >= 'a' && text[i] <= 'z')
{
freq[text[i] - 'a']++;
}
}
}
for(int j = 0; j < 26; j++)
{
if(freq[j] >= 1)
{
*max_freq = freq[j];
}
}
This function below(is_sentence_terminator). Here the function checks whether the sentence finishes with a " ! ? or . " if it does not finish with one of the terminators then it is not a sentence and ignores it.
int is_sentence_terminator(char ch)
{
if(ch == 33 || ch == 46 || ch == 63)
{
return 1;
}else
{
return 0;
}
}
There are some issues in your code:
there is no need to test for special characters, comparing text[i] to 'a' and 'z' is sufficient for ASCII systems.
in the second loop, you should update *max_freq only if freq[j] is greater than the current value, not 1. *max_freq should be initialized to 0 before the loop.
In the calling code, you would also
print the letters whose frequency is non 0.
print all letters with the maximum frequency using one final loop.
Here is a modified version:
void get_letter_frequencies(const char *text, size_t len, int freq[26], int *max_freq) {
for (int i = 0; i < 26; i++)
freq[i] = 0;
for (int i = 0; i < len; i++) {
if (text[i] >= 'a' && text[i] <= 'z') {
freq[text[i] - 'a']++; // assuming ASCII
}
}
*max_freq = 0;
for (int i = 0; i < 26; i++) {
if (*max_freq < freq[i]) {
*max_freq = freq[i];
}
}
}

Longest Substring Palindrome issue

I feel like I've got it almost down, but for some reason my second test is coming up with a shorter palindrome instead of the longest one. I've marked where I feel the error may be coming from, but at this point I'm kind of at a loss. Any direction would be appreciated!
#include <stdio.h>
#include <string.h>
/*
* Checks whether the characters from position first to position last of the string str form a palindrome.
* If it is palindrome it returns 1. Otherwise it returns 0.
*/
int isPalindrome(int first, int last, char *str)
{
int i;
for(i = first; i <= last; i++){
if(str[i] != str[last-i]){
return 0;
}
}
return 1;
}
/*
* Find and print the largest palindrome found in the string str. Uses isPalindrome as a helper function.
*/
void largestPalindrome(char *str)
{
int i, last, pStart, pEnd;
pStart = 0;
pEnd = 0;
int result;
for(i = 0; i < strlen(str); i++){
for(last = strlen(str); last >= i; last--){
result = isPalindrome(i, last, str);
//Possible error area
if(result == 1 && ((last-i)>(pEnd-pStart))){
pStart = i;
pEnd = last;
}
}
}
printf("Largest palindrome: ");
for(i = pStart; i <= pEnd; i++)
printf("%c", str[i]);
return;
}
/*
* Do not modify this code.
*/
int main(void)
{
int i = 0;
/* you can change these strings to other test cases but please change them back before submitting your code */
//str1 working correctly
char *str1 = "ABCBACDCBAAB";
char *str2 = "ABCBAHELLOHOWRACECARAREYOUIAMAIDOINEVERODDOREVENNGGOOD";
/* test easy example */
printf("Test String 1: %s\n",str1);
largestPalindrome(str1);
/* test hard example */
printf("\nTest String 2: %s\n",str2);
largestPalindrome(str2);
return 0;
}
Your code in isPalindrome doesn't work properly unless first is 0.
Consider isPalindrome(6, 10, "abcdefghhgX"):
i = 6;
last - i = 4;
comparing str[i] (aka str[6] aka 'g') with str[last-i] (aka str[4] aka 'e') is comparing data outside the range that is supposed to be under consideration.
It should be comparing with str[10] (or perhaps str[9] — depending on whether last is the index of the final character or one beyond the final character).
You need to revisit that code. Note, too, that your code will test each pair of characters twice where once is sufficient. I'd probably use two index variables, i and j, set to first and last. The loop would increment i and decrement j, and only continue while i is less than j.
for (int i = first, j = last; i < j; i++, j--)
{
if (str[i] != str[j])
return 0;
}
return 1;
In isPalindrome, replace the line if(str[i] != str[last-i]){ with if(str[i] != str[first+last-i]){.
Here's your problem:
for(i = first; i <= last; i++){
if(str[i] != str[last-i]){
return 0;
}
}
Should be:
for(i = first; i <= last; i++, last--){
if(str[i] != str[last]){
return 0;
}
}
Also, this:
for(last = strlen(str); last >= i; last--){
Should be:
for(last = strlen(str) - 1; last >= i; last--){

Print the longest word of a string

I wrote the following code
#include<stdio.h>
int main(void)
{
int i, max = 0, count = 0, j;
char str[] = "I'm a programmer";
for(i = 0; i < str[i]; i++)
{
if (str[i] != ' ')
count++;
else
{
if (max < count)
{
j = i - count;
max = count;
}
count = 0;
}
}
for(i = j; i < j + max; i++)
printf("%c", str[i]);
return 0;
}
With the intention to find and print the longest word, but does not work when the longest word this in the last as I'm a programmer I printed I'm instead of programmer
How to solve this problem, someone gives me a hand
The terminating condition of your for loop is wrong. It should be:
for(i = 0; i < strlen(str) + 1; i++)
and also, since at the end of string you don't have a ' ', but you have a '\0', you should change:
if (str[i] != ' ')
to:
if (str[i] != ' ' && str[i] != '\0')
The issue should be rather obvious. You only update your longest found word when the character you are inspecting is a space. But there is no space after the longest word in your test string, thus updating code is never exeuted for it.
You can just plop this code after the loop and that should do the trick.
Note however you could have trivially found this by adding mere printfs showing progress of this function.

Resources