Checking for duplicate words in a string in C [duplicate] - arrays

This question already has answers here:
How to check duplicate words in a string in C?
(2 answers)
Closed 1 year ago.
I have written a code in c to search for duplicate words in a string, It just appends every word in a string to a 2d string array, but it is returning 0 for the numbers of rows and duplicate strings, what is the problem with the code?
int main() {
char str[50] = "C code find duplicate string";
char str2d[10][50];
int count = 0;
int row = 0, column = 0;
for (int i = 0; str[i] != '\0'; i++) {
if (str[i] != '\0' || str[i] != ' ') {
str2d[row][column] = str[i];
column += 1;
} else {
str2d[row][column] = '\0';
row += 1;
column = 0;
}
}
for (int x = 0; x <= row; x++) {
for (int y = x + 1; y <= row; y++) {
if (strcmp(str2d[x], str2d[y]) == 0 && (strcmp(str2d[y], "0") != 0)) {
count += 1;
}
}
}
printf("%i %i", row, count);
return 0;
}

There are multiple problems in your code:
the 2D array might be too small: there could be as many as 25 words in a 50 byte string, and even more if you consider sequences of spaces to embed empty words.
the test if (str[i] != '\0' || str[i] != ' ') is always true.
the last word in the string is not null terminated in the 2D array.
the word at str2d[row] is uninitialized if the string ends with a space
sequences of spaces cause empty words to be stored into the 2D array.
there is no point in testing strcmp(str2d[y], "0"). This might be a failed attempt at ignoring empty words, which could be tested with strcmp(str2d[y], "").
Here is a modified version:
#include <stdio.h>
#include <string.h>
int main() {
char str[50] = "C code find duplicate string";
char str2d[25][50];
int count = 0, row = 0, column = 0;
for (int i = 0;;) {
// skip initial spaces
while (str[i] == ' ')
i++;
if (str[i] == '\0')
break;
// copy characters up to the next space or the end of the string
while (str[i] != ' ' && str[i] != '\0')
str2d[row][column++] = str[i++];
str2d[row][column] = '\0';
row++;
}
for (int x = 0; x < row; x++) {
for (int y = x + 1; y < row; y++) {
if (strcmp(str2d[x], str2d[y]) == 0)
count += 1;
}
}
printf("%i %i\n", row, count);
return 0;
}

The problems are:
if (str[i] != '\0' || str[i] != ' ') should be if (str[i] != '\0' && str[i] != ' '). If I recall right, using the logical or will prevent reaching the else case.
if (strcmp(str2d[x], str2d[y]) == 0 && (strcmp(str2d[y], "0") != 0)) should be if (strcmp(str2d[x], str2d[y]) == 0). Otherwise, your code will not count duplicates when the word is "0".
a. To avoid confusion, use something like printf("Number of rows = %d, Number of duplicates = %d\n", row+1, count);. Since C arrays start at index 0, that's what row in your code contains. But the number of rows is 1.
b. If you haven't realised by now, there are no duplicates in your str variable: char str[50] = "C code find duplicate string";. So your code returns a correct value of 0. Change it to char str[50] = "C code find duplicate duplicate"; (for example), your code will correctly return 1.

Related

Finding the shortest word in a C string

I'm trying to find the shortest word in a given C string, but somehow it fails if there is only one word or if it's the last word.
I tried start at the null character and count backwards until I hit " " and than count the steps I took but it did not work properly.
Also is there a better or faster way to iterate through a string while finding the shortest or longest word?
#include <sys/types.h>
#include <string.h>
#include <stdio.h>
ssize_t find_short(const char *s) {
int min = 100;
int count = 0;
for (int i = 0 ; i < strlen(s); i++) {
if (s[i] != ' ') {
count++;
} else {
if (count < min) {
min = count;
}
count = 0;
}
if (s[i] == '\0') {
count = 0;
while (s[i] != ' ') {
--i;
count++;
if (s[i] == ' ' && count < min) {
min = count;
}
}
}
}
return min;
}
Your idea was correct you just complicated a little. Let us break down your code:
int min = 100;
First you should initialized min to INT_MAX which you can get it from #include <limits.h>. Maybe all words are bigger than 100.
for (int i = 0 ; i < strlen(s); i++)
you can use the C-String terminal character '\0':
for(int i = 0; s[i] != '\0'; i++)
The if and else part:
if (s[i] != ' ')
{
count++;
}
else
{
if (count < min)
{
min = count;
}
count = 0;
}
almost correct, but you need to append count != 0 to the condition count < min to ensure that if the string starts with spaces, you do not want to count them as the smallest word.
This part can be removed :
if (s[i] == '\0')
{
count = 0;
while(s[i] != ' ')
{
--i;
count++;
if(s[i] == ' ' && count < min)
{
min = count;
}
}
}
check the last word outside the loop. Hence, your code would look like the following:
ssize_t find_short(const char *s)
{
int min = INT_MAX;
int count = 0;
// Iterate over the string
for(int i = 0; s[i] != '\0'; i++){
if(s[i] != ' '){ // Increment the size of the word find so far
count++;
}
else{ // There is a space meaning new word
if(count != 0 && count < min){ // Check if current word is the smallest
min = count; // Update the counter
}
count = 0; // Set the counter
}
}
return (count < min) ? count : min // Check the size of the last word
}
There are multiple issues in your code:
mixing int, size_t and ssize_t for the index and string lengths and return value is confusing and incorrect as these types have a different range.
int min = 100; produces an incorrect return value if the shortest word is longer than that.
for (int i = 0 ; i < strlen(s); i++) is potentially very inefficient as the string length may be recomputed at every iteration. for (size_t i = 0 ; s[i] != '\0'; i++) is a better alternative.
find_short returns 100 for the empty string instead of 0.
Scanning backwards from the end is tricky and not necessary: to avoid this special case, omit the test in the for loop and detect the end of word by comparing the character with space or the null byte, breaking from the loop in the latter case after potentially updating the minimum length.
The initial value for min should be 0 to account for the case where the string is empty or contains only whitespace. Whenever a word has been found, min should be updated if it is 0 or if the word length is non zero and less than min.
Here are an implementation using <ctype.h> to test for whitespace:
#include <stddef.h>
#include <ctype.h>
size_t find_short(const char *s) {
size_t min = 0, len = 0;
for (;;) {
unsigned char c = *s++;
if (isspace(c) || c == '\0') {
if (min == 0 || (len > 0 && len < min))
min = len;
if (c == '\0') // test for loop termination
break;
len = 0;
} else {
len++;
}
}
return min;
}
Here is a more general alternative using the functions strcspn() and strspn() from <string.h> where you can define the set of word separators:
#include <string.h>
size_t find_short(const char *s) {
size_t min = 0;
const char *seps = " \t\r\n"; // you could add dashes and punctuation
while (*s) {
s += strspn(s, seps);
if (*s) {
size_t len = strcspn(s, seps);
if (min == 0 || (len > 0 && len < min))
min = len;
s += len;
}
}
return min;
}
As you said your program fails if the shortest word is the last one. That's because the last i for which the loop runs is i == len-1 that is the last letter of the string. So in this last lap, count will be incremented, but you will never check if this count of the last word was smaller than the min you had so far.
Assuming that you receive a null-terminated string, you could extend the loop till i <= len (where len = strlen(s)) and adjust the if condition to
if( s[i] != ' ' && s[i] )
That means: if s[i] is not a space nor the terminating null character
Also you can remove the condition if (s[i] == '\0').
About faster algorithms, I don't think it's possible to do better.
If you want you can automate the count increment using an inner empty for loop running till it finds a space and then in the outer for check for how long the innermost have been running.
I once wrote a program for the same problem which uses an inner for, I show you just for the algorithm, but don't take example from the "style" of the code, I was trying to make it as few lines as possible and that's not a good practice.
ssize_t find_short(const char *s)
{
ssize_t min = 99, i = 0;
for( --s; !i || (min > 1 && *s); s += i) {
for(i = 1; *(s+i) != ' ' && *(s+i); i++);
if( min > i-1 ) min = i-1;
}
return min;
}
Oh, one improvement I just noticed in my code could be to return the min when it reaches 1 because you know you are not going to find shorter words.
I'd suggest that you use strtok to split your string into an array of strings using space character as the token. You can then process each string in the resulting array to determine which is the "word" that you want.
See Split strings into tokens and save them in an array

Program to count length of each word in string in C

I'm writting a program to count the length of each word in array of characters. I was wondering if You guys could help me, because I'm struggling with it for at least two hours for now and i don't know how to do it properly.
It should go like that:
(number of letters) - (number of words with this many letters)
2 - 1
3 - 4
5 - 1
etc.
char tab[1000];
int k = 0, x = 0;
printf("Enter text: ");
fgets(tab, 1000, stdin);
for (int i = 2; i < (int)strlen(tab); i++)
{
for (int j = 0; j < (int)strlen(tab); j++)
{
if (tab[j] == '\0' || tab[j]=='\n')
break;
if (tab[j] == ' ')
k = 0;
else k++;
if (k == i)
{
x++;
k = 0;
}
}
if (x != 0)
{
printf("%d - %d\n", i, x);
x = 0;
k = 0;
}
}
return 0;
By using two for loops, you're doing len**2 character scans. (e.g.) For a buffer of length 1000, instead of 1000 character comparisons, you're doing 1,000,000 comparisons.
This can be done in a single for loop if we use a word length histogram array.
The basic algorithm is the same as your inner loop.
When we have a non-space character, we increment a current length value. When we see a space, we increment the histogram cell (indexed by the length value) by 1. We then set the length value to 0.
Here's some code that works:
#include <stdio.h>
int
main(void)
{
int hist[100] = { 0 };
char buf[1000];
char *bp;
int chr;
int curlen = 0;
printf("Enter text: ");
fflush(stdout);
fgets(buf,sizeof(buf),stdin);
bp = buf;
for (chr = *bp++; chr != 0; chr = *bp++) {
if (chr == '\n')
break;
// end of word -- increment the histogram cell
if (chr == ' ') {
hist[curlen] += 1;
curlen = 0;
}
// got an alpha char -- increment the length of the word
else
curlen += 1;
}
// catch the final word on the line
hist[curlen] += 1;
for (curlen = 1; curlen < sizeof(hist) / sizeof(hist[0]); ++curlen) {
int count = hist[curlen];
if (count > 0)
printf("%d - %d\n",curlen,count);
}
return 0;
}
UPDATE:
and i don't really understand pointers. Is there any simpler method to do this?
Pointers are a very important [essential] tool in the C arsenal, so I hope you get to them soon.
However, it is easy enough to convert the for loop (Removing the char *bp; and bp = buf;):
Change:
for (chr = *bp++; chr != 0; chr = *bp++) {
Into:
for (int bufidx = 0; ; ++bufidx) {
chr = buf[bufidx];
if (chr == 0)
break;
The rest of the for loop remains the same.
Here's another loop [but, without optimization by the compiler] double fetches the char:
for (int bufidx = 0; buf[bufidx] != 0; ++bufidx) {
chr = buf[bufidx];
Here is a single line version. Note this is not recommended practice because of the embedded assignment of chr inside the loop condition clause, but is for illustration purposes:
for (int bufidx = 0; (chr = buf[bufidx]) != 0; ++bufidx) {

Folding input lines every nth column (K&R 1-22) in C

Write a program to "fold" long input lines into two or more shorter lines after the last non-blank character that occurs before the n-th column of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column.
The algorithm I decided to follow for this was as follows:
If length of input line < maxcol (the column after which one would have to fold), then print the line as it is.
If not, from maxcol, I check towards it's left, and it's right to find the closest non-space character, and save them as 'first' and 'last'. I then print the character array from line[0] to line[first] and then the rest of the array, from line[last] to line[len] becomes the new line array.
Here's my code:
#include <stdio.h>
#define MAXCOL 5
int getline1(char line[]);
int main()
{
char line[1000];
int len, i, j, first, last;
len = getline1(line);
while (len > 0) {
if (len < MAXCOL) {
printf("%s\n", line);
break;
}
else {
for (i = MAXCOL - 1; i >= 0; i--) {
if (line[i] != ' ') {
first = i;
break;
}
}
for (j = MAXCOL - 1; j <= len; j++) {
if (line[j] != ' ') {
last = j;
break;
}
}
//printf("first %d last %d\n", first, last);
for (i = 0; i <= first; i++)
putchar(line[i]);
putchar('\n');
for (i = 0; i < len - last; i++) {
line[i] = line[last + i];
}
len -= last;
first = last = 0;
}
}
return 0;
}
int getline1(char line[])
{
int c, i = 0;
while ((c = getchar()) != EOF && c != '\n')
line[i++] = c;
if (c == '\n')
line[i++] = '\n';
line[i] = '\0';
return i;
}
Here are the problems:
It does not do something intelligent with very long lines (this is fine, as I can add it as an edge case).
It does not do anything for tabs.
I cannot understand a part of the output.
For example, with the input:
asd de def deffff
I get the output:
asd
de
def
defff //Expected until here
//Unexpected lines below
ff
fff
deffff
deffff
deffff
Question 1 - Why do the unexpected lines print? How do I make my program/algorithm better?
Eventually, after spending quite some time with this question, I gave up and decided to check the clc-wiki for solutions. Every program here did NOT work, save one (The others didn't work because they did not cover certain edge cases). The one that worked was the largest one, and it did not make any sense to me. It did not have any comments, and neither could I properly understand the variable names, and what they represented. But it was the ONLY program in the wiki that worked.
#include <stdio.h>
#define YES 1
#define NO 0
int main(void)
{
int TCOL = 8, ch, co[3], i, COL = 19, tabs[COL - 1];
char bls[COL - 1], bonly = YES;
co[0] = co[1] = co[2] = 0;
while ((ch = getchar()) != EOF)
{
if (ch != '\t') {
++co[0];
++co[2];
}
else {
co[0] = co[0] + (TCOL * (1 + (co[2] / TCOL)) - co[2]);
i = co[2];
co[2] = TCOL + (co[2] / TCOL) * TCOL;
}
if (ch != '\n' && ch != ' ' && ch != '\t')
{
if (co[0] >= COL) {
putchar('\n');
co[0] = 1;
co[1] = 0;
}
else
for (i = co[1]; co[1] > 0; --co[1])
{
if (bls[i - co[1]] == ' ')
putchar(bls[i - co[1]]);
else
for (; tabs[i - co[1]] != 0;)
if (tabs[i - co[1]] > 0) {
putchar(' ');
--tabs[i - co[1]];
}
else {
tabs[i - co[1]] = 0;
putchar(bls[i - co[1]]);
}
}
putchar(ch);
if (bonly == YES)
bonly = NO;
}
else if (ch != '\n')
{
if (co[0] >= COL)
{
if (bonly == NO) {
putchar('\n');
bonly = YES;
}
co[0] = co[1] = 0;
}
else if (bonly == NO) {
bls[co[1]] = ch;
if (ch == '\t') {
if (TCOL * (1 + ((co[0] - (co[2] - i)) / TCOL)) -
(co[0] - (co[2] - i)) == co[2] - i)
tabs[co[1]] = -1;
else
tabs[co[1]] = co[2] - i;
}
++co[1];
}
else
co[0] = co[1] = 0;
}
else {
putchar(ch);
if (bonly == NO)
bonly = YES;
co[0] = co[1] = co[2] = 0;
}
}
return 0;
}
Question 2 - Can you help me make sense of this code and how it works?
It fixes all the problems with my solution, and also works by reading character to character, and therefore seems more efficient.
Question 1 - Why do the unexpected lines print? How do I make my program/algorithm better?
You are getting the unexpected lines in the output because after printing the array, you are not terminating the new line array with null character \0 -
Here you are copying character from starting from last till len - last, creating a new line array:
for (i = 0; i < len - last; i++) {
line[i] = line[last + i];
}
You have copied the characters but the null terminating character is still at its original position. Assume the input string is:
asd de def deffff
So, initially the content of line array will be:
"asd de def deffff\n"
^
|
null character is here
Now after printing asd, you are copying characters from last index of line till len - last index to line array itself starting from 0 index. So, after copying the content of line array will be:
"de def deffff\n deffff\n"
|____ _____|
\/
This is causing the unexpected output
(null character is still at the previous location)
So, after for loop you should add the null character just after the last character copied, like this:
line [len - last] = '\0';
With this the content of line array that will be processed in the next iteration of while loop will be:
"de def deffff\n"
One more thing, in the line array you can see the \n (newline) character at the end. May you want to remove it before processing the input, you can do:
line[strcspn(line, "\n")] = 0;
Improvements that you can do in your program:
1. One very obvious improvement that you can do is to use pointer to the input string while processing it. With the help of pointer you don't need to copy the rest of the array, apart from processed part, again to the same array till the program process the whole input. Initialize the pointer to the start of the input string and in every iteration just move the pointer to appropriate location and start processing from that location where pointer is pointing to.
2. Since you are taking the whole input first in a buffer and then processing it. You may consider fgets() for taking input. It will give better control over the input from user.
3. Add a check for line array overflow, in case of very long input. With fgets() you can specify the maximum number of character to be copied to line array from input stream.
Question 2 - Can you help me make sense of this code and how it works?
The program is very simple, try to understand it at least once by yourself. Either use a debugger or take a pen and paper, dry run it once for small size input and check the output. Increase the input size and add some variations like multiple space characters and check the program code path and output. This way you can understand it very easily.
Here's another (and I think better) solution to this exercise :
#include <stdio.h>
#define MAXCOL 10
void my_flush(char buf[]);
int main()
{
int c, prev_char, i, j, ctr, spaceleft, first_non_space_buf;
char buf[MAXCOL+2];
prev_char = -1;
i = first_non_space_buf = ctr = 0;
spaceleft = MAXCOL;
printf("Just keep typing once the output has been printed");
while ((c = getchar()) != EOF) {
if (buf[0] == '\n') {
i = 0;
my_flush(buf);
}
//printf("Prev char = %c and Current char = %c and i = %d and fnsb = %d and spaceleft = %d and j = %d and buf = %s \n", prev_char, c, i, first_non_space_buf, spaceleft, j, buf);
if ((((prev_char != ' ') && (prev_char != '\t') && (prev_char != '\n')) &&
((c == ' ') || (c == '\t') || (c == '\n'))) ||
(i == MAXCOL)) {
if (i <= spaceleft) {
printf("%s", buf);
spaceleft -= i;
}
else {
putchar('\n');
spaceleft = MAXCOL;
for (j = first_non_space_buf; buf[j] != '\0'; ++j) {
putchar(buf[j]);
++ctr;
}
spaceleft -= ctr;
}
i = 0;
my_flush(buf);
buf[i++] = c;
first_non_space_buf = j = ctr = 0;
}
else {
if (((prev_char == ' ') || (prev_char == '\t') || (prev_char == '\n')) &&
((c != ' ') && (c != '\t') && (c != '\n'))) {
first_non_space_buf = i;
}
buf[i++] = c;
buf[i] = '\0';
}
prev_char = c;
}
printf("%s", buf);
return 0;
}
void my_flush(char buf[])
{
int i;
for (i = 0; i < MAXCOL; ++i)
buf[i] = '\0';
}
Below is my solution, I know the thread is no longer active but my code might help someone who's facing issues to grasp the already presented code snippets.
*EDIT
explaination
Keep reading input unless the input contains '\n', '\t' or there've been
atleast MAXCOl chars.
Incase of '\t', use expandTab to replace with required spaces and use printLine if it doesn't exceed MAXCOl.
Incase of '\n', directly use printLine and reset the index.
If index is 10:
find the last blank using findBlank ad get a new index.
use printLine to print the current line.
get new index as 0 or index of newly copied char array using the newIndex function.
code
/* fold long lines after last non-blank char */
#include <stdio.h>
#define MAXCOL 10 /* maximum column of input */
#define TABSIZE 8 /* tab size */
char line[MAXCOL]; /* input line */
int expandTab(int index);
int findBlank(int index);
int newIndex(int index);
void printLine(int index);
void main() {
int c, index;
index = 0;
while((c = getchar()) != EOF) {
line[index] = c; /* store current char */
if (c == '\t')
index = expandTab(index);
else if (c == '\n') {
printLine(index); /* print current input line */
index = 0;
} else if (++index == MAXCOL) {
index = findBlank(index);
printLine(index);
index = newIndex(index);
}
}
}
/* expand tab into blanks */
int expandTab(int index) {
line[index] = ' '; /* tab is atleast one blank */
for (++index; index < MAXCOL && index % TABSIZE != 0; ++index)
line[index] = ' ';
if (index > MAXCOL)
return index;
else {
printLine(index);
return 0;
}
}
/* find last blank position */
int findBlank(int index) {
while( index > 0 && line[index] != ' ')
--index;
if (index == 0)
return MAXCOL;
else
return index - 1;
}
/* re-arrange line with new position */
int newIndex(int index) {
int i, j;
if (index <= 0 || index >= MAXCOL)
return 0;
else {
i = 0;
for (j = index; j < MAXCOL; ++j) {
line[i] = line[j];
++i;
}
return i;
}
}
/* print line until passed index */
void printLine(int index) {
int i;
for(i = 0; i < index; ++i)
putchar(line[i]);
if (index > 0)
putchar('\n');
}

Copying a string into a new array

I'm trying to read a string in an array, and if a character is not any of the excluded characters int a = ('a'||'e'||'i'||'o'||'u'||'y'||'w'||'h'); it should copy the character into a new array, then print it.
The code reads as:
void letter_remover (char b[])
{
int i;
char c[MAX];
int a = ('a'||'e'||'i'||'o'||'u'||'y'||'w'||'h');
for (i = 0; b[i] != '\0'; i++)
{
if (b[i] != a)
{
c[i] = b[i];
}
i++;
}
c[i] = '\0';
printf("New string without forbidden characters: %s\n", c);
}
However it only prints New string without forbidden characters: h, if the inputted array is, for example hello. I'd like the output of this to be ll (with h, e and o removed).
Use this:
if (b[i] != 'a' && b[i] != 'e' && b[i] != 'i' && b[i] != 'o' && b[i] != 'u' && b[i] != 'y' && b[i] != 'w' && b[i] != 'h')
The boolean OR operator just returns 0 or 1, it doesn't create an object that automatically tests against all the parameters to the operator.
You could also use the strchr() function to search for a character in a string.
char a[] = "aeiouywh";
for (i = 0; b[i] != '\0'; i++)
{
if (!strchr(a, b[i]))
{
c[i] = b[i];
}
i++;
}
c[i] = '\0';
int a = ('a'||'e'||'i'||'o'||'u'||'y'||'w'||'h');
...has an entirely different meaning than you expect. When you Boolean-OR together all those characters, a becomes 1. Since b[] contains no character value 1, no characters will be excluded. Also, your c[] is going to have empty slots if you had tested correctly.
You can use strcspn() to test if your string contains your forbidden characters. For example...
// snip
int i=0, j=0;
char * a = "aeiouywh";
while (b[i])
{
int idx = strcspn(&b[i], a);
if (idx >= 0)
{
if (idx > 0)
strncpy(&c[j], &b[i], idx);
j += idx;
i += idx + 1;
}
}
// etc...
Also, you must be sure c[] is large enough to contain all the characters that might be copied.

Print the longest word of a string

I wrote the following code
#include<stdio.h>
int main(void)
{
int i, max = 0, count = 0, j;
char str[] = "I'm a programmer";
for(i = 0; i < str[i]; i++)
{
if (str[i] != ' ')
count++;
else
{
if (max < count)
{
j = i - count;
max = count;
}
count = 0;
}
}
for(i = j; i < j + max; i++)
printf("%c", str[i]);
return 0;
}
With the intention to find and print the longest word, but does not work when the longest word this in the last as I'm a programmer I printed I'm instead of programmer
How to solve this problem, someone gives me a hand
The terminating condition of your for loop is wrong. It should be:
for(i = 0; i < strlen(str) + 1; i++)
and also, since at the end of string you don't have a ' ', but you have a '\0', you should change:
if (str[i] != ' ')
to:
if (str[i] != ' ' && str[i] != '\0')
The issue should be rather obvious. You only update your longest found word when the character you are inspecting is a space. But there is no space after the longest word in your test string, thus updating code is never exeuted for it.
You can just plop this code after the loop and that should do the trick.
Note however you could have trivially found this by adding mere printfs showing progress of this function.

Resources