How do I check if a pattern exists in an entered string? - c

I have an assignment where the user enters a string and then a pattern in one function, and then has to check if the pattern exists in the string and how many times it appears and at what offset. I'm stumped and my classmates keep giving me cryptic hints. Below is my get function
int getNums()
{
printf("Please enter a number: "); //Initial printf
int count, patcount;
int torf;
char len_num[31]; //The character array for the initial entered string
char pat_num[6]; //The character array for the entered pattern after initial string
char *lenptr = len_num; //pointer to the address of the first element of len_num
char *patptr = pat_num; //pointer to the address of the first element of len_num
scanf("%s", len_num); //Where the user scans in their wanted number, which is treated as a string
printf("\n");
printf("%s\n", lenptr);
int len = stringLength(lenptr); //Checks how long string is
int valid = isValid(len_num); //Checks if string is valid
for(count=0; count<len_num[count]; count++) //Checks if length of string is within appropriate range
{
if(len>=10 && len<=30) //Continues to pattern get if within range
{
torf=1;
}
else //Denies continuation if string is outside of range
{
torf=0;
printf("Not within range! Try again!\n");
return (1);
}
}
printf("Please enter a pattern: "); //Initial entry statement for pattern
scanf("%s", pat_num); //User scans in pattern
printf("\n");
printf("%s\n", pat_num);
len = stringPattern(patptr); //Check how long pattern is
valid = isValid(pat_num); //Checks if pattern is valid
for(patcount=0; patcount<pat_num[patcount]; patcount++) //Checks if length of pattern is within appropriate range
{
if(len>=2 && len<=5) //Continues to pattern check if within range
{
torf=1;
}
else //Denies continuation if pattern is outside of range
{
torf=0;
printf("Pattern not within range! Try again!\n");
return (1);
}
}
checkPattern();
}
I don't know how I should start my check function. Not to mention I have to pass by reference with pointers and I'm stuck with that too

Since you have asked for the pattern matching function, I did not check your string input function. You may use this simple driver code to test my solution:
#include <stdio.h>
void findPattern(char* input, char* pattern);
int main()
{
char input[31], pattern[6];
printf("Enter the string: ");
scanf("%s", input);
printf("Enter the pattern: ");
scanf("%s", pattern);
findPattern(input, pattern);
return 0;
}
I prefer findPattern over checkPattern. You shall rename it according to your convenience. I have not used any library functions apart from that in stdio.h as per your requirement. Following is my take on this task, I have explained the logic in the comments. Basically, it just iterates over the entire input string once where it checks for a match with the initial character in the pattern. If so, it marks the offset and searches further down the pattern to find a complete match.
void findPattern(char* input, char* pattern)
{
int i = 0; // iterator for input
int j = 0; // iterator for pattern
// solution variables
int offset = 0;
int occurrence = 0;
// Search the entire input string
while (input[i] != '\0')
{
// Mark the offset whenever the first character of the pattern matches
if (input[i] == pattern[j])
{
offset = i;
// I didn't quite get the relativity of your offset
// Maybe you need: offset = i + 1;
}
// Search for complete pattern match
while (input[i] != '\0' && pattern[j] == input[i])
{
// Go for the next character in the pattern
++j;
// The pattern matched successfully if the entire pattern was searched
if (pattern[j] == '\0')
{
// Display the offset
printf("\nPattern found at offset %d", offset);
// Increment the occurrence
++occurrence;
// There are no more characters left in the pattern
break;
}
else
{
// Go for the next character in the input
// only if there are more characters left to be searched in the pattern
++i;
}
}
// Reset the pattern iterator to search for a new match
j = 0;
// Increment the input iterator to search further down the string
++i;
}
// Display the occurrence of the pattern in the input string
printf("\nThe pattern has occurred %d times in the given string", occurrence);
}
I have to pass by reference with pointers and I'm stuck with that too
If that's the case then instead of findPattern(input, pattern);, call this function as:
findPattern(&input, &pattern);

You may be way over thinking the solution. You have a string input with a number of characters that you want to count the number of multi-character matches of pattern in. One nice thing about strings is you do not need to know how long they are to iterate over them, because by definition a string in C ends with the nul-terminating character.
This allows you to simply keep an index within your findpattern function and you increment the index each time the character from input matches the character in pattern (otherwise you zero the index). If you reach the point where pattern[index] == '\0' you have matched all characters in your pattern.
You must always declare a function with a type that will provide a meaningful return to indicate success/failure of whatever operation the function carries out if it is necessary to the remainder of your code (if the function just prints output -- then void is fine).
Otherwise, you need to choose a sane return type to indicate whether (and how many) matches of pattern were found in input. Here a simple int type will do. (which limits the number of matches that can be returned to 2147483647 which should be more than adequate).
Putting those pieces together, you could simplify your function to something similar to:
int findpattern (const char *input, const char *ptrn)
{
int n = 0, idx = 0; /* match count and pattern index */
while (*input) { /* loop over each char in s */
if (*input == ptrn[idx]) /* if current matches pattern char */
idx++; /* increment pattern index */
else /* otherwize */
idx = 0; /* zero pattern index */
if (!ptrn[idx]) { /* if end of pattern - match found */
n++; /* increment match count */
idx = 0; /* zero index for next match */
}
input++; /* increment pointer */
}
return n; /* return match count */
}
Adding a short example program that allows you to enter the pattern and input as the first two arguments to the program (or uses the defaults shown if one or both are not provided):
int main (int argc, char **argv) {
char *pattern = argc > 1 ? argv[1] : "my",
*input = argc > 2 ? argv[2] : "my dog has fleas, my cat has none";
int n;
if ((n = findpattern (input, pattern)))
printf ("'%s' occurs %d time(s) in '%s'\n", pattern, n, input);
else
puts ("pattern not found");
}
Note how providing a meaningful return allows you to both (1) validate whether or not a match was found; and (2) provides the number of matches found through the return. The complete code just needs the header stdio.h, e.g.
#include <stdio.h>
int findpattern (const char *input, const char *ptrn)
{
int n = 0, idx = 0; /* match count and pattern index */
while (*input) { /* loop over each char in s */
if (*input == ptrn[idx]) /* if current matches pattern char */
idx++; /* increment pattern index */
else /* otherwize */
idx = 0; /* zero pattern index */
if (!ptrn[idx]) { /* if end of pattern - match found */
n++; /* increment match count */
idx = 0; /* zero index for next match */
}
input++; /* increment pointer */
}
return n; /* return match count */
}
int main (int argc, char **argv) {
char *pattern = argc > 1 ? argv[1] : "my",
*input = argc > 2 ? argv[2] : "my dog has fleas, my cat has none";
int n;
if ((n = findpattern (input, pattern)))
printf ("'%s' occurs %d time(s) in '%s'\n", pattern, n, input);
else
puts ("pattern not found");
}
Example Use/Output
Check for multiple matches:
$ ./bin/findpattern
'my' occurs 2 time(s) in 'my dog has fleas, my cat has none'
A single match:
$ ./bin/findpattern fleas
'fleas' occurs 1 time(s) in 'my dog has fleas, my cat has none'
Pattern not found
$ ./bin/findpattern gophers
pattern not found
All the same pattern:
$ ./bin/findpattern my "mymymy"
'my' occurs 3 time(s) in 'mymymy'
Output From Function Itself
While it would be better to provide a return to indicate the number of matches (which would allow the function to be reused in a number of different ways), if you did just want to make this an output function that outputs the results each time it is called, then simply move the output into the function and declare another pointer to input so input is preserved for printing at the end.
The changes are minimal, e.g.
#include <stdio.h>
void findpattern (const char *input, const char *ptrn)
{
const char *p = input; /* pointer to input */
int n = 0, idx = 0; /* match count and pattern index */
while (*p) { /* loop over each char in s */
if (*p == ptrn[idx]) /* if current matches pattern char */
idx++; /* increment pattern index */
else /* otherwize */
idx = 0; /* zero pattern index */
if (!ptrn[idx]) { /* if end of pattern - match found */
n++; /* increment match count */
idx = 0; /* zero index for next match */
}
p++; /* increment pointer */
}
if (n) /* output results */
printf ("'%s' occurs %d time(s) in '%s'\n", ptrn, n, input);
else
puts ("pattern not found");
}
int main (int argc, char **argv) {
char *pattern = argc > 1 ? argv[1] : "my",
*input = argc > 2 ? argv[2] : "my dog has fleas, my cat has none";
findpattern (input, pattern);
}
(use and output are the same as above)
Look things over and let me know if you have further questions.

Related

Writing a C program that removes every occurrence of a char except the last one

Im trying to write a C program that removes all occurrences of repeating chars in a string except the last occurrence.For example if I had the string
char word[]="Hihxiivaeiavigru";
output should be:
printf("%s",word);
hxeavigru
What I have so far:
#include <stdio.h>
#include <string.h>
int main()
{
char word[]="Hihxiiveiaigru";
for (int i=0;i<strlen(word);i++){
if (word[i+1]==word[i]);
memmove(&word[i], &word[i + 1], strlen(word) - i);
}
printf("%s",word);
return 0;
}
I am not sure what I am doing wrong.
With short strings, any algorithm will do. OP's attempt is O(n*n) (as well as other working answers and #David C. Rankin that identified OP's short-comings.)
But what if the string was thousands, millions in length?
Consider the following algorithm: #paulsm4
Form a `bool` array used[CHAR_MAX - CHAR_MIN + 1] and set each false.
i,unique = n - 1;
From the end of the string (n-1 to 0) to the front:
if (character never seen yet) { // used[] look-up
array[unique] = array[i];
unique--;
}
Mark used[array[i]] as true (index from CHAR_MIN)
i--;
Shift the string "to the left" (unique - i) places
Solution is O(n)
Coding goal is too fun to just post a fully coded answer.
I would first write a function to determine if a char ch at a given position i is the last occurence of ch given a char *. Like,
bool isLast(char *word, char ch, int p) {
p++;
ch = tolower(ch);
while (word[p] != '\0') {
if (tolower(word[p]) == ch) {
return false;
}
p++;
}
return true;
}
Then you can use that to iteratively emit your desired characters like
int main() {
char *word = "Hihxiivaeiavigru";
for (int i = 0; word[i] != '\0'; i++) {
if (isLast(word, word[i], i)) {
putchar(word[i]);
}
}
putchar('\n');
}
And (for completeness) I used
#include <stdio.h>
#include <ctype.h>
#include <stdbool.h>
Outputs (as requested)
hxeavigru
Additional areas where you are currently hurting yourself.
Your for loop must NOT increment the index, e.g. for (int i=0; word[i];). This is because when you memmove() by 1, you have just incremented the indexes. That also means the value to save for last is now i - 1.
there should only be one call to strlen() in the program. You can simply subtract one from length each time memmove() is called.
only increment your loop counter variable when memmove() is not called.
Additionally, avoid hardcoding strings. You shouldn't have to recompile your code just to test the results of "Hihxiivaeiaigrui" instead of "Hihxiivaeiaigru". You shouldn't have to recompile just to remove all but the last 'a' instead of the 'i'. Either pass the string and character to find as arguments to your program (that's what int argc, char **argv are for), or prompt the user for input.
Putting it altogether you could do (presuming word is 1023 characters or less):
#include <stdio.h>
#include <string.h>
#define MAXC 1024
int main (int argc, char **argv) {
char word[MAXC]; /* storage for word */
strcpy (word, argc > 1 ? argv[1] : "Hihxiivaeiaigru"); /* copy to word */
int find = argc > 2 ? *argv[2] : 'i', /* character to find */
last = -1; /* last index where find found */
size_t len = strlen (word); /* only compute strlen once */
printf ("%s (removing all but last %c)\n", word, find);
for (int i=0; word[i];) { /* loop over each char -- do NOT increment */
if (word[i] == find) { /* is this my character to find? */
if (last != -1) { /* if last is set */
/* overwrite last with rest of word */
memmove (&word[last], &word[last + 1], (int)len - last);
last = i - 1; /* last now i - 1 (we just moved it) */
len = len - 1;
}
else { /* last not set */
last = i; /* set it */
i++; /* increment loop counter */
}
}
else /* all other chars */
i++; /* just increment loop counter */
}
puts (word); /* output result -- no need for printf (no coversions) */
}
Example Use/Output
$ ./bin/rm_all_but_last_occurrence
Hihxiivaeiaigru (removing all but last i)
Hhxvaeaigru
What if you want to use "Hihxiivaeiaigrui"? Just pass it as the 1st argument:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui
Hihxiivaeiaigrui (removing all but last i)
Hhxvaeagrui
What if you want to use "Hihxiivaeiaigrui" and remove duplicate 'a' characters? Just pass the string to search as the 1st argument and the character to find as the second:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui a
Hihxiivaeiaigrui (removing all but last a)
Hihxiiveiaigrui
Nothing removed if only one of the characters:
$ ./bin/rm_all_but_last_occurrence Hihxiivaeiaigrui H
Hihxiivaeiaigrui (removing all but last H)
Hihxiivaeiaigrui
Let me know if you have further questions.
Im trying to write a C program that removes all occurrences of repeating chars in a string except the last occurrence.
Process the string (or word) from last character and move towards the first character of string (or word). Now, think of it as a problem where you have to remove all occurrence of a character from string and except the first occurrence. Since, we are processing the string from last character to first character, so, we have to move the characters, which are remain after removing duplicates, to the start of string once you have processed whole string and, if, there were duplicate characters found in the string. The complexity of this algorithm is O(n).
Implementation:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define INDX(x) (tolower(x) - 'a')
void remove_dups_except_last (char str[]) {
int map[26] = {0}; /* to keep track of a character processed */
size_t len = strlen (str);
char *p = str + len; /* pointer pointing to null character of input string */
size_t i = 0;
for (i = len; i != 0; --i) {
if (map[INDX(str[i - 1])] == 0) {
map[INDX(str[i - 1])] = 1;
*--p = str[i - 1];
}
}
/* if there were duplicates characters then only copy
*/
if (p != str) {
for (i = 0; *p; ++i) {
str[i] = *p++;
}
str[i] = '\0';
}
}
int main(int argc, char* argv[])
{
if (argc != 2) {
printf ("Invalid number of arguments\n");
return -1;
}
char str[1024] = {0};
/* Assumption: the input string/word will contain characters A-Z and a-z
* only and size of input will not be more than 1023.
*
* Leaving it up to you to check the valid characters in input string/word
*/
strcpy (str, argv[1]);
printf ("Original string : %s\n", str);
remove_dups_except_last (str);
printf ("Removed duplicated characters except the last one, modified string : %s\n", str);
return 0;
}
Testcases output:
# ./a.out Hihxiivaeiavigru
Original string : Hihxiivaeiavigru
Removed duplicated characters except the last one, modified string : hxeavigru
# ./a.out aa
Original string : aa
Removed duplicated characters except the last one, modified string : a
# ./a.out a
Original string : a
Removed duplicated characters except the last one, modified string : a
# ./a.out TtYyuU
Original string : TtYyuU
Removed duplicated characters except the last one, modified string : tyU
You can re-iterate to get each characters of your string, then if it is not "i" and not the last occurrence of the i, copy to a new string.
#include <stdio.h>
#include <string.h>
int main() {
char word[]="Hihxiiveiaigru";
char newword[10000];
char* ptr = strrchr(word, 'i');
int index=0;
int index2=0;
while (index < strlen(word)) {
if (word[index]!='i' || index ==(ptr - word)) {
newword[index2]=word[index];
index2++;
}
index++;
}
printf("%s",newword);
return 0;
}

Printf not printing - returns NULL

beginner here. So I'm trying to write some code that take a sentence and returns the longest word. When I debugg the program everything looks correct as I'd expect including the char array. However when I come to print the output I invariably get a NULL...
I've put in the entire code because I think one of the loops must be effecting the array string pointer in some way?
#include <stdio.h>
#include <string.h>
void LongestWord(char sen1[500]) {
/*
steps:
1. char pointer. Each Byte holds array position of each space or return value Note space = 32 & return = 10.
2. Once got above asses biggest word. Biggest word stored in short int (starting position)
3. Once got biggest word start - move to sen using strncpy
*/
char sen[500];
char *ptr = sen;
int i = 0;
int space_position[500];
int j = 0;
int k = 0;
int word_size_prior_to_each_position[500];
int l = 0;
int largest = 0;
int largest_end_position = 0;
int largest_start_position =0;
memset(&sen[0], 0, 500);
memset(&space_position[0], 0, 2000);
memset(&word_size_prior_to_each_position[0], 0, 2000);
while (i < 500) { //mark out where the spaces or final return is
if ((sen1[i] == 0b00100000) ||
(sen1[i] == 0b00001010))
{
space_position[j] = i;
j = j+1;
}
i = i+1;
}
while (k < 500) {
if (k == 0) {
word_size_prior_to_each_position[k] = (space_position[k]);
}
//calculate word size at each position
if ((k > 0) && (space_position[k] != 0x00)) {
word_size_prior_to_each_position[k] = (space_position[k] - space_position[k-1]) -1;
}
k = k+1;
}
while (l < 500) { //find largest start position
if (word_size_prior_to_each_position[l] > largest) {
largest = word_size_prior_to_each_position[l];
largest_end_position = space_position[l];
largest_start_position = space_position[l-1];
}
l = l+1;
}
strncpy(ptr, sen1+largest_start_position+1, largest);
printf("%s", *ptr);
return 0;
}
int main(void) {
char stringcapture[500];
fgets(stringcapture, 499, stdin);
LongestWord(stringcapture); //this grabs input and posts into the longestword function
return 0;
}
In the function LongestWord replace
printf("%s", *ptr);
with
printf("%s\n", ptr);
*ptr denotes a single character, but you want to print a string (see %s specification), so you must use ptr instead. It makes sense to also add a line break (\n).
Also remove the
return 0;
there, because it's a void function.
Returning the longest word
To return the longest word from the function as pointer to char, you can change the function signature to
char *LongestWord(char sen1[500])
Since your pointer ptr points to a local array in LongestWord it will result in a dangling reference as soon as the function returns.
Therefore you need to do sth like:
return strdup(ptr);
Then in main you can change your code to:
char *longest_word = LongestWord(stringcapture);
printf("%s\n", longest_word);
free(longest_word);
Some more Hints
You have a declaration
int space_position[500];
There you are calling:
memset(&space_position[0], 0, 2000);
Here you are assuming that an int is 4 bytes. That assumption leads to not-portable code.
You should rather use:
memset(&space_position[0], 0, sizeof(space_position));
You can even write:
memset(space_position, 0, sizeof(space_position));
since space_position is the address of the array anyway.
Applied to your memsets, it would look like this:
memset(sen, 0, sizeof(sen));
memset(space_position, 0, sizeof(space_position));
memset(word_size_prior_to_each_position, 0, sizeof(word_size_prior_to_each_position));
Instead of using some binary numbers for space and return, you can alternatively use the probably more readable notation of ' ' and '\n', so that you could e.g. write:
if ((sen1[i] == ' ') ||
(sen1[i] == '\n'))
instead of
if ((sen1[i] == 0b00100000) ||
(sen1[i] == 0b00001010))
The variable largest_end_position is assigned but never used somewhere. So it can be removed.
The following line
strncpy(ptr, sen1 + largest_start_position + 1, largest);
would omit the first letter of the word if the first word were also the longest. It seems largest_start_position is the position of the space, but in case of the first word (largest_start_position == 0) you start to copy from index 1. This special case needs to be handled.
You have a local array in main that is not initialized.
So instead of
char stringcapture[500];
you must write
char stringcapture[500];
memset(stringcapture, 0, sizeof(stringcapture));
alternatively you could use:
char stringcapture[500] = {0};
Finally in this line:
largest_start_position = space_position[l - 1];
You access the array outside the boundaries if l==0 (space_position[-1]). So you have to write:
if (l > 0) {
largest_start_position = space_position[l - 1];
}
else {
largest_start_position = 0;
}
While Stephan has provided you with a good answer addressing the problems you were having with your implementation of your LongestWord function, you may be over-complicating what your are doing to find the longest word.
To be useful, think about what you need to know when getting the longest word from a sentence. You want to know (1) what the longest word is; and (2) how many characters does it contain? You can always call strlen again when the function returns, but why? You will have already handled that information in the function, so you might as well make that information available back in the caller.
You can write your function in a number of ways to either return the length of the longest word, or a pointer to the longest word itself, etc. If you want to return a pointer to the longest word, you can either pass an array of sufficient size as a parameter to the function for filling within the function, or you can dynamically allocate storage within the function so that the storage survives the function return (allocated storage duration verses automatic storage duration). You can also declare an array static and preserve storage that way, but that will limit you to one use of the function in any one expression. If returning a pointer to the longest word, to also make the length available back in the caller, you can pass a pointer as a parameter and update the value at that address within your function making the length available back in the calling function.
So long as you are simply looking for the longest word, the longest word in the unabridged dictionary (non-medical) is 29-characters (taking 30-characters storage total), or for medical terms the longest word is 45-character (taking 46-characters total). So it may make more sense to simply pass an array to fill with the longest word as a parameter since you already know what the max-length needed will be (an array of 64-chars will suffice -- or double that to not skimp on buffer size, your call).
Rather than using multiple arrays, a simple loop and a pair of pointers is all you need to walk down your sentence buffer bracketing the beginning and end of each word to pick out the longest one. (and the benefit there, as opposed to using a strtok, etc. is the original sentence is left unchanged allowing it to be passed as const char * allowing the compiler to further optimize the code)
A longest_word function that passes the sentence and word to fill as parameters returning the length of the longest string is fairly straight forward to do in a single loop. Loosely referred to as a State Loop, where you use a simple flag to keep track of your read state, i.e. whether you are in a word within the sentence or whether you are in whitespace before, between or after the words in the sentence. A simple In/Out state flag.
Then you simply use a pointer p to locate the beginning of each word, and an end-pointer ep to advance down the sentence to locate the end of each word, checking for the word with the max-length as you go. You can use the isspace() macro provided in ctype.h to locate the spaces between each word.
The loop itself does nothing more than loop continually while you keep track of each pointer and then check which word is the longest by the simple pointer difference ep - p when the end of each word is found. If a word is longer than the previous max, then copy that to your longest word array and update max with the new max-length.
A short implementation could be similar to:
size_t longest_word (const char *sentence, char *word)
{
const char *p = sentence, *ep = p; /* pointer & end-pointer */
size_t in = 0, max = 0; /* in-word flag & max len */
if (!sentence || !*sentence) /* if NULL or empty, set word empty */
return (*word = 0);
for (;;) { /* loop continually */
if (isspace (*ep) || !*ep) { /* check whitespace & end of string */
if (in) { /* if in-word */
size_t len = ep - p; /* get length */
if (len > max) { /* if greater than max */
memcpy (word, p, len); /* copy to word */
word[len] = 0; /* nul-terminate word */
max = len; /* update max */
}
p = ep; /* update pointer to end-pointer */
in = 0; /* zero in-word flag */
}
if (!*ep) /* if end of word, bail */
break;
}
else { /* non-space character */
if (!in) { /* if not in-word */
p = ep; /* update pointer to end-pointer */
in = 1; /* set in-word flag */
}
}
ep++; /* advance end-pointer */
}
return max; /* return max length */
}
A complete example taking the sentence to be read as user-input could be similar to:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXWRD 64 /* longest word size */
#define MAXC 2048 /* max characters in sentence */
size_t longest_word (const char *sentence, char *word)
{
const char *p = sentence, *ep = p; /* pointer & end-pointer */
size_t in = 0, max = 0; /* in-word flag & max len */
if (!sentence || !*sentence) /* if NULL or empty, set word empty */
return (*word = 0);
for (;;) { /* loop continually */
if (isspace (*ep) || !*ep) { /* check whitespace & end of string */
if (in) { /* if in-word */
size_t len = ep - p; /* get length */
if (len > max) { /* if greater than max */
memcpy (word, p, len); /* copy to word */
word[len] = 0; /* nul-terminate word */
max = len; /* update max */
}
p = ep; /* update pointer to end-pointer */
in = 0; /* zero in-word flag */
}
if (!*ep) /* if end of word, bail */
break;
}
else { /* non-space character */
if (!in) { /* if not in-word */
p = ep; /* update pointer to end-pointer */
in = 1; /* set in-word flag */
}
}
ep++; /* advance end-pointer */
}
return max; /* return max length */
}
int main (void) {
char buf[MAXC], word[MAXWRD];
size_t len;
if (!fgets (buf, MAXC, stdin)) {
fputs ("error: user canceled input.\n", stderr);
return 1;
}
len = longest_word (buf, word);
printf ("longest: %s (%zu-chars)\n", word, len);
return 0;
}
Example Use/Output
Entered string has 2-character leading whitespace as well as 2-characters trailing whitespace:
$ ./bin/longest_word
1234 123 12 123456 1234 123456789 12345678 1 1234
longest: 123456789 (9-chars)
This isn't intended to be a substitute for Stephan's answer helping with the immediate issues in your implementation, rather this is an example providing you with an alternative way to think about approaching the problem. Generally the simpler you can keep any coding task, the less error prone it will be. Look it over and let me know if you have any further questions about the approach.

Strange letter print out from array c programming

I am trying to delete the blank space between a string and print out the first word with isalpha() function.
When I print out, only the first letter prints out. exempel "hello big panda" I get "hhhhh" but I want the hole word "hello" instead
int main()
{
char inputString[]={"hello big panda"};
int k=0;
int i=0;
do
{
inputString[i];
i++;
}
while (inputString[i]=isalpha(inputString[i]));
for(i=0; inputString[i] !='\0' ;i++)
{
for (k=i; inputString[k] != '\0'; k++)
{
inputString[k] =inputString[i];
}
}
printf("%s", inputString);
return 0;
}
done this:
int printfirstword(char sentence[])
{
int k=0;
int i=0;
while (isalpha(sentence[i])) //checking for the first blank space
{
i++;
}
sentence[i] = '\0';
printf("%s\n", sentence);
return 0;
}
int main()
{
char sentence[100];
int wordNumber;
char answer;
printfirstword("Hello there")
return0;
}
But I don't want to change the string that is passed to it
What you can simply do is use a while loop instead of your do-while. You can simply increment i until you find the index of first blank space. Then using the value of i you can insert '\0' in your string. Output it and you are done. :)
#include <stdio.h>
#include<ctype.h>
int main()
{
char inputString[]={"hello big panda"};
int k=0;
int i=0;
while (isalpha(inputString[i])) //checking for the first blank space
{
i++;
}
inputString[i] = '\0';
printf("%s", inputString);
return 0;
}
If you would like to keep the original string then you could simply make a new string say newStr and then
while (isalpha(inputString[i])) //checking for the first blank space
{ newStr[i]=inputString[i]; //copy first word into newStr
i++;
}
newStr[i] = '\0';
printf("%s", newStr);
Your function must do 3 things as it works through the sentence finding words. (1) always check for the end of the string to prevent an attempted read beyond the end of the sentence; and (2) locate and print the requested word at the index given; and (3) handle the condition where the user requests a word index greater than that available.
(you should always test the sentence you are passed in the function to make sure the pointer isn't a NULL pointer, and that the contents of the sentence is simply the '\0' character indicating an empty-string)
An easy way to do this (after you have tested the input string), is to set up a continual loop, that repeatedly read the characters of a word, checks if it is the word to print (if so it prints), and if not read and discard all the non-alpha characters before the next word, and then repeats.
Something simple like the following works. It takes the sentence (or updated position within the sentence) and the index for the word to print zero-indexed, e.g. (0, 1, 2, ...) and then loops a described above.
(note: you can change the zero-index scheme to a 1, 2, 3, ... word-number scheme by initializing n=1; instead of 0 -- but since everything in C is zero indexed, that is left to you)
#include <stdio.h>
#include <ctype.h>
int prnword (const char *s, int nwrd)
{
int n = 0; /* word counter */
char *p = s; /* pointer to s */
if (!s || !*s) { /* test s not NULL and not empty */
fprintf (stderr, "error: string NULL, empty or at end.\n");
return 0;
}
for (;;) { /* loop continually until exit condition reached */
while (*p && isalpha(*p)) { /* loop over chars in s */
if (n == nwrd) /* if requested index */
putchar (*p); /* print all chars */
p++; /* increment pointer */
}
while (*p && !isalpha(*p)) /* iterate find next alpha */
p++;
if (++n > nwrd) /* if past our word, break */
break;
if (!*p) /* if end reached, break */
break;
}
if (n <= nwrd) { /* check request exceeds avaialble words */
fprintf (stderr, "error: request word '%d' "
"exceeds available wprds indexes.\n", nwrd);
return 0;
}
putchar ('\n'); /* tidy up with new line */
return p - s; /* return number of chars to next alpha */
}
int main (void) {
char str[] = "hello big panda";
int nchars = 0;
/* example -- all words in order
* passing update string position
*/
nchars = prnword (str, 0);
nchars += prnword (str + nchars, 0);
nchars += prnword (str + nchars, 0);
putchar ('\n');
/* request exceed available zero-based word indexes */
nchars = 0;
nchars += prnword (str, 3);
putchar ('\n');
/* print 2nd word only */
nchars = 0;
nchars = prnword (str, 1);
putchar ('\n');
return 0;
}
Example Use/Output
Note the first block of calls to prnword print each of the words in the sentence, saving the number of characters returned by prior calls and using that to start the function reading the 1st character of the desired word, meaning you are always looking for word index 0.
The second call intentionally gives an index one past the last word to force handling the error.
And finally, the last call simply says "Go print word 2" (index 1) starting from scratch.
Example Use/Output
$ ./bin/words
hello
big
panda
error: request word '3' exceeds available wprds indexes.
big
Look things over and let me know if you have questions.

C : How to sort words from variable number of files with frequency # and in alphabetical order

I'm new to C and I'm having trouble writing a C program that takes a variable number of files via command line arguments and sorts the words by (ASCII)alphabetical order and prints only unique words, but includes the frequencies. I managed to get as far as sorting words through user input in alphabetical order, but I don't know how to properly write the code to take file input, and I also have no clue how to only print each unique word once with it's frequency.
here's what I got so far, which takes stdin rather than file and lacks frequency count:
#include <stdio.h>
#include <string.h>
int main(void) {
char a[2048][2048];
int i = 0,
j = 0,
k = 0,
n;
while(i < 2048 && fgets(a[i], 2048, stdin) != NULL)
{
n = strlen(a[i]);
if(n > 0 && a[i][n-1] == '\n')
a[i][n -1] = '\0';
i++;
}
for(j = 0; j < i; j++)
{
char max[2048];
strcpy (max,a[j]);
for(k = j + 1; k < i; k++)
{
if(strcmp(a[k], max) < 0)
{
char temp[2048];
strcpy(temp, a[k]);
strcpy(a[k], max);
strcpy(max, temp);
}
}
strcpy(a[j],max);
}
for( j = 0; j < i; j++){
printf("%s\n", a[j]);
}
return 0;
}
In order to read words in a file into an array holding only unique words while keeping track of the number of occurrences of each time a word is seen, can be done in a couple of ways. An easy and straight-forward approach is to keep 2 separate arrays. The first, a 2D character array of sufficient size to hold the number of words anticipated, and the second, a numeric array (unsigned int or size_t) that contains the number of times each word is seen at the same index as the word is stored in the character array.
The only challenge while reading words from the file is to determine if a word has been seen before, if not, the new word is added to the seen character array at a given index and the frequency array freq is then updated at that index to reflect the word has been seen 1 time (e.g. freq[index]++;).
If while checking against your list of words in seen, you find the current word already appears at index X, then you skip adding the word to seen and simply update freq[X]++;.
Below is a short example that does just that. Give it a try and let me know if you have any questions:
#include <stdio.h>
#include <string.h>
#define MAXW 100
#define MAXC 32
int main (int argc, char **argv) {
/* initialize variables & open file or stdin for reading */
char seen[MAXW][MAXC] = {{ 0 }};
char word[MAXC] = {0};
size_t freq[MAXW] = {0};
size_t i, idx = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* seen 1st word into 'seen' array, update index 'idx' */
if (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1) {
strcpy (seen[idx], word);
freq[idx]++;
idx++;
}
else {
fprintf (stderr, "error: file read error.\n");
return 1;
}
/* read each word in file */
while (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1) {
/* check against all words in seen */
for (i = 0; i < idx; i++) {
/* if word already in 'seen', update 'freq' count */
if (strcmp (seen[i], word) == 0) {
freq[i]++;
goto skipdup; /* skip adding word to 'seen' */
}
} /* add word to 'seen', update freq & 'idx' */
strcpy (seen[idx], word);
freq[idx]++;
idx++;
skipdup:
if (idx == MAXW) { /* check 'idx' against MAXW */
fprintf (stderr, "warning: MAXW words exceeded.\n");
break;
}
}
if (fp != stdin) fclose (fp);
printf ("\nthe occurrence of words are:\n\n");
for (i = 0; i < idx; i++)
printf (" %-28s : %zu\n", seen[i], freq[i]);
return 0;
}
Compile
gcc -Wall -Wextra -O3 -o bin/file_words_occur file_words_occur.c
Input
$ cat dat/words.txt
the quick brown fox jumps over the lazy dog. the fox jumps over the dog to avoid the squirrel.
Output
$ ./bin/file_words_occur <dat/words.txt
the occurrence of words are:
the : 8
quick : 1
brown : 1
fox : 2
jumps : 2
over : 2
lazy : 1
dog : 2
to : 1
avoid : 1
squirrel : 2
was : 1
in : 1
path : 1
of : 1
captain : 1
jack : 1
sparrow : 1
a : 1
pirate : 1
so : 1
brave : 1
on : 1
seven : 1
seas : 1
Note: the longest word in the abridged dictionaries is 28 chars long (Antidisestablishmentarianism). It requires space for the nul-terminating character for a total of 29 chars. The choice of MAXC of 32 should accommodate all normal words.
Handle Multiple Files + Sorting Words/Occurrences Alphabetically
As noted in the comments, handling multiple files can be done with the existing code, simply by utilizing the codes ability to read from stdin. All you need to do is cat file1 file2 file3 | ./prog_name. Updating the code to handle multiple files as arguments is not difficult either. (you could just wrap the existing body with a for (j = 1, j < argc, j++) and open/close each filename provided. (some other slight tweaks to the fp declaration are also needed)
But what's the fun in that? Whenever you think about doing the same thing more than once in your program, the "I should make that a function" lightbulb should wink on. That is the proper way to think about handling repetitive processes in your code. (arguably, since there is just one thing we are doing more than once, and since we could simply wrap that in a for loop, we could get by without a function in this case -- but where is the learning in that?)
OK, so we know we are going to move the file-read/frequency-count code to a function, but what about the sort requirement? That's where we need to change the data handling from 2-arrays to an array of struct. Why go from 2-arrays to handling the data in a struct?
When you sort the words alphabetically, you must maintain the relationship between the seen array and the freq array so after the sort, you have the right number of occurrences with the right word. You cannot independently sort the arrays and keep that relationship. However, if we put both the word and the occurrences of that word in a struct, then we could sort an array of structs by the word and the right number of occurrences remains associated with the right word. e.g. something like the following would work:
typedef struct {
char seen[MAXC];
size_t freq;
} wfstruct;
(wfstruct is just a semi-descriptive name for word-frequency struct, it can be anything that makes sense to you)
Which in your program you will declare as an array of with something like:
wfstruct words[MAXW];
(you will actually want to initialize each member to zero -- that is done in the actual code below)
How to sort an array of that? qsort is your friend. qsort will sort a collection of anything so long as you can pass qsort (1) the array, (2) how many elements to sort, (3) the size of the elements, and (4) a compare function that takes a const void pointer to the elements it will compare. This always gives new C programmers fits because you have to figure out (a) how to pass the element of your array-of-whatever as a pointer, and (b) then how to handle getting the data you need back out of the pointer in the function to compare.
The declaration for a comparison function for qsort is:
int compare (const void *a, const void *b);
To write the compare function, all you need to ask yourself is "What do I need to compare to sort my collection the way I want it sorted?" In this case you know you want to sort the array of structs by the word seen in each element of the array of wfstruct. You know seen will be a simple character string, so you can sort using strcmp.
Then the final thing you need to ask yourself is "How in the heck do I get my seen string out of const void *a (and *b) so I can feed it to strcmp?" Here you know the const void *a must represent the basic element of what you will be sorting, which is struct wfstruct. So you know that const void *a is a pointer to wfstruct. Since it will be a pointer, you know you must use the -> operator to derefernce the seen member of the struct. (e.g. the seen member is access as mystruct->seen.
But "what is the rule regarding dereferncing a void pointer?" (Answer: "you can't derefernce a void pointer") How do you handle this? Simple, you just declare a pointer of type struct wfstruct in your compare function and typecase a to (wfstruct *). Example:
wfstruct *ap = (wfstruct *)a;
Now you have a good-ole pointer to struct wfstruct (or simply pointer to wfstruct since we included the typedef for wfstruct in its declaration). You do the same thing for b and now you can pass ap->seen and bp->seen to strcmp and sort your array of struct:
int compare (const void *a, const void *b)
{
wfstruct *ap = (wfstruct *)a;
wfstruct *bp = (wfstruct *)b;
return (strcmp (ap->seen, bp->seen));
}
The call to qsort in your program is nothing more than:
/* sort words alphabetically */
qsort (words, idx, sizeof *words, compare);
With the basics out of the way, you can now move the needed code to a function to allow you to read multiple files as arguments, keep a total of the number of words seen between files (as well as their frequency) and then sort the resulting array of structs alphabetically.
note: to keep track of the total number of words between multiple files (calls to your funciton), you can either return the number of words gathered for each file as the return from your read function, and keep a total that way, or you can simply pass a pointer to your total to the read function and have it updated directly in the function. We will take the second approach below.
Putting the pieces together, you get:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXW 100
#define MAXC 32
typedef struct {
char seen[MAXC];
size_t freq;
} wfstruct;
int get_word_freq (wfstruct *words, size_t *idx, FILE *fp);
int compare (const void *a, const void *b);
int main (int argc, char **argv) {
/* initialize variables & open file or stdin for seening */
wfstruct words[MAXW] = {{{ 0 }, 0}};
size_t i, idx = 0;
FILE *fp = NULL;
if (argc < 2) { /* read from stdin */
get_word_freq (words, &idx, stdin);
}
else {
/* read each file given on command line */
for (i = 1; i < (size_t)argc; i++)
{ /* open file for reading */
if (!(fp = fopen (argv[i], "r"))) {
fprintf (stderr, "error: file open failed '%s'.\n",
argv[i]);
continue;
}
/* check 'idx' against MAXW */
if (idx == MAXW) break;
get_word_freq (words, &idx, fp);
}
}
/* sort words alphabetically */
qsort (words, idx, sizeof *words, compare);
printf ("\nthe occurrence of words are:\n\n");
for (i = 0; i < idx; i++)
printf (" %-28s : %zu\n", words[i].seen, words[i].freq);
return 0;
}
int get_word_freq (wfstruct *words, size_t *idx, FILE *fp)
{
char word[MAXC] = {0};
size_t i;
/* read 1st word into array, update index 'idx' */
if (*idx == 0) {
if (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1) {
strcpy (words[*idx].seen, word);
words[*idx].freq++;
(*idx)++;
}
else {
fprintf (stderr, "error: file read error.\n");
return 1;
}
}
/* read each word in file */
while (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1) {
/* check against all words in struct */
for (i = 0; i < *idx; i++) {
/* if word already 'seen', update 'words[i]. freq' count */
if (strcmp (words[i].seen, word) == 0) {
words[i].freq++;
goto skipdup; /* skip adding word to 'words[i].seen' */
}
} /* add to 'words[*idx].seen', update words[*idx].freq & '*idx' */
strcpy (words[*idx].seen, word);
words[*idx].freq++;
(*idx)++;
skipdup:
if (*idx == MAXW) { /* check 'idx' against MAXW */
fprintf (stderr, "warning: MAXW words exceeded.\n");
break;
}
}
fclose (fp);
return 0;
}
/* qsort compare funciton */
int compare (const void *a, const void *b)
{
wfstruct *ap = (wfstruct *)a;
wfstruct *bp = (wfstruct *)b;
return (strcmp (ap->seen, bp->seen));
}
Output
$ ./bin/file_words_occur_multi dat/words.txt dat/words.txt
the occurrence of words are:
a : 2
avoid : 2
brave : 2
brown : 2
captain : 2
dog : 4
fox : 4
in : 2
jack : 2
jumps : 4
lazy : 2
of : 2
on : 2
over : 4
path : 2
pirate : 2
quick : 2
seas : 2
seven : 2
so : 2
sparrow : 2
squirrel : 4
the : 16
to : 2
was : 2
Passing Index (idx) as Non-Pointer
As mentioned above, there are two ways to keep track of the number of unique words seen across multiple files: (1) pass the index and keep the total in main, or (2) pass a pointer to the index and update its value directly in the function. The example above passes a pointer. Since the additional syntax required to dereference and properly use the pointer value can be challenging for those new to C, here is an example of passing idx as a simple variable and keeping track of the total in main.
(note: you are required to pass the index either way, it's your choice whether you pass idx as a regular variable and work with a copy of the variable in the function, or whether you pass idx as a pointer and operate on the value directly in the function)
Here are the simple changes to get_word_freq and the changes required in main follow (note: size_t is chosen as the type rather than int because the array index can never be negative):
size_t get_word_freq (wfstruct *words, size_t idx, FILE *fp)
{
char word[MAXC] = {0};
size_t i;
/* read 1st word into array, update index 'idx' */
if (idx == 0) {
if (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1) {
strcpy (words[idx].seen, word);
words[idx].freq++;
idx++;
}
else {
fprintf (stderr, "error: file read error.\n");
return idx;
}
}
/* read each word in file */
while (fscanf (fp, " %32[^ ,.\t\n]%*c", word) == 1) {
/* check against all words in struct */
for (i = 0; i < idx; i++) {
/* if word already 'seen', update 'words[i]. freq' count */
if (strcmp (words[i].seen, word) == 0) {
words[i].freq++;
goto skipdup; /* skip adding word to 'words[i].seen' */
}
} /* add to 'words[*idx].seen', update words[*idx].freq & '*idx' */
strcpy (words[idx].seen, word);
words[idx].freq++;
idx++;
skipdup:
if (idx == MAXW) { /* check 'idx' against MAXW */
fprintf (stderr, "warning: MAXW words exceeded.\n");
break;
}
}
fclose (fp);
return idx;
}
The changes required in main:
...
if (argc < 2) { /* read from stdin */
idx = get_word_freq (words, idx, stdin);
}
else {
/* read each file given on command line */
for (i = 1; i < (size_t)argc; i++)
{ /* open file for reading */
...
/* check 'idx' against MAXW */
if ((idx = get_word_freq (words, idx, fp)) == MAXW)
break;
}
}
...
Let me know if you have further questions.
There are still many things to add to your program!
Loop over input files given on command line. A simple C way could be:
int main(int argc, char *argv[]) {
FILE *fd;
...
while (*(argv++) != NULL) {
if strcmp(*argv, "-") { /* allow - to stand for stdin */
fd = stdin;
}
else {
fd = fopen(*argv, "r");
if (fd == NULL) {
/* process error condition */
...
}
/* process file */
...
if (fd != stdin) fclose(fd); /* don't forget to close */
}
return 0;
}
Split the files in words
char word[64];
int cr;
while ((cr = fscanf(fd, "%63s", word)) == 1) {
filter(word); /* optionally convert to lower case, remove punctuation... */
/* process word */
...
}
store the words in a container and count their occurence. At the simplest level, you can use an array with linear search, but a tree would be much better.
unsigned int maxWord = 2048, totWord = 0, nWord = 0;
typedef {
char *word;
int count;
} stat;
stat * st = calloc(maxWord, sizeof(stat));
and later
void add(stat *st, const char * word) {
unsigned int i;
totWord += 1;
for (i=0; i<nWord; i++) {
if (strcmp(word, st[i].word) == 0) {
st[i].count += 1;
return;
}
}
if (nWord < maxWord) {
st[nWord].word = strdup(word);
st[nWord].count += 1;
nWord += 1;
}
}
You now have to glue above together, sort the st array (with qsort), and the frequency of each word is ((float) st[i].count) / totWord

How to constantly define and then empty a string in a loop with C?

Suppose that I need to write a program in C that takes in a name of an edible plant
and then print if it is a fruit or a vegetable. You know input will be one of three
things; apple, orange, or cabbage.
Here is the pseudocode for how I would do it:
while input != quit:
read input
if input is orange or apple(or some other fruit then print fruit)
else if input is cabbage print vegetable
The way I would do it in C is by using a while loop and storing input as
a string. Then compare input to a define set of elements (fruit set or vegetable set) and output results. Pretty straightforward.
This way input is read and initialized to e.g. string S.
Then S is compared to the set and accordingly output is given or program quits.
But this means string S is rewritten with every new iteration. Since we don't know
the length of input string for the next iteration we need to 'clear' the entire length
of S so that a new input can be rewritten to it correctly.
But this means for each input string
of length n, the program has to 'insert' a character into string S n times and when done must
'clear' a character entered also n times before the next iteration. I am wondering
if there is a more efficient way to carry out this task.
P.S. A full explanation of a possible solution is appreciated. I kindly request refraining
from referencing a function from a standard library(e.g. string.h) or another without
indicating how it works.
This is one problem where you can really make it just as difficult as you want to.. It all depends on your needs. If you have a static set of fruits and vegetables (or stuff), then you can use something as simple as a comma separated string literal to hold all values in one category or another. (the comma is just for readability)
If on the other hand, you need to be able to insert and delete new values to fruits or vegetables, then you will need a datastruct that will allow for easy insertion/deletion and searching. Which datastruct will depended on how many and how often inserts/deletes are done. (any will work, but e.g. a bst is less efficient on insert/delete, but more efficient on search -- it's a tradeoff)
However, regardless of how simple or complex your implementation, it all boils down to a reliable way to get input and a reliable test that is flexible enough to handle what the user inputs. The following is just one example of a simple implementation with consideration given to conversion of all input to lowercase, a reasonable input routine and a simple comparison with strstr against string literals. You can compare with other answers and get a flavor for the different approaches that you can take:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
#define MAXS 255
char *str2lower (char *str); /* convert string to lowercase */
int main () {
char line[MAXS] = {0}; /* array of 255 chars for input set to 0 */
int match = 0; /* var to test scanf result */
char *fruit = "orange,apple,banana"; /* if fruits are known, a simple string is fine */
char *veg = "cabbage,lettuce"; /* same for vegetables */
int cnt = 1; /* simple counter */
while (strcmp (line, "quit") != 0) /* loop until quit or match failure */
{
if (cnt == 1) printf ("\nEnter produce (quit to end): ");
else printf (" next: ");
match = scanf ("%[^\n]%*c", line); /* match = 1 on success (1 match) up to 255 chars */
if (!match)
strncpy (line, "quit", strlen ("quit") + 1); /* set quit to exit (could use break;) */
str2lower(line); /* convert input to lower case before test */
if (strstr (fruit, line)) /* use simple strstr test to categorize */
printf (" entry [%2d] %-10s -> fruit\n", cnt, line);
else if (strstr (veg, line))
printf (" entry [%2d] %-10s -> vegetable\n", cnt, line);
else if (strcmp (line, "quit") != 0)
printf (" entry [%2d] %-10s -> no match\n", cnt, line);
cnt++;
}
return 0;
}
char *str2lower (char *str)
{
if (!str) return NULL;
char *p = str;
for (;*p;p++)
if ('A' <= *p && *p <= 'Z')
*p += 32;
return str;
}
output:
$ ./bin/produce
Enter produce (quit to end): cherry
entry [ 1] cherry -> no match
next: apple
entry [ 2] apple -> fruit
next: lettuce
entry [ 3] lettuce -> vegetable
next: banana
entry [ 4] banana -> fruit
next: quit
Would it be cheating to store some extra information along with the plant names? Something like
#include <stdio.h>
#include <string.h>
#define PLANT_COUNT 3
int main(int argc, char**argv)
{
char line[10 + 1] = {'\0'};
char *plants[PLANT_COUNT][10] = { {"apple", "fruit"}, {"orange", "fruit"}, {"cabbage", "vegetable"} };
unsigned idx = 0;
while (fgets(line, 10, stdin) != NULL)
{
/* Gets rid of the \n placed by fgets */
strtok(line, "\n");
for (idx = 0; idx < PLANT_COUNT; idx++)
{
if (strcmp(line, plants[idx][0]) == 0)
{
printf("Found %s. It is a %s\n", line, plants[idx][1]);
}
}
}
return 0;
}

Resources