traversing C string: get the last word of a string - c

how would you get the last word of a string, starting from the '\0' newline character to the rightmost space? For example, I could have something like this where str could be assigned a string:
char str[80];
str = "my cat is yellow";
How would I get yellow?

Something like this:
char *p = strrchr(str, ' ');
if (p && *(p + 1))
printf("%s\n", p + 1);

In case you don't want to use 'strrchr' function, Here is the solution.
i = 0;
char *last_word;
while (str[i] != '\0')
{
if (str[i] <= 32 && str[i + 1] > 32)
last_word = &str[i + 1];
i++;
}
i = 0;
while (last_word && last_word[i] > 32)
{
write(1, &last_word[i], 1);
i++;
}

I would use function strrchr()

The best way to do this is to take advantage of existing solutions. One such solution (to a much more general problem) is Perl Compatible Regular Expressions, an open-source regular expression library for C. So, you can match the string "my cat is yellow" with the regular expression \b(\w+)$ (expressed in C as "\b(\w+)$") and keep the first captured group, which is "yellow."

(heavy sigh) The original code is WRONG in standard / K&R / ANSI C! It does NOT initialize the string (the character array named str)! I'd be surprised if the example compiled. What your program segment really needs is
if strcpy(str, "my cat is yellow")
{
/* everything went well, or at least one or more characters were copied. */
}
or, if you promised not to try to manipulate the string, you could use a char pointer to the hard-coded "my cat is yellow" string in your source code.
If, as stated, a "word" is bounded by a space character or a NULL character, then it would be faster to declare a character pointer and walk backwards from the character just before the NULL. Obviously, you'd first have to be sure that there was a non-empty string....
#define NO_SPACE 20
#define ZERO_LENGTH -1
int iLen;
char *cPtr;
if (iLen=strlen(str) ) /* get the number of characters in the sting */
{ /* there is at least one character in the string */
cPtr = (char *)(str + iLen); /* point to the NULL ending the string */
cPtr--; /* back up one character */
while (cPtr != str)
{ /* make sure there IS a space in the string
and that we don't walk too far back! */
if (' ' == *cPtr)
{ /* found a space */
/* Notice that we put the constant on the left?
That's insurance; the compiler would complain if we'd typed = instead of ==
*/
break;
}
cPtr--; /* walk back toward the beginning of the string */
}
if (cPtr != str)
{ /* found a space */
/* display the word and exit with the success code */
printf("The word is '%s'.\n", cPtr + 1);
exit (0);
}
else
{ /* oops. no space found in the string */
/* complain and exit with an error code */
fprintf(STDERR, "No space found.\n");
exit (NO_SPACE);
}
}
else
{ /* zero-length string. complain and exit with an error code. */
fprintf(STDERR, "Empty string.\n");
exit (ZERO_LENGTH);
}
Now you could argue that any non-alphabetic character should mark a word boundary, such as "Dogs-chase-cats" or "my cat:yellow". In that case, it'd be easy to say
if (!isalpha(*cPtr) )
in the loop instead of looking for just a space....

Related

split a user inputed string at a specific letter in c

I am trying to write an if else statement that looks at a user input and then splits it after index[1] if the string includes the letter b or split after index[0] if the string input doesnt include the letter b. How would I approach that? Pretty new to C so not too sure.
This is what I have so far... I think im on the right path and am trying to figure out how I would finish it off so it does what I want it to do.
int split_note_and_chord(char* string, char* note, char* chord)
{
for(user input doesnt have b in it)
{
if(i = 0; i <index; i++)
{
note[i] = string[i];
}
note[index] = 0;
else{ if(i = 0; i < index; i++)
{
note[i] = strlen(string[2]);
}
}
}
C string is nothing but a char array
string.h provides handy functions to check the string contents
you can use if condition and strstr and strchr functions for your logic
For example
#include <stdio.h>
#include <string.h>
int main () {
const char *input = "backwards";
char *ret;
ret = strstr(input, "b");
if( ret != NULL ) {
} else {
}
}
The strstr will return NULL if the b does not exist
You can also use strchr if you want the second argument as single char strchr(input, 'b');
There are a number of ways to approach splitting your input string after the 2nd character if the input contains 'b' or after the 1st character otherwise. Since you are dealing with either a 1 or 2, all you need to do is determine if 'b' is present. The easiest way to do that is with strchr() which will search a given string for the first occurrence of a character, returning a pointer to that character if found, or NULL otherwise. See man 3 strchr
So you can use strchr to test if 'b' is present, if the return isn't NULL split the string after the 2nd character, if it is NULL, split it after the first.
A simple implementation using a ternary to set the split-after size for input read into a buffer buf would be:
char part2[MAXC]; /* buffer to hold 2nd part */
size_t split; /* number of chars to split */
/* if buf contains 'b', set split at 2, otherwise set at 1 */
split = strchr(buf, 'b') ? 2 : 1;
strcpy (part2, buf + split); /* copy part2 from buf */
buf[split] = 0; /* nul-terminate buf at split */
A quick implementation allowing you to enter as many strings as you like and it will split after the 1st or 2nd character depending on the absense, or presence of 'b' would be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void) {
char buf[MAXC]; /* buffer to hold line of input */
fputs ("Enter a string to split (or [Enter] alone to exit)\n\n"
"string: ", stdout);
while (fgets (buf, MAXC, stdin)) { /* loop reading each line */
char part2[MAXC]; /* buffer to hold 2nd part */
size_t split; /* number of chars to split */
if (*buf == '\n') /* if [Enter] alone, exit */
break;
/* if buf contains 'b', set split at 2, otherwise set at 1 */
split = strchr(buf, 'b') ? 2 : 1;
strcpy (part2, buf + split); /* copy part2 from buf */
buf[split] = 0; /* nul-terminate buf at split */
printf (" part1: %s\n part2: %s\nstring: ", buf, part2);
}
}
(note: if you are unfamiliar with the ternary operator, it is simple (test) ? if_true : if_false. Above it is just shorthand for if (strchar (buf, 'b') != NULL) split = 2; else split = 1;)
Example Use/Output
$ ./bin/splitb
Enter a string to split (or [Enter] alone to exit)
string: look out
part1: l
part2: ook out
string: look out below
part1: lo
part2: ok out below
string:
Let me know if this is what you intended. If not, I'm happy to help further. Also, if you have any questions, just let me know.
Edit Based on Comment
It is still unclear what your list of notes are in your header file, but you can simply use a string constant to contain the letters of the notes, e.g.
#define NOTES "abcdefg" /* (that can be a string constant as well) */
(you can add upper case if needed or you can convert the input to lower -- whatever works for you)
If you simply need to find the first occurrence of one of the letters in the NOTES string, then strpbrk() will allow you to do just that returning a pointer to the first character of NOTES found in your string. (you must have some way to handle the user entering, e.g. "the note cflat", which would split on the first 'e' instead of 'c', but you will need to provide further specifics there)
Another consideration is how long note can be. If it is always 1-character, then you can simplify by just comparing against the first character in the string using strchr (NOTES, buf[0]) (which turns the way you normally think about using strchr() around -- using the fist string NOTES and the first char read from user input.
Taking a general approach that would break "---cflat---" into "---c" and "flat---", your function could be similar to:
int split_note_and_chord (char *string, char *note, char *chord)
{
char *p = strpbrk (string, NOTES); /* pointer to first of NOTES in string */
if (p != NULL) { /* if found */
strcpy (note, string); /* copy string to note */
note[p - string + 1] = 0; /* nul-terminate after note */
strcpy (chord, p + 1); /* copy rest to chord */
return 1; /* return success */
}
*note = 0; /* make note and chord empty-string */
*chord = 0;
return 0; /* return failure */
}
(note: if there is no char in NOTES found, then note and chord are made the empty-string by nul-terminating at the first character before returning zero to indication no-note found.)
A quick implementation similar to the first could be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
#define NOTES "abcdefg" /* (that can be a string constant as well) */
int split_note_and_chord (char *string, char *note, char *chord)
{
...
}
int main (void) {
char buf[MAXC], /* buffer to hold line of input */
note[MAXC], /* buffer for node */
chord[MAXC]; /* buffer for chord */
fputs ("Enter a string with node and chord (or [Enter] alone to exit)\n\n"
"string: ", stdout);
/* loop reading each line until [Enter] alone */
while (fgets (buf, MAXC, stdin) && *buf != '\n') {
if (split_note_and_chord (buf, note, chord))
printf (" note : %s\n chord : %s\n", note, chord);
else
fputs ("\nerror: note not found in string.\n\n", stderr);
fputs ("string: ", stdout);
}
return 0;
}
(note: that using fgets() will read and include the '\n' resulting from the user pressing Enter in buf and thus it will also be included in the remainder copied to chord. You can use buf[strcspn (buf, "\n")] = 0; to trim it from buf -- or from chord by substituting chord for buf in the call using strcspn() as the index to nul-terminate at.)
(also note: you can adjust MAXC to fit your needs -- which is why you declare a constant in the first place -- to make it a simple change of one line at the top of your file)
Example Use/Output
Using your function to split various input would result in the following:
$ ./bin/splitb3
Enter a string with node and chord (or [Enter] alone to exit)
string: ---cflat---
note : ---c
chord : flat---
string: asharp
note : a
chord : sharp
string: bflat
note : b
chord : flat
string: hook
error: note not found in string.
string: c
note : c
chord :
There are many, many different way to do this and how to best approach it will depend on how you have your notes and chords defined in your header -- as well as what, if any, limitations you put on what format you require the user to enter. If you need more help, please edit your question and Add the contents of your header so we will know how they are declared and defined, as well as listing any constraints you want to place on what the user can enter.

Replacing a whole word and not substrings in a string in C

I am trying to replace a whole word in C array of characters and skip the substrings. I made research and I ended up with really hard resolutions while I think I have better idea if someone can give me a hand.
Let's say I have the string:
char sentence[100]= "apple tree house";
And I would like to replace tree with the number 12:
"apple 12 house"
I know that the words are delimited by space so my idea is to :
1.Tokenize the string with delimiter white space
2.In the while loop checking with the library function STRCMP if the string is equal to the token and if it is then to be replaced.
The problem for me comes when I try to replace the string as I couldn't make it.
void wordreplace(char string[], char search[], char replace[]) {
// Tokenize
char * token = strtok(string, " ");
while (token != NULL) {
if (strcmp(search, token) == 0) {
REPLACE SEARCH STRING WITH REPLACE STRING
}
token = strtok(NULL, " ");
}
printf("Sentence : %s", string);
}
Any suggestions what I can use ? I guess it might be really simple but I am beginner much appreciated :)
[EDIT]: Spaces are the only delimiters and usually the string to be replaced is not longer than the original.
I would avoid strtok in this case (because it will modify the string as a side effect of tokenizing it), and approach this by looking at the string essentially character-by-character and maintaining a "read" and "write" index. Because the output can never be longer than the input, the write index will never get ahead of the read one, and you can "write-back" and make the change within the same string.
To visualize this, I find it useful to write out the input in boxes and draw arrows to current read and write indexes and track through the process so you can verify that you have a system that will do what you want it to do and that your loops and indexes all work like you expect.
Here is one implementation that matches how my own mind tends to approach this sort of algorithm. It walks the string and looks ahead to try matching from the current character. If it finds a match, it copies the replace onto the current spot, and increments both indexes accordingly.
void wordreplace(char * string, const char * search, const char * replace) {
// This is required to be true since we're going to do the replace
// in-place:
assert(strlen(replace) <= strlen(search));
// Get ourselves set up
int r = 0, w = 0;
int str_len = strlen(string);
int search_len = strlen(search);
int replace_len = strlen(replace);
// Walk through the input character by character.
while (r < str_len) {
// Is this character the start of a matching token? It is
// if we see the search string followed by a space or end of
// string.
if (strncmp(&string[r], search, search_len) == 0 &&
(string[r+search_len] == ' ' || string[r+search_len] == '\0')) {
// We matched the search token. Copy the replace token.
memcpy(&string[w], replace, replace_len);
// Update our indexes.
w += replace_len;
r += search_len;
} else {
// Otherwise just copy this character.
string[w++] = string[r++];
}
}
// Be sure to terminate the final version of the string.
string[w] = '\0';
}
(Note that I tweaked your function signature to use the more idiomatic pointer notation rather than char arrays, and per flu's comment below, I marked the search and replace tokens as "const" which is a way of the function advertising that it will not modify those strings.)
To do what you want to do becomes a little more involved because you need to handle the scenarios where:
replacement is shorter than original -- so you will need to move the remainder of line to follow the replacement text to avoid leaving empty space;
replacement is same length as original -- trivial case, just overwrite original with replacement; and finally
replacement is longer than original -- where you must validate the original string plus the replacement length difference will still fit in the storage for the original string, you must copy the end of line to a temporary buffer before making the replacement, and then add the rest of the line in the temporary buffer to the end.
strtok is some disadvantages here due to it making changes to the original string during the tokenizing process. (you can just make a copy, but if you want an in-place replacement, you need to look further). A combination of strstr and strcspn allow you to operate on the original string in more efficient manner when looking for a specific search string within the original.
strcspn can be used like strtok with the set of delimiters to provide the length of the current token found (to ensure strstr didn't match your search term as a lesser-included-substring of a longer word, like tree in trees) Then it becomes a simple matter of looping with strstr and validating the length of the token with strcspn and then just applying one of the three cases above.
A short example implementation with comments included in-line to help you follow along could be:
#include <stdio.h>
#include <string.h>
#define MAXLIN 100
void wordreplace (char *str, const char *srch,
const char *repl, const char *delim)
{
char *p = str; /* pointer to str */
size_t lenword, /* length of word found */
lenstr = strlen (str), /* length of total string */
lensrch = strlen (srch), /* length of search word */
lenrepl = strlen (repl); /* length of replace word */
while ((p = strstr (p, srch))) { /* srch exist in rest of string? */
lenword = strcspn (p, delim); /* get length of word found */
if (lenword == lensrch) { /* word len match search len */
if (lenrepl == lensrch) /* if replace is same len */
memcpy (p, repl, lenrepl); /* just copy over */
else if (lenrepl > lensrch) { /* if replace is longer */
/* check that additional lenght will fit in str */
if (lenstr + lenrepl - lensrch > MAXLIN - 1) {
fputs ("error: replaced length would exeed size.\n",
stderr);
return;
}
if (!p[lenword]) { /* if no following char */
memcpy (p, repl, lenrepl); /* just copy replace */
p[lenrepl] = 0; /* and nul-terminate */
}
else { /* store rest of line in buffer, replace, add end */
char endbuf[MAXLIN]; /* temp buffer for end */
size_t lenend = strlen (p + lensrch); /* end length */
memcpy (endbuf, p + lensrch, lenend + 1); /* copy end */
memcpy (p, repl, lenrepl); /* make replacement */
memcpy (p + lenrepl, endbuf, lenend); /* add end after */
}
}
else { /* otherwise replace is shorter than search */
size_t lenend = strlen (p + lenword); /* get end length */
memcpy (p, repl, lenrepl); /* copy replace */
/* move end to after replace */
memmove (p + lenrepl, p + lenword, lenend + 1);
}
}
}
}
int main (int argc, char **argv) {
char str[MAXLIN] = "apple tree house in the elm tree";
const char *search = argc > 1 ? argv[1] : "tree",
*replace = argc > 2 ? argv[2] : "12",
*delim = " \t\n";
wordreplace (str, search, replace, delim);
printf ("str: %s\n", str);
}
Example Use/Output
Your replace "tree" with "12" example in "apple tree house in the elm tree":
$ ./bin/wordrepl_strstr_strcspn
str: apple 12 house in the elm 12
A simple same-length replacement of "tree" with "core", e.g.
$ ./bin/wordrepl_strstr_strcspn tree core
str: apple core house in the elm core
The "longer than" replacemnt of "tree" with "bobbing":
$ ./bin/wordrepl_strstr_strcspn tree bobbing
str: apple bobbing house in the elm bobbing
There are many different ways you can approach this problem, so no one way is the right way. The key is to make it understandable and reasonably efficient. Look things over and let me know if you have further questions.

How to check if an index contains a symbol?

I want to check to make sure that a given string contained in an array called secretWord has no symbols in it (e.g. $ % & #). If it does have a symbol in it, I make the user re-enter the string. It takes advantage of recursion to keep asking until they enter a string that does not contain a symbol.
The only symbol I do accept is the NULL symbol (the symbol represented by the ASCII value of zero). This is because I fill all the empty space in the array with NULL symbols.
My function is as follows:
void checkForSymbols(char *array, int arraysize){ //Checks for symbols in the array and if there are any it recursively calls this function until it gets input without them.
for (int i = 0; i < arraysize; i++){
if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != (char) 0){
flushArray(array, arraysize);
printf("No symbols are allowed in the word. Please try again: ");
fgets(secretWord, sizeof(secretWord) - 1, stdin);
checkForSymbols(secretWord, sizeof(secretWord));
}//end if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != 0)
else
continue;
}//end for(i = 0; i < sizeof(string[]); i++){
}//end checkForSymbols
The problem: When I enter any input (see example below), the if statement runs (it prints No symbols are allowed in the word. Please try again: and asks for new input).
I assume the problem obviously stems from the statement if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != (char) 0). But I have tried changing the (char) 0 part to '\0' and 0 as well and neither change had any effect.
How do I compare if what is in the index is a symbol, then? Why are strings without symbols setting this if statement off?
And if any of you are wondering what the "flushArray" method I used was, here it is:
void flushArray(char *array, int arraysize){ //Fills in the entire passed array with NULL characters
for (int i = 0; i < arraysize; i++){
array[i] = 0;
}
}//end flushArray
This function is called on the third line of my main() method, right after a print statement on the first line that asks users to input a word, and an fgets() statement on the second line that gets the input that this checkForSymbols function is used on.
As per request, an example would be if I input "Hello" as the secretWord string. The program then runs the function on it, and the if statement is for some reason triggered, causing it to
Replace all values stored in the secretWord array with the ASCII value of 0. (AKA NULL)
Prints No symbols are allowed in the word. Please try again: to the console.
Waits for new input that it will store in the secretWord array.
Calls the checkForSymbols() method on these new values stored in secretWord.
And no matter what you input as new secretWord, the checkForSymbols() method's if statement fires and it repeats steps 1 - 4 all over again.
Thank you for being patient and understanding with your help!
You can do something like this to find symbols in your code, put the code at proper location
#include <stdio.h>
#include <string.h>
int main () {
char invalids[] = "#.<#>";
char * temp;
temp=strchr(invalids,'s');//is s an invalid character?
if (temp!=NULL) {
printf ("Invalid character");
} else {
printf("Valid character");
}
return 0;
}
This will check if s is valid entry or not similarly for you can create an array and do something like this if array is not null terminated.
#include <string.h>
char false[] = { '#', '#', '&', '$', '<' }; // note last element isn't '\0'
if (memchr(false, 'a', sizeof(false)){
// do stuff
}
memchr is used if your array is not null terminated.
As suggested by #David C. Rankin you can also use strpbrk like
#include <stdio.h>
#include <string.h>
int main () {
const char str1[] = ",*##_$&+.!";
const char str2[] = "##"; //input string
char *ret;
ret = strpbrk(str1, str2);
if(ret) {
printf("First matching character: %c\n", *ret);
} else {
printf("Continue");
}
return(0);
}
The only symbol I do accept is the NULL symbol (the symbol represented by the ASCII value of zero). This is because I fill all the empty space in the array with NULL symbols.
NULL is a pointer; if you want a character value 0, you should use 0 or '\0'. I assume you're using memset or strncpy to ensure the trailing bytes are zero? Nope... What a shame, your MCVE could be so much shorter (and complete). :(
void checkForSymbols(char *array, int arraysize){
/* ... */
if (!isdigit(array[i]) && !isalpha(array[i]) /* ... */
As per section 7.4p1 of the C standard, ...
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
Not all char values are representable as an unsigned char or equal to EOF, and so it's possible (and highly likely given the nature of this question) that the code above invokes undefined behaviour.
As you haven't completed your question (by providing an MCVE, and describing what errors are occuring) I'm assuming that the question you're trying to ask might be a duplicate of this question, this question, this question, this question and probably a whole lot of others... If so, did you try Googling the error message? That's probably the first thing you should've done. Should that fail in the future, ask a question about the error message!
As per request, an example would be if I input "Hello" as the secretWord string.
I assume secretWord is declared as char secretWord[] = "Hello"; in your example, and not char *secretWord = "Hello";. The two types are distinct, and your book should clarify that. If not, which book are you reading? I can probably recommend a better book, if you'd like.
Any attempt to modify a string literal (i.e. char *array = "Hello"; flushArray(array, ...)) is undefined behaviour, as explained by answers to this question (among many others, I'm sure).
It seems a solution to this problem might be available by using something like this...
In response to your comment, you are probably making it a bit tougher on yourself than it needs to be. You have two issues to deal with (one you are not seeing). The first being to check the input to validate only a-zA-Z0-9 are entered. (you know that). The second being you need to identify and remove the trailing '\n' read and included in your input by fgets. (that one may be tripping you up)
You don't show how the initial array is filled, but given your use of fgets on secretWord[1], I suspect you are also using fgets for array. Which is exactly what you should be using. However, you need to remove the '\n' included at the end of the buffer filled by fgets before you call checkforsymbols. Otherwise you have character 0xa (the '\n') at the end, which, of course, is not a-zA-Z0-9 and will cause your check to fail.
To remove the trailing '\n', all you need to do is check the last character in your buffer. If it is a '\n', then simply overwrite it with the nul-terminating character (either 0 or the equivalent character representation '\0' -- your choice). You simply need the length of the string (which you get with strlen from string.h) and then check if (string[len - 1] == '\n'). For example:
size_t len = strlen (str); /* get length of str */
if (str[len - 1] == '\n') /* check for trailing '\n' */
str[--len] = 0; /* overwrite with nul-byte */
A third issue, important, but not directly related to the comparison, is to always choose a type for your function that will return an indication of Success/Failure as needed. In your case the choice of void gives you nothing to check to determine whether there were any symbols found or not. You can choose any type you like int, char, char *, etc.. All will allow the return of a value to gauge success or failure. For testing strings, the normal choice is char *, returning a valid pointer on success or NULL on failure.
A fourth issue when taking input is you always need to handle the case where the user chooses to cancel input by generating a manual EOF with either ctrl+d on Linux or ctrl+z on windoze. The return of NULL by fgets gives you that ability. But with it (and every other input function), you have to check the return and make use of the return information in order to validate the user input. Simply check whether fgets returns NULL on your request for input, e.g.
if (!fgets (str, MAXS, stdin)) { /* read/validate input */
fprintf (stderr, "EOF received -> user canceled input.\n");
return 1; /* change as needed */
}
For your specific case where you only want a-zA-Z0-9, all you need to do is iterate down the string the user entered, checking each character to make sure it is a-zA-Z0-9 and return failure if anything else is encountered. This is made easy given that every string in C is nul-terminated. So you simply assign a pointer to the start of your string (e.g. char *p = str;) and then use either a for or while loop to check each character, e.g.
for (; *p != 0; p++) { do stuff }
that can be written in shorthand:
for (; *p; p++) { do stuff }
or use while:
while (*p) { do stuff; p++; }
Putting all of those pieces together, you could write your function to take a string as its only parameter and return NULL if a symbol is encountered, or return a pointer to your original string on success, e.g.
char *checkforsymbols (char *s)
{
if (!s || !*s) return NULL; /* validate string and not empty */
char *p = s; /* pointer to iterate over string */
for (; *p; p++) /* for each char in s */
if ((*p < 'a' || *p > 'z') && /* char is not a-z */
(*p < 'A' || *p > 'Z') && /* char is not A-Z */
(*p < '0' || *p > '9')) { /* char is not 0-9 */
fprintf (stderr, "error: '%c' not allowed in input.\n", *p);
return NULL; /* indicate failure */
}
return s; /* indicate success */
}
A short complete test routine could be:
#include <stdio.h>
#include <string.h>
#define MAXS 256
char *checkforsymbols (char *s);
int main (void) {
char str[MAXS] = "";
size_t len = 0;
for (;;) { /* loop until str w/o symbols */
printf (" enter string: "); /* prompt for user input */
if (!fgets (str, MAXS, stdin)) { /* read/validate input */
fprintf (stderr, "EOF received -> user canceled input.\n");
return 1;
}
len = strlen (str); /* get length of str */
if (str[len - 1] == '\n') /* check for trailing '\n' */
str[--len] = 0; /* overwrite with nul-byte */
if (checkforsymbols (str)) /* check for symbols */
break;
}
printf (" valid str: '%s'\n", str);
return 0;
}
char *checkforsymbols (char *s)
{
if (!s || !*s) return NULL; /* validate string and not empty */
char *p = s; /* pointer to iterate over string */
for (; *p; p++) /* for each char in s */
if ((*p < 'a' || *p > 'z') && /* char is not a-z */
(*p < 'A' || *p > 'Z') && /* char is not A-Z */
(*p < '0' || *p > '9')) { /* char is not 0-9 */
fprintf (stderr, "error: '%c' not allowed in input.\n", *p);
return NULL; /* indicate failure */
}
return s; /* indicate success */
}
Example Use/Output
$ ./bin/str_chksym
enter string: mydoghas$20worthoffleas
error: '$' not allowed in input.
enter string: Baddog!
error: '!' not allowed in input.
enter string: Okheisagood10yearolddog
valid str: 'Okheisagood10yearolddog'
or if the user cancels user input:
$ ./bin/str_chksym
enter string: EOF received -> user canceled input.
footnote 1.
C generally prefers the use of all lower-case variable names, while reserving all upper-case for macros and defines. Leave MixedCase or camelCase variable names for C++ and java. However, since this is a matter of style, this is completely up to you.

K&R - Recursive descent parser - strcat

What would be the reason for out[0] = '\0'; on the main() function?
It does seem to be working without it.
Code
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXTOKEN 100
enum { NAME, PARENS, BRACKETS };
int tokentype;
char token[MAXTOKEN]; /*last token string */
char name[MAXTOKEN]; /*identifier name */
char datatype[MAXTOKEN]; /*data type = char, int, etc. */
char out[1000];
void dcl(void);
void dirdcl(void);
int gettoken(void);
/*
Grammar:
dcl: optional * direct-dcl
direct-dcl: name
(dcl)
direct-dcl()
direct-dcl[optional size]
*/
int main() /* convert declaration to words */
{
while (gettoken() != EOF) { /* 1st token on line */
/* 1. gettoken() gets the datatype from the token */
strcpy(datatype, token);
/* 2. Init out to end of the line? */
/* out[0] = '\0'; */
/* parse rest of line */
dcl();
if (tokentype != '\n')
printf("syntax error\n");
printf("%s: %s %s\n", name, out, datatype);
}
return 0;
}
int gettoken(void) /* return next token */
{
int c, getch(void);
void ungetch(int);
char *p = token;
/* Skip blank spaces and tabs */
while ((c = getch()) == ' ' || c == '\t')
;
if (c == '(') {
if ((c = getch()) == ')') {
strcpy(token, "()");
return tokentype = PARENS;
} else {
ungetch(c);
return tokentype = '(';
}
} else if (c == '[') {
for (*p++ = c; (*p++ = getch()) != ']'; )
;
*p = '\0';
return tokentype = BRACKETS;
} else if (isalpha(c)) {
/* Reads the next character of input */
for (*p++ = c; isalnum(c = getch()); ) {
*p++ = c;
}
*p = '\0';
ungetch(c); /* Get back the space, tab */
return tokentype = NAME;
} else
return tokentype = c;
}
/* dcl: parse a declarator */
void dcl(void)
{
int ns;
for (ns = 0; gettoken() == '*'; ) /* count *'s */
ns++;
dirdcl();
while (ns-- > 0)
strcat(out, " pointer to");
}
/* dirdcl: parse a direct declarator */
void dirdcl(void)
{
int type;
if (tokentype == '(') {
dcl();
if (tokentype != ')')
printf("error: missing )\n");
}
else if (tokentype == NAME) /* variable name */ {
strcpy(name, token);
printf("token: %s\n", token);
}
else
printf("error: expected name or (dcl)\n");
while ((type = gettoken()) == PARENS || type == BRACKETS) {
if (type == PARENS)
strcat(out, " function returning");
else {
strcat(out, " array");
strcat(out, token);
strcat(out, " of");
}
}
}
You need out[0] to be zero in order for strcat to work.
While this line
out[0] = '\0';
was required prior to the introduction of static initialization rules, it is no longer required, because static arrays, such as out[], are initialized to all zeros.
According to initialization rules of C99,
...
if it has arithmetic type, it is initialized to (positive or unsigned) zero.
if it is an aggregate, every member is initialized (recursively) according to these rules.
It is resetting the char array (aka string) to empty array. (removing junk values)
like we use:
int i = 0;
before doing something like:
i += 1;
so that junk value don't add
So just '\0' in 0 index of array tells that array is completely empty and the strcat function starts appending value from 0 index, over writing the junk values in other indexes of array.
If program is working without resetting array then it means your IDE tool is doing that for you, but it is good practice to reset it.
In short: In this particular case it's not strictly necessary, but in many other cases that look suspiciously similar, it is, so most people do it as "good style". So why would it be necessary?
There is no such thing as "empty" memory. There is no such thing as a "length". Unless you explicitly keep track of it, or define your own.
Memory is just bytes, which are numbers from 0 to 255. Since 0 is just as valid a number as 255, there is no way to tell whether a byte is used or not. You can "add up" several bytes if you need larger numbers, but everything is built out of bytes, in the end. Text is simply mapped to a number. A couple decades ago it was decided which number represents which character. So if you see a byte with the value 32, it could be a 32. Or it could be the 32nd letter in the computer's alphabet (which is the space character).
When you receive a string and you don't know how much text you will be dealing with, what you usually do is you reserve a large block of bytes. This is what char out[1000]; above does. But how do you tell where the text ends? How much of the 1000 bytes you've already used?
Well, in the old days, some people would just declare another variable, say, int length; and keep track of how many bytes they've used so far. The designers of C went a different route. They decided to pick a very rare character and use that as a marker. They picked the character with the value 0 for that (That is not the character '0'. The character '0' actually is the 48th letter of a computer's alphabet).
So you can just look at all the bytes in your string from the start, and if a character is > 0, you know it is used. If you reach a 0 character, you know this is the end of your string. There are various advantages to either approach. An int uses 4 bytes, an additional 0-character only 1. On the other hand, if you use an int, a string can also contain a 0-character, it's just another character, nobody cares.
Whenever you write "foo" in C, what C actually does is reserve room for 4 bytes, for 'f', 'o', 'o' and for the 0 to indicate the end. When you write "" in C, what it does is reserve room for a single byte, the 0. So that you can tell that the string is empty.
So, what is memory filled with before you put something into it at startup? Well, in most cases, it is just garbage. Whatever was in that memory the last time it was used (after all, you have limited RAM, so when you quit one application on your computer, its memory can get re-used for the next app you launch after that). These will be random numbers, often outside of the range of common characters.
So, if you want strcat to see out as an empty string, you need to give it a block of memory that starts with this 0 value character. If you just leave memory like it is, there might be some random characters in it. Your buffer might contain "jbhasugaudq7e1723876123798dbkda0skno§§^^%$#-9H0HWDZmwus0/usr/local/bin"
or whatever was in that memory before. If you now appended some text to it, it would think the stuff before the first 0 (which is just randomly in this place) was a valid string, and append it to that. It will only know that this string is supposed to be empty, if you put a 0 right at the start.
So why did I say it is "not strictly necessary"? Well, because in your case, out is a global variable, and global variables are special because they automatically get cleared to 0 when your application starts up (or assigned any value that you assign them when you declare them).
However, this is only true for global variables (both regular globals and static globals). So many programmers make it a habit to always initialize their blocks of bytes. That way, if someone later decides to change a global into a local variable, or copy-and-pastes the code to another spot to use with a local variable, they do not have to worry about forgetting to add this statement.
This is especially useful as random memory often contains 0 characters. So depending on what program you previously used, you might not notice you forgot the initial 0 because there happened to be one already in there. And only later, when one of your users runs this application, they get garbage at the start of their string.
Does that clarify things a bit?

How to sscanf only the last word from a string? [duplicate]

how would you get the last word of a string, starting from the '\0' newline character to the rightmost space? For example, I could have something like this where str could be assigned a string:
char str[80];
str = "my cat is yellow";
How would I get yellow?
Something like this:
char *p = strrchr(str, ' ');
if (p && *(p + 1))
printf("%s\n", p + 1);
In case you don't want to use 'strrchr' function, Here is the solution.
i = 0;
char *last_word;
while (str[i] != '\0')
{
if (str[i] <= 32 && str[i + 1] > 32)
last_word = &str[i + 1];
i++;
}
i = 0;
while (last_word && last_word[i] > 32)
{
write(1, &last_word[i], 1);
i++;
}
I would use function strrchr()
The best way to do this is to take advantage of existing solutions. One such solution (to a much more general problem) is Perl Compatible Regular Expressions, an open-source regular expression library for C. So, you can match the string "my cat is yellow" with the regular expression \b(\w+)$ (expressed in C as "\b(\w+)$") and keep the first captured group, which is "yellow."
(heavy sigh) The original code is WRONG in standard / K&R / ANSI C! It does NOT initialize the string (the character array named str)! I'd be surprised if the example compiled. What your program segment really needs is
if strcpy(str, "my cat is yellow")
{
/* everything went well, or at least one or more characters were copied. */
}
or, if you promised not to try to manipulate the string, you could use a char pointer to the hard-coded "my cat is yellow" string in your source code.
If, as stated, a "word" is bounded by a space character or a NULL character, then it would be faster to declare a character pointer and walk backwards from the character just before the NULL. Obviously, you'd first have to be sure that there was a non-empty string....
#define NO_SPACE 20
#define ZERO_LENGTH -1
int iLen;
char *cPtr;
if (iLen=strlen(str) ) /* get the number of characters in the sting */
{ /* there is at least one character in the string */
cPtr = (char *)(str + iLen); /* point to the NULL ending the string */
cPtr--; /* back up one character */
while (cPtr != str)
{ /* make sure there IS a space in the string
and that we don't walk too far back! */
if (' ' == *cPtr)
{ /* found a space */
/* Notice that we put the constant on the left?
That's insurance; the compiler would complain if we'd typed = instead of ==
*/
break;
}
cPtr--; /* walk back toward the beginning of the string */
}
if (cPtr != str)
{ /* found a space */
/* display the word and exit with the success code */
printf("The word is '%s'.\n", cPtr + 1);
exit (0);
}
else
{ /* oops. no space found in the string */
/* complain and exit with an error code */
fprintf(STDERR, "No space found.\n");
exit (NO_SPACE);
}
}
else
{ /* zero-length string. complain and exit with an error code. */
fprintf(STDERR, "Empty string.\n");
exit (ZERO_LENGTH);
}
Now you could argue that any non-alphabetic character should mark a word boundary, such as "Dogs-chase-cats" or "my cat:yellow". In that case, it'd be easy to say
if (!isalpha(*cPtr) )
in the loop instead of looking for just a space....

Resources