Tokenizing a string - c

I am in the process of writing a C program that parses a string and tokenizing it by breaking the string characters into words that are seperated by white space. My question is when i run my current program:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char input[20];
printf("Please enter your word:\n");
scanf("%c", &input);
printf("%c", input[1]);
return 0;
}
If i was to enter the word "This", i would expect to get back "h" when i run the program but instead i get a downwards pointing arrow. However, when the input is set to print out input[0] i get back a "T".
Edit: I have modified my code so that it prints out the whole string now which i will show below
int main()
{
char input[20];
printf("Please enter your words:\n");
scanf("%s", input);
printf("%s", input);
return 0;
}
My goal is to be able to break that string into chars that i can search through to find whitespace and thus being able to isolate those words for example, if my input was "This is bad" i'd like the code to print out
This
is
bad
Edit:
I have modified my code to fit one of these answers but the problem i run into now is that it won't compile
int main()
{
char input[20];
printf("Please enter your words:\n");
size_t offset = 0;
do
{
scanf("%c", input + offset);
offset++;
}
while(offset < sizeof(input) && input[offset - 1] != '\n');
}
printf("%c", input[]);
return 0;

Problems:
1) scanf("%c", input); only set the first element of the array input.
2) printf("%c", input[1]); prints the second element of the array input, which has uninitialized data in it.
Solution:
Small state machine. No limit on string size like 20.
#include <ctype.h>
#include <stdio.h>
int main() {
int ch = fgetc(stdin);
while (ch != EOF) {
while (isspace(ch)) {
// If only 1 line of input allowed, then add
if (ch == '\n') return 0;;
ch = fgetc(stdin);
}
if (ch != EOF) {
do {
fputc(ch, stdout);
ch = fgetc(stdin);
} while (ch != EOF && !isspace(ch));
fputc('\n', stdout);
}
}
return 0;
}

scanf("%c", &input); does not do what you think it does.
First of all, %c scans only a single character: http://www.cplusplus.com/reference/cstdio/scanf/
Second, array's name is already a pointer to it's first element, so stating &input you make a pointer to a pointer, so instead of storing your character in array's first element you store it in pointer to the array which is a very bad thing.
If you really want to use scanf, I recommend a loop:
size_t offset = 0;
do
{
scanf("%c", input + offset);
offset++;
}
while(offset < sizeof(input) && input[offset - 1] != '\n');
Using scanf("%s", input") leaves you vulnerable to buffer overflow attacks if the word is longer than 20 characters http://en.wikipedia.org/wiki/Buffer_overflow
In my example I assumed, that you want to finish your word with a newline character.
EDIT: In scanf documentation is also a good example:
scanf("%19s", input);
It scans no more than 19 characters, which also prevent buffer overflow. But if you want to change input size, you have to change it two places.

You can use
char * strtok ( char * str, const char * delimiters );
to tokenize your string. If you have your input in input[] array and want to tokenize the string accoring to whitespace character, you can do the following :
char *ptr;
ptr = strtok(input, " ");
while(ptr != NULL) {
printf("%s\n", ptr);
ptr = strtok(NULL, " ");
}
Only the first call to strtok() requires the character array as input. Specifying NULL in the next calls means that it will operate on the same character array.

Your scanf only picks up the first character, input[1] contains random garbage. Use scanf("%19s", input) instead.

Related

Is there an elegant way to handle the '\n' that gets read by input functions (getchar(), fgets(), scanf()) in C?

I am trying a simple exercise from K&R to append string2 at the end of string1 using pointers. In case of overflow i.e. buffer of string1 can't contain all of string2 I want to prompt the user to re-enter string2 or exit.
I have written the following code:
#include<stdio.h>
#include<string.h>
#define MAXLINE 1000
int get_input(char *s);
int str_cat(char *s, char *t);
void main()
{
char input1[MAXLINE], input2[MAXLINE], c;
get_input(input1);
check:
get_input(input2);
if((strlen(input1) + strlen(input2) + 2) <= MAXLINE)
{
str_cat(input1, input2);
printf("%s\n", input1);
}
else
{
input2[0] = '\0';
printf("String overflow\n Press: \n 1: Re-enter string. \n 2: Exit.\n");
scanf(" %d", &c);
if(c == 1){
input2[0] = '\0';
get_input(input2);
goto check;
}
}
}
int get_input(char *arr)
{
int c;
printf("Enter the string: \n");
while(fgets(arr, MAXLINE, stdin))
{
break;
}
}
int str_cat(char *s, char *t)
{
while(*s != '\0')
{
s++;
}
while((*s++ = *t++) != '\0')
{
;
}
*s = '\0';
}
Initially, I was using the standard getchar() function mentioned in the book to read the input in get_input() which looked like this:
int get_input(char *arr)
{
int c;
printf("Enter the string: \n");
while((c = getchar()) != '\n' && c != EOF)
{
*arr++ = c;
}
*arr = '\0';
}
I am new and I read this and understood my mistake. I understand that one isn't supposed to use different input functions to read stdin and the '\n' is left in the input stream which is picked by the getchar() causing my condition to fail.
So, I decided to use fgets() to read the input and modified the scanf("%d", &c) as mentioned in the thread with scanf(" %d", c). This does work (kinda) but gives rise to behaviors that I do not want.
So, I have a few questions:
What's a better way to fgets() from reading the input on encountering '\n' than the one I have used?
while(fgets(arr, MAXLINE, stdin))
{
break;
}
fgets() stops reading the line and stores it as an input once it either encounters a '\n' or EOF. But, it ends up storing the '\n' at the end of the string. Is there a way to prevent this or do I have to over-write the '\n' manually?
Even though I used the modified version of scanf(" %d", &c), my output looks like
this: (https://imgur.com/a/RaC2Kc6). Despite that I get Enter the string: twice when prompted to re-enter the second string in case of an overflow situation. Is the modified scanf() messing with my input? And how do I correct it?
In general, do not mix fgets with scanf. Although it may be a bit bloaty, you will avoid many problems by being consistent with reading input with fgets and then parse it with sscanf. (Note the extra s)
A good way to remove the newline is buffer[strcspn(buffer, "\n")] = 0
Example:
// Read line and handle error if it occurs
if(!fgets(buffer, buffersize, stdin)) {
// Handle error
}
// Remove newline (if you want, not necessarily something you need)
buffer[strcspn(buffer, "\n")] = 0;
// Parse and handle error
int val;
if(sscanf(buffer, "%d", &val) != 1) {
// Handle error
}
// Now you can use the variable val
There is one thing here that might be dangerous in certain situations, and that is if buffer is not big enough to hold a complete line. fgets will not read more than buffersize characters. If the line is longer, the remaining part will be left in stdin.

Issues with scanf() and accepting user input

I am trying to take in user input with spaces and store it in an array of characters.
After, I want to take in a single character value and store it as a char.
However, when I run my code, the prompt for the character gets ignored and a space is populated instead. How can I take in an array of chars and still be allowed to prompt for a single character after?
void main()
{
char userIn[30];
char findChar;
printf("Please enter a string: ");
scanf("%[^\n]s", userIn);
printf("Please enter a character to search for: ");
scanf("%c", &findChar);
//this was put here to see why my single char wasnt working in a function I had
printf("%c", findChar);
}
scanf("%c", &findChar); reads the next character pending in the input stream. This character will be the newline entered by the user that stopped the previous conversion, so findChar will be set to the value '\n', without waiting for any user input and printf will output this newline without any other visible effect.
Modify the call as scanf(" %c", &findChar) to ignore pending white space and get the next character from the user, or more reliably write a loop to read the read and ignore of the input line.
Note also that scanf("%[^\n]s", userIn); is incorrect:
scanf() may store bytes beyond the end of userIn if the user types more than 29 bytes of input.
the s after the ] is a bug, the conversion format for character classes is not a variation of the %s conversion.
Other problems:
void is not a proper type for the return value of the main() function.
the <stdio.h> header is required for this code.
Here is a modified version:
#include <stdio.h>
int main() {
char userIn[30];
int c;
char findChar;
int i, found;
printf("Please enter a string: ");
if (scanf("%29[^\n]", userIn) != 1) {
fprintf(stderr, "Input failure\n");
return 1;
}
/* read and ignore the rest of input line */
while ((c = getchar()) != EOF && c != '\n')
continue;
printf("Please enter a character to search for: ");
if (scanf("%c", &findChar) != 1) {
fprintf(stderr, "Input failure\n");
return 1;
}
printf("Searching for '%c'\n", findChar);
found = 0;
for (i = 0; userIn[i] != '\0'; i++) {
if (userIn[i] == findChar) {
found++;
printf("found '%c' at offset %d\n", c, i);
}
}
if (!found) {
printf("character '%c' not found\n", c);
}
return 0;
}
scanf("%[^\n]s", userIn); is a bit weird. The s is guaranteed not to match, since that character will always be \n. Also, you should use a width modifier to avoid a buffer overflow. Use scanf("%29[^\n]", userIn); That alone will not solve the problem, since the next scanf is going to consume the newline. There are a few options. You could consume the newline in the first scanf with:
scanf("%29[^\n]%*c", userIn);
or discard all whitespace in the next call with
scanf(" %c", &findChar);
The behavior will differ on lines of input that exceed 29 characters in length or when the user attempts to assign whitespace to findChar, so which solution you use will depend on how you want to handle those situations.

Function fgets skips user input?

When I use the function fgets, the program skips the user input, effecting the rest of the program. An example program with this effect is:
#include <stdio.h>
int main() {
char firstDigit[2];
char secondDigit[2];
printf("Enter your first digit: ");
fgets(firstDigit, 1, stdin);
printf("\nEnter your second digit: ");
fgets(secondDigit, 1, stdin);
printf("\n\nYour first digit is %s and your second digit is %s.\n", firstDigit, secondDigit);
}
I then thought that maybe the problem was that fgets might be writing the newline, so I changed the code to account for that:
#include <stdio.h>
int main() {
char firstDigit[3];
char secondDigit[3];
printf("Enter your first digit: ");
fgets(firstDigit, 2, stdin);
printf("\nEnter your second digit: ");
fgets(secondDigit, 2, stdin);
printf("\n\nYour first digit is %c and your second digit is %c.\n", firstDigit[0], secondDigit[0]);
}
This time, the first input works properly, but the second input is skipped.
What am I doing incorrectly?
char firstDigit[2] and char secondDigit[2] are not large enough to hold a digit, a newline character, and a null-terminator:
char firstDigit[3];
char secondDigit[3];
Then, the calls to fgets() need to specify the size of the buffer arrays:
fgets(firstDigit, sizeof firstDigit, stdin);
/* ... */
fgets(secondDigit, sizeof secondDigit, stdin);
When instead fgets(firstDigit, 2, stdin); is used, fgets() stores at most two characters, including the \0 character, in firstDigit[]. This means that the \n character is still in the input stream, and this interferes with the second call to fgets().
In answer to OP's comment, How would you remove the unread characters from the input stream?, a good start would be to use more generous allocations for firstDigit[] and secondDigit[]. For example, char firstDigit[100], or even char firstDigit[1000] will be large enough that any expected input will be taken in by fgets(), leaving no characters behind in the input stream. To be more certain that the input stream is empty, a portable solution is to use the idiomatic loop:
int c;
while ((c = getchar()) != '\n' && c != EOF) {
continue;
}
Note here that it is necessary to check for EOF, since getchar() may return this value if the user signals end-of-file from the keyboard, or if stdin has been redirected, or in the unlikely event of an input error. But also note that this loop should only be used if there is at least a \n character still in the input stream. Before attempting to clear the input stream with this method, the input buffer should be checked for a newline; if it is present in the buffer, the input stream is empty and the loop should not be executed. In the code below, strchr() is used to check for the newline character. This function returns a null pointer if the sought-for character is not found in the input string.
#include <stdio.h>
#include <string.h> // for strchr()
int main(void)
{
char firstDigit[3]; // more generous allocations would also be good
char secondDigit[3]; // e.g., char firstDigit[1000];
printf("Enter your first digit: ");
fgets(firstDigit, sizeof firstDigit, stdin);
/* Clear input stream if not empty */
if (strchr(firstDigit, '\n') == NULL) {
int c;
while ((c = getchar()) != '\n' && c != EOF) {
continue;
}
}
putchar('\n');
printf("Enter your second digit: ");
fgets(secondDigit, sizeof secondDigit, stdin);
/* Clear input stream if not empty */
if (strchr(secondDigit, '\n') == NULL) {
int c;
while ((c = getchar()) != '\n' && c != EOF) {
continue;
}
}
puts("\n");
printf("Your first digit is %c and your second digit is %c.\n",
firstDigit[0], secondDigit[0]);
return 0;
}
It may be even better to use a single buffer[] to store lines of input, and then to store individual characters in chars. You could also write a function to clear the input stream, instead of rewriting the same loop each time it is needed:
#include <stdio.h>
#include <string.h> // for strchr()
void clear_stdin(void);
int main(void)
{
char buffer[1000];
char firstDigit;
char secondDigit;
printf("Enter your first digit: ");
fgets(buffer, sizeof buffer, stdin);
firstDigit = buffer[0];
/* Clear input stream if not empty */
if (strchr(buffer, '\n') == NULL) {
clear_stdin();
}
putchar('\n');
printf("Enter your second digit: ");
fgets(buffer, sizeof buffer, stdin);
secondDigit = buffer[0];
/* Clear input stream if not empty */
if (strchr(buffer, '\n') == NULL) {
clear_stdin();
}
puts("\n");
printf("Your first digit is %c and your second digit is %c.\n",
firstDigit, secondDigit);
return 0;
}
void clear_stdin(void)
{
int c;
while ((c = getchar()) != '\n' && c != EOF) {
continue;
}
}
For the first case, fgets(firstDigit, 1, stdin); cannot read anything from the input because the buffer has a size of only 1 byte, and fgets() must store a null terminator into the destination.
For the second case: fgets(firstDigit, 2, stdin); reads 1 byte from stdin, the digit that you typed, and cannot read the newline because the destination array is already full, allowing for the null terminator. The second fgets() reads the pending newline from the first entry and returns immediately for the same reason, not letting you type the second input.
You must allow fgets() to read at least 2 bytes by providing a buffer size of at least 3:
#include <stdio.h>
int main(void) {
char firstDigit[3];
char secondDigit[3];
printf("Enter your first digit: ");
if (!fgets(firstDigit, sizeof firstDigit, stdin))
return 1;
printf("\nEnter your second digit: ");
if (!fgets(secondDigit, sizeof secondDigit, stdin))
return 1;
printf("\n\nYour first digit is %s and your second digit is %s.\n",
firstDigit, secondDigit);
return 0;
}
Note that if you type more than a single character before the enter key, the program will still behave in an unexpected way.
This is a buffer problem. When you press enter, don't know why it is saved in the stdin buffer.
After you perform an fgets(...) you must type fflush(stdin); on all circumstances.
Something like this:
printf("Enter your first digit: ");
fgets(firstDigit, 1, stdin);
fflush(stdin);

Replace the usage of gets with getchar

I currently have a homework assignment and I used gets.
The professor said I should be using getchar instead.
What is the difference?
How would I change my code to use getchar? I can't seem to get it right.
code:
#include <stdio.h>
#include <string.h>
#include <strings.h>
#define STORAGE 255
int main() {
int c;
char s[STORAGE];
for(;;) {
(void) printf("n=%d, s=[%s]\n", c = getword(s), s);
if (c == -1) break;
}
}
int getword(char *w) {
char str[255];
int i = 0;
int charCount = 0;
printf("enter your sentence:\n"); //user input
gets(str);
for(i = 0; str[i] != '\0' && str[i] !=EOF; i++){
if(str[i] != ' '){
charCount++;
} else {
str[i] = '\0'; //Terminate str
i = -1; //idk what this is even doing?
break; //Break out of the for-loop
}
}
printf("your string: '%s' contains %d of letters\n", str, charCount); //output
strcpy(w, str);
// return charCount;
return strlen(w); //not sure what i should be returning.... they both work
}
gets() was supposed to get a string from the input and store it into the supplied argument. However, due to lack of preliminary validation on the input length, it is vulnerable to buffer overflow.
A better choice is fgets().
However, coming to the usage of getchar() part, it reads one char at a time. So basically, you have to keep reading from the standard input one by one, using a loop, until you reach a newline (or EOF) which marks the end of expected input.
As you read a character (with optional validation), you can keep on storing them in str so that, when the input loop ends, you have the input string ready in str.
Don't forget to null terminate str, just in case.

C: Clearing STDIN

basically in codeblocks for windows before each printf I have "fflush(stdin);" which works. When I copied my code to Linux, it doesn't work, nor does any of the alternatives for "fflush(stdin);" that I've found. No matter which way I seem to do it, the input doesn't seem to be clearing in the buffer or something in my code is incorrect.
#include <stdio.h>
#include <math.h>
#include <limits.h>
#include <ctype.h>
int main()
{
char pbuffer[10], qbuffer[10], kbuffer[10];
int p=0, q=0, k=0;
int r, i, Q, count, sum;
char a[3];
a[0]='y';
while(a[0]=='y' || a[0]=='Y')
{
printf("Enter a p value: \n");
fgets(pbuffer, sizeof(pbuffer), stdin);
p = strtol(pbuffer, (char **)NULL, 10);
printf("Enter a q value: \n");
fgets(qbuffer, sizeof(qbuffer), stdin);
q = strtol(qbuffer, (char **)NULL, 10);
printf("Enter a k value: \n");
fgets(kbuffer, sizeof(kbuffer), stdin);
k = strtol(kbuffer, (char **)NULL, 10);
while(p<q+1)
{
Q=p;
sum=0;
count=0;
while(Q>0)
{
count++;
r = Q%10;
sum = sum + pow(r,k);
Q = Q/10;
}
if ( p == sum && i>1 && count==k )
{
printf("%d\n",p);
}
p++;
a[0]='z';
}
while((a[0]!='y') && (a[0]='Y') && (a[0]!='n') && (a[0]!='N'))
{
printf("Would you like to run again? (y/n) ");
fgets(a, sizeof(a), stdin);
}
}
return 0;
}
Calling fflush(stdin) is not standard, so the behavior is undefined (see this answer for more information).
Rather than calling fflush on stdin, you could call scanf, passing a format string instructing the function to read everything up to and including the newline '\n' character, like this:
scanf("%*[^\n]%1*[\n]");
The asterisk tells scanf to ignore the result.
Another problem is calling scanf to read a character into variable a with the format specifier of " %s": when the user enters a non-empty string, null terminator creates buffer overrun, causing undefined behavior (char a is a buffer of one character; string "y" has two characters - {'y', '\0'}, with the second character written past the end of the buffer). You should change a to a buffer that has several characters, and pass that limit to scanf:
char a[2];
do {
printf("Would you like to run again? (y/n) \n")
scanf("%1s", a);
} while(a[0] !='y' && a[0] !='Y' && a[0]!='n' && a[0]!='N' );
}
I think what you are trying to do is more difficult than it seems.
My interpretation of what you are trying to do is disable type ahead so that if the user types some characters while your program is processing other stuff, they don't appear at the prompt. This is actually quite difficult to do because it is an OS level function.
You could do a non blocking read on the device before printing the prompt until you get EWOULDBLOCK in errno. Or the tcsetattr function family might help. It looks like there is a way to drain input for a file descriptor in there, but it might interact badly with fgets/fscanf
A better idea is not to worry about it at all. Unix users are used to having type ahead and what you want would be unexpected behaviour for them.
Drop the need for flushing the input buffer.
OP is on the right track using fgets() rather than scanf() for input, OP should continue that approach with:
char a;
while(a !='y' && a !='Y' && a!='n' && a!='N' ) {
printf("Would you like to run again? (y/n) \n");
if (fgets(kbuffer, sizeof(kbuffer), stdin) == NULL)
Handle_EOForIOerror();
int cnt = sscanf(kbuffer, " %c", &a); // Use %c, not %s
if (cnt == 0)
continue; // Only white-space entered
}
Best to not use scanf() as it tries to handle user IO and parsing in one shot and does neither that well.
Certain present OP's woes stem from fgets() after scanf(" %s", &a); (which is UB as it should be scanf(" %c", &a);. Mixing scanf() with fgets() typically has the problem that the scanf(" %c", &a); leaves the Enter or '\n' in the input buffer obliging the code to want to flsuh the input buffer before the next fgets(). Else that fgets() gets the stale '\n' and not a new line of info.
By only using fgets() for user IO, there need for flushing is negated.
Sample fgets() wrapper
char *prompt_fgets(const char *prompt, char dest, long size) {
fputs(prompt, stdout);
char *retval = fgets(dest, size, stdin);
if (retval != NULL) {
size_t len = strlen(dest);
if (len > 1 && dest[len-1] == '\n') { // Consume trailing \n
dest[--len] = '\0';
}
else if (len + 1 == dest) { // Consume extra char
int ch;
do {
ch == fgetc(stdin);
} while (ch != '\n' && ch != EOF);
}
return retval;
}

Resources