Ignoring whitespaces while reading input files in C - c

I'm trying to write code that registers the first word of every line as a command, but I want to be able to read the word regardless of there being spaces or not in front of it. I'm currently using fgets() and strncmp the first x characters of each line to do this, but it doesn't work for an arbitrary amount of whitespace. I have tried using sscanf() inside the fgets() loop to store the first word of each line to a variable but it seems to be skipping through lines and reading them incorrectly. I would rather not post the code as it is quite lengthy but it is basically this:
while( fgets(Line, BUFFER, input) != NULL )
{
if(strncmp(Line, "Word", 4) != NULL)
//DO SOMETHING
}
There are many strncmps and I would like for each of them to ignore an arbitrary amount of preceding spaces.

You can use isspace to skip over whitespace:
#include <ctype.h>
while( fgets(Line, BUFFER, input) != NULL )
{
char *p = Line;
while (isspace(*p)) // skip whitespace
p++;
if(strncmp(p, "Word", 4) != NULL)
//DO SOMETHING
}

Related

Usage of scanf ... getchar

Is the following pattern ok in C to get a string up until a newline?
int n = scanf("%40[^\n]s", title);
getchar();
It seems to work in being a quick way to strip off the trailing newline, but I'm wondering if there are shortcomings I'm not seeing here.
The posted code has multiple problems:
the s in the format string is not what you think it is: the specification is %40[^\n] and the s will try and match an s in the input stream, which may occur after 40 bytes have been stored into title.
scanf() will fail to convert anything of the pending input is a newline, leaving title unchanged and potentially uninitialized
getchar() will not necessarily read the newline: if more than 40 characters are present on the line, it will just read the next character.
If you want to read a line, up to 40 bytes and ignore the rest of the line up to and including the newline, use this:
char title[41];
*title = '\0';
if (scanf("%40[^\n]", title) == EOF) {
// end of file reached before reading anything, handle this case
} else {
scanf("%*[^\n]"); // discard the rest of the line, if any
getchar(); // discard the newline if any (or use scanf("%1*[\n]"))
}
It might be more readable to write:
char title[41];
int c, len = 0;
while ((c = getchar()) != EOF && c != '\n') {
if (len < 40)
title[len++] = c;
}
title[len] = '\0';
if (c == EOF && len == 0) {
// end of file reached before reading a line
} else {
// possibly empty line of length len was read in title
}
You can also use fgets():
char title[41];
if (fgets(title, sizeof title, stdin) {
char *p = strchr(title, '\n');
if (p != NULL) {
// strip the newline
*p = '\0';
} else {
// no newline found: discard reamining characters and the newline if any
int c;
while ((c = getchar()) != EOF && c != '\n')
continue;
}
} else {
// at end of file: nothing was read in the title array
}
Previous note, the s should be removed, it's not part of the specifier and is enough to mess up your read, scanf will try to match an s character against the string you input past the 40 characters, until it finds one the execution will not advance.
To answer your question using a single getchar is not the best approach, you can use this common routine to clear the buffer:
int n = scanf(" %40[^\n]", title);
int c;
while((c = getchar()) != '\n' && c != EOF){}
if(c == EOF){
// in the rare cases this can happen, it may be unrecoverable
// it's best to just abort
return EXIT_FAILURE;
}
//...
Why is this useful? It reads and discards all the characters remaing in the stdin buffer, regardless of what they are.
In a situation when an inputed string has, let's say 45 characters, this approach will clear the stdin buffer whereas a single getchar only clears 1 character.
Note that I added a space before the specifier, this is useful because it discards all white spaces before the first parseable character is found, newlines, spaces, tabs, etc. This is usually the desired behavior, for instance, if you hit Enter, or space Enter it will discard those and keep waiting for the input, but if you want to parse empty lines you should remove it, or alternatively use fgets.
There are a number of problems with your code like n never being used and wrong specifier for scanf.
The better approach is to use fgets. fgets will also read the newline character (if present before the buffer is full) but it's easy to remove.
See Removing trailing newline character from fgets() input

Reading String Until Blank Space

I am working on a C program that will read user input as Strings separated by spaces. I wanted a way to read these strings up until the blank spaces. I know that scanf does this with ease, but I was wondering if it was possible to do the same with gets() or sscanf() or some combination of the 2. I am very new to C, only having just started it and I come from a Java background. Thank you all in advance, I really appreciate any and all help!!!! Find below the code I have so far as well as well as a sample input (***The wanted functionality is achieved if the scanf() portion is uncommented, but the goal is achieve the same functionality as scanf() only using gets() and sscanf()).
int main()
{
printf("Enter words seperated by spaces:(. or EOF to stop):\n");
do
{
char s[255];
//scanf("%s",s);
gets(s);
if(strcmp(s,".")==0 || strcmp(s,"EOF")==0)
{
insert_dictionary_order(s);//adding to list
break;
}
else
{
insert_dictionary_order(s);//adding to list
}
}
while(1);
//now printing list
print_list();
return 0;
}
This is a sample text.
The file will be terminated by a single dot: .
The program continues processing the lines because the dot (.)
did not appear at the beginning.
. even though this line starts with a dot, it is not a single dot.
The program stops processing lines right here.
.
You won't be able to feed any more lines to the program.
Edit: The answer MUST implement either get() or sscanf() or both.
First, Never, Ever, Ever use gets(). It is so insecure and prone to exploit by buffer overflow it has been completely removed from the c-library beginning with C11. See Why gets() is so dangerous it should never be used! (using scanf() with the "%s" or "%[..]" conversion specifiers without the optional field-width modifier are just a bad)
fgets() is what you are looking for. To read your text file, stopping when '.' appears alone as the first character in a line with nothing following (aside from the '\n' included in the buffer by fgets()), you simply read each line into a sufficiently sized buffer (character array) and check the first character in the line to see if it is a '.' and check the second character to see if it is '\n', and if so, exit your read-loop.
For example:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (void) {
char line[MAXC]; /* buffer to hold line */
while (fgets (line, MAXC, stdin)) { /* read each line */
/* if 1st char '.' and either EOF or next is '\n', done */
if (*line == '.' && (!line[1] || line[1] == '\n'))
break;
line[strcspn (line, "\n")] = 0; /* trim \n from end of line */
puts (line); /* output line */
}
}
(note: above strcspn (line, "\n") returns the number of characters in line up to the '\n' allowing you to overwrite the '\n' with the nul-termianting character to remove it)
Generally checking that the second character in the line beginning '.' is a newline would be sufficient, but there are a number of editors that do not write POSIX compliant files (meaning no '\n' after the final line). If you didn't check against EOF, then for files with non-POSIX file endings, you would output the last '.' if it appeared on the final line.
Example Use/Output
With your example input in dat/stop_on_dot.txt redirected to the program on stdin as input, you would receive the following:
$ ./bin/stop_on_dot < dat/stop_on_dot.txt
This is a sample text.
The file will be terminated by a single dot: .
The program continues processing the lines because the dot (.)
did not appear at the beginning.
. even though this line starts with a dot, it is not a single dot.
The program stops processing lines right here.
Look things over and let me know if you have any questions.
Never use gets, there is no check for boundary there.
As for your concern using just getchar() is enough by checking the input is space character( , \t,\n)
I have just written a small program to take input till the space.
we can write it more robustly, let me know if this solves your problem.
#include <stdio.h>
int main()
{
char ch;
while(ch = getchar())
{
if(ch != ' ' && ch != '\t' && ch != '\n')
putchar(ch);
else
break;
}
return 0;
}

Trying to stop my program when two lines are empty using fgets()

Basically I'm making a short program that reads two lines and, if they are different, it will write both of them. If thery are the same sentences, the program shall write only one sentence. Program should stop when it encounters two empty lines. And that's a problem. I can't figure out how to do it. I've already tried to use strcmp() function but that didn't work either. Here's my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char sent1[6000], sent2[6000];
while(1){
fgets(sent1, sizeof(sent1), stdin);
fgets(sent2, sizeof(sent2), stdin);
if(strcmp(sent1, sent2)!=0)
printf("%s%s", sent1, sent2);
else
printf("%s", sent1);
if((sent1[0] == '/0') && (sent2[0] == '/0'))
break;
}
return 0;
}
fgets reads in a line including the new line character. So if your file contains empty lines, then such a line may have a '\n' at position 0, not a '\0'.
Try:
if( ((sent1[0] == '\0') || (sent1[0] == '\n'))
&& ((sent1[0] == '\0') || (sent1[0] == '\n')) )
Further, note that you should check the return value of fgets, which is NULL if end of file has been reached. In such a case, sent1 or sent2 would not be altered any more, and you might run into an infinite loop.
Your lines are not empty because fgets() would read in the newlines ('\n'). So you'd need to check for the newline charater instead.
if((sent1[0] == '\n') && (sent2[0] == '\n'))
break;
Not so relevant anymore, but your comparison is wrong because the null character is '\0', not '/0'.
As said before, fgets() would read in the newline character if there's space in the buffer. In your case, since you're comparing them, you need to be aware of that. Probably needs to strip out the newline character too. e.g.,
char *p;
if ((p = strchr(sent1, '\n')) != NULL)
*p = '\0';
(same with sent2 as well)
Another case to consider is what if the fgets() calls fail. You need to check their return value for failures. fgets() returns NULL on failure.

how to make fscanf, scan a single word instead of whole line

There is a text file called 1.txt like below that contains certain names, I want to add them to a linked list however the code doesn't scan only one name but the whole line, how can I make fscanf so that it only scans a single name?
Example 1.txt:
ana,bob,max,dolores
My code:
FILE *fp = fopen("1.txt", "rt");
while (!feof(fp)) {
char name_in[100];
fscanf(fp, "%s,", name_in);
printf("%s", name_in);
addnewnode(head, name_in);
}
fclose(fp);
The problem is that with the "%s" format, then scanf will not stop scanning until it hit the end of the input or a whitespace. That means you can't use scanf alone to parse your input.
Instead I suggest you read your whole line into a buffer, for example using fgets. Once you have it then you can use strtok in a loop to tokenize the line.
Not using scanf also sidesteps a big problem with your format string: Namely that it will look for a trailing comma in the input. If there's no comma at the end of the line (like in your example) then scanf will just block. It will block until there is a comma, but if you don't give it a comma then it might block forever. Either that or you will not get the last entry because scanf will fail. Checking what scanf returns is crucial.
Also, I strongly suggest you read Why is “while ( !feof (file) )” always wrong?.
What's in a name?
A name is usually thought of as containing letters, maybe spaces and some other characters. Code needs to be told what char make up a name, what are valid delimiters and handle other unexpected char.
"%s" only distinguishes between white-space and non-white-space. It treats , the same as letters.
"%width[A-Za-z' ]" will define a scanset accepting letters, ' and space. It will read/save up to width characters before appending a null character.
Always a good idea to check the return value of a input function before using the populated objects.
FILE *fp = fopen("1.txt", "rt");
if (fp == NULL) Handle_Error();
// end-of-file signal is not raised until after a read attempt.
// while (!feof(fp)) {
char name_in[100];
char delimiter[2];
int count;
while ((count = fscanf(fp, "%99[A-Za-z' ]%1[,\n]", name_in, delimiter)) == 2) {
printf("<%s>%s", name_in, delimiter);
addnewnode(head, name_in);
if (delimiter[0] == '\n') {
; // Maybe do something special at end-of-line
}
}
fclose(fp);
// Loop stopped unexpectedly
if (count != EOF || !feof(fp)) {
puts("Oops");
}
More robust code would read the line like with fgets() and then process the string. Could use similar code as above but with sscanf()
To include - in a scanset so code can handle hyphenated names, list it first. You may want to allow other characters too.
"%width[-A-Za-z' .]"

How to stop enter character getting stuck in buffer? C

So in the following bit of code I'm reading an option from the user to then decide whether to perform a given action:
printf("Do you want to execute %s: ", exe);
char input = getchar();
if (input == 'y') {
return execute(exe);
} return 0;
The problem I'm having is that after the line char input = getchar() a character (I'm assuming an enter key or newline) gets stuck in stdin, and a future call to fgets() (reading from stdin) from my execute() function therefore eats up this stuck character and doesn't prompt the user for the required input.
Now I can define the following function to 'eat' any stray characters for me if called before calling execute()...
void iflush() {
int c;
while ((c = getchar()) != EOF && c != '\n') {
continue;
}
}
... and this solves the problem. But I'm left wondering why this is happening in the first place? Is there a more 'proper' way of getting a char from stdin without causing this stray character which then messes up my other methods? I'm very new to C and am keen to be sure I'm learning good habits.
It seems odd that I have to add a helper method, even if simple, to prevent errors in my program just from recording some user input. I've tried using fgetc() also but same problem.
I'm left wondering why this is happening in the first place?
It's because when you can characters with %c (or fgetc()), it doesn't ignore ay of the whitespaces in the input stream.
Some format specifiers, for example %d, ignore any whitespaces left in the input stream. So, it's not a problem. But %c doesn't ignore it.
Hence, you typically use those "eat" functions.
Think if you are scanning a whitespace character (such as space, newline, etc), how'd you able to scan it using scanf()?
Obviously, there are different and better ways such as using fgets().
I'm very new to C and am keen to be sure I'm learning good habits.
My suggestion is to avoid scanf() completely. Instead use fgets() to read a line and then you can parse that line using sscanf(). Please read: Why does everyone say not to use scanf? What should I use instead?
One thing to be aware about fgets() is if you input buffer has enough space, then it'll read the newline character as well,
which you would want to avoid in most user inputs. So, you need to check if the last character is a newline and then remove it.
You can achieve it using strcspn() function such as:
char line[256];
if (fgets(line, sizeof line, stdin) == NULL) {
/* handle failure */
}
line[strcspn(line, "\n")] = 0; /* remove the newline char if present */
If you are mainly interested in reading just one char, then an array 3 chars would be sufficient. Still using a larger buffer doesn't hurt:
char line[256];
char ch;
if (fgets(line, sizeof line, stdin) == NULL) {
/* handle failure */
}
if (sscanf(line, "%c", &ch) != 1) {
/* handle failure */
}
/* Now you can use 'ch' to check if it's 'y' */

Resources