Many people said that scanf shouldn't be used in "more serious program", same as with getline.
I started to be lost: if every input function I got across people said that I shouldn't use any of them, then what should I use? Is there is a more "standard" way to get input that I'm not aware of?
Generally, fgets() is considered a good option. It reads whole lines into a buffer, and from there you can do what you need. If you want behavior like scanf(), you can pass the strings you read along to sscanf().
The main advantage of this, is that if the string fails to convert, it's easy to recover, whereas with scanf() you're left with input on stdin which you need to drain. Plus, you won't wind up in the pitfall of mixing line-oriented input with scanf(), which causes headaches when things like \n get left on stdin commonly leading new coders to believe the input calls had been ignored altogether.
Something like this might be to your liking:
char line[256];
int i;
if (fgets(line, sizeof(line), stdin)) {
if (1 == sscanf(line, "%d", &i)) {
/* i can be safely used */
}
}
Above you should note that fgets() returns NULL on EOF or error, which is why I wrapped it in an if. The sscanf() call returns the number of fields that were successfully converted.
Keep in mind that fgets() may not read a whole line if the line is larger than your buffer, which in a "serious" program is certainly something you should consider.
For simple input where you can set a fixed limit on the input length, I would recommend reading the data from the terminal with fgets().
This is because fgets() lets you specify the buffer size (as opposed to gets(), which for this very reason should pretty much never be used to read input from humans):
char line[256];
if(fgets(line, sizeof line, stdin) != NULL)
{
/* Now inspect and further parse the string in line. */
}
Remember that it will retain e.g. the linefeed character(s), which might be surprising.
UPDATE: As pointed out in a comment, there's a better alternative if you're okay with getting responsibility for tracking the memory: getline(). This is probably the best general-purpose solution for POSIX code, since it doesn't have any static limit on the length of lines to be read.
There are several problems with using scanf:
reading text with a plain %s conversion specifier has the same risk as using gets(); if the user types in a string that's longer than what the target buffer is sized to hold, you'll get a buffer overrun;
if using %d or %f to read numeric input, certain bad patterns cannot be caught and rejected completely -- if you're reading an integer with %d and the user types "12r4", scanf will convert and assign the 12 while leaving r4 in the input stream to foul up the next read;
some conversion specifiers skip leading whitespace, others do not, and failure to take that into account can lead to problems where some input is skipped completely;
Basically, it takes a lot of extra effort to bulletproof reads using scanf.
A good alternative is to read all input as text using fgets(), and then tokenize and convert the input using sscanf or combinations of strtok, strtol, strtod, etc.
Use fgets to get the data and use sscanf (or another method) to interpret them.
See this page to learn why it is better to use fgets + sscanf rather than scanf
http://c-faq.com/stdio/scanfprobs.html
Related
I have problems with my C program when I try to read / parse input.
Help?
This is a FAQ entry.
StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.
This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:
Why does the last line print twice?
Why does my scanf("%d", ...) / scanf("%c", ...) fail?
Why does gets() crash?
...
The answer is marked as community wiki. Feel free to improve and (cautiously) extend.
The Beginner's C Input Primer
Text mode vs. Binary mode
Check fopen() for failure
Pitfalls
Check any functions you call for success
EOF, or "why does the last line print twice"
Do not use gets(), ever
Do not use fflush() on stdin or any other stream open for reading, ever
Do not use *scanf() for potentially malformed input
When *scanf() does not work as expected
Read, then parse
Read (part of) a line of input via fgets()
Parse the line in-memory
Clean Up
Text mode vs. Binary mode
A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.
A "text mode" stream may do a number of transformations, including (but not limited to):
removal of spaces immediately before a line-end;
changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.
It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.
Check fopen() for failure
The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.
When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char **argv)
{
if (argc < 2) {
fprintf(stderr, "usage: %s file\n", argv[0]);
return 1;
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
// alternatively, just `perror(argv[1])`
fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
return 1;
}
// read from fp here
fclose(fp);
return 0;
}
Pitfalls
Check any functions you call for success
This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.
These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.
EOF, or "why does the last line print twice"
The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:
// BROKEN CODE
while (!feof(fp)) {
fgets(buffer, BUFFER_SIZE, fp);
printf("%s", buffer);
}
This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.
EOF only gets set when you attempt to read past the last character!
So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.
Instead, check whether fgets failed directly:
// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
printf("%s", buffer);
}
Do not use gets(), ever
There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.
Do not use fflush() on stdin or any other stream open for reading, ever
Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.
Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:
int c;
do c = getchar(); while (c != EOF && c != '\n');
Do not use *scanf() for potentially malformed input
Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.
But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)
Even then *scanf() can trip the unobservant:
Using a format string that in some way can be influenced by the user is a gaping security hole.
If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
It has somewhat peculiar behaviour in some corner cases.
When *scanf() does not work as expected
A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.
Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.
You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)
Read, then parse
We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?
Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.
Read (part of) a line of input via fgets()
fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it did not all fit, you are looking at a partially-read line.
Parse the line in-memory
Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.
But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.
Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.
And if all else fails, you have the whole line available to print a helpful error message for the user.
Clean Up
Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.
fclose(fp);
I have problems with my C program when I try to read / parse input.
Help?
This is a FAQ entry.
StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.
This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:
Why does the last line print twice?
Why does my scanf("%d", ...) / scanf("%c", ...) fail?
Why does gets() crash?
...
The answer is marked as community wiki. Feel free to improve and (cautiously) extend.
The Beginner's C Input Primer
Text mode vs. Binary mode
Check fopen() for failure
Pitfalls
Check any functions you call for success
EOF, or "why does the last line print twice"
Do not use gets(), ever
Do not use fflush() on stdin or any other stream open for reading, ever
Do not use *scanf() for potentially malformed input
When *scanf() does not work as expected
Read, then parse
Read (part of) a line of input via fgets()
Parse the line in-memory
Clean Up
Text mode vs. Binary mode
A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.
A "text mode" stream may do a number of transformations, including (but not limited to):
removal of spaces immediately before a line-end;
changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.
It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.
Check fopen() for failure
The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.
When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char **argv)
{
if (argc < 2) {
fprintf(stderr, "usage: %s file\n", argv[0]);
return 1;
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
// alternatively, just `perror(argv[1])`
fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
return 1;
}
// read from fp here
fclose(fp);
return 0;
}
Pitfalls
Check any functions you call for success
This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.
These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.
EOF, or "why does the last line print twice"
The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:
// BROKEN CODE
while (!feof(fp)) {
fgets(buffer, BUFFER_SIZE, fp);
printf("%s", buffer);
}
This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.
EOF only gets set when you attempt to read past the last character!
So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.
Instead, check whether fgets failed directly:
// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
printf("%s", buffer);
}
Do not use gets(), ever
There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.
Do not use fflush() on stdin or any other stream open for reading, ever
Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.
Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:
int c;
do c = getchar(); while (c != EOF && c != '\n');
Do not use *scanf() for potentially malformed input
Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.
But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)
Even then *scanf() can trip the unobservant:
Using a format string that in some way can be influenced by the user is a gaping security hole.
If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
It has somewhat peculiar behaviour in some corner cases.
When *scanf() does not work as expected
A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.
Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.
You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)
Read, then parse
We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?
Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.
Read (part of) a line of input via fgets()
fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it did not all fit, you are looking at a partially-read line.
Parse the line in-memory
Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.
But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.
Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.
And if all else fails, you have the whole line available to print a helpful error message for the user.
Clean Up
Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.
fclose(fp);
I have problems with my C program when I try to read / parse input.
Help?
This is a FAQ entry.
StackOverflow has many questions related to reading input in C, with answers usually focussed on the specific problem of that particular user without really painting the whole picture.
This is an attempt to cover a number of common mistakes comprehensively, so this specific family of questions can be answered simply by marking them as duplicates of this one:
Why does the last line print twice?
Why does my scanf("%d", ...) / scanf("%c", ...) fail?
Why does gets() crash?
...
The answer is marked as community wiki. Feel free to improve and (cautiously) extend.
The Beginner's C Input Primer
Text mode vs. Binary mode
Check fopen() for failure
Pitfalls
Check any functions you call for success
EOF, or "why does the last line print twice"
Do not use gets(), ever
Do not use fflush() on stdin or any other stream open for reading, ever
Do not use *scanf() for potentially malformed input
When *scanf() does not work as expected
Read, then parse
Read (part of) a line of input via fgets()
Parse the line in-memory
Clean Up
Text mode vs. Binary mode
A "binary mode" stream is read in exactly as it has been written. However, there might (or might not) be an implementation-defined number of null characters ('\0') appended at the end of the stream.
A "text mode" stream may do a number of transformations, including (but not limited to):
removal of spaces immediately before a line-end;
changing newlines ('\n') to something else on output (e.g. "\r\n" on Windows) and back to '\n' on input;
adding, altering, or deleting characters that are neither printing characters (isprint(c) is true), horizontal tabs, or new-lines.
It should be obvious that text and binary mode do not mix. Open text files in text mode, and binary files in binary mode.
Check fopen() for failure
The attempt to open a file may fail for various reasons -- lack of permissions, or file not found being the most common ones. In this case, fopen() will return a NULL pointer. Always check whether fopen returned a NULL pointer, before attempting to read or write to the file.
When fopen fails, it usually sets the global errno variable to indicate why it failed. (This is technically not a requirement of the C language, but both POSIX and Windows guarantee to do it.) errno is a code number which can be compared against constants in errno.h, but in simple programs, usually all you need to do is turn it into an error message and print that, using perror() or strerror(). The error message should also include the filename you passed to fopen; if you don't do that, you will be very confused when the problem is that the filename isn't what you thought it was.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char **argv)
{
if (argc < 2) {
fprintf(stderr, "usage: %s file\n", argv[0]);
return 1;
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
// alternatively, just `perror(argv[1])`
fprintf(stderr, "cannot open %s: %s\n", argv[1], strerror(errno));
return 1;
}
// read from fp here
fclose(fp);
return 0;
}
Pitfalls
Check any functions you call for success
This should be obvious. But do check the documentation of any function you call for their return value and error handling, and check for those conditions.
These are errors that are easy when you catch the condition early, but lead to lots of head-scratching if you do not.
EOF, or "why does the last line print twice"
The function feof() returns true if EOF has been reached. A misunderstanding of what "reaching" EOF actually means makes many beginners write something like this:
// BROKEN CODE
while (!feof(fp)) {
fgets(buffer, BUFFER_SIZE, fp);
printf("%s", buffer);
}
This makes the last line of the input print twice, because when the last line is read (up to the final newline, the last character in the input stream), EOF is not set.
EOF only gets set when you attempt to read past the last character!
So the code above loops once more, fgets() fails to read another line, sets EOF and leaves the contents of buffer untouched, which then gets printed again.
Instead, check whether fgets failed directly:
// GOOD CODE
while (fgets(buffer, BUFFER_SIZE, fp)) {
printf("%s", buffer);
}
Do not use gets(), ever
There is no way to use this function safely. Because of this, it has been removed from the language with the advent of C11.
Do not use fflush() on stdin or any other stream open for reading, ever
Many people expect fflush(stdin) to discard user input that has not yet been read. It does not do that. In plain ISO C, calling fflush() on an input stream has undefined behaviour. It does have well-defined behavior in POSIX and in MSVC, but neither of those make it discard user input that has not yet been read.
Usually, the right way to clear pending input is read and discard characters up to and including a newline, but not beyond:
int c;
do c = getchar(); while (c != EOF && c != '\n');
Do not use *scanf() for potentially malformed input
Many tutorials teach you to use *scanf() for reading any kind of input, because it is so versatile.
But the purpose of *scanf() is really to read bulk data that can be somewhat relied upon being in a predefined format. (Such as being written by another program.)
Even then *scanf() can trip the unobservant:
Using a format string that in some way can be influenced by the user is a gaping security hole.
If the input does not match the expected format, *scanf() immediately stops parsing, leaving any remaining arguments uninitialized.
It will tell you how many assignments it has successfully done -- which is why you should check its return code (see above) -- but not where exactly it stopped parsing the input, making graceful error recovery difficult.
It skips any leading whitespaces in the input, except when it does not ([, c, and n conversions). (See next paragraph.)
It has somewhat peculiar behaviour in some corner cases.
When *scanf() does not work as expected
A frequent problem with *scanf() is when there is an unread whitespace (' ', '\n', ...) in the input stream that the user did not account for.
Reading a number ("%d" et al.), or a string ("%s"), stops at any whitespace. And while most *scanf() conversion specifiers skip leading whitespace in the input, [, c and n do not. So the newline is still the first pending input character, making either %c and %[ fail to match.
You can skip over the newline in the input, by explicitly reading it e.g. via fgetc(), or by adding a whitespace to your *scanf() format string. (A single whitespace in the format string matches any number of whitespace in the input.)
Read, then parse
We just adviced against using *scanf() except when you really, positively, know what you are doing. So, what to use as a replacement?
Instead of reading and parsing the input in one go, as *scanf() attempts to do, separate the steps.
Read (part of) a line of input via fgets()
fgets() has a parameter for limiting its input to at most that many bytes, avoiding overflow of your buffer. If the input line did fit into your buffer completely, the last character in your buffer will be the newline ('\n'). If it did not all fit, you are looking at a partially-read line.
Parse the line in-memory
Especially useful for in-memory parsing are the strtol() and strtod() function families, which provide similar functionality to the *scanf() conversion specifiers d, i, u, o, x, a, e, f, and g.
But they also tell you exactly where they stopped parsing, and have meaningful handling of numbers too large for the target type.
Beyond those, C offers a wide range of string processing functions. Since you have the input in memory, and always know exactly how far you have parsed it already, you can walk back as many times you like trying to make sense of the input.
And if all else fails, you have the whole line available to print a helpful error message for the user.
Clean Up
Make sure you explicitly close any stream you have (successfully) opened. This flushes any as-yet unwritten buffers, and avoids resource leaks.
fclose(fp);
In the book Practical C Programming, I find that the combination of fgets() and sscanf() is used to read input. However, it appears to me that the same objective can be met more easily using just the fscanf() function:
From the book (the idea, not the example):
int main()
{
int age, weight;
printf("Enter age and weight: ");
char line[20];
fgets(line, sizeof(line), stdin);
sscanf(line, "%d %d", &age, &weight);
printf("\nYou entered: %d %d\n", age, weight);
return 0;
}
How I think it should be:
int main()
{
int age, weight;
printf("Enter age and weight: ");
fscanf(stdin, "%d %d", &age, &weight);
printf("\nYou entered: %d %d\n", age, weight);
return 0;
}
Or there is some hidden quirk I'm missing?
There are a few behavior differences in the two approaches. If you use fgets() + sscanf(), you must enter both values on the same line, whereas fscanf() on stdin (or equivalently, scanf()) will read them off different lines if it doesn't find the second value on the first line you entered.
But, probably the most important differences have to do with handling error cases and the mixing of line oriented input and field oriented input.
If you read a line that you're unable to parse with sscanf() after having read it using fgets() your program can simply discard the line and move on. However, fscanf(), when it fails to convert fields, leaves all the input on the stream. So, if you failed to read the input you wanted, you'd have to go and read all the data you want to ignore yourself.
The other subtle gotcha comes in if you want to mix field oriented (ala scanf()) with line oriented (e.g. fgets()) calls in your code. When scanf() converts an int for example, it will leave behind a \n on the input stream (assuming there was one, like from pressing the enter key), which will cause a subsequent call to fgets() to return immediately with only that character in the input. This is a really common issue for new programmers.
So, while you are right that you can just use fscanf() like that, you may be able to avoid some headaches by using fgets() + sscanf().
The problem with only using fscanf() is, mostly, in error management.
Imagine you input "51 years, 85 Kg" to both programs.
The first program fails in the sscanf() and you still have the line to report errors to the user, to try a different parsing alternative, to something;
The second program fails at years, age is usable, weight is unusable.
Remeber to always check the return value of *scanf() for error checking.
fgets(line, sizeof(line), stdin);
if (sscanf(line, "%d%d", &age, &weight) != 2) /* error with input */;
Edit
With your first program, after the error, the input buffer is clear; with the second program the input buffer starts with YEAR...
Recovery in the first case is easy; recovery in the second case has to go through some sort of clearing the input buffer.
There is no difference between fscanf() versus fgets()/sscanf() when:
Input data is well-formed.
Two types of errors occur: I/O and format. fscanf() simultaneously handles these two error types in one function but offers few recovery options. The separate fgets() and sscanf() allow logical separation of I/O issues from format ones and thus better recovery.
Only 1 parsing path with fscanf().
Separating I/O from scanning as with fgets/sscanf allows multiple sscanf() options. Should a given scanning of a buffer not realize the desired results, other sscanf() with different formats are available.
No embedded '\0'.
Rarely does '\0' occurs, but should one occur, sscanf() will not see it as scanning stops with its occurrence, whereas fscanf() continues.
In all cases, check results of all three functions.
I've already got some code to read a text file using fscanf(), and now I need it modified so that fields that were previously whitespace-free need to allow whitespace. The text file is basically in the form of:
title: DATA
title: DATA
etc...
which is basically parsed using fgets(inputLine, 512, inputFile); sscanf(inputLine, "%*s %s", &data);, reading the DATA fields and ignoring the titles, but now some of the data fields need to allow spaces. I still need to ignore the title and the whitespace immediately after it, but then read in the rest of the line including the whitespace.
Is there anyway to do this with the sscanf() function?
If not, what is the smallest change I can make to the code to handle the whitespace properly?
UPDATE: I edited the question to replace fscanf() with fgets() + sscanf(), which is what my code is actually using. I didn't really think it was relevant when I first wrote the question which is why I simplified it to fscanf().
If you cannot use fgets() use the %[ conversion specifier (with the "exclude option"):
char buf[100];
fscanf(stdin, "%*s %99[^\n]", buf);
printf("value read: [%s]\n", buf);
But fgets() is way better.
Edit: version with fgets() + sscanf()
char buf[100], title[100];
fgets(buf, sizeof buf, stdin); /* expect string like "title: TITLE WITH SPACES" */
sscanf(buf, "%*s %99[^\n]", title);
I highly suggest you stop using fscanf() and start using fgets() (which reads a whole line) and then parse the line that has been read.
This will allow you considerably more freedom in regards to parsing non-exactly-formatted input.
The simplest thing would be to issue a
fscanf("%*s");
to discard the first part and then just call the fgets:
fgets(str, stringSize, filePtr);
If you insist on using scanf, and assuming that you want newline as a terminator, you can do this:
scanf("%*s %[^\n]", str);
Note, however, that the above, used exactly as written, is a bad idea because there's nothing to guard against str being overflown (as scanf doesn't know its size). You can, of course, set a predefined maximum size, and specify that, but then your program may not work correctly on some valid input.
If the size of the line, as defined by input format, isn't limited, then your only practical option is to use fgetc to read data char by char, periodically reallocating the buffer as you go. If you do that, then modifying it to drop all read chars until the first whitespace is fairly trivial.
A %s specifier in fscanf skips any whitespace on the input, then reads a string of non-whitespace characters up to and not including the next whitespace character.
If you want to read up to a newline, you can use %[^\n] as a specifier. In addition, a ' ' in the format string will skip whitespace on the input. So if you use
fscanf("%*s %[^\n]", &str);
it will read the first thing on the line up to the first whitespace ("title:" in your case), and throw it away, then will read whitespace chars and throw them away, then will read all chars up to a newline into str, which sounds like what you want.
Be careful that str doesn't overflow -- you might want to use
fscanf("%*s %100[^\n]", &str)
to limit the maximum string length you'll read (100 characters, not counting a terminating NUL here).
You're running up against the limits of what the *scanf family is good for. With fairly minimal changes you could try using the string-scanning modules from Dave Hanson's C Interfaces and Implementations. This stuff is a retrofit from the programming language Icon, an extremely simple and powerful string-processing language which Hanson and others worked on at Arizona. The departure from sscanf won't be too severe, and it is simpler, easier to work with, and more powerful than regular expressions. The only down side is that the code is a little hard to follow without the book—but if you do much C programming, the book is well worth having.