Difference between fgets and fscanf? - c

I have a question concerning fgets and fscanf in C. What exactly is the difference between these two? For example:
char str[10];
while(fgets(str,10,ptr))
{
counter++;
...
and the second example:
char str[10];
while(fscanf(ptr,"%s",str))
{
counter++;
...
when having a text file which contains strings which are separated by an empty space, for example: AB1234 AC5423 AS1433. In the first example the "counter" in the while loop will not give the same output as in the second example. When changing the "10" in the fgets function the counter will always give different results. What is the reason for this?
Can somebody please also explain what the fscanf exactly does, how long is the string in each while loop?

The function fgets read until a newline (and also stores it). fscanf with the %s specifier reads until any blank space and doesn't store it...
As a side note, you're not specifying the size of the buffer in scanf and it's unsafe. Try:
fscanf(ptr, "%9s", str)

fgets reads to a newline. fscanf only reads up to whitespace.

In your example, fgets will read up to a maximum of 9 characters from the input stream and save them to str, along with a 0 terminator. It will not skip leading whitespace. It will stop if it sees a newline (which will be saved to str) or EOF before the maximum number of characters.
fscanf with the %s conversion specifier will skip any leading whitespace, then read all non-whitespace characters, saving them to str followed by a 0 terminator. It will stop reading at the next whitespace character or EOF. Without an explicit field width, it will read as many non-whitespace characters as are in the stream, potentially overruning the target buffer.
So, imagine the input stream looks like this: "\t abcdef\n<EOF>". If you used fgets to read it, str would contain "\t abcdef\n\0". If you usedfscanf, str could contain "abcdef\0" (where \0 indicates the 0 terminator).

fgets read the whole line. fscanf with %s read a string, separate by space (or \n,\t,etc...).
Anyway, you should not use them unless you sure that the array you read to is big enough to contain the input.
You wrote When changing the "10" in the fgets function the counter will always give different results. Note that fgets and scanf don't know how much bytes to read. you should tell them. changing the "10" just enlarge the buffer these functions write to.

Related

Why is this creating two inputs instead of one

https://i.imgur.com/FLxF9sP.png
As shown in the link above I have to input '<' twice instead of once, why is that? Also it seems that the first input is ignored but the second '<' is the one the program recognizes.
The same thing occurs even without a loop too.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(){
int randomGen, upper, lower, end, newRandomGen;
char answer;
upper = 100;
lower = 1;
end = 1;
do {
srand(time(0));
randomGen = rand()%(upper + lower);
printf("%d\n", randomGen);
scanf("%s\n", &answer);
}while(answer != '=');
}
Whitespace in scanf format strings, like the \n in "%c\n", tries to match any amount of whitespace, and scanf doesn’t know that there’s no whitespace left to skip until it encounters something that isn’t whitespace (like the second character you type) or the end of input. You provide it with =\n, which fills in the %c and waits until the whitespace is over. Then you provide it with another = and scanf returns. The second time around, the character could be anything and it’d still work.
Skip leading whitespace instead (and use the correct specifier for one character, %c, as has been mentioned):
scanf(" %c", &answer);
Also, it’s good practice to make sure you actually succeeded in reading something, especially when failing to read something means leaving it uninitialized and trying to read it later (another example of undefined behaviour). So check scanf’s return value, which should match the number of conversion specifiers you provided:
if (scanf(" %c", &answer) != 1) {
return EXIT_FAILURE;
}
As has been commented, you should not use the scanf format %s if you want to read a single character. Indeed, you should never use the scanf format %s for any purpose, because it will read an arbitrary number of characters into the buffer you supply, so you have no way to ensure that your buffer is large enough. So you should always supply a maximum character count. For example, %1s will read only one character. But note: that will still not work with a char variable, since it reads a string and in C, strings are arrays of char terminated with a NUL. (NUL is the character whose value is 0, also sometimes spelled \0. You could just write it as 0, but don't confuse that with the character '0' (whose value is 48, in most modern systems).
So a string containing a single character actually occupies two bytes: the character itself, and a NUL.
If you just want to read a single character, you could use the format %c. %c has a few differences from %s, and you need to be aware of all of them:
The default maximum length read by %s is "unlimited". The default for %c is 1, so %c is identical to %1c.
%s will put a NUL at the end of the characters read (which you need to leave space for), so the result is a C string. %c does not add the NUL, so you only need to leave enough space for the characters themselves.
%s skips whitespace before storing any characters. %c does not ignore whitespace. Note: a newline character (at the end of each line) is considered whitespace.
So, based on the first two rules, you could use either of the following:
char theShortString[2];
scanf("%1s", theShortString);
char theChar = theShortString[0];
or
char theChar;
scanf("%c", &theChar);
Now, when you used
scanf("%s", &theChar);
you will cause scanf to write a NUL (that is, a zero) in the byte following theChar, which quite possibly is part of a different variable. That's really bad. Don't do that. Ever. Even if you get away with it today, it will get you into serious trouble some time soon.
But that's not the problem here. The problem here is with what comes after the %s format code.
Let's take a minute (ok, maybe half an hour) to read the documentation of scanf, by typing man scanf. What we'll see, quite near the beginning, is: (emphasis added)
A directive is one of the following:
A sequence of white-space characters (space, tab, newline, etc.; see isspace(3)). This directive matches any amount of white space, including none, in the input.
So when you use "%s\n", scanf will do the following:
skip over any white-space characters in the input buffer.
read the following word up to but not including the next white-space character, and store it in the corresponding argument, followed by a NUL.
skip over any white-space following the word which it just read.
It does the last step because \n — a newline — is itself white-space, as noted in the quote from the manpage.
Now, what you actually typed was < followed by a newline, so the word read at step 2 will be just he character <. The newline you typed afterwards is white-space, so it will be ignored by step 3. But that doesn't satisfy step 3, because scanf (as documented) will ignore "any amount of white space". It doesn't know that there isn't more white space coming. You might, for example, be intending to type a blank line (that is, just a newline), in which case scanf must skip over that newline as well. So scanf keeps on reading.
Since the input buffer is now empty, the I/O library must now read the next line, which it does. And now you type another < followed by a newline. Clearly, the < is not white-space, so scanf leaves it in the input buffer and returns, knowing that it has done its duty.
Your program then checks the word read by scanf and realises that it is not an =. So it loops again, and the scanf executes again. Now there is already data in the input buffer (the second < which you typed), so scanf can immediately store that word. But it will again try to skip "any amount of white space" afterwards, which by the same logic as above will cause it to read a third line of input, which it leaves in the input buffer.
The end result is that you always need to type the next line before the previous line is passed back to your program. Obviously that's not what you want.
So what's the solution? Simple. Don't put a \n at the end of your format string.
Of course, you do want to skip that newline character. But you don't need to skip it until the next call to scanf. If you used a %1s format code, scanf would automatically skip white-space before returning input, but as we've seen above, %c is far simpler if you only want to read a single character. Since %c does not skip white-space before returning input, you need to insert an explicit directive to do so: a white-space character. It's usual to use an actual space rather than a newline for this purpose, so we would normally write this loop as:
char answer;
srand(time(0)); /* Only call srand once, at the beginning of the program */
do {
randomGen = rand()%(upper + lower); /* This is not right */
printf("%d\n", randomGen);
scanf(" %c", &answer);
} while (answer != '=');
scanf("%s\n", &answer);
Here you used the %s flag in the format string, which tells scanf to read as many characters as possible into a pre-allocated array of chars, then a null terminator to make it a C-string.
However, answer is a single char. Just writing the terminator is enough to go out of bounds, causing undefined behaviour and strange mishaps.
Instead, you should have used %c. This reads a single character into a char.

What is the difference between these two scanf statements?

I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.

Read File: fscanf doesn't read whitespaces?

I have a problem fetching lines from File Pointer using fscanf.
Let's say a want to fetch a line like this:
<123324><sport><DESCfddR><spor ds>
Fscanf fetch only this part:
<123324><sport><DESCfddR><spor
Does anybody know how to overcome this problem?
Thanks in advance.
In conclusion,the best way to read lines which contain whitespaces is to use fgets:
fgets (currentLine, MAX_LENGTH , filePointer);
Using fscanf you are going to mess with a lot of problems.
You are probably using %s in the fscanf to read data. From the C11 standard,
7.21.6.2 The fscanf function
[...]
The conversion specifiers and their meanings are:
[...]
s Matches a sequence of non-white-space characters. 286
[...]
So, %s will stop scanning when it encounters a whitespace character or, if the length field is present, until the specified length or until a whitespace character, whichever occurs first.
How to fix this problem? Use a different format specifier:
fscanf(fp ," %[^\n]", buffer);
The above fscanf skips all whitespace characters, if any, until the first non-whitespace character(space at the start) and then, %[^\n] scans everything until a \n character.
You can further improve security by using
fscanf(fp ," %M[^\n]", buffer);
Replace M with the size of buffer minus one(One space reserved for the NUL-terminator). It is the length modifier. Also checking the return value of fscanf is a good idea.
Using fgets() is a better way though.

What is the behavior of %(limit)[^\n] in scanf ? It is safety from overflow?

The format %(limit)[^\n] for scanf function is unsafe ? (where (limit) is the length -1 of the string)
If it is unsafe, why ?
And there is a safe way to implement a function that catch strings just using scanf() ?
On Linux Programmer's Manual, (typing man scanf on terminal), the s format said:
Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null byte ('\0'),which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.
The input string stops at maximum field width always ? Or is just on GCC ?
Thanks.
%(limit)[^\n] for scanf" is usually safe.
In the below example, at most 99 char will be read and saved into buf. If any char are saved, a '\0' will be appended and cnt will be 1.
char buf[100];
int cnt = scanf("%99[^\n]", buf);
This functionality is certainly safe, but what about others?
Problems occur when the input is a lone "\n".
In this case, nothing is saved in buf and 0 is returned. Had the next line of code been the following, the output is Undefined Behavior as buf is not initialized to anything.
puts(buf);
A better following line would be
if (cnt == 1) puts(buf);
else printf("Return count = %d\n", cnt);
Problems because the '\n' was not consumed.
The '\n' is still waiting to be read and another call to scanf("%99[^\n]", buf); will not read the '\n'.
Q: is a safe way to implement a function that catch strings just using scanf()
A: Pedantically: Not easily.
scanf(), fgets(), etc. are best used for reading text, not strings. In C a string is an array of char terminated with a '\0'. Input via scanf(), fgets(), etc. typically have issues reading '\0' and typically that char is not in the input anyways. Usually input is thought of as groups of char terminated by '\n' or other white-space.
If code is reading input terminated with '\n', using fgets() works well and is portable. fgets() too has it weakness that are handled in various ways . getline() is a nice alternative.
A close approximate would be scanf(" %99[^\n]", buf) (note the added " "), but alone that does not solve handing excessive long lines, reading multiple empty lines, embedded '\0' detection, loss of ability to report length read (strlen() does not work due to embedded '\0') and its leaving the trailing '\n' in stdin.
Short of using scanf("%c", &ch) with lots of surrounding code (which is silly, just use fgetc()) , I see no way to use a single scanf() absolutely safely when reading a line of user input.
Q: The input string stops at maximum field width always ?
A: With scanf("%99[^\n]", input stops 1) when a '\n' is encountered - the '\n' is not saved and remains in the file input buffer 2) 99 char have been read 3) EOF occurs or 4) IO error occurs (rare).
The [^\n] is to make scanf read input until it meets a new line character...while the limit is the maximum number of characters scanf should read...

how to read scanf with spaces

I'm having a weird problem
i'm trying to read a string from a console with scanf()
like this
scanf("%[^\n]",string1);
but it doesnt read anything. it just skips the entire scanf.
I'm trying it in gcc compiler
Trying to use scanf to read strings with spaces can bring unnecessary problems of buffer overflow and stray newlines staying in the input buffer to be read later. gets() is often suggested as a solution to this, however,
From the manpage:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many characters
gets() will read, and because gets()
will continue to store characters past
the end of the buffer, it is extremely
dangerous to use. It has been used to
break computer security. Use fgets()
instead.
So instead of using gets, use fgets with the STDIN stream to read strings from the keyboard
That should work fine, so something else is going wrong. As hobbs suggests, you might have a newline on the input, in which case this won't match anything. It also won't consume a newline, so if you do this in a loop, the first call will get up to the newline and then the next call will get nothing. If you want to read the newline, you need another call, or use a space in the format string to skip whitespace. Its also a good idea to check the return value of scanf to see if it actually matched any format specifiers.
Also, you probably want to specify a maximum length in order to avoid overflowing the buffer. So you want something like:
char buffer[100];
if (scanf(" %99[^\n]", buffer) == 1) {
/* read something into buffer */
This will skip (ignore) any blank lines and whitespace on the beginning of a line and read up to 99 characters of input up to and not including a newline. Trailing or embedded whitespace will not be skipped, only leading whitespace.
I'll bet your scanf call is inside a loop. I'll bet it works the first time you call it. I'll bet it only fails on the second and later times.
The first time, it will read until it reaches a newline character. The newline character will remain unread. (Odds are that the library internally does read it and calls ungetc to unread it, but that doesn't matter, because from your program's point of view the newline is unread.)
The second time, it will read until it reaches a newline character. That newline character is still waiting at the front of the line and scanf will read all 0 of the characters that are waiting ahead of it.
The third time ... the same.
You probably want this:
if (scanf("%99[^\n]%*c", buffer) == 1) {
Edit: I accidentally copied and pasted from another answer instead of from the question, before inserting the %*c as intended. This resulting line of code will behave strangely if you have a line of input longer than 100 bytes, because the %*c will eat an ordinary byte instead of the newline.
However, notice how dangerous it would be to do this:
scanf("%[^n]%*c", string1);
because there, if you have a line of input longer than your buffer, the input will walk all over your other variables and stack and everything. This is called buffer overflow (even if the overflow goes onto the stack).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *text(int n);
int main()
{
char str[10];
printf("enter username : ");
scanf(text(9),str);
printf("username = %s",str);
return 0;
}
char *text(int n)
{
fflush(stdin);fflush(stdout);
char str[50]="%",buf[50],st2[10]="[^\n]s";
char *s;itoa(n,buf,10);
// n == -1 no buffer protection
if(n != -1) strcat(str,buf);
strcat(str,st2);s=strdup(str);
fflush(stdin);fflush(stdout);
return s;
}

Resources