General questions about scanf and fscanf in C programming language - c

If I'm not wrong, library function int fscanf(FILE *stream, const char *format, ...) works
exactly the same as function int scanf(const char *format, ...) except that it requires stream selection.
For example if I wanted to read two ints from standard input the code would look something like this.
int first_number;
int second_number;
scanf("%d%d", &first_number, &second_number);
There's no point of me adding newline character in between format specifiers even though the second number is entered in next line of input? Function just looks for next decimal integer right? What happens when I enter two characters instead of ints? Why the function sometimes doesn't work if there's a space between format specifiers?
In addition to that. When reading from file with fscanf(..), lets says the txt file contains next lines:
P6
255
1920 1080
Do I need to specify next line characters in fscanf(..)? I read it like this.
FILE *input = ..
char type[2];
int tr;
int width; int height;
fscanf(input, "%s\n", &type);
fscanf(input, "%d\n" &tr);
fscanf(input, "%d %d\n", &width, &height)
Is there a need for \n to signal next line?
Can fscanf(..) anyhow affect any other functions for reading files like fread()? Or is it a good practice to just stick to one function through the whole file?

scanf(...) operates like fscanf(stdin, ....).
Unless '\n', ' ', or other white spaces are inside a "%[...]", as part of a format for *scanf(), scanning functions the same as if ' ', '\t' '\n' was used. (Also for '\v, '\r, '\f.)
// All function the same.
fscanf(input, "%d\n" &tr);
fscanf(input, "%d " &tr);
fscanf(input, "%d\t" &tr);
There's no point of me adding newline character in between format specifiers even though the second number is entered in next line of input?
All format specifiers except "%n", "^[...]", "%c" consume optional leading white-spaces. With "%d" the is no need other than style to code a leading white-space in the format.
Function just looks for next decimal integer right?
Simply: yes.
What happens when I enter two characters instead of ints?
Scanning stops. The first non-numeric input remains in stdin as well as any following input. The *scanf() return value reflects the incomplete scan.
Why the function sometimes doesn't work if there's a space between format specifiers?
Need example. Having spaces between specifiers is not an issue unless the following specifier is one of "%n", "^[...]", "%c".
When reading from file with fscanf(..), .... Do I need to specify next line characters in fscanf(..)?
No. fscanf() is not line orientated. Use fgets() to read lines. fscanf() is challenging to use to read a line. Something like
char buf[100];
int cnt = fscanf(f, "%99[^\n]", buf);
if (cnt == 0) {
buf[0] = 0;
}
if (cnt != EOF) {
cnt = fscanf(f, "%*1[^\n]");
}
I read it like this. ... fscanf(input, "%s\n", &type); fscanf(input, "%d\n" &tr); ....
"it" as in a line is not read properly as "%s", "%d", "\n" all read consume 0, 1, 2, ... '\n' and other white-spaces. They do not read a line nor just the 1 character of the format.
Further "\n" does not complete upon reading 1 '\n', but continues reading all white-spaces until a non-white-space is detected (or end-of-file). Do not append such to the end of a format to read the rest of the line.
If want to read the trailing '\n', code could use int cnt = fscanf(input, "%d%*1[\n]" &tr);, but code will not know if it succeeded in reading the trailing '\n' after the int. It will have simply read it if it was there. Could use other formats, but really, using fgets() to read a line is better.
Is there a need for \n to signal next line?
No, as a format "\n" reads 0 or more whites-spaces, not just new-lines.
Can fscanf(..) anyhow affect any other functions for reading files like fread()?
Yes. All input function affect what is available next for other input functions. Mixing fread() and fscanf() is challenging to get right.
is it a good practice to just stick to one function through the whole file?
It certainly is simpler. I recommend to use input functions as building blocks for a helper function to handle your file input.
Tip: Read lines with fgets(), then parse. Set fscanf() aside until you understand why it has so much trouble with unexpected input.

The %d conversion specifier tells scanf and fscanf to skip over any leading whitespace, then read up to the first non-digit character, so you don’t need to put a newline between the two %d in the scanf call - in fact, if you do that, it means you have to have a newline between your inputs, not just blanks.
Most conversion specifiers skip over leading whitespace - the only ones that don’t are %c and %[, so you’ll want to be careful when using them.

Related

Parsing string with sscanf that has a string in it

Project is in C. I need to parse strings that are always formatted the following way: integer, whitespace, plus sign, multi-word string, plus sign, white space, integer, whitespace, integer, end-of-line
Example:
10 +This is 1 string+ 2 -1
I'm having a hard time figuring out what to enter in the formatting of sscanf so that the string surrounded by the '+' signs get parsed correctly, without including the + signs. Assuming sscanf can be used for this case.
I tried "%d +%s+ %d %d" and that didn't work.
You use %s but that reads up to the first white space character. You want to read a string of not-plus-signs, so say that's what sscanf() should do:
"%d +%[^+]+ %d %d"
That's a scan set — see POSIX sscanf(). You should also protect yourself from buffer overflow. If you have:
char buffer[256];
use:
"%d +%255[^+]+ %d %d"
Note the off-by-one in the lengths — this is a design feature of the scanf() family of functions. You could skip leading spaces by putting a space after the first + in the format string. It is not possible to skip trailing spaces before the second + in the data; you'll have to remove those separately.
You ask for 'end of line' after the 3rd number. That's fairly hard. You might use:
"%d +%255[^+]+ %d %d %n"
passing an extra pointer to int argument to hold the offset of the last character parsed. The blank before the %n skips white space, including newlines, so if you read into int nbytes; (passing &nbytes), then you'd check if (buffer[nbytes] != '\0') { …handle trailing garbage… } (but only after checking that you had four successful conversion specifications — %n conversion specifications are not counted in the return value from sscanf() et al). There are other solutions to that; they're all grubby to some extent.

Why is this creating two inputs instead of one

https://i.imgur.com/FLxF9sP.png
As shown in the link above I have to input '<' twice instead of once, why is that? Also it seems that the first input is ignored but the second '<' is the one the program recognizes.
The same thing occurs even without a loop too.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(){
int randomGen, upper, lower, end, newRandomGen;
char answer;
upper = 100;
lower = 1;
end = 1;
do {
srand(time(0));
randomGen = rand()%(upper + lower);
printf("%d\n", randomGen);
scanf("%s\n", &answer);
}while(answer != '=');
}
Whitespace in scanf format strings, like the \n in "%c\n", tries to match any amount of whitespace, and scanf doesn’t know that there’s no whitespace left to skip until it encounters something that isn’t whitespace (like the second character you type) or the end of input. You provide it with =\n, which fills in the %c and waits until the whitespace is over. Then you provide it with another = and scanf returns. The second time around, the character could be anything and it’d still work.
Skip leading whitespace instead (and use the correct specifier for one character, %c, as has been mentioned):
scanf(" %c", &answer);
Also, it’s good practice to make sure you actually succeeded in reading something, especially when failing to read something means leaving it uninitialized and trying to read it later (another example of undefined behaviour). So check scanf’s return value, which should match the number of conversion specifiers you provided:
if (scanf(" %c", &answer) != 1) {
return EXIT_FAILURE;
}
As has been commented, you should not use the scanf format %s if you want to read a single character. Indeed, you should never use the scanf format %s for any purpose, because it will read an arbitrary number of characters into the buffer you supply, so you have no way to ensure that your buffer is large enough. So you should always supply a maximum character count. For example, %1s will read only one character. But note: that will still not work with a char variable, since it reads a string and in C, strings are arrays of char terminated with a NUL. (NUL is the character whose value is 0, also sometimes spelled \0. You could just write it as 0, but don't confuse that with the character '0' (whose value is 48, in most modern systems).
So a string containing a single character actually occupies two bytes: the character itself, and a NUL.
If you just want to read a single character, you could use the format %c. %c has a few differences from %s, and you need to be aware of all of them:
The default maximum length read by %s is "unlimited". The default for %c is 1, so %c is identical to %1c.
%s will put a NUL at the end of the characters read (which you need to leave space for), so the result is a C string. %c does not add the NUL, so you only need to leave enough space for the characters themselves.
%s skips whitespace before storing any characters. %c does not ignore whitespace. Note: a newline character (at the end of each line) is considered whitespace.
So, based on the first two rules, you could use either of the following:
char theShortString[2];
scanf("%1s", theShortString);
char theChar = theShortString[0];
or
char theChar;
scanf("%c", &theChar);
Now, when you used
scanf("%s", &theChar);
you will cause scanf to write a NUL (that is, a zero) in the byte following theChar, which quite possibly is part of a different variable. That's really bad. Don't do that. Ever. Even if you get away with it today, it will get you into serious trouble some time soon.
But that's not the problem here. The problem here is with what comes after the %s format code.
Let's take a minute (ok, maybe half an hour) to read the documentation of scanf, by typing man scanf. What we'll see, quite near the beginning, is: (emphasis added)
A directive is one of the following:
A sequence of white-space characters (space, tab, newline, etc.; see isspace(3)). This directive matches any amount of white space, including none, in the input.
So when you use "%s\n", scanf will do the following:
skip over any white-space characters in the input buffer.
read the following word up to but not including the next white-space character, and store it in the corresponding argument, followed by a NUL.
skip over any white-space following the word which it just read.
It does the last step because \n — a newline — is itself white-space, as noted in the quote from the manpage.
Now, what you actually typed was < followed by a newline, so the word read at step 2 will be just he character <. The newline you typed afterwards is white-space, so it will be ignored by step 3. But that doesn't satisfy step 3, because scanf (as documented) will ignore "any amount of white space". It doesn't know that there isn't more white space coming. You might, for example, be intending to type a blank line (that is, just a newline), in which case scanf must skip over that newline as well. So scanf keeps on reading.
Since the input buffer is now empty, the I/O library must now read the next line, which it does. And now you type another < followed by a newline. Clearly, the < is not white-space, so scanf leaves it in the input buffer and returns, knowing that it has done its duty.
Your program then checks the word read by scanf and realises that it is not an =. So it loops again, and the scanf executes again. Now there is already data in the input buffer (the second < which you typed), so scanf can immediately store that word. But it will again try to skip "any amount of white space" afterwards, which by the same logic as above will cause it to read a third line of input, which it leaves in the input buffer.
The end result is that you always need to type the next line before the previous line is passed back to your program. Obviously that's not what you want.
So what's the solution? Simple. Don't put a \n at the end of your format string.
Of course, you do want to skip that newline character. But you don't need to skip it until the next call to scanf. If you used a %1s format code, scanf would automatically skip white-space before returning input, but as we've seen above, %c is far simpler if you only want to read a single character. Since %c does not skip white-space before returning input, you need to insert an explicit directive to do so: a white-space character. It's usual to use an actual space rather than a newline for this purpose, so we would normally write this loop as:
char answer;
srand(time(0)); /* Only call srand once, at the beginning of the program */
do {
randomGen = rand()%(upper + lower); /* This is not right */
printf("%d\n", randomGen);
scanf(" %c", &answer);
} while (answer != '=');
scanf("%s\n", &answer);
Here you used the %s flag in the format string, which tells scanf to read as many characters as possible into a pre-allocated array of chars, then a null terminator to make it a C-string.
However, answer is a single char. Just writing the terminator is enough to go out of bounds, causing undefined behaviour and strange mishaps.
Instead, you should have used %c. This reads a single character into a char.

Difference between return values of scanf("%s", str) and scanf("%[^\n]s", str)

while(scanf("%s",a))
{
if(a[0]=='#')
break;
nextpermutation(a);
}
This while loop perfectly works for scanf("%s", a)
but if I put scanf("%[^\n]s", a) in the while it just runs for one time.
I checked the return value of both scanf and they were the same still.
I didn't get why this is happening...
The battle of two losers: "%s" vs. "%[^\n]s":
Both specifiers are bad and do not belong in robust code. Neither limits user input and can easily overrun the destination buffer.
The s in "%[^\n]s" serves no purpose. #Jonathan Leffler. So "%[^\n]" is reviewed following - still that is bad.
"%s" consumes leading white-space. "%[^\n]" does not #Igor Tandetnik
"%[^\n]" reads nothing if the leading character is a '\n', leaving a unchanged or rarely in an unknown state.
"%[^\n]" reads all characters in except '\n'.
"%s" reads all leading white-space, discards them, then reads and saves all non-white-spaces.
Better to use fgets() to read a line of user input and then parse it.
Example: to read a line of input, including spaces:
char a[100];
if (fgets(a, sizeof a, stdin) == NULL) Handle_EOF_or_Error();
// if the potential trailing \n is not wanted
a[strcspn(a, "\n")] = '\0';
What is that s doing there in %[^\n]s?
Your scanf("%[^\n]s") contains two format specifiers in its format string: %[^\n] followed by a lone s. This will require the input stream to contain s after the sequence matched by %[^\n]. This is, of course, is not very close in behavior to plain %s. Matching sequences for these formats will be very different for that reason alone.
Note that [] is not a modifier for %s format, as you seem to incorrectly believe. [] in scanf is a self-sufficient format specifier. Did you mean just %[^\n] by any chance instead of %[^\n]s?

Read space on fscanf

I have a field that allows whitespace in my text file that is 'citepage'. Is it possible for fscanf to read the field with blanks between tabs and then show it in printf? The citepage is getting data of timestamp.
Exemple .txt:
1[tab]AAAI[tab]Low-cost Outdoor Robot Platform for the Penn State Abington Mini Grand Challenge[tab]2005[tab]Robert Avanzato[tab]1[tab][espaco][tab]2013-03-07 16:49:1
My current code:
while (!feof(fp)){
fscanf(fp,"%d\t %19[^\t]\t %300[^\t]\t %d\t %100[^\t]\t %d\t %s\t %19[^\t]\n ",&artigos.id,artigos.sigla,artigos.titulo,&artigos.ano,artigos.autores,&artigos.citacoes,artigos.citepage,artigos.timestamp);
printf("\nid: %d ",artigos.id);
printf("\nsigla: %s ",artigos.sigla);
printf("\ntitulo: %s ",artigos.titulo);
printf("\nano: %d ",artigos.ano);
printf("\nautores: %s ",artigos.autores);
printf("\ncitacoes: %d ",artigos.citacoes);
printf("\ncitepage: %s ",artigos.citepage);
printf("\ntimestamp: %s ",artigos.timestamp);
}
fscanf is not good for separting things based on tabs or newlines as opposed to spaces, because it treats all whitespace as the same -- something to be skipped and ignored. Whenever you have a whitespace character in your format string (doesn't matter if it'a a space or tab or newline; they all do the same thing), fscanf will read and throw away whitespace until it finds a non-whitespace character. So in your case, when it gets to the \t after the %d that read the citacoes, it will skip the following\t \t in the input, and the next character to be read will be 2, so that's where it will start reading for citepage.
Now you can use %*1[\t] in the format string to skip a single tab character (rather than all whitespace), but doing so is messy and error prone. It also gets easily confused by incorrect input, making it almost impossible to give the user proper diagnostics about malformed input. But if you want to do that, replace all the tabs in the format string with %*1[\t] and remove all the spaces and it should work.
A much better choice would be to read the entire line into a buffer (with fgets) and then use strsep to split it up on the tab characters.
Also you should never use feof -- it doesn't return true until after you've tried unsuccessfully to read past the end of the file. Always check the return value of the fscanf or fgets call instead.
The '\t' and ' ' directives do the same thing: skip any white-space. To use a tab as a separator in scanf(), one needs to use "%*1[\t]". strtok() or a simple loop may be easier. See #Chris Dodd
Suggest fgets()/sscanf() for better error handling especially for this complex scan.
Further, the format specifier for artigos.citepage needs adjustment.
char buf[200];
if (fgets(buf, sizeof buf, fp) == NULL) Handle_EOFIOError();
// scan, but do not save 1 `\t`
#define TF "*1[\t]"
if (8 == sscanf(buf,
"%d" TF "%19[^\t]" TF "%300[^\t]" TF "%d" TF
"%100[^\t]" TF "%d" TF "%19[^\t]" TF "%19[^\t]",
&artigos.id, artigos.sigla, artigos.titulo, &artigos.ano,
artigos.autores, &artigos.citacoes, artigos.citepage, artigos.timestamp) {
Success();
}

Reading text with sscanf and fgets

So my text file looks similar to this
1. First 1.1
2. Second 2.2
Essentially an integer, string and then a float.
Using sscanf() and fgets() in theory, I should be able to scan this in (I have to do it in this format) but only get the integer can someone help point what I am doing wrong?
while(!feof(foo))
{
fgets(name, sizeof(name) - 1, foo);
sscanf(name,"%d%c%f", &intarray[i], &chararray[i], &floatarray[i]);
i++;
}
Where intarray, chararray, and floatarray are 1D arrays and i is an int initialized to 0.
The structure of the loop is wrong; you should not use feof() like that and you must always check the status of both fgets() and sscanf(). This code avoids overflowing the input arrays, too.
enum { MAX_ENTRIES = 10 };
int i;
int intarray[MAX_ENTRIES];
float floatarray[MAX_ENTRIES];
char chararray[MAX_ENTRIES][50];
for (i = 0; i < MAX_ENTRIES && fgets(name, sizeof(name), foo) != 0; i++)
{
if (sscanf(name,"%d. %49s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
...process format error...
}
Note the major changes:
The dot after the integer must be scanned by the format string.
The chararray has to be a 2D array to make any sense. If you read a single character with %c, it would contain the space after the first number, and the subsequent conversion specification (for the float value) would fail because the string name is not a floating point value.
The & in front of chararray[i] is not wanted when it is a 2D array. It would be needed if you were really reading a single character in a 1D array of characters instead of the whole string such as 'First' or 'Second' from the sample data.
The test checks that three values were converted successfully. Any smaller value indicates problems. With sscanf(), you'd only get EOF returned if there was nothing in the string for the first conversion specification to work on (empty string, all white space); you'd get 0 returned if the first non-blank was alphabetic or a punctuation character other than + or -, etc.
If you really want a single character instead of the name, then you'll have to arrange to read the extra characters in the word, maybe using:
if (sscanf(name,"%d %c%*s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
There's a space before the %c which is crucial; it will skip white space in the input, and then the %c will pick up the first non-blank character. The %*s will read more characters, skipping any white space (there won't be any) and then scanning a string of characters up to the next white space. The * suppresses an assignment; the scanned data won't be stored anywhere.
One of the major advantages of the fgets() plus sscanf() paradigm is that when you report the format error, you can report to the user the complete line of input that caused problems. If you use raw fscanf() or scanf(), you can only report on the first character that caused trouble, typically up to the end of the line, and then only if you write code to read that data. It is fiddlier (so the reporting is usually not very careful), and the available information is not as helpful to the user on those rare occasions when the reporting tries to be careful.
You need to change your format string to:
"%d %s %f"
The spaces are because you have spaces in your input data, the %s because you want to read a multi-character string at that point (%c only reads one character); don't worry though, as %s won't read past a space. You'll need to make sure you've got enough space in the target buffer to read the string, of course.
If you only want the first character of the second word, try:
"%d %c%s %f"
And add an extra (dummy) buffer to receive the string parsed by %s which you want to discard.
won't it be %s for string else it will only read a character with %c and then the float value might be affected.
try "%d %s %f"
%s won't help since it may read the float value itself. as far as I know, %c reads a single character. then it searches for a space that leads to problem. To scan the word, you can use a loop (terminated by a space ofcourse).

Resources