I have a problem fetching lines from File Pointer using fscanf.
Let's say a want to fetch a line like this:
<123324><sport><DESCfddR><spor ds>
Fscanf fetch only this part:
<123324><sport><DESCfddR><spor
Does anybody know how to overcome this problem?
Thanks in advance.
In conclusion,the best way to read lines which contain whitespaces is to use fgets:
fgets (currentLine, MAX_LENGTH , filePointer);
Using fscanf you are going to mess with a lot of problems.
You are probably using %s in the fscanf to read data. From the C11 standard,
7.21.6.2 The fscanf function
[...]
The conversion specifiers and their meanings are:
[...]
s Matches a sequence of non-white-space characters. 286
[...]
So, %s will stop scanning when it encounters a whitespace character or, if the length field is present, until the specified length or until a whitespace character, whichever occurs first.
How to fix this problem? Use a different format specifier:
fscanf(fp ," %[^\n]", buffer);
The above fscanf skips all whitespace characters, if any, until the first non-whitespace character(space at the start) and then, %[^\n] scans everything until a \n character.
You can further improve security by using
fscanf(fp ," %M[^\n]", buffer);
Replace M with the size of buffer minus one(One space reserved for the NUL-terminator). It is the length modifier. Also checking the return value of fscanf is a good idea.
Using fgets() is a better way though.
Related
I've been trying to scanf multiple consecutive strings.
I know you have to eliminate the newline character and i've also been told that "%[^\n]%*c" is the RIGHT way.
But in my tests, " %[^\n]" works even better because it's simpler and also doesn't go wrong if i try to feed it a newline directly, it keeps waiting a string.
So far so good.
Is there any case in which "%[^\n]%*c" is the better way?
Thanks a lot!
This format string " %[^\n]" allows to skip leading white spaces including the new line character '\n' stored in the input buffer by a previous call of scanf.
However if you will use fgets after a call of scanf with such a format string then fgets will read an empty string because the new line character '\n' will be present in the input buffer.
After a call of scanf with this format string "%[^\n]%*c" you may call fgets because the new line character will be removed.
Pay attention to that these format strings "%[^\n]%*c" and " %[^\n]%*c" have different effects. The first one does not allow to skip leading white space characters opposite to the second format string.
To make a call of scanf safer you should specify a length modifier as for example
char s[100];
scanf( " %99[^\n]", s );
while(scanf("%s",a))
{
if(a[0]=='#')
break;
nextpermutation(a);
}
This while loop perfectly works for scanf("%s", a)
but if I put scanf("%[^\n]s", a) in the while it just runs for one time.
I checked the return value of both scanf and they were the same still.
I didn't get why this is happening...
The battle of two losers: "%s" vs. "%[^\n]s":
Both specifiers are bad and do not belong in robust code. Neither limits user input and can easily overrun the destination buffer.
The s in "%[^\n]s" serves no purpose. #Jonathan Leffler. So "%[^\n]" is reviewed following - still that is bad.
"%s" consumes leading white-space. "%[^\n]" does not #Igor Tandetnik
"%[^\n]" reads nothing if the leading character is a '\n', leaving a unchanged or rarely in an unknown state.
"%[^\n]" reads all characters in except '\n'.
"%s" reads all leading white-space, discards them, then reads and saves all non-white-spaces.
Better to use fgets() to read a line of user input and then parse it.
Example: to read a line of input, including spaces:
char a[100];
if (fgets(a, sizeof a, stdin) == NULL) Handle_EOF_or_Error();
// if the potential trailing \n is not wanted
a[strcspn(a, "\n")] = '\0';
What is that s doing there in %[^\n]s?
Your scanf("%[^\n]s") contains two format specifiers in its format string: %[^\n] followed by a lone s. This will require the input stream to contain s after the sequence matched by %[^\n]. This is, of course, is not very close in behavior to plain %s. Matching sequences for these formats will be very different for that reason alone.
Note that [] is not a modifier for %s format, as you seem to incorrectly believe. [] in scanf is a self-sufficient format specifier. Did you mean just %[^\n] by any chance instead of %[^\n]s?
I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.
I have a question concerning fgets and fscanf in C. What exactly is the difference between these two? For example:
char str[10];
while(fgets(str,10,ptr))
{
counter++;
...
and the second example:
char str[10];
while(fscanf(ptr,"%s",str))
{
counter++;
...
when having a text file which contains strings which are separated by an empty space, for example: AB1234 AC5423 AS1433. In the first example the "counter" in the while loop will not give the same output as in the second example. When changing the "10" in the fgets function the counter will always give different results. What is the reason for this?
Can somebody please also explain what the fscanf exactly does, how long is the string in each while loop?
The function fgets read until a newline (and also stores it). fscanf with the %s specifier reads until any blank space and doesn't store it...
As a side note, you're not specifying the size of the buffer in scanf and it's unsafe. Try:
fscanf(ptr, "%9s", str)
fgets reads to a newline. fscanf only reads up to whitespace.
In your example, fgets will read up to a maximum of 9 characters from the input stream and save them to str, along with a 0 terminator. It will not skip leading whitespace. It will stop if it sees a newline (which will be saved to str) or EOF before the maximum number of characters.
fscanf with the %s conversion specifier will skip any leading whitespace, then read all non-whitespace characters, saving them to str followed by a 0 terminator. It will stop reading at the next whitespace character or EOF. Without an explicit field width, it will read as many non-whitespace characters as are in the stream, potentially overruning the target buffer.
So, imagine the input stream looks like this: "\t abcdef\n<EOF>". If you used fgets to read it, str would contain "\t abcdef\n\0". If you usedfscanf, str could contain "abcdef\0" (where \0 indicates the 0 terminator).
fgets read the whole line. fscanf with %s read a string, separate by space (or \n,\t,etc...).
Anyway, you should not use them unless you sure that the array you read to is big enough to contain the input.
You wrote When changing the "10" in the fgets function the counter will always give different results. Note that fgets and scanf don't know how much bytes to read. you should tell them. changing the "10" just enlarge the buffer these functions write to.
I've already got some code to read a text file using fscanf(), and now I need it modified so that fields that were previously whitespace-free need to allow whitespace. The text file is basically in the form of:
title: DATA
title: DATA
etc...
which is basically parsed using fgets(inputLine, 512, inputFile); sscanf(inputLine, "%*s %s", &data);, reading the DATA fields and ignoring the titles, but now some of the data fields need to allow spaces. I still need to ignore the title and the whitespace immediately after it, but then read in the rest of the line including the whitespace.
Is there anyway to do this with the sscanf() function?
If not, what is the smallest change I can make to the code to handle the whitespace properly?
UPDATE: I edited the question to replace fscanf() with fgets() + sscanf(), which is what my code is actually using. I didn't really think it was relevant when I first wrote the question which is why I simplified it to fscanf().
If you cannot use fgets() use the %[ conversion specifier (with the "exclude option"):
char buf[100];
fscanf(stdin, "%*s %99[^\n]", buf);
printf("value read: [%s]\n", buf);
But fgets() is way better.
Edit: version with fgets() + sscanf()
char buf[100], title[100];
fgets(buf, sizeof buf, stdin); /* expect string like "title: TITLE WITH SPACES" */
sscanf(buf, "%*s %99[^\n]", title);
I highly suggest you stop using fscanf() and start using fgets() (which reads a whole line) and then parse the line that has been read.
This will allow you considerably more freedom in regards to parsing non-exactly-formatted input.
The simplest thing would be to issue a
fscanf("%*s");
to discard the first part and then just call the fgets:
fgets(str, stringSize, filePtr);
If you insist on using scanf, and assuming that you want newline as a terminator, you can do this:
scanf("%*s %[^\n]", str);
Note, however, that the above, used exactly as written, is a bad idea because there's nothing to guard against str being overflown (as scanf doesn't know its size). You can, of course, set a predefined maximum size, and specify that, but then your program may not work correctly on some valid input.
If the size of the line, as defined by input format, isn't limited, then your only practical option is to use fgetc to read data char by char, periodically reallocating the buffer as you go. If you do that, then modifying it to drop all read chars until the first whitespace is fairly trivial.
A %s specifier in fscanf skips any whitespace on the input, then reads a string of non-whitespace characters up to and not including the next whitespace character.
If you want to read up to a newline, you can use %[^\n] as a specifier. In addition, a ' ' in the format string will skip whitespace on the input. So if you use
fscanf("%*s %[^\n]", &str);
it will read the first thing on the line up to the first whitespace ("title:" in your case), and throw it away, then will read whitespace chars and throw them away, then will read all chars up to a newline into str, which sounds like what you want.
Be careful that str doesn't overflow -- you might want to use
fscanf("%*s %100[^\n]", &str)
to limit the maximum string length you'll read (100 characters, not counting a terminating NUL here).
You're running up against the limits of what the *scanf family is good for. With fairly minimal changes you could try using the string-scanning modules from Dave Hanson's C Interfaces and Implementations. This stuff is a retrofit from the programming language Icon, an extremely simple and powerful string-processing language which Hanson and others worked on at Arizona. The departure from sscanf won't be too severe, and it is simpler, easier to work with, and more powerful than regular expressions. The only down side is that the code is a little hard to follow without the book—but if you do much C programming, the book is well worth having.