Can fscanf() read whitespace? - c

I've already got some code to read a text file using fscanf(), and now I need it modified so that fields that were previously whitespace-free need to allow whitespace. The text file is basically in the form of:
title: DATA
title: DATA
etc...
which is basically parsed using fgets(inputLine, 512, inputFile); sscanf(inputLine, "%*s %s", &data);, reading the DATA fields and ignoring the titles, but now some of the data fields need to allow spaces. I still need to ignore the title and the whitespace immediately after it, but then read in the rest of the line including the whitespace.
Is there anyway to do this with the sscanf() function?
If not, what is the smallest change I can make to the code to handle the whitespace properly?
UPDATE: I edited the question to replace fscanf() with fgets() + sscanf(), which is what my code is actually using. I didn't really think it was relevant when I first wrote the question which is why I simplified it to fscanf().

If you cannot use fgets() use the %[ conversion specifier (with the "exclude option"):
char buf[100];
fscanf(stdin, "%*s %99[^\n]", buf);
printf("value read: [%s]\n", buf);
But fgets() is way better.
Edit: version with fgets() + sscanf()
char buf[100], title[100];
fgets(buf, sizeof buf, stdin); /* expect string like "title: TITLE WITH SPACES" */
sscanf(buf, "%*s %99[^\n]", title);

I highly suggest you stop using fscanf() and start using fgets() (which reads a whole line) and then parse the line that has been read.
This will allow you considerably more freedom in regards to parsing non-exactly-formatted input.

The simplest thing would be to issue a
fscanf("%*s");
to discard the first part and then just call the fgets:
fgets(str, stringSize, filePtr);

If you insist on using scanf, and assuming that you want newline as a terminator, you can do this:
scanf("%*s %[^\n]", str);
Note, however, that the above, used exactly as written, is a bad idea because there's nothing to guard against str being overflown (as scanf doesn't know its size). You can, of course, set a predefined maximum size, and specify that, but then your program may not work correctly on some valid input.
If the size of the line, as defined by input format, isn't limited, then your only practical option is to use fgetc to read data char by char, periodically reallocating the buffer as you go. If you do that, then modifying it to drop all read chars until the first whitespace is fairly trivial.

A %s specifier in fscanf skips any whitespace on the input, then reads a string of non-whitespace characters up to and not including the next whitespace character.
If you want to read up to a newline, you can use %[^\n] as a specifier. In addition, a ' ' in the format string will skip whitespace on the input. So if you use
fscanf("%*s %[^\n]", &str);
it will read the first thing on the line up to the first whitespace ("title:" in your case), and throw it away, then will read whitespace chars and throw them away, then will read all chars up to a newline into str, which sounds like what you want.
Be careful that str doesn't overflow -- you might want to use
fscanf("%*s %100[^\n]", &str)
to limit the maximum string length you'll read (100 characters, not counting a terminating NUL here).

You're running up against the limits of what the *scanf family is good for. With fairly minimal changes you could try using the string-scanning modules from Dave Hanson's C Interfaces and Implementations. This stuff is a retrofit from the programming language Icon, an extremely simple and powerful string-processing language which Hanson and others worked on at Arizona. The departure from sscanf won't be too severe, and it is simpler, easier to work with, and more powerful than regular expressions. The only down side is that the code is a little hard to follow without the book—but if you do much C programming, the book is well worth having.

Related

scanf("%[^\n]s",a) with set size of string

So I had a code where I use
scanf("%[^\n]s",a);
and has multiple scanf to take different inputs some being string input. So I understand that scanf("%[^\n]s",a) takes input until new line has been reached, however I was wondering suppose my string can only hold up to 10 characters, then after my string has been filled, but new line hasn't been reached how can i get rid of the extra input before going to new line. I was thinking of doing getchar() until new line has been reached however in order to even check if my 10 spots has been filled I need to use getchar, so doesn't that mess up my next scanf input? Anybody have any other way to do it? Still using scanf() and getchar?
scanf("%[^\n]s",a) is a common mistake; the %[ directive is distinct from the %s directive. What you're asking from scanf is:
A group of non-'\n' characters, followed by...
A literal s character.
Perhaps you intended to write scanf("%[^\n]",a)? Note the deleted s...
You can use the * modifier to suppress assignment for a directive, for example scanf("%10[^\n]", a); followed by scanf("%*[^\n]"); to read and discard up to the next newline and getchar(); to read and discard that newline:
scanf("%10[^\n]", a);
scanf("%*[^\n]"); // read and discard up to the next newline
getchar(); // read and discard that newline
As pointed out, the two format strings could be concatenated to reduce the number of calls to scanf. I wrote my answer this way for the sake of documentation, and I'll leave it as is. Besides, I figure that attempt at optimisation would be negligible; a profiler is likely to indicate much more significant bottlenecks for optimisation in realistic scenarios.
You can use this format to hold the first 10 characters and keep the next lines of input:
scanf("%10[^\n]%*[^\n]",a);
getchar();

What are options scanf vs gets vs fgets?

I heve a following code
while ( a != 5)
scanf("%s", buffer);
This works well but takes no space in between the mentioned words or in other words, scanf terminates if we use spaces to scan
If I use this
while( a != 5)
scanf("%[^\n]", buffer);
It works only for once which is bad
I never use gets() because I know how much nasty it is..
My last option is this
while( a != 5)
fgets(buffer, sizeof(buffer), stdin);
So my questions are
Why the second command is not working inside the loop?
What are the other options I have to scan a string with spaces?
"%[^\n]" will attempt to scan everything until a newline. The next character in the input would be the \n so you should skip over it to get to the next line.
Try: "%[^\n]%*c", the %*c will discard the next character, which is the newline char.
Why the second command is not working inside the loop
Becuase, for the first time what you scan until \n, the \n is remaining in the input buffer. You need to eat up (or, in other word, discard) the stored newline from the buffer. You can make use of while (getchar()!=\n); to get that job done.
What are the other options I have to scan a string with spaces?
Well, you're almost there. You need to use fgets(). Using this, you can
Be safe from buffer overrun (Overcome limitation of gets())
Input strings with spaces (Overcome limitation of %s)
However, please keep in mind, fgets() reads and stores the trailing newline, so you may want to get rid of it and you have to do that yourself, manually.

Read File: fscanf doesn't read whitespaces?

I have a problem fetching lines from File Pointer using fscanf.
Let's say a want to fetch a line like this:
<123324><sport><DESCfddR><spor ds>
Fscanf fetch only this part:
<123324><sport><DESCfddR><spor
Does anybody know how to overcome this problem?
Thanks in advance.
In conclusion,the best way to read lines which contain whitespaces is to use fgets:
fgets (currentLine, MAX_LENGTH , filePointer);
Using fscanf you are going to mess with a lot of problems.
You are probably using %s in the fscanf to read data. From the C11 standard,
7.21.6.2 The fscanf function
[...]
The conversion specifiers and their meanings are:
[...]
s Matches a sequence of non-white-space characters. 286
[...]
So, %s will stop scanning when it encounters a whitespace character or, if the length field is present, until the specified length or until a whitespace character, whichever occurs first.
How to fix this problem? Use a different format specifier:
fscanf(fp ," %[^\n]", buffer);
The above fscanf skips all whitespace characters, if any, until the first non-whitespace character(space at the start) and then, %[^\n] scans everything until a \n character.
You can further improve security by using
fscanf(fp ," %M[^\n]", buffer);
Replace M with the size of buffer minus one(One space reserved for the NUL-terminator). It is the length modifier. Also checking the return value of fscanf is a good idea.
Using fgets() is a better way though.

How to limit scanf function in C to print error when input is too long?

I want to limit the scanf function so when I enter for example a char* array <String...> that has more then 30 characters, it will not get it and my output will be error.
I got a hint to use [^n] or something like that but I don't understand how to do it?
I know that I can use scanf("%30s"..) but I don't want the input to be valid and just the error.
Any help would be great.
If you must use scanf then I believe that the best that you can do is use the width specifier with something like: "%31s", as you've already mentioned, then use strlen to check the length of the input, and discard the string and report an error if the input is longer than your limit.
Or possibly skip the strlen by additionally using an %n in your format string, e.g. "%31s%n".
A format string using something like %[^\n] in place of %s simply instructs the function to continue reading until a newline, consuming other whitespace characters along the way. This is useful if you want to allow the input to include whitespace characters.
Review the docs for scanf (here's a copy of the man page).
You could use fgets and sscanf. With fgets you can read a little bit more than 30 characters and then check that you didn't get more than 30 characters.
Or if you really want to use scanf use it with something more than 30 like %32s.
Take a look at this page http://linux.die.net/man/3/sscanf and look for the %n format specifier. I would also recommend looking the sscanf function's return value, which will tell you the number of formatted arguments, as well as the presence of error.
I've used the %n format specifier to help in parsing a string of parameters:
ret = sscanf(line, "%d %d %s %d %d %n", &iLoad, &iScreen, &filename, &stage, &bitmapType, &offset);
The number of chars formatted by the preceding arguments is stored in the variable offset.
You could use getchar in a loop, and count the characters coming in.
int iCharCount = 0;
ch = getchar();
while( ch != EOF ) {
iCharCount++;
if(30 < iCharCount)
{
printf("You have attempted to enter more than 30 characters.\n");
printf("Aborting.");
break;
}
printf( "%c", ch );
ch = getchar();
}
This is a crude example. If it were up to me, I'd allocate a maximum-sized character array, read the whole line in, and then use string utilities to count it, edit it, and so on.
Well in C you can do:
#include <string.h>
...
if(strlen(array_ptr) > 0) error();
Obviously you need a bigger buffer to actually first get the input to it, and then check it's length, so the array could be of e.g. 512 bytes. When you copy strings to it, you need to check that you are getting 0 at the end.
sscanf ,is very good for this kind of thing, but a careful scanf can do the trick here too. You'll want to make sure that you're correctly limiting the number of characters the user can enter, so %31s would mean that 30 chars max + the \0 null terminator (31).
What you're preventing is buffer overflow attacks, which can be extremely effective ways to break sloppily written c programs. Here's an excellent article by Aleph One on BO:
http://insecure.org/stf/smashstack.html

Which is the best way to get input from user in C?

Many people said that scanf shouldn't be used in "more serious program", same as with getline.
I started to be lost: if every input function I got across people said that I shouldn't use any of them, then what should I use? Is there is a more "standard" way to get input that I'm not aware of?
Generally, fgets() is considered a good option. It reads whole lines into a buffer, and from there you can do what you need. If you want behavior like scanf(), you can pass the strings you read along to sscanf().
The main advantage of this, is that if the string fails to convert, it's easy to recover, whereas with scanf() you're left with input on stdin which you need to drain. Plus, you won't wind up in the pitfall of mixing line-oriented input with scanf(), which causes headaches when things like \n get left on stdin commonly leading new coders to believe the input calls had been ignored altogether.
Something like this might be to your liking:
char line[256];
int i;
if (fgets(line, sizeof(line), stdin)) {
if (1 == sscanf(line, "%d", &i)) {
/* i can be safely used */
}
}
Above you should note that fgets() returns NULL on EOF or error, which is why I wrapped it in an if. The sscanf() call returns the number of fields that were successfully converted.
Keep in mind that fgets() may not read a whole line if the line is larger than your buffer, which in a "serious" program is certainly something you should consider.
For simple input where you can set a fixed limit on the input length, I would recommend reading the data from the terminal with fgets().
This is because fgets() lets you specify the buffer size (as opposed to gets(), which for this very reason should pretty much never be used to read input from humans):
char line[256];
if(fgets(line, sizeof line, stdin) != NULL)
{
/* Now inspect and further parse the string in line. */
}
Remember that it will retain e.g. the linefeed character(s), which might be surprising.
UPDATE: As pointed out in a comment, there's a better alternative if you're okay with getting responsibility for tracking the memory: getline(). This is probably the best general-purpose solution for POSIX code, since it doesn't have any static limit on the length of lines to be read.
There are several problems with using scanf:
reading text with a plain %s conversion specifier has the same risk as using gets(); if the user types in a string that's longer than what the target buffer is sized to hold, you'll get a buffer overrun;
if using %d or %f to read numeric input, certain bad patterns cannot be caught and rejected completely -- if you're reading an integer with %d and the user types "12r4", scanf will convert and assign the 12 while leaving r4 in the input stream to foul up the next read;
some conversion specifiers skip leading whitespace, others do not, and failure to take that into account can lead to problems where some input is skipped completely;
Basically, it takes a lot of extra effort to bulletproof reads using scanf.
A good alternative is to read all input as text using fgets(), and then tokenize and convert the input using sscanf or combinations of strtok, strtol, strtod, etc.
Use fgets to get the data and use sscanf (or another method) to interpret them.
See this page to learn why it is better to use fgets + sscanf rather than scanf
http://c-faq.com/stdio/scanfprobs.html

Resources