Making fscanf Ignore Optional Parameter - c

I am using fscanf to read a file which has lines like
Number <-whitespace-> string <-whitespace-> optional_3rd_column
I wish to extract the number and string out of each column, but ignore the 3rd_column if it exists
Example Data:
12 foo something
03 bar
24 something #randomcomment
I would want to extract 12,foo; 03,bar; 24, something while ignoring "something" and "#randomcomment"
I currently have something like
while(scanf("%d %s %*s",&num,&word)>=2)
{
assign stuff
}
However this does not work with lines with no 3rd column. How can I make it ignore everything after the 2nd string?

The problem is that the %*s is eating the number on the next line when there's no third column, and then the next %d is failing because the next token is not a number. To fix it without using gets() followed by sscanf(), you can use the character class specified:
while(scanf("%d %s%*[^\n]", &num, &word) == 2)
{
assign stuff
}
The [^\n] says to match as many characters as possible that aren't newlines, and the * suppresses assignment as before. Also note that you can't put a space between the %s and the %*[\n], because otherwise that space in the format string would match the newline, causing the %*[\n] to match the entire subsequent line, which is not what you want.

It would appear to me that the simplest solution is to scanf("%d %s", &num, &word) and then fgets() to eat the rest of the line.

Use fgets() to read a line at a time and then use sscanf() to look for the two columns you are interested in, more robust and you don't have to do anything special to ignore trailing data.

I often use gets() followed by an sscanf() on the string you just, er, gots.
Bonus: you can separate the test for end-of-input from the parsing.

Related

Variable scanf inputs

How exactly would you deal with having a variable amount of scanf inputs?
I'm scanning commands, some of them are 1. word commands but some require numeric argument. Does scanf allow the following?
scanf(" %s %d", command, argument);
Would that ignore the "argument" if only one value was inputed?
The other option i though of was
scanf(" %s", command)
if (strcmp(command, "somethin") {
scanf("%d", argument); }
But that would create a newline right? the terminal has to recieve the input in form of "> command argument"
SO, my question is, how to solve the problem of having variable number of inputs.
No, it won't "create a newline". scanf is completely unaware of any newlines. scanf treats the input stream as a linear sequence of data separated by whitespace. Newline is just whitespace, no different from any other whitespace. The only scanf format specifiers that can "see" newlines are %c and %[]. Your %s and %d are completely newline-agnostic.
Which means that your second example is doing it right (within the natural limitations of scanf). It won't "create a newline". It will read a single line, if you supply the input in a single line (like somethin 42).
You might actually run into the "opposite" problem: if the user forgets to input the required argument in a single line, the next scanf will wait for it on the next line. And on the next line. And on the next line... until he user finally supplies it. I'm not sure this behavior is desirable for you. If not, then a better idea would be to use dedicated line-based input through fgets and then parse the line manually.
P.S. There's no reason to prepend %s and %d with spaces.

scanf("%[^\n]s",a) with set size of string

So I had a code where I use
scanf("%[^\n]s",a);
and has multiple scanf to take different inputs some being string input. So I understand that scanf("%[^\n]s",a) takes input until new line has been reached, however I was wondering suppose my string can only hold up to 10 characters, then after my string has been filled, but new line hasn't been reached how can i get rid of the extra input before going to new line. I was thinking of doing getchar() until new line has been reached however in order to even check if my 10 spots has been filled I need to use getchar, so doesn't that mess up my next scanf input? Anybody have any other way to do it? Still using scanf() and getchar?
scanf("%[^\n]s",a) is a common mistake; the %[ directive is distinct from the %s directive. What you're asking from scanf is:
A group of non-'\n' characters, followed by...
A literal s character.
Perhaps you intended to write scanf("%[^\n]",a)? Note the deleted s...
You can use the * modifier to suppress assignment for a directive, for example scanf("%10[^\n]", a); followed by scanf("%*[^\n]"); to read and discard up to the next newline and getchar(); to read and discard that newline:
scanf("%10[^\n]", a);
scanf("%*[^\n]"); // read and discard up to the next newline
getchar(); // read and discard that newline
As pointed out, the two format strings could be concatenated to reduce the number of calls to scanf. I wrote my answer this way for the sake of documentation, and I'll leave it as is. Besides, I figure that attempt at optimisation would be negligible; a profiler is likely to indicate much more significant bottlenecks for optimisation in realistic scenarios.
You can use this format to hold the first 10 characters and keep the next lines of input:
scanf("%10[^\n]%*[^\n]",a);
getchar();

C fscanf reading in the correct format

I'm totally stuck with fscanf formatizer in C
Alice:(44;69) Bob:(74;68) John:(57;98)
This is what I need to read from file. Name:(score1, score2). But I failed to construct the correct formatizer for it:
while(fscanf(f, "%[a-zA-Z]%[;(]%d %d", &buff, &garbage, &s1, &s2)!= EOF){
What am I doing wrong?
First of all if you check e.g. this scanf (and family) reference you can see that you can add an asterisk to a format code to suppress assignment, so no need to pass "garbage" variables.
Secondly for your problem, the numbers are split with semicolon, but you have a space in the format which corresponds to whitespace.
In fact, due to the pattern-matching functionality built-in into scanf you should be able to simplify the format specification to e.g.
fscanf(f, " %[^:]:(%d;%d)", buff, &s1, &s2)
The "%[^:]" format reads everything as a string until it sees a colon. The rest of the format then matches the colon, the left parenthesis, a decimal number, a semicolon, another decimal number and a right parenthesis. I also added a leading space in the format, to skip leading whitespace if there is any.

How to limit scanf function in C to print error when input is too long?

I want to limit the scanf function so when I enter for example a char* array <String...> that has more then 30 characters, it will not get it and my output will be error.
I got a hint to use [^n] or something like that but I don't understand how to do it?
I know that I can use scanf("%30s"..) but I don't want the input to be valid and just the error.
Any help would be great.
If you must use scanf then I believe that the best that you can do is use the width specifier with something like: "%31s", as you've already mentioned, then use strlen to check the length of the input, and discard the string and report an error if the input is longer than your limit.
Or possibly skip the strlen by additionally using an %n in your format string, e.g. "%31s%n".
A format string using something like %[^\n] in place of %s simply instructs the function to continue reading until a newline, consuming other whitespace characters along the way. This is useful if you want to allow the input to include whitespace characters.
Review the docs for scanf (here's a copy of the man page).
You could use fgets and sscanf. With fgets you can read a little bit more than 30 characters and then check that you didn't get more than 30 characters.
Or if you really want to use scanf use it with something more than 30 like %32s.
Take a look at this page http://linux.die.net/man/3/sscanf and look for the %n format specifier. I would also recommend looking the sscanf function's return value, which will tell you the number of formatted arguments, as well as the presence of error.
I've used the %n format specifier to help in parsing a string of parameters:
ret = sscanf(line, "%d %d %s %d %d %n", &iLoad, &iScreen, &filename, &stage, &bitmapType, &offset);
The number of chars formatted by the preceding arguments is stored in the variable offset.
You could use getchar in a loop, and count the characters coming in.
int iCharCount = 0;
ch = getchar();
while( ch != EOF ) {
iCharCount++;
if(30 < iCharCount)
{
printf("You have attempted to enter more than 30 characters.\n");
printf("Aborting.");
break;
}
printf( "%c", ch );
ch = getchar();
}
This is a crude example. If it were up to me, I'd allocate a maximum-sized character array, read the whole line in, and then use string utilities to count it, edit it, and so on.
Well in C you can do:
#include <string.h>
...
if(strlen(array_ptr) > 0) error();
Obviously you need a bigger buffer to actually first get the input to it, and then check it's length, so the array could be of e.g. 512 bytes. When you copy strings to it, you need to check that you are getting 0 at the end.
sscanf ,is very good for this kind of thing, but a careful scanf can do the trick here too. You'll want to make sure that you're correctly limiting the number of characters the user can enter, so %31s would mean that 30 chars max + the \0 null terminator (31).
What you're preventing is buffer overflow attacks, which can be extremely effective ways to break sloppily written c programs. Here's an excellent article by Aleph One on BO:
http://insecure.org/stf/smashstack.html

Can fscanf() read whitespace?

I've already got some code to read a text file using fscanf(), and now I need it modified so that fields that were previously whitespace-free need to allow whitespace. The text file is basically in the form of:
title: DATA
title: DATA
etc...
which is basically parsed using fgets(inputLine, 512, inputFile); sscanf(inputLine, "%*s %s", &data);, reading the DATA fields and ignoring the titles, but now some of the data fields need to allow spaces. I still need to ignore the title and the whitespace immediately after it, but then read in the rest of the line including the whitespace.
Is there anyway to do this with the sscanf() function?
If not, what is the smallest change I can make to the code to handle the whitespace properly?
UPDATE: I edited the question to replace fscanf() with fgets() + sscanf(), which is what my code is actually using. I didn't really think it was relevant when I first wrote the question which is why I simplified it to fscanf().
If you cannot use fgets() use the %[ conversion specifier (with the "exclude option"):
char buf[100];
fscanf(stdin, "%*s %99[^\n]", buf);
printf("value read: [%s]\n", buf);
But fgets() is way better.
Edit: version with fgets() + sscanf()
char buf[100], title[100];
fgets(buf, sizeof buf, stdin); /* expect string like "title: TITLE WITH SPACES" */
sscanf(buf, "%*s %99[^\n]", title);
I highly suggest you stop using fscanf() and start using fgets() (which reads a whole line) and then parse the line that has been read.
This will allow you considerably more freedom in regards to parsing non-exactly-formatted input.
The simplest thing would be to issue a
fscanf("%*s");
to discard the first part and then just call the fgets:
fgets(str, stringSize, filePtr);
If you insist on using scanf, and assuming that you want newline as a terminator, you can do this:
scanf("%*s %[^\n]", str);
Note, however, that the above, used exactly as written, is a bad idea because there's nothing to guard against str being overflown (as scanf doesn't know its size). You can, of course, set a predefined maximum size, and specify that, but then your program may not work correctly on some valid input.
If the size of the line, as defined by input format, isn't limited, then your only practical option is to use fgetc to read data char by char, periodically reallocating the buffer as you go. If you do that, then modifying it to drop all read chars until the first whitespace is fairly trivial.
A %s specifier in fscanf skips any whitespace on the input, then reads a string of non-whitespace characters up to and not including the next whitespace character.
If you want to read up to a newline, you can use %[^\n] as a specifier. In addition, a ' ' in the format string will skip whitespace on the input. So if you use
fscanf("%*s %[^\n]", &str);
it will read the first thing on the line up to the first whitespace ("title:" in your case), and throw it away, then will read whitespace chars and throw them away, then will read all chars up to a newline into str, which sounds like what you want.
Be careful that str doesn't overflow -- you might want to use
fscanf("%*s %100[^\n]", &str)
to limit the maximum string length you'll read (100 characters, not counting a terminating NUL here).
You're running up against the limits of what the *scanf family is good for. With fairly minimal changes you could try using the string-scanning modules from Dave Hanson's C Interfaces and Implementations. This stuff is a retrofit from the programming language Icon, an extremely simple and powerful string-processing language which Hanson and others worked on at Arizona. The departure from sscanf won't be too severe, and it is simpler, easier to work with, and more powerful than regular expressions. The only down side is that the code is a little hard to follow without the book—but if you do much C programming, the book is well worth having.

Resources