sscanf format string explicit matching of %d - c

I have the following c code
if (sscanf(input, "s %d\n", &value) == 1){}
Where it is suppose to parse the following
s -inf
....
s 0
s 1
s 2
....
s inf
but not such things as
s 5junkjunkjunk
Because it shouldn't match as there is something in between %d and the \n. Yet it does work even though it doesn't fit the format string.

This is (one of the) reasons why one should never use *scanf: it's ridiculously difficult to get it to handle malformed input robustly.
The correct way to parse something like this is: use fgets (or getline if you have it) to read an entire line, manually check for and skip over the leading "s ", then use strtod to parse the number, then check whether strtod set *endp to point to a newline.
If the syntax is even a little more complicated than what you have it may be time to reach for lex and yacc.

If you read the docs for scanf(), you'll find that every appearance of a whitespace character in the format string matches any number of whitespace characters (including zero) in the input. sscanf() can't be forced to match an entire string -- it "succeeds" if it matches everything in the format string, even if this leaves other characters (junkjunkjunk) unconsumed.

After accept answer
inf is nicely read as a floating point.
Agree scanf() is cumbersome, but sscanf() is versatile.
Note: "%n" reports the number of char scanned and does not contribute to sscanf() result.
char buf[100];
if (fgets(buf, sizeof buf, input) == NULL) EOForIOerror();
int n;
double x;
if (sscanf("s %lf %n", &x, &n) != 1 || buf[n]) ScanFailure();
// if needed
if (!isinf(x) && round(x) != x) NumberIsNotInfinityNorWholeNumber();
// x is good to go
Details:
If no legit number is found, sscanf() does not return 1 and ScanFailure() is called.
Else is a legit number is found, " %n" scans any number 0 or more trailing white-spaces such as '\n'. Then it sets the offset of the scan in n. This location will be the first non-white-space after the number. If it is not '\0' then ScanFailure() is called.

Related

General questions about scanf and fscanf in C programming language

If I'm not wrong, library function int fscanf(FILE *stream, const char *format, ...) works
exactly the same as function int scanf(const char *format, ...) except that it requires stream selection.
For example if I wanted to read two ints from standard input the code would look something like this.
int first_number;
int second_number;
scanf("%d%d", &first_number, &second_number);
There's no point of me adding newline character in between format specifiers even though the second number is entered in next line of input? Function just looks for next decimal integer right? What happens when I enter two characters instead of ints? Why the function sometimes doesn't work if there's a space between format specifiers?
In addition to that. When reading from file with fscanf(..), lets says the txt file contains next lines:
P6
255
1920 1080
Do I need to specify next line characters in fscanf(..)? I read it like this.
FILE *input = ..
char type[2];
int tr;
int width; int height;
fscanf(input, "%s\n", &type);
fscanf(input, "%d\n" &tr);
fscanf(input, "%d %d\n", &width, &height)
Is there a need for \n to signal next line?
Can fscanf(..) anyhow affect any other functions for reading files like fread()? Or is it a good practice to just stick to one function through the whole file?
scanf(...) operates like fscanf(stdin, ....).
Unless '\n', ' ', or other white spaces are inside a "%[...]", as part of a format for *scanf(), scanning functions the same as if ' ', '\t' '\n' was used. (Also for '\v, '\r, '\f.)
// All function the same.
fscanf(input, "%d\n" &tr);
fscanf(input, "%d " &tr);
fscanf(input, "%d\t" &tr);
There's no point of me adding newline character in between format specifiers even though the second number is entered in next line of input?
All format specifiers except "%n", "^[...]", "%c" consume optional leading white-spaces. With "%d" the is no need other than style to code a leading white-space in the format.
Function just looks for next decimal integer right?
Simply: yes.
What happens when I enter two characters instead of ints?
Scanning stops. The first non-numeric input remains in stdin as well as any following input. The *scanf() return value reflects the incomplete scan.
Why the function sometimes doesn't work if there's a space between format specifiers?
Need example. Having spaces between specifiers is not an issue unless the following specifier is one of "%n", "^[...]", "%c".
When reading from file with fscanf(..), .... Do I need to specify next line characters in fscanf(..)?
No. fscanf() is not line orientated. Use fgets() to read lines. fscanf() is challenging to use to read a line. Something like
char buf[100];
int cnt = fscanf(f, "%99[^\n]", buf);
if (cnt == 0) {
buf[0] = 0;
}
if (cnt != EOF) {
cnt = fscanf(f, "%*1[^\n]");
}
I read it like this. ... fscanf(input, "%s\n", &type); fscanf(input, "%d\n" &tr); ....
"it" as in a line is not read properly as "%s", "%d", "\n" all read consume 0, 1, 2, ... '\n' and other white-spaces. They do not read a line nor just the 1 character of the format.
Further "\n" does not complete upon reading 1 '\n', but continues reading all white-spaces until a non-white-space is detected (or end-of-file). Do not append such to the end of a format to read the rest of the line.
If want to read the trailing '\n', code could use int cnt = fscanf(input, "%d%*1[\n]" &tr);, but code will not know if it succeeded in reading the trailing '\n' after the int. It will have simply read it if it was there. Could use other formats, but really, using fgets() to read a line is better.
Is there a need for \n to signal next line?
No, as a format "\n" reads 0 or more whites-spaces, not just new-lines.
Can fscanf(..) anyhow affect any other functions for reading files like fread()?
Yes. All input function affect what is available next for other input functions. Mixing fread() and fscanf() is challenging to get right.
is it a good practice to just stick to one function through the whole file?
It certainly is simpler. I recommend to use input functions as building blocks for a helper function to handle your file input.
Tip: Read lines with fgets(), then parse. Set fscanf() aside until you understand why it has so much trouble with unexpected input.
The %d conversion specifier tells scanf and fscanf to skip over any leading whitespace, then read up to the first non-digit character, so you don’t need to put a newline between the two %d in the scanf call - in fact, if you do that, it means you have to have a newline between your inputs, not just blanks.
Most conversion specifiers skip over leading whitespace - the only ones that don’t are %c and %[, so you’ll want to be careful when using them.

Parsing string with sscanf that has a string in it

Project is in C. I need to parse strings that are always formatted the following way: integer, whitespace, plus sign, multi-word string, plus sign, white space, integer, whitespace, integer, end-of-line
Example:
10 +This is 1 string+ 2 -1
I'm having a hard time figuring out what to enter in the formatting of sscanf so that the string surrounded by the '+' signs get parsed correctly, without including the + signs. Assuming sscanf can be used for this case.
I tried "%d +%s+ %d %d" and that didn't work.
You use %s but that reads up to the first white space character. You want to read a string of not-plus-signs, so say that's what sscanf() should do:
"%d +%[^+]+ %d %d"
That's a scan set — see POSIX sscanf(). You should also protect yourself from buffer overflow. If you have:
char buffer[256];
use:
"%d +%255[^+]+ %d %d"
Note the off-by-one in the lengths — this is a design feature of the scanf() family of functions. You could skip leading spaces by putting a space after the first + in the format string. It is not possible to skip trailing spaces before the second + in the data; you'll have to remove those separately.
You ask for 'end of line' after the 3rd number. That's fairly hard. You might use:
"%d +%255[^+]+ %d %d %n"
passing an extra pointer to int argument to hold the offset of the last character parsed. The blank before the %n skips white space, including newlines, so if you read into int nbytes; (passing &nbytes), then you'd check if (buffer[nbytes] != '\0') { …handle trailing garbage… } (but only after checking that you had four successful conversion specifications — %n conversion specifications are not counted in the return value from sscanf() et al). There are other solutions to that; they're all grubby to some extent.

Line wise input in c

how can I enforce following input restriction in c?
First line contains float ,
Second line contains float ,
Third line int,
after pressing enter three times in console, program should be able to read each line and put the contents in respective int,int,float variables.
After three enter key press program should not wait for user input and start validation.
some test cases
line1: 34
line2:4
line3:12
result: ok
line1:
line2:4
line3:12
result: not ok
line1: Hi
line2:4
line3:12
result: not ok
so far I used the basics
scanf("%f",&p);
scanf("%f",&r);
scanf("%d",&t);
it works fine for test case 1 and 3, but fails when I leave an empty line.
You should always check the return value of scanf.
The reason is that the return value is what scanf uses to communicate conversion errors, among other errors. For example, if your program tells scanf to expect a sequence of decimal digits, and scanf encounters something that doesn't match that pattern, the return value will indicate this failure.
The value returned will be the number of items that are successfully assigned to. For example,
char str[128];
int x, y = scanf("%d %127s", &x, str);
If y is 1, then it should be assumed that x is safe to use. If y is 2, then it should be assumed that both x and str are safe to use.
This answers part of your question. The next part is how you can go about ensuring that the input is in the form of lines. scanf doesn't strictly deal with lines; it deals with other units, such as %d being an int encoded as a sequence of decimal digits (and a sign); it'll return once the decimal digit sequence ends... There's no guarantee that the decimal digits will occupy the entirety of the line.
There are actually two problems here: leading and trailing whitespace. All format specifiers, with the exception of [, c, C, and n, will cause leading whitespace to be discarded. If you want to handle leading whitespace differently, you'll need to codify how you expect leading whitespace to be handled.
Consider that discarding user input is almost always (if not always) a bad idea. If you don't care what the remainder of the line contains, you could use something like scanf("%*[^\n]"); getchar(); to discard everything trailing up to and including the '\n' newline character... The first statement would attempt to read as many non-newline characters as possible, and the second would discard the terminating newline character. However, if you want to ensure that the input occupies the entirety of the line, then you need to test the value returned by getchar.
An example using all of these considerations:
/* Test for leading whitespace (including newlines) */
int c = getchar();
if (c != '-' && !isdigit(c)) {
/* Leading whitespace found */
}
ungetc(c);
/* Test for correct data conversion */
int x, y = scanf("%d", &x);
if (y != 1) {
/* Something non-numeric was entered */
}
/* Test for trailing newline */
c = getchar();
if (c != '\n') {
/* Trailing newline found */
}
Armed with this information, perhaps you can come up with an attempt and update your question with some code if you have any problems...
P.S. I noticed in the code you wrote, you seem to have %f and %d confused; %f is for reading into floats, and %d is for reading into ints, not the other way around...
As soon as I read line wise input, I know that fgets + sscanf must be used instead of direct scanf. Of course you can use getc/getchar as a workaround, but you can get corner cases, where I findfgets + sscanf cleaner. Example to get a float alone on a line:
char line[80], dummy[2];
float val;
if (fgets(line, sizeof(line), stdin) == NULL)...
if (sscanf(line, "%f%1s", &val, dummy) != 1)...
// Ok val was alone on the line with optional ignored blanks before and/or after
You could also add a test for loooong lines :
if ((line[0] != 0) && (line[strlen(line)-1 != '\n'))...

Taking formatted input : sscanf not ignoring white spaces

I have to find out the input hours and minutes after taking inputs from the user of the form :
( Number1 : Number2 )
eg: ( 12 : 21 )
I should report 12 hours and 21 minutes and then again wait for input. If there is a mismatch in the given format, I should report it as invalid input. I wrote this code :
#include<stdio.h>
int main()
{
int hourInput=0,minutesInput=0;
char *buffer = NULL;
size_t size;
do
{
puts("\nEnter current time : ");
getline ( &buffer, &size, stdin );
if ( 2 == sscanf( buffer, "%d:%d", &hourInput, &minutesInput ) && hourInput >= 0 && hourInput <= 24 && minutesInput >=0 && minutesInput <= 60 )
{
printf("Time is : %d Hours %d Minutes", hourInput, minutesInput );
}
else
{
puts("\nInvalid Input");
}
}
while ( buffer!=NULL && buffer[0] != '\n' );
return 0;
}
Q. if someone gives spaces between the number and :, my program considers it as invalid input, while I should treat it as valid.
Can someone explain why it is happening and any idea to get rid of this issue ? As far as I understand, sscanf should ignore all the white spaces ?
To allow optional spaces before ':', replace
"%d:%d"
with
"%d :%d"
sscanf() ignores white space where its format directives tell it to ignore, not everywhere. A whitespace character in the directive such as ' ' will ignore all white spaces. %d as well as other integer and floating point directives will ignore leading white space. Thus a space before %d is redundant.
C11 7,21,6,2,8 Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [, c, or n specifier.)
Additional considerations include using %u and unsigned as an alternate way to not accept negative numbers. strptime() is a common function used for scanning strings for time info.
I think if you put a space before and after the colon, it will ignore any whitespace and still work when they don't put spaces before and after the colon.
Like this:
sscanf( buffer, "%d : %d", &hourInput, &minutesInput )
use "%d : %d" as format string. it will work with and without spaces.
1st thing allocate memory before using buffer
2nd is this a C++ program or C as getline is not a C standard function.
Check this
int main()
{
int x=0,y=0;
char bff[]="7 8";
sscanf(bff,"%d%d",&x,&y);
printf("%d %d",x,y);
}
o/p-7 8
On sscanf man pagez
RETURN VALUE OF SSCANF -
These functions return the number of input items assigned. This can
be
fewer than provided for, or even zero, in the event of a matching fail-
ure. Zero indicates that, although there was input available, no conver-
sions were assigned; typically this is due to an invalid input character,
such as an alphabetic character for a `%d' conversion. The value EOF is
returned if an input failure occurs before any conversion such as an end-
of-file occurs. If an error or end-of-file occurs after conversion has
begun, the number of conversions which were successfully completed is
returned.
Now,
if someone gives spaces between the number and :, my program considers
it as invalid input
Yes , it should consider it wrong because the sscanf reads the from the buffer in exactly the same way as %d:%d,but if a character in the input stream conflicts with format-string, the function ends, ending with a matching failure.
Characters outside of conversion specifications are expected to match the sequence of characters in the input stream; the matched characters in the input stream are scanned but not stored. (Please see the emphasis on the sentence in bold)
i.e, sscanf while writing to the memory ignores whitespaces.
Avoid comparing sscanf() return value. In your case, it is always depends on the user input. If user gives spaces between input this value changes.

Reading text with sscanf and fgets

So my text file looks similar to this
1. First 1.1
2. Second 2.2
Essentially an integer, string and then a float.
Using sscanf() and fgets() in theory, I should be able to scan this in (I have to do it in this format) but only get the integer can someone help point what I am doing wrong?
while(!feof(foo))
{
fgets(name, sizeof(name) - 1, foo);
sscanf(name,"%d%c%f", &intarray[i], &chararray[i], &floatarray[i]);
i++;
}
Where intarray, chararray, and floatarray are 1D arrays and i is an int initialized to 0.
The structure of the loop is wrong; you should not use feof() like that and you must always check the status of both fgets() and sscanf(). This code avoids overflowing the input arrays, too.
enum { MAX_ENTRIES = 10 };
int i;
int intarray[MAX_ENTRIES];
float floatarray[MAX_ENTRIES];
char chararray[MAX_ENTRIES][50];
for (i = 0; i < MAX_ENTRIES && fgets(name, sizeof(name), foo) != 0; i++)
{
if (sscanf(name,"%d. %49s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
...process format error...
}
Note the major changes:
The dot after the integer must be scanned by the format string.
The chararray has to be a 2D array to make any sense. If you read a single character with %c, it would contain the space after the first number, and the subsequent conversion specification (for the float value) would fail because the string name is not a floating point value.
The & in front of chararray[i] is not wanted when it is a 2D array. It would be needed if you were really reading a single character in a 1D array of characters instead of the whole string such as 'First' or 'Second' from the sample data.
The test checks that three values were converted successfully. Any smaller value indicates problems. With sscanf(), you'd only get EOF returned if there was nothing in the string for the first conversion specification to work on (empty string, all white space); you'd get 0 returned if the first non-blank was alphabetic or a punctuation character other than + or -, etc.
If you really want a single character instead of the name, then you'll have to arrange to read the extra characters in the word, maybe using:
if (sscanf(name,"%d %c%*s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
There's a space before the %c which is crucial; it will skip white space in the input, and then the %c will pick up the first non-blank character. The %*s will read more characters, skipping any white space (there won't be any) and then scanning a string of characters up to the next white space. The * suppresses an assignment; the scanned data won't be stored anywhere.
One of the major advantages of the fgets() plus sscanf() paradigm is that when you report the format error, you can report to the user the complete line of input that caused problems. If you use raw fscanf() or scanf(), you can only report on the first character that caused trouble, typically up to the end of the line, and then only if you write code to read that data. It is fiddlier (so the reporting is usually not very careful), and the available information is not as helpful to the user on those rare occasions when the reporting tries to be careful.
You need to change your format string to:
"%d %s %f"
The spaces are because you have spaces in your input data, the %s because you want to read a multi-character string at that point (%c only reads one character); don't worry though, as %s won't read past a space. You'll need to make sure you've got enough space in the target buffer to read the string, of course.
If you only want the first character of the second word, try:
"%d %c%s %f"
And add an extra (dummy) buffer to receive the string parsed by %s which you want to discard.
won't it be %s for string else it will only read a character with %c and then the float value might be affected.
try "%d %s %f"
%s won't help since it may read the float value itself. as far as I know, %c reads a single character. then it searches for a space that leads to problem. To scan the word, you can use a loop (terminated by a space ofcourse).

Resources