Detecting mismatches against constants in scanf format string - c

From the man page of scanf:
A directive is one of the following:
A sequence of white-space characters (space, tab, newline, etc.; see isspace(3)). This directive matches any amount of
white space, including none, in the input.
An ordinary character (i.e., one other than white space or '%'). This character must exactly match the next character of
input. (emphasis mine)
A conversion specification, which commences with a '%' (percent) character. A sequence of characters from the input is
converted according to this specification, and the result is placed in
the corresponding pointer
argument. If the next item of input does not match the conversion specification, the conversion fails—this is a matching
failure.
Now, consider the following code:
#include <stdio.h>
int main(void)
{
const char* fmt = "A %49s B";
char buf[50];
printf("%d\n", sscanf("A foo B", fmt, buf)); // 1
printf("%d\n", sscanf("blah blaaah blah", fmt, buf)); // 0
printf("%d\n", sscanf("A blah blah", fmt, buf)); // 1
return 0;
}
Lines 1 and 3 print 1 because matching "A" with "A" succeeds, as does matching "foo"/"blah" with %s. Line 2 prints 0 because "A" cannot be matched with "blah", so parsing stops there.
This is all fine and logical, but is there any way for me to detect that a matching failure occurred after all conversion specifications have been successfully matched and assigned? In that case, the value returned by scanf will be the number of conversion specifiers in my format string, so I can't use it to tell if matching succeeded till the very end.
In other words: the string fed to sscanf in line 3 is not "valid" in the sense that it's not in the format A [something] B. Can I use scanf to detect this, or is strtok my only option?

Employ a " %n" at the end of the format.
Directives:
" " scans 0 or more white-space. It does not fail.
"%n" saves the count of the number of characters parsed so far (as an int). It does not fail.
Set n to 0 and test to see that it changed. The change would only happen if the entire preceding format succeeded. Also test that the scan ended on a null character - thus detecting trail unwanted text.
The added " ", though optional, if very useful as typically a trailing white-space, which is often a '\n', is not offensive. It negates the needed for a scanned line of text to be preprocessed to have its line ending removed.
#include <stdio.h>
void test(const char *s) {
const char* fmt = "A %49s B %n";
char buf[50];
int n = 0;
int cnt = sscanf(s, fmt, buf, &n);
int success = n > 0 && s[n] == '\0';
printf("sscanf():%2d n:%2d success:%d '%s'\n", cnt, n, success, s);
}
int main(void) {
test("A foo B");
test("blah blaaah blah");
test("A blah blah");
test("A foo B ");
test("A foo B x");
test("");
return 0;
}
Output
sscanf(): 1 n: 7 success:1 'A foo B'
sscanf(): 0 n: 0 success:0 'blah blaaah blah'
sscanf(): 1 n: 0 success:0 'A blah blah'
sscanf(): 1 n: 8 success:1 'A foo B '
sscanf(): 1 n: 8 success:0 'A foo B x'
sscanf():-1 n: 0 success:0 ''
Note that success is determined by n alone. On lack of success, the destination scanned variables like buf should not be used. If a partial result is needed, then use the return value of sscanf().

If you want to parse more complex input, use a proper parser/lexxer. Otherwise, have a look at the %n conversion specifier:
No input is consumed. The corresponding argument shall be a pointer to signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function. No argument is converted, but one is consumed. If the conversion specification includes an assignment- suppressing character or a field width, the behavior is undefined.
You can use this multiple times: after the last variable conversion and one at the end.

For OP's use case, regex can equally be used to match the pattern.
/* see http://linux.die.net/man/3/regex */
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int regexp(const char *);
int main(int argc, char **argv){
printf("%d\n", regexp("A foo B"));
printf("%d\n", regexp("blah blaaah blah"));
printf("%d\n", regexp("A blah blah"));
return EXIT_SUCCESS;
}
int regexp(const char *input_str){
char buf[100];
regex_t regex;
int rcval;
/* compile regexp - see http://linux.die.net/man/3/regcomp */
rcval = regcomp(&regex, "^A\\s.*\\sB$", 0);
if (rcval) {
fprintf(stderr, "Could not compile regex\n");
return -1;
}
/* execute regexp - see http://linux.die.net/man/3/regexec */
rcval = regexec(&regex, input_str, 0, NULL, 0);
if (!rcval) {
fprintf(stdout, "Match\n");
regfree(&regex);
return 1;
}else{
if (rcval == REG_NOMATCH) {
fprintf(stdout, "No match\n");
regfree(&regex);
return 0;
}else{
regerror(rcval, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
regfree(&regex);
return -1;
}
}
return 0; // default to no match
}

If you are only interested in whether the entire format has been scanned, but do not care about trailing text in the stream, you can use this terse method:
int success = 0;
fscanf(stream, "A %49s B%n", buf, &success);
if(!success) handleError();
The trick is, that the "%n" conversion does not happen unless the entire format is matched successfully. Since any matching of the format will consume one or more characters from the stream, the value that the "%n" writes will always be non-zero. Thus, success will be set to a truthy value if, and only if the entire format was matched.

Related

fscanf with whitespaces as separators - what format should I use?

I have a txt file that its lines are as follows
[7 chars string][whitespace][5 chars string][whitespace][integer]
I want to use fscanf() to read all these into memory, and I'm confused about what format should I use.
Here's an example of such line:
hello box 94324
Notice the filling whitespaces in each string, apart from the separating whitespace.
Edit: I know about the recommendation to use fgets() first, I cannot use it here.
Edit: here's my code
typedef struct Product {
char* id; //Product ID number. This is the key of the search tree.
char* productName; //Name of the product.
int currentQuantity; //How many items are there in stock, currently.
} Product;
int main()
{
FILE *initial_inventory_file = NULL;
Product product = { NULL, NULL, 0 };
//open file
initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");
product.id = malloc(sizeof(char) * 10); //- Product ID: 9 digits exactly. (10 for null character)
product.productName = malloc(sizeof(char) * 11); //- Product name: 10 chars exactly.
//go through each line in inital inventory
while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) != EOF)
{
printf("%9c %10c %i\n", product.id, product.productName, product.currentQuantity);
}
//cleanup...
...
}
Here's a file example: (it's actually 10 chars, 9 chars, and int)
022456789 box-large 1234
023356789 cart-small 1234
023456789 box 1234
985477321 dog food 2
987644421 cat food 5555
987654320 snaks 4444
987654321 crate 9999
987654322 pillows 44
Assuming your input file is well-formed, this is the most straightforward version:
char str1[8] = {0};
char str2[6] = {0};
int val;
...
int result = fscanf( input, "%7s %5s %d", str1, str2, &val );
If result is equal to 3, you successfully read all three inputs. If it's less than 3 but not EOF, then you had a matching failure on one or more of your inputs. If it's EOF, you've either hit the end of the file or there was an input error; use feof( input ) to test for EOF at that point.
If you can't guarantee your input file is well-formed (which most of us can't), you're better off reading in the entire line as text and parsing it yourself. You said you can't use fgets, but there's a way to do it with fscanf:
char buffer[128]; // or whatever size you think would be appropriate to read a line at a time
/**
* " %127[^\n]" tells scanf to skip over leading whitespace, then read
* up to 127 characters or until it sees a newline character, whichever
* comes first; the newline character is left in the input stream.
*/
if ( fscanf( input, " %127[^\n]", buffer ) == 1 )
{
// process buffer
}
You can then parse the input buffer using sscanf:
int result = sscanf( buffer, "%7s %5s %d", str1, str2, &val );
if ( result == 3 )
{
// process inputs
}
else
{
// handle input error
}
or by some other method.
EDIT
Edge cases to watch out for:
Missing one or more inputs per line
Malformed input (such as non-numeric text in the integer field)
More than one set of inputs per line
Strings that are longer than 7 or 5 characters
Value too large to store in an int
EDIT 2
The reason most of us don't recommend fscanf is because it sometimes makes error detection and recovery difficult. For example, suppose you have the input records
foo bar 123r4
blurga blah 5678
and you read it with fscanf( input, "%7s %5s %d", str1, str2, &val );. fscanf will read 123 and assign it to val, leaving r4 in the input stream. On the next call, r4 will get assigned to str1, blurga will get assigned to str2, and you'll get a matching failure on blah. Ideally you'd like to reject the whole first record, but by the time you know there's a problem it's too late.
If you read it as a string first, you can parse and check each field, and if any of them are bad, you can reject the whole thing.
Let's assume the input is
<LWS>* <first> <LWS>+ <second> <LWS>+ <integer>
where <LWS> is any whitespace character, including newlines; <first> has one to seven non-whitespace characters; <second> has one to five non-wihitespace characters; <integer> is an optionally signed integer (in hexadecimal if it begins with 0x or 0X, in octal if it begins with 0, or in decimal otherwise); * indicates zero or more of the preceding element; and + indicates one or more of the preceding element.
Let's say you have a structure,
struct record {
char first[8]; /* 7 characters + end-of-string '\0' */
char second[6]; /* 5 characters + end-of-string '\0' */
int number;
};
then you can read the next record from stream in into the structure pointed to by the caller using e.g.
#include <stdlib.h>
#include <stdio.h>
/* Read a record from stream 'in' into *'rec'.
Returns: 0 if success
-1 if invalid parameters
-2 if read error
-3 if non-conforming format
-4 if bug in function
+1 if end of stream (and no data read)
*/
int read_record(FILE *in, struct record *rec)
{
int rc;
/* Invalid parameters? */
if (!in || !rec)
return -1;
/* Try scanning the record. */
rc = fscanf(in, " %7s %5s %d", rec->first, rec->second, &(rec->number));
/* All three fields converted correctly? */
if (rc == 3)
return 0; /* Success! */
/* Only partially converted? */
if (rc > 0)
return -3;
/* Read error? */
if (ferror(in))
return -2;
/* End of input encountered? */
if (feof(in))
return +1;
/* Must be a bug somewhere above. */
return -4;
}
The conversion specifier %7s converts up to seven non-whitespace characters, and %5s up to five; the array (or char pointer) must have room for an additional end-of-string nul byte, '\0', which the scanf() family of functions add automatically.
If you do not specify the length limit, and use %s, the input can overrun the specified buffer. This is a common cause for the common buffer overflow bug.
The return value from the scanf() family of functions is the number of successful conversions (possibly 0), or EOF if an error occurs. Above, we need three conversions to fully scan a record. If we scan just 1 or 2, we have a partial record. Otherwise, we check if a stream error occurred, by checking ferror(). (Note that you want to check ferror() before feof(), because an error condition may also set feof().) If not, we check if the scanning function encountered end-of-stream before anything was converted, using feof().
If none of the above cases were met, then the scanning function returned zero or negative without neither ferror() or feof() returning true. Because the scanning pattern starts with (whitespace and) a conversion specifier, it should never return zero. The only nonpositive return value from the scanf() family of functions is EOF, which should cause feof() to return true. So, if none of the above cases were met, there must be a bug in the code, triggered by some odd corner case in the input.
A program that reads structures from some stream into a dynamically allocated buffer typically implements the following pseudocode:
Set ptr = NULL # Dynamically allocated array
Set num = 0 # Number of entries in array
Set max = 0 # Number of entries allocated for in array
Loop:
If (num >= max):
Calculate new max; num + 1 or larger
Reallocate ptr
If reallocation failed:
Report out of memory
Abort program
End if
End if
rc = read_record(stream, ptr + num)
If rc == 1:
Break out of loop
Else if rc != 0:
Report error (based on rc)
Abort program
End if
End Loop
The issue in your code using the "%9c ..."-format is that %9c does not write the string terminating character. So your string is probably filled with garbage and not terminated at all, which leads to undefined behaviour when printing it out using printf.
If you set the complete content of the strings to 0 before the first scan, it should work as intended. To achieve this, you can use calloc instead of malloc; this will initialise the memory with 0.
Note that the code also has to somehow consumes the newline character, which is solved by an additional fscanf(f,"%*c")-statement (the * indicates that the value is consumed, but not stored to a variable). Will work only if there are no other white spaces between the last digit and the newline character:
int main()
{
FILE *initial_inventory_file = NULL;
Product product = { NULL, NULL, 0 };
//open file
initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");
product.id = calloc(sizeof(char), 10); //- Product ID: 9 digits exactly. (10 for null character)
product.productName = calloc(sizeof(char), 11); //- Product name: 10 chars exactly.
//go through each line in inital inventory
while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) == 3)
{
printf("%9s %10s %i\n", product.id, product.productName, product.currentQuantity);
fscanf(initial_inventory_file,"%*c");
}
//cleanup...
}
Have you tried the format specifiers?
char seven[8] = {0};
char five[6] = {0};
int myInt = 0;
// loop here
fscanf(fp, "%s %s %d", seven, five, &myInt);
// save to structure / do whatever you want
If you're sure that the formatting and strings are the always fixed length, you could also iterate over input character by character (using something like fgetc() and manually process it. The example above could cause segmentation errors if the string in the file exceeds 5 or 7 characters.
EDIT Manual Scanning Loop:
char seven[8] = {0};
char five[6] = {0};
int myInt = 0;
// loop this part
for (int i = 0; i < 7; i++) {
seven[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
for (int i = 0; i < 5; i++) {
five[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
fscanf(fp, "%d", &myInt);

Won't read from file to struct

I've been sitting with this problem for 2 days and I can't figure out what I'm doing wrong. I've tried debugging (kind of? Still kind of new), followed this link: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ And I've tried Google and all kinds of things. Basically I'm reading from a file with this format:
R1 Fre 17/07/2015 18.00 FCN - SDR 0 - 2 3.211
and I have to make the program read this into a struct, but when I try printing the information it comes out all wrong. My code looks like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_INPUT 198
typedef struct game{
char weekday[4],
home_team[4],
away_team[4];
int round,
hour,
minute,
day,
month,
year,
home_goals,
away_goals,
spectators;}game;
game make_game(FILE *superliga);
int main(void){
int input_number,
number_of_games = 198,
i = 0;
game tied[MAX_INPUT];
FILE *superliga;
superliga = fopen("superliga-2015-2016.txt", "r");
for(i = 0; i < number_of_games; ++i){
tied[i] = make_game(superliga);
printf("R%d %s %d/%d/%d %d.%d %s - %s %d - %d %d\n",
tied[i].round, tied[i].weekday, tied[i].day, tied[i].month,
tied[i].year, tied[i].hour, tied[i].minute, tied[i].home_team,
tied[i].away_team, tied[i].home_goals, tied[i].away_goals,
tied[i].spectators);}
fclose(superliga);
return 0;
}
game make_game(FILE *superliga){
double spect;
struct game game_info;
fscanf(superliga, "R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
game_info.spectators = spect * 1000;
return game_info;
}
The problem is in your file. It starts with whitespaces, not with R's as you stated in the control string.
Check the return value of fscanf() and you'll see that it's zero everytime.
If you add a leading whitespace to your fscanf() call, your problem will be solved, like this:
fscanf(superliga, " R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
If each line in the file is a separate record, you should read each line as a string, then try to parse each string.
(Note that this also has the added feature of speculative parsing: you can try parsing the line in several different formats, and accept the one that parses correctly. I like to use this when I accept e.g. vector inputs, so that the user can use x y z, x, y, z, x/y/z, (x,y,z), [x,y,z], <x y z>, <x,y,z>, and so on, depending on what they like. It's only one additional scanf per format, after all.)
To read lines, you can use fgets() into a local buffer. The local buffer must be long enough. If the program is to run on POSIX.1 machines only (i.e., not on Windows), then you can use getline() instead, which can dynamically reallocate the given buffer as needed, so you're not limited to any specific line length.
To parse the string, use sscanf().
Note that all tabs, spaces, and newlines in the pattern in all of the scanf family of functions are treated exactly the same: they indicate any number of any type of whitespace. In other words, \n does not mean "and then a newline"; it means the same as a space, i.e. "and possibly some whitespace here". However, all conversions except %c and %[ automatically skip any leading whitespace; so, with the exception of a space before one of those two, the spaces in the pattern are only meaningful to us humans, they do not have any functional effect in the scanning.
All scanf family of functions return the number of successful conversions. (The only exception is the "conversion" %n, which yields the number of characters consumed; some implementations include it in the conversion count, and some others do not.) If end of input occurs prior to the first conversion, or a read error occurs, or the input does not match with the fixed part of the pattern, the functions will return EOF.
Even if you suppress saving the result of a conversion -- for example, if you have a word in the input you don't need, you can convert but discard it with %*s --, it is counted. So, for example sscanf(line, " %*d %*s %*d") returns 3 if the line starts with an integer, followed by a word (anything that is not a newline nor contains whitespace), followed by an integer.
Rather than have the function return the parsed structure, pass a pointer to the structure (and the file handle to read from), and return a status code. I prefer 0 for success, and nonzero for failure, but feel free to change that.
In other words, I'd suggest you change your read function into
#ifndef GAME_LINE_MAX
#define GAME_LINE_MAX 1022
#endif
int read_game(game *one, FILE *in)
{
char buffer[GAME_LINE_MAX + 2]; /* + '\n' + '\0' */
char *line;
/* Sanity check: no NULL pointers accepted! */
if (!one || !in)
return -1;
/* Paranoid check: Fail if read error has already occurred. */
if (ferror(in))
return -1;
/* Read the line */
line = fgets(buffer, sizeof buffer, in);
if (!line)
return -1;
/* Parse the game; pattern from OP's example: */
if (sscanf(line, "R%d %3s %d/%d/%d %d.%d %3s - %3s %d - %d %d\n",
&(one->round), one->weekday,
&(one->day), &(one->month), &(one->year),
&(one->hour), &(one->minute)
one->home_team,
one->away_team,
&(one->home_goals),
&(one->away_goals),
&(one->spectators)) < 12)
return -1; /* Line not formatted like above */
/* Spectators in the file are in units of 1000; convert: */
one->spectators *= 1000;
/* Success. */
return 0;
}
To use the above function in a loop, reading games one after another from standard input (stdin):
game g;
while (!read_game(&g, stdin)) {
/* Do something with current game stats, g */
}
if (ferror(stdin)) {
/* Read error occurred! */
} else
if (!feof(stdin)) {
/* Not all data was read/parsed! */
}
The two if clauses above are to check if there was a real read error (as in, a problem with the hardware or something like that), and whether there was unread/unparsed data (not at end of file), respectively.
There are two differences in the scanning pattern compared to the OP: First, all strings parsed are limited to 3 characters, because the structure has only room for 3+1 each. The one character is reserved for the end of string '\0', which is not counted in the maximum length for %s. Second, I parse the spectator count directly, and just multiply the field by 1000 if successful.
Also note how I used one->weekday, one->home_team, and one->away_team to refer to the character arrays. This works, because an array variable can be used as if it was a pointer to the first element in that array. (Given char a[5];, a and &a and &(a[0]) can all be used to refer to the first element in the array a). I like to use this "raw form" when scanning, because it makes it easier to match them to %s conversions, and ensure the pattern matches the parameters.

Input/Output scanset in c

#include<stdio.h>
int main()
{
char str[50]={'\0'};
scanf("%[A-Z]s",str);
printf("%s",str);
return 0;
}
1)
Input:
helloWORLD
output:
2)
Input:
HELLoworlD
output:
HELL
In output 1, i expected the output as "WORLD" but it didnt give any outout.
From output 2, i understood that this is working only if the first few characters are in upper case.
Can you please explain how it actually works?
Interpretation of scansets
When it is given helloWORLD, the conversion specification %[A-Z] fails immediately because the h is not an upper-case letter. Therefore, scanf() returns 0, indicating that it did not successfully convert anything. If you tested the return value, you'd know that.
When it is given HELLoworlD, the scanset matches the HELL and stops at the first o. The format string also attempts to match a literal s, but there's no way for scanf() to report that it fails to match that after matching HELL.
Buffer overflow
Note that %[A-Z] is in general dangerous (as is %s) because there is no constraint on the number of characters read. If you have:
char str[50];
then you should use:
if (scanf("%49[A-Z]", str) != 1)
...some problem in the scan...
Also note that there is a 'difference by one' between the declared length of str and the number in the format string. This is awkward; there's no way to provide that number as an argument to scanf() separate from the format string (unlike printf()), so you may end up creating the format string on the fly:
int scan_upper(char *buffer, size_t buflen)
{
char format[16];
if (buflen < 2)
return EOF; // Or other error indication
snprintf(format, sizeof(format), "%%%zu[A-Z]", buflen-1); // Check this too!?
return scanf(format, buffer);
}
When you do
scanf("%[A-Z]s",str);
It takes input as long as you enter upper-case letters.
And since you set all the array to '\0', printf() will stop printing when it meets one.
Therefore, the first input is blank, and the second is printing until the end of the upper-case string.

What does the n stand for in `sscanf(s, "%d %n", &i, &n)`?

The man page states that the signature of sscanf is
sscanf(const char *restrict s, const char *restrict format, ...);
I have seen an answer on SO where a function in which sscanf is used like this to check if an input was an integer.
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
Looking at !s[n] it seems to suggest that we check if sscanf scanned the character sequence until the termination character \0. So I assume n stands for the index where sscanf will be in the string s when the function ends.
But what about the variable i? What does it mean?
Edit:
To be more explicit: I see the signature of sscanf wants a pointer of type char * as first parameter. A format specifier as seconf parameter so it knows how to parse the character sequence and as much variables as conversion specifiers as next parameters. I understand now that i is for holding the parsed integer.
Since there is only one format specifier, I tried to deduce the function of n.
Is my assumption above for n correct?
Looks like the op has his answer already, but since I bothered to look this up for myself and run the code...
From "C The Pocket Reference" (2nd Ed by Herbert Shildt) scanf() section:
%n Receives an integer of value equal to the number of characters read so far
and for the return value:
The scanf() function returns a number equal to the number of the number of fields
that were successfully assigned values
The sscanf() function works the same, it just takes it's input from the supplied buffer argument ( s in this case ). The "== 1" test makes sure that only one integer was parsed and the !s[n] makes sure the input buffer is well terminated after the parsed integer and/or that there's really only one integer in the string.
Running this code, an s value like "32" gives a "true" value ( we don't have bool defined as a type on our system ) but s as "3 2" gives a "false" value because s[n] in that case is "2" and n has the value 2 ( "3 " is parsed to create the int in that case ). If s is " 3 " this function will still return true as all that white space is ingored and n has the value of 3.
Another example input, "3m", gives a "false" value as you'd expect.
Verbatim from sscanf()'s man page:
Conversions
[...]
n
Nothing is expected; instead, the number of characters
consumed thus far from the input is stored through the next pointer,
which must be a pointer to int. This is not a
conversion, although it can be suppressed with the * assignment-suppression character. The C
standard says: "Execution of
a %n directive does not increment the assignment count returned at the completion of
execution" but the Corrigendum seems to contradict this. Probably it is wise not
to make any assumptions on the effect of %n conversions on the return value.
I would like to point out that the original code is buggy:
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
I will explain why. And I will interpret the sscanf format string.
First, buggy:
Given input "1", which is the integer one, sscanf will store 1 into i. Then, since there is no white space after, sscanf will not touch n. And n is uninitialized. Because sscanf set i to 1, the value returned by sscanf will be 1, meaning 1 field scanned. Since sscanf returns 1, the part of the expression
sscanf(s, "%d %n", &i, &n) == 1
will be true. Therefore the other part of the && expression will execute. And s[n] will access some random place in memory because n is uninitialized.
Interpreting the format:
"%d %n"
Attempts to scan a number which may be a decimal number or an integer or a scientific notation number. The number is an integer, it must be followed by at least one white space. White space would be a space, \n, \t, and certain other non-printable characters. Only if it is followed by white space will it set n to the number of characters scanned to that point, including the white space.
This code might be what is intended:
static bool is_int(char const* s)
{
int i;
int fld;
return (fld = sscanf(s, "%i", &i)) == 1;
}
int main(int argc, char * argv[])
{
bool ans = false;
ans = is_int("1");
ans = is_int("m");
return 0;
}
This code is based on, if s is an integer, then sscanf will scan it and fld will be exactly one. If s is not an integer, then fld will be zero or -1. Zero if something else is there, like a word; and -1 if nothing is there but an empty string.
variable i there means until it has read an integer vaalue.
what are you trying to ask though? Its not too clear! the code will (try to ) read an integer from the string into 'i'

sscanf function usage in c

I'm trying to parse xxxxxx(xxxxx) format string using sscanf as following:
sscanf(command, "%s(%s)", part1, part2)
but it seems like sscanf does not support this format and as a result, part1 actually contains the whole string.
anyone has experience with this please share...
Thank you
Converting your code into a program:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%s(%s)", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
When run, it produces 'Problem! n = 1'.
This is because the first %s conversion specifier skips leading white space and then scans for 'non white-space' characters up to the next white space character (or, in this case, end of string).
You would need to use (negated) character classes or scansets to get the result you want:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
This produces:
Part1 = <<xxxxx>>; Part2 = <<yyyy>>
Note the 31's in the format; they prevent overflows.
I'm wondering how does %31 works. Does it work as %s and prevent overflow or does it just prevent overflow?
With the given data, these two lines are equivalent and both safe enough:
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
if ((n = sscanf(command, "%[^(](%[^)])", part1, part2)) != 2)
The %[...] notation is a conversion specification; so is %31[...].
The C standard says:
Each conversion specification is introduced by the character %.
After the %, the following appear in sequence:
An optional assignment-suppressing character *.
An optional decimal integer greater than zero that specifies the maximum field width
(in characters).
An optional length modifier that specifies the size of the receiving object.
A conversion specifier character that specifies the type of conversion to be applied.
The 31 is an example of the (optional) maximum field width. The [...] part is a scanset, which could perhaps be regarded as a special case of the s conversion specifier. The %s conversion specifier is approximately equivalent to %[^ \t\n].
The 31 is one less than the length of the string; the null at the end is not counted in that length. Since part1 and part2 are each an array of 32 char, the %31[^(] or %31[^)] conversion specifiers prevent buffer overflows. If the first string of characters was more than 31 characters before the (, you'd get a return value of 1 because of a mismatch on the literal open parenthesis. Similarly, the second string would be limited to 31 characters, but you'd not easily be able to tell whether the ) was in the correct place or not.
If you know exactly how long are the parts of your "command", then the simplest option is:
sscanf(command, "%6s(%5s)", part1, part2);
This assumes that 'part1' is always 6 characters long and 'part2' is always 5 characters long (as in your code sample).
Try this instead:
#include <stdio.h>
int main(void)
{
char str1[20];
char str2[20];
sscanf("Hello(World!)", "%[^(](%[^)])", str1, str2);
printf("str1=\"%s\", str2=\"%s\"\n", str1, str2);
return 0;
}
Output (ideone):
str1="Hello", str2="World!"

Resources