#include <stdio.h>
int main()
{
char * msg = "Internal power 10. power sufficient. total count 10";
char * temp = "Internal power %d. power %s. total count %d";
int v1, v2, ret;
char str1[64];
ret = sscanf(msg, temp, &v1, str1, &v2);
printf("%d\n", ret);
printf("%d %s %d ", v1, str1 , v2);
return 0;
}
I want to understand why sscanf is failing and why it is not able to retrieve the last variable?
%s reads a whitespace-delimited string; that is, it consumes sufficient. by the time it gets to dot, the rest of the format, ". total count %d" does not match the remainder " total count 10".
Since you're expecting the word to be followed by ., you might as well use %63[^.] i.e. maximum 63 characters that do not include a dot. Or %63[a-z] for maximum 63 ASCII lowercase letters - specifying the width explicitly also ensures that buffer overflow can't happen:
char * temp = "Internal power %d. power %63[^.]. total count %d";
P.S. always check the return value of *scanf - it tells how many specifiers were matched (in this case it should be 3); however, now 2 was returned meaning that the matching failed after the second conversion.
The problem is this part of the scanf format string: "power %s."
The problem is because the scanf format strings are not really regular expressions or otherwise doing exact matching. When you have the "%s" format, then scanf (and its siblings) will read everything until the next white-space.
That means with the string you have the "%s" will cause your sscanf call to read "sufficient." including the dot. Then the call will try to match the dot which have already been read into the string and since it's not available anymore the call will fail.
You can use sets as suggested by Jonathan Leffler in his comment. I also suggest you read e.g. this scanf (and family) reference for more details.
Related
I have a file format like this
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
...........................
I have to read first three columns as float and the last column as char array in C. All the columns are tab separated and the there is an new line character at the end of each line. Everything works fine with fscanf(fp1, "%f\t%f\t%f\t%s\n", ...) till I have a some text at the end of each line (the char string part).
There are cases where instead of AA/BB/CC, I have an empty string in the file. How to handle that case. I have tried fscanf(fp1, "%f\t%f\t%f\t%s[^\n]\n", ...) and many other things, but I am unable to figure out the right way. Can you please help me out here?
Using float rather than double will throw away about half the digits shown. You get 6-7 decimal digits with float; you get 15+ digits with double.
As to your main question: use fgets() (or POSIX
getline()) to read lines and then sscanf() to parse the line that is read. This will avoid confusion. When the input is line-based but not regular enough, don't use fscanf() and family to read the data — the file-reading scanf() functions don't care about newlines, even when you do.
Note that sscanf() will return either 3 or 4, indicating whether there was a string at the end of a line or not (or EOF, 0, 1 or 2 if it is given an empty string, or a string which doesn't start with a number, or a string which only contains one or two numbers). Always test the return value from scanf() and friends — but do so carefully. Look for the number of values that you expect (3 or 4 in this example), rather than 'not EOF'.
This leads to roughly:
#include <stdio.h>
int main(void)
{
double d[3];
char text[20];
char line[4096];
while (fgets(line, sizeof(line), stdin) != 0)
{
int rc = sscanf(line, "%lf %lf %lf %19s", &d[0], &d[1], &d[2], &text[0]);
if (rc == 4)
printf("%13.6f %13.6f %13.6f [%s]\n", d[0], d[1], d[2], text);
else if (rc == 3)
printf("%13.6f %13.6f %13.6f -NA-\n", d[0], d[1], d[2]);
else
printf("Format error: return code %d\n", rc);
}
return 0;
}
If given this file as standard input:
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
19.20212223242525 29.3031323334353637 3940.41424344454647
19.20212223242525 29.3031323334353637 3940.41424344454647 PolyVinyl-PolySaccharide
the output is:
1.996920 2.461320 130.812783 [AA]
2.461320 2.554200 138.591316 [BB]
2.554200 2.995380 146.832384 [CC]
19.202122 29.303132 3940.414243 -NA-
19.202122 29.303132 3940.414243 [PolyVinyl-PolySacch]
You can tweak the output format to suit yourself. Note that the %19s avoids buffer overflow even when the text is longer than 19 characters.
I've been sitting with this problem for 2 days and I can't figure out what I'm doing wrong. I've tried debugging (kind of? Still kind of new), followed this link: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ And I've tried Google and all kinds of things. Basically I'm reading from a file with this format:
R1 Fre 17/07/2015 18.00 FCN - SDR 0 - 2 3.211
and I have to make the program read this into a struct, but when I try printing the information it comes out all wrong. My code looks like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_INPUT 198
typedef struct game{
char weekday[4],
home_team[4],
away_team[4];
int round,
hour,
minute,
day,
month,
year,
home_goals,
away_goals,
spectators;}game;
game make_game(FILE *superliga);
int main(void){
int input_number,
number_of_games = 198,
i = 0;
game tied[MAX_INPUT];
FILE *superliga;
superliga = fopen("superliga-2015-2016.txt", "r");
for(i = 0; i < number_of_games; ++i){
tied[i] = make_game(superliga);
printf("R%d %s %d/%d/%d %d.%d %s - %s %d - %d %d\n",
tied[i].round, tied[i].weekday, tied[i].day, tied[i].month,
tied[i].year, tied[i].hour, tied[i].minute, tied[i].home_team,
tied[i].away_team, tied[i].home_goals, tied[i].away_goals,
tied[i].spectators);}
fclose(superliga);
return 0;
}
game make_game(FILE *superliga){
double spect;
struct game game_info;
fscanf(superliga, "R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
game_info.spectators = spect * 1000;
return game_info;
}
The problem is in your file. It starts with whitespaces, not with R's as you stated in the control string.
Check the return value of fscanf() and you'll see that it's zero everytime.
If you add a leading whitespace to your fscanf() call, your problem will be solved, like this:
fscanf(superliga, " R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
If each line in the file is a separate record, you should read each line as a string, then try to parse each string.
(Note that this also has the added feature of speculative parsing: you can try parsing the line in several different formats, and accept the one that parses correctly. I like to use this when I accept e.g. vector inputs, so that the user can use x y z, x, y, z, x/y/z, (x,y,z), [x,y,z], <x y z>, <x,y,z>, and so on, depending on what they like. It's only one additional scanf per format, after all.)
To read lines, you can use fgets() into a local buffer. The local buffer must be long enough. If the program is to run on POSIX.1 machines only (i.e., not on Windows), then you can use getline() instead, which can dynamically reallocate the given buffer as needed, so you're not limited to any specific line length.
To parse the string, use sscanf().
Note that all tabs, spaces, and newlines in the pattern in all of the scanf family of functions are treated exactly the same: they indicate any number of any type of whitespace. In other words, \n does not mean "and then a newline"; it means the same as a space, i.e. "and possibly some whitespace here". However, all conversions except %c and %[ automatically skip any leading whitespace; so, with the exception of a space before one of those two, the spaces in the pattern are only meaningful to us humans, they do not have any functional effect in the scanning.
All scanf family of functions return the number of successful conversions. (The only exception is the "conversion" %n, which yields the number of characters consumed; some implementations include it in the conversion count, and some others do not.) If end of input occurs prior to the first conversion, or a read error occurs, or the input does not match with the fixed part of the pattern, the functions will return EOF.
Even if you suppress saving the result of a conversion -- for example, if you have a word in the input you don't need, you can convert but discard it with %*s --, it is counted. So, for example sscanf(line, " %*d %*s %*d") returns 3 if the line starts with an integer, followed by a word (anything that is not a newline nor contains whitespace), followed by an integer.
Rather than have the function return the parsed structure, pass a pointer to the structure (and the file handle to read from), and return a status code. I prefer 0 for success, and nonzero for failure, but feel free to change that.
In other words, I'd suggest you change your read function into
#ifndef GAME_LINE_MAX
#define GAME_LINE_MAX 1022
#endif
int read_game(game *one, FILE *in)
{
char buffer[GAME_LINE_MAX + 2]; /* + '\n' + '\0' */
char *line;
/* Sanity check: no NULL pointers accepted! */
if (!one || !in)
return -1;
/* Paranoid check: Fail if read error has already occurred. */
if (ferror(in))
return -1;
/* Read the line */
line = fgets(buffer, sizeof buffer, in);
if (!line)
return -1;
/* Parse the game; pattern from OP's example: */
if (sscanf(line, "R%d %3s %d/%d/%d %d.%d %3s - %3s %d - %d %d\n",
&(one->round), one->weekday,
&(one->day), &(one->month), &(one->year),
&(one->hour), &(one->minute)
one->home_team,
one->away_team,
&(one->home_goals),
&(one->away_goals),
&(one->spectators)) < 12)
return -1; /* Line not formatted like above */
/* Spectators in the file are in units of 1000; convert: */
one->spectators *= 1000;
/* Success. */
return 0;
}
To use the above function in a loop, reading games one after another from standard input (stdin):
game g;
while (!read_game(&g, stdin)) {
/* Do something with current game stats, g */
}
if (ferror(stdin)) {
/* Read error occurred! */
} else
if (!feof(stdin)) {
/* Not all data was read/parsed! */
}
The two if clauses above are to check if there was a real read error (as in, a problem with the hardware or something like that), and whether there was unread/unparsed data (not at end of file), respectively.
There are two differences in the scanning pattern compared to the OP: First, all strings parsed are limited to 3 characters, because the structure has only room for 3+1 each. The one character is reserved for the end of string '\0', which is not counted in the maximum length for %s. Second, I parse the spectator count directly, and just multiply the field by 1000 if successful.
Also note how I used one->weekday, one->home_team, and one->away_team to refer to the character arrays. This works, because an array variable can be used as if it was a pointer to the first element in that array. (Given char a[5];, a and &a and &(a[0]) can all be used to refer to the first element in the array a). I like to use this "raw form" when scanning, because it makes it easier to match them to %s conversions, and ensure the pattern matches the parameters.
With printf it is perfectly normal to do:
int dec = 3;
float n = 4.3232;
printf("%.*f", dec, n);
But in scanf() I want to replace 100
scanf(%100[^~], string)
with something like:
int a = 100;
scanf(%[***somtehing goes here***][^~], a, string);
But I didn't manage to do it.
Not sure if it is duplicate, I will delete the question if it is.
Edit: replaced '\n' with ~.
For your stated purpose it's probably better to do this:
fgets(string, a, stdin);
http://linux.die.net/man/3/fgets
Just do a first "pass" using sprintf() where you construct the format string that you then use with scanf():
char fmt[64];
const int a = 100;
sprintf(fmt, "%%%d[^\n]", a);
The first two % signs are parsed as a unit by sprintf(); they cause it to emit a single % into the destination string.
The second %d is just the regular code to format a (decimal) integer, it will emit 100.
So the result will be that fmt contains the string "%100[^\n]" (where the \n really means an embedded newline).
Then use fmt with scanf():
const int got = scanf(fmt, string);
As usual, be sure to check the value of got after the call, if it's not 1 then that means scanf() failed to do the requested conversion.
I have an input .txt file that looks like this:
Robert Hill 53000 5
Amanda Trapp 89000 3
Jonathan Nguyen 93000 3
Mary Lou Gilley 17000 1 // Note that came contains of 3 parts!
Warren Rexroad 72000 7
I need to read those lines and parse them into three different categories: name (which is an array of chars), mileage (int) and years(int).
sscanf(line, "%[^] %d %d ", name, &mileage, &years);
This doesn't work very well for me, any suggestions?
THE PROBLEM
The problem with the current specifier passed to sscanf is that it is both ill-formed, and even when fixed it won't do what you want. If you would have used [^ ] as the first conversion specifier, sscanf would try to read as many characters as it can before hitting a space.
If we assume that a name can't contain digits specifying [^0123456789] will read the correct data, but it will also include the trailing space after the name, but before the first mileage entry. This is however easily solved by replacing the last space with a null-byte in name.
To get the number of characters read into name we can use the %n specifier to denote that we'd sscanf to store the number of bytes read into our matching argument; we can later use this value to correctly "trim" our buffer.
We should also specify a maximum width of the characters read by %[^0123456789] so that it doesn't cause a buffer-overflow, this is done by specifying the size of our buffer directly after our %.
SAMPLE IMPLEMENTATION
#include <stdio.h>
#include <string.h>
int
main (int argc, char *argv[])
{
char const * line = "Mary Lou Gilley 17000 1";
char name[255];
int mileage, years, name_length;
sscanf(line, "%254[^0123456789]%n %d %d ", name, &name_length, &mileage, &years);
name[name_length-1] = '\0';
printf ("data: '%s', %d, %d", name, mileage, years);
return 0;
}
data: 'Mary Lou Gilley', 17000, 1
If you have a function that finds the positon of the first digit like so:
// This function returns the position of the
// space before the first digit (assuming that
// the names dont contain digits)...
char *digitPos(char *s){
if isdigit(*(s+1)) return s;
else return digitPos(s+1);
}
You can then just separate the two variables by inserting a '\0' at the right position like so:
pos = digitPos(line); // This is a pointer to the space
*pos = '\0';
strcpy(name, line);
sscanf(pos + 1, "%d %d", &mileage, &years);
This might help you get started. It lacks the intelligence of BLUEPIXY's solution which handles the trailing whitespace a little better than mine ( or you could chop it off yourself).
dan#rachel ~ $ echogcc -o t t.c
dan#rachel ~ $ echo "Dan P F 3 21" | ./t
Name: Dan P F ,
Mileage: 3,
Years: 21.
Here's the code.
#include <stdio.h>
#include <string.h>
int main(){
char *buf;
int mileage, years;
while(!feof(stdin) ){
if( fscanf( stdin, "%m[^0-9] %d %d", &buf, &mileage, &years) == 3 ){
fprintf(stderr, "Name:\t %s,\nMileage:\t %d,\nYears:\t %d.\n",
buf, mileage, years
);
}
}
}
You have discovered one of the three reasons *scanf should never be used: it's almost impossible to write a format specification that handles nontrivial input syntax, especially if you have to worry about recovering from malformed input. But there are two even more important reasons:
Many input specifications, including your %[...] construct, are just as happy to overflow buffers as the infamous gets.
Numeric overflow provokes undefined behavior -- the C library is licensed to crash just because someone typed too many digits.
The correct way to parse lines like these is to scan for the first digit with strcspn("0123456789", line), or while (*p && !isdigit(*p)) p++;, then use strtoul to convert the numbers that follow.
int pos;
sscanf(line, "%*[^0-9]%n", &pos);
line[--pos]=';';
sscanf(line, "%[^;]; %d %d ", name, &mileage, &years);
The man page states that the signature of sscanf is
sscanf(const char *restrict s, const char *restrict format, ...);
I have seen an answer on SO where a function in which sscanf is used like this to check if an input was an integer.
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
Looking at !s[n] it seems to suggest that we check if sscanf scanned the character sequence until the termination character \0. So I assume n stands for the index where sscanf will be in the string s when the function ends.
But what about the variable i? What does it mean?
Edit:
To be more explicit: I see the signature of sscanf wants a pointer of type char * as first parameter. A format specifier as seconf parameter so it knows how to parse the character sequence and as much variables as conversion specifiers as next parameters. I understand now that i is for holding the parsed integer.
Since there is only one format specifier, I tried to deduce the function of n.
Is my assumption above for n correct?
Looks like the op has his answer already, but since I bothered to look this up for myself and run the code...
From "C The Pocket Reference" (2nd Ed by Herbert Shildt) scanf() section:
%n Receives an integer of value equal to the number of characters read so far
and for the return value:
The scanf() function returns a number equal to the number of the number of fields
that were successfully assigned values
The sscanf() function works the same, it just takes it's input from the supplied buffer argument ( s in this case ). The "== 1" test makes sure that only one integer was parsed and the !s[n] makes sure the input buffer is well terminated after the parsed integer and/or that there's really only one integer in the string.
Running this code, an s value like "32" gives a "true" value ( we don't have bool defined as a type on our system ) but s as "3 2" gives a "false" value because s[n] in that case is "2" and n has the value 2 ( "3 " is parsed to create the int in that case ). If s is " 3 " this function will still return true as all that white space is ingored and n has the value of 3.
Another example input, "3m", gives a "false" value as you'd expect.
Verbatim from sscanf()'s man page:
Conversions
[...]
n
Nothing is expected; instead, the number of characters
consumed thus far from the input is stored through the next pointer,
which must be a pointer to int. This is not a
conversion, although it can be suppressed with the * assignment-suppression character. The C
standard says: "Execution of
a %n directive does not increment the assignment count returned at the completion of
execution" but the Corrigendum seems to contradict this. Probably it is wise not
to make any assumptions on the effect of %n conversions on the return value.
I would like to point out that the original code is buggy:
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
I will explain why. And I will interpret the sscanf format string.
First, buggy:
Given input "1", which is the integer one, sscanf will store 1 into i. Then, since there is no white space after, sscanf will not touch n. And n is uninitialized. Because sscanf set i to 1, the value returned by sscanf will be 1, meaning 1 field scanned. Since sscanf returns 1, the part of the expression
sscanf(s, "%d %n", &i, &n) == 1
will be true. Therefore the other part of the && expression will execute. And s[n] will access some random place in memory because n is uninitialized.
Interpreting the format:
"%d %n"
Attempts to scan a number which may be a decimal number or an integer or a scientific notation number. The number is an integer, it must be followed by at least one white space. White space would be a space, \n, \t, and certain other non-printable characters. Only if it is followed by white space will it set n to the number of characters scanned to that point, including the white space.
This code might be what is intended:
static bool is_int(char const* s)
{
int i;
int fld;
return (fld = sscanf(s, "%i", &i)) == 1;
}
int main(int argc, char * argv[])
{
bool ans = false;
ans = is_int("1");
ans = is_int("m");
return 0;
}
This code is based on, if s is an integer, then sscanf will scan it and fld will be exactly one. If s is not an integer, then fld will be zero or -1. Zero if something else is there, like a word; and -1 if nothing is there but an empty string.
variable i there means until it has read an integer vaalue.
what are you trying to ask though? Its not too clear! the code will (try to ) read an integer from the string into 'i'