using sscanf to check string format - c

I want to compare my string to a giving format.
the format that I want to use in the check is :
"xxx://xxx:xxx#xxxxxx" // all the xxx are with variable length
so I used the sscanf() as follow :
if (sscanf(stin,"%*[^:]://%*[^:]:%*[^#]#") == 0) { ... }
is it correct to compare the return of scanf to 0 in this case?

You will only get zero back if all the fields match; but that won't tell you diddly-squat in practice. It might have failed with a colon in the first character and it would still return 0.
You need at least one conversion in there that is counted (%n is not counted), and that occurs at the end so you know that what went before also matched. You can never tell if trailing context (data after the last conversion specification) matched, and sscanf() won't backup if it has converted data, even if backing up would allow the trailing context to match.
For your scenario, that might be:
char c;
int n;
if (sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%n%c", &n, &c) == 1)
This requires at least one character after the #. It also tells you how many characters there were up to and including the #.

OP's suggestion is close.
#Jonathan Leffler is correct in that comparing the result of a specifier-less sscanf() against 0 does not distinguish between a match and no-match.
To test against "xxx://xxx:xxx#xxxxxx", (and assuming any part with "x" needs at least 1 matching), use
int n = 0;
sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%*c%n", &n);
if (n > 0) {
match();
}
There is a obscure hole using this method with fscanf(). A stream of data with a \0 is a problem.

Related

How can I use sscanf to analyze string data?

How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
The array name can be 31 characters at most and the index 3 at most.
I found the following example which suppose to work with "Matrix[index1][index2]". I really couldn't understand how it does it in order to take apart the part I need to get my strings.
sscanf(inputString, "%32[^[]%*[[]%3[^]]%*[^[]%*[[]%3[^]]", matrixName, index1,index2) == 3
This try over here wasn't a success, what am I missing?
sscanf(inputString, "%32[^[]%*[[]%3[^]]", arrayName, index) == 2
How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
With sscanf, you don't. Not if you mean that you can rely on nothing being modified in the event that the input does not match the pattern. This is because sscanf, like the rest of the scanf family, processes its input and format linearly, without backtracking, and by design it fills input fields as they are successfully matched. Thus, if you scan with a format that assigns multiple fields or has trailing literal characters then it is possible for results to be stored for some fields despite a matching failure occurring.
But if that's ok with you then #gsamaras's answer provides a nearly-correct approach to parsing and validating a string according to your specified format, using sscanf. That answer also presents a nice explanation of the meaning of the format string. The problem with it is that it provides no way to distinguish between the input fully matching the format and the input failing to match at the final ], or including additional characters after.
Here is a variation on that code that accounts for those tail-end issues, too:
char array_name[32] = {0}, idx[4] = {0}, c = 0;
int n;
if (sscanf(str, "%31[^[][%3[^]]%c%n", array_name, idx, &c, &n) >= 3
&& c == ']' && str[n] == '\0')
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
The difference in the format is the replacement of the literal terminating ] with a %c directive, which matches any one character, and the addition of a %n directive, which causes the number of characters of input read so far to be stored, without itself consuming any input.
With that, if the return value is at least 3 then we know that the whole format was matched (a %n never produces a matching failure, but docs are unclear and behavior is inconsistent on whether it contributes to the returned field count). In that event, we examine variable c to determine whether there was a closing ] where we expected to find one, and we use the character count recorded in n to verify that all characters of the string were parsed (so that str[n] refers to a string terminator).
You may at this point be wondering at how complicated and cryptic that all is. And you would be right to do so. Parsing structured input is a complicated and tricky proposition, for one thing, but also the scanf family functions are pretty difficult to use. You would be better off with a regex matcher for cases like yours, or maybe with a machine-generated lexical analyzer (see lex), possibly augmented by machine-generated parser (see yacc). Even a hand-written parser that works through the input string with string functions and character comparisons might be an improvement. It's still complicated any way around, but those tools can at least make it less cryptic.
Note: the above assumes that the index can be any string of up to three characters. If you meant that it must be numeric, perhaps specifically a decimal number, perhaps specifically non-negative, then the format can be adjusted to serve that purpose.
A naive example to get you started:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "myArray[123]";
char array_name[32] = {0}, idx[4] = {0};
if(sscanf(str, "%31[^[][%3[^]]]", array_name, idx) == 2)
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
return 0;
}
Output:
arrayName = myArray
index = 123
which will find easy not-in-the-expected format cases, such as "ArrayNameidx]" and "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP[idx]", but not "ArrayName[idx".
The essence of sscanf() is to tell it where to stop, otherwise %s would read until the next whitespace.
This negated scanset %[^[] means read until you find an opening bracket.
This negated scanset %[^]] means read until you find a closing bracket.
Note: I used 31 and 3 as the width specifiers respectively, since we want to reserve the last slot for the NULL terminator, since the name of the array is assumed to be 31 characters at the most, and the index 3 at the most. The size of the array for its token is the max allowed length, plus one.
How can I use sscanf to analyze string data?
Use "%n" to detect a completed scan.
array name can be 31 characters at most and the index 3 at most.
For illustration, let us assume the index needs to limit to a numeric value [0 - 999].
Use string literal concatenation to present the format more clearly.
char name[32]; // array name can be 31 characters
#define NAME_FMT "%31[^[]"
char idx[4]; //
#define IDX_FMT "%3[0-9]"
int n = 0; // be sure to initialize
sscanf(str, NAME_FMT "[" IDX_FMT "]" "%n", array_name, idx, &n);
// Did scan complete (is `n` non-zero) with no extra text?
if (n && str[n] == '\0') {
printf("arrayName = %s\nindex = %d\n", array_name, atoi(idx));
} else {
printf("Not in the expected format \"ArrayName[idx]\"\n");
}

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

Can I use fscanf to get only digits from text that contain chars and ints?

I want to extract digits from a file that contains characters and digits.
For example:
+ 321 chris polanco 23
I want to skip the '+' and get only the 321.
Here's the code I have so far.
while(fscanf(update, "%d", &currentIn->userid) != EOF){
currentIn->index = index;
rootIn = sort(rootIn, currentIn);
index = index + 1;
currentIn = malloc(sizeof(Index));
}
I was thinking that since I had %d that it would get the first digits that it finds but I was wrong. I'm open to better ways of doing this if you guys have any.
Instead of struggling with fscanf() (and running into format problems later), I recommend to use fgets() + sscanf() combination to process each line.
If you know the the integer you are interested in starts at 3rd position in each line of the file then you can do line+2 in sscanf() to read it. Otherwise, you can modify the sscanf() format string according to the format of your input file.
char line[MAX_LINE_LEN + 1];
While ( fgets(line, sizeof line, update) )
{
if(sscanf(line+2, "%d", &currentIn->userid) != 1)
{
/* handle failure */
}
...
}
while (fscanf(update, "%*[^0-9]%d", &currentIn->userid) == 1)
{
...
}
This skips over non-digits (that's the %*[^0-9] part) followed by an integer. The suppressed assignment isn't counted, so the == 1 ensures that you got a number.
Unfortunately, it runs into a problem if the first character in the file is a digit — as pointed out by Chris Dodd. There are multiple possible solutions to that:
ungetc('a', update); will give a non-digit to read first.
while ((fscanf(update, "%*[^0-9]"), fscanf(update, "%d", &currentIn->userid)) == 1)
Or:
while (fscanf(update, "%*[^0-9]%d", &currentIn->userid) == 1 ||
fscanf(update, "%d", &currentIn->userid) == 1)
{
...
}
Depending on which you think is more likely, you could reverse the order of these two fscanf() policies. With the scanf() family of functions, there's always a problem if the string of digits is so long that the number cannot be represented in an int; you get undefined behaviour. I don't attempt to address that.
This will pick up multiple numbers per line, one per invocation. If you want a single number per line, or otherwise want control over how each line is handled, then use fgets() or readline() to read the line, and then sscanf() to do the analysis. One advantage of this is that if you so choose, you can use careful functions like strtol() to convert digits to numbers.

Ignoring integers that are next to characters using sscanf()

Sorry for the simple question, but I'm trying to find an elegant way to avoid my program seeing input like "14asdf" and accepting it just as 14.
if (sscanf(sInput, "%d", &iAssignmentMarks[0]) != 0)
Is there an easy way to prevent sscanf from pulling integers out of mangled strings like that?
You can't directly stop sscanf() from doing what it is designed and specified to do. However, you can use a little-known and seldom-used feature of sscanf() to make it easy to find out that there was a problem:
int i;
if (sscanf(sInput, "%d%n", &iAssignmentMarks[0], &i) != 1)
...failed to recognize an integer...
else if (!isspace(sInput[i]) && sInput[i] != '\0')
...character after integer was not a space character (including newline) or EOS...
The %n directive reports on the number of characters consumed up to that point, and does not count as a conversion (so there is only one conversion in that format). The %n is standard in sscanf() since C89.
For extracting a single integer, you could also use strtol() - carefully (detecting error conditions with it is surprisingly hard, but it is better than sscanf() which won't report or detect overflows). However, this technique can be used multiple times in a single format, which is often more convenient.
You want to read integers from strings. It is easier to do this with strtol instead of sscanf. strtol will return, indirectly via endptr, the address just after the last character that was succesfully read into the number. If, and only if, the string was a number, then endptr will point to the end of your number string, i.e. *endptr == \0.
char *endptr = NULL;
long n = strtol(sInput, &endptr, 10);
bool isNumber = endptr!=NULL && *endptr==0 && errno==0;
(Initial whitespace is ignored. See a strtol man page for details.
This is easy. No fancy C++ required! Just do:
char unusedChar;
if (sscanf(sInput, "%d%c", &iAssignmentMarks[0], &unusedChar) == 1)
scanf isn't that smart. You'll have to read the input as text and use strtol to convert it. One of the arguments to strtol is a char * that will point to the first character that isn't converted; if that character isn't whitespace or 0, then the input string wasn't a valid integer:
char input[SIZE]; // where SIZE is large enough for the expected values plus
// a sign, newline character, and 0 terminator
...
if (fgets(input, sizeof input, stdin))
{
char *chk;
long val = strtol(input, &chk, 10);
if (*chk == NULL || !isspace(*chk) && *chk != 0)
{
// input wasn't an integer string
}
}
If you can use c++ specific capabilities, there are more clear ways to test input strings using streams.
Check here:
http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-39.2
If you're wondering, yes this did come from another stack overflow post. Which answers this question:
Other answer

Determine if a string is an integer or a float in ANSI C

Using only ANSI C, what is the best way to, with fair certainty, determine if a C style string is either a integer or a real number (i.e float/double)?
Don't use atoi and atof as these functions return 0 on failure. Last time I checked 0 is a valid integer and float, therefore no use for determining type.
use the strto{l,ul,ull,ll,d} functions, as these set errno on failure, and also report where the converted data ended.
strtoul: http://www.opengroup.org/onlinepubs/007908799/xsh/strtoul.html
this example assumes that the string contains a single value to be converted.
#include <errno.h>
char* to_convert = "some string";
char* p = to_convert;
errno = 0;
unsigned long val = strtoul(to_convert, &p, 10);
if (errno != 0)
// conversion failed (EINVAL, ERANGE)
if (to_convert == p)
// conversion failed (no characters consumed)
if (*p != 0)
// conversion failed (trailing data)
Thanks to Jonathan Leffler for pointing out that I forgot to set errno to 0 first.
Using sscanf, you can be certain if the string is a float or int or whatever without having to special case 0, as is the case with atoi and atof solution.
Here's some example code:
int i;
float f;
if(sscanf(str, "%d", &i) != 0) //It's an int.
...
if(sscanf(str "%f", &f) != 0) //It's a float.
...
atoi and atof will convert or return a 0 if it can't.
I agree with Patrick_O that the strto{l,ul,ull,ll,d} functions are the best way to go. There are a couple of points to watch though.
Set errno to zero before calling the functions; no function does that for you.
The Open Group page linked to (which I went to before noticing that Patrick had linked to it too) points out that errno may not be set. It is set to ERANGE if the value is out of range; it may be set (but equally, may not be set) to EINVAL if the argument is invalid.
Depending on the job at hand, I'll sometimes arrange to skip over trailing white space from the end of conversion pointer returned, and then complain (reject) if the last character is not the terminating null '\0'. Or I can be sloppy and let garbage appear at the end, or I can accept optional multipliers like 'K', 'M', 'G', 'T' for kilobytes, megabytes, gigabytes, terabytes, ... or any other requirement based on the context.
I suppose you could step through the string and check if there are any . characters in it. That's just the first thing that popped into my head though, so I'm sure there are other (better) ways to be more certain.
Use strtol/strtoll (not atoi) to check integers.
Use strtof/strtod (not atof) to check doubles.
atoi and atof convert the initial part of the string, but don't tell you whether or not they used all of the string. strtol/strtod tell you whether there was extra junk after the characters converted.
So in both cases, remember to pass in a non-null TAIL parameter, and check that it points to the end of the string (that is, **TAIL == 0). Also check the return value for underflow and overflow (see the man pages or ANSI standard for details).
Note also that strod/strtol skip initial whitespace, so if you want to treat strings with initial whitespace as ill-formatted, you also need to check the first character.
It really depends on why you are asking in the first place.
If you just want to parse a number and don't know if it is a float or an integer, then just parse a float, it will correctly parse an integer as well.
If you actually want to know the type, maybe for triage, then you should really consider testing the types in the order that you consider the most relevant. Like try to parse an integer and if you can't, then try to parse a float. (The other way around will just produce a little more floats...)
atoi and atof will convert the number even if there are trailing non numerical characters. However, if you use strtol and strtod it will not only skip leading white space and an optional sign, but leave you with a pointer to the first character not in the number. Then you can check that the rest is whitespace.
Well, if you don't feel like using a new function like strtoul, you could just add another strcmp statement to see if the string is 0.
i.e.
if(atof(token) != NULL || strcmp(token, "0") == 0)

Resources