Efficient way to extract formatted data in C - c

I have a string of the form
[S{i,j} : this is stack overflow]
I want to extract i,j and this is stack overflow in two separate strings.
sscanf will not work here as strings to be extracted can have spaces.
Can anyone please suggest an efficient way to do this?

If the string cannot contain the ending square bracket, you can absolutely use sscanf():
int i, j;
char text[128];
if( sscanf(input, "[S{%d,%d} : %127[^]]", &i, &j, text) == 3 )
{
printf("got it all!\n");
}
For more information about the somewhat lesser known conversion specifier [, see for instance this manual page. Basically, the conversion %[^]] means "all characters except a closing square bracket, it's a special form of the syntax using both negation (^) and doubling the closing bracket to include it in the set of negated characters.
UPDATE If you really mean "in two separate strings", then of course the above is wrong since it parses out the numbers into int-type variables. To get the pair as a string, use something like:
char ij[32], text[128];
if( sscanf(input, "[S{%31[^}]} : %127[^]]", ij, text) == 2 )
{
}

Related

How can I use sscanf to analyze string data?

How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
The array name can be 31 characters at most and the index 3 at most.
I found the following example which suppose to work with "Matrix[index1][index2]". I really couldn't understand how it does it in order to take apart the part I need to get my strings.
sscanf(inputString, "%32[^[]%*[[]%3[^]]%*[^[]%*[[]%3[^]]", matrixName, index1,index2) == 3
This try over here wasn't a success, what am I missing?
sscanf(inputString, "%32[^[]%*[[]%3[^]]", arrayName, index) == 2
How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
With sscanf, you don't. Not if you mean that you can rely on nothing being modified in the event that the input does not match the pattern. This is because sscanf, like the rest of the scanf family, processes its input and format linearly, without backtracking, and by design it fills input fields as they are successfully matched. Thus, if you scan with a format that assigns multiple fields or has trailing literal characters then it is possible for results to be stored for some fields despite a matching failure occurring.
But if that's ok with you then #gsamaras's answer provides a nearly-correct approach to parsing and validating a string according to your specified format, using sscanf. That answer also presents a nice explanation of the meaning of the format string. The problem with it is that it provides no way to distinguish between the input fully matching the format and the input failing to match at the final ], or including additional characters after.
Here is a variation on that code that accounts for those tail-end issues, too:
char array_name[32] = {0}, idx[4] = {0}, c = 0;
int n;
if (sscanf(str, "%31[^[][%3[^]]%c%n", array_name, idx, &c, &n) >= 3
&& c == ']' && str[n] == '\0')
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
The difference in the format is the replacement of the literal terminating ] with a %c directive, which matches any one character, and the addition of a %n directive, which causes the number of characters of input read so far to be stored, without itself consuming any input.
With that, if the return value is at least 3 then we know that the whole format was matched (a %n never produces a matching failure, but docs are unclear and behavior is inconsistent on whether it contributes to the returned field count). In that event, we examine variable c to determine whether there was a closing ] where we expected to find one, and we use the character count recorded in n to verify that all characters of the string were parsed (so that str[n] refers to a string terminator).
You may at this point be wondering at how complicated and cryptic that all is. And you would be right to do so. Parsing structured input is a complicated and tricky proposition, for one thing, but also the scanf family functions are pretty difficult to use. You would be better off with a regex matcher for cases like yours, or maybe with a machine-generated lexical analyzer (see lex), possibly augmented by machine-generated parser (see yacc). Even a hand-written parser that works through the input string with string functions and character comparisons might be an improvement. It's still complicated any way around, but those tools can at least make it less cryptic.
Note: the above assumes that the index can be any string of up to three characters. If you meant that it must be numeric, perhaps specifically a decimal number, perhaps specifically non-negative, then the format can be adjusted to serve that purpose.
A naive example to get you started:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "myArray[123]";
char array_name[32] = {0}, idx[4] = {0};
if(sscanf(str, "%31[^[][%3[^]]]", array_name, idx) == 2)
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
return 0;
}
Output:
arrayName = myArray
index = 123
which will find easy not-in-the-expected format cases, such as "ArrayNameidx]" and "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP[idx]", but not "ArrayName[idx".
The essence of sscanf() is to tell it where to stop, otherwise %s would read until the next whitespace.
This negated scanset %[^[] means read until you find an opening bracket.
This negated scanset %[^]] means read until you find a closing bracket.
Note: I used 31 and 3 as the width specifiers respectively, since we want to reserve the last slot for the NULL terminator, since the name of the array is assumed to be 31 characters at the most, and the index 3 at the most. The size of the array for its token is the max allowed length, plus one.
How can I use sscanf to analyze string data?
Use "%n" to detect a completed scan.
array name can be 31 characters at most and the index 3 at most.
For illustration, let us assume the index needs to limit to a numeric value [0 - 999].
Use string literal concatenation to present the format more clearly.
char name[32]; // array name can be 31 characters
#define NAME_FMT "%31[^[]"
char idx[4]; //
#define IDX_FMT "%3[0-9]"
int n = 0; // be sure to initialize
sscanf(str, NAME_FMT "[" IDX_FMT "]" "%n", array_name, idx, &n);
// Did scan complete (is `n` non-zero) with no extra text?
if (n && str[n] == '\0') {
printf("arrayName = %s\nindex = %d\n", array_name, atoi(idx));
} else {
printf("Not in the expected format \"ArrayName[idx]\"\n");
}

Can sscanf be used to match wildcards?

For instance consider an array of C strings, all digits
..."12334", "21335", "24335"...
and I want to know how many of these strings matches this wildcard mask
**33* (where * = any digit 0-9)
Could I use sscanf(str, mask, ...) to accomplish this? The format "%1d%1d%[3]%[3]%1d" seems to match more than I want (when 33 isn't there) and "%1d%1d33%1d" seems to behave weirdly, matching some but not all matching entries.
The context in my code:
if (sscanf(array[i], mask, &a1, &a2, &a3) == 3)
3 being the number of wildcard digits matched.
The format "%1d%1d33%1d" should be correct, assuming your inputs are all numbers. But you haven't told us what specific inputs it's failing on. You should consider that the strings "1 2334" and " 1\n\n233 \t 4" would actually match because %d will eat whitespace until it finds an integer.
Beware that if you were to use "%2d33%1d" this would be even worse, because a 2-character integer can be a single digit with a negative.
In case it's not already apparent, using sscanf for this type of matching is not appropriate. You are better off using a regular expressions library, which excel at this kind of thing.
However, by far the simplest approach, if you just want something quick that works, is to use short-circuit evaluation along with isdigit. You don't even need to check the string length:
int matches( const char * s )
{
return s
&& isdigit(s[0])
&& isdigit(s[1])
&& '3' == s[2]
&& '3' == s[3]
&& isdigit(s[4]);
}

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

using sscanf to check string format

I want to compare my string to a giving format.
the format that I want to use in the check is :
"xxx://xxx:xxx#xxxxxx" // all the xxx are with variable length
so I used the sscanf() as follow :
if (sscanf(stin,"%*[^:]://%*[^:]:%*[^#]#") == 0) { ... }
is it correct to compare the return of scanf to 0 in this case?
You will only get zero back if all the fields match; but that won't tell you diddly-squat in practice. It might have failed with a colon in the first character and it would still return 0.
You need at least one conversion in there that is counted (%n is not counted), and that occurs at the end so you know that what went before also matched. You can never tell if trailing context (data after the last conversion specification) matched, and sscanf() won't backup if it has converted data, even if backing up would allow the trailing context to match.
For your scenario, that might be:
char c;
int n;
if (sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%n%c", &n, &c) == 1)
This requires at least one character after the #. It also tells you how many characters there were up to and including the #.
OP's suggestion is close.
#Jonathan Leffler is correct in that comparing the result of a specifier-less sscanf() against 0 does not distinguish between a match and no-match.
To test against "xxx://xxx:xxx#xxxxxx", (and assuming any part with "x" needs at least 1 matching), use
int n = 0;
sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%*c%n", &n);
if (n > 0) {
match();
}
There is a obscure hole using this method with fscanf(). A stream of data with a \0 is a problem.

C, reading simple formula strings from input

for a homework, we need to input simple formulas (such as 3*2, 4+10, 50/16, etc.) and calculate the result (and rest) using only addition, subtraction, and bit shifting. Anyway, I could use three subsequent input reading, however I thought I'd try getting the formula in one pass using fgets() and sscanf(). Here is what I have :
int *v; // value (left side)
int *m; // modifier (right side)
char *o; // operant
int res = sscanf(buffer,"%d%s%d",v,o,m);
But naturally, this does not work, because o gets all the remaining portion of the string, leaving m with nothing (m equals whatever value is where and when it is declared)
Now, what would be the proper way to accomplish this?
NOTE : I'm using a trim function to trim extra spaces.
Try %c instead of %s. If the operator is always a single character, and there are no spaces between the operator and the operands, this should work.
By the way, are you initializing v, m, and o to actually point to something? It would be much better to do this:
int v;
int m
char o;
int res = sscanf(buffer, "%d%c%d", &v, &o, &m);
As my "Intro to Programming in C" professor used to say: "C is dangerous. Practice safe C!".
You may use %[-+*/] (with your set of operators) to ensure, that operator string gets only operator characters.
Since you don't have extraneous spaces and operators are all one character long, you could use %c to fill in o.

Resources