Can sscanf be used to match wildcards? - c

For instance consider an array of C strings, all digits
..."12334", "21335", "24335"...
and I want to know how many of these strings matches this wildcard mask
**33* (where * = any digit 0-9)
Could I use sscanf(str, mask, ...) to accomplish this? The format "%1d%1d%[3]%[3]%1d" seems to match more than I want (when 33 isn't there) and "%1d%1d33%1d" seems to behave weirdly, matching some but not all matching entries.
The context in my code:
if (sscanf(array[i], mask, &a1, &a2, &a3) == 3)
3 being the number of wildcard digits matched.

The format "%1d%1d33%1d" should be correct, assuming your inputs are all numbers. But you haven't told us what specific inputs it's failing on. You should consider that the strings "1 2334" and " 1\n\n233 \t 4" would actually match because %d will eat whitespace until it finds an integer.
Beware that if you were to use "%2d33%1d" this would be even worse, because a 2-character integer can be a single digit with a negative.
In case it's not already apparent, using sscanf for this type of matching is not appropriate. You are better off using a regular expressions library, which excel at this kind of thing.
However, by far the simplest approach, if you just want something quick that works, is to use short-circuit evaluation along with isdigit. You don't even need to check the string length:
int matches( const char * s )
{
return s
&& isdigit(s[0])
&& isdigit(s[1])
&& '3' == s[2]
&& '3' == s[3]
&& isdigit(s[4]);
}

Related

How can I use sscanf to analyze string data?

How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
The array name can be 31 characters at most and the index 3 at most.
I found the following example which suppose to work with "Matrix[index1][index2]". I really couldn't understand how it does it in order to take apart the part I need to get my strings.
sscanf(inputString, "%32[^[]%*[[]%3[^]]%*[^[]%*[[]%3[^]]", matrixName, index1,index2) == 3
This try over here wasn't a success, what am I missing?
sscanf(inputString, "%32[^[]%*[[]%3[^]]", arrayName, index) == 2
How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
With sscanf, you don't. Not if you mean that you can rely on nothing being modified in the event that the input does not match the pattern. This is because sscanf, like the rest of the scanf family, processes its input and format linearly, without backtracking, and by design it fills input fields as they are successfully matched. Thus, if you scan with a format that assigns multiple fields or has trailing literal characters then it is possible for results to be stored for some fields despite a matching failure occurring.
But if that's ok with you then #gsamaras's answer provides a nearly-correct approach to parsing and validating a string according to your specified format, using sscanf. That answer also presents a nice explanation of the meaning of the format string. The problem with it is that it provides no way to distinguish between the input fully matching the format and the input failing to match at the final ], or including additional characters after.
Here is a variation on that code that accounts for those tail-end issues, too:
char array_name[32] = {0}, idx[4] = {0}, c = 0;
int n;
if (sscanf(str, "%31[^[][%3[^]]%c%n", array_name, idx, &c, &n) >= 3
&& c == ']' && str[n] == '\0')
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
The difference in the format is the replacement of the literal terminating ] with a %c directive, which matches any one character, and the addition of a %n directive, which causes the number of characters of input read so far to be stored, without itself consuming any input.
With that, if the return value is at least 3 then we know that the whole format was matched (a %n never produces a matching failure, but docs are unclear and behavior is inconsistent on whether it contributes to the returned field count). In that event, we examine variable c to determine whether there was a closing ] where we expected to find one, and we use the character count recorded in n to verify that all characters of the string were parsed (so that str[n] refers to a string terminator).
You may at this point be wondering at how complicated and cryptic that all is. And you would be right to do so. Parsing structured input is a complicated and tricky proposition, for one thing, but also the scanf family functions are pretty difficult to use. You would be better off with a regex matcher for cases like yours, or maybe with a machine-generated lexical analyzer (see lex), possibly augmented by machine-generated parser (see yacc). Even a hand-written parser that works through the input string with string functions and character comparisons might be an improvement. It's still complicated any way around, but those tools can at least make it less cryptic.
Note: the above assumes that the index can be any string of up to three characters. If you meant that it must be numeric, perhaps specifically a decimal number, perhaps specifically non-negative, then the format can be adjusted to serve that purpose.
A naive example to get you started:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "myArray[123]";
char array_name[32] = {0}, idx[4] = {0};
if(sscanf(str, "%31[^[][%3[^]]]", array_name, idx) == 2)
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
return 0;
}
Output:
arrayName = myArray
index = 123
which will find easy not-in-the-expected format cases, such as "ArrayNameidx]" and "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP[idx]", but not "ArrayName[idx".
The essence of sscanf() is to tell it where to stop, otherwise %s would read until the next whitespace.
This negated scanset %[^[] means read until you find an opening bracket.
This negated scanset %[^]] means read until you find a closing bracket.
Note: I used 31 and 3 as the width specifiers respectively, since we want to reserve the last slot for the NULL terminator, since the name of the array is assumed to be 31 characters at the most, and the index 3 at the most. The size of the array for its token is the max allowed length, plus one.
How can I use sscanf to analyze string data?
Use "%n" to detect a completed scan.
array name can be 31 characters at most and the index 3 at most.
For illustration, let us assume the index needs to limit to a numeric value [0 - 999].
Use string literal concatenation to present the format more clearly.
char name[32]; // array name can be 31 characters
#define NAME_FMT "%31[^[]"
char idx[4]; //
#define IDX_FMT "%3[0-9]"
int n = 0; // be sure to initialize
sscanf(str, NAME_FMT "[" IDX_FMT "]" "%n", array_name, idx, &n);
// Did scan complete (is `n` non-zero) with no extra text?
if (n && str[n] == '\0') {
printf("arrayName = %s\nindex = %d\n", array_name, atoi(idx));
} else {
printf("Not in the expected format \"ArrayName[idx]\"\n");
}

Is there a better way to achieve this result than using sscanf?

I need to scan varied incoming messages from a serial stream to check if they contain this string:
"Everything: Received: switchX yy(y)"
where X = 1 to 9 and yy(y) is "on" or "off". i.e. "Everything: Received: switch4 on" or "Everything: Received: switch2 off" etc.
I am using the following code on an ATMega328 to do the check and pass the relevant variables to the transmit() function:
valid_data = sscanf(serial_buffer, "Everything: Received: switch%u %s", &switch_number, command);
if(valid_data == 2)
{
if(strcmp(command, "on") == 0)
{
transmit(switch_number, 1);
}
if(strcmp(command, "off") == 0)
{
transmit(switch_number, 0);
}
}
The check is triggered when the serial_buffer input ISR detects "\n". '\0' is appended to the serial stream to terminate the string.
It works and I'm not pushed for space/processing power but I just wondered if this is the best way to achieve the required result?
It works and I'm not pushed for space/processing power but I just wondered if this is the best way to achieve the required result?
It's unclear on which criteria you want us to judge, since neither speed nor memory usage is a pressing concern, but in the absence of pressure from those considerations I personally rate code simplicity and clarity as the most important criteria for source code, other than correctness, of course.
From that perspective, a solution based on sscanf() is good, especially with a comparatively simple format string such as you in fact have. The pattern for the lines you want to match is pretty clear in the format string, and the following logic is clear and simple, too. As a bonus, it also should produce small code, since a library function does most of the work, and it is reasonable to hope that the implementation has put some effort into optimizing that function for good performance, so it's probably a win even on those criteria with which you were not too concerned.
There are, however, some possible correctness issues:
sscanf() does not match whitespace literally. A run of one or more whitespace characters in the format string matches any run of zero or more whitespace characters in the input.
sscanf() skips leading whitespace before most fields, and before %u fields in particular.
switch numbers outside the specified range of 1 - 9 can be scanned.
the command buffer can easily be overrun.
sscanf() will ignore anything in the input string after the last field match
All of those issues can be dealt with, if needed. Here, for instance, is an alternative that handles all of them except the amount of whitespace between words (but including avoiding whitspace between "switch" and the digit):
unsigned char switch_number;
int nchars = 0;
int valid_data = sscanf(serial_buffer, "Everything: Received: switch%c %n",
&switch_number, &nchars);
if (valid_data >= 1 && switch_number - (unsigned) '1' < 9) {
char *command = serial_buffer + nchars;
if (strcmp(command, "on") == 0) {
transmit(switch_number - '0', 1);
} else if (strcmp(command, "off") == 0) {
transmit(switch_number - '0', 0);
} // else not a match
} // else not a match
Key differences from yours include
the switch number is read via a %c directive, which reads a single character without skipping leading whitespace. The validation condition switch_number - (unsigned) '1' < 9 ensures that the character read is between '1' and '9'. It makes use of the fact that unsigned arithmetic wraps around.
instead of reading the command into a separate buffer, the length of the leading substring is captured via a %n directive. This allows testing the whole tail against "on" and "off", thereby removing the need of an extra buffer, and enabling you to reject lines with trailing words.
If you want to check that all the whitespace, too, exactly matches, then the %n can help with that as well. For example,
if (nchars == 30 && serial_buffer[11] == ' ' && serial_buffer[21] == ' '
serial_buffer[29] == ' ') // it's OK

using sscanf to check string format

I want to compare my string to a giving format.
the format that I want to use in the check is :
"xxx://xxx:xxx#xxxxxx" // all the xxx are with variable length
so I used the sscanf() as follow :
if (sscanf(stin,"%*[^:]://%*[^:]:%*[^#]#") == 0) { ... }
is it correct to compare the return of scanf to 0 in this case?
You will only get zero back if all the fields match; but that won't tell you diddly-squat in practice. It might have failed with a colon in the first character and it would still return 0.
You need at least one conversion in there that is counted (%n is not counted), and that occurs at the end so you know that what went before also matched. You can never tell if trailing context (data after the last conversion specification) matched, and sscanf() won't backup if it has converted data, even if backing up would allow the trailing context to match.
For your scenario, that might be:
char c;
int n;
if (sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%n%c", &n, &c) == 1)
This requires at least one character after the #. It also tells you how many characters there were up to and including the #.
OP's suggestion is close.
#Jonathan Leffler is correct in that comparing the result of a specifier-less sscanf() against 0 does not distinguish between a match and no-match.
To test against "xxx://xxx:xxx#xxxxxx", (and assuming any part with "x" needs at least 1 matching), use
int n = 0;
sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%*c%n", &n);
if (n > 0) {
match();
}
There is a obscure hole using this method with fscanf(). A stream of data with a \0 is a problem.

Efficient way to extract formatted data in C

I have a string of the form
[S{i,j} : this is stack overflow]
I want to extract i,j and this is stack overflow in two separate strings.
sscanf will not work here as strings to be extracted can have spaces.
Can anyone please suggest an efficient way to do this?
If the string cannot contain the ending square bracket, you can absolutely use sscanf():
int i, j;
char text[128];
if( sscanf(input, "[S{%d,%d} : %127[^]]", &i, &j, text) == 3 )
{
printf("got it all!\n");
}
For more information about the somewhat lesser known conversion specifier [, see for instance this manual page. Basically, the conversion %[^]] means "all characters except a closing square bracket, it's a special form of the syntax using both negation (^) and doubling the closing bracket to include it in the set of negated characters.
UPDATE If you really mean "in two separate strings", then of course the above is wrong since it parses out the numbers into int-type variables. To get the pair as a string, use something like:
char ij[32], text[128];
if( sscanf(input, "[S{%31[^}]} : %127[^]]", ij, text) == 2 )
{
}

Switching numbers inside string

I got a string, that inside it has:
2#0.88315#1#1.5005#true#0.112 and it keep going...
I need to switch every number thats 2 or bigger, to 1,
so I wrote this :
for (i = 0 ; i < strlen(data) ; i++)
{
if (data[i] >= 50 && data[i] <= 57) // If it's a number
{
data[i] = '1'; // switch it to one
while (data[i] >= 48 && data[i] <= 57)
{
i++;
}
}
}
The problem is, that it makes numbers like 0.051511 as 1.111111 too...
Because it doesnt look at a double as one number, but every number seperatly...
How can I do it ?
Thanks
To clarify the question since it is unclear, you want to have the following input:
"2#0.88315#1#1.5005#true#0.112"
To be modified to be the following:
"1#0.88315#1#1#true#0.112"
Your problem is that you need to parse each number into a float value to do any sort of comparison. Either this, or you will need to manually parse it by checking for a '.' character. Doing it manually is rigid, error-prone and unnecessary because the C standard library provides functions which can help you.
Since this is homework, I'll give you some tips on how to approach this problem instead of the actual solution. What you should do is try to write a solution with these steps and if you get stuck, edit the original question with the code you wrote, where it is failing and why you think it is failing.
Your first step is to tokenise the input into the following:
"2"
"0.88315"
"1"
"1.5005"
"true"
"0.112"
This can be done by iterating through the string and either splitting it or using the pointer after which a '#' character occurs. Splitting the string can be done with strtok. However, strtok will split the string by modifying it which is not necessarily needed in our case. The simpler method is simply to iterate through the string and stop each time after a '#' character is reached. The input would then be tokenised to the following:
"2#0.88315#1#1.5005#true#0.112"
"0.88315#1#1.5005#true#0.112"
"1#1.5005#true#0.112"
"1.5005#true#0.112"
"true#0.112"
"0.112"
Some of these substrings do not start with a string which represents a float. You will need to determine which of them do. To do this, you can attempt to parse the front of each string as a float. This can be done with sscanf. After parsing the floats, you will be able to do the comparison you want to.
You are trying to modify the string into a different length so when replacing a float value by a '1', you need to check the length of the original value. If it is longer than 1 character, you will have to shift the subsequent characters forward. For example:
"3.423#1"
If you parsed the first token and found it to be > 2, you would replace the first character with a '1'. This result in:
"1.423#1"
You then still need to delete the rest of that token by shifting the rest of the string down to get:
"1#1"
It looks like you're comparing a char and an int in your if statements.
You should figure out why this matters and compensate for it.
You're comparing the characters in the string one at a time. If you need to consider everything between the "#" symbols as one number, this won't work. Try to get these numbers into an array, cast them to a double, and then do your comparison against 2.

Resources