Is there a better way to achieve this result than using sscanf? - c

I need to scan varied incoming messages from a serial stream to check if they contain this string:
"Everything: Received: switchX yy(y)"
where X = 1 to 9 and yy(y) is "on" or "off". i.e. "Everything: Received: switch4 on" or "Everything: Received: switch2 off" etc.
I am using the following code on an ATMega328 to do the check and pass the relevant variables to the transmit() function:
valid_data = sscanf(serial_buffer, "Everything: Received: switch%u %s", &switch_number, command);
if(valid_data == 2)
{
if(strcmp(command, "on") == 0)
{
transmit(switch_number, 1);
}
if(strcmp(command, "off") == 0)
{
transmit(switch_number, 0);
}
}
The check is triggered when the serial_buffer input ISR detects "\n". '\0' is appended to the serial stream to terminate the string.
It works and I'm not pushed for space/processing power but I just wondered if this is the best way to achieve the required result?

It works and I'm not pushed for space/processing power but I just wondered if this is the best way to achieve the required result?
It's unclear on which criteria you want us to judge, since neither speed nor memory usage is a pressing concern, but in the absence of pressure from those considerations I personally rate code simplicity and clarity as the most important criteria for source code, other than correctness, of course.
From that perspective, a solution based on sscanf() is good, especially with a comparatively simple format string such as you in fact have. The pattern for the lines you want to match is pretty clear in the format string, and the following logic is clear and simple, too. As a bonus, it also should produce small code, since a library function does most of the work, and it is reasonable to hope that the implementation has put some effort into optimizing that function for good performance, so it's probably a win even on those criteria with which you were not too concerned.
There are, however, some possible correctness issues:
sscanf() does not match whitespace literally. A run of one or more whitespace characters in the format string matches any run of zero or more whitespace characters in the input.
sscanf() skips leading whitespace before most fields, and before %u fields in particular.
switch numbers outside the specified range of 1 - 9 can be scanned.
the command buffer can easily be overrun.
sscanf() will ignore anything in the input string after the last field match
All of those issues can be dealt with, if needed. Here, for instance, is an alternative that handles all of them except the amount of whitespace between words (but including avoiding whitspace between "switch" and the digit):
unsigned char switch_number;
int nchars = 0;
int valid_data = sscanf(serial_buffer, "Everything: Received: switch%c %n",
&switch_number, &nchars);
if (valid_data >= 1 && switch_number - (unsigned) '1' < 9) {
char *command = serial_buffer + nchars;
if (strcmp(command, "on") == 0) {
transmit(switch_number - '0', 1);
} else if (strcmp(command, "off") == 0) {
transmit(switch_number - '0', 0);
} // else not a match
} // else not a match
Key differences from yours include
the switch number is read via a %c directive, which reads a single character without skipping leading whitespace. The validation condition switch_number - (unsigned) '1' < 9 ensures that the character read is between '1' and '9'. It makes use of the fact that unsigned arithmetic wraps around.
instead of reading the command into a separate buffer, the length of the leading substring is captured via a %n directive. This allows testing the whole tail against "on" and "off", thereby removing the need of an extra buffer, and enabling you to reject lines with trailing words.
If you want to check that all the whitespace, too, exactly matches, then the %n can help with that as well. For example,
if (nchars == 30 && serial_buffer[11] == ' ' && serial_buffer[21] == ' '
serial_buffer[29] == ' ') // it's OK

Related

How can I use sscanf to analyze string data?

How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
The array name can be 31 characters at most and the index 3 at most.
I found the following example which suppose to work with "Matrix[index1][index2]". I really couldn't understand how it does it in order to take apart the part I need to get my strings.
sscanf(inputString, "%32[^[]%*[[]%3[^]]%*[^[]%*[[]%3[^]]", matrixName, index1,index2) == 3
This try over here wasn't a success, what am I missing?
sscanf(inputString, "%32[^[]%*[[]%3[^]]", arrayName, index) == 2
How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
With sscanf, you don't. Not if you mean that you can rely on nothing being modified in the event that the input does not match the pattern. This is because sscanf, like the rest of the scanf family, processes its input and format linearly, without backtracking, and by design it fills input fields as they are successfully matched. Thus, if you scan with a format that assigns multiple fields or has trailing literal characters then it is possible for results to be stored for some fields despite a matching failure occurring.
But if that's ok with you then #gsamaras's answer provides a nearly-correct approach to parsing and validating a string according to your specified format, using sscanf. That answer also presents a nice explanation of the meaning of the format string. The problem with it is that it provides no way to distinguish between the input fully matching the format and the input failing to match at the final ], or including additional characters after.
Here is a variation on that code that accounts for those tail-end issues, too:
char array_name[32] = {0}, idx[4] = {0}, c = 0;
int n;
if (sscanf(str, "%31[^[][%3[^]]%c%n", array_name, idx, &c, &n) >= 3
&& c == ']' && str[n] == '\0')
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
The difference in the format is the replacement of the literal terminating ] with a %c directive, which matches any one character, and the addition of a %n directive, which causes the number of characters of input read so far to be stored, without itself consuming any input.
With that, if the return value is at least 3 then we know that the whole format was matched (a %n never produces a matching failure, but docs are unclear and behavior is inconsistent on whether it contributes to the returned field count). In that event, we examine variable c to determine whether there was a closing ] where we expected to find one, and we use the character count recorded in n to verify that all characters of the string were parsed (so that str[n] refers to a string terminator).
You may at this point be wondering at how complicated and cryptic that all is. And you would be right to do so. Parsing structured input is a complicated and tricky proposition, for one thing, but also the scanf family functions are pretty difficult to use. You would be better off with a regex matcher for cases like yours, or maybe with a machine-generated lexical analyzer (see lex), possibly augmented by machine-generated parser (see yacc). Even a hand-written parser that works through the input string with string functions and character comparisons might be an improvement. It's still complicated any way around, but those tools can at least make it less cryptic.
Note: the above assumes that the index can be any string of up to three characters. If you meant that it must be numeric, perhaps specifically a decimal number, perhaps specifically non-negative, then the format can be adjusted to serve that purpose.
A naive example to get you started:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "myArray[123]";
char array_name[32] = {0}, idx[4] = {0};
if(sscanf(str, "%31[^[][%3[^]]]", array_name, idx) == 2)
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
return 0;
}
Output:
arrayName = myArray
index = 123
which will find easy not-in-the-expected format cases, such as "ArrayNameidx]" and "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP[idx]", but not "ArrayName[idx".
The essence of sscanf() is to tell it where to stop, otherwise %s would read until the next whitespace.
This negated scanset %[^[] means read until you find an opening bracket.
This negated scanset %[^]] means read until you find a closing bracket.
Note: I used 31 and 3 as the width specifiers respectively, since we want to reserve the last slot for the NULL terminator, since the name of the array is assumed to be 31 characters at the most, and the index 3 at the most. The size of the array for its token is the max allowed length, plus one.
How can I use sscanf to analyze string data?
Use "%n" to detect a completed scan.
array name can be 31 characters at most and the index 3 at most.
For illustration, let us assume the index needs to limit to a numeric value [0 - 999].
Use string literal concatenation to present the format more clearly.
char name[32]; // array name can be 31 characters
#define NAME_FMT "%31[^[]"
char idx[4]; //
#define IDX_FMT "%3[0-9]"
int n = 0; // be sure to initialize
sscanf(str, NAME_FMT "[" IDX_FMT "]" "%n", array_name, idx, &n);
// Did scan complete (is `n` non-zero) with no extra text?
if (n && str[n] == '\0') {
printf("arrayName = %s\nindex = %d\n", array_name, atoi(idx));
} else {
printf("Not in the expected format \"ArrayName[idx]\"\n");
}

compare strings using sscanf, but ignore whitespaces

For a command line application I need to compare the input string to a command pattern. White spaces need to be ignored.
This line should match input strings like " drop all " and " drop all":
int rc = sscanf( input, "drop all");
But what does indicate a successful match here?
Use "%n" to record where the scanning stopped.
Add white space in the format to wherever WS in input needs to be ignored.
int n = -1;
sscanf( input, " drop all %n", &n);
// v---- Did scanning reach the end of the format?
// | v---- Was there additional text in `input`?
if (n >= 0 && input[n] == '\0') Success();
Rather than working with dirty data, it's often better to clean it up and then work with it. Cleanup has to only happen once, whereas dirty data adds code complexity every time you have to use it. This step is often referred to as "normalization". Normalize the input to a canonical form before using it.
Clean up the input by trimming whitespace and doing whatever other normalization is necessary (such as folding and normalizing internal whitespace).
You could write your own trim function, but I'd recommend you use a pre-existing function like Gnome Lib's g_strstrip(). Gnome Lib brings in all sorts of handy functions.
#include <glib.h>
void normalize_cmd( char *str ) {
g_strstrip(str);
// Any other normalization you might want to do, like
// folding multiple spaces or changing a hard tab to
// a space.
}
Then you can use strcmp on the normalized input.
// This isn't strictly necessary, but it's nice to keep a copy
// of the original around for error messages and such.
char *cmd = g_strdup(input);
normalize_cmd(cmd);
if ( strcmp(cmd, "drop all") == 0) {
puts("yes");
}
else {
puts("no");
}
Putting all the normalization up front reduces the complexity of all downstream code having to work with that input; they don't have to keep repeating the same concerns about dirty data. By putting all the normalization in one place, rather than scattered all over the code, you're sure it's consistent, and the normalization method can be consistently updated.

using sscanf to check string format

I want to compare my string to a giving format.
the format that I want to use in the check is :
"xxx://xxx:xxx#xxxxxx" // all the xxx are with variable length
so I used the sscanf() as follow :
if (sscanf(stin,"%*[^:]://%*[^:]:%*[^#]#") == 0) { ... }
is it correct to compare the return of scanf to 0 in this case?
You will only get zero back if all the fields match; but that won't tell you diddly-squat in practice. It might have failed with a colon in the first character and it would still return 0.
You need at least one conversion in there that is counted (%n is not counted), and that occurs at the end so you know that what went before also matched. You can never tell if trailing context (data after the last conversion specification) matched, and sscanf() won't backup if it has converted data, even if backing up would allow the trailing context to match.
For your scenario, that might be:
char c;
int n;
if (sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%n%c", &n, &c) == 1)
This requires at least one character after the #. It also tells you how many characters there were up to and including the #.
OP's suggestion is close.
#Jonathan Leffler is correct in that comparing the result of a specifier-less sscanf() against 0 does not distinguish between a match and no-match.
To test against "xxx://xxx:xxx#xxxxxx", (and assuming any part with "x" needs at least 1 matching), use
int n = 0;
sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%*c%n", &n);
if (n > 0) {
match();
}
There is a obscure hole using this method with fscanf(). A stream of data with a \0 is a problem.

Can I use fscanf to get only digits from text that contain chars and ints?

I want to extract digits from a file that contains characters and digits.
For example:
+ 321 chris polanco 23
I want to skip the '+' and get only the 321.
Here's the code I have so far.
while(fscanf(update, "%d", &currentIn->userid) != EOF){
currentIn->index = index;
rootIn = sort(rootIn, currentIn);
index = index + 1;
currentIn = malloc(sizeof(Index));
}
I was thinking that since I had %d that it would get the first digits that it finds but I was wrong. I'm open to better ways of doing this if you guys have any.
Instead of struggling with fscanf() (and running into format problems later), I recommend to use fgets() + sscanf() combination to process each line.
If you know the the integer you are interested in starts at 3rd position in each line of the file then you can do line+2 in sscanf() to read it. Otherwise, you can modify the sscanf() format string according to the format of your input file.
char line[MAX_LINE_LEN + 1];
While ( fgets(line, sizeof line, update) )
{
if(sscanf(line+2, "%d", &currentIn->userid) != 1)
{
/* handle failure */
}
...
}
while (fscanf(update, "%*[^0-9]%d", &currentIn->userid) == 1)
{
...
}
This skips over non-digits (that's the %*[^0-9] part) followed by an integer. The suppressed assignment isn't counted, so the == 1 ensures that you got a number.
Unfortunately, it runs into a problem if the first character in the file is a digit — as pointed out by Chris Dodd. There are multiple possible solutions to that:
ungetc('a', update); will give a non-digit to read first.
while ((fscanf(update, "%*[^0-9]"), fscanf(update, "%d", &currentIn->userid)) == 1)
Or:
while (fscanf(update, "%*[^0-9]%d", &currentIn->userid) == 1 ||
fscanf(update, "%d", &currentIn->userid) == 1)
{
...
}
Depending on which you think is more likely, you could reverse the order of these two fscanf() policies. With the scanf() family of functions, there's always a problem if the string of digits is so long that the number cannot be represented in an int; you get undefined behaviour. I don't attempt to address that.
This will pick up multiple numbers per line, one per invocation. If you want a single number per line, or otherwise want control over how each line is handled, then use fgets() or readline() to read the line, and then sscanf() to do the analysis. One advantage of this is that if you so choose, you can use careful functions like strtol() to convert digits to numbers.

strcmp not working

I know this may be a totally newbie question (I haven't touched C in a long while), but can someone tell me why this isn't working?
printf("Enter command: ");
bzero(buffer,256);
fgets(buffer,255,stdin);
if (strcmp(buffer, "exit") == 0)
return 0;
If I enter "exit" it doesn't enter the if, does it have to do with the length of "buffer"?
Any suggestions?
You want to do this:
strcmp(buffer, "exit\n")
That is, when you enter your string and press "enter", the newline becomes a part of buffer.
Alternately, use strncmp(), which only compares n characters of the string
fgets() is returning the string "exit\n" -- unlike gets(), it preserves newlines.
As others have said, comparing with "exit" is failing because fgets() included the newline in the buffer. One of its guarantees is that the buffer will end with a newline, unless the entered line is too long for the buffer, in which case it does not end with a newline. fgets() also guarantee that the buffer is nul terminated, so you don't need to zero 256 bytes but only let fgets() use 255 to get that guarantee.
The easy answer of comparing to exactly "exit\n" required that the user did not accidentally add whitespace before or after the word. That may not matter if you want to force the user to be careful with the exit command, but might be a source of user annoyance in general.
Using strncmp() potentially allows "exited", "exit42", and more to match where you might not want them. That might work against you, especially if some valid commands are prefix strings of other valid commands.
In the general case, it is often a good idea to separate I/O, tokenization, parsing, and action into their own phases.
Agree with Dave. Also you may wish to use strncmp() instead. Then you can set a length for the comparison.
http://www.cplusplus.com/reference/clibrary/cstdio/fgets/
http://www.cplusplus.com/reference/clibrary/cstring/strncmp/
I'd recommend that you strip the \n from the end of the string, like this.
char buf[256];
int len;
/* get the string, being sure to leave room for a null byte */
if ( fgets(buf,sizeof(buf) - 1) == EOF )
{
printf("error\n");
exit(1);
}
/* absolutely always null-terminate, the easy way */
buf[sizeof(buf) - 1] = '\0';
/* compute the length, and truncate the \n if any */
len = strlen(buf);
while ( len > 0 && buf[len - 1] == '\n' )
{
buf[len - 1] = '\0';
--len;
}
That way, if you have to compare the inputted string against several constants, you're not having to add the \n to all of them.

Resources