For a command line application I need to compare the input string to a command pattern. White spaces need to be ignored.
This line should match input strings like " drop all " and " drop all":
int rc = sscanf( input, "drop all");
But what does indicate a successful match here?
Use "%n" to record where the scanning stopped.
Add white space in the format to wherever WS in input needs to be ignored.
int n = -1;
sscanf( input, " drop all %n", &n);
// v---- Did scanning reach the end of the format?
// | v---- Was there additional text in `input`?
if (n >= 0 && input[n] == '\0') Success();
Rather than working with dirty data, it's often better to clean it up and then work with it. Cleanup has to only happen once, whereas dirty data adds code complexity every time you have to use it. This step is often referred to as "normalization". Normalize the input to a canonical form before using it.
Clean up the input by trimming whitespace and doing whatever other normalization is necessary (such as folding and normalizing internal whitespace).
You could write your own trim function, but I'd recommend you use a pre-existing function like Gnome Lib's g_strstrip(). Gnome Lib brings in all sorts of handy functions.
#include <glib.h>
void normalize_cmd( char *str ) {
g_strstrip(str);
// Any other normalization you might want to do, like
// folding multiple spaces or changing a hard tab to
// a space.
}
Then you can use strcmp on the normalized input.
// This isn't strictly necessary, but it's nice to keep a copy
// of the original around for error messages and such.
char *cmd = g_strdup(input);
normalize_cmd(cmd);
if ( strcmp(cmd, "drop all") == 0) {
puts("yes");
}
else {
puts("no");
}
Putting all the normalization up front reduces the complexity of all downstream code having to work with that input; they don't have to keep repeating the same concerns about dirty data. By putting all the normalization in one place, rather than scattered all over the code, you're sure it's consistent, and the normalization method can be consistently updated.
Related
I need to create a method that get's commands from users using scanf and runs a function. The command can be simple as help or list but it can also be a command that has an argument like look DIRECTION or take ITEM. What is the best way to go about this? I could just loop through the characters of a single given string and check it manually but I was wondering there was a better way of doing this.
scanf("%s %s", command, argument);
This won't work if there's no argument. Is there a way around this?
There is a 'method' that may work. In fact, two come to mind.
Both rely on whitespace chars (in plain-english, '\n', ' 'and '\t') separating the arguments , and I assume this is good enough.
1
First, the relatively easy one - using main(int argc,char *argv[]) as most CLI programs do.
Then, running a long string of if()s/else if()s which check if the input string matched valid arguments , by testing if strcmp(argv[x],expected_command) returns 0.
You may not yet have been taught about how to use this, and it may appear scary, but its quite easy if you are familiar with string.h, arrays and pointers already.
Google searches and YouTube videos may be of help, and it won't take more than 20 or so minutes.
2
Second, if you have your program with a real CLU 'UI' and the program is in a loop and doesn't just terminate once output is generated - unlike say cat or ls , then you take input of 'command' strings within the program.
This means you will have to, apart from and before the if-ed strcmp()s , ensure that you take input with scanf() safely, and that you are able to take multiple strings as input, since you talk of sub-arguments like look DIRECTION.
The way I have done this myself (in the past) is as follows :
1. Declare a command string, say char cmd[21] = ""; and (optionally) initialise it to be empty , since reading an uninitialised string is UB (and the user may enter EOF).
2. Declare a function (for convenience) to check scanf() say like so:
int handle_scanf(int returned,int expected){
if(returned==expected)
return 0;
if(returned==EOF){
puts("\n Error : Input Terminated Immaturely.");
/* you may alternatively do perror() but then
will have to deal with resetting errno=0 and
including errno.h */
return -1;
}
else{
puts("\n Error : Insufficient Input.");
return -2;
}
}
Which can be used as : if(handle_scanf(scanf(xyz,&xyz),1)==0) {...}
As scanf() returns number of items 'taken' (items that matched with expected format-string and were hence saved) and here there is only 1 expected argument.
3. Declare a function (for convenience) to clear/flush stdin so that if and when unnecessary input is left in the input stream , (which if not dealt with, will be passed to the next place where input is taken) it can be 'eaten'.
I do it like so :
void eat()
{
int eat; while ((eat = getchar()) != '\n' && eat != EOF);
}
Essentially clears input till a newline or EOF is read. Since '\n' and EOF represent End Of Line and End Of File , and modern I/O is line buffered and performed through the stdin file , it makes sense to stop upon reading them.
EDIT : You may alternatively use a macro, for slightly better performance.
4. Print a prompt and take input, like so :
fputs("\n >>> ",stdout);
int check = handle_scanf(scanf("%20s",cmd),1);
Notice what I did here ?
"%20s" does two things - stops buffer overflow (because more than 20 chars won't be scanned into cmd) and also stops scanning when a whitespace char is encountered. So, your main command must be one-word.
5. Check if the the command is valid .
This is to be done with the aforementioned list of checking if strcmp(cmd,"expected_cmd")==0 , for all possible expected commands.
If there is no match, with an else , display an error message and call eat();(arguments to invalid command can be ignored) but only if(check != -1).
If check==-1 , this may mean that the user has sent an EOF signal to the program, in which case, calling eat() within a loop will result in an infinite loop displaying the error message, something which you don't want.
6. If there is a match, absorb the whitespace separating char and then scanf() into a char array ( if the user entered, look DIRECTION, DIRECTION is still in the input stream and will only now be saved to said char array ). This can be done like so :
#define SOME_SIZE 100 // use an appropriate size
if(strcmp(cmd,"look")==0 && check==0){ // do if(check==0) before these ifs, done here just for my convenience)
getchar(); // absorb whitespace seperator
char strbuff[SOME_SIZE] = ""; // string buffer of appropriate size
if(handle_scanf(scanf("%99[^\n]",strbuff),1)==0){
eat();
/* look at DIRECTION :) */
}
// handle_scanf() generated appropriate error msg if it doesn't return 0
}
Result
All in all, this code handles scanf mostly safely and can indeed be used in a way that the user will only type , say :
$ ./myprogram
>>> look DIRECTION
# output
>>> | #cursor
If it is all done within a big loop inside main() .
Conclusion
In reality, you may end up needing to use both together if your program is complex enough :)
I hope my slightly delayed answer is of help :)
In case of any inaccuracies , or missing details, please comment and I will get back to you ASAP
Here's a good way to parse an inputted string using strtok and scanf with a limit of 99 characters
#include <string.h>
char command[99];
scanf("%[^\n]%*c", command); //This gets the entire string and spaces
char *token;
token = strtok(command, " "); //token = the first string separated by a " "
if (strcmp(token, "help") == 0){
//do function
}
else if (strcmp(token, "go") == 0){ //if the command has an argument, you have to get the next string
token = strtok(NULL, " "); //this gets the next string separated by a space
if (strcmp(token, "north") == 0){
//do function
}
}
You can keep using token = strtok(NULL, " "); until token = NULL signifying the end of a string
Say I have a string like this:
Hello World - this is a line of textCOLOR="4"
and this string is stored in buf[1000]
As you can see this string has a color tag in the format of COLOR="n". The number is what needs to be pulled (it can be between 1 and 56) and assigned to an int variable. I'd like for this tag to be able to be anywhere in the string.
I can run the following code to extract the color value:
int colorNumber = 1; //default value
if (scanf(buf, "%*[^\"]\"%2d[^\"]\"", &colorNumber)) {
// work with number
}
and that works fine, but if the string were to contain a number or quote in it then the scanf would fail to produce the number.
I've tried a few variations of my second scanf argument, but those didn't work. I tried "%*[^\"]COLOR=\"%2d[^\"]\"" but that doesn't seem to work at all.
I've looked through the man pages for scanf, but I couldn't find what I was looking for in there either.
Perhaps scanf is not the right tool for this? I'm willing to try other libraries/functions if necessary.
try
if (sscanf(buf, "%*[^C]COLOR=\"%d", &colorNUM) == 1)
The reason your attempted "%*[^\"]COLOR"... format didn't work is that the %*[\"] matches everything up to the ", skipping past the COLOR, so will then fail to match COLOR. The above will fail if there's another C somwhere else in the string before COLOR, however. To avoid that you're probably better off using strstr, possibly in a loop if COLOR might appear multiple places.
for (const char *p = buf; p = strstr(p, "COLOR"); ++p) {
if (sscanf(p, "COLOR=\"%d", &colorNum) == 1) {
// found a color number
Note also, trying to match characters after the %d is pointless as regardelss of whether they match or not, the returned values will be the same, so there's no way to tell. If you want to ensure there's a " after the number, you need something like
int end = 0;
if (sscanf(p, "COLOR=\"%d\"%n", &colorNum, &end), end > 0)
by checking that %n wrote something to end, you're also checking that everything before the %n matched correctly.
I need to scan varied incoming messages from a serial stream to check if they contain this string:
"Everything: Received: switchX yy(y)"
where X = 1 to 9 and yy(y) is "on" or "off". i.e. "Everything: Received: switch4 on" or "Everything: Received: switch2 off" etc.
I am using the following code on an ATMega328 to do the check and pass the relevant variables to the transmit() function:
valid_data = sscanf(serial_buffer, "Everything: Received: switch%u %s", &switch_number, command);
if(valid_data == 2)
{
if(strcmp(command, "on") == 0)
{
transmit(switch_number, 1);
}
if(strcmp(command, "off") == 0)
{
transmit(switch_number, 0);
}
}
The check is triggered when the serial_buffer input ISR detects "\n". '\0' is appended to the serial stream to terminate the string.
It works and I'm not pushed for space/processing power but I just wondered if this is the best way to achieve the required result?
It works and I'm not pushed for space/processing power but I just wondered if this is the best way to achieve the required result?
It's unclear on which criteria you want us to judge, since neither speed nor memory usage is a pressing concern, but in the absence of pressure from those considerations I personally rate code simplicity and clarity as the most important criteria for source code, other than correctness, of course.
From that perspective, a solution based on sscanf() is good, especially with a comparatively simple format string such as you in fact have. The pattern for the lines you want to match is pretty clear in the format string, and the following logic is clear and simple, too. As a bonus, it also should produce small code, since a library function does most of the work, and it is reasonable to hope that the implementation has put some effort into optimizing that function for good performance, so it's probably a win even on those criteria with which you were not too concerned.
There are, however, some possible correctness issues:
sscanf() does not match whitespace literally. A run of one or more whitespace characters in the format string matches any run of zero or more whitespace characters in the input.
sscanf() skips leading whitespace before most fields, and before %u fields in particular.
switch numbers outside the specified range of 1 - 9 can be scanned.
the command buffer can easily be overrun.
sscanf() will ignore anything in the input string after the last field match
All of those issues can be dealt with, if needed. Here, for instance, is an alternative that handles all of them except the amount of whitespace between words (but including avoiding whitspace between "switch" and the digit):
unsigned char switch_number;
int nchars = 0;
int valid_data = sscanf(serial_buffer, "Everything: Received: switch%c %n",
&switch_number, &nchars);
if (valid_data >= 1 && switch_number - (unsigned) '1' < 9) {
char *command = serial_buffer + nchars;
if (strcmp(command, "on") == 0) {
transmit(switch_number - '0', 1);
} else if (strcmp(command, "off") == 0) {
transmit(switch_number - '0', 0);
} // else not a match
} // else not a match
Key differences from yours include
the switch number is read via a %c directive, which reads a single character without skipping leading whitespace. The validation condition switch_number - (unsigned) '1' < 9 ensures that the character read is between '1' and '9'. It makes use of the fact that unsigned arithmetic wraps around.
instead of reading the command into a separate buffer, the length of the leading substring is captured via a %n directive. This allows testing the whole tail against "on" and "off", thereby removing the need of an extra buffer, and enabling you to reject lines with trailing words.
If you want to check that all the whitespace, too, exactly matches, then the %n can help with that as well. For example,
if (nchars == 30 && serial_buffer[11] == ' ' && serial_buffer[21] == ' '
serial_buffer[29] == ' ') // it's OK
Here's some code I found in a very old C library that's trying to eat whitespace from a file...
while(
(line_buf[++line_idx] != ' ') &&
(line_buf[ line_idx] != ' ') &&
(line_buf[ line_idx] != ',') &&
(line_buf[ line_idx] != '\0') )
{
This great thread explains what the problem is, but most of the answers are "just ignore it" or "you should never do this". What I don't see, however, is the canonical solution. Can anyone offer a way to code this test using the "proper way"?
UPDATE: to clarify, the question is "what is the proper way to test for the presence of a string of one or more characters at a given index in another string". Forgive me if I am using the wrong terminology.
Original question
There is no canonical or correct way. Multi-character constants have always been implementation defined. Look up the documentation for the compiler used when the code was written and figure out what was meant.
Updated question
You can match multiple characters using strchr().
while (strchr( " ,", line_buf[++line_idx] ))
{
Again, this does not account for that multi-char constant. You should figure out why that was there before simply removing it.
Also, strchr() does not handle Unicode. If you are dealing with a UTF-8 stream, for example, you will need a function capable of handling it.
Finally, if you are concerned about speed, profile. The compiler might get you better results using the three (or four) individual test expressions in the ‘while’ condition.
In other words, the multiple tests might be the best solution!
Beyond that, I smell some uncouth indexing: the way that line_idx is updated depends on the surrounding code to actuate the loop properly. Make sure that you don’t create an off-by-one error when you update stuff.
Good luck!
UPDATE: to clarify, the question is "what is the proper way to test
for the presence of a string of one or more characters at a given
index in another string". Forgive me if I am using the wrong
terminology.
Well, there are a number of ways, but the standard way is using strspn which has the prototype:
size_t strspn(const char *s, const char *accept);
and it cleverly:
calculates the length (in bytes) of the initial segment of s
which consists entirely of bytes in accept.
This allows you to test for the "the presence of a string of one or more characters at a given index in another string" and tells you how many of the characters from that string were sequentially matched.
For example, if you had another string say char s = "somestring"; and wanted to know if it contained the letters r, s, t, say, in char *accept = "rst"; beginning at the 5th character, you could test:
size_t n;
if ((n = strspn (&s[4], accept)) > 0)
printf ("matched %zu chars from '%s' at beginning of '%s'\n",
n, accept, &s[4]);
To compare in order, you can use strncmp (&s[4], accept, strlen (accept));. You can also simply use nestest loops to iterate over s with the characters in accept.
All of the ways are "proper", so long as they do not invoke Undefined Behavior (and are reasonable efficient).
I want to compare my string to a giving format.
the format that I want to use in the check is :
"xxx://xxx:xxx#xxxxxx" // all the xxx are with variable length
so I used the sscanf() as follow :
if (sscanf(stin,"%*[^:]://%*[^:]:%*[^#]#") == 0) { ... }
is it correct to compare the return of scanf to 0 in this case?
You will only get zero back if all the fields match; but that won't tell you diddly-squat in practice. It might have failed with a colon in the first character and it would still return 0.
You need at least one conversion in there that is counted (%n is not counted), and that occurs at the end so you know that what went before also matched. You can never tell if trailing context (data after the last conversion specification) matched, and sscanf() won't backup if it has converted data, even if backing up would allow the trailing context to match.
For your scenario, that might be:
char c;
int n;
if (sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%n%c", &n, &c) == 1)
This requires at least one character after the #. It also tells you how many characters there were up to and including the #.
OP's suggestion is close.
#Jonathan Leffler is correct in that comparing the result of a specifier-less sscanf() against 0 does not distinguish between a match and no-match.
To test against "xxx://xxx:xxx#xxxxxx", (and assuming any part with "x" needs at least 1 matching), use
int n = 0;
sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%*c%n", &n);
if (n > 0) {
match();
}
There is a obscure hole using this method with fscanf(). A stream of data with a \0 is a problem.