Using sscanf to parse two strings out - c

I have a semi xml formatted file that contains line with the following format:
<param name="Distance" value="1000Km" />
The first char in the string is usually a TAB or spaces.
I've been using the following to try to parse the two strings out (from name and value):
if(sscanf(lineread, "\t<param name=\"%s\" value=\"%s\" />", name, value) == 1)
{
//do something
}
name and value are char*
Now, the result is always the same: name gets parse (I need to remove the quotes) and name is always empty.
What am I doing wrong?
Thanks, code is appreciated.
Jess.

As usual, a scanset is probably your best answer:
sscanf(lineread, "%*[^\"]\"%[^\"]\"%*[^\"]\"%[^\"]\"", name, value);
Of course, for real code you also want to limit the lengths of the conversions:
#include <stdio.h>
int main() {
char lineread[] = "<param name=\"Distance\" value=\"1000Km\" />";
char name[256], value[256];
sscanf(lineread, "%*[^\"]\"%255[^\"]\"%*[^\"]\"%255[^\"]\"", name, value);
printf("%s\t%s", name, value);
return 0;
}
Edti: BTW, sscanf returns the number of successful conversions, so in your original code, you probably wanted to compare to 2 instead of 1.
Edit2: This much: %*[^\"]\" means "read and ignore characters other than a quote mark", then read and skip across a quote mark. The next %255[^\"]\" means "read up to 255 characters other than a quote mark, then read and skip across a quote mark. That whole pattern is then repeated to read the second string.

The problem with the original code was that %s stops only after seeing a space. Hence, name gets Distance" not Distance as expected.

Related

Extract formatted input from user

I have some user input following this format:
Playa Raco#path#5#39.244|-0.257#0-23
The # here acts as a separator, and the | is also a separator for the latitude and longitude. I would like to extract this information. Note that the strings could have spaces.
I tried using the %[^\n]%*c formatter with scanf and adding # and |, but it doesn't work because it matches the whole line.
I would like to keep this as simple as possible, I know that I could do this reading each char, but I'm curious to see best practices and check if there is a scanf or similar alternative for this.
As mentioned in the comments, there are many ways you can parse the information from the string. You can walk a pair of pointers down the string, testing each character and taking the appropriate action, you can use strtok(), but note strtok() modifies the original string, so it cannot be used on a string-literal, you can use sscanf() to parse the values from the string, or you can use any combination of strcspn(), strspn(), strchr(), etc. and then manually copy each field between a start and end pointer.
However, your question also imposes "I would like to keep this as simple as possible..." and that points directly to sscanf(). You simply need to validate the return and you are done. For example, you could do:
#include <stdio.h>
#define MAXC 16 /* adjust as necessary */
int main (void) {
const char *str = "Playa Raco#path#5#39.244|-0.257#0-23";
char name[MAXC], path[MAXC], last[MAXC];
int num;
double lat, lon;
if (sscanf (str, "%15[^#]#%15[^#]#%d#%lf|%lf#%15[^\n]",
name, path, &num, &lat, &lon, last) == 6) {
printf ("name : %s\npath : %s\nnum : %d\n"
"lat : %f\nlon : %f\nlast : %s\n",
name, path, num, lat, lon, last);
}
else
fputs ("error: parsing values from str.\n", stderr);
}
(note: the %[..] conversion does not consume leading whitespace, so if there is a possibility of leading whitespace or a space following '#' before a string conversion, include a space in the format string, e.g. " %15[^#]# %15[^#]#%d#%lf|%lf# %15[^\n]")
Where each string portion of the input to be split is declared as a 16 character array. Looking at the format-string, you will note the read of each string is limited to 15 characters (plus the nul-terminating) character to ensure you do not attempt to store more characters than your arrays can hold. (that would invoke Undefined Behavior). Since there are six conversions requested, you validate the conversion by ensuring the return is 6.
Example Use/Output
Taking this approach, the output above would be:
./bin/parse_sscanf
name : Playa Raco
path : path
num : 5
lat : 39.244000
lon : -0.257000
last : 0-23
No one way is necessarily "better" than another so long as you validate the conversions and protect the array bounds for any character arrays filled. However, as far as simple as possible goes, it's hard to beat sscanf() here -- and it doesn't modify your original string, so it is safe to use with string-literals.

How can I use sscanf to analyze string data?

How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
The array name can be 31 characters at most and the index 3 at most.
I found the following example which suppose to work with "Matrix[index1][index2]". I really couldn't understand how it does it in order to take apart the part I need to get my strings.
sscanf(inputString, "%32[^[]%*[[]%3[^]]%*[^[]%*[[]%3[^]]", matrixName, index1,index2) == 3
This try over here wasn't a success, what am I missing?
sscanf(inputString, "%32[^[]%*[[]%3[^]]", arrayName, index) == 2
How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
With sscanf, you don't. Not if you mean that you can rely on nothing being modified in the event that the input does not match the pattern. This is because sscanf, like the rest of the scanf family, processes its input and format linearly, without backtracking, and by design it fills input fields as they are successfully matched. Thus, if you scan with a format that assigns multiple fields or has trailing literal characters then it is possible for results to be stored for some fields despite a matching failure occurring.
But if that's ok with you then #gsamaras's answer provides a nearly-correct approach to parsing and validating a string according to your specified format, using sscanf. That answer also presents a nice explanation of the meaning of the format string. The problem with it is that it provides no way to distinguish between the input fully matching the format and the input failing to match at the final ], or including additional characters after.
Here is a variation on that code that accounts for those tail-end issues, too:
char array_name[32] = {0}, idx[4] = {0}, c = 0;
int n;
if (sscanf(str, "%31[^[][%3[^]]%c%n", array_name, idx, &c, &n) >= 3
&& c == ']' && str[n] == '\0')
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
The difference in the format is the replacement of the literal terminating ] with a %c directive, which matches any one character, and the addition of a %n directive, which causes the number of characters of input read so far to be stored, without itself consuming any input.
With that, if the return value is at least 3 then we know that the whole format was matched (a %n never produces a matching failure, but docs are unclear and behavior is inconsistent on whether it contributes to the returned field count). In that event, we examine variable c to determine whether there was a closing ] where we expected to find one, and we use the character count recorded in n to verify that all characters of the string were parsed (so that str[n] refers to a string terminator).
You may at this point be wondering at how complicated and cryptic that all is. And you would be right to do so. Parsing structured input is a complicated and tricky proposition, for one thing, but also the scanf family functions are pretty difficult to use. You would be better off with a regex matcher for cases like yours, or maybe with a machine-generated lexical analyzer (see lex), possibly augmented by machine-generated parser (see yacc). Even a hand-written parser that works through the input string with string functions and character comparisons might be an improvement. It's still complicated any way around, but those tools can at least make it less cryptic.
Note: the above assumes that the index can be any string of up to three characters. If you meant that it must be numeric, perhaps specifically a decimal number, perhaps specifically non-negative, then the format can be adjusted to serve that purpose.
A naive example to get you started:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "myArray[123]";
char array_name[32] = {0}, idx[4] = {0};
if(sscanf(str, "%31[^[][%3[^]]]", array_name, idx) == 2)
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
return 0;
}
Output:
arrayName = myArray
index = 123
which will find easy not-in-the-expected format cases, such as "ArrayNameidx]" and "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP[idx]", but not "ArrayName[idx".
The essence of sscanf() is to tell it where to stop, otherwise %s would read until the next whitespace.
This negated scanset %[^[] means read until you find an opening bracket.
This negated scanset %[^]] means read until you find a closing bracket.
Note: I used 31 and 3 as the width specifiers respectively, since we want to reserve the last slot for the NULL terminator, since the name of the array is assumed to be 31 characters at the most, and the index 3 at the most. The size of the array for its token is the max allowed length, plus one.
How can I use sscanf to analyze string data?
Use "%n" to detect a completed scan.
array name can be 31 characters at most and the index 3 at most.
For illustration, let us assume the index needs to limit to a numeric value [0 - 999].
Use string literal concatenation to present the format more clearly.
char name[32]; // array name can be 31 characters
#define NAME_FMT "%31[^[]"
char idx[4]; //
#define IDX_FMT "%3[0-9]"
int n = 0; // be sure to initialize
sscanf(str, NAME_FMT "[" IDX_FMT "]" "%n", array_name, idx, &n);
// Did scan complete (is `n` non-zero) with no extra text?
if (n && str[n] == '\0') {
printf("arrayName = %s\nindex = %d\n", array_name, atoi(idx));
} else {
printf("Not in the expected format \"ArrayName[idx]\"\n");
}

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

Using scanf to read in certain amount of characters in C?

I am having trouble accepting input from a text file. My program is supposed to read in a string specified by the user and the length of that string is determined at runtime. It works fine when the user is running the program (manually inputting the values) but when I run my teacher's text file, it runs into an infinite loop.
For this example, it fails when I am taking in 4 characters and his input in his file is "ABCDy". "ABCD" is what I am supposed to be reading in and 'y' is supposed to be used later to know that I should restart the game. Instead when I used scanf to read in "ABCD", it also reads in the 'y'. Is there a way to get around this using scanf, assuming I won't know how long the string should be until runtime?
Normally, you'd use something like "%4c" or "%4s" to read a maximum of 4 characters (the difference is that "%4c" reads the next 4 characters, regardless, while "%4s" skips leading whitespace and stops at a whitespace if there is one).
To specify the length at run-time, however, you have to get a bit trickier since you can't use a string literal with "4" embedded in it. One alternative is to use sprintf to create the string you'll pass to scanf:
char buffer[128];
sprintf(buffer, "%%%dc", max_length);
scanf(buffer, your_string);
I should probably add: with printf you can specify the width or precision of a field dynamically by putting an asterisk (*) in the format string, and passing a variable in the appropriate position to specify the width/precision:
int width = 10;
int precision = 7;
double value = 12.345678910;
printf("%*.*f", width, precision, value);
Given that printf and scanf format strings are quite similar, one might think the same would work with scanf. Unfortunately, this is not the case--with scanf an asterisk in the conversion specification indicates a value that should be scanned, but not converted. That is to say, something that must be present in the input, but its value won't be placed in any variable.
Try
scanf("%4s", str)
You can also use fread, where you can set a read limit:
char string[5]={0};
if( fread(string,(sizeof string)-1,1,stdin) )
printf("\nfull readed: %s",string);
else
puts("error");
You might consider simply looping over calls to getc().

How to retrieve the telephone number from an AT CMGL response?

I have an application written in C that reads text messages from a modem using AT commands. A typical AT response from the modem looks like this:
+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"
The code is currently set up to only retrieve the id from this line, which is the first number, and it does so using the following code:
sscanf(line, "+CMGL: %d,", &entry);
Here, "line" is a character array containing a line from the modem, and "entry" is an integer in which the id is stored. I tried extending this code like this:
sscanf(line, "+CMGL: %d,\"%*s\",\"%s\",", &entry, phonenr);
I figured I would use the %*s to scan for the text in the first pair of quotes and skip it, and read the text in the next pair of quotes (the phone number) into the phonenr character array.
This doesn't work (%*s apparently reads "REC" and the next %s doesn't read anything).
An extra challange is that the text isn't restricted to "REC READ", it could in fact be many things, also a text without the space in it.
Sscanf is not very good for parsing, use strchr rather. Without error handling:
#include <stdio.h>
int main(void)
{
const char *CGML_text = "+CMGL: 1,\"REC READ\",\"+31612123738\",,\"08/12/22,11:37:52+04\"";
char *comma, *phone_number_start, *phone_number_end;
comma = strchr(CGML_text, ',');
comma = strchr(comma + 1, ',');
phone_number_start = comma + 2;
phone_number_end = strchr(phone_number_start, '"') - 1;
printf("Phone number is '%.*s'\n", phone_number_end + 1 - phone_number_start, phone_number_start);
return 0;
}
(updated with tested, working code)
The way I solved it now is with the following code:
sscanf(line, "+CMGL: %d,\"%*[^\"]\",\"%[^\"]", &entry, phonenr);
This would first scan for a number (%d), then for an arbitrary string of characters that are not double quotes (and skip them, because of the asterisk), and for the phone number it does the same.
However, I'm not sure yet how robust this is.
You can use strchr() to find the position of '+' in the string, and extract the phone number after it. You may also try to use strtok() to split the string with '"', and analyze the 3rd part.
%s in scanf() reads until whitespace.
You're very close to a solution.
To read this;
+CMGL: 1,"REC READ"
You need;
"+CMGL: %d,"%*s %*s"

Resources