Percent sign as format delimiter in fscanf - c

I am working on this piece of code that reads a file with records delimited by percent signs (%) and then saves the values in a node struct. The input is as follows:
2%c1%d3%33445.000000%2016%4%11
1%c2%d2%234.500000%2016%4%11
0%c1%d1%123.400000%2016%4%11
Each line will be a node containing the data separated by the percent signs. I am using fscanf to read the formatted input and save the values in the specific variables. It works well if the delimiter is any character but '%'.
I tried escaping the percent sign by doing '%%' but it won't work and fscanf returns -1. I have looked everywhere for a way of doing this, but can't find anything. Any help would be greatly appreciated. The following is a snippet from by code.
int recordID;
char category[255];
char detail[255];
float amount;
int year;
int month;
int day;
while(fscanf(pFile, "%d%%%s%%%s%%%f%%%d%%%d%%%d", &recordID, category, detail, &amount, &year, &month, &day) == 7) {
struct node* p = (struct node*) malloc(sizeof(struct node));
p->recordID = recordID;
copy_array(category, p->category, 255);
copy_array(detail, p->detail, 255);
p->amount = amount;
p->year = year;
p->month = month;
p->day = day;
add_node(p);
}
The pFile is the file containing the input specified above.
Thank you!

The problem is that %s reads a string, and you're not telling it to stop at the delimiter, so it gobbles up the % and everything past it. Use %[^%] instead of %s:
fscanf(pFile, "%d%%%[^%]%%%[^%]%%%f%%%d%%%d%%%d", ...
If you've never seen the scanf specifier %[, it works like this: %[abc] scans any combination of a's, b's, and c's. %[^abc] scans any string not containing an a, b, or c. You can also use ranges, like %[0-9]. Otherwise it's mostly like %s, writing to a char * destination buffer.
(As an aside, whoever chose % as a delimiter should be shot. I changed all the %'s to |'s, both in your code and your data file, so I could debug it without losing my mind, and then I changed them all back to % at the end, after I got it working.)
Addendum: John Bollinger is absolutely right, you need to worry about buffer overflow, also, as his solution shows.

[f]scanf() does not do pattern matching the way you hoped. In particular, the %s field descriptor matches a whitespace-delimited string, whereas you need to match a string delimited by '%' characters. You might have gotten a clue about this if you had examined the actual scanf() return value (but good on you for at least comparing it with the one you expected!).
You can match a string of characters from a given set via the %[ field descriptor, as #SteveSummit explained in his answer. Moreover, it is a good idea to specify maximum field widths in your format, so as to avoid overrunning the bounds of your arrays. That would be particularly effective with the format you are scanning, as an overlength input field for either of your strings will cause a matching failure to occur with the subsequent delimiter:
fscanf(pFile, "%d%%%254[^%]%%%254[^%]%%%f%%%d%%%d%%%d", &recordID, category,
detail, &amount, &year, &month, &day)

Related

fscanf c programming wierd error

I am not new to programming, but I encountered this small problem and I can't seem to get it.
I want to read a file with dates and put them in another file with another format
Input example: 18.08.2015
Output example: 18-08-2015
Here is the code (dat1 has "r" permission and dat2 "w"):
char d[3];
char m[3];
char g[5];
while(fscanf(dat1,"%s.%s.%s\n",&d,&m,&g)==3)
{
fprintf(dat2,"%s-%s-%s\n",d,m,g);
}
On the other hand, this works fine if I use [space] instead of a [dot] in the input file.
(18 08 2015)
What am I missing? The solution has to be as simple as possible and with using fscanf, not fgetc or fgets, to be explained to students that are just beginning to learn C. Thanks.
The %s pattern matches a sequence of non-white-space characters, so the first %s will gobble up the entire string.
Why use char arrays at all, why not int?
int d;
int m;
int g;
while(fscanf(dat1,"%d.%d.%d\n",&d,&m,&g)==3)
{
fprintf(dat2,"%d-%d-%d\n",d,m,g);
}
The %d in fprintf will not output leading zeros though. You'll have to teach your students a little bit extra or leave it for extra credit.
Since the scanf format %s reads up to the next whitespace character, it cannot be used for a string ending with a .. Instead use a character class: %2[0-9] or %2[^.]. (Change the 2 to the maximum number of characters you can handle, and don't forget that the [ format code does not skip whitespace, so if you want to do that, put a space before the format code.)
Change
fscanf(dat1,"%s.%s.%s\n",&d,&m,&g)
to
fscanf(dat1,"%[^.].%[^.].%[^.]\n",d,m,g);

Working of sscanf

There is a big string. I want to store different parts of this in different variables. But it seems that either my understanding is not clear or there is a bug. Please help.
here is my section of code.
char sample[] = "abc,batsman,2,28.0,1800";
char name[10] ,speciality[10];
float batavg;
int pos, runs,j;
j = sscanf(sample,"%s,%s,%d,%f,%d", name, speciality, pos, batavg, runs);
printf("%s,%s,%d,%f,%d", name, speciality, pos, batavg, runs);
printf("\n%d\n",j);
Output
Some garbage values with the value of j = 1 in the above case is shown.
How can I settle this?
The scanf() family of functions require you to pass pointers to the locations where the scanned fields should be stored. That just works when you're scanning into a char array (field descriptor %s) because the name of a char array is converted to a pointer automatically, but for other kinds of fields you need to use address-of operator (&).
Additionally, as iharob first observed, the %s descriptor expects fields to be delimited by whitespace. You can get what you want via the %[] descriptor:
j=sscanf(sample,"%[^,],%[^,],%d,%f,%d",name,speciality,&pos,&batavg,&runs);
The "%s" specifier in *scanf() family of functions scans all the characters until a white space happens.
So the first "%s" is consuming the whole string, that's why j == 1, you must check the value of j before printing, since all the other parameters are uninitialized at the moment of printing.
You need a different format specifier, namely
sscanf("%[^,],%[^,],%d,%f,%d", name, speciality, &pos, &batavg, &runs);

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

fprintf prints a new line at the beginning of the file

I'm using a fprintf function to print to a new file
I'm using the following command to write multiple times:
fprintf(fp, "%-25s %d %.2f %d",temp->data.name, temp->data.day, temp->data.temp, temp->data.speed);
The problem is that sometimes the file gets an extra new line as the first character.
Could this be lelftovers from some buffer, I don't really know...
typedef struct Data {
char name[26];
int day;
int speed;
float temp;
} Data ;
#spatz you were right, I'm kind of new to the string format thing and I was told to make one for a fscanf where I was to expect an undetermined amount of space between the bits of data, here is what I came up with, I'm pretty sure its the source of the problem:
check=fscanf(fp1, "%20c%*[^0-9]%d%*[^0-9]%f%*[^0-9]%d%*[^\n]%*c", name, &day, &temp, &speed);
only the first line get read normally and everything afterwards reads the new line of the previous line.
Can someone please show me the proper way to write this thing?
Rather than calling fscanf() over and over, and hoping that the newlines match up how you want, use fgets() to get one line at a time, parse it using fscanf(), and do error handling on a line-by-line basis. This will be less error-prone, and it sounds like it will clear up your problem with no extra effort.
Your problem is that name starts with a newline, and that newline ends up in the file.
In order to properly parse the file I would have to know its format, but for now I assume it's <string> <int> <int> <float> where the number of spaces between each element may vary.
The format string I would start with is simply "%s%d%d%f", and let fscanf() deal with the whitespace. With this format string I was able to properly parse lines like
foo 3 4 7
If this does not satisfy you feel free to elaborate on the format of the file you are parsing and I'll try to come up with solutions.

How to retrieve the telephone number from an AT CMGL response?

I have an application written in C that reads text messages from a modem using AT commands. A typical AT response from the modem looks like this:
+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"
The code is currently set up to only retrieve the id from this line, which is the first number, and it does so using the following code:
sscanf(line, "+CMGL: %d,", &entry);
Here, "line" is a character array containing a line from the modem, and "entry" is an integer in which the id is stored. I tried extending this code like this:
sscanf(line, "+CMGL: %d,\"%*s\",\"%s\",", &entry, phonenr);
I figured I would use the %*s to scan for the text in the first pair of quotes and skip it, and read the text in the next pair of quotes (the phone number) into the phonenr character array.
This doesn't work (%*s apparently reads "REC" and the next %s doesn't read anything).
An extra challange is that the text isn't restricted to "REC READ", it could in fact be many things, also a text without the space in it.
Sscanf is not very good for parsing, use strchr rather. Without error handling:
#include <stdio.h>
int main(void)
{
const char *CGML_text = "+CMGL: 1,\"REC READ\",\"+31612123738\",,\"08/12/22,11:37:52+04\"";
char *comma, *phone_number_start, *phone_number_end;
comma = strchr(CGML_text, ',');
comma = strchr(comma + 1, ',');
phone_number_start = comma + 2;
phone_number_end = strchr(phone_number_start, '"') - 1;
printf("Phone number is '%.*s'\n", phone_number_end + 1 - phone_number_start, phone_number_start);
return 0;
}
(updated with tested, working code)
The way I solved it now is with the following code:
sscanf(line, "+CMGL: %d,\"%*[^\"]\",\"%[^\"]", &entry, phonenr);
This would first scan for a number (%d), then for an arbitrary string of characters that are not double quotes (and skip them, because of the asterisk), and for the phone number it does the same.
However, I'm not sure yet how robust this is.
You can use strchr() to find the position of '+' in the string, and extract the phone number after it. You may also try to use strtok() to split the string with '"', and analyze the 3rd part.
%s in scanf() reads until whitespace.
You're very close to a solution.
To read this;
+CMGL: 1,"REC READ"
You need;
"+CMGL: %d,"%*s %*s"

Resources