How to parse this input in C

How to parse this input in C - c

Right now i am doing an assignment but find it very hard to parse the user input in C. Here is kind of input user will input.
INSERT Alice, 25 Norway Drive, Fitzerald, GA, 40204, 6000.60
Here INSERT is the command (to enter in link list)
Alice is a name
25 Norway Drive is an address
Fitzerald is a city
GA is a state
40204 is a zip code
6000.60 is a balance
How can I use scanf or any other method in C to properly take this as input? The biggest problem in front of me is how to ignore these "," and store these values in separate variables of appropriate data types.
Thanks everyone, i have solve the issue and here is the solution:
pch = strtok(NULL, ","); pch =
substr(pch, 2, strlen(pch)); //substr is my custom funcition and i believe you can tell by its name what it is doing.
strcpy(customer->streetAddress, pch);

Fast easy method:
Use fgets() to get the string from the user;
and strtok() to tokenize it.
Edit
After reading your comment:
Use strtok() with only the comma, and then remove trailing and leading spaces from the result.
Edit2
After a test run, I noticed you will get "INSERT Alice" as the first token. So, after all tokens have been extracted, run strtok() again, this time with a space, on the first token extracted. Or, find the space and somehow identify the command and the name from there.

If your input data format is fixed you can use something quick and dirty using [s]scanf().
With input of:
INSERT Alice, 25 Norway Drive, Fitzerald, GA, 40204, 6000.60
You might try, if reading from stdin:
char name[80], addr[80], city[80], state[80];
int zip;
double amt;
int res = scanf("INSERT %[^,], %[^,], %[^,], %[^,], %d, %f\n",
&name, &addr, &city, &state, &zip, &amt);
Should return the number of items matched (i.e. 6).

scanf() may be a bit tricky in this situation, assuming that different commands with different parameters can be used. I would probably use fgets() to read in the string first, followed by the use of strtok() to read the first token (the command). At that point you can either continue to use strtok() with "," as the delimiter to read the rest of the tokens in the string, or you could use a sscanf() on the rest of the string (now that you know the format that the rest of the input will be in). sscanf() is still going to be a pain due to the fact that it appears that an unspecified number of spaces would be allowed in the address and possibly town fields.

Related

How to read two words in one string

I have sample input file like this
1344 Muhammad Ayyubi 1
1344 Muhammad Ali Ayyubi 1
First, last number and surname are separated with tab character. However, a person may have two names. In that case, names are separated with whitespace.
I am trying to read from input file and store them in related variables.
Here is my code that successfully reads when a person has only one name.
fscanf(fp, "%d\t%s\t%s\t%d", &id, firstname, surname, &roomno)
The question is that is there any way to read the input file which may contain two first names.
Thanks in advance.

Read the line with fgets() which then saves that as a string.
Then parse the string. Save into adequate sized buffers.
Scanning with "\t", scans any number of white-space - zero or more. Use TABFMT below to scan 1 tab character.
Test results along the way.
This code uses " %n" to see that parsing reached that point and nothing more on the line.
#define LINE_N 100
char line[LINE_N];
int id,
char firstname[LINE_N];
char surname[LINE_N];
int roomno;
if (fgets(line, sizeof line, fp)) {
int n = 0;
#define TABFMT "%*1[\t]"
#define NAMEFMT "%[^\t]"
sscanf(line, "%d" TABFMT NAMEFMT TABFMT NAMEFMT TABFMT "%d %n",
&id, firstname, surname, &roomno, &n);
if (n == 0 || line[n]) {
fprintf(stderr, "Failed to parse <%s>\n", line);
} else {
printf("Success: %d <%s> <%s> %d\n", id, firstname, surname, roomno);
}
}
If the last name or first is empty, this code treats that as an error.
Alternate approach would read the line into a string and then use strcspn(), strchr() or strtok() to look for tabs to parse into the 4 sub-strings`.
The larger issue missed by OP is what to do about ill-formatted input? Error handling is often dismissed with "input will be well formed", yet in real life, bad input does happen and also is the crack the hackers look for. Defensive coding takes steps to validate input. Pedantic code would not use *scanf() at all, but instead fgets(), strcspn(), strspn(), strchr(), strtol() and test, test, test. This answer is a middle-of-the-road testing effort.

You can use the %[ specifier to read whitespace in a string:
fscanf(fp, "%d\t%[^\t]\t%[^\t]\t%d", &id, firstname, surname, &roomno)

The answers to the question as stated are reasonable, but the question is wrong.
The end-goal here is to read human-names. Human names come in quite a variety - not always first, [middle,] last. Baking in this assumption is an error in design.
This is a many, many times repeated error. Better not to repeat.
Simplest solution is to re-order the data fields, and make no assumptions about the structure of names. So the input data becomes:
1344 1 Muhammad Ayyubi
1344 1 Muhammad Ali Ayyubi
Scanning code then can pull off the first two numeric fields, and use the remainder of the line for name (making no assumptions about structure).
More generally, if you do need to scan fields with embedded whitespace, remember the 32 "control" characters in the ASCII character table, of which ~24 have no assigned semantics (in current use). You can add structure to a file of text, for example with use of (from man ascii:
034 28 1C FS (file separator)
035 29 1D GS (group separator)
036 30 1E RS (record separator)
037 31 1F US (unit separator)
There is almost no case where text fields are allowed these characters.

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.

There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.

I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

Using strtok() to tokenize strings (possibly multiple words) between integers, all separated by spaces

The format of the last line as seen below completely random. The first part of the program has experiment names inputted into *experiments[20] while data for each experiment is put into data[10][20]. After a certain line in the input redirection where "*** END ***" is read, the data input is terminated. The following line is of our options.They do this:1. Show all data. 2. Calculate the average for a specific experiment (therefore the name of the experiment HAS to follow 2 in the file. 3. Calculate the total average of all experiments. 4. End the program. Everything needs to be done through file redirection input
Main Question: How do i tokenize a string composed of two words, as you can see two lines below?
Ok, so we have stdin file redirection input of this last line of a file:
1 2 Control Group 3 4
There are 4 possible options: 1,2,3,4
2 is always followed by the name of an experiment as it calculates the average of that specific experiment.
We tokenize the line obtained through fgets() by doing this:
token = strtok(str," ");
and then by continuing like this for other integers:
token = strtok(NULL," ");
Each token of a number is scanned into an int var as such:
sscanf (token, "%d", &var);
When var is equal to 2, the switch statement creates a new token, expecting a String to follow. Originally I had written the code as such:
printf("What experiment would you like to use?\n");
token = strtok (NULL," ");
sscanf (token, "%s", &str);
And then I would compare str with my different experiment names in a loop using strcmp. However I only tested it with 1 word experiment names. Now I'm realizing the problem can be written as listed at the beginning of my question.
Is there a simple solution to this?
Thanks.

How to retrieve the telephone number from an AT CMGL response?

I have an application written in C that reads text messages from a modem using AT commands. A typical AT response from the modem looks like this:
+CMGL: 1,"REC READ","+31612123738",,"08/12/22,11:37:52+04"
The code is currently set up to only retrieve the id from this line, which is the first number, and it does so using the following code:
sscanf(line, "+CMGL: %d,", &entry);
Here, "line" is a character array containing a line from the modem, and "entry" is an integer in which the id is stored. I tried extending this code like this:
sscanf(line, "+CMGL: %d,\"%*s\",\"%s\",", &entry, phonenr);
I figured I would use the %*s to scan for the text in the first pair of quotes and skip it, and read the text in the next pair of quotes (the phone number) into the phonenr character array.
This doesn't work (%*s apparently reads "REC" and the next %s doesn't read anything).
An extra challange is that the text isn't restricted to "REC READ", it could in fact be many things, also a text without the space in it.

Sscanf is not very good for parsing, use strchr rather. Without error handling:
#include <stdio.h>
int main(void)
{
const char *CGML_text = "+CMGL: 1,\"REC READ\",\"+31612123738\",,\"08/12/22,11:37:52+04\"";
char *comma, *phone_number_start, *phone_number_end;
comma = strchr(CGML_text, ',');
comma = strchr(comma + 1, ',');
phone_number_start = comma + 2;
phone_number_end = strchr(phone_number_start, '"') - 1;
printf("Phone number is '%.*s'\n", phone_number_end + 1 - phone_number_start, phone_number_start);
return 0;
}
(updated with tested, working code)

The way I solved it now is with the following code:
sscanf(line, "+CMGL: %d,\"%*[^\"]\",\"%[^\"]", &entry, phonenr);
This would first scan for a number (%d), then for an arbitrary string of characters that are not double quotes (and skip them, because of the asterisk), and for the phone number it does the same.
However, I'm not sure yet how robust this is.

You can use strchr() to find the position of '+' in the string, and extract the phone number after it. You may also try to use strtok() to split the string with '"', and analyze the 3rd part.

%s in scanf() reads until whitespace.
You're very close to a solution.
To read this;
+CMGL: 1,"REC READ"
You need;
"+CMGL: %d,"%*s %*s"

Can scanf identify a format character within a string?

Let's say that I expect a list of items from the standard input which are separated buy commas, like this:
item1, item2, item3,...,itemn
and I also want to permit the user to emit white-spaces between items and commas, so this kind of input is legal in my program:
item1,item2,item3,...,itemn
If I use scanf like this:
scanf("%s,%s,%s,%s,...,%s", s1, s2, s3, s4,...,sn);
it will fail when there are no white-spaces (I tested it) because it will refer to the whole input as one string. So how can I solve this problem only with C standard library functions?

The quick answer is never, ever use scanf to read user input. It is intended for reading strictly formatted input from files, and even then isn't much good. At the least, you should be reading entire lines and then parsing them with sscanf(), which gives you some chance to correct errors. at best you should be writing your own parsing functions
If you are actually using C++, investigate the use of the c++ string and stream classes, which are much more powerful and safe.

You could have a look at strtok. First read the line into a buffer, then tokenize:
const int BUFFERSIZE = 32768;
char buffer[BUFFERSIZE];
fgets(buffer, sizeof(buffer), stdin);
const char* delimiters = " ,\n";
char* p = strtok(buffer, delimiters);
while (p != NULL)
{
printf("%s\n", pch);
p = strtok(NULL, delimiters);
}
However, with strtok you'll need to be aware of the potential issues related to reentrance.

I guess it is better to write your own parsing function for this. But if you still prefer scanf despite of its pitfalls, you can do some workaround, just substitute %s with %[^, \t\r\n].
The problem that %s match sequence of non white space characters, so it swallows comma too. So if you replace %s with %[^, \t\r\n] it will work almost the same (difference is that %s uses isspace(3) to match space characters but in this case you explicitly specify which space characters to match and this list probably not the same as for isspace).
Please note, if you want to allow spaces before and after comma you must add white space to your format string. Format string "%[^, \t\r\n] , %[^, \t\r\n]" matches strings like "hello,world", "hello, world", "hello , world".

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to parse this input in C - c

Related

How to read two words in one string

Sscanf not returning what I want

Using strtok() to tokenize strings (possibly multiple words) between integers, all separated by spaces

How to retrieve the telephone number from an AT CMGL response?

Can scanf identify a format character within a string?

Categories

Resources