I have sample input file like this
1344 Muhammad Ayyubi 1
1344 Muhammad Ali Ayyubi 1
First, last number and surname are separated with tab character. However, a person may have two names. In that case, names are separated with whitespace.
I am trying to read from input file and store them in related variables.
Here is my code that successfully reads when a person has only one name.
fscanf(fp, "%d\t%s\t%s\t%d", &id, firstname, surname, &roomno)
The question is that is there any way to read the input file which may contain two first names.
Thanks in advance.
Read the line with fgets() which then saves that as a string.
Then parse the string. Save into adequate sized buffers.
Scanning with "\t", scans any number of white-space - zero or more. Use TABFMT below to scan 1 tab character.
Test results along the way.
This code uses " %n" to see that parsing reached that point and nothing more on the line.
#define LINE_N 100
char line[LINE_N];
int id,
char firstname[LINE_N];
char surname[LINE_N];
int roomno;
if (fgets(line, sizeof line, fp)) {
int n = 0;
#define TABFMT "%*1[\t]"
#define NAMEFMT "%[^\t]"
sscanf(line, "%d" TABFMT NAMEFMT TABFMT NAMEFMT TABFMT "%d %n",
&id, firstname, surname, &roomno, &n);
if (n == 0 || line[n]) {
fprintf(stderr, "Failed to parse <%s>\n", line);
} else {
printf("Success: %d <%s> <%s> %d\n", id, firstname, surname, roomno);
}
}
If the last name or first is empty, this code treats that as an error.
Alternate approach would read the line into a string and then use strcspn(), strchr() or strtok() to look for tabs to parse into the 4 sub-strings`.
The larger issue missed by OP is what to do about ill-formatted input? Error handling is often dismissed with "input will be well formed", yet in real life, bad input does happen and also is the crack the hackers look for. Defensive coding takes steps to validate input. Pedantic code would not use *scanf() at all, but instead fgets(), strcspn(), strspn(), strchr(), strtol() and test, test, test. This answer is a middle-of-the-road testing effort.
You can use the %[ specifier to read whitespace in a string:
fscanf(fp, "%d\t%[^\t]\t%[^\t]\t%d", &id, firstname, surname, &roomno)
The answers to the question as stated are reasonable, but the question is wrong.
The end-goal here is to read human-names. Human names come in quite a variety - not always first, [middle,] last. Baking in this assumption is an error in design.
This is a many, many times repeated error. Better not to repeat.
Simplest solution is to re-order the data fields, and make no assumptions about the structure of names. So the input data becomes:
1344 1 Muhammad Ayyubi
1344 1 Muhammad Ali Ayyubi
Scanning code then can pull off the first two numeric fields, and use the remainder of the line for name (making no assumptions about structure).
More generally, if you do need to scan fields with embedded whitespace, remember the 32 "control" characters in the ASCII character table, of which ~24 have no assigned semantics (in current use). You can add structure to a file of text, for example with use of (from man ascii:
034 28 1C FS (file separator)
035 29 1D GS (group separator)
036 30 1E RS (record separator)
037 31 1F US (unit separator)
There is almost no case where text fields are allowed these characters.
Related
Data file:
Newton 30 United Kingdom Scientist
Maxwell 25 United Kingdom Mathematician
Edison 60 United States Engineer
Code to read it:
#define MAX_NAME 50
#define MAX_COUNTRY 25
#define MAX_PROFILE 20
struct person
{
char *name;
int age;
char *country;
char *profile;
};
struct person pObj;
pObj->name = (char *) malloc(sizeof(MAX_NAME));
pObj->country = (char *) malloc(sizeof(MAX_COUNTRY));
pObj->profile = (char *) malloc(sizeof(MAX_PROFILE));
fscanf(fPtr,"%s\t%d\t%s\t%s\n",pObj->name,&pObj->age,pObj->country,pObj->profile);
I wrote a program to read tab delimited record to a structure using fscanf(). Same thing I can do by strtok(), strsep() functions also. But If I use strtok(), I forced to use atoi() function to load age field. But I don't want to use that atoi() function. So I simply used fscanf() to read age as Integer directly from the FILE stream buffer. It works fine. BUT for some record, country field is empty as like below.
Newton 30 United Kingdom Scientist
Maxwell 25 Mathematician
Edison 60 United States Engineer
When I read the second record, fscanf() doesn't fill empty string to the country field instead it has been filled with profile data. We understand fscanf() works that way. But is it there any option to scan the country field even though it is empty in the file? Can I do this without using atoi() function for age? i.e., reading fields by that respective types but not all the fields as strings.
Original format
The %s conversion specification skips any white space (blanks, tabs, newlines, etc) in the input, and then reads non-white-space up to the next white space character. The \t appearing in the format string causes fscanf() to skip zero or more white space characters (not just tabs).
You have:
fscanf(fPtr,"%s\t%d\t%s\t%s\the n", pObj->name, pObj->age, pObj->country, pObj-profile);
You need to pass a pointer to the age and you need an arrow -> between pObj and profile (please post code that could compile; it doesn't inspire confidence when there are errors like this):
fscanf(fPtr,"%s\t%d\t%s\t%s\the n", pObj->name, &pObj->age, pObj->country, pObj->profile);
Given the first input line:
Newton 30 United Kingdom Scientist
fscanf() will read Newton into pObj->name, 30 into pObj->age,UnitedintopObj->countryandKingdomintopObj->profile.fscanf()` and family are very casual about white space, in general. Most conversions skip leading white space.
After the 4 values are assigned, you have \the n" at the end of the format. The tab skips the white space between Kingdom and Scientist, but the data doesn't match he n, so the scanning stops — not that you're any the wiser for that.
The next operation will pick up where this one stopped, so the next pObj->name will be assigned Scientist and then the pObj->age conversion will fail because Maxwell doesn't represent an integer. The conversions stop there on that fscanf().
And so the problems continue. Your claimed output can't be attained with the code you show in the question.
If you're adamant that you must use fscanf(), you'll need to use scan sets such as %24[^\t] to read the country. But you'd do better using fgets() or POSIX function getline() to read whole lines of input, and then perhaps use sscanf() but more likely use strcspn() or strpbrk() from Standard C (or perhaps strtok() or — far better — POSIX strtok_r() or Windows strtok_s(), or non-standard strsep()) to split the line into fields at tabs. Note that strtok_r() et al don't care how many repeats there are of the delimiter (tabs in your case) between the fields; you can't have empty fields with them. You can identify empty fields with strcspn(), strpbrk() and strsep().
Cleaned up format
The format string has been revised to:
fscanf(fPtr,"%s\t%d\t%s\t%s\n", pObj->name, &pObj->age, pObj->country, pObj->profile);
This won't work, but can now be adapted so it will work.
if (fscanf(fPtr," %49[^\t]\t%d\t%24[^\t]\t%19[^\n]", pObj->name, &pObj->age, pObj->country, pObj->profile) != 4)
…handle a format error…
Beware trailing white space in scanf() format strings. The leading blank skips any newline left over from previous lines, and skips any leading white space on a line. The %49[^\t] looks for up to 49 non-tabs; the tab is optional and matches any sequence of white space, but the first character will be a tab unless the name was too long. Then it reads a number, more optional white space (it doesn't have to be a tab, but it will be unless the data is malformatted), then up to 24 non-tabs, white space again (of which the first character will be a tab unless there's a formatting problem), and up to 19 non-tabs. The next character should be a newline, unless there's a formatting problem.
I need to be able to get data from a text file, which is presented in the following format:
Starting Cash: 1500
Turn Limit (-1 for no turn limit): 10
Number of Players Left To End Game: 1
Property Set Multiplier: 2
Number of Houses Before Hotels: 4
Must Build Houses Evenly: Yes
Put Money In Free Parking: No
Auction Properties: No
Salary Multiplier For Landing On Go: 1
To clarify, I need the data presented after the colon. I'm not really sure how to approach this. I was reading other questions and they all said to use fgets, but I don't know how long each line will be and we can't statically allocate C strings, so where would I store the line pointed to by fgets? Also, is it possible to do this using fscanf (we have learned how to do fscanf but not learned fgets)? My idea when approaching this was to get each line, and then scan each line with sscanf (I think that would work) using the string literals that I don't need:
sscanf(str, "Starting Cash: %d", &startingCash);
Would this work?
After opening the file with fopen(), you could do
float cash;
int turnlimit, plyrno, mult, house;
fscanf(fin, "%*[^:]: %f", &cash);
fscanf(fin, "%*[^:]: %d", &turnlimit);
fscanf(fin, "%*[^:]: %d", &plyrno);
where fin is the FILE pointer.
%[^:] would scan till, but not including, a : is encountered and the * is for assignment suppression; meaning the value for it would be read but not assigned anywhere.
After reading till the point before the :, the : itself followed by a space must be read. So a : must be there in the format string followed by a space.
See What is the purpose of using the [^ notation in scanf? .
I would like to write a lottery program in C, that reads the chosen numbers of former weeks into an array. I have got a text file in which there are 5 columns that are separated with tabulators. My questions would be the following:
What should I separate the columns with? (e.g. a comma, a semicolon, a tabulator or something else)
Should I include a kind of EOF in the last row? (e.g. -1, "EOF") Is there any accepted or "official" convention to do this?
Which function should I use for reading the numbers? Is there any proper or "accepted" way of reading data from text files?
I used to write a C program for a "Who Wants to Be a Billionaire" game. In that one I used a kind of function that read each line into an array that was big enough to hold a whole line. After that I separated its data into variables like this:
line: "text1";"text2";"text3";"text4"endline (-> line loaded into a buffer array)
text1 -> answer1 (until reaching the semicolon)
text2 -> answer2 (until reaching the semicolon)
text3 -> answer3 (until reaching the semicolon)
text4 -> answer4 (until reaching the end of the line)
endline -> start over, that is read a new line and separate its contents into variables.
It worked properly, but I don't know if it was good enough for a programmer. (btw I'm not a programmer yet, I study Computer Science at a university)
Every answers and advice is welcome. Thanks in advance for your kind help!
The scanf() family of functions don't care about newlines, so if you want to process lines, you need to read the lines first and then process the lines with sscanf(). The scanf() family of functions also treats white space — blanks, tabs, newlines, etc. — interchangeably. Using tabs as separators is fine, but blanks will work too. Clearly, if you're reading and processing a line at a time, newlines won't really factor into the scanning.
int lottery[100][5];
int line;
char buffer[4096];
for (line = 0; fgets(buffer, sizeof(buffer), stdin) != 0 && line < 100; line++)
{
if (sscanf(buffer, "%d %d %d %d %d", &lottery[line][0], &lottery[line][1],
&lottery[line][2], &lottery[line][3], &lottery[line][4]) != 5)
{
fprintf(stderr, "Faulty line: [%s]\n", line);
break;
}
}
This stops on EOF, too many lines, and a faulty line (one which doesn't start with 5 numbers; you can check their values etc in the loop if you want to — but what are the tests you need to run?). If you want to validate the white space separators, you have to work harder.
Maybe you want to test for nothing but spaces and newlines after the 5 numbers; that's a bit trickier (it can be done; look up the %n conversion specification in sscanf()).
my question is how can I read specific sections from a file? For instance, if my file was:
454545454 Joe Brown 70 50 40
656565656 David Smith 80 90 100
383838383 George Williams 95 100 80
How could I read the first string (9-Digit #), skip over the name, and then read the 3 sets of numbers?
I think that you could notice that the white space is your sentinel. I'm thinking that maybe you can store the whole file into a char* and asking for this sentinel each time.
Other solution could be using atoi (ascii to int) for validate if it's a number or a letter. You can also read about fread and fseek.
I think that the best way is to mix both solution... find each sentinel and try to parse it using atoi.
The main idea is that you try to find some pattern in the file that allows you to think the algorithm.
In C, most of the times you have to solve the logic by yourself.
Hope it helps!
Instead of "reading specific sections," read file line by line and save the information you want and discard the others. scanf is used to read formatted from an external source into program variables. Since scanf returns the number of successful reads from the source, you can use that to do some error checking.
char num_string[STR_LEN];
int numbers[3];
char dummy1[STR_LEN], dummy2[STR_LEN];
int num_read = scanf( "%s%s%s%d%d%d", num_string, dummy1, dummy2, &numbers[0], &numbers[1], &numbers[2] );
if( num_read != 6 )
// error
else
{
// do stuff with num_string, and numbers[0]-numbers[2]
}
Right now i am doing an assignment but find it very hard to parse the user input in C. Here is kind of input user will input.
INSERT Alice, 25 Norway Drive, Fitzerald, GA, 40204, 6000.60
Here INSERT is the command (to enter in link list)
Alice is a name
25 Norway Drive is an address
Fitzerald is a city
GA is a state
40204 is a zip code
6000.60 is a balance
How can I use scanf or any other method in C to properly take this as input? The biggest problem in front of me is how to ignore these "," and store these values in separate variables of appropriate data types.
Thanks everyone, i have solve the issue and here is the solution:
pch = strtok(NULL, ","); pch =
substr(pch, 2, strlen(pch)); //substr is my custom funcition and i believe you can tell by its name what it is doing.
strcpy(customer->streetAddress, pch);
Fast easy method:
Use fgets() to get the string from the user;
and strtok() to tokenize it.
Edit
After reading your comment:
Use strtok() with only the comma, and then remove trailing and leading spaces from the result.
Edit2
After a test run, I noticed you will get "INSERT Alice" as the first token. So, after all tokens have been extracted, run strtok() again, this time with a space, on the first token extracted. Or, find the space and somehow identify the command and the name from there.
If your input data format is fixed you can use something quick and dirty using [s]scanf().
With input of:
INSERT Alice, 25 Norway Drive, Fitzerald, GA, 40204, 6000.60
You might try, if reading from stdin:
char name[80], addr[80], city[80], state[80];
int zip;
double amt;
int res = scanf("INSERT %[^,], %[^,], %[^,], %[^,], %d, %f\n",
&name, &addr, &city, &state, &zip, &amt);
Should return the number of items matched (i.e. 6).
scanf() may be a bit tricky in this situation, assuming that different commands with different parameters can be used. I would probably use fgets() to read in the string first, followed by the use of strtok() to read the first token (the command). At that point you can either continue to use strtok() with "," as the delimiter to read the rest of the tokens in the string, or you could use a sscanf() on the rest of the string (now that you know the format that the rest of the input will be in). sscanf() is still going to be a pain due to the fact that it appears that an unspecified number of spaces would be allowed in the address and possibly town fields.