Won't read from file to struct - c

I've been sitting with this problem for 2 days and I can't figure out what I'm doing wrong. I've tried debugging (kind of? Still kind of new), followed this link: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ And I've tried Google and all kinds of things. Basically I'm reading from a file with this format:
R1 Fre 17/07/2015 18.00 FCN - SDR 0 - 2 3.211
and I have to make the program read this into a struct, but when I try printing the information it comes out all wrong. My code looks like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_INPUT 198
typedef struct game{
char weekday[4],
home_team[4],
away_team[4];
int round,
hour,
minute,
day,
month,
year,
home_goals,
away_goals,
spectators;}game;
game make_game(FILE *superliga);
int main(void){
int input_number,
number_of_games = 198,
i = 0;
game tied[MAX_INPUT];
FILE *superliga;
superliga = fopen("superliga-2015-2016.txt", "r");
for(i = 0; i < number_of_games; ++i){
tied[i] = make_game(superliga);
printf("R%d %s %d/%d/%d %d.%d %s - %s %d - %d %d\n",
tied[i].round, tied[i].weekday, tied[i].day, tied[i].month,
tied[i].year, tied[i].hour, tied[i].minute, tied[i].home_team,
tied[i].away_team, tied[i].home_goals, tied[i].away_goals,
tied[i].spectators);}
fclose(superliga);
return 0;
}
game make_game(FILE *superliga){
double spect;
struct game game_info;
fscanf(superliga, "R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
game_info.spectators = spect * 1000;
return game_info;
}

The problem is in your file. It starts with whitespaces, not with R's as you stated in the control string.
Check the return value of fscanf() and you'll see that it's zero everytime.
If you add a leading whitespace to your fscanf() call, your problem will be solved, like this:
fscanf(superliga, " R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);

If each line in the file is a separate record, you should read each line as a string, then try to parse each string.
(Note that this also has the added feature of speculative parsing: you can try parsing the line in several different formats, and accept the one that parses correctly. I like to use this when I accept e.g. vector inputs, so that the user can use x y z, x, y, z, x/y/z, (x,y,z), [x,y,z], <x y z>, <x,y,z>, and so on, depending on what they like. It's only one additional scanf per format, after all.)
To read lines, you can use fgets() into a local buffer. The local buffer must be long enough. If the program is to run on POSIX.1 machines only (i.e., not on Windows), then you can use getline() instead, which can dynamically reallocate the given buffer as needed, so you're not limited to any specific line length.
To parse the string, use sscanf().
Note that all tabs, spaces, and newlines in the pattern in all of the scanf family of functions are treated exactly the same: they indicate any number of any type of whitespace. In other words, \n does not mean "and then a newline"; it means the same as a space, i.e. "and possibly some whitespace here". However, all conversions except %c and %[ automatically skip any leading whitespace; so, with the exception of a space before one of those two, the spaces in the pattern are only meaningful to us humans, they do not have any functional effect in the scanning.
All scanf family of functions return the number of successful conversions. (The only exception is the "conversion" %n, which yields the number of characters consumed; some implementations include it in the conversion count, and some others do not.) If end of input occurs prior to the first conversion, or a read error occurs, or the input does not match with the fixed part of the pattern, the functions will return EOF.
Even if you suppress saving the result of a conversion -- for example, if you have a word in the input you don't need, you can convert but discard it with %*s --, it is counted. So, for example sscanf(line, " %*d %*s %*d") returns 3 if the line starts with an integer, followed by a word (anything that is not a newline nor contains whitespace), followed by an integer.
Rather than have the function return the parsed structure, pass a pointer to the structure (and the file handle to read from), and return a status code. I prefer 0 for success, and nonzero for failure, but feel free to change that.
In other words, I'd suggest you change your read function into
#ifndef GAME_LINE_MAX
#define GAME_LINE_MAX 1022
#endif
int read_game(game *one, FILE *in)
{
char buffer[GAME_LINE_MAX + 2]; /* + '\n' + '\0' */
char *line;
/* Sanity check: no NULL pointers accepted! */
if (!one || !in)
return -1;
/* Paranoid check: Fail if read error has already occurred. */
if (ferror(in))
return -1;
/* Read the line */
line = fgets(buffer, sizeof buffer, in);
if (!line)
return -1;
/* Parse the game; pattern from OP's example: */
if (sscanf(line, "R%d %3s %d/%d/%d %d.%d %3s - %3s %d - %d %d\n",
&(one->round), one->weekday,
&(one->day), &(one->month), &(one->year),
&(one->hour), &(one->minute)
one->home_team,
one->away_team,
&(one->home_goals),
&(one->away_goals),
&(one->spectators)) < 12)
return -1; /* Line not formatted like above */
/* Spectators in the file are in units of 1000; convert: */
one->spectators *= 1000;
/* Success. */
return 0;
}
To use the above function in a loop, reading games one after another from standard input (stdin):
game g;
while (!read_game(&g, stdin)) {
/* Do something with current game stats, g */
}
if (ferror(stdin)) {
/* Read error occurred! */
} else
if (!feof(stdin)) {
/* Not all data was read/parsed! */
}
The two if clauses above are to check if there was a real read error (as in, a problem with the hardware or something like that), and whether there was unread/unparsed data (not at end of file), respectively.
There are two differences in the scanning pattern compared to the OP: First, all strings parsed are limited to 3 characters, because the structure has only room for 3+1 each. The one character is reserved for the end of string '\0', which is not counted in the maximum length for %s. Second, I parse the spectator count directly, and just multiply the field by 1000 if successful.
Also note how I used one->weekday, one->home_team, and one->away_team to refer to the character arrays. This works, because an array variable can be used as if it was a pointer to the first element in that array. (Given char a[5];, a and &a and &(a[0]) can all be used to refer to the first element in the array a). I like to use this "raw form" when scanning, because it makes it easier to match them to %s conversions, and ensure the pattern matches the parameters.

Related

Read tab separated content line by line with last column empty string

I have a file format like this
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
...........................
I have to read first three columns as float and the last column as char array in C. All the columns are tab separated and the there is an new line character at the end of each line. Everything works fine with fscanf(fp1, "%f\t%f\t%f\t%s\n", ...) till I have a some text at the end of each line (the char string part).
There are cases where instead of AA/BB/CC, I have an empty string in the file. How to handle that case. I have tried fscanf(fp1, "%f\t%f\t%f\t%s[^\n]\n", ...) and many other things, but I am unable to figure out the right way. Can you please help me out here?
Using float rather than double will throw away about half the digits shown. You get 6-7 decimal digits with float; you get 15+ digits with double.
As to your main question: use fgets() (or POSIX
getline()) to read lines and then sscanf() to parse the line that is read. This will avoid confusion. When the input is line-based but not regular enough, don't use fscanf() and family to read the data — the file-reading scanf() functions don't care about newlines, even when you do.
Note that sscanf() will return either 3 or 4, indicating whether there was a string at the end of a line or not (or EOF, 0, 1 or 2 if it is given an empty string, or a string which doesn't start with a number, or a string which only contains one or two numbers). Always test the return value from scanf() and friends — but do so carefully. Look for the number of values that you expect (3 or 4 in this example), rather than 'not EOF'.
This leads to roughly:
#include <stdio.h>
int main(void)
{
double d[3];
char text[20];
char line[4096];
while (fgets(line, sizeof(line), stdin) != 0)
{
int rc = sscanf(line, "%lf %lf %lf %19s", &d[0], &d[1], &d[2], &text[0]);
if (rc == 4)
printf("%13.6f %13.6f %13.6f [%s]\n", d[0], d[1], d[2], text);
else if (rc == 3)
printf("%13.6f %13.6f %13.6f -NA-\n", d[0], d[1], d[2]);
else
printf("Format error: return code %d\n", rc);
}
return 0;
}
If given this file as standard input:
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
19.20212223242525 29.3031323334353637 3940.41424344454647
19.20212223242525 29.3031323334353637 3940.41424344454647 PolyVinyl-PolySaccharide
the output is:
1.996920 2.461320 130.812783 [AA]
2.461320 2.554200 138.591316 [BB]
2.554200 2.995380 146.832384 [CC]
19.202122 29.303132 3940.414243 -NA-
19.202122 29.303132 3940.414243 [PolyVinyl-PolySacch]
You can tweak the output format to suit yourself. Note that the %19s avoids buffer overflow even when the text is longer than 19 characters.

sscanf behaviour is different

#include <stdio.h>
int main()
{
char * msg = "Internal power 10. power sufficient. total count 10";
char * temp = "Internal power %d. power %s. total count %d";
int v1, v2, ret;
char str1[64];
ret = sscanf(msg, temp, &v1, str1, &v2);
printf("%d\n", ret);
printf("%d %s %d ", v1, str1 , v2);
return 0;
}
I want to understand why sscanf is failing and why it is not able to retrieve the last variable?
%s reads a whitespace-delimited string; that is, it consumes sufficient. by the time it gets to dot, the rest of the format, ". total count %d" does not match the remainder " total count 10".
Since you're expecting the word to be followed by ., you might as well use %63[^.] i.e. maximum 63 characters that do not include a dot. Or %63[a-z] for maximum 63 ASCII lowercase letters - specifying the width explicitly also ensures that buffer overflow can't happen:
char * temp = "Internal power %d. power %63[^.]. total count %d";
P.S. always check the return value of *scanf - it tells how many specifiers were matched (in this case it should be 3); however, now 2 was returned meaning that the matching failed after the second conversion.
The problem is this part of the scanf format string: "power %s."
The problem is because the scanf format strings are not really regular expressions or otherwise doing exact matching. When you have the "%s" format, then scanf (and its siblings) will read everything until the next white-space.
That means with the string you have the "%s" will cause your sscanf call to read "sufficient." including the dot. Then the call will try to match the dot which have already been read into the string and since it's not available anymore the call will fail.
You can use sets as suggested by Jonathan Leffler in his comment. I also suggest you read e.g. this scanf (and family) reference for more details.

Storing String Inside a String?

My problem is when I try to save the string (series[0]) Inside (c[0])
and I display it, it always ignore the last digit.
For Example the value of (series[0]) = "1-620"
So I save this value inside (c[0])
and ask the program to display (c[0]), it displays "1-62" and ignores the last digit which is "0". How can I solve this?
This is my code:
#include <stdio.h>
int main(void)
{
int price[20],i=0,comic,j=0;
char name,id,book[20],els[20],*series[20],*c[20];
FILE *rent= fopen("read.txt","r");
while(!feof(rent))
{
fscanf(rent,"%s%s%s%d",&book[i],&els[i],&series[i],&price[i]);
printf("1.%s %s %s %d",&book[i],&els[i],&series[i],price[i]);
i++;
}
c[0]=series[0];
printf("\n%s",&c[0]);
return 0;
}
The use of fscanf and printf is wrong :
fscanf(rent,"%s%s%s%d",&book[i],&els[i],&series[i],&price[i]);
Should be:
fscanf(rent,"%c%c%s%d",&book[i],&els[i],series[i],&price[i]);
You have used the reference operator on a char pointer when scanf expecting a char pointer, also you read a string to book and else instead of one character.
printf("1.%s %s %s %d",&book[i],&els[i],&series[i],price[i]);
Should be:
printf("1.%c %c %s %d",book[i],els[i],series[i],price[i]);
And:
printf("\n%s",&c[0]);
Should be:
printf("\n%s",c[0]);
c is an array of char * so c[i] can point to a string and that is what you want to send to printf function.
*Keep in mind that you have to allocate (using malloc) a place in memory for all the strings you read before sending them to scanf:
e.g:
c[0] = (char*)malloc(sizeof(char)*lengthOfString+1);
and only after this you can read characters in to it.
or you can use a fixed size double character array:
c[10][20];
Now c is an array of 20 strings that can be up to 9 characters long.
Amongst other problems, at the end you have:
printf("\n%s",&c[0]);
There are multiple problems there. The serious one is that c[0] is a char *, so you're passing the address of a char * — a char ** — to printf() but the %s format expects a char *. The minor problem is that you should terminate lines of output with newline.
In general, you have a mess with your memory allocation. You haven't allocated space for char *series[20] pointers to point at, so you get undefined behaviour when you use it.
You need to make sure you've allocated enough space to store the data, and it is fairly clear that you have not done that. One minor difficulty is working out what the data looks like, but it seems to be a series of lines each with 3 words and 1 number. This code does that job a bit more reliably:
#include <stdio.h>
int main(void)
{
int price[20];
int i;
char book[20][32];
char els[20][32];
char series[20][20];
const char filename[] = "read.txt";
FILE *rent = fopen(filename, "r");
if (rent == 0)
{
fprintf(stderr, "Failed to open file '%s' for reading\n", filename);
return 1;
}
for (i = 0; i < 20; i++)
{
if (fscanf(rent, "%31s%31s%19s%d", book[i], els[i], series[i], &price[i]) != 4)
break;
printf("%d. %s %s %s %d\n", i, book[i], els[i], series[i], price[i]);
}
printf("%d titles read\n", i);
fclose(rent);
return 0;
}
There are endless ways this could be tweaked, but as written, it ensures no overflow of the buffers (by the counting loop and input conversion specifications including the length), detects when there is an I/O problem or EOF, and prints data with newlines at the end of the line. It checks and reports if it fails to open the file (including the name of the file — very important when the name isn't hard-coded and a good idea even when it is), and closes the file before exiting.
Since you didn't provide any data, I created some random data:
Tixrpsywuqpgdyc Yeiasuldknhxkghfpgvl 1-967 8944
Guxmuvtadlggwjvpwqpu Sosnaqwvrbvud 1-595 3536
Supdaltswctxrbaodmerben Oedxjwnwxlcvpwgwfiopmpavseirb 1-220 9698
Hujpaffaocnr Teagmuethvinxxvs 1-917 9742
Daojgyzfjwzvqjrpgp Vigudvipdlbjkqjm 1-424 4206
Sebuhzgsqpyidpquzjxswbccqbruqf Vuhssjvcjjylcevcisdzedkzlp 1-581 3451
Doeraxdmyqcbbzyp Litbetmttcgfldbhqqfdxqi 1-221 2485
Raqqctfdlhrmhtzusntvgbvotpk Iowdcqlwgljwlfvwhfmw 1-367 3505
Kooqkvabwemxoocjfaa Hicgkztiqvqdjjx 1-466 435
Lowywyzzkkrazfyjuggidsqfvzzqb Qiginniroivqymgseushahzlrywe 1-704 5514
The output from the code above on that data is:
0. Tixrpsywuqpgdyc Yeiasuldknhxkghfpgvl 1-967 8944
1. Guxmuvtadlggwjvpwqpu Sosnaqwvrbvud 1-595 3536
2. Supdaltswctxrbaodmerben Oedxjwnwxlcvpwgwfiopmpavseirb 1-220 9698
3. Hujpaffaocnr Teagmuethvinxxvs 1-917 9742
4. Daojgyzfjwzvqjrpgp Vigudvipdlbjkqjm 1-424 4206
5. Sebuhzgsqpyidpquzjxswbccqbruqf Vuhssjvcjjylcevcisdzedkzlp 1-581 3451
6. Doeraxdmyqcbbzyp Litbetmttcgfldbhqqfdxqi 1-221 2485
7. Raqqctfdlhrmhtzusntvgbvotpk Iowdcqlwgljwlfvwhfmw 1-367 3505
8. Kooqkvabwemxoocjfaa Hicgkztiqvqdjjx 1-466 435
9. Lowywyzzkkrazfyjuggidsqfvzzqb Qiginniroivqymgseushahzlrywe 1-704 5514
10 titles read

How to parse an input line with sscanf?

I have an input .txt file that looks like this:
Robert Hill 53000 5
Amanda Trapp 89000 3
Jonathan Nguyen 93000 3
Mary Lou Gilley 17000 1 // Note that came contains of 3 parts!
Warren Rexroad 72000 7
I need to read those lines and parse them into three different categories: name (which is an array of chars), mileage (int) and years(int).
sscanf(line, "%[^] %d %d ", name, &mileage, &years);
This doesn't work very well for me, any suggestions?
THE PROBLEM
The problem with the current specifier passed to sscanf is that it is both ill-formed, and even when fixed it won't do what you want. If you would have used [^ ] as the first conversion specifier, sscanf would try to read as many characters as it can before hitting a space.
If we assume that a name can't contain digits specifying [^0123456789] will read the correct data, but it will also include the trailing space after the name, but before the first mileage entry. This is however easily solved by replacing the last space with a null-byte in name.
To get the number of characters read into name we can use the %n specifier to denote that we'd sscanf to store the number of bytes read into our matching argument; we can later use this value to correctly "trim" our buffer.
We should also specify a maximum width of the characters read by %[^0123456789] so that it doesn't cause a buffer-overflow, this is done by specifying the size of our buffer directly after our %.
SAMPLE IMPLEMENTATION
#include <stdio.h>
#include <string.h>
int
main (int argc, char *argv[])
{
char const * line = "Mary Lou Gilley 17000 1";
char name[255];
int mileage, years, name_length;
sscanf(line, "%254[^0123456789]%n %d %d ", name, &name_length, &mileage, &years);
name[name_length-1] = '\0';
printf ("data: '%s', %d, %d", name, mileage, years);
return 0;
}
data: 'Mary Lou Gilley', 17000, 1
If you have a function that finds the positon of the first digit like so:
// This function returns the position of the
// space before the first digit (assuming that
// the names dont contain digits)...
char *digitPos(char *s){
if isdigit(*(s+1)) return s;
else return digitPos(s+1);
}
You can then just separate the two variables by inserting a '\0' at the right position like so:
pos = digitPos(line); // This is a pointer to the space
*pos = '\0';
strcpy(name, line);
sscanf(pos + 1, "%d %d", &mileage, &years);
This might help you get started. It lacks the intelligence of BLUEPIXY's solution which handles the trailing whitespace a little better than mine ( or you could chop it off yourself).
dan#rachel ~ $ echogcc -o t t.c
dan#rachel ~ $ echo "Dan P F 3 21" | ./t
Name: Dan P F ,
Mileage: 3,
Years: 21.
Here's the code.
#include <stdio.h>
#include <string.h>
int main(){
char *buf;
int mileage, years;
while(!feof(stdin) ){
if( fscanf( stdin, "%m[^0-9] %d %d", &buf, &mileage, &years) == 3 ){
fprintf(stderr, "Name:\t %s,\nMileage:\t %d,\nYears:\t %d.\n",
buf, mileage, years
);
}
}
}
You have discovered one of the three reasons *scanf should never be used: it's almost impossible to write a format specification that handles nontrivial input syntax, especially if you have to worry about recovering from malformed input. But there are two even more important reasons:
Many input specifications, including your %[...] construct, are just as happy to overflow buffers as the infamous gets.
Numeric overflow provokes undefined behavior -- the C library is licensed to crash just because someone typed too many digits.
The correct way to parse lines like these is to scan for the first digit with strcspn("0123456789", line), or while (*p && !isdigit(*p)) p++;, then use strtoul to convert the numbers that follow.
int pos;
sscanf(line, "%*[^0-9]%n", &pos);
line[--pos]=';';
sscanf(line, "%[^;]; %d %d ", name, &mileage, &years);

Program not working for large files in C

I'm using the following program in C to filter a log file with about 200,000 lines. But the program stops responding after about 12000 lines. Any explanations why does this happen and any solution to it?
The code is compiled in GCC (windows).
PS: The code is executing properly and giving desired output for small files.
#include<stdio.h>
#include<string.h>
int check(char *url)
{
//some code to filter the data and return either 0 or 1 depending upon input
}
int main()
{
FILE *fpi, *fpo;
fpi=fopen("access.log","r");
fpo=fopen("edited\\filter.txt","w");
char date[11],time[9],ip[16],url[500],temp[3];
while(!feof(fpi))
{
printf(".");
fscanf(fpi," %s %s %s %s %s %s",date,time,temp,ip,temp,url);
if(check(url))
fprintf(fpo,"%s %s %s %s %s %s\n",date,time,temp,ip,temp,url);
}
fclose(fpi);
fclose(fpo);
printf("\n\n\nDONE! :)");
return 0;
}
It is possible that one of the lines in the input file contains a field that is larger than the string variable you pass to fscanf(). It might result in a buffer overflow, which later results in an infinite loop somewhere. Just a speculation. I suggest you delimit %s in the fscanf() format string with the maximum length of the output string variable.
For example, this will make sure that there are no buffer overflows and that the resulted strings terminate:
fscanf(fpi," %10s %8s %2s %15s %49s %2s", date, time, temp, ip, temp, url);
date[10] = '\0';
time[8] = '\0';
ip[15] = '\0';
temp[2] = '\0';
url[499] = '\0';
Also, you are reading temp twice. The latter read will override the former. Is this what you intended?
Another improvement, assuming that the input file is line-terminated, and each log is in a separate line, is to use fgets() in order to read a line and only then use sscanf() on the intermediate buffer. This way you ensure that no formatting errors extend beyond a single line. Also, sscanf returns the number of read items, in your case - 6. It's would be safer to check the return value.

Resources