Reading file with different format in each line in C - c

I just start learning C(optional school course). I stuck on a small problem for 2 days. So the basic idea is, I have bunch of data in a file that I want to extract. However, there are 2 formats that the data has, and the first letter on each line determines what action I need to take.
For example, the data in file looks like these:
S:John,engineer,male,30
S:Alice,teacher,female,40
C:Ford Focus,4-door,25000
C:Chevy Corvette,sports,56000
S:Anna,police,female,36
What I want to do is, after open the file, read each line. If the first letter is S, then use
fscanf(fp, "%*c:%[^,],%[^,],%[^,],%d%*c",name,job,sex,&age)
to store all variable so I can pass them to function people().
But if the first letter is C, then use
fscanf(fp, "%*c:%[^,],%[^,],%d%*c",car,type,&price)
to store so I can pass them to function vehicle().
Would really appreciated if anyone can give me some pointer on how to do this. Thanks.

There are many approaches, but separating IO from parsing is a good first step.
With line oriented data, it is so much cleaner to simply
FILE *inf = ...;
char buf[100];
if (fgets(buf, sizeof buf, inf) == NULL) Handle_EOForIOError();
Then parse it.
char name[sizeof buf];
char job[sizeof buf];
char sex[sizeof buf];
unsigned age;
char car[sizeof buf];
char type[sizeof buf];
unsigned cost;
int n;
if (sscanf(buf, "S:%[^,],%[^,],%[^,],%u %n", name, job, sex, &age, &n) == 4 &&
buf[n] == '\0')
Good_SRecord();
else if (sscanf(buf, "C:%[^,],%[^,],%u %n", car, type, &cost &n) == 3 &&
buf[n] == '\0')
Good_CRecord();
else Garbage();
The " %n" trick is really good at making sure all the data parsed as expected without extra junk.

Here is the rough idea: parse the string in 2 steps:
"%c:%s"
will get you the first character and the rest of the string. Then, based on what you read in the first character, you can continue parsing the remaining part as
"%[^,],%[^,],%[^,],%d%c"
or
"%c:%[^,],%[^,],%d%c"

Related

How to write to file user’s input and read from file

I’m a beginner at programming with C. I’m here because I need help from professional programmers. So, I have to do a simple program called “Lessons”. In this program, I need to use structure, an array of structure and use files to save all records.
For now, I have a struct
typed struct lessons{
char LessonName [30],
char TeacherName [20],
char TeacherLastName[20],
int numberOfStudents
} lessons
And array
lessons info[10]
As far I can understand I have an array called “info” which can handle 10 lessons information. Right?
And now I face up with my biggest problem. How should I “play” with all records?
Should I create a txt file and fill it up with some information or I should add the lesson’s information with code help?
Can anybody explain, give some examples of how should I put the new record to a static array when the user enters all information about the lesson?
And also, how to scan all records from (txt or bin) file and show it on console?
To make work easier I can give examples of records:
Physical education Harry Pleter 32
History Emily Shelton 12
If all string fields of the struct are a single word, you can do it like this:
for (int i = 0; i < sizeof info / sizeof info[0]; i++)
{
scanf("%29s %19s %19s %d", info[i].LessonName, info[i].TeacherName, info[i].TeacherLastName, &info[i].numberOfStudents);
}
This will work perfectly with the string "History Emily Shelton 12", but will however fail to work with the string "Physical education Harry Pleter 32", because Physical education is 2 words.
Reading more than one word per field
Reading more than 1 word per field can be done very similarly, but you will have to decide which is the character that separates the fields. In this example, I used comma ,:
for (int i = 0; i < sizeof info / sizeof info[0]; i++)
{
scanf("%29[^,] %19[^,] %19[^,] %d", info[i].LessonName, info[i].TeacherName, info[i].TeacherLastName, &info[i].numberOfStudents);
}
This will correctly parse the string: "Physical education, Harry, Pleter, 32". The commas are needed because the program needs to know when to stop reading a field and continue with the next one.
How should I “play” with all records?
How about 1 line of the file == 1 record?
Code needs 2 functions:
void lessons_write(FILE *destination, const lessons *source);
// return 1: success, EOF: end-of-file, 0: fail
int lessons_read(FILE *source, lessons *destination);
A key consideration: Input may not be formatted correctly and so calling code needs to know when something failed.
Consider comma separated values CSV.
void lessons_write(FILE *destination, const lessons *source) {
// Maybe add tests here to detect if the record is sane.
fprintf(destination, "%s, %s, %s, %d\n",
source->LessonName,
source->TeacherName,
source->TeacherLastName,
source->numberOfStudents);
}
int lessons_read(FILE *source, lessons *destination) {
// Use a buffer large enough for any reasonable input.
char buf[sizeof *destination * 2];
// Use fgets to read **1 line**
if (fgets(buf, sizeof buf, source) == NULL) {
return EOF;
}
// Use %n to detect end of scanning. It records the offset of the scan.
int n = 0;
sscanf(buf, " %29[^,], %19[^,], %19[^,], %d %n",
destination->LessonName,
destination->TeacherName,
destination->TeacherLastName,
&destination->numberOfStudents, &n);
// If scan did not complete or some junk at the end ...
if (n == 0 || buf[n] != '\0') {
return 0;
}
// Maybe add tests here to detect if the record is sane.
return 1;
}
Now armed with basics, consider more advanced issues:
Can a course name or teachers name contain a ',' or '\n' as that will foal up the reading? Perhaps form a bool Lessons_Valid(const lessons *source) test?
Can a course name or teachers name begin, contain or end with spaces? Or control characters?
What range is valid for # of students? Perhaps 0-9999.
How about long names?
A key to good record handling is the ability to detect garbage input.
give some examples of how should I put the new record to a static array when the user enters all information about the lesson?
How about some pseudo code to not take all the learning experience away?
open file to read, successful?
Set N as 10
lessons abc[N]
for up to N times
read record and return success
Was that the last (EOF)?
Was is a bad record (Error message)
else save it and increment read_count
close input
for read_count
print the record to stdout

Check if user input into an array is too long?

I am getting the user to input 4 numbers. They can be input: 1 2 3 4 or 1234 or 1 2 34 , etc. I am currently using
int array[4];
scanf("%1x%1x%1x%1x", &array[0], &array[1], &array[2], &array[3]);
However, I want to display an error if the user inputs too many numbers: 12345 or 1 2 3 4 5 or 1 2 345 , etc.
How can I do this?
I am very new to C, so please explain as much as possible.
//
Thanks for your help.
What I have now tried to do is:
char line[101];
printf("Please input);
fgets(line, 101, stdin);
if (strlen(line)>5)
{
printf("Input is too large");
}
else
{
array[0]=line[0]-'0'; array[1]=line[1]-'0'; array[2]=line[2]-'0'; array[3]=line[3]-'0';
printf("%d%d%d%d", array[0], array[1], array[2], array[3]);
}
Is this a sensible and acceptable way? It compiles and appears to work on Visual Studios. Will it compile and run on C?
OP is on the right track, but needs adjust to deal with errors.
The current approach, using scanf() can be used to detect problems, but not well recover. Instead, use a fgets()/sscanf() combination.
char line[101];
if (fgets(line, sizeof line, stdin) == NULL) HandleEOForIOError();
unsigned arr[4];
int ch;
int cnt = sscanf(line, "%1x%1x%1x%1x %c", &arr[0], &arr[1], &arr[2],&arr[3],&ch);
if (cnt == 4) JustRight();
if (cnt < 4) Handle_TooFew();
if (cnt > 4) Handle_TooMany(); // cnt == 5
ch catches any lurking non-whitespace char after the 4 numbers.
Use %1u if looking for 1 decimal digit into an unsigned.
Use %1d if looking for 1 decimal digit into an int.
OP 2nd approach array[0]=line[0]-'0'; ..., is not bad, but has some shortcomings. It does not perform good error checking (non-numeric) nor handles hexadecimal numbers like the first. Further, it does not allow for leading or interspersed spaces.
Your question might be operating system specific. I am assuming it could be Linux.
You could first read an entire line with getline(3) (or readline(3), or even fgets(3) if you accept to set an upper limit to your input line size) then parse that line (e.g. with sscanf(3) and use the %n format specifier). Don't forget to test the result of sscanf (the number of read items).
So perhaps something like
int a=0,b=0,c=0,d=0;
char* line=NULL;
size_t linesize=0;
int lastpos= -1;
ssize_t linelen=getline(&line,&linesize,stdin);
if (linelen<0) { perror("getline"); exit(EXIT_FAILURE); };
int nbscanned=sscanf(line," %1d%1d%1d%1d %n", &a,&b,&c,&d,&lastpos);
if (nbscanned>=4 && lastpos==linelen) {
// be happy
do_something_with(a,b,c,d);
}
else {
// be unhappy
fprintf(stderr, "wrong input line %s\n", line);
exit(EXIT_FAILURE);
}
free(line); line=NULL;
And once you have the entire line, you could parse it by other means like successive calls of strtol(3).
Then, the issue is what happens if the stdin has more than one line. I cannot guess what you want in that case. Maybe feof(3) is relevant.
I believe that my solution might not be Linux specific, but I don't know. It probably should work on Posix 2008 compliant operating systems.
Be careful about the result of sscanf when having a %n conversion specification. The man page tells that standards might be contradictory on that corner case.
If your operating system is not Posix compliant (e.g. Windows) then you should find another way. If you accept to limit line size to e.g. 128 you might code
char line[128];
memset (line, 0, sizeof(line));
fgets(line, sizeof(line), stdin);
ssize_t linelen = strlen(line);
then you do append the sscanf and following code from the previous (i.e. first) code chunk (but without the last line calling free(line)).
What you are trying to get is 4 digits with or without spaces between them. For that, you can take a string as input and then check that string character by character and count the number of digits(and spaces and other characters) in the string and perform the desired action/ display the required message.
You can't do that with scanf. Problem is, there are ways to make scanf search for something after the 4 numbers, but all of them will just sit there and wait for more user input if the user does NOT enter more. So you'd need to use gets() or fgets() and parse the string to do that.
It would probably be easier for you to change your program, so that you ask for one number at a time - then you ask 4 times, and you're done with it, so something along these lines, in pseudo code:
i = 0
while i < 4
ask for number
scanf number and save in array at index i
E.g
#include <stdio.h>
int main(void){
int array[4], ch;
size_t i, size = sizeof(array)/sizeof(*array);//4
i = 0;
while(i < size){
if(1!=scanf("%1x", &array[i])){
//printf("invalid input");
scanf("%*[^0123456789abcdefABCDEF]");//or "%*[^0-9A-Fa-f]"
} else {
++i;
}
}
if('\n' != (ch = getchar())){
printf("Extra input !\n");
scanf("%*[^\n]");//remove extra input
}
for(i=0;i<size;++i){
printf("%x", array[i]);
}
printf("\n");
return 0;
}

A way to fscanf on only the first line

I've been looking into a way to obtain 2 integers, seperated by a space, that are located in the first line of a file that I would read. I considered using
fscanf(file, "%d %d\n", &wide, &high);
But that read 2 integers that were anywhere in the file, and would give the wrong output if the first line was in the wrong format. I also tried using
char line[1001];
fgets(line, 1000, file);
Which seems like the best bet, except for how clumsy it is. It leaves me with a string that has up to a few hundred blank spaces, from which I must extract my precious integers, nevermind checking for errors in formatting.
Surely there is a better option than this? I'll accept any solution, but the most robust solution seems (to me) to be a fscanf on the first line only. Any way to do that?
You can capture the character immediately following the second number in a char, and check that the captured character is '\n', like this:
int wide, high;
char c;
if (fscanf(file, "%d%d%c", &wide, &high, &c) != 3 || c != '\n') {
printf("Incorrect file format: expected two ints followed by a newline.");
}
Here is a demo on ideone.
Which seems like the best bet, except for how clumsy it is.
Nah, it's not clumsy at all (except that you are using the size argument of fgets() in the wrong way...). It's perfectly fine & idiomatic. strtol() does its job pretty well:
char line[LINE_MAX];
fgets(line, sizeof line, file);
char *endp;
int width = strtol(line, &endp, 10);
int height = strtol(endp, NULL, 10);

using fscanf to read in items

There is a file a.txt looks like this:
1 abc
2
3 jkl
I want to read each line of this file as an int and a string, like this:
fp = fopen("a.txt", "r");
while (1) {
int num;
char str[10];
int ret =fscanf(fp, "%d%s", &num, str);
if (EOF == ret)
break;
else if (2 != ret)
continue;
// do something with num and str
}
But there is a problem, if a line in a.txt contains just a num, no string (just like line 2), then the above code will be stuck in that line.
So any way to jump to the next line?
Do it in two steps:
Read a full line using fgets(). This assumes you can put a simple static limit on how long lines you want to support, but that's very often the case.
Use sscanf() to inspect and parse the line.
This works around the problem you ran into, and is generally the better approach for problems like these.
UPDATE: Trying to respond to your comment. The way I see it, the easiest approach is to always read the full line (as above), then parse out the two fields you consider it to consist of: the number and the string.
If you do this with sscanf(), you can use the return value to figure out how it went, like you tried with fscanf() in your code:
const int num_fields = sscanf(line, "%d %s", &num, str);
if( num_fields == 1 )
; /* we got only the number */
else if( num_fields == 2 )
; /* we got both the number and the string */
else
; /* failed to get either */
Not sure when you wouldn't "need" the string; it's either there or it isn't.
If the first character of the string is \r or\n this will be an empty string. You can use the comparison. fscanf() is not suitable if words contain spaces(or empty lines) in them .In that case better to use fgets()
How to solve "using fscanf":
After the int, look for spaces (don't save), then look for char that are not '\n'.
int num;
char str[10];
#define ScanInt "%d"
#define ScanSpacesDontSave "%*[ ]"
#define ScanUntilEOL "%9[^\n]"
int ret = fscanf(fp, ScanInt ScanSpacesDontSave ScanUntilEOL, &num, str);
if (EOF == ret) break;
else if (1 == ret) HandleNum(num);
else if (2 == ret) HandleNumStr(num, str);
else HadleMissingNum();
ret will be 2 if something was scanned into str, else ret will typically be 1. The trick to the above is not to scan in a '\n' after the int. Thus code can not use "%s" nor " " after the "%d" which both consume all (leading) white-space. The next fscanf() call will consume the '\n' as part of leading white-space consumption via "%d".
Minor note: reading the line with fgets() then parsing the buffer is usually a better approach, but coding goals may preclude that.

fscanf() error on this text file?

test.txt:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
ccccccccccccccccccccccccccccccccccc
The color of the car is blue
code:
FILE *fp;
char var[30];
int result
if( (fp = fopen( "test.txt", "r")) == NULL) //open for read
printf("ERROR");
result = fscanf( fp, "The color of the car is %s", &var);
After this executes:
opens file (not NULL and was able to execute an append when testing)
result = 0 //zero- in the case of a matching failure....?
errno = 0;
and var is garbage.
I was expecting fscanf() to match "blue".
How should I correctly get blue into var?
Thank You.
How about using fgets instead:
char *search = "The color of the..";
char *p;
while (fgets(var, SIZE, stdin)) {
/* Does it have what we want ? */
if ((p = strstr(var, search)))
break;
}
if (p)
p += strlen(str);
Fscanf doesn't work like this. It doesn't look around nor scan the string for stuff to match. You have to supply a string exactly matching the format specifier. So you could do stuff like
result = fscanf(fp, "aaaaaaaaaaaaa\nbbbbbbbbbbb\ncccccccccc\nThe color of the car is %s", &var);
to achieve what you want.
You have a bug where you are passing in the pointer to an array of char into fscanf, when you should be passing in a pointer to char. It so happens that the address of an array equals the address of the first element of an array, so it works on accident. But, the wrong type is being passed in.
In addition, you want to find the line which provides you with the match, and you seem to want to store "blue" into var. To do this, you should test to see if a line of input matches your scan pattern. You can use fgets to read a line, and use sscanf to scan the line.
while (fgets(buf, sizeof(buf), fp) != 0) {
result = sscanf(buf, "The color of the car is %s", var);
if (result == 1) break;
}
sscanf will return how many directives it successfully scanned. In this case, there is the one %s directive.
Doing it this way prevents a scanning error from "jamming" your input parsing. That is, if fscanf returns an error, it will fail to move past the input that caused the error until you call clearerr().
You have another bug here in that if test.txt doesn't exist, your code will go ahead and use fscanf anyway, making your check useless. This would almost certainly cause the program to crash.
Use regex
char p[100];
char q[100];
char r[100];
fscanf(fp, "%[^\n]\n%[^\n]\n%[^\n]\nThe color of the car is %s",p,q,r, &var);
Modify it according to your requirement. I know this is not your actual string.

Resources