C copy substring from text file - c

Say I have the following text file -
name:asdfg
address:zcvxz
,
name:qwerwer
address:zxcvzxcvxz
,
And I wanna copy the name (without "name:") to a certain string variable, the address to another and so on.
How do I do so without corrupting memory?
Tried using (example) -
char buf[50];
while (fgets(buf, 50, file) != NULL) {
if (!strncmp(buf, "name", 4))
strncpy(somestring, buf + 5, 20)
//do the same for address, continue looping
but the text lines differ in length, so it seems to copy all sorts of crap from the buffer, as the strings arent null terminated so it copies "asdfgcrapcrapcrap".

You are to be commended for using fgets to handle your file I/O as it provides a much more flexible and robust way to read, validate and prepare to parse the lines of data you read. It is generally the recommended way to do line-oriented input (either from a file or from the user). However, this is one of those circumstances where treating multiple records as formatted input does have some advantages.
Let's start with an example reading your data file and capturing the name:.... and address:... data in a simple data structure to hold both the name and address data values in a 20-char array for each. Each line is read, the length is validated, the trailing '\n' is removed and then strchr is used to locate the ':' in the line. (we don't care about lines without ':'). The label before ':' is copied to tmp and then compare against "name" or "address" to determine which value to read. Once the address data is read, both name and addr values are printed to stdout,
#include <stdio.h>
#include <string.h>
enum { MAXC = 20, MAXS = 256 };
typedef struct {
char name[MAXC],
addr[MAXC];
} data;
int main (int argc, char **argv) {
char buf[MAXS] = "",
*name = "name", /* name/address literals for comparison */
*addr = "address";
data mydata = { .name = "" };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fgets (buf, MAXS, fp)) { /* read each line */
char *p = buf, /* pointer to use with strchr */
tmp[MAXC] = ""; /* storage for labels */
size_t len = strlen (buf); /* get buf len */
if (len && buf[len - 1] == '\n') /* validate last char is '\n' */
buf[--len] = 0; /* overwrite with nul-character */
else if (len + 1 == MAXS) { /* handle string too long */
fprintf (stderr, "error: line too long or no '\n'\n");
return 1;
}
if ((p = strchr (buf, ':'))) { /* find ':' in buf */
size_t labellen = p - buf, /* get length of label */
datalen = strlen (p + 1); /* get length of data */
if (labellen + 1 > MAXC) { /* validate both lengths */
fprintf (stderr, "error: label exceeds '%d' chars.\n", MAXC);
return 1;
}
if (datalen + 1 > MAXC) {
fprintf (stderr, "error: data exceeds '%d' chars.\n", MAXC);
return 1;
}
strncpy (tmp, buf, labellen); /* copy label to temp */
tmp[labellen] = 0; /* nul-terminate */
if (strcmp (name, tmp) == 0) /* is the label "name" ? */
strcpy (mydata.name, p + 1);
else if (strcmp (addr, tmp) == 0) { /* is the label "address" ? */
strcpy (mydata.addr, p + 1);
/* record complete -- output results */
printf ("\nname : %s\naddr : %s\n", mydata.name, mydata.addr);
}
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
(note: there are many ways to structure this logic. The example above just represents a semi-standard method)
Example Use/Output
$./bin/nameaddr <dat/nameaddr.txt
name : asdfg
addr : zcvxz
name : qwerwer
addr : zxcvzxcvxz
Here is where I will have a tough time convincing you that fgets was the way to go for this problem. Why? Here we are essentially reading formatted input that is comprised of 3-lines of data. The format string for fscanf doesn't care how many lines are involved, and can easily be constructed to skip '\n' within the formatted input. This can provide (a more fragile), but attractive alternative for the right input files.
For example, the code above can be reduced to the following using fscanf for a formatted read:
#include <stdio.h>
#define MAXC 20
typedef struct {
char name[MAXC],
addr[MAXC];
} data;
int main (int argc, char **argv) {
data mydata = { .name = "" };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* read 3-lines at a time separating name and address at once */
while (fscanf (fp, " name:%19s address:%19s ,",
mydata.name, mydata.addr) == 2)
printf ("\nname : %s\naddr : %s\n", mydata.name, mydata.addr);
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
(the output is the same)
In the rare case, for the correct data file, fscanf can provide a viable alternative to a line-oriented read with fgets. However, your first choice should remain a line-oriented approach using either fgets or POSIX getline.
Look both over and let me know if you have further questions.

If the name is 20 characters or longer, strncpy() won't copy the null terminator to the destination string, so you need to add it yourself.
strncpy(somestring, buf + 5, 19);
somestring[19] = '\0';

Related

Reading Part of a File in C

I have a file that contain few different sections. All sections have a start section and end section lines to distinguish between sections.
How can I read lines from section-2?
>start Section-1
Some words are here.
>end Section-1
>start Section-2
Other words are also here.
>end Section-2
With my current code, all the file is printed (all sections except words separating sections). I understand the issue is that in my fgets I'm reading the file until #end Section-2 and I probably need another while loop to read lines from specific start section. But I'm not sure how can I change the code so it will only output words inside the section-2.
Expected output:
Other
words
are
also
here.
What I get now:
Some
words
are
here.
Other
words
are
also
here.
My code:
#define MAXSTR 1000
#define END ">end Section-2\n"
#define ENDWORD ">end"
#define STRWORD ">start"
#define SECTION "Section-2"
int main () {
FILE *file;
char lines[MAXSTR];
char delim[2] = " ";
char *words;
if ((file = fopen("sample.txt", "r")) == NULL) {
printf("File empty.\n");
return 0;
}
while (strcmp(fgets(lines, MAXSTR, file), END) != 0) {
words = strtok(lines, delim);
while (words != NULL && strcmp(words, STRWORD) != 0
&& strcmp(words, SECTION) != 0
&& strcmp(words, ENDWORD) != 0) {
printf("%s\n", words);
words = strtok(NULL, delim);
}
}
fclose(fileUrl);
return 0;
}
You are thinking along the correct lines. The key is to set a flag when you find the first "Section-X" to read and then while that flag is set, tokenize each line until the closing "Section-X" is found, at which time you exit your read-loop.
You can check for "Section-X" however you like, using the entire line, or just the "Section-X" identifier (which I chose below). To locate the "Section-X" text, just use strrchr() to find the last space in each line, and compare from the next character to the end of line for your section, e.g.
#include <stdio.h>
#include <string.h>
#define MAXC 1024
int main (int argc, char **argv) {
if (argc < 2) { /* validate 1 arg givent for filename */
fprintf (stderr, "usage: %s file [\"Section-X\" (default: 2)]\n", argv[0]);
return 1;
}
const char *section = argc > 2 ? argv[2] : "Section-2", /* set section */
*delim = " ";
char line[MAXC];
int found = 0; /* found flag, 0-false, 1-true */
FILE *fp = fopen (argv[1], "r"); /* open file */
if (!fp) { /* validate file open for reading */
perror ("fopen-fp");
return 1;
}
while (fgets (line, MAXC, fp)) { /* read each line */
line[strcspn (line, "\n")] = 0; /* trim \n from end */
char *p = strrchr(line, ' '); /* pointer to last space */
if (p && strcmp (p + 1, section) == 0) { /* compare "Section-X" */
if (found++) /* check/set found flag */
break; /* break loop if 2nd "Section-X" */
continue;
}
if (found) { /* if found set, tokenize each line */
for (p = strtok (line, delim); p; p = strtok (NULL, delim))
puts (p);
}
}
}
Example Use/Output
With your input stored in the file dat/sections.txt and reading default "Section-2":
$ ./bin/read_sections dat/sections.txt
Other
words
are
also
here.
Reading "Section-1":
$ ./bin/read_sections dat/sections.txt "Section-1"
Some
words
are
here.
Look things over and let me know if you have questions.

Parsing INI file in C - how to store sections and its' keys and values?

I'm trying to parse .ini file using only STANDARD Libraries in C.
Input files look like:
[section1]
key1 = value1
key2 = value2
[section2]
key3 = vaule3
key4 = value4
key5 = value5
...
im running that with ./file inputfile.ini section2.key3 and i want to get value of key3 from section2
MY QUESTION is: How to easily store keys and values? - im a total beginner, so I need something simple and easy to implement - maybe struct but how to store all keys and values inside struct if i don't know quantity of keys?
I got stuck here, two strings section and current_section looks equally but in if(section == current_section) they don't pass True, what is the problem?
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
FILE * fPointer;
fPointer = fopen(argv[1], "r"); // read from file
char singleLine[30];
char section[30];
char key[30];
int right_section = 0;
char current_section[30];
sscanf(argv[2], "%[a-zA-Z0-9].%[a-zA-Z0-9]", section, key); //split of section.key
while(!feof(fPointer)){
fgets(singleLine, 30, fPointer); //read line by line
printf("%s", singleLine);
char current_key[30];
char current_value[30];
if(singleLine[0]=='['){
sscanf(singleLine, "[%127[^]]", current_section); //strip from []
printf("current section:%s%s", current_section, section); //both look equally
if(current_section == section){ // doesn't work here, current_section == section looks the same but if doesnt work
right_section = 1;
printf("yes, right");
}
}
}
fclose(fPointer);
return 0;
}```
You are working down the correct path, but there are a few things that you must approach differently if you want ensure things work correctly. If you take nothing else from this answer, learn that you cannot use any input or parsing function without checking the return (that applies to virtually every function you use, unless the operation of code that follows does not depend on the result -- like just printing values) Also, you never use while (!feof(fpointer)), e.g. see: Why is while ( !feof (file) ) always wrong?
Now, how to approach the problem. First, if you need a constant for your array size, then #define a constant or use a global enum. For example, for my sect, inisect, key, inikey and val buffers I would define SPLTC and then for my line buffer, I define MAXC, e.g.
#define SPLTC 128 /* if you need a constant, #define one (or more) */
#define MAXC 256
Depending on whether you need to be -ansi or c89/90 compatible, declare your variables before any operations, e.g.
int main (int argc, char **argv) {
char buf[MAXC], sect[SPLTC], inisect[SPLTC], key[SPLTC], inikey[SPLTC], val[SPLTC];
FILE *fp = NULL;
Then the first thing you will do is validate that sufficient arguments were provided on the command line:
if (argc < 3) { /* validate 2 arguments provided */
fprintf (stderr,
"error: insufficient number of arguments\n"
"usage: %s file.ini section.key\n", argv[0]);
return 1;
}
Next you will open your file and validate that it is open for reading:
/* open/validate file open for reading */
if ((fp = fopen (argv[1], "r")) == NULL) {
fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
perror ("fopen");
return 1;
}
Then split your section.key argument argv[2] into sect and key and validate the separation:
/* split section.key into sect & key */
if (sscanf (argv[2], " %127[^.]. %127s", sect, key) != 2) {
fputs ("error: invalid section.key\n", stderr);
return 1;
}
Now enter your read loop to find your section in the file (you always control the loop with the return of the read function itself):
while (fgets (buf, MAXC, fp)) { /* read each line */
if (buf[0] == '[') { /* is first char '[]'? */
if (sscanf (buf, " [%127[^]]", inisect) == 1) { /* parse section */
if (strcmp (sect, inisect) == 0) /* does it match 2nd arg? */
break; /* if so break loop */
}
}
}
How do you check that the section was found? You can keep a flag-variable as you have done with right_section, or... think about where you would be in the file if your section wasn't found? You would be at EOF. So now you can correctly check feof(fp), e.g.
if (feof (fp)) { /* if file stream at EOF, section not found */
fprintf (stderr, "error: EOF encountered before section '%s' found.\n",
sect);
return 1;
}
If you haven't exited due to not finding your section (meaning you got to this point in the code), just read each line validating a separation into inikey and val (if the validation fails -- you have read all the key/val pairs in that section without a match) If you find the key match during your read of the section success you have your inikey and val. If you complete the loop without a match you can check if you issue an error, and if you reach EOF without a match, you can again check feof(fp) after the loop, e.g.
while (fgets (buf, MAXC, fp)) { /* continue reading lines */
/* parse key & val from line */
if (sscanf (buf, " %127s = %127s", inikey, val) != 2) { /* if not key & val */
fprintf (stderr, "error: end of section '%s' reached "
"with no matching key found.\n", sect);
return 1;
}
if (strcmp (key, inikey) == 0) { /* does key match? */
printf ("section : %s\n key : %s\n val : %s\n", sect, key, val);
break;
}
}
if (feof (fp)) { /* if file stream at EOF, key not found */
fprintf (stderr, "error: EOF encountered before key '%s' found.\n",
argv[3]);
return 1;
}
That's basically it. If you put it altogether you have:
#include <stdio.h>
#include <string.h>
#define SPLTC 128 /* if you need a constant, #define one (or more) */
#define MAXC 256
int main (int argc, char **argv) {
char buf[MAXC], sect[SPLTC], inisect[SPLTC], key[SPLTC], inikey[SPLTC], val[SPLTC];
FILE *fp = NULL;
if (argc < 3) { /* validate 2 arguments provided */
fprintf (stderr,
"error: insufficient number of arguments\n"
"usage: %s file.ini section.key\n", argv[0]);
return 1;
}
/* open/validate file open for reading */
if ((fp = fopen (argv[1], "r")) == NULL) {
fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
perror ("fopen");
return 1;
}
/* split section.key into sect & key */
if (sscanf (argv[2], " %127[^.]. %127s", sect, key) != 2) {
fputs ("error: invalid section.key\n", stderr);
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line */
if (buf[0] == '[') { /* is first char '[]'? */
if (sscanf (buf, " [%127[^]]", inisect) == 1) { /* parse section */
if (strcmp (sect, inisect) == 0) /* does it match 2nd arg? */
break; /* if so break loop */
}
}
}
if (feof (fp)) { /* if file stream at EOF, section not found */
fprintf (stderr, "error: EOF encountered before section '%s' found.\n",
sect);
return 1;
}
while (fgets (buf, MAXC, fp)) { /* continue reading lines */
/* parse key & val from line */
if (sscanf (buf, " %127s = %127s", inikey, val) != 2) { /* if not key & val */
fprintf (stderr, "error: end of section '%s' reached "
"with no matching key found.\n", sect);
return 1;
}
if (strcmp (key, inikey) == 0) { /* does key match? */
printf ("section : %s\n key : %s\n val : %s\n", sect, key, val);
break;
}
}
if (feof (fp)) { /* if file stream at EOF, key not found */
fprintf (stderr, "error: EOF encountered before key '%s' found.\n",
argv[3]);
return 1;
}
}
Example Use/Output
Finding valid section/key combinations:
$ ./bin/readini dat/test.ini section2.key3
section : section2
key : key3
val : vaule3
$ /bin/readini dat/test.ini section2.key5
section : section2
key : key5
val : value5
$ ./bin/readini dat/test.ini section1.key2
section : section1
key : key2
val : value2
Attempts to find invalid section/key combinations.
$ ./bin/readini dat/test.ini section1.key3
error: end of section 'section1' reached with no matching key found.
$ ./bin/readini dat/test.ini section2.key8
error: EOF encountered before key 'key8' found.
Look things over and let me know if you have further questions.
As you pointed out, the issue is in the strings compare if statement, the thing is that a char array (and any array in C) variable in reality is a pointer to the first element of the array (this explanation is over-simplified).
So in your if statement you are really comparing the memory addresses of the first element of each of the two array variables instead of comparing the content of each other.
For doing a correct compare of strings there are several options, you could do it manually (iterating over each array and comparing element by element) or you could use some helpers from the standar library, like strcmp or memcmp.
For example you could re-write your if statement as below:
#include <string.h>
if (memcmp ( section, current_section, sizeof(section) ) == 0) {
// both arrays have the same content
}

Extract numerical values from a string and average them

I have a .txt file that contains data in this format:
xxxx: 0.9467,
yyyy: 0.9489,
zzzz: 0.78973,
hhhh: 0.8874,
yyyy: 0.64351,
xxxx: 0.8743,
and so on...
Let's say that my C program receives, as input, the string yyyy. The program should, simply, return all the instances of yyyy in the .txt file and the average of all their numerical values.
int main() {
FILE *filePTR;
char fileRow[100000];
if (fopen_s(&filePTR, "file.txt", "r") == 0) {
while (fgets(fileRow, sizeof fileRow, filePTR) != NULL) {
if (strstr(fileRow, "yyyy") != NULL) { // Input parameter
printf("%s", fileRow);
}
}
fclose(filePTR);
printf("\nEnd of the file.\n");
} else {
printf("ERROR! Impossible to read the file.");
}
return 0;
}
This is my code right now. I don't know how to:
Isolate the numerical values
actually convert them to double type
average them
I read something about the strtok function (just to start), but I would need some help...
You have started off on the right track and should be commended for using fgets() to read a complete line from the file on each iteration, but your choice of strstr does not ensure the prefix you are looking for is found at the beginning of the line.
Further, you want to avoid hardcoding your search string as well as the file to open. main() takes arguments through argc and argv that let you pass information into your program on startup. See: C11 Standard - ยง5.1.2.2.1 Program startup(p1). Using the parameters eliminates your need to hardcode values by letting you pass the filename to open and the prefix to search for as arguments to your program. (which also eliminates the need to recompile your code simply to read from another filename or search for another string)
For example, instead of hardcoding values, you can use the parameters to main() to open any file and search for any prefix simply using something similar to:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC] = "", *str = NULL; /* buffer for line and ptr to search str */
size_t n = 0, len = 0; /* counter and search string length */
double sum = 0; /* sum of matching lines */
FILE *fp = NULL; /* file pointer */
if (argc < 3) { /* validate 2 arguments given - filename, search_string */
fprintf (stderr, "error: insufficient number of arguments\n"
"usage: %s filename search_string\n", argv[0]);
return 1;
}
if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
perror ("fopen-filename");
return 1;
}
str = argv[2]; /* set pointer to search string */
len = strlen (str); /* get length of search string */
...
At this point in your program, you have opened the file passed as the first argument and have validated that it is open for reading through the file-stream pointer fp. You have passed in the prefix to search for as the second argument, assigned it to the pointer str and have obtained the length of the prefix and have stored in in len.
Next you want to read each line from your file into buf, but instead of attempting to match the prefix with strstr(), you can use strncmp() with len to compare the beginning of the line read from your file. If the prefix is found, you can then use sscanf to parse the double value from the file and add it to sum and increment the number of values stored in n, e.g.
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
if (strncmp (buf, str, len) == 0) { /* if prefix matches */
double tmp; /* temporary double for parse */
/* parse with scanf, discarding prefix with assignment suppression */
if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
sum += tmp; /* add value to sum */
n++; /* increment count of values */
}
}
}
(note: above the assignment suppression operator for sscanf(), '*' allows you to read and discard the prefix and ':' without having to store the prefix in a second string)
All that remains is checking if values are contained in sum by checking your count n and if so, output the average for the prefix. Or, if n == 0 the prefix was not found in the file, e.g.:
if (n) /* if values found, output average */
printf ("prefix '%s' avg: %.4f\n", str, sum / n);
else /* output not found */
printf ("prefix '%s' -- not found in file.\n", str);
}
That is basically all you need. With it, you can read from any file you like and search for any prefix simply passing the filename and prefix as the first two arguments to your program. The complete example would be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char buf[MAXC] = "", *str = NULL; /* buffer for line and ptr to search str */
size_t n = 0, len = 0; /* counter and search string length */
double sum = 0; /* sum of matching lines */
FILE *fp = NULL; /* file pointer */
if (argc < 3) { /* validate 2 arguments given - filename, search_string */
fprintf (stderr, "error: insufficient number of arguments\n"
"usage: %s filename search_string\n", argv[0]);
return 1;
}
if (!(fp = fopen (argv[1], "r"))) { /* open/validate file open for reading */
perror ("fopen-filename");
return 1;
}
str = argv[2]; /* set pointer to search string */
len = strlen (str); /* get length of search string */
while (fgets (buf, MAXC, fp)) { /* read each line into buf */
if (strncmp (buf, str, len) == 0) { /* if prefix matches */
double tmp; /* temporary double for parse */
/* parse with scanf, discarding prefix with assignment suppression */
if (sscanf (buf, "%*1023[^:]: %lf", &tmp) == 1) {
sum += tmp; /* add value to sum */
n++; /* increment count of values */
}
}
}
if (n) /* if values found, output average */
printf ("prefix '%s' avg: %.4f\n", str, sum / n);
else /* output not found */
printf ("prefix '%s' -- not found in file.\n", str);
}
Example Use/Output
Using your data file stored in dat/prefixdouble.txt, you can search for each prefix in the file and obtain the average, e.g.
$ ./bin/prefixaverage dat/prefixdouble.txt hhhh
prefix 'hhhh' avg: 0.8874
$ ./bin/prefixaverage dat/prefixdouble.txt xxxx
prefix 'xxxx' avg: 0.9105
$ ./bin/prefixaverage dat/prefixdouble.txt yyyy
prefix 'yyyy' avg: 0.7962
$ ./bin/prefixaverage dat/prefixdouble.txt zzzz
prefix 'zzzz' avg: 0.7897
$ ./bin/prefixaverage dat/prefixdouble.txt foo
prefix 'foo' -- not found in file.
Much easier than having to recompile each time you want to search for another prefix. Look things over and let me know if you have further questions.

Use fscanf to read strings and empty lines

I have a text file containing keywords and integers and have access to the file stream in order to parse this file.
I am able to parse it by doing
while( fscanf(stream, "%s", word) != -1 ) which gets each word and int in the file for me to parse, but the problem I'm having is that I cannot detect an empty line "\n" which then I need to detect for something. I can see that \n is a character thus not detected by %s. What can I do to modify fscanf to also get EOL characters?
You can do exactly what it is you wish to do with fscanf, but the number of checks and validations required to do it properly, and completely is just painful compared to using a proper line oriented input function like fgets.
With fgets (or POSIX getline) detecting an empty line requires nothing special, or in addition to, reading a normal line. For example, to read a line of text with fgets, you simply provide a buffer of sufficient size and make a single call to read up to and including the '\n' into buf:
while (fgets (buf, BUFSZ, fp)) { /* read each line in file */
To check whether the line was an empty-line, you simply check if the first character in buf is the '\n' char, e.g.
if (*buf == '\n')
/* handle blank line */
or, in the normal course of things, you will be removing the trailing '\n' by obtaining the length and overwriting the '\n' with the nul-terminating character. In which case, you can simply check if length is 0 (after removal), e.g.
size_t len = strlen (buf); /* get buf length */
if (len && buf[len-1] == '\n') /* check last char is '\n' */
buf[--len] = 0; /* overwrite with nul-character */
(note: if the last character was not '\n', you know the line was longer than the buffer and characters in the line remain unread -- and will be read on the next call to fgets, or you have reached the end of the file with a non-POSIX line ending on the last line)
Putting it altogether, an example using fgets identifying empty lines, and providing for printing complete lines even if the line exceeds the buffer length, you could do something like the following:
#include <stdio.h>
#include <string.h>
#define BUFSZ 4096
int main (int argc, char **argv) {
size_t n = 1;
char buf[BUFSZ] = "";
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fgets (buf, BUFSZ, fp)) { /* read each line in file */
size_t len = strlen (buf); /* get buf length */
if (len && buf[len-1] == '\n') /* check last char is '\n' */
buf[--len] = 0; /* overwrite with nul-character */
else { /* line too long or non-POSIX file end, handle as required */
printf ("line[%2zu] : %s\n", n, buf);
continue;
} /* output line (or "empty" if line was empty) */
printf ("line[%2zu] : %s\n", n++, len ? buf : "empty");
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
Example Input File
$ cat ../dat/captnjack2.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/fgetsblankln ../dat/captnjack2.txt
line[ 1] : This is a tale
line[ 2] : empty
line[ 3] : Of Captain Jack Sparrow
line[ 4] : empty
line[ 5] : A Pirate So Brave
line[ 6] : empty
line[ 7] : On the Seven Seas.
So Why Does Everybody Recommend fgets?
Well, let's take a look at doing the same thing with fscanf and I'll let you be the judge. To begin with, fscanf does not read or include the trailing '\n' with the "%s" format specifier (by default) or when using the character class "%[^\n]" (because it was specifically excluded). So you do not have the ability to read a (1) line with characters and (2) line without characters using the same format string. You either read characters and fscanf succeeds, or you don't and you experience a matching failure.
So as alluded to in the comments, you have to pre-check if the next character in the input buffer is a '\n' character using fgetc (or getc) and then put it back in the input buffer with ungetc if it isn't.
Further adding to your fscanf task, you must independently validate each check, put back, and read every step along the way. This results in quite a number of checks to handle all cases and provide all checks necessary to avoid undefined behavior.
As part of those checks you will need to limit the number of characters you read to one less-than the number of characters in the buffer while capturing the next character to determine if the line was too long to fit. Additional checks are required to handle (without failure) a file with a non-POSIX line end on the final line -- something handled without issue by fgets.
Below is a similar implementation to the fgets code above. Go through and understand why each step it necessary and what each validation prevents against. You may be able to rearrange slightly, but it has been whittled down to close to the bare minimum. After going though it, it should become clear why fgets is the preferred method for handling checks for empty lines (as well as for line oriented input, generally)
#include <stdio.h>
#define BUFSZ 4096
int main (int argc, char **argv) {
int c = 0, r = 0;
size_t n = 1;
char buf[BUFSZ] = "", nl = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
for (;;) { /* loop until EOF */
if ((c = fgetc (fp)) == '\n') /* check next char is '\n' */
*buf = 0; /* make buf empty-string */
else {
if (c == EOF) /* check if EOF */
break;
if (ungetc (c, fp) == EOF) { /* ungetc/validate */
fprintf (stderr, "error: ungetc failed.\n");
break;
}
/* read line into buf and '\n' into nl, handle failure */
if ((r = fscanf (fp, "%4095[^\n]%c", buf, &nl)) != 2) {
if (r == EOF) { /* EOF (input failure) */
break;
} /* check next char, if not EOF, non-POSIX eol */
else if ((c = fgetc (fp)) != EOF) {
if (ungetc (c, fp) == EOF) { /* unget it */
fprintf (stderr, "error: ungetc failed.\n");
break;
} /* read line again handling non-POSIX eol */
if (fscanf (fp, "%4095[^\n]", buf) != 1) {
fprintf (stderr, "error: fscanf failed.\n");
break;
}
}
} /* good fscanf, validate nl = '\n' or line to long */
else if (nl != '\n') {
fprintf (stderr, "error: line %zu too long.\n", n);
break;
}
} /* output line (or "empty" for empty line) */
printf ("line[%2zu] : %s\n", n++, *buf ? buf : "empty");
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
The Use/Output is identical to above. Look things over and let me know if you have any further questions.

C program to copy .csv of integers copies one less element unless element size is set to +1

I'm new to learning the C language and I wanted to write a simple program that would copy an array integers from one .csv file to a new .csv file. My code works as intended, however when my array size for fread/fwrite is set to the exact number of elements in the .csv array (10 in this case), it only copies nine of the elements.
When the array size is set to +1, it copies all the elements.
#include <stdio.h>
#include <stdlib.h>
#define LISTSIZE 11
//program that copies an array of integers from one .csv to another .csv
int main(int argc, char * argv[])
{
if (argc != 2)
{
fprintf(stderr, "Usage ./file_sort file.csv\n");
return 1;
}
char * csvfile = argv[1];
FILE * input_csvile = fopen(csvfile, "r"); //open .csv file and create file pointer input_csvile
if(input_csvile == NULL)
{
fprintf(stderr, "Error, Could not open\n");
return 2;
}
unsigned int giving_total[LISTSIZE];
if(input_csvile != NULL) //after file opens, read array from .csv input file
{
fread(giving_total, sizeof(int), LISTSIZE, input_csvile);
}
else
fprintf(stderr, "Error\n");
FILE * printed_file = fopen("school_currentfy1.csv", "w");
if (printed_file != NULL)
{
fwrite(giving_total, sizeof(int), LISTSIZE, printed_file); //copy array of LISTSIZE integers to new file
}
else
fprintf(stderr, "Error\n");
fclose(printed_file);
fclose(input_csvile);
return 0;
}
Does this have something to do with the array being 0-indexed and the .csv file being 1-indexed? I also had an output with the LISTSIZE of 11 which had the last (10) element being displayed incorrectly; 480 instead of 4800.
http://imgur.com/lLOozrc Output/input with LISTSIZE of 10
http://imgur.com/IZPGwsA Input/Output with LISTSIZE of 11
Note: as noted in the comment, fread and fwrite are for reading and writing binary data, not text. If you are dealing with a .csv (comma separated values -- e.g. as exported from MS Excel or Open/LibreOffice calc) You will need to use fgets (or any other character/string oriented function) followed by sscanf (or strtol, strtoul) to read the values as text and perform the conversion to int values. To write the values to your output file, use fprintf. (fscanf is also available for input text processing and conversion, but you lose flexibility in handling variations in input format)
However, if your goal was to read binary data for 10 integers (e.g. 40-bytes of data), then fread and fwrite are fine, but as with all input/output routines, you need to validate the number of bytes read and written to insure you are dealing with valid data within your code. (and that you have a valid output data file when you are done)
There are many ways to read a .csv file, depending on the format. One generic way is to simply read each line of text with fgets and then repeatedly call sscanf to convert each value. (this has a number of advantages in handling different spacing around the ',' compared to fscanf) You simply read each line, assign a pointer to the beginning of the buffer read by fgets, and then call sscanf (with %n to return the number of character processed by each call) and then advance the pointer by that number and scan forward in the buffer until your next '-' (for negative values) or a digit is encountered. (using %n and scanning forward can allow fscanf to be used in a similar manner) For example:
/* read each line until LISTSIZE integers read or EOF */
while (numread < LISTSIZE && fgets (buf, MAXC, fp)) {
int nchars = 0; /* number of characters processed by sscanf */
char *p = buf; /* pointer to line */
/* (you should check a whole line is read here) */
/* while chars remain in buf, less than LISTSIZE ints read
* and a valid conversion to int perfomed by sscanf, update p
* to point to start of next number.
*/
while (*p && numread < LISTSIZE &&
sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1) {
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next digit in buf */
while (*p && *p != '-' && (*p < '0' || *p > '9'))
p++;
}
}
Now to create your output file, you simply write numread values back out in comma separated value format. (you can adjust how many your write per line as required)
for (i = 0; i < numread; i++) /* write in csv format */
fprintf (fp, i ? ",%d" : "%d", giving_total[i]);
fputc ('\n', fp); /* tidy up -- make sure file ends with '\n' */
Then it is just a matter of closing your output file and checking for any stream errors (always check on close when writing values to a file)
if (fclose (fp)) /* always validate close after write to */
perror("error"); /* validate no stream errors occurred */
Putting it altogether, you could do something similar to the following:
#include <stdio.h>
#include <stdlib.h>
#define LISTSIZE 10
#define MAXC 256
int main(int argc, char *argv[])
{
if (argc < 3) {
fprintf(stderr, "Usage ./file_sort file.csv [outfile]\n");
return 1;
}
int giving_total[LISTSIZE]; /* change to int to handle negative values */
size_t i, numread = 0; /* generic i and number of integers read */
char *csvfile = argv[1],
buf[MAXC] = ""; /* buffer to hold MAXC chars of text */
FILE *fp = fopen (csvfile, "r");
if (fp == NULL) { /* validate csvfile open for reading */
fprintf(stderr, "Error, Could not open input file.\n");
return 2;
}
/* read each line until LISTSIZE integers read or EOF */
while (numread < LISTSIZE && fgets (buf, MAXC, fp)) {
int nchars = 0; /* number of characters processed by sscanf */
char *p = buf; /* pointer to line */
/* (you should check a whole line is read here) */
/* while chars remain in buf, less than LISTSIZE ints read
* and a valid conversion to int perfomed by sscanf, update p
* to point to start of next number.
*/
while (*p && numread < LISTSIZE &&
sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1) {
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next digit in buf */
while (*p && *p != '-' && (*p < '0' || *p > '9'))
p++;
}
}
if (numread < LISTSIZE) /* warn if less than LISTSIZE integers read */
fprintf (stderr, "Warning: only '%zu' integers read from file", numread);
fclose (fp); /* close input file */
fp = fopen (argc > 2 ? argv[2] : "outfile.csv", "w"); /* open output file */
if (fp == NULL) { /* validate output file open for writing */
fprintf(stderr, "Error, Could not open output file.\n");
return 3;
}
for (i = 0; i < numread; i++) /* write in csv format */
fprintf (fp, i ? ",%d" : "%d", giving_total[i]);
fputc ('\n', fp); /* tidy up -- make sure file ends with '\n' */
if (fclose (fp)) /* always validate close after write to */
perror("error"); /* validate no stream errors occurred */
return 0;
}
Like I said, there are many, many ways to approach this. The idea is to build in as much flexibility to your read as possible so it can handle any variations in the input format without choking. Another very robust way to approach the read is using strtol (or strtoul for unsigned values). Both allow will advance a pointer for you to the next character following the integer converted so you can start your scan for the next digit from there.
An example of the read flexibility provide in either of these approaches is shown below. Reading a file of any number of lines, with values separate by any separator and converting each integer encountered to a value in your array, e.g.
Example Input
$ cat ../dat/10int.csv
8572, -2213, 6434, 16330, 3034
12346, 4855, 16985, 11250, 1495
Example Program Use
$ ./bin/fgetscsv ../dat/10int.csv dat/outfile.csv
Example Output File
$ cat dat/outfile.csv
8572,-2213,6434,16330,3034,12346,4855,16985,11250,1495
Look things over and let me know if you have questions. If your intent was to read 40-bytes in binary form, just let me know and I'm happy to help with an example there.
If you want a truly generic read of values in a file, you can tweak the code that finds the number in the input file to scan forward in the file and validate that any '-' is followed by a digit. This allows reading any format and simply picking the integers from the file. For example with the following minor change:
while (*p && numread < LISTSIZE) {
if (sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1)
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next number in buf */
for (; *p; p++) {
if (*p >= '0' && *p <= '9') /* positive value */
break;
if (*p == '-' && *(p+1) >= '0' && *(p+1) <= '9') /* negative */
break;
}
}
You can easily process the following file and obtain the same results:
$ cat ../dat/10intmess.txt
8572,;a -2213,;--a 6434,;
a- 16330,;a
- The Quick
Brown%3034 Fox
12346Jumps Over
A
4855,;*;Lazy 16985/,;a
Dog.
11250
1495
Example Program Use
$ ./bin/fgetscsv ../dat/10intmess.txt dat/outfile2.csv
Example Output File
$ cat dat/outfile2.csv
8572,-2213,6434,16330,3034,12346,4855,16985,11250,1495

Resources