Open multiple files and store them in array of structures - c

I need some help with my task. I have to open unknown number of files and store them in data structure at the start of program. First file contains names of two other files and so on ( this is explained more under example of first file).
Each file has same structure:
[Title of file] [name of file X][name of file Y][Text]
, for example, first file will looks like this:
File 1 file_8.txt file_25.txtText: "this is some example text, lenght is unknown so so i will have to use malloc and realloc to dynamicaly store it."
The name of first file is typed in stdin by user when starting a program
(example: ./task1 page_1.txt)
The first line stores the title of the file. The second and third lines each contain a file name of a next file that i have to read/store.If there are no further names of files on 2nd and 3rd line, both of the lines will have " -\n ". Text starts at fourth line (can have multiple lines like in example above)
My struct for now:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
char title[1000]; // should use malloc and realloc and not this way
char file_x[1000]; // dynamically
char file_y[1000]; // dynamically
char text[10000];
} Story;
My main looks like this:
int main (int argc,char *argv[])
{
char c[100];
char buffer[100];
FILE *input = fopen(argv[1], "r");
Story *temp = (Story*) malloc(sizeof(Story) * 8);
if(input)
{
int flag = 0;
while (fgets(c, sizeof(buffer),input) != NULL)
{
if(flag == 0)
{
sscanf(c, "%s", temp->title);
}
else if(flag == 1)
{
sscanf(c, "%s", temp->file_x);
}
else if(flag == 2)
{
sscanf(c, "%s", temp->file_y);
}
else
{
while(!feof(input))
{
fread(temp->text, sizeof(Story),1,input);
}
}
flag++;
}
printf("%s\n%s\n%s\n", temp->title,
temp->file_x, temp->file_y);
}
else if (input == NULL)
{
printf("ERROR MESSAGE HERE \n");
return 1;
}
free(temp);
fclose(input);
return 0;
}
For now i managed to open first file and store it to structure. I need an idea how to open and store all other files and also have to implement it using dynamic memory allocation.
Any advice is greatly appreciated.

I suspect your lesson is covering recursion as with each element in your array of story you will need to branch an unknown number of times to read file_x and file_y (each of which can contain additional file_x and file_y within). Your procedural option is to follow down the trail of all file_x and then return to each file_y repeating the process until you reach the final files in each chain where file_x and file_y are empty.
Before determining which approach you will take, you simply need a way to read one file, extract the title, file_x, file_y and allocate and store text. This is a fairly straight-forward process where your primary task is to validate each step so you have confidence that you are processing actual data and are not invoking Undefined Behavior by reading from a file that isn't actually open or attempting to write (or read) beyond the bounds of your storage.
Here is a short example that takes a pointer to story to fill and the filename to read from. You will note a repetitive process involved. (read the string with fgets, get the length, validate the last char read was '\n' indicating you read the whole line, and finally trimming the '\n' by either overwriting with a nul-terminating character so you don't have newlines dangling off the end of your stored string, or overwriting with a ' ' (space) in the case of text where you are concatenating lines together.
note: below, realloc is never called directly on the pointer to text. Instead a tmp pointer is used with realloc to validate realloc succeeds before assigning the new block to text. (otherwise you will lose your pointer to text if realloc fails -- because it returns NULL)
/* read values into struct story 's' from 'filename' */
int read_file (story *s, char *filename)
{
size_t len = 0, /* var for strlen */
text_size = 0, /* total text_size */
nul_char = 0; /* flag for +1 on first allocation */
char buf[TITLE_MAX] = ""; /* read buffer for 'text' */
FILE *fp = fopen (filename, "r"); /* file pointer */
if (!fp) /* validate file open for reading */
return 0; /* or return silently indicating no file_x or file_y */
if (fgets (s->title, TITLE_MAX, fp) == 0) { /* read title */
fprintf (stderr, "error: failed to read title from '%s'.\n",
filename);
fclose(fp);
return 0;
}
len = strlen (s->title); /* get title length */
if (len && s->title[len - 1] == '\n') /* check last char is '\n' */
s->title[--len] = 0; /* overwrite with nul-character */
else { /* handle error if line too long */
fprintf (stderr, "error: title too long, filename '%s'.\n",
filename);
fclose(fp);
return 0;
}
if (fgets (s->file_x, PATH_MAX, fp) == 0) { /* same for file_x */
fprintf (stderr, "error: failed to read file_x from '%s'.\n",
filename);
fclose(fp);
return 0;
}
len = strlen (s->file_x);
if (len && s->file_x[len - 1] == '\n')
s->file_x[--len] = 0;
else {
fprintf (stderr, "error: file_x too long, filename '%s'.\n",
filename);
fclose(fp);
return 0;
}
if (fgets (s->file_y, PATH_MAX, fp) == 0) { /* same for file_y */
fprintf (stderr, "error: failed to read file_y from '%s'.\n",
filename);
fclose(fp);
return 0;
}
len = strlen (s->file_y);
if (len && s->file_y[len - 1] == '\n')
s->file_y[--len] = 0;
else {
fprintf (stderr, "error: file_y too long, filename '%s'.\n",
filename);
fclose(fp);
return 1;
}
while (fgets (buf, TITLE_MAX, fp)) { /* read text in TITLE_MAX chunks */
len = strlen (buf);
if (len && buf[len - 1] == '\n') /* check for '\n' */
buf[len - 1] = ' '; /* overwrite with ' ' for concat */
if (text_size == 0)
nul_char = 1; /* account for space for '\0' when empty, and */
else /* use a flag to set new block to empty-string */
nul_char = 0;
void *tmp = realloc (s->text, text_size + len + nul_char); /* allocate */
if (!tmp) { /* validate realloc succeeded */
fprintf (stderr, "error: realloc failed, filename '%s'.\n",
filename);
break;
}
s->text = tmp; /* assign new block to s->text */
if (nul_char) /* if first concatenation */
*(s)->text = 0; /* initialize s->text to empty-string */
strcat (s->text, buf); /* concatenate buf with s->text */
text_size += (len + 1); /* update text_size total */
}
fclose (fp); /* close file */
return 1;
}
With this, you will need to design a way to work through all file_x and file_y filenames. As mentioned above, this likely lends itself to a recursive function, or you can work your way down the file_x tree and circle back and pick up all the file_y additions. Note, you need to account for the new addition of story each time either file_x or file_y are followed.
Below is a short example that follows through all file_x additions and comes back and follow through only the 1st file_y branch. It is intended to shown you how to handle the calling and filling from both a file_x and file_y rather than write the final code for you. If you add the following above the read_file function, you will have a working example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h> /* for PATH_MAX */
enum { STORY_MAX = 12, TITLE_MAX = 1024 };
typedef struct
{
char title[TITLE_MAX],
file_x[PATH_MAX],
file_y[PATH_MAX],
*text;
} story;
int read_file (story *s, char *filename);
int main (int argc, char **argv) {
int n = 0, storycnt = 0;
story stories[STORY_MAX] = {{ .title = "" }};
char *filename = argv[1];
/* read all file_x filenames */
while (n < STORY_MAX && read_file (&stories[n], filename)) {
filename = stories[n++].file_x;
}
storycnt = n; /* current story count of all file_x */
for (int i = 0; i < storycnt; i++) /* find all file_y files */
while (n < STORY_MAX && read_file (&stories[n], stories[i].file_y)) {
filename = stories[i++].file_y;
n++;
}
for (int i = 0; i < n; i++) { /* output stories content */
printf ("\ntitle : %s\nfile_x: %s\nfile_y: %s\ntext : %s\n",
stories[i].title, stories[i].file_x,
stories[i].file_y, stories[i].text);
free (stories[i].text); /* don't forget to free memory */
}
return 0;
}
Example Input Files
$ cat file_1.txt
File 1
file_8.txt
file_25.txt
Text: "this is some example text, lenght is unknown
so i will have to use malloc and realloc to
dynamicaly store it."
$ cat file_8.txt
file_8
This is the text from file 8. Not much,
just some text.
$ cat file_25.txt
file_25
This is the text from file 25. Not much,
just some text.
Example Use/Output
$ ./bin/rdstories file_1.txt
title : File 1
file_x: file_8.txt
file_y: file_25.txt
text : Text: "this is some example text, lenght is unknown so i will
have to use malloc and realloc to dynamicaly store it."
title : file_8
file_x:
file_y:
text : This is the text from file 8. Not much, just some text.
title : file_25
file_x:
file_y:
text : This is the text from file 25. Not much, just some text.
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/rdstories file_1.txt
==9488== Memcheck, a memory error detector
==9488== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==9488== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==9488== Command: ./bin/rdstories file_1.txt
==9488==
title : File 1
file_x: file_8.txt
file_y: file_25.txt
text : Text: "this is some example text, lenght is unknown so i will
have to use malloc and realloc to dynamicaly store it."
title : file_8
file_x:
file_y:
text : This is the text from file 8. Not much, just some text.
title : file_25
file_x:
file_y:
text : This is the text from file 25. Not much, just some text.
==9488==
==9488== HEAP SUMMARY:
==9488== in use at exit: 0 bytes in 0 blocks
==9488== total heap usage: 13 allocs, 13 frees, 3,353 bytes allocated
==9488==
==9488== All heap blocks were freed -- no leaks are possible
==9488==
==9488== For counts of detected and suppressed errors, rerun with: -v
==9488== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have questions.

Related

Freeing copies to allocated memory (by getline() function)

I have written the following function to read lines of a text file and save them into a global array:
/* maximum number of lines allowed in one source file */
#define MAXSRCLNS 100
char *G_source_lines[MAXSRCLNS]; /* source lines */
/* number of source lines saved into source_ */
size_t G_source_lines_count = 0;
/*
* reads the source file 'path' and puts non-empty lines into the
* array G_source_lines. increments the number of lines
* G_source_lines_count for each line saved.
*/
void read_lines(char *path)
{
FILE *stream;
stream = fopen(path, "r");
if (!stream) {
fprintf(stderr, "can't open source '%s'\n", path);
exit(EXIT_FAILURE);
}
char *lnptr = NULL;
size_t n = 0;
while ((getline(&lnptr, &n, stream) != -1)) {
/* throw away empty lines */
if (!isempty(lnptr)) {
/* assert(("line count too large", G_source_lines_count < MAXSRCLNS)); */
G_source_lines[G_source_lines_count++] = lnptr;
} else {
/* I don't save an empty line in the G_source_lines for later
freeing, so free the allocation right here! */
free(lnptr);
}
lnptr = NULL;
}
/* free the lnptr variable defined and allocated on this stack */
/* don't forget to free it's copies in G_source_lines when done with it */
free(lnptr);
fclose(stream);
}
void free_source_lines(void)
{
for (size_t ln = 0; ln < G_source_lines_count; ++ln)
free(G_source_lines[ln]);
}
I am not sure whether copying the pointers to the allocated memories by getline saved in the lnptr into G_source_lines makes it necessary to free those copies too as the function free_source_lines should do when done with the G_souce_lines, or is it enough to free lnptr in read_line once at the end?
Yes, it's necessary to free them in free_source_lines().
Since you're setting lnptr to NULL before each call to getline(), it's setting it to a pointer to a different buffer each time. The call to free(lnptr) at the end only frees the buffer that was allocated during the final, failing call, not any of the buffers that were saved in G_source_lines.

A more elegant way to parse

I'm kind of new to C.
I need to write a small function that opens a configuration file that has 3 lines, each line contains a path to files/directories that I need to extract.
I wrote this program and it seem to work:
void readCMDFile(char* cmdFile,char directoryPath[INPUT_SIZE], char inputFilePath[INPUT_SIZE],char outputFilePath [INPUT_SIZE]) {
//open files
int file = open(cmdFile, O_RDONLY);
if (file < 0) {
handleFailure();
}
char buffer[BUFF_SIZE];
int status;
int count;
while((count=read(file,buffer,sizeof(buffer)))>0)
{
int updateParam = UPDATE1;
int i,j;
i=0;
j=0;
for (;i<count;i++) {
if (buffer[i]!='\n'&&buffer[i]!=SPACE&&buffer[i]!='\0') {
switch (updateParam){
case UPDATE1:
directoryPath[j] = buffer[i];
break;
case UPDATE2:
inputFilePath[j] = buffer[i];
break;
case UPDATE3:
outputFilePath[j] = buffer[i];
break;
}
j++;
} else{
switch (updateParam){
case UPDATE1:
updateParam = UPDATE2;
j=0;
break;
case UPDATE2:
updateParam = UPDATE3;
j=0;
break;
}
}
}
}
if (count < 0) {
handleFailure();
}
}
but it is incredibly unintuitive and pretty ugly, so I thought there must be a more elegant way to do it. are there any suggestions?
Thanks!
Update: a config file content will look like that:
/home/bla/dirname
/home/bla/bla/file1.txt
/home/bla/bla/file2.txt
Your question isn't one about parsing the contents of the file, it is simply one about reading the lines of the file into adequate storage within a function in a manner that the object containing the stored lines can be return to the calling function. This is fairly standard, but you have a number of ways to approach it.
The biggest consideration is not knowing the length of the lines to be read. You say there are currently 3-lines to be read, but there isn't any need to know beforehand how many lines there are (by knowing -- you can avoid realloc, but that is about the only savings)
You want to create as robust and flexible method you can for reading the lines and storing them in a way that allocates just enough memory to hold what is read. A good approach is to declare a fixed-size temporary buffer to hold each line read from the file with fgets and then to call strlen on the buffer to determine the number of characters required (as well as trimming the trailing newline included by fgets) Since you are reading path information the predefined macro PATH_MAX can be used to adequately size your temporary buffer to insure it can hold the maximum size path usable by the system. You could also use POSIX geline instead of fgets, but we will stick to the C-standard library for now.
The basic type that will allow you to allocate storage for multiple lines in your function and return a single pointer you can use in the calling function is char ** (a pointer to pointer to char -- or loosely an dynamic array of pointers). The scheme is simple, you allocate for some initial number of pointers (3 in your case) and then loop over the file, reading a line at a time, getting the length of the line, and then allocating length + 1 characters of storage to hold the line. For example, if you allocate 3 pointers with:
#define NPATHS 3
...
char **readcmdfile (FILE *fp, size_t *n)
{
...
char buf[PATH_MAX] = ""; /* temp buffer to hold line */
char **paths = NULL; /* pointer to pointer to char to return */
size_t idx = 0; /* index counter (avoids dereferencing) */
...
paths = calloc (NPATHS, sizeof *paths); /* allocate NPATHS pointers */
if (!paths) { /* validate allocation/handle error */
perror ("calloc-paths");
return NULL;
}
...
while (idx < NPATHS && fgets (buf, sizeof buf, fp)) {
size_t len = strlen (buf); /* get length of string in buf */
...
paths[idx] = malloc (len + 1); /* allocate storage for line */
if (!paths[idx]) { /* validate allocation */
perror ("malloc-paths[idx]"); /* handle error */
return NULL;
}
strcpy (paths[idx++], buf); /* copy buffer to paths[idx] */
...
return paths; /* return paths */
}
(note: you can eliminate the limit of idx < NPATHS, if you include the check before allocating for each string and realloc more pointers, as required)
The remainder is just the handling of opening the file and passing the open file-stream to your function. A basic approach is to either provide the filename on the command line and then opening the filename provided with fopen (or read from stdin by default if no filename is given). As with every step in your program, you need to validate the return and handle any error to avoid processing garbage (and invoking Undefined Behavior)
A simple example would be:
int main (int argc, char **argv) {
char **paths; /* pointer to pointer to char for paths */
size_t i, n = 0; /* counter and n - number of paths read */
/* open file given by 1st argument (or read stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("fopen-failed");
return 1;
}
paths = readcmdfile (fp, &n); /* call function to read file */
/* passing open file pointer */
if (!paths) { /* validate return from function */
fprintf (stderr, "error: readcmdfile failed.\n");
return 1;
}
for (i = 0; i < n; i++) { /* output lines read from file */
printf ("path[%lu]: %s\n", i + 1, paths[i]);
free (paths[i]); /* free memory holding line */
}
free (paths); /* free pointers */
return 0;
}
Putting all the pieces together, adding the code the trim the '\n' read and included in buf by fgets, and adding an additional test to make sure the line you read actually fit in buf, you could do something like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h> /* for PATH_MAX */
#define NPATHS 3
/* read lines from file, return pointer to pointer to char on success
* otherwise return NULL. 'n' will contain number of paths read from file.
*/
char **readcmdfile (FILE *fp, size_t *n)
{
char buf[PATH_MAX] = ""; /* temp buffer to hold line */
char **paths = NULL; /* pointer to pointer to char to return */
size_t idx = 0; /* index counter (avoids dereferencing) */
*n = 0; /* zero the pointer passed as 'n' */
paths = calloc (NPATHS, sizeof *paths); /* allocate NPATHS pointers */
if (!paths) { /* validate allocation/handle error */
perror ("calloc-paths");
return NULL;
}
/* read while index < NPATHS & good read into buf
* (note: instead of limiting to NPATHS - you can simply realloc paths
* when idx == NPATHS -- but that is for later)
*/
while (idx < NPATHS && fgets (buf, sizeof buf, fp)) {
size_t len = strlen (buf); /* get length of string in buf */
if (len && buf[len - 1] == '\n') /* validate last char is '\n' */
buf[--len] = 0; /* overwrite '\n' with '\0' */
else if (len == PATH_MAX - 1) { /* check buffer full - line to long */
fprintf (stderr, "error: path '%lu' exceeds PATH_MAX.\n", idx);
return NULL;
}
paths[idx] = malloc (len + 1); /* allocate storage for line */
if (!paths[idx]) { /* validate allocation */
perror ("malloc-paths[idx]"); /* handle error */
return NULL;
}
strcpy (paths[idx++], buf); /* copy buffer to paths[idx] */
}
*n = idx; /* update 'n' to contain index - no. of lines read */
return paths; /* return paths */
}
int main (int argc, char **argv) {
char **paths; /* pointer to pointer to char for paths */
size_t i, n = 0; /* counter and n - number of paths read */
/* open file given by 1st argument (or read stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("fopen-failed");
return 1;
}
paths = readcmdfile (fp, &n); /* call function to read file */
/* passing open file pointer */
if (!paths) { /* validate return from function */
fprintf (stderr, "error: readcmdfile failed.\n");
return 1;
}
for (i = 0; i < n; i++) { /* output lines read from file */
printf ("path[%lu]: %s\n", i + 1, paths[i]);
free (paths[i]); /* free memory holding line */
}
free (paths); /* free pointers */
return 0;
}
(note: if you allocate memory -- it is up to you to preserve a pointer to the beginning of each block -- so it can be freed when it is no longer needed)
Example Input File
$ cat paths.txt
/home/bla/dirname
/home/bla/bla/file1.txt
/home/bla/bla/file2.txt
Example Use/Output
$ ./bin/readpaths <paths.txt
path[1]: /home/bla/dirname
path[2]: /home/bla/bla/file1.txt
path[3]: /home/bla/bla/file2.txt
As you can see the function has simply read each line of the input file, allocated 3 pointers, allocated for each line and assigned the address for each block to the corresponding pointer and then returns a pointer to the collection to main() where it is assigned to paths there. Look things over and let me know if you have further questions.
I recommend looking into regular expressions. That way you read everything, then match with regular expressions and handle your matches.
Regular expressions exist for this purpose: to make parsing elegant.
If I were you, I will create a method for if/else blocks. I feel like they're redundant.
switch(updateParam) {
case UPDATE1:
method(); /*do if/else here*/
break;
...............
...............
}
However, you can still put them there if you do not need the method for other times and you concern about performance issues as function call costs more than just collective instructions.
In your program, you are passing 3 array of char to store the 3 lines read from the file. But this is very inefficient as the input file may contain more lines and in future, you may have the requirement to read more than 3 lines from the file. Instead, you can pass the array of char pointers and allocate memory to them and copy the content of lines read from the file. As pointed by Jonathan (in comment), if you use standard I/O then you can use function like fgets() to read lines
from input file.
Read a line from the file and allocate memory to the pointer and copy the line, read from the file to it. If the line is too long, you can read remaining part in consecutive calls to fgets() and use realloc to expand the existing memory, the pointer is pointing to, large enough to accommodate the remaining part of the line read.
Putting these all together, you can do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUF_SZ 100
#define MAX_LINES 3 /* Maximum number of lines to be read from file */
int readCMDFile(const char* cmdFile, char *paths[MAX_LINES]) {
int count, next_line, line_cnt, new_line_found;
char tmpbuf[BUF_SZ];
FILE *fp;
fp = fopen(cmdFile, "r");
if (fp == NULL) {
perror ("Failed to open file");
return -1;
}
next_line = 1; /* Keep track of next line */
count = 1; /* Used to calculate the size of memory, if need to reallocte
* in case when a line in the file is too long to read in one go */
line_cnt = 0; /* Keep track of index of array of char pointer */
new_line_found = 0;
while ((line_cnt < MAX_LINES) && (fgets (tmpbuf, BUF_SZ, fp) != NULL)) {
if (tmpbuf[strlen(tmpbuf) - 1] == '\n') {
tmpbuf[strlen(tmpbuf) - 1] = '\0';
new_line_found = 1;
} else {
new_line_found = 0;
}
if (next_line) {
paths[line_cnt] = calloc (sizeof (tmpbuf), sizeof (char));
if (paths[line_cnt] == NULL) {
perror ("Failed to allocate memory");
return -1;
}
next_line = 0;
count = 1;
} else {
char *ptr = realloc (paths[line_cnt], sizeof (tmpbuf) * (++count));
if (ptr == NULL) {
free (paths[line_cnt]);
perror ("Failed to reallocate memory");
return -1;
} else {
paths[line_cnt] = ptr;
}
}
/* Using strcat to copy the buffer to allocated memory because
* calloc initialize the block of memory with zero, so it will
* be same as strcpy when first time copying the content of buffer
* to the allocated memory and fgets add terminating null-character
* to the buffer so, it will concatenate the content of buffer to
* allocated memory in case when the pointer is reallocated */
strcat (paths[line_cnt], tmpbuf);
if (new_line_found) {
line_cnt++;
next_line = 1;
}
}
fclose(fp);
return line_cnt;
}
int main(void) {
int lines_read, index;
const char *file_name = "cmdfile.txt";
char *paths[MAX_LINES] = {NULL};
lines_read = readCMDFile(file_name, paths);
if (lines_read < 0) {
printf ("Failed to read file %s\n", file_name);
}
/* Check the output */
for (index = 0; index < lines_read; index++) {
printf ("Line %d: %s\n", index, paths[index]);
}
/* Free the allocated memory */
for (index = 0; index < lines_read; index++) {
free (paths[index]);
paths[index] = NULL;
}
return 0;
}
Output:
$ cat cmdfile.txt
/home/bla/dirname
/home/bla/bla/file1.txt
/home/bla/bla/file2.txt
$ ./a.out
Line 0: /home/bla/dirname
Line 1: /home/bla/bla/file1.txt
Line 2: /home/bla/bla/file2.txt
Note that the above program is not taking care of empty lines in the file as it has not been mentioned in the question. But if you want, you can add that check just after removing the trailing newline character from the line read from the file.

C copy substring from text file

Say I have the following text file -
name:asdfg
address:zcvxz
,
name:qwerwer
address:zxcvzxcvxz
,
And I wanna copy the name (without "name:") to a certain string variable, the address to another and so on.
How do I do so without corrupting memory?
Tried using (example) -
char buf[50];
while (fgets(buf, 50, file) != NULL) {
if (!strncmp(buf, "name", 4))
strncpy(somestring, buf + 5, 20)
//do the same for address, continue looping
but the text lines differ in length, so it seems to copy all sorts of crap from the buffer, as the strings arent null terminated so it copies "asdfgcrapcrapcrap".
You are to be commended for using fgets to handle your file I/O as it provides a much more flexible and robust way to read, validate and prepare to parse the lines of data you read. It is generally the recommended way to do line-oriented input (either from a file or from the user). However, this is one of those circumstances where treating multiple records as formatted input does have some advantages.
Let's start with an example reading your data file and capturing the name:.... and address:... data in a simple data structure to hold both the name and address data values in a 20-char array for each. Each line is read, the length is validated, the trailing '\n' is removed and then strchr is used to locate the ':' in the line. (we don't care about lines without ':'). The label before ':' is copied to tmp and then compare against "name" or "address" to determine which value to read. Once the address data is read, both name and addr values are printed to stdout,
#include <stdio.h>
#include <string.h>
enum { MAXC = 20, MAXS = 256 };
typedef struct {
char name[MAXC],
addr[MAXC];
} data;
int main (int argc, char **argv) {
char buf[MAXS] = "",
*name = "name", /* name/address literals for comparison */
*addr = "address";
data mydata = { .name = "" };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fgets (buf, MAXS, fp)) { /* read each line */
char *p = buf, /* pointer to use with strchr */
tmp[MAXC] = ""; /* storage for labels */
size_t len = strlen (buf); /* get buf len */
if (len && buf[len - 1] == '\n') /* validate last char is '\n' */
buf[--len] = 0; /* overwrite with nul-character */
else if (len + 1 == MAXS) { /* handle string too long */
fprintf (stderr, "error: line too long or no '\n'\n");
return 1;
}
if ((p = strchr (buf, ':'))) { /* find ':' in buf */
size_t labellen = p - buf, /* get length of label */
datalen = strlen (p + 1); /* get length of data */
if (labellen + 1 > MAXC) { /* validate both lengths */
fprintf (stderr, "error: label exceeds '%d' chars.\n", MAXC);
return 1;
}
if (datalen + 1 > MAXC) {
fprintf (stderr, "error: data exceeds '%d' chars.\n", MAXC);
return 1;
}
strncpy (tmp, buf, labellen); /* copy label to temp */
tmp[labellen] = 0; /* nul-terminate */
if (strcmp (name, tmp) == 0) /* is the label "name" ? */
strcpy (mydata.name, p + 1);
else if (strcmp (addr, tmp) == 0) { /* is the label "address" ? */
strcpy (mydata.addr, p + 1);
/* record complete -- output results */
printf ("\nname : %s\naddr : %s\n", mydata.name, mydata.addr);
}
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
(note: there are many ways to structure this logic. The example above just represents a semi-standard method)
Example Use/Output
$./bin/nameaddr <dat/nameaddr.txt
name : asdfg
addr : zcvxz
name : qwerwer
addr : zxcvzxcvxz
Here is where I will have a tough time convincing you that fgets was the way to go for this problem. Why? Here we are essentially reading formatted input that is comprised of 3-lines of data. The format string for fscanf doesn't care how many lines are involved, and can easily be constructed to skip '\n' within the formatted input. This can provide (a more fragile), but attractive alternative for the right input files.
For example, the code above can be reduced to the following using fscanf for a formatted read:
#include <stdio.h>
#define MAXC 20
typedef struct {
char name[MAXC],
addr[MAXC];
} data;
int main (int argc, char **argv) {
data mydata = { .name = "" };
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* read 3-lines at a time separating name and address at once */
while (fscanf (fp, " name:%19s address:%19s ,",
mydata.name, mydata.addr) == 2)
printf ("\nname : %s\naddr : %s\n", mydata.name, mydata.addr);
if (fp != stdin) fclose (fp); /* close file if not stdin */
return 0;
}
(the output is the same)
In the rare case, for the correct data file, fscanf can provide a viable alternative to a line-oriented read with fgets. However, your first choice should remain a line-oriented approach using either fgets or POSIX getline.
Look both over and let me know if you have further questions.
If the name is 20 characters or longer, strncpy() won't copy the null terminator to the destination string, so you need to add it yourself.
strncpy(somestring, buf + 5, 19);
somestring[19] = '\0';

C program to copy .csv of integers copies one less element unless element size is set to +1

I'm new to learning the C language and I wanted to write a simple program that would copy an array integers from one .csv file to a new .csv file. My code works as intended, however when my array size for fread/fwrite is set to the exact number of elements in the .csv array (10 in this case), it only copies nine of the elements.
When the array size is set to +1, it copies all the elements.
#include <stdio.h>
#include <stdlib.h>
#define LISTSIZE 11
//program that copies an array of integers from one .csv to another .csv
int main(int argc, char * argv[])
{
if (argc != 2)
{
fprintf(stderr, "Usage ./file_sort file.csv\n");
return 1;
}
char * csvfile = argv[1];
FILE * input_csvile = fopen(csvfile, "r"); //open .csv file and create file pointer input_csvile
if(input_csvile == NULL)
{
fprintf(stderr, "Error, Could not open\n");
return 2;
}
unsigned int giving_total[LISTSIZE];
if(input_csvile != NULL) //after file opens, read array from .csv input file
{
fread(giving_total, sizeof(int), LISTSIZE, input_csvile);
}
else
fprintf(stderr, "Error\n");
FILE * printed_file = fopen("school_currentfy1.csv", "w");
if (printed_file != NULL)
{
fwrite(giving_total, sizeof(int), LISTSIZE, printed_file); //copy array of LISTSIZE integers to new file
}
else
fprintf(stderr, "Error\n");
fclose(printed_file);
fclose(input_csvile);
return 0;
}
Does this have something to do with the array being 0-indexed and the .csv file being 1-indexed? I also had an output with the LISTSIZE of 11 which had the last (10) element being displayed incorrectly; 480 instead of 4800.
http://imgur.com/lLOozrc Output/input with LISTSIZE of 10
http://imgur.com/IZPGwsA Input/Output with LISTSIZE of 11
Note: as noted in the comment, fread and fwrite are for reading and writing binary data, not text. If you are dealing with a .csv (comma separated values -- e.g. as exported from MS Excel or Open/LibreOffice calc) You will need to use fgets (or any other character/string oriented function) followed by sscanf (or strtol, strtoul) to read the values as text and perform the conversion to int values. To write the values to your output file, use fprintf. (fscanf is also available for input text processing and conversion, but you lose flexibility in handling variations in input format)
However, if your goal was to read binary data for 10 integers (e.g. 40-bytes of data), then fread and fwrite are fine, but as with all input/output routines, you need to validate the number of bytes read and written to insure you are dealing with valid data within your code. (and that you have a valid output data file when you are done)
There are many ways to read a .csv file, depending on the format. One generic way is to simply read each line of text with fgets and then repeatedly call sscanf to convert each value. (this has a number of advantages in handling different spacing around the ',' compared to fscanf) You simply read each line, assign a pointer to the beginning of the buffer read by fgets, and then call sscanf (with %n to return the number of character processed by each call) and then advance the pointer by that number and scan forward in the buffer until your next '-' (for negative values) or a digit is encountered. (using %n and scanning forward can allow fscanf to be used in a similar manner) For example:
/* read each line until LISTSIZE integers read or EOF */
while (numread < LISTSIZE && fgets (buf, MAXC, fp)) {
int nchars = 0; /* number of characters processed by sscanf */
char *p = buf; /* pointer to line */
/* (you should check a whole line is read here) */
/* while chars remain in buf, less than LISTSIZE ints read
* and a valid conversion to int perfomed by sscanf, update p
* to point to start of next number.
*/
while (*p && numread < LISTSIZE &&
sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1) {
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next digit in buf */
while (*p && *p != '-' && (*p < '0' || *p > '9'))
p++;
}
}
Now to create your output file, you simply write numread values back out in comma separated value format. (you can adjust how many your write per line as required)
for (i = 0; i < numread; i++) /* write in csv format */
fprintf (fp, i ? ",%d" : "%d", giving_total[i]);
fputc ('\n', fp); /* tidy up -- make sure file ends with '\n' */
Then it is just a matter of closing your output file and checking for any stream errors (always check on close when writing values to a file)
if (fclose (fp)) /* always validate close after write to */
perror("error"); /* validate no stream errors occurred */
Putting it altogether, you could do something similar to the following:
#include <stdio.h>
#include <stdlib.h>
#define LISTSIZE 10
#define MAXC 256
int main(int argc, char *argv[])
{
if (argc < 3) {
fprintf(stderr, "Usage ./file_sort file.csv [outfile]\n");
return 1;
}
int giving_total[LISTSIZE]; /* change to int to handle negative values */
size_t i, numread = 0; /* generic i and number of integers read */
char *csvfile = argv[1],
buf[MAXC] = ""; /* buffer to hold MAXC chars of text */
FILE *fp = fopen (csvfile, "r");
if (fp == NULL) { /* validate csvfile open for reading */
fprintf(stderr, "Error, Could not open input file.\n");
return 2;
}
/* read each line until LISTSIZE integers read or EOF */
while (numread < LISTSIZE && fgets (buf, MAXC, fp)) {
int nchars = 0; /* number of characters processed by sscanf */
char *p = buf; /* pointer to line */
/* (you should check a whole line is read here) */
/* while chars remain in buf, less than LISTSIZE ints read
* and a valid conversion to int perfomed by sscanf, update p
* to point to start of next number.
*/
while (*p && numread < LISTSIZE &&
sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1) {
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next digit in buf */
while (*p && *p != '-' && (*p < '0' || *p > '9'))
p++;
}
}
if (numread < LISTSIZE) /* warn if less than LISTSIZE integers read */
fprintf (stderr, "Warning: only '%zu' integers read from file", numread);
fclose (fp); /* close input file */
fp = fopen (argc > 2 ? argv[2] : "outfile.csv", "w"); /* open output file */
if (fp == NULL) { /* validate output file open for writing */
fprintf(stderr, "Error, Could not open output file.\n");
return 3;
}
for (i = 0; i < numread; i++) /* write in csv format */
fprintf (fp, i ? ",%d" : "%d", giving_total[i]);
fputc ('\n', fp); /* tidy up -- make sure file ends with '\n' */
if (fclose (fp)) /* always validate close after write to */
perror("error"); /* validate no stream errors occurred */
return 0;
}
Like I said, there are many, many ways to approach this. The idea is to build in as much flexibility to your read as possible so it can handle any variations in the input format without choking. Another very robust way to approach the read is using strtol (or strtoul for unsigned values). Both allow will advance a pointer for you to the next character following the integer converted so you can start your scan for the next digit from there.
An example of the read flexibility provide in either of these approaches is shown below. Reading a file of any number of lines, with values separate by any separator and converting each integer encountered to a value in your array, e.g.
Example Input
$ cat ../dat/10int.csv
8572, -2213, 6434, 16330, 3034
12346, 4855, 16985, 11250, 1495
Example Program Use
$ ./bin/fgetscsv ../dat/10int.csv dat/outfile.csv
Example Output File
$ cat dat/outfile.csv
8572,-2213,6434,16330,3034,12346,4855,16985,11250,1495
Look things over and let me know if you have questions. If your intent was to read 40-bytes in binary form, just let me know and I'm happy to help with an example there.
If you want a truly generic read of values in a file, you can tweak the code that finds the number in the input file to scan forward in the file and validate that any '-' is followed by a digit. This allows reading any format and simply picking the integers from the file. For example with the following minor change:
while (*p && numread < LISTSIZE) {
if (sscanf (p, "%d%n", &giving_total[numread], &nchars) == 1)
numread++; /* increment the number read */
p += nchars; /* move p nchars forward in buf */
/* find next number in buf */
for (; *p; p++) {
if (*p >= '0' && *p <= '9') /* positive value */
break;
if (*p == '-' && *(p+1) >= '0' && *(p+1) <= '9') /* negative */
break;
}
}
You can easily process the following file and obtain the same results:
$ cat ../dat/10intmess.txt
8572,;a -2213,;--a 6434,;
a- 16330,;a
- The Quick
Brown%3034 Fox
12346Jumps Over
A
4855,;*;Lazy 16985/,;a
Dog.
11250
1495
Example Program Use
$ ./bin/fgetscsv ../dat/10intmess.txt dat/outfile2.csv
Example Output File
$ cat dat/outfile2.csv
8572,-2213,6434,16330,3034,12346,4855,16985,11250,1495

How to read in two text files and count the amount of keywords?

I have tried looking around but, to me files are the hardest thing to understand so far as I am learning C, especially text files, binary files were a bit easier. Basically I have to read in two text files both contains words that are formatted like this "hard, working,smart, works well, etc.." I am suppose to compare the text files and count the keywords. I would show some code but honestly I am lost and the only thing I have down is just nonsense besides this.
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#define SIZE 1000
void resumeRater();
int main()
{
int i;
int counter = 0;
char array[SIZE];
char keyword[SIZE];
FILE *fp1, *fp2;
int ch1, ch2;
errno_t result1 = fopen_s(&fp1, "c:\\myFiles\\resume.txt", "r");
errno_t result2 = fopen_s(&fp2, "c:\\myFiles\\ideal.txt", "r");
if (fp1 == NULL) {
printf("Failed to open");
}
else if (fp2 == NULL) {
printf("Failed to open");
}
else {
result1 = fread(array, sizeof(char), 1, fp1);
result2 = fread(keyword, sizeof(char), 1, fp2);
for (i = 0; i < SIZE; i++)
{
if (array[i] == keyword[i])
{
counter++;
}
}
fclose(fp1);
fclose(fp2);
printf("Character match: %d", counter);
}
system("pause");
}
When you have a situation where you are doing a multiple of something (like reading 2 files), it makes a lot of sense to plan ahead. Rather than muddying the body of main with all the code necessary to read 2 text files, create a function that reads the text file for you and have it return an array containing the lines of the file. This really helps you concentrate on the logic of what your code needs to do with the lines rather than filling space with getting the lines in the first place. Now there is nothing wrong with cramming it all in one long main, but from a readability, maintenance, and program structure standpoint, it makes all more difficult.
If you structure the read function well, you can reduce your main to the following. This reads both text files into character arrays and provides the number of lines read in a total of 4 lines (plus the check to make sure your provided two filenames to read):
int main (int argc, char **argv) {
if (argc < 3 ) {
fprintf (stderr, "error: insufficient input, usage: %s <filename1> <filename2>\n", argv[0]);
return 1;
}
size_t file1_size = 0; /* placeholders to be filled by readtxtfile */
size_t file2_size = 0; /* for general use, not needed to iterate */
/* read each file into an array of strings,
number of lines read, returned in file_size */
char **file1 = readtxtfile (argv[1], &file1_size);
char **file2 = readtxtfile (argv[2], &file2_size);
return 0;
}
At that point you have all your data and you can work on your key word code. Reading from textfiles is a very simple matter. You just have to get comfortable with the tools available. When reading lines of text, the preferred approach is to use line-input to read an entire line at a time into a buffer. You then parse to buffer to get what it is you need. The line-input tools are fgets and getline. Once you have read the line, you then have tools like strtok, strsep or sscanf to separate what you want from the line. Both fgets and getline read the newline at the end of each line as part of their input, so you may need to remove the newline to meet your needs.
Storing each line read is generally done by declaring a pointer to an array of char* pointers. (e.g. char **file1;) You then allocate memory for some initial number of pointers. (NMAX in the example below) You then access the individual lines in the file as file1_array[n] when n is the line index 0 - lastline of the file. If you have a large file and exceed the number of pointers you originally allocated, you simply reallocate additional pointers for your array with realloc. (you can set NMAX to 1 to make this happen for every line)
What you use to allocate memory and how you reallocate can influence how you make use of the arrays in your program. Careful choices of calloc to initially allocate your arrays, and then using memset when you reallocate to set all unused pointers to 0 (null), can really save you time and headache? Why? Because, to iterate over your array, all you need to do is:
n = 0;
while (file1[n]) {
<do something with file1[n]>;
n++;
}
When you reach the first unused pointer (i.e. the first file1[n] that is 0), the loop stops.
Another very useful function when reading text files is strdup (char *line). strdup will automatically allocate space for line using malloc, copy line to the newly allocated memory, and return a pointer to the new block of memory. This means that all you need to do to allocate space for each pointer and copy the line ready by getline to your array is:
file1[n] = strdup (line);
That's pretty much it. you have read your file and filled your array and know how to iterate over each line in the array. What is left is cleaning up and freeing the memory allocated when you no longer need it. By making sure that your unused pointers are 0, this too is a snap. You simply iterate over your file1[n] pointers again, freeing them as you go, and then free (file1) at the end. Your done.
This is a lot to take in, and there are a few more things to it. On the initial read of the file, if you noticed, we also declare a file1_size = 0; variable, and pass its address to the read function:
char **file1 = readtxtfile (argv[1], &file1_size);
Within readtxtfile, the value at the address of file1_size is incremented by 1 each time a line is read. When readtxtfile returns, file1_size contains the number of lines read. As shown, this is not needed to iterate over the file1 array, but you often need to know how many lines you have read.
To put this all together, I created a short example of the functions to read two text files, print the lines in both and free the memory associated with the file arrays. This explanation ended up longer than I anticipated. So take time to understand how it works, and you will be a step closer to handling textfiles easily. The code below will take 2 filenames as arguments (e.g. ./progname file1 file2) Compile it with something similar to gcc -Wall -Wextra -o progname srcfilename.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NMAX 256
char **readtxtfile (char *fn, size_t *idx);
char **realloc_char (char **p, size_t *n);
void prn_chararray (char **ca);
void free_chararray (char **ca);
int main (int argc, char **argv) {
if (argc < 3 ) {
fprintf (stderr, "error: insufficient input, usage: %s <filename1> <filename2>\n", argv[0]);
return 1;
}
size_t file1_size = 0; /* placeholders to be filled by readtxtfile */
size_t file2_size = 0; /* for general use, not needed to iterate */
/* read each file into an array of strings,
number of lines read, returned in file_size */
char **file1 = readtxtfile (argv[1], &file1_size);
char **file2 = readtxtfile (argv[2], &file2_size);
/* simple print function */
if (file1) prn_chararray (file1);
if (file2) prn_chararray (file2);
/* simple free memory function */
if (file1) free_chararray (file1);
if (file2) free_chararray (file2);
return 0;
}
char** readtxtfile (char *fn, size_t *idx)
{
if (!fn) return NULL; /* validate filename provided */
char *ln = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
size_t nmax = NMAX; /* check for reallocation */
char **array = NULL; /* array to hold lines read */
FILE *fp = NULL; /* file pointer to open file fn */
/* open / validate file */
if (!(fp = fopen (fn, "r"))) {
fprintf (stderr, "%s() error: file open failed '%s'.", __func__, fn);
return NULL;
}
/* allocate NMAX pointers to char* */
if (!(array = calloc (NMAX, sizeof *array))) {
fprintf (stderr, "%s() error: memory allocation failed.", __func__);
return NULL;
}
/* read each line from fp - dynamicallly allocated */
while ((nchr = getline (&ln, &n, fp)) != -1)
{
/* strip newline or carriage rtn */
while (nchr > 0 && (ln[nchr-1] == '\n' || ln[nchr-1] == '\r'))
ln[--nchr] = 0;
array[*idx] = strdup (ln); /* allocate/copy ln to array */
(*idx)++; /* increment value at index */
if (*idx == nmax) /* if lines exceed nmax, reallocate */
array = realloc_char (array, &nmax);
}
if (ln) free (ln); /* free memory allocated by getline */
if (fp) fclose (fp); /* close open file descriptor */
return array;
}
/* print an array of character pointers. */
void prn_chararray (char **ca)
{
register size_t n = 0;
while (ca[n])
{
printf (" arr[%3zu] %s\n", n, ca[n]);
n++;
}
}
/* free array of char* */
void free_chararray (char **ca)
{
if (!ca) return;
register size_t n = 0;
while (ca[n])
free (ca[n++]);
free (ca);
}
/* realloc an array of pointers to strings setting memory to 0.
* reallocate an array of character arrays setting
* newly allocated memory to 0 to allow iteration
*/
char **realloc_char (char **p, size_t *n)
{
char **tmp = realloc (p, 2 * *n * sizeof *p);
if (!tmp) {
fprintf (stderr, "%s() error: reallocation failure.\n", __func__);
// return NULL;
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n, 0, *n * sizeof *p); /* memset new ptrs 0 */
*n *= 2;
return p;
}
valgrind - Don't Forget To Check For Leaks
Lastly, anytime you allocate memory in your code, make sure you use a memory checker such as valgrind to confirm you have no memory errors and to confirm you have no memory leaks (i.e. allocated blocks you have forgotten to free, or that have become unreachable). valgrind is simple to use, just valgrind ./progname [any arguments]. It can provide a wealth of information. For example, on this read example:
$ valgrind ./bin/getline_readfile_fn voidstruct.c wii-u.txt
==14690== Memcheck, a memory error detector
==14690== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==14690== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==14690== Command: ./bin/getline_readfile_fn voidstruct.c wii-u.txt
==14690==
<snip - program output>
==14690==
==14690== HEAP SUMMARY:
==14690== in use at exit: 0 bytes in 0 blocks
==14690== total heap usage: 61 allocs, 61 frees, 6,450 bytes allocated
==14690==
==14690== All heap blocks were freed -- no leaks are possible
==14690==
==14690== For counts of detected and suppressed errors, rerun with: -v
==14690== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Pay particular attention to the lines:
==14690== All heap blocks were freed -- no leaks are possible
and
==14690== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
You can ignore the (suppressed: 2 from 2) which just indicate I don't have the development files installed for libc.

Resources