Reading text file into an array of lines in C - c

Using C I would like to read in the contents of a text file in such a way as to have when all is said and done an array of strings with the nth string representing the nth line of the text file. The lines of the file can be arbitrarily long.
What's an elegant way of accomplishing this? I know of some neat tricks to read a text file directly into a single appropriately sized buffer, but breaking it down into lines makes it trickier (at least as far as I can tell).
Thanks very much!

Breaking it down into lines means parsing the text and replacing all the EOL (by EOL I mean \n and \r) characters with 0.
In this way you can actually reuse your buffer and store just the beginning of each line into a separate char * array (all by doing only 2 passes).
In this way you could do one read for the whole file size+2 parses which probably would improve performance.

It's possible to read the number of lines in the file (loop fgets), then create a 2-dimensional array with the first dimension being the number of lines+1. Then, just re-read the file into the array.
You'll need to define the length of the elements, though. Or, do a count for the longest line size.
Example code:
inFile = fopen(FILENAME, "r");
lineCount = 0;
while(inputError != EOF) {
inputError = fscanf(inFile, "%s\n", word);
lineCount++;
}
fclose(inFile);
// Above iterates lineCount++ after the EOF to allow for an array
// that matches the line numbers
char names[lineCount][MAX_LINE];
fopen(FILENAME, "r");
for(i = 1; i < lineCount; i++)
fscanf(inFile, "%s", names[i]);
fclose(inFile);

For C (as opposed to C++), you'd probably wind up using fgets(). However, you might run into issues due to your arbitrary length lines.

Perhaps a Linked List would be the best way to do this?
The compiler won't like having an array with no idea how big to make it. With a Linked List you can have a really large text file, and not worry about allocating enough memory to the array.
Unfortunately, I haven't learned how to do linked lists, but maybe somebody else could help you.

If you have a good way to read the whole file into memory, you are almost there. After you've done that you could scan the file twice. Once to count the lines, and once to set the line pointers and replace '\n' and (and maybe '\r' if the file is read in Windows binary mode) with '\0'. In between scans allocate an array of pointers, now that you know how many you need.

you can use this way
#include <stdlib.h> /* exit, malloc, realloc, free */
#include <stdio.h> /* fopen, fgetc, fputs, fwrite */
struct line_reader {
/* All members are private. */
FILE *f;
char *buf;
size_t siz;
};
/*
* Initializes a line reader _lr_ for the stream _f_.
*/
void
lr_init(struct line_reader *lr, FILE *f)
{
lr->f = f;
lr->buf = NULL;
lr->siz = 0;
}
/*
* Reads the next line. If successful, returns a pointer to the line,
* and sets *len to the number of characters, at least 1. The result is
* _not_ a C string; it has no terminating '\0'. The returned pointer
* remains valid until the next call to next_line() or lr_free() with
* the same _lr_.
*
* next_line() returns NULL at end of file, or if there is an error (on
* the stream, or with memory allocation).
*/
char *
next_line(struct line_reader *lr, size_t *len)
{
size_t newsiz;
int c;
char *newbuf;
*len = 0; /* Start with empty line. */
for (;;) {
c = fgetc(lr->f); /* Read next character. */
if (ferror(lr->f))
return NULL;
if (c == EOF) {
/*
* End of file is also end of last line,
` * unless this last line would be empty.
*/
if (*len == 0)
return NULL;
else
return lr->buf;
} else {
/* Append c to the buffer. */
if (*len == lr->siz) {
/* Need a bigger buffer! */
newsiz = lr->siz + 4096;
newbuf = realloc(lr->buf, newsiz);
if (newbuf == NULL)
return NULL;
lr->buf = newbuf;
lr->siz = newsiz;
}
lr->buf[(*len)++] = c;
/* '\n' is end of line. */
if (c == '\n')
return lr->buf;
}
}
}
/*
* Frees internal memory used by _lr_.
*/
void
lr_free(struct line_reader *lr)
{
free(lr->buf);
lr->buf = NULL;
lr->siz = 0;
}
/*
* Read a file line by line.
* http://rosettacode.org/wiki/Read_a_file_line_by_line
*/
int
main()
{
struct line_reader lr;
FILE *f;
size_t len;
char *line;
f = fopen("foobar.txt", "r");
if (f == NULL) {
perror("foobar.txt");
exit(1);
}
/*
* This loop reads each line.
* Remember that line is not a C string.
* There is no terminating '\0'.
*/
lr_init(&lr, f);
while (line = next_line(&lr, &len)) {
/*
* Do something with line.
*/
fputs("LINE: ", stdout);
fwrite(line, len, 1, stdout);
}
if (!feof(f)) {
perror("next_line");
exit(1);
}
lr_free(&lr);
return 0;
}

Related

How can I write a function That returns all the content of a text file using fgets in C programming?

I want to write a function char* lire(FILE* f) that it supposed to read a file of text and return its content, I want to return it and not just read it & display it.
I would like to use fgets.
This code works but it's not what I want
char *lire(FILE *f)
{
char *content;
content = (char*)malloc((strlen(content) + 1) * sizeof(char));
while (fgets(content,120000, f) )
{
printf("%s", content);
}
return 0;
}
Instead, I tried this to return the text file but it just shows the first line of my text file
char *lire(FILE *f)
{
char *content;
content = (char*)malloc((strlen(content) + 1) * sizeof(char));
while (fgets(content,120000, f) )
{
return content;
}
}
There are several ways to read an entire text file into a single string using fgets(), but all of them will use fgets() iteratively. You can try to determine the size of the file upfront and then allocate sufficient space and read into that space. This works for disk files where you can seek on the file; it doesn't work if the input comes from a pipe, terminal, socket or most other things that are not disk files. There's also a TOCTOU (time of check, time of use) problem; file sizes can change while your code is reading them.
Alternatively, you can simply read into allocated space, reallocating more space when it is needed. This works with any type of input file.
Using fgets() means that the file cannot usefully contain null bytes. That's because fgets() does not report how much data it read, so you have to use strlen() to find out how long the data was, and strlen() stops at the first null byte it encounters.
Using incremental allocation, you might end up with code like this:
/* SO 7028-9928 */
/* Slurp a file using fgets() */
/* #include "slurp.h" */
/* SOF - slurp.h */
#include <stdio.h>
extern char *slurp_file(FILE *fp);
/* EOF - slurp.h */
#include <stdlib.h>
#include <string.h>
#define INIT_ALLOC 1024
#define MIN_SPACE 256
#define MAX_EXCESS 256
char *slurp_file(FILE *fp)
{
size_t offset = 0;
size_t bufsiz = INIT_ALLOC;
char *buffer = malloc(bufsiz);
if (buffer == NULL)
return NULL;
while (fgets(buffer + offset, bufsiz - offset, fp) != NULL)
{
/* Assumes data does not contain null bytes */
/* Generic problem using fgets() */
size_t newlen = strlen(buffer + offset);
offset += newlen;
if (bufsiz - offset < MIN_SPACE)
{
size_t new_size = bufsiz * 2;
char *new_data = realloc(buffer, new_size);
if (new_data == NULL)
{
free(buffer);
return NULL;
}
bufsiz = new_size;
buffer = new_data;
}
}
if (bufsiz - offset > MAX_EXCESS)
buffer = realloc(buffer, offset + 1);
return buffer;
}
int main(void)
{
char *data;
if ((data = slurp_file(stdin)) != NULL)
{
printf("Size: %zu\n", strlen(data));
printf("Data: [[%s]]\n", data);
free(data);
}
return 0;
}
When the program is run on its own source code, it produces output like:
Size: 1362
Data: [[/* SO 7028-9928 */
/* Slurp a file using fgets() */
…
return 0;
}
]]
One alternative involves using fread() instead of fgets() — this can handle binary data, but you need to revise the function interface to report both the length of the data and the pointer to the start of the data.

In C, how would you save different lines of a text file to different variables

How would I save different lines from a text File to different variables of different datatypes; all of these variables make up a struct (in my example a flight struct with the following).
struct Flight
{
int flightNum;
char desination[30];
char departDay[15];
};
An Example of the information that I would like to add via text file would be.
111
NYC
Monday
I obviously want to save the words NYC and Monday to a char array, but I want to save 111 to an integer variable
So far I have
while (fscanf(flightInfo, "%s", tempName) != EOF)
{
fscanf(flightInfo, "%d\n", &tempNum);
flight.flightNumber = tempNum;
fscanf(flightInfo, "%s\n", tempName);
strcpy(flight.desination, tempName);
fscanf(flightInfo, "%s\n", tempName)
strcpy(flight.departDay, tempName);
}
Assume that flightInfo is a pointer to a filename, tempNum is an integer, and tempName is a char array
It sounds like you're on the right track.
What about something like this:
#define MAX_FLIGHTS 100
...
struct Flight flights[MAX_FLIGHTS ];
int n_flights = 0;
...
while (!feof(fp) && (n_flights < MAX_FLIGHTS-1))
{
if (fscanf(fp, "%d\n", &flights[n_flights].flightNum) != 1)
error_handler();
if (fscanf(fp, "%29s\n", flights[n_flights].destination) != 1)
error_handler();
if (fscanf(fp, "%14s\n", flights[n_flights].departDay) != 1)
error_handler();
++n_flights;
}
...
ADDENDUM:
Per Chux's suggestion, I've modified the code to mitigate against potential buffer overruns, by setting scanf max string length to 29 (1 less than char[30] buffer size).
Here is a more detailed explanation:
SonarSource: "scanf()" and "fscanf()" format strings should specify a field width for the "%s" string placeholder
The first question you have to answer is this: how important is it for the file to be readable by people, or on other platforms?
If it isn't that important, then I recommend serializing with fwrite() and fread(). That is easier to code for each record, and - as long as your structs are all the same size - allows O(1) access to any record in the file.
If you do want to store these as individual lines, the best way to read a line in from a file is with fgets()
Pseudocode follows:
typedef struct flight {
int flightNum;
char desination[30];
char departDay[15];
} flight;
typedef struct flightSet {
flight *flights;
size_t n; /* number of flights */
size_t nAlloc; /* number of flights you have space for */
} flightSet;
#define FLIGHTSET_INIT_SIZE 16
#define MAX_LINE_LENGTH 128
#define FILENAME "file.txt"
// Create a new flightSet, calling it F
// Allocate FLIGHTSET_INIT_ALLOC number of flight structures for F->flights
// Set F->n to 0
// Set F->nAlloc to FLIGHTSET_INIT_ALLOC
/* Set up other variables */
size_t i = 0; // iterator */
char buffer[MAX_LINE_LENGTH]; // for reading with fgets() */
flights *temp; // for realloc()ing when we have more flights to read
// after reaching nAlloc flights
char *endptr; // for using strtol() to get a number from buffer
FILE *fp; // for reading from the file
// Open FILENAME with fp for reading
//MAIN LOOP
// If i == F->nAlloc, use realloc() to double the allocation of F->flights
// If successful, double F->nAlloc
if (fgets(buffer, MAX_LINE_LENGTH, fp) == NULL) {
// End of file
// Use break to get out of the main loop
}
F->flights[i]->flightNum = (int)strtol(buffer, &endptr, 10);
if (endptr == buffer) {
// The first invalid character that can't be converted to a number is at the very beginning
// of the buffer, so this is not a valid numerical character and your data file is corrupt
// Print out an error message
break;
}
if (fgets(buffer, MAX_LINE_LENGTH, fp) == NULL) {
// End of file when expecting new line; file format error
// Use break to get out of the main loop
} else {
F->flights[i]->destination = strdup(buffer); // If your system has strdup()
// Check for memory allocation
}
if (fgets(buffer, MAX_LINE_LENGTH, fp) == NULL) {
// End of file when expecting new line; file format error
// Use break to get out of the main loop
} else {
F->flights[i]->departDay = strdup(buffer); // If your system has strdup()
// Check for memory allocation
}
// If you've gotten here so far without errors, great!
// Increment F->n to reflect the number of successful records we have in F.
// Increment i, the loop iterator
//Final cleanup. Should include closing the file, and freeing any allocated
//memory that didn't end up in a valid record.

A more elegant way to parse

I'm kind of new to C.
I need to write a small function that opens a configuration file that has 3 lines, each line contains a path to files/directories that I need to extract.
I wrote this program and it seem to work:
void readCMDFile(char* cmdFile,char directoryPath[INPUT_SIZE], char inputFilePath[INPUT_SIZE],char outputFilePath [INPUT_SIZE]) {
//open files
int file = open(cmdFile, O_RDONLY);
if (file < 0) {
handleFailure();
}
char buffer[BUFF_SIZE];
int status;
int count;
while((count=read(file,buffer,sizeof(buffer)))>0)
{
int updateParam = UPDATE1;
int i,j;
i=0;
j=0;
for (;i<count;i++) {
if (buffer[i]!='\n'&&buffer[i]!=SPACE&&buffer[i]!='\0') {
switch (updateParam){
case UPDATE1:
directoryPath[j] = buffer[i];
break;
case UPDATE2:
inputFilePath[j] = buffer[i];
break;
case UPDATE3:
outputFilePath[j] = buffer[i];
break;
}
j++;
} else{
switch (updateParam){
case UPDATE1:
updateParam = UPDATE2;
j=0;
break;
case UPDATE2:
updateParam = UPDATE3;
j=0;
break;
}
}
}
}
if (count < 0) {
handleFailure();
}
}
but it is incredibly unintuitive and pretty ugly, so I thought there must be a more elegant way to do it. are there any suggestions?
Thanks!
Update: a config file content will look like that:
/home/bla/dirname
/home/bla/bla/file1.txt
/home/bla/bla/file2.txt
Your question isn't one about parsing the contents of the file, it is simply one about reading the lines of the file into adequate storage within a function in a manner that the object containing the stored lines can be return to the calling function. This is fairly standard, but you have a number of ways to approach it.
The biggest consideration is not knowing the length of the lines to be read. You say there are currently 3-lines to be read, but there isn't any need to know beforehand how many lines there are (by knowing -- you can avoid realloc, but that is about the only savings)
You want to create as robust and flexible method you can for reading the lines and storing them in a way that allocates just enough memory to hold what is read. A good approach is to declare a fixed-size temporary buffer to hold each line read from the file with fgets and then to call strlen on the buffer to determine the number of characters required (as well as trimming the trailing newline included by fgets) Since you are reading path information the predefined macro PATH_MAX can be used to adequately size your temporary buffer to insure it can hold the maximum size path usable by the system. You could also use POSIX geline instead of fgets, but we will stick to the C-standard library for now.
The basic type that will allow you to allocate storage for multiple lines in your function and return a single pointer you can use in the calling function is char ** (a pointer to pointer to char -- or loosely an dynamic array of pointers). The scheme is simple, you allocate for some initial number of pointers (3 in your case) and then loop over the file, reading a line at a time, getting the length of the line, and then allocating length + 1 characters of storage to hold the line. For example, if you allocate 3 pointers with:
#define NPATHS 3
...
char **readcmdfile (FILE *fp, size_t *n)
{
...
char buf[PATH_MAX] = ""; /* temp buffer to hold line */
char **paths = NULL; /* pointer to pointer to char to return */
size_t idx = 0; /* index counter (avoids dereferencing) */
...
paths = calloc (NPATHS, sizeof *paths); /* allocate NPATHS pointers */
if (!paths) { /* validate allocation/handle error */
perror ("calloc-paths");
return NULL;
}
...
while (idx < NPATHS && fgets (buf, sizeof buf, fp)) {
size_t len = strlen (buf); /* get length of string in buf */
...
paths[idx] = malloc (len + 1); /* allocate storage for line */
if (!paths[idx]) { /* validate allocation */
perror ("malloc-paths[idx]"); /* handle error */
return NULL;
}
strcpy (paths[idx++], buf); /* copy buffer to paths[idx] */
...
return paths; /* return paths */
}
(note: you can eliminate the limit of idx < NPATHS, if you include the check before allocating for each string and realloc more pointers, as required)
The remainder is just the handling of opening the file and passing the open file-stream to your function. A basic approach is to either provide the filename on the command line and then opening the filename provided with fopen (or read from stdin by default if no filename is given). As with every step in your program, you need to validate the return and handle any error to avoid processing garbage (and invoking Undefined Behavior)
A simple example would be:
int main (int argc, char **argv) {
char **paths; /* pointer to pointer to char for paths */
size_t i, n = 0; /* counter and n - number of paths read */
/* open file given by 1st argument (or read stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("fopen-failed");
return 1;
}
paths = readcmdfile (fp, &n); /* call function to read file */
/* passing open file pointer */
if (!paths) { /* validate return from function */
fprintf (stderr, "error: readcmdfile failed.\n");
return 1;
}
for (i = 0; i < n; i++) { /* output lines read from file */
printf ("path[%lu]: %s\n", i + 1, paths[i]);
free (paths[i]); /* free memory holding line */
}
free (paths); /* free pointers */
return 0;
}
Putting all the pieces together, adding the code the trim the '\n' read and included in buf by fgets, and adding an additional test to make sure the line you read actually fit in buf, you could do something like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h> /* for PATH_MAX */
#define NPATHS 3
/* read lines from file, return pointer to pointer to char on success
* otherwise return NULL. 'n' will contain number of paths read from file.
*/
char **readcmdfile (FILE *fp, size_t *n)
{
char buf[PATH_MAX] = ""; /* temp buffer to hold line */
char **paths = NULL; /* pointer to pointer to char to return */
size_t idx = 0; /* index counter (avoids dereferencing) */
*n = 0; /* zero the pointer passed as 'n' */
paths = calloc (NPATHS, sizeof *paths); /* allocate NPATHS pointers */
if (!paths) { /* validate allocation/handle error */
perror ("calloc-paths");
return NULL;
}
/* read while index < NPATHS & good read into buf
* (note: instead of limiting to NPATHS - you can simply realloc paths
* when idx == NPATHS -- but that is for later)
*/
while (idx < NPATHS && fgets (buf, sizeof buf, fp)) {
size_t len = strlen (buf); /* get length of string in buf */
if (len && buf[len - 1] == '\n') /* validate last char is '\n' */
buf[--len] = 0; /* overwrite '\n' with '\0' */
else if (len == PATH_MAX - 1) { /* check buffer full - line to long */
fprintf (stderr, "error: path '%lu' exceeds PATH_MAX.\n", idx);
return NULL;
}
paths[idx] = malloc (len + 1); /* allocate storage for line */
if (!paths[idx]) { /* validate allocation */
perror ("malloc-paths[idx]"); /* handle error */
return NULL;
}
strcpy (paths[idx++], buf); /* copy buffer to paths[idx] */
}
*n = idx; /* update 'n' to contain index - no. of lines read */
return paths; /* return paths */
}
int main (int argc, char **argv) {
char **paths; /* pointer to pointer to char for paths */
size_t i, n = 0; /* counter and n - number of paths read */
/* open file given by 1st argument (or read stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("fopen-failed");
return 1;
}
paths = readcmdfile (fp, &n); /* call function to read file */
/* passing open file pointer */
if (!paths) { /* validate return from function */
fprintf (stderr, "error: readcmdfile failed.\n");
return 1;
}
for (i = 0; i < n; i++) { /* output lines read from file */
printf ("path[%lu]: %s\n", i + 1, paths[i]);
free (paths[i]); /* free memory holding line */
}
free (paths); /* free pointers */
return 0;
}
(note: if you allocate memory -- it is up to you to preserve a pointer to the beginning of each block -- so it can be freed when it is no longer needed)
Example Input File
$ cat paths.txt
/home/bla/dirname
/home/bla/bla/file1.txt
/home/bla/bla/file2.txt
Example Use/Output
$ ./bin/readpaths <paths.txt
path[1]: /home/bla/dirname
path[2]: /home/bla/bla/file1.txt
path[3]: /home/bla/bla/file2.txt
As you can see the function has simply read each line of the input file, allocated 3 pointers, allocated for each line and assigned the address for each block to the corresponding pointer and then returns a pointer to the collection to main() where it is assigned to paths there. Look things over and let me know if you have further questions.
I recommend looking into regular expressions. That way you read everything, then match with regular expressions and handle your matches.
Regular expressions exist for this purpose: to make parsing elegant.
If I were you, I will create a method for if/else blocks. I feel like they're redundant.
switch(updateParam) {
case UPDATE1:
method(); /*do if/else here*/
break;
...............
...............
}
However, you can still put them there if you do not need the method for other times and you concern about performance issues as function call costs more than just collective instructions.
In your program, you are passing 3 array of char to store the 3 lines read from the file. But this is very inefficient as the input file may contain more lines and in future, you may have the requirement to read more than 3 lines from the file. Instead, you can pass the array of char pointers and allocate memory to them and copy the content of lines read from the file. As pointed by Jonathan (in comment), if you use standard I/O then you can use function like fgets() to read lines
from input file.
Read a line from the file and allocate memory to the pointer and copy the line, read from the file to it. If the line is too long, you can read remaining part in consecutive calls to fgets() and use realloc to expand the existing memory, the pointer is pointing to, large enough to accommodate the remaining part of the line read.
Putting these all together, you can do:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUF_SZ 100
#define MAX_LINES 3 /* Maximum number of lines to be read from file */
int readCMDFile(const char* cmdFile, char *paths[MAX_LINES]) {
int count, next_line, line_cnt, new_line_found;
char tmpbuf[BUF_SZ];
FILE *fp;
fp = fopen(cmdFile, "r");
if (fp == NULL) {
perror ("Failed to open file");
return -1;
}
next_line = 1; /* Keep track of next line */
count = 1; /* Used to calculate the size of memory, if need to reallocte
* in case when a line in the file is too long to read in one go */
line_cnt = 0; /* Keep track of index of array of char pointer */
new_line_found = 0;
while ((line_cnt < MAX_LINES) && (fgets (tmpbuf, BUF_SZ, fp) != NULL)) {
if (tmpbuf[strlen(tmpbuf) - 1] == '\n') {
tmpbuf[strlen(tmpbuf) - 1] = '\0';
new_line_found = 1;
} else {
new_line_found = 0;
}
if (next_line) {
paths[line_cnt] = calloc (sizeof (tmpbuf), sizeof (char));
if (paths[line_cnt] == NULL) {
perror ("Failed to allocate memory");
return -1;
}
next_line = 0;
count = 1;
} else {
char *ptr = realloc (paths[line_cnt], sizeof (tmpbuf) * (++count));
if (ptr == NULL) {
free (paths[line_cnt]);
perror ("Failed to reallocate memory");
return -1;
} else {
paths[line_cnt] = ptr;
}
}
/* Using strcat to copy the buffer to allocated memory because
* calloc initialize the block of memory with zero, so it will
* be same as strcpy when first time copying the content of buffer
* to the allocated memory and fgets add terminating null-character
* to the buffer so, it will concatenate the content of buffer to
* allocated memory in case when the pointer is reallocated */
strcat (paths[line_cnt], tmpbuf);
if (new_line_found) {
line_cnt++;
next_line = 1;
}
}
fclose(fp);
return line_cnt;
}
int main(void) {
int lines_read, index;
const char *file_name = "cmdfile.txt";
char *paths[MAX_LINES] = {NULL};
lines_read = readCMDFile(file_name, paths);
if (lines_read < 0) {
printf ("Failed to read file %s\n", file_name);
}
/* Check the output */
for (index = 0; index < lines_read; index++) {
printf ("Line %d: %s\n", index, paths[index]);
}
/* Free the allocated memory */
for (index = 0; index < lines_read; index++) {
free (paths[index]);
paths[index] = NULL;
}
return 0;
}
Output:
$ cat cmdfile.txt
/home/bla/dirname
/home/bla/bla/file1.txt
/home/bla/bla/file2.txt
$ ./a.out
Line 0: /home/bla/dirname
Line 1: /home/bla/bla/file1.txt
Line 2: /home/bla/bla/file2.txt
Note that the above program is not taking care of empty lines in the file as it has not been mentioned in the question. But if you want, you can add that check just after removing the trailing newline character from the line read from the file.

String / char * concatinate, C

Am trying to open a file(Myfile.txt) and concatenate each line to a single buffer, but am getting unexpected output. The problem is,my buffer is not getting updated with the last concatenated lines. Any thing missing in my code?
Myfile.txt (The file to open and read)
Good morning line-001:
Good morning line-002:
Good morning line-003:
Good morning line-004:
Good morning line-005:
.
.
.
Mycode.c
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[])
{
/* Define a temporary variable */
char Mybuff[100]; // (i dont want to fix this size, any option?)
char *line = NULL;
size_t len=0;
FILE *fp;
fp =fopen("Myfile.txt","r");
if(fp==NULL)
{
printf("the file couldn't exist\n");
return;
}
while (getline(&line, &len, fp) != -1 )
{
//Any function to concatinate the strings, here the "line"
strcat(Mybuff,line);
}
fclose(fp);
printf("Mybuff is: [%s]\n", Mybuff);
return 0;
}
Am expecting my output to be:
Mybuff is: [Good morning line-001:Good morning line-002:Good morning line-003:Good morning line-004:Good morning line-005:]
But, am getting segmentation fault(run time error) and a garbage value. Any think to do? thanks.
Specify MyBuff as a pointer, and use dynamic memory allocation.
#include <stdlib.h> /* for dynamic memory allocation functions */
char *MyBuff = calloc(1,1); /* allocate one character, initialised to zero */
size_t length = 1;
while (getline(&line, &len, fp) != -1 )
{
size_t newlength = length + strlen(line)
char *temp = realloc(MyBuff, newlength);
if (temp == NULL)
{
/* Allocation failed. Have a tantrum or take recovery action */
}
else
{
MyBuff = temp;
length = newlength;
strcat(MyBuff, temp);
}
}
/* Do whatever is needed with MyBuff */
free(MyBuff);
/* Also, don't forget to release memory allocated by getline() */
The above will leave newlines in MyBuff for each line read by getline(). I'll leave removing those as an exercise.
Note: getline() is linux, not standard C. A function like fgets() is available in standard C for reading lines from a file, albeit it doesn't allocate memory like getline() does.

Correct way to read a text file into a buffer in C? [duplicate]

This question already has answers here:
How to read the content of a file to a string in C?
(12 answers)
Closed 5 years ago.
I'm dealing with small text files that i want to read into a buffer while i process them, so i've come up with the following code:
...
char source[1000000];
FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
while((symbol = getc(fp)) != EOF)
{
strcat(source, &symbol);
}
fclose(fp);
}
...
Is this the correct way of putting the contents of the file into the buffer or am i abusing strcat()?
I then iterate through the buffer thus:
for(int x = 0; (c = source[x]) != '\0'; x++)
{
//Process chars
}
char source[1000000];
FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
while((symbol = getc(fp)) != EOF)
{
strcat(source, &symbol);
}
fclose(fp);
}
There are quite a few things wrong with this code:
It is very slow (you are extracting the buffer one character at a time).
If the filesize is over sizeof(source), this is prone to buffer overflows.
Really, when you look at it more closely, this code should not work at all. As stated in the man pages:
The strcat() function appends a copy of the null-terminated string s2 to the end of the null-terminated string s1, then add a terminating `\0'.
You are appending a character (not a NUL-terminated string!) to a string that may or may not be NUL-terminated. The only time I can imagine this working according to the man-page description is if every character in the file is NUL-terminated, in which case this would be rather pointless. So yes, this is most definitely a terrible abuse of strcat().
The following are two alternatives to consider using instead.
If you know the maximum buffer size ahead of time:
#include <stdio.h>
#define MAXBUFLEN 1000000
char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
if ( ferror( fp ) != 0 ) {
fputs("Error reading file", stderr);
} else {
source[newLen++] = '\0'; /* Just to be safe. */
}
fclose(fp);
}
Or, if you do not:
#include <stdio.h>
#include <stdlib.h>
char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
/* Go to the end of the file. */
if (fseek(fp, 0L, SEEK_END) == 0) {
/* Get the size of the file. */
long bufsize = ftell(fp);
if (bufsize == -1) { /* Error */ }
/* Allocate our buffer to that size. */
source = malloc(sizeof(char) * (bufsize + 1));
/* Go back to the start of the file. */
if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }
/* Read the entire file into memory. */
size_t newLen = fread(source, sizeof(char), bufsize, fp);
if ( ferror( fp ) != 0 ) {
fputs("Error reading file", stderr);
} else {
source[newLen++] = '\0'; /* Just to be safe. */
}
}
fclose(fp);
}
free(source); /* Don't forget to call free() later! */
Yes - you would probably be arrested for your terriable abuse of strcat !
Take a look at getline() it reads the data a line at a time but importantly it can limit the number of characters you read, so you don't overflow the buffer.
Strcat is relatively slow because it has to search the entire string for the end on every character insertion.
You would normally keep a pointer to the current end of the string storage and pass that to getline as the position to read the next line into.
If you're on a linux system, once you have the file descriptor you can get a lot of information about the file using fstat()
http://linux.die.net/man/2/stat
so you might have
#include <unistd.h>
void main()
{
struct stat stat;
int fd;
//get file descriptor
fstat(fd, &stat);
//the size of the file is now in stat.st_size
}
This avoids seeking to the beginning and end of the file.
See this article from JoelOnSoftware for why you don't want to use strcat.
Look at fread for an alternative. Use it with 1 for the size when you're reading bytes or characters.
Why don't you just use the array of chars you have? This ought to do it:
source[i] = getc(fp);
i++;
Not tested, but should work.. And yes, it could be better implemented with fread, I'll leave that as an exercise to the reader.
#define DEFAULT_SIZE 100
#define STEP_SIZE 100
char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
buffer[i]=fgetc(fp);
i++;
if(i>=buffer_sz){
buffer_sz+=STEP_SIZE;
void *tmp=buffer;
buffer=realloc(buffer,buffer_sz);
if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
}
}
buffer[i]=0;
Methinks you want fread:
http://www.cplusplus.com/reference/clibrary/cstdio/fread/
Have you considered mmap()? You can read from the file directly as if it were already in memory.
http://beej.us/guide/bgipc/output/html/multipage/mmap.html

Resources