reading a file to array of strings - c

I'm new to C and just learning about malloc and realloc and help from the community in understanding how to do this. I have a file with paragraphs that I need to read line by line and store the lines in array o strings while creating the arrays dynamically.
Inillially the MAX number of lines to store is 10 if this is not sufficient we use realloc to double the memory and print a message indicating that we reallocated memory. So far this is what I have and need help to finish
int main(int argc, char* argv[])
{
char* p = malloc(10* sizeof(char));
while(buffer, sizeof(buffer), stdin)
{
}
}

while(buffer, ... does nothing, use fgets:
data.txt:
one
two
three
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUF_LEN 32
extern char *strdup(const char *);
int main(void)
{
char **arr = NULL;
char buf[BUF_LEN];
size_t i, n = 0;
FILE *f;
f = fopen("data.txt", "r");
if (f == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
while (fgets(buf, BUF_LEN, f)) {
arr = realloc(arr, sizeof(*arr) * (n + 1));
if (arr == NULL) {
perror("realloc");
exit(EXIT_FAILURE);
}
arr[n] = strdup(buf);
if (arr[n++] == NULL) {
perror("strdup");
exit(EXIT_FAILURE);
}
}
for (i = 0; i < n; i++) {
printf("%s", arr[i]);
free(arr[i]);
}
free(arr);
}

You said and you do need array of strings. Come to think of it, a string is a sequence/array of characters, right? So you need array of array of characters.
Now, a char * is capable of pointing to a character and indirectly to the subsequent characters, if there are any. This is what we call as a string, and here's how we have one:
char * astring = malloc( 256 * sizeof * astring );
// astring holds an adress pointing to a memory location
// which has the capacity of 256 *astring s
// astring is a string tha can hold 255 characters
// with the full-stop '\0' at the end
Now you want 10 of such, 10 of char *s. char ** will be able to point at them, just like char * can at chars.
char ** lines = malloc( 10 * sizeof * lines );
for ( int i = 0; i < 10; i++ )
lines[i] = malloc( 256 );
// sizeof may be omittid for chars
If you are planning to increase 10, it's a good idea to store that inside a variable, double it when needed and reallocate accordingly.
int numlines = 10;
int linelength = 256;
char ** lines = malloc( numlines * sizeof * lines );
for( int linenr = 0; fgets( lines[linenr] = malloc( linelength ), linelength, yourfile ) != EOF; linenr++ ) {
if ( linenr + 1 == numlines ) {
numlines *= 2;
lines = realloc( lines, numlines * sizeof * lines );
}
}
Include necessary headers, fill in the gaps and make checks if the allocations and fopen succeeded, make sure 256 is enough, increase that if necessary. You may optionally make that adaptive as well, but that'll require more code.

Related

Novice C question: Working with a variable-length array of variable-length strings?

I probably got an easy one for the C programmers out there!
I am trying to create a simple C function that will execute a system command in and write the process output to a string buffer out (which should be initialized as an array of strings of length n). The output needs to be formatted in the following way:
Each line written to stdout should be initialized as a string. Each of these strings has variable length. The output should be an array consisting of each string. There is no way to know how many strings will be written, so this array is also technically of variable length (but for my purposes, I just create a fixed-length array outside the function and pass its length as an argument, rather than going for an array that I would have to manually allocate memory for).
Here is what I have right now:
#define MAX_LINE_LENGTH 512
int exec(const char* in, const char** out, const size_t n)
{
char buffer[MAX_LINE_LENGTH];
FILE *file;
const char terminator = '\0';
if ((file = popen(in, "r")) == NULL) {
return 1;
}
for (char** head = out; (size_t)head < (size_t)out + n && fgets(buffer, MAX_LINE_LENGTH, file) != NULL; head += strlen(buffer)) {
*head = strcat(buffer, &terminator);
}
if (pclose(file)) {
return 2;
}
return 0;
}
and I call it with
#define N 128
int main(void)
{
const char* buffer[N];
const char cmd[] = "<some system command resulting in multi-line output>";
const int code = exec(cmd, buffer, N);
exit(code);
}
I believe the error the above code results in is a seg fault, but I'm not experienced enough to figure out why or how to fix.
I'm almost positive it is with my logic here:
for (char** head = out; (size_t)head < (size_t)out + n && fgets(buffer, MAX_LINE_LENGTH, file) != NULL; head += strlen(buffer)) {
*head = strcat(buffer, &terminator);
}
What I thought this does is:
Get a mutable reference to out (i.e. the head pointer)
Save the current stdout line to buffer (via fgets)
Append a null terminator to buffer (because I don't think fgets does this?)
Overwrite the data at head pointer with the value from step 3
Move head pointer strlen(buffer) bytes over (i.e. the number of chars in buffer)
Continue until fgets returns NULL or head pointer has been moved beyond the bounds of out array
Where am I wrong? Any help appreciated, thanks!
EDIT #1
According to Barmar's suggestions, I edited my code:
#include <stdio.h>
#include <stdlib.h>
#define MAX_LINE_LENGTH 512
int exec(const char* in, const char** out, const size_t n)
{
char buffer[MAX_LINE_LENGTH];
FILE *file;
if ((file = popen(in, "r")) == NULL) return 1;
for (size_t i = 0; i < n && fgets(buffer, MAX_LINE_LENGTH, file) != NULL; i += 1) out[i] = buffer;
if (pclose(file)) return 2;
return 0;
}
#define N 128
int main(void)
{
const char* buffer[N];
const char cmd[] = "<system command to run>";
const int code = exec(cmd, buffer, N);
for (int i = 0; i < N; i += 1) printf("%s", buffer[i]);
exit(code);
}
While there were plenty of redundancies with what I wrote that are now fixed, this still causes a segmentation fault at runtime.
Focusing on the edited code, this assignment
out[i] = buffer;
has problems.
In this expression, buffer is implicitly converted to a pointer-to-its-first-element (&buffer[0], see: decay). No additional memory is allocated, and no string copying is done.
buffer is rewritten every iteration. After the loop, each valid element of out will point to the same memory location, which will contain the last line read.
buffer is an array local to the exec function. Its lifetime ends when the function returns, so the array in main contains dangling pointers. Utilizing these values is Undefined Behaviour.
Additionally,
for (int i = 0; i < N; i += 1)
always loops to the maximum storable number of lines, when it is possible that fewer lines than this were read.
A rigid solution uses an array of arrays to store the lines read. Here is a cursory example (see: this answer for additional information on using multidimensional arrays as function arguments).
#include <stdio.h>
#include <stdlib.h>
#define MAX_LINES 128
#define MAX_LINE_LENGTH 512
int exec(const char *cmd, char lines[MAX_LINES][MAX_LINE_LENGTH], size_t *lc)
{
FILE *stream = popen(cmd, "r");
*lc = 0;
if (!stream)
return 1;
while (*lc < MAX_LINES) {
if (!fgets(lines[*lc], MAX_LINE_LENGTH, stream))
break;
(*lc)++;
}
return pclose(stream) ? 2 : 0;
}
int main(void)
{
char lines[MAX_LINES][MAX_LINE_LENGTH];
size_t n;
int code = exec("ls -al", lines, &n);
for (size_t i = 0; i < n; i++)
printf("%s", lines[i]);
return code;
}
Using dynamic memory is another option. Here is a basic example using strdup(3), lacking robust error handling.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **exec(const char *cmd, size_t *length)
{
FILE *stream = popen(cmd, "r");
if (!stream)
return NULL;
char **lines = NULL;
char buffer[4096];
*length = 0;
while (fgets(buffer, sizeof buffer, stream)) {
char **reline = realloc(lines, sizeof *lines * (*length + 1));
if (!reline)
break;
lines = reline;
if (!(lines[*length] = strdup(buffer)))
break;
(*length)++;
}
pclose(stream);
return lines;
}
int main(void)
{
size_t n = 0;
char **lines = exec("ls -al", &n);
for (size_t i = 0; i < n; i++) {
printf("%s", lines[i]);
free(lines[i]);
}
free(lines);
}

How store each string of getline() inside a (dynamic) array of strings?

I'm using the getline() function to get every line of stdin. Every line is a string with different length:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *line = NULL;
size_t foo = 0;
ssize_t reader;
while ((reader = getline(&line, &foo, stdin)) != -1) { // %zu of reader is length of line
printf("%s", line);
}
free(line);
return 0;
}
In every iteration, line is a string and is containing the current line. How can I take each string-line and store it inside an array? There are several things I have tried but none of them worked or they just lead to memory access failure :(
I hope my question is clear? If it's not, please tell me and I will change it!
Unless you know up front how many lines to expect, then you will have to allocate the array dynamically, eg:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *line = NULL;
size_t foo = 0;
ssize_t reader;
int result = 0;
int numlines = 0, maxlines = 10;
char **lines = malloc(sizeof(char*) * maxlines);
if (!lines) {
printf("error allocating array\n");
}
else {
while ((reader = getline(&line, &foo, stdin)) != -1) { // %zu of reader is length of line
printf("%s", line);
if (numlines == maxlines) {
maxlines *= 2; // <-- or use whatever threshold makes sense for you
char **newlines = realloc(lines, sizeof(char*) * maxlines);
if (!newlines) {
printf("error reallocating array\n");
result = -1;
break;
}
lines = newlines;
}
lines[numlines] = line;
++numlines;
line = NULL;
foo = 0;
}
free(line); // <-- in case getline() or realloc() failed...
// use lines up to numlines as needed
// free lines
for(int i = 0; i < numlines; ++i) {
free(lines[i]);
}
free(lines);
}
return result;
}
You need to create an array of pointers that gets resized when needed:
#include <stdio.h>
#include <stdlib.h>
int main()
{
// start with an array that ends with a NULL pointer
// (like argv does)
size_t numLines = 0;
char **lines = malloc( ( numLines + 1 ) * sizeof( *lines ) );
lines[ numLines ] = NULL;
// break the loop explicitly - easier to handle and much less
// bug-prone than putting the assignment into a while statement
for ( ;; )
{
// get the next line
size_t bytes = 0UL;
char *line = NULL;
ssize_t result = getline( &line, &bytes, stdin );
if ( result < 0 )
{
break;
}
// enlarge the array by one
numLines++;
char **tmp = realloc( lines, ( numLines + 1 ) * sizeof( *tmp ) );
if ( !tmp )
{
break;
}
lines = tmp;
// add the new line to the end of the array
lines[ numLines ] = line;
lines[ numLines + 1 ] = NULL;
}
// use lines - then free them
return( 0 );
}
That can be optimized by doing the realloc() calls in chunks, such as every 32 or 64 lines. But given that you're already effectively calling malloc() once per line, that might not help much.

pointer of pointer of char in c, assignment crashes

I have a pointer of pointer to store lines I read from a file;
char **lines;
And I'm assigning them like this :
line_no=0;
*(&lines[line_no++])=buffer;
But it crashes why ?
According to my logic the & should give the pointer of zeroth index, then *var=value, that's how to store value in pointer. Isn't it ?
Here is my current complete code :
void read_file(char const *name,int len)
{
int line_no=0;
FILE* file;
int buffer_length = 1024;
char buffer[buffer_length];
file = fopen(name, "r");
while(fgets(buffer, buffer_length, file)) {
printf("---%s", buffer);
++line_no;
if(line_no==0)
{
lines = (char**)malloc(sizeof(*lines) * line_no);
}
else
{
lines = (char**)realloc(lines,sizeof(*lines) * line_no);
}
lines[line_no-1] = (char*)malloc(sizeof(buffer));
lines[line_no-1]=buffer;
printf("-------%s--------\n", *lines[line_no-1]);
}
fclose(file);
}
You have just a pointer, nothing more. You need to allocate memory using malloc().
Actually, you need first to allocate memory for pointers, then allocate memory for strings.
N lines, each M characters long:
char** lines = malloc(sizeof(*lines) * N);
for (int i = 0; i < N; ++i) {
lines[i] = malloc(sizeof(*(lines[i])) * M);
}
You are also taking an address and then immediately dereference it - something like*(&foo) makes little to no sense.
For updated code
Oh, there is so much wrong with that code...
You need to include stdlib.h to use malloc()
lines is undeclared. The char** lines is missing before loop
if in loop checks whether line_no is 0. If it is, then it allocates lines. The problem is, variable line_no is 0 - sizeof(*lines) times 0 is still zero. It allocates no memory.
But! There is ++line_no at the beginning of the loop, therefore line_no is never 0, so malloc() isn't called at all.
lines[line_no-1] = buffer; - it doesn't copy from buffer to lines[line_no-1], it just assigns pointers. To copy strings in C you need to use strcpy()
fgets() adds new line character at the end of buffer - you probably want to remove it: buffer[strcspn(buffer, "\n")] = '\0';
Argument len is never used.
char buffer[buffer_length]; - don't use VLA
It would be better to increment line_no at the end of the loop instead of constantly calculating line_no-1
In C, casting result of malloc() isn't mandatory
There is no check, if opening file failed
You aren't freeing the memory
Considering all of this, I quickly "corrected" it to such state:
void read_file(char const* name)
{
FILE* file = fopen(name, "r");
if (file == NULL) {
return;
}
int buffer_length = 1024;
char buffer[1024];
char** lines = malloc(0);
int line_no = 0;
while (fgets(buffer, buffer_length, file)) {
buffer[strcspn(buffer, "\n")] = '\0';
printf("---%s\n", buffer);
lines = realloc(lines, sizeof (*lines) * (line_no+1));
lines[line_no] = malloc(sizeof (*lines[line_no]) * buffer_length);
strcpy(lines[line_no], buffer);
printf("-------%s--------\n", lines[line_no]);
++line_no;
}
fclose(file);
for (int i = 0; i < line_no; ++i) {
free(lines[i]);
}
free(lines);
}
Ok, you have a couple of errors here:
lines array is not declared
Your allocation is wrong
I don't understand this line, it is pointless to allocate something multiplying it by zero
if( line_no == 0 )
{
lines = (char**)malloc(sizeof(*lines) * line_no);
}
You shouldn't allocate array with just one element and constantly reallocate it. It is a bad practice, time-consuming, and can lead to some bigger problems later.
I recommend you to check this Do I cast the result of malloc? for malloc casting.
You could write something like this:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void read_file(char const *name)
{
int line_no = 0, arr_size = 10;
int buffer_length = 1024;
char buffer[buffer_length];
char **lines;
FILE* file;
lines = malloc(sizeof(char*) * 10);
file = fopen(name, "r");
while(fgets(buffer, buffer_length, file)) {
buffer[strlen(buffer)-1] = '\0';
printf("---%s", buffer);
++line_no;
if(line_no == arr_size)
{
arr_size += 10;
lines = realloc(lines, sizeof(char*) * arr_size);
}
lines[line_no-1] = malloc(sizeof(buffer));
lines[line_no-1] = buffer;
printf("-------%s--------\n", lines[line_no-1]);
}
fclose(file);
}
PS, fgets() also takes the '\n' char at the end, in order to prevent this you can write the following line: buffer[strlen(buffer)-1] = '\0';

Why is fgets resetting my double pointer element?

I'm trying to store an array of string in a double pointer but it doesn't seem to be doing so.
char **pointerA;
char *pointerB;
int count;
FILE* file = fopen("textfile.ini", "r");
pointerA = (char **) malloc (sizeof(*pointerA));
pointerB = (char *) malloc (sizeof(*pointerB));
while(fgets(pointerB, 200, file) !== NULL)
{
pointerA = (char **)realloc(pointerA, sizeof(char *) * (strlen(pointerB) + 1));
pointerA[count] = pointerB;
count++;
}
fclose(file);
I expect every element to only store it's own string but it seems like all the element is storing the last string.
you need to allocate each element of pointerA like so :
int nbLines = 10; //number of lines to read in file
char** pointerA = ( char** ) malloc ( sizeof ( char* ) * nbLines ); //allocate 2D array. Each element points to another string
for ( int i = 0; i < nbLines; ++i ) {
char line [ 200 ];
fgets ( line , 200 , file ); //get a line from the file
pointerA [ i ] = ( char* ) malloc ( sizeof ( char ) * strlen ( line ) ); //allocate a string with the size of that line
pointerA [ i ] = line;
}
pointerB = (char *) malloc (sizeof(*pointerB));
you allocate 1 char and read in the fgets up to 200.
edit
It has to be something like this
char **pointerA;
char *pointerB;
#define MAXSTRING 200
FILE* file = fopen("textfile.ini", "r");
pointerA = NULL;
size_t nlines = 0;
do
{
pointerB = malloc(MAXSTRING);
pointerA = realloc(pointerA, sizeof(char *) * (nlines+1));
pointerA[nlines] = pointerB;
nlines++;
}while(pointerB & fgets(pointerB, , file) !== NULL)
fclose(file);
If you want the program to work with n strings you could do something like this:
int cur_lines = 0;
char **pointerA = malloc(sizeof(char *) * (cur_lines + 1));
char line [200];
while(fgets(line, 200, file))
pointerA[cur_lines] = malloc(sizeof(char) * 200);
strcpy(pointerA[cur_lines],line);
cur_lines += 1;
pointerA = realloc(pointerA, sizeof(char *) * (cur_lines + 1));
}
Of course you should check that the results of the malloc and realloc are not NULL before using these variables, usually you don't directly overwrite the pointer you are using whith a realloc but create a new one instead and overwrite the old one if it's not NULL to avoid having memory leaks.
If you want better performance you shouldn't increase pointerA by only 1 at each loop but more than that (usually double) and keep a counter of used spaces. Also keep in mind that having a line of 200 chars means that the maximum line length is actually 199 since the last character is \0.
The problem with this approach is that you'll have one last unused malloc'd space that you need to take care of later.

Realloc on an array of structs, address boundary error when indexing

I have some code where I'm trying to read lines in from a file and store some information from each line in a struct. Since I don't know how long the file will be, I'm dynamically adjusting the array of structs using realloc.
My issue is that my code seems to work fine for the first 3 (technically 6) lines, and then I receive SIGSEGV (address boundary error). gdb says that this happens when trying to index the array (array[i]->string = (char*) _tmp).
typedef struct {
char* string;
int len;
} buffer;
int read_into_array(char *filename, buffer** array) {
int n;
size_t size;
char* buf = NULL;
FILE *file = fopen(filename, "r");
int i = 0;
while (1) {
buffer *tmp = (buffer*)realloc(*array, sizeof(buffer) * (i + 1));
if (!tmp)
printf("Failed realloc\n");
*array = tmp;
// First line is ignored, second line is taken as data.
getline(&buf, &size, file);
n = getline(&buf, &size, file);
if (n > 0) {
void* _tmp = malloc(sizeof(char) * n);
if (!_tmp)
printf("Failed malloc\n");
array[i]->string = (char*) _tmp;
array[i]->len = n-1;
strncpy(array[i]->string, buf, n-1);
}
i++;
if (feof(file)) {
printf("saw end of file, leaving.\n");
break;
}
}
return i;
}
int main(int argc, char* argv[]) {
char *filename = argv[1];
buffer *array = (buffer*) calloc(1, sizeof(buffer));
int num = read_into_array(filename, &array);
}
Apologies for the somewhat poor formatting, I've been trying to figure this out for a while.
Since it seems to work for the first few lines, my assumption is that I'm going wrong somewhere in the realloc calculation. My other guess is that I'm somehow using/reading the file incorrectly.
Thanks for any help. For posterity, the file looks something like this https://hastebin.com/vinidiyita.sm (the real file is thousands of lines long).
when you do *array=tmp you're allocating memory for array[0]
then you're using array[i] that should be a pointer to a buffer, but points to garbage or 0
You're confusing two ways to use data.
The first is by using arrays - there's the non-dynamic:
buffer array[x] = {0};
int num = read_into_array(filename, &array);
then you can use array[i]
and there's the dynamic type:
buffer **array = calloc(initial_len*sizeof(buffer *));
int num = read_into_array(filename, array, initial_len);
read_into_array(char *filename, buffer **&array, int initial_len)
{
int len = initial_len;
...
while()
{
...
if(i>len)
{
array = realloc(array, sizeof(buffer*) * (i + 1));
len = i;
}
array[i] = calloc(sizeof(buffer));
}
}

Resources