Reading multiple files with different number of lines - arrays

I am trying to read arrays from multiple files, and the array size in each file is different. So what I do is, I try to count the number of lines in the file and then store that as the array size.
For example, I have two .txt files, File_1.txt and File_2.txt which contain the following data:
0.000 300.00
0.054 2623.3
1.000 300.00
0.000 300.00
0.054 2623.3
0.500 1500.0
1.000 300.00
respectively.
Here is the code that I use:
int main()
{
char filter[1024];
char filename[60];
FILE *fp;
double *T_SR, Z_SR;
for (int i = 1; i < 3; i++)
{
sprintf(filename, "File_%d.txt", i);
fp = fopen(filename, "r");
if (fp == NULL)
{
exit(1);
}
int count = 0;
for (int j = getc(fp); j != EOF; j = getc(fp))
{
if (j == '\n')
{
count = count + 1;
}
}
T_SR = (double *)malloc(count * sizeof(double));
Z_SR = (double *)malloc(count * sizeof(double));
for (int rows = 0; rows < count; rows++)
{
fscanf(fp, "%lf %lf", &Z_SR[rows], &T_SR[rows]);
printf("%lf %lf\n", Z_SR[rows], T_SR[rows]);
if (feof(fp))
{
break;
}
}
}
}
But instead of printing the given array as output, it prints this:
0.0000 0.0000
0.0000 0.0000
I checked the value of count, it's good. Maybe the problem is simple, but I am not able to find it. Can someone please help?

After you ran the whole file with getc the file indicator will be at the end of the file you must set it back to the beginning before you use fscanf, you can use rewind for that.
rewind(fp); //<--
for (int rows = 0; rows < count; rows++)
{
//...
}
Aside from that, other problems exist as Jaberwocky pointed out, among others, like a memory leak issue, and the fact that you don't close your files or check malloc return, here's how your code could look like (with comments):
double *T_SR, *Z_SR; // fix the pointer issue
//...
char line[1024]; // make sure it's larger than the largest line in the file
while (fgets(line, sizeof line, fp)) // fixes the count issue
{
// doesn't count empty lines, if there are any
if (line[0] != '\n')
{
count++;
}
}
if(count > 0)
{
T_SR = malloc(count * sizeof *T_SR);
Z_SR = malloc(count * sizeof *Z_SR);
if(T_SR == NULL || Z_SR == NULL) // check memory allocation
{
perror("malloc");
return EXIT_FAILURE;
}
rewind(fp);
for(int rows = 0; fscanf(fp, "%lf%lf", &Z_SR[rows], &T_SR[rows]) == 2; rows++)
{
printf("%lf %lf\n", Z_SR[rows], T_SR[rows]);
}
free(T_SR); // free the memory, avoids memory leaks
free(Z_SR);
}
fclose(fp); // and close the file
//...
Live demo

There are several bugs:
The most important one is the rewind issue that has been addressed in anastaciu's anwer.
double * T_SR, Z_SR is wrong, it should be double * T_SR, *Z_SR. I wonder actually if the code you posted is the code you compile.
your line counting method is flawed. If the last line of the file does not end with a \n, the count variable will be 2 and you'll miss the last line.
fscanf returns the number of items read or EOF. If you had check that, you might have found the problem in your code yourself.
the feof check is done too late, if fscanf encounters en EOF you still print the values that have not bee read due to the EOF condition.

I try to count the number of lines in the file and then store that as the array size.
Aside from the key rewind() issue, avoid reading code one way to find line count and another to find the doubles. Far too easy to get a line count that does not match the "line count" of reading two doubles.
Use one approach to find both.
size_t read_SR(size_t count, double *Z_SR, double *T_SR, FILE *inf) {
char line[100];
rewind(inf);
size_t rows;
while (fgets(line, sizeof line, inf)) {
double Z, T;
if (sscanf(line, "%lf %lf", &Z, &T) != 2) return rows;
if (rows < count) {
if (Z_SR) Z_SR[rows] = Z;
if (T_SR) T_SR[rows] = T;
}
rows++;
}
return rows;
}
Usage
// First pass, find size
size_t count = read_SR(0, NULL, NULL, inf);
double *T_SR = malloc(sizeof *T_SR * count);
double *Z_SR = malloc(sizeof *Z_SR * count);
// 2nd pass, save data
read_SR(count, Z_SR, T_SR, inf);

Related

Not able to print this 2D array (weird output) in C

I am trying to read a text file with 100 numbers like 1 2 45 55 100 text file here (all on a single line) and then put them in a 10x10 array (2D array).
736.2 731.6 829.8 875.8 568.3 292.2 231.1 868.9 66.7 811.9 292.0 967.6 419.3 578.1 322.5 471.7 980.0 378.8 784.1 116.8 900.4 355.3 645.7 603.6 409.1 652.1 144.1 590.6 953.1 954.0 502.0 689.3 685.6 331.9 565.1 253.9 624.1 796.2 122.8 690.7 608.0 414.8 658.3 27.3 992.9 980.8 499.0 972.8 359.7 283.1 89.7 260.1 638.4 735.4 863.6 47.5 387.5 7.7 638.1 340.6 961.7 140.1 29.8 647.3 471.9 594.9 901.2 96.0 391.1 24.0 786.7 999.1 438.7 445.0 26.4 431.6 425.9 525.4 404.4 785.6 808.5 494.1 45.7 447.0 229.5 909.3 494.4 617.0 917.0 132.5 957.5 878.8 272.6 987.4 526.1 744.5 582.3 427.3 840.5 973.3
Here is my code:
#include <stdio.h>
#define NR 10
#define NC 10
int main(void) {
int numbers[9][9];
int i = 0;
int count;
int j = 0;
FILE *file;
file = fopen("numbers.txt", "r");
for (count = 1; count < 101; count++) {
fscanf(file, "%d", &numbers[i][j]);
j++;
if ((count != 1) && (count % 10 == 0)) {
i++;
j = 0;
}
}
fclose(file);
int p = 0;
int q = 0;
for (p = 0; p < NR; p++) {
for (q = 0; q < NC; q++) {
printf("%d", numbers[p][q]);
}
printf("\n");
}
return 0;
}
As SparKot noted in a comment, to read a 10x10 matrix, you need to define the matrix with 10x10 elements:
int numbers[10][10];
That has to be one of the weirder ways of reading a 10x10 matrix that I've ever seen. Why not go for a simple approach of nested loops. Since the data contains floating-point numbers, you need to read them as double (or perhaps float) values.
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
double double_val;
if (fscanf(file, "%lf", &double_val) != 1)
{
fprintf(stderr, "failed to read matrix[i][j]\n", i, j);
exit(EXIT_FAILURE);
}
numbers[i][j] = double_val;
}
}
The mess with double_val works around the data containing floating point numbers and your original code trying to read integers. You'll get one valid value; thereafter, fscanf() will return 0 because the . is not a part of a valid integer. This highlights the importance of checking the return value from fscanf() and its relatives.
Frankly, you should be using double numbers[10][10]; for the data from the file. Then you could read directly into the array:
if (fscanf("%lf", &numbers[i][j]) != 1)
But you'd need to check (and probably change) all the rest of the code too.
There are multiple issues in your code:
the matrix is too small, make it numbers[NR][NC].
you do not check for fopen failure: you will have undefined behavior if the file numbers.txt is not in the current directory or cannot be open for reading.
you read the file contents as integers, but the file contains floating point numbers with a . decimal separator: the second and subsequent fscanf() will get stuck on the . and keep returning 0 without modifying the destination number, leaving the matrix mostly uninitialized. Make the matrix double numbers[NR][NC], read the numbers with %lf and test for conversion failure.
the counting method in the reading loop is weird. Just use 2 nested for loops with proper counter and tests.
printing the matrix contents, you should output at least a space between numbers so the output is readable.
Here is a modified version:
#include <errno.h>
#include <stdio.h>
#include <string.h>
#define NR 10
#define NC 10
int main() {
double numbers[NR][NC];
FILE *file;
file = fopen("numbers.txt", "r");
if (file == NULL) {
fprintf(stderr, "cannot open numbers.txt: %s\n", strerror(errno));
return 1;
}
for (int i = 0; i < NR; i++) {
for (int j = 0; j < NC; j++) {
if (fscanf(file, "%lf", &numbers[i][j]) != 1) {
fprintf(stderr, "error reading number at row %d, col %d\n",
i + 1, j + 1);
fclose(file);
return 1;
}
}
}
fclose(file);
for (int p = 0; p < NR; p++) {
for (int q = 0; q < NC; q++) {
printf(" %5g", numbers[p][q]);
}
printf("\n");
}
return 0;
}
Clear all a common condition that causes programs to crash; they are often associated with a file named core.
code is showing segmentation fault.

Input in specific format (matrix)

I have an issue with input in my homework. On stdin, I will get a specifically formatted input.
In first line, there will be 2 integers, that determine the size of a matrix (rows and cols). All the lines after represent rows of the matrix.
I essentially want to do something like getline(), but I don't want to use getline(). In fact I can't, its forbidden in the homework. Therefore I have to scan int by int (or char by char I guess). The issue here is I need it to be bulletproof (almost). Dummy-proof at least.
I'm imagining a big while loop that keeps going until EOF and inside that another loop (perhaps?) which always reads a line, saves it to my allocated matrix and carries on to the next. I'm aware that I'm supposed to be checking for '\n', but I kind of lack the ability to think of a solution today.
Here's what I'm working with: My matrices are a structure.
struct Matrix{
int nrows;
int ncols;
int** matrix;
};
I then have multiple functions.
A function to dynamically allocate space for the matrix of specific size:
struct Matrix init_matrix(int r, int c)
{
struct Matrix mat;
mat.nrows = r;
mat.ncols = c;
mat.matrix = calloc(r, sizeof(int *));
for(int i = 0; i < r; ++i)
{
*(mat.matrix+i) = calloc(c, sizeof(int));
}
return mat;
}
A function to free the previously allocated space:
void free_matrix(struct Matrix mat)
{
int top = mat.nrows;
for(int i = 0; i < top; ++i)
{
free(mat.matrix[i]);
}
free(mat.matrix);
}
Those 2 functions work perfectly fine.
Now I'm trying to make a function create_matrix(void) (at least I think it shouldn't take any args), that will read the input I'm supposed to receive, for example:
3 3
1 2 3
4 5 6
7 8 9
when the function reads the input, it could tell if the input is incorrect or is in incorrect format and exit the program with corresponding exit value (like 100 f.e.) If the input is correct and in correct format, it calls init_matrix() and then saves input to the matrix.
For your deeper understanding: the whole input I'm supposed to receive is:
matrix A (like above, size in first line, values in lines after)
an operation (+,-,*)
matrix B
Then execute the operation (A*B, A+B etc.). I'm trying to make most things into functions, so the main would be very simple, f.e.
int main(int argc, char *argv[])
{
struct Matrix mat1 = create_matrix();
char operation = get_operation();
struct Matrix mat2 = create_matrix();
struct Matrix result = compute(mat1,mat2, operation);
return 0;
}
Something in those lines, if you get me. The thing is I want to make the program complex enough so that I could later edit it to handle a bigger sequence (up to 100) of matrices than just two. Right now I could do it the dirty way, make it work for two matrices with one operation, but that's not what I really want.
Well, here's how I solved it. It works. It's not anywhere close to perfect, but it works, upload system took it and gave it full amount of points, so I'm satisfied.
struct Matrix read_matrix(FILE *fp)
{
struct Matrix mat;
//FIRST LINE
int ch;
int i = 0;
int n = 20;
char* line = calloc(n,sizeof(char));
while((ch = fgetc(fp)) != EOF && ch != '\n')
{
*(line + i++) = ch;
}
*(line + n-1) = '\0';
int r,c;
int k = sscanf(line,"%d %d", &r, &c);
if(k != 2)
{
fprintf(stderr, "Error: Chybny vstup!\n");
exit(100);
}
free(line);
//MATRIX
line = calloc(c, sizeof(int));
mat = init_matrix(r, c);
i = 0;
r = 0;
while(r < mat.nrows && (ch = fgetc(fp)))
{
if(ch == '\n' || ch == EOF)
{
*(line + i) = '\0';
int offset;
char *data = line;
for(int j = 0; j < mat.ncols; ++j)
{
int d = sscanf(data, " %d%n", &mat.matrix[r][j], &offset);
if(d != 1){
fprintf(stderr, "Error: Chybny vstup!\n");
exit(100);
}
data += offset;
}
i = 0;
++r;
if(ch == EOF){
break;
}
} else
{
*(line + i++) = ch;
}
}
free(line);
return mat;
}

Retrieving an array from a file and find it's size C

I have a file that I have to read some numbers from and put them into an array. The only problem is that I don't know how to find the size of it. I am given the maximum size of the array but the numbers don't fill the array completely. I tried many different ways to make it work but it doesn't read the correct values from the file. Is there any other way to do it without sizeof?
#include<stdio.h>
#define MAX_NUMBER 25
int main(void)
{
int test[];
int size;
FILE* sp_input;
int i;
sp_input = fopen("a20.dat", "r");
if (sp_input == NULL)
printf("\nUnable to open the file a20.dat\n");
else
{
while( fscanf(sp_input, "%d", &test[i])!=EOF)
{
size=sizeof(test)/sizeof(test[0]);
}
for(i = 0; i < size; i++)
printf("\na[%d]=%d has a size of %d\n", i,test[i],size);
fclose(sp_input);
}
return 0;
}
If you increment i each time you successfully do a fscanf, it will serve as a count of the number of items read.
i = 0;
while (fscanf(sp_input, "%d", &test[i]) == 1) {
i = i + 1;
}
// Now, i is the number of items in the list, and test[0] .. test[i-1]
// are the items.
Edit: As #chux pointed out, in this case it's better to compare to 1, the expected number of items scanned, on each call. If a bogus input is provided (non-digits), there's still a problem and you should stop.
Define a maximum size array and continue looping as able.
File input need not fill the array, just populate it as it can. Keep track, i, of how many of test[] was used and be sure not to overfill the array.
#define MAX_NUMBER 25
int test[MAX_NUMBER];
FILE* sp_input = fopen("a20.dat", "r");
...
// Use `size_t` for array indexing
size_t i;
// do not read too many `int`
for (i=0; i<MAX_NUMBER; i++) {
if (fscanf(sp_input, "%d", &test[i]) != 1) {
break;
}
printf("test[%zu]=%d\n", i, test[i]);
}

How to read a 10 GB txt file consisting of tab-separated double data line by line in C

I have a txt file consisting of tab-separated data with type double. The data file is over 10 GB, so I just wish to read the data line-by-line and then do some processing. Particularly, the data is layout as an matrix with, say 1001 columns, and millions of rows. Below is just a fake sample to show the layout.
10.2 30.4 42.9 ... 3232.000 23232.45
...
...
7.234 824.23232 ... 4009.23 230.01
...
For each line I'd like to store the first 1000 values in an array, and the last value in a separate variable. I am new to C, so it would be nice if you could kindly point out major steps.
Update:
Thanks for all valuable suggestions and solutions. I just figured out one simple example where I just read a 3-by-4 matrix row by row from a txt file. For each row, the first 3 elements are stored in x, and the last element is stored in vector y. So x is a n-by-p matrix with n=p=3, y is a 1-by-3 vector.
Below is my data file and my code.
Data file:
1.112272 -0.345324 0.608056 0.641006
-0.358203 0.300349 -1.113812 -0.321359
0.155588 2.081781 0.038588 -0.562489
My code:
#include<math.h>
#include <stdlib.h>
#include<stdio.h>
#include <string.h>
#define n 3
#define p 3
void main() {
FILE *fpt;
fpt = fopen("./data_temp.txt", "r");
char line[n*(p+1)*sizeof(double)];
char *token;
double *x;
x = malloc(n*p*sizeof(double));
double y[n];
int index = 0;
int xind = 0;
int yind = 0;
while(fgets(line, sizeof(line), fpt)) {
//printf("%d\n", sizeof(line));
//printf("%s\n", line);
token = strtok(line, "\t");
while(token != NULL) {
printf("%s\n", token);
if((index+1) % (p+1) == 0) { // the last element in each line;
yind = (index + 1) / (p+1) - 1; // get index for y vector;
sscanf(token, "%lf", &(y[yind]));
} else {
sscanf(token, "%lf", &(x[xind]));
xind++;
}
//sscanf(token, "%lf", &(x[index]));
index++;
token = strtok(NULL, "\t");
}
}
int i = 0;
int j = 0;
puts("Print x matrix:");
for(i = 0; i < n*p; i++) {
printf("%f\n", x[i]);
}
printf("\n");
puts("Print y vector:");
for(j = 0; j < n; j++) {
printf("%f\t", y[j]);
}
printf("\n");
free(x);
fclose(fpt);
}
With above, hopefully things will work if I replace data_temp.txt with my raw 10 GB data file (of course change values of n,p, and some other code wherever necessary.)
I have additional questions that I wish if you could help me.
I first initialized char line[] as char line[(p+1)*sizeof(double)] (note not multiplying n). But the line cannot be read completely. How could I assign memory JUST for one single line? What's the lenght? I assume it's (p+1)*sizeof(double) since there are (p+1) doubles in each line. Should I also assign memory for \t and \n? If so, how?
Does the code look reasonable to you? How could I make it more efficient since this code will be executed over millions of rows?
If I don't know the number of columns or rows in the raw 10 GB file, how could I quickly count rows and columns?
Again I am new to C, any comments are very appreciated. Thanks a lot!
1st way
Read file in chunks into preallocated buffer using fread.
2nd way
Map the file into your process memory space using mmap, move the pointer then over the file.
3rd way
Since your file is delimited by lines, open the file with fopen, use setvbuf or similar to set a buffer size greater than about 10 lines or so, then read the file line-by-line using fgets.
To potentially read the file even faster, use open with O_DIRECT (assuming Linux), then use fdopen to get a FILE * for the open file, then use setvbuf to set a page-aligned buffer. Doing that will allow you to bypass the kernel page cache - if your system's implementation works successfully using direct IO that way. (There can be many restrictions to direct IO)
Something to get you started: Reading 1 line
#define COLUMN (1000+1)
double data[COLUMNS];
for (int i = 0; i< COLUMN; i++) {
char delim = '\n';
int cnt = fscanf(in_stream, "%lf%c", &data[i], &delim);
if (cnt < 1) {
if (cnt == EOF && i == 0) return 0; // None read, OK as end of file
puts("Missing or bad data");
return -1; // problem
}
if (delim != '\t') {
// If tab not found, should be at end of line
if (delim == '\n' && i == COLUMN-1) {
return COLUMN; // Success
}
puts("Bad delimiter");
return -1;
}
}
puts("Extra data");
return -1;

How to read unlimited characters in C

How to read unlimited characters into a char* variable without specifying the size?
For example, say I want to read the address of an employee that may also take multiple lines.
You have to start by "guessing" the size that you expect, then allocate a buffer that big using malloc. If that turns out to be too small, you use realloc to resize the buffer to be a bit bigger. Sample code:
char *buffer;
size_t num_read;
size_t buffer_size;
buffer_size = 100;
buffer = malloc(buffer_size);
num_read = 0;
while (!finished_reading()) {
char c = getchar();
if (num_read >= buffer_size) {
char *new_buffer;
buffer_size *= 2; // try a buffer that's twice as big as before
new_buffer = realloc(buffer, buffer_size);
if (new_buffer == NULL) {
free(buffer);
/* Abort - out of memory */
}
buffer = new_buffer;
}
buffer[num_read] = c;
num_read++;
}
This is just off the top of my head, and might (read: will probably) contain errors, but should give you a good idea.
Just had to answer Ex7.1, pg 330 of Beginning C, by Ivor Horton, 3rd edition. Took a couple of weeks to work out. Allows input of floating numbers without specifying in advance how many numbers the user will enter. Stores the numbers in a dynamic array, and then prints out the numbers, and the average value. Using Code::Blocks with Ubuntu 11.04. Hope it helps.
/*realloc_for_averaging_value_of_floats_fri14Sept2012_16:30 */
#include <stdio.h>
#include <stdlib.h>
#define TRUE 1
int main(int argc, char ** argv[])
{
float input = 0;
int count=0, n = 0;
float *numbers = NULL;
float *more_numbers;
float sum = 0.0;
while (TRUE)
{
do
{
printf("Enter an floating point value (0 to end): ");
scanf("%f", &input);
count++;
more_numbers = (float*) realloc(numbers, count * sizeof(float));
if ( more_numbers != NULL )
{
numbers = more_numbers;
numbers[count - 1] = input;
}
else
{
free(numbers);
puts("Error (re)allocating memory");
exit(TRUE);
}
} while ( input != 0 );
printf("Numbers entered: ");
while( n < count )
{
printf("%f ", numbers[n]); /* n is always less than count.*/
n++;
}
/*need n++ otherwise loops forever*/
n = 0;
while( n < count )
{
sum += numbers[n]; /*Add numbers together*/
n++;
}
/* Divide sum / count = average.*/
printf("\n Average of floats = %f \n", sum / (count - 1));
}
return 0;
}
/* Success Fri Sept 14 13:29 . That was hard work.*/
/* Always looks simple when working.*/
/* Next step is to use a function to work out the average.*/
/*Anonymous on July 04, 2012*/
/* http://www.careercup.com/question?id=14193663 */
How about just putting a 1KB buffer (or 4KB) on the stack, reading into that until you find the end of the address, and then allocate a buffer of the correct size and copy the data to it? Once you return from the function, the stack buffer goes away and you only have a single call to malloc.

Resources