Looking for a more efficient method of reading a CSV file into a 2D array of doubles. My current code works okay but takes 2 minutes to read into the array.
The CSV file being read is as follows. The entire file contains 28,325,381 lines and is 1.8GB.
CoordinateX,CoordinateY,CoordinateZ,Pressure,Temperature,VelocityX,VelocityY,VelocityZ,
0,0,0.0904,33.5797,300,-0.00146382,0.000389435,-0.00147085,
0,0.0003,0.0904,33.5795,300,0.126682,-0.000382509,0.00330599,
0,0.0006,0.0904,33.5793,300,0.250278,-0.00151828,0.0100881,
0,0.0009,0.0904,33.5788,300,0.365407,-0.00287706,0.0184123,
...
The original code...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
/* Declare file and character pointers */
FILE *pFile_r = NULL;
FILE *pFile_w = NULL;
char *pCell;
int n = 0, p = 0, number_properties = 8;
/* Open CSV file */
char buffer[128];
pFile_r = fopen("G:\\csv\\coarse (19000 - 39000)\\39500.csv","r");
/* Declare array */
int array_total_nodes = 28325340; //number of nodes in array
double **properties_array;
properties_array = malloc(array_total_nodes * sizeof(double*));
for (n=0; n<array_total_nodes; n++) {
properties_array[n] = malloc(number_properties * sizeof(double));
}
/* Fill array */
rewind(pFile_r);
fgets(buffer, 128, pFile_r);
while (!feof(pFile_r)) {
pCell = strtok(buffer, ",");
for (p=0; p<number_properties; p++) {
properties_array[n][p] = strtod(pCell, &pCell);
pCell = strtok(NULL, ",");
}
fgets(buffer, 128, pFile_r);
n++;
}
return 0;
}
This method is something simple I came up with. I'm hoping there is a more clever way that's faster. Your help is appreciated.
Update
I have switched for fgets to fscanf as such...
while (!feof(pFile_r)) {
fscanf(pFile_r, "%lf,%lf,%lf,%lf,%lf,%lf,%lf,%lf,", &properties_array[n][0], &properties_array[n][1], &properties_array[n][2], &properties_array[n][3],
&properties_array[n][4], &properties_array[n][5], &properties_array[n][6], &properties_array[n][7]);
n++;
}
Reading in now takes 70 seconds as opposed to 120 (a 40% decrease). I would like to improve upon this further if possible.
Related
I have a text file that lists some groceries and information about them. Looks something like this:
Round_Steak 1kg 17.38 18.50
Chicken 1kg 7.21 7.50
Apples 1kg 4.25 4.03
Carrots 1kg 2.3 2.27
Here's my code that I've used that allows me to reference each individual line:
#include <stdio.h>
#include <string.h>
#define Llimit 100
#define Rlimit 10
int main()
{
//Array Line gets each line from the file by setting a limit for them and printing based on that limit.
char line[Rlimit][Llimit];
FILE *fp = NULL;
int n = 0;
int i = 0;
fp = fopen("food.txt", "r");
while(fgets(line[n], Llimit, fp))
{
line[n][strlen(line[n]) - 1] = '\0';
n++;
}
printf("%s", line[1]);
fclose(fp);
return 0;
}
For instance, if I print line[1], I will get "Chicken 1kg 7.21 7.50". What I need to do however, is separate each string into their individual parts. So if I call something like line[1][0], I will get only "Chicken" as a result. I've tried using strtok(line[i], " ") in some for loops and other things like that, but I'm really stumped about how to apply it to this code.
you can write a function (str_to_word_array)
this is my str_to_word_array func
https://github.com/la-montagne-epitech/mY_Lib_C/blob/master/my_str_to_word_array.c
it's take a string and a separator( " " for your case), you have to stock the result in char **, just like this:
char *line; // type of the element
char separator // type of the element
char **tab = my_str_to_word_array(line, separator);
SOLVED:
With the help of brahimi haroun in the comments, i made a separate function to perform the task seperately and it works great. I thought I would share it here:
char **get_column_item(char *lines, int column)
{
int i = 0;
char *p = strtok(lines, " ");
char *array[4];
while (p != NULL)
{
array[i++] = p;
p = strtok(NULL, " ");
}
printf("%s\n", array[column]);
return array[column];
}
Now, with my original code, if you call get_column_item(line[1], 0); It will return the first item in that row, so it will return "Chicken".
I'm working on image convolution for a 416 * 416 color image with a 3 * 3 * 3 * 16 kernal weights (where kernal width - 3, kernal height - 3, filter channels - 3, number of filters - 16). I'm trying to do this in C, but first I need to read the image from the text file and store it in the memory before working with the convolution function. But I think it seems that C doesn't allow me to write 416 * 416 * 3 size string values into an array. I'm actually a newbie to C, so I'm trying to figure out what would be the best approach I should obey in this ?
Below you can see the code.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char line[255];
int fileSize = 416 * 416 * 3;
char image[416 * 416 * 3][255];
FILE *fpointer_1 = fopen("dog_text_image.txt", "r");
for (int i = 0; i < fileSize; i++)
{
fgets(line, 255, fpointer_1);
strcpy(image[i], line);
};
fclose(fpointer_1);
printf("1st value : %s\n", image[0]);
printf("2nd value : %s\n", image[1]);
printf("3rd value : %s\n", image[3]);
return 0;
}
You can alternatively use pointer to array (instead of pointer to pointer) to allocate the entire matrix with a single malloc, and deallocate it with a single free:
#include <stdio.h>
#include <stdlib.h>
int main()
{
/*Pointer to array of length 255*/
char (*image)[255];
/*One malloc to allocate memory*/
if(!(image = malloc(416 * 416 * 3 * sizeof *image))){
perror("Bad allocation!");
return EXIT_FAILURE;
}
/* Do stuff with the matrix*/
image[400][100] = 'a';
/*One free to deallocate memory*/
if(image){
free(image);
}
return 0;
}
It's likely you are getting a stack overflow, try allocating the memory to the heap, something like replacing
char image[416 * 416 * 3][255];
with
//...
char **image;
if(!(image = malloc((fileSize) * sizeof(*image)))){
perror("Bad allocation!");
return EXIT_FAILURE;
}
for(int i = 0; i < fileSize; i++) {
if(!(image[i] = malloc(255))){
perror("Bad allcation!");
return EXIT_FAILURE;
}
}
//...
To free the memory it's the other way around:
for (int i = 0; i < fileSize; i++){
if(image[i])
free(image[i]);
}
if(image)
free(image);
Also, in the code
//...
fgets(line, 255, fpointer_1);
strcpy(image[i], line;
//...
strcpy is really unnecessary you can read directly to image[i] in fgets.
It's possible you can solve the issue with the help of #CraigEstey's comments, try that first.
I am working on a programming assignment in C, which is about creating basic automation for cinema halls.
For holding data of halls, I define a structure like this:
typedef struct {
char *hallName;
char *movieName;
seat** hallSeats;
int studentCount;
int fullFareCount;
int totalSum;
int width;
int height;
}Hall;
So I am given a text file with commands and whenever I came up with a specific command, I should create a separate hall. For that reason, I created another function for that.
Hall makeHall(char **temp) //TEMP HOLDING THE LINES FROM FILE
{
int width = strToInt(temp[3]);
int height = strToInt(temp[4]);
char currentRowLetter = 'A';
int currentRow;
int currentSeat;
seat **hall = malloc(sizeof(seat*) * width );
for (currentRow=0 ; currentRow < width ; currentRow++)
{
hall[currentRow] = malloc(sizeof(seat) * height );
for(currentSeat=0; currentSeat < height ; currentSeat++)
{
hall[currentRow][currentSeat].rowLetter = currentRowLetter;
hall[currentRow][currentSeat].seatNumber = currentSeat + 1;
hall[currentRow][currentSeat].seatTaken = ' ';
}
++currentRowLetter;
}
Hall newHall;
newHall.hallName = temp[1];
newHall.movieName = temp[2];
newHall.hallSeats = hall;
newHall.width = width;
newHall.height = height;
return newHall;
}
Since I will have multiple halls, I created a Hall array in order to access them later.
Hall *allHalls = malloc(sizeof(Hall) * 10); /*Hall placeholder*/
While I iterate over the lines, I check commands and create halls or sell tickets.
Hall *allHalls = malloc(sizeof(Hall) * 10); /*Hall placeholder*/
FILE *f;
f = fopen("input.txt", "rt");
char *line = malloc (sizeof(char) * 200); /*LINE HOLDER*/
int currentLineNumber = 0;
char *tmp;
int hallNumber = 0;
while (1) { /*PARSING FILE*/
if (fgets(line,200, f) == NULL) break; /*FILE END CHECKER*/
currentLineNumber++;
tmp = strtok(line," ");
char **temp = malloc(sizeof(char*) * 6);
int currentWordNumber = 0;
while(tmp != NULL) /*PARSING LINES*/
{
temp[currentWordNumber] = malloc(strlen(tmp) + 1);
strcpy(temp[currentWordNumber],tmp);
tmp = strtok (NULL, " ");
currentWordNumber++;
}
if(!strcmp("CREATEHALL",temp[0]))
{
allHalls[hallNumber] = makeHall(temp); /*<<<<<<<PROBLEM*/
hallNumber++;
printf("%d\n",hallNumber);
}
Now that's the part I am lost at. Whenever I tried to access the array, the program crashes.
I thought it was a memory problem, so increased memory allocated by malloc for allHalls to 40 (even though it should not be a problem, since file only gives 3 different halls) and program no longer crashes, but instead overwrites the previous hall in the array.
I tried multiple solutions but none of them came out any good, so closest I get is this.
I did use java a lot before, so I am still stuck to OOP and pretty new to C.
EDIT
Seat is defined as
typedef struct {
char rowLetter;
int seatNumber;
char seatTaken;
}seat;
also example createhall command is
CREATEHALL Hall_A Avatar 24 20
while the numbers at the end being width and height for hall
EDIT : CODE
I got the bug:
At the bottom of the while(1) loop in main you do a free(allHalls); so now there are no more halls and you get a segfault...
It was in the code you didn't show us:
while (1) {
...
if(!strcmp("CREATEHALL",temp[0]))
{
allHalls[hallNumber] = makeHall(temp); /*<<<<<<<PROBLEM*/
hallNumber++;
printf("%d\n",hallNumber);
}
....
free(temp);
free(allHalls); // <-- there's your bug
}
fclose(f);
free(line);
I am new with .ini files and thus this qn(which might seem silly) .I have created a .ini file and access it via my C program. The ini file looks like this:
[key]
title = A,H,D
The C program accesses it using:
LPCSTR ini ="C:\\conf.ini;
char var[100];
GetPrivateProfileString("key", "title", 0, var, 100, ini);
printf("%s", var);
char* buffer = strtok(var, ", ");
do{
printf("%s", buffer);
if (strcmp(buffer, "A")==0)
printf("Hello");
puts("");
}while ((buffer=strtok(NULL, ", "))!= NULL);
output looks as :
A H D F G IAHello
H
D
F
G
Now what I need to do is use these individual tokens again to form an array with indices within my C program. For example:
char x[A, H, D, F, G]
so that when I refer to the index 2, x[2] should give me 'D'. Could somebody suggest a way to do this. I have never used strtok before and thus very confused. Thank you in advance.
This question is quite similar to others regarding getting external information and storing it in an array.
The problem here is the amount of elements in your array to store.
You could use Link-lists, but for this example, I would scan the file, getting the total amount of items needed for the array - and then parse the file data again - storing the items in the array.
The first loop, goes through and counts the items to be store, as per your example posted. I will do the second loop just as an example - please note in my example you would of created nTotalItems and have counted the amount of items, storing that in nTotalItems ... I am assuming you want to store a string, not just a char...
Also please note this a draft example, done at work - only to show a method of storing the tokens into an array, therefore there is no error checking ec
// nTotalItems has already been calculated via the first loop...
char** strArray = malloc( nTotalItems * sizeof( char* ));
int nIndex = 0;
// re-setup buffer
buffer = strtok(var, ", ");
do {
// allocate the buffer for string and copy...
strArray[ nIndex ] = malloc( strlen( buffer ) + 1 );
strcpy( strArray[ nIndex ], buffer );
printf( "Array %d = '%s'\n", nIndex, strArray[ nIndex ] );
nIndex++;
} while ((buffer=strtok(NULL, ", "))!= NULL);
Just use an INI parser that supports arrays.
INI file:
[my_section]
title = A,H,D
C program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <confini.h>
#define MY_ARRAY_DELIMITER ','
struct configuration {
char ** title;
size_t title_length;
};
static char ** make_strarray (size_t * arrlen, const char * src, const size_t buffsize, IniFormat ini_format) {
*arrlen = ini_array_get_length(src, MY_ARRAY_DELIMITER, ini_format);
char ** const dest = *arrlen ? (char **) malloc(*arrlen * sizeof(char *) + buffsize) : NULL;
if (!dest) { return NULL; }
memcpy(dest + *arrlen, src, buffsize);
char * iter = (char *) (dest + *arrlen);
for (size_t idx = 0; idx < *arrlen; idx++) {
dest[idx] = ini_array_release(&iter, MY_ARRAY_DELIMITER, ini_format);
ini_string_parse(dest[idx], ini_format);
}
return dest;
}
static int ini_handler (IniDispatch * this, void * v_conf) {
struct configuration * conf = (struct configuration *) v_conf;
if (this->type == INI_KEY && ini_string_match_si("my_section", this->append_to, this->format)) {
if (ini_string_match_si("title", this->data, this->format)) {
/* Save memory (not strictly needed) */
this->v_len = ini_array_collapse(this->value, MY_ARRAY_DELIMITER, this->format);
/* Allocate a new array of strings */
if (conf->title) { free(conf->title); }
conf->title = make_strarray(&conf->title_length, this->value, this->v_len + 1, this->format);
if (!conf->title) { return 1; }
}
}
return 0;
}
static int conf_init (IniStatistics * statistics, void * v_conf) {
*((struct configuration *) v_conf) = (struct configuration) { NULL, 0 };
return 0;
}
int main () {
struct configuration my_conf;
/* Parse the INI file */
if (load_ini_path("C:\\conf.ini", INI_DEFAULT_FORMAT, conf_init, ini_handler, &my_conf)) {
fprintf(stderr, "Sorry, something went wrong :-(\n");
return 1;
}
/* Print the parsed data */
for (size_t idx = 0; idx < my_conf.title_length; idx++) {
printf("my_conf.title[%d] = %s\n", idx, my_conf.title[idx]);
}
/* Free the parsed data */
if (my_conf.title_length) {
free(my_conf.title);
}
return 0;
}
Output:
my_conf.title[0] = A
my_conf.title[1] = H
my_conf.title[2] = D
I am currently making a small test program for simple file checking. The program writes two small matrices(A and B) to files, closes and reopens them, reads in the matrices from the files, multiplies them and writes the resulting matrix(C) to a new file. It then closes and reopens this file containing the answer and prints it out for me to check if the IO operation proceeded correctly.
My problem is that the result matrix reads differently than expected.
I consider myself a beginner in C and of file input/output operations and this is the code that is causing me trouble. I am using WinXP, Codeblocks and Mingw.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define bufferA(i,k) (bufferA[i*cols+k])
#define bufferB(k,j) (bufferB[k*cols+j])
#define bufferC(i,j) (bufferC[i*cols+j])
void printMatrix(int *nMatrixToPrint, int nNumberOfElements, int nDimension) {
// This function prints out the element of an Array. This array represents a matrix in memory.
int nIndex;
printf("\n");
for (nIndex = 0; nIndex < nNumberOfElements; nIndex++) {
if (nIndex % nDimension == 0)
printf("\n");
printf("%d,",nMatrixToPrint[nIndex]);
}
return;
}
int main(int argc, char *argv[]) {
int nElements = 16, nDim = 4;
int A[4][4] = {{1,2,3,1},{2,2,1,2},{4,2,3,1},{5,1,1,3}};
int B[4][4] = {{3,2,1,4},{2,2,3,3},{4,1,3,2},{2,2,5,1}};
// Create files of A and B, delete old ones if present
FILE *fpA = fopen("A.dat", "w+");
FILE *fpB = fopen("B.dat", "w+");
// Write data to them
fwrite((int*)A, sizeof(*A), nElements, fpA);
fwrite((int*)B, sizeof(*B), nElements, fpB);
// and close them
fclose(fpA);
fclose(fpB);
// Reopen files
fpA = fopen("A.dat", "r");
fpB = fopen("B.dat", "r");
// Allocate memory
int *bufferA = (int*)malloc(nElements * sizeof(*bufferA));
int *bufferB = (int*)malloc(nElements * sizeof(*bufferB));
int *bufferC = (int*)calloc(nElements, sizeof(*bufferC));
// Read files
fread(bufferA, sizeof(int), nElements, fpA);
fread(bufferB, sizeof(int), nElements, fpB);
printf("\nA");
printMatrix(bufferA, nElements, nDim);
printf("\n\nB");
printMatrix(bufferB, nElements, nDim);
// Matrix multiplication
// Calculate and write to C
int i,j,k = 0; // Loop indices
int n = nDim,l = nDim, m = nDim, cols = nDim;
// multiply
for (i = 0; i < n; i++) { // Columns
for (j = 0; j < m; j++) { // Rows
//C(i,j) = 0;
for (k = 0; k < l; k++) {
bufferC(i,j) += bufferA(i,k) * bufferB(k,j);
}
}
}
printf("\n\nC_buffer");
printMatrix(bufferC, nElements, nDim);
// Create C and write to it
FILE* Cfile = fopen("C.dat", "w");
fwrite(bufferC, sizeof(*bufferC), nElements, Cfile);
// Close files
fclose(fpA);
fclose(fpB);
fclose(Cfile);
// reopen C for reading
Cfile = fopen("C.dat", "r");
// Obtain file size
fseek(Cfile , 0 , SEEK_END);
long lSize = ftell(Cfile);
rewind(Cfile);
printf("\nC file length is: %ld", lSize);
// read data into bufferA
fread(bufferA, sizeof(int), lSize, Cfile);
fclose(Cfile);
printf("\n\nC_file");
printMatrix(bufferA, nElements, nDim);
// Free allocated memory and remove dangling pointers
free(bufferA); bufferA = NULL;
free(bufferB); bufferB = NULL;
free(bufferC); bufferC = NULL;
exit(0);
}
Which gives me the following output:
A
1,2,3,1,
2,2,1,2,
4,2,3,1,
5,1,1,3,
B
3,2,1,4,
2,2,3,3,
4,1,3,2,
2,2,5,1,
C_buffer
21,11,21,17,
18,13,21,18,
30,17,24,29,
27,19,26,28,
C file length is: 64
C_file
21,11,21,17,
18,13,21,18,
30,17,24,29,
27,19,1,3,
As you can see, the last two elements in C_file are wrong, instead the output shows the last two elements in A as I was writing the file contents into bufferA. A switch to bufferB would swap the last two characters with the last elements in B which is still erroneous. A filecopy into another project would yield the last two integers as whatever was in ram at that malloc address.
My question is as follows: Why does not fwrite write the proper data into the file. Why does it manage the first 14 elements but not the last two? And how does this differ from my previous correct uses of fwrite and fread when I wrote and retrieved the elements of A and B?
You are writing binary data, and therefore you have to open the file in binary mode, the default is text mode. This makes a difference on windows, but not on *nix, which explains why it works for the other people here.
for all your fopen calls, include the letter 'b' in the mode argument, e.g. replace "w+" with "w+b" , replace "r" with "rb" and so on.
Your program runs just fine on my Mac.
The results would look better if printMatrix() output a final newline. Perhaps the unterminated line is causing some sort of confusion on your system?