I need to write an array whose parts are scattered in code.
In short, my code is this:
(...)
FILE *f = fopen(filepath, "wb"); // wb -write binary
for (uint i = 0; i < num_points; i++) {
fwrite(points[i].loc, sizeof(float[3]), 1, f);
}
fclose(f);
As you can see, my solution was writing each new part of the array in front of the file.
But does this have a problem with efficiency, or memory reading? Should I allocate the entire array in RAM and then write the file?
fwrite will buffer your data until its buffer is full, or fflush/fclose/etc. is called. So it won't perform a syscall each iteration.
Related
In a C program I'm writing I have to read values from a text file and put them in an array for later use.
I don't like my code (snippet shown below) because I do two while loops, the first to count the number of values, then I create an array as big as that value, and lastly I read the file again, filling the array.
Also, in the first loop, I use a variable x because the fscanf() requires it, but I never use it later in the code and I'd like to avoid it at all if possible.
int x, n=0, sum=0;
fp=fopen("data.txt", "r");
while(fscanf(fp,"%d\n",&x)!=EOF){
n++;
}
rewind(fp);
int v[n];
while(fscanf(fp,"%d\n",&v[i])!=EOF){
sum+=v[i];
i++;
}
So, any advice on how can I improve this code? I figured I could kinda "fix" it by declaring an array big "enough" at the beginning and filling it as needed. But I don't know in advance how many values I have to work with, so I decided to trash out this method.
This is one scenario where dynamic memory allocation can come handy. You can follow the general procedure as described below
Define a pointer.
Open the file fopen() and read the first element from the file fscanf(). Error check should be taken care, also.
If the read is successful, allocate memory dynamically malloc() to the pointer and copy the value.
Read next element.
4.1. If the read is successful
If the read is successful, re-allocate the memory realloc() with one more new element size.
Copy the last read value to the newly allocated memory.
4.2. If the read id failure, check for EOF and stop the reading.
Continue to step 4.
Also, please keep in mind, the memory which you have allocate using dynamic memory allocation, needs to be free()d also.
As a note, referring to the comment of Mr. # szczurcio, this is not an optimized effort, because, you've to re-allocte memory in each successful read. TO minimize the impact of dynamic memory allocation, we can decide on a threshold value which we will use to allocate memory and then, when exhausted, will double the amount of the previous value. This way, the allocation will happen in a chunk and the allocation overhead in each read cycle can be avoided.
Minor changes to the code, please note that I've changed v to be an int* and then check the amount of carriage returns in the file. I then allocate the correct amount of memory for the array, rewind file and then let your code loop through the file again...
int x, n=0, sum=0;
char c;
int* v;
int i = 0;
fp=fopen("data.txt", "r");
while (f.get(c))
if (c == '\n')
++i;
rewind(fp);
v = malloc( i * sizeof( int ));
i = 0;
while(fscanf(fp,"%d\n",&v[i])!=EOF)
{
sum+=v[i];
i++;
}
As said by Sourav, dynamic memory allocation is definitely the way to go.
That said, you can also change the data structure to another that doesn't require a priori knowledge of N. If you only need sequential access to the values and don't really need random access, a linked list is an option. Moreover, you can always use binary trees, hash tables and so on. Depends on what you want to do with the data.
P.S: srry, I'd post this as a comment, but I don't have the reputation.
This is the typical scenario in which you would like to know the size of the file before creating the array. Well, better speaking, the number of lines in the file.
I'm going to suggest something radically different. Since this is a text file, the smallest number will occupy two chars (smallest in a "text" sense), one for the digit, and another one for the \n (though the \n can be one or two bytes, that's OS dependant).
We can now the size of the file. After fopen'ing it, you can, through ftell, know how many bytes it holds. If you divide that number by 2, you will have an approximation of the maximum possible number of lines in the file. So you can create the array of that size, and then save the number of positions really occupied.
FILE * fp = fopen( "data.txt", "rt" );
/* Get file size */
fseek( fp, SEEK_END, 0 );
long size = ftell( fp );
fseek( fp, SEEK_SET, 0 );
/* Create data */
long MaxNumbers = size / 2;
int * data = (int *) malloc( sizeof( int ) * MaxNumbers );
long lastPos = 0;
/* Read file */
int * next = data;
while( fscanf(fp, "%d\n", next) != EOF ) {
++next;
}
lastPos = ( next - data ) / sizeof( int );
/* Close the file */
fclose( fp );
Once you have the data loaded in data, you know the real number of items, so you can copy it to another array of the exact size (maybe through memcpy()), or stay with this array. If you want to change the array:
int * v = (int *) malloc( sizeof( int ) * lastPos );
memcpy( v, data, sizeof( int ) * lastPos );
free( data );
Note: this code is a simple demo, and it does not check for NULL's after calling malloc(), while a real program should.
This code does not waste memory or computation time when expanding the array because the data does not fit. However, it a) creates an array at the beginning which is potentially bigger than needed, and b) if you want to have an array of the exact size, then you will temporally have twice the needed space allocated. We are changing memory for better performance, and sometimes this is not a good idea for our environment (i.e., an embedded system).
A big improvement to this strategy would be to be able to deal with the input file. If you dedicate the same space to each number (say there are always three positions, and a 3 is stored as 003), and you know the maximum number (in order to know how many spaces you need for each number), then the algorithm will be completely accurate, and you don't need to change the data read to another array or whatever. With this change, this strategy is simply the best one I can imagine.
Hope this helps.
WHAT THE CODE DOES: I read a binary file and sort it. I use a frequency array in order to do so.
UPDATES:it does do the loop, but it doesn`t write the numbers correctly...
That is the code. I want to write on file after reading from it. I will suprascript what is already written, and that is okey. The problem is I have no error on compiling, but at run time it seems I have an infinite loop.
The file is binary. I read it byte by byte and that`s also the way I want to write it.
while(fread(readChar, sizeof(readChar)/2, 1, inFile)){
bit = atoi(readChar);
array[bit] = array[bit] + 1;
}
fseek(inFile, 0, SEEK_SET);
for( i = 0; i < 256; i++)
while(array[i] > 0){
writeChar[0] = array[i]; //do I correctly convert int to char?
fwite(writeChar, sizeof(readChar)/2, 1, inFile);
array[i] = array[i] -1;
}
The inFile file declaration is:
FILE* inFile = fopen (readFile, "rb+");
It reads from the file, but does not write!
Undefined behavior:
fread() is used to read a binary representation of data. atoi() takes a textual represetation of data: a string (a pointer to an array of char that is terminated with a '\0'.
Unless the data read into readChar has one of its bytes set to 0, calling atoi() may access data outside readChar.
fread(readChar, sizeof(readChar)/2, 1, inFile);
bit = atoi(readChar);
Code it not reading data "bit by bit" At #Jens comments: "The smallest unit is a byte." and that is at least 8 bits.
The only possible reason for an infinite loop I see is, that your array is not initialized.
After declaration with:
int array[256];
the elements can have any integer value, also very large ones.
So there are no infinite loops, but some loops can have very much iterations.
You should initialize your array with zeros:
int array[256]={0};
I don't know the count of elements in your array and if this is the way you declare it, but if you declare your array like shown, than ={0} will initialize all members with 0. You also can use a loop:
for(int i=0; i < COUNT_OF_ELEMENTS;i++) array[i] = 0;
EDIT: I forgot to mention, that your code is only able to sort files with only numbers within.
For that, you have also to change the conversion while writing:
char writeChar[2]={0};
for( int i = 0; i < 256; i++)
while(array[i] > 0){
_itoa(i,writeChar,10);
fwrite(writeChar, sizeof(char), 1, inFile);
array[i] = array[i] -1;
}
File content before:
12345735280735612385478504873457835489
File content after:
00112223333334444455555556777778888889
Is that what you want?
Apologies if the title doesn't make sense, I've been staring at my monitor for 15 minutes trying to come up with one.
I'm using a library function from a C API (in 64-bit Xubuntu 14.04) to move a set number of int16_t values into a buffer and repeat it a set number of times, described here in (sort of) pseudo-code:
int16_t *buffer = calloc(total_values_to_receive, 2 * sizeof(samples[0]));
while (!done){
receive_into_buffer(buffer, num_values_to_receive_per_pass);
fwrite(buffer, 2 * sizeof(samples[0]), num_values_to_receive_per_pass, file);
values_received += num_values_to_receive_per_pass;
if (values_received == total_values_to_receive){
done = true;
}
}
Basically what it does is receive a set number of values on each pass and writes those values to a file, also note that the same file is appended each time. E.g. if total_values_to_receive = 100 and num_values_to_receive_per_pass = 10, there would be 10 passes in total.
What I would like to do, mainly to increase speed, is to have the write part occur after all passes have been completed. The library function prototype contains void* samples, size_t num_samples, which, you guessed it, refers to the buffer where the samples need to be stored into and the amount of samples to store.
I'm not completely confident with pointers, but is there a way to write into the buffer on one pass, and then move the pointer by num_values_to_receive_per_pass so that the next time the library function is called, it just appends that buffer (so to speak). Then the pointer can be moved to the start of the buffer and fwrite can be called to write the total number of values into the file.
Does that make sense? Any tips on how to actually implement it?
Thanks in advance.
Assuming the second argument of receive_into_buffer is in unit of buffer element, not byte, the following should work,
int16_t *buffer = calloc(total_values_to_receive, 2 * sizeof(buffer[0]));
int16_t *temp = buffer;
while (!done){
receive_into_buffer(temp, num_values_to_receive_per_pass);
temp += 2 * num_values_to_receive_per_pass;
values_received += num_values_to_receive_per_pass;
if (values_received == total_values_to_receive){
done = true;
}
}
fwrite(buffer, 2 * sizeof(buffer[0]), total_values_to_receive, file);
Style: a plain for() loop will avoid the indicator variable:
for (values_receive=0; values_received < total_values_to_receive; values_received += num_values_to_receive_per_pass) {
receive_into_buffer(buffer, num_values_to_receive_per_pass);
fwrite(buffer, 2 * sizeof(samples[0]), num_values_to_receive_per_pass, file);
}
Im trying to implement External Sorting in C.
I have to read N integers (fixed depending on main memory) from a file initially so that I can apply quicksort on them and then continue with the merging process.
I can think of these 2 ways:
read N integers one by one from the file and put them in an array then sort them.
read a bulk of data into a big char array and then reading integers from it using sscanf.
1st method is clearly slow and 2nd method is using lot of extra memory (but we have a limited main memory)
Is there any better way?
Don't try to be more clever than your OS, it probably supports some clever memory management functions, which will make your life easier, and your code faster.
Assuming you are using a POSIX compliant operating system, then you can use mmap(2).
Map your file into memory with mmap
Sort it
Sync it
This way the OS will handle swapping out data when room is tight, and swap it in when you need it.
Since stdio file operations are buffered, you won't really need to worry about the first option, especially if the file isn't huge. Remember you're not operating directly on a file, but a representation of that file in memory.
For example, if you scan in one number at a time, the system will read in a much larger section from the file (on my system it's 4096 bytes, or the entire file if it's shorter).
you can use below function to read ints from file one by one and continue sorting and merging on the go....
the function takes filename and integer count as argument and it returns int from file.
int read_int (const char *file_name, int count)
{
int err = -1;
int num = 0;
int fd = open(filename, O_RDONLY);
if(fd < 0)
{
printf("error opening file\n");
return (fd);
}
err = pread(fd, &num, sizeof(int), count*sizeof(int));
if(err < 0)
{
printf("End of file reached\n");
return (err);
}
close(fd);
return (num);
}
Sort at the same time you read is the best way. and save your data into linked list instead of array is more efficient in the sort
you can use fscanf() to read integer by integer from file. and try to sort at the moment you read integer from the file. I mean when you read integer from the file put it in the array in the right place to get the array sorted when you finish reading.
The following example read from file integer by integer and insert them with sort at the same time of reading. the integer are saved into arrays and not into linked list
void sort_insert(int x, int *array, int len)
{
int i=0, j;
for(i=0; i<(len-1); i++)
{
if (x > array[i])
continue;
for (j=(len-1); j>i; j--)
array[j] = array[j-1];
break;
}
array[i] = x;
}
void main() {
int x, i;
int len = 0;
int array[50];
FILE *fp = fopen("myfile.txt", "r");
while (len<50 && fscanf(fp, " %d",&x)>0)
{
len++;
sort_insert(x, array, len);
}
for (i=0; i<len; i++)
{
printf("array[%d] = %d\n", i, array[i]);
}
}
I need to write the content of an array into a file. Let's suppose I generate random numbers and put them into an array. How do I copy those values into the output file?
[... previous code/declarations ...]
file = open(filename, O_WRONLY|O_CREAT|O_TRUNC, S_IRWXU|S_IWGRP|S_IWOTH);
buffer = (double *) calloc(d, sizeof(double));
for (i = 0; i < d; i++)
{
double div = (double) (RAND_MAX/0.333);
double r = rand()/div;
(*(buffer+i)) = r;
}
write(file, buffer, sizeof(double));
[ ... ]
If I try to read the files, all I see is this memory garbage, nonsense chars all over my screen. Can anyone please help me figuring out what is I do wrong?
The write call is used to write memory blocks to a file, which will seem like garbage mostly if you try to just output the file later. That's because they will most likely be an IEEE754 binary representation.
If you want a textual representation of the numbers, use something like:
fh = fopen (filename, "w");
for (i = 0; i < d; i++)
fprintf (fh, "%f\n", buffer[i]);
fclose (fh);
You are writing random binary values to the file. When you read the file, you see the random binary values, interpreted as characters. Where's the problem?
At least this part is wrong:
write(file, buffer, sizeof(double));
You're writing only a single value, not the whole array. Multiply the size with d (which seems to be the number of elements in your array).
Why do you put
double div = (double) (RAND_MAX/0.333);
inside the loop?