C program over-writing file contents in if statement - c

I have a C program that is trying to record something called an "avalanche size". The criteria for recording this is if the "delta_energy" which is generated by the program is less than equal to zero then I can increment my avalanche size by "*avalanche_size = *avalanche_size + 1;" and if the delta_energy is not less than equal to zero then it continues running the loop without incrementing the avalanche size.
So what I want to do is to write the delta energy WHEN THE AVALANCHE SIZE IS INCREMENTED (not otherwise) to a file called delta_energies_GSA as shown in the code below.
But I what I find to happen is that if I put the fprintf statement inside the if{ } where the avalanche size is incremented for sure, everytime it does one iteration, it over-writes all the entries in the file. So in the end I end up with the file containing only the delta energies for one of the iterations. If I take the fprintf statemenet and put it outside the bracket, it records everything but it also gives me the delta energies for when the avalanche size is not incremented and I don't want those.
I thought about doing maybe a condition like "if the avalanche size is bigger than the previous avalanche size then fprintf the delta energy" ... but I'm not sure how to do this since avalanche size is just an integer not a vector..
Any help would be really appreciated! Thank you
for (k = 0; k < n_nodes; k++)
{
if (delta_energy <= 0.0)
{
stored_state[i] = new_state;
*avalanche_size = *avalanche_size + 1;
printf("\n\n For k = %d: ",k);
printf("\n\n This is the delta energy with GSA for %d avalanche size:%f", *avalanche_size, delta_energy);
fprintf(delta_energies_GSA,"\n %d\t %d\t %f \n",k, *avalanche_size, delta_energy);
}
I haven't shown the full code because its a very large function of a very large program.
I have also been very careful about when I open and close the file. The file is opened right at the beginning of the function after I have declared my variables. And I close the file right before the function ends.
This is how the file is opened:
{
double d_energy, q_A_minus_1, one_over_q_A_minus_1, prob_term;
neighbor_inf *np;
extern int generate_new_state();
FILE *delta_energies_GSA;
delta_energies_GSA = fopen("delta_energies_GSA.dat", "w");
if (delta_energies_GSA == NULL)
{
printf("I couldn't open delta_energies_GSA.dat for writing.\n");
exit(0);
}
Right after declaring my variables and it is closed before the function ends:
fclose(delta_energies_GSA);
return(stored_state);
} /* end recover_stored_patterns_GSA() */

The fprintf() does exactly what you want to do, append a string to a file, I don't see anything wrong here with your code if the fopen() is outside the for loop (in this case use "w+" with fopen, for append) and there aren't seek to 0.
EDIT
My wrong, not "w+" but "a" for append (and if you don't need to also read the file).
The wrong behavior that you must investigate is "why the fprintf replace the whole file".
Try this simple test.
#include <stdio.h>
int main(int argc, char **argv) {
FILE *f = fopen("test", "w");
if (f) {
int i;
for (i=0; i<100; i++) {
if (i % 3)
fprintf(f, "%d\n", i);
}
fclose(f);
}
return 0;
}

Related

My program creates a file named date.in but it is not inserting all the numbers

Write a C program that reads from the keyboard a natural number n
with up to 9 digits and creates the text file data.out containing the
number n and all its non-zero prefixes, in a single line, separated by
a space, in order decreasing in value. Example: for n = 10305 the data
file.out will contain the numbers: 10305 1030 103 10 1.
This is what I made:
#include <stdio.h>
int main()
{
int n;
FILE *fisier;
fisier=fopen("date.in","w");
printf("n= \n");
scanf("%d",&n);
fprintf(fisier,"%d",n);
while(n!=0)
{
fisier=fopen("date.in","r");
n=n/10;
fprintf(fisier,"%d",n);
}
fclose(fisier);
}
Few things:
Function calls may return error. You need to check that every time.
fisier=fopen("date.in","w");
This should have been followed by an error check. To understand more on what it return, first thing you should do is read the man page for that function. See man page for fopen(). If there is an error in opening the file, it will return NULL and errno is set to a value which indicates what error occurred.
if (NULL == fisier)
{
// Error handling code
;
}
Your next requirement is separating the numbers by a space. There isn't one. The following should do it.
fprintf(fisier, "%d ", n);
The next major problem is opening the file in a loop. Its like you are trying to open a door which is already open.
fisier=fopen("date.in","r");
if(NULL == fisier)
{
// Error handling code
;
}
while(n!=0)
{
n=n/10;
fprintf(fisier,"%d",n);
}
fclose(fisier);
A minor issue that you aren't checking is the number is not having more than 9 digits.
if(n > 999999999)
is apt after you get a number. If you want to deal with negative numbers as well, you can modify this condition the way you want.
In a nutshell, at least to start with, the program should be something similar to this:
#include <stdio.h>
// Need a buffer to read the file into it. 64 isn't a magic number.
// To print a 9 digit number followed by a white space and then a 8 digit number..
// and so on, you need little less than 64 bytes.
// I prefer keeping the memory aligned to multiples of 8.
char buffer[64];
int main(void)
{
size_t readBytes = 0;
int n = 0;
printf("\nEnter a number: ");
scanf("%d", &n);
// Open the file
FILE *pFile = fopen("date.in", "w+");
if(NULL == pFile)
{
// Prefer perror() instead of printf() for priting errors
perror("\nError: ");
return 0;
}
while(n != 0)
{
// Append to the file
fprintf(pFile, "%d ", n);
n = n / 10;
}
// Done, close the file
fclose(pFile);
printf("\nPrinting the file: ");
// Open the file
pFile = fopen("date.in", "r");
if(NULL == pFile)
{
// Prefer perror() instead of printf() for priting errors
perror("\nError: ");
return 0;
}
// Read the file
while((readBytes = fread(buffer, 1, sizeof buffer, pFile)) > 0)
{
// Preferably better way to print the contents of the file on stdout!
fwrite(buffer, 1, readBytes, stdout);
}
printf("\nExiting..\n\n");
return 0;
}
Remember: The person reading your code may not be aware of all the requirements, so comments are necessary. Secondly, I understand english to a decent level but I don't know what 'fisier' means. Its recommended to name variables in such a way that its easy to understand the purpose of the variable. For example, pFile is a pointer to a file. p in the variable immediately gives an idea that its a pointer.
Hope this helps!
To draw a conclusion from all the comments:
fopen returns a file handle when successfull and NULL otherwise. Opening a file twice might result in an error (it does on my machine), such that fisier is set to NULL inside the loop. Obvioulsy fprintf to NULL wont do anything.
You only need to call fopen once, so remove it from the loop. After that it will work as intended.
It's alwas good to check if the fopen succeeded or not:
FILE *fisier;
fisier=fopen("date.in","w");
if(!fisier) { /* handle error */ }
You print no spaces between the numbers. Maybe that's intended, but maybe
fprintf(fisier,"%d ",n);
would be better.

Two processes writing on the same file

I know this is recipe for disaster. And I actually made it work using shared variables.
But it's homework, and teacher definitely wants us to put many processes writing to the same file using different file pointers. I've been trying all day with little success, but I just can't find why this fails.
I have approached the problem in the folowing way:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int status;
int process_count = 0;
do
{
int from = (n / np) * (process_count) + 1;
int to = (n / np) * (process_count + 1);
if (fork() == 0)
{
FILE *aux;
aux = fopen("./parciais.txt", "w");
fseek(aux, sizeof(int) * process_count, SEEK_SET);
int sum = 0;
int i = from;
while (i <= to)
{
int square = i * i;
sum += square;
i++;
}
long int where_am_i = ftell(aux);
printf("I am process %i writing %i on byte: %li\n", process_count, sum, where_am_i);
fwrite(&sum, sizeof(int), 1, aux);
fclose(aux);
exit(1);
}
else
{
wait(&status);
process_count++;
}
} while (process_count < np);
FILE *aux;
aux = fopen("./parciais.txt", "r");
int sum;
for (int i = 0; i <= np - 1; i++)
{
fseek(aux, sizeof(int) * i, SEEK_SET);
long int where_am_i = ftell(aux);
int read;
fread(&read, sizeof(int), 1, aux);
printf("I am reading %i at byte: %li\n", read, where_am_i);
sum += read;
}
}
I expected the output to be something such as:
I am process 0 writing 98021 on byte: 0
I am process 1 writing 677369 on byte: 4
I am process 2 writing 1911310 on byte: 8
I am reading 98021 at byte: 0
I am reading 677369 at byte: 4
I am reading 1911310 at byte: 8
But I get:
I am process 0 writing 98021 on byte: 0
I am process 1 writing 677369 on byte: 4
I am process 2 writing 1911310 on byte: 8
I am reading 0 at byte: 0
I am reading 0 at byte: 4
I am reading 1911310 at byte: 8
This means, for some reason, only the last value is written.
I've been banging my head on the wall over this and I just can't find where's the catch... Can someone please lend me a hand?
The problem is due to fopen("./parciais.txt", "w") :
"w" : "Creates an empty file for writing. If a file with the same name already exists, its content is erased and the file is considered as a new empty file."
Try with "a" instead!
("Appends to a file. Writing operations, append data at the end of the file. The file is created if it does not exist.")
As mentioned in another answer, the "a" argument is not enough either. The file must be created once, hence in the main process, and then accessed in "r+b" mode for the fseek to work correctly!
As #B.Go already answered, the main problem is that you are opening the file with mode "w", which truncates it to zero length if it already exists. Each child process does this, clobbering the contents written by the previous one.
You want this combination of behaviors for the file:
it is created if it does not already exist (or I suppose you want this, at least)
it is not truncated upon opening if it does already exist
you may write to it
writes start at the current file offset, as opposed to automatically going to the current end of the file
the file is binary, not subject to any kind of character translation or to tail truncation upon writing to it
Unfortunately, there is no standard mode that provides all of it: the various r modes require that the file already exist, the w modes truncate the file if it does already exist, and the a modes direct all writes to the current end of the file, regardless of the stream's current offset. If you can assume that the file will already exist then mode "r+b", which can also be spelled "rb+", has all the wanted characteristics except creating the file if it doesn't exist:
aux = fopen("./parciais.txt", "r+b");
That permits reading as well, but just because you can read from the file doesn't mean you have to do. Additionally, on Linux and POSIX-conforming systems, there is no distinction between binary and text files, so you can omit the b if you are confident that your program needs to run only on POSIX systems. That you are using fork() suggests that this condition may apply to you.
If you must provide for creating the file, too, then open it once at the very beginning of the program, using any of the w or a modes depending on whether you want to truncate the file, then immediately close it again:
FILE *aux = fopen("./parciais.txt", "a");
if (aux) {
fclose(aux);
} else {
// handle error ...
}

fscanf while-loop never runs

I'm working on a project, and I can't seem to figure out why a piece of my function for finding prime numbers won't run. Essentially, I want to code to first check the text file log for any previously encountered prime numbers, but no matter what I put for the while-loop containing fscanf(), it seems like my code never enters it.
int filePrime(int a) {
int hold = 0;
FILE *fp = fopen("primes.txt", "a+");
if (fp == NULL) {
printf("Error while opening file.");
exit(2);
}
/*
the while loop below this block is the one with the issue.
on first run, it should skip this loop entirely, and proceed
to finding prime numbers the old-fashioned way, while populating the file.
instead, it is skipping this loop and proceeding right into generating a
new set of prime numbers and writing them to the file, even if the previous
numbers are already in the file
*/
while (fscanf(fp, "%d", &hold) == 1){
printf("Inside scan loop.");
if (hold >= a) {
fclose(fp);
return 1;
}
if (a % hold == 0) {
fclose(fp);
return 0;
}
}
printf("Between scan and print.\n");
for (; hold <= a; hold++) {
if (isPrime(hold) == 1) {
printf("Printing %d to file\n", hold);
fprintf(fp, "%d\n", hold);
if (hold == a)
return 1;
}
}
fclose(fp);
return 0;
}
I have tried all sorts of changes to the while-loop test.
Ex. != 0, != EOF, cutting off the == 1 entirely.
I just can't seem to get my code to enter the loop using fscanf.
Any help is very much appreciated, thank you so much for your time.
In a comment, I asked where the "a+" mode leaves the current position?
On Mac OS X 10.11.4, using "a+" mode opens the file and positions the read/write position at the end of file.
Demo code (aplus.c):
#include <stdio.h>
int main(void)
{
const char source[] = "aplus.c";
FILE *fp = fopen(source, "a+");
if (fp == NULL)
{
fprintf(stderr, "Failed to open file %s\n", source);
}
else
{
int n;
char buffer[128];
fseek(fp, 0L, SEEK_SET);
while ((n = fscanf(fp, "%127s", buffer)) == 1)
printf("[%s]\n", buffer);
printf("n = %d\n", n);
fclose(fp);
}
return(0);
}
Without the fseek(), the return value from n is -1 (EOF) immediately.
With the fseek(), the data (source code) can be read.
One thing slightly puzzles me: I can't find information in the POSIX fopen() specification (or in the C standard) which mentions the read/write position after opening a file with "a+" mode. It's clear that write operations will always be at the end, regardless of intervening uses of fseek().
POSIX stipulates that the call to open() shall use O_RDWR|O_CREAT|O_APPEND for "a+", and open() specifies:
The file offset used to mark the current position within the file shall be set to the beginning of the file.
However, as chux notes (thanks!), the C standard explicitly says:
Annex J Portability issues
J.3 Implementation-defined behaviour
J.3.12 Library functions
…
Whether the file position indicator of an append-mode stream is initially positioned at
the beginning or end of the file (7.21.3).
…
So the behaviour seen is permissible in the C standard.
The manual page on Mac OS X for fopen() says:
"a+" — Open for reading and writing. The file is created if it does not exist. The stream is positioned at the end of the file. Subsequent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.
This is allowed by Standard C; it isn't clear it is fully POSIX-compliant.

Reading and Writing to Files in C

I'm fairly new to C. This is the first program I've written involving reading and writing to files. So far, I was able to read the file, perform the operations I need but I am having trouble with 2 things.
Whatever the file is, it omits the last line when reading. For example if the file has:
3
5
6
It will only read the 3 and 5. But if I leave an empty/blank line at the bottom it'll read all three. Any ideas as why that is?
I now need to take what I did, essentially converting volts to milliVolts, microVolts, etc. and write it back to the file. What I have been doing up until now is reading it from the file and working through the console. So essentially, I want write the last two printf statements to the file. I had some attempts at this but it wasn't working and I couldn't seem to find decent support online for this. When I tried, it would completely erase what was in the file before.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
FILE * file = fopen ("bacon.txt", "r");
float voltage = 0, voltageArray[100], voltageToMilli = 0,voltageToMicro = 0, voltageToKilo = 0, voltageToMega = 0;
int i = 1, j = 0;
fscanf (file, "%f", &voltage);
while (!feof (file)) {
printf("Line # %d\n", i);
printf ("The voltage is: %f\n", voltage);
voltageArray[j] = voltage;
fscanf (file, "%f", &voltage);
printf("The voltage in Element %d is: %f Volts",j,voltageArray[j]);
voltageToMilli = voltageArray[j] * 1000;
voltageToMicro = voltageArray[j] * 1000000;
voltageToKilo = voltageArray[j] * 0.001;
voltageToMega = voltageArray[j] *0.000001;
printf("\nThe voltage is %f Volts, which is: %f milliVolts, %f microVolts, %f kiloVolts, %f megaVolts",voltageArray[j],voltageToMilli,voltageToMicro,voltageToKilo,voltageToMega);
printf("\n\n");
i++;
j++;
}
fclose (file);
return (0);
}
Please try to keep explanations clear and simple as I am a beginner in C. Thank you!
For the first issue, the problem is that the loop logic is incorrect. On each iteration is stores the previous read data, reads the next data and then goes back to the top of the loop. The problem with this is that the next data is not stored until the next iteration. But after reading the last data item (and before storing it into the array) the feof check is always false. Refer to this question for why checking feof as a loop condition is almost always wrong.
Here is an example of how you could restructure your code to read all the items as intended:
int rval;
while ((rval = fscanf(file, "%f", &voltage)) != EOF) {
if (rval != 1) {
printf("Unexpected input\n");
exit(-1);
}
voltageArray[j] = voltage;
/* Do the rest of your processing here. */
}
problem is in the file there is nothing after the last number,
so, after reading the last number from the file, feof(file) is true.
and the while exits.
simplest fix is change it to this
while(fscanf (file, "%f", &voltage) == 1) {
and remove the other fscanf calls.
this works because that fscanf() call will return 1 when it is able
to read a number and either 0 or EOF (which is a negative number)
otherwise.

read multiple fasta sequence using external library kseq.h

I am trying to find fasta sequences of 5 ids/name as provided by user from a big fasta file (containing 80000 fasta sequences) using an external header file kseq.h as in: http://lh3lh3.users.sourceforge.net/kseq.shtml. When I run the program in a for loop, I have to open/close the big fasta file again and again (commented in the code) which makes the computation time slow. On the contrary, if I open/close only once outside the loop, the program stops if it encounters an entry which is not present in the big fasta file I.e. it reaches end of the file. Can anyone suggest how to get all the sequences without losing computational time. The code is:
#include <zlib.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include "ext_libraries/kseq.h"
KSEQ_INIT(gzFile, gzread)
int main(int argc, char *argv[])
{
char gwidd_ids[100];
kseq_t *seq;
int i=0, nFields=0, row=0, col=0;
int size=1000, flag1=0, l=0, index0=0;
printf("Opening file %s\n", argv[1]);
char **gi_ids=(char **)malloc(sizeof(char *)*size);
for(i=0;i<size;i++)
{
gi_ids[i]=(char *)malloc(sizeof(char)*50);
}
FILE *fp_inp = fopen(argv[1], "r");
while(fscanf(fp_inp, "%s", gwidd_ids) == 1)
{
printf("%s\n", gwidd_ids);
strcpy(gi_ids[index0], gwidd_ids);
index0++;
}
fclose(fp_inp);
FILE *f0 = fopen("xxx.txt", "w");
FILE *f1 = fopen("yyy.txt", "w");
FILE *f2 = fopen("zzz", "w");
FILE *instream = NULL;
instream = fopen("fasta_seq_uniprot.txt", "r");
gzFile fpf = gzdopen(fileno(instream), "r");
for(col=0;col<index0;col++)
{
flag1=0;
// FILE *instream = NULL;
// instream = fopen("fasta_seq_nr_uniprot.txt", "r");
// gzFile fpf = gzdopen(fileno(instream), "r");
kseq_t *seq = kseq_init(fpf);
while((kseq_read(seq)) >= 0 && flag1 == 0)
{
if(strcasecmp(gi_ids[col], seq->name.s) == 0)
{
fprintf(f1, ">%s\n", gi_ids[col]);
fprintf(f2, ">%s\n%s\n", seq->name.s, seq->seq.s);
flag1 = 1;
}
}
if(flag1 == 0)
{
fprintf(f0, "%s\n", gi_ids[col]);
}
kseq_destroy(seq);
// gzclose(fpf);
}
gzclose(fpf);
fclose(f0);
fclose(f1);
fclose(f2);
for(i=0;i<size;i++)
{
free(gi_ids[i]);
}
free(gi_ids);
return 0;
}
A few examples of inputfile (fasta_seq_uniprot.txt) is:
P21306
MSAWRKAGISYAAYLNVAAQAIRSSLKTELQTASVLNRSQTDAFYTQYKNGTAASEPTPITK
P38077
MLSRIVSNNATRSVMCHQAQVGILYKTNPVRTYATLKEVEMRLKSIKNIEKITKTMKIVASTRLSKAEKAKISAKKMD
-----------
-----------
The user entry file is
P37592\n
Q8IUX1\n
B3GNT2\n
Q81U58\n
P70453\n
Your problem appears a bit different than you suppose. That the program stops after trying to retrieve a sequence that is not present in the data file is a consequence of the fact that it never rewinds the input. Therefore, even for a query list containing only sequences that are present in the data file, if the requested sequence IDs are not in the same relative order as the data file then the program will fail to find some of the sequences (it will pass them by when looking for an earlier-listed sequence, never to return).
Furthermore, I think it likely that the time savings you observe comes from making only a single pass through the file, instead of a (partial) pass for each requested sequence, not so much from opening it only once. Opening and closing a file is a bit expensive, but nowhere near as expensive as reading tens or hundreds of kilobytes from it.
To answer your question directly, I think you need to take these steps:
Move the kseq_init(seq) call to just before the loop.
Move the kseq_destroy(seq) call to just after the loop.
Put in a call to kseq_rewind(seq) as the last statement in the loop.
That should make your program right again, but it is likely to kill pretty much all your time savings, because you will return to scanning the file from the beginning for each requested sequence.
The library you are using appears to support only sequential access. Therefore, the most efficient way to do the job both right and fast would be to invert the logic: read sequences one at a time in an outer loop, testing each one as you go to see whether it matches any of the requested ones.
Supposing that the list of requested sequences will contain only a few entries, like your example, you probably don't need to do any better testing for matches than just using an inner loop to test each requested sequence id vs. the then-current sequence. If the query lists may be a lot longer, though, then you could consider putting them in a hash table or sorting them into the same order as the data file to make it possible to test more efficiently for matches.

Resources