c code: fprintf print on file less times than expected - c

I ran 10 copy of the same code (changing only some parameters) simultaneously in 10 different folder (each program working in single core).
In each program I have a for loop (the number of iterations, NumSamples, can be 50000, 500000 or 5000000 (the number of iteration depend on the specific time of execution of a single iteration; A greater number of iteration are performed in quicker cases )). Inside each iteration I compute a certain quantity (a double variable) and then save it on file with (inside the for block):
fprintf(fp_TreeEv, "%f\n", TreeEv);
where TreeEv is the name of the variable computed at each cycle.
To be sure that the code save the variable right after the fprintf command I set the buffer to 0 after the file opening, with:
TreeEv=fopen("data.txt", "a");
setbuf(TreeEv, NULL);
The program ends without error. Also I know that all the NumSamples iterations have been done during the execution (I print a variable initialized with 0 at the beginning of the for loop and that increase by one at each cycle).
When I open the txt file, at the ending of code execution I see that inside the file there are less row (data) than expected (NumSamples), for example 4996187 instead of 5000000
(I've also checked that csv file miss the same amount of data with tr -d -c , < filename.csv | wc -c, as suggested by Barmar)
What could be the source of the problem?
I copy below the for loop of my code(inv_trasf is just a function that generate random numbers):\
char NomeFiletxt [64];
char NomeFilecsv [64];
sprintf(NomeFilecsv, "TreeEV%d.csv", N);
sprintf(NomeFiletxt, "TreeEVCol%d.txt", N);
FILE* fp_TreeEv;
fp_TreeEv=fopen(NomeFilecsv, "a");
FILE* fp_TreeEvCol;
fp_TreeEvCol=fopen(NomeFiletxt, "a");
setbuf(fp_TreeEv, NULL);
setbuf(fp_TreeEvCol, NULL);
for(ciclo=0; ciclo<NumSamples; ciclo++){
sum = 0;
sum_2=0;
k_max_int=0;
for(i=0; i<N; i++){
deg_seq[i]=inv_trasf(k_max_double_round);
sum+=deg_seq[i];
sum_2+=(deg_seq[i]*deg_seq[i]);
if(deg_seq[i]>k_max_int){
k_max_int = deg_seq[i];
}
}
if((sum%2)!=0){
do{
v=rand_int(N);
}while(deg_seq[v]==k_max_int);
deg_seq[v]++;
sum++;
sum_2+=(deg_seq[v]*deg_seq[v]-(deg_seq[v]-1)*(deg_seq[v])-1);
}
TreeEV = ((double)sum_2/sum)-1.;
fprintf(fp_TreeEv, "%f,", TreeEV);
fprintf(fp_TreeEvCol, "%f\n", TreeEV);
CycleCount +=1;
}
fclose(fp_TreeEv);
fclose(fp_TreeEvCol);
Could the problem stay in the ssd that, in a given moment, failed to follow the codes and wasn't able to call back data (for the presence of Null buffer)?
Only codes with greater execution time (on the single cycle iteration) save correctly all the expected data. From man setbuf "If the argument buf is NULL, only the mode is affected; a new buffer will be allocated on the next read or write operation."
[EDIT]
I noticed that inside one of the txt file "corrupted" there is one invalid value:
I checked that it is the only value with a different number of digit; to do so I first removed all the "." from the txt file with tr -d \. < TreeEVCol102400.txt > test.txt and then sorted the new file with sort -g test.txt > t.dat. After that is easy to check at the top/end of t.dat file for values with more/less digit.
I've also checked that it is the only value in the file with at least 2 "." with:
grep -ni '\.*[0-9]*\.[0-9]*\.' TreeEVCol102400.txt
I checked that each corrupted files has just one invalid values of this kind.

When you set 'unbuffered' mode on a FILE, each character might1 be written individually to the underlying device/file. In that case, if mulitple processes are appending to the same file, if both try to write a number at the same time, they might get their digits interleaved, as you show with your "corrupted" value. So setting _IONBF mode on a file is generally not what you want.
1It's actually not possible to completely unbuffer a file -- there will still be disk buffers involved in the OS, and may still be a (small) stdio buffer to deal with some corner cases. But in general, every character might be individually writte to a file.

Related

How to get the cursor location during parsing?

I made a minimal example for Packcc parser generator.
Here, the parser have to recognize float or integer numbers.
I try to print the location of the detected numbers. For simplicity there is no
line/column count, just the number from "ftell".
%auxil "FILE*" # The type sent to "pcc_create" for access in "ftell".
test <- line+
/
_ EOL+
line <- num _ EOL
num <- [0-9]+'.'[0-9]+ {printf("Float at %li\n", ftell(auxil));}
/
[0-9]+ {printf("Integer at %li\n", ftell(auxil));}
_ <- [ \t]*
EOL <- '\n' / '\r\n' / '\r'
%%
int main()
{
FILE* file = fopen("test.txt", "r");
stdin = file;
if(file == NULL) {
// try to open.
puts("File not found");
}
else {
// parse.
pcc_context_t *ctx = pcc_create(file);
while(pcc_parse(ctx, NULL));
pcc_destroy(ctx);
}
return 0;
}
The file to parse can be
2.0
42
The command can be
packcc test.peg && cc test.c && ./a.out
The problem is the cursor value is always at the end of file whatever the number
position.
Positions can be retrieved by special variables.
In the example above "ftell" must be replaced by "$0s" or "$0e".
$0s is the begining of the matched pattern, $0e is the end of the matched pattern.
https://github.com/arithy/packcc/blob/master/README.md
Without looking more closely at the generated code, it would seem that the parser insists on reading the entire text into memory before executing any of the actions. That seems unnecessary for this grammar, and it is certainly not the way a typical generated lexical scanner would work. It's particularly odd since it seems like the generated scanner uses getchar to read one byte at a time, which is not very efficient if you are planning to read the entire file.
To be fair, you wouldn't be able to use ftell in a flex-generated scanner either, unless you forced the scanner into interactive mode. (The original AT&T lex, which also reads one character at a time, would give you reasonable value from ftell. But you're unlikely to find a scanner built with it anymore.)
Flex would give you the wrong answer because it deliberately reads its input in chunks the size of its buffer, usually 8k. That's a lot more efficient than character-at-a-time reading. But it doesn't work for interactive environments -- for example, where you are parsing directly from user input -- because you don't want to read beyond the end of the line the user typed.
You'll have to ask whoever maintains packcc what their intended approach for maintaining source position is. It's possible that they have something built in.

C - Append on 2nd line of file

Im trying to append on the 2nd line of my txt file. The format I want for my txt file is the following:
1 2 3 4 5
1.2 3.5 6.4 1.2 6.5
Basicly, I want to append on the first two lines of the file.
void write_stats(int tries, int num_letters, int tries_sucess)
FILE *stats;
stats = fopen("C:\\Users\\rjmal\\Documents\\CLION PROJECTS\\JogoDaForca\\stats.txt", "a");
if(stats == NULL)
{
printf("can't open file\n");
exit(0);
}
fprintf(stats," %d\n",tries);
fprintf(stats," %f",(float)tries_sucess/num_letters);
fclose(stats);
}
How do I make that without making a new line on the file everytime I run my program?
With the code I made, I get something like:
1
3 1.5
1 2.3
Due to the way files and lines are considered in computers, you can not vertically print as you desire. Instead, what you can do is storing all these numbers (i.e. tries AND (float)tries_sucess/num_letters) in two arrays and printing the contents of each array on the same line in that order. In effect, this would be buffering your content before printing and formatting it as you desire.
In that way, you can print all the data into two lines, which now correspond to an array, each.
Alternatively, you can create two char arrays and consider them as actual string buffers and use sprintf to record into them. Then, once you're done, you can print each char array through a single fprintf call.
Assuming you created two sufficiently long char arrays, below is a sample code for new write_stats. It now only serves to record the stats into two buffers.
void write_stats(int tries, int num_letters, int tries_sucess, char* buffer1, char* buffer2)
{
sprintf(buffer1 + strlen(buffer1)," %d\n",tries);
sprintf(buffer2 + strlen(buffer2)," %f",(float)tries_sucess/num_letters);
}
Note that you need to initiate the buffers with 0 to be able to easily make use of strlen function as I did. Also, you will eventually (i.e. when you are done calling write_stats ) need to call fprintf, in a block that buffer1 and buffer2 are defined in, as follows.
FILE *stats;
stats = fopen("C:\\Users\\rjmal\\Documents\\CLION PROJECTS\\JogoDaForca\\stats.txt", "a");
if(stats == NULL)
{
printf("can't open file\n");
exit(0);
}
fprintf(stats,"%s\n%s", buffer1, buffer2);
fclose(stats);
Since there are quite a few details to keep in mind, I think it is best you see this idea at work. See here for a working implementation, with some comments to help elaborate some details. As you may observe, the output given there is horizontal and is in 2 lines, as you described and as given below.
1 3 13 55 233
2.000000 1.600000 1.619048 1.617978 1.618037

Strange results with reading binary files in C

I'm working on 64-bit Xubuntu 14.04.
I have a fairly large C program and to test new features, I usually implement them in a separate program to iron out any bugs and whatnot before incorporating them into the main program.
I have a function that takes a const char* as argument, to indicate the path of a file (/dev/rx.bin in this case). It opens the file, reads a specific number of bytes into an array and then does some things before exporting the new data to a different file.
First off I allocate the array:
int16_t samples = (int16_t *)calloc(rx_length, 2 * sizeof(samples[0]));
Note that rx_length is for example 100 samples (closer to 100 000 in the actual program), and it's calculated from the same constants.
Next I open the file and read from it:
uint32_t num_samples_read;
FILE *in_file = fopen(file, "rb");
if (in_file == NULL){
ferror(in_file);
return 1;
}
num_samples_read = fread(samples, 2 * sizeof(samples[0]), rx_length, in_file);
Here's the kicker; the return value from fread is not the same between the test program and the main program, while the code is identical. For example, when I should be reading 100 000 samples from a 400 kB file (100 000 samples, one int16_t for the real part and one int16_t for the imaginary part, adds up to four bytes per sample), the value returned is 99328 in the main program. For the life of me I cannot figure out why.
I've tested the output of every single variable used in any calculation, and up until fread() everything is identical.
I should also note that the function is in a separate header in my main program, but I figured that since printing every constant / definition gives the expected result, that it's not there where I'm making a mistake.
If there's anything that I might have missed, any input would be greatly appreciated.
Regards.
Thank you chux for reminding me to close and answer.
Closing the file was the problem in my main program, it never occurred within the test environment because the input file was not being modified there.
Once the RX thread has completed its task, make a call to fclose():
rx_task_out:
fclose(p->out_file);
// close device
// free sample buffer
return NULL;
Previously, only an error status with creating the RX thread caused it to close the file.

How to save results of a function into text file in C

This function print the length of words with '*' called histogram.How can I save results into text file? I tried but the program does not save the results.(no errors)
void histogram(FILE *myinput)
{
FILE *ptr;
printf("\nsaving results...\n");
ptr=fopen("results1.txt","wt");
int j, n = 1, i = 0;
size_t ln;
char arr[100][10];
while(n > 0)
{
n = fscanf(myinput, "%s",arr[i]);
i++;
}
n = i;
for(i = 0; i < n - 1; i++)
{
ln=strlen(arr[i]);
fprintf(ptr,"%s \t",arr[i]);
for(j=0;j<ln;j++)
fprintf(ptr, "*");
fprintf(ptr, "\n");
}
fclose(myinput);
fclose(ptr);
}
I see two ways to take care of this issue:
Open a file in the program and write to it.
If running with command line, change the output location for standard out
$> ./histogram > outfile.txt
Using the '>' will change where standard out will write to. The issue with '>' is that it will truncate a file and then write to the file. This means that if there was any data in that file before, it is gone. Only the new data written by the program will be there.
If you need to keep the data in the file, you can change the standard out to append the file with '>>' as in the following example:
$> ./histogram >> outfile.txt
Also, there does not have to be a space between '>' and the file name. I just do that for preference. It could look like this:
$> ./histogram >outfile.txt
If your writing to a file will be a one time thing, changing standard out is probably be best way to go. If you are going to do it every time, then add it to the code.
You will need to open another FILE. You can do this in the function or pass it in like you did the file being read from.
Use 'fprintf' to write to the file:
int fprintf(FILE *restrict stream, const char *restrict format, ...);
Your program may have these lines added to write to a file:
FILE *myoutput = fopen("output.txt", "w"); // or "a" if you want to append
fprintf(myoutput, "%s \t",arr[i]);
Answer Complete
There may be some other issues as well that I will discuss now.
Your histogram function does not have a return identifier. C will set it to 'int' automatically and then say that you do not have a return value for the function. From what you have provided, I would add the 'void' before the function name.
void histogram {
The size of arr's second set of arrays may be to small. One can assume that the file you are reading from does not exceed 10 characters per token, to include the null terminator [\0] at the end of the string. This would mean that there could be at most 9 characters in a string. Else you are going to overflow the location and potentially mess your data up.
Edit
The above was written before a change to the provided code that now includes a second file and fprintf statements.
I will point to the line that opens the out file:
ptr=fopen("results1.txt","wt");
I am wondering if you mean to put "w+" where the second character is a plus symbol. According to the man page there are six possibilities:
The argument mode points to a string beginning with one of the
following sequences (possibly followed by additional characters, as
described below):
r Open text file for reading. The stream is positioned at the
beginning of the file.
r+ Open for reading and writing. The stream is positioned at the
beginning of the file.
w Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
w+ Open for reading and writing. The file is created if it does
not exist, otherwise it is truncated. The stream is
positioned at the beginning of the file.
a Open for appending (writing at end of file). The file is
created if it does not exist. The stream is positioned at the
end of the file.
a+ Open for reading and appending (writing at end of file). The
file is created if it does not exist. The initial file
position for reading is at the beginning of the file, but
output is always appended to the end of the file.
As such, it appears you are attempting to open the file for reading and writing.

Opening a file in C through a proccess

I am trying to create a a program that does the following actions:
Open a file and read one line.
Open another file and read another line.
Compare the two lines and print a message.
This is my code:
#include <stdio.h>
#include <string.h>
int findWord(char sizeLineInput2[512]);
int main()
{
FILE*cfPtr2,*cfPtr1;
int i;
char sizeLineInput1[512],sizeLineInput2[512];
cfPtr2=fopen("mike2.txt","r");
// I open the first file
while (fgets(sizeLineInput2, 512, cfPtr2)!=NULL)
// I read from the first 1 file one line
{
if (sizeLineInput2[strlen(sizeLineInput2)-1]=='\n')
sizeLineInput2[strlen(sizeLineInput2)-1]='\0';
printf("%s \n",sizeLineInput2);
i=findWord(sizeLineInput2);
//I call the procedure that compares the two lines
}
getchar();
return 0;
}
int findWord(char sizeLineInput2[512])
{
int x;
char sizeLineInput1[512];
File *cfPtr1;
cfPtr1=fopen("mike1.txt","r");
// here I open the second file
while (fgets(sizeLineInput1, 512,cfPtr1)!=NULL)
{
if (sizeLineInput1[strlen(sizeLineInput1)-1]=='\n')
sizeLineInput1[strlen(sizeLineInput1)-1]='\0';
if (strcmp(sizeLineInput1,sizeLineInput2)==0)
//Here, I compare the two lines
printf("the words %s and %s are equal!\n",sizeLineInput1,sizeLineInput2);
else
printf("the words %s and %s are not equal!\n",sizeLineInput1,sizeLineInput2);
}
fclose(cfPtr1);
return 0;
}
It seems to have some problem with file pointers handling. Could someone check it and tell me what corrections I have to do?
Deconstruction and Reconstruction
The current code structure is, to be polite about it, cock-eyed.
You should open the files in the same function - probably main(). There should be two parallel blocks of code. In fact, ideally, you'd do your opening and error handling in a function so that main() simply contains:
FILE *cfPtr1 = file_open("mike1.txt");
FILE *cfPtr2 = file_open("mike2.txt");
If control returns to main(), the files are open, ready for use.
You then need to read a line from each file - in main() again. If either file does not contain a line, then you can bail out with an appropriate error:
if (fgets(buffer1, sizeof(buffer1), cfPtr1) == 0)
...error: failed to read file1...
if (fgets(buffer2, sizeof(buffer2), cfPtr2) == 0)
...error: failed to read file2...
Then you call you comparison code with the two lines:
findWord(buffer1, buffer2);
You need to carefully segregate the I/O operations from the actual processing of data; if you interleave them as in your first attempt, it makes everything very messy. I/O tends to be messy, simply because you have error conditions to deal with - that's why I shunted the open operation into a separate function (doubly so since you need to do it twice).
You could decide to wrap the fgets() call and error handling up in a function, too:
const char *file1 = "mike1.txt";
const char *file2 = "mike2.txt";
read_line(cfPtr1, file1, buffer1, sizeof(buffer1));
read_line(cfPtr2, file2, buffer2, sizeof(buffer2));
That function can trim the newline off the end of the string and deal with anything else that you want it to do - and report an accurate error, including the file name, if anything goes wrong. Clearly, with the variables 'file1' and 'file2' on hand, you'd use those instead of literal strings in the file_open() calls. Note, too, that making them into variables means it is trivial to take the file names from the command line; you simply set 'file1' and 'file2' to point to the argument list instead of the hard-wired defaults. (I actually wrote: const char file1[] = "mike1.txt"; briefly - but then realized that if you handle the file names via the command line, then you need pointers, not arrays.)
Also, if you open a file, you should close the file too. Granted, if your program exits, the o/s cleans up behind you, but it is a good discipline to get into. One reason is that not every program exits (think of the daemons running services on your computer). Another is that you quite often use a resource (file, in the current discussion) briefly and do not need it again. You should not hold resources in your program for longer than you need them.
Philosophy
Polya, in his 1957 book "How To Solve It", has a dictum:
Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.
That is as valid advice in programming as it is in mathematics. And in their classic 1978 book 'The Elements of Programming Style', Kernighan and Plauger make the telling statements:
[The] subroutine call permits us to summarize the irregularities in the argument list [...]
The subroutine itself summarizes the regularities of the code.
In more modern books such as 'The Pragmatic Programmer' by Hunt & Thomas (1999), the dictum is translated into a snappy TLA:
DRY - Don't Repeat Yourself.
If you find your code doing the 'same' lines of code repeated several times, write a subroutine to do it once and call the subroutine several times.
That is what my suggested rewrite is aiming at.
In both main() and findWord() you should not use strlen(sizeLineInputX) right after reading the file with fgets() - there may be no '\0' in sizeLineInput2 and you will have strlen() read beyond the 512 bytes you have.
Instead of using fgets use fgetc to read char by char and check for a newline character (and for EOF too).
UPD to your UPD: you compare each line of mike2.txt with each line of mike1.txt - i guess that's not what you want. Open both files one outside while loop in main(), use one loop for both files and check for newline and EOF on both of them in that loop.

Resources