Strange results with reading binary files in C

Strange results with reading binary files in C - c

I'm working on 64-bit Xubuntu 14.04.
I have a fairly large C program and to test new features, I usually implement them in a separate program to iron out any bugs and whatnot before incorporating them into the main program.
I have a function that takes a const char* as argument, to indicate the path of a file (/dev/rx.bin in this case). It opens the file, reads a specific number of bytes into an array and then does some things before exporting the new data to a different file.
First off I allocate the array:
int16_t samples = (int16_t *)calloc(rx_length, 2 * sizeof(samples[0]));
Note that rx_length is for example 100 samples (closer to 100 000 in the actual program), and it's calculated from the same constants.
Next I open the file and read from it:
uint32_t num_samples_read;
FILE *in_file = fopen(file, "rb");
if (in_file == NULL){
ferror(in_file);
return 1;
}
num_samples_read = fread(samples, 2 * sizeof(samples[0]), rx_length, in_file);
Here's the kicker; the return value from fread is not the same between the test program and the main program, while the code is identical. For example, when I should be reading 100 000 samples from a 400 kB file (100 000 samples, one int16_t for the real part and one int16_t for the imaginary part, adds up to four bytes per sample), the value returned is 99328 in the main program. For the life of me I cannot figure out why.
I've tested the output of every single variable used in any calculation, and up until fread() everything is identical.
I should also note that the function is in a separate header in my main program, but I figured that since printing every constant / definition gives the expected result, that it's not there where I'm making a mistake.
If there's anything that I might have missed, any input would be greatly appreciated.
Regards.

Thank you chux for reminding me to close and answer.
Closing the file was the problem in my main program, it never occurred within the test environment because the input file was not being modified there.
Once the RX thread has completed its task, make a call to fclose():
rx_task_out:
fclose(p->out_file);
// close device
// free sample buffer
return NULL;
Previously, only an error status with creating the RX thread caused it to close the file.

Related

c code: fprintf print on file less times than expected

I ran 10 copy of the same code (changing only some parameters) simultaneously in 10 different folder (each program working in single core).
In each program I have a for loop (the number of iterations, NumSamples, can be 50000, 500000 or 5000000 (the number of iteration depend on the specific time of execution of a single iteration; A greater number of iteration are performed in quicker cases )). Inside each iteration I compute a certain quantity (a double variable) and then save it on file with (inside the for block):
fprintf(fp_TreeEv, "%f\n", TreeEv);
where TreeEv is the name of the variable computed at each cycle.
To be sure that the code save the variable right after the fprintf command I set the buffer to 0 after the file opening, with:
TreeEv=fopen("data.txt", "a");
setbuf(TreeEv, NULL);
The program ends without error. Also I know that all the NumSamples iterations have been done during the execution (I print a variable initialized with 0 at the beginning of the for loop and that increase by one at each cycle).
When I open the txt file, at the ending of code execution I see that inside the file there are less row (data) than expected (NumSamples), for example 4996187 instead of 5000000
(I've also checked that csv file miss the same amount of data with tr -d -c , < filename.csv | wc -c, as suggested by Barmar)
What could be the source of the problem?
I copy below the for loop of my code(inv_trasf is just a function that generate random numbers):\
char NomeFiletxt [64];
char NomeFilecsv [64];
sprintf(NomeFilecsv, "TreeEV%d.csv", N);
sprintf(NomeFiletxt, "TreeEVCol%d.txt", N);
FILE* fp_TreeEv;
fp_TreeEv=fopen(NomeFilecsv, "a");
FILE* fp_TreeEvCol;
fp_TreeEvCol=fopen(NomeFiletxt, "a");
setbuf(fp_TreeEv, NULL);
setbuf(fp_TreeEvCol, NULL);
for(ciclo=0; ciclo<NumSamples; ciclo++){
sum = 0;
sum_2=0;
k_max_int=0;
for(i=0; i<N; i++){
deg_seq[i]=inv_trasf(k_max_double_round);
sum+=deg_seq[i];
sum_2+=(deg_seq[i]*deg_seq[i]);
if(deg_seq[i]>k_max_int){
k_max_int = deg_seq[i];
}
}
if((sum%2)!=0){
do{
v=rand_int(N);
}while(deg_seq[v]==k_max_int);
deg_seq[v]++;
sum++;
sum_2+=(deg_seq[v]*deg_seq[v]-(deg_seq[v]-1)*(deg_seq[v])-1);
}
TreeEV = ((double)sum_2/sum)-1.;
fprintf(fp_TreeEv, "%f,", TreeEV);
fprintf(fp_TreeEvCol, "%f\n", TreeEV);
CycleCount +=1;
}
fclose(fp_TreeEv);
fclose(fp_TreeEvCol);
Could the problem stay in the ssd that, in a given moment, failed to follow the codes and wasn't able to call back data (for the presence of Null buffer)?
Only codes with greater execution time (on the single cycle iteration) save correctly all the expected data. From man setbuf "If the argument buf is NULL, only the mode is affected; a new buffer will be allocated on the next read or write operation."
[EDIT]
I noticed that inside one of the txt file "corrupted" there is one invalid value:
I checked that it is the only value with a different number of digit; to do so I first removed all the "." from the txt file with tr -d \. < TreeEVCol102400.txt > test.txt and then sorted the new file with sort -g test.txt > t.dat. After that is easy to check at the top/end of t.dat file for values with more/less digit.
I've also checked that it is the only value in the file with at least 2 "." with:
grep -ni '\.*[0-9]*\.[0-9]*\.' TreeEVCol102400.txt
I checked that each corrupted files has just one invalid values of this kind.

When you set 'unbuffered' mode on a FILE, each character might1 be written individually to the underlying device/file. In that case, if mulitple processes are appending to the same file, if both try to write a number at the same time, they might get their digits interleaved, as you show with your "corrupted" value. So setting _IONBF mode on a file is generally not what you want.
1It's actually not possible to completely unbuffer a file -- there will still be disk buffers involved in the OS, and may still be a (small) stdio buffer to deal with some corner cases. But in general, every character might be individually writte to a file.

C - Append on 2nd line of file

Im trying to append on the 2nd line of my txt file. The format I want for my txt file is the following:
1 2 3 4 5
1.2 3.5 6.4 1.2 6.5
Basicly, I want to append on the first two lines of the file.
void write_stats(int tries, int num_letters, int tries_sucess)
FILE *stats;
stats = fopen("C:\\Users\\rjmal\\Documents\\CLION PROJECTS\\JogoDaForca\\stats.txt", "a");
if(stats == NULL)
{
printf("can't open file\n");
exit(0);
}
fprintf(stats," %d\n",tries);
fprintf(stats," %f",(float)tries_sucess/num_letters);
fclose(stats);
}
How do I make that without making a new line on the file everytime I run my program?
With the code I made, I get something like:
1
3 1.5
1 2.3

Due to the way files and lines are considered in computers, you can not vertically print as you desire. Instead, what you can do is storing all these numbers (i.e. tries AND (float)tries_sucess/num_letters) in two arrays and printing the contents of each array on the same line in that order. In effect, this would be buffering your content before printing and formatting it as you desire.
In that way, you can print all the data into two lines, which now correspond to an array, each.
Alternatively, you can create two char arrays and consider them as actual string buffers and use sprintf to record into them. Then, once you're done, you can print each char array through a single fprintf call.
Assuming you created two sufficiently long char arrays, below is a sample code for new write_stats. It now only serves to record the stats into two buffers.
void write_stats(int tries, int num_letters, int tries_sucess, char* buffer1, char* buffer2)
{
sprintf(buffer1 + strlen(buffer1)," %d\n",tries);
sprintf(buffer2 + strlen(buffer2)," %f",(float)tries_sucess/num_letters);
}
Note that you need to initiate the buffers with 0 to be able to easily make use of strlen function as I did. Also, you will eventually (i.e. when you are done calling write_stats ) need to call fprintf, in a block that buffer1 and buffer2 are defined in, as follows.
FILE *stats;
stats = fopen("C:\\Users\\rjmal\\Documents\\CLION PROJECTS\\JogoDaForca\\stats.txt", "a");
if(stats == NULL)
{
printf("can't open file\n");
exit(0);
}
fprintf(stats,"%s\n%s", buffer1, buffer2);
fclose(stats);
Since there are quite a few details to keep in mind, I think it is best you see this idea at work. See here for a working implementation, with some comments to help elaborate some details. As you may observe, the output given there is horizontal and is in 2 lines, as you described and as given below.
1 3 13 55 233
2.000000 1.600000 1.619048 1.617978 1.618037

Reading a jpeg file byte by byte

For the class cs50, I have to read in jpeg files byte by byte from a memory card in order to look at the header information. The file compiles well, but whenever I execute the file, it returns a "segmentation fault(core dumped)" message.
Edit) Okay, now I know why I have to use an "unsigned char" instead of "int*". Can someone tell me how I can store information into files within scope for this particular code? Right now, I am trying to store information outside of an if() condition, and I don't think the fread function is actually accessing the "image" file I opened.
#include <stdio.h>
#include <string.h>
#include <math.h>
FILE * image = NULL;
int main(int argc, char* argv[])
{
FILE* infile = fopen("card.raw", "r");
if (infile == NULL)
{
printf("Could not open.\n");
fclose(infile);
return 1;
}
unsigned char storage[512];
int number = 0;
int b = floor((number) / 100);
int c = floor(((number) - (b * 100))/ 10);
int d = floor(((number) - (b * 100) - (c * 10)));
int writing = 0;
char string[5];
char* extension = ".jpg";
while (fread(&storage, sizeof(storage), 1, infile))
{
if (storage == NULL)
{
break;
}
if (storage[0] == 0xff && storage[1] == 0xd8 && storage[2] == 0xff)
{
if (storage[3] == 0xe0 || storage[3] == 0xe1)
{
if (image != NULL)
{
fclose(image);
}
sprintf(string, "%d%d%d%s", b, c, d, extension);
image = fopen(string, "w");
number++;
writing = 1;
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
}
}
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
if (storage == NULL)
{
fclose(image);
}
}
fclose(image);
fclose(infile);
return 0;
}
This is the problem set just in case my explanation is not clear.
recover
In anticipation of this problem set, I spent the past several days snapping photos of people I know, all of which were saved by my
digital camera as JPEGs on a 1GB CompactFlash (CF) card. (It’s
possible I actually spent the past several days on Facebook instead.)
Unfortunately, I’m not very good with computers, and I somehow deleted
them all! Thankfully, in the computer world, "deleted" tends not to
mean "deleted" so much as "forgotten." My computer insists that the CF
card is now blank, but I’m pretty sure it’s lying to me.
Write in ~/Dropbox/pset4/jpg/recover.c a program that recovers these photos.
Ummm.
Okay, here’s the thing. Even though JPEGs are more complicated than BMPs, JPEGs have "signatures," patterns of bytes that distinguish
them from other file formats. In fact, most JPEGs begin with one of
two sequences of bytes. Specifically, the first four bytes of most
JPEGs are either
0xff 0xd8 0xff 0xe0 or 0xff 0xd8 0xff 0xe1
from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos
(e.g., my CF card), they demark the start of a JPEG. (To be sure, you
might encounter these patterns on some disk purely by chance, so data
recovery isn’t an exact science.)
Fortunately, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately
after the previously taken photo. Accordingly, the start of a JPEG
usually demarks the end of another. However, digital cameras generally
initialize CF cards with a FAT file system whose "block size" is 512
bytes (B). The implication is that these cameras only write to those
cards in units of 512 B. A photo that’s 1 MB (i.e.,
1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 "blocks" on a CF card. But so does a photo that’s, say, one byte smaller (i.e.,
1,048,575 B)! The wasted space on disk is called "slack space."
Forensic investigators often look at slack space for remnants of
suspicious data.
The implication of all these details is that you, the investigator, can probably write a program that iterates over a copy
of my CF card, looking for JPEGs' signatures. Each time you find a
signature, you can open a new file for writing and start filling that
file with bytes from my CF card, closing that file only once you
encounter another signature. Moreover, rather than read my CF card’s
bytes one at a time, you can read 512 of them at a time into a buffer
for efficiency’s sake. Thanks to FAT, you can trust that JPEGs'
signatures will be "block-aligned." That is, you need only look for
those signatures in a block’s first four bytes.
Realize, of course, that JPEGs can span contiguous blocks. Otherwise, no JPEG could be larger than 512 B. But the last byte of a
JPEG might not fall at the very end of a block. Recall the possibility
of slack space. But not to worry. Because this CF card was brand- new
when I started snapping photos, odds are it’d been "zeroed" (i.e.,
filled with 0s) by the manufacturer, in which case any slack space
will be filled with 0s. It’s okay if those trailing 0s end up in the
JPEGs you recover; they should still be viewable.
Now, I only have one CF card, but there are a whole lot of you! And so I’ve gone ahead and created a "forensic image" of the card,
storing its contents, byte after byte, in a file called card.raw . So
that you don’t waste time iterating over millions of 0s unnecessarily,
I’ve only imaged the first few megabytes of the CF card. But you
should ultimately find that the image contains 16 JPEGs. As usual, you
can open the file programmatically with
fopen , as in the below. FILE* file = fopen("card.raw", "r");
Notice, incidentally, that ~/Dropbox/pset4/jpg contains only recover.c, but it’s devoid of any code. (We leave it to you to decide
how to implement and compile recover!) For simplicity, you should
hard-code "card.raw" in your program; your program need not accept any
command-line arguments. When executed, though, your program should
recover every one of the JPEGs from card.raw, storing each as a
separate file in your current working directory. Your program should
number the files it
outputs by naming each , ###.jpg where ### is three-digit decimal number from 000 on up. (Befriend sprintf.) You need not try to
recover the JPEGs' original names. To
check whether the JPEGs your program spit out are correct, simply double-click and take a look! If each photo appears intact,
your operation was likely a success!
Odds are, though, the JPEGs that the first draft of your code spits out won’t be correct. (If you open them up and don’t see
anything, they’re probably not correct!) Execute the command below to
delete all JPEGs in your current working directory.
rm *.jpg
If you’d rather not be prompted to confirm each deletion, execute the command below instead.
rm -f *.jpg
Just be careful with that -f switch, as it "forces" deletion without prompting you.

int* storage[512];
You define a pointer to a memory location for 512 ints, but you don't actually reserve the space (only the pointer.
I suspect you just want
int storage[512];
After this, storage is still a pointer, but now it actually points to 512 ints. Though I still think you don't want this. You need 'bytes' not ints. The nearest C has are unsigned char. So the final declaration is:
unsigned char storage[512];
Why? Because read reads into consecutive bytes. If you read into ints, then you will read 4 bytes into each int (because an int occupies 4 bytes).

There are a number of problems in your program. The first is that you have not opened the file in binary mode.
The second is that you are doing unnecessary pointer arithmetic. Why not—
char buffer [BUFFERSIZE] ;
....
if (buffer [ii] == WHATEVER)

File returns garbage, but writes correctly

I'm writing a struct into a file, but it returns garbage. Here is my code:
ptFile = fopen("funcionarios.dat", "ab+");
fwrite(&novoFunc, sizeof(strFunc), 1, ptFile);
The values of struct novoFunc, before and after the fwrite are not garbage.
However, when I return the file values:
ptFile = fopen("funcionarios.dat", "rb+");
[...]
fseek(ptFile, i*sizeof(strFunc), SEEK_SET); //on the loop, i goes from 0 to total structs
fread(&funcionario, sizeof(strFunc), 1, ptFile);
printf("Code: %d; Name: %s; Address: %s; CPF: %d; Sales: %d\n", funcionario.codigo, funcionario.nome, funcionario.endereco, funcionario.cpf, funcionario.numVendas);
Any idea why? The code was working fine, and I dont remember doing significative changes.
Thanks in advance
Edit: Struct definition
typedef struct func{
int codigo;
char nome[50];
char endereco[100];
int cpf;
int numVendas;
int ativo;
} strFunc;
Edit2: It just got weirder: it works fine on linux (using netbeans and gcc compiler), but it doesnt on windows (devcpp and codeblocks). Well, the entire code is here:
http://pastebin.com/XjDzAQCx
the function cadastraFucionario() register the user, and when I use listaFuncionarios(), to list all the registered data, it returns the garbage. Here is a print of what listaFuncionarios() returns:
http://img715.imageshack.us/img715/3002/asodfadhf.jpg
Im sorry the code isnt in english

You say: "The code was working fine, and I dont remember doing significative changes."
When it was working fine, it wrote some structures into your file.
Maybe later it was still working fine, and it appended some additional structures at the end of your file. The original data still remained at the beginning of your file. So when you read the beginning of the file, you read the original data. Maybe.
Are you sure that you read garbage? Are you sure that you didn't just read old data?
In your code:
ptFile = fopen("funcionarios.dat", "ab+");
Appending is the right thing to do for some purposes but not for others. Do you need wb+ instead?

This:
it works fine on linux ... but it doesnt on windows
is a big red flag. Windows has "text" files which are different to "binary" files. On Linux and other Unixes, there is no difference.
Two lines in your source stand out:
fopen("funcionarios.dat", "rb+");
and later
fopen("funcionarios.dat", "r+");
That is, sometimes you open the file in "binary" mode, and sometimes in "text" mode. Make sure you always open any file in "binary" mode (that is, with the b in the mode string) if you ever intend to read or write non-text data.

Here are two problems in your function retornaIndice.
while(!feof(ptFile)){
fseek(ptFile, sizeof(strFunc)*i, SEEK_SET);
fread(&tmpFunc, sizeof(strFunc), 1, ptFile);
You aren't checking the result of fread. After reading the last record, eof has not been reached yet, so you will try another read. That read will reach eof and will return 0, but you aren't checking for that 0, so you will use garbage data and will exit the loop the next time the while statement tests it.
if(codigo != 0 && tmpFunc.ativo){
if(tmpFunc.codigo == codigo){
return i;
}
If you detect a problem at this point, you don't close ptFile. The leaked handle shouldn't cause garbage data to be written to the file, but it doesn't inspire confidence either.
Some of your other functions have the same errors.

Opening a file in C through a proccess

I am trying to create a a program that does the following actions:
Open a file and read one line.
Open another file and read another line.
Compare the two lines and print a message.
This is my code:
#include <stdio.h>
#include <string.h>
int findWord(char sizeLineInput2[512]);
int main()
{
FILE*cfPtr2,*cfPtr1;
int i;
char sizeLineInput1[512],sizeLineInput2[512];
cfPtr2=fopen("mike2.txt","r");
// I open the first file
while (fgets(sizeLineInput2, 512, cfPtr2)!=NULL)
// I read from the first 1 file one line
{
if (sizeLineInput2[strlen(sizeLineInput2)-1]=='\n')
sizeLineInput2[strlen(sizeLineInput2)-1]='\0';
printf("%s \n",sizeLineInput2);
i=findWord(sizeLineInput2);
//I call the procedure that compares the two lines
}
getchar();
return 0;
}
int findWord(char sizeLineInput2[512])
{
int x;
char sizeLineInput1[512];
File *cfPtr1;
cfPtr1=fopen("mike1.txt","r");
// here I open the second file
while (fgets(sizeLineInput1, 512,cfPtr1)!=NULL)
{
if (sizeLineInput1[strlen(sizeLineInput1)-1]=='\n')
sizeLineInput1[strlen(sizeLineInput1)-1]='\0';
if (strcmp(sizeLineInput1,sizeLineInput2)==0)
//Here, I compare the two lines
printf("the words %s and %s are equal!\n",sizeLineInput1,sizeLineInput2);
else
printf("the words %s and %s are not equal!\n",sizeLineInput1,sizeLineInput2);
}
fclose(cfPtr1);
return 0;
}
It seems to have some problem with file pointers handling. Could someone check it and tell me what corrections I have to do?

Deconstruction and Reconstruction
The current code structure is, to be polite about it, cock-eyed.
You should open the files in the same function - probably main(). There should be two parallel blocks of code. In fact, ideally, you'd do your opening and error handling in a function so that main() simply contains:
FILE *cfPtr1 = file_open("mike1.txt");
FILE *cfPtr2 = file_open("mike2.txt");
If control returns to main(), the files are open, ready for use.
You then need to read a line from each file - in main() again. If either file does not contain a line, then you can bail out with an appropriate error:
if (fgets(buffer1, sizeof(buffer1), cfPtr1) == 0)
...error: failed to read file1...
if (fgets(buffer2, sizeof(buffer2), cfPtr2) == 0)
...error: failed to read file2...
Then you call you comparison code with the two lines:
findWord(buffer1, buffer2);
You need to carefully segregate the I/O operations from the actual processing of data; if you interleave them as in your first attempt, it makes everything very messy. I/O tends to be messy, simply because you have error conditions to deal with - that's why I shunted the open operation into a separate function (doubly so since you need to do it twice).
You could decide to wrap the fgets() call and error handling up in a function, too:
const char *file1 = "mike1.txt";
const char *file2 = "mike2.txt";
read_line(cfPtr1, file1, buffer1, sizeof(buffer1));
read_line(cfPtr2, file2, buffer2, sizeof(buffer2));
That function can trim the newline off the end of the string and deal with anything else that you want it to do - and report an accurate error, including the file name, if anything goes wrong. Clearly, with the variables 'file1' and 'file2' on hand, you'd use those instead of literal strings in the file_open() calls. Note, too, that making them into variables means it is trivial to take the file names from the command line; you simply set 'file1' and 'file2' to point to the argument list instead of the hard-wired defaults. (I actually wrote: const char file1[] = "mike1.txt"; briefly - but then realized that if you handle the file names via the command line, then you need pointers, not arrays.)
Also, if you open a file, you should close the file too. Granted, if your program exits, the o/s cleans up behind you, but it is a good discipline to get into. One reason is that not every program exits (think of the daemons running services on your computer). Another is that you quite often use a resource (file, in the current discussion) briefly and do not need it again. You should not hold resources in your program for longer than you need them.
Philosophy
Polya, in his 1957 book "How To Solve It", has a dictum:
Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.
That is as valid advice in programming as it is in mathematics. And in their classic 1978 book 'The Elements of Programming Style', Kernighan and Plauger make the telling statements:
[The] subroutine call permits us to summarize the irregularities in the argument list [...]
The subroutine itself summarizes the regularities of the code.
In more modern books such as 'The Pragmatic Programmer' by Hunt & Thomas (1999), the dictum is translated into a snappy TLA:
DRY - Don't Repeat Yourself.
If you find your code doing the 'same' lines of code repeated several times, write a subroutine to do it once and call the subroutine several times.
That is what my suggested rewrite is aiming at.

In both main() and findWord() you should not use strlen(sizeLineInputX) right after reading the file with fgets() - there may be no '\0' in sizeLineInput2 and you will have strlen() read beyond the 512 bytes you have.
Instead of using fgets use fgetc to read char by char and check for a newline character (and for EOF too).
UPD to your UPD: you compare each line of mike2.txt with each line of mike1.txt - i guess that's not what you want. Open both files one outside while loop in main(), use one loop for both files and check for newline and EOF on both of them in that loop.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Strange results with reading binary files in C - c

Related

c code: fprintf print on file less times than expected

C - Append on 2nd line of file

Reading a jpeg file byte by byte

File returns garbage, but writes correctly

Opening a file in C through a proccess

Categories

Resources