Read binary file using struct to find records - c

I'm using the following code to read a binary file and use a struct to output the data. However, I know my data consist of only one record, and it seems to be printing out many records. I'm wondering why this might be?
FILE *p;
struct myStruct x;
p=fopen("myfile","rb");
//printf("Rcords of file:\n");
while(1)
{
fread(&x,sizeof(x),1,p);
if(feof(p)!=0)
break;
printf("\n\nID:%ld",x.ID);
}
fclose(p);
return 0;
The struct is fairly normal like this:
struct myStruct
{
int ID;
char name[100];
}

Use %d instead of %ld to print an int
And take a look to Why is “while ( !feof (file) )” always wrong?
A struct haves a fixed size, you can use ftell to get the size of the file and then divide using the size of the struct in order to get the number of records, also, always check the result of those functions.
Something like:
FILE *file;
long size;
size_t count, records;
file = fopen("myfile", "rb");
if (file == NULL) {
perror("fopen");
return 0;
}
if (fseek(file, 0, SEEK_END) == -1) {
perror("fseek");
return 0;
}
size = ftell(file);
if (size == -1) {
perror("ftell");
return 0;
}
if (fseek(file, 0, SEEK_SET) == -1) {
perror("fseek");
return 0;
}
records = size / sizeof(x);
for (count = 0; count < records; count++) {
if (fread(&x, sizeof(x), 1, file) == 1) {
printf("\n\nID:%d",x.ID); /* %d instead of %ld */
} else {
break;
}
}
But notice that you are always writing to the same variable on the stack.
EDIT:
How do you store the struct in the file?
I'm not storing it, a program is.
If it is not yours (you don't build the file using the same struct) then you can't know which is the sizeof(x) inside the file, read about structure padding and packing.

Use more protection. Test the result of functions.
FILE *p;
struct myStruct x;
p=fopen("myfile","rb");
assert(p); // Insure file opened
while(1) {
size_t n = fread(&x, sizeof(x), 1, p);
// feof() is insufficient,
// fread() can fail due to input errors too and not set end-of-file condition
// if(feof(p)!=0)
if (n == 0) {
break;
}
// printf("\n\nID:%ld",x.ID);
printf("\n\nID:%d", x.ID); // Use matching specifier
fflush(stdout); // Insure output occurs promptly
}
fclose(p);
return 0;
Since OP's code had a mis-matched printf specifier, it indicates that either warnings are not fully enabled or OP is using a weak compiler. Suggest fixing that to save time.

Related

How to read from a file and parse it

I have a file .txt containing some values formatted like this:
0,30,25,10
Now, I open up the file and store it into an array
char imposta_tratt[300];
FILE *fp;
fp = fopen("/home/pi/Documents/imposta_trattamento.txt", "r");
if (fp == 0) return;
fread(imposta_tratt, sizeof(imposta_tratt), 1, fp);
fclose(fp);
Now I expect to have the array filled with my data. I have the values separated by a , so I go on and parse it:
const char delim[2] = ",";
int t=0;
char *token = strtok(imposta_tratt, delim);
while (token!=NULL){
strcpy(tratt[t],token);
token = strtok(NULL, delim);
tratt[t]=token;
t++;
}
Here, referring to what's in the file .txt, I expect to have tratt[0]=0; tratt[1]=30; tratt[2]=25; and so on, but seems like I am missing something since it's not like this.
All I want is to have the values of the txt file stored in single variables. Can someone help?
What you are trying to achieve can simply be done using fgets():
bool read_file_content(const char *filename, const size_t tsizemax, int tratt[tsizemax], size_t *tsize, const char *delim)
{
// Attempt to open filename.
FILE *fp = fopen(filename, "r");
if (!fp) return false; // Return false upon failure.
// Try to read one line. If you have more, you need a while loop.
char imposta_tratt[300];
if (!fgets(imposta_tratt, sizeof imposta_tratt, fp)) {
fclose(fp);
return false;
}
*tsize = 0;
char tmp[300]; // Temporary buffer. Used for conversion into int.
char *token = strtok(imposta_tratt, delim);
while (token && *tsize < tsizemax) {
strncpy(tmp, token, sizeof tmp);
tratt[(*tsize)++] = atoi(tmp);
token = strtok(NULL, delim);
}
fclose(fp);
return true;
}
const char *filename: The file you want to parse.
const size_t tsizemax: The maximum size of your tratt array. It is important to control the size, otherwise your code will have buffer overflow (think of when your file has more than 100 tokens, for example).
int tratt[tsizemax]: The array that will hold the values.
size_t *tsize: The number of tokens read (used in combination of tsizemax).
const char *delim: The delimiter(s), in your case a ,.
This is your main():
int main(void)
{
int tratt[100];
size_t size = 0;
if (!read_file_content("in.txt", 100, tratt, &size, ",")) {
puts("Failed");
return 1;
}
for (size_t i = 0; i < size; ++i)
printf("%d\n", tratt[i]);
}
Output:
0
30
25
10
Suppose "in.txt" has contents
0,30,25,10
The below program uses fscanf to read the integers into the tratt array, one-by-one. As we read integers using fscanf, we make sure it's return value is as expected. If not, we close the file and exit. In the event that the return value of fscanf is not as expected, the program also prints which type of error occurred. Currently, if any error occurs, the program stops. However, you can make the program behave differently depending on the error that occurred if you like.
As output, the program prints all of the integers read into the tratt array. The output is
0
30
25
10
Now this program assumes we know the number of elements we want to read into tratt. If we do not, we could allow for dynamically allocating more memory should the array need more elements or perhaps "in.txt" could contain a data structure, say, at the beginning/end of the file that records information about the file, such as the number of numbers in the file and the data type (a binary file would be best suited for this). These are just a couple of the possibilities.
A better approach might be to read characters in one-by-one (say, using getc) and use strtol to convert a sequence of character digits to a long int (I would have taken an approach similar to this).
Nevertheless, this approach is more succinct and should suffice.
#include <stdio.h>
#include <stdlib.h>
#define FILE_NAME "in.txt"
#define MAX_LEN 4
int main(void) {
int i, tratt[MAX_LEN];
FILE *fp = fopen(FILE_NAME, "r"); /* open file for reading */
/* if cannot open file */
if (fp == NULL) {
printf("Cannot open %s\n", FILE_NAME);
exit(EXIT_FAILURE);
}
/* read integer, checking return value of scanf as expected */
if (fscanf(fp, "%d", &tratt[0]) != 1) {
if (ferror(fp))
printf("fscanf: read error\n");
else if (feof(fp))
printf("fscanf: end of file\n");
else
printf("fscanf: matching failure\n");
fclose(fp);
exit(EXIT_FAILURE);
}
for (i = 1; i < MAX_LEN; i++)
/* read comma plus integer, checking return value of scanf */
if (fscanf(fp, ",%d", &tratt[i]) != 1) {
if (ferror(fp))
printf("fscanf: read error\n");
else if (feof(fp))
printf("fscanf: end of file\n");
else
printf("fscanf: matching failure\n");
fclose(fp);
exit(EXIT_FAILURE);
}
fclose(fp); /* close file */
/* print integers stored in tratt */
for (i = 0; i < MAX_LEN; i++)
printf("%d\n", tratt[i]);
return 0;
}

Error in making array in C whose size is decided at runtime

I had this code which initially reads a .wav file to find out the number of samples in it. Using this size I was trying to make array of that size. After that I was reading the samples form same .wav file and was storing in that array, but out of 762880 samples it was reading only 7500 samples(approx.).
#include<stdio.h>
#include<conio.h>
#include<math.h>
#include<stdlib.h>
void main(){
FILE *fp;
long n,i=0;
float *data;
FILE* inp =NULL;
FILE* oup =NULL;
float value =0;
signed short ss;
/* Open file */
fp = fopen("k1s1.wav", "rb");
fseek(fp, 0L, SEEK_END);
n = ftell(fp);
n=n/2-22;
printf("n:%ld",n);
fclose(fp);
data=malloc(n*sizeof(float));
inp = fopen("k1s1.wav","rb");
oup = fopen("cs123.txt","wt");
fseek (inp,44,SEEK_SET);// that didn't help getting the right result !!
i=0;
for(i=0;i<n;i++){
fread(&ss, sizeof(signed short) ,1 , inp);
//Now we have to convert from signed short to float:
value=((float)ss)/(unsigned)0xFFFF;
value= value* 2.0;
value= value*100000;
value= ceil(value);
value= value/100000;
fprintf(oup,"%f\n",value);
data[i]=value;
///printf("%lf\t",value);
}
fclose(inp);
fclose(oup);
printf("done");
}
When I remove this line - "data[i]=value;" in for loop the programs works fine and i can see output in the file.I need to store these values in array as well for further computations. What could be the error?
The code looks okay. What you could do is check the results from fseek and fread to confirm they are working as expected. As for fseek, technically it should be passed the result from ftell and not passed something calculated (but probably correct!).
I saw in the comments TurboC was mentioned. If that's the case, make sure to build using "small-code & big-data". If you're building with "small-code & small data", you could have problems like you're seeing with lots of data.
I'd probably write something like this:
#include <stdlib.h>
#include <stdio.h>
#define HEADER 22
#define INCREMENT 10000
#define CONVERSION (2.0 / 65535.0)
int main() {
FILE *inp = NULL, *oup = NULL;
float *data = NULL, value;
signed short header[HEADER], ss;
long i = 0, j = 0;
if ((inp = fopen("k1s1.wav", "rb")) && (oup = fopen("cs123.txt","wt"))) {
if (fread(header, sizeof(header), 1, inp)) {
while (fread(&ss, sizeof(signed short), 1, inp)) {
value = ((float)ss) * CONVERSION;
if (j == i) {
j += INCREMENT;
if (!(data = realloc(data, j * sizeof(float)))) {
fprintf(stderr, "FAILURE: realloc failure at %ld elements\n", j);
return EXIT_FAILURE;
}
}
data[i++] = value;
fprintf(oup,"%.5f\n",value);
}
/* Release any extra memory */
j = i;
if (!(data = realloc(data, j * sizeof(float)))) {
fprintf(stderr, "FAILURE: realloc failure at %ld elements\n", j);
return EXIT_FAILURE;
}
for (i = 0; i < j; i++) {
/* Can analyze data here */
}
} else {
fprintf(stderr, "FAILURE: header not defined\n");
return EXIT_FAILURE;
}
} else {
fprintf(stderr, "FAILURE: files could not be opened\n");
return EXIT_FAILURE;
}
/* Okay to use if the arguments are NULL */
free(data);
fclose(inp);
fclose(oup);
return EXIT_SUCCESS;
}
This solution avoids any uncertainty as to whether fseek, ftell, etc are working or not by not using them, and checks for more errors so you might be able to what's up if it fails in your test cases. Hope this helps.

How do you properly use the fstat() function and what are its limits?

So I am very new to C. I have done a lot of programming in Java and am finding it very difficult to learn C.
Currently I am assigned to read in a file from our terminal window, which will contain a list of integers. From this list we must read the values and calculate the average, I believe I have done this correctly.
My only problem is I do not understand how to use the fstat() correctly, I read the man page but am still having a hard time understanding. In my code below, I am wanting to use fstat() to find the size of the file being read so I can then allocate the correct amount of space for my array where I store the values read off the input file. I just need to know the proper usage and syntax of fstat() and from there I believe I can make significant progress. Thanks in advance!
char *infile;
int fileSize;
int fd;
int i;
int j;
int temp;
int sum = 0;
int average;
/* enforce proper number of arguments
*/
if (argc != 1)
{
fprintf(stderr, "Too many arguments.\n");
exit(1);
}
infile = argv[1];
//open file
fd = open(infile, O_RDONLY);
//exit if open fails
assert (fd > -1);
//find size of file
fileSize = fstat(fd, blksize_t st_blksize);
//fine perfect size for array
temp = malloc(temp * sizeof(int));
//create file of perfect size
int intArray[temp];
//scan integers into our array
for (i = 0; i < fileSize; i++)
{
fscanf(infile, "%d", intArray[i]);
}
fclose(fd);
//add all integers into the array up
for (j = 0; j < fileSize; j++);
{
sum = sum + intArray[j];
}
//calculat average
average = (sum)/fileSize;
printf("Number of numbers averaged: %d\n Average of numbers: %d\n", fileSize, average);
if ( close(fd) == -1 )
{
fprintf(stderr, "error closing file -- quitting");
exit(1);
}
return 0;
}
The library function fstat() does not return the size of the file, it returns 0 if successful. It informs the file size by filling in the struct passed as an argument.
if (fstat( fd, &buf))
printf("Bad call\n");
else
printf("File size : %ld\n", buf.st_size);
But as #chux (deleted post) answered, it tells you the file size in bytes, not in integers. The function fscanf() inputs the data from text, so there is no direct correlation between file size, and number of fields.
So unfortunately, in answer to your titled question, using fstat() to determine the file size is of no practical use to you. Your secondary implied question is how to allocate enough memory for the array. I posted an answer to that, in a different context where at the outset the array size is unknown. C reading a text file separated by spaces with unbounded word size
But here I use a simpler technique - parse the file twice to find out how many textual integers it contains. It then rewinds the file and allocates memory for the array, although, in this example, the array isn't necessary to calculate the sum and average of the values, and the double file parse isn't necessary either, unless you plan to do more with the values.
#include <stdio.h>
#include <stdlib.h>
void fatal(char *msg) {
printf("%s\n", msg);
exit (1);
}
int main(int argc, char *argv[])
{
FILE *fil;
int *array;
int items = 0;
int sum = 0;
int avg;
int value;
int i;
if (argc < 2) // check args
fatal ("No file name supplied");
if ((fil = fopen (argv[1], "rt")) == NULL) // open file
fatal ("Cannot open file");
while (fscanf(fil, "%d", &value) == 1) // count ints
items++;
printf ("Found %d items\n", items);
if (items == 0)
fatal ("No integers found");
if ((array = malloc(items * sizeof (int))) == NULL) // allocate array
fatal ("Cannot allocate memory");
if (fseek (fil, 0, SEEK_SET)) // rewind file
fatal ("Cannot rewind file");
for (i=0; i<items; i++) {
if (fscanf(fil, "%d", &value) != 1) // check int read
fatal ("Cannot read integer");
array[i] = value;
sum += value;
}
fclose(fil);
printf ("Sum = %d\n", sum);
printf ("Avg = %d\n", (sum+items/2) / items); // allow rounding
free(array);
return 0;
}
Input file:
1 2 3
4 5
6
-1 -2
Program output:
Found 8 items
Sum = 18
Avg = 2
You claim to have read the manpage for fstat(), which seems at odds with:fileSize = fstat(fd, blksize_t st_blksize);
You need to declare a struct stat in the function scope, and pass a pointer to it to fstat():
struct stat finfo;
fstat(fd, &finfo);
Then you can read the file size from the struct stat:
off_t filesize = finfo.st_size;
I'd also recommend using size_t instead of int for everything to do with object sizes.

C, Segmentation fault parsing large csv file

I wrote a simple program that would open a csv file, read it, make a new csv file, and only write some of the columns (I don't want all of the columns and am hoping removing some will make the file more manageable). The file is 1.15GB, but fopen() doesn't have a problem with it. The segmentation fault happens in my while loop shortly after the first progress printf().
I tested on just the first few lines of the csv and the logic below does what I want. The strange section for when index == 0 is due to the last column being in the form (xxx, yyy)\n (the , in a comma separated value file is just ridiculous).
Here is the code, the while loop is the problem:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
long size;
FILE* inF = fopen("allCrimes.csv", "rb");
if (!inF) {
puts("fopen() error");
return 0;
}
fseek(inF, 0, SEEK_END);
size = ftell(inF);
rewind(inF);
printf("In file size = %ld bytes.\n", size);
char* buf = malloc((size+1)*sizeof(char));
if (fread(buf, 1, size, inF) != size) {
puts("fread() error");
return 0;
}
fclose(inF);
buf[size] = '\0';
FILE *outF = fopen("lessColumns.csv", "w");
if (!outF) {
puts("fopen() error");
return 0;
}
int index = 0;
char* currComma = strchr(buf, ',');
fwrite(buf, 1, (int)(currComma-buf), outF);
int progress = 0;
while (currComma != NULL) {
index++;
index = (index%14 == 0) ? 0 : index;
progress++;
if (progress%1000 == 0) printf("%d\n", progress/1000);
int start = (int)(currComma-buf);
currComma = strchr(currComma+1, ',');
if (!currComma) break;
if ((index >= 3 && index <= 10) || index == 13) continue;
int end = (int)(currComma-buf);
int endMinusStart = end-start;
char* newEntry = malloc((endMinusStart+1)*sizeof(char));
strncpy(newEntry, buf+start, endMinusStart);
newEntry[end+1] = '\0';
if (index == 0) {
char* findNewLine = strchr(newEntry, '\n');
int newLinePos = (int)(findNewLine-newEntry);
char* modifiedNewEntry = malloc((strlen(newEntry)-newLinePos+1)*sizeof(char));
strcpy(modifiedNewEntry, newEntry+newLinePos);
fwrite(modifiedNewEntry, 1, strlen(modifiedNewEntry), outF);
}
else fwrite(newEntry, 1, end-start, outF);
}
fclose(outF);
return 0;
}
Edit: It turned out the problem was that the csv file had , in places I was not expecting which caused the logic to fail. I ended up writing a new parser that removes lines with the incorrect number of commas. It removed 243,875 lines (about 4% of the file). I'll post that code instead as it at least reflects some of the comments about free():
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
long size;
FILE* inF = fopen("allCrimes.csv", "rb");
if (!inF) {
puts("fopen() error");
return 0;
}
fseek(inF, 0, SEEK_END);
size = ftell(inF);
rewind(inF);
printf("In file size = %ld bytes.\n", size);
char* buf = malloc((size+1)*sizeof(char));
if (fread(buf, 1, size, inF) != size) {
puts("fread() error");
return 0;
}
fclose(inF);
buf[size] = '\0';
FILE *outF = fopen("uniformCommaCount.csv", "w");
if (!outF) {
puts("fopen() error");
return 0;
}
int numOmitted = 0;
int start = 0;
while (1) {
char* currNewLine = strchr(buf+start, '\n');
if (!currNewLine) {
puts("Done");
break;
}
int end = (int)(currNewLine-buf);
char* entry = malloc((end-start+2)*sizeof(char));
strncpy(entry, buf+start, end-start+1);
entry[end-start+1] = '\0';
int commaCount = 0;
char* commaPointer = entry;
for (; *commaPointer; commaPointer++) if (*commaPointer == ',') commaCount++;
if (commaCount == 14) fwrite(entry, 1, end-start+1, outF);
else numOmitted++;
free(entry);
start = end+1;
}
fclose(outF);
printf("Omitted %d lines\n", numOmitted);
return 0;
}
you're malloc'ing but never freeing. possibly you run out of memomry, one of your mallocs returns NULL, and the subsequent call to str(n)cpy segfaults.
adding free(newEntry);, free(modifiedNewEntry); immediately after the respective fwrite calls should solve your memory shortage.
also note that inside your loop you compute offsets into the buffer buf which contains the whole file. these offsets are held in variables of type int whose maximum value on your system may be too small for the numbers you are handling. also note that adding large ints may result in a negative value which is another possible cause of the segfault (negative offsets into buf take you to some address outside the buffer possibly not even readable).
The malloc(3) function can (and sometimes does) fail.
At least code something like
char* buf = malloc(size+1);
if (!buf) {
fprintf(stderr, "failed to malloc %d bytes - %s\n",
size+1, strerror(errno));
exit (EXIT_FAILURE);
}
And I strongly suggest to clear with memset(buf, 0, size+1) the successful result of a malloc (or otherwise use calloc ....), not only because the following fread could fail (which you are testing) but to ease debugging and reproducibility.
and likewise for every other calls to malloc or calloc (you should always test them against failure)....
Notice that by definition sizeof(char) is always 1. Hence I removed it.
As others pointed out, you have a memory leak because you don't call free appropriately. A tool like valgrind could help.
You need to learn how to use the debugger (e.g. gdb). Don't forget to compile with all warnings and debugging information (e.g. gcc -Wall -g). And improve your code till you get no warnings.
Knowing how to use a debugger is an essential required skill when programming (particularly in C or C++). That debugging skill (and ability to use the debugger) will be useful in every C or C++ program you contribute to.
BTW, you could read your file line by line with getline(3) (which can also fail and you should test that).

Unable to read file using fread in C

I am trying to read a file "file.raw" and 4 bytes at a time to an array and check if it has the particular 4 byte signature I am looking for. I am having trouble with this. The value of result I get is 0, instead of 4 when using fread.
#include<stdint.h>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
typedef uint8_t BYTE;
int main(void)
{
size_t result;
FILE *inptr = fopen("file.raw","r+");
//Check if file can be opened.
if (inptr == NULL)
{
printf("File Open Error\n");
return -1;
}
long int x = 0;
while(!feof(inptr))
{
// Make space for reading in to an array
BYTE *array = (BYTE *) malloc(10);
if(array == NULL)
{
printf("Array Initialization Error\n");
return -1;
}
result = fread(array,1,4,inptr);
//Exit if file not read. ** This is where I can't get past.
if(result != 4)
{
printf("File Read Error\n");
printf("%d\n",result);
free(array);
fclose(inptr);
return -1;
}
//Compare strings
if(memcmp(array,"0xffd8ffe0",4)==0)
{
printf("File Start found\n");
printf("Exiting...\n");
printf("%p\n",inptr);
free(array);
fclose(inptr);
return 0;
}
x++;
free(array);
}
printf("%p\n",inptr);
printf("%ld\n",x);
fclose(inptr);
return 0;
}
My guess is that it doesn't fail on the first iteration of the while loop, but rather keeps reading the file until you reach end of the file, at which point fread() returns 0 and your program exits.
The reason it's not finding the signature is this:
memcmp(array,"0xffd8ffe0",4)==0
That memcmp() call is almost certainly not what you want (it's looking for the sequence of ASCII characters '0', 'x', 'f' and 'f').
PS As noted by #Mat in the comments, for maximum portability you should open the file in binary mode ("r+b" instead of "r+").
Try opening the file in binary mode ("r+b") instead of text mode ("r+"). You're probably being undone by unintentional CRLF conversions, messing up your binary data.

Resources