fwrite fails to write big data

fwrite fails to write big data - c

I find fwrite fails when I am trying to write somewhat big data as in the following code.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
int size = atoi(argv[1]);
printf("%d\n", size);
FILE* fp = fopen("test", "wb");
char* c = "";
int i = fwrite(c, size, 1, fp);
fclose(fp);
printf("%d\n", i);
return 0;
}
The code is compiled into binary tw
When I try ./tw 10000 it works well. But when I try something like ./tw 12000 it fails.(fwrite() returns 0 instead of 1)
What's the reason of that? In what way can I avoid this?
EDIT: When I do fwrite(c, 1, size, fp) it returns 8192 instead of larger size I give.
2nd EDIT: When I write a loop that runs for size times, and fwrite(c, 1, 1, fp) each time, it work perfectly OK.
It seems when size is too large(as in the first EDIT) it only writes about 8192 bytes.
I guess something has limited fwrite write up to fixed size bytes at a time.
3rd EDIT: The above is not clear.
The following fails for space - w_result != 0 when space is large, where space is determined by me and w_result is object written in total.
w_result = 0;
char* empty = malloc(BLOCKSIZE * size(char));
w_result = fwrite(empty, BLOCKSIZE, space, fp);
printf("%d lost\n", space - w_result);
While this works OK.
w_result = 0;
char* empty = malloc(BLOCKSIZE * sizeof(char));
for(i = 0; i < space; i ++)
w_result += fwrite(empty, BLOCKSIZE, 1, fp);
printf("%d lost\n", space - w_result);
(every variable has been declared.)
I corrected some errors the answers memtioned. But the first one should work according to you.

With fwrite(c, size, 1, fp); you state that fwrite should write 1 item that is size big , big out of the buffer c.
c is just a pointer to an empty string. It has a size of 1. When you tell fwrite to go look for more data than 1 byte in c , you get undefined behavior. You cannot fwrite more than 1 byte from c.
(undefined behavior means anything could happen, it could appear to work fine when you try with a size of 10000 and not with a size of 12000. The implementation dependent reason for that is likely that there is some memory available, perhaps the stack, starting at c and 10000 bytes forward, but at e.g. 11000 there is no memory and you get a segfault)

You are reading memory that doesn't belong to your program (and writing it to a file).
Test your program using valgrind to see the errors.

From that snippet of code, it looks like you're trying to write what's at c, which is just a single NULL byte, to the file pointer, and you're doing so "size" times. The fact that it doesn't crash with 10000 is coincidental. What are you trying to do?

As has been stated by others the code is performing an invalid memory read via c.
A possible solution would be to dynamically allocate a buffer that is size bytes in size, initialise it, and fwrite() it to the file, remembering to deallocate the buffer afterwards.
Remember to check return values from functions (fopen() for example).

Related

How does fread deal with trailing garbage when reaching the end of the file?

I recently started dabbing in C again, a language I'm not particularly proficient at and, in fact, keep forgetting (I mostly code in Python). My idea here is to read data from a hypothetically large file as chunks and then process the data accordingly. For now, I'm simulating this by actually loading the whole file into a buffer of type short with fread. This method will be changed, since it would be a very bad idea for, say, a file that's 1 GB, I'd think. The end goal is to read a chunk as one, process, move the cursor, read another chunk and so on.
The file in question is 43 bytes and has the phrase "The quick brown fox jumps over the lazy dog". This size is convenient because it's a prime number, so no matter how many bytes I split it into, there will always be trailing garbage (due to the buffer having leftover space?). Data processing in this case is just printing out the shorts as two chars after byte manipulation (see code below)
#include <stdio.h>
#include <stdlib.h>
#define MAX_BUFF_SIZE 1024
long file_size(FILE *f)
{
if (fseek(f, 0, SEEK_END) != 0) exit(EXIT_FAILURE); // Move cursor to the end
long file_size = ftell(f); // Determine position to get file size
rewind(f);
return file_size;
}
int main(int argc, char* argv[])
{
short buff[MAX_BUFF_SIZE] = {0}; // Initialize to 0 remove trailing garbage
char* filename = argv[1];
FILE* fp = fopen(filename, "r");
if (fp)
{
size_t size = sizeof(buff[0]); // Size in bytes of each chunk. Fixed to 2 bytes
int nmemb = (file_size(fp) + size - 1) / size; // Number of chunks to read from file
// (ceil f_size/size)
printf("Should read at most %d chunks\n", nmemb);
short mask = 0xFF; // Mask to take first or second byte
size_t num_read = fread(buff, size, nmemb, fp);
printf("Read %lu chunks\n\n", num_read); // Seems to have read more? Look into.
for (int i=0; i<nmemb; i++) {
char first_byte = buff[i] & mask;
char second_byte = (buff[i] >> 8) & mask; // Identity for 2 bytes. Keep mask for consistency
printf("Chunk %02d: 0x%04x | %c %c\n", // Remember little endian (bytes reversed)
i, buff[i], first_byte, second_byte);
}
fclose(fp);
} else
{
printf("File %s not found\n", filename);
return 1;
}
return 0;
}
Now yesterday, on printing out the last chunk of data I was getting "Chunk 21: 0xffff9567 | g". The last (first?) byte (0x67) is g, and I did expect some trailing garbage, but I don't understand why it was printing out so many bytes when the variable buff has shorts in it. At that point I was just printing the hex as %x, not %04x, and buff was not initialized to 0. Today, I decided to initialize it to 0 and not only did the garbage disappear, but I can't recreate the problem even after leaving buff uninitialized again.
So here are my questions that hopefully aren't too abstract:
Does fread look beyond the file when reading data and does it remove trailing garbage itself, or is it up to us?
Why was printf showing an int when the buffer is a short? (I assume %x is for ints) and why can't I replicate the behaviour even after leaving buff without initialization?
Should I always initialize the buffer to zero to remove trailing garbage? What's the usual approach here?
I hope these aren't too many, or too vague, questions, and that I was clear enough. Like I said, I don't know much about C but find low-mid level programming very interesting, especially when it comes to direct data bit/byte manipulation.
Hope you have a great day!
EDIT 1:
Some of you wisely suggested I use num_read instead of nmemb on the loop, since that's the return value of fread, but that means I'll discard the rest of the file (nmemb is 22 but num_read is 21). Is that the usual approach? Also, thank you for pointing out that %x was casting to unsigned int, hence the 4 bytes instead of 2.
EDIT 2:
For clarification, and since I mispoke in a comment, I'd like to keep the remaining byte (or data), while discarding the rest, which is undefined. I don't know if this is the usual approach since if I use num_read in the loop, whatever is leftover at the end is discarded, data or not. I'm more interested in knowing what the usual approach is: discard leftover data or remove anything that we know is undefined, in this case one of the bytes.

Gradual memory allocation strategy

For didactic purposes, I am working on a program that reads a string (array of chars) from standard input. The goal is to allow the program to sequentially increase the memory allocated according to the dimension of the input. I would like your opinion on my approach.
I thought I could allocate one byte of space one by one, for every reading cycle needed. Clearly, it does not work. How could I approach this problem? Is it even worth trying?
Thank you for your patience and support!
#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
int main(){
char *q;
int flag = 1, j = 0;
printf("\n\nNow let's go for a word of undefined lenght. Type it:\n\n");
do{
q++ = calloc(1, sizeof(char));
flag = ((q[j] = getchar()) != 0); #Until it is valid.
++j;
}while(flag);
return 0;
}

Reading the entire input at once is typically wrong.
The standard approach is to get a buffer of few kilobytes and read + process the data in chunks of that size, overwriting previously read data which is now useless.
In rare cases where you need to have the entire thing in ram and are reading from a regular file, you can fstat the file to get its size and allocate accordingly. If the file is big (megabytes in size), you should mmap.
Finally, in extremely rare case where you need to read stuff up and you can't know the size in advance, the way is to realloc doubling the size each time. i.e. size *= 2; new = realloc(p, size); p = new; ....

Are element size and count exchangable in an fread call? [duplicate]

This question already has answers here:
How does fread really work?
(7 answers)
Closed 7 years ago.
Let's say I have a file with a size of 5000 bytes, which I am trying to read from.
I have this code:
int main()
{
char *file_path[] = "/path/to/my/file"
FILE *fp= fopen(file_path,"rb"); //open the file
fseek(fp, 0, SEEK_END); // seek to end of file
unsigned long fullsize = ftell(fp); //get the file size (5000 for this example)
fseek(fp, 0, SEEK_SET); //bring back the stream to the begging
char *buf = (char*)malloc(5000);
fread(buf,5000,1,fp);
free(buf);
return 0;
}
I can also replace the fread call with
fread(buf,1000,5,fp);
What is better? And why?
In matters of optimization, I understand the return value is different.

If you exchange those two arguments, you still request to read the same number of bytes. However the behaviour is different in other respects:
What happens if the file is shorter than that amount
The return value
Since you should always be checking the return value of fread, this is important :)
If you use the form result = fread(buf, 1, 5000, fp);, i.e. read 5000 units of size 1, but the file size is only 3000, then what will happen is that those 3000 bytes are placed in your buffer, and 3000 is returned.
In other words you can detect a partial read and still use the partial result.
However if you use result = fread(buf, 5000, 1, fp);, i.e. read 1 unit of size 5000, then the contents of the buffer are indeterminate (i.e. the same status as an uninitialized variable), and the return value is 0.
In both cases, a partial read leaves the file pointer in an indeterminate state, i.e. you will need to fseek before doing any further reads.
Using the latter form (i.e. any size other than 1) is probably best used for when you either want to abort if the full size is not available, or if you're reading a file with fixed-size records.

I've always found it best to use 1 for the element size. If fread()
can't read a complete element at the end of the file, it will skip the
last, partial element. This is not desirable when the last element is
short. On the other hand, using 1 for element size does no harm.
Sample code that prints itself and demonstrates this behavior:
#include <stdio.h>
#include <string.h>
#define SIZE 100
#define N 1
int main()
{
FILE *fin;
int ct;
char buf[SIZE * N + 1];
fin = fopen("size_n.c", "r");
while (1) {
ct = fread(buf, SIZE, N, fin);
if (!ct)
break;
buf[ct * SIZE] = '\0';
fputs(buf, stdout);
}
}

Loss of data when copying a file

int32_t a[MAX];
int main()
{
FILE *f = fopen("New.doc","rb");
FILE *g = fopen("temp.doc","wb");
if (f == NULL || g == NULL)
return 0;
int n;
while ((n = fread(a, 4, MAX, f)) > 0)
fwrite(a, 1, n, g);
fclose(f);
fclose(g);
return 0;
}
Size on disk of "new.doc" in my computer is 20kb, when I ran the code above, I got "temp.doc" with 5kb in size => loss of data. However, when I changed 4 into 1, I got "temp.doc" which is exactly similar to "new.doc". Can anyone explain what happened? Thank you.

fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
On success, fread() and fwrite() return the number of items read or
written. This number equals the number of bytes transferred only when
size is 1.
In your case
n = fread(a, 4, MAX, f)
would return sizeof file/4. When you write again, you actually write 1/4 of the file. You can fix the fread call to have size 1 instead of 4

This is happening because you did not pay attention to the semantics of fread() and fwrite(). One returns the number of items written, the other requires the number of bytes to write. In your case, you are reading 4-byte elements and writing 1-byte elements, so of course you are missing 75% of your file. Use a matching element size in the calls to fread() and fwrite() and all will be well.
Note that if your files do not actually contain 4-byte elements (e.g. they are text files), you better use an element size of 1 (and I suggest changing a to be of type char to be less confusing, and using sizeof(a[0]) as the element size in the function calls).

Segfault in fread()

I'm trying to read a BMP image (greyscales) with C, save values into an array, and convert this array to a string with values separated with a comma.
My program worked well under Windows 7 64-bit, but I had to move to Windows XP 32-bit because of library compatibility problems.
I have 1,750 images to read, and I want to store all of them in a single string.
When I launch my program it goes fine until the 509:th image, then I get a Segmentation Fault caused by fread(). Here's my code:
int i=0,j,k,num,len,length,l;
unsigned char *Buffer;
FILE *fp;
char *string,*finalstring;
char *query;
char tmp2[5],tmp[3];
query = (char *)malloc(sizeof(char)*200000000);
string = (char *)malloc(sizeof(char)*101376);
Buffer = (unsigned char *)malloc(sizeof(unsigned char)*26368);
BITMAPFILEHEADER bMapFileHeader;
BITMAPINFOHEADER bMapInfoHeader;
length = 0;
for (k =1;k<1751;k++)
{
strcpy(link,"imagepath");
//here just indexing the images from 0000 to 1750
sprintf(tmp2,"%.4d",k);
strcat(link,tmp2);
strcat(link,".bmp");
fp = fopen(link, "rb");
num = fread(&bMapFileHeader,sizeof(BITMAPFILEHEADER),1,fp);
num = fread(&bMapInfoHeader,sizeof(BITMAPINFOHEADER),1,fp);
//seek beginning of data in bitmap
fseek(fp,54,SEEK_SET);
//read in bitmap file to data
fread(Buffer,26368,1,fp);
l=0;
for(i=1024;i<26368;i++)
{
itoa(Buffer[i],tmp,10);
len = strlen(tmp);
memcpy(string+l,tmp,len);
memcpy(string+l+len,",",1);
l = l+len+1;
}
memcpy(query,"",1);
memcpy(string,"",1);
printf("%i\n",k);
}
Thanks

Make it tmp[4]; for three digits and the terminating 0.
Also: where is the fclose? I suspect that you're running out of file handles.
Check, whether fp != 0.

Where did you get 101376 from? Each of your bytes take up at most 5 characters as a decimal number with comma (e.g. -127,), 5*26368 is 131840.

Get rid of the casts in malloc calls. And #include <stdlib.h>.
What's the output of this program, in both the 64-bit and 32-bit systems you're using?
#include <stdio.h>
int main(void) {
printf("sizeof (int) is %d\n", (int)(sizeof (int)));
printf("sizeof (int*) is %d\n", (int)(sizeof (int*)));
return 0;
}

Run your program in the debugger.
Set a breakpoint at the call to
fread -- make it conditional on
k==507 (this will stop it when you
expect the fread to be successful).
When the program hits the
breakpoint, examine the variables
and check what is about to be passed
to fread. The first one or two times
you hit the breakpoint, the values
will be good.
Then on the 509th time, you will
probably see bogus values being passed
to fread. Figure out where those
bogus values are coming from --
possibly set a conditional
breakpoint on the variable being set
to whatever the bogus value is.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

fwrite fails to write big data - c

You are reading memory that doesn't belong to your program (and writing it to a file). Test your program using valgrind to see the errors.

From that snippet of code, it looks like you're trying to write what's at c, which is just a single NULL byte, to the file pointer, and you're doing so "size" times. The fact that it doesn't crash with 10000 is coincidental. What are you trying to do?

Related

How does fread deal with trailing garbage when reaching the end of the file?

Gradual memory allocation strategy

Are element size and count exchangable in an fread call? [duplicate]

Loss of data when copying a file

Segfault in fread()

Categories

Resources