Loss of data when copying a file - file

int32_t a[MAX];
int main()
{
FILE *f = fopen("New.doc","rb");
FILE *g = fopen("temp.doc","wb");
if (f == NULL || g == NULL)
return 0;
int n;
while ((n = fread(a, 4, MAX, f)) > 0)
fwrite(a, 1, n, g);
fclose(f);
fclose(g);
return 0;
}
Size on disk of "new.doc" in my computer is 20kb, when I ran the code above, I got "temp.doc" with 5kb in size => loss of data. However, when I changed 4 into 1, I got "temp.doc" which is exactly similar to "new.doc". Can anyone explain what happened? Thank you.

fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
On success, fread() and fwrite() return the number of items read or
written. This number equals the number of bytes transferred only when
size is 1.
In your case
n = fread(a, 4, MAX, f)
would return sizeof file/4. When you write again, you actually write 1/4 of the file. You can fix the fread call to have size 1 instead of 4

This is happening because you did not pay attention to the semantics of fread() and fwrite(). One returns the number of items written, the other requires the number of bytes to write. In your case, you are reading 4-byte elements and writing 1-byte elements, so of course you are missing 75% of your file. Use a matching element size in the calls to fread() and fwrite() and all will be well.
Note that if your files do not actually contain 4-byte elements (e.g. they are text files), you better use an element size of 1 (and I suggest changing a to be of type char to be less confusing, and using sizeof(a[0]) as the element size in the function calls).

Related

Using information from an external file, How can I fill my array up correctly using fread?

I need to be able to make sure my array is correctly receiving values from the file card.raw through fread.
I am not confident about using arrays with pointers, so if anybody could help me with the theory here, it would be GREATLY appreciate it. Thanks in advance.
The code is supposed to take literally one block of size 512 bytes and stick it into the array. Then I am just using a debugger and printf to examine the arrays output.
/**
* recover.c
*
* Computer Science 50
* Problem Set 4
*
* Recovers JPEGs from a forensic image.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char* argv[])
{
//Size of EACH FAT JPEG in bytes
#define FILESIZE 512
unsigned char* buffer[FILESIZE];
///Step 1: Open jpeg
FILE* readfrom = fopen("card.raw", "rb");
if (readfrom == NULL)
{
printf("Could not open");
}
///Step 2: Find Beginning of JPEG. The first digits will be 255216255 Then 224 or 225
fread(&buffer, FILESIZE, 1, readfrom);
for(int x = 0; x < FILESIZE; x++)
{
printf("%d = %c\n", x, buffer[x]);
}
fclose(readfrom);
}
Use return values from input functions. fread() reports how many elements were read - code might not have read 512. Swap FILESIZE, 1 to detect the number of characters/bytes read.
// fread(&buffer, FILESIZE, 1, readfrom);
size_t count = fread(&buffer, 1, FILESIZE, readfrom);
Only print out up to the number of elements read. Recommend hexadecimal (and maybe decimal) output rather than character.
for(size_t x = 0; x < count; x++) {
// printf("%d = %c\n", x, buffer[x]);
printf("%3zu = %02X % 3u\n", x, buffer[x], buffer[x]);
}
If the fopen() failed, best to not continue with for() and fclose().
if (readfrom == NULL) {
printf("Could not open");
return -1;
}
The second parameter is size, in bytes, of each element to be read.
The third parameter is Number of elements each one with a size of the <second parameter> bytes.
So, swap your second and first parameters.
Replace unsigned char* buffer[FILESIZE]; with unsigned char buffer[FILESIZE];. For now, you have an array of unsigned char *, when you need unsigned char. Because buffer is already a pointer, you don't need to take its address. In fread call, replace &buffer with buffer.
It must go like this: fread(buffer, 1, FILESIZE, readfrom);
One more thing: add return with a specific error code after printf("Could not open");, because if file hasn't been open, you cannot read from it, can you? And add return 0; in the end of main.
And take your #define out of main.
Read more about fread here: http://www.cplusplus.com/reference/cstdio/fread/

Are element size and count exchangable in an fread call? [duplicate]

This question already has answers here:
How does fread really work?
(7 answers)
Closed 7 years ago.
Let's say I have a file with a size of 5000 bytes, which I am trying to read from.
I have this code:
int main()
{
char *file_path[] = "/path/to/my/file"
FILE *fp= fopen(file_path,"rb"); //open the file
fseek(fp, 0, SEEK_END); // seek to end of file
unsigned long fullsize = ftell(fp); //get the file size (5000 for this example)
fseek(fp, 0, SEEK_SET); //bring back the stream to the begging
char *buf = (char*)malloc(5000);
fread(buf,5000,1,fp);
free(buf);
return 0;
}
I can also replace the fread call with
fread(buf,1000,5,fp);
What is better? And why?
In matters of optimization, I understand the return value is different.
If you exchange those two arguments, you still request to read the same number of bytes. However the behaviour is different in other respects:
What happens if the file is shorter than that amount
The return value
Since you should always be checking the return value of fread, this is important :)
If you use the form result = fread(buf, 1, 5000, fp);, i.e. read 5000 units of size 1, but the file size is only 3000, then what will happen is that those 3000 bytes are placed in your buffer, and 3000 is returned.
In other words you can detect a partial read and still use the partial result.
However if you use result = fread(buf, 5000, 1, fp);, i.e. read 1 unit of size 5000, then the contents of the buffer are indeterminate (i.e. the same status as an uninitialized variable), and the return value is 0.
In both cases, a partial read leaves the file pointer in an indeterminate state, i.e. you will need to fseek before doing any further reads.
Using the latter form (i.e. any size other than 1) is probably best used for when you either want to abort if the full size is not available, or if you're reading a file with fixed-size records.
I've always found it best to use 1 for the element size. If fread()
can't read a complete element at the end of the file, it will skip the
last, partial element. This is not desirable when the last element is
short. On the other hand, using 1 for element size does no harm.
Sample code that prints itself and demonstrates this behavior:
#include <stdio.h>
#include <string.h>
#define SIZE 100
#define N 1
int main()
{
FILE *fin;
int ct;
char buf[SIZE * N + 1];
fin = fopen("size_n.c", "r");
while (1) {
ct = fread(buf, SIZE, N, fin);
if (!ct)
break;
buf[ct * SIZE] = '\0';
fputs(buf, stdout);
}
}

While doing fread() into an integer, what happens if file size is not a multiple of 4 bytes?

I am implementing a Feistel cipher of block size 16 bits. I read from an input file into an integer (32bits), encrypt and write to output file.
unsigned int buf1, buf2;
FILE *fin = fopen("input", "r");
FILE *fout = fopen("output", "w");
while(!feof(fin)) {
fread(&buf1, 4, 1, fin);
buf2 = encrypt(buf1);
fwrite(&buf2, 4, 1, fout);
}
The program is almost done. The only problem is that encryption followed by decryption is not the source file. They're almost same but the difference is only in the last bits.
My question is what happens if the no of bytes in the file is not a multiple of 4. What will happen to the last call to fread()?
If the number of bytes in the file is not a multiple of 4, the last call to fread() will return 0.
size_t fread(void * restrict ptr, size_t size, size_t nmemb, FILE * restrict stream);
"The fread function returns the number of elements successfully read, which may be less than nmemb if a read error or end-of-file is encountered."
The result value of fread() should be used to detect an incomplete read and EOF. OP code, as is, will read once too often.
Also suggest using uint32_t instead of unsigned int for greater portability and checking `fwrite() results.
uint32_t buf1, buf2;
int cnt;
while((cnt = fread(&buf1, sizeof buf1, 1, fin)) == 1) {
buf2 = encrypt(buf1);
if (fwrite(&buf2, sizeof buf2, 1, fout) != 1) {
Handle_WriteError();
}
}
You need to know the size of the file you are reading in advance (using stat() or equivalent), read the number of complete blocks available and then handle the residual bytes, if any, as a special case, perhaps by padding. If you don't want ciphertext expansion, then look at block stealing modes of operation, which are available for both ECB and CBC modes.

Combing two files with binary format

I wrote this code to test to combine two files:
long getFileSize(char *filename)
{
FILE* fp=fopen(filename,"rb");
fseek(fp,0,SEEK_END);
long size=ftell(fp);
fclose(fp);
return size;
}
long lengthA = getFileSize(argv[1]);
long lengthB = getFileSize(argv[2]);
printf("sizeof %s is:%d\n",argv[1],lengthA);
printf("sizeof %s is %d\n",argv[2],lengthB);
void *pa;
void *pb;
FILE* fp=fopen(argv[1],"rb");
fread(pa,1,lengthA,fp);
fclose(fp);
FILE* fpn=fopen(argv[2],"rb");
fread(pb,1,lengthB,fpn);
fclose(fpn);
printf("pointerA is:%p;pointerB is:%p\n",pa,pb);
FILE *ff=fopen("test.pack","wb");
fwrite(pa,1,lengthA,ff);
fwrite(pb,1,lengthB,ff);
fclose(ff);
long lengthFinal = getFileSize("test.pack");
printf("Final size:%i\n",lengthFinal);
however I don't know if the data is equal to the returned value from getFileSize,the console print clearly says something wrong with it,but I can't figure it out:
sizeof a.zip is:465235
sizeof b.zip is 107814
pointerA is:0x80484ec;pointerB is:0x804aff4
Final size:255270
since I know the length of each file,I can then use fseek to restore them right? that's the idea I was thinking.
*pa and *pb need to point to some memory where the file content shall be read to.
So, do a malloc for these two buffers with lengthA*sizeof(char) and lengthB*sizeof(char) and pass these allocated buffers to fread:
pa = malloc(lengthA*sizeof(char));
pb = malloc(lengthB*sizeof(char));
...
fread(pa,sizeof(char),lengthA,fp);
...
fread(pb,sizeof(char),lengthB,fpn);
Furthermore, fread returns the number of items actually read. Also check this!
Excerpt from man fread:
fread() and fwrite() return the number of items successfully read or written (i.e., not the number of characters). If an error occurs, or the end-of-file is reached, the return value is a short item count (or zero).
Note that there's no real reason to load both source files into memory at once. Also, it's potentially very memory-inefficient to do so, since you're really reading all of the files in, and then all you do is write the contents out again.
A better algorithm, in my opinion, would be:
let C = a reasonable buffer size, say 128 KB
let B = a static buffer of C bytes
let R = the output file, opened for binary write
for each input file F:
open F for binary read
repeat
let N be the number of bytes read, up to a maximum of C
if N > 0
write N first bytes of B into R
until N = 0
close F
close R
This does away with the need to allocate buffers dynamically, you could just do char C[B] and have #define B (128 << 10).
The above assumes that reading from a file which has no more bytes to deliver returns 0 bytes.
Also note that by doing away with the need to load the entire file, you also no longer need to open each input file an extra time just to seek to the end in order to compute the file's size.
pa and pb are not pointing to valid memory.
char* pa = malloc(lengthA * sizeof(char));
char* pb = malloc(lengthB * sizeof(char));
Remember to free() when no longer required.
Check all return values from functions fopen(), fread(), fwrite(), etc.

fwrite fails to write big data

I find fwrite fails when I am trying to write somewhat big data as in the following code.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
int size = atoi(argv[1]);
printf("%d\n", size);
FILE* fp = fopen("test", "wb");
char* c = "";
int i = fwrite(c, size, 1, fp);
fclose(fp);
printf("%d\n", i);
return 0;
}
The code is compiled into binary tw
When I try ./tw 10000 it works well. But when I try something like ./tw 12000 it fails.(fwrite() returns 0 instead of 1)
What's the reason of that? In what way can I avoid this?
EDIT: When I do fwrite(c, 1, size, fp) it returns 8192 instead of larger size I give.
2nd EDIT: When I write a loop that runs for size times, and fwrite(c, 1, 1, fp) each time, it work perfectly OK.
It seems when size is too large(as in the first EDIT) it only writes about 8192 bytes.
I guess something has limited fwrite write up to fixed size bytes at a time.
3rd EDIT: The above is not clear.
The following fails for space - w_result != 0 when space is large, where space is determined by me and w_result is object written in total.
w_result = 0;
char* empty = malloc(BLOCKSIZE * size(char));
w_result = fwrite(empty, BLOCKSIZE, space, fp);
printf("%d lost\n", space - w_result);
While this works OK.
w_result = 0;
char* empty = malloc(BLOCKSIZE * sizeof(char));
for(i = 0; i < space; i ++)
w_result += fwrite(empty, BLOCKSIZE, 1, fp);
printf("%d lost\n", space - w_result);
(every variable has been declared.)
I corrected some errors the answers memtioned. But the first one should work according to you.
With fwrite(c, size, 1, fp); you state that fwrite should write 1 item that is size big , big out of the buffer c.
c is just a pointer to an empty string. It has a size of 1. When you tell fwrite to go look for more data than 1 byte in c , you get undefined behavior. You cannot fwrite more than 1 byte from c.
(undefined behavior means anything could happen, it could appear to work fine when you try with a size of 10000 and not with a size of 12000. The implementation dependent reason for that is likely that there is some memory available, perhaps the stack, starting at c and 10000 bytes forward, but at e.g. 11000 there is no memory and you get a segfault)
You are reading memory that doesn't belong to your program (and writing it to a file).
Test your program using valgrind to see the errors.
From that snippet of code, it looks like you're trying to write what's at c, which is just a single NULL byte, to the file pointer, and you're doing so "size" times. The fact that it doesn't crash with 10000 is coincidental. What are you trying to do?
As has been stated by others the code is performing an invalid memory read via c.
A possible solution would be to dynamically allocate a buffer that is size bytes in size, initialise it, and fwrite() it to the file, remembering to deallocate the buffer afterwards.
Remember to check return values from functions (fopen() for example).

Resources