I am trying to read 128KB binary file in chunks of 256 Bytes. The first 20-40 bytes of 256 bytes seems to be always correct. However after that the data gets corrupted. I tried reading the file and writing it into another binary file and compared. More than half of the data is corrupted. Here is my code
uint8_t buffer[256]
read_bin_file = fopen("vtest.bin", "r");
if (read_bin_file == NULL)
{
printf("Unable to open file\n");
return false;
}
test_bin = fopen("test_file.bin", "w");
if (test_file == NULL)
{
printf("Unable to open file\n");
return false;
}
fflush(stdout);
for (i = 0; i <=0x1FF; i++)
{
file_Read_pointer = i * 256;
fseek(read_bin_file, file_Read_pointer, SEEK_SET);
fread(buffer, 256, 1, read_bin_file);
fseek(test_file, file_Read_pointer, SEEK_SET);
fwrite(buffer, 256, 1, test_file);
}
What is that I am missing?
Also when i try to increase the bytes read from 256 to 1024 ( i<0x7F) the error seems to decrease significantly. The file is almost 90% matching
If it is binary data you're reading and writing, then you should open the files in binary mode with read_bin_file = fopen("vtest.bin", "rb");. Note the "b" in the mode. This prevents special handling of new line characters.
Your fseeks are also unnecessary, the fread and fwrite calls will handle that for you.
From here "The file position indicator for the stream is advanced by the number of characters read."
Related
Introduction
I'm writing my own cp program. With the code I currently have I'm able to copy and paste files.
Code
char *buf;
int fd;
int ret;
struct stat sb;
FILE *stream;
/*opening and getting size of file to copy*/
fd = open(argv[1],O_RDONLY);
if(fd == -1)
{
perror("open");
return 1;
}
/*obtaining size of file*/
ret = fstat(fd,&sb);
if(ret)
{
perror("stat");
return 1;
}
/*opening a stream for reading/writing file*/
stream fdopen(fd,"rb");
if(!stream)
{
perror("fdopen");
return 1;
}
/*allocating space for reading binary file*/
buf = malloc(sb.st_size);
/*reading data*/
if(!fread(buf,sb.st_size,1,stream))
{
perror("fread");
return 1;
}
/*writing file to a duplicate*/
fclose(stream);
stream = fopen("duplicate","wb");
if(!fwrite(buf,sb.st_size,1,stream))
{
perror("fwrite");
return 1;
}
fclose(stream);
close(fd);
free(buf);
return 0;
The problem
I'm unable to copy and paste .zip files and .tar.gz files. If i alter the code and give an extension such as 'duplicate.zip' (assuming im copying a zip file) such as .zip and then try and copy a .zip file
everything is copied, however the new duplicated file does not act like a zip file and when i use cat it outputs nothing and this error when i attempt to unzip it anyway:
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
So how do i go about copying zip and pasting zip files and also .tar.gz files. Any pointers will be helpful, thanks in advance.
You are using malloc() incorrectly. You want to allocate sb.st_size bytes.
malloc(sb.st_size * sizeof buf)
should be
malloc(sb.st_size)
The use of fread() is dubious and you are throwing away the result of fread(). Instead of
if(!fread(buf,sb.st_size,1,stream))
you should have
size_t num_bytes_read = fread (buf, 1, sb.st_size, stream);
if (num_bytes_read < sb.st_size)
You are using strlen() incorrectly. The content of buf is not guaranteed to be a string; and anyway you already know how many bytes you have in buf: sb.st_size. (Because if fread() returned a smaller number of bytes read you got angry and terminated the process.) So instead of
fwrite(buf,strlen(buf),1,stream)
you should have
fwrite (buf, 1, sb.st_size, stream)
In addition to AlexP's notes...
/*obtaining size of file*/
ret = fstat(fd,&sb);
if(ret)
{
perror("stat");
return 1;
}
// ...some code...
/*allocating space for reading binary file*/
buf = malloc(sb.st_size);
/*reading data*/
if(!fread(buf,sb.st_size,1,stream))
{
perror("fread");
return 1;
}
You have a race condition here. If the file size changes between your fstat call and malloc or fread you will read too much or too little of the file.
Fixing this leads us to the next issue, you're slurping the entire file into memory. While this might work for small files, it is extremely inefficient with your memory on large ones. For very large files it might be too large for a single malloc, and you're not checking if your malloc succeeds.
Instead, read and write the file a piece at a time. And read until there isn't any more to read.
uint8_t *buffer[4096]; // 4K buffer
size_t num_read;
while( (num_read = fread(buffer, sizeof(uint8_t), sizeof(buffer), in)) != 0 ) {
if( fwrite( buffer, sizeof(uint8_t), num_read, out ) == 0 ) {
perror("fwrite");
}
}
This avoids the race condition by not having to call fstat in the first place. And it avoids allocating a potentially enormous hunk of memory. Instead it can all be done on the stack.
I've used uint8_t to get a hunk of bytes. It's a standard fixed width integer type from stdint.h. You can also use unsigned char to read bytes, and that's probably what uint8_t really is, but uint8_t makes it explicit.
I am trying to write 100 integers to binary file. I have tried writing to this file, and reading from it. When reading from it I get completely random digits.
Here is the block concerning the write.
Do note I have the file open for write with "wb" mode. I have also closed the file at the end.
for (int i = 0; i < 99; i++) {
fwrite(&i, sizeof(int), 1, file);
}
Here is the block concerning the read.
Do note I do have the file open here in "rb" mode and it is closed.
int num;
for (int i = 0; i < 100; i++) {
int rc = getc(file);
if (rc == EOF) {
fputs("Error occured while reading file", stderr);
return EXIT_FAILURE;
}
fread(&num, sizeof(int), 1, file);
printf("%d", num);
}
My output is like this:
-13421772802147469895-168955699232767012640583688388440-104919389914260634872147467638000128293273683884400-19797114882147440795-168947558432767-1097029212883066888388440148657280313254001912147440795-168942592032767-109702911303445504838844014865730434362077432147440795-168935577632767-1097029063753420883766251486573257-6039796492147440795-168932864032767-109702901326841856838844014865733541270-168949760032767-10970289133241241683884401486573450-1090518913214744079500196944831217016018891752457584192041348617175279241952408940298110176910929517683167731702125413116313304413809989891296126535181930809719192433591818324585127960891517680423011935761967-13421772802147469895-168955699232767012640583688388440-104919389914260634872147467638000128293273683884400-19797114882147440795-168947558432767
So there is something wrong, and I am not sure what exactly. Perhaps I am not sure if I understand the API for reading/writing completely (specifically size_t nitems)? I am not sure how to tell how many bytes I need to read/write from a file.
In the first loop, you are writing 100 integers starting at the address of 'i', 99 times.
Not what I think you were thinking you were doing.
it should be
fwrite(&i, sizeof(int), 1, file);
Secondly, what mode do you open the file for writing? It should be opened in binary mode otherwise it will not save binary data correctly (add 'b' to the fopen mode value)
DO you close and reopen the file for the read (and set the right file mode?) or if I was left open, do you fseek back to beginning of the file before trying to read the values.
I consider reading file of unknown size that I know doesn't change size in the meantime. So I intend to use fstat() function and struct stat. Now I am considering what the st_size field really means and how should I use it.
If I get the file size's in this way, then allocate a buffer of that size and read exactly that size of bytes there seems to be one byte left over. I come to this conclusion when I used feof() function to check if there really nothing left in FILE *. It returns false! So I need to read (st_size + 1) and only than all bytes have been read and feof() works correctly. Should I always add this +1 value to this size to read all bytes from binary file or there is some hidden reason that this isn't reading to EOF?
struct stat finfo;
fstat(fileno(fp), &finfo);
data_length = finfo.st_size;
I am asking about this because when I add +1 then the number of bytes read by fread() is really -1 byte less, and as the last byte is inserted 00 byte. I could also before checking with feof() do something like this
fread(NULL, 1, 1, fp);
It is the real code, it is a little odd situation:
// reading png bytes from file
FILE *fp = fopen("./test/resources/RGBA_8bits.png", "rb");
// get file size from file info
struct stat finfo;
fstat(fileno(fp), &finfo);
pngDataLength = finfo.st_size;
pngData = malloc(sizeof(unsigned char)*pngDataLength);
if( fread(pngData, 1, pngDataLength, fp) != pngDataLength) {
fprintf(stderr, "%s: Incorrect number of bytes read from file!\n", __func__);
fclose(fp);
free(pngData);
return;
}
fread(NULL, 1, 1, fp);
if(!feof(fp)) {
fprintf(stderr, "%s: Not the whole binary file has been read.\n", __func__);
fclose(fp);
free(pngData);
return;
}
fclose(fp);
This behaviour is normal.
feof will return true only once you have tried to read beyond the file's end which you don't do as you read exactly the size of the file.
I am trying to build antivirus in C.
I do that like this:
Read data of virus and picture file to scanned.
Check if virus data appear in picture data.
I read the data of scanned file and virus file like this: ( I read the file by binary mode because the file is picture(.png) )
// open file
file = fopen(filePath, "rb");
if (!file)
{
printf("Error: can't open file.\n");
return 0;
}
// Allocate memory for fileData
char* fileData = calloc(fileLength + 1, sizeof(char));
// Read data of file.
fread(fileData, fileLength, 1, file);
after i read the file data and the Virus data i check if the virus appear in the file like this:
char* ret = strstr(fileData, virusID);
if (ret != NULL)
printf("Infetecd file");
It does not work even though in my picture i have VirusID.
I want to check if the binary data of virus appear in binary data of picture.
For example: my binary data of my virus http://pastebin.com/xZbWA9qu
And the binary data of my picture(with the virus): http://pastebin.com/yjXr84kr
First, note the order of arguments of fread, fread(void *ptr, size_t size, size_t nmemb, FILE *stream); so to get the number of bytes, it's better to do fread(fileData, 1, fileLength, file);. Your code will return 0 or 1 depends on whether there is enough data to be read in the file, not the number of bytes it has read.
Second, strstr is to search for strings, not memory blocks, in order to search binary blocks, you need to write your own, or you can use the GNU extension function memmem.
// Allocate memory for fileData
char *fileData = malloc(fileLength);
// Read data of file.
size_t nread = fread(fileData, 1, fileLength, file);
void *ret = memmem(fileData, nread, virusID, virusLen);
if (ret != NULL)
printf("Infetecd file");
Search for the first byte of the virus signature, if you find it then see if the next byte is the second byte of the signature, and so on until you have checked and matched all bytes of the signature. Then the file is infected. If not all bytes matches then search again for the first byte of the signature.
I am reaing from a file, and when i read, it takes it line by line, and prints it
what i want exactly is i want an array of char holding all the chars in the file and print it once,
this is the code i have
if(strcmp(str[0],"#")==0)
{
FILE *filecomand;
//char fname[40];
char line[100];
int lcount;
///* Read in the filename */
//printf("Enter the name of a ascii file: ");
//fgets(History.txt, sizeof(fname), stdin);
/* Open the file. If NULL is returned there was an error */
if((filecomand = fopen(str[1], "r")) == NULL)
{
printf("Error Opening File.\n");
//exit(1);
}
lcount=0;
int i=0;
while( fgets(line, sizeof(line), filecomand) != NULL ) {
/* Get each line from the infile */
//lcount++;
/* print the line number and data */
//printf("%s", line);
}
fclose(filecomand); /* Close the file */
You need to determine the size of the file. Once you have that, you can allocate an array large enough and read it in a single go.
There are two ways to determine the size of the file.
Using fstat:
struct stat stbuffer;
if (fstat(fileno(filecommand), &stbuffer) != -1)
{
// file size is in stbuffer.st_size;
}
With fseek and ftell:
if (fseek(fp, 0, SEEK_END) == 0)
{
long size = ftell(fp)
if (size != -1)
{
// succesfully got size
}
// Go back to start of file
fseek(fp, 0, SEEK_SET);
}
Another solution would be to map the entire file to the memory and then treat it as a char array.
Under windows MapViewOfFile, and under unix mmap.
Once you mapped the file (plenty of examples), you get a pointer to the file's beginning in the memory. Cast it to char[].
Since you can't assume how big the file is, you need to determine the size and then dynamically allocate a buffer.
I won't post the code, but here's the general scheme. Use fseek() to navigate to the end of file, ftell() to get size of the file, and fseek() again to move the start of the file. Allocate a char buffer with malloc() using the size you found. The use fread() to read the file into the buffer. When you are done with the buffer, free() it.
Use a different open. i.e.
fd = open(str[1], O_RDONLY|O_BINARY) /* O_BINARY for MS */
The read statement would be for a buffer of bytes.
count = read(fd,buf, bytecount)
This will do a binary open and read on the file.