fread Only first 5 bytes of .PNG file - c

I've made a simple resource packer for packing the resources for my game into one file. Everything was going fine until I began writing the unpacker.
I noticed the .txt file - 26 bytes - that I had packed, came out of the resource file fine, without anyway issues, all data preserved.
However when reading the .PNG file I had packed in the resource file, the first 5 bytes were intact while the rest was completely nullified.
I traced this down to the packing process, and I noticed that fread is only reading the first 5 bytes of the .PNG file and I can't for the life of me figure out why. It even triggers 'EOF' indicating that the file is only 5 bytes long, when in fact it is a 787 byte PNG of a small polygon, 100px by 100px.
I even tested this problem by making a separate application to simply read this PNG file into a buffer, I get the same results and only 5-bytes are read.
Here is the code of that small separate application:
#include <cstdio>
int main(int argc, char** argv)
{
char buffer[1024] = { 0 };
FILE* f = fopen("test.png", "r");
fread(buffer, 1, sizeof(buffer), f);
fclose(f); //<- I use a breakpoint here to verify the buffer contents
return 0;
}
Can somebody please point out my stupid mistake?

Can somebody please point out my stupid mistake?
Windows platform, I guess?
Use this:
FILE* f = fopen("test.png", "rb");
instead of this:
FILE* f = fopen("test.png", "r");
See msdn for explanation.

Extending the correct answer from SigTerm, here is some background of why you got the effect you did for opening a PNG file in text mode:
The PNG format explains its 8-byte file header as follows:
The first eight bytes of a PNG file always contain the following values:
(decimal) 137 80 78 71 13 10 26 10
(hexadecimal) 89 50 4e 47 0d 0a 1a 0a
(ASCII C notation) \211 P N G \r \n \032 \n
This signature both identifies the file as a PNG file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish PNG files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a PNG file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem.
I believe that in text mode, the call to fread() was terminated when it read the sixth byte which contains a Ctrl+Z character. Ctrl+Z was historically used in MSDOS (and in CPM before it) to indicate the end of a file, which was necessary because the file system stored the size of a file as a count of blocks, not a count of bytes.
By reading the file in text mode instead of binary mode, you triggered the protection against accidentally using the TYPE command to display a PNG file.
One thing you could do that would have helped diagnose this error is to use fread() slightly differently. You didn't test the return value from fread(). You should. Further, you should call it like this:
...
size_t nread;
...
nread = fread(buffer, sizeof(buffer), 1, f);
so that nread is a count of the bytes actually written to the buffer. For the PNG file in text mode, it would have told you on the first read that it only read 5 bytes. Since the file cannot be that small, you would have had a clue that something else was going on. The remaining bytes of the buffer were never modified by fread(), which would have been seen if you initialized the buffer to some other fill value.

Related

fread is not reading whole file [duplicate]

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

How to write bytes individually into a file in C?

I have an array of ints where every cell is a number between 0-255 representing a byte I want to write to the output file.
I have written this simple loop:
for (int i = 0; i < len ; i++)
putc(write_buffer[i], OutputFile);
However after reviewing the output file using hex editor, I found that certain bytes have been duplicated and/or moved around within the file.
For example with the input array:
int write_buffer[10]= {137,80,78,71,13,10,26,10,0,0}
The output file contents (in decimal) are: 137 80 78 71 13 13 10 26 13 10 0 0
Does anyone have a clue as for what could be causing this?
Psychic debugging: You opened the file in text mode on Windows passing fopen with mode "wt" or, when _fmode is set to _O_TEXT, which is the default, "w", which means it's applying line ending translation, converting LF ('\n'/10) to CRLF ('\r' '\n', 13 10).
Change your fopen to pass a mode of "wb", not just "w"/"wt", so the file is operating in binary mode, and the translation won't be performed with any C stdio functions.
pts made a useful note in the comments: If you didn't open the file yourself (using stdout, or a handle opened by code you don't control), you can switch it to binary mode with setmode(fileno(OutputFile), O_BINARY); (for stdout, 1 can replace the first argument).

How to use FILE in binary and text mode in C [duplicate]

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

How does C store the information into a file really?

In the simple code below, I'm writing an int number (10) into a file and then reading it back to make sure it's done successfully and it is. However, when I open the file (tried both notepad++ and vscode) I see something like this:
???
Here's the code:
int main(){
int var = 10;
FILE* fp = fopen("testfile","w");
rewind(fp);
fwrite(&var,sizeof(int),1,fp);
fflush(fp);
fclose(fp);
int var2 = 0;
fopen("testfile","r+");
fread(&var2,sizeof(int),1,fp);
printf("num: %d\n",var2);
return 0;
}
Of course I thought maybe it's written in a special format which vscode is unable to recognize, but recently I learned coding a simple database, and it used just the same way to save the records in files and when you opened its output file with vscode, it showed both ???s AND the information, however, here it shows only ???s WITHOUT the information. So although it seems be a very basic problem, I can't find the answer to it, so how is 10 really stored in that file? Thanks in advance.
When you write to the file with fwrite, it reads the raw bytes that make up var and writes those to disk. This is the binary representation of the number.
If you use a tool like od, it will print out the bytes the files contains:
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 0a 00 00 00
0000004
You can see here that the first byte contains the value 10 and the next 3 contain the value 0. This tells us that an int takes up 4 bytes and is stored in little-endian format, meaning the least significant byte comes first.
Had you instead uses fprintf to write the value:
fprintf(fp, "%d\n", var);
It would have written the text representation to the file. The file would then look something like this:
[dbush#db-centos7 ~]$ cat testfile
10
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 31 30 0a
0000003
We can see here that printing the file shows readable text, and od shows us the ASCII codes for the characters '1' and '0', as well as a newline.
You are writing a binary file. It cannot be read with an editor. The value 10 is probably stored as 0x0000000A or 0x0A000000 something like that, depending on if the system is big or small endian.
But the point is that it is stored in binary format and not text format.
If you open this file in a text editor, it will likely be interpreted as three NULL characters and then a LF (line feed) character.

Reading a jpeg file byte by byte

For the class cs50, I have to read in jpeg files byte by byte from a memory card in order to look at the header information. The file compiles well, but whenever I execute the file, it returns a "segmentation fault(core dumped)" message.
Edit) Okay, now I know why I have to use an "unsigned char" instead of "int*". Can someone tell me how I can store information into files within scope for this particular code? Right now, I am trying to store information outside of an if() condition, and I don't think the fread function is actually accessing the "image" file I opened.
#include <stdio.h>
#include <string.h>
#include <math.h>
FILE * image = NULL;
int main(int argc, char* argv[])
{
FILE* infile = fopen("card.raw", "r");
if (infile == NULL)
{
printf("Could not open.\n");
fclose(infile);
return 1;
}
unsigned char storage[512];
int number = 0;
int b = floor((number) / 100);
int c = floor(((number) - (b * 100))/ 10);
int d = floor(((number) - (b * 100) - (c * 10)));
int writing = 0;
char string[5];
char* extension = ".jpg";
while (fread(&storage, sizeof(storage), 1, infile))
{
if (storage == NULL)
{
break;
}
if (storage[0] == 0xff && storage[1] == 0xd8 && storage[2] == 0xff)
{
if (storage[3] == 0xe0 || storage[3] == 0xe1)
{
if (image != NULL)
{
fclose(image);
}
sprintf(string, "%d%d%d%s", b, c, d, extension);
image = fopen(string, "w");
number++;
writing = 1;
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
}
}
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
if (storage == NULL)
{
fclose(image);
}
}
fclose(image);
fclose(infile);
return 0;
}
This is the problem set just in case my explanation is not clear.
recover
In anticipation of this problem set, I spent the past several days snapping photos of people I know, all of which were saved by my
digital camera as JPEGs on a 1GB CompactFlash (CF) card. (It’s
possible I actually spent the past several days on Facebook instead.)
Unfortunately, I’m not very good with computers, and I somehow deleted
them all! Thankfully, in the computer world, "deleted" tends not to
mean "deleted" so much as "forgotten." My computer insists that the CF
card is now blank, but I’m pretty sure it’s lying to me.
Write in ~/Dropbox/pset4/jpg/recover.c a program that recovers these photos.
Ummm.
Okay, here’s the thing. Even though JPEGs are more complicated than BMPs, JPEGs have "signatures," patterns of bytes that distinguish
them from other file formats. In fact, most JPEGs begin with one of
two sequences of bytes. Specifically, the first four bytes of most
JPEGs are either
0xff 0xd8 0xff 0xe0 or 0xff 0xd8 0xff 0xe1
from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos
(e.g., my CF card), they demark the start of a JPEG. (To be sure, you
might encounter these patterns on some disk purely by chance, so data
recovery isn’t an exact science.)
Fortunately, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately
after the previously taken photo. Accordingly, the start of a JPEG
usually demarks the end of another. However, digital cameras generally
initialize CF cards with a FAT file system whose "block size" is 512
bytes (B). The implication is that these cameras only write to those
cards in units of 512 B. A photo that’s 1 MB (i.e.,
1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 "blocks" on a CF card. But so does a photo that’s, say, one byte smaller (i.e.,
1,048,575 B)! The wasted space on disk is called "slack space."
Forensic investigators often look at slack space for remnants of
suspicious data.
The implication of all these details is that you, the investigator, can probably write a program that iterates over a copy
of my CF card, looking for JPEGs' signatures. Each time you find a
signature, you can open a new file for writing and start filling that
file with bytes from my CF card, closing that file only once you
encounter another signature. Moreover, rather than read my CF card’s
bytes one at a time, you can read 512 of them at a time into a buffer
for efficiency’s sake. Thanks to FAT, you can trust that JPEGs'
signatures will be "block-aligned." That is, you need only look for
those signatures in a block’s first four bytes.
Realize, of course, that JPEGs can span contiguous blocks. Otherwise, no JPEG could be larger than 512 B. But the last byte of a
JPEG might not fall at the very end of a block. Recall the possibility
of slack space. But not to worry. Because this CF card was brand- new
when I started snapping photos, odds are it’d been "zeroed" (i.e.,
filled with 0s) by the manufacturer, in which case any slack space
will be filled with 0s. It’s okay if those trailing 0s end up in the
JPEGs you recover; they should still be viewable.
Now, I only have one CF card, but there are a whole lot of you! And so I’ve gone ahead and created a "forensic image" of the card,
storing its contents, byte after byte, in a file called card.raw . So
that you don’t waste time iterating over millions of 0s unnecessarily,
I’ve only imaged the first few megabytes of the CF card. But you
should ultimately find that the image contains 16 JPEGs. As usual, you
can open the file programmatically with
fopen , as in the below. FILE* file = fopen("card.raw", "r");
Notice, incidentally, that ~/Dropbox/pset4/jpg contains only recover.c, but it’s devoid of any code. (We leave it to you to decide
how to implement and compile recover!) For simplicity, you should
hard-code "card.raw" in your program; your program need not accept any
command-line arguments. When executed, though, your program should
recover every one of the JPEGs from card.raw, storing each as a
separate file in your current working directory. Your program should
number the files it
outputs by naming each , ###.jpg where ### is three-digit decimal number from 000 on up. (Befriend sprintf.) You need not try to
recover the JPEGs' original names. To
check whether the JPEGs your program spit out are correct, simply double-click and take a look! If each photo appears intact,
your operation was likely a success!
Odds are, though, the JPEGs that the first draft of your code spits out won’t be correct. (If you open them up and don’t see
anything, they’re probably not correct!) Execute the command below to
delete all JPEGs in your current working directory.
rm *.jpg
If you’d rather not be prompted to confirm each deletion, execute the command below instead.
rm -f *.jpg
Just be careful with that -f switch, as it "forces" deletion without prompting you.
int* storage[512];
You define a pointer to a memory location for 512 ints, but you don't actually reserve the space (only the pointer.
I suspect you just want
int storage[512];
After this, storage is still a pointer, but now it actually points to 512 ints. Though I still think you don't want this. You need 'bytes' not ints. The nearest C has are unsigned char. So the final declaration is:
unsigned char storage[512];
Why? Because read reads into consecutive bytes. If you read into ints, then you will read 4 bytes into each int (because an int occupies 4 bytes).
There are a number of problems in your program. The first is that you have not opened the file in binary mode.
The second is that you are doing unnecessary pointer arithmetic. Why not—
char buffer [BUFFERSIZE] ;
....
if (buffer [ii] == WHATEVER)

Resources