How to write bytes individually into a file in C? - c

I have an array of ints where every cell is a number between 0-255 representing a byte I want to write to the output file.
I have written this simple loop:
for (int i = 0; i < len ; i++)
putc(write_buffer[i], OutputFile);
However after reviewing the output file using hex editor, I found that certain bytes have been duplicated and/or moved around within the file.
For example with the input array:
int write_buffer[10]= {137,80,78,71,13,10,26,10,0,0}
The output file contents (in decimal) are: 137 80 78 71 13 13 10 26 13 10 0 0
Does anyone have a clue as for what could be causing this?

Psychic debugging: You opened the file in text mode on Windows passing fopen with mode "wt" or, when _fmode is set to _O_TEXT, which is the default, "w", which means it's applying line ending translation, converting LF ('\n'/10) to CRLF ('\r' '\n', 13 10).
Change your fopen to pass a mode of "wb", not just "w"/"wt", so the file is operating in binary mode, and the translation won't be performed with any C stdio functions.
pts made a useful note in the comments: If you didn't open the file yourself (using stdout, or a handle opened by code you don't control), you can switch it to binary mode with setmode(fileno(OutputFile), O_BINARY); (for stdout, 1 can replace the first argument).

Related

fread is not reading whole file [duplicate]

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

How to use FILE in binary and text mode in C [duplicate]

What translation occurs when writing to a file that was opened in text mode that does not occur in binary mode? Specifically in MS Visual C.
unsigned char buffer[256];
for (int i = 0; i < 256; i++) buffer[i]=i;
int size = 1;
int count = 256;
Binary mode:
FILE *fp_binary = fopen(filename, "wb");
fwrite(buffer, size, count, fp_binary);
Versus text mode:
FILE *fp_text = fopen(filename, "wt");
fwrite(buffer, size, count, fp_text);
I believe that most platforms will ignore the "t" option or the "text-mode" option when dealing with streams. On windows, however, this is not the case. If you take a look at the description of the fopen() function at: MSDN, you will see that specifying the "t" option will have the following effect:
line feeds ('\n') will be translated to '\r\n" sequences on output
carriage return/line feed sequences will be translated to line feeds on input.
If the file is opened in append mode, the end of the file will be examined for a ctrl-z character (character 26) and that character removed, if possible. It will also interpret the presence of that character as being the end of file. This is an unfortunate holdover from the days of CPM (something about the sins of the parents being visited upon their children up to the 3rd or 4th generation). Contrary to previously stated opinion, the ctrl-z character will not be appended.
In text mode, a newline "\n" may be converted to a carriage return + newline "\r\n"
Usually you'll want to open in binary mode. Trying to read any binary data in text mode won't work, it will be corrupted. You can read text ok in binary mode though - it just won't do automatic translations of "\n" to "\r\n".
See fopen
Additionally, when you fopen a file with "rt" the input is terminated on a Crtl-Z character.
Another difference is when using fseek
If the stream is open in binary mode, the new position is exactly offset bytes measured from the beginning of the file if origin is SEEK_SET, from the current file position if origin is SEEK_CUR, and from the end of the file if origin is SEEK_END. Some binary streams may not support the SEEK_END.
If the stream is open in text mode, the only supported values for offset are zero (which works with any origin) and a value returned by an earlier call to std::ftell on a stream associated with the same file (which only works with origin of SEEK_SET.
Even though this question was already answered and clearly explained, I think it would be interesting to show the main issue (translation between \n and \r\n) with a simple code example. Note that I'm not addressing the issue of the Crtl-Z character at the end of the file.
#include <stdio.h>
#include <string.h>
int main() {
FILE *f;
char string[] = "A\nB";
int len;
len = strlen(string);
printf("As you'd expect string has %d characters... ", len); /* prints 3*/
f = fopen("test.txt", "w"); /* Text mode */
fwrite(string, 1, len, f); /* On windows "A\r\nB" is writen */
printf ("but %ld bytes were writen to file", ftell(f)); /* prints 4 on Windows, 3 on Linux*/
fclose(f);
return 0;
}
If you execute the program on Windows, you will see the following message printed:
As you'd expect string has 3 characters... but 4 bytes were writen to file
Of course you can also open the file with a text editor like Notepad++ and see yourself the characters:
The inverse conversion is performed on Windows when reading the file in text mode.
We had an interesting problem with opening files in text mode where the files had a mixture of line ending characters:
1\n\r
2\n\r
3\n
4\n\r
5\n\r
Our requirement is that we can store our current position in the file (we used fgetpos), close the file and then later to reopen the file and seek to that position (we used fsetpos).
However, where a file has mixtures of line endings then this process failed to seek to the actual same position. In our case (our tool parses C++), we were re-reading parts of the file we'd already seen.
Go with binary - then you can control exactly what is read and written from the file.
In 'w' mode, the file is opened in write mode and the basic coding is 'utf-8'
in 'wb' mode, the file is opened in write -binary mode and it is resposible for writing other special characters and the encoding may be 'utf-16le' or others

How does C store the information into a file really?

In the simple code below, I'm writing an int number (10) into a file and then reading it back to make sure it's done successfully and it is. However, when I open the file (tried both notepad++ and vscode) I see something like this:
???
Here's the code:
int main(){
int var = 10;
FILE* fp = fopen("testfile","w");
rewind(fp);
fwrite(&var,sizeof(int),1,fp);
fflush(fp);
fclose(fp);
int var2 = 0;
fopen("testfile","r+");
fread(&var2,sizeof(int),1,fp);
printf("num: %d\n",var2);
return 0;
}
Of course I thought maybe it's written in a special format which vscode is unable to recognize, but recently I learned coding a simple database, and it used just the same way to save the records in files and when you opened its output file with vscode, it showed both ???s AND the information, however, here it shows only ???s WITHOUT the information. So although it seems be a very basic problem, I can't find the answer to it, so how is 10 really stored in that file? Thanks in advance.
When you write to the file with fwrite, it reads the raw bytes that make up var and writes those to disk. This is the binary representation of the number.
If you use a tool like od, it will print out the bytes the files contains:
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 0a 00 00 00
0000004
You can see here that the first byte contains the value 10 and the next 3 contain the value 0. This tells us that an int takes up 4 bytes and is stored in little-endian format, meaning the least significant byte comes first.
Had you instead uses fprintf to write the value:
fprintf(fp, "%d\n", var);
It would have written the text representation to the file. The file would then look something like this:
[dbush#db-centos7 ~]$ cat testfile
10
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 31 30 0a
0000003
We can see here that printing the file shows readable text, and od shows us the ASCII codes for the characters '1' and '0', as well as a newline.
You are writing a binary file. It cannot be read with an editor. The value 10 is probably stored as 0x0000000A or 0x0A000000 something like that, depending on if the system is big or small endian.
But the point is that it is stored in binary format and not text format.
If you open this file in a text editor, it will likely be interpreted as three NULL characters and then a LF (line feed) character.

odd append behaviour

If I have a file containing
manual
/lib/plymouth/themes/default.plymouth
/lib/plymouth/themes/spinfinity/spinfinity.plymouth
10
/lib/plymouth/themes/ubuntu-logo/ubuntu-logo.plymouth
100
and then I open it in a mode, then do
fprintf(f, "/el/derpito.plymouth\n100\n");
why is the file now containing this?
manual
/lib/plymouth/themes/default.plymouth
/lib/plymouth/themes/spinfinity/spinfinity.plymouth
10
/lib/plymouth/themes/ubuntu-logo/ubuntu-logo.plymouth
100
/el/derpito.plymouth
100
I'd expect the file to be this instead:
manual
/lib/plymouth/themes/default.plymouth
/lib/plymouth/themes/spinfinity/spinfinity.plymouth
10
/lib/plymouth/themes/ubuntu-logo/ubuntu-logo.plymouth
100
/el/derpito.plymouth
100
Perhaps there is a lonely \n or \cr stuck at the end of the file before you make the write. I wuld open it with a hex editor and see.
my first guess would be that the last character of your file (before the appending) is a newline-character, after which append appends the new line.

fread Only first 5 bytes of .PNG file

I've made a simple resource packer for packing the resources for my game into one file. Everything was going fine until I began writing the unpacker.
I noticed the .txt file - 26 bytes - that I had packed, came out of the resource file fine, without anyway issues, all data preserved.
However when reading the .PNG file I had packed in the resource file, the first 5 bytes were intact while the rest was completely nullified.
I traced this down to the packing process, and I noticed that fread is only reading the first 5 bytes of the .PNG file and I can't for the life of me figure out why. It even triggers 'EOF' indicating that the file is only 5 bytes long, when in fact it is a 787 byte PNG of a small polygon, 100px by 100px.
I even tested this problem by making a separate application to simply read this PNG file into a buffer, I get the same results and only 5-bytes are read.
Here is the code of that small separate application:
#include <cstdio>
int main(int argc, char** argv)
{
char buffer[1024] = { 0 };
FILE* f = fopen("test.png", "r");
fread(buffer, 1, sizeof(buffer), f);
fclose(f); //<- I use a breakpoint here to verify the buffer contents
return 0;
}
Can somebody please point out my stupid mistake?
Can somebody please point out my stupid mistake?
Windows platform, I guess?
Use this:
FILE* f = fopen("test.png", "rb");
instead of this:
FILE* f = fopen("test.png", "r");
See msdn for explanation.
Extending the correct answer from SigTerm, here is some background of why you got the effect you did for opening a PNG file in text mode:
The PNG format explains its 8-byte file header as follows:
The first eight bytes of a PNG file always contain the following values:
(decimal) 137 80 78 71 13 10 26 10
(hexadecimal) 89 50 4e 47 0d 0a 1a 0a
(ASCII C notation) \211 P N G \r \n \032 \n
This signature both identifies the file as a PNG file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish PNG files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a PNG file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem.
I believe that in text mode, the call to fread() was terminated when it read the sixth byte which contains a Ctrl+Z character. Ctrl+Z was historically used in MSDOS (and in CPM before it) to indicate the end of a file, which was necessary because the file system stored the size of a file as a count of blocks, not a count of bytes.
By reading the file in text mode instead of binary mode, you triggered the protection against accidentally using the TYPE command to display a PNG file.
One thing you could do that would have helped diagnose this error is to use fread() slightly differently. You didn't test the return value from fread(). You should. Further, you should call it like this:
...
size_t nread;
...
nread = fread(buffer, sizeof(buffer), 1, f);
so that nread is a count of the bytes actually written to the buffer. For the PNG file in text mode, it would have told you on the first read that it only read 5 bytes. Since the file cannot be that small, you would have had a clue that something else was going on. The remaining bytes of the buffer were never modified by fread(), which would have been seen if you initialized the buffer to some other fill value.

Resources