How does C store the information into a file really? - c

In the simple code below, I'm writing an int number (10) into a file and then reading it back to make sure it's done successfully and it is. However, when I open the file (tried both notepad++ and vscode) I see something like this:
???
Here's the code:
int main(){
int var = 10;
FILE* fp = fopen("testfile","w");
rewind(fp);
fwrite(&var,sizeof(int),1,fp);
fflush(fp);
fclose(fp);
int var2 = 0;
fopen("testfile","r+");
fread(&var2,sizeof(int),1,fp);
printf("num: %d\n",var2);
return 0;
}
Of course I thought maybe it's written in a special format which vscode is unable to recognize, but recently I learned coding a simple database, and it used just the same way to save the records in files and when you opened its output file with vscode, it showed both ???s AND the information, however, here it shows only ???s WITHOUT the information. So although it seems be a very basic problem, I can't find the answer to it, so how is 10 really stored in that file? Thanks in advance.

When you write to the file with fwrite, it reads the raw bytes that make up var and writes those to disk. This is the binary representation of the number.
If you use a tool like od, it will print out the bytes the files contains:
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 0a 00 00 00
0000004
You can see here that the first byte contains the value 10 and the next 3 contain the value 0. This tells us that an int takes up 4 bytes and is stored in little-endian format, meaning the least significant byte comes first.
Had you instead uses fprintf to write the value:
fprintf(fp, "%d\n", var);
It would have written the text representation to the file. The file would then look something like this:
[dbush#db-centos7 ~]$ cat testfile
10
[dbush#db-centos7 ~]$ od -tx1 testfile
0000000 31 30 0a
0000003
We can see here that printing the file shows readable text, and od shows us the ASCII codes for the characters '1' and '0', as well as a newline.

You are writing a binary file. It cannot be read with an editor. The value 10 is probably stored as 0x0000000A or 0x0A000000 something like that, depending on if the system is big or small endian.
But the point is that it is stored in binary format and not text format.
If you open this file in a text editor, it will likely be interpreted as three NULL characters and then a LF (line feed) character.

Related

How to write bytes individually into a file in C?

I have an array of ints where every cell is a number between 0-255 representing a byte I want to write to the output file.
I have written this simple loop:
for (int i = 0; i < len ; i++)
putc(write_buffer[i], OutputFile);
However after reviewing the output file using hex editor, I found that certain bytes have been duplicated and/or moved around within the file.
For example with the input array:
int write_buffer[10]= {137,80,78,71,13,10,26,10,0,0}
The output file contents (in decimal) are: 137 80 78 71 13 13 10 26 13 10 0 0
Does anyone have a clue as for what could be causing this?
Psychic debugging: You opened the file in text mode on Windows passing fopen with mode "wt" or, when _fmode is set to _O_TEXT, which is the default, "w", which means it's applying line ending translation, converting LF ('\n'/10) to CRLF ('\r' '\n', 13 10).
Change your fopen to pass a mode of "wb", not just "w"/"wt", so the file is operating in binary mode, and the translation won't be performed with any C stdio functions.
pts made a useful note in the comments: If you didn't open the file yourself (using stdout, or a handle opened by code you don't control), you can switch it to binary mode with setmode(fileno(OutputFile), O_BINARY); (for stdout, 1 can replace the first argument).

What does the last byte after doing `fwrite()` mean?

I'm playing around with binary files in C, and I can't work out one thing - why does the file end in this byte 00001010 (that equals a 10)?
My code is in essence the following (simplified).
FILE *test = fopen("file.b", "ab");
int value = 1;
fwrite(&value, sizeof(int), 1, test);
fclose(test);
After running the program, file.b looks like this (with the help of vim :%!xxd -b).
00000001 00000000 00000000 00000000 00001010
The trailing byte happens independent of the type I choose to write.
10 is a newline. vim automatically appends a newline (if the file didn't end in a newline) when you filter it through xxd.
Since you are treating it as a binary file you should tell vim it is a binary file with vim -b so the newline isn't added automatically.
Take a look at :h binary

Wrong fprintf output in c

So here is the code that I am attempting to get to work:
char* inFile = "input.txt";
FILE *out = fopen("output.txt", "a+");
int i = 0
while(i < 5){
int countFound = findWord(inFile, keyWord[i]);//returns count of keywords in given file
fprintf(out, "%s: %d\n", keyWord[i], countFound);
i++;
}
fclose(out);
The output of this code is:
youshouldsee1
: 3
youshouldsee2
: 3
youshouldsee3
: 3
youshouldsee4
: 3
youshouldsee5: 1
Expected output:
youshouldsee1: 3
youshouldsee2: 3
youshouldsee3: 3
youshouldsee4: 3
youshouldsee5: 1
I don't really understand why the output is like that, shouldn't it print the string and the int then a new line? Also note that there is not a newline after the last line and there should be. I did some testing and I noticed that if I changed the fprintf statement to fprintf(out, "%s\n", keyWord[i]); the output is:
youshouldsee1
youshouldsee2
youshouldsee3
youshouldsee4
youshouldsee5
Which is formatted much better. Again note that there is not a newline after the last line and there should be.
I noticed that while doing this with just printf statements I get the exact same problem, but the output is slightly more messed up.
Does anybody know what causes this specific issue? Much appreciated.
The array keyWord[] is a double pointer, I'm not sure if that makes a difference or not, but I thought that I would mention it. It is declared like so char** keyWord;. And it was created as follows:
char *tempWord = "something";
keyWords[x] = strdup(tempWord);
That could be totally irrelevent but I thought it was best to mention it.
You likely have some carriage return characters ('\r' or 0x0D) and/or backspace characters ('\b' or 0x08) in your keyWord[i] strings.
CRs and backspaces don't print ordinarily to terminals and instead move the terminal cursor—CRs move the cursor to the beginning of the current line, and backspaces move the cursor backwards one character. When subsequent characters are printed, they overwrite what was there before. So this code
printf("foobar\rbaz\n");
results in this output
bazbar
and this code
printf("foo\bbar\n");
results in this output
fobar
If you look at the raw bytes of the output without printing it to the terminal, you should see the CRs and backspaces plain as day. I like to use the program hexdump(1) to do that. For example:
$ echo -e 'foobar\rbaz'
bazbar
$ echo -e 'foobar\rbaz' | hexdump -C
00000000 66 6f 6f 62 61 72 0d 62 61 7a 0a |foobar.baz.|
0000000b
So I'd suggest looking at the raw data in your program's output and find out where the pesky characters are, and then figure out how to get rid of them.

fwrite fails to write value 10 (0x0A)

I was experimenting on creating BMP files from scratch, when I found a weird bug I could not explain. I isolated the bug in this minimalist program:
int main()
{
FILE* ptr=NULL;
int success=0,pos=0;
ptr=fopen("test.bin","w");
if (ptr==NULL)
{
return 1;
}
char c[3]={10,11,10};
success=fwrite(c,1,3,ptr);
pos=ftell(ptr);
printf("success=%d, pos=%d\n",success,pos);
return 0;
}
the output is:
success=3, pos=5
with hex dump of the test.bin file being:
0D 0A 0B 0D 0A
In short, whatever value you put instead of 11 (0x0B), fwrite will write it correctly. But for some reason, when fwrite comes across a 10 (0x0A) - and precisely this value - it writes 0D 0A instead, that is, 2 bytes, although I clearly specified 1 byte per write in the fwrite arguments. Thus the 3 bytes written, as can be seen in the success variable, and the mysterious 5 in the ftell result.
Could someone please tell me what the heck is going on here...and why 10, why not 97 or 28??
Thank you very much for your help!
EDIT: oh wait, I think I have an idea...isn't this linked to \n being 0A on Unix, and 0D 0A on windows, and some inner feature of the compiler converting one to the other? how can I force it to write exactly the bytes I want?
Your file was opened in text mode so CRLF translation is being done. Try:
fopen("test.bin","wb");
You must be working on the Windows machine. In Windows, EOL is CR-LF whereas in Unix, it is a single character. Your system is replacing 0A with 0D0A.

fread Only first 5 bytes of .PNG file

I've made a simple resource packer for packing the resources for my game into one file. Everything was going fine until I began writing the unpacker.
I noticed the .txt file - 26 bytes - that I had packed, came out of the resource file fine, without anyway issues, all data preserved.
However when reading the .PNG file I had packed in the resource file, the first 5 bytes were intact while the rest was completely nullified.
I traced this down to the packing process, and I noticed that fread is only reading the first 5 bytes of the .PNG file and I can't for the life of me figure out why. It even triggers 'EOF' indicating that the file is only 5 bytes long, when in fact it is a 787 byte PNG of a small polygon, 100px by 100px.
I even tested this problem by making a separate application to simply read this PNG file into a buffer, I get the same results and only 5-bytes are read.
Here is the code of that small separate application:
#include <cstdio>
int main(int argc, char** argv)
{
char buffer[1024] = { 0 };
FILE* f = fopen("test.png", "r");
fread(buffer, 1, sizeof(buffer), f);
fclose(f); //<- I use a breakpoint here to verify the buffer contents
return 0;
}
Can somebody please point out my stupid mistake?
Can somebody please point out my stupid mistake?
Windows platform, I guess?
Use this:
FILE* f = fopen("test.png", "rb");
instead of this:
FILE* f = fopen("test.png", "r");
See msdn for explanation.
Extending the correct answer from SigTerm, here is some background of why you got the effect you did for opening a PNG file in text mode:
The PNG format explains its 8-byte file header as follows:
The first eight bytes of a PNG file always contain the following values:
(decimal) 137 80 78 71 13 10 26 10
(hexadecimal) 89 50 4e 47 0d 0a 1a 0a
(ASCII C notation) \211 P N G \r \n \032 \n
This signature both identifies the file as a PNG file and provides for immediate detection of common file-transfer problems. The first two bytes distinguish PNG files on systems that expect the first two bytes to identify the file type uniquely. The first byte is chosen as a non-ASCII value to reduce the probability that a text file may be misrecognized as a PNG file; also, it catches bad file transfers that clear bit 7. Bytes two through four name the format. The CR-LF sequence catches bad file transfers that alter newline sequences. The control-Z character stops file display under MS-DOS. The final line feed checks for the inverse of the CR-LF translation problem.
I believe that in text mode, the call to fread() was terminated when it read the sixth byte which contains a Ctrl+Z character. Ctrl+Z was historically used in MSDOS (and in CPM before it) to indicate the end of a file, which was necessary because the file system stored the size of a file as a count of blocks, not a count of bytes.
By reading the file in text mode instead of binary mode, you triggered the protection against accidentally using the TYPE command to display a PNG file.
One thing you could do that would have helped diagnose this error is to use fread() slightly differently. You didn't test the return value from fread(). You should. Further, you should call it like this:
...
size_t nread;
...
nread = fread(buffer, sizeof(buffer), 1, f);
so that nread is a count of the bytes actually written to the buffer. For the PNG file in text mode, it would have told you on the first read that it only read 5 bytes. Since the file cannot be that small, you would have had a clue that something else was going on. The remaining bytes of the buffer were never modified by fread(), which would have been seen if you initialized the buffer to some other fill value.

Resources