How fread() and fwrite works in c programming - c

I am working on a database flat file project using c language. I have created a structure with few members. I am using fwrite() to write them in binary file and fread() to fetch the data. My two major question
1st can we write structure in text file? I have seen no good example. Is it practically wrong to write it in text format? when I write using "w" instead of "wb" I get the text format but with some extra words.
2nd how these fread() & fwrite works(). They operate on a block of data how they get the address of next block. I mean we do have the pointer but file doesnt have any address so how the pointer go to next block?

1st can we write structure in text file ? i have seen no good example
.is it practically wrong to write it in text format ?when i write
using "w" instead of "wb" i get the text format but with some extra
words
Imagine your structure contains some integers inside. Now if you write them using fwrite, these integers will be written in file in binary format.
If you try to interpret this as text, this won't work, text editor will try to interpret the binary values as characters - which will most likely not work as you expect.
e.g. if your structure contains integer 3, when written using fwrite, it will be stored as
00000000 0000000 0000000 00000011 (3 in binary)
assuming big endian notation. Now if you will try to read above using a text editor, of course you will not get desired effect.
Not saying anything about the padding bytes which maybe inserted in your structure by compiler.
2nd how these fread() & fwrite works(). They operate on a block of
data how they get the address of next block. I mean we do have the
pointer but file doesnt have any address so how the pointer go to next
block?
This is most likely taken care of using OS.
PS. I suggest you read more about serialization, and try to understand difference between text and binary files.

Related

How do fread and fwrite distinguish between different data (types) in C?

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."

Use magic.mgc from another language

I'm currently working on a project which involves reading file's magic files (without bindings). I'd like to know how it would be possible to read the file tests from the compiled binary magic.mgc directly, in another language (like Go), as I'm unsure of how its contents should be interpreted.
According to Christos Zoulas, main contributor of file:
If you want to use them directly you
need to understand the binary format (which changes over time) and load
it in your own data structures. [...] The code that parses the file is in apprentice.c. See check_buffer()
for the reader and apprentice_compile() for the writer. There is
a 4 byte magic number, followed by a 4 byte version number followed
by MAGIG_SET (2) number of 4 byte counts one for each set (ascii,
binary) followed by an array of 'struct magic' entries, in native
byte format.
So that's the format one should expect! Nevertheless, it has to be parsed just like the raw files.

Using fscanf in binary mode

Is using fscanf when opening a file in binary mode bad? I can't seem to find anything reasonable on the Internet. I am trying to open and read a PPM file and I've found this, but I am not sure if using fscanf is okay? And using netpbm is not okay, yeah.
Reading this with fread seems like a pain.
The scanf and fscanf functions are for reading characters, e.g., "1234", and converting them from a string to an integer. But integers are not stored as stings in a binary file. The actual bytes of the integer itself are stored. These need to be read directly into an integer with fread.

How to check if fwrite() works in C?

I am trying to read 256 bytes of whatever data in my input file and construct the information of them into a struct, then write that struct into my output file. Since I can't simply open and read the output file, I wonder what I should do to make sure I have sucessfully written the struct to my output file?
Check the return value from fwrite- i.e. read manual page http://www.cplusplus.com/reference/cstdio/fwrite/
You can read again the file and cast it to your struct and check it's values.
You've got several basic issues. Firstly, you need to verify that the data
you have read in is valid. Secondly, having constructed your data structure
in memory you need to write it out to a different file. Thirdly, you want
to guarantee that this has been written out correctly - but you don't say
what you are allowed to in order to verify the output file's correctness.
Reading the data in is easy - fread() is your friend here. I would read it
all into a (void *) of the appropriate size.
What I've done in similar data in / data out use-cases in the past is to
include a simple (depending on the application) checksum as the first element
in your output data structure. Use bcopy() or memcpy() to transfer your read-in
data to your output structure, then calculate the checksum on the data and
update the checksum field.
Finally, you fwrite() all that data to the output file, and check the return
value - the number of objects written out. If there is an error (number written
is less than desired), you need to check errno and handle the error case.
In my copy of the manual (Solaris 11.x), the error codes possible for
fwrite(3c) are those for fputc(3c).
Finally finally, you can determine whether sufficient bytes have been written
to the output file by comparing the statbut from a stat() call immediately
after opening the output file, and one immediately your fwrite() + fclose()
has returned.
If you opened the file for reading and writing, you should also be able to fseek to the beginning (offset 0) and re-read the file from there. That's assuming you're ok with opening the file for reading and writing together.
Check the return code as chux suggested. Open the input and output files in a hex editor and compare the first 256 bytes. Emacs has one that works well enough.

Writing structure into a file in C

I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.

Resources