writing data structure to a file - c

I know following approach may not be portable but that is exactly what I want to find out now.
Imagine I have some data structure
struct student
{
char name[20];
int age;
} x;
Now I want to write it to a file like this:
fwrite(&x, sizeof(student), 1, filePointer);
Read similarly:
fread(voidPointer, sizeof(student), 1, filePointer);
// Now possibly do a memcpy
memcpy(studentObjectPointer, voidPointer, sizeof(student));
My question is: Say I don't want to copy this file to another computer, and I will read it from the same computer that created this file.
Will the portability (endianness, packed data structure) issues still apply to above approach? Or it will work fine?

If the file would be copied on other machines, you will have to build your own serializer and deserializer. With the structure you gave it is quite simple.
You have to define which endianness to adopt when writing numbers (here, the int age).
Here, you could process like this :
Open the file in binary mode
Write the 20 bytes of the name string
Write the age in a CHOSEN endianness (big endian)
When reading it back, you will certainly have to convert the Big-Endianness to the local endianness of the machine.
There is still a remaining issue : if sizeof (int) is not the same between the two machines. In that case things get more vicious and I think the above is not sufficient.
If you really need to be portable across a wide range of machines, consider the use of specific length types defined in #include <stdint.h> such as int32_t for instance.

Remember, we're living in a 4-dimension world. You said, the saved file will not move in the x, y and z axis, it will be used on the same computer. But you did not mentioned the 4th dimension, the time. If the structure changes, it will fail.
At least, you should put a signature field in the structure. Fill it with a constant value before write(), check it right after read(), and change the constant value when the structure gets modified.

Related

C How fread reads different data blocks in a binary file?

I'm porting some C code to C#, but I know little about C, but I'm flexible and I can learn new programming languages. Anyway I wasn't able to figure out the exact behaviour from the code I'm porting.
I've read about fread() and on the web.
fread(&(targetObj->data), sizeof(TestObj), 1, file);
Now, file is a big binary file with lots of data in it.
What I want to know is how I can do this in C#.
Let me explain:
I think that line of code does this:
TestObj is an unsigned short
reads 1 time a chunk of data of the size of TestObj(unsigned short)
reads it from file (which is pointer to a binary file on filesystem) into targetObj->data
What I don't understand is:
I have a big binary file, what it actually reads? There are somewhere headers which define where an unsigned short sized chunk of data is written?
Where does it takes from the binary that object? How can I know how to read back from the binary file in C#? Maybe C knows where to pick that single unsigned short, but I don't in C#
For example if that binary file has saved in it 40 unsigned shorts the C code line above reads just the first one?
and if I do
fread(&(targetObj->data), sizeof(TestObj), 5, file);
it is expected that testObj->data is an array of 5 unsigned shorts?
And the code will read the first 5 unsigned shorts that it finds in the whole binary file?
I can't wrap my head around this but I need to know how C recognizes that unsigned short in a big binary file which I don't know the content of nor I can't think how I can say in C# read the first C unsigned short from that file
fread just reads the specified number of bytes from the current file cursor position, and advances the file cursor (or "file pointer", but not to be confused with a C pointer).
So if sizeof(TestObj) is 2, it will read two bytes and place them into the location pointed by &(targetObj->data), with no bounds checking, and regardless of any differences between your architecture endianess and the file protocol endianess. Note that this approach is not a platform-independent way of parsing files containing numbers in binary form, since the number might be stored differently on your machine, compared to how it is stored inside the file (by whoever designed the binary protocol you are trying to read).
In C#, you might achieve a similar thing by manually specifying struct packing and field placement, although the code will suffer from the same problems as your C code.
fread reads from current position in stream see also ftell and fseek. Equivalent in C# would be Stream.Read
From man fread
size_t
fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);
The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.
sizeof(short) is resolved by compiler, as per https://stackoverflow.com/a/14171152/6204612
And C does not do any pretty conversions from you. What is read is precisely sizeof(short) bytes, and these bytes are put into TestObj variable. Whether it is correct or not is implementer's responsibility. You need to manage offsets, collection sizes etc. on your own.

C - Save/Load Pointer Data to File

Firstly apologies if this question has been asked before or has a glaring obvious solution that I cannot see. I have found a similar question however I believe what I am asking goes a little further than what was previously asked.
I have a structure as follows:
typedef struct {
int id;
char *title;
char *body;
} journal_entry;
Q: How do I write and load the contents of a pointer to memory in C (not C++) without using fixed lengths?
Am I wrong in thinking that by writing title or body to file I would endup with junk data and not actually the information I had stored? I do not know the size that the title or body of a journal entry would be and the size may vary significantly from entry to entry.
My own reading suggests that I will need to dereference pointers and fwrite each part of the struct separately. But I'm uncertain how to keep track of the data and the structs without things becoming confused particularly for larger files. Furthermore if these are not the only items I intend to store in the file (for example I may wish to include small images later on I'm uncertain how I would order the file structure for convenience.
The other (possibly perceived) problem is that I have used malloc to allocate memory for the string for the body / entry when loading the data how will I know how much memory to allocate for the string when I wish to load the entry again? Do I need to expand my struct to include int body_len and int title_len?
Guidance or suggestions would be very gratefully received.
(I am focusing on a Linux point of view, but it could be adapted to other systems)
Serialization
What you want to achieve is often called serialization (citing wikipedia) - or marshalling:
The serialization is the process of translating data structures or object state into a format that can be stored and reconstructed later in the same or another computer
Pointer I/O
It is in principle possible to read and write pointers, e.g. the %p conversion specification for fprintf(3) & fscanf(3) (and you might directly write and read a pointer, which is like at the machine level some intptr_t integer. However, a given address (e.g. 0x1234F580 ...) is likely to be invalid or have a different meaning when read again by a different process (e.g. because of ASLR).
Serialization of aggregate data
You might use some textual format like JSON (and I actually recommend doing so) or other format like YAML (or perhaps invent your own, e.g. inspired by s-exprs). It is a well established habit to prefer textual format (and Unix had that habit since before 1980) to binary ones (like XDR, ASN/1, ...). And many protocols (HTTP, SMTP, FTP, JSONRPC ....) are textual protocols
Notice that on current systems, I/O is much slower than computation, so the relative cost of textual encoding & decoding is tiny w.r.t. network or disk IO (see table of Answers here)
The encoding of a some aggregate data (e.g. a struct in C) is generally compositional, and by composing the encoding of elementary scalar data (numbers, strings, ....) you can encode some higher-level data type.
serialization libraries
Most formats (notably JSON) have several free software libraries to encode/decode them, e.g. Jansson, JsonCPP, etc..
Suggestion:
Use JSON and format your journal_entry perhaps into a JSON object like
{ "id": 1234,
"title": "Some Title Here",
"body": "Some body string goes here" }
Concretely, you'll use some JSON library and first convert your journal_entry into some JSON type (and vice versa), then use the library to encode/decode that JSON
databases
You could also consider a database approach (e.g. sqlite, etc...)
PS. Serialization of closures (or anything containing pointer to code) may be challenging. You'll need to define what exactly that means.
PPS. Some languages provide builtin support for serialization and marshalling. For example, Ocaml has a Marshal module, Python has pickle
You are correct that storing this structure in memory is not a good idea, because once the strings to which your pointers point are gone, there is no way to retrieve them. From the practical point of view, one way is to declare strings of finite length (if you know that your strings have a length limit):
typedef struct {
int id;
char title[MAX_TITLE_LEGNTH];
char body[MAX_BODY_LENGTH];
} journal_entry;
If you need to allocate title and body with malloc, you can have a "header" element that stores the length of the whole structure. When you write your structure to file, you would use this element to figure out how many bytes you need to read.
I.e. to write:
FILE* fp = fopen(<your-file-name>,"wb");
size_t size = sizeof(id)+strlen(title)+1+strlen(body)+1;
fwrite(&size, sizeof(size), 1, fp);
fwrite(&id, sizeof(id), 1, fp);
fwrite(title, sizeof(char), strlen(title)+1, fp);
fwrite(body, sizeof(char), strlen(body)+1, fp);
fclose(fp);
To read (not particularly safe implementation, just to give the idea):
FILE* fp = fopen(<your-file-name>,"rb");
size_t size;
int read_bytes = 0;
struct journal_entry je;
fread(&size, sizeof(size), 1, fp);
void* buf = malloc(size);
fread(buf, size, 1, fp);
fclose(fp);
je.id = *((int*)buf); // might break if you wrote your file on OS with different endingness
read_bytes += sizeof(je.id)
je.title = (char*)(buf+read_bytes);
read_bytes += strlen(je.title)+1;
je.body = (char*)(buf+read_bytes);
// other way would be to malloc je.title and je.body and destroy the buf
In memory you can store strings as pointers to arrays. But in a file on disk you would typically store the data directly. One easy way to do it would be to store a uint32_t containing the size, then store the actual bytes of the string. You could also store null-terminated strings in the file, and simply scan for the null terminator when reading them. The first method makes it easier to preallocate the needed buffer space when reading, without needed to pass over the data twice.

Reading a binary file bit by bit

I know the function below:
size_t fread(void *ptr, size_t size_of_elements, size_t number_of_elements, FILE *a_file);
It only reads byte by byte, my goal is to be able to read 12 bits at a time and then take them into an array. Any help or pointers would be greatly appreciated!
Adding to the first comment, you can try reading one byte at a time (declare a char variable and write there), and then use the bitwise operators >> and << to read bit by bit. Read more here: http://www.cprogramming.com/tutorial/bitwise_operators.html
Many years ago, I wrote some I/O routines in C for a Huffman encoder. This needs to be able to read and write on the granularity of bits rather than bytes. I created functions analogous to read(2) and write(2) that could be asked to (say) read 13 bits from a stream. To encode, for example, bytes would be fed into the coder and variable numbers of bits would emerge the other side. I had a simple structure with a bit pointer into the current byte being read or written. Every time it went off the end, it flushed the completed byte out and reset the pointer to zero. Unfortunately that code is long gone, but it might be an idea to pull apart an open-source Huffman coder and see how the problem was solved there. Similarly, base64 coding takes 3 bytes of data and turns them into 4 (or vice versa).
I've implemented a couple of methods to read/write files bit by bit. Here they are. Whether it is viable or not for your use case, you have to decide that for yourself. I've tried to make the most readable, optimized code i could, not being a seasoned C developer (for now).
Internally, it uses a "bitCursor" to store information about previous bits that don't yet fit a full byte. It has who data fields: d stores data and s stores size, or the amount of bits stored in the cursor.
You have four functions:
newBitCursor(), which returns a bitCursor object with default values
{0,0}. Such a cursor is needed at the beginning of a sequence of
read/write operations to or from a file.
fwriteb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which writes sizeb rightmost bits of the value stored in ptr to fp.
fcloseb(FILE *fp, bitCursor *c), which writes a remaining byte, if
the previous writes did not exactly encapsulate all data needed to
be written, that is probably almost always the case...
freadb(void *ptr, size_t sizeb, size_t rSizeb, FILE *fp, bitCursor
*c), which reads sizeb bits and bitwise ORs them to *ptr. (it is, therefore, your responsibility to init *ptr as 0)
More info is provided in the comments. Have Fun!
Edit: It has come to my knowledge today that when i made that i assumed Little Endian! :P Oops! It's always nice to realize how much of a noob i still am ;D.
Edit: GNU's Binary File Descriptors.
Read the first two bytes from your a_file file pointer and check the bits in the least or greatest byte — depending on the endianness of your platform (x86 is little-endian) — using bitshift operators.
You can't really put bits into an array, as there isn't a datatype for bits. Rather than keeping 1's and 0's in an array, which is inefficient, it seems cheaper just to keep the two bytes in a two-element array (say, of type unsigned char *) and write functions to map those two bytes to one of 4096 (2^12) values-of-interest.
As a complication, on subsequent reads, if you want to fread through the pointer every 12 bits, you would read only one byte, using the left-over bits from the previous read to build a new 12-bit value. If there are no leftovers, you would need to read two bytes.
Your mapping functions would need to address the second case where bits were used from previous read, because the two bytes would need different mapping. To do this efficiently, a modulus on a read-counter could be used to swap between two mappings.
read 2 bytes and do bit wise operations will get it done for the next time read 2nd bytes onwards apply the bit-wise operations will get back you expected . . . .
For your problem you can see this demo program which read 2byte but actual information is only 12 bit.As well as this type of things are used it bit wise access.
fwrite() is a standard library function which take the size argument as byte and of type int.So it is not possible exactly 12bit read.If the file you create then create like below as well as read as below it solve your problem.
If that file is special file which not written by you then follow the standard provided for that file to read I think they also writing like this only.Or you can provide the axact where it I may help you.
#include<stdio.h>
#include<stdlib.h>
struct node
{
int data:12;
}NODE;
int main()
{
FILE *fp;
fp=fopen("t","w");
NODE.data=1024;
printf("%d\n",NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
NODE.data=0;
NODE.data=2048;
printf("%d\n",(unsigned)NODE.data);
fwrite(&NODE,sizeof(NODE),1,fp);
fclose(fp);
fp=fopen("t","r");
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fread(&NODE,sizeof(NODE),1,fp);
printf("%d\n",NODE.data);
fclose(fp);
}

In C, is there a cross-platform way to store what a variable might contain for quick reloading of its contents?

The idea is that an application may contain a struct of large arrays that are filled up via a slow external library. So, what if that could be easily stored to a file for fast reference, at least after it has been run once? If it's not possible to be done easily in a cross platform way, is it easy to be done locally 'after a first run'?
it depends of the way the structure is filled. if the structure has a fixed size (that is, it does not contain any dynamically allocated pointer) and is self-contained (it does not contain pointers to memory outside the structure itself) then you can dump the struct directly to a file using standard library file operation. something along that way:
#include <stdio.h>
FILE *file;
file = fopen( "filename", "w" );
fwrite( &your_struct, sizeof(your_struct), 1, file )
fclose( file );
(note: error checking ommited for clarity and conciseness)
reloading looks something like this:
file = fopen( "filename", "r" );
fread( &your_struct, sizeof(your_struct), 1, file );
fclose( file );
this method will work on all platforms.
however, this method is not strictly cross-platform, since the resulting file cannot be ported between machines of different endianness (for example, old Macintosh'es used to store the bytes composing an int in a different order than an IBM PC); the resulting file can only be used on platforms of the same architecture than the computer which produced the file.
now if the struct is not self-contained (it contains a pointer referencing memory outside the struct) or uses dynamically allocated memory, then you will need something more elaborate...
regarding the endianness problem, the standard BSD socket implementation, which exists on almost every platform, defines a set of functions to convert from network byte order to host byte order (and their inverse), which are really handy, since the network byte order is strictly cross-platform. have a look at htons() and ntohs(), htonl() and ntohl(). unfortunately, you have to call those functions for each field of the structure, which is quite cumbersome if the structure is large.
maybe you can store the data in XML-Format-File. With that you can avoid the problems Adrian told, and you also have no problem with language specific character codesets, and you even have the opportunity to read and write and handle the data in completly different programming languages

Writing structure into a file in C

I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.

Resources