Firstly apologies if this question has been asked before or has a glaring obvious solution that I cannot see. I have found a similar question however I believe what I am asking goes a little further than what was previously asked.
I have a structure as follows:
typedef struct {
int id;
char *title;
char *body;
} journal_entry;
Q: How do I write and load the contents of a pointer to memory in C (not C++) without using fixed lengths?
Am I wrong in thinking that by writing title or body to file I would endup with junk data and not actually the information I had stored? I do not know the size that the title or body of a journal entry would be and the size may vary significantly from entry to entry.
My own reading suggests that I will need to dereference pointers and fwrite each part of the struct separately. But I'm uncertain how to keep track of the data and the structs without things becoming confused particularly for larger files. Furthermore if these are not the only items I intend to store in the file (for example I may wish to include small images later on I'm uncertain how I would order the file structure for convenience.
The other (possibly perceived) problem is that I have used malloc to allocate memory for the string for the body / entry when loading the data how will I know how much memory to allocate for the string when I wish to load the entry again? Do I need to expand my struct to include int body_len and int title_len?
Guidance or suggestions would be very gratefully received.
(I am focusing on a Linux point of view, but it could be adapted to other systems)
Serialization
What you want to achieve is often called serialization (citing wikipedia) - or marshalling:
The serialization is the process of translating data structures or object state into a format that can be stored and reconstructed later in the same or another computer
Pointer I/O
It is in principle possible to read and write pointers, e.g. the %p conversion specification for fprintf(3) & fscanf(3) (and you might directly write and read a pointer, which is like at the machine level some intptr_t integer. However, a given address (e.g. 0x1234F580 ...) is likely to be invalid or have a different meaning when read again by a different process (e.g. because of ASLR).
Serialization of aggregate data
You might use some textual format like JSON (and I actually recommend doing so) or other format like YAML (or perhaps invent your own, e.g. inspired by s-exprs). It is a well established habit to prefer textual format (and Unix had that habit since before 1980) to binary ones (like XDR, ASN/1, ...). And many protocols (HTTP, SMTP, FTP, JSONRPC ....) are textual protocols
Notice that on current systems, I/O is much slower than computation, so the relative cost of textual encoding & decoding is tiny w.r.t. network or disk IO (see table of Answers here)
The encoding of a some aggregate data (e.g. a struct in C) is generally compositional, and by composing the encoding of elementary scalar data (numbers, strings, ....) you can encode some higher-level data type.
serialization libraries
Most formats (notably JSON) have several free software libraries to encode/decode them, e.g. Jansson, JsonCPP, etc..
Suggestion:
Use JSON and format your journal_entry perhaps into a JSON object like
{ "id": 1234,
"title": "Some Title Here",
"body": "Some body string goes here" }
Concretely, you'll use some JSON library and first convert your journal_entry into some JSON type (and vice versa), then use the library to encode/decode that JSON
databases
You could also consider a database approach (e.g. sqlite, etc...)
PS. Serialization of closures (or anything containing pointer to code) may be challenging. You'll need to define what exactly that means.
PPS. Some languages provide builtin support for serialization and marshalling. For example, Ocaml has a Marshal module, Python has pickle
You are correct that storing this structure in memory is not a good idea, because once the strings to which your pointers point are gone, there is no way to retrieve them. From the practical point of view, one way is to declare strings of finite length (if you know that your strings have a length limit):
typedef struct {
int id;
char title[MAX_TITLE_LEGNTH];
char body[MAX_BODY_LENGTH];
} journal_entry;
If you need to allocate title and body with malloc, you can have a "header" element that stores the length of the whole structure. When you write your structure to file, you would use this element to figure out how many bytes you need to read.
I.e. to write:
FILE* fp = fopen(<your-file-name>,"wb");
size_t size = sizeof(id)+strlen(title)+1+strlen(body)+1;
fwrite(&size, sizeof(size), 1, fp);
fwrite(&id, sizeof(id), 1, fp);
fwrite(title, sizeof(char), strlen(title)+1, fp);
fwrite(body, sizeof(char), strlen(body)+1, fp);
fclose(fp);
To read (not particularly safe implementation, just to give the idea):
FILE* fp = fopen(<your-file-name>,"rb");
size_t size;
int read_bytes = 0;
struct journal_entry je;
fread(&size, sizeof(size), 1, fp);
void* buf = malloc(size);
fread(buf, size, 1, fp);
fclose(fp);
je.id = *((int*)buf); // might break if you wrote your file on OS with different endingness
read_bytes += sizeof(je.id)
je.title = (char*)(buf+read_bytes);
read_bytes += strlen(je.title)+1;
je.body = (char*)(buf+read_bytes);
// other way would be to malloc je.title and je.body and destroy the buf
In memory you can store strings as pointers to arrays. But in a file on disk you would typically store the data directly. One easy way to do it would be to store a uint32_t containing the size, then store the actual bytes of the string. You could also store null-terminated strings in the file, and simply scan for the null terminator when reading them. The first method makes it easier to preallocate the needed buffer space when reading, without needed to pass over the data twice.
Related
Suppose we have following structure:
struct Something {
int i;
};
If I want to write in a file any data of this type(dynamically allocated), I do this:
struct Something *object = malloc(sizeof(struct Something));
object->i = 0; // set member some value
FILE *file = fopen("output_file", "wb");
fwrite(object, sizeof(struct Something), 1 file);
fclose(file);
Now, my questions:
How we do this with a structure what contains pointers? I tested using same method, it worked fine, data could been read, but I want to know if there are any risks?
What you want is called serialization. See also XDR (a portable binary data format) & libs11n (a C++ binary serialization library); you often care about data portability: being able to read the data on some different computer.
"serialization" means to "convert" some complex data structure (e.g. a list, a tree, a vector or even your Something...) into a (serial) byte stream (e.g. a file, a network connection, etc...), and backwards. Dealing with circular data structures or shared sub-components may be tricky.
You don't want to write raw pointers inside a file (but you could), because the written address probably won't make any sense at the next execution of your program (e.g. because of ASLR), i.e. when you'll read the data again.
Read also about application checkpointing and persistence.
For pragmatic reasons (notably ease of debugging and resilience w.r.t. small software evolution) it is often better to use some textual data format (like e.g. JSON or Yaml) to store such persistent data.
You might also be interested in databases. Look first into sqlite, and also into DBMS ("relational" -or SQL based- ones like PostGreSQL, NoSQL ones like e.g. MongoDB)
The issue is not writing a single dynamically allocated struct (since you want mostly to write the data content, not the pointer, so it is the same to fwrite a malloc-ed struct or a locally allocated one), it is to serialize complex data structures which use lots of weird internal pointers!
Notice that copying garbage collectors use algorithms similar to serialization algorithms (since both need to scan a complex graph of references).
Also, on today's computers, disk -or network- IO is a lot (e.g. a million times) slower than the CPU, so it makes sense to do some significant computation before writing files.
I know following approach may not be portable but that is exactly what I want to find out now.
Imagine I have some data structure
struct student
{
char name[20];
int age;
} x;
Now I want to write it to a file like this:
fwrite(&x, sizeof(student), 1, filePointer);
Read similarly:
fread(voidPointer, sizeof(student), 1, filePointer);
// Now possibly do a memcpy
memcpy(studentObjectPointer, voidPointer, sizeof(student));
My question is: Say I don't want to copy this file to another computer, and I will read it from the same computer that created this file.
Will the portability (endianness, packed data structure) issues still apply to above approach? Or it will work fine?
If the file would be copied on other machines, you will have to build your own serializer and deserializer. With the structure you gave it is quite simple.
You have to define which endianness to adopt when writing numbers (here, the int age).
Here, you could process like this :
Open the file in binary mode
Write the 20 bytes of the name string
Write the age in a CHOSEN endianness (big endian)
When reading it back, you will certainly have to convert the Big-Endianness to the local endianness of the machine.
There is still a remaining issue : if sizeof (int) is not the same between the two machines. In that case things get more vicious and I think the above is not sufficient.
If you really need to be portable across a wide range of machines, consider the use of specific length types defined in #include <stdint.h> such as int32_t for instance.
Remember, we're living in a 4-dimension world. You said, the saved file will not move in the x, y and z axis, it will be used on the same computer. But you did not mentioned the 4th dimension, the time. If the structure changes, it will fail.
At least, you should put a signature field in the structure. Fill it with a constant value before write(), check it right after read(), and change the constant value when the structure gets modified.
I have a complex structure in a C program which has many members that are allocated memory, dynamically. How do I write this structure to a text / binary file? How will I be able to recreate the entire structure from the data read from the file.
struct parseinfo{
int varcount;
int termcount;
char **variables;
char **terminals;
char ***actions;
};
The members variables, terminals and actions are all dynamically allocated and I need to write this structure to a file so that I could reconstruct the structure later.
You must write to this structure Serialize and Deserialize functions, you can't write this structure to file as raw data because you have allocated pointers on a heap and this not make sense to save this values on file.
Short: Not in an automated way.
In essence, it depends highly on the semantics of your structure.
If the data fields inside specify the length of certain arraiys inside it,
you can reconstruct the struct.
But you have to be careful to "beleive" the values. (possible cause of stack overflow (nice word)) if you believe that there are 2^34 entries in an array.
But otherwise it is just going tru every member (pain)
You could search a little about ASN.1 and TLV-structs.
Here are some suggestions for binary serialization:
You can serialize a string either a la C (write the terminating '\0') or
by writing the size (say, an int) followed by the contents.
You can serialize an array of strings by writing the length of the array
followed by the strings.
If it's possible that you deserialize the file on a different
architecture (different int size, different endianness...), then take
care to carefully specify the binary format of the file. In such a case
you may want to take a look at the XDR serialiaztion standard.
For ASCII serialization, I like the JSON format.
The way I would suggest to do it is to think about what does it take to create the items in the first place?
When it comes to the char **variables; char **terminals; char **actions you're obviously going to have to figure out how to declare those and read them in but I don't think you can inject a /0 into a file (EOF character??)
How would you like to see it written to the file? Can you provide a sample output how you think it should be stored? Perhaps one item per line in a file? Does it need to be a binary file?
The idea is that an application may contain a struct of large arrays that are filled up via a slow external library. So, what if that could be easily stored to a file for fast reference, at least after it has been run once? If it's not possible to be done easily in a cross platform way, is it easy to be done locally 'after a first run'?
it depends of the way the structure is filled. if the structure has a fixed size (that is, it does not contain any dynamically allocated pointer) and is self-contained (it does not contain pointers to memory outside the structure itself) then you can dump the struct directly to a file using standard library file operation. something along that way:
#include <stdio.h>
FILE *file;
file = fopen( "filename", "w" );
fwrite( &your_struct, sizeof(your_struct), 1, file )
fclose( file );
(note: error checking ommited for clarity and conciseness)
reloading looks something like this:
file = fopen( "filename", "r" );
fread( &your_struct, sizeof(your_struct), 1, file );
fclose( file );
this method will work on all platforms.
however, this method is not strictly cross-platform, since the resulting file cannot be ported between machines of different endianness (for example, old Macintosh'es used to store the bytes composing an int in a different order than an IBM PC); the resulting file can only be used on platforms of the same architecture than the computer which produced the file.
now if the struct is not self-contained (it contains a pointer referencing memory outside the struct) or uses dynamically allocated memory, then you will need something more elaborate...
regarding the endianness problem, the standard BSD socket implementation, which exists on almost every platform, defines a set of functions to convert from network byte order to host byte order (and their inverse), which are really handy, since the network byte order is strictly cross-platform. have a look at htons() and ntohs(), htonl() and ntohl(). unfortunately, you have to call those functions for each field of the structure, which is quite cumbersome if the structure is large.
maybe you can store the data in XML-Format-File. With that you can avoid the problems Adrian told, and you also have no problem with language specific character codesets, and you even have the opportunity to read and write and handle the data in completly different programming languages
I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.