How to write dynamically allocated structure to file - c

I have a complex structure in a C program which has many members that are allocated memory, dynamically. How do I write this structure to a text / binary file? How will I be able to recreate the entire structure from the data read from the file.
struct parseinfo{
int varcount;
int termcount;
char **variables;
char **terminals;
char ***actions;
};
The members variables, terminals and actions are all dynamically allocated and I need to write this structure to a file so that I could reconstruct the structure later.

You must write to this structure Serialize and Deserialize functions, you can't write this structure to file as raw data because you have allocated pointers on a heap and this not make sense to save this values on file.

Short: Not in an automated way.
In essence, it depends highly on the semantics of your structure.
If the data fields inside specify the length of certain arraiys inside it,
you can reconstruct the struct.
But you have to be careful to "beleive" the values. (possible cause of stack overflow (nice word)) if you believe that there are 2^34 entries in an array.
But otherwise it is just going tru every member (pain)
You could search a little about ASN.1 and TLV-structs.

Here are some suggestions for binary serialization:
You can serialize a string either a la C (write the terminating '\0') or
by writing the size (say, an int) followed by the contents.
You can serialize an array of strings by writing the length of the array
followed by the strings.
If it's possible that you deserialize the file on a different
architecture (different int size, different endianness...), then take
care to carefully specify the binary format of the file. In such a case
you may want to take a look at the XDR serialiaztion standard.
For ASCII serialization, I like the JSON format.

The way I would suggest to do it is to think about what does it take to create the items in the first place?
When it comes to the char **variables; char **terminals; char **actions you're obviously going to have to figure out how to declare those and read them in but I don't think you can inject a /0 into a file (EOF character??)
How would you like to see it written to the file? Can you provide a sample output how you think it should be stored? Perhaps one item per line in a file? Does it need to be a binary file?

Related

Writing data to file in C [Serialization]

I have a struct that I want to save in a data file.
typedef struct Photo {
char name[20];
char description[100];
} photo;
I'd like to be able to save many of these structs in a file, much like a database. The only way I can see in doing that would be through fwrite() which has problems when it comes to platform mobility, and to just write photo->name, photo->description as ascii. I would really like to avoid having to parse all of the raw text data. Is there another way to do it?
I couldn't really understand what you are asking but you can also write to files using fprintf.
And if you are asking how to store the data in the file for faster lookup, use hash tables, at the start of the file have a table which points to memory locations for the structure along with a hash value(of the name parameter).
When querying the file all you have to do is match the hash value and then jump to that specific location.

C - Save/Load Pointer Data to File

Firstly apologies if this question has been asked before or has a glaring obvious solution that I cannot see. I have found a similar question however I believe what I am asking goes a little further than what was previously asked.
I have a structure as follows:
typedef struct {
int id;
char *title;
char *body;
} journal_entry;
Q: How do I write and load the contents of a pointer to memory in C (not C++) without using fixed lengths?
Am I wrong in thinking that by writing title or body to file I would endup with junk data and not actually the information I had stored? I do not know the size that the title or body of a journal entry would be and the size may vary significantly from entry to entry.
My own reading suggests that I will need to dereference pointers and fwrite each part of the struct separately. But I'm uncertain how to keep track of the data and the structs without things becoming confused particularly for larger files. Furthermore if these are not the only items I intend to store in the file (for example I may wish to include small images later on I'm uncertain how I would order the file structure for convenience.
The other (possibly perceived) problem is that I have used malloc to allocate memory for the string for the body / entry when loading the data how will I know how much memory to allocate for the string when I wish to load the entry again? Do I need to expand my struct to include int body_len and int title_len?
Guidance or suggestions would be very gratefully received.
(I am focusing on a Linux point of view, but it could be adapted to other systems)
Serialization
What you want to achieve is often called serialization (citing wikipedia) - or marshalling:
The serialization is the process of translating data structures or object state into a format that can be stored and reconstructed later in the same or another computer
Pointer I/O
It is in principle possible to read and write pointers, e.g. the %p conversion specification for fprintf(3) & fscanf(3) (and you might directly write and read a pointer, which is like at the machine level some intptr_t integer. However, a given address (e.g. 0x1234F580 ...) is likely to be invalid or have a different meaning when read again by a different process (e.g. because of ASLR).
Serialization of aggregate data
You might use some textual format like JSON (and I actually recommend doing so) or other format like YAML (or perhaps invent your own, e.g. inspired by s-exprs). It is a well established habit to prefer textual format (and Unix had that habit since before 1980) to binary ones (like XDR, ASN/1, ...). And many protocols (HTTP, SMTP, FTP, JSONRPC ....) are textual protocols
Notice that on current systems, I/O is much slower than computation, so the relative cost of textual encoding & decoding is tiny w.r.t. network or disk IO (see table of Answers here)
The encoding of a some aggregate data (e.g. a struct in C) is generally compositional, and by composing the encoding of elementary scalar data (numbers, strings, ....) you can encode some higher-level data type.
serialization libraries
Most formats (notably JSON) have several free software libraries to encode/decode them, e.g. Jansson, JsonCPP, etc..
Suggestion:
Use JSON and format your journal_entry perhaps into a JSON object like
{ "id": 1234,
"title": "Some Title Here",
"body": "Some body string goes here" }
Concretely, you'll use some JSON library and first convert your journal_entry into some JSON type (and vice versa), then use the library to encode/decode that JSON
databases
You could also consider a database approach (e.g. sqlite, etc...)
PS. Serialization of closures (or anything containing pointer to code) may be challenging. You'll need to define what exactly that means.
PPS. Some languages provide builtin support for serialization and marshalling. For example, Ocaml has a Marshal module, Python has pickle
You are correct that storing this structure in memory is not a good idea, because once the strings to which your pointers point are gone, there is no way to retrieve them. From the practical point of view, one way is to declare strings of finite length (if you know that your strings have a length limit):
typedef struct {
int id;
char title[MAX_TITLE_LEGNTH];
char body[MAX_BODY_LENGTH];
} journal_entry;
If you need to allocate title and body with malloc, you can have a "header" element that stores the length of the whole structure. When you write your structure to file, you would use this element to figure out how many bytes you need to read.
I.e. to write:
FILE* fp = fopen(<your-file-name>,"wb");
size_t size = sizeof(id)+strlen(title)+1+strlen(body)+1;
fwrite(&size, sizeof(size), 1, fp);
fwrite(&id, sizeof(id), 1, fp);
fwrite(title, sizeof(char), strlen(title)+1, fp);
fwrite(body, sizeof(char), strlen(body)+1, fp);
fclose(fp);
To read (not particularly safe implementation, just to give the idea):
FILE* fp = fopen(<your-file-name>,"rb");
size_t size;
int read_bytes = 0;
struct journal_entry je;
fread(&size, sizeof(size), 1, fp);
void* buf = malloc(size);
fread(buf, size, 1, fp);
fclose(fp);
je.id = *((int*)buf); // might break if you wrote your file on OS with different endingness
read_bytes += sizeof(je.id)
je.title = (char*)(buf+read_bytes);
read_bytes += strlen(je.title)+1;
je.body = (char*)(buf+read_bytes);
// other way would be to malloc je.title and je.body and destroy the buf
In memory you can store strings as pointers to arrays. But in a file on disk you would typically store the data directly. One easy way to do it would be to store a uint32_t containing the size, then store the actual bytes of the string. You could also store null-terminated strings in the file, and simply scan for the null terminator when reading them. The first method makes it easier to preallocate the needed buffer space when reading, without needed to pass over the data twice.

Print a structure in a String format in C

I one of my assignment, I have a task to print the below whole structure in a string format.
Struct test
{
int a,
char char1,char2;
}
output should be: Structure is a=10,char1=b,char2=c;
I know it is very simple by using
printf("Structure is a=%d,char1=%c, char2= %c", s.a,s.char1,s.char2);
But in real-time, I have a lot of big structures and I cannot write printf statements with access specifiers for each element of structure. Is there any other way to print the whole structure with just specifying the structure variable or some other?
There's no way to do this in pure C. Some languages support this via a concept called reflection, but it's not available in C.
Code-that-writes-code is your best bet. Write a script that finds all your structs and builds functions to printf them.
One possible solution I can think of is that you can take the help of the fread funtion using which you can save the whole content of the structure at once into a, say temporary file. Using:
fread(&STRUCTURE_OBJECT, sizeof(YOUR_STRUCTURE), 1, FILE_POINTER);
Where STRUCTURE_OBJECT is the name of a data element of your strucure.
And then use linux based commands like "cat" and "piping" etc for the quick glance of the output.

Writing structure into a file in C

I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.

Passing variable-length structures between MPI processes

I need to MPI_Gatherv() a number of int/string pairs. Let's say each pair looks like this:
struct Pair {
int x;
unsigned s_len;
char s[1]; // variable-length string of s_len chars
};
How to define an appropriate MPI datatype for Pair?
In short, it's theoretically impossible to send one message of variable size and receive it into a buffer of the perfect size. You'll either have to send a first message with the sizes of each string and then a second message with the strings themselves, or encode that metainfo into the payload and use a static receiving buffer.
If you must send only one message, then I'd forgo defining a datatype for Pair: instead, I'd create a datatype for the entire payload and dump all the data into one contiguous, untyped package. Then at the receiving end you could iterate over it, allocating the exact amount of space necessary for each string and filling it up. Let me whip up an ASCII diagram to illustrate. This would be your payload:
|..x1..|..s_len1..|....string1....|..x2..|..s_len2..|.string2.|..x3..|..s_len3..|.......string3.......|...
You send the whole thing as one unit (e.g. an array of MPI_BYTE), then the receiver would unpack it something like this:
while (buffer is not empty)
{
read x;
read s_len;
allocate s_len characters;
move s_len characters from buffer to allocated space;
}
Note however that this solution only works if the data representation of integers and chars is the same on the sending and receiving systems.
I don't think you can do quite what you want with MPI. I'm a Fortran programmer, so bear with me if my understanding of C is a little shaky. You want, it seems, to pass a data structure consisting of 1 int and 1 string (which you pass by passing the location of the first character in the string) from one process to another ? I think that what you are going to have to do is pass a fixed length string -- which would have, therefore, to be as long as any of the strings you really want to pass. The reception area for the gathering of these strings will have to be large enough to to receive all the strings together with their lengths.
You'll probably want to declare a new MPI datatype for your structs; you can then gather these and, since the gathered data includes the length of the string, recover the useful parts of the string at the receiver.
I'm not certain about this, but I've never come across truly variable message lengths as you seem to want to use and it does sort feel un-MPI-like. But it may be something implemented in the latest version of MPI that I've just never stumbled across, though looking at the documentation on-line it doesn't seem so.
MPI implementations do not inspect or interpret the actual contents of a message. Provided that you know the size of the data structure, you can represent that size in some number of char's or int's. The MPI implementation will not know or care about the actual internal details of the data.
There are a few caveats...both the sender and receiver need to agree on the interpretation of the message contents, and the buffer that you provide on the sending and receiving side needs to fit into some definable number of char's or int's.

Resources