Serialize Data Structures in C

Serialize Data Structures in C - c

I'd like a C library that can serialize my data structures to disk, and then load them again later. It should accept arbitrarily nested structures, possibly with circular references.
I presume that this tool would need a configuration file describing my data structures. The library is allowed to use code generation, although I'm fairly sure it's possible to do this without it.
Note I'm not interested in data portability. I'd like to use it as a cache, so I can rely on the environment not changing.
Thanks.
Results
Someone suggested Tpl which is an awesome library, but I believe that it does not do arbitrary object graphs, such as a tree of Nodes that each contain two other Nodes.
Another candidate is Eet, which is a project of the Enlightenment window manager. Looks interesting but, again, seems not to have the ability to serialize nested structures.

Check out tpl. From the overview:
Tpl is a library for serializing C
data. The data is stored in its
natural binary form. The API is small
and tries to stay "out of the way".
Compared to using XML, tpl is faster
and easier to use in C programs. Tpl
can serialize many C data types,
including structures.

I know you're asking for a library. If you can't find one (::boggle::, you'd think this was a solved problem!), here is an outline for a solution:
You should be able to write a code generator[1] to serialize trees/graphs without (run-time) pre-processing fairly simply.
You'll need to parse the node structure (typedef handling?), and write the included data values in a straight ahead fashion, but treat the pointers with some care.
For pointer to other objects (i.e. char *name;) which you know are singly referenced, you can serialize the target data directly.
For objects that might be multiply refernced and for other nodes of your tree you'll have to represent the pointer structure. Each object gets assigned a serialization number, which is what is written out in-place of the pointer. Maintain a translation structure between current memory position and serialization number. On encountering a pointer, see if it is already assigned a number, if not, give it one and queue that object up for serialization.
Reading back also requires a node-#/memory-location translation step, and might be easier to do in two passes: regenerate the nodes with the node numbers in the pointer slots (bad pointer, be warned) to find out where each node gets put, then walk the structure again fixing the pointers.
I don't know anything about tpl, but you might be able to piggy-back on it.
The on-disk/network format should probably be framed with some type information. You'll need a name-mangling scheme.
[1] ROOT uses this mechanism to provide very flexible serialization support in C++.
Late addition: It occurs to me that this is not always as easy as I implied above. Consider the following (contrived and badly designed) declaration:
enum {
mask_none = 0x00,
mask_something = 0x01,
mask_another = 0x02,
/* ... */
mask_all = 0xff
};
typedef struct mask_map {
int mask_val;
char *mask_name;
} mask_map_t;
mask_map_t mask_list[] = {
{mask_something, "mask_something"},
{mask_another, "mask_another"},
/* ... */
};
struct saved_setup {
char* name;
/* various configuration data */
char* mask_name;
/* ... */
};
and assume that we initalize out struct saved_setup items so that mask_name points at mask_list[foo].mask_name.
When we go to serialize the data, what do we do with struct saved_setup.mask_name?
You will need to take care in designing your data structures and/or bring some case-specific intelligence to the serialization process.

This is my solution. It uses my own implementation of malloc, free and mmap, munmap system calls. Follow the given example codes. Ref: http://amscata.blogspot.com/2013/02/serialize-your-memory.html
In my approach I create a char array as my own RAM space. Then there are functions for allocate the memory and free them. After creating the data structure, by using mmap, I write the char array to a file.
Whenever you want to load it back to the memory there is a function which used munmap to put the data structure again to the char array. Since it has virtual addresses for your pointers, you can re use your data structure. That means, you can create data structure, save it, load it, again edit it, and save it again.

You can take a look on eet. A library of the enlightenment project to store C data types (including nested structures). Although nearly all libs of the enlightenment project are in pre-alpha state, eet is already released. I'm not sure, however, if it can handle circular references. Probably not.

http://s11n.net/c11n/
HTH

you should checkout gwlib. the serializer/deserializer is extensive. and there are extensive tests available to look at. http://gwlib.com/

I'm assuming you are talking about storing a graph structure, if not then disregard...
If your storing a graph, I personally think the best idea would be implementing a function that converts your graph into an adjacency matrix. You can then make a function that converts an adjacency matrix to your graph data structure.
This has three benefits (that may or may not matter in your application):
adjacency matrix are a very natural way to create and store a graph
You can create an adjacency matrix and import them into your applications
You can store and read your data in a meaningful way.
I used this method during a CS project and is definitely how I would do it again.
You can read more about adjacency matrix here: http://en.wikipedia.org/wiki/Modified_adjacency_matrix

Another option is Avro C, an implementation of Apache Avro in C.

Here is an example using the Binn library (my creation):
binn *obj;
// create a new object
obj = binn_object();
// add values to it
binn_object_set_int32(obj, "id", 123);
binn_object_set_str(obj, "name", "Samsung Galaxy Charger");
binn_object_set_double(obj, "price", 12.50);
binn_object_set_blob(obj, "picture", picptr, piclen);
// send over the network
send(sock, binn_ptr(obj), binn_size(obj));
// release the buffer
binn_free(obj);
If you don't want to use strings as keys you can use a binn_map which uses integers as keys.
There is also support for lists, and all these structures can be nested:
binn *list;
// create a new list
list = binn_list();
// add values to it
binn_list_add_int32(list, 123);
binn_list_add_double(list, 2.50);
// add the list to the object
binn_object_set_list(obj, "items", list);
// or add the object to the list
binn_list_add_object(list, obj);

In theory YAML should do what you want http://code.google.com/p/yaml-cpp/
Please let me know if it works for you.

Related

How to encode JSON buffer in C?

I have need of some advice.
I gather data from sensors on the analogue ports and I maintain data on the readings.
I then format this data into a json style format which I then use to send it to cloud.
Now the specific code I have for formatting the various values to json are held, not in a string of course, but in a character array using the int sprintf ( char * str, const char * format, ... ); method.
Here is my routines that uses this code:
void StackData() {
char buff[256];
sprintf(buff, "{\"id\":\"stat\",\"minHour\":%1i,\"maxHour\":%2i,\"minDay\":%3i,\"maxDay\":%4i,\"inHour\":%5lu,\"iinDay\":%6lu,\"inWeek\":%7lu}",
minHour, maxHour, minDay, maxDay, AmpsHour, AmpsDay, AmpsWeek);
}
I would like to see how others might do this differently, or is this another way by using a specific library to do this?
PS: I have successfully used coreJSON library to parse JSON input

What you have is reasonable, although an alternative might be some sort of result builder:
char buff[256] = { 0 }
jsonObjectOpen(buff);
jsonObjectInteger(buff,"minHour", minHour);
jsonObjectInteger(buff,"maxHour", maxHour);
jsonObjectClose(buff);
Basically each function is appending the necessary json elements to the buffer, and you'd need to implement functions for each data type (string, int, float), and of course, make sure you use the in the correct order.
I don't think this is more succinct, but if you are doing it more than a few times, especially for more complex structures, you might find it more readible and maintainable.
It's entirely possible there is an existing library that will help with this type of approach, also being mindful of ensuring that the buffer space isn't exceeded during the building process.
In other languages that have type detection, this is a lot easier, and I supposed you could always have a single function that takes a void pointer and a 'type' enum, but that could be more error prone for the sake of a marginally simpler API.

I might be good idea to separate JSON object building from the encoding.
One of the existing JSON C-library do it by the following way:
json_t *item = json_object();
json_object_set_new(item, "id", json_string("stat"));
json_object_set_new(item, "minHour", json_integer(minHour));
json_object_set_new(item, "maxHour", json_integer(maxHour));
...
// Dump to console
json_dumpf(item, stdout, JSON_INDENT(4) | JSON_SORT_KEYS);
// Dump to file
json_dumpf(item, file, JSON_COMPACT);
// Free allocated resources
json_decref(item);
The separation give some benefits.
For example, encode formatting can be selected in one place.
And the same object can be easily encoded several ways (as in the example).

Shared pointer in rust arrays

I have two arrays:
struct Data {
all_objects: Vec<Rc<dyn Drawable>>;
selected_objects: Vec<Rc<dyn Drawable>>;
}
selected_objects is guarenteed to be a subset of all_objects. I want to be able to somehow be able to add or remove mutable references to selected objects.
I can add the objects easily enough to selected_objects:
Rc::get_mut(selected_object).unwrap().select(true);
self.selected_objects.push(selected_object.clone());
However, if I later try:
for obj in self.selected_objects.iter_mut() {
Rc::get_mut(obj).unwrap().select(false);
}
This gives a runtime error, which matches the documentation for get_mut: "Returns None otherwise, because it is not safe to mutate a shared value."
However, I really want to be able to access and call arbitrary methods on both arrays, so I can efficiently perform operations on the selection, while also being able to still perform operations for all objects.
It seems Rc does not support this, it seems RefMut is missing a Clone() that alows me to put it into multiple arrays, plus not actually supporting dyn types. Box is also missing a Clone(). So my question is, how do you store writable pointers in multiple arrays? Is there another type of smart pointer for this purpose? Do I need to nest them? Is there some other data structure more suitable? Is there a way to give up the writable reference?

Ok, it took me a bit of trial and error, but I have a ugly solution:
struct Data {
all_objects: Vec<Rc<RefCell<dyn Drawable>>>;
selected_objects: Vec<Rc<RefCell<dyn Drawable>>>;
}
The Rc allows you to store multiple references to an object. RefCell makes these references mutable. Now the only thing I have to do is call .borrow() every time I use a object.
While this seems to work and be reasonably versitle, I'm still open for cleaner solutions.

Writing nested struct to disk in C

So I have a struct that looks something like this (more or less):
typedef struct AST_STRUCT
{
enum {
AST_OBJECT,
AST_REFERENCE,
AST_VARIABLE,
AST_VARIABLE_DEFINITION,
AST_VARIABLE_ASSIGNMENT,
AST_VARIABLE_MODIFIER,
AST_FUNCTION_DEFINITION,
AST_FUNCTION_CALL,
AST_NULL,
AST_STRING,
AST_CHAR,
AST_FLOAT,
AST_LIST,
AST_BOOLEAN,
AST_INTEGER,
AST_COMPOUND,
AST_TYPE,
AST_BINOP,
AST_NOOP,
AST_BREAK,
AST_RETURN,
AST_IF,
AST_ELSE,
AST_WHILE,
AST_ATTRIBUTE_ACCESS,
AST_LIST_ACCESS,
AST_NEW
} type;
struct AST_STRUCT* variable_value;
}
Now I would like to write this struct, serialized to the disk into a .dat file.
The problem is that as you can see, it has a field called variable_value .
I am using this function to write it to disk:
https://www.geeksforgeeks.org/readwrite-structure-file-c/
I am also using the other function in that article to read it from the disk.
It appears as if the variable_value field on the struct is not loaded properly.
How would I write the entire struct to disk and maintain the data of the variable_value field?
I was first thinking about dumping the variable_value field into a separate file and then sort of "link it back" into the struct once I load it, but maybe there is another way of doing this?

You are trying to serialize a linked structure (I assume a tree, from the name "AST").
If we imagine for a second that you successfully did that and wrote it to disk, when you load it back, you'll allocate memory for the links in the tree, but the addresses (values of pointers) of these memory chunks are not guaranteed to be the same as the old ones. You will not be able to reconstruct the tree.
So you can't use the value of the addresses as links on the disk. You'll need to use some other method. These days, the popular method is the JSON format, which would work for you assuming that you don't have cross-links or back-links in the tree.
So what you need is a JSON C library. I've never used one, but here are a couple I found in 2 seconds:
json-c
cJSON

JSON and xml are perfectly good formats to use if you want to save your data as text. They do have issues with restoring multiple references to the same object, but even they can be resolved.
If the data is a simple link-list, you can simply restore the objects in order and repair the link-list yourself as you restore the objects. If the data structure is more complex than that you need a more generic solution.
If you really want to save the data as binary, then you need to fix up pointers when you reload. The main way to do this is to keep a map of saved addresses vs newly allocated addresses. If you really don't like the idea of emitting the saved addresses, you can use a serial number to represent each unique address you find.
For each object you save you have to record it's scalar address, and the type of object, before you save the object itself.
For each pointer you save, you need to save the scalar address, and "remember" to later save that referenced object.
On restoring, when you restore an object you need to load the object based on its type, and create a mapping entry that shows how the saved address has turned into a restored address.
If an object contains any pointers, the address stored in those pointers is found by applying that mapping. However, you will often find that the object has not yet been loaded for that pointer, so you will also need to record a mapping from the saved scalar address to the address of the pointer. You can then either fix up all these unfinished pointers after you finish loading, or you can do it for a particular referenced object after that object is loaded.
You need to put in a little extra care to handle objects that support multiple inheritance, by noting that the pointer does not point to the root of the object.
But otherwise, this is just about all you need, plus considerations of versioning, endianness, padding - if you care about the longevity of the data you are saving.

Ansi C dynamic include

I was assigned to edit part of Ansi C application but my knowledge of pure C is just basics. Anyway current situation is I have map1_data1.h, map1_data2.h, map2_data1.h, map2_data2.h and variables in those files are always connected to the map name = map1_structure in map1_data1.h and so on.
In app there is #include for each file and in code then something like
if (game->map == 1){
mapStructure = map1_structure
} else {
mapStructure = map2_structure
}
I have to extend this to be able to load the map dynamicly so something like
void loadMap(int mapId){
mapStructure = map*mapId*_structure // just short for what i want to achieve
}
My first idea to do so was removing map name connection in variables name in map1_data.h and have just structure variable in there. That requires only one header file at time to be loaded and thats where I'm stucked. Havent found any clues to do so on google.
I would like to have it as variable as possible so something like #include "map*mapId*_data1.h" but should be ok to have 1 switch in one place in whole app to decide on what map to be loaded.
One more thing, the app keeps running for more than 1 game = it will load various maps in one run.

Judging from the comments, you have a single type, call it Map, which is a structure type containing a collection of different data types, including 3D arrays and points and so on. You need to have some maps built into the program; later on, you will need to load new maps at runtime.
You have two main options for the runtime loading the maps:
Map in shared object (shared library, dynamically loaded library, aka DLL).
Map in data file.
Of these two, you will choose the data file over the shared object because it is, ultimately, simpler and more flexible.
Shared Object
With option 1, only someone who can compile a shared library can create the new maps. You'd have a 'library' consisting of one or more data objects, which can be looked up by name. On most Unix-like systems, you'd end up using dlopen() to load the library, and then dlsym() to find the symbol name in that library (specifying the name via a string). If it is present in the library, dlsym() will return you a pointer.
In outline:
typedef void *SO_Handle;
const char *path_to_library = "/usr/local/lib/your_game/libmap32.so";
const char *symbol_name = "map32_structure";
SO_Handle lib = dlopen(path_to_library, RTLD_NOW);
if (lib == 0)
...bail out...
map_structure = dlsym(lib, symbol_name);
if (map_structure == 0)
...bail out...
You have to have some way of generating the library name based on where the software is installed and where extensions are downloaded. You also have to have some way of knowing the name of the symbol to look for. The simplest system is to use a single fixed name (map_structure), but you are not constrained to do that.
After this, you have your general map_structure read for use. You can invent endless variations on the theme.
Data file
This is the more likely way you'll do it. You arrange to serialize the map structure into a disk file that can be read by your program. This will contain a convenient representation of the data. You should consider the TLV (type-length-value) encoding scheme, so that you can tell by looking at the type what sort of data follows, and the length tells you how many of them, and the value is the data. You can do this with binary data or with text data. It is easier to debug text data because you can look at and see what's going on. The chances are that the difference in performance between binary and text is small enough (swamped by the I/O time) that using text is the correct way to go.
With a text description of the map, you'd have information to identify the file as being a map file for your game (perhaps with a map format version number). Then you'd have sections describing each of the main elements in the Map structure. You'd allocate the Map (malloc() et al), and then load the data from the file into the structure.

Common datastructure library in C

Hello I have started writing common data structure library in C similar to STL.
Here is the link . http://code.google.com/p/cstl/
I struggled a lot of whether to go ahead with having void* as basic element for data structure. and End up with structure which has two elements
typedef struct __c_lib__object {
void* raw_data;
size_t size;
} clib_object, *clib_object_ptr;
This approach allow me to store each element, but it requires lot of memory allocation , during saving and returning back the element from the container.
Can anybody please review this , and let me know if there is any other approach.
Thanks
Avinash

Names starting with double-underscore are reserved to 'the implementation' and should be avoided in user code.
Personally, I dislike typedefs for pointers; I'd rather use clib_object *x; than clib_object_ptr x;.
Why do you need to record the size of the object?