In C, is there a cross-platform way to store what a variable might contain for quick reloading of its contents? - c

The idea is that an application may contain a struct of large arrays that are filled up via a slow external library. So, what if that could be easily stored to a file for fast reference, at least after it has been run once? If it's not possible to be done easily in a cross platform way, is it easy to be done locally 'after a first run'?

it depends of the way the structure is filled. if the structure has a fixed size (that is, it does not contain any dynamically allocated pointer) and is self-contained (it does not contain pointers to memory outside the structure itself) then you can dump the struct directly to a file using standard library file operation. something along that way:
#include <stdio.h>
FILE *file;
file = fopen( "filename", "w" );
fwrite( &your_struct, sizeof(your_struct), 1, file )
fclose( file );
(note: error checking ommited for clarity and conciseness)
reloading looks something like this:
file = fopen( "filename", "r" );
fread( &your_struct, sizeof(your_struct), 1, file );
fclose( file );
this method will work on all platforms.
however, this method is not strictly cross-platform, since the resulting file cannot be ported between machines of different endianness (for example, old Macintosh'es used to store the bytes composing an int in a different order than an IBM PC); the resulting file can only be used on platforms of the same architecture than the computer which produced the file.
now if the struct is not self-contained (it contains a pointer referencing memory outside the struct) or uses dynamically allocated memory, then you will need something more elaborate...
regarding the endianness problem, the standard BSD socket implementation, which exists on almost every platform, defines a set of functions to convert from network byte order to host byte order (and their inverse), which are really handy, since the network byte order is strictly cross-platform. have a look at htons() and ntohs(), htonl() and ntohl(). unfortunately, you have to call those functions for each field of the structure, which is quite cumbersome if the structure is large.

maybe you can store the data in XML-Format-File. With that you can avoid the problems Adrian told, and you also have no problem with language specific character codesets, and you even have the opportunity to read and write and handle the data in completly different programming languages

Related

Virtualizing fopen with some malloc-ed memory instead of using a file

I have a piece of code using a FILE* file with a fwrite:
test = fwrite(&object,sizeof(object),1,file);
I want to serialize some internal data structure with an indexing structure (so, I'm using neither Google's Protobuf nor Cap'n Proto, since that is a custom data structure with some specific indexing requirements). Now, inside my project, I want to use Google Test in order to test the serialization, in order to check that what it has been serialized it could be deserialized and easily retrieved, too. In the testing phase, I want to pass to fwrite a FILE* object which is not a file, but a handler to some allocated main memory, so that no file is procuded, and that I can directly check the main memory for the results of the serialization. Is it possible to virtualize the FILE* and to write directly into the main memory? I would like to keep fwrite for writing data structures for performance reasons, without being forced to write two different methods for serialization (Sometimes i'm writing on the fly with no further memory occupation for transcoding). Thanks in advance.
One way is to create a dynamic library with all those fopen/fwrite functions (that would do something for your magic filename and fall back to the original ones otherwise) and load it with LD_PRELOAD. To fall back to the originals, resolve them with "dlsym" and RTLD_NEXT.
Another way is to include a special header at the top of the library/test which would have a statement like "#define fopen my_fopen". Inside the file with the implementation of "my_fopen" you need to put "#undef fopen" before including original "stdio.h". This approach will only work for your source code files that include the header but will not hook the functions for the binary libraries that you link.
fopencookie did the job I was looking for.
http://man7.org/linux/man-pages/man3/fopencookie.3.html

C - Save/Load Pointer Data to File

Firstly apologies if this question has been asked before or has a glaring obvious solution that I cannot see. I have found a similar question however I believe what I am asking goes a little further than what was previously asked.
I have a structure as follows:
typedef struct {
int id;
char *title;
char *body;
} journal_entry;
Q: How do I write and load the contents of a pointer to memory in C (not C++) without using fixed lengths?
Am I wrong in thinking that by writing title or body to file I would endup with junk data and not actually the information I had stored? I do not know the size that the title or body of a journal entry would be and the size may vary significantly from entry to entry.
My own reading suggests that I will need to dereference pointers and fwrite each part of the struct separately. But I'm uncertain how to keep track of the data and the structs without things becoming confused particularly for larger files. Furthermore if these are not the only items I intend to store in the file (for example I may wish to include small images later on I'm uncertain how I would order the file structure for convenience.
The other (possibly perceived) problem is that I have used malloc to allocate memory for the string for the body / entry when loading the data how will I know how much memory to allocate for the string when I wish to load the entry again? Do I need to expand my struct to include int body_len and int title_len?
Guidance or suggestions would be very gratefully received.
(I am focusing on a Linux point of view, but it could be adapted to other systems)
Serialization
What you want to achieve is often called serialization (citing wikipedia) - or marshalling:
The serialization is the process of translating data structures or object state into a format that can be stored and reconstructed later in the same or another computer
Pointer I/O
It is in principle possible to read and write pointers, e.g. the %p conversion specification for fprintf(3) & fscanf(3) (and you might directly write and read a pointer, which is like at the machine level some intptr_t integer. However, a given address (e.g. 0x1234F580 ...) is likely to be invalid or have a different meaning when read again by a different process (e.g. because of ASLR).
Serialization of aggregate data
You might use some textual format like JSON (and I actually recommend doing so) or other format like YAML (or perhaps invent your own, e.g. inspired by s-exprs). It is a well established habit to prefer textual format (and Unix had that habit since before 1980) to binary ones (like XDR, ASN/1, ...). And many protocols (HTTP, SMTP, FTP, JSONRPC ....) are textual protocols
Notice that on current systems, I/O is much slower than computation, so the relative cost of textual encoding & decoding is tiny w.r.t. network or disk IO (see table of Answers here)
The encoding of a some aggregate data (e.g. a struct in C) is generally compositional, and by composing the encoding of elementary scalar data (numbers, strings, ....) you can encode some higher-level data type.
serialization libraries
Most formats (notably JSON) have several free software libraries to encode/decode them, e.g. Jansson, JsonCPP, etc..
Suggestion:
Use JSON and format your journal_entry perhaps into a JSON object like
{ "id": 1234,
"title": "Some Title Here",
"body": "Some body string goes here" }
Concretely, you'll use some JSON library and first convert your journal_entry into some JSON type (and vice versa), then use the library to encode/decode that JSON
databases
You could also consider a database approach (e.g. sqlite, etc...)
PS. Serialization of closures (or anything containing pointer to code) may be challenging. You'll need to define what exactly that means.
PPS. Some languages provide builtin support for serialization and marshalling. For example, Ocaml has a Marshal module, Python has pickle
You are correct that storing this structure in memory is not a good idea, because once the strings to which your pointers point are gone, there is no way to retrieve them. From the practical point of view, one way is to declare strings of finite length (if you know that your strings have a length limit):
typedef struct {
int id;
char title[MAX_TITLE_LEGNTH];
char body[MAX_BODY_LENGTH];
} journal_entry;
If you need to allocate title and body with malloc, you can have a "header" element that stores the length of the whole structure. When you write your structure to file, you would use this element to figure out how many bytes you need to read.
I.e. to write:
FILE* fp = fopen(<your-file-name>,"wb");
size_t size = sizeof(id)+strlen(title)+1+strlen(body)+1;
fwrite(&size, sizeof(size), 1, fp);
fwrite(&id, sizeof(id), 1, fp);
fwrite(title, sizeof(char), strlen(title)+1, fp);
fwrite(body, sizeof(char), strlen(body)+1, fp);
fclose(fp);
To read (not particularly safe implementation, just to give the idea):
FILE* fp = fopen(<your-file-name>,"rb");
size_t size;
int read_bytes = 0;
struct journal_entry je;
fread(&size, sizeof(size), 1, fp);
void* buf = malloc(size);
fread(buf, size, 1, fp);
fclose(fp);
je.id = *((int*)buf); // might break if you wrote your file on OS with different endingness
read_bytes += sizeof(je.id)
je.title = (char*)(buf+read_bytes);
read_bytes += strlen(je.title)+1;
je.body = (char*)(buf+read_bytes);
// other way would be to malloc je.title and je.body and destroy the buf
In memory you can store strings as pointers to arrays. But in a file on disk you would typically store the data directly. One easy way to do it would be to store a uint32_t containing the size, then store the actual bytes of the string. You could also store null-terminated strings in the file, and simply scan for the null terminator when reading them. The first method makes it easier to preallocate the needed buffer space when reading, without needed to pass over the data twice.

Write dynamically allocated structure to file

Suppose we have following structure:
struct Something {
int i;
};
If I want to write in a file any data of this type(dynamically allocated), I do this:
struct Something *object = malloc(sizeof(struct Something));
object->i = 0; // set member some value
FILE *file = fopen("output_file", "wb");
fwrite(object, sizeof(struct Something), 1 file);
fclose(file);
Now, my questions:
How we do this with a structure what contains pointers? I tested using same method, it worked fine, data could been read, but I want to know if there are any risks?
What you want is called serialization. See also XDR (a portable binary data format) & libs11n (a C++ binary serialization library); you often care about data portability: being able to read the data on some different computer.
"serialization" means to "convert" some complex data structure (e.g. a list, a tree, a vector or even your Something...) into a (serial) byte stream (e.g. a file, a network connection, etc...), and backwards. Dealing with circular data structures or shared sub-components may be tricky.
You don't want to write raw pointers inside a file (but you could), because the written address probably won't make any sense at the next execution of your program (e.g. because of ASLR), i.e. when you'll read the data again.
Read also about application checkpointing and persistence.
For pragmatic reasons (notably ease of debugging and resilience w.r.t. small software evolution) it is often better to use some textual data format (like e.g. JSON or Yaml) to store such persistent data.
You might also be interested in databases. Look first into sqlite, and also into DBMS ("relational" -or SQL based- ones like PostGreSQL, NoSQL ones like e.g. MongoDB)
The issue is not writing a single dynamically allocated struct (since you want mostly to write the data content, not the pointer, so it is the same to fwrite a malloc-ed struct or a locally allocated one), it is to serialize complex data structures which use lots of weird internal pointers!
Notice that copying garbage collectors use algorithms similar to serialization algorithms (since both need to scan a complex graph of references).
Also, on today's computers, disk -or network- IO is a lot (e.g. a million times) slower than the CPU, so it makes sense to do some significant computation before writing files.

Use fopen to open file repeatedly in C

I have a question about "fopen" function.
FILE *pFile1, *pFile2;
pFile1 = fopen(fileName,"rb+");
pFile2 = fopen(fileName,"rb+");
Can I say that pFile1==pFile2? Besides, can FILE type be used as a key of map?
Thanks!
Can I say that pFile1 == pFile2?
No pFile1 and pFile2 are pointers to two distinct FILE structures, returned by the two different function calls.
Give it a try!!
To add further:
Note opening a file that is already open has implementation-defined behavior, according to the C Standard:
FIO31-C. Do not open a file that is already open
subclause 7.21.3, paragraph 8 [ISO/IEC 9899:2011]:
Functions that open additional (nontemporary) files require a file
name, which is a string. The rules for composing valid file names are
implementation-defined. Whether the same file can be simultaneously
open multiple times is also implementation-defined.
Some platforms may forbid a file simultaneously being opened multiple times, but other platforms may allow it. Therefore, portable code cannot depend on what will happen if this rule is violated. Although this isn't a problem on POSIX compliant systems. Many applications open a file multiple times to read concurrently (of-course if you wants writing operation also then you may need concurrency control mechanism, but that's a different matter).
Can I say that pFile1==pFile2?
(edited after reading the pertinent remark of Grijesh Chauhan)
you can say that pFile1 != pFile2, because 2 things can happen:
the system forbids opening the file twice, in which case pFile2 will be NULL
the system allows a second opening, i, which case pFile2 will point to a different context.
This is one more reason among thousands to check system calls, by the way.
Assuming the second call succeeded you can,for instance, seek to a given position with pFile1 while you read from another with pFile2.
As a side note, since you will eventually access the same physical disk, it is rarely a good idea to do so unless you know exactly what you're doing. Seeking back and forth like crazy between two different parts of a big file could eventually force the disk driver to wobble between two physical parts of the disk, reducing your I/O performance dramatically (unless the disk is a non-seeking device like an SSD).
can FILE type be used as a key of map?
No, because
it would not make any sense to use an unknown structure of an unknown size whose lifetime you have no direct control of as a key
the FILE class does not implement the necessary comparison operator
You could use a FILE *, though, since any pointer can be used as a map key.
However, it is pretty dangerous to do so. For one thing, the pointer is just like a random number to you. It comes from some memory allocation within the sdtio library, and you have no control over it.
second, if for some reason you deallocate the file handle (i.e. you close the file), you will keep using an invalid pointer reference as a key unless you also remove the file from the map. This is doable, but both awkward and dangerous IMHO.

writing data structure to a file

I know following approach may not be portable but that is exactly what I want to find out now.
Imagine I have some data structure
struct student
{
char name[20];
int age;
} x;
Now I want to write it to a file like this:
fwrite(&x, sizeof(student), 1, filePointer);
Read similarly:
fread(voidPointer, sizeof(student), 1, filePointer);
// Now possibly do a memcpy
memcpy(studentObjectPointer, voidPointer, sizeof(student));
My question is: Say I don't want to copy this file to another computer, and I will read it from the same computer that created this file.
Will the portability (endianness, packed data structure) issues still apply to above approach? Or it will work fine?
If the file would be copied on other machines, you will have to build your own serializer and deserializer. With the structure you gave it is quite simple.
You have to define which endianness to adopt when writing numbers (here, the int age).
Here, you could process like this :
Open the file in binary mode
Write the 20 bytes of the name string
Write the age in a CHOSEN endianness (big endian)
When reading it back, you will certainly have to convert the Big-Endianness to the local endianness of the machine.
There is still a remaining issue : if sizeof (int) is not the same between the two machines. In that case things get more vicious and I think the above is not sufficient.
If you really need to be portable across a wide range of machines, consider the use of specific length types defined in #include <stdint.h> such as int32_t for instance.
Remember, we're living in a 4-dimension world. You said, the saved file will not move in the x, y and z axis, it will be used on the same computer. But you did not mentioned the 4th dimension, the time. If the structure changes, it will fail.
At least, you should put a signature field in the structure. Fill it with a constant value before write(), check it right after read(), and change the constant value when the structure gets modified.

Resources