I am trying to read mixed data into a C struct
usually, I do something like this
typedef struct data {
uint32_t value;
float x,y,z;
} __attribute__((__packed__));
and read it in like so:
data x;
fread(&x, 1, sizeof(data), filePointer);
and that works just fine for fixed length data, however, I need to load a ASCIIZ string, which is variable length, and I was wondering if there was a easy way to read that into a struct
Sorry, but there is no built-in serialization for C. This has been asked on SO before with some very good answers.
If that doesn't give you what you want, then search for C serialize or C serialization in your favorite search engine.
There are two ways you could be storing your ASCIIZ string in the structure, exemplified by:
struct asciiz_1
{
char asciiz[32];
};
struct asciiz_2
{
size_t buflen;
char *buffer;
};
The first (struct asciiz_1) can be treated the same way as your struct data; even though the string may be of variable length with garbage after the null (zero) byte, the structure as a whole is a fixed size and can be handled safely with fread() and fwrite().
The second (struct asciiz_2) is a lost cause. You have to allocate the extra space to receive the string (presumably after reading the length), and the pointer value should not be written to the file (it won't have any meaning to the reading process). So, you have to handle this differently.
Your data structure - your choice.
Related
I'd like to output a struct's data to a binary file, but without any padding bits between each variable's information. For example:
struct s {
int i1;
short s1;
char c1;
};
struct s example[2];
If I use fwrite(&example, sizeof(struct s), 2, file), the binary file still has the padding bits between, for example, s1 and c1, and also from c1 to i1 (of the 2nd struct).
What would be a good approach to remove those padding bits from the output file ?
Thanks! Any help is appreciated
I would just suggest manually reading/writing the members of the struct individually. Packing using your compiler directives can cause inefficiency and portability issues with unaligned data access. And if you have to deal with endianness, it's easy to support that later when your read operations break down to field members rather than whole structs.
Another thing, and this relates more to futuristic maintenance-type concerns, is that you don't want your serialization code or the files people have saved so far to break if you change the structure a bit (add new elements or even change the order as a cache line optimization, e.g.). So you'll potentially run into a lot less pain with code that provides a bit more breathing room than dumping the memory contents of the struct directly into a file, and it'll often end up being worth the effort to serialize your members individually.
If you want to generalize a pattern and reduce the amount of boilerplate you write, you can do something like this as a basic example to start and build upon:
struct Fields
{
int num;
void* ptrs[max_fields];
int sizes[max_fields];
};
void field_push(struct Fields* fields, void* ptr, int size)
{
assert(fields->num < max_fields);
fields->ptrs[fields->num] = ptr;
fields->sizes[fields->num] = size;
++fields->num;
}
struct Fields s_fields(struct s* inst)
{
struct Fields new_fields;
new_fields.num = 0;
field_push(&new_fields, &inst->i1, sizeof inst->i1);
field_push(&new_fields, &inst->s1, sizeof inst->s1);
field_push(&new_fields, &inst->c1, sizeof inst->c1);
return new_fields;
}
Now you can use this Fields structure with general-purpose functions to read and write members of any struct, like so:
void write_fields(FILE* file, struct Fields* fields)
{
int j=0;
for (; j < fields->num; ++j)
fwrite(fields->ptrs[j], fields->sizes[j], 1, file);
}
This is generally a bit easier to work with than some functional for_each_field kind of approach accepting a callback.
Now all you have to worry about when you create some new struct, S, is to define a single function to output struct Fields from an instance to then enable all those general functions you wrote that work with struct Fields to now work with this new S type automatically.
Many compilers accept a command line parameter which means "pack structures". In addition, many accept a pragma:
#pragma pack(1)
where 1 means byte alignment, 2 means 16-bit word alignment, 4 means 32-bit word alignment, etc.
To make your solution platform independent, you can create a function that writes each field of the struct one at a time, and then call the function to write as many of the structs as needed.
int writeStruct(struct s* obj, size_t count, FILE* file)
{
size_t i = 0;
for ( ; i < count; ++i )
{
// Make sure to add error checking code.
fwrite(&(obj[i].i1), sizeof(obj[i].i1), 1, file);
fwrite(&(obj[i].s1), sizeof(obj[i].s1), 1, file);
fwrite(&(obj[i].c1), sizeof(obj[i].c1), 1, file);
}
// Return the number of structs written to file successfully.
return i;
}
Usage:
struct s example[2];
writeStruct(s, 2, file);
I have to write a structure to a file so i can read it in later. The struct is:
struct prog{
char* title;
char* channel;
struct tm* start;
struct tm* end;
float review;
enum soort sort;
union type *type;
};
union type{
struct serie *ep;
struct film *mov;
};
struct serie{
int seznum;
int epnum;
char* eptitle;
};
struct film{
char* reg;
char* act;
char* genre;
};
enum sort { film, serie }
So you can see that there are pointers to strings and the struct tm (from time.h). I can't figure out how to write it to a binary file, all i can do is write it string by string but there must be a more efficient way to write the previous struct.
I think the problem starts with the char* and the pointers to the tm structs because the program is now going to write the adress of the start of a string or the adress of the tm struct. I wan't to write the string and not the adress to a file. I tried with record I/O but it writes the adresses so i can't read them later.
Thanks!
You may want to look at time_t or other alternatives which can store time in a single integer timestamp. You can also use a database or a simple file storage class.
Otherwise you can't really improve this with native C functions. You could remove pointers and use fixed character arrays like so:
struct T_film
{
char text[50];
int i;
};
struct T_prog
{
char title[50];
tm start;
tm end;
T_film film;
};
T_prog data;//initialize...
fwrite(&data, 1, sizeof(T_prog), fout);
fseek(fin,0,0);
fread(&data, 1, sizeof(T_prog), fin);
Now the structure has a fixed pattern and everything can be stored in binary. BUT, this can be even more inefficient. The size of text is too short, or too long with extra blank space which takes too long to read. It's not portable either. So you may want to stick to pointers and reading/writing with makeshift methods.
You can also separate fixed sized data and put them in a separate structure. Write the fixed sized data in one block, then write the character strings one by one.
This question is really about how to use variable-length types in the Python/C API (PyObject_NewVar, PyObject_VAR_HEAD, PyTypeObject.tp_basicsize and .tp_itemsize , but I can ask this question without bothering with the details of the API. Just assume I need to use an array inside a struct.
I can create a list data structure in one of two ways. (I'll just talk about char lists for now, but it doesn't matter.) The first uses a pointer and requires two allocations. Ignoring #includes and error handling:
struct listptr {
size_t elems;
char *data;
};
struct listptr *listptr_new(size_t elems) {
size_t basicsize = sizeof(struct listptr), itemsize = sizeof(char);
struct listptr *lp;
lp = malloc(basicsize);
lp->elems = elems;
lp->data = malloc(elems * itemsize);
return lp;
}
The second way to create a list uses array notation and one allocation. (I know this second implementation works because I've tested it pretty thoroughly.)
struct listarray {
size_t elems;
char data[1];
};
struct listarray *listarray_new(size_t elems) {
size_t basicsize = offsetof(struct listarray, data), itemsize = sizeof(char);
struct listarray *la;
la = malloc(basicsize + elems * itemsize);
la->elems = elems;
return lp;
}
In both cases, you then use lp->data[index] to access the array.
My question is why does the second method work? Why do you declare char data[1] instead of any of char data[], char data[0], char *data, or char data? In particular, my intuitive understanding of how structs work is that the correct way to declare data is char data with no pointer or array notation at all. Finally, are my calculations of basicsize and itemsize correct in both implementations? In particular, is this use of offsetof guaranteed to be correct for all machines?
Update
Apparently this is called a struct hack: In C99, you can use a flexible array member:
struct listarray2 {
size_t elems;
char data[];
}
with the understanding that you'll malloc enough space for data at runtime. Before C99, the data[1] declaration was common. So my question now is why declare char data[1] or char data[] instead of char *data or char data?
The reason you'd declare char data[1] or char data[] instead of char *data or char data is to keep your structure directly serializable and deserializable. This is important in cases where you'll be writing these sorts of structures to disk or over a network socket, etc.
Take for example your first code snippet that requires two allocations. Your listptr type is not directly serializable. i.e. listptr.elems and the data pointed to by listptr.data are not in a contiguous piece of memory. There is no way to read/write this structure to/from disk with a generic function. You need a custom function that is specific to your struct listptr type to do it. i.e. On serialize you'd have to first write elems to disk, and then write the data pointed to by the data pointer. On deserialization you'd have to read elems, allocate the appropriate space to listptr.data and then read the data from disk.
Using a flexible array member solves this problem because listptr.elem and the listptr.data reside in a contiguous memory space. So to serialize it you can simply write out the total allocated size for the structure and then the structure itself. On deserialize you then first read the allocated size, allocate the needed space and then read your listptr struct into that space.
You may wonder why you'd ever really need this, but it can be an invaluable feature. Consider a data stream of heterogeneous types. Provided you define a header that defines the which heterogeneous type you have and its size and precede each type in the stream with this header, you can generically serialize and deserialize data stream very elegantly and efficiently.
The only reason I know of for choosing char data[1] over char data[] is if you are defining an API that needs to be portable between C99 and C++ since C++ does not have support for flexible array members.
Also, wanted to point out that in the char data[1] you can do the following to get the total needed structure size:
size_t totalsize = offsetof(struct listarray, data[elems]);
You also ask why you wouldn't use char data instead of char data[1] or char data[]. While technically possible to use just plain old char data, it would be (IMHO) morally shunned. The two main issues with this approach are:
You wanted an array of chars, but now you can't access the data member directly as an array. You need to point a pointer to the address of data to access it as an array. i.e.
char *as_array = &listarray.data;
Your structure definition (and your code's use of the structure) would be totally misleading to anyone reading the code. Why declare a single char when you really meant an array of char?
Given these two things, I don't know why anyone would use char data in favor of char data[1]. It just doesn't benefit anyone given the alternatives.
I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"
This is mainly in the lstring.h/lstring.c files (I am using 5.2)
I will show the code I am curious about first. Its from lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.
Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?
The idea is probably that you allocate the header and the data in one big chunk of data instead of two:
TString *str = (TString*)malloc(sizeof(TString) + <length_of_string>);
In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.
But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.
As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack
struct lenstring {
unsigned length;
char data[0];
};
but C99 added flexible array members so it can be done in a standard compliant way as
struct lenstring {
unsigned length;
char data[];
};
If Lua's string were done in this way it'd be something like
typedef union TString {
L_Umaxalign dummy;
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len;
const char data[];
} tsv;
} TString;
#define getstr(ts) (ts->tsv->data)
It relates to the complications arising from the more limited C language. In C++, you would just define a base class called GCObject which contains the garbage collection variables, then TString would be a subclass and by using a virtual destructor, both the TString and it's accompanying const char * blocks would be freed properly.
When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.
What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that free(void *) does not need to know anything other than the address of the memory block.
#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked
Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.
If your TString pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.
The pseudo code for this kind of garbage collection would be something like this:
GCHeader *next, *prev;
GCHeader *current = firstObject;
while(current)
{
next = current->next;
if (/* current is ready for deletion */)
{
free(current);
// relink previous to the next (singly-linked list)
if (prev)
prev->next = next;
}
else
prev = current; // store previous undeleted object
current = next;
}
I have something like this, in fact more complex struct than this:
typedef struct _sample {
unsigned char type;
char *name;
test *first;
} sample;
typedef struct _test {
test *prev;
test *next;
char *name;
int total;
test_2 **list;
} test;
typedef struct _test_2 {
char *name;
unsigned int blabla;
} test_2;
sample *sample_var;
I want to backup this struct into a file and after restore it.
I also try with fwrite(sample_var, sizeof(sample), 1, file_handle); but the real problem is sizeof(sample) that return wrong size, not real variable size.
There is a way to save it into file & restore without knowing the size?
You are trying to serialize, or marshal the structure. You can't just fwrite the data (having pointers is the most obvious stopper). The sizeof problem is really minor when compared to storing pointers in a file (a pointer is meaningless outside the program where it originated).
You will have to define your own serialization / deserialization functions. You could either use your own simple format or use JSON, XML, XDR or something like that.
Personally I would go with JSON, since it's all the rage these days anyway.
As an aside, here is a C FAQ vaguely linked to your own question (though it discusses interoperabillity issues).
There is no easy approach to save such a structure into a file. For instance, even the sample.name field has a size of 4 (depending on architecture), while what you probably want to save is the content of the memory pointed by sample.name.
Here is a sample code that will do such a thing. You will have to duplicate the process to save the entire structure.
void saveToFile(FILE *fh, sample s)
{
fwrite(s.type, sizeof(char), fh);
int nameSize = strlen(s.name); // get the length of the name field
fwrite(nameSize, sizeof(size_t), fh); // write the length of the name field
frwite(s.name, nameSize * sizeof(char), fh); // write the content of the name field
// continue with other fields
}
The idea is to store the size of the next structure and then writting the content. To get the information from the file, you read the size, and then get the data.
sizeof(sample) is not incorrect: it returns the size of a char followed by two pointers. If you need to save such a recursive data type, you have to manually follow dereference the pointers.
It seems like what you really want to do is store the struct and what it's pointer's are referring to, not the pointers themselves.
You will need to write some logic the determine the size of the the data being pointed at, and write that data to the file instead of the pointers.