How to output a binary file in C without padding bits - c

I'd like to output a struct's data to a binary file, but without any padding bits between each variable's information. For example:
struct s {
int i1;
short s1;
char c1;
};
struct s example[2];
If I use fwrite(&example, sizeof(struct s), 2, file), the binary file still has the padding bits between, for example, s1 and c1, and also from c1 to i1 (of the 2nd struct).
What would be a good approach to remove those padding bits from the output file ?
Thanks! Any help is appreciated

I would just suggest manually reading/writing the members of the struct individually. Packing using your compiler directives can cause inefficiency and portability issues with unaligned data access. And if you have to deal with endianness, it's easy to support that later when your read operations break down to field members rather than whole structs.
Another thing, and this relates more to futuristic maintenance-type concerns, is that you don't want your serialization code or the files people have saved so far to break if you change the structure a bit (add new elements or even change the order as a cache line optimization, e.g.). So you'll potentially run into a lot less pain with code that provides a bit more breathing room than dumping the memory contents of the struct directly into a file, and it'll often end up being worth the effort to serialize your members individually.
If you want to generalize a pattern and reduce the amount of boilerplate you write, you can do something like this as a basic example to start and build upon:
struct Fields
{
int num;
void* ptrs[max_fields];
int sizes[max_fields];
};
void field_push(struct Fields* fields, void* ptr, int size)
{
assert(fields->num < max_fields);
fields->ptrs[fields->num] = ptr;
fields->sizes[fields->num] = size;
++fields->num;
}
struct Fields s_fields(struct s* inst)
{
struct Fields new_fields;
new_fields.num = 0;
field_push(&new_fields, &inst->i1, sizeof inst->i1);
field_push(&new_fields, &inst->s1, sizeof inst->s1);
field_push(&new_fields, &inst->c1, sizeof inst->c1);
return new_fields;
}
Now you can use this Fields structure with general-purpose functions to read and write members of any struct, like so:
void write_fields(FILE* file, struct Fields* fields)
{
int j=0;
for (; j < fields->num; ++j)
fwrite(fields->ptrs[j], fields->sizes[j], 1, file);
}
This is generally a bit easier to work with than some functional for_each_field kind of approach accepting a callback.
Now all you have to worry about when you create some new struct, S, is to define a single function to output struct Fields from an instance to then enable all those general functions you wrote that work with struct Fields to now work with this new S type automatically.

Many compilers accept a command line parameter which means "pack structures". In addition, many accept a pragma:
#pragma pack(1)
where 1 means byte alignment, 2 means 16-bit word alignment, 4 means 32-bit word alignment, etc.

To make your solution platform independent, you can create a function that writes each field of the struct one at a time, and then call the function to write as many of the structs as needed.
int writeStruct(struct s* obj, size_t count, FILE* file)
{
size_t i = 0;
for ( ; i < count; ++i )
{
// Make sure to add error checking code.
fwrite(&(obj[i].i1), sizeof(obj[i].i1), 1, file);
fwrite(&(obj[i].s1), sizeof(obj[i].s1), 1, file);
fwrite(&(obj[i].c1), sizeof(obj[i].c1), 1, file);
}
// Return the number of structs written to file successfully.
return i;
}
Usage:
struct s example[2];
writeStruct(s, 2, file);

Related

Get the index of an element declared in a struct

I'd like to associate each element of my struct (or something else, may not be possible with a struct) with an incrementing index, kind of like an enum but associate with value fields.
Let's say that I have this data struct :
typedef struct
{
int8_t value_at_index0; // start with index 0 for this field
int8_t value_at_index1; // then index = 1, etc
int8_t value_at_index2;
int8_t value_at_index3;
int8_t value_at_index4;
} data;
I want to get the index of a field, and the index should mirror the position where the field was declared in the struct.
EDIT: My problem is that I want to write values in an external memory, but I don't want to figure at which index is the value. For example I'd like to do that: write(index.value_at_index2, data.value_at_index2)
Thanks
Ok im assuming you have a badly written code and don't wanna fix it or its a very specific problem
Although i would just use an array or allocate some memory for it, I haven't read the manual but i remember in a year ago that structs are always written as a block(ie they aren't fragmented unlike memory allocation) and although this is not correct behavior to work with structs like this i did a test and it worked
Use __attribute__((packed, aligned(1))), this makes the padding 1, meaning that no blank space bytes will be added, that is if you have a single byte element it wont add 3 more blank ones to fill the gap.
This doesn't mean that it will shorten everything to one and this stops working as soon as you factor in elements of different sizes as adding one when its going through a 4 bytes integer will cause unexpected behaviour
typedef struct __attribute__((packed, aligned(1))) //like so
{
int8_t value_at_index0; // start with index 0 for this field
int8_t value_at_index1; // then index = 1, etc
int8_t value_at_index2;
int8_t value_at_index3;
int8_t value_at_index4;
} data;
int main(int argc, char **argv){
data data;
int8_t *hey=NULL;
data.value_at_index0=5;
data.value_at_index1=3;
data.value_at_index2=2;
data.value_at_index3=5;
data.value_at_index4=10;
size_t i;
hey = &data;
for (i = 0; i < 5; i++){
printf("hey its working %d", *hey);
hey+=1;
}
//letsee(5, data);
return 0;
}
Now this is not correct behavior, superficially by your question i would just advise you to stick with memory allocation since it allows for dynamic sizes and even if you want a static size, you could just use arrays.
If you want an even sketchier version of above in the most recent C compilers there is a few tricks to make it dynamic. Although i will tell you that you're gonna have a bunch of memory leaks, core dumps, and unexpected behavior
TL;DR This is not correct behavior and I'm just posting this in case that you can't change the code and you must use structs
Now for the clean version i read in the question that you are going to write a struct assuming that the struct is basic and you just wanna write every static element in it you can easily just do
FILE *fp;
if ((fp = fopen("lets.txt", "w")) == NULL){
return 1;
}
fwrite(&data, sizeof(data), 1, fp);
if (fclose(fp) != 0){
return 1;
}
since when you place a struct in the first parameter, and provide the size of it it will just write every element inside the struct, this also applies to pointers although there you have to specify the size of each element in the size param and the number of elements in the next param.
Again this is assuming all your elements are well defined and static, since structs have a lot of flag's
You should consider using arrays/pointers for this purpose. To make this work you need to make sure that all values are the same type and temporary removing memory padding for that struct. The following code illustrates how it is being done.
/*Remove struct padding*/
#pragma pack(push, 1)
typedef struct
{
int8_t a;
int8_t b;
int8_t c;
int8_t d;
} data;
/*Recover struct padding, some structs from other libraries malfunction from not using struct pudding.*/
#pragma pack(pop)
int main()
{
/*Define a clean data structure*/
data tmp = { 0x00, 0x00, 0x00, 0x00 };
((int8_t *)&tmp)[0] = 0x01; /*data.a = 0x01*/
((int8_t *)&tmp)[1] = 0x03; /*data.b = 0x03*/
((int8_t *)&tmp)[2] = 0x05; /*data.c = 0x05*/
((int8_t *)&tmp)[3] = 0x07; /*data.d = 0x07*/
return 0;
}

Could this use of malloc lead to an "optional" struct field?

I was implementing a structure in which I needed (at runtime) to have an optional field.
So I thought about this:
//...
#include <stdlib.h>
struct test {
int x; // Must be
int y; // Optional (Must be the last field..(?))
};
int main(int argc, char **argv) {
// With the optional field
struct test *a = malloc(sizeof(*a));
a->x = 11;
a->y = 55;
// Without the optional field
struct test *b = malloc(sizeof(*b) - sizeof(int));
b->x = 22;
// ...
free(a);
free(b);
return 0;
}
Could this code do what I ask?
Possibly adding a bit field to check if there is the optional field or not.
Also, if the proposed solution works, if this were implemented for a list of multiple items (> 100000), would it be better to do it to save memory?
Could this code do what I ask?
Well, it could, but you cannot rely on that. Do not do this; it is not a way to write correct programs.
When you write b->x = 22;, the compiler is entitled to behave as if there were a whole struct test at b. You may be thinking, “I am just putting 22 in the bytes for the member x,” but the compiler may use a “store eight bytes” instruction:
Consider some architecture where memory is organized into eight-byte groups. The bus can only read and write whole eight-byte chunks.
Since there is no way to write four bytes in hardware, writing four bytes to memory requires reading eight bytes, manipulating them in processor registers to insert the desired values in four of the bytes, and writing eight bytes back to memory.
The compiler wants to optimize b->x = 22;, and it knows y has not been set yet, so it is allowed to have any value. So, instead of using an inefficient write-four-byte sequence, the compiler generates an eight-byte store that puts 22 in b->x and 0 in b->y.
Then this fails because the compiler has just written 0 to memory that might be in use for something else because it is not part of the space you allocated for b.
“If you lie to the compiler, it will get its revenge.” — Henry Spencer
What you're attempting doesn't conform to the C standard because you're attempting to use an object of type struct test that doesn't have enough memory allocated for it, even though you're only accessing the fields for which memory was allocated. It might work but you can't rely on that.
What you can do is make use of a flexible array member:
struct test {
int x;
int y[];
};
In a struct like this, sizeof(struct test) doesn't include the last member. You can use such a struct by allocating space for the struct plus as many array elements of the last member that you want. For example:
struct test *b = malloc(sizeof(*b) + sizeof(int));
b->x = 1;
b->y[0] = 2;
You'll need to use array indexing to access the last member, but this is a way to do what you want in a standard-conforming manner.
Then in the case you don't want the last member, you do this:
struct test *b = malloc(sizeof(*b));
b->x = 1;
I think your proposed solution is dangerous. Use two different structs:
struct test_x {
int x;
};
struct test_xy {
int x;
int y;
};
Either have two arrays or store a void * to either along with a discriminator (tagged pointer for instance). The other option is use a pointer for the optional element but sizeof(int *) is the same as sizeof(int) at least on my box, so that only makes things larger.
Consider a column layout if all the y members are optional, or you can sort the data so all the xy elements comes first:
struct test_column {
int *x;
int *y;
};
struct test_column t = {
.x = malloc(100000 * sizeof(int)),
.y = 0
It doesn't help you in case but unions are the standard way to two structs share memory so size of each element is
max(sizeof(test_xy), sizeof(test_x)) instead of sizeof(test_xy) + sizeof(test_x).
Finally, consider compression especially if you use the test_column format.

Why does internal Lua strings store the way they do?

I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"
This is mainly in the lstring.h/lstring.c files (I am using 5.2)
I will show the code I am curious about first. Its from lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.
Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?
The idea is probably that you allocate the header and the data in one big chunk of data instead of two:
TString *str = (TString*)malloc(sizeof(TString) + <length_of_string>);
In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.
But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.
As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack
struct lenstring {
unsigned length;
char data[0];
};
but C99 added flexible array members so it can be done in a standard compliant way as
struct lenstring {
unsigned length;
char data[];
};
If Lua's string were done in this way it'd be something like
typedef union TString {
L_Umaxalign dummy;
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len;
const char data[];
} tsv;
} TString;
#define getstr(ts) (ts->tsv->data)
It relates to the complications arising from the more limited C language. In C++, you would just define a base class called GCObject which contains the garbage collection variables, then TString would be a subclass and by using a virtual destructor, both the TString and it's accompanying const char * blocks would be freed properly.
When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.
What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that free(void *) does not need to know anything other than the address of the memory block.
#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked
Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.
If your TString pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.
The pseudo code for this kind of garbage collection would be something like this:
GCHeader *next, *prev;
GCHeader *current = firstObject;
while(current)
{
next = current->next;
if (/* current is ready for deletion */)
{
free(current);
// relink previous to the next (singly-linked list)
if (prev)
prev->next = next;
}
else
prev = current; // store previous undeleted object
current = next;
}

How to save a dynamic struct to file

I have something like this, in fact more complex struct than this:
typedef struct _sample {
unsigned char type;
char *name;
test *first;
} sample;
typedef struct _test {
test *prev;
test *next;
char *name;
int total;
test_2 **list;
} test;
typedef struct _test_2 {
char *name;
unsigned int blabla;
} test_2;
sample *sample_var;
I want to backup this struct into a file and after restore it.
I also try with fwrite(sample_var, sizeof(sample), 1, file_handle); but the real problem is sizeof(sample) that return wrong size, not real variable size.
There is a way to save it into file & restore without knowing the size?
You are trying to serialize, or marshal the structure. You can't just fwrite the data (having pointers is the most obvious stopper). The sizeof problem is really minor when compared to storing pointers in a file (a pointer is meaningless outside the program where it originated).
You will have to define your own serialization / deserialization functions. You could either use your own simple format or use JSON, XML, XDR or something like that.
Personally I would go with JSON, since it's all the rage these days anyway.
As an aside, here is a C FAQ vaguely linked to your own question (though it discusses interoperabillity issues).
There is no easy approach to save such a structure into a file. For instance, even the sample.name field has a size of 4 (depending on architecture), while what you probably want to save is the content of the memory pointed by sample.name.
Here is a sample code that will do such a thing. You will have to duplicate the process to save the entire structure.
void saveToFile(FILE *fh, sample s)
{
fwrite(s.type, sizeof(char), fh);
int nameSize = strlen(s.name); // get the length of the name field
fwrite(nameSize, sizeof(size_t), fh); // write the length of the name field
frwite(s.name, nameSize * sizeof(char), fh); // write the content of the name field
// continue with other fields
}
The idea is to store the size of the next structure and then writting the content. To get the information from the file, you read the size, and then get the data.
sizeof(sample) is not incorrect: it returns the size of a char followed by two pointers. If you need to save such a recursive data type, you have to manually follow dereference the pointers.
It seems like what you really want to do is store the struct and what it's pointer's are referring to, not the pointers themselves.
You will need to write some logic the determine the size of the the data being pointed at, and write that data to the file instead of the pointers.

getting a substruct out of a big struct in C

I'm having a very big struct in an existing program. This struct includes a great number of bitfields.
I wish to save a part of it (say, 10 fields out of 150).
An example code I would use to save the subclass is:
typedef struct {int a;int b;char c} bigstruct;
typedef struct {int a;char c;} smallstruct;
void substruct(smallstruct *s,bigstruct *b) {
s->a = b->a;
s->c = b->c;
}
int save_struct(bigstruct *bs) {
smallstruct s;
substruct(&s,bs);
save_struct(s);
}
I also wish that selecting which part of it wouldn't be too much hassle, since I wish to change it every now and then. The naive approach I presented before is very fragile and unmaintainable. When scaling up to 20 different fields, you have to change fields both in the smallstruct, and in the substruct function.
I thought of two better approaches. Unfortunately both requires me to use some external CIL like tool to parse my structs.
The first approach is automatically generating the substruct function. I'll just set the struct of smallstruct, and have a program that would parse it and generate the substruct function according to the fields in smallstruct.
The second approach is building (with C parser) a meta-information about bigstruct, and then write a library that would allow me to access a specific field in the struct. It would be like ad-hoc implementation of Java's class reflection.
For example, assuming no struct-alignment, for struct
struct st {
int a;
char c1:5;
char c2:3;
long d;
}
I'll generate the following meta information:
int field2distance[] = {0,sizeof(int),sizeof(int),sizeof(int)+sizeof(char)}
int field2size[] = {sizeof(int),1,1,sizeof(long)}
int field2bitmask[] = {0,0x1F,0xE0,0};
char *fieldNames[] = {"a","c1","c2","d"};
I'll get the ith field with this function:
long getFieldData(void *strct,int i) {
int distance = field2distance[i];
int size = field2size[i];
int bitmask = field2bitmask[i];
void *ptr = ((char *)strct + distance);
long result;
switch (size) {
case 1: //char
result = *(char*)ptr;
break;
case 2: //short
result = *(short*)ptr;
...
}
if (bitmask == 0) return result;
return (result & bitmask) >> num_of_trailing_zeros(bitmask);
}
Both methods requires extra work, but once the parser is in your makefile - changing the substruct is a breeze.
However I'd rather do that without any external dependencies.
Does anyone have any better idea? Where my ideas any good, is there some availible implementation of my ideas on the internet?
From your description, it looks like you have access to and can modify your original structure. I suggest you refactor your substructure into a complete type (as you did in your example), and then make that structure a field on your big structure, encapsulating all of those fields in the original structure into the smaller structure.
Expanding on your small example:
typedef struct
{
int a;
char c;
} smallstruct;
typedef struct
{
int b;
smallstruct mysub;
} bigstruct;
Accessing the smallstruct info would be done like so:
/* stack-based allocation */
bigstruct mybig;
mybig.mysub.a = 1;
mybig.mysub.c = '1';
mybig.b = 2;
/* heap-based allocation */
bigstruct * mybig = (bigstruct *)malloc(sizeof(bigstruct));
mybig->mysub.a = 1;
mybig->mysub.c = '1';
mybig->b = 2;
But you could also pass around pointers to the small struct:
void dosomething(smallstruct * small)
{
small->a = 3;
small->c = '3';
}
/* stack based */
dosomething(&(mybig.mysub));
/* heap based */
dosomething(&((*mybig).mysub));
Benefits:
No Macros
No external dependencies
No memory-order casting hacks
Cleaner, easier-to-read and use code.
If changing the order of the fields isn't out of the question, you can rearrange the bigstruct fields in such a way that the smallstruct fields are together, and then its simply a matter of casting from one to another (possibly adding an offset).
Something like:
typedef struct {int a;char c;int b;} bigstruct;
typedef struct {int a;char c;} smallstruct;
int save_struct(bigstruct *bs) {
save_struct((smallstruct *)bs);
}
Macros are your friend.
One solution would be to move the big struct out into its own include file and then have a macro party.
Instead of defining the structure normally, come up with a selection of macros, such as BEGIN_STRUCTURE, END_STRUCTURE, NORMAL_FIELD, SUBSET_FIELD
You can then include the file a few times, redefining those structures for each pass. The first one will turn the defines into a normal structure, with both types of field being output as normal. The second would define NORMAL_FIELD has nothing and would create your subset. The third would create the appropriate code to copy the subset fields over.
You'll end up with a single definition of the structure, that lets you control which fields are in the subset and automatically creates suitable code for you.
Just to help you in getting your metadata, you can refer to the offsetof() macro, which also has the benefit of taking care of any padding you may have
I suggest to take this approach:
Curse the guy who wrote the big structure. Get a voodoo doll and have some fun.
Mark each field of the big structure that you need somehow (macro or comment or whatever)
Write a small tool which reads the header file and extracts the marked fields. If you use comments, you can give each field a priority or something to sort them.
Write a new header file for the substructure (using a fixed header and footer).
Write a new C file which contains a function createSubStruct which takes a pointer to the big struct and returns a pointer to the substruct
In the function, loop over the fields collected and emit ss.field = bs.field (i.e. copy the fields one by one).
Add the small tool to your makefile and add the new header and C source file to your build
I suggest to use gawk, or any scripting language you're comfortable with, as the tool; that should take half an hour to build.
[EDIT] If you really want to try reflection (which I suggest against; it'll be a whole lot of work do get that working in C), then the offsetof() macro is your friend. This macro returns the offset of a field in a structure (which is most often not the sum of the sizes of the fields before it). See this article.
[EDIT2] Don't write your own parser. To get your own parser right will take months; I know since I've written lots of parsers in my life. Instead mark the parts of the original header file which need to be copied and then rely on the one parser which you know works: The one of your C compiler. Here are a couple of ideas how to make this work:
struct big_struct {
/**BEGIN_COPY*/
int i;
int j : 3;
int k : 2;
char * str;
/**END_COPY*/
...
struct x y; /**COPY_STRUCT*/
}
Just have your tool copy anything between /**BEGIN_COPY*/ and /**END_COPY*/.
Use special comments like /**COPY_STRUCT*/ to instruct your tool to generate a memcpy() instead of an assignment, etc.
This can be written and debugged in a few hours. It would take as long to set up a parser for C without any functionality; that is you'd just have something which can read valid C but you'd still have to write the part of the parser which understands C, and the part which does something useful with the data.

Resources