struct padding influence in C struct serialization ( saving to file ) - c

I have the following structs in C:
typedef struct sUser {
char name[nameSize];
char nickname[nicknameSize];
char mail[mailSize];
char address[addressSize];
char password[passwordSize];
int totalPoints;
PlacesHistory history;
DynamicArray requests;
}User;
typedef struct sPlacesHistory {
HistoryElement array[HistorySize];
int occupied;
int last;
}PlacesHistory;
and the functions:
void serializeUser( User * user, FILE * fp ) {
fwrite( user, nameSize + nicknameSize + mailSize + addressSize + passwordSize + sizeof( int ) + sizeof( PlacesHistory ), 1, fp );
serializeDynamicArray( user -> requests, fp );
}
User * loadUser( FILE * fp ) {
User * user = malloc( sizeof( User ) );
fread( user, nameSize + nicknameSize + mailSize + addressSize + passwordSize + sizeof( int ) + sizeof( PlacesHistory ), 1, fp );
user -> requests = loadDynamicArray( fp );
return user;
}
When I load the struct User, and I print that user (loaded from file), the field "last" of placesHistory has the value of 255 or -1, depending on the order of the fields of the PlacesHistory structure. But The User I saved had -1 on that member.
So when i get 255, it is obviously wrong..
I suspect this has to do about struct padding.
How can I do this in such a way that the order of fields in the structure doesn't matter?
Or which criteria do I need to follow to make things work right?
Do I need to fwrite/fread one member at a time? ( I would like to avoid this for efficiency matters )
Do I need to serialize to an array first instead of a file? (I hope not .. because this implicates to know the size of all my structures beforehand because of the mallocated array- which means extra work creating a function for every non simple structure to know it's size)
Note: *Size are defined constants
Note2: DynamicArray is a pointer to another structure.

Yes, it probably has to do with padding in front of either totalPoints or history.
You can just write out sizeof(User) - sizeof(DynamicArray) and read back in the same. Of course this will only be compatible as long as your struct definitions and compiler don't change. If you don't need serialized data from one version of your program to be compatible with another version, then the above should work.

Why are you adding up all elements individually? That's just adding a lot of room for error. Whenever you change your structure, your code might break if you forgot to change all the places where you add the size up (in fact, why do you add it up each time?).
And, as you suspected, your code doesn't account for structure padding either, so you may be missing up to three bytes at the end of your data block (if your largest element is 4 bytes).
Why not sizeof(User) to get the size of the amount of data you're reading/writing? If you don't want parts of it saved (like requests), then use a struct inside a struct. (EDIT: Or, like rlibby suggested, just subtract the sizeof of the part you don't want to read.)
My guess is that your strings sizes are not divisible by 4, so you are 3 bytes short, and as such, it's possible that you were supposed to read "0xffffffff" (=-1) but ended up just reading "0xff000000" (=255 when using little endian, and assuming that your structure was zeroed out initially).

padding may be your problem because
nameSize + nicknameSize + mailSize + addressSize + passwordSize + sizeof( int ) + sizeof( PlacesHistory ) != sizeof( User )
so la last member (and last in struct) remain unitialized. To check this do a memset(,0,sizeof(User)) before reading from file.
To fix this use #pragma pack(push,1) before and #pragma pack(pop) after

Related

How to allocate memory for a struct with a member of unknown variable size?

Somebody kindly helped me a couple days ago with a large part of this in another SO question. At that time, the size of struct base was known in advance. I'd like to alter member path to be a character array the size of which isn't known until right before allocating memory for a new instance of base in the base_new() function.
In the previous version, all files were required to be stored in the same directory and only the file name was added; and it was limited to length 256. Now I'd like to permit user added subdirectories under the ../../databases directory and not limit the length.
Is it possible to set the size of path before or after the db[i] = malloc( sizeof ( struct base ) ) in base_new()?
Or, perhaps, I should simply ask, how can this be accomplished?
Thank you.
/* Global declaration */
struct base {
...
char path[];
};
struct base **db;
/* in main() */
db = malloc( n * sizeof *db );
for (size_t i = 0; i < n; ++i)
db[i] = NULL;
/* Function to assign pointer db[i] to newly allocated struct base */
int base_new( void )
{
/* declarations */
// Assign pointer to beginning of memory allocation for the new instance of struct base.
if ( ( db[i] = malloc( sizeof ( struct base ) ) ) == NULL )
{
printf( "Error attempting to malloc space for new base.\n" );
return 1;
}
// When size of path was fixed, just truncated name to 256. */
l = sizeof( db[i]->path );
rc = snprintf( db[i]->path, l, "%s%.*s", "../../databases/", 256, name );
if ( rc > l - 1 || rc < 0 )
{
// Not enough space; truncate and add the '\0' at l.
db[i]->path[l] = '\0';
}
// When size of path variable and writing path. */
l = sizeof( db[i]->path ) - 16;
rc = snprintf( db[i]->path, l, "%s%s", "../../databases/", path );
if ( rc > l - 1 || rc < 0 )
{
db[i]->path[l] = '\0';
}
}
I got a message at the top of my question asking if an existing question answers this one. It is closely related and helpful, but the answer I received here is better I think and discusses a few other related points. I don't know how it is supposed to work but I picked No because this answer is better, or at least I can understand it better. This answer shows how to malloc the variable member of the struct and discusses freeing the memory of the struct by member before freeing the pointer to the struct. The other question is a bit more general but still helpful. Thanks.
Instead of an array, you can have a pointer inside your struct:
struct base {
...
char *path;
};
Later, allocate memory to this pointer whenever you need it:
base.path = malloc(n * sizeof(char)); // n is the variable size you will set before
Since you allocate memory dynamically now, don’t forget to free to avoid any memory leaks. In C, there is the requirement of every struct having a fixed byte length, so that, for example, sizeof(struct base) can be evaluated at compile time. In your case, the variable length array's size cannot be determined at compile time, so it is illegal do something like char path[l] where l is unknown at compile time.
Btw, a correction regarding
l = sizeof( db[i]->path );
First of all, even if path was declared as an array, this won't give you the size of it in terms of its length, it would return you the complete byte size occupied by the array, you gotta divide it by sizeof (char) to get the length. Now that you have declared it as a pointer however, I guess you don't really need to do this.

Allocate Pointer and pointee at once

If I want to reduce malloc()s (espacially if the data is small and allocated often) I would like to allocate the pointer and pointee at once.
If you assume something like the following:
struct entry {
size_t buf_len;
char *buf;
int something;
};
I would like to allocate memory in the following way (don't care about error checking here):
size_t buf_len = 4; // size of the buffer
struct entry *e = NULL;
e = malloc( sizeof(*e) + buf_len ); // allocate struct and buffer
e->buf_len = buf_len; // set buffer size
e->buf = e + 1; // the buffer lies behind the struct
This could even be extende, so that a whole array is allocated at once.
How would you assess such a technuique with regard to:
Portability
Maintainability / Extendability
Performance
Readability
Is this reasonable? If it is ok to use, are there any ideas on how to design a possible interface for that?
You could use a flexible array member instead of a pointer:
struct entry {
size_t buf_len;
int something;
char buf[];
};
// ...
struct entry *e = malloc(sizeof *e + buf_len);
e->buf_len = buf_len;
Portability and performance are fine. Readability: not perfect but good enough.
Extendability: you can't use this for more than one member at a time, you'd have to fall back to your explicit pointer version. Also, the explicit pointer version means that you have to muck around to ensure correct alignment if you use it with a type that doesn't have an alignment of 1.
If you are seriously thinking about this I'd consider revisiting your entire data structure's design to see if there is another way of doing it. (Maybe this way is actually the best way, but have a good think about it first).
As to portability, I am unaware of any issues, as long as the sizes are found via suitable calls to sizeof(), as in your code.
Regarding maintainability, extendability and readability, you should certainly wrap allocation and de-allocation in a well-commented function. Calls to...
entry *allocate_entry_with_buffer();
void deallocate_entry_with_buffer(entry **entry_with_buffer);
...do not need to know implementation details of how the memory actually gets handled. People use stranger things like custom allocators and memory pools quite frequently.
As for speed, this is certainly faster than making lots of small allocations. I used to allocate whole 2D matrices with a similar strategy...
It should work, but in fact you are using a pointer for a useless indirection. Windows API (for example) uses another method for variable size structs : the variable size buffer is last in struct and is declared to be char buf[1].
Your struct would become :
struct entry {
size_t buf_len;
int something;
char buf[1];
};
The allocation is (still no error checking) :
size_t buf_len = 4; // size of the buffer
struct entry *e;
e = malloc( sizeof(*e) + buf_len - 1); // struct already has room for 1 char
e->buf_len = buf_len; // set buffer size
That's all e.buf is guaranteed to be a char array of size buf_len.
That way ensures that even if the variable part was not a character array but a int, long, or anything array, the alignement would be given by the last element being a array of proper type and size 1.
For starters, the line:
e->buf = e + sizeof(*e); // the buffer lies behind the struct
Should be:
e->buf = e + 1; // the buffer lies behind the struct
This is because e + 1 will be equal to the address at the end of the structure. As you have it, it will only be the number of bytes into the structure equal to the number of bytes in a pointer.
And, yes, it's reasonable. However, I prefer this approach:
struct entry {
size_t buf_len;
int something;
char buf[1];
};
This way, you don't mess with the pointers. Just append as many bytes as needed, and they will grow the size of your buf array.
Note: I wrote a text editor using an approach similar to this but used a Microsoft c++ extension that allowed me to declare the last member as char buf[]. So it was an empty array that was exactly as long as the number of extra bytes I allocated.
seems fine to me - put comments in though
Or you could do this - which is quite common
struct entry {
size_t buf_len;
int something;
char buf;
};
ie make the struct itself variable length. and do
size_t buf_len = 4; // size of the buffer
struct entry *e = NULL;
// check that it packs right
e = malloc(sizeof(size_t) + sizeof(int) + buf_len ); // allocate struct and buffer
e->buf_len = buf_len; // set buffer size
...... later
printf(&e.buf);

allocate struct and memory for elements in one malloc

I am sure this is a basic question but I haven't been able to find whether or not this is a legitimate memory allocation strategy or not. I am reading in data from a file and I am filling in a struct. The size of the members are variable on each read so my struct elements are pointers like so
struct data_channel{
char *chan_name;
char *chan_type;
char *chan_units;
};
So before reading I figure out what the size of each string is so I can allocate memory for them my question is can I allocate the memory for the struct and the strings all in one malloc and then fill the pointer in?
Say the size of chan_name is 9, chan_type 10, and chan_units 5. So I would allocate the and do something like this.
struct data_channel *chan;
chan = malloc(sizeof(struct data_channel) + 9 + 10 + 5);
chan->chan_name = chan[1];
chan->chan_type = chan->chan_name + 9;
chan->chan_units = chan->chan_type + 10;
So I read a couple of articles on memory alignment but I don't know if doing the above is a problem or not or what kind of unintended consequences it could have. I have already implemented it in my code and it seems to work fine. I just don't want to have to keep track of all those pointers because in reality each of my structs has 7 elements and I could have upwards of 100 channels. That of course means 700 pointers plus the pointers for each struct so total 800. The I also have to devise a way to free them all. I also want to apply this strategy to arrays of strings of which I then need to have an array of pointers to. I don't have any structures right now that would mix data types could that be a problem but I might could that be a problem?
If chan_name is a 8 character string, chan_type is a 9 character string and chan_units is a 4 character string, then yes it will work fine when you fix the compilation error you have when assigning to chan_name.
If you allocate enough memory for the structure plus all the strings (including their string terminator) then it's okay to use such a method. Maybe not recommended by all, but it will work.
It depends in part on the element types. You will certainly be able to do it with character strings; with some other types, you have to worry about alignment and padding issues.
struct data_channel
{
char *chan_name;
char *chan_type;
char *chan_units;
};
struct data_channel *chan;
size_t name_size = 9;
size_t type_size = 10;
size_t unit_size = 5;
chan = malloc(sizeof(struct data_channel) + name_size + type_size + unit_size);
if (chan != 0)
{
chan->chan_name = (char *)chan + sizeof(*chan);
chan->chan_type = chan->chan_name + name_size;
chan->chan_units = chan->chan_type + type_size;
}
This will work OK in practice — it was being done for ages before the standard was standardized. I can't immediately see why the standard would disallow this.
What gets trickier is if you needed to allocate an array of int, say, as well as two strings. Then you have to worry about alignment issues.
struct data_info
{
char *info_name;
int *info_freq;
char *info_unit;
};
size_t name_size = 9;
size_t freq_size = 10;
size_t unit_size = 5;
size_t nbytes = sizeof(struct data_info) + name_size + freq_size * sizeof(int) + unit_size;
struct data_info *info = malloc(nbytes);
if (info != 0)
{
info->info_freq = (int *)((char *)info + sizeof(*info));
info->info_name = (char *)info->info_freq + freq_size * sizeof(int);
info->info_unit = info->info_name + name_size;
}
This has adopted the simple expedient of allocating the most stringently aligned type (the array of int) first, then allocating the strings afterwards. This part is, however, where you have to make judgement calls about portability. I'm confident that the code is portable in practice.
C11 has alignment facilities (_Alignof and _Alignas and <stdalign.h>, plus max_align_t in <stddef.h>) that could alter this answer (but I've not studied them sufficiently so I'm not sure how, yet), but the techniques outlined here will work in any version of C provided you are careful about the alignment of data.
Note that if you have a single array in the structure, then C99 provides an alternative to the older 'struct hack' called a flexible array member (FAM). This allows you to have an array explicitly as the last element of the structure.
struct data_info
{
char *info_name;
char *info_units;
int info_freq[];
};
size_t name_size = 9;
size_t freq_size = 10;
size_t unit_size = 5;
size_t nbytes = sizeof(struct data_info) + name_size + freq_size * sizeof(int) + unit_size;
struct data_info *info = malloc(nbytes);
if (info != 0)
{
info->info_name = ((char *)info + sizeof(*info) + freq_size * sizeof(int));
info->info_units = info->info_name + name_size;
}
Note that there was no step to initialize the FAM, info_freq in this example. You cannot have multiple arrays like this.
Note that the techniques outlined cannot readily be applied to arrays of structures (at least, arrays of the outer structure). If you go to considerable effort, you can make it work. Also, beware of realloc(); if you reallocate space, you have to fix up the pointers if the data has moved.
One other point: especially on 64-bit machines, if the sizes of the strings are uniform enough, you'd probably do better allocating the arrays in the structure, instead of using the pointers.
struct data_channel
{
char chan_name[16];
char chan_type[16];
char chan_units[8];
};
This occupies 40 bytes. On a 64-bit machine, the original data structure would occupy 24 bytes for the three pointers and another 24 bytes for the (9 + 10 + 5) bytes of data, for a total of 48 bytes allocated.
I know there is a sure way to do this when you have ONE array at the end of a structure, but since all your arrays have the same type, you may be in luck. The sure method is:
#include <stddef.h>
#include <stdlib.h>
struct StWithArray
{
int blahblah;
float arr[1];
};
struct StWithArray * AllocWithArray(size_t nb)
{
size_t size = nb*sizeof(float) + offsetof(structStWithArray, arr);
return malloc(size);
}
The use of an actual array in the structure guarantees alignment is respected.
Now to apply it to your case:
#include <stddef.h>
#include <stdlib.h>
struct data_channel
{
char *chan_name;
char *chan_type;
char *chan_units;
char actualCharArray[1];
};
struct data_channel * AllocDataChannel(size_t nb)
{
size_t size = nb*sizeof(char) + offsetof(data_channel, actualCharArray);
return malloc(size);
}
struct data_channel * CreateDataChannel(size_t length1, size_t length2, size_t length3)
{
struct data_channel * pt = AllocDataChannel(length1 + length2 + length3);
if(pt != NULL)
{
pt->chan_name = &pt->actualCharArray[0];
pt->chan_type = &pt->actualCharArray[length1];
pt->chan_name = &pt->actualCharArray[length1+length2];
}
return pt;
}
Joachim and Jonathan's answers are nice. Only addition I would like to mention is this.
Separate mallocs and frees buy you some basic protection like buffer overrun, access after
free, etc. I mean basic and not Valgrind like features. Allocating one single chunk and internally doling it out will lead to a loss of this feature.
In future, if the mallocs are for different sizes totally, then separate mallocs may buy you the efficiency of coming from different allocation buckets inside of the malloc implementation, especially if you are going to free them at different times.
The last thing you have to consider is how frequently you are calling mallocs. If it is frequent, then cost of multiple mallocs can be costly.

Odd behaviour using flexible array member

I tried to replace a void* member of a struct with a flexible array member using the more accepted idiom:
typedef struct Entry {
int counter;
//void* block2; // This used to be what I had
unsigned char block[1];
}
I then add entries into a continuous memory block:
void *memPtr = mmap(NULL, someSize*1024, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
as such:
int number = 0;
int AddEntry(void *data) {
Entry *entry;
entry = malloc(sizeof(Entry) + ((SECTOR_SIZE-1) * sizeof(unsigned char));
entry->counter = 1;
memcpy(entry->block, data, SECTOR_SIZE);
// make sure number doesn't overflow space, etc...
memcpy(&memPtr[number], entry, sizeof(Entry) + ((SECTOR_SIZE-1) * sizeof(unsigned char));
number++;
return 0;
}
The problem is unpacking this data once I need it. For example, if I do:
void * returnBlock(int i) {
Entry * entry = &memPtr[i];
printf("Entry counter is %d\n", entry->counter); // returns 1, reliably
return entry->block; // Gives me gibberish but not if I uncomment void* block2.
}
Is there a reason this could be? I don't necessarily think I'm stomping on stuff anywhere, and it used to work with the void* approach. The weird thing is that if I put a dummy void* back into the struct, it works. It doesn't work if I put in a dummy int.
Edit: actually, it also fails if number in AddEntry is not 0. What am I stepping on, if anything?
Your problem is here:
&memPtr[number]
Since memPtr is a void * pointer, this isn't actually allowed in C at all. Some compilers do allow arithmetic on void * pointers as a language extension - however they treat it as if it were a char * pointer.
That means that &memPtr[number] is likely indexing only number bytes into your memory block - so the second Entry structure copied in will overlap the first one, and so on.
Your allocation line appears to be assuming 1024 bytes per Entry (if someSize is a number of Entry structures), so you probably want something like:
((char *)memPtr + number * 1024)
(and similar in the returnBlock() function).
However, if you do this you will notice that there is no point in using the flexible array member - because you're creating a contiguous array of these structures, and don't have a separate index, you have to assume each one is a fixed size. This means that you might as well make each one a fixed size:
typedef struct Entry {
int counter;
unsigned char block[1024 - sizeof counter];
}

How do I create my own packet to send via UDP?

I'm making my own client-server application in C that implements the TFTP protocol. After reading the TFTP's RFC and making working a simple socket client-server app, now I'm a little confused on how to create the specific packets that have to be created for the TFTP protocol.
For example, the WRQ packet has to be this way:
2 bytes string 1 byte string 1 byte
------------------------------------------------
| Opcode | Filename | 0 | Mode | 0 |
------------------------------------------------
which is extracted from the official RFC.
I have a .h in which I define all the structures for the packets, but I'm not sure if I'm doing correctly and I'm not being lucky finding information on the web.
The struct I created for this packet is:
struct WRQ {
signed short int opcode; //2 bytes
char * filename; // N bytes
char zero_0; // 1 byte
char * mode; // N Bytes
char zero_1; // 1 byte
};
I have two doubts:
a) when I make a sizeof(struct WRQ) it returns 20 bytes. Which is not the size I want to get. Why does this happens?
b) How do I have to define the strings? because I want the server to recieve the string itself, and, I think, this way, It will recieve the pointer to the string in the client machine.
I hope that all is clear and you could help me because I'm stuck at the moment!
The following code (with no error checking) is one possible way of building up the packet and sending it. Note that it assumes that both filename and mode are not NULL. I don't know if that is a valid assumption or not. Even if it is, it would be wise to have a check for NULL before using them in real code:
struct WRQ *p;
int packetLen;
char *buf;
char *pos;
int ret;
// compute packet length. Start with fixed size data
packetLen = sizeof( p->opcode ) + sizeof( p->zero_0 ) + sizeof( p->zero_1 );
// This assumes (possibly incorrectly) that filename and mode are not null
packetLen += strlen( p->filename ) + 1;
packetLen += strlen( p->mode ) + 1;
// allocate the buffer
buf = malloc( packetLen );
pos = buf;
// and start filling it in
// I am assuming (but didn't study the RFC that it should be network byte order)
*(signed short int*)pos = htons( p->opcode );
pos += sizeof( p->opcode );
strcpy( pos, p->filename );
pos += strlen( p->filename ) + 1;
*pos = p->zero_0;
strcpy( pos, p->mode );
pos += strlen( p->mode ) + 1;
*pos = p->zero_1;
ret = send( s, buf, packetLen, flags );
free( buf );
In your code, char * filename; is a pointer to a char. This means that filename only occupies 4 bytes. Even if your string is 1000 bytes long, since filename is simply a pointer, it just contains the 4-byte memory address of the first character of your string.
So you have two solutions: use char filename[MAX_LENGTH] to declare a string of size MAX_LENGTH and always pass that. Or, you can include another field, say, "filename_length", which tells you how many bytes to expect when you read the filename field.
(The above is actually false. filename may not be 4 bytes long. filename will be sizeof(char*) bytes long. This is probably 4 bytes on your computer. But pointers are not always 4 bytes, and nowadays people are getting into a lot of trouble for assuming a 4-byte pointer on 64-bit architectures. So I'm just sayin'. Don't downvote me.)
To answer your second question - why is sizeof() 20 bytes? The compiler will pad pieces of the struct with extra bytes so that each member fits inside a 4-byte boundary. The compiler can work efficiently with "words" rather than weird-sized structures. (Again, the "4" just depends on the architecture. Each architecture has its own "word" length.) This known as alignment. There is an excellent SO thread which gives more detail: Why isn't sizeof for a struct equal to the sum of sizeof of each member?
You can't just put a char* in there and expect it to work, because that's a pointer and you need the actual character data to appear in the packet (a pointer passed between two programs will almost never work!). Since the filename portion of the packet is variable-length, you can't represent the whole packet as a struct as you are trying to do. Instead, you probably should dynamically generate the packet by concatenating the pieces on demand, such as with a function having this signature:
vector<char> makePacket(uint16_t opcode, const char* filename, const char* mode);
C structures can't have a variable size. You'll have to define "filename" and "mode" as char field_name [preset size]; or construct the structure by hand in a memory buffer at run time.
Finally, if what you need is a TFTP implementation in C, you can bet someone has written if already. E.g. there's an BSD'ed tftp-hpa used in a variety of Linux distributions.

Resources