So I have some structs containing data that I want to send to another process using a unix socket. This process may not be compiled using the same compiler version, or even be written in C for that matter. This is the struct (note that some stuff is commented out):
struct nested_struct {
uint8_t a;
uint8_t b;
uint16_t c;
} */__attribute__((packed))*/;
struct my_struct {
uint32_t num_nested_structs;
/* uint8_t padding[3];*/
uint8_t x;
uint16_t y;
uint16_t z;
struct nested_struct nested[];
} /*__attribute__((packed))*/;
For convenience and performance, I'd like to get away with something like
write(socket, &data.x, data.num_nested_structs * sizeof(struct nested_struct) + 5)
or something -- but I doubt this would be safe, given that struct my_struct is not nicely aligned. But how about if we un-comment the packed attribute? This feels like it should work, but I've read that referencing fields in __packed__ structs by address can be dangerous.
What if we instead uncomment the uint8_t padding[3]; field? Now both structs are word size-aligned (on a system with WORD_BIT = 32). Is it safe to assume that the compiler won't add any padding in this case? If so, is this enough to ensure that accessing 5 + 4*num_nested_structs bytes of memory starting from &my_struct.x is safe?
Related
I have the following struct definition:
typedef struct mb32_packet_t {
union {
struct {
uint16_t preamble;
uint8_t system_id;
uint8_t message_id;
uint8_t reserved;
uint32_t paylen;
};
uint8_t header[9];
};
uint8_t *payload;
uint16_t checksum;
} __attribute__((packed)) mb32_packet_t;
Now I would like to have another union, so that I can get an uint8_t body[] pointer to the entire packet object. Something like this:
typedef struct mb32_packet_t {
union {
struct {
union {
struct {
uint16_t preamble;
uint8_t system_id;
uint8_t message_id;
uint8_t reserved;
uint32_t paylen;
};
uint8_t header[9];
};
uint8_t *payload;
uint16_t checksum;
};
uint8_t body[?];
};
} __attribute__((packed)) mb32_packet_t;
The problem is that the payload field size is dynamically determined at runtime. Is there another way to accomplish this other than making payload fixed sized?
I basically want to send objects of this type through a network socket, so I need a uint8_t pointer that points to an object of this type. At the time of sending the object, I know the size of the entire object in bytes.
Introduction
The question is unclear, so I will discuss three apparent possibilities.
Fixed-length header followed by variable-length payload
A typical way to define a packet for a networking or messaging service is to have a fixed-length header followed by a variable-length payload. In modern C, the variable-length payload may be defined using a flexible array member, which is an array with no dimension at the end of a structure:
typedef struct
{
uint16_t preamble;
uint8_t system_id;
uint8_t message_id;
uint8_t reserved;
uint32_t paylen;
uint8_t payload[];
} mb32_packet_t;
Memory for such a structure is allocated use the base size provided by sizeof plus additional memory for the payload:
mb32_packet_t *MyPacket = malloc(sizeof *MyPacket + PayloadLength);
When you pass such an object to a routine that requires a char * or uint8_t * or similar type for its argument, you can simply convert the pointer:
SendMyMessage(…, (uint8_t *) MyPacket,…);
That cast, (uint8_t *) MyPacket, provides the pointer to the first byte of the packet requested in the question. There is no need to wedge another member into the structure or layer on a union or other declaration.
Prior to the introduction of flexible array members in C 1999, people would use one of two workarounds to create structures with variable amounts of data. One, they might just define a member array with one element and adjust the space calculations accordingly:
typedef struct
{
…
unsigned char payload[1];
} mb32_packet_t;
mb32_packet_t *MyPacket = malloc(sizeof *MyPacket + PayloadLength - 1);
Technically, that violated the C standard, since the structure contained an array of only one element even though more space was allocated for it. However, compilers were not as aggressive in their analysis of program semantics and their optimization as they are now, so it generally worked. So you may still see old code using that method.
Two, GCC had its own pre-standard implementation of flexible array members, just using an array dimension of zero instead of omitting a dimension:
typedef struct
{
…
unsigned char payload[0];
} mb32_packet_t;
Again, you may see old code using that, but new code should use the standard flexible array member.
Fixed-length header with pointer to variable-length payload
The payload-after-header form shown above is the form of packet I would most expect in a messaging packet, because it matches what the hardware has to put “on the wire” when sending bytes across a network: It writes the header bytes followed by the data bytes. So it is convenient to have them arranged that way in memory.
However, your code shows another option: The data is not in the packet but is pointed to by a pointer in the packet, with uint8_t *payload;. I would suspect that is a mistake, that the network or messaging service really wants a flexible array member, but you show it followed by another member, uint16_t checksum. A flexible array member must be the last member in a structure, so the fact that there is another member after the payload suggests this definition with a pointer may be correct for the messaging service you are working with.
However, if that is the case, it is not possible to get a pointer to the complete packet object, because the object is in two pieces. One contains the header, and the other, at some unrelated location in memory, contains the data.
As above, you can produce a uint8_t * pointer to the start of the packet with (uint8_t) MyPacket. If the messaging system knows about the pointer in the structure, that should work. If you have mistaken what the packet structure must be, it will fail.
Fixed-length header followed by fixed-length payload space
Code elsewhere on Stack Overflow shows a struct mb32_packet_t with a fixed amount of space for a payload:
typedef struct mb32_packet_t {
uint8_t compid;
uint8_t servid;
uint8_t payload[248];
uint8_t checksum;
} __attribute__((packed)) mb32_packet_s;
In this form, the packet is always a fixed size, although the amount of space used for the payload could vary. Again, you would obtain a uint8_t * pointer to the packet by a cast. There is no need for a special member for that.
This is possible, but not with a struct or union, because all parts of a struct or union need to have a known size. You can still use a struct for the header.
Because the body starts at a known location, there's a trick you can use to access it as if it was part of the structure. You can declare it with no size at all (a "flexible array member") or as 0 bytes (a GCC extension that predates the standard). The compiler will not allocate any space for it, but it will still let you use the name to refer to the end of the struct. The trick is that you can malloc extra bytes after the end of the struct, and then use body to refer to them.
typedef struct mb32_packet_t {
union {
struct {
uint16_t preamble;
uint8_t system_id;
uint8_t message_id;
uint8_t reserved;
uint32_t paylen;
};
uint8_t header[9];
};
uint8_t body[]; // flexible array member
} __attribute__((packed)) mb32_packet_t;
// This is not valid. The body is 0 bytes long, so the write is out of bounds.
mb32_packet_t my_packet;
my_packet.body[0] = 1;
// This is valid though!
mb32_packet_t *my_packet2 = malloc(sizeof(*my_packet2) + 50);
my_packet2->body[49] = 1;
// Alternative way to calculate size
mb32_packet_t *my_packet3 = malloc(offsetof(mb32_packet_t, body[50]));
my_packet3->body[49] = 1;
The flexible array member must be last. To access the checksum, you will need to allocate an extra 2 bytes, and use pointer arithmetic. Fortunately, this is just for the checksum, and not the entire header.
mb32_packet_t *my_packet = malloc(sizeof(*my_packet) + body_size + 2);
uint16_t *pchecksum = (uint16_t*)&my_packet.body[body_size];
// or
uint16_t *pchecksum = (uint16_t*)(my_packet.body + body_size);
After you fill in the header, body and checksum, then because they are contiguous in memory, a pointer to the header is also a pointer to the entire packet object.
I usually do it this way:
typedef struct
{
size_t payload_size;
double x;
char y[45];
/* another members */
unsigned char payload[];
}my_packet_t;
or if your compiler does not support FAMs
typedef struct
{
size_t payload_size;
double x;
char y[45];
/* another members */
unsigned char payload[0];
}my_packet_t;
So it the payload can be at the end of the header structure
I have a little data-hiding module that looks like this:
/** mydata.h */
struct _mystruct_t;
typedef struct _mystruct_t mystruct;
mystruct *newMystruct();
void freeMystruct( mystruct** p );
/** mydata.c */
#include "mydata.h"
struct _mystruct_t {
int64_t data1;
int16_t data2;
int16_t data3;
};
// ... related definitions ... //
For the most part, this is what I want; although simple, the struct has strict consistency requirements and I really don't want to provide access to the data members.
The problem is that in client code I would like to include the struct in another struct which I would like to allocate on the stack. Right now I am jumping through hoops to free the mystruct*s in some client code. Since a) mystruct is pretty small and I really don't think it's going to get big anytime soon and b) it's not a problem that client code has to recompile if I ever change mystruct, I would like to make the size of mystruct public (i.e. in the header).
Two possibilities I've considered:
/** mydata.h */
typedef struct {
// SERIOUSLY DON'T ACCESS THESE MEMBERS
int64_t data1;
int16_t data2;
int16_t data3;
} mystruct;
I think the drawbacks here speak for themselves.
OR
/** mydata.h */
#define SIZEOF_MYSTRUCT (sizeof(int64_t)+sizeof(int16_t)+sizeof(int16_t))
// everything else same as before...
/** mydata.c */
// same as before...
_Static_assert (SIZEOF_MYSTRUCT == sizeof(mystruct), "SIZEOF_MYSTRUCT is incorrect")
Of course this seems non-ideal since I have to update this value manually and I don't know if/how alignment of the struct could actually cause this to be incorrect (I thought of the static assert while writing this question, it partially addresses this concern).
Is one of these preferred? Or even better, is there some clever trick to provide the actual struct definition in the header while later somehow hiding the ability to access the members?
You can create different .h file distributed to the end user that would define your secret structure just as byte array (you can't hide data without crypto/checksumming more than just saying "here are some bytes"):
typedef struct {
unsigned char data[12];
} your_struct;
You just have to make sure that both structures are the same for all the compilers and options, thus using __declspec(align()) (for VC) in your library code, so for example:
// Client side
__declspec(align(32)) typedef struct {
int64_t data1;
int16_t data2;
int16_t data3;
} mystruct;
To prevent structure from being 16B long instead of commonly expected 12B. Or just use /Zp compiler option.
I would stay with a configure time generated #define describing the size of the mystruct and possibly a typedef char[SIZEOF_MYSTRUCT] opaque_mystruct to simplify creation of placeholders for mystruct.
Likely the idea of configure time actions deserves some explanations. The general idea is to
place the definition of the mystruct into a private, non-exported but nevertheless distributed header,
create a small test application being built and executed before the library. The test application would #include the private header, and print actual sizeof (mystruct) for a given compiler and compile options
create an appropriate script which would create a library config.h with #define SIZEOF_MYSTRUCT <calculated_number> and possibly definition of opaque_mystruct.
It's convenient to automate these steps with a decent build system, for examplecmake, gnu autotools or any other with support of configure stage. Actually all mentioned systems have built-in facilities which simplify the whole task to invocation of few predefined macros.
I've been researching and thinking and took one of my potential answers and took it to the next level; I think it addresses all of my concerns. Please critique.
/** in mydata.h */
typedef const struct { const char data[12]; } mystruct;
mystruct createMystruct();
int16_t exampleMystructGetter( mystruct *p );
// other func decls operating on mystruct ...
/** in mydata.c */
typedef union {
mystruct public_block;
struct mystruct_data_s {
int64_t d1;
int16_t d2
int16_t d3;
} data;
} mystruct_data;
// Optionally use '==' instead of '<=' to force minimal space usage
_Static_assert (sizeof(struct mystruct_data_s) <= sizeof(mystruct), "mystruct not big enough");
mystruct createMystruct(){
static mystruct_data mystruct_blank = { .data = { .d1 = 1, .d2 = 2, .d3 = 3 } };
return mystruct_blank.public_block;
}
int16_t exampleMystructGetter(mystruct *p) {
mystruct_data *a = (mystruct_data*)p;
return a->data.d2;
}
Under gcc 4.7.3 this compiles without warnings. A simple test program to create and access via the getter also compiles and works as expected.
In the embedded world we often have data structures that are passed around via fixed-length buffers. These are relatively easy to handle using something like this:
#define TOTAL_BUFFER_LENGTH 4096
struct overlay {
uint16_t field1;
uint16_t field2;
uint8_t array1[ARY1_LEN];
};
static_assert(sizeof(struct overlay) <= TOTAL_BUFFER_LENGTH);
struct overlay* overlay = malloc(TOTAL_BUFFER_LENGTH);
That is, we use a data structure as an overlay to allow easy access to the part of the buffer that is currently being used.
We have a number of buffer formats, however, that also use the last few bytes of the buffer to store things like checksums. We currently use constructions like this:
struct overlay {
uint16_t field1;
uint16_t field2;
uint8_t array1[ARY1_LEN];
char reserved[TOTAL_BUFFER_LENGTH -
sizeof(uint16_t) - sizeof(uint16_t) -
(sizeof(uint8_t) * ARY1_LEN) -
sizeof(uint32_t)];
uint32_t crc;
};
As ugly as this looks for this simple data structure, it's an absolute monstrosity when the structure grows to have dozens of fields. It's also a maintainability nightmare, as adding or removing a structure field means that the size calculation for reserved must be updated at the same time.
When the end of the structure only contains one item (like a checksum), then we sometimes use a helper function for reading/writing the value. That keeps the data structure clean and maintainable, but it doesn't scale well when the end of the buffer has multiple fields.
It would help greatly if we could do something like this instead:
struct overlay {
uint16_t field1;
uint16_t field2;
uint8_t array1[ARY1_LEN];
char reserved[TOTAL_BUFFER_LENGTH -
offsetof(struct overlay, reserved) -
sizeof(uint32_t)];
uint32_t crc;
};
Unfortunately, offsetof only works on complete object types and since this is in the middle of the definition of struct overlay, that type isn't yet complete.
Is there a cleaner, more maintainable way to do this sort of thing? I essentially need a fixed-length structure with fields at the beginning and at the end, with the remaining space in the middle reserved/unused.
In your situation, I think I'd probably do things this way:
typedef struct overlay_head
{
uint16_t field1;
uint16_t field2;
uint8_t array1[ARY1_LEN];
} overlay_head;
typedef struct overlay_tail
{
uint32_t crc;
} overlay_tail;
enum { OVERLAY_RSVD = TOTAL_BUFFER_LENGTH - sizeof(overlay_head)
- sizeof(overlay_tail) };
typedef struct overlay
{
overlay_head h;
uint8_t reserved[OVERLAY_RSVD];
overlay_tail t;
} overlay;
You can then work almost as before, except that where you used to write p->field1
you now write p->h.field1, and where you used to write p->crc you now write p->t.crc.
Note that this handles arbitrarily large tail structures quite effectively, as long as the head and tail both fit inside the overall size.
You could define a structure that simply has the buffer with a CRC field at the end:
struct checked_buffer {
char data[TOTAL_BUFFER_LENGTH - sizeof(uint32_t)];
uint32_t crc;
};
and then place your "overlays" on its data field. You're presumably already casting pointers to "convert" a raw buffer's char* into an overlay*, so it shouldn't be a big deal to cast from overlay* to checked_buffer* when you want to access the CRC field.
But if you want to have a field in a consistent position across a bunch of structures, it'd be easier to put it at the beginning of each structure. That way you can declare it directly in each structure without needing to do anything strange, and you don't need any pointer casts to access it.
How about that?
union a256
{
struct
{
int field_a;
int field_b;
char name[16];
//
int crcshadow;
};
struct
{
char buff[256-sizeof(int)];
int crc;
};
} ;
static_assert(offsetof(a256, crcshadow) < offsetof(a256, crc), "data too big");
The first struct contains data, the second define fixed size for this union.
I want the size of a C struct to be multiple of 16 bytes (16B/32B/48B/..).
It does not matter which size it gets to; it only needs to be multiple of 16 bytes.
How could I enforce the compiler to do that?
For Microsoft Visual C++:
#pragma pack(push, 16)
struct _some_struct
{
...
}
#pragma pack(pop)
For GCC:
struct _some_struct { ... } __attribute__ ((aligned (16)));
Example:
#include <stdio.h>
struct test_t {
int x;
int y;
} __attribute__((aligned(16)));
int main()
{
printf("%lu\n", sizeof(struct test_t));
return 0;
}
compiled with gcc -o main main.c will output 16. The same goes for other compilers.
The size of a C struct will depend on the members of the struct, their types and how many of them there are. There is really no standard way to force the compiler to make structs to be a multiple of some size. Some compilers provide a pragma that will allow you to set the alignment boundary however that is really a different thing. And there may be some that would have such a setting or provide such a pragma.
However if you insist on this one method would be to do memory allocation of the struct and to force the memory allocation to round up to the next 16 byte size.
So if you had a struct like this.
struct _simpleStruct {
int iValueA;
int iValueB;
};
Then you could do something like the following.
{
struct _simpleStruct *pStruct = 0;
pStruct = malloc ((sizeof(*pStruct)/16 + 1)*16);
// use the pStruct for whatever
free(pStruct);
}
What this would do is to push the size up to the next 16 byte size so far as you were concerned. However what the memory allocator does may or may not be to give you a block that is actually that size. The block of memory may actually be larger than your request.
If you are going to do something special with this, for instance lets say that you are going to write this struct to a file and you want to know the block size then you would have to do the same calculation used in the malloc() rather than using the sizeof() operator to calculate the size of the struct.
So the next thing would be to write your own sizeof() operator using a macro such as.
#define SIZEOF16(x) ((sizeof(x)/16 + 1) * 16)
As far as I know there is no dependable method for pulling the size of an allocated block from a pointer. Normally a pointer will have a memory allocation block that is used by the memory heap management functions that will contain various memory management information such as the allocated block size which may actually be larger than the requested amount of memory. However the format for this block and where it is located relative to the actual memory address provided will depend on the C compiler's run time.
This depends entirely on the compiler and other tools since alignment is not specified that deeply in the ISO C standard (it specifies that alignment may happen at the compilers behest but does not go into detail as to how to enforce it).
You'll need to look into the implementation-specific stuff for your compiler toolchain. It may provide a #pragma pack (or align or some other thing) that you can add to your structure defininition.
It may also provide this as a language extension. For example, gcc allows you to add attributes to a definition, one of which controls alignment:
struct mystruct { int val[7]; } __attribute__ ((aligned (16)));
You could perhaps do a double struct, wrapping your actual struct in a second one that can add padding:
struct payload {
int a; /*Your actual fields. */
float b;
char c;
double d;
};
struct payload_padded {
struct payload p;
char padding[16 * ((sizeof (struct payload) + 15) / 16)];
};
Then you can work with the padded struct:
struct payload_padded a;
a.p.d = 43.3;
Of course, you can make use of the fact that the first member of a structure starts 0 bytes from where the structure starts, and treat a pointer to struct payload_padded as if it's a pointer to a struct payload (because it is):
float d_plus_2(const struct payload *p)
{
return p->d + 2;
}
/* ... */
struct payload_padded b;
const double dp2 = d_plus_2((struct payload *) &b);
I am using a library that has a function that takes an array of structs. That struct and function has the following layout:
struct TwoInt32s
{
int32_t a;
int32_t b;
};
void write(struct TwoInt32s *buffer, int len);
My initial tests suggest that an array of such structs has the same memory layout as an array of int32_t so I can do something like this:
int32_t *buffer = malloc(2 * len * sizeof(int32_t));
/* fill in the buffer */
write((struct TwoInt32s*)buffer, len);
However I'm wondering if this is universally true or not. Using an array of int32_t greatly simplifies my code.
EDIT: I forgot the sizeof
From what I read, C guarantees a few things about struct padding:
members will NOT be reordered
padding will only be added between members with different alignments or at the end of the struct
a pointer to a struct points to the same memory location as a pointer to its first member
each member is aligned in a manner appropriate for its type
there may be unnamed holes in the struct as necessary to achieve alignment
From this I can extrapolate that a and b have no padding between them. However it's possible that the struct will have padding at the end. I doubt this since it's word-aligned on both 32 and 64 bit systems. Does anyone have additional information on this?
The implementation is free to pad structs - there may be unused bytes in between a and b. It is guaranteed that the first member isn't offset from the beginning of the struct though.
Typically you manage such layout with a compiler-specific pragma, e.g:
#pragma pack(push)
#pragma pack(1)
struct TwoInt32s
{
int32_t a;
int32_t b;
};
#pragma pack(pop)
malloc allocates bytes. Why did you choose "2*len" ?
You could simply use "sizeof":
int32_t *buffer = malloc(len * sizeof(TwoInt32s));
/* fill in the buffer */
write((struct TwoInt32s*)buffer, len);
and as Erik mentioned, it would be a good practice to pack the struct.
It's safest to not cast, but convert -- i.e., create a new array and fill it with the values found in the struct, then kill the struct.
You could allocate structures but treat their members as a sort of virtual array:
struct TwoInt32s *buffer = malloc(len * sizeof *buffer);
#define BUFFER(i) (*((i)%2 ? &buffer[(i)/2].b : &buffer[(i)/2].a))
/* fill in the buffer, e.g. */
for (int i = 0; i < len * 2; i++)
BUFFER(i) = i;
Unfortunately, neither GCC nor Clang currently "get" this code.