Can we count on C structs well behaved format? - c

Can we predict how a C struct will be implemented by the compiler?
If I write the (very badly aligned) struct:
struct {
uint16_t a;
uint32_t b;
uint8_t c;
} s;
char *p = (char*)&s;
can I guarantee that p[6] is the same as s.c? Are the struct fields allocated in this most obvious and canonical way, so we can predict where each field will be in memory?
Edit: Will struct __attribute__ ((__packed__)) {...} s; get me this behavior in GCC?

No you cannot. Don't do that.
You are guaranteed only the order and the same compiler will always do the same layout.
If you need such a thing consult your compiler's documentation for how to enable byte packing (always available) and pad yourself.

The fields have to be allocated in ascending order, but the compiler is free to insert padding between fields as it sees fit, so there's no guarantee of what value of n in p[n] will refer to s.c. OTOH, you can obtain the correct offset using offsetof(s,c).

Even with the __packed__ attribute, it may be impossible to get this alignment due to architectural restrictions. For example, if uint32_t requires 4-byte alignment, it will be at offset 4 even with __packed__.
If you need to assume a particular alignment, put in a static check that will prevent the code compiling with a different alignment.

For a given version of a compiler on a given version of the operating system - and with the same build options = yes
But don't !

See #pragma pack(packed) and #pragma pack(reset). It has the same impact as the GCC attribute __packed__ you mentioned.

Related

How can size of a structure be a non-multiple of 4?

I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!
Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.
#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);
Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))

Misalignment of members in structures [duplicate]

This question already has answers here:
Practical Use of Zero-Length Bitfields
(5 answers)
Closed 8 years ago.
In C, sometimes certain members of a structure tend to have misaligned offsets, as in case of this thread in HPUX community
In such a case, one is suggested to use zero-width bit field to align the(misaligned) next member.
Under what circumstance does misalignment of structure members happen? Is it not the job of the compiler to align offsets of members at word boundary?
"Misalignment" of a structure member can only occur if the alignment requirements of the structure member are deliberately hidden. (Or if some implementation-specific mechanism is used to suppress alignment, such as gcc's packed attribute`.)
For example, in the referenced problem, the issue is that there is a struct:
struct {
// ... stuff
int val;
unsigned char data[DATA_SIZE];
// ... more stuff
}
and the programmer attempts to use data as though it were a size_t:
*(size_t*)s->data
However, the programmer has declared data as unsigned char and the compiler therefore only guarantees that it is aligned for use as an unsigned char.
As it happens, data follows an int and is therefore also aligned for an int. On some architectures this would work, but on the target architecture a size_t is bigger than an int and requires a stricter alignment.
Obviously the compiler cannot know that you intend to use a structure member as though it were some other type. If you do that and compile for an architecture which requires proper alignment, you are likely to experience problems.
The referenced thread suggests inserting a zero-length size_t bit-field before the declaration of the unsigned char array in order to force the array to be aligned for size_t. While that solution may work on the target architecture, it is not portable and should not be used in portable code. There is no guarantee that a 0-length bit-field will occupy 0 bits, nor is there any guarantee that a bit-field based on size_t will actually be stored in a size_t or be appropriately aligned for any non bit-field use.
A better solution would be to use an anonymous union:
// ...
int val;
union {
size_t dummy;
unsigned char data[DATA_SIZE];
};
// ...
With C11, you can specify a minimum alignment explicitly:
// ...
int val;
_Alignas(size_t) unsigned char data[DATA_SIZE];
// ...
In this case, if you #include <stdalign.h>, you can spell _Alignas in a way which will also work with C++11:
int val;
alignas(size_t) unsigned char data[DATA_SIZE];
Q: Why does it misalignment happen? Is it not the job of the compiler to align offsets of members at word boundary?
You are probably aware that the reason that structure fields are aligned to specific boundaries is to improve performance. A properly aligned field may only require a single memory fetch operation by the CPU; where a mis-aligned field will require at least two memory fetch operations (twice the CPU time).
As you indicated, it is the compilers job to align structure fields for fastest CPU access; unless a programmer over-rides the compiler's default behavior.
Then the question might be; Why would the programmer over-ride the compiler's default alignment of structure fields?
One example of why a programmer would want to over-ride the default alignment is when sending a structure 'over the wire' to another computer. Generally, a programmer wants to pack as much data as possible, into fewest number of bytes.
Hence, the programmer will disable the default alignment when structure density is more important than CPU performance accessing structure fields.

typecast array to struct in c

I have a structure like this
struct packet
{
int seqnum;
char type[1];
float time1;
float pri;
float time2;
unsigned char data[512];
}
I am receiving packet in an array
char buf[529];
I want to take the seqnum,data everything separately.Does the following typecast work.. It is giving junk value for me.
struct packet *pkt;
pkt=(struct packet *)buf;
printf(" %d",pkt->seqnum)
No, that likely won't work and is generally a bad and broken way of doing this.
You must use compiler-specific extensions to make sure there's no invisible padding between your struct members, for something like that to work. With gcc, for instance, you do this using the __attribute__() syntax.
It is, thus, not a portable idea.
It's much better to be explicit about it, and unpack each field. This also gives you a chance to have a well-defined endianness in your network protocol, which is generally a good idea for interoperability's sake.
No, that isn't generally valid code. You should make the struct first and then memcopy stuff into it:
packet p;
memcpy(&p.seqnum, buf + 0, 4);
memcpy(&p.type[0], buf + 4, 1);
memcpy(&p.time1, buf + 5, 4);
And so forth.
You must take great care to get the type sizes and endianness right.
First of all, you cannot know in advance where the compiler will insert padding bytes in your structure for performance optimization (cache line alignment, integer alignment etc) since this is platform-dependent. Except, of course, if you are considering building the app only on your platform.
Anyway, in your case it seems like you are getting data from somewhere (network ?) and it is highly probable that the data has been compacted (no padding bytes between fields).
If you really want to typecast your array to a struct pointer, you can still tell the compiler to remove the padding bytes it might add. Note that this depends on the compiler you use and is not a standard C implementation. With gcc, you might add this statement at the end of your structure definition :
struct my_struct {
int blah;
/* Blah ... */
} __attribute__((packed));
Note that it will affect the performance for member access, copy etc ...
Unless you have a very good reason to do so, don't ever use the __attribute__((packed)) thing !
The other solution, which is much more advisable is to make the parsing on your own. You just allocate an appropriate structure and fill its fields by seeking the good information from your buffer. A sequence of memcpy instructions is likely to do the trick here (see Kerrek's answer)

How do I determine the memory layout of a structure?

Suppose I have the following structure (in C):
struct my_struct {
int foo;
float bar;
char *baz;
};
If I now have a variable, say
struct my_struct a_struct;
How can I find out how the fields of that structure are going to be laid out in memory? In other words, I need to know what the address of a_struct.foo, of a_struct.bar and a_struct.baz are going to be. And I cannot do that programatically, because I am actually cross-compiling to another platform.
CLARIFICATION
Thanks the answers so far, but I cannot do this programatically (i.e. with the offsetof macro, or with a small test program) because I am cross-compiling and I need to know how the fields are going to be aligned on the target platform. I know this is implementation-dependent, that's the whole point of my question. I am using GCC to compile, targeting an ARM architecture.
What I need in the end is to be able to dump the memory from the target platform and parse it with other tools, such as Python's struct library. But for that I need to know how the fields were laid out.
In general, this is implementation specific. It depends on things like the compiler, compiler settings, the platform you are compiling on, word-size, etc. Here's a previous SO thread on the topic: C struct memory layout?
If you are cross-compiling, I'd imagine the specific layout will be different depending on which platform you compile for. I'd consult references for your compiler and platform.
There's a program called pahole (Poke-A-Hole) in the dwarves package that will produce a report showing the structures of your program along with markers showing where and how large padding is.
I think you have two options.
The first one is to use __attribute__((packed)) after the struct declaration. This will ensure that each member will be allocated exactly the amount of memory that its type requires.
The other one is to examine your structure and use the alignment rules (n-byte basic type variable has to be n-byte aligned) to figure out the layout.
In your example, in either case each member variable will take 4 bytes and the structure will occupe 12 bytes in memory.
One hacky way to see the memory view of what's inside it would be to cast a struct pointer to a char pointer, then print out all the chars, something like:
struct my_struct s;
s.foo = MAX_INT;
s.bar = 1.0;
s.baz = "hello";
for (int i = 0; i < sizeof(s); i++) {
char *c = ((char*)&s) + i;
printf("byte %d: 0x%02x\n", i, *c);
}
That doesn't explicitly show you the boundaries, you'd have to infer that from the dump.
Comments made by others about packing still apply; you'll also want to use explicitly sized types like uint32 instead of unsigned int

Reading binary data into memory structures, weird effects

I've been at this for a while now and it really puzzles me. This is a very distilled code fragment that reproduces the problem:
uint8_t dataz[] = { 1, 2, 3, 4, 5, 6 };
struct mystruct {
uint8_t dummy1[1];
uint16_t very_important_data;
uint8_t dummy2[3];
} *mystruct = (void *) dataz;
printf("%x\n", mystruct -> very_important_data);
What do you expect should be the output ? I'd say x302, but nope. It gives me x403. The same as if using this structure:
struct mystruct {
uint8_t dummy1[2];
uint16_t very_important_data;
uint8_t dummy2[2];
} *mystruct = (void *) dataz;
How would you explain that?
As others have mentioned, unless your compiler alignment is byte-aligned, your structure is likely to have "holes" in it. The compiler does this because it speeds up memory access.
If you're using gcc, there is a "packed" attribute which will cause the struct to be byte-aligned, and so remove the "holes":
struct __attribute((__packed__)) mystruct {
uint8_t dummy1[1];
uint16_t very_important_data;
uint8_t dummy2[3];
} *mystruct = (void *) dataz;
However, this will not necessarily fix the problem. The 16-bit value may not be set to what you think it should be, depending on the endianness of your machine. You will have to swap the bytes in any multi-byte integers in the struct. There is no general function to do this, as it would require information on the layout of the structure at run-time, which C does not provide.
Mapping structures to binary data is generally non-portable, even if you get it to work on your machine, right now.
Packing. The is no guarantee how members of a struct are physically located inside the struct. They may be word-aligned, leaving gaps.
There are pragmas in some versions of C to explictly control packing.
Most likely, the compiler has added a byte of padding between dummy1 and very_important_data to align very_important_data on a 16-bit boundary.
In general, the alignment and padding of fields in a struct is implementation-dependent, so you shouldn't rely on it. If you absolutely need a particular behavior, many compilers offer #pragma or other directives to control this. Check your compiler's documentation.
It depends on the compiler, but usually a compiler aligns each member to its natural alignment. In the case you ran into, very_important_data is a uint16_t which probably has a natural alignment of 2 bytes.

Resources