How do I determine the memory layout of a structure?

How do I determine the memory layout of a structure? - c

Suppose I have the following structure (in C):
struct my_struct {
int foo;
float bar;
char *baz;
};
If I now have a variable, say
struct my_struct a_struct;
How can I find out how the fields of that structure are going to be laid out in memory? In other words, I need to know what the address of a_struct.foo, of a_struct.bar and a_struct.baz are going to be. And I cannot do that programatically, because I am actually cross-compiling to another platform.
CLARIFICATION
Thanks the answers so far, but I cannot do this programatically (i.e. with the offsetof macro, or with a small test program) because I am cross-compiling and I need to know how the fields are going to be aligned on the target platform. I know this is implementation-dependent, that's the whole point of my question. I am using GCC to compile, targeting an ARM architecture.
What I need in the end is to be able to dump the memory from the target platform and parse it with other tools, such as Python's struct library. But for that I need to know how the fields were laid out.

In general, this is implementation specific. It depends on things like the compiler, compiler settings, the platform you are compiling on, word-size, etc. Here's a previous SO thread on the topic: C struct memory layout?
If you are cross-compiling, I'd imagine the specific layout will be different depending on which platform you compile for. I'd consult references for your compiler and platform.

There's a program called pahole (Poke-A-Hole) in the dwarves package that will produce a report showing the structures of your program along with markers showing where and how large padding is.

I think you have two options.
The first one is to use __attribute__((packed)) after the struct declaration. This will ensure that each member will be allocated exactly the amount of memory that its type requires.
The other one is to examine your structure and use the alignment rules (n-byte basic type variable has to be n-byte aligned) to figure out the layout.
In your example, in either case each member variable will take 4 bytes and the structure will occupe 12 bytes in memory.

One hacky way to see the memory view of what's inside it would be to cast a struct pointer to a char pointer, then print out all the chars, something like:
struct my_struct s;
s.foo = MAX_INT;
s.bar = 1.0;
s.baz = "hello";
for (int i = 0; i < sizeof(s); i++) {
char *c = ((char*)&s) + i;
printf("byte %d: 0x%02x\n", i, *c);
}
That doesn't explicitly show you the boundaries, you'd have to infer that from the dump.
Comments made by others about packing still apply; you'll also want to use explicitly sized types like uint32 instead of unsigned int

Related

What is there to be gained by deterministic field ordering in the memory layout?

Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?

The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.

If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).

Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.

typecast array to struct in c

I have a structure like this
struct packet
{
int seqnum;
char type[1];
float time1;
float pri;
float time2;
unsigned char data[512];
}
I am receiving packet in an array
char buf[529];
I want to take the seqnum,data everything separately.Does the following typecast work.. It is giving junk value for me.
struct packet *pkt;
pkt=(struct packet *)buf;
printf(" %d",pkt->seqnum)

No, that likely won't work and is generally a bad and broken way of doing this.
You must use compiler-specific extensions to make sure there's no invisible padding between your struct members, for something like that to work. With gcc, for instance, you do this using the __attribute__() syntax.
It is, thus, not a portable idea.
It's much better to be explicit about it, and unpack each field. This also gives you a chance to have a well-defined endianness in your network protocol, which is generally a good idea for interoperability's sake.

No, that isn't generally valid code. You should make the struct first and then memcopy stuff into it:
packet p;
memcpy(&p.seqnum, buf + 0, 4);
memcpy(&p.type[0], buf + 4, 1);
memcpy(&p.time1, buf + 5, 4);
And so forth.
You must take great care to get the type sizes and endianness right.

First of all, you cannot know in advance where the compiler will insert padding bytes in your structure for performance optimization (cache line alignment, integer alignment etc) since this is platform-dependent. Except, of course, if you are considering building the app only on your platform.
Anyway, in your case it seems like you are getting data from somewhere (network ?) and it is highly probable that the data has been compacted (no padding bytes between fields).
If you really want to typecast your array to a struct pointer, you can still tell the compiler to remove the padding bytes it might add. Note that this depends on the compiler you use and is not a standard C implementation. With gcc, you might add this statement at the end of your structure definition :
struct my_struct {
int blah;
/* Blah ... */
} __attribute__((packed));
Note that it will affect the performance for member access, copy etc ...
Unless you have a very good reason to do so, don't ever use the __attribute__((packed)) thing !
The other solution, which is much more advisable is to make the parsing on your own. You just allocate an appropriate structure and fill its fields by seeking the good information from your buffer. A sequence of memcpy instructions is likely to do the trick here (see Kerrek's answer)

Can we count on C structs well behaved format?

Can we predict how a C struct will be implemented by the compiler?
If I write the (very badly aligned) struct:
struct {
uint16_t a;
uint32_t b;
uint8_t c;
} s;
char *p = (char*)&s;
can I guarantee that p[6] is the same as s.c? Are the struct fields allocated in this most obvious and canonical way, so we can predict where each field will be in memory?
Edit: Will struct __attribute__ ((__packed__)) {...} s; get me this behavior in GCC?

No you cannot. Don't do that.
You are guaranteed only the order and the same compiler will always do the same layout.
If you need such a thing consult your compiler's documentation for how to enable byte packing (always available) and pad yourself.

The fields have to be allocated in ascending order, but the compiler is free to insert padding between fields as it sees fit, so there's no guarantee of what value of n in p[n] will refer to s.c. OTOH, you can obtain the correct offset using offsetof(s,c).

Even with the __packed__ attribute, it may be impossible to get this alignment due to architectural restrictions. For example, if uint32_t requires 4-byte alignment, it will be at offset 4 even with __packed__.
If you need to assume a particular alignment, put in a static check that will prevent the code compiling with a different alignment.

For a given version of a compiler on a given version of the operating system - and with the same build options = yes
But don't !

See #pragma pack(packed) and #pragma pack(reset). It has the same impact as the GCC attribute __packed__ you mentioned.

Accessing array as a struct *

This is one of those I think this should work, but it's best to check questions. It compiles and works fine on my machine.
Is this guaranteed to do what I expect (i.e. allow me to access the first few elements of the array with a guarantee that the layout, alignment, padding etc of the struct is the same as the array)?
struct thingStruct
{
int a;
int b;
int c;
};
void f()
{
int thingsArray[5];
struct thingStruct *thingsStruct = (struct thingStruct *)&thingsArray[0];
thingsArray[0] = 100;
thingsArray[1] = 200;
thingsArray[2] = 300;
printf("%d", thingsStruct->a);
printf("%d", thingsStruct->b);
printf("%d", thingsStruct->c);
}
EDIT: Why on earth would I want to do something like this? I have an array which I'm mmapping to a file. I'm treating the first part of the array as a 'header', which stores various pieces of information about the array, and the rest of it I'm treating as a normal array. If I point the struct to the start of the array I can access the pieces of header data as struct members, which is more readable. All the members in the struct would be of the same type as the array.

While I have seen this done frequently, you cannot (meaning it is not legal, standard C) make assumptions about the binary layout of a structure, as it may have padding between fields.
This is explained in the comp.lang.c faq: http://c-faq.com/struct/padding.htmls

Although it's likely to work in most places, it's still a bit iffy. If you want to give symbolic names to parts of the header, why not just do:
enum { HEADER_A, HEADER_B, HEADER_C };
/* ... */.
printf("%d", thingsArray[HEADER_A]);
printf("%d", thingsArray[HEADER_B]);
printf("%d", thingsArray[HEADER_C]);

As Evan commented on the question, this will probably work in most cases (again, probably best if you use #pragma pack to ensure their is no padding) assuming all the types in your struct are the same type as your array. Given the rules of C, this is legal.
My question to you is "why?" This isn't a particularly safe thing to do. If a float gets thrown into the middle of the struct, this all falls apart. Why not just use the struct directly? This really ins't a technique that I'd recommend in most cases.

Another solution for representing a header and the rest of file data is using a structure like this:
struct header {
long headerData1;
int headerData2;
int headerData3;
int fileData[ 1 ]; // <- data begin here
};
Then you allocate the memory block with a file contents and cast it as struct header *myFileHeader (or map the memory block on a file) and access all your file data with
myFileHeader->fileData[ position ]
for arbitrary big position. The language imposes no restriction on the index value, so it's only your responsibility to keep your arbitrary big posistion within the actual size of the memory block you allocated (or the mapped file's size).
One more important note: apart from switching off the struct members padding, which has been already described by others, you should carefully choose data types for the header members, so that they fit the actual file data layout despite compiler you use (say, int won't change from 32 to 64 bits...)

Is a struct of pointers guaranteed to be represented without padding bits?

I have a linked list, which stores groups of settings for my application:
typedef struct settings {
struct settings* next;
char* name;
char* title;
char* desc;
char* bkfolder;
char* srclist;
char* arcall;
char* incfold;
} settings_row;
settings_row* first_profile = { 0 };
#define SETTINGS_PER_ROW 7
When I load values into this structure, I don't want to have to name all the elements. I would rather treat it like a named array -- the values are loaded in order from a file and placed incrementally into the struct. Then, when I need to use the values, I access them by name.
//putting values incrementally into the struct
void read_settings_file(settings_row* settings){
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
//accessing components by name
void settings_info(settings_row* settings){
printf("Settings 'profile': %s\n", settings.title);
printf("Description: %s\n", settings.desc);
printf("Folder to backup to: %s\n", settings.bkfolder);
}
But I wonder, since these are all pointers (and there will only ever be pointers in this struct), will the compiler add padding to any of these values? Are they guaranteed to be in this order, and have nothing between the values? Will my approach work sometimes, but fail intermittently?
edit for clarification
I realize that the compiler can pad any values of a struct--but given the nature of the struct (a struct of pointers) I thought this might not be a problem. Since the most efficient way for a 32 bit processor to address data is in 32 bit chunks, this is how the compiler pads values in a struct (ie. an int, short, int in a struct will add 2 bytes of padding after the short, to make it into a 32 bit chunk, and align the next int to the next 32 bit chunk). But since a 32 bit processor uses 32 bit addresses (and a 64 bit processor uses 64 bit addresses (I think)), would padding be totally unnecessary since all of the values of the struct (addresses, which are efficient by their very nature) are in ideal 32 bit chunks?
I am hoping some memory-representation / compiler-behavior guru can come shed some light on whether a compiler would ever have a reason to pad these values

Under POSIX rules, all pointers (both function pointers and data pointers) are all required to be the same size; under just ISO C, all data pointers are convertible to 'void *' and back without loss of information (but function pointers need not be convertible to 'void *' without loss of information, nor vice versa).
Therefore, if written correctly, your code would work. It isn't written quite correctly, though! Consider:
void read_settings_file(settings_row* settings)
{
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
Let's assume you're using a 32-bit machine with 8-bit characters; the argument is not all that significantly different if you're using 64-bit machines. The assignment to 'field' is all wrong, because settings + 4 is a pointer to the 5th element (counting from 0) of an array of 'settings_row' structures. What you need to write is:
void read_settings_file(settings_row* settings)
{
char* field = (char *)settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
The cast before addition is crucial!
C Standard (ISO/IEC 9899:1999):
6.3.2.3 Pointers
A pointer to void may be converted to or from a pointer to any incomplete or object
type. A pointer to any incomplete or object type may be converted to a pointer to void
and back again; the result shall compare equal to the original pointer.
[...]
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.

In many cases pointers are natural word sizes, so the compiler is unlikely to pad each member, but that doesn't make it a good idea. If you want to treat it like an array you should use an array.
I'm thinking out loud here so there's probably many mistakes but perhaps you could try this approach:
enum
{
kName = 0,
kTitle,
kDesc,
kBkFolder,
kSrcList,
kArcAll,
kIncFold,
kSettingsCount
};
typedef struct settings {
struct settings* next;
char *settingsdata[kSettingsCount];
} settings_row;
Set the data:
settings_row myRow;
myRow.settingsData[kName] = "Bob";
myRow.settingsData[kDescription] = "Hurrrrr";
...
Reading the data:
void read_settings_file(settings_row* settings){
char** field = settings->settingsData;
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}

It's not guaranteed by the C standard. I've a sneaking suspicion, that I don't have time to check right now either way, that it guarantees no padding between the char* fields, i.e. that consecutive fields of the same type in a struct are guaranteed to be layout-compatible with an array of that type. But even if so, you're on your own between the settings* and the first char*, and also between the last char* and the end of the struct. But you could use offsetof to deal with the first issue, and I don't think the second affects your current code.
However, what you want is almost certainly guaranteed by your compiler, which somewhere in its documentation will set out its rules for struct layout, and will almost certainly say that all pointers to data are word sized, and that a struct can be the size of 8 words without additional padding. But if you want to write highly portable code, you have to use only the guarantees in the standard.
The order of fields is guaranteed. I also don't think you'll see intermittent failure - AFAIK the offset of each field in that struct will be consistent for a given implementation (meaning the combination of compiler and platform).
You could assert that sizeof(settings*) == sizeof(char*) and sizeof(settings_row) == sizeof(char*)*8. If both those hold, there is no room for any padding in the struct, since fields are not allowed to "overlap". If you ever hit a platform where they don't hold, you'll find out.
Even so, if you want an array, I'd be inclined to say use an array, with inline accessor functions or macros to get the individual fields. Whether your trick works or not, it's even easier not to think about it at all.

Although not a duplicate, this probably answers your question:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
It's not uncommon for applications to write an entire struct into a file and read it back out again. But this suffers from the possibility that one day the file will need to be read back on another platform, or by another version of the compiler that packs the struct differently. (Although this can be dealt with by specially-written code that understands the original packing format).

Technically, you can rely only on the order; the compiler could insert padding. If different pointers were of different size, or if the pointer size wasn't a natural word size, it might insert padding.
Practically speaking, you could get away with it. I wouldn't recommend it; it's a bad, dirty trick.
You could achieve your goal with another level of indirection (what doesn't that solve?), or by using a temporary array initialized to point to the various members of the structure.

It's not guaranteed, but it will work fine in most cases. It won't be intermittent, it will either work or not work on a particular platform with a particular build. Since you're using all pointers, most compilers won't mess with any padding.
Also, if you wanted to be safer, you could make it a union.

You can't do that the way you are trying. The compiler is allowed to pad any and all members of the struct. I do not believe it is allowed to reorder the fields.
Most compilers have an attribute that can be applied to the struct to pack it (ie to turn it into a collection of tightly packed storage with no padding), but the downside is that this generally affects performance. The packed flag will probably allow you to use the struct the way you want, but it may not be portable across various platforms.
Padding is designed to make field access as efficient as possible on the target architecture. It's best not to fight it unless you have to (ie, the struct goes to a disk or over a network.)

It seems to me that this approach creates more problems than it solves.
When you read this code six months from now, will you still be aware of all the subtleties of how the compiler pads a struct?
Would someone else, who didn't write the code?
If you must use the struct, use it in the canonical way and just write a function which
assigns values to each field separately.
You could also use an array and create macros to give field names to indices.
If you get too "clever" about optimizing your code, you will end up with slower code anyway, since the compiler won't be able to optimize it as well.