Is a struct of pointers guaranteed to be represented without padding bits? - c

I have a linked list, which stores groups of settings for my application:
typedef struct settings {
struct settings* next;
char* name;
char* title;
char* desc;
char* bkfolder;
char* srclist;
char* arcall;
char* incfold;
} settings_row;
settings_row* first_profile = { 0 };
#define SETTINGS_PER_ROW 7
When I load values into this structure, I don't want to have to name all the elements. I would rather treat it like a named array -- the values are loaded in order from a file and placed incrementally into the struct. Then, when I need to use the values, I access them by name.
//putting values incrementally into the struct
void read_settings_file(settings_row* settings){
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
//accessing components by name
void settings_info(settings_row* settings){
printf("Settings 'profile': %s\n", settings.title);
printf("Description: %s\n", settings.desc);
printf("Folder to backup to: %s\n", settings.bkfolder);
}
But I wonder, since these are all pointers (and there will only ever be pointers in this struct), will the compiler add padding to any of these values? Are they guaranteed to be in this order, and have nothing between the values? Will my approach work sometimes, but fail intermittently?
edit for clarification
I realize that the compiler can pad any values of a struct--but given the nature of the struct (a struct of pointers) I thought this might not be a problem. Since the most efficient way for a 32 bit processor to address data is in 32 bit chunks, this is how the compiler pads values in a struct (ie. an int, short, int in a struct will add 2 bytes of padding after the short, to make it into a 32 bit chunk, and align the next int to the next 32 bit chunk). But since a 32 bit processor uses 32 bit addresses (and a 64 bit processor uses 64 bit addresses (I think)), would padding be totally unnecessary since all of the values of the struct (addresses, which are efficient by their very nature) are in ideal 32 bit chunks?
I am hoping some memory-representation / compiler-behavior guru can come shed some light on whether a compiler would ever have a reason to pad these values

Under POSIX rules, all pointers (both function pointers and data pointers) are all required to be the same size; under just ISO C, all data pointers are convertible to 'void *' and back without loss of information (but function pointers need not be convertible to 'void *' without loss of information, nor vice versa).
Therefore, if written correctly, your code would work. It isn't written quite correctly, though! Consider:
void read_settings_file(settings_row* settings)
{
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
Let's assume you're using a 32-bit machine with 8-bit characters; the argument is not all that significantly different if you're using 64-bit machines. The assignment to 'field' is all wrong, because settings + 4 is a pointer to the 5th element (counting from 0) of an array of 'settings_row' structures. What you need to write is:
void read_settings_file(settings_row* settings)
{
char* field = (char *)settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
The cast before addition is crucial!
C Standard (ISO/IEC 9899:1999):
6.3.2.3 Pointers
A pointer to void may be converted to or from a pointer to any incomplete or object
type. A pointer to any incomplete or object type may be converted to a pointer to void
and back again; the result shall compare equal to the original pointer.
[...]
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.

In many cases pointers are natural word sizes, so the compiler is unlikely to pad each member, but that doesn't make it a good idea. If you want to treat it like an array you should use an array.
I'm thinking out loud here so there's probably many mistakes but perhaps you could try this approach:
enum
{
kName = 0,
kTitle,
kDesc,
kBkFolder,
kSrcList,
kArcAll,
kIncFold,
kSettingsCount
};
typedef struct settings {
struct settings* next;
char *settingsdata[kSettingsCount];
} settings_row;
Set the data:
settings_row myRow;
myRow.settingsData[kName] = "Bob";
myRow.settingsData[kDescription] = "Hurrrrr";
...
Reading the data:
void read_settings_file(settings_row* settings){
char** field = settings->settingsData;
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}

It's not guaranteed by the C standard. I've a sneaking suspicion, that I don't have time to check right now either way, that it guarantees no padding between the char* fields, i.e. that consecutive fields of the same type in a struct are guaranteed to be layout-compatible with an array of that type. But even if so, you're on your own between the settings* and the first char*, and also between the last char* and the end of the struct. But you could use offsetof to deal with the first issue, and I don't think the second affects your current code.
However, what you want is almost certainly guaranteed by your compiler, which somewhere in its documentation will set out its rules for struct layout, and will almost certainly say that all pointers to data are word sized, and that a struct can be the size of 8 words without additional padding. But if you want to write highly portable code, you have to use only the guarantees in the standard.
The order of fields is guaranteed. I also don't think you'll see intermittent failure - AFAIK the offset of each field in that struct will be consistent for a given implementation (meaning the combination of compiler and platform).
You could assert that sizeof(settings*) == sizeof(char*) and sizeof(settings_row) == sizeof(char*)*8. If both those hold, there is no room for any padding in the struct, since fields are not allowed to "overlap". If you ever hit a platform where they don't hold, you'll find out.
Even so, if you want an array, I'd be inclined to say use an array, with inline accessor functions or macros to get the individual fields. Whether your trick works or not, it's even easier not to think about it at all.

Although not a duplicate, this probably answers your question:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
It's not uncommon for applications to write an entire struct into a file and read it back out again. But this suffers from the possibility that one day the file will need to be read back on another platform, or by another version of the compiler that packs the struct differently. (Although this can be dealt with by specially-written code that understands the original packing format).

Technically, you can rely only on the order; the compiler could insert padding. If different pointers were of different size, or if the pointer size wasn't a natural word size, it might insert padding.
Practically speaking, you could get away with it. I wouldn't recommend it; it's a bad, dirty trick.
You could achieve your goal with another level of indirection (what doesn't that solve?), or by using a temporary array initialized to point to the various members of the structure.

It's not guaranteed, but it will work fine in most cases. It won't be intermittent, it will either work or not work on a particular platform with a particular build. Since you're using all pointers, most compilers won't mess with any padding.
Also, if you wanted to be safer, you could make it a union.

You can't do that the way you are trying. The compiler is allowed to pad any and all members of the struct. I do not believe it is allowed to reorder the fields.
Most compilers have an attribute that can be applied to the struct to pack it (ie to turn it into a collection of tightly packed storage with no padding), but the downside is that this generally affects performance. The packed flag will probably allow you to use the struct the way you want, but it may not be portable across various platforms.
Padding is designed to make field access as efficient as possible on the target architecture. It's best not to fight it unless you have to (ie, the struct goes to a disk or over a network.)

It seems to me that this approach creates more problems than it solves.
When you read this code six months from now, will you still be aware of all the subtleties of how the compiler pads a struct?
Would someone else, who didn't write the code?
If you must use the struct, use it in the canonical way and just write a function which
assigns values to each field separately.
You could also use an array and create macros to give field names to indices.
If you get too "clever" about optimizing your code, you will end up with slower code anyway, since the compiler won't be able to optimize it as well.

Related

struct xyz a[0]; What does this mean? [duplicate]

I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.

What is there to be gained by deterministic field ordering in the memory layout?

Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?
The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.
If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).
Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.

Typecasting of pointers in C

I know a pointer to one type may be converted to a pointer of another type. I have three questions:
What should kept in mind while typecasting pointers?
What are the exceptions/error may come in resulting pointer?
What are best practices to avoid exceptions/errors?
A program well written usually does not use much pointer typecasting. There could be a need to use ptr typecast for malloc for instance (declared (void *)malloc(...)), but it is not even necessary in C (while a few compilers may complain).
int *p = malloc(sizeof(int)); // no need of (int *)malloc(...)
However in system applications, sometimes you want to use a trick to perform binary or specific operation - and C, a language close to the machine structure, is convenient for that. For instance say you want to analyze the binary structure of a double (that follows thee IEEE 754 implementation), and working with binary elements is simpler, you may declare
typedef unsigned char byte;
double d = 0.9;
byte *p = (byte *)&d;
int i;
for (i=0 ; i<sizeof(double) ; i++) { ... work with b ... }
You may also use an union, this is an exemple.
A more complex utilisation could be the simulation of the C++ polymorphism, that requires to store the "classes" (structures) hierarchy somewhere to remember what is what, and perform pointer typecasting to have, for instance, a parent "class" pointer variable to point at some time to a derived class (see the C++ link also)
CRectangle rect;
CPolygon *p = (CPolygon *)&rect;
p->whatami = POLY_RECTANGLE; // a way to simulate polymorphism ...
process_poly ( p );
But in this case, maybe it's better to directly use C++!
Pointer typecast is to be used carefully for well determined situations that are part of the program analysis - before development starts.
Pointer typecast potential dangers
use them when it's not necessary - that is error prone and complexifies the program
pointing to an object of different size that may lead to an access overflow, wrong result...
pointer to two different structures like s1 *p = (s1 *)&s2; : relying on their size and alignment may lead to an error
(But to be fair, a skilled C programmer wouldn't commit the above mistakes...)
Best practice
use them only if you do need them, and comment the part well that explains why it is necessary
know what you are doing - again a skilled programmer may use tons of pointer typecasts without fail, i.e. don't try and see, it may work on such system / version / OS, and may not work on another one
In plain C you can cast any pointer type to any other pointer type. If you cast a pointer to or from an uncompatible type, and incorrectly write the memory, you may get a segmentation fault or unexpected results from your application.
Here is a sample code of casting structure pointers:
struct Entity {
int type;
}
struct DetailedEntity1 {
int type;
short val1;
}
struct DetailedEntity2 {
int type;
long val;
long val2;
}
// random code:
struct Entity* ent = (struct Entity*)ptr;
//bad:
struct DetailedEntity1* ent1 = (struct DetailedEntity1*)ent;
int a = ent->val; // may be an error here, invalid read
ent->val = 117; // possible invali write
//OK:
if (ent->type == DETAILED_ENTITY_1) {
((struct DetailedEntity1*)ent)->val1;
} else if (ent->type == DETAILED_ENTITY_2) {
((struct DetailedEntity2*)ent)->val2;
}
As for function pointers - you should always use functions which exactly fit the declaration. Otherwise you may get unexpected results or segfaults.
When casting from pointer to pointer (structure or not) you must ensure that the memory is aligned in the exact same way. When casting entire structures the best way to ensure it is to use the same order of the same variables at the start, and differentiating structures only after the "common header". Also remember, that memory alignment may differ from machine to machine, so you can't just send a struct pointer as a byte array and receive it as byte array. You may experience unexpected behaviour or even segfaults.
When casting smaller to larger variable pointers, you must be very careful. Consider this code:
char* ptr = malloc (16);
ptr++;
uint64_t* uintPtr = ptr; // may cause an error, memory is not properly aligned
And also, there is the strict aliasing rule that you should follow.
You probably need a look at ... the C-faq maintained by Steve Summit (which used to be posted in the newsgroups, which means it was read and updated by a lot of the best programmers at the time, sometimes the conceptors of the langage itself).
There is an abridged version too, which is maybe more palatable and still very, very, very, very useful. Reading the whole abridged is, I believe, mandatory if you use C.

Questions about typecasting

I have questions about typecasting. This is just a dummy program shown here. The actual code is too big to be posted.
typedef struct abc
{
int a;
}abc_t;
main()
{
abc_t *MY_str;
char *p;
MY_str = (abc_t *)p;
}
Whenever I run the quality analysis check tool, I get a level 2 warning:
Casting to different object pointer type. REFERENCE - ISO:C90-6.3.4 Cast Operators - Semantics <next> Msg(3:3305) Pointer cast to stricter alignment. <next>
Can anyone please tell me how to resolve this issue?
Simple - your static analysis tool (which, btw?) has decided that a char* does not have a particular alignment requirement (it could point anywhere in memory) whereas an abc_t* likely has a word alignment requirement (int must be on a 4/8 byte boundary).
In reality, as the char* is on the stack, it will be word aligned on most architectures. Your tool cannot see this.
In your implementation (and probably many others) each int must be at an address that is divisible by sizeof int, which is often 4.
On the other hand, a char can be at any address.
It's like assigning 3.25 to an int variable. That's also not possible.
So when you have a bad pointer, you will probably get an exception from your machine, and technically this code invokes undefined behavior.
a char* can be aligned on any byte boundary, which means if you cast it to a structure, the alignment requirements of that struct might not be met (such as 16 byte boundaries required for SIMD types).
Your code is invalid C. If you find yourself doing something like this, it's probably the result of a greater misunderstanding. For instance I'm guessing you want to read an abc_t object from a file/socket/etc. and you're used to passing a char pointer to the read/recv/whatever function. Instead you should just declare an object of type abc_t and pass its address to whatever reading function you're using.

Can C arrays contain padding in between elements?

I heard a rumor that, in C, arrays that are contained inside structs may have padding added in between elements of the array. Now obviously, the amount of padding could not vary between any pair of elements or calculating the next element in an array is not possible with simple pointer arithmetic.
This rumor also stated that arrays which are not contained in structures are guaranteed to contain no padding. I know at least that part is true.
So, in code, the rumor is:
{
// Given this:
struct { int values[20]; } foo;
int values[20];
// This may be true:
sizeof(values) != sizeof(foo.values);
}
I'm pretty certain that sizeof(values) will always equal sizeof(foo.values). However, I have not been able to find anything in the C standard (specifically C99) that explicitly confirms or denies this.
Does anyone know if this rumor is addressed in any C standard?
edit: I understand that there may be padding between the end of the array foo.values and the end of the struct foo and that the standard states that there will be no padding between the start of foo and the start of foo.values. However, does anyone have a quote from or reference to the standard where it says there is no padding between the elements of foo.values?
No, there will never be padding in between elements of an array. That is specifically not allowed. The C99 standard calls array types "An array type describes a contiguously allocated nonempty set of objects...". For contrast, a structure is "sequentially", not "contiguously" allocated.
There might be padding before or after an array within a structure; that is another animal entirely. The compiler might do that to aid alignment of the structure, but the C standard doesn't say anything about that.
Careful here. Padding may be added at the end of the struct, but will not be added between the elements of the array as you state in your question. Arrays will always reference contiguous memory, though an array of structures may have padding added to each element as part of the struct itself.
In your example, the values and foo.values arrays will have the same size. Any padding will be part of the struct foo instead.
Here's the explanation as to why a structure may need padding between its members or even after its last member, and why an array doesn't:
Different types might have different alignment requirements. Some types need to be aligned on word boundaries, others on double or even quad word boundaries. To accomplish this, a structure may contain padding bytes between its members. Trailing padding bytes might be needed because the memory location directly ofter a structure must also conform to the structure's alignment requirements, ie if bar is of type struct foo *, then
(struct foo *)((char *)bar + sizeof(struct foo))
yields a valid pointer to struct foo (ie doesn't fail due to mis-alignment).
As each 'member' of an array has the same alignment requirement, there's no reason to introduce padding. This holds true for arrays contained in structures as well: If an array's first elment is correctly aligned, so are all following elements.
Yes, sort of. Variables are often aligned to some boundry, depending on the variable. Take the following, for instance:
typedef struct
{
double d;
char c;
} a_type_t;
double and char are 8 and 1 bytes, on my system, respectively. Total of 9. That structure, however, will be 16 bytes, so that the doubles will always be 8-byte aligned. If I had just used ints, chars, etc, then the alignment might be 1, 2, 4, or 8.
For some type T, sizeof(T) may or may not equal sizeof(T.a) + sizeof(T.b) + sizeof(T.c) ... etc.
Generally, this is entirely compiler and architecture dependent. In practice, it never matters.
Consider:
struct {
short s;
int i;
} s;
Assuming shorts are 16 bits and you're on 32 bits, the size will probably be 8 bytes as each struct members tends to be aligned a word (32 bit in this case) boundary. I say "probably" because it is implementation specific behaviour that can be varied by compiler flags and the like.
It's worth stressing that this is implementation behaviour not necessarily defined by the C standard. Much like the size of shorts, ints and longs (the C standard simply says shorts won't be larger than ints and longs won't be smaller than ints, which can end up as 16/32/32, 16/32/64, 32/32/64 or a number of other configurations).

Resources