What are the semantics of overlapping objects in C?

What are the semantics of overlapping objects in C? - c

Consider the following struct:
struct s {
int a, b;
};
Typically1, this struct will have size 8 and alignment 4.
What if we create two struct s objects (more precisely, we write into allocated storage two such objects), with the second object overlapping the first?
char *storage = malloc(3 * sizeof(struct s));
struct s *o1 = (struct s *)storage; // offset 0
struct s *o2 = (struct s *)(storage + alignof(struct s)); // offset 4
// now, o2 points half way into o1
*o1 = (struct s){1, 2};
*o2 = (struct s){3, 4};
printf("o2.a=%d\n", o2->a);
printf("o2.b=%d\n", o2->b);
printf("o1.a=%d\n", o1->a);
printf("o1.b=%d\n", o1->b);
Is anything about this program undefined behavior? If so, where does it become undefined? If it is not UB, is it guaranteed to always print the following:
o2.a=3
o2.b=4
o1.a=1
o1.b=3
In particular, I want to know what happens to the object pointed to by o1 when o2, which overlaps it, is written. Is it still allowed to access the unclobbered part (o1->a)? Is accessing the clobbered part o1->b simply the same as accessing o2->a?
How does effective type apply here? The rules are clear enough when you are talking about non-overlapping objects and pointers that point to the same location as the last store, but when you start talking about the effective type of portions of objects or overlapping objects it is less clear.
Would anything change if the the second write was of a different type? If the members were say int and short rather than two ints?
Here's a godbolt if you want to play with it there.
1 This answer applies to platforms where this isn't the case too: e.g., some might have size 4 and alignment 2. On a platform where the size and alignment were the same, this question wouldn't apply since aligned, overlapping objects would be impossible, but I'm not sure if there is any platform like that.

Basically this is all grey area in the standard; the strict aliasing rule specifies basic cases and leaves the reader (and compiler vendors) to fill in the details.
There have been efforts to write a better rule but so far they haven't resulted in any normative text and I'm not sure what the status of this is for C2x.
As mentioned in my answer to your previous question, the most common interpretation is that p->q means (*p).q and the effective type applies to all of *p, even though we then go on to apply .q .
Under this interpretation, printf("o1.a=%d\n", o1->a); would cause undefined behaviour as the effective type of the location *o1 is not s (since part of it has been overwritten).
The rationale for this interpretation can be seen in a function like:
void f(s* s1, s* s2)
{
s2->a = 5;
s1->b = 6;
printf("%d\n", s2->a);
}
With this interpretation the last line could be optimised to puts("5"); , but without it, the compiler would have to consider that the function call may have been f(o1, o2); and therefore lose all benefits that are purportedly provided by the strict aliasing rule.
A similar argument applies to two unrelated struct types that both happen to have an int member at different offset.

Related

What is the point of "pointer types" when you dynamically allocate memory?

Why do we have pointer types? eg
int *ptr;
I know its for type safety, eg to dereference 'ptr', the compiler needs to know that its dereferencing the ptr to type int, not to char or long, etc, but as others outlined here Why to specify a pointer type? , its also because "we should know how many bytes to read. Dereferencing a char pointer would imply taking one byte from memory while for int it could be 4 bytes." That makes sense.
But what if I have something like this:
typedef struct _IP_ADAPTER_INFO {
struct _IP_ADAPTER_INFO* Next;
DWORD ComboIndex;
char AdapterName[MAX_ADAPTER_NAME_LENGTH + 4];
char Description[MAX_ADAPTER_DESCRIPTION_LENGTH + 4];
UINT AddressLength;
BYTE Address[MAX_ADAPTER_ADDRESS_LENGTH];
DWORD Index;
UINT Type;
UINT DhcpEnabled;
PIP_ADDR_STRING CurrentIpAddress;
IP_ADDR_STRING IpAddressList;
IP_ADDR_STRING GatewayList;
IP_ADDR_STRING DhcpServer;
BOOL HaveWins;
IP_ADDR_STRING PrimaryWinsServer;
IP_ADDR_STRING SecondaryWinsServer;
time_t LeaseObtained;
time_t LeaseExpires;
} IP_ADAPTER_INFO, *PIP_ADAPTER_INFO;
PIP_ADAPTER_INFO pAdapterInfo = (IP_ADAPTER_INFO *)malloc(sizeof(IP_ADAPTER_INFO));
What would be the point of declaring the type PIP_ADAPTER_INFO here? After all, unlike the previous example, we've already allocated enough memory for the pointer to point at (using malloc), so isn't defining the type here redundant? We will be reading as much data from memory as there has been allocated.
Also, side note: Is there any difference between the following 4 declarations or is there a best practice?
PIP_ADAPTER_INFO pAdapterInfo = (IP_ADAPTER_INFO *)malloc(sizeof(IP_ADAPTER_INFO));
or
PIP_ADAPTER_INFO pAdapterInfo = (PIP_ADAPTER_INFO)malloc(sizeof(IP_ADAPTER_INFO));
or
IP_ADAPTER_INFO *pAdapterInfo = (IP_ADAPTER_INFO *)malloc(sizeof(IP_ADAPTER_INFO));
or
IP_ADAPTER_INFO *pAdapterInfo = (PIP_ADAPTER_INFO)malloc(sizeof(IP_ADAPTER_INFO));

You’re kind of asking two different questions here - why have different pointer types, and why hide pointers behind typedefs?
The primary reason for distinct pointer types comes from pointer arithmetic - if p points to an object of type T, then the expression p + 1 points to the next object of that type. If p points to an 4-byte int, then p + 1 points to the next int. If p points to a 128-byte struct, then p + 1 points to the next 128-byte struct, and so on. Pointers are abstractions of memory addresses with additional type semantics.
As for hiding pointers behind typedefs...
A number of us (including myself) consider hiding pointers behind typedefs to be bad style if the user of the type still has to be aware of the type’s “pointer-ness” (i.e., if you ever have to dereference it, or if you ever assign the result of malloc/calloc/realloc to it, etc.). If you’re trying to abstract away the “pointer-ness” of something, you need to do it in more than just the declaration - you need to provide a full API that hides all the pointer operations as well.
As for your last question, best practice in C is to not cast the result of malloc. Best practice in C++ is to not use malloc at all.

I think this is more a question of type definition style than of dynamic memory allocation.
Old-school C practice is to describe structs by their tags. You say
struct foo {
...
};
and then
struct foo foovar;
or
struct foo *foopointer = malloc(sizeof(struct foo));
But a lot of people don't like having to type that keyword struct all the time. (I guess I can't fault then; C has always favored terseness, sometimes seemingly just to reduce typing.) So a form using typedef became quite popular (and it either influenced, or was influenced by, C++):
typedef struct {
...
} Foo;
and then
Foo foovar;
or
Foo *foopointer = malloc(sizeof(Foo));
But then, for reasons that are less clear, it became popular to throw the pointerness into the typedef, too, like this:
typedef struct {
...
} Foo, *Foop;
Foop foopointer = malloc(sizeof(*Foop));
But this is all a matter of style and personal preference, in the service of what someone imagines to be clarity or convenience or usefulness. (But of course opinions on clarity and convenience, like opinions on style, can legitimately vary.) I've seen the pointer typedefs disparaged as being a misleading or Microsoftian practice, but I'm not sure I can fault them right now.
You also asked about the casts, and we could also dissect various options for the sizeof call as the argument to malloc.
It doesn't really matter whether you say
Foop foopointer = (Foop)malloc(sizeof(*Foop));
or
Foop foopointer = (Foo *)malloc(sizeof(*Foop));
The first one may be clearer, in that you don't have to go back and check that Foop and Foo * are the same thing. But they're both poor practice in C, and in at least some circles they've been deprecated since the 1990's. Those casts are are considered distracting and unnecessary in straight C -- although of course they're necessary in C++, or I suppose if you're using a C++ compiler to compile C code. (If you were writing straight C++ code, of course, you'd typically use new instead of malloc.)
But then what should you put in the sizeof()? Which is better,
Foop foopointer = malloc(sizeof(*Foop));
or
Foop foopointer = malloc(sizeof(Foo));
Again, the first one can be easier to read, since you don't have to go back and check that Foop and Foo * are the same thing. But by the same token, there's a third form that can be even clearer:
Foop foopointer = malloc(sizeof(*foopointer));
Now you know that, whatever type foopointer points at, you're allocating the right amount of space for it. This idiom works best, though, if it's maximally clear that foopiinter is in fact a pointer that points at some type, meaning that the variants
Foo *foopointer = malloc(sizeof(*foopointer));
or even
struct foo *foopointer = malloc(sizeof(*foopointer));
can be considered clearer still -- and this may be one of the reasons people consider the pointer typedef to be less than perfectly useful.
Bottom line, if you're still with me: If you don't find PIP_ADAPTER_INFO useful, don't use it -- use IP_ADAPTER_INFO (along with explicit *'s when you need them) instead. Someone thought PIP_ADAPTER_INFO might be useful, which is why it's there, but the arguments in favor of its use aren't too compelling.

What is the point of “pointer types” when you dynamically allocate memory?
At least for the example you show there is none.
So the follow up question would be if there were situations where typedefing a pointer made sense.
And the answer is: Yes.
It definitely makes sense if one is in the need of an opaque data type.
A nice example is the pthread_t type which defines a handle to a POSIX thread.
Depending on the implementation it is defined as
typedef struct bla pthread_t;
typedef struct foo * pthread_t;
typedef long pthread_t;
and with this abstracts away the kind of implementation, as it is of no interest to the user, which probably is not the intention with the struct you show in your question.

Why do we have pointer types?
To accommodate architectures where the size and encoding may differ for various types. C ports well to many platforms, even novel ones.
It is not unusual today that pointers to functions have a different size than pointers to objects. An object pointer coverts to a void *, yet a function pointer may not.
A pointer to char need not be the same size as a pointer to an int or union or struct. This is uncommon today. The spec details follow (my emphasis):
A pointer to void shall have the same representation and alignment requirements as a
pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All
pointers to structure types shall have the same representation and alignment requirements
as each other. All pointers to union types shall have the same representation and
alignment requirements as each other. Pointers to other types need not have the same
representation or alignment requirements. C11dr §6.2.5 28

Is it safe to memcpy to a dynamic storage struct?

Context:
I was reviewing some code that receives data from an IO descriptor into a character buffer, does some control on it and then use part of the received buffer to populate a struct, and suddenly wondered whether a strict aliasing rule violation could be involved.
Here is a simplified version
#define BFSZ 1024
struct Elt {
int id;
...
};
unsigned char buffer[BFSZ];
int sz = read(fd, buffer, sizeof(buffer)); // correctness control omitted for brievety
// search the beginning of struct data in the buffer, and process crc control
unsigned char *addr = locate_and_valid(buffer, sz);
struct Elt elt;
memcpy(&elt, addr, sizeof(elt)); // populates the struct
// and use it
int id = elt.id;
...
So far, so good. Provide the buffer did contain a valid representation of the struct - say it has been produced on same platform, so without endianness or padding problem - the memcpy call has populated the struct and it can safely be used.
Problem:
If the struct is dynamically allocated, it has no declared type. Let us replace last lines with:
struct Elt *elt = malloc(sizeof(struct Element)); // no declared type here
memcpy(elt, addr, sizeof(*elt)); // populates the newly allocated memory and copies the effective type
// and use it
int id = elt->id; // strict aliasing rule violation?
...
Draft n1570 for C language says in 6.5 Expressions §6
The effective type of an object for an access to its stored value is the declared type of the
object, if any.87) If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent accesses that do not modify
the stored value. If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then the effective type
of the modified object for that access and for subsequent accesses that do not modify the
value is the effective type of the object from which the value is copied, if it has one.
buffer does have an effective type and even a declared type: it is an array of unsigned char. That is the reason why the code uses a memcpy instead of a mere aliasing like:
struct Elt *elt = (struct Elt *) addr;
which would indeed be a strict aliasing rule violation (and could additionaly come with alignment problems). But if memcpy has given an effective type of an unsigned char array to the zone pointed by elt, everything is lost.
Question:
Does memcpy from an array of character type to a object with no declared type give an effective type of array of character?
Disclaimer:
I know that it works without a warning with all common compilers. I just want to know whether my understanding of standard is correct
In order to better show my problem, let us considere a different structure Elt2 with sizeof(struct Elt2)<= sizeof(struct Elt), and
struct Elt2 actual_elt2 = {...};
For static or automatic storage, I cannot reuse object memory:
struct Elt elt;
struct Elt2 *elt2 = &elt;
memcpy(elt2, &actual_elt2, sizeof(*elt2));
elt2->member = ... // strict aliasing violation!
While it is fine for dynamic one (question about it there):
struct Elt *elt = malloc(sizeof(*elt));
// use elt
...
struct Elt2 *elt2 = elt;
memcpy(elt2, &actual_elt2, sizeof(*elt2));
// ok, memory now have struct Elt2 effective type, and using elt would violate strict aliasing rule
elt2->member = ...; // fine
elt->id = ...; // strict aliasing rule violation!
What could make copying from a char array different?

The code is fine, no strict aliasing violation. The pointed-at data has an effective type, so the bold cited text does not apply. What applies here is the part you left out, last sentence of 6.5/6:
For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
So the effective type of the pointed-at object becomes struct Elt. The returned pointer of malloc does indeed point to an object with no delcared type, but as soon as you point at it, the effective type becomes that of the struct pointer. Otherwise C programs would not be able to use malloc at all.
What makes the code safe is also that you are copying data into that struct. Had you instead just assigned a struct Elt* to point at the same memory location as addr, then you would have a strict aliasing violation and UB.

Lundin's answer is correct; what you are doing is fine (so long as the data is aligned and of same endianness).
I want to note this is not so much a result of the C language specification as it is a result of how the hardware works. As such, there's not a single authoritative answer. The C language specification defines how the language works, not how the language is compiled or implemented on different systems.
Here is an interesting article about memory alignment and strict aliasing on a SPARC versus Intel processor (notice the exact same C code performs differently, and gives errors on one platform while working on another):
https://askldjd.com/2009/12/07/memory-alignment-problems/
Fundamentally, two identical structs, on the same system with the same endian and memory alignment, must work via memcpy. If it didn't then the computer wouldn't be able to do much of anything.
Finally, the following question explains more about memory alignment on systems, and the answer by joshperry should help explain why this is a hardware issue, not a language issue:
Purpose of memory alignment

Using an allocated space to store multiple arrays

Is the following correct code?
Assume it is known that all object pointer types have equal size and alignment, with size not greater than 8.
// allocate some space to A, and set *A and **A to different regions of that space
char*** A = malloc(92);
*A = (char**)( (char*)A + 2*sizeof(char**) );
**A = (char*)*A + 4*sizeof(char*);
// initialize the second char** object
A[1] = *A + 2;
// write four strings further out in the space
strcpy(A[0][0],"string-0-0");
A[0][1] = A[0][0] + strlen(A[0][0]) + 1;
strcpy(A[0][1],"string-0-1");
A[1][0] = A[0][1] + strlen(A[0][1]) + 1;
strcpy(A[1][0],"string-1-0");
A[1][1] = A[1][0] + strlen(A[1][0]) + 1;
strcpy(A[1][1],"string-1-1");
I find stuff like this useful in situations where it may not be straightforward how to deallocate the object. For example, say A[1][1] may or may not be reassigned to the address of a string literal. Either way you just free A. Also, the number of calls to malloc are minimized.
My concern that this may not be correct code is based on the following. Using a draft version of the standard, I have:
7.22.3 Memory management functions
...The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)...
So I am garanteed to be able to use the space as an array of a single type. I can not find any garantee that I can use it as an array of two distinct types (char* and char**). Note that the use of some of the space as char arrays is unique given that any object can be accessed as a character type array.
The rules for effective type are consistent with this approach, as no individual byte is ever used as part of two different types.
While the above indicates there seems to be no explicit violation of the standard, the standard does not explicitly permit the behavior either, and we have paragraph 2 of chapter 4 (emphasis added):
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
That is a little vague when the memory model itself is so vague. The use of space returned by malloc() to store an array of any (one) type is apparently in need of explicit allowance, which allowance I quoted above. So one could argue that the use of that space for disjoint arrays of distinct types also requires explicit allowance and, without it, is left as undefined behavior per chapter 4.
So, to be specific, if the code example is correct, what is wrong with the argument that it is not explicitly defined and therefore undefined, per the part of chapter 4 from the standard that is above quoted?

Provided that the offset of any object from the start of the allocated region is a multiple of the alignment, and provided that no piece of the memory is used as more than one type within the lifetime of the allocation, there will be no problem.
One nasty gotcha is that while there are some algorithms (e.g. for hash tables) that will work very nicely with a table that is initially filled with arbitrary values (given a value that may or may not be correct, code may be able to determine whether the value is correct much more quickly--O(1) vs O(N)--than it could find the correct value without an initial guess), such behavior may not be reliable when using gcc or clang. The way they interpret the Standard, writing memory as one type and reading as another non-character type yields Undefined Behavior even if the destination type would have no trap representations, even if pointers are converted which are converted to a new type are never used (in their original form) after that, and even if code would work correctly for any value the read might have yielded in cases where the data had not yet been written as the new type.
Given:
float *fp;
uint32_t *ip;
*fp = 1.0f;
*ip = 23;
Behavior would be defined if fp and ip identify the same storage, and the
storage would be required to hold 23 afterward. On the other hand, given:
float *fp;
uint32_t *ip;
*fp = 1.0f;
uint32_t temp = *ip;
*ip = 23;
a compiler could take advantage of the Undefined Behavior to reorder the
operations so the write to *fp occurred after the write to *ip. If the data is getting reused for something like a hash table it's unlikely that a compiler would be able to usefully apply such optimization, but a "smart" compiler might reorder writes of useless data past the writes of useful data. Such "optimizations" aren't terribly likely to break things, but there is no Standard-defined way to prevent them except by either freeing and reallocating the storage (if one is using a hosted system and can tolerate the performance hit and possible fragmentation) or else clearing the storage before re-use (which may turn O(1) operations into O(N)).

Typecasting of pointers in C

I know a pointer to one type may be converted to a pointer of another type. I have three questions:
What should kept in mind while typecasting pointers?
What are the exceptions/error may come in resulting pointer?
What are best practices to avoid exceptions/errors?

A program well written usually does not use much pointer typecasting. There could be a need to use ptr typecast for malloc for instance (declared (void *)malloc(...)), but it is not even necessary in C (while a few compilers may complain).
int *p = malloc(sizeof(int)); // no need of (int *)malloc(...)
However in system applications, sometimes you want to use a trick to perform binary or specific operation - and C, a language close to the machine structure, is convenient for that. For instance say you want to analyze the binary structure of a double (that follows thee IEEE 754 implementation), and working with binary elements is simpler, you may declare
typedef unsigned char byte;
double d = 0.9;
byte *p = (byte *)&d;
int i;
for (i=0 ; i<sizeof(double) ; i++) { ... work with b ... }
You may also use an union, this is an exemple.
A more complex utilisation could be the simulation of the C++ polymorphism, that requires to store the "classes" (structures) hierarchy somewhere to remember what is what, and perform pointer typecasting to have, for instance, a parent "class" pointer variable to point at some time to a derived class (see the C++ link also)
CRectangle rect;
CPolygon *p = (CPolygon *)&rect;
p->whatami = POLY_RECTANGLE; // a way to simulate polymorphism ...
process_poly ( p );
But in this case, maybe it's better to directly use C++!
Pointer typecast is to be used carefully for well determined situations that are part of the program analysis - before development starts.
Pointer typecast potential dangers
use them when it's not necessary - that is error prone and complexifies the program
pointing to an object of different size that may lead to an access overflow, wrong result...
pointer to two different structures like s1 *p = (s1 *)&s2; : relying on their size and alignment may lead to an error
(But to be fair, a skilled C programmer wouldn't commit the above mistakes...)
Best practice
use them only if you do need them, and comment the part well that explains why it is necessary
know what you are doing - again a skilled programmer may use tons of pointer typecasts without fail, i.e. don't try and see, it may work on such system / version / OS, and may not work on another one

In plain C you can cast any pointer type to any other pointer type. If you cast a pointer to or from an uncompatible type, and incorrectly write the memory, you may get a segmentation fault or unexpected results from your application.
Here is a sample code of casting structure pointers:
struct Entity {
int type;
}
struct DetailedEntity1 {
int type;
short val1;
}
struct DetailedEntity2 {
int type;
long val;
long val2;
}
// random code:
struct Entity* ent = (struct Entity*)ptr;
//bad:
struct DetailedEntity1* ent1 = (struct DetailedEntity1*)ent;
int a = ent->val; // may be an error here, invalid read
ent->val = 117; // possible invali write
//OK:
if (ent->type == DETAILED_ENTITY_1) {
((struct DetailedEntity1*)ent)->val1;
} else if (ent->type == DETAILED_ENTITY_2) {
((struct DetailedEntity2*)ent)->val2;
}
As for function pointers - you should always use functions which exactly fit the declaration. Otherwise you may get unexpected results or segfaults.
When casting from pointer to pointer (structure or not) you must ensure that the memory is aligned in the exact same way. When casting entire structures the best way to ensure it is to use the same order of the same variables at the start, and differentiating structures only after the "common header". Also remember, that memory alignment may differ from machine to machine, so you can't just send a struct pointer as a byte array and receive it as byte array. You may experience unexpected behaviour or even segfaults.
When casting smaller to larger variable pointers, you must be very careful. Consider this code:
char* ptr = malloc (16);
ptr++;
uint64_t* uintPtr = ptr; // may cause an error, memory is not properly aligned
And also, there is the strict aliasing rule that you should follow.

You probably need a look at ... the C-faq maintained by Steve Summit (which used to be posted in the newsgroups, which means it was read and updated by a lot of the best programmers at the time, sometimes the conceptors of the langage itself).
There is an abridged version too, which is maybe more palatable and still very, very, very, very useful. Reading the whole abridged is, I believe, mandatory if you use C.

Is a struct of pointers guaranteed to be represented without padding bits?

I have a linked list, which stores groups of settings for my application:
typedef struct settings {
struct settings* next;
char* name;
char* title;
char* desc;
char* bkfolder;
char* srclist;
char* arcall;
char* incfold;
} settings_row;
settings_row* first_profile = { 0 };
#define SETTINGS_PER_ROW 7
When I load values into this structure, I don't want to have to name all the elements. I would rather treat it like a named array -- the values are loaded in order from a file and placed incrementally into the struct. Then, when I need to use the values, I access them by name.
//putting values incrementally into the struct
void read_settings_file(settings_row* settings){
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
//accessing components by name
void settings_info(settings_row* settings){
printf("Settings 'profile': %s\n", settings.title);
printf("Description: %s\n", settings.desc);
printf("Folder to backup to: %s\n", settings.bkfolder);
}
But I wonder, since these are all pointers (and there will only ever be pointers in this struct), will the compiler add padding to any of these values? Are they guaranteed to be in this order, and have nothing between the values? Will my approach work sometimes, but fail intermittently?
edit for clarification
I realize that the compiler can pad any values of a struct--but given the nature of the struct (a struct of pointers) I thought this might not be a problem. Since the most efficient way for a 32 bit processor to address data is in 32 bit chunks, this is how the compiler pads values in a struct (ie. an int, short, int in a struct will add 2 bytes of padding after the short, to make it into a 32 bit chunk, and align the next int to the next 32 bit chunk). But since a 32 bit processor uses 32 bit addresses (and a 64 bit processor uses 64 bit addresses (I think)), would padding be totally unnecessary since all of the values of the struct (addresses, which are efficient by their very nature) are in ideal 32 bit chunks?
I am hoping some memory-representation / compiler-behavior guru can come shed some light on whether a compiler would ever have a reason to pad these values

Under POSIX rules, all pointers (both function pointers and data pointers) are all required to be the same size; under just ISO C, all data pointers are convertible to 'void *' and back without loss of information (but function pointers need not be convertible to 'void *' without loss of information, nor vice versa).
Therefore, if written correctly, your code would work. It isn't written quite correctly, though! Consider:
void read_settings_file(settings_row* settings)
{
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
Let's assume you're using a 32-bit machine with 8-bit characters; the argument is not all that significantly different if you're using 64-bit machines. The assignment to 'field' is all wrong, because settings + 4 is a pointer to the 5th element (counting from 0) of an array of 'settings_row' structures. What you need to write is:
void read_settings_file(settings_row* settings)
{
char* field = (char *)settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
The cast before addition is crucial!
C Standard (ISO/IEC 9899:1999):
6.3.2.3 Pointers
A pointer to void may be converted to or from a pointer to any incomplete or object
type. A pointer to any incomplete or object type may be converted to a pointer to void
and back again; the result shall compare equal to the original pointer.
[...]
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.

In many cases pointers are natural word sizes, so the compiler is unlikely to pad each member, but that doesn't make it a good idea. If you want to treat it like an array you should use an array.
I'm thinking out loud here so there's probably many mistakes but perhaps you could try this approach:
enum
{
kName = 0,
kTitle,
kDesc,
kBkFolder,
kSrcList,
kArcAll,
kIncFold,
kSettingsCount
};
typedef struct settings {
struct settings* next;
char *settingsdata[kSettingsCount];
} settings_row;
Set the data:
settings_row myRow;
myRow.settingsData[kName] = "Bob";
myRow.settingsData[kDescription] = "Hurrrrr";
...
Reading the data:
void read_settings_file(settings_row* settings){
char** field = settings->settingsData;
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}

It's not guaranteed by the C standard. I've a sneaking suspicion, that I don't have time to check right now either way, that it guarantees no padding between the char* fields, i.e. that consecutive fields of the same type in a struct are guaranteed to be layout-compatible with an array of that type. But even if so, you're on your own between the settings* and the first char*, and also between the last char* and the end of the struct. But you could use offsetof to deal with the first issue, and I don't think the second affects your current code.
However, what you want is almost certainly guaranteed by your compiler, which somewhere in its documentation will set out its rules for struct layout, and will almost certainly say that all pointers to data are word sized, and that a struct can be the size of 8 words without additional padding. But if you want to write highly portable code, you have to use only the guarantees in the standard.
The order of fields is guaranteed. I also don't think you'll see intermittent failure - AFAIK the offset of each field in that struct will be consistent for a given implementation (meaning the combination of compiler and platform).
You could assert that sizeof(settings*) == sizeof(char*) and sizeof(settings_row) == sizeof(char*)*8. If both those hold, there is no room for any padding in the struct, since fields are not allowed to "overlap". If you ever hit a platform where they don't hold, you'll find out.
Even so, if you want an array, I'd be inclined to say use an array, with inline accessor functions or macros to get the individual fields. Whether your trick works or not, it's even easier not to think about it at all.

Although not a duplicate, this probably answers your question:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
It's not uncommon for applications to write an entire struct into a file and read it back out again. But this suffers from the possibility that one day the file will need to be read back on another platform, or by another version of the compiler that packs the struct differently. (Although this can be dealt with by specially-written code that understands the original packing format).

Technically, you can rely only on the order; the compiler could insert padding. If different pointers were of different size, or if the pointer size wasn't a natural word size, it might insert padding.
Practically speaking, you could get away with it. I wouldn't recommend it; it's a bad, dirty trick.
You could achieve your goal with another level of indirection (what doesn't that solve?), or by using a temporary array initialized to point to the various members of the structure.

It's not guaranteed, but it will work fine in most cases. It won't be intermittent, it will either work or not work on a particular platform with a particular build. Since you're using all pointers, most compilers won't mess with any padding.
Also, if you wanted to be safer, you could make it a union.

You can't do that the way you are trying. The compiler is allowed to pad any and all members of the struct. I do not believe it is allowed to reorder the fields.
Most compilers have an attribute that can be applied to the struct to pack it (ie to turn it into a collection of tightly packed storage with no padding), but the downside is that this generally affects performance. The packed flag will probably allow you to use the struct the way you want, but it may not be portable across various platforms.
Padding is designed to make field access as efficient as possible on the target architecture. It's best not to fight it unless you have to (ie, the struct goes to a disk or over a network.)

It seems to me that this approach creates more problems than it solves.
When you read this code six months from now, will you still be aware of all the subtleties of how the compiler pads a struct?
Would someone else, who didn't write the code?
If you must use the struct, use it in the canonical way and just write a function which
assigns values to each field separately.
You could also use an array and create macros to give field names to indices.
If you get too "clever" about optimizing your code, you will end up with slower code anyway, since the compiler won't be able to optimize it as well.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

What are the semantics of overlapping objects in C? - c

Related

What is the point of "pointer types" when you dynamically allocate memory?

Is it safe to memcpy to a dynamic storage struct?

Using an allocated space to store multiple arrays

Typecasting of pointers in C

Is a struct of pointers guaranteed to be represented without padding bits?

Categories

Resources