Is casting incomplete struct pointers undefined behavior? - c

I was currently reading about some strict aliasing rules, and I was wondering whether casting a pointer to an incomplete struct is undefined behavior.
Example 1:
#include <stdlib.h>
struct abc;
int main(int argc, char *argv[])
{
struct abc *mystruct;
char *buf;
buf = malloc(100);
mystruct = (struct abc*)buf;
// and then mystruct could be submitted to a function, where it is
// casted back to a "char *", the "struct abc" will never be completed.
return 0;
}
Example 2:
struct abc1;
struct abc2;
int foo(struct abc1 *mystruct1)
{
struct abc2 *mystruct2;
mystruct2 = (struct abc2 *)mystruct1;
// and then mystruct2 could be submitted to a function, where it is
// casted to a "char *", both structs stay incomplete.
return 0;
}
So, my question: Is casting pointers to incomplete structs like in those two examples prohibited by the c11 standard, and if so, which part of the standard does forbid it?

One key relevant part of the standard is C11 §6.2.5 Types ¶28:
28 A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.48) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
48) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
Another is §6.3 Conversions and particularly §6.3.2.3 Pointers ¶7:
7 A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
68) In general, the concept ''correctly aligned'' is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
Thus, my understanding is that there is no problem with either example code fragment shown in the question. The structure types are incomplete, but that is not a problem for the operations shown. The 'and then' sections of the code need not be problematic — it depends on what is actually there.

Related

Is it legal C to obtain the pointer to a struct from the pointer to its 2nd member?

I'm wondering if the line preceded by the comment "Is this legal C?" (in the function dumpverts() at the bottom) is legal C or not:
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
struct stvertex
{
double x;
double y;
char tag;
};
struct stmesh
{
size_t nverts;
struct stvertex verts[]; /* flexible array member */
};
void dumpverts(struct stvertex *ptr);
int main(int argc, char **argv)
{
size_t f;
size_t usr_nverts=5; /* this would come from the GUI */
struct stmesh *m = malloc(sizeof(struct stmesh) + usr_nverts*sizeof(struct stvertex));
if(m==NULL) return EXIT_FAILURE;
m->nverts=usr_nverts;
for(f=0;f<m->nverts;f++)
{
m->verts[f].x = f*10.0; /* dumb values just for testing */
m->verts[f].y = f*7.0;
m->verts[f].tag = 'V';
}
dumpverts( &(m->verts[0]) );
return EXIT_SUCCESS;
}
void dumpverts(struct stvertex *ptr) /* Here is were the juice is */
{
size_t f;
/* Is this legal C? */
struct stmesh *themesh = (struct stmesh *)((char *)ptr - offsetof(struct stmesh, verts));
for(f=0;f<themesh->nverts;f++)
{
printf("v[%zu] = (%g,%g) '%c'\n", f, themesh->verts[f].x, themesh->verts[f].y, themesh->verts[f].tag);
}
fflush(stdout);
}
I tend to believe it's legal, but I'm not 100% sure if the strict aliasing rule would permit the cast from char * to struct stmesh * like the interesting line in the dumpverts() function body is doing.
Basically, that line is obtaining the pointer to the struct stmesh from the pointer to its second member. I don't see any alignment-related potential issues, because the memory for the whole struct stmesh came from malloc(), so the beginning of the struct is "suitably aligned". But I'm not sure about the strict aliasing rule, as I said.
If it breaks strict aliasing, can it be made compliant without changing the prototype of the dumpverts() function?
If you wonder what I want this for, it's mainly for learning where are the limits of offsetof(). Yes, I know dumpverts() should be receiving a pointer to struct stmesh instead. But I'm wondering if obtaining the struct stmesh pointer programmatically would be possible in a legal way.
Yes, it's valid. You can convert any non-function pointer to and from char *: there's an explicit part of the standard allowing that:
C17, section 6.3.2.3, clause 7:
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
The reason this is allowed is exactly so you can do tricks like the one you're showing. Note, however, that this is only valid if the pointer comes from a struct stmesh in the first place (even if you don't have that struct in scope when you're doing that).
Sidenote: you don't need offsetof(struct stmesh, nverts) at all in your example. It's guaranteed to be zero. Section 6.7.2.1, clause 15:
A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
Pedantically, there is nothing in the C standard explicitly stating that the code is well-defined. I'd say that it's somewhere between questionable and undefined behavior.
Strict aliasing concerns: not a problem. To de-reference some address through a pointer to struct is fine as far as strict aliasing goes, as long as what's actually stored at that location is of the correct effective type (C17 6.5 §6 and §7).
Character pointer conversion: questionable. Any type in C may be inspected byte by byte through the use of a character pointer. This is in line with "Strict aliasing" C17 6.5 §7 and also the pointer conversion rules in C17 6.3.2.3, emphasis mine:
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
Your pointer does not point to the lowest addressed byte in the surrounding struct type. Nor do you use successive increments. Alignment is another issue but I don't think it will be a problem in your case.
Pointer arithmetic: questionable. Pointer arithmetic is defined by the additive operators C17 6.5.6, which strictly speaking only allow pointer arithmetic on array types. Where a single struct variable may be regarded as an array of 1 such struct item. To make sense of the previously quoted 6.3.2.3 in terms of pointer arithmetic, I think it must be interpreted as a character array of sizeof(the_struct) bytes. Decreasing a character pointer pointing into the middle of a struct is not covered by the rules of pointer arithmetic - strictly speaking it sorts under §8 "...otherwise, the behavior is undefined".
Initial struct member/initial common sequence rules: do not apply. There's a special rule allowing us to convert between a struct pointer and a pointer to its first element (C17 6.7.2.1 §15) but that does not apply here. There is also a special rule for "common initial sequence" of two structs in a union, also does not apply here.
This might be a more well-defined version:
dumpverts( (uintptr_t) &(m->verts[0]) );
...
void dumpverts (uintptr_t ptr)
{
struct stmesh* themesh = (struct stmesh *)(ptr - offsetof(struct stmesh, verts));
This is plain integer arithmetic. Your only concerns here are alignment and strict aliasing, which should be ok. Integer to/from pointer conversions with uintptr_t are otherwise fine (impl.defined), C17 6.3.2.3 §5 and §6.

Is casting from char * to other compatibly-aligned pointer types defined behavior in C89?

Are explicit casts from char * to other pointer types fully defined behavior according to ANSI C89 if the pointer is guaranteed to meet the alignment requirements of the type you're casting to? Here's an example of what I mean:
/* process.c */
void *process(size_t elem_size, size_t cap) {
void *arr;
assert(cap > 5);
arr = malloc(elem_size * cap);
/* set id of element 5 to 0xffffff */
*(long *)((char *)arr + elem_size*5) = 0xffffff;
/* rest of the code omitted */
return arr;
}
/* main.c */
struct some_struct { long id; /* other members omitted */ };
struct other_struct { long id; /* other members omitted */ };
int main(int argc, char **argv) {
struct some_struct *s = process(sizeof(struct some_struct), 40);
printf("%lx\n", s[5].id);
return 0;
}
This code compiles without warnings and works as expected on my machine but I'm not fully sure if these kinds of casts are defined behavior.
C89 draft, section 4.10.3 (Memory management functions):
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object in the space allocated (until the space is explicitly freed or reallocated).
C89 draft, section 3.3.4 (Cast operators):
A pointer to an object or incomplete type may be converted to a pointer to a different object type or a different incomplete type. The resulting pointer might not be valid if it is improperly aligned for the type pointed to. It is guaranteed, however, that a pointer to an object of a given alignment may be converted to a pointer to an object of the same alignment or a less strict alignment and back again; the result shall compare equal to the original pointer.
This clearly specifies what happens if you cast from struct some_struct * to char * and back to struct some_struct * but in my case the code responsible for allocation doesn't have access to the full struct definition so it can't initially specify the pointer type to be struct some_struct * so I'm not sure if the rule still applies.
If the code I posted is technically UB, is there another standards-compliant way to modify array elements without knowing their full type? Are there real-world impementations where you would expect it to do something else than ((struct some_struct *)arr)[5].id = 0xffffff;?
This code compiles without warnings and works as expected on my machine but I'm not fully sure if these kinds of casts are defined behavior.
In general, the casts have defined behavior, but that behavior can be that the result is not a valid pointer. Thus dereferencing the result of the cast may produce UB.
Considering only function process(), then, it is possible that the result of its evaluation of (long *)((char *)arr + elem_size*5) is an invalid pointer. If and only if it is invalid, the attempt to use it to assign a value to the object it hypothetically points to produces UB.
However, main()s particular usage of that function is fine:
In process(), the pointer returned by malloc and stored in arr is suitably aligned for a struct some_struct (and for any other type).
The compiler must choose a size and layout for struct some_struct such that every element and every member of every element of an array of such structures is properly aligned.
Arrays are composed of contiguous objects of the array's element type, without gaps (though the element types themselves may contain padding if they are structures or unions).
Therefore, (char *)arr + n * sizeof(struct some_struct) must be suitably aligned for a struct some_struct, for any integer n such that the result points within or just past the end of the allocated region. This computation is closely related to the computations involved in accessing an array of struct some_struct via the indexing operator.
struct some_struct has a long as its first member, and that member must appear at offset 0 from the beginning of the struct. Therefore, every pointer that is suitably aligned for a struct some_struct must also be suitably aligned for a long.
Determining whether an operation upholds or violates "type aliasing" constraints requires being able to answer two questions:
For purposes of the aliasing rules, when does a region of storage hold an "object" of a particular type.
For purposes of the aliasing rules, is a particular access performed "by" an lvalue expression of a particular type.
For your question, the second issue above is the most the relevant: if a pointer or lvalue of type T1 is used to derive a value of type T2* which is dereferenced to access storage, the access may sometimes need to be regarded for purposes of aliasing as though performed by an lvalue of type T1, and sometimes as though performed by one of type T2, but the Standard fails to offer any guidance as to which interpretation should apply when. Constructs like yours would be processed predictably by implementation that didn't abuse the Standard as an excuse to behave nonsensically, but could be processed nonsensically by conforming but obtuse implementations that do abuse the Standard in such fashion.
The authors of C89 didn't expect anyone to care about the precise boundaries between constructs whose behavior was defined by the Standard, versus those which all implementations were expected to process identically but which the Standard didn't actually define, and thus saw no need to define the terms "object" and "by" with sufficient precision to unambiguously answer the above questions in ways that would yield defined program behavior in all sensible cases.

Reading through an char array passed as void* with pointer incrementation and later read as chars and other datatypes?

So to clear out misunderstandings from the title (not sure how to ask the question in the title) I want to read from a file(char array), pass it as an void* so i can read undependable of datatype by incrementing the pointer. So here's an simple example of what I want to do in C code:
char input[] = "D\0\0Ckjh\0";
char* pointer = &input[0]; //lets say 0x00000010
char type1 = *pointer; //should be 'D'
pointer += sizeof(char); //0x00000020
uint16_t value1 = *(uint16_t*)pointer; //should be 0
pointer += sizeof(uint16_t); //0x00000040
char type2 = *pointer; //should be 'C'
pointer += sizeof(char); //0x00000050
uint32_t value2 = *(uint32_t*)pointer; //should be 1802135552
This is just for educational purpose, so I would just like to know if it is possible or if there is a way to achieve the same goal or something alike. Also the speed of this would be nice to know. Would it be faster to just keep the array and just make bitshifting on the chars as you read them or is this actually faster?
Edit: edit on the c code and changed void* to char*;
This is wrong in two ways:
void is an incomplete type that cannot be completed. An incomplete type is a type without a known size. In order to do pointer arithmetics, the size must be known. The same is true for dereferencing a pointer. Some compilers attribute the size of a char to void, but that's an extension you should never rely on. Incrementing a pointer to void is wrong and can't work.
What you have is an array of char. Accessing this array through a pointer of a different type violates strict aliasing, you're not allowed to do that.
That's actually not what your current code does -- looking at this line:
uint32_t value2 = (int)*pointer; //should be 1802135552
You're just converting the single byte (assuming your pointer points to char, see my first point) to an uint32_t. What you probably meant is
uint32_t value2 = *(uint32_t *)pointer; //should be 1802135552
which might do what you expect, but is technically undefined behavior.
The relevant reference for this second point is e.g. in §6.5 p7 in N1570, the latest draft for C11:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
The reasoning for this very strict rule is for example that it enables compilers to do optimizations based on the assumption that two pointers of different types (except char *) can never alias. Other reasons include alignment restrictions on some platforms.
Even if you fix your code to cast pointer to correct type (like int *) before dereferencing it, you might have problems with alignment. For example on some architectures you simply can not read an 4-byte int if it is not aligned to 4-byte word boundary.
A solution which would definitely work is to use something like this:
int result;
memcpy(&result, pointer, sizeof(result));
UPDATE:
in the updated code in the question
uint16_t value1 = *(uint16_t*)pointer;
exactly violates strict aliasing. It's invalid code.
For more details, read the rest of the answer.
Initial version:
Technically, you are not allowed to dereference a void pointer in first place.
Quoting C11, chapter §6.5.3.2
[...] If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. [...]
but, a void is a forever-incomplete type, so the storage size is not known, hence the dereference is not possible.
A gcc extension allows you to dereference the void pointer and perform arithmatic operation on them, considering it as alias for a char pointer, but better, do not reply on this. Please cast the pointer to either a character type or the actual type (or compatible) and then, go ahead with dereference.
That said, if you cast the pointer itself to some other type than a character type or an incompatible type with the original pointer, you'll violate strict aliasing rule.
As mentioned in chapter §6.5,
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
and, chapter §6.3.2.3
[....] When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.

Is pointer conversion through a void pointer well defined?

Suppose I have some structures defined like:
struct foo { int a; };
struct bar { struct foo r; int b; };
struct baz { struct bar z; int c; };
Does the C standard guarantee that the following code is strictly conforming?
struct baz x;
struct foo *p = (void *)&x;
assert(p == &x.z.r);
The motivation for this construct is to provide a consistent programming idiom for casting to a pointer type that is known to be compatible.
Now, this is what C says about how structures and its initial members are convertible:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
C.11 §6.7.2.1¶15
This is what it says about void pointer conversions:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
C.11 §6.3.2.3¶1
And this is what it says about converting between object pointer types:
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a
pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
C.11 §6.3.2.3¶7
My understanding from the above is that converting an object pointer to an object pointer of a different type via a void * conversion is perfectly fine. But, I got a comment that suggests otherwise.
Your example is strictly conforming.
The 2nd sentence from §6.7.2.1 ¶15 (A pointer to a
structure object, suitably converted, points to its initial member ... and vice versa.) guarantees the following equalities :
(sruct bar *) &x == &(x.z)
(struct foo *) &(x.z) == &(x.z.r)
As you are at the beginning of the struct, no padding can occur, and my understanding of the standard is that the address of a struct and of its first element are the same.
So struct foo *p = (void *) &x; is correct as would be struct foo *p = (struct foo *) &x;
In that particular case, the alignment is guaranteed to be correct per §6.7.2.1 ¶15. And it is always allowed to pass via a void *, but it is not necessary, because §6.3.2.3 ¶7 allows the conversion between pointers to different objects, provided there is no alignment problem
And it should be noted that §6.2.3.2 ¶7 also says : When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object that means that all those pointers point in fact to the lowest addressed byte of x.r.z.a. So you could also pass via pointers to char because we also have :
(char *) &x == (char *) &(x.z) == (char *) &(x.z.r) == (char *) &(x.z.r.a)
To complete the analysis, it is necessary to see the definition of how pointer equality is defined:
... If one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void, the former is converted to the type of the latter.
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
C.11 §6.5.9¶5-6
So, here is an argument that it is well-defined:
(void *)&x == (void *)&x.z §6.7.2.1¶15, §6.5.9¶5-6
(void *)&x.z == &x.z.r §6.7.2.1¶15, §6.5.9¶5-6
(void *)&x == &x.z.r transitive equality
(struct foo *)(void *)&x == (void *)&x §6.3.2.3¶1, §6.5.9¶5-6
(struct foo *)(void *)&x == &x.z.r transitive equality
The last step above is the essence of initializing p and the assertion from the code in the question.
IMHO yes, apart on what strict interpretations of standard can make questionable, allocation of objects in memory follows the same rule on the same compiler: supply an address suitable for any kind of variable. Because each structure starts with its first variable for the transitive property the structure itself will be aligned to an address that suits any kind of variable. The latter close the doubt that different structures have different addresses, no modification can be made between conversions because the address definition follows the same rules. That's of course not true for following structure fields, which can be not contiguous to conform with alignment requirements of following fields.
If you operate on the first element of a structure it is guarantee that is the same as the first field of the structure itself.
Now have a look to one of the most diffused piece of software all around: the Independent JPEG group JPEGlib.
The whole software, compiled on many processors and machines, uses a technique that resembles C++ management passing structures wich beginning is always the same, but that holds many other, and different substructures and fields between calls.
This code compile and runs on anything from toys to PC's to tablets, etc...
Yes, in terms of language standard, your example is strictly conforming, thus, perfectly legal. This essentially comes from 2 quotes you provided (important is highlighted). The first one:
A pointer to void may be converted to or from a pointer to any object type.
This means that in your assignment in code we have a successfull cast from struct baz pointer to void pointer and, after that, successfull cast from void pointer to struct due to the fact that both pointers are aligned equally. If that was not the case, we would have undefined behaviour due to non-compliance to 6.3.2.3 that you provided.
And the second one:
68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
And this one is more important. It does not state (nor should it) that types A and C must be the same which, in turn, allows them not to. The only restriction is the alingment.
That's pretty much it.
Of course, however, such manipulations are unsafe for obvious reasons.

Is Cast to void** needed?

I have the compiler complaining (warning) about the folowing.
Am I missing something? Because I thought this didn't need a cast
char* CurrentCh = some ptr value;
int size;
size = func(&CurrentCh);
with func defined like this
int func(void** ptr);
Compiler warning:
passing argument 1 of 'func'
from incompatible pointer type
Thx
In C you can pass any pointer type to a function that expects a void*. What it says is "I need a pointer to something, it doesn't matter what it points to". Whereas void** says "I need a pointer to a void*, not a pointer to another pointer type".
In C, void * is the generic pointer type. But void ** is not a generic pointer-to-pointer type! If you want to be able to pass a pointer to a pointer in a generic way, you should use void * anyway:
#include <stdio.h>
void func(void *ptr)
{
char **actual = ptr;
const char *data = *actual;
printf("%s\n", data);
}
int main(void)
{
char *test = "Hello, world";
func(&test);
return 0;
}
The cast is necessary as what you do is a form of type punning: You reinterpret the memory which is pointed to from char * to void *.
For these types, the C standard guarantees that this actually works as char * and void * have the same representation. For other type combinations, this may not be the case.
The relevant parts of the standard are section 6.2.5, §27
A pointer to void shall have the same
representation and alignment
requirements as a pointer to a
character type. Similarly, pointers
to qualified or unqualified versions
of compatible types shall have the
same representation and alignment
requirements. All pointers to
structure types shall have the same
representation and alignment
requirements as each other. All
pointers to union types shall have the
same representation and alignment
requirements as each other. Pointers
to other types need not have the same
representation or alignment
requirements.
and less relevant (but perhaps also interesting) section 6.3.2.3, §7
A pointer to an object or incomplete
type may be converted to a pointer to
a different object or incomplete type.
If the resulting pointer is not
correctly aligned for the pointed-to
type, the behavior is undefined.
Otherwise, when converted back again,
the result shall compare equal to the
original pointer. When a pointer to an
object is converted to a pointer to a
character type, the result points to
the lowest addressed byte of the
object. Successive increments of the
result, up to the size of the object,
yield pointers to the remaining bytes
of the object.
Anything beyond that is implementation-specific.
In C, any pointer can downcast to void*, but not to void**. You will need an explicit cast.

Resources