I always wondered why warnings passing argument 1 from of 'foo' makes pointer from integer without a cast and alike are only warnings and not errors.
Actually these warnings are almost always errors.
Does somebody know what's the idea behind this?
Is it mostly to allow prehistoric code to be compiled without errors?
Or just to comply to the standard? Then latter maybe needs some fixing.
Example:
int foo(int *bar)
{
*bar = 42;
}
void bar()
{
int n = 0;
foo(n); // this is obviously an error
...
}
Per 6.5.2.2 Function Calls, ¶ 7:
If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type
The relevant text in 6.5.16.1 Simple Assignment is:
Constraints
One of the following shall hold:
the left operand has atomic, qualified, or unqualified arithmetic type, and the right has arithmetic type;
the left operand has an atomic, qualified, or unqualified version of a structure or union type compatible with the type of the right;
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) one operand is a pointer to an object type, and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
the left operand is an atomic, qualified, or unqualified pointer, and the right is a null pointer constant; or
the left operand has type atomic, qualified, or unqualified _Bool, and the right is a pointer.
None of these allow the left operand as a pointer and the right operand as an integer. Thus, such an assignment (and by the first quoted text above, the function call) is a constraint violation. This means the compiler is required by the standard to "diagnose" it. However it's up to the compiler what it does beyond that. Yes, an error would be highly preferable, but just printing a warning is a low-quality way to satisfy the requirement to "diagnose" constraint violations like this.
Does somebody know what's the idea behind this?
Is it mostly to allow prehistoric code to be compiled without errors?
Or just to comply to the standard? Then latter maybe needs some fixing.
It is to comply with the standard in the sense that the standard requires conforming implementations to diagnose such issues, as #R.. describes in his answer. Implementations are not required to reject programs on account of such issues, however. As for why some compilers instead accept such programs, that would need to be evaluated on a per-implementation basis, but this quotation from the first edition of K&R may shed a bit of light:
5.6 Pointers are not Integers
You may notice in older C programs a rather cavalier attitude toward
copying pointers. It has generally been true that on most machines a
pointer may be assigned to an integer and back again; no scaling or
conversion takes place, and no bits are lost. Regrettably, this has
led to the taking of liberties with routines that return pointers
which are then merely passed to other routines -- the requisite
pointer declarations are often left out.
(Kernighan & Ritchie, The C Programming Language, 1st ed., 1978)
Notice in the first place that this long predates even C89. I'm a bit amused today that the authors were then talking about "older" C programs. But note too that even at that time, the C language as defined by K&R did not formally permit implicit conversion between pointers and integers (though it did permit casting between them).
Nevertheless, there were programs that relied on implicit conversion anyway, apparently because it happened to work on the targeted implementations. It was attractive, by some people's standards at the time, in conjunction with primordial C's implicit typing rules. One could let a variable or function intended to return or store a pointer default to type int by omitting its declaration altogether, and as long as it was interpreted as a pointer wherever it ultimately was used, everything usually happened to work as intended.
I'm inclined to guess that everything continuing to work as intended, thereby supporting backwards compatibility, was a consideration for compiler developers in continuing to accept implicit conversions, so that's "allow[ing] prehistoric code to be compiled." I note, however, that these days code with implicit conversions of this kind are much less likely to work as intended than they used to be, for many machines these days have 64-bit pointers but only 32-bit ints.
The behaviour of assigning an arithmetic type to a pointer is not well formed in the C standard. (See the answer provided by R.. for relevant sections.)
Your compiler (or the settings you're using) have decided to treat that as a warning.
Compilers have default settings and often support language extensions and those may be quite liberal.
Notice for anything outside the language specification it's up to the implementers of the compiler to decide what's an error or if they're going to interpret it as a language extension and (hopefully) issue a warning that the the code is off the offical piste.
I agree that's not best. My recommendation would be to treat is an error because it almost certainly is and casting an int to a pointer is the standard supported way of being explicit and getting the same result (e.g. int *n).
I think you're using GCC and it's notorious for "helpfully" compiling things that it could better serve you by rejecting and making you use standard constructs.
Enable all warnings (-Wall on the gcc command-line) and make sure you understand and address them all appropriately.
Related
I've read various posts on Stack Overflow RE: the derefercing type-punned pointer error. My understanding is that the error is essentially the compiler warning of the danger of accessing an object through a pointer of a different type (though an exception appears to be made for char*), which is an understandable and reasonable warning.
My question is specific to the code below: why does casting the address of a pointer to a void** qualify for this warning (promoted to error via -Werror)?
Moreover, this code is compiled for multiple target architectures, only one of which generates the warning/error - might this imply that it is legitimately a compiler version-specific deficiency?
// main.c
#include <stdlib.h>
typedef struct Foo
{
int i;
} Foo;
void freeFunc( void** obj )
{
if ( obj && * obj )
{
free( *obj );
*obj = NULL;
}
}
int main( int argc, char* argv[] )
{
Foo* f = calloc( 1, sizeof( Foo ) );
freeFunc( (void**)(&f) );
return 0;
}
If my understanding, stated above, is correct, a void**, being still just a pointer, this should be safe casting.
Is there a workaround not using lvalues that would pacify this compiler-specific warning/error? I.e. I understand that and why this will resolve the issue, but I would like to avoid this approach because I want to take advantage of freeFunc() NULLing an intended out-arg:
void* tmp = f;
freeFunc( &tmp );
f = NULL;
Problem compiler (one of one):
user#8d63f499ed92:/build$ /usr/local/crosstool/x86-fc3/bin/i686-fc3-linux-gnu-gcc --version && /usr/local/crosstool/x86-fc3/bin/i686-fc3-linux-gnu-gcc -Wall -O2 -Werror ./main.c
i686-fc3-linux-gnu-gcc (GCC) 3.4.5
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
./main.c: In function `main':
./main.c:21: warning: dereferencing type-punned pointer will break strict-aliasing rules
user#8d63f499ed92:/build$
Not-complaining compiler (one of many):
user#8d63f499ed92:/build$ /usr/local/crosstool/x86-rh73/bin/i686-rh73-linux-gnu-gcc --version && /usr/local/crosstool/x86-rh73/bin/i686-rh73-linux-gnu-gcc -Wall -O2 -Werror ./main.c
i686-rh73-linux-gnu-gcc (GCC) 3.2.3
Copyright (C) 2002 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
user#8d63f499ed92:/build$
Update: I've further discovered the warning appears to be generated specifically when compiled with -O2 (still with the noted "problem compiler" only)
A value of type void** is a pointer to an object of type void*. An object of type Foo* is not an object of type void*.
There is an implicit conversion between values of type Foo* and void*. This conversion may change the representation of the value. Similarly, you can write int n = 3; double x = n; and this has the well-defined behavior of setting x to the value 3.0, but double *p = (double*)&n; has undefined behavior (and in practice will not set p to a “pointer to 3.0” on any common architecture).
Architectures where different types of pointers to objects have different representations are rare nowadays, but they are permitted by the C standard. There are (rare) old machines with word pointers which are addresses of a word in memory and byte pointers which are addresses of a word together with a byte offset in this word; Foo* would be a word pointer and void* would be a byte pointer on such architectures. There are (rare) machines with fat pointers which contain information not only about the address of the object, but also about its type, its size and its access control lists; a pointer to a definite type might have a different representation from a void* which needs additional type information at runtime.
Such machines are rare, but permitted by the C standard. And some C compilers take advantage of the permission to treat type-punned pointers as distinct to optimize code. The risk of pointers aliasing is a major limitation to a compiler's ability to optimize code, so compilers tend to take advantage of such permissions.
A compiler is free to tell you that you're doing something wrong, or to quietly do what you didn't want, or to quietly do what you wanted. Undefined behavior allows any of these.
You can make freefunc a macro:
#define FREE_SINGLE_REFERENCE(p) (free(p), (p) = NULL)
This comes with the usual limitations of macros: lack of type safety, p is evaluated twice. Note that this only gives you the safety of not leaving dangling pointers around if p was the single pointer to the freed object.
A void * is treated specially by the C standard in part because it references an incomplete type. This treatment does not extend to void ** as it does point to a complete type, specifically void *.
The strict aliasing rules say you can't convert a pointer of one type to a pointer of another type and subsequently dereference that pointer because doing so means reinterpreting the bytes of one type as another. The only exception is when converting to a character type which allows you to read the representation of an object.
You can get around this limitation by using a function-like macro instead of a function:
#define freeFunc(obj) (free(obj), (obj) = NULL)
Which you can call like this:
freeFunc(f);
This does have a limitation however, because the above macro will evaluate obj twice. If you're using GCC, this can be avoided with some extensions, specifically the typeof keyword and statement expressions:
#define freeFunc(obj) ({ typeof (&(obj)) ptr = &(obj); free(*ptr); *ptr = NULL; })
Dereferencing a type punned pointer is UB and you can't count on what will happen.
Different compilers generate different warnings, and for this purpose different versions of the same compiler can be considered as different compilers. This seems a better explanation for the variance you see than a dependence on the architecture.
A case which may help you understand why type punning in this case can be bad is that your function won't work on an architecture for which sizeof(Foo*) != sizeof(void*). That is authorized by the standard although I don't know any current one for which this is true.
A workaround would be to use a macro instead of a function.
Note that free accepts null pointers.
This code is invalid per the C Standard, so it might work in some cases, but is not necessarily portable.
The "strict aliasing rule" for accessing a value via a pointer that has been cast to a different pointer type is found in 6.5 paragraph 7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
In your *obj = NULL; statement, the object has effective type Foo* but is accessed by the lvalue expression *obj with type void*.
In 6.7.5.1 paragraph 2, we have
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
So void* and Foo* are not compatible types or compatible types with qualifiers added, and certainly don't fit any of the other options of the strict aliasing rule.
Although not the technical reason the code is invalid, it's also relevant to note section 6.2.5 paragraph 26:
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
As for the differences in warnings, this is not a case where the Standard requires a diagnostic message, so it's just a matter of how good the compiler or its version is at noticing potential issues and pointing them out in a helpful way. You noticed optimization settings can make a difference. This is often because more information is internally generated about how various pieces of the program actually fit together in practice, and that extra information is therefore also available for warning checks.
On top of what the other answers have said, this is a classic anti-pattern in C, and one which should be burned with fire. It appears in:
Free-and-null-out functions like the one you've found the warning in.
Allocation functions that shun the standard C idiom of returning void * (which doesn't suffer from this issue because it involves a value conversion instead of type punning), instead returning an error flag and storing the result via a pointer-to-pointer.
For another example of (1), there was a longstanding infamous case in ffmpeg/libavcodec's av_free function. I believe it was eventually fixed with a macro or some other trick, but I'm not sure.
For (2), both cudaMalloc and posix_memalign are examples.
In neither case does the interface inherently require invalid usage, but it strongly encourages it, and admits correct usage only with an extra temporary object of type void * that defeats the purpose of the free-and-null-out functionality, and makes allocation awkward.
Although C was designed for machines which use the same representation for all pointers, the authors of the Standard wanted to make the language usable on machines that use different representations for pointers to different types of objects. Therefore, they did not require that machines which use different pointer representations for different kinds of pointers support a "pointer to any kind of pointer" type, even though many machines could do so at zero cost.
Before the Standard was written, implementations for platforms that used the same representation for all pointer types would unanimously allow a void** to be used, at least with suitable casting, as a "pointer to any pointer". The authors of the Standard almost certainly recognized that this would be useful on platforms that supported it, but since it couldn't be universally supported they declined to mandate it. Instead, they expected that quality implementation would process such constructs as what the Rationale would describe as a "popular extension", in cases where doing so would make sense.
Is the behavior of this code well-defined?
#include <stdatomic.h>
const int test = 42;
const int * _Atomic atomic_int_ptr;
atomic_init(&atomic_int_ptr, &test);
const int ** int_ptr_ptr = &atomic_int_ptr;
printf("int = %d\n", **int_ptr_ptr); //prints int = 42
I assigned a pointer to atomic type to a pointer to non-atomic type (the types are the same). Here are my thoughts of this example:
The Standard explicitly specify distinction of const, volatile and restrict qualifiers from the _Atomic qualifier 6.2.5(p27):
this Standard explicitly uses the phrase ‘‘atomic, qualified or
unqualified type’’ whenever the atomic version of a type is permitted
along with the other qualified versions of a type. The phrase
‘‘qualified or unqualified type’’, without specific mention of atomic,
does not include the atomic types.
Also the compatibility of qualified types is defined as 6.7.3(p10):
For two qualified types to be compatible, both shall have the
identically qualified versionof a compatible type; the order of
type qualifiers within a list of specifiers or qualifiers does
not affect the specified type.
Combining the quotes cited above I concluded that atomic and non-atomic types are compatible types. So, applying the rule of simple assigning 6.5.16.1(p1) (emp. mine):
the left operand has atomic, qualified, or unqualified pointer
type, and (considering the type the left operand would have
after lvalue conversion) both operands are pointers to qualified
or unqualified versions of compatible types, and the type pointed to by
the left has all the qualifiers of the type pointed to by the right;
So I concluded that the behavior is well defined (even in spite of assigning atomic type to a non-atomic type).
The problem with all that is that applying the rules above we can also conclude that simple assignment a non-atomic type to an atomic type is also well defined which is obviously not true since we have a dedicated generic atomic_store function for that.
6.2.5p27:
Further, there is the _Atomic qualifier. The presence of the _Atomic
qualifier designates an atomic type. The size, representation, and
alignment of an atomic type need not be the same as those of the
corresponding unqualified type. Therefore, this Standard explicitly
uses the phrase ''atomic, qualified or unqualified type'' whenever the
atomic version of a type is permitted along with the other qualified
versions of a type. The phrase ''qualified or unqualified type'',
without specific mention of atomic, does not include the atomic types.
I think this should make it clear that atomic-qualified types are not deemed compatible with qualified or unqualified versions of the types they're based on.
C11 allows _Atomic T to have a different size and layout than T, e.g. if it's not lock-free. (See #PSkocik's answer).
For example, the implementation could choose to put a mutex inside each atomic object, and put it first. (Most implementations instead use the address as an index into a table of locks: Where is the lock for a std::atomic? instead of bloating each instance of an _Atomic or std::atomic<T> object that isn't guaranteed lock-free at compile time).
Therefore _Atomic T* is not compatible with T* even in a single-threaded program.
Merely assigning a pointer might not be UB (sorry I didn't put on my language lawyer hat), but dereferencing certainly can be.
I'm not sure if it's strictly UB on implementations where _Atomic T and T do share the same layout and alignment. Probably it violates strict aliasing, if _Atomic T and T are considered different types regardless of whether or not they share the same layout.
alignof(T) might be different from alignof(_Atomic T), but other than an intentionally perverse implementation (Deathstation 9000), _Atomic T will be at least as aligned as plain T, so that's not an issue for casting pointers to objects that already exist. An object that's more aligned than it needs to be is not a problem, just a possible missed-optimization if it stops a compiler from using a single wider load.
Fun fact: creating an under-aligned pointer is UB in ISO C, even without dereference. (Most implementations don't complain, and Intel's _mm_loadu_si128 intrinsic even requires compilers to support doing so.)
In practice on real implementations, _Atomic T* and T* use the same layout / object representation and alignof(_Atomic T) >= alignof(T). A single-threaded or mutex-guarded part of a program could do non-atomic access to an _Atomic object, if you can work around the strict-aliasing UB. Maybe with memcpy.
On real implementations, _Atomic may increase the alignment requirement, e.g. a struct {int a,b;} on most ABIs for most 64-bit ISAs would typically only have 4-byte alignment (max of the members), but _Atomic would give it natural alignment = 8 to allow loading/storing it with a single aligned 64-bit load/store. This of course doesn't change the layout or alignment of the members relative to the start of the object, just the alignment of the object as a whole.
The problem with all that is that applying the rules above we can also conclude that simple assignment a non-atomic type to an atomic type is also well defined which is obviously not true since we have a dedicated generic atomic_store function for that.
No, that reasoning is flawed.
atomic_store(&my_atomic, 1) is equivalent to my_atomic=1;. In the C abstract machine, they both do an atomic store with memory_order_seq_cst.
You can also see this from looking at the code-gen for real compilers on any ISA; e.g. x86 compilers will use an xchg instruction, or mov+mfence. Similarly, shared_var++ compiles to an atomic RMW (with mo_seq_cst).
IDK why there's an atomic_store generic function. Maybe just for contrast / consistency with atomic_store_explicit, which lets you do atomic_store_explicit(&shared_var, 1, memory_order_release) or memory_order_relaxed to do a release or relaxed store instead of sequential-release. (On x86, just a plain store. Or on weakly-ordered ISAs, some fencing but not a full barrier.)
For types that are lock-free, where the object representation of _Atomic T and T are identical, there's no problem in practice accessing an atomic object through a non-atomic pointer in a single-threaded program. I suspect it's still UB, though.
C++20 is planning to introduce std::atomic_ref<T> which will let you do atomic operations on a non-atomic variable. (With no UB as long as no threads are potentially doing non-atomic access to it during the time window of being written.) This is basically a wrapper around the __atomic_* builtins in GCC for example, that std::atomic<T> is implemented on top of.
(This presents some problems, like if atomic<T> needs more alignment than T, e.g. for long long or double on i386 System V. Or a struct of 2x int on most 64-bit ISAs. You should use alignas(_Atomic T) T foo when declaring non-atomic objects you want to be able to do atomic operations on.)
Anyway, I'm not aware of any standards-compliant way to do similar things in portable ISO C11, but it's worth mentioning that real C compilers very much do support doing atomic operations on objects declared without _Atomic. But only using stuff like GNU C atomic builtins.:
See Casting pointers to _Atomic pointers and _Atomic sizes : apparently casting a T* to _Atomic T* is not recommended even in GNU C. Although we don't have a definitive answer that it's actually UB.
The C11 standard ISO/IEC 9899:2011 (E) states the following constraints for simple assignments in §6.5.16.1/1:
One of the following shall hold:
the left operand has atomic, qualified, or unqualified arithmetic type, and the right has
arithmetic type;
the left operand has an atomic, qualified, or unqualified version of a structure or union
type compatible with the type of the right;
the left operand has atomic, qualified, or unqualified pointer type, and (considering
the type the left operand would have after lvalue conversion) both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
the left operand has atomic, qualified, or unqualified pointer type, and (considering
the type the left operand would have after lvalue conversion) one operand is a pointer
to an object type, and the other is a pointer to a qualified or unqualified version of
void, and the type pointed to by the left has all the qualifiers of the type pointed to
by the right;
the left operand is an atomic, qualified, or unqualified pointer, and the right is a null
pointer constant; or
the left operand has type atomic, qualified, or unqualified _Bool, and the right is a pointer.
I am interested in the case in which both sides are pointers to incompatible types different from void. If I understand correctly, this should at the very least invoke UB, as it violates this constraint. One example for incompatible types should be (according to §6.2.7 and §6.7.2) int and double.
Therefore the following program should be in violation:
int main(void) {
int a = 17;
double* p;
p = &a;
(void)p;
}
Both gcc and clang warn about "-Wincompatible-pointer-types", but do not abort compilation (compilation with -std=c11 -Wall -Wextra -pedantic).
Similarly, the following program only leads to a "-Wint-conversion" warning, while compiling just fine.
int main(void) {
int a;
double* p;
p = a;
(void)p;
}
Coming from C++, I expected that either of those test cases would require a cast to compile. Is there any reason why either of the programs would be standards-legal? Or, are there at least significant historic reasons for supporting this code style even when disabling the entertaining GNU C extensions by explicitly using -std=c11 instead of -std=gnu11?
Is there any reason why either of the programs would be standards-legal?
These programs are not "standards-legal". They contain constraint violations and you already quoted the right text from the standard.
The compilers conform to the standard by producing a diagnostic for constraint violation. The standard does not require compilation to abort in the case of a constraint violation or other erroneous program.
It doesn't say in as many words, but the only reasonable conclusion is that any executable generated as a result of a program containing a constraint violation has completely undefined behaviour. (I have seen people try to argue otherwise though).
Speculation follows: C (and C++) are used for many purposes; sometimes people want "high level assembler" for their machine and don't care about portability or standards. Presumably the compiler vendors set the defaults to what they think their target audience would prefer.
The compiler flag (both gcc and clang) to request checks for strict standards conformance and to refuse to compile nonconformant code is -pedantic-errors:
$ gcc -std=c11 -pedantic-errors x.c
x.c: In function ‘main’:
x.c:3:15: error: initialization from incompatible pointer type [-Wincompatible-pointer-types]
double* p = &a;
^
Clang:
$ clang -std=c11 -pedantic-errors x.c
x.c:3:11: error: incompatible pointer types initializing 'double *' with an
expression of type 'int *' [-Werror,-Wincompatible-pointer-types]
double* p = &a;
^ ~~
1 error generated.
A significant proportion (to say the least) of typical C code in the wild is nonconformant, so -pedantic-errors would cause most C programs and libraries to fail to compile.
Your code example and your citation of the standard does not match. The example is initialization and 6.5.16 talks about assignment.
Confusingly the matching-type requirement is in a constraint section 6.5.16 for assignment, but "only" in the semantics section (6.7.9) for for initialization. So the compilers have the "right" not to issue a diagnostic for initialization.
In C, constraint violations only require "diagnostics", the compiler may well continue compilation, but there is no guarantee that the resulting executable is valid.
On my platform, a Debian testing, both compilers give me a diagnostic without any option, so I guess your installation must be quite old and obsolete.
No, Yes
It's really very simple.
No, not standards legal. Yes, significant historical reasons.
C did not originally even have casts. As a system programming language, using it as an ultra-powerful glorified assembler was not only reasonable, it was best-practice and really "only-practice", back in the day.
A key piece of information should shine a light on things: it really is not the compiler's job to either implement or enforce the specification. The specification is, actually, literally, only a suggestion. The compiler's actual job is to compile all the C code that was ever written, including pre-C11, pre-C99, pre-C89, and even pre-K&R. This is why there are so many optional restrictions. Projects with modern code styles turn on strictly conforming modes.
There is the way C is defined in standards, and, there is the way C is used in practice. So you can see, the compiler simply can't refuse to build the code.
Over decades, developers have been shifting to portable, strictly conforming code, but when C first appeared, it was used a bit like a really amazingly powerful assembler, and it was kind of open season on address arithmetic and type punning. Programs in those days were mostly written for one architecture at a time.
The Question
The question of whether all pointers derived from pointers to structure types are the same, is not easy to answer. I find it to be a significant question for the following two primary reasons.
A. The lack of a pointer to pointer to 'any' incomplete or object type, imposes a limitation on convenient function interfaces, such as:
int allocate(ANY_TYPE **p,
size_t s);
int main(void)
{
int *p;
int r = allocate(&p, sizeof *p);
}
[Complete code sample]
The existing pointer to 'any' incomplete or object type is explicitly described as:
C99 / C11 §6.3.2.3 p1:
A pointer to void may be converted to or from a pointer to any incomplete or object type. [...]
A pointer derived from the existing pointer to 'any' incomplete or object type, pointer to pointer to void, is strictly a pointer to pointer to void, and is not required to be convertible with a pointer derived from a pointer to 'any' incomplete or object type.
B. It is not uncommon for programmers to utilize conventions based on assumptions that are not required, related to the generalization of pointers, knowingly or unknowingly, while depending on their experience with their specific implementations. Assumptions such as being convertible, being representable as integers, or sharing a common property: object size, representation, or alignment.
The words of the standard
According to C99 §6.2.5 p27 / C11 §6.2.5 p28:
[...] All pointers to structure types shall have the same representation and alignment requirements as each other. [...]
Followed by C99 TC3 Footnote 39 / C11 Footnote 48:
The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
Although the standard doesn't say: "A pointer to a structure type" and the following words have been chosen: "All pointers to structure types", it doesn't explicitly specify whether it applies to a recursive derivation of such pointers. In other occasions where special properties of pointers are mentioned in the standard, it doesn't explicitly specify or mention recursive pointer derivation, which means that either the 'type derivation' applies, or it doesn't- but it's not explicitly mentioned.
And although the phrasing "All pointers to" while referring to types is used only twice, (for structure and union types), as opposed to the more explicit phrasing: "A pointer to" which is used throughout the standard, we can't conclude whether it applies to a recursive derivation of such pointers.
Background
The assumption that the standard implicitly requires all pointers to structure types, (complete, incomplete, compatible and incompatible), to have the same representation and alignment requirements, began at C89- many years before the standard required it explicitly. The reasoning behind it was the compatibility of incomplete types in separate translation units, and although according to the C standards committee, the original intent was to allow the compatibility of an incomplete type with its completed variation, the actual words of the standard did not describe it. This has been amended in the second Technical corrigendum to C89, and therefore made the original assumption concrete.
Compatibility and Incomplete Types
While reading the guidelines related to compatibility and incomplete types, thanks to Matt McNabb, we find further insight of the original C89 assumption.
Pointer derivation of object and incomplete types
C99 / C11 §6.2.5 p1:
Types are partitioned into object types, function types, and incomplete types.
C99 / C11 §6.2.5 p20:
A pointer type may be derived from a function type, an object type, or an incomplete type, called the referenced type.
C99 / C11 §6.2.5 p22:
A structure or union type of unknown content is an incomplete type. It is completed, for all declarations of that type, by declaring the same structure or union tag with its defining content later in the same scope.
Which means that pointers may be derived from both object types and incomplete types. Although it isn't specified that incomplete types are not required to be completed; in the past the committee responded on this matter, and stated that the lack of a prohibition is sufficient and there's no need for a positive statement.
The following pointer to pointer to incomplete 'struct never_completed', is never completed:
int main(void)
{
struct never_completed *p;
p = malloc(1024);
}
[Complete code sample]
Compatible types of separate translation units
C99 / C11 §6.7.2.3 p4:
All declarations of structure, union or enumerated types that have the same scope and use the same tag declare the same type.
C99 / C11 §6.2.7 p1:
Two types have compatible type if their types are the same. Two structure types declared in separate translation units are compatible if their tags (are) the same tag. [trimmed quote] [...]
This paragraph has a great significance, allow me to summarize it: two structure types declared in separate translation units are compatible if they use the same tag. If both of them are completed- their members have to be the same (according to the specified guidelines).
Compatibility of pointers
C99 §6.7.5.1 p2 / C11 §6.7.6.1 p2:
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
If the standard mandates that two structures under specified conditions, are to be compatible in separate translation units whether being incomplete or complete, it means that the pointers derived from these structures are compatible just as well.
C99 / C11 §6.2.5 p20:
Any number of derived types can be constructed from the object, function, and incomplete types
These methods of constructing derived types can be applied recursively.
And due to the fact that pointer derivation is recursive, it makes pointers derived from pointers to compatible structure types, to be compatible with each other.
Representation of compatible types
C99 §6.2.5 p27 / C11 §6.2.5 p28:
pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements.
C99 / C11 §6.3 p2:
Conversion of an operand value to a compatible type causes no change to the value or the representation.
C99 / C11 §6.2.5 p26:
The qualified or unqualified versions of a type are distinct types that belong to the same type category and have the same representation and alignment requirements.
This means that a conforming implementation can't have a distinct judgement concerning the representation and alignment requirements of pointers derived from incomplete or complete structure types, due to the possibility that a separate translation unit might have a compatible type, which will have to share the same representation and alignment requirements, and it is required to apply the same distinct judgement with either an incomplete or a complete variation of the same structure type.
The following pointer to pointer to incomplete 'struct complete_incomplete':
struct complete_incomplete **p;
Is compatible and shares the same representation and alignment requirements as the following pointer to pointer to complete 'struct complete_incomplete':
struct complete_incomplete { int i; } **p;
C89 related
If we wonder about the premise concerning C89, defect report #059 of Jun 93' questioned:
Both sections do not explicitly require that an incomplete type eventually must be completed, nor do they explicitly allow incomplete types to remain incomplete for the whole compilation unit. Since this feature is of importance for the declaration of true opaque data types, it deserves clarification.
Considering mutual referential structures defined and implemented in different compilation units makes the idea of an opaque data type a natural extension of an incomplete data type.
The response of the committee was:
Opaque data types were considered, and endorsed, by the Committee when drafting the C Standard.
Compatibility versus Interchangeability
We have covered the aspect concerning the representation and alignment requirements of recursive pointer derivation of pointers to structure types, now we are facing a matter that a non-normative footnote mentioned, 'interchangeability':
C99 TC3 §6.2.5 p27 Footnote 39 / C11 §6.2.5 p28 Footnote 48:
The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
The standard says that the notes, footnotes, and examples are non-normative and are "for information only".
C99 FOREWORD p6 / C11 FOREWORD p8:
[...] this foreword, the introduction, notes, footnotes, and examples are also for information only.
It's unfortunate that this confusing footnote was never changed, because at best- the footnote is specifically about the direct types referring to it, so phrasing the footnote as-if the properties of "representation and alignment requirements" are without the context of these specific types, makes it easy to interpret as being a general rule for all types that share a representation and alignment. If the footnote is to be interpreted without the context of specific types, then it's obvious that the normative text of the standard doesn't imply it, even without the need to debate the interpretation of the term 'interchangeable'.
Compatibility of pointers to structure types
C99 / C11 §6.7.2.3 p4:
All declarations of structure, union or enumerated types that have the same scope and use the same tag declare the same type.
C99 / C11 §6.2.7 p1:
Two types have compatible type if their types are the same.
C99 §6.7.5.1 p2 / C11 §6.7.6.1 p2:
For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.
This states the obvious conclusion, different structure types are indeed different types, and because they are different they are incompatible. Therefore, two pointers to two different and incompatible types, are incompatible just as well, regardless of their representation and alignment requirements.
Effective types
C99 / C11 §6.5 p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object
C99 / C11 §6.5 p6:
The effective type of an object for an access to its stored value is the declared type of the object, if any.
Incompatible pointers are not 'interchangeable' as arguments to functions, nor as return values from functions. Implicit conversions and specified special cases are the exceptions, and these types are not part of any such exception. Even if we decide to add an unrealistic requirement for said 'interchangeability', and say that an explicit conversion is required to make it applicable, then accessing the stored value of an object with an incompatible effective type breaks the effective types rules. For making it a reality we need a new property that currently the standard doesn't have. Therefore sharing the same representation and alignment requirements, and being convertible, is simply not enough.
This leaves us with being interchangeable 'as members of unions', and although they are indeed interchangeable as members of union- it bears no special significance.
Official interpretations
1. The first 'official' interpretation belongs to a member of the C standards committee. His interpretation for: "are meant to imply interchangeability", is that it doesn't actually imply that such an interchangeability exists, but actually makes a suggestion for it.
As much as I would like it to become a reality, I wouldn't consider an implementation that took a suggestion from a non-normative footnote, not to mention an unreasonably vague footnote, while contradicting normative guidelines- to be a conforming implementation. This obviously renders a program that utilizes and depends on such a 'suggestion', to be a non-strictly conforming one.
2. The second 'official' interpretation belongs to a member/contributor to the C standards committee, by his interpretation the footnote doesn't introduce a suggestion, and because the (normative) text of standard doesn't imply it- he considers it to be a defect in the standard. He even made a suggestion to change the effective types rules for addressing this matter.
3. The third 'official' interpretation is from defect report #070 of Dec 93`. It has been asked, within the context of C89, whether a program that passes an 'unsigned int' type, where the type 'int' is expected, as an argument to a function with a non-prototype declarator, to introduce undefined behavior.
In C89 there's the very same footnote, with the same implied interchangeability as arguments to functions, attached to:
C89 §3.1.2.5 p2:
The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.
The committee responded that they encourage implementors to allow this interchangeability to work, but since it's not a requirement, it renders the program to be a non-strictly conforming one.
The following code sample is not strictly conforming. '&s1' and 'struct generic **' are sharing the same representation and alignment requirements, but nevertheless they are incompatible. According to the effective types rules, we are accessing the stored value of the object 's1' with an incompatible effective type, a pointer to 'struct generic', while its declared type, and therefore effective type, is a pointer to 'struct s1'. To overcome this limitation we could've used the pointers as members of a union, but this convention damages the goal of being generic.
int allocate_struct(void *p,
size_t s)
{
struct generic **p2 = p;
if ((*p2 = malloc(s)) == NULL)
return -1;
return 0;
}
int main(void)
{
struct s1 { int i; } *s1;
if (allocate_struct(&s1, sizeof *s1) != 0)
return EXIT_FAILURE;
}
[Complete code sample]
The following code sample is strictly conforming, to overcome both issues of effective types and being generic, we're taking advantage of: 1. a pointer to void, 2. the representation and alignment requirements of all pointers to structs, and 3. accessing the pointer's byte representation 'generically', while using memcpy to copy the representation, without affecting its effective type.
int allocate_struct(void *pv,
size_t s)
{
struct generic *pgs;
if ((pgs = malloc(s)) == NULL)
return -1;
memcpy(pv, &pgs, sizeof pgs);
return 0;
}
int main(void)
{
struct s1 { int i; } *s1;
if (allocate_struct(&s1, sizeof *s1) != 0)
return EXIT_FAILURE;
}
[Complete code sample]
The Conclusion
The conclusion is that a conforming implementation must have the same representation and alignment requirements, respectively, for all recursively derived pointers to structure types, whether they are incomplete or complete, and whether they are compatible or incompatible. Although whether the types are compatible or incompatible is significant, but due to the mere possibility of a compatible type, they must share the fundamental properties of representation and alignment. It would've been preferred if we could access pointers that share representation and alignment directly, but unfortunately the current effective types rules do not require it.
My answer is "no."
There is no wording in any standard of C that I'm aware of which suggests otherwise. The fact that all pointers to structure types have the same representation and alignment requirements has no bearing on any derived type.
This makes complete sense and any other reality would seem to be inconsistent. Consider the alternative:
Let's call the alignment and representation requirements for pointers to structure types "A". Suppose that any "recursively derived type" shares the requirements "A".
Let's call the alignment and representation requirements for pointers to union types "B". Suppose that any "recursively derived type" shares the requirements "B".
Let's suppose that "A" and "B" are not the same[1]. Furthermore, let's suppose that they cannot be satisfied at the same time. (A 4-byte representation and an 8-byte representation, for example.)
Now derive a type from both:
A type with requirements "A"
A type with requirements "B"
Now you have a type whose requirements are impossible to satisfy, because it must satisfy "A" and "B", but they cannot both be satisfied at once.
Perhaps you're thinking of derived types as having a flat lineage all the way back to a single ancestor, but that's not so. Derived types can have many ancestors. The standard definition of "derived types" discusses this.
[1] While it might seem unreasonable, unlikely and silly, it's allowed.
I work on compilers for a couple of embedded platforms. A user has recently complained about the following behaviour from one of our compilers. Given code like this:
extern volatile int MY_REGISTER;
void Test(void)
{
(void) (MY_REGISTER = 1);
}
The compiler generates this (in pseudo-assembler):
Test:
move regA, 1
store regA, MY_REGISTER
load regB, MY_REGISER
That is, it not only writes to MY_REGISTER, but reads it back afterwards. The extra load upset him for performance reasons. I explained that this was because according to the standard "An assignment expression has the value of the left operand after the assignment, [...]".
Strangely, removing the cast-to-void changes the behaviour: the load disappears. The user's happy, but I'm just confused.
So I also checked this out in a couple of versions of GCC (3.3 and 4.4). There, the compiler never generates a load, even if the value is explicitly used, e.g.
int TestTwo(void)
{
return (MY_REGISTER = 1);
}
Turns into
TestTwo:
move regA, 1
store regA, MY_REGISTER
move returnValue, 1
return
Does anyone have a view on which is a correct interpretation of the standard? Should the read-back happen at all? Is it correct or useful to add the read only if the value is used or cast to void?
The relevant paragraph in the standard is this
An assignment operator stores a value
in the object designated by the left
operand. An assignment expression has
the value of the left operand after
the assignment, but is not an lvalue.
The type of an assignment expression
is the type of the left operand unless
the left operand has qualified type,
in which case it is the unqualified
version of the type of the left
operand. The side effect of updating the stored value of the left operand shall
occur between the previous and the next sequence point.
So this clearly makes the difference between "the value of the left operand" and the update of the stored value. Also note that the return is not an lvalue (so there is no reference to the variable in the return of the expression) and all qualifiers are lost.
So I read this as gcc doing the right thing when it returns the value that it knowingly has to store.
Edit:
The upcoming standard plans to clarify that by adding a footnote:
The implementation is permitted to
read the object to determine the value
but is not required to, even when the
object has volatile-qualified type.
Edit 2:
Actually there is another paragraph about expression statements that might shed a light on that:
The expression in an expression
statement is evaluated as a void
expression for its side effects.\footnote{Such as assignments, and function calls which have side effects}
Since this implies that the effect of returning a value is not wanted for such a statement, this strongly suggests that the value may only be loaded from the variable if the value is used.
As a summary, your customer really is rightly upset when he sees that the variable is loaded. This behavior might be in accordance with the standard if you stretch the interpretation of it, but it clearly is on the borderline of being acceptable.
Reading back seems to be nearer to the standard (especially considering that reading a volatile variable can result in a different value than the one written), but I'm pretty sure it isn't what is expected by most code using volatile, especially in contexts where reading or writing a volatile variable triggers some other effects.
volatile in general isn't very well defined -- "What constitutes an access to an object that
has volatile-qualified type is implementation-defined."
Edit: If I had to make a compiler, I think I wouldn't read back the variable if it isn't used and reread it if is, but with a warning. Then should a cast to void be an used?
(void) v;
should surely be one, and considering that, I don't any reason for
(void) v = exp;
not to be. But in any case, I'd give a warning explaining how to get the other effect.
BTW, If you work on a compiler, you probably have someone in contact with the C committee, filling a formal defect report will bring you a binding interpretation (well, there is the risk of the DR being classified "Not A Defect" without any hint about what they want...)
The language in the standard says nothing about reading the volatile variable, only what the value of the assignment expression is, which a) is defined by C semantics, not by the content of the variable and b) isn't used here, so need not be calculated.