Related
For example, can this
unsigned f(float x) {
unsigned u = *(unsigned *)&x;
return u;
}
cause unpredictable results on a platform where,
unsigned and float are both 32-bit
a pointer has a fixed size for all types
unsigned and float can be stored to and loaded from the same part of memory.
I know about strict aliasing rules, but most examples showing problematic cases of violating strict aliasing is like the following.
static int g(int *i, float *f) {
*i = 1;
*f = 0;
return *i;
}
int h() {
int n;
return g(&n, (float *)&n);
}
In my understanding, the compiler is free to assume that i and f are implicitly restrict. The return value of h could be 1 if the compiler thinks *f = 0; is redundant (because i and f can't alias), or it could be 0 if it puts into account that the values of i and f are the same. This is undefined behaviour, so technically, anything else can happen.
However, the first example is a bit different.
unsigned f(float x) {
unsigned u = *(unsigned *)&x;
return u;
}
Sorry for my unclear wording, but everything is done "in-place". I can't think of any other way the compiler might interpret the line unsigned u = *(unsigned *)&x;, other than "copy the bits of x to u".
In practice, all compilers for various architectures I tested in https://godbolt.org/ with full optimization produce the same result for the first example, and varying results (either 0 or 1) for the second example.
I know it's technically possible that unsigned and float have different sizes and alignment requirements, or should be stored in different memory segments. In that case even the first code won't make sense. But on most modern platforms where the following holds, is the first example still undefined behaviour (can it produce unpredictable results)?
unsigned and float are both 32-bit
a pointer has a fixed size for all types
unsigned and float can be stored to and loaded from the same part of memory.
In real code, I do write
unsigned f(float x) {
unsigned u;
memcpy(&u, &x, sizeof(x));
return u;
}
The compiled result is the same as using pointer casting, after optimization. This question is about interpretation of the standard about strict aliasing rules for code such as the first example.
Is it always undefined behaviour to copy the bits of a variable through an incompatible pointer?
Yes.
The rule is https://port70.net/~nsz/c/c11/n1570.html#6.5p7 :
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the
object,
a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
a character type.
The effective type of the object x is float - it is defined with that type.
unsigned is not compatible with float,
unsigned is not a qualified version of float,
unsigned is not a signed or unsigned type of float,
unsigned is not a signed or unsigned type corresponding to qualified version of float,
unsigned is not an aggregate or union type
and unsigned is not a character type.
The "shall" is violated, it is undefined behavior (see https://port70.net/~nsz/c/c11/n1570.html#4p2 ). There is no other interpretation.
We also have https://port70.net/~nsz/c/c11/n1570.html#J.2 :
The behavior is undefined in the following circumstances:
An object has its stored value accessed other than by an lvalue of an allowable type (6.5).
As Kamil explains, it's UB. Even int and long (or long and long long) aren't alias-compatible even when they're the same size. (But interestingly, unsigned int is compatible with int)
It's nothing to do with being the same size, or using the same register-set as suggested in a comment, it's mainly a way to let compilers assume that different pointers don't point to overlapping memory when optimizing. They still have to support C99 union type-punning, not just memcpy. So for example a dst[i] = src[i] loop doesn't need to check for possible overlap when unrolling or vectorizing, if dst and src have different types.1
If you're accessing the same integer data, the standard requires that you use the exact same type, modulo only things like signed vs. unsigned and const. Or that you use (unsigned) char*, which is like GNU C __attribute__((may_alias)).
The other part of your question seems to be why it appears to work in practice, despite the UB.
Your godbolt link forgot to link the actual compilers you tried.
https://godbolt.org/z/rvj3d4e4o shows GCC4.1, from before GCC went out of its way to support "obvious" local compile-time-visible cases like this, to sometimes not break people's buggy code using non-portable idioms like this.
It loads garbage from stack memory, unless you use -fno-strict-aliasing to make it movd to that location first. (Store/reload instead of movd %xmm0, %eax is a missed-optimization bug that's been fixed in later GCC versions for most cases.)
f: # GCC4.1 -O3
movl -4(%rsp), %eax
ret
f: # GCC4.1 -O3 -fno-strict-aliasing
movss %xmm0, -4(%rsp)
movl -4(%rsp), %eax
ret
Even that old GCC version warns warning: dereferencing type-punned pointer will break strict-aliasing rules which should make it obvious that GCC notices this and does not consider it well-defined. Later GCC that do choose to support this code still warn.
It's debatable whether it's better to sometimes work in simple cases, but break other times, vs. always failing. But given that GCC -Wall does still warn about it, that's probably a good tradeoff between convenience for people dealing with legacy code or porting from MSVC. Another option would be to always break it unless people use -fno-strict-aliasing, which they should if dealing with codebases that depend on this behaviour.
Being UB doesn't mean required-to-fail
Just the opposite; it would take tons of extra work to actually trap on every signed overflow in the C abstract machine, for example, especially when optimizing stuff like 2 + c - 3 into c - 1. That's what gcc -fsanitize=undefined tries to do, adding x86 jo instructions after additions (except it still does constant-propagation so it's just adding -1, not detecting temporary overflow on INT_MAX. https://godbolt.org/z/WM9jGT3ac). And it seems strict-aliasing is not one of the kinds of UB it tries to detect at run time.
See also the clang blog article: What Every C Programmer Should Know About Undefined Behavior
An implementation is free to define behaviour the ISO C standard leaves undefined
For example, MSVC always defines this aliasing behaviour, like GCC/clang/ICC do with -fno-strict-aliasing. Of course, that doesn't change the fact that pure ISO C leaves it undefined.
It just means that on those specific C implementations, the code is guaranteed to work the way you want, rather than happening to do so by chance or by de-facto compiler behaviour if it's simple enough for modern GCC to recognize and do the more "friendly" thing.
Just like gcc -fwrapv for signed-integer overflows.
Footnote 1: example of strict-aliasing helping code-gen
#define QUALIFIER // restrict
void convert(float *QUALIFIER pf, const int *pi) {
for(int i=0 ; i<10240 ; i++){
pf[i] = pi[i];
}
}
Godbolt shows that with the -O3 defaults for GCC11.2 for x86-64, we get just a SIMD loop with movdqu / cvtdq2ps / movups and loop overhead. With -O3 -fno-strict-aliasing, we get two versions of the loop, and an overlap check to see if we can run the scalar or the SIMD version.
Is there actual cases where strict aliasing helps better code generation, in which the same cannot be achieved with restrict
You might well have a pointer that might point into either of two int arrays, but definitely not at any float variable, so you can't use restrict on it. Strict-aliasing will let the compiler still avoid spill/reload of float objects around stores through the pointer, even if the float objects are global vars or otherwise aren't provably local to the function. (Escape analysis.)
Or a struct node * that definitely isn't the same type as the payload in a tree.
Also, most code doesn't use restrict all over the place. It could get quite cumbersome. Not just in loops, but in every function that deals with pointers to structs. And if you get it wrong and promise something that's not true, your code's broken.
The Standard was never intended to fully, accurately, and unambiguously partition programs that have defined behavior and those that don't(*), but instead relies upon compiler writers to exercise a certain amount of common sense.
(*) If it was intended for that purpose, it fails miserably, as evidenced by the amount of confusion stemming from it.
Consider the following two code snippets:
/* Assume suitable declarations of u are available everywhere */
union test { uint32_t ww[4]; float ff[4]; } u;
/* Snippet #1 */
uint32_t proc1(int i, int j)
{
u.ww[i] = 1;
u.ff[j] = 2.0f;
return u.ww[i];
}
/* Snippet #2, part 1, in one compilation unit */
uint32_t proc2a(uint32_t *p1, float *p2)
{
*p1 = 1;
*p2 = 2.0f;
return *p1;
}
/* Snippet #2, part 2, in another compilation unit */
uint32_t proc2(int i, int j)
{
return proc2a(u.ww+i, u.ff+j);
}
It is clear that the authors of the Standard intended that the first version of the code be processed meaningfully on platforms where that would make sense, but it's also clear that at least some of the authors of C99 and later versions did not intend to require that the second version be processed likewise (some of the authors of C89 may have intended that the "strict aliasing rule" only apply to situations where a directly named object would be accessed via pointer of another type, as shown in the example given in the published Rationale; nothing in the Rationale suggests a desire to apply it more broadly).
On the other hand, the Standard defines the [] operator in such a fashion that proc1 is semantically equivalent to:
uint32_t proc3(int i, int j)
{
*(u.ww+i) = 1;
*(u.ff+j) = 2.0f;
return *(u.ww+i);
}
and there's nothing in the Standard that would imply that proc() shouldn't have the same semantics. What gcc and clang seem to do is special-case the [] operator as having a different meaning from pointer dereferencing, but nothing in the Standard makes such a distinction. The only way to consistently interpret the Standard is to recognize that the form with [] falls in the category of actions which the Standard doesn't require that implementations process meaningfully, but relies upon them to handle anyway.
Constructs such as yours example of using a directly-cast pointer to access storage associated with an object of the original pointer's type fall in a similar category of constructs which at least some authors of the Standard likely expected (and would have demanded, if they didn't expect) that compilers would handle reliably, with or without a mandate, since there was no imaginable reason why a quality compiler would do otherwise. Since then, however, clang and gcc have evolved to defy such expectations. Even if clang and gcc would normally generate useful machine code for a function, they seek to perform aggressive inter-procedural optimizations that make it impossible to predict what constructs will be 100% reliable. Unlike some compilers which refrain from applying potential optimizing transforms unless they can prove that they are sound, clang and gcc seek to perform transforms that can't be proven to affect program behavior.
#include <stdio.h>
int main()
{
const int a = 12;
int *p;
p = &a;
*p = 70;
}
Will it work?
It's "undefined behavior," meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes." A variable, const or not, is just a location in memory, and you can break the rules of constness and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
It's undefined behaviour. Proof:
/* program.c */
int main()
{
const int a = 12;
int* p;
p = &a;
*p = 70;
printf("%d\n", a);
return 0;
}
gcc program.c
and run it. Output will be 70 (gcc 4.3)
Then compile it like this:
gcc -O2 program.c
and run it. The output will be 12. When it does optimisation, the compiler presumably loads 12 into a register and doesn't bother to load it again when it needs to access a for the printf because it "knows" that a can't change.
Modifying a const qualified object through a pointer invokes undefined behaviour, and such is the result. It may be something you'd expect from a particular implementation, e.g. the previous value unchanged, if it has been placed in .text, etc.
It does indeed work with gcc. It didn't like it though:
test.c:6: warning: assignment discards qualifiers from pointer target type
But the value did change when executed. I won't point out the obvious no-no...
yes, you can make it done by using such code. but the code do not apply when when a is global (a gcc-compiled program gave me segmentation fault.)
generally speaking, in beloved C, you can almost always find someway to hack things that are not supposed to be changed or exposed. const here being a example.
But thinking about the poor guy(maybe myself after 6 months) maintains our code, I often choose not do so.
Here the type of pointer p is int*, which is being assigned the value of type const int* (&a => address of a const int variable).
Implicit cast eliminates the constness, though gcc throws a warning (please note this largely depends on the implementation).
Since the pointer is not declared as a const, value can be changed using such pointer.
if the pointer would be declared as const int* p = &a, you won't be able to do *p = 70.
This code contains a constraint violation:
const int a = 12;
int *p;
p = &a;
The constraint violated is C11 6.5.16.1/1 "Simple assignment"; if both operands are pointers then the type pointed to by the left must have all the qualifiers of the type pointed to by the right. (And the types, sans qualifiers, must be compatible).
So the constraint is violated because &a has type const int *, which has const as a qualifier; but that qualifier does not appear in the type of p which is int *.
The compiler must emit a diagnostic and might not generate an executable. The behaviour of any executable would be completely undefined, since the program does not comply with the rules of the language.
You cannot change the value of a constant variable by using a pointer pointing to it. This type of pointer is called as Pointer to a constant.
There is also another concept called Constant Pointer. It means that once a pointer points to a memory location you cannot make it point to the another location.
Bad, BAD idea.
Also, the behavior is platform- and implementation-specific. If you're running on a platform where the constant is stored in non-writable memory, this obviously won't work.
And, why on earth would you want to? Either update the constant in your source, or make it a variable.
The problem with changing the value of const variable is that the compiler will not expect that to happen. Consider this code:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", a);
Why would the compiler read a in the last statement? The compiler knows that a is 12 and since it is const, it will never change. So the optimizer may transform the code above into this:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", 12);
This can lead to strange issues. E.g. the code might work as desired in debug builds without optimization but it will fail in release builds with optimization.
Actually a good optimizer might transform the entire code to this:
printf("%d\n", 12);
As all other code before has no effect in the eye of the compiler. Leaving out code that has no effect will also have no effect on the overall program.
On the other hand, a decent compiler will recognize, that your code is faulty and warn you, since
int * p = &a;
is actually wrong. Correct would be:
const int * p = &a;
as p is not a pointer to int, it is a pointer to const int and when declared like that, the next line will cause a hard compile error.
To get rid of the warning, you have to cast:
int * p = (int *)&a;
And an even better compiler will recognize that this cast breaks the const promise and instruct the optimizer to not treat a as const.
As you can see, the quality, capabilities and settings of the compilerwill decide in the end what behavior you can expect. This implies that the same code may show different behavior on different platforms or when using different compilers on the same platform.
If the C standard had defined a behavior for that case, all compilers would have to implement it and no matter what the standard had defined, it would have been hard to implement, putting a huge burden on everyone who wants to write a compiler. Even if the standard had just said "This is forbidden", all compilers would have to perform complex data flow analysis to enforce this rule. So the standard just doesn't define it. It defines that const values cannot be changed and if you find a way to change them anyway, there is no behavior you can rely on.
Yes, you can change the value of a constant variable.
Try this code:
#include <stdio.h>
int main()
{
const int x=10;
int *p;
p=(int*)&x;
*p=12;
printf("%d",x);
}
#include <stdio.h>
int main()
{
const int a = 12;
int *p;
p = &a;
*p = 70;
}
Will it work?
It's "undefined behavior," meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes." A variable, const or not, is just a location in memory, and you can break the rules of constness and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
It's undefined behaviour. Proof:
/* program.c */
int main()
{
const int a = 12;
int* p;
p = &a;
*p = 70;
printf("%d\n", a);
return 0;
}
gcc program.c
and run it. Output will be 70 (gcc 4.3)
Then compile it like this:
gcc -O2 program.c
and run it. The output will be 12. When it does optimisation, the compiler presumably loads 12 into a register and doesn't bother to load it again when it needs to access a for the printf because it "knows" that a can't change.
Modifying a const qualified object through a pointer invokes undefined behaviour, and such is the result. It may be something you'd expect from a particular implementation, e.g. the previous value unchanged, if it has been placed in .text, etc.
It does indeed work with gcc. It didn't like it though:
test.c:6: warning: assignment discards qualifiers from pointer target type
But the value did change when executed. I won't point out the obvious no-no...
yes, you can make it done by using such code. but the code do not apply when when a is global (a gcc-compiled program gave me segmentation fault.)
generally speaking, in beloved C, you can almost always find someway to hack things that are not supposed to be changed or exposed. const here being a example.
But thinking about the poor guy(maybe myself after 6 months) maintains our code, I often choose not do so.
Here the type of pointer p is int*, which is being assigned the value of type const int* (&a => address of a const int variable).
Implicit cast eliminates the constness, though gcc throws a warning (please note this largely depends on the implementation).
Since the pointer is not declared as a const, value can be changed using such pointer.
if the pointer would be declared as const int* p = &a, you won't be able to do *p = 70.
This code contains a constraint violation:
const int a = 12;
int *p;
p = &a;
The constraint violated is C11 6.5.16.1/1 "Simple assignment"; if both operands are pointers then the type pointed to by the left must have all the qualifiers of the type pointed to by the right. (And the types, sans qualifiers, must be compatible).
So the constraint is violated because &a has type const int *, which has const as a qualifier; but that qualifier does not appear in the type of p which is int *.
The compiler must emit a diagnostic and might not generate an executable. The behaviour of any executable would be completely undefined, since the program does not comply with the rules of the language.
You cannot change the value of a constant variable by using a pointer pointing to it. This type of pointer is called as Pointer to a constant.
There is also another concept called Constant Pointer. It means that once a pointer points to a memory location you cannot make it point to the another location.
Bad, BAD idea.
Also, the behavior is platform- and implementation-specific. If you're running on a platform where the constant is stored in non-writable memory, this obviously won't work.
And, why on earth would you want to? Either update the constant in your source, or make it a variable.
The problem with changing the value of const variable is that the compiler will not expect that to happen. Consider this code:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", a);
Why would the compiler read a in the last statement? The compiler knows that a is 12 and since it is const, it will never change. So the optimizer may transform the code above into this:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", 12);
This can lead to strange issues. E.g. the code might work as desired in debug builds without optimization but it will fail in release builds with optimization.
Actually a good optimizer might transform the entire code to this:
printf("%d\n", 12);
As all other code before has no effect in the eye of the compiler. Leaving out code that has no effect will also have no effect on the overall program.
On the other hand, a decent compiler will recognize, that your code is faulty and warn you, since
int * p = &a;
is actually wrong. Correct would be:
const int * p = &a;
as p is not a pointer to int, it is a pointer to const int and when declared like that, the next line will cause a hard compile error.
To get rid of the warning, you have to cast:
int * p = (int *)&a;
And an even better compiler will recognize that this cast breaks the const promise and instruct the optimizer to not treat a as const.
As you can see, the quality, capabilities and settings of the compilerwill decide in the end what behavior you can expect. This implies that the same code may show different behavior on different platforms or when using different compilers on the same platform.
If the C standard had defined a behavior for that case, all compilers would have to implement it and no matter what the standard had defined, it would have been hard to implement, putting a huge burden on everyone who wants to write a compiler. Even if the standard had just said "This is forbidden", all compilers would have to perform complex data flow analysis to enforce this rule. So the standard just doesn't define it. It defines that const values cannot be changed and if you find a way to change them anyway, there is no behavior you can rely on.
Yes, you can change the value of a constant variable.
Try this code:
#include <stdio.h>
int main()
{
const int x=10;
int *p;
p=(int*)&x;
*p=12;
printf("%d",x);
}
#include <stdio.h>
int main()
{
const int a = 12;
int *p;
p = &a;
*p = 70;
}
Will it work?
It's "undefined behavior," meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes." A variable, const or not, is just a location in memory, and you can break the rules of constness and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
It's undefined behaviour. Proof:
/* program.c */
int main()
{
const int a = 12;
int* p;
p = &a;
*p = 70;
printf("%d\n", a);
return 0;
}
gcc program.c
and run it. Output will be 70 (gcc 4.3)
Then compile it like this:
gcc -O2 program.c
and run it. The output will be 12. When it does optimisation, the compiler presumably loads 12 into a register and doesn't bother to load it again when it needs to access a for the printf because it "knows" that a can't change.
Modifying a const qualified object through a pointer invokes undefined behaviour, and such is the result. It may be something you'd expect from a particular implementation, e.g. the previous value unchanged, if it has been placed in .text, etc.
It does indeed work with gcc. It didn't like it though:
test.c:6: warning: assignment discards qualifiers from pointer target type
But the value did change when executed. I won't point out the obvious no-no...
yes, you can make it done by using such code. but the code do not apply when when a is global (a gcc-compiled program gave me segmentation fault.)
generally speaking, in beloved C, you can almost always find someway to hack things that are not supposed to be changed or exposed. const here being a example.
But thinking about the poor guy(maybe myself after 6 months) maintains our code, I often choose not do so.
Here the type of pointer p is int*, which is being assigned the value of type const int* (&a => address of a const int variable).
Implicit cast eliminates the constness, though gcc throws a warning (please note this largely depends on the implementation).
Since the pointer is not declared as a const, value can be changed using such pointer.
if the pointer would be declared as const int* p = &a, you won't be able to do *p = 70.
This code contains a constraint violation:
const int a = 12;
int *p;
p = &a;
The constraint violated is C11 6.5.16.1/1 "Simple assignment"; if both operands are pointers then the type pointed to by the left must have all the qualifiers of the type pointed to by the right. (And the types, sans qualifiers, must be compatible).
So the constraint is violated because &a has type const int *, which has const as a qualifier; but that qualifier does not appear in the type of p which is int *.
The compiler must emit a diagnostic and might not generate an executable. The behaviour of any executable would be completely undefined, since the program does not comply with the rules of the language.
You cannot change the value of a constant variable by using a pointer pointing to it. This type of pointer is called as Pointer to a constant.
There is also another concept called Constant Pointer. It means that once a pointer points to a memory location you cannot make it point to the another location.
Bad, BAD idea.
Also, the behavior is platform- and implementation-specific. If you're running on a platform where the constant is stored in non-writable memory, this obviously won't work.
And, why on earth would you want to? Either update the constant in your source, or make it a variable.
The problem with changing the value of const variable is that the compiler will not expect that to happen. Consider this code:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", a);
Why would the compiler read a in the last statement? The compiler knows that a is 12 and since it is const, it will never change. So the optimizer may transform the code above into this:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", 12);
This can lead to strange issues. E.g. the code might work as desired in debug builds without optimization but it will fail in release builds with optimization.
Actually a good optimizer might transform the entire code to this:
printf("%d\n", 12);
As all other code before has no effect in the eye of the compiler. Leaving out code that has no effect will also have no effect on the overall program.
On the other hand, a decent compiler will recognize, that your code is faulty and warn you, since
int * p = &a;
is actually wrong. Correct would be:
const int * p = &a;
as p is not a pointer to int, it is a pointer to const int and when declared like that, the next line will cause a hard compile error.
To get rid of the warning, you have to cast:
int * p = (int *)&a;
And an even better compiler will recognize that this cast breaks the const promise and instruct the optimizer to not treat a as const.
As you can see, the quality, capabilities and settings of the compilerwill decide in the end what behavior you can expect. This implies that the same code may show different behavior on different platforms or when using different compilers on the same platform.
If the C standard had defined a behavior for that case, all compilers would have to implement it and no matter what the standard had defined, it would have been hard to implement, putting a huge burden on everyone who wants to write a compiler. Even if the standard had just said "This is forbidden", all compilers would have to perform complex data flow analysis to enforce this rule. So the standard just doesn't define it. It defines that const values cannot be changed and if you find a way to change them anyway, there is no behavior you can rely on.
Yes, you can change the value of a constant variable.
Try this code:
#include <stdio.h>
int main()
{
const int x=10;
int *p;
p=(int*)&x;
*p=12;
printf("%d",x);
}
I used the following piece of code to read data from files as part of a larger program.
double data_read(FILE *stream,int code) {
char data[8];
switch(code) {
case 0x08:
return (unsigned char)fgetc(stream);
case 0x09:
return (signed char)fgetc(stream);
case 0x0b:
data[1] = fgetc(stream);
data[0] = fgetc(stream);
return *(short*)data;
case 0x0c:
for(int i=3;i>=0;i--)
data[i] = fgetc(stream);
return *(int*)data;
case 0x0d:
for(int i=3;i>=0;i--)
data[i] = fgetc(stream);
return *(float*)data;
case 0x0e:
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
return *(double*)data;
}
die("data read failed");
return 1;
}
Now I am told to use -O2 and I get following gcc warning:
warning: dereferencing type-punned pointer will break strict-aliasing rules
Googleing I found two orthogonal answers:
Concluding: there's no need to worry; gcc tries to be more law obedient than
the actual law.
vs
So basically if you have an int* and a float* they are not allowed to point to the same memory location. If your code does not respect this, then the compiler's optimizer will most likely break your code.
In the end I don't want to ignore the warnings. What would you recommend?
[update] I substituted the toy example with the real function.
The problem occurs because you access a char-array through a double*:
char data[8];
...
return *(double*)data;
But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:
If the compiler knows that your *(double*) can in no way overlap with data[], it's allowed to all sorts of things like reordering your code into:
return *(double*)data;
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
The loop is most likely optimized away and you end up with just:
return *(double*)data;
Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data, it could have given bugs.
But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:
double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;
Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.
It looks a lot as if you really want to use fread:
int data;
fread(&data, sizeof(data), 1, stream);
That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:
union
{
char theChars[4];
int theInt;
} myunion;
for(int i=0; i<4; i++)
myunion.theChars[i] = fgetc(stream);
return myunion.theInt;
I'm not sure why the length of data in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.
Note that both your code and mine are highly non-portable.
Edit: If you want to read ints of various lengths from a file, portably, try something like this:
unsigned result=0;
for(int i=0; i<4; i++)
result = (result << 8) | fgetc(stream);
(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)
This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.
If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.
Using a union is not the correct thing to do here. Reading from an unwritten member of the union is undefined - i.e. the compiler is free to perform optimisations that will break your code (like optimising away the write).
This doc summarizes the situation: http://dbp-consulting.com/tutorials/StrictAliasing.html
There are several different solutions there, but the most portable/safe one is to use memcpy(). (The function calls may be optimized out, so it's not as inefficient as it appears.) For example, replace this:
return *(short*)data;
With this:
short temp;
memcpy(&temp, data, sizeof(temp));
return temp;
Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.
Casting a three byte character array to an int is one of the worst things I have seen, ever. Normally your int has at least 4 bytes. So for the fourth (and maybe more if int is wider) you get random data. And then you cast all of this to a double.
Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.
The authors of the C Standard wanted to let compiler writers generate efficient code in circumstances where it would be theoretically possible but unlikely that a global variable might have its value accessed using a seemingly-unrelated pointer. The idea wasn't to forbid type punning by casting and dereferencing a pointer in a single expression, but rather to say that given something like:
int x;
int foo(double *d)
{
x++;
*d=1234;
return x;
}
a compiler would be entitled to assume that the write to *d won't affect x. The authors of the Standard wanted to list situations where a function like the above that received a pointer from an unknown source would have to assume that it might alias a seemingly-unrelated global, without requiring that types perfectly match. Unfortunately, while the rationale strongly suggests that authors of the Standard intended to describe a standard for minimum conformance in cases where a compiler would otherwise have no reason to believe that things might alias, the rule fails to require that compilers recognize aliasing in cases where it is obvious and the authors of gcc have decided that they'd rather generate the smallest program it can while conforming to the poorly-written language of the Standard, than generate code which is actually useful, and instead of recognizing aliasing in cases where it's obvious (while still being able to assume that things that don't look like they'll alias, won't) they'd rather require that programmers use memcpy, thus requiring a compiler to allow for the possibility that pointers of unknown origin might alias just about anything, thus impeding optimization.
Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*.
In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.
I think the safe way would be
int i, *p = &i;
char *q = (char*)&p[0];
or
char *q = (char*)(void*)p;
You can also try this and see what you get:
char *q = reinterpret_cast<char*>(p);