Does strict aliasing apply when using pointers to struct members? - c

Does test_func the following snippet trigger undefined behavior under the strict aliasing rules when the two arguments partially overlap?
That is the second argument is a member of the first:
#include <stdio.h>
typedef struct
{
//... Other fields
int x;
//... Other fields
} A;
int test_func(A *a, int *x)
{
a->x = 0;
*x = 1;
return a->x;
}
int main()
{
A a = {0};
printf("%d\n", test_func(&a, &a.x));
return 0;
}
Is the compiler allowed to think test_func will just return 0, based on the assumption that A* and int* will not alias? so the *x cannot overwrite the member?

Strict aliasing refers to when a pointer is converted to another pointer type, after which the contents are accessed. Strict aliasing means that the involved pointed-at types must be compatible. That does not apply here.
There is however the term pointer aliasing, meaning that two pointers can refer to the same memory. The compiler is not allowed to assume that this is the case here. If it wants to do optimizations like those you describe, it would perhaps have to add machine code that compares the pointers with each other, to determine if they are the same or not. Which in itself would make the function slightly slower.
To help the compiler optimize such code, you can declare the pointers as restrict, which tells the compiler that the programmer guarantees that the pointers are not pointing at the same memory.
Your function compiled with gcc -O3 results in this machine code:
0x00402D09 mov $0x1,%edx
Which basically means that the whole function was replaced (inlined) with "set a.x to 1".
But if I rewrite your function as
int test_func(A* restrict a, int* restrict x)
{
a->x = 0;
*x = 1;
return a->x;
}
and compile with gcc -O3, it does return 0. Because I have now told the compiler that a->X and x do not point at the same memory, so it can assume that *x = 1; does not affect the result and skip the line *x = 1; or sequence it before the line a->x = 0;.
The optimized machine code of the restrict version actually skips the whole function call, since it knows that the value is already 0 as per your initialization.
This is of course a bug, but the programmer is to blame for it, for careless use of restrict.

This is not a violation of strict aliasing. The strict aliasing rule says (simplified) that you can access the value of an object only using an lvalue expression of a compatible type. In this case, the object you're accessing is the member x of main's a variable. This member has type int. And the expression you use to access it (*x) also has type int. So there's no problem.
You may be confusing strict aliasing with restrict. If you had used the restrict keyword in the declaration of one of the pointer parameters, the code would be invalid because restrict prevents you from using different pointers to access the same object - but this is a different issue than strict aliasing.

Related

Why is it allowed to modify a constant using a pointer in C? [duplicate]

#include <stdio.h>
int main()
{
const int a = 12;
int *p;
p = &a;
*p = 70;
}
Will it work?
It's "undefined behavior," meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes." A variable, const or not, is just a location in memory, and you can break the rules of constness and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
It's undefined behaviour. Proof:
/* program.c */
int main()
{
const int a = 12;
int* p;
p = &a;
*p = 70;
printf("%d\n", a);
return 0;
}
gcc program.c
and run it. Output will be 70 (gcc 4.3)
Then compile it like this:
gcc -O2 program.c
and run it. The output will be 12. When it does optimisation, the compiler presumably loads 12 into a register and doesn't bother to load it again when it needs to access a for the printf because it "knows" that a can't change.
Modifying a const qualified object through a pointer invokes undefined behaviour, and such is the result. It may be something you'd expect from a particular implementation, e.g. the previous value unchanged, if it has been placed in .text, etc.
It does indeed work with gcc. It didn't like it though:
test.c:6: warning: assignment discards qualifiers from pointer target type
But the value did change when executed. I won't point out the obvious no-no...
yes, you can make it done by using such code. but the code do not apply when when a is global (a gcc-compiled program gave me segmentation fault.)
generally speaking, in beloved C, you can almost always find someway to hack things that are not supposed to be changed or exposed. const here being a example.
But thinking about the poor guy(maybe myself after 6 months) maintains our code, I often choose not do so.
Here the type of pointer p is int*, which is being assigned the value of type const int* (&a => address of a const int variable).
Implicit cast eliminates the constness, though gcc throws a warning (please note this largely depends on the implementation).
Since the pointer is not declared as a const, value can be changed using such pointer.
if the pointer would be declared as const int* p = &a, you won't be able to do *p = 70.
This code contains a constraint violation:
const int a = 12;
int *p;
p = &a;
The constraint violated is C11 6.5.16.1/1 "Simple assignment"; if both operands are pointers then the type pointed to by the left must have all the qualifiers of the type pointed to by the right. (And the types, sans qualifiers, must be compatible).
So the constraint is violated because &a has type const int *, which has const as a qualifier; but that qualifier does not appear in the type of p which is int *.
The compiler must emit a diagnostic and might not generate an executable. The behaviour of any executable would be completely undefined, since the program does not comply with the rules of the language.
You cannot change the value of a constant variable by using a pointer pointing to it. This type of pointer is called as Pointer to a constant.
There is also another concept called Constant Pointer. It means that once a pointer points to a memory location you cannot make it point to the another location.
Bad, BAD idea.
Also, the behavior is platform- and implementation-specific. If you're running on a platform where the constant is stored in non-writable memory, this obviously won't work.
And, why on earth would you want to? Either update the constant in your source, or make it a variable.
The problem with changing the value of const variable is that the compiler will not expect that to happen. Consider this code:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", a);
Why would the compiler read a in the last statement? The compiler knows that a is 12 and since it is const, it will never change. So the optimizer may transform the code above into this:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", 12);
This can lead to strange issues. E.g. the code might work as desired in debug builds without optimization but it will fail in release builds with optimization.
Actually a good optimizer might transform the entire code to this:
printf("%d\n", 12);
As all other code before has no effect in the eye of the compiler. Leaving out code that has no effect will also have no effect on the overall program.
On the other hand, a decent compiler will recognize, that your code is faulty and warn you, since
int * p = &a;
is actually wrong. Correct would be:
const int * p = &a;
as p is not a pointer to int, it is a pointer to const int and when declared like that, the next line will cause a hard compile error.
To get rid of the warning, you have to cast:
int * p = (int *)&a;
And an even better compiler will recognize that this cast breaks the const promise and instruct the optimizer to not treat a as const.
As you can see, the quality, capabilities and settings of the compilerwill decide in the end what behavior you can expect. This implies that the same code may show different behavior on different platforms or when using different compilers on the same platform.
If the C standard had defined a behavior for that case, all compilers would have to implement it and no matter what the standard had defined, it would have been hard to implement, putting a huge burden on everyone who wants to write a compiler. Even if the standard had just said "This is forbidden", all compilers would have to perform complex data flow analysis to enforce this rule. So the standard just doesn't define it. It defines that const values cannot be changed and if you find a way to change them anyway, there is no behavior you can rely on.
Yes, you can change the value of a constant variable.
Try this code:
#include <stdio.h>
int main()
{
const int x=10;
int *p;
p=(int*)&x;
*p=12;
printf("%d",x);
}

Change Value of const varaible by having a pointer? [duplicate]

#include <stdio.h>
int main()
{
const int a = 12;
int *p;
p = &a;
*p = 70;
}
Will it work?
It's "undefined behavior," meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes." A variable, const or not, is just a location in memory, and you can break the rules of constness and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
It's undefined behaviour. Proof:
/* program.c */
int main()
{
const int a = 12;
int* p;
p = &a;
*p = 70;
printf("%d\n", a);
return 0;
}
gcc program.c
and run it. Output will be 70 (gcc 4.3)
Then compile it like this:
gcc -O2 program.c
and run it. The output will be 12. When it does optimisation, the compiler presumably loads 12 into a register and doesn't bother to load it again when it needs to access a for the printf because it "knows" that a can't change.
Modifying a const qualified object through a pointer invokes undefined behaviour, and such is the result. It may be something you'd expect from a particular implementation, e.g. the previous value unchanged, if it has been placed in .text, etc.
It does indeed work with gcc. It didn't like it though:
test.c:6: warning: assignment discards qualifiers from pointer target type
But the value did change when executed. I won't point out the obvious no-no...
yes, you can make it done by using such code. but the code do not apply when when a is global (a gcc-compiled program gave me segmentation fault.)
generally speaking, in beloved C, you can almost always find someway to hack things that are not supposed to be changed or exposed. const here being a example.
But thinking about the poor guy(maybe myself after 6 months) maintains our code, I often choose not do so.
Here the type of pointer p is int*, which is being assigned the value of type const int* (&a => address of a const int variable).
Implicit cast eliminates the constness, though gcc throws a warning (please note this largely depends on the implementation).
Since the pointer is not declared as a const, value can be changed using such pointer.
if the pointer would be declared as const int* p = &a, you won't be able to do *p = 70.
This code contains a constraint violation:
const int a = 12;
int *p;
p = &a;
The constraint violated is C11 6.5.16.1/1 "Simple assignment"; if both operands are pointers then the type pointed to by the left must have all the qualifiers of the type pointed to by the right. (And the types, sans qualifiers, must be compatible).
So the constraint is violated because &a has type const int *, which has const as a qualifier; but that qualifier does not appear in the type of p which is int *.
The compiler must emit a diagnostic and might not generate an executable. The behaviour of any executable would be completely undefined, since the program does not comply with the rules of the language.
You cannot change the value of a constant variable by using a pointer pointing to it. This type of pointer is called as Pointer to a constant.
There is also another concept called Constant Pointer. It means that once a pointer points to a memory location you cannot make it point to the another location.
Bad, BAD idea.
Also, the behavior is platform- and implementation-specific. If you're running on a platform where the constant is stored in non-writable memory, this obviously won't work.
And, why on earth would you want to? Either update the constant in your source, or make it a variable.
The problem with changing the value of const variable is that the compiler will not expect that to happen. Consider this code:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", a);
Why would the compiler read a in the last statement? The compiler knows that a is 12 and since it is const, it will never change. So the optimizer may transform the code above into this:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", 12);
This can lead to strange issues. E.g. the code might work as desired in debug builds without optimization but it will fail in release builds with optimization.
Actually a good optimizer might transform the entire code to this:
printf("%d\n", 12);
As all other code before has no effect in the eye of the compiler. Leaving out code that has no effect will also have no effect on the overall program.
On the other hand, a decent compiler will recognize, that your code is faulty and warn you, since
int * p = &a;
is actually wrong. Correct would be:
const int * p = &a;
as p is not a pointer to int, it is a pointer to const int and when declared like that, the next line will cause a hard compile error.
To get rid of the warning, you have to cast:
int * p = (int *)&a;
And an even better compiler will recognize that this cast breaks the const promise and instruct the optimizer to not treat a as const.
As you can see, the quality, capabilities and settings of the compilerwill decide in the end what behavior you can expect. This implies that the same code may show different behavior on different platforms or when using different compilers on the same platform.
If the C standard had defined a behavior for that case, all compilers would have to implement it and no matter what the standard had defined, it would have been hard to implement, putting a huge burden on everyone who wants to write a compiler. Even if the standard had just said "This is forbidden", all compilers would have to perform complex data flow analysis to enforce this rule. So the standard just doesn't define it. It defines that const values cannot be changed and if you find a way to change them anyway, there is no behavior you can rely on.
Yes, you can change the value of a constant variable.
Try this code:
#include <stdio.h>
int main()
{
const int x=10;
int *p;
p=(int*)&x;
*p=12;
printf("%d",x);
}

Is the C restrict qualifier transitive?

While there are many examples [1][2][3] that address how the restrict keyword works, I am not completely sure if the restrictified relation is transitive on the pointers that it can point to. For instance, the following code declares a structure that contains an integer and a pointer to an integer.
typedef struct container_s {
int x;
int *i;
} container_s;
int bar(container_s *c, int *i) {
int* tmp = c->i;
*tmp = 5;
*i = 4;
return *tmp;
}
int main(){
return 0;
}
Does the compiler need an extra load instruction for the last access of tmp (the returned value) because it cannot infer that *i and *tmp do not alias?
If so, would this new definition fix that load?
int bar(container_s *c, int* restrict i) { ... }
EDIT
This case int bar(container_s *c, int * restrict i) { ... } removes the extract load when I produce LLVM IR (clang -S -O3 -emit-llvm). However, I do not understand why the next two modifications do not remove that final load when:
I update the definition of the function (is the restrict transitively considered for c->i?) to:
int bar(container_s * restrict c, int *i) { ... }
I update the structure as below (Why cannot the compiler infer that there is no need for an extra load?):
typedef struct container_s {
int x;
int * restrict i;
} container_s;
int bar(container_s *c, int *i) { ... }
I'm having trouble seeing how transitivity applies here, but I can speak to your example.
Does the compiler need an extra load instruction for the last access of tmp (the returned value) because it cannot infer that *i and *tmp do not alias?
The compiler indeed cannot safely infer that *i and *tmp do not alias in your original code, as you have aptly demonstrated. It does not follow that the compiler specifically needs emit the load instruction implied by the abstract machine semantics of the * operator, but it does need to take care to deal with the aliasing issue somehow.
If so, would [restrict-qualifying parameter i] fix that load?
Adding restrict-qualification to parameter i in the function definition places the following additional requirement on the behavior of the program (derived from the text of C2011, 6.7.3.1/4): during each execution of bar(), because i is (trivially) based on i, and *i is used to access the object it designates, and that designated object is modified during the execution of bar() (via *i at least), every other lvalue used to access the object designated by *i shall also have its address based on i.
*tmp is accessed, and its address, tmp, is not based on i. Therefore, if i == tmp (that is, if on some call, i == c->i) then the program fails to conform. Its behavior is undefined in that case. The compiler is free to emit code that assumes the program conforms, so in particular, in the restrict-qualified case it can emit code that assumes both that the statement
*i = 4;
does not modify *tmp, and that the statement
*tmp = 5;
does not modify *i. Indeed, it seems consistent with the definition and express intent of restrict that compilers be free to make precisely those assumptions.
In particular, if the compiler chooses to handle the possibility of aliasing in the original code by performing the possibly-redundant load of *tmp, then in the restrict-qualified version it might choose to optimize by omitting that load. However, the resulting machine code is by no means required to differ in any way between the two cases. That is, you cannot, in general, rely on the compiler to make use of all the optimizations available to it.
Update:
The followup questions ask why clang does not perform a particular optimization under particular circumstances. Before anything else, it is essential to reiterate that C compilers have no responsibility whatever for performing any particular optimization that may be possible for given source code, except as they themselves document. Therefore, one generally cannot draw any conclusions from the fact that a given optimization is not performed, and it is rarely useful to ask why a given optimization was not performed.
About as far as you can go -- and I am interpreting the questions in this light -- is to ask whether the optimization in question is one that a conforming compiler could have performed. In this case the standard underscores that by taking the unusual step of clarifying that restrict imposes no optimization obligation on implementations:
A translator is free to ignore any or all aliasing implications of uses of restrict.
(C2011, 6.7.3.1/6)
With that said, on to the questions.
In this code variant, *tmp is an lvalue whose address is based on restrict-qualified pointer c. The object it designates is accessed via that lvalue within the scope of the function, and also modified within that scope (via *tmp, so the compiler can certainly see it). The address of *i is not based on c, so the compiler is free to assume that *i does not alias *tmp, just as in the original question.
This case is different. Although it is permitted to restrict-qualify struct members, restrict has effect only when it qualifies an ordinary identifier (C2011, 6.7.3.1/1), which struct member names are not (C2011, 6.2.3). In this case, restrict has no effect, and to ensure conforming behavior, the compiler must account for the possibility that c->i and *i (and *tmp) are aliases.
"would this new header fix that load?", --> No. The restrict refers to i, and accessing its fields:
... requires that all accesses to that object use, directly or indirectly, the value of that particular pointer... C11 ยง6.7.3 8
but does not extend to qualify those fields when they in turn are used to access other data.
#include<stdio.h>
typedef struct container_s {
int x;
int *i;
} container_s;
int bar(container_s * c, int* restrict i) {
int* tmp = c->i;
*tmp = 5;
*i = 4;
return *tmp;
}
int main(void) {
int i = 42;
container_s s = { 1, &i };
printf("%d\n", bar(&s, &i));
printf("%d\n", i);
printf("%d\n", *(s.i));
}
Output
4
4
4

Meaning of volatile for arrays and typecasts

Folks,
Consider this (abominable) piece of code:
volatile unsigned long a[1];
unsigned long T;
void main(void)
{
a[0] = 0x6675636b; /* first access of a */
T = *a;
*(((char *)a) + 3) = 0x64; /* second access of a */
T = *a;
}
...the question: is ((char *)a) volatile or non-volatile?
This begs a larger question: should there be a dependence between the two accesses of a? That is, human common sense says there is, but the C99 standard says that volatile things don't alias non-volatile things -- so if ((char *)a) is non-volatile, then the two accesses don't alias, and there isn't a dependence.
More correctly, C99 6.7.3 (para 5) reads:
"If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined."
So when we typecast a, does the volatile qualifier apply?
When in doubt, run some code :) I whipped up some similar (slightly less abominable) test code (win32 C++ app in msvs 2k10)
int _tmain(int argc, _TCHAR* argv[]) {
int a = 0;
volatile int b = 0;
a = 1; //breakpoint 1
b = 2; //breakpoint 2
*(int *) &b = 0; //breakpoint 3
*(volatile int *) &b = 0; //breakpoint 4
return 0;
}
When compiled for release, I am allowed to breakpoint at 2 and 4, but not 1 and 3.
My conclusion is that the typecast determines the behavior and 1 and 3 were optimized away. Intuition supports this - otherwise compiler would have to keep some type of list of all locations of memory listed as volatile and check on every access (hard, ugly), rather than just associating it with the type of the identifier (easier and more intuitive).
I also suspect it's compiler specific (and possibly flag specific even within a compiler) and would test on any platform before depending on this behavior.
Actually scratch that, I would simply try to not depend on this behavior :)
Also, I know you were asking specifically about arrays, but I doubt that makes a difference. You can easily whip up similar test code for arrays.
like you said, its "undefined". Which means demons can come out of your nose. Please stick to the "defined" behaviours as much as possible. A volatile specifier will ask the compiler to not optimize the value, since its an "important" and critical value that might cause problems if changed due to different optmization mechanisms. But that's all it can do.

Can we change the value of an object defined with const through pointers?

#include <stdio.h>
int main()
{
const int a = 12;
int *p;
p = &a;
*p = 70;
}
Will it work?
It's "undefined behavior," meaning that based on the standard you can't predict what will happen when you try this. It may do different things depending on the particular machine, compiler, and state of the program.
In this case, what will most often happen is that the answer will be "yes." A variable, const or not, is just a location in memory, and you can break the rules of constness and simply overwrite it. (Of course this will cause a severe bug if some other part of the program is depending on its const data being constant!)
However in some cases -- most typically for const static data -- the compiler may put such variables in a read-only region of memory. MSVC, for example, usually puts const static ints in .text segment of the executable, which means that the operating system will throw a protection fault if you try to write to it, and the program will crash.
In some other combination of compiler and machine, something entirely different may happen. The one thing you can predict for sure is that this pattern will annoy whoever has to read your code.
It's undefined behaviour. Proof:
/* program.c */
int main()
{
const int a = 12;
int* p;
p = &a;
*p = 70;
printf("%d\n", a);
return 0;
}
gcc program.c
and run it. Output will be 70 (gcc 4.3)
Then compile it like this:
gcc -O2 program.c
and run it. The output will be 12. When it does optimisation, the compiler presumably loads 12 into a register and doesn't bother to load it again when it needs to access a for the printf because it "knows" that a can't change.
Modifying a const qualified object through a pointer invokes undefined behaviour, and such is the result. It may be something you'd expect from a particular implementation, e.g. the previous value unchanged, if it has been placed in .text, etc.
It does indeed work with gcc. It didn't like it though:
test.c:6: warning: assignment discards qualifiers from pointer target type
But the value did change when executed. I won't point out the obvious no-no...
yes, you can make it done by using such code. but the code do not apply when when a is global (a gcc-compiled program gave me segmentation fault.)
generally speaking, in beloved C, you can almost always find someway to hack things that are not supposed to be changed or exposed. const here being a example.
But thinking about the poor guy(maybe myself after 6 months) maintains our code, I often choose not do so.
Here the type of pointer p is int*, which is being assigned the value of type const int* (&a => address of a const int variable).
Implicit cast eliminates the constness, though gcc throws a warning (please note this largely depends on the implementation).
Since the pointer is not declared as a const, value can be changed using such pointer.
if the pointer would be declared as const int* p = &a, you won't be able to do *p = 70.
This code contains a constraint violation:
const int a = 12;
int *p;
p = &a;
The constraint violated is C11 6.5.16.1/1 "Simple assignment"; if both operands are pointers then the type pointed to by the left must have all the qualifiers of the type pointed to by the right. (And the types, sans qualifiers, must be compatible).
So the constraint is violated because &a has type const int *, which has const as a qualifier; but that qualifier does not appear in the type of p which is int *.
The compiler must emit a diagnostic and might not generate an executable. The behaviour of any executable would be completely undefined, since the program does not comply with the rules of the language.
You cannot change the value of a constant variable by using a pointer pointing to it. This type of pointer is called as Pointer to a constant.
There is also another concept called Constant Pointer. It means that once a pointer points to a memory location you cannot make it point to the another location.
Bad, BAD idea.
Also, the behavior is platform- and implementation-specific. If you're running on a platform where the constant is stored in non-writable memory, this obviously won't work.
And, why on earth would you want to? Either update the constant in your source, or make it a variable.
The problem with changing the value of const variable is that the compiler will not expect that to happen. Consider this code:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", a);
Why would the compiler read a in the last statement? The compiler knows that a is 12 and since it is const, it will never change. So the optimizer may transform the code above into this:
const int a = 12;
int * p = &a;
*p = 70;
printf("%d\n", 12);
This can lead to strange issues. E.g. the code might work as desired in debug builds without optimization but it will fail in release builds with optimization.
Actually a good optimizer might transform the entire code to this:
printf("%d\n", 12);
As all other code before has no effect in the eye of the compiler. Leaving out code that has no effect will also have no effect on the overall program.
On the other hand, a decent compiler will recognize, that your code is faulty and warn you, since
int * p = &a;
is actually wrong. Correct would be:
const int * p = &a;
as p is not a pointer to int, it is a pointer to const int and when declared like that, the next line will cause a hard compile error.
To get rid of the warning, you have to cast:
int * p = (int *)&a;
And an even better compiler will recognize that this cast breaks the const promise and instruct the optimizer to not treat a as const.
As you can see, the quality, capabilities and settings of the compilerwill decide in the end what behavior you can expect. This implies that the same code may show different behavior on different platforms or when using different compilers on the same platform.
If the C standard had defined a behavior for that case, all compilers would have to implement it and no matter what the standard had defined, it would have been hard to implement, putting a huge burden on everyone who wants to write a compiler. Even if the standard had just said "This is forbidden", all compilers would have to perform complex data flow analysis to enforce this rule. So the standard just doesn't define it. It defines that const values cannot be changed and if you find a way to change them anyway, there is no behavior you can rely on.
Yes, you can change the value of a constant variable.
Try this code:
#include <stdio.h>
int main()
{
const int x=10;
int *p;
p=(int*)&x;
*p=12;
printf("%d",x);
}

Resources