C -- Modify const through aliased non-const pointer - c

Is it allowed in standard C for a function to modify an int given as const int * using an aliased int *? To put it another way, is the following code guaranteed to always return 42 and 1 in standard C?
#include <stdio.h>
void foo(const int *a, int *b)
{
printf("%d\n", *a);
*b = 1;
printf("%d\n", *a);
}
int main(void)
{
int a = 42;
foo(&a, &a);
return 0;
}

In your example code, you have an integer. You take a const pointer to it, and a non-const pointer to it. Modifying the integer via the non-const pointer is legal and well-defined, of course.
Since both pointers are pointers to integers, and the const pointer need not point to a const object, then the compiler should expect that the value read from the const pointer could have changed, and is required to reload the value.
Note that this would not be the case if you had used the restrict keyword, because it specifies that a pointer argument does not alias any other pointer argument, so then the compiler could optimise the reload away.

Yes and yes. Your program is defined.
The fact that you point to an non-const int variable with a pointer to a const int, doesn't make that variable const and may be still modified trough a pointer to an int or by using the original variable label.

Yes you can do this (if you know you can get away with it).
One reason you may not be able to get away with it, is if the destination memory you are writing to is in a read-only protected areas (such as constant data) then you will get an access violation. For example any const's at compile time that end up in read-only data sections of the executable. Most platform support protecting it from being written to at runtime.
Basically don't do it.
There are other issues with your example that probably don't make it the best demonstration. Such as needing a reload of *a in the 2nd printf the compiler may optimize it out! (it knows 'a' did not change, it know's 'a' points to a const, therefore, it does not need to reload memory by preforming a memory load for the 2nd '*a' expression, it can reuse the value it probably has in a register from the 1st time it loaded '*a'). Now if you add in a memory barrier between, then your example has a chance of working better.
https://en.wikipedia.org/wiki/Memory_barrier
GCC ? asm volatile ("" : : : "memory"); // might work before 2nd printf
But the principal for the actual question you asked, yes you can do it if you know what you are doing about other stuff like that.

Yes, it is guaranteed to always print 42 and 1.
const int *a means the value pointed to is a constant for pointer a.
Try dereferencing from a (*a = 10;) in the function and you will get an error.
The pointer a however is not constant. You can do a = b for example.
b can point to the same address as a and/or modify the value, as you did in your example.
Would you declare b pointer's value to be constant (const int *b), you would receive an error.
I try to memorize like this:
const int *a - a points to an object of type int, which it is not allowed to modify (any other pointer to that object can do what it wants, depends on its declaration/definition).

Related

Misunderstanding in particular user case of pointers and double-pointers

I'm dealing with pointers, double-pointers and arrays, and I think I'm messing up a bit my mind. I've been reading about it, but my particular user-case is messing me up, and I'd appreciate if someone could clear a bit my mind. This is a small piece of code I've built to show my misunderstanding:
#include <stdio.h>
#include <stdint.h>
void fnFindValue_vo(uint8_t *vF_pu8Msg, uint8_t vF_u8Length, uint8_t **vF_ppu8Match, uint8_t vF_u8Value)
{
for(int i=0; i<vF_u8Length; i++)
{
if(vF_u8Value == vF_pu8Msg[i])
{
*vF_ppu8Match = &vF_pu8Msg[i];
break;
}
}
}
int main()
{
uint8_t u8Array[]={0,0,0,1,2,4,8,16,32,64};
uint8_t *pu8Reference = &u8Array[3];
/*
* Purpose: Find the index of a value in u8Array from a reference
* Reference: First non-zero value
* Condition: using the function with those input arguments
*/
// WAY 1
uint8_t *pu8P2 = &u8Array[0];
uint8_t **ppu8P2 = &pu8P2;
fnFindValue_vo(u8Array,10,ppu8P2,16); // Should be diff=4
uint8_t u8Diff1 = *ppu8P2 - pu8Reference;
printf("Diff1: %u\n", u8Diff1);
// WAY 2
uint8_t* ppu8Pos; // Why this does not need to be initialized and ppu8P2 yes
fnFindValue_vo(u8Array,10,&ppu8Pos,64); // Should be diff=6
uint8_t u8Diff2 = ppu8Pos - pu8Reference;
printf("Diff2: %u\n", u8Diff2);
}
Suppose the function fnFindValue_vo and its arguments cannot be changed. So my purpose is to find the relative index of a value in the array taking as reference the first non-zero value (no need to find it, can be hard-coded).
In the first way, I've done it following my logic and understanding of the pointers. So I have *pu8P2 that contains the address of the first member of u8Array, and **ppu8P2 containing the address of pu8P2. So after calling the funcion, I just need to substract the pointers 'pointing' to u8Array to get the relative index.
Anyway, I tried another method. I just created a pointer, and passed it's address, without initializing the pointer, to the funcion. So later I just need to substract those two pointers and I get also the relative index.
My confusion comes with this second method.
Why ppu8Pos does not have to be initialized, and ppu8P2 yes? I.e. Why couldn't I declare it as uint8_t **ppu8P2;? (it gives me Segmentation fault).
Which of the two methods is more practical/better practice for coding?
Why is it possible to give the address to a pointer when the function's argument is a double pointer?
Why ppu8Pos does not have to be initialized, and ppu8P2 yes
You are not using the value of ppu8Pos right away. Instead, you pass its address to another function, where it gets assigned by-reference. On the other hand, ppu8P2 is the address of ppu8Pos you pass to another function, where its value is used, so you need to initialise it.
Which of the two methods is more practical/better practice for coding
They are identical for all intents and purposes, for exactly the same reason these two fragments are identical:
// 1
double t = sin(x)/cos(x);
// 2
double s = sin(x), c = cos(x);
double t = s/c;
In one case, you use a variable initialised to a value. In the other case, you use a value directly. The type of the value doesn't really matter. It could be a double, or a pointer, or a pointer to a pointer.
Why is it possible to give the address to a pointer when the function's argument is a double pointer?
These two things you mention, an address to a pointer and a double pointer, are one and the same thing. They are not two very similar things, or virtually indistinguishable, or any weak formulation like that. No, the two wordings mean exactly the same, to all digits after the decimal point.
The address of a pointer (like e.g. &pu8P2) is a pointer to a pointer.
The result of &pu8P2 is a pointer to the variable pu8P2.
And since pu8P2 is of the type uint8_t * then a pointer to such a type must be uint8_t **.
Regarding ppu8Pos, it doesn't need to be initialized, because that happens in the fnFindValue_vo function with the assignment *vF_ppu8Match = &vF_pu8Msg[i].
But there is a trap here: If the condition vF_u8Value == vF_pu8Msg[i] is never true then the assignment never happens and ppu8Pos will remain uninitialized. So that initialization of ppu8Pos is really needed after all.
The "practicality" of each solution is more an issue of personal opinion I believe, so I leave that unanswered.
For starters the function fnFindValue_vo can be a reason of undefined behavior because it does not set the pointer *vF_ppu8Match in case when the target value is not found in the array.
Also it is very strange that the size of the array is specified by an object of the type uint8_t. This does not make a sense.
The function should be declared at least the following way
void fnFindValue_vo( const uint8_t *vF_pu8Msg, size_t vF_u8Length, uint8_t **vF_ppu8Match, uint8_t vF_u8Value )
{
const uint8_t *p = vF_pu8Msg;
while ( p != vF_pu8Msg + vF_u8Length && *p != vF_u8Value ) ++p;
*vF_ppu8Match = ( uint8_t * )p;
}
The difference between the two approaches used in your question is that in the first code snippet if the target element will not be found then the pointer will still point to the first element of the array
uint8_t *pu8P2 = &u8Array[0];
And this expression
uint8_t u8Diff1 = *ppu8P2 - pu8Reference;
will yield some confusing positive value (due to the type uint8_t) because the difference *ppu8P2 - pu8Reference be negative.
In the second code snippet in this case you will get undefined behavior due to this statement
uint8_t u8Diff2 = ppu8Pos - pu8Reference;
because the pointer ppu8Pos was not initialized.
Honestly, not trying to understand your code completely, but my advice is do not overcomplicate it.
I would start with one fact which helped me untangle:
if you have int a[10]; then a is a pointer, in fact
int x = a[2] is exactly the same like int x = *(a+2) - you can try it.
So let's have
int a[10]; //this is an array
//a is a pointer to the begging of the array
a[2] is an int type and it is the third value in that array stored at memory location a plus size of two ints;
&a[2] is a pointer to that third value
*(a) is the first value in the array a
*(a+1) is the same as a[1] and it is the second int value in array a
and finally
**a is the same as *(*a) which means: *a is take the first int value in the array a (the same as above) and the second asterisk means "and take that int and pretend it is a pointer and take the value from the that location" - which is most likely a garbage.
https://stackoverflow.com/questions/42118190/dereferencing-a-double-pointer
Only when you have a[5][5]; then a[0] would be still a pointer to the first row and a[1] would be a pointer to the second row and **(a) would then be the same as a[0][0].
https://beginnersbook.com/2014/01/2d-arrays-in-c-example/
Drawing it on paper as suggested in comments helps, but what helped me a lot is to learn using debugger and break points. Put a breakpoint at the first line and then go trough the program step by step. In the "watches" put all variants like
pu8P2,&pu8P2,*pu8P2,**pu8P2 and see what is going on.

Why is it Illegal to create a Pointer to a Pointer that points to a const in C?

This is the source of the common "discarded const qualifier at assignment" error. However, I don't understand why it is illegal to do so?
Consider this code,
int sum_array(const int *a, int n){
int sum, *i;
sum = 0;
for(i = a; i<a+n; i++)
sum += *i;
return sum;
}
Obviously I can do the same operation using i = 0 and comparing a + i < a+n; however, it doesn't make sense to me why simply copying the address of a variable into another variable is illegal?
Typically the const variable indicates that the value of a variable cannot be changed. E.g. const int x = 7, here we declare that the value of x should not be changed from 7.
However, with const pointers, creating the "potential" to change the variable is also illegal. I.e. i = a does not change the value of what a points to, what would truly be "illegal" would be *i = 0, however, i = a is still illegal because it gives us the potential to change a.
I know you can just answer me with "because the language was created that way", I'm just wondering if there's anything I'm missing here.
The primary purpose of const is to enlist the compiler's help in ensuring that you don't accidentally modify a given object. It uses the type system for this. You can actually circumvent it if you want, by writing an explicit cast; but usually you don't want that, because if you wanted that then that usually means you didn't really want the const to begin with.
const serves two purposes:
in the contract of the function call (here int sum_array(const int *a, int n)) it means that the argument a will not be used by the function to modify the contents pointed by a (a is not constant, it is a pointer to things considered constant). Then caller knows that the data will not be modified through call and the compiler must enforce it as it can:
inside the function definition, implementor is then protected against accidental modification of contents pointed by a. ie. if he tries to write something like *a = someValue; then compiler will complain as the contract is violated. But then it also verifies that this contract is respected in anyway. Thus, any derived variable from a that may give access to the same data must be considered as const or the contract will be violated. Then writing int *p = a is a clear violation of that, because p is not a pointer to const data, which would let you write *p = someValue to modify data.
Be aware that const doesn't mean that data is const, only that through the pointer it is forbidden to modify the data. Then:
const int *a...
int *p = a;
is forbidden but
int *a...
const int *p = a;
is correct because in that case you have a pointer which let you modify data through it and you construct a derived one that restrict access to the same data through the derived pointer. Data is modifiable through a but not through p. Restricting is never dangerous, opening could.
Of course you can enforce the dangerous derivation by using cast (which lets you remove const) but I would not recommend doing so. Advice, if you are tempted doing so, please refrain yourself, think again, if after that you really want to do so, let things go hang for a full night and think about it again the day after... Recipe is:
const *a...;
int *p = (int *)a; // "seriously refrain" advice
The question proposes that, given const int *a, int *i = a; could be allowed and a subsequent *i = 0; would be disallowed (“illegal”).
This is not feasible because it requires the compiler to track information about the source of the data in i. At the point where *i = 0; appears, the compiler has to know that i contains a value resulting from the initialization from a.
We could do that in very simple code. But consider that the code may be very complicated, with loops, branches, and function calls. At the point where *i = 0; appears, the compiler generally cannot know whether i still holds the address from a or has been changed to something else. (I expect this is equivalent to the Halting Problem and hence is logically impossible in general).
C uses types to manage this. When a is a pointer to const, it may only be assigned to a pointer to const (unless overridden by explicit cast). So i is required to be a pointer to const, which enables the compiler to know that the thing it points to should not be modified.

Is it undefined behavior to modify a value that I also have a const pointer pointing to

Does the following scenario have undefined behavior?
void do_stuff(const int *const_pointer, int *pointer) {
printf("%i\n", *const_pointer);
*pointer = 1;
}
int data = 0;
do_stuff(&data, &data);
If this is undefined behavior it could probably cause problems if the compiler assumes that the value that const_pointer points to never changes. In this case it might reorder both instructions in do_stuff and thereby change the behavior from the intended printf("0") to printf("1").
If the compiler can prove that the value pointer to by a pointer to const will not change then it will not need to reload the value, or keep the ordering.
In this case this cannot be done because the two pointers may alias, so the compiler cannot assume the pointers don't point to the same object. (The call to the function might be done from a different translation unit, the compiler doesn't know what is passed to the function.)
There is no undefined behavior in your example, and you are guaranteed that the printf will output 0.
A const_pointer merely means that the data it points to cannot be changed.
When you pass a pointer to your function, it's entirely up to you to decide whether it should be const or not. const is a protection tool for you to prevent unwanted changes to your data. But without const, a function can still work. For example, your own version of strcpy can be written either as :
strcpy( char *s, const char *t );
or,
strcpy( char *s, char *t ); // without const, still work, just not as good as the first version
So there shouldn't anything unexpected in your code: you cannot modify data via const_pointer, but you can modify it via pointer (even when the two pointers are pointing to the same location).

What is C-equivalent of reference to a pointer "*&"

Could someone please let me know the C-equivalent of reference to a pointer "*&"?
In other word, if my function is like this in C++:
void func(int* p, int*& pr)
{
p++;
pr++;
}
How would I changed the second argument while converting it in C?
UPDATE:
#MikeDeSimone : Please let me know if I understood the translated code properly?
Let me start by initializing variable:
int i = 10;
int *p1 = &i;
int **pr= &p1;
So, when you performed (*pr)++ , that is basically equivalent to:
(p1)++
However, I fail to understand how would that look from inside main()?
Question 2: what would I do if I have code snippet like this?
void pass_by_reference(int*& p)
{
//Allocate new memory in p: this change would reflect in main
p = new int;
}
You use a pointer to a pointer.
void func(int* p, int** pr)
{
p++;
(*pr)++;
}
See, for example, the second parameter to strtoul, which the function uses to return the point at which parsing stopped.
Sorry for the late update...
Please let me know if I understood the translated code properly? Let me start by initializing variable:
int i = 10;
int *p1 = &i;
int **pr= &p1;
So, when you performed (*pr)++ , that is basically equivalent to:
(p1)++
Yes.
However, I fail to understand how would that look from inside main()?
I don't understand how main comes into this; we were talking about func. For this discussion, main would be a function like any other. Variables declared within a function only exist during execution of that function.
Question 2: what would I do if I have code snippet like this?
void pass_by_reference(int*& p)
{
//Allocate new memory in p: this change would reflect in main
p = new int;
}
The thing to remember about references passed into functions is that they are just saying "this parameter is a reference to the parameter passed to the function, and changing it changes the original. It is not a local copy like non-reference parameters."
Reviewing references in practice:
If your function is declared void func(int foo); and called with int k = 0; foo(k); then a copy of k is made that func sees as foo.
If func changes foo, k does not change. You will often see functions "trash" their passed-in-by-copy parameters like this.
If your function is declared void func(int& foo); and called with int k = 0; foo(k); then a reference to k is made that func sees as foo.
If func changes foo, it is actually changing k.
This is often done to "pass back" more values than just the return value, or when the function needs to persistently modify the object somehow.
Now the thing is that C doesn't have references. But, to be honest, C++ references are C pointers under the hood. The difference is that references cannot be NULL, should not be taken as pointing to the start of a C array, and references hide the pointer operations to make it look like you're working on the variable directly.
So every time you see a reference in C++, you need to convert it to a pointer in C. The referred-to type does not matter; even if it's a pointer, the reference turns into a pointer itself, so you have a pointer-to-pointer.
So what's a pointer, anyway? Remember that memory is just a big array of bytes, with every byte having an address. A pointer is a variable that contains an address. C and C++ give pointers types so the language can determine what kind of data the pointer is pointing to. Thus an int is an integer value, and an int* is a pointer to an integer value (as opposed to a pointer to a character, or structure, or whatever).
This means you can do two general things with a pointer: you can operate on the pointer itself, or you can operate on the object the pointer is pointing to. The latter is what happens when you use unary prefix * (e.g. *pr) or -> if the pointer points to a structure. (a->b is really just shorthand for (*a).b.)

Compiler optimization about elimination of pointer operation on inline function in C?

If this function Func1 is inlined,
inline int Func1 (int* a)
{
return *a + 1;
}
int main ()
{
int v = GetIntFromUserInput(); // Unknown at compile-time.
return Func1(&v);
}
Can I expect a smart compiler to eliminate the pointer operations? (&a and *a)
As I guess, the function will be transformed into something like this,
int main ()
{
int v = GetIntFromUserInput(); // Unknown at compile-time.
int* a = &v;
return *a + 1;
}
and finally,
int main ()
{
int v = GetIntFromUserInput(); // Unknown at compile-time.
return v + 1;
}
Pointer operations look easily being eliminated. But I heard that pointer operation is something special and cannot be optimized.
Yes the compiler, as said by Wallyk, is able to remove useless operations in this case.
However you must remember that when you specify a function signature something is lost in the translation from your problem domain to C. Consider the following function:
void transform(const double *xyz, // Source point
double *txyz, // Transformed points
const double *m, // 4x3 transformation matrix
int n) // Number of points to transform
{
for (int i=0; i<n; i++) {
txyz[0] = xyz[0]*m[0] + xyz[1]*m[3] + xyz[2]*m[6] + m[9];
txyz[1] = xyz[0]*m[1] + xyz[1]*m[4] + xyz[2]*m[7] + m[10];
txyz[2] = xyz[0]*m[2] + xyz[1]*m[5] + xyz[2]*m[8] + m[11];
txyz += 3; xyz += 3;
}
}
I think that the intent is clear, however the compiler must be paranoid and consider that the generated code must behave exactly as described by the C semantic even in cases that are of course not part of the original problem of transforming an array of points like:
txyz and xyz are pointing to the same memory address, or maybe they are pointing to adjacent doubles in memory
m is pointing inside the txyz area
This means that for the above function the C compiler is forced to assume that after each write to txyz any of xyz or m could change and so those values cannot be loaded in free order. The resulting code consequently will not be able to take advantage of parallel execution for example of the computations of the tree coordinates even if the CPU would allow to do so.
This case of aliasing was so common that C99 introduced a specific keyword to be able to tell the compiler that nothing so strange was intended. Putting the restrict keyword in the declaration of txyz and m reassures the compiler that the pointed-to memory is not accessible using other ways and the compiler is then allowed to generate better code.
However this "paranoid" behavior is still necessary for all operations to ensure correctness and so for example if you write code like
char *s = malloc(...);
char *t = malloc(...);
... use s and t ...
the compiler has no way to know that the two memory areas will be non-overlapping or, to say it better, there is no way to define a signature in the C language to express the concept that returned values from malloc are "non overlapping". This means that the paranoid compiler (unless some non-standard declarations are present for malloc and the compiler has a special handling for it) will think in the subsequent code that any write to something pointed by s will possibly overwrite data pointed by t (even when you're not getting past the size passed to malloc I mean ;-) ).
In your example case even a paranoid compiler is allowed to assume that
no one will know the address of a local variable unless getting it as a parameter
no unknown external code is executed between the reading and computation of addition
If both those points are lost then the compiler must think to strange possibilities; for example
int a = malloc(sizeof(int));
*a = 1;
printf("Hello, world.\n");
// Here *a could have been changed
This crazy thought is necessary because malloc knows the address of a; so it could have passed this information to printf, which after printing the string could use that address to change the content of the location. This seems clearly absurd and maybe the library function declaration could contain some special unportable trick, but it's necessary for correctness in general (imagine malloc and printf being two user defined functions instead of library ones).
What does all this blurb mean? That yes, in your case the compiler is allowed to optimize, but it's very easy to remove this possibility; for example
inline int Func1 (int* a) {
printf("pointed value is %i\n", *a);
return *a + 1;
}
int main () {
int v = GetIntFromUserInput(); // Assume input value is non-determinable.
printf("Address of v is %p\n", &v);
return Func1(&v);
}
is a simple variation of your code, but in this case the compiler cannot avoid assuming that the second printf call could have changed the pointed memory even if it's passed just the pointed value and not the address (because the first call to printf was passed the address and so the compiler must assume that potentially that function could have stored the address to use it later to alter the variable).
A very common misconception in C and C++ is that liberal use of the keyword const with pointers or (in C++) references will help the optimizer generating better code.
This is completely false:
In the declaration const char *s the nothing is said about that the pointed character is going to be constant; it's simply said that it is an error to change the pointed character using that pointer. In other words const in this case simply means that the pointer is "readonly" but doesn't tell that, for example, other pointers could be used to changed the very same memory pointed to by s.
It is legal in C (and C++) to "cast away" const-ness from a pointer (or reference) to constant. So the paranoid compiler must assume that even a function has been only handed a const int * the function could store that pointer and later can use it to change the memory pointed to.
The const keyword with pointers (and C++ references) is only meant as an aid for the programmer to avoid unintentional writing use of a pointer that was thought as being used only for reading. Once this check is performed then this const keyword is simply forgotten by the optimizer because it has no implications in the semantic of the language.
Sometimes you may find another silly use of the const keyword with parameters that tells that the value of the parameter cannot be changed; for example void foo(const int x).
This kind of use has no real philosophical meaning for the signature and simply puts some little annoyance on the implementation of the called function: a parameter is a copy of a value and caller shouldn't care if the called function is going to change that copy or not... the called function can still make a copy of the parameter and change that copy so nothing is gained anyway.
To recap... when the compiler sees
void foo(const int * const x);
must still assume that foo will potentially store away a copy of the passed pointer and that can use this copy to change the memory pointed to by x immediately or later when you call any other unknown function.
This level of paranoia is required because of how the language semantic is defined.
It is very important to understand this "aliasing" problem (there can be different ways to alter the same writable area of memory), especially with C++ where there is a common anti-pattern of passing around const references instead of values even when logically the function should accept a value. See this answer if you are also using C++.
All these are the reasons for which when dealing with pointers or references the optimizer has much less freedom than with local copies.
It is reasonable that it might occur. For example, gcc -O3 does so:
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call GetIntFromUserInput
movl %ebp, %esp
popl %ebp
addl $1, %eax
ret
Notice that it takes the return value from the function, adds one, and returns.
Interestingly, it also compiled a Func1, probably since inline seems like it should have the meaning of static, but an external function (like GetIntFromUserInput) ought to be able to call it. If I add static (and leave inline), it does remove the function's code.

Resources