assignment expressions and volatile - c

I seem to have a reasonable understanding of volatiles in general, but there's one seemingly obscure case, in which I'm not sure how things are supposed to work per the standard. I've read the relevant parts of C99 and a dozen or more related posts on SO, but can't find the logic in this case or a place where this case is explained.
Suppose we have this piece of code:
int a, c;
volatile int b;
a = b = 1;
c = b += 1; /* or equivalently c = ++b; */
Should a be evaluated like this:
b = 1;
a = b; // volatile is read
or like this:
b = 1;
a = 1; // volatile isn't read
?
Similarly, should c be evaluated like this:
int tmp = b;
tmp++;
b = tmp;
c = b; // volatile is read
or like this:
int tmp = b;
tmp++;
b = tmp;
c = tmp; // volatile isn't read
?
In simple cases like a = b; c = b; things are clear. But how about the ones above?
Basically, the question is, what exactly does "expression has the value of the left operand after the assignment" mean in 6.5.16c3 of C99 when the object is volatile?:
An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left operand
after the assignment, but is not an lvalue.
Does it imply an extra read of the volatile to produce the value of the assignment expression?
UPDATE:
So, here's the dilemma.
If "the value of the object after the assignment" is not obtained from the extra read of the volatile object, then the compiler makes the assumption that the volatile object b:
is capable of holding an arbitrary int value that gets written into it, which it may not be (say, bit 0 is hardwired to 0, which is not an unusual thing with hardware registers, for which we are supposed to use volatiles)
cannot change between the point when the assigning write has occurred and the point when the expression value is obtained (and again it can be a problem with hardware registers)
And because of all that, the expression value, if not obtained from the extra read of the volatile object, does not yield the value of the volatile object, which the standard claims should be the case.
Both of these assumptions don't seem to fit well with the nature of volatile objects.
If, OTOH, "the value of the object after the assignment" is obtained from the extra implied read of said volatile object, then the side effects of evaluating assignment expressions with volatile left operands depend on whether the expression value is used or not or are completely arbitrary, which would be an odd, unexpected and poorly documented behavior.

C11 clarifies that this is unspecified.
You can find the final draft of C11 here. The second sentence you quoted now refers to footnote 111:
An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment,111) but is not an lvalue.
Footnote 111 says this:
The implementation is permitted to read the object to determine the value but is not required to, even when the object has volatile-qualified type.

From common sense I'd argue like this:
If b = (whatever) and whatever can be stored in a register, there's no reason for the compiler to re-evaluate the expression for assignment.
Also because it cannot be more recent than the value in the register.
Consider f(x) vs. r = f(x): Once the result of f(x) is known, it can be assigned.
So for a = b = 1 there should be no reason for assigning 1 to b a second time, just to be able to assign to a.
Also assume you write a = ++b:
Obviously b cannot be incremented a second time; otherwise basic C semantics would be broken.

Related

Question on "array objects" and undefined behavior

In C, suppose for a pointer p we do *p++ = 0. If p points to an int variable, is this defined behavior?
You can do arithmetic resulting in pointing one past the end of an "array object" per the standard, but I am unable to find a really precise definition of "array object" in the standard. I don't think in this context it means just an object explicitly defined as an array, because p=malloc(sizeof(int)); ++p; pretty clearly is intended to be defined behavior.
If a variable does not qualify as an "array object", then as far as I can tell *p++ = 0 is undefined behavior.
I am using the C23 draft, but an answer citing the C11 standard would probably answer the question too.
Yes it is well-defined. Pointer arithmetic is defined by the additive operators so that's where you need to look.
C17 6.5.6/7
For the purposes of these operators, a pointer to an object that is not an element of an array behaves
the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
That is, int x; is to be regarded as equivalent to int x[1]; for the purpose of determining valid pointer arithmetic.
Given int x; int* p = &x; *p++ = 0; then it is fine to point 1 item past it but not to de-reference that item:
C17 6.5.6/8
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
This behavior has not changed in the various revisions of the standard. It's the very same from C90 to C23.
There are two separate questions: 1. What constructs does the Standard specify that correct conforming implementations should process meaningfully, and 2. What constructs do clang and gcc actually process meaningfully. The clear intention of the Standard is to define the behavior of a pointer "one past" an array object and a pointer to the start of another array object that happens to immediately follow it. The actual behavior of clang and gcc tells another story, however.
Given the source code:
#include <stdint.h>
extern int x[],y[];
int test1(int *p)
{
y[0] = 1;
if (p == x+1)
*p = 2;
return y[0];
}
int test2(int *p)
{
y[0] = 1;
uintptr_t p1 = 3*(uintptr_t)(x+1);
uintptr_t p2 = 5*(uintptr_t)p;
if (5*p1 == 3*p2)
*p = 2;
return y[0];
}
both clang and gcc will recognize in both functions that the *p=2 assignment will only run if p happens to be equal to a one-past pointer to x, and will conclude as a consequence that it would be impossible for p to equal y. Construction of an executable example where clang and gcc would erroneously make this assumption is difficult without the ability to execute a program containing two compilation units, but examination of the generated machine code at https://godbolt.org/z/x78GMqbrv will reveal that every ret instruction is immediately preceded by mov eax,1, which loads the return value with 1.
Note that the code in test2 doesn't compare pointers, nor even compare integers that are directly formed from pointers, but the fact that clang and gcc are able to show that the numbers being compared can only be equal if the pointers happened to be equal is sufficient for test2() to, as perceived by clang or gcc, invoke UB if the function is passed a pointer to y, and y happens to equal x+1.

Volatile and sequence point

Given the following code:
unsigned int global_flag = 0;
void exception_handle()
{
global_flag = 1;
}
void func()
{
/* access will cause exception which will assign global_flag = 1
then execution continues */
volatile unsigned int x = *(unsigned int *)(0x60000000U); /* memory protection unit configured to raise exception upon accessing this address */
if (global_flag == 1)
{
/* some code */
}
}
Given the fact that volatile must not be reordered across sequence points:
The minimum requirement is that at a sequence point all previous
accesses to volatile objects have stabilized and no subsequent
accesses have occurred
And given the following about sequence points:
sequence points occur in the following places ... (1) .. (2) .. (3) At the end of a full expression. This category includes expression
statements (such as the assignment a=b;), return statements, the
controlling expressions of if, switch, while, or do-while statements,
and all three expressions in a for statement.
Is it promised that volatile unsigned int x = *(unsigned int *)(0x60000000U); will take place before if (global_flag == 1) (in the binary asm, the CPU out-of-order execution is not relevant here) ?
According to the citations above, volatile unsigned int x = *(unsigned int *)(0x60000000U); must be evaluated before the end of next sequence point, and volatile unsigned int x = *(unsigned int *)(0x60000000U); is a sequence point by itself, so is that means that every volatile assignment is evaluated at the assignment time?
If the answer to above question is no, than next sequence point is at the end of the if, does it mean that something like that can be executed:
if (global_flag == 1)
{
volatile unsigned int x = *(unsigned int *)(0x60000000U);
/* some code */
}
System is an embedded one- ARM cortex m0, single core, single thread application.
In your snippet the variable global_flag is not volatile, so nothing prevents the compiler from moving the access to global_flag across sequence points or to remove it entirely if circumstances allow it. It does not make sense to talk about the order of the access to x and the access to global_flag because the latter is not an observable event, only the former is.
(Also note that there is no volatile qualifier in the expression *(unsigned int *)(0x60000000U). I think it is really that expression that you wish to treat specially, but your code does not do that. The compiler is allowed to produce code that evaluates *(unsigned int *)(0x60000000U) well in advance, then does a ton of other stuff it has on its plate, then assigns the value that was obtained to x and this would satisfy the constraints that the C standards place on volatile lvalues.)
If your snippet had unsigned int volatile global_flag = 0; and *(volatile unsigned int *)(0x60000000U) then the answer to the question “Is it promised that …” would be an unambiguous “yes”.
Is it promised that volatile unsigned int x = *(unsigned int *)(ILLEGAL_ADDRESS); will take place before if (global_flag == 1)
From informative C11 AnnexC (added newlines/formatting for readability):
The following are the sequence points described in 5.1.2.3:
...
- Between the evaluation of a full expression and the next full expression to be evaluated.
- The following are full expressions:
- an initializer that is not part of a compound literal (6.7.9);
- the expression in an expression statement (6.8.3);
- the controlling expression of a selection statement (if or switch) (6.8.4);
- the controlling expression of a while or do statement (6.8.5);
- each of the (optional) expressions of a for statement (6.8.5.3);
- the (optional) expression in a return statement (6.8.6.4).
As the *(unsigned int *)(ILLEGAL_ADDRESS); is an initializer (assignment expression) and the initializer is not part of a compound literal, it is a full expression. The next full expression is the controlling statement in if, so between if and the initialization of x there is a sequence point.
And from the famous C11 5.1.2.3p6:
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
...
As x is a volatile object, it is initialized strictly to the abstract machine, so after the sequence point it has to have the rvalue equal to the result of *(unsigned int *)(ILLEGAL_ADDRESS) operation.
So yes, the initialization of x object must happen before the control expression inside the if.
On undefined behavior, there's the good quote from C11 6.5.3.2p4:
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
As you commented:
accessing address 0x60000000 is not permitted in my system memory model
one can deduce that (unsigned int*)0x60000000 is an invalid pointer, so the unary * operator should spawn dragons.

How the multiple assignment worked in the below case?

I have just written a sample code to try it out. Surprisingly, I did not get any compilation failure. As per C,we should have declaration followed by initialization or use. Kindly explain.
#include <stdio.h>
int main(void) {
int a = a = 1; //Why it compiles??
printf("%d",a);
return 0;
}
Above code is compiled successfully and outputs 1. Please explain and also provide any input from standard which allows this.
Each assignment expression like a = 1 has - besides the "side effect" of assigning the value 1 to a - a result value, which is the value of a after the assignment (cf, for example, cppreference/assignment):
Assignment also returns the same value as what was stored in lhs (so
that expressions such as a = b = c are possible).
Hence, if you write, for example, int a; printf("%d",(a=1)), the output will be 1.
If you know chain assignments like in int a; a = a = 1, then this is equivalent to int a; a = (a=1), and - as the result of (a=1) is 1, the result of a = (a=1) is 1, too.
The definition
int a = a = 1;
is equal to
int a = (a = 1);
and is also roughly equivalent to
int a;
a = (a = 1);
When you use a in the initialization, it has already been defined, it exists and can be assigned to. And more importantly, since it's defined then it can be used as a source for its own initialization.
The C standard does not define the behavior in this case, not because of the rule about unsequenced effects or explicit statement but rather because it fails to address the situation.
C 2011 (unofficial draft N1570) clause 6.7, paragraph 1, shows us the overall grammar of declarations. In this grammar, int a = a = 1; is a
declaration in which:
int is a declaration-specifiers which consists solely of the type-specifier int.
a = a = 1 is an init-declarator, in which a is a declarator and a = 1 is an initializer. The declarator consists solely of the identifier a.
6.7.6 3 defines a full declarator to be a declarator that is not part of another declarator, and it says the end of a full declarator is a sequence point. However, these are not necessary for our analysis.
6.7.9 8 says “An initializer specifies the initial value stored in an object.”
6.7.9 11 says “The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.”
So, on one hand, the initializer, which has the value 1, specifies the initial value stored in a. On the other hand, the expression a = 1 has the side effect of storing 1 in a. I do not see anything in the C standard that says which occurs first. The rules about sequencing within expressions apply only to the evaluation of the initializer; they do not tell us the order of giving the “initial value” to a and the side effect of assigning to it.
It is reasonable to conclude that, whether a is given the initial value 1 or is assigned the value 1, it ends up with the value 1, so the behavior is defined. However, the standard famously makes it undefined behavior to modify the value of an object twice in an unsequenced way, even if the value being written is the same. The explicit statement of that rule is in 6.5 2, which applies to expressions, and hence does not apply in this situation where we have a conflict between an initialization and an expression. However, we might interpret the spirit of the standard to be:
In order to afford an implementation opportunity to do whatever it needs to do to store (or modify) a new value in an object, a sequencing for the store relative to other stores (or modifications) must be defined.
The standard fails to define a sequence for the initialization and the assignment side effect, and therefore it fails to afford the implementation this needed constraint.
Thus, the standard fails to specify the behavior in a way that guarantees an implementation will produce defined behavior.
Additionally, we can consider int a = 2 + (a = 1). In this case, the value of the initializer is 3, but the side effect assigns 1 to a. For this declaration, the standard does not say which value prevails (except that one might interpret “initial value” literally, thus implying that 3 must be assigned first, so the side effect must be later).

Difference between a simple variable i and *(&i);

I have the following C program:
int main()
{
int i = 5;
printf("Simple value of i = %d", i);
printf("\nPointer value of i = %d", *(&i));
return 0;
}
Both of the printf() will print the same thing, which is 5. As per my understanding & is being used for address value and * is used to pick the value on that address.
My question is: Why do we need *(&i) if the same thing can be achieved by a simple i variable?
My question is why we need *(&i) if same thing can be achieved with simple i variable?
Well, you don't need it.
The expression *(&i) is equivalent to i.
6.5.3.2 Address and indirection operators says:
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ''pointer to type'', the result has type ''type''. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)
And the footnote:
Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). [..]
The C standard allows a compiler to transform the code in anyway as long as the observable behaviour (see: 5.1.2.3 Program execution) is same.
So the statement:
printf("\nPointer value of i = %d", *(&i));
can be, in theory, transformed into:
printf("\nPointer value of i = %d", i);
by a compiler without violating C standard.
*(&i) is almost exactly the same as i.
A compiler is allowed to optimise it out, but do note that it is not allowed to do that if there is a side-effect of the address of i being taken (it can no longer be stored solely in a CPU register for example).

Restricted pointer questions

I'm a little confused about the rules regarding restricted pointers. Maybe someone out there can help me out.
Is it legal to define nested restricted pointers as follows:
int* restrict a;
int* restrict b;
a = malloc(sizeof(int));
// b = a; <-- assignment here is illegal, needs to happen in child block
// *b = rand();
while(1)
{
b = a; // Is this legal? Assuming 'b' is not modified outside the while() block
*b = rand();
}
Is it legal to derive a restricted pointer value as follows:
int* restrict c;
int* restrict d;
c = malloc(sizeof(int*)*101);
d = c;
for(int i = 0; i < 100; i++)
{
*d = i;
d++;
}
c = d; // c is now set to the 101 element, is this legal assuming d isn't accessed?
*c = rand();
Thanks!
Andrew
For reference, here's the restrict qualifier's rather convoluted definition (from C99 6.7.3.1 "Formal definition of restrict"):
Let D be a declaration of an ordinary
identifier that provides a means of
designating an object P as a
restrict-qualified pointer to type T.
If D appears inside a block and
does not have storage class
extern, let B denote the block. If D
appears in the list of parameter
declarations of a function
definition, let B denote the
associated block. Otherwise, let B
denote the block of main (or the block
of whatever function is called at
program startup in a freestanding
environment).
In what follows, a pointer
expression E is said to be based on
object P if (at some sequence point
in the execution of B prior to the
evaluation of E) modifying P to point
to a copy of the array object into
which it formerly pointed would change
the value of E. Note that "based" is
defined only for expressions with
pointer types.
During each execution of B, let L be
any lvalue that has &L based on P. If
L is used to access the value of the
object X that it designates, and X is
also modified (by any means), then the
following requirements apply: T shall
not be const-qualified. Every other
lvalue used to access the value of X
shall also have its address based on
P. Every access that modifies X shall
be considered also to modify P, for
the purposes of this subclause. If P
is assigned the value of a pointer
expression E that is based on another
restricted pointer object P2,
associated with block B2, then either
the execution of B2 shall begin before
the execution of B, or the
execution of B2 shall end prior to
the assignment. If these
requirements are not met, then the
behavior is undefined.
Here an execution of B means that
portion of the execution of the
program that would correspond to the
lifetime of an object with scalar type
and automatic storage duration
associated with B.
My reading of the above means that in your first question, a cannot be assigned to b, even inside a "child" block - the result is undefined. Such an assignment could be made if b were declared in that 'sub-block', but since b is declared at the same scope as a, the assignment cannot be made.
For question 2, the assignments between c and d also result in undefined behavior (in both cases).
The relevant bit from the standard (for both questions) is:
If P is assigned the value of a
pointer expression E that is based on
another restricted pointer object P2,
associated with block B2, then either
the execution of B2 shall begin before
the execution of B, or the
execution of B2 shall end prior to
the assignment.
Since the restricted pointers are associated with the same block, it's not possible for block B2 to begin before the execution of B, or for B2 to end prior to the assignment (since B and B2 are the same block).
The standard gives an example that makes this pretty clear (I think - the clarity of the restrict definition's 4 short paragraphs is on par with C++'s name resolution rules):
EXAMPLE 4:
The rule limiting assignments between
restricted pointers does not
distinguish between a function call
and an equivalent nested block.
With one exception, only
"outer-to-inner" assignments between
restricted pointers declared in nested
blocks have defined behavior.
{
int * restrict p1;
int * restrict q1;
p1 = q1; // undefined behavior
{
int * restrict p2 = p1; // valid
int * restrict q2 = q1; // valid
p1 = q2; // undefined behavior
p2 = q2; // undefined behavior
}
}
The restrict type qualifier is an indication to the compiler that, if the memory addressed by the restrict-qualified pointer is modified, no other pointer will access that same memory. The compiler may choose to optimize code involving restrict-qualified pointers in a way that might otherwise result in incorrect behavior. It is the responsibility of the programmer to ensure that restrict-qualified pointers are used as they were intended to be used. Otherwise, undefined behavior may result. (link)
As you can see from the above description, both your assignments are illegal, that may work in executables produced by some compilers but break in others. Don't expect the compiler itself to emit errors or warnings as restrict just gives an opportunity to perform certain optimization, which it can choose not to perform, like in the case of volatile.

Resources