How the multiple assignment worked in the below case? - c

I have just written a sample code to try it out. Surprisingly, I did not get any compilation failure. As per C,we should have declaration followed by initialization or use. Kindly explain.
#include <stdio.h>
int main(void) {
int a = a = 1; //Why it compiles??
printf("%d",a);
return 0;
}
Above code is compiled successfully and outputs 1. Please explain and also provide any input from standard which allows this.

Each assignment expression like a = 1 has - besides the "side effect" of assigning the value 1 to a - a result value, which is the value of a after the assignment (cf, for example, cppreference/assignment):
Assignment also returns the same value as what was stored in lhs (so
that expressions such as a = b = c are possible).
Hence, if you write, for example, int a; printf("%d",(a=1)), the output will be 1.
If you know chain assignments like in int a; a = a = 1, then this is equivalent to int a; a = (a=1), and - as the result of (a=1) is 1, the result of a = (a=1) is 1, too.

The definition
int a = a = 1;
is equal to
int a = (a = 1);
and is also roughly equivalent to
int a;
a = (a = 1);
When you use a in the initialization, it has already been defined, it exists and can be assigned to. And more importantly, since it's defined then it can be used as a source for its own initialization.

The C standard does not define the behavior in this case, not because of the rule about unsequenced effects or explicit statement but rather because it fails to address the situation.
C 2011 (unofficial draft N1570) clause 6.7, paragraph 1, shows us the overall grammar of declarations. In this grammar, int a = a = 1; is a
declaration in which:
int is a declaration-specifiers which consists solely of the type-specifier int.
a = a = 1 is an init-declarator, in which a is a declarator and a = 1 is an initializer. The declarator consists solely of the identifier a.
6.7.6 3 defines a full declarator to be a declarator that is not part of another declarator, and it says the end of a full declarator is a sequence point. However, these are not necessary for our analysis.
6.7.9 8 says “An initializer specifies the initial value stored in an object.”
6.7.9 11 says “The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.”
So, on one hand, the initializer, which has the value 1, specifies the initial value stored in a. On the other hand, the expression a = 1 has the side effect of storing 1 in a. I do not see anything in the C standard that says which occurs first. The rules about sequencing within expressions apply only to the evaluation of the initializer; they do not tell us the order of giving the “initial value” to a and the side effect of assigning to it.
It is reasonable to conclude that, whether a is given the initial value 1 or is assigned the value 1, it ends up with the value 1, so the behavior is defined. However, the standard famously makes it undefined behavior to modify the value of an object twice in an unsequenced way, even if the value being written is the same. The explicit statement of that rule is in 6.5 2, which applies to expressions, and hence does not apply in this situation where we have a conflict between an initialization and an expression. However, we might interpret the spirit of the standard to be:
In order to afford an implementation opportunity to do whatever it needs to do to store (or modify) a new value in an object, a sequencing for the store relative to other stores (or modifications) must be defined.
The standard fails to define a sequence for the initialization and the assignment side effect, and therefore it fails to afford the implementation this needed constraint.
Thus, the standard fails to specify the behavior in a way that guarantees an implementation will produce defined behavior.
Additionally, we can consider int a = 2 + (a = 1). In this case, the value of the initializer is 3, but the side effect assigns 1 to a. For this declaration, the standard does not say which value prevails (except that one might interpret “initial value” literally, thus implying that 3 must be assigned first, so the side effect must be later).

Related

Is reading an uninitialized value always an undefined behaviour? Or are there exceptions to it?

An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.

Using pointed to content in assignment of a pointer

It has always been my understanding that the lack of a sequence point after the reading of the right expression in an assignment makes an example like the following produce undefined behavior:
void f(void)
{
int *p;
/*...*/
p = (int [2]){*p};
/*...*/
}
// p is assigned the address of the first element of an array of two ints, the
// first having the value previously pointed to by p and the second, zero. The
// expressions in this compound literal need not be constant. The unnamed object
// has automatic storage duration.
However, this is EXAMPLE 2 under "6.5.2.5 Compound literals" in the committee draft for the C11 standard, the version identified as n1570, which I understand to be the final draft (I don't have access to the final version).
So, my question: Is there something in the standard that gives this defined and specified behavior?
EDIT
I would like to expound on exactly what I see as the problem, in response to some of the discussion that has come up.
We have two conditions under which an assignment is explicitly stated to have
undefined behavior, as per 6.5p2 of the standard quoted in the answer given by dbush:
1) A side effect on a scalar object is unsequenced relative to a different side
effect on the same scalar object.
2) A side effect on a scalar object is unsequenced relative to a value
computation using the value of the same scalar object.
An example of item 1 is "i = ++i + 1". In this case the side effect of
writing the value i+1 into i due to ++i is unsequenced relative to the side effect of assigning the RHS to the LHS. There is a sequence point between the value calculations of each side and the assignment of RHS to LHS, as described in 6.5.16.1 given in the answer by Jens Gustedt below. However, the modification of i due to ++i is not subject to that sequence point, otherwise the behavior would
be defined.
In the example I give above, we have a similar situation. There is a value computation, which involves the creation of an array and the conversion of that array to a pointer to its first element. There is also a side effect of writing a value to part of that array, *p to the first element.
So, I don't see what gaurantees we have in the standard that the modification
of the otherwise uninitialized first element of the array will be sequenced
before the writing of the array address to p. What about this modification (writing *p to the first element) is different from the modification of writing
i+1 to i?
To put it another way, suppose an implementation looked at the statement of interest in the example as three tasks: 1st, allocate space for the compound literal object; 2nd: assign a pointer to said space to p; 3rd: write *p to the first element in the newly allocated space. The value computation for both RHS and LHS would be sequenced before the assignment, as computing the value of the RHS only requires the address. In what way is this hypothetical implementation not standard compliant?
You need to look at the definition of the assignment operator in 6.5.16.1
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.
The evaluations of the operands are unsequenced.
So here you clearly see that first it evaluates the expressions on both sides in any order or even concurrently, and then stores the value of the right into the object designated by the left.
Additionally, you should know that LHS and RHS of an assignment are evaluated differently. Citations are a bit too long, so here is a summary
For the LHS the evaluation leaves "lvalues", that is objects such as
p, untouched. In particular it doesn't look at the contents of the
object.
For the RHS there is "lvalue conversion", that is for any object that is found there (e.g *p) the contents of that object is loaded.
If the RHS contains an lvalue of array type, this array is converted to a pointer to its first element. This is what is happening to your compound literal.
Edit: You added another question
What about this modification (writing *p to the first element) is
different from the modification of writing i+1 to i?
The difference is simply that i in the LHS of the assignment and thus has to be updated. The array from the compound literal is not in the LHS and thus is of no concern for the update.
Section 6.5p2 of the C standard details why this is valid:
If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings. 84)
And footnote 84 states:
84) This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
The posted snippet from 6.5.2.5 falls under the latter, as there is no side effect.
In (int [2]){*p}, *p provides an initial value for the compound literal. This is not an assignment, and it is not a side effect. The initial value is part of the object when the object is created. There is no moment when the array exists and it is not initialized.
In p = (int [2]){*p}, we know the side effect of updating p is sequenced after the computation of the right side because C 2011 [N1570] 6.5.16 3 says “The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands.”

C order of evaluation of assignment statement

I've been encountered on a case where cross-platform code was behaving differently on a basic assignment statement.
One compiler evaluated the Lvalue first, Rvalue second and then the assignment.
Another compiler did the Rvalue first, Lvalue second and then the assignment.
This may have impact in case Lvalue influence the value of Rvalue as shown in the following case:
struct MM {
int m;
}
int helper (struct MM** ppmm ) {
(*ppmm) = (struct MM *) malloc (sizeof (struct MM));
(*ppmm)->m = 1000;
return 100;
}
int main() {
struct MM mm = {500};
struct MM* pmm = &mm
pmm->m = helper(&pmm);
printf(" %d %d " , mm.m , pmm->m);
}
The example above, the line pmm->m = helper(&mm);, depend on the order of evaluation. if Lvalue evaluated first, than pmm->m is equivalent to mm.m, and if Rvalue calculated first than pmm->m is equivalent to the MM instance that allocated on heap.
My question is whether there's a C standard to determine the order of evaluation (didn't find any), or each compiler can choose what to do.
are there any other similar pitfalls I should be aware of ?
The semantics for evaluation of an = expression include that
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
(C2011, 6.5.16/3; emphasis added)
The emphasized provision explicitly permits your observed difference in the behavior of the program when compiled by different compilers. Moreover, unsequenced means, among other things, that it is permissible for the evaluations to occur in different order even in different runs of the very same build of the program. If the function in which the unsequenced evaluations appear were called more than once, then it would be permissible for the evaluations to occur in different order during different calls within the same execution of the program.
That already answers the question, but it's important to see the bigger picture. Modifying an object or calling a function that does so is a side effect (C2011, 5.1.2.3/2). This key provision therefore comes into play:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
(C2011, 6.5/2)
The called function has the side effect of modifying the value stored in main()'s variable pmm, evaluation of the left-hand operand of the assignment involves a value computation using the value of pmm, and these are unsequenced, therefore the behavior is undefined.
Undefined behavior is to be avoided at all costs. Because your program's behavior is undefined, is not limited to the two alternatives you observed (in case that wasn't bad enough). The C standard places no limitations whatever on what it may do. It might instead crash, zero out your hard drive's partition table, or, if you have suitable hardware, summon nasal demons. Or anything else. Most of these are unlikely, but the best viewpoint is that if your program has undefined behavior then your program is wrong.
When using the simple assignment operator: =, the order of evaluation of operands is unspecified. There is also no sequence point in between the evaluations.
For example if you have two functions:
*Get() = logf(2.0f);
It is not specified in which order they are called at any time, and yet this behavior is completely defined.
A function call will introduce a sequence point. It will happen after the evaluation of the arguments and before the actual call. The operator ; will also introduce a sequence point. This is important because an object must not be modified twice without an intervening sequence point, otherwise the behavior is undefined.
Your example is particularly complicated due to unspecified behavior, and may have different results, depending the left or right operand is evaluated first.
The left operand is evaluated first.
The left operand is evaluated and the pointer pmm will point to the struct mm. Then the function is called, and a sequence point occurs. it modifies the pointer pmm by pointing it to allocated memory, followed by a sequence point because of the operator ;. Then it stores the value 1000 to the member m, followed by another sequence point because of ;. The function returns 100 and assigns it to the left operand, but since the left operand was evaluated first, the value 100, it is assigned to the object mm, more specifically its member m.
mm->m has the value 100 and ppm->m has the value 1000. This is defined behavior, no object is modified twice in-between sequence points.
The right operand is evaluated first.
The function is called first, the sequence point occurs, it modifies the pointer ppm by pointing it to new allocated struct, followed by a sequence point. Then it stores the value 1000 to the member m, followed by a sequence point. Then the function returns. Then the left operand is evaluated, ppm->m will point to the new allocated struct, and its member m, is modified by assigning it the value 100.
mm->m will have the value 500 since it was never modified, and pmm->m will have the value 100. No object was modified twice in-between sequence points. The behavior is defined.

Why can this C code run correctly? [duplicate]

This question already has answers here:
Why does sizeof(x++) not increment x?
(10 answers)
Closed 7 years ago.
The C code likes this:
#include <stdio.h>
#include <unistd.h>
#define DIM(a) (sizeof(a)/sizeof(a[0]))
struct obj
{
int a[1];
};
int main()
{
struct obj *p = NULL;
printf("%d\n",DIM(p->a));
return 0;
}
This object pointer p is NULL, so, i think this p->a is illegal.
But i have tested this code in Ubuntu14.04, it can execute correctly. So, I want to know why...
Note: the original code had int a[0] above but I've changed that to int a[1] since everyone seems to be hung up on that rather than the actual question, which is:
Is the expression sizeof(p->a) valid when p is equal to NULL?
Because sizeof is a compile time construction, it does not depend on evaluating the input. sizeof(p->a) gets evaluated based on the declared type of the member p::a solely, and becomes a constant in the executable. So the fact that p points to null makes no difference.
The runtime value of p plays absolutely no role in the expression sizeof(p->a).
In C and C++, sizeof is an operator and not a function. It can be applied to either a type-id or an expression. Except in the case that of an expression and the expression is a variable-length array (new in C99) (as pointed out by paxdiablo), the expression is an unevaluated operand and the result is the same as if you had taken sizeof against the type of that expression instead. (C.f. C11 references due to paxdiablo below, C++14 working draft 5.3.3.1)
First up, if you want truly portable code, you shouldn't be attempting to create an array of size zero1, as you did in your original question, now fixed. But, since it's not really relevant to your question of whether sizeof(p->a) is valid when p == NULL, we can ignore it for now.
From C11 section 6.5.3.4 The sizeof and _Alignof operators (my bold):
2/ The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Therefore no evaluation of the operand is done unless it's a variable length array (which your example is not). Only the type itself is used to figure out the size.
1 For the language lawyers out there, C11 states in 6.7.6.2 Array declarators (my bold):
1/ In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero.
However, since that's in the constraints section (where shall and shall not do not involve undefined behaviour), it simply means the program itself is not strictly conforming. It's still covered by the standard itself.
This code contains a constraint violation in ISO C because of:
struct obj
{
int a[0];
};
Zero-sized arrays are not permitted anywhere. Therefore the C standard does not define the behaviour of this program (although there seems to be some debate about that).
The code can only "run correctly" if your compiler implements a non-standard extension to allow zero-sized arrays.
Extensions must be documented (C11 4/8), so hopefully your compiler's documentation defines its behaviour for struct obj (a zero-sized struct?) and the value of sizeof p->a, and whether or not sizeof evaluates its operand when the operand denotes a zero-sized array.
sizeof() doesn't care a thing about the content of anything, it merely looks at the resulting type of the expression.
Since C99 and variable length arrays, it is computed at run time when a variable length array is part of the expression in the sizeof operand.Otherwise, the operand is not evaluated and the result is an integer constant
Zero-size array declarations within structs was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets(flexible array members).

assignment expressions and volatile

I seem to have a reasonable understanding of volatiles in general, but there's one seemingly obscure case, in which I'm not sure how things are supposed to work per the standard. I've read the relevant parts of C99 and a dozen or more related posts on SO, but can't find the logic in this case or a place where this case is explained.
Suppose we have this piece of code:
int a, c;
volatile int b;
a = b = 1;
c = b += 1; /* or equivalently c = ++b; */
Should a be evaluated like this:
b = 1;
a = b; // volatile is read
or like this:
b = 1;
a = 1; // volatile isn't read
?
Similarly, should c be evaluated like this:
int tmp = b;
tmp++;
b = tmp;
c = b; // volatile is read
or like this:
int tmp = b;
tmp++;
b = tmp;
c = tmp; // volatile isn't read
?
In simple cases like a = b; c = b; things are clear. But how about the ones above?
Basically, the question is, what exactly does "expression has the value of the left operand after the assignment" mean in 6.5.16c3 of C99 when the object is volatile?:
An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left operand
after the assignment, but is not an lvalue.
Does it imply an extra read of the volatile to produce the value of the assignment expression?
UPDATE:
So, here's the dilemma.
If "the value of the object after the assignment" is not obtained from the extra read of the volatile object, then the compiler makes the assumption that the volatile object b:
is capable of holding an arbitrary int value that gets written into it, which it may not be (say, bit 0 is hardwired to 0, which is not an unusual thing with hardware registers, for which we are supposed to use volatiles)
cannot change between the point when the assigning write has occurred and the point when the expression value is obtained (and again it can be a problem with hardware registers)
And because of all that, the expression value, if not obtained from the extra read of the volatile object, does not yield the value of the volatile object, which the standard claims should be the case.
Both of these assumptions don't seem to fit well with the nature of volatile objects.
If, OTOH, "the value of the object after the assignment" is obtained from the extra implied read of said volatile object, then the side effects of evaluating assignment expressions with volatile left operands depend on whether the expression value is used or not or are completely arbitrary, which would be an odd, unexpected and poorly documented behavior.
C11 clarifies that this is unspecified.
You can find the final draft of C11 here. The second sentence you quoted now refers to footnote 111:
An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment,111) but is not an lvalue.
Footnote 111 says this:
The implementation is permitted to read the object to determine the value but is not required to, even when the object has volatile-qualified type.
From common sense I'd argue like this:
If b = (whatever) and whatever can be stored in a register, there's no reason for the compiler to re-evaluate the expression for assignment.
Also because it cannot be more recent than the value in the register.
Consider f(x) vs. r = f(x): Once the result of f(x) is known, it can be assigned.
So for a = b = 1 there should be no reason for assigning 1 to b a second time, just to be able to assign to a.
Also assume you write a = ++b:
Obviously b cannot be incremented a second time; otherwise basic C semantics would be broken.

Resources