I am trying to decipher a note that led to a change between C99 and C11.
The change proposed in that note ended up in C11's 6.2.4:8, namely:
A non-lvalue expression with structure or union type, where the
structure or union contains a member with array type (including,
recursively, members of all contained structures and unions) refers
to an object with automatic storage duration and temporary lifetime.
Its lifetime begins when the expression is evaluated and its initial
value is the value of the expression. Its lifetime ends when the
evaluation of the containing full expression or full declarator ends.
Any attempt to modify an object with temporary lifetime results in
undefined behavior.
I understand why the change was needed (some discussion can be found here. Note that the discussion goes back to before C11). However, what I don't understand is a side remark that Clark Nelson made in writing his note:
Please note that this approach additionally declares an example like
this, which was conforming under C99, to be non-conforming:
struct X { int a[5]; } f();
int *p = f().a;
printf("%p\n", p);
I understand why this example is non-conforming under C11. What I specifically fail to understand is how it is conforming under C99. And, if it is defined under C99, what is it supposed to do then, definedly print the value of a dangling pointer?
My understanding is that in C99, the finest grain of lifetime for an object is the block. Thus, while 6.5.2.2 (and some other § mentioned in the note you refer to) specifically says that you can't access the returned value after the next sequence point, technically its address is not indeterminate until after you have left the enclosing block (the reason why you should have some storage reserved for an inaccessible object is left as an exercise for the reader, though). Thus, something like
struct X { int a[5]; } f();
int *p;
{ p = f().a; }
printf("%p\n", p);
is undefined in C99 as well as in C11. In C11, the notion of "temporary lifetime", that does not exist in C99, allows to consider that the pointer becomes indeterminate as soon as the full expression ends.
Related
An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.
I have the following example
#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
typedef struct test{
int a;
long b;
int c;
} test;
int main()
{
test *t = (test*) malloc(offsetof(test, c));
t -> b = 100;
}
It works fine, but Im not sure about it. I think I have UB here. We have a pointer to an object of a structure type. But the object of the structure type is not really valid.
I went through the standard and could not find any definition of this behavior. The only section I could find close to this one is 6.5.3.2:
If an invalid value has been assigned to the pointer, the behavior of
the unary * operator is undefined
But this is not really relevant since the pointer returned by malloc is completely valid.
Is there a reference in the standard explaining such a behavior? I'm using C11 N1570.
From C2011, paragraph 6.2.6.1/4:
Values stored in non-bit-field objects of any other object type consist of n x CHAR_BIT bits, where n is the size of an object of that type, in bytes.
Therefore, since the allocated object in your code is smaller than the size of a struct test, it cannot contain a value of an object of that type.
Now consider your expression t -> b = 100. C2011, paragraph 6.5.2.3/4 defines the behavior of the -> operator:
A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points [...].
(Emphasis added.) We've established that your t does not (indeed, cannot) point to a struct test, however, so the best we can say about 6.5.2.3/4 is that it does not apply to your case. There being no other definition of the behavior of the -> operator, we are left with paragraph 4/2 (emphasis added):
If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior.
So there you are. The behavior of your code is undefined.
since the pointer returned by malloc is completely valid.
No the pointer is not "completely valid". Not at all.
Why do you think the pointer is "completely valid"? You didn't allocate enough bytes to hold an entire struct test - the pointer is not "completely valid" as there isn't a valid struct test object for you to access.
There's no such thing as a partial object in C. That's why you can't find it in the C standard.
It works fine
No, it doesn't.
"I didn't observe it blowing up." is not the same as "It works fine."
Your code doesn't do anything observable. Per the as-if rule the compiler is free to elide the entire thing and just return zero from main().
This question is about the definedness or otherwise of assigning an uninitalised automatic variable to another one of the same type.
Consider
typedef struct
{
int s1;
int s2;
} Foo;
typedef union
{
int u1;
Foo u2;
} Bar;
int main()
{
{
int a;
int b = a; // (1)
}
{
Foo a;
Foo b = a; // (2)
}
{
Bar a;
a.u1 = 0;
Bar b = a; // (3)
}
}
Referring to the comments in main:
(1) is undefined since a is uninitialised. That much I know.
But what about (2)? The struct members s1 and s2 are uninitialised.
Furthermore, what about (3)? The memory u2.s2 is uninitialised, so reading it is undefined behaviour no?
The behavior is undefined in (1) and (2).
Per the C standard, the value of an object with automatic storage duration that is not initialized is indeterminate (C 2011 [N1570] 6.7.9 10). Nominally, this means it has some value, but we do not know what it is while writing the program.
However, the standard also says “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined” (6.3.2.1 2). In your sample code, the address of a is never taken, and it is not initialized, and using it in an expression is an lvalue. Therefore, the behavior is undefined.
(This passage, 6.3.2.1 2, was designed to accommodate processors that can detect use of an uninitialized register. Nonetheless, the rule in the C standard applies to all implementations.)
(3) is not clearly addressed by the C standard. Although a member of the union has been assigned a value, and hence is not uninitialized for purposes of 6.3.2.1 2, the object being used in b = a is the union, not its member. Obviously, our intuitive notion is that, if a member of a union is assigned a value, the union has a value. However, I do not see this specified in the C standard.
We can infer 6.3.2.1 2 is not intended to consider a union or structure to be uninitialized, at least if part of it has been assigned a value, because:
Structures can have unnamed members, such as unnamed bit fields.
Per C 6.7.9 9, unnamed members of structures have indeterminate value, even after initialization (of the structures).
If 6.3.2.1 2 applied to structures in which not every member had been assigned a value, then b = a would always be undefined if a were a structure with an unnamed member and had automatic storage duration.
That seems unreasonable and not what the standard intended.
However, there is some wiggle room here. The standard could have specified that a structure is not uninitialized only if it were initialized or all of its named members have been assigned values. In that case (3) would be undefined if a were a structure in which only one member had been assigned a value. I do not think this wiggle room exists with a union; if a member of the union has been assigned a value, it is only reasonable to consider the union not to be uninitialized.
In general, assigning from an uninitialized object isn't undefined behavior, it only makes the result unspecified.
But the code you show indeed has undefined behavior -- for a different reason than you assume. Citing N1570 (latest C11 draft), §6.3.2.1 p2 here:
[...] If
the lvalue designates an object of automatic storage duration that could have been
declared with the register storage class (never had its address taken), and that object
is uninitialized (not declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
Explaining this a bit: The C standard is prepared to handle values that aren't stored in an addressable location. This is typically the case when they are held in one of the CPU's registers. Explicitly giving an object the register storage class is only a hint to the compiler that it should, if sensible, hold that object in a register. The other way around, a compiler is free to hold any object with automatic storage duration in a register as long as the code doesn't need to address it (by taking a pointer).
In your code, you have uninitialized objects with automatic storage duration that never have their address taken, so the compiler would be free to place them in registers. This means there is no value for the object (not even an unspecified one) before it is initialized. Therefore, using this potentially non-existent value to initialize another object (or, for other purposes) is undefined behavior.
If your code would take a pointer to the respective a in all these examples, the result of the assignment would be unspecified (of course), but the behavior would be defined.
It's worth to add that structs and unions have nothing to do with the answer to your question. The rules are the same for all kind of objects with automatic storage duration. That said, in your third example, a isn't uninitialized any more, after you assign one member of the union. So for your third example, the behavior is well-defined. It doesn't matter what's in the other member of the union, a union can only hold a value for one of its members at a time.
At C99§6.5.2.2p5 there's this little gem, bolded by me for the purpose of emphasizing the question:
If the expression that denotes the called function has type pointer to function returning an object type, the function call expression has the same type as that object type, and has the value determined as specified in 6.8.6.4. Otherwise, the function call has type void. If an attempt is made to modify the result of a function call or to access it after the next sequence point, the behavior is undefined.
This allowed us to return structs, for example:
struct foo { int foo;
char bar[2]; };
struct foo get_foo() {
struct foo return_value = { .foo = 42,
.bar = "x" };
return return_value;
}
... and assign that return value somewhere else from within the caller, for example:
int main(void) {
struct foo bar = get_foo(); /* Well defined because the return value
* is copied -before- the sequence point
* that terminates its storage duration */
printf("%s\n", bar.bar);
printf("%d\n", get_foo().foo); /* Again, well defined because the access
* occurs before the next sequence point
* (the function call). */
}
... whilst rendering examples like the following invalid:
int main(void) {
printf("%s\n", get_foo().bar); /* UB because there's a sequence point
* between the evaluation of the sub-
* expression `get_foo().bar` and the
* entrace to the function `printf` */
get_foo().bar[0]++; /* UB because an attempt is made to modify the
* result of a function call */
}
--
C11§6.5.2.2p5, however, is essentially the same paragraph but without the bolded text.
If the expression that denotes the called function has type pointer to function returning an object type, the function call expression has the same type as that object type, and has the value determined as specified in 6.8.6.4. Otherwise, the function call has type void.
Are those examples above that are undefined behaviour in C99 still undefined in C11? If so, which paragraphs invalidate them? If not, I gather there must be some extension of the storage duration of automatic values/objects returned; which section of the standard specifies that extension of storage duration?
Are those examples above that are undefined behaviour in C99 still undefined in C11?
The examples posed above that are well defined are still well defined.
The temporary lifetime of the object in this one "ends when the evaluation of the containing full expression or declarator ends", so this previously undefined example is now well defined:
printf("%s\n", get_foo().bar);
This example is still undefined behaviour, because an attempt is made to modify an object that has temporary lifetime:
get_foo().bar[0]++;
If so, which paragraphs invalidate them? If not, I gather there must be some extension of the storage duration of automatic values/objects returned; which section of the standard specifies that extension of storage duration?
As pointed out by Jens Gustedt in a comment, C11§6.2.4p8 seems to convey a slightly different meaning to the sentence that C99§6.5.2.2p5 contains which C11§6.5.2.2p5 omitted:
A non-lvalue expression with structure or union type, where the structure or union contains a member with array type (including, recursively, members of all contained structures and unions) refers to an object with automatic storage duration and temporary lifetime.36) Its lifetime begins when the expression is evaluated and its initial value is the value of the expression. Its lifetime ends when the evaluation of the containing full expression or full declarator ends. Any attempt to modify an object with temporary lifetime results in undefined behavior.
36) The address of such an object is taken implicitly when an array member is accessed.
It seems that a little reorganisation was performed; the "storage duration extension" sentence in C99 was changed and moved from the "function call" section to the "storage duration" section, where it fits better.
The only question remaining is whether or not the result of a function call is considered an lvalue. For every operator that produces an lvalue, it seems as though it is explicitly mentioned that the operator produces an lvalue. For example, C11§6.5.3.2p6 states that the unary * operator produces an lvalue providing its operand points at an object.
The function call operator, however, says nothing about producing an lvalue, so we must assume that it doesn't produce an lvalue. If that's not good enough, then consider C11§6.5.2.3p3 and p7, which say:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.
If f is a function returning a structure or union, and x is a member of that structure or union, f().x is a valid postfix expression but is not an lvalue.
We can also deduce from these two paragraphs that the result of a function is not an lvalue, thus meeting the criteria for C11§6.2.4p8 (quoted above).
Footnote 95 is interesting but tangential to the discussion on hand:
95) If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.
The following compiles and prints "string" as an output.
#include <stdio.h>
struct S { int x; char c[7]; };
struct S bar() {
struct S s = {42, "string"};
return s;
}
int main()
{
printf("%s", bar().c);
}
Apparently this seems to invokes an undefined behavior according to
C99 6.5.2.2/5 If an attempt is made to modify the result of a function
call or to access it after the next sequence point, the behavior is
undefined.
I don't understand where it says about "next sequence point". What's going on here?
You've run into a subtle corner of the language.
An expression of array type is, in most contexts, implicitly converted to a pointer to the first element of the array object. The exceptions, none of which apply here, are:
When the array expression is the operand of a unary & operator (which yields the address of the entire array);
When it's the operand of a unary sizeof or (as of C11) _Alignof operator (sizeof arr yields the size of the array, not the size of a pointer); and
When it's a string literal in an initializer used to initialize an array object (char str[6] = "hello"; doesn't convert "hello" to a char*.)
(The N1570 draft incorrectly adds _Alignof to the list of exceptions. In fact, for reasons that are not clear, _Alignof can only be applied to a type name, not to an expression.)
Note that there's an implicit assumption: that the array expression refers to an array object in the first place. In most cases, it does (the simplest case is when the array expression is the name of a declared array object) -- but in this one case, there is no array object.
If a function returns a struct, the struct result is returned by value. In this case, the struct contains an array, giving us an array value with no corresponding array object, at least logically. So the array expression bar().c decays to a pointer to the first element of ... er, um, ... an array object that doesn't exist.
The 2011 ISO C standard addresses this by introducing "temporary lifetime", which applies only to "A non-lvalue expression with structure or union type, where the structure or union
contains a member with array type" (N1570 6.2.4p8). Such an object may not be modified, and its lifetime ends at the end of the containing full expression or full declarator.
So as of C2011, your program's behavior is well defined. The printf call gets a pointer to the first element of an array that's part of a struct object with temporary lifetime; that object continues to exist until the printf call finishes.
But as of C99, the behavior is undefined -- not necessarily because of the clause you quote (as far as I can tell, there is no intervening sequence point), but because C99 doesn't define the array object that would be necessary for the printf to work.
If your goal is to get this program to work, rather than to understand why it might fail, you can store the result of the function call in an explicit object:
const struct s result = bar();
printf("%s", result.c);
Now you have a struct object with automatic, rather than temporary, storage duration, so it exists during and after the execution of the printf call.
The sequence point occurs at the end of the full expression- i.e., when printf returns in this example. There are other cases where sequence points occur
Effectively, this rule states that function temporaries do not live beyond the next sequence point- which in this case, occurs well after it's use, so your program has quite well-defined behaviour.
Here's a simple example of not well-defined behaviour:
char* c = bar().c; *c = 5; // UB
Here, the sequence point is met after c is created, and the memory it points to is destroyed, but we then attempt to access c, resulting in UB.
In C99 there is a sequence point at the call to a function, after the arguments have been evaluated (C99 6.5.2.2/10).
So, when bar().c is evaluated, it results in a pointer to the first element in the char c[7] array in the struct returned by bar(). However, that pointer gets copied into an argument (a nameless argument as it happens) to printf(), and by the time the call is actually made to the printf() function the sequence point mentioned above has occurred, so the member that the pointer was pointing to may no longer be alive.
As Keith Thomson mentions, C11 (and C++) make stronger guarantees about the lifetime of temporaries, so the behavior under those standards would not be undefined.