Is a C implementation required to ignore undefined behaviors occurring during the evaluation of void expressions as if the evaluation itself never took place?
Considering C11, 6.3.2.2 §1:
If an expression of any other type is evaluated as a void expression, its value or designator is discarded. (A void expression is evaluated for its side effects.)
This is related to the common idiom used to prevent compilers from warning about unused variables:
void f() {
int a;
(void)a;
}
But what if we have undefined behavior, such as:
void f() {
int a;
(void)(1/0);
}
Can I safely claim that this program contains no undefined behavior? The standard says that "its value or designator is discarded", but the "expression (...) is evaluated (...)", so the evaluation does seem to take place.
GCC/Clang do report the undefined behavior, since it is obvious in this case, but in a more subtle example they don't:
int main() {
int a = 1;
int b = 0;
(void)(a/b);
return 0;
}
Even with -O0, neither GCC nor Clang evaluate 1/0. But that also happens even without the cast to void, so it's not representative.
Pushing the argument to its extreme, wouldn't the simple evaluation of (void)a in my first example (where a is uninitialized) systematically trigger undefined behavior?
ISO C11 6.3.2.1 §2 does mention that:
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
However, in the Annex J.2 Undefined behavior, the phrasing is slightly different:
The behavior is undefined in the following circumstances:
(...)
An lvalue designating an object of automatic storage duration that could have been declared with the register storage class is used in a context that requires the value of the designated object, but the object is uninitialized. (6.3.2.1).
This annex does lead to the interpretation that a void expression containing undefined behavior during its evaluation is not actually evaluated, but since it's just an annex, I'm not sure of its argumentative weight.
This is related to the common idiom used to prevent compilers from
warning about unused variables:
void f() {
int a;
(void)a;
}
Yes and no. I'd argue that that idiom turns an unused variable into a used one -- it appears in an expression -- with the cast to void serving to prevent compilers from complaining about the result of that expression going unused. But in the technical, language-lawyer sense, that particular expression of the idiom produces UB because the sub-expression a is subject to lvalue conversion when a's value is indeterminate. You've already quoted the relevant text of the standard.
But what if we have undefined behavior, such as:
void f() {
int a;
(void)(1/0);
}
Can I safely claim that this program contains no undefined behavior?
No.
The standard says that "its value or designator is discarded", but the
"expression (...) is evaluated (...)", so the evaluation does seem to
take place.
Yes, just as the expression a in your earlier example is also evaluated, also producing UB. UB arises from evaluation of the inner sub-expression. The conversion to type void is a separable consideration, exactly as a conversion to any other type would be.
GCC/Clang do report the undefined behavior, since it is obvious in
this case, but in a more subtle example they don't:
Compiler behavior cannot be taken as indicative here. C does not require compilers to diagnose most undefined behaviors, not even those that could, in principle, be detected at compile time. Indeed, it is important to recognize that UB arising from incorrect code happens first and foremost at compile time, though of course it follows that if an executable is produced then it exhibits UB, too.
Pushing the argument to its extreme, wouldn't the simple evaluation of
(void)a in my first example (where a is uninitialized) systematically
trigger undefined behavior?
Yes, as I already remarked. But that does not mean that programs containing such constructions are obligated to misbehave. As a quality-of-implementation matter, I think it reasonable to hope that the expression statement (void)a; will be accepted by the compiler and will have no corresponding runtime behavior at all. But I cannot rely on the language standard to back me up on that.
This annex does lead to the interpretation that a void expression
containing undefined behavior during its evaluation is not actually
evaluated, but since it's just an annex, I'm not sure of its
argumentative weight.
The plain wording of the normative text of the standard is quite sufficient here. The annex is not normative, but if there is any question about how the normative text is meant to be interpreted, then informative sections of the standard, such as Annex J, are one of the sources taken into account in sorting that out (but they are still only informative).
Related
An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.
In the C89 standard, I found the following section:
3.2.2.1 Lvalues and function designators
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; otherwise the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined.
If I read it correctly, it allows us to create an lvalue and applies some operators on it, which compiles and can cause undefined behavior during runtime.
Problem is that, I can't think of an example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.
Consider that an lvalue is
An lvalue is an expression (with an object type or an incomplete type other than void) that designates an object.
and that incomplete type is
Types are partitioned into object types (types that describe objects), function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).
A failed program I tried:
struct i_am_incomplete;
int main(void)
{
struct i_am_incomplete *p;
*(p + 1);
return 0;
}
and got the following error:
error: arithmetic on a pointer to an incomplete type 'struct i_am_incomplete'
*(p + 1);
~ ^
Anyone can think of an example on this ? An example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.
UPDATE:
As #algrid said in the answer, I misunderstood undefined behavior, which contains compile error as an option.
Maybe I'm splitting hairs, I still wonder the underlying motivation here to prefer undefined behavior over disallowing an lvalue to have an incomplete type.
I believe this program demonstrates the case:
struct S;
struct S *s, *f();
int main(void)
{
s = f();
if ( 0 )
*s; // here
}
struct S { int x; };
struct S *f() { static struct S y; return &y; }
On the marked line, *s is an lvalue of incomplete type, and it does not fall under any of the "Except..." cases in your quote of 3.2.2.1 (which is 6.3.2.1/2 in the current standard). Therefore it is undefined behaviour.
I tried my program in gcc and clang and they both rejected it with the error that a pointer to incomplete type cannot be dereferenced; but I cannot find anywhere in the Standard which would make that a constraint violation, so I believe the compilers are incorrect to reject the program. Or possibly the standard is defective by omitting such a constraint, which would make sense.
(Since the code is inside an if(0), that means the compiler cannot reject it merely on the basis of it being undefined behaviour).
Some build systems may have been designed in a way would allow code like:
extern struct foo x;
extern use_foo(struct foo x); // Pass by value
...
use_foo(x);
to be processed successfully without the compiler having to know or care
about the actual representation of struct foo [for example, some systems may process pass-by-value by having the caller pass the address of an object and requiring the called function to make a copy if it's going to modify it].
Such a facility may be useful on systems that could support it, and I don't think the authors of the Standard wanted to imply that code which used that feature was "broken", but they also didn't want to mandate that all C implementations support such a feature. Making the behavior undefined would allow implementations to support it when practical, without requiring that they do so.
"Undefined behavior" term includes compilation error as an option. From the C89 standard:
Undefined behavior - behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
As you can see "terminating a translation" is ok.
In this case I believe the compilation error you get for you sample code is an example of "undefined behavior" implemented as compile time error.
Sure, array types can be that:
extern double A[];
...
A[0] = 1; // lvalue conversion of A
This has well defined behavior, even if the definition of A is not visible to the compiler. So inside this TU the array type is never completed.
Difference of two pointers of the same type is always one.
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
Output is 1.
I dont get the reasoning behind it
The behavior is undefined.
C99 6.5.6 paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
Paragraph 7 in the same section says:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Section 4 paragraph 2 says:
If a "shall" or "shall not" requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
"undefined behavior" or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they
all describe "behavior that is undefined".
3.4.3 defines the term "undefined behavior" as:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
Given the declaration:
int a = 5, b = 10, c;
it's likely that evaluating &b - &a will yield a result that seems reasonable, such as 1 or -1. (Reasonable results are always a possible symptom of undefined behavior; it's undefined, not required to crash.) But the compiler is under no obligation to place a and b at any particular locations in memory relative to each other, and even if it does so, the subtraction is not guaranteed to be meaningful. An optimizing compiler is free to transform your program in ways that assume that its behavior is well defined, resulting in code that can behave in arbitrarily bad ways if that assumption is violated.
By writing &b - &a, you are in effect promising the compiler that that's a meaningful operation. As Henry Spencer famously said, "If you lie to the compiler, it will get is revenge."
Note that it's not just the result of the subtraction that's undefined, it's the behavior of the program that evaluates it.
Oh, did I mention that the behavior is undefined?
In fact, the behaviour of your program is undefined. The output happens to be 1 on your platform/compiler, but it could just as easily be something else.
This code exhibits undefined behaviour because pointer arithmetic is only defined when the operands are both in the same array or struct.
The value of 1 comes about because the compiler has placed a and b next to each other in memory. Another compiler could do something different. Indeed the same compiler could do something different the next time you change the code.
C compiler knows the size of each type. for example suppose P is a int pointer that refer to address 0x0010. if you add P by 1 (P++ or P=P+1), then the value of P is 0x0014.
About your question, a and b variables declared tandems, in physical memory they tandems, and the head of each of them, has 4 bytes difference with other. At this situation compiler knows the int size is 4 bytes. when you subtract 2 int pointer, the compiler divide the result by 4.
I was answering a question and made this test program.
#include <stdio.h>
int main()
{
volatile const int v = 5;
int * a = &v;
*a =4;
printf("%d\n", v);
return 0;
}
Without the volatile keyword the code optimizes (compiled with -O3 apple clang 4.2) the change of the var away, with it works as expected and the const variable is modified correctly.
I was wondering if a more experienced C developer knows if there is a part of the standard that says this is unsafe or UB.
UPDATE: #EricPostpischil gave me this standards quote
A program may not modify its own object defined with a const-qualified type, per C 2011 (N1570) 6.7.3 6: “If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.” An external agent may modify an object that has volatile-qualified type, per 6.7.3 7: “An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects
My program breaks the first rule but I thought that the second rule may exempt a program from the first.
UPDATE 2:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.134) What constitutes an access to an object that has volatile-qualified type is implementation-defined.
If you look at this quote you can see the var must be evaluated according to certain rules, I haven't read through all of section 5.1.2.3 but I believe that this may shed some light on the issue.
It is unsafe because the same behavior cannot be guaranteed for use in other compilers. So your code is compiler-dependent and may even be compiler switch dependent. That's why it's a bad idea.
This line:
int * a = &v;
is a constraint violation. The compiler must produce a diagnostic message, and may reject the program. If the compiler produces an executable anyway, then that executable has completely undefined behaviour (i.e. the C Standard no longer covers the program at all).
The constraints violated are that volatile nor const may not be implicitly converted away.
To comply with the C standard, the pointer must have its pointed-to type having the same or stronger qualifiers as the object being pointed to, e.g.:
int const volatile *a = &v;
after which you will find that the line *a = 4; causes a compilation error.
A possible attempt might be:
int *a = (int *)&v;
This line must compile, but then it causes undefined behaviour to read or write via *a. The undefined behaviour is specified by C11 6.7.3/6 (C99 and C89 had similar text):
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.
Difference of two pointers of the same type is always one.
#include<stdio.h>
#include<string.h>
int main(){
int a = 5,b = 10,c;
int *p = &a,*q = &b;
c = p - q;
printf("%d" , c);
return 0;
}
Output is 1.
I dont get the reasoning behind it
The behavior is undefined.
C99 6.5.6 paragraph 9 says:
When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements.
Paragraph 7 in the same section says:
For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.
Section 4 paragraph 2 says:
If a "shall" or "shall not" requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
"undefined behavior" or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they
all describe "behavior that is undefined".
3.4.3 defines the term "undefined behavior" as:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International imposes no
requirements
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
Given the declaration:
int a = 5, b = 10, c;
it's likely that evaluating &b - &a will yield a result that seems reasonable, such as 1 or -1. (Reasonable results are always a possible symptom of undefined behavior; it's undefined, not required to crash.) But the compiler is under no obligation to place a and b at any particular locations in memory relative to each other, and even if it does so, the subtraction is not guaranteed to be meaningful. An optimizing compiler is free to transform your program in ways that assume that its behavior is well defined, resulting in code that can behave in arbitrarily bad ways if that assumption is violated.
By writing &b - &a, you are in effect promising the compiler that that's a meaningful operation. As Henry Spencer famously said, "If you lie to the compiler, it will get is revenge."
Note that it's not just the result of the subtraction that's undefined, it's the behavior of the program that evaluates it.
Oh, did I mention that the behavior is undefined?
In fact, the behaviour of your program is undefined. The output happens to be 1 on your platform/compiler, but it could just as easily be something else.
This code exhibits undefined behaviour because pointer arithmetic is only defined when the operands are both in the same array or struct.
The value of 1 comes about because the compiler has placed a and b next to each other in memory. Another compiler could do something different. Indeed the same compiler could do something different the next time you change the code.
C compiler knows the size of each type. for example suppose P is a int pointer that refer to address 0x0010. if you add P by 1 (P++ or P=P+1), then the value of P is 0x0014.
About your question, a and b variables declared tandems, in physical memory they tandems, and the head of each of them, has 4 bytes difference with other. At this situation compiler knows the int size is 4 bytes. when you subtract 2 int pointer, the compiler divide the result by 4.