Modifying a const variable with the volatile keyword - c

I was answering a question and made this test program.
#include <stdio.h>
int main()
{
volatile const int v = 5;
int * a = &v;
*a =4;
printf("%d\n", v);
return 0;
}
Without the volatile keyword the code optimizes (compiled with -O3 apple clang 4.2) the change of the var away, with it works as expected and the const variable is modified correctly.
I was wondering if a more experienced C developer knows if there is a part of the standard that says this is unsafe or UB.
UPDATE: #EricPostpischil gave me this standards quote
A program may not modify its own object defined with a const-qualified type, per C 2011 (N1570) 6.7.3 6: “If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.” An external agent may modify an object that has volatile-qualified type, per 6.7.3 7: “An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects
My program breaks the first rule but I thought that the second rule may exempt a program from the first.
UPDATE 2:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.134) What constitutes an access to an object that has volatile-qualified type is implementation-defined.
If you look at this quote you can see the var must be evaluated according to certain rules, I haven't read through all of section 5.1.2.3 but I believe that this may shed some light on the issue.

It is unsafe because the same behavior cannot be guaranteed for use in other compilers. So your code is compiler-dependent and may even be compiler switch dependent. That's why it's a bad idea.

This line:
int * a = &v;
is a constraint violation. The compiler must produce a diagnostic message, and may reject the program. If the compiler produces an executable anyway, then that executable has completely undefined behaviour (i.e. the C Standard no longer covers the program at all).
The constraints violated are that volatile nor const may not be implicitly converted away.
To comply with the C standard, the pointer must have its pointed-to type having the same or stronger qualifiers as the object being pointed to, e.g.:
int const volatile *a = &v;
after which you will find that the line *a = 4; causes a compilation error.
A possible attempt might be:
int *a = (int *)&v;
This line must compile, but then it causes undefined behaviour to read or write via *a. The undefined behaviour is specified by C11 6.7.3/6 (C99 and C89 had similar text):
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.

Related

Is reading an uninitialized value always an undefined behaviour? Or are there exceptions to it?

An obvious example of undefined behavior (UB), when reading a value, is:
int a;
printf("%d\n", a);
What about the following examples?
int i = i; // `i` is not initialized when we are reading it by assigning it to itself.
int x; x = x; // Is this the same as above?
int y; int z = y;
Are all three examples above also UB, or are there exceptions to it?
Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:
Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.
In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.
This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.
There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:
5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.
6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way. (Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached
So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.
The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:
int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);
In this case it is not undefined behavior if:
The implementation does not have trap representations for the given type, OR
The value chosen for a happens to not be a trap representation.
In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.
Use of the not initialized automatic storage duration objects invokes UB.
Use of the not initialized static storage duration objects is defined as they are initialized to 0s
int a;
int foo(void)
{
static int b;
int c;
int d = d; //UB
static int e = e; //OK
printf("%d\n", a); //OK
printf("%d\n", b); //OK
printf("%d\n", c); //UB
}
In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".
With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.
For example, given something like:
struct ztstr15 { char dat[16]; } x,y;
void test(void)
{
struct zstr15 hey;
strcpy(hey.dat, "Hey");
x=hey;
y=hey;
}
Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:
Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.
Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.
Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.
Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.
I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.

Undefined behavior inside void expressions

Is a C implementation required to ignore undefined behaviors occurring during the evaluation of void expressions as if the evaluation itself never took place?
Considering C11, 6.3.2.2 §1:
If an expression of any other type is evaluated as a void expression, its value or designator is discarded. (A void expression is evaluated for its side effects.)
This is related to the common idiom used to prevent compilers from warning about unused variables:
void f() {
int a;
(void)a;
}
But what if we have undefined behavior, such as:
void f() {
int a;
(void)(1/0);
}
Can I safely claim that this program contains no undefined behavior? The standard says that "its value or designator is discarded", but the "expression (...) is evaluated (...)", so the evaluation does seem to take place.
GCC/Clang do report the undefined behavior, since it is obvious in this case, but in a more subtle example they don't:
int main() {
int a = 1;
int b = 0;
(void)(a/b);
return 0;
}
Even with -O0, neither GCC nor Clang evaluate 1/0. But that also happens even without the cast to void, so it's not representative.
Pushing the argument to its extreme, wouldn't the simple evaluation of (void)a in my first example (where a is uninitialized) systematically trigger undefined behavior?
ISO C11 6.3.2.1 §2 does mention that:
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
However, in the Annex J.2 Undefined behavior, the phrasing is slightly different:
The behavior is undefined in the following circumstances:
(...)
An lvalue designating an object of automatic storage duration that could have been declared with the register storage class is used in a context that requires the value of the designated object, but the object is uninitialized. (6.3.2.1).
This annex does lead to the interpretation that a void expression containing undefined behavior during its evaluation is not actually evaluated, but since it's just an annex, I'm not sure of its argumentative weight.
This is related to the common idiom used to prevent compilers from
warning about unused variables:
void f() {
int a;
(void)a;
}
Yes and no. I'd argue that that idiom turns an unused variable into a used one -- it appears in an expression -- with the cast to void serving to prevent compilers from complaining about the result of that expression going unused. But in the technical, language-lawyer sense, that particular expression of the idiom produces UB because the sub-expression a is subject to lvalue conversion when a's value is indeterminate. You've already quoted the relevant text of the standard.
But what if we have undefined behavior, such as:
void f() {
int a;
(void)(1/0);
}
Can I safely claim that this program contains no undefined behavior?
No.
The standard says that "its value or designator is discarded", but the
"expression (...) is evaluated (...)", so the evaluation does seem to
take place.
Yes, just as the expression a in your earlier example is also evaluated, also producing UB. UB arises from evaluation of the inner sub-expression. The conversion to type void is a separable consideration, exactly as a conversion to any other type would be.
GCC/Clang do report the undefined behavior, since it is obvious in
this case, but in a more subtle example they don't:
Compiler behavior cannot be taken as indicative here. C does not require compilers to diagnose most undefined behaviors, not even those that could, in principle, be detected at compile time. Indeed, it is important to recognize that UB arising from incorrect code happens first and foremost at compile time, though of course it follows that if an executable is produced then it exhibits UB, too.
Pushing the argument to its extreme, wouldn't the simple evaluation of
(void)a in my first example (where a is uninitialized) systematically
trigger undefined behavior?
Yes, as I already remarked. But that does not mean that programs containing such constructions are obligated to misbehave. As a quality-of-implementation matter, I think it reasonable to hope that the expression statement (void)a; will be accepted by the compiler and will have no corresponding runtime behavior at all. But I cannot rely on the language standard to back me up on that.
This annex does lead to the interpretation that a void expression
containing undefined behavior during its evaluation is not actually
evaluated, but since it's just an annex, I'm not sure of its
argumentative weight.
The plain wording of the normative text of the standard is quite sufficient here. The annex is not normative, but if there is any question about how the normative text is meant to be interpreted, then informative sections of the standard, such as Annex J, are one of the sources taken into account in sorting that out (but they are still only informative).

In C language, is it semantically possible to create an lvalue with incomplete type?

In the C89 standard, I found the following section:
3.2.2.1 Lvalues and function designators
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; otherwise the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined.
If I read it correctly, it allows us to create an lvalue and applies some operators on it, which compiles and can cause undefined behavior during runtime.
Problem is that, I can't think of an example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.
Consider that an lvalue is
An lvalue is an expression (with an object type or an incomplete type other than void) that designates an object.
and that incomplete type is
Types are partitioned into object types (types that describe objects), function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).
A failed program I tried:
struct i_am_incomplete;
int main(void)
{
struct i_am_incomplete *p;
*(p + 1);
return 0;
}
and got the following error:
error: arithmetic on a pointer to an incomplete type 'struct i_am_incomplete'
*(p + 1);
~ ^
Anyone can think of an example on this ? An example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.
UPDATE:
As #algrid said in the answer, I misunderstood undefined behavior, which contains compile error as an option.
Maybe I'm splitting hairs, I still wonder the underlying motivation here to prefer undefined behavior over disallowing an lvalue to have an incomplete type.
I believe this program demonstrates the case:
struct S;
struct S *s, *f();
int main(void)
{
s = f();
if ( 0 )
*s; // here
}
struct S { int x; };
struct S *f() { static struct S y; return &y; }
On the marked line, *s is an lvalue of incomplete type, and it does not fall under any of the "Except..." cases in your quote of 3.2.2.1 (which is 6.3.2.1/2 in the current standard). Therefore it is undefined behaviour.
I tried my program in gcc and clang and they both rejected it with the error that a pointer to incomplete type cannot be dereferenced; but I cannot find anywhere in the Standard which would make that a constraint violation, so I believe the compilers are incorrect to reject the program. Or possibly the standard is defective by omitting such a constraint, which would make sense.
(Since the code is inside an if(0), that means the compiler cannot reject it merely on the basis of it being undefined behaviour).
Some build systems may have been designed in a way would allow code like:
extern struct foo x;
extern use_foo(struct foo x); // Pass by value
...
use_foo(x);
to be processed successfully without the compiler having to know or care
about the actual representation of struct foo [for example, some systems may process pass-by-value by having the caller pass the address of an object and requiring the called function to make a copy if it's going to modify it].
Such a facility may be useful on systems that could support it, and I don't think the authors of the Standard wanted to imply that code which used that feature was "broken", but they also didn't want to mandate that all C implementations support such a feature. Making the behavior undefined would allow implementations to support it when practical, without requiring that they do so.
"Undefined behavior" term includes compilation error as an option. From the C89 standard:
Undefined behavior - behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
As you can see "terminating a translation" is ok.
In this case I believe the compilation error you get for you sample code is an example of "undefined behavior" implemented as compile time error.
Sure, array types can be that:
extern double A[];
...
A[0] = 1; // lvalue conversion of A
This has well defined behavior, even if the definition of A is not visible to the compiler. So inside this TU the array type is never completed.

Is this behavior of clang standard compliant?

This is going to be a long, language lawyerish question, so I'd like to quickly state why I find it relevant. I am working on a project where strict standard compliance is crucial (writing a language that compiles to C). The example I am going to give seems like a standard violation on the part of clang, and so, if this is the case, I'd like to confirm it.
gcc says that a conditional with a pointer to a restrict qualified pointer can not co-inhabit a conditional statement with a void pointer. On the other hand, clang compiles such things fine. Here is an example program:
#include <stdlib.h>
int main(void){
int* restrict* A = malloc(8);
A ? A : malloc(8);
return 0;
}
For gcc, the options -std=c11 and -pedantic may be included or not in any combination, likewise for clang and the options -std=c11 and -Weverything. In any case, clang compiles with no errors, and gcc gives the following:
tem-2.c: In function ‘main’:
tem-2.c:7:2: error: invalid use of ‘restrict’
A ? A : malloc(8);
^
The c11 standard says the following with regard to conditional statements, emphasis added:
6.5.15 Conditional operator
...
One of the following shall hold for the second and third operands:
— both operands have arithmetic type;
— both operands have the same structure or union type;
— both operands have void type;
— both operands are pointers to qualified or unqualified versions of compatible types;
— one operand is a pointer and the other is a null pointer constant; or
— one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void.
...
If both the second and third operands are pointers or one is a null pointer constant and the
other is a pointer, the result type is a pointer to a type qualified with all the type qualifiers
of the types referenced by both operands. Furthermore, if both operands are pointers to
compatible types or to differently qualified versions of compatible types, the result type is
a pointer to an appropriately qualified version of the composite type; if one operand is a
null pointer constant, the result has the type of the other operand; otherwise, one operand
is a pointer to void or a qualified version of void, in which case the result type is a
pointer to an appropriately qualified version of void.
...
The way I see it, the first bold portion above says that the two types can go together, and the second bold portion defines the result to be a pointer to a restrict qualified version of void. However, as the following states, this type can not exist, and so the expression is correctly identified as erroneous by gcc:
6.7.3 Type qualifiers, paragraph 2
Types other than pointer types whose referenced type is an object type shall not be restrict-qualified.
Now, the problem is that a "shall not" condition is violated by this example program, and so is required to produce an error, by the following:
5.1.1.3 Diagnostics, paragraph 1
A conforming implementation shall produce at least one diagnostic message (identified in
an implementation-defined manner) if a preprocessing translation unit or translation unit
contains a violation of any syntax rule or constraint, even if the behavior is also explicitly
specified as undefined or implementation-defined. Diagnostic messages need not be
produced in other circumstances.
It seems clang is not standard compliant by treating an erroneous type silently. That makes me wonder what else clang does silently.
I am using gcc version 5.4.0 and clang version 3.8.0, on an x86-64 Ubuntu machine.
Yes it looks like a bug.
Your question more briefly: can void be restrict qualified? Since void is clearly not a pointer type, the answer is no. Because this violates a constraint, the compiler should give a diagnostic.
I was able to trick clang to confess its sins by using a _Generic expression
puts(_Generic(A ? A : malloc(8), void* : "void*"));
and clang tells me
static.c:24:18: error: controlling expression type 'restrict void *' not compatible with any generic association type
puts(_Generic(A ? A : malloc(8), void* : "void*"));
which shows that clang here really tries to match a nonsense type restrict void*.
Please file them a bug report.
While a compiler could satisfy all obligations surrounding restrict by ignoring the qualifier altogether, a compiler which wants to keep track of what it is or is not allowed to do needs to keep track of which pointers hold copies of restrict pointers. Given something like:
int *foo;
int *bar;
int wow(int *restrict p)
{
foo = p;
...
*p = 123;
*foo = 456;
*p++;
*bar = 890;
return *p;
}
since foo is derived from p, a compiler must allow for accesses made via
foo to alias accesses via p. A compiler need not make such allowances
for accesses made via bar, since that is known not to hold an address derived from p.
The rules surrounding restrict get murky in cases where a pointer may or
may not be derived from another. A compiler would certainly be allowed to
simply ignore a restrict qualifier in cases where it can't track all of
the pointers derived from a pointer; I'm not sure if any such cases would
invoke UB even if nothing ever modifies the storage identified by the
pointer. If a syntactic construct is structurally guaranteed to invoke
UB, having a compiler squawk may be more useful than having it act in an
arbitrary fashion (though having a compiler simply ignore any restrict
qualifiers it can't fully handle might be more useful yet).

Pass integer address in method expecting unsigned char pointer

Recently I had code (in C) where I passed the address of an int to a function expecting a pointer to unsigned char. Is this not valid? Is this UB or what?
e.g.,
void f(unsigned char*p)
{
// do something
}
// Call it somewhere
int x = 0; // actually it was uint32_t if it makes difference
f(&x);
I did get a warning though ... Compiled in Xcode
int * and unsigned char * are not considered compatible types, so implicit conversion will issue a diagnostic. However, the standard does allow explicit casting between different pointers, subject to two rules (C11 section 6.3.2.3):
Converting a type "pointer to A" to type "pointer to B" and back to "pointer to A" shall result in the same original pointer. (i.e., if p is of type int *, then (int *)(double *)p will yield p)
Converting any pointer to a char * will point to the lowest-addressable byte of the object.
So, in your case, an explicit (unsigned char *) cast will yield a conforming program without any undefined behavior.
The cast is required, see C11 (n1570) 6.5.2.2 p.2:
[…] Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
This refers to the rules for assignment, the relevant part is (ibid. 6.5.16.1 p.1)
One of the following shall hold:
[…]
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right.
[…]
And unsigned char isn’t compatible to int.
These rules both appear in a “constraint” section, where “shall” means that the compiler has to give a “diagnostic message” (cf. C11 5.1.1.3) and may stop compiling (or whatever, everything beyond that diagnostic is, strictly speaking, out of the scope of the C standard). Your code is an example of a constraint violation.
Other examples of constraint violations are calling a (prototyped and non-variadic) function with the wrong number of arguments, using bitwise operators on doubles, or redeclaring an identifier with an incompatible type in the same scope, ibid. 5.1.1.3 p.2:
Example
An implementation shall issue a diagnostic for the translation unit:
char i;
int i;
because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.
Syntax violations are treated equally.
So, strictly speaking, your program is as invalid as
int foo(int);
int main() {
It's my birthday!
foo(0.5 ^ 42, 12);
}
which a conforming implementation very well may compile, maybe to a program having undefined behavior, as long as it gives at least one diagnostic (e.g. a warning).
For e.g. gcc, a warning is a diagnostic (you can turn syntax and constraint violations into errors with -pedantic-errors).
The term ill-formed may be used to refer to either a syntax or a constraint violation, the C standard doesn't use this term, but cf. C++11 (n3242):
1.3.9
ill-formed program
program that is not well formed
1.3.26
well-formed program
C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.
The language-lawyer attitude aside, your code will probably always either be not compiled at all (which should be reason enough to do the cast), or show the expected behavior.
C11, §6.5.2.2:
2 Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
§6.5.16.1 describes assignment in terms of a list of constraints, including
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
int and unsigned char are not compatible types, so the program is not well-formed and the Standard doesn't even guarantee that it will compile.
Although some would say "it is undefined behavior according to the standard", here is what happens de-facto (answering by an example):
Safe:
void f(char* p)
{
char r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
int x = 0;
f((char*)&x); // the casting is just in order to emit the compilation warning
This code is safe as long as you access memory with p[i], where 0 <= i <= sizeof(int)-1.
Unsafe:
void f(int* p)
{
int r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
char x[sizeof(int)] = {0};
f((int*)&x); // the casting is just in order to emit the compilation warning
This code is unsafe because although the allocated variable is large enough to accommodate an int, its address in memory is not necessarily a multiple of sizeof(int). As a result, unless the compiler (as well as the underlying HW architecture) supports unaligned load/store operations, a memory access violation will occur during runtime if the address of this variable in memory is indeed not properly aligned.

Resources