Consider the following C code:
static sig_atomic_t x;
static sig_atomic_t y;
int foo()
{
x = 1;
y = 2;
}
First question: can the C compiler decide to "optimize" the code for foo to y = 2; x = 1 (in the sense that the memory location for y is changed before the memory location for x)? This would be equivalent, except when multiple threads or signals are involved.
If the answer to the first question is "yes": what should I do if I really want the guarantee that x is stored before y?
Yes, the compiler may change the order of the two assignments, because the reordering is not "observable" as defined by the C standard, e.g., there are no side-effects to the assignments (again, as defined by the C standard, which does not consider the existence of an outside observer).
In practice you need some kind of barrier/fence to guarantee the order, e.g., use the services provided by your multithreading environment, or possibly C11 stdatomic.h if available.
The C standard specifies a term called observable behavior. This means that at a minimum, the compiler/system has a few restrictions: it is not allowed to re-order expressions containing volatile-qualified operands, nor is it allowed to re-order input/output.
Apart from those special cases, anything is fair game. It may execute y before x, it may execute them in parallel. It might optimize the whole code away as there are no observable side-effects in the code. And so on.
Please note that thread-safety and order of execution are different things. Threads are created explicitly by the programmer/libraries. A context switch may interrupt any variable acccess which is not atomic. That's another issue and the solution is to use mutex, _Atomic qualifier or similar protection mechanisms.
If the order matters, you should volatile-qualify the variables. In that case, the following guarantees are made by the language:
C17 5.1.2.3 § 6 (the definition of observable behavior):
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
C17 5.1.2.3 § 4:
In the abstract machine, all expressions are evaluated as specified by the semantics.
Where "semantics" is pretty much the whole standard, for example the part that specifies that a ; consists of a sequence point. (In this case, C17 6.7.6 "The end of a full
declarator is a sequence point." The term "sequenced before" is specified in C17 5.1.2.3 §3).
So given this:
volatile int x = 1;
volatile int y = 1;
then the order of initialization is guaranteed to be x before y, as the ; of the first line guarantees the sequencing order, and volatile guarantees that the program strictly follows the evaluation order specified in the standard.
Now as it happens in the real world, volatile does not guarantee memory barriers on many compiler implementations for multi-core systems. Those implementations are not conforming.
Opportunist compilers might claim that the programmer must use system-specific memory barriers to guarantee order of execution. But in case of volatile, that is not true, as proven above. They just want to dodge their responsibility and hand it over to the programmers. The C standard doesn't care if the CPU has 57 cores, branch prediction and instruction pipelining.
Related
Assume the following code
struct a {
unsigned cntr;
};
void boo(struct a *v) {
v->cntr++;
while(v->cntr > 1);
}
I wonder if the compiler is allowed to omit the while loop inside boo() due to the following statement in the C11 standard:
An iteration statement whose controlling expression is not a constant expression,156) that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate.157)
157)This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.
Can v->cntr, in the controlling expression, be considered as a synchronization since v may be a pointer to a global structure which can be modified externally (for example by another thread)?
Additional question.
Is the compiler allowed not to re-read v->cntr on each iteration if v is not defined as volatile?
Can v->cntr, in the controlling expression, be considered as a synchronization
No.
From https://port70.net/~nsz/c/c11/n1570.html#5.1.2.4p5 :
The library defines a number of atomic operations (7.17) and operations on mutexes (7.26.4) that are specially identified as synchronization operations.
So basically, functions from stdatomic.h and mtx_* from thread.h are synchronization operations.
since v may be a pointer to a global structure which can be modified externally (for example by another thread)?
Does not matter. Assumptions like sound to me like they would disallow many sane optimizations, I wouldn't want my compiler to assume that.
If v were modified in another thread, then it would be unsequenced, that would just result in undefined behavior https://port70.net/~nsz/c/c11/n1570.html#5.1.2.4p25 .
Is the compiler allowed not to re-read v->cntr on each iteration if v is not defined as volatile?
Yes.
From the Procedure Call Standard for the ARM architecture (§7.1.5):
a compiler may ignore a volatile qualification of an automatic
variable whose address is never taken unless the function calls
setjmp().
Is that mean that in following code:
volatile int x = 8;
if (x == 1)
{
printf("can be optimised away??");
}
The whole if scope can be optimised out?
This just contradict the standard, for starters, volatile accesses are part of the observable behaviour and must be performed as in the abstract machine code:
§5.1.2.3:
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the
rules of the abstract machine.
And also §6.7.3:
An object that has volatile-qualified type may be modified in ways
unknown to the implementation or have other unknown side effects.
Therefore any expression referring to such an object shall be
evaluated strictly according to the rules of the abstract machine
Is there a contradiction? And if so, how is it legit that a PCS contradict the C standard?
So I reached ARM toolchain's support group, and according to them the ARM PCS standard is an independent standard that is not bound to the C standard, such that a compiler can choose to comply to one, or both of them. In their own words:
In a way it's not really a contradiction
the APCS permits a compiler to respect or ignore local volatile
the C standard requires a compiler to respect local volatile
so a compiler that is compatible with both will respect local volatile.
Armclang has elected to follow the C standard which makes it compatible with both
So if a compiler choose to perform this non C-conforming optimization, it is still ARM PCS conforming implementation, but not a C-conforming compiler.
To conclude, a C-conforming compiler for ARM architecture which implements ARM PCS will never perform this optimization.
From ISO/IEC 9899:201x section 5.1.2.3 Program execution paragraph 4:
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
What exactly is the allowed optimization here regarding the volatile object? can someone give an example of a volatile access that CAN be optimized away?
Since volatiles access are an observable behaviour (described in paragraph 6) it seems that no optimization can take please regarding volatiles, so, I'm curious to know what optimization is allowed in section 4.
Reformatting a little:
An actual implementation need not evaluate part of an expression if:
a) it can deduce that its value is not used; and
b) it can deduce that that no needed side effects are produced (including any
caused by calling a function or accessing a volatile object).
Reversing the logic without changing the meaning:
An actual implementation must evaluate part of an expression if:
a) it can't deduce that its value is not used; or
b) it can't deduce that that no needed side effects are produced (including
any caused by calling a function or accessing a volatile object).
Simplifying to focus on the volatile part:
An actual implementation must evaluate part of an expression if needed
side effects are produced (including accessing a volatile object).
Accesses to volatile objects must be evaluated. The phrase “including any…” modifies “side effects.” It does not modify “if it can deduce…” It has the same meaning as:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects (including any caused by calling a function or accessing a volatile object) are produced.
This means “side effects” includes side effects that are caused by accessing a volatile object. In order to decide it cannot evaluate part of an expression, an implementation must deduce that no needed side effects, including any caused by calling a function or accessing a volatile object, are produced.
It does not mean that an implementation can discard evaluation of part of an expression even if that expression includes accesses to a volatile object.
can someone give an example of a volatile access that CAN be optimized
away?
I think that you misinterpreted the text, IMO this paragraph means that
volatile unsigned int bla = whatever();
if (bla < 0) // the code is not evaluated even if a volatile is involved
Adding another example that fits into this in my understanding:
volatile int vol_a;
....
int b = vol_a * 0; // vol_a is not evaluated
In cases where an access to a volatile object would affect system behavior in a way that would be necessary to make a program achieve its purpose, such an access must not be omitted. If the access would have no effect whatsoever on system behavior, then the operation could be "performed" on the abstract machine without having to execute any instructions. It would be rare, however, for a compiler writer to know with certainty that the effect of executing instructions to perform the accesses would be the same as the effect of pretending to do those instructions on the abstract machine while skipping them on the real one.
In the much more common scenario where a compiler writer would have no particular knowledge of any effect that a volatile access might have, but also have no particular reason to believe that such accesses couldn't have effects the compiler writer doesn't know about (e.g. because of hardware which is triggered by operations involving certain addresses), a compiler writer would have to allow for the possibility that such accesses might have "interesting" effects by performing them in the specified sequence, without regard for whether the compiler writer knows of any particular reason that the sequence of operations should matter.
The C99 standard 5.1.2.3$2 says
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, 12) which are changes in the state of
the execution environment. Evaluation of an expression in general includes both value
computations and initiation of side effects. Value computation for an lvalue expression
includes determining the identity of the designated object.
I guess that in a lot of cases the compiler can't inline and possibly eliminate the functions doing I/O since they live in a different translation unit. And the parameters to functions doing I/O are often pointers, further hindering the optimizer.
But link-time-optimization gives the compiler "more to chew on".
And even though the paragraph I quoted says that "modifying an object" (that's standard-speak for memory) is a side-effect, stores to memory is not automatically treated as a side effect when the optimizer kicks in. Here's an example from John Regehrs Nine Ways to Break your Systems Software using Volatile where the message store is reordered relative to the volatile ready variable.
.
volatile int ready;
int message[100];
void foo (int i) {
message[i/10] = 42;
ready = 1;
}
How do a C compiler determine if a statement operates on a file? In a free-standing embedded environment I declare registers as volatile, thus hindering the compiler from optimizing calls away and swapping order of I/O calls.
Is that the only way to tell the compiler that we're doing I/O? Or do the C standard dictate that these N calls in the standard library do I/O and thus must receive special treatment? But then, what if someone created their own system call wrapper for say read?
As C has no statement dedicated to IO, only function calls can modify files. So if the compiler sees no function call in a sequence of statements, it knows that this sequence has not modified any file.
If only functions from the standard library are called, and if the environment is hosted, the compiler could know what they do and use that to guess what will happen.
But what is really important, is that the compiler only needs to respect side effects. It is perfectly allowed when it does not know, to assume that a function call could involve side effects and act accordingly. It will not be a violation of the standard if no side effects are actually involved, it will just possibly lose a higher optimization.
In standards C99 and C11 an expression like the following is U.B. (Undefined Behaviour):
int x = 2;
int ans = x++ + x++;
In Bash the increment/decrement operators are defined and the official document in gnu.org says that the conventions of standard C are followed.
In addition, Bash is mostly POSIX conforming and in its standard document (http://pubs.opengroup.org/onlinepubs/9699919799/) is said that, for arithmetical operations, C standard is assumed unless the contrary is said.
Since I cannot find more information, my conclusion is that in Bash we also have Undefined Behaviour with increment operators:
x = 2
echo $(( x++ + x++ ))
I need to be sure if my conclusion is right or, on the contrary, if there exists some convention in Bash that supersedes the C standard.
Additional note: Trying in my system (Ubuntu 14.04, Bash version 4.3.11) it seems that left to right evaluation is performed, with an increment that is immediately done where the operator ++ appears.
You seem to be seeking to find a defined standard for bash, as there is for C. Unfortunately, there isn't one.
There are really only two guides:
Volume 3 (Shell and Utilities) of the Open Group Base Specifications, commonly known as "Posix", which is deliberately underspecified. It does state that arithmetic evaluation "be equivalent to that described in Section 6.5, Expressions, of the ISO C standard," but that standard does not specify any value for an expression which contains both a mutator and another use of the same variable (such as x + x++ or x++ + x++).
The actual behaviour of and manual for a particular version of bash, neither of which qualify as a formal specification, and neither of which are in any way certified or blessed by any standards organization.
Consequently, I would have to say that no document defines the result in bash of arithmetic evaluation of an expression like x++ + x++. Even if the bash manual for the current bash version specified it (which it doesn't), and even if it were possible to deduce the behaviour from examination of the source code of the current bash version (which it is, but not necessarily easily), that would not be in any sense of the word a formal specification.
That makes the result "undefined" in the intuitive sense of the word: no result is defined.
There is certainly no law requiring that every programming language be fully specified, and many are not. Indeed, while C and C++ both enjoy exhaustive definitions in the form of ISO standards, there are many usages which the standards deliberately do not define, partly because doing so would require forms of error-detection which would impede performance. These decisions have been and will continue to be controversial, and I have no intention of taking sides.
I will simply observe that in the context of a formal specification, the following are not the same:
The requirement that the result of evaluating a particular program construct be detected as an error.
The explicit statement that the value of a particular construct is implementation-defined.
The failure to specify the value of a particular construct, or the explicit statement that it is unspecified.
The explicit statement that the result of evaluating a particular construct is undefined.
The first two of these are definitely formal specifications, in that they imply that the result of an evaluation is well-defined (in the second case, the definition should/must appear in the implementation manual). Such constructs are definitely usable, although of course the implementation-specific constructs will render the program non-portable.
The third does not accurately define the value of the construct it describes, although it will often provide a list of possibilities. The fourth, which is a C/C++ specialty, specifies that the construct is not valid and that it is the programmer's responsibility to avoid it because the standard imposes no requirements whatsoever on an implementation. Such constructs should never be used.
Examples of the four cases taken from specific specifications:
Error detection. (Java) "if the value of the divisor in an integer division is 0, then an ArithmeticException is thrown."
Implementation-specific. (Posix shell) "Open files are represented by decimal numbers starting with zero. The largest possible value is implementation-defined; however, all implementations shall support at least 0 to 9, inclusive, for use by the application."
Unspecified behaviour. (C/C++) "An example of unspecified behavior is the order in which the arguments to a function are evaluated." (from the definitions section of the C99 standard).
Undefined behaviour (C) "In both operations [/ and %], if the value of the second operand is zero, the behavior is undefined." (Contrast with Java, above.)
The last two categories seem to trigger a kind of outrage amongst certain programmers; disbelief that it is possible that a specification leave behaviour unspecified, and even that the absence of a specification is some kind of conspiracy to hide the truth (which must, therefore, be set free). And this in turn leads to random experimentation with particular language implementations, which must be futile precisely because the standard does not bind all implementations to do the same thing, or even for a particular implementation to consistently do the same thing.
It's also important to avoid thinking of "undefined behaviour" as a specific behaviour. If a computation with UB had a specific behaviour, it wouldn't be undefined. Even if the computation had a range of possible specific behaviours, it would be merely unspecified.
"Undefined behaviour" is not a specification or an attribute. You cannot detect "undefined behaviour" because the lack of definition means that any behaviour is possible, including behaviour which is the defined result of some other construct.
In particular, "undefined behaviour" is not the same as a detected error because the implementation is under no obligation to detect it.
Looking at the bash code (bash 4.3), source expr.c, I see the following:
/* post-increment or post-decrement */
if (stok == POSTINC || stok == POSTDEC)
{
/* restore certain portions of EC */
tokstr = ec.tokstr;
noeval = ec.noeval;
curlval = ec.lval;
lasttok = STR; /* ec.curtok */
v2 = val + ((stok == POSTINC) ? 1 : -1);
vincdec = itos (v2);
if (noeval == 0)
{
#if defined (ARRAY_VARS)
if (curlval.ind != -1)
expr_bind_array_element (curlval.tokstr, curlval.ind, vincdec);
else
#endif
expr_bind_variable (tokstr, vincdec);
}
free (vincdec);
curtok = NUM; /* make sure x++=7 is flagged as an error */
}
As you can see, the post increment is not implemented with the C post-increment:
v2 = val + ((stok == POSTINC) ? 1 : -1);
IMHO, with that code, and the fact that a line is processed token by token from left to right, I could say the behaviour is well defined.