What do the different classifications of undefined behavior mean?

What do the different classifications of undefined behavior mean? - c

I was reading through the C11 standard. As per the C11 standard undefined behavior is classified into four different types. The parenthesized numbers refer to the subclause of the C Standard (C11) that identifies the undefined behavior.
Example 1: The program attempts to modify a string literal (6.4.5). This undefined behavior is classified as: Undefined Behavior (information/confirmation needed)
Example 2 : An lvalue does not designate an object when evaluated (6.3.2.1). This undefined behavior is classified as: Critical Undefined Behavior
Example 3: An object has its stored value accessed other than by an lvalue of an allowable type (6.5). This undefined behavior is classified as: Bounded Undefined Behavior
Example 4: The string pointed to by the mode argument in a call to the fopen function does not exactly match one of the specified character sequences (7.21.5.3). This undefined behavior is classified as: Possible Conforming Language Extension
What is the meaning of the classifications? What do these classification convey to the programmer?

I only have access to a draft of the standard, but from what I’m reading, it seems like this classification of undefined behavior isn’t mandated by the standard and only matters from the perspective of compilers and environments that specifically indicate that they want to create C programs that can be more easily analyzed for different classes of errors. (These environments have to define a special symbol __STDC_ANALYZABLE__.)
It seems like the key idea here is an “out of bounds write,” which is defined as a write operation that modifies data that isn’t otherwise allocated as part of an object. For example, if you clobber the bytes of an existing variable accidentally, that’s not an out of bounds write, but if you jumped to a random region of memory and decorated it with your favorite bit pattern you’d be performing an out of bounds write.
A specific behavior is bounded undefined behavior if the result is undefined, but won’t ever do an out of bounds write. In other words, the behavior is undefined, but you won’t jump to a random address not associated with any objects or allocated space and put bytes there. A behavior is critical undefined behavior if you get undefined behavior that cannot promise that it won’t do an out-of-bounds write.
The standard then goes on to talk about what can lead to critical undefined behavior. By default undefined behaviors are bounded undefined behaviors, but there are exceptions for UB that result from memory errors like like accessing deallocated memory or using an uninitialized pointer, which have critical undefined behavior. Remember, though, that these classifications only exist and have meaning in the context of implementations of C that choose to specifically separate out these sorts of behaviors. Unless your C environment guarantees it’s analyzable, all undefined behaviors can potentially do absolutely anything!
My guess is that this is intended for environments like building drivers or kernel plugins where you’d like to be able to analyze a piece of code and say “well, if you're going to shoot someone in the foot, it had better be your foot that you’re shooting and not mine!” If you compile a C program with these constraints, the runtime environment can instrument the very few operations that are allowed to be critical undefined behavior and have those operations trap to the OS, and assume that all other undefined behaviors will at most destroy memory that’s specifically associated with the program itself.

All of these are cases where the behaviour is undefined, i.e. the standard "imposes no requirements". Traditionally, within undefined behaviour and considering one implementation (i.e. C compiler + C standard library), one could see two kinds of undefined behaviour:
constructs for which the behaviour would not be documented, or would be documented to cause a crash, or the behaviour would be erratic,
constructs that the standard left undefined but for which the implementation defines some useful behaviour.
Sometimes these can be controlled by compiler switches. E.g. example 1 usually always causes bad behaviour - a trap, or crash, or modifies a shared value. Earlier versions of GCC allowed one to have modifiable string literals with -fwritable-strings; therefore if that switch was given, the implementation defined the behaviour in that case.
C11 added an optional orthogonal classification: bounded undefined behaviour and critical undefined behaviour. Bounded undefined behaviour is that which does not perform an out-of-bounds store, i.e. it cannot cause values being written in arbitrary locations in memory. Any undefined behaviour that is not bounded undefined behaviour is critical undefined behaviour.
Iff __STDC_ANALYZABLE__ is defined, the implementation will conform to the appendix L, which has this definitive list of critical undefined behaviour:
An object is referred to outside of its lifetime (6.2.4).
A store is performed to an object that has two incompatible declarations (6.2.7),
A pointer is used to call a function whose type is not compatible with the referenced type (6.2.7, 6.3.2.3, 6.5.2.2).
An lvalue does not designate an object when evaluated (6.3.2.1).
The program attempts to modify a string literal (6.4.5).
The operand of the unary * operator has an invalid value (6.5.3.2).
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just
beyond the array object and is used as the operand of a unary *
operator that is evaluated (6.5.6).
An attempt is made to modify an object defined with a const-qualified type through use of an lvalue with
non-const-qualified type (6.7.3).
An argument to a function or macro defined in the standard library has an invalid value or a type not expected by a function
with variable number of arguments (7.1.4).
The longjmp function is called with a jmp_buf argument where the most recent invocation of the setjmp macro in the same invocation of
the program with the corresponding jmp_buf argument is nonexistent,
or the invocation was from another thread of execution, or the
function containing the invocation has terminated execution in the
interim, or the invocation was within the scope of an identifier with
variably modified type and execution has left that scope in the
interim (7.13.2.1).
The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3).
A string or wide string utility function accesses an array beyond the end of an object (7.24.1, 7.29.4).
For the bounded undefined behaviour, the standard imposes no requirements other than that an out-of-bounds write is not allowed to happen.
The example 1: modification of a string literal is also. classified as critical undefined behaviour. The example 4 is critical undefined behaviour too - the value is not one expected by the standard library.
For example 4, the standard hints that while the behaviour is undefined in case of mode that is not defined by the standard, there are implementations that might define behaviour for other flags. For example glibc supports many more mode flags, such as c, e, m and x, and allow setting the character encoding of the input with ,ccs=charset modifier (and putting the stream into wide mode right away).

Some programs are intended solely for use with input that is known to be valid, or at least come from trustworthy sources. Others are not. Certain kinds of optimizations which might be useful when processing only trusted data are stupid and dangerous when used with untrusted data. The authors of Annex L unfortunately wrote it excessively vaguely, but the clear intention is to allow compilers that they won't do certain kinds of "optimizations" that are stupid and dangerous when using data from untrustworthy sources.
Consider the function (assume "int" is 32 bits):
int32_t triplet_may_be_interesting(int32_t a, int32_t b, int32_t c)
{
return a*b > c;
}
invoked from the context:
#define SCALE_FACTOR 123456
int my_array[20000];
int32_t foo(uint16_t x, uint16_t y)
{
if (x < 20000)
my_array[x]++;
if (triplet_may_be_interesting(x, SCALE_FACTOR, y))
return examine_triplet(x, SCALE_FACTOR, y);
else
return 0;
}
When C89 was written, the most common way a 32-bit compiler would process that code would have been to do a 32-bit multiply and then do a signed comparison with y. A few optimizations are possible, however, especially if a compiler in-lines the function invocation:
On platforms where unsigned compares are faster than signed compares, a compiler could infer that since none of a, b, or c can be negative, the arithmetical value of a*b is non-negative, and it may thus use an unsigned compare instead of a signed comparison. This optimization would be allowable even if __STDC_ANALYZABLE__ is non-zero.
A compiler could likewise infer that if x is non-zero, the arithmetical value of x*123456 will be greater than every possible value of y, and if x is zero, then x*123456 won't be greater than any. It could thus replace the second if condition with simply if (x). This optimization is also allowable even if __STDC_ANALYzABLE__ is non-zero.
A compiler whose authors either intend it for use only with trusted data, or else wrongly believe that cleverness and stupidity are antonyms, could infer that since any value of x larger than 17395 will result in an integer overflow, x may be safely presumed to be 17395 or less. It could thus perform my_array[x]++; unconditionally. A compiler may not define __STDC_ANALYZABLE__ with a non-zero value if it would perform this optimization. It is this latter kind of optimization which Annex L is designed to address. If an implementation can guarantee that the effect of overflow will be limited to yielding a possibly-meaningless value, it may be cheaper and easier for code to deal with the possibly of the value being meaningless than to prevent the overflow. If overflow could instead cause objects to behave as though their values were corrupted by future computations, however, there would be no way a program could deal with things like overflow after the fact, even in cases where the result of the computation would end up being irrelevant.
In this example, if the effect of integer overflow would be limited to yielding a possibly-meaningless value, and if calling examine_triplet() unnecessarily would waste time but would otherwise be harmless, a compiler may be able to usefully optimize triplet_may_be_interesting in ways that would not be possible if it were written to avoid integer overflow at all costs. Aggressive
"optimization" will thus result in less efficient code than would be possible with a compiler that instead used its freedom to offer some loose behavioral guarantees.
Annex L would be much more useful if it allowed implementations to offer specific behavioral guarantees (e.g. overflow will yield a possibly-meaningless result, but have no other side-effects). No single set of guarantees would be optimal for all programs, but the amount of text Annex L spent on its impractical proposed trapping mechanism could have been better spent specifying macros to indicate what guarantees various implementations could offer.

According to cppreference :
Critical undefined behavior
Critical UB is undefined behavior that might perform a memory write or
a volatile memory read out of bounds of any object. A program that has
critical undefined behavior may be susceptible to security exploits.
Only the following undefined behaviors are critical:
access to an object outside of its lifetime (e.g. through a dangling pointer)
write to an object whose declarations are not compatible
function call through a function pointer whose type is not compatible with the type of the function it points to
lvalue expression is evaluated, but does not designate an object attempted modification of a string literal
dereferencing an invalid (null, indeterminate, etc) or past-the-end pointer
modification of a const object through a non-const pointer
call to a standard library function or macro with an invalid argument
call to a variadic standard library function with unexpected argument type (e.g. call to printf with an argument of the type that
doesn't match its conversion specifier)
longjmp where there is no setjmp up the calling scope, across threads, or from within the scope of a VM type.
any use of the pointer that was deallocated by free or realloc
any string or wide string library function accesses an array out of bounds
Bounded undefined behavior
Bounded UB is undefined behavior that cannot perform an illegal memory
write, although it may trap and may produce or store indeterminate
values.
All undefined behavior not listed as critical is bounded, including
multithreaded data races
use of a indeterminate values with automatic storage duration
strict aliasing violations
misaligned object access
signed integer overflow
unsequenced side-effects modify the same scalar or modify and read the same scalar
floating-to-integer or pointer-to-integer conversion overflow
bitwise shift by a negative or too large bit count
integer division by zero
use of a void expression
direct assignment or memcpy of inexactly-overlapped objects
restrict violations
etc.. ALL undefined behavior that's not in the critical list.

"I was reading through the C11 standard. As per the C11 standard undefined behavior is classified into four different types."
I wonder what you were actually reading. The 2011 ISO C standard does not mention these four different classifications of undefined behavior. In fact it's quite explicit in not making any distinctions among different kinds of undefined behavior.
Here's ISO C11 section 4 paragraph 2:
If a "shall" or "shall not" requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is
undefined. Undefined behavior is otherwise indicated in this
International Standard by the words "undefined behavior" or by the
omission of any explicit definition of behavior. There is no
difference in emphasis among these three; they all describe "behavior
that is undefined".
All the examples you cite are undefined behavior, which, as far as the Standard is concerned, means nothing more or less than:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
If you have some other reference, that discusses different kinds of undefined behavior, please update your question to cite it. Your question would then be about what that document means by its classification system, not (just) about the ISO C standard.
Some of the wording in your question appears similar to some of the information in C11 Annex L, "Analyzability" (which is optional for conforming C11 implementations), but your first example refers to "Undefined Behavior (information/confirmation needed)", and the word "confirmation" appears nowhere in the ISO C standard.

Related

Can volatile variables be read multiple times between sequence points?

I'm making my own C compiler to try to learn as much details as possible about C. I'm now trying to understand exactly how volatile objects work.
What is confusing is that, every read access in the code must strictly be executed (C11, 6.7.3p7):
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.134) What constitutes an access to an object that has volatile-qualified type is implementation-defined.
Example : in a = volatile_var - volatile_var;, the volatile variable must be read twice and thus the compiler can't optimise to a = 0;
At the same time, the order of evaluation between sequence point is undetermined (C11, 6.5p3):
The grouping of operators and operands is indicated by the syntax. Except as specified later, side effects and value computations of subexpressions are unsequenced.
Example : in b = (c + d) - (e + f) the order in which the additions are evaluated is unspecified as they are unsequenced.
But evaluations of unsequenced objects where this evaluation creates a side effect (with volatile for instance), the behaviour is undefined (C11, 6.5p2):
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
Does this mean the expressions like x = volatile_var - (volatile_var + volatile_var) is undefined ? Should my compiler throw an warning if this occurs ?
I've tried to see what CLANG and GCC do. Neither thow an error nor a warning. The outputed asm shows that the variables are NOT read in the execution order, but left to right instead as show in the asm risc-v asm below :
const int volatile thingy = 0;
int main()
{
int new_thing = thingy - (thingy + thingy);
return new_thing;
}
main:
lui a4,%hi(thingy)
lw a0,%lo(thingy)(a4)
lw a5,%lo(thingy)(a4)
lw a4,%lo(thingy)(a4)
add a5,a5,a4
sub a0,a0,a5
ret
Edit: I am not asking "Why do compilers accept it", I am asking "Is it undefined behavior if we strictly follow the C11 standard". The standard seems to state that it is undefined behaviour, but I need more precision about it to correctly interpret that

Reading the (ISO 9899:2018) standard literally, then it is undefined behavior.
C17 5.1.2.3/2 - definition of side effects:
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects
C17 6.5/2 - sequencing of operands:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
Thus when reading the standard literally, volatile_var - volatile_var is definitely undefined behavior. Twice in a row UB actually, since both of the quoted sentences apply.
Please also note that this text changed quite a bit in C11. Previously C99 said, 6.5/2:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
That is, the behaviour was previously unspecified in C99 (unspecified order of evaluation) but was made undefined by the changes in C11.
That being said, other than re-ordering the evaluation as it pleases, a compiler doesn't really have any reason to do wild and crazy things with this expression since there isn't much that can be optimized, given volatile.
As a quality of implementation, mainstream compilers seem to maintain the previous "merely unspecified" behavior from C99.

Per C11, this is undefined behavior.
Per 5.1.2.3 Program execution, paragraph 2 (bolding mine):
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects ...
And 6.5 Expressions, paragraph 2 (again, bolding mine):
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
Note that, as this is your compiler, you are free to define the behavior should you wish.

As other answers have pointed out, accessing a volatile-qualified variable is a side effect, and side effects are interesting, and having multiple side effects between sequence points is especially interesting, and having multiple side effects that affect the same object between sequence points is undefined.
As an example of how/why it's undefined, consider this (wrong) code for reading a two-byte big-endian value from an input stream ifs:
uint16_t val = (getc(ifs) << 8) | getc(ifs); /* WRONG */
This code imagines (in order to implement big-endianness, that is) that the two getc calls happen in left-to-right order, but of course that's not at all guaranteed, which is why this code is wrong.
Now, one of the things the volatile qualifier is for is input registers. So if you've got a volatile variable
volatile uint8_t inputreg;
and if every time you read it you get the next byte coming in on some device — that is, if merely accessing the variable inputreg is like calling getc() on a stream — then you might write this code:
uint16_t val = (inputreg << 8) | inputreg; /* ALSO WRONG */
and it's just about exactly as wrong as the getc() code above.

The Standard has no terminology more specific than "Undefined Behavior" to describe actions which should be unambiguously defined on some implementations, or even the vast majority of them, but may behave unpredictably on others, based upon Implementation-Defined criteria. If anything, the authors of the Standard go out of their way to avoid saying anything about such behaviors.
The term is also used as a catch-all for situations where a potentially useful optimization might observably affect program behavior in some cases, to ensure that such optimizations will not affect program behavior in any defined situations.
The Standard specifies that the semantics of volatile-qualified accesses are "Implementation Defined", and there are platforms where certain kinds of optimizations involving volatile-qualified accesses might be observable if more than one such access occurs between sequence points. As a simple example, some platforms have read-modify-write operations whose semantics may be observably distinct from doing discrete read, modify, and write operations. If a programmer were to write:
void x(int volatile *dest, int volatile *src)
{
*dest = *src | 1;
}
and the two pointers were equal, the behavior of such a function might depend upon whether a compiler recognized that the pointers were equal and replaced discrete read and write operations with a combined read-modify-write.
To be sure, such distinctions would be unlikely to matter in most cases, and would be especially unlikely to matter in cases where an object is read twice. Nonetheless, the Standard makes no attempt to distinguish situations where such optimizations would actually affect program behavior, much less those where they would affect program behavior in any way that actually mattered, from those where it would be impossible to detect the effects of such optimization. The notion that the phrase "non-portable or erroneous" excludes constructs which would be non-portable but correct on the target platform would lead to an interesting irony that compiler optimizations such as read-modify-write merging would be completely useless on any "correct" programs.

No diagnostic is required for programs with Undefined Behaviour, except where specifically mentioned. So it's not wrong to accept this code.
In general, it's not possible to know whether the same volatile storage is being accessed multiple times between sequence points (consider a function taking two volatile int* parameters, without restrict, as the simplest example where analysis is impossible).
That said, when you are able to detect a problematic situation, users might find it helpful, so I encourage you to work on getting a diagnostic out.

IMO it is legal but very bad.
int new_thing = thingy - (thingy + thingy);
Multiple use of volatile variables in one expression is allowed and no warning is needed. But from the programmer's point of view, it is a very bad line of code.
Does this mean the expressions like x = volatile_var - (volatile_var +
volatile_var) is undefined ? Should my compiler throw an error if this
occurs ?
No as C standard does not say anything how those reads have to be ordered. It is left to the implementations. All known to me implementations do it the easiest way for them like in this example : https://godbolt.org/z/99498141d

why does my auto variable always give 0 instead of garbage value? [duplicate]

What is undefined behavior (UB) in C and C++? What about unspecified behavior and implementation-defined behavior? What is the difference between them?

Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!
Let's look at a classic example:
#include <iostream>
int main()
{
char* p = "hello!\n"; // yes I know, deprecated conversion
p[0] = 'y';
p[5] = 'w';
std::cout << p;
}
The variable p points to the string literal "hello!\n", and the two assignments below try to modify that string literal. What does this program do? According to section 2.14.5 paragraph 11 of the C++ standard, it invokes undefined behavior:
The effect of attempting to modify a string literal is undefined.
I can hear people screaming "But wait, I can compile this no problem and get the output yellow" or "What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump". This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a "correct" behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.
Other examples of undefined behavior include accessing an array beyond its bounds, dereferencing the null pointer, accessing objects after their lifetime ended or writing allegedly clever expressions like i++ + ++i.
Section 1.9 of the C++ standard also mentions undefined behavior's two less dangerous brothers, unspecified behavior and implementation-defined behavior:
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.
Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.
Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]
Specifically, section 1.3.24 states:
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they're talking about. Avoid internet tutorials. Avoid bullschildt.

Well, this is basically a straight copy-paste from the standard
3.4.1 1 implementation-defined behavior unspecified behavior where
each implementation documents how the
choice is made
2 EXAMPLE An example of
implementation-defined behavior is the
propagation of the high-order bit when
a signed integer is shifted right.
3.4.3 1 undefined behavior behavior, upon use of a nonportable or erroneous
program construct or of erroneous
data, for which this International
Standard imposes no requirements
2
NOTE Possible undefined behavior
ranges from ignoring the situation
completely with unpredictable results,
to behaving during translation or
program execution in a documented
manner characteristic of the
environment (with or without the
issuance of a diagnostic message), to
terminating a translation or execution
(with the issuance of a diagnostic
message).
3 EXAMPLE An example of
undefined behavior is the behavior on
integer overflow.
3.4.4 1 unspecified behavior use of an unspecified value, or other behavior
where this International Standard
provides two or more possibilities and
imposes no further requirements on
which is chosen in any instance
2
EXAMPLE An example of unspecified
behavior is the order in which the
arguments to a function are evaluated.

Maybe simpler wording could be easier to understand than the rigorous definition of the standards.
implementation-defined behavior:
The language says that we have data-types. The compiler vendors specify what sizes shall they use, and provide a documentation of what they did.
undefined behavior:
You are doing something wrong. For example, you have a very large value in an int that doesn't fit in char. How do you put that value in char? actually there is no way! Anything could happen, but the most sensible thing would be to take the first byte of that int and put it in char. It is just wrong to do that to assign the first byte, but thats what happens under the hood.
unspecified behavior:
Which of these two functions is executed first?
void fun(int n, int m);
int fun1() {
std::cout << "fun1";
return 1;
}
int fun2() {
std::cout << "fun2";
return 2;
}
//...
fun(fun1(), fun2()); // which one is executed first?
The language doesn't specify the evaluation, left to right or right to left! So an unspecified behavior may or mayn't result in an undefined behavior, but certainly your program should not produce an unspecified behavior.
#eSKay I think your question is worth editing the answer to clarify more :)
for fun(fun1(), fun2()); isn't the behaviour "implementation defined"? The compiler has to choose one or the other course, after all?
The difference between implementation-defined and unspecified, is that the compiler is supposed to pick a behavior in the first case but it doesn't have to in the second case. For example, an implementation must have one and only one definition of sizeof(int). So, it can't say that sizeof(int) is 4 for some portion of the program and 8 for others. Unlike unspecified behavior, where the compiler can say: "OK I am gonna evaluate these arguments left-to-right and the next function's arguments are evaluated right-to-left." It can happen in the same program, that's why it is called unspecified. In fact, C++ could have been made easier if some of the unspecified behaviors were specified. Take a look here at Dr. Stroustrup's answer for that:
It is claimed that the difference between what can be produced giving the compiler this freedom and requiring "ordinary left-to-right evaluation" can be significant. I'm unconvinced, but with innumerable compilers "out there" taking advantage of the freedom and some people passionately defending that freedom, a change would be difficult and could take decades to penetrate to the distant corners of the C and C++ worlds. I am disappointed that not all compilers warn against code such as ++i+i++. Similarly, the order of evaluation of arguments is unspecified.
IMO far too many "things" are left undefined, unspecified, that's easy to say and even to give examples of, but hard to fix. It should also be noted that it is not all that difficult to avoid most of the problems and produce portable code.

From the official C Rationale Document
The terms unspecified behavior, undefined behavior, and implementation-defined behavior are used to categorize the result of writing programs whose properties the Standard does not, or cannot, completely describe. The goal of adopting this categorization is to allow a certain variety among implementations which permits quality of implementation to be an active force in the marketplace as well as to allow certain popular extensions, without removing the cachet of conformance to the Standard. Appendix F to the Standard catalogs those behaviors which fall into one of these three categories.
Unspecified behavior gives the implementor some latitude in translating programs. This latitude does not extend as far as failing to translate the program.
Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.
Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation-defined behavior is not an adequate response.

Undefined Behavior vs. Unspecified Behavior has a short description of it.
Their final summary:
To sum up, unspecified behavior is usually something you shouldn't
worry about, unless your software is required to be portable.
Conversely, undefined behavior is always undesirable and should never
occur.

Implementation defined-
Implementors wish,should be well documented,standard gives choices but sure to compile
Unspecified -
Same as implementation-defined but not documented
Undefined-
Anything might happen,take care of it.

Historically, both Implementation-Defined Behavior and Undefined Behavior represented situations in which the authors of the Standard expected that people writing quality implementations would use judgment to decide what behavioral guarantees, if any, would be useful for programs in the intended application field running on the intended targets. The needs of high-end number-crunching code are quite different from those of low-level systems code, and both UB and IDB give compiler writers flexibility to meet those different needs. Neither category mandates that implementations behave in a way that's useful for any particular purpose, or even for any purpose whatsoever. Quality implementations that claim to be suitable for a particular purpose, however, should behave in a manner befitting such purpose whether the Standard requires it or not.
The only difference between Implementation-Defined Behavior and Undefined Behavior is that the former requires that implementations define and document a consistent behavior even in cases where nothing the implementation could possibly do would be useful. The dividing line between them is not whether it would generally be useful for implementations to define behaviors (compiler writers should define useful behaviors when practical whether the Standard requires them to or not) but whether there might be implementations where defining a behavior would be simultaneously costly and useless. A judgment that such implementations might exist does not in any way, shape, or form, imply any judgment about the usefulness of supporting a defined behavior on other platforms.
Unfortunately, since the mid 1990s compiler writers have started to interpret the lack of behavioral mandates as an judgment that behavioral guarantees aren't worth the cost even in application fields where they're vital, and even on systems where they cost practically nothing. Instead of treating UB as an invitation to exercise reasonable judgment, compiler writers have started treating it as an excuse not to do so.
For example, given the following code:
int scaled_velocity(int v, unsigned char pow)
{
if (v > 250)
v = 250;
if (v < -250)
v = -250;
return v << pow;
}
a two's-complement implementation would not have to expend any effort
whatsoever to treat the expression v << pow as a two's-complement shift
without regard for whether v was positive or negative.
The preferred philosophy among some of today's compiler writers, however, would suggest that because v can only be negative if the program is going to engage in Undefined Behavior, there's no reason to have the program clip the negative range of v. Even though left-shifting of negative values used to be supported on every single compiler of significance, and a large amount of existing code relies upon that behavior, modern philosophy would interpret the fact that the Standard says that left-shifting negative values is UB as implying that compiler writers should feel free to ignore that.

C++ standard n3337 § 1.3.10
implementation-defined behavior
behavior, for a well-formed program construct and correct data, that
depends on the implementation and that each implementation documents
Sometimes C++ Standard doesn't impose particular behavior on some constructs but says instead that a particular, well defined behavior has to be chosen and described by particular implementation (version of library). So user can still know exactly how will program behave even though Standard doesn't describe this.
C++ standard n3337 § 1.3.24
undefined behavior
behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International
Standard omits any explicit definition of behavior or when a program
uses an erroneous construct or erroneous data. Permissible undefined
behavior ranges from ignoring the situation completely with
unpredictable results, to behaving during translation or program
execution in a documented manner characteristic of the environment
(with or without the issuance of a diagnostic message), to terminating
a translation or execution (with the issuance of a diagnostic
message). Many erroneous program constructs do not engender undefined
behavior; they are required to be diagnosed. — end note ]
When the program encounters construct that is not defined according to C++ Standard it is allowed to do whatever it wants to do ( maybe send an email to me or maybe send an email to you or maybe ignore the code completely).
C++ standard n3337 § 1.3.25
unspecified behavior
behavior, for a well-formed program construct and correct data, that
depends on the implementation [ Note: The implementation is not
required to document which behavior occurs. The range of possible
behaviors is usually delineated by this International Standard. — end
note ]
C++ Standard doesn't impose particular behavior on some constructs but says instead that a particular, well defined behavior has to be chosen ( bot not necessary described) by particular implementation (version of library). So in the case when no description has been provided it can be difficult to the user to know exactly how will program behave.

Undefined behavior is ugly -- as in, "The good, the bad, and the ugly".
Good: a program that compiles and works, for the right reasons.
Bad: a program that has an error, of a kind that the compiler can detect and complain about.
Ugly: a program that has an error, that the compiler cannot detect and warn about, meaning that the program compiles, and may seem to work correctly some of the time, but also fails bizarrely some of the time. That's what undefined behavior is.
Some program languages and other formal systems try hard to limit the "gulf of undefinedness" -- that is, they try to arrange things so that most or all programs are either "good" or "bad", and that very few are "ugly". It's a characteristic feature of C, however, that its "gulf of undefinedness" is quite wide.

Does i=i++ cause undefined behavior even if not executed? [duplicate]

The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?
int main(void)
{
int i;
if(0)
{
i = 1/0;
}
return 0;
}
I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
So, any ideas?

Let's look at how the C standard defines the terms "behavior" and "undefined behavior".
References are to the N1570 draft of the ISO C 2011 standard; I'm not aware of any relevant differences in any of the three published ISO C standards (1990, 1999, and 2011).
Section 3.4:
behavior
external appearance or action
Ok, that's a bit vague, but I'd argue that a given statement has no "appearance", and certainly no "action", unless it's actually executed.
Section 3.4.3:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
It says "upon use" of such a construct. The word "use" is not defined by the standard, so we fall back to the common English meaning. A construct is not "used" if it's never executed.
There's a note under that definition:
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
So a compiler is permitted to reject your program at compile time if its behavior is undefined. But my interpretation of that is that it can do so only if it can prove that every execution of the program will encounter undefined behavior. Which implies, I think, that this:
if (rand() % 2 == 0) {
i = i / 0;
}
which certainly can have undefined behavior, cannot be rejected at compile time.
As a practical matter, programs have to be able to perform runtime tests to guard against invoking undefined behavior, and the standard has to permit them to do so.
Your example was:
if (0) {
i = 1/0;
}
which never executes the division by 0. A very common idiom is:
int x, y;
/* set values for x and y */
if (y != 0) {
x = x / y;
}
The division certainly has undefined behavior if y == 0, but it's never executed if y == 0. The behavior is well defined, and for the same reason that your example is well defined: because the potential undefined behavior can never actually happen.
(Unless INT_MIN < -INT_MAX && x == INT_MIN && y == -1 (yes, integer division can overflow), but that's a separate issue.)
In a comment (since deleted), somebody pointed out that the compiler may evaluate constant expressions at compile time. Which is true, but not relevant in this case, because in the context of
i = 1/0;
1/0 is not a constant expression.
A constant-expression is a syntactic category that reduces to conditional-expression (which excludes assignments and comma expressions). The production constant-expression appears in the grammar only in contexts that actually require a constant expression, such as case labels. So if you write:
switch (...) {
case 1/0:
...
}
then 1/0 is a constant expression -- and one that violates the constraint in 6.6p4: "Each constant expression shall evaluate to a constant that is in the range of representable
values for its type.", so a diagnostic is required. But the right hand side of an assignment does not require a constant-expression, merely a conditional-expression, so the constraints on constant expressions don't apply. A compiler can evaluate any expression that it's able to at compile time, but only if the behavior is the same as if it were evaluated during execution (or, in the context of if (0), not evaluated during execution().
(Something that looks exactly like a constant-expression is not necessarily a constant-expression, just as, in x + y * z, the sequence x + y is not an additive-expression because of the context in which it appears.)
Which means the footnote in N1570 section 6.6 that I was going to cite:
Thus, in the following initialization,
static int i = 2 || 1 / 0;
the expression is a valid integer constant expression with value one.
isn't actually relevant to this question.
Finally, there are a few things that are defined to cause undefined behavior that aren't about what happens during execution. Annex J, section 2 of the C standard (again, see the N1570 draft) lists things that cause undefined behavior, gathered from the rest of the standard. Some examples (I don't claim this is an exhaustive list) are:
A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial
preprocessing token or comment
Token concatenation produces a character sequence matching the syntax of a universal character name
A character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string
literal, a header name, a comment, or a preprocessing token that is
never converted to a token
An identifier, comment, string literal, character constant, or header name contains an invalid multibyte character or does not begin
and end in the initial shift state
The same identifier has both internal and external linkage in the same translation unit
These particular cases are things that a compiler could detect. I think their behavior is undefined because the committee didn't want to, or couldn't, impose the same behavior on all implementations, and defining a range of permitted behaviors just wasn't worth the effort. They don't really fall into the category of "code that will never be executed", but I mention them here for completeness.

This article discusses this question in section 2.6:
int main(void){
guard();
5 / 0;
}
The authors consider that the program is defined when guard() does not terminate. They also find themselves distinguishing notions of “statically undefined” and “dynamically undefined”, e.g.:
The intention behind the standard11 appears to be that, in general, situations are made statically undefined if it is not easy to generate code for them. Only when code can be generated, then the situation can be undefined dynamically.
11) Private correspondence with committee member.
I would recommend looking at the entire article. Taken together, it paints a consistent picture.
The fact that the authors of the article had to discuss the question with a committee member confirms that the standard is currently fuzzy on the answer to your question.

In this case the undefined behavior is the result of executing the code. So if the code is not executed, there is no undefined behavior.
Non executed code could invoke undefined behavior if the undefined behavior was the result of solely the declaration of the code (e.g. if some case of variable shadowing was undefined).

I'd go with the last paragraph of this answer: https://stackoverflow.com/a/18384176/694576
... UB is a runtime issue, not a compiletime issue ...
So, no, there is no UB invoked.

Only when the standard makes breaking changes and your code suddenly is no longer "never gets executed". But I don't see any logical way in which this can cause 'undefined behaviour'. Its not causing anything.

On the subject of undefined behaviour it is often hard to separate the formal aspects from the practical ones. This is the definition of undefined behaviour in the 1989 standard (I don't have a more recent version at hand, but I don't expect this to have changed substantially):
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely
with unpredictable results, to behaving during translation or program execution
in a documented manner characteristic of the environment (with or without the
issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
From a formal point of view I'd say your program does invoke undefined behaviour, which means that the standard places no requirement whatsoever on what it will do when run, just because it contains division by zero.
On the other hand, from a practical point of view I'd be surprised to find a compiler that didn't behave as you intuitively expect.

The standard says, as I remember right, it's allowed to do anything from the moment, a rule got broken. Maybe there are some special events with kind of global flavour (but I never heard or read about something like that)... So I would say: No this can't be UB, because as long the behavior is well defined 0 is allways false, so the rule can't get broken on runtime.

I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
I think the program does not invoke undefined behavior.
Defect Report #109 addresses a similar question and says:
Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming.
A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.

It depends on how the expression "undefined behavior" is defined, and whether "undefined behavior" of a statement is the same as "undefined behavior" for a program.
This program looks like C, so a deeper analysis of what the C standard used by the compiler (as some answers did) is appropriate.
In absence of a specified standard, the correct answer is "it depends". In some languages, compilers after the first error try to guess what the programmer might mean and still generate some code, according to the compilers guess. In other, more pure languages, once somerthing is undefined, the undefinedness propagate to the whole program.
Other languages have a concept of "bounded errors". For some limited kinds of errors, these languages define how much damage an error can produce. In particular languages with implied garbage collection frequently make a difference whether an error invalidates the typing system or does not.

Can an implementation specify undefined behavior

3.4.1
1 implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made
Can an implementation specify that, implementation-defined behavior is undefined behavior in cases where undefined behavior is a possible outcome?
For example:
6.3.1.3 Signed and unsigned integers
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
So as long as it is documented, can this result be undefined by the implementation and cause undefined behavior, or it must have a defined result for that implementaion?

No, semantically this is not possible. Undefined behavior is what the term states, behavior that is not defined by the standard. If the standard requests implementation defined behavior, it explicitly request the implementation to specify what it does when a certain error occurs.
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
So an implementation could say "under that and that circumstances this implementation raises a blurb signal" but it can't say "under that and that circumstances we don't tell you what we are doing".
Don't mystify undefined behavior as something that will happen or even as something that can be defined.

You can find a more detailed explanation of what "implementation defined" is intended to mean in the C Rationale - not a standard document per se, but a good reference.
In chapter 1.6 Definition of terms:
Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation-defined behavior is not an adequate response.
You can't make a "reasonable coding decision" based on undefined behavior.
The C FAQ (again, not a standard, but a well-know reference) is quite clear too:
implementation-defined: The implementation must pick some behavior; it may not fail to compile the program. (The program using the construct is not incorrect.) The choice must be documented. The Standard may specify a set of allowable behaviors from which to choose, or it may impose no particular requirements
Neither unspecified and implementation-defined behaviors are errors - compiler can't fail the translation. They are meant to give the implementation options to produce optimal code for the targeted environment.

The assignment of integers to smaller types is something of an oddball, since the standard clearly recognizes that some implementations may trap, but--uniquely--it requires that the trapping abide by the rules of signals; the decision to impose that requirement here but not elsewhere is somewhat curious, since in many cases it would impede what would otherwise be a straightforward optimization--replacing a signed integer variable whose type shorter than int, and whose address is never taken, with an int.
Nonetheless, for whatever reasons, the authors of the standard have gone out of their way to forbid that optimization. [Note: If I were in charge of the standard, I would specify that an explicit cast to a shorter integer type would yield a value which when cast to an unsigned type of the same size would yield the same result as casting the value directly whenever such a value exists, but that a store of an oversized value directly to an lvalue without a cast would not be thus constrained; I didn't write the standard, though].
It's ironic, actually: given:
uint64t signedpow(int32_t n, uint32_t p) {
uint64_t result;
while(p--) { n*=n; result+=n; }
return result;
}
uint64t unsignedpow(uint32_t n, uint32_t p) {
uint64_t result;
while(p--) { n*=n; result+=n; }
return result;
}
On a platform where int is 32 bits, the latter would have defined semantics for all values of n and p, while the former would not, but on a platform where int is 64 bits, the reverse would be true. A compiler for a typical 64-bit platform which didn't want to waste code on some other defined behavior would be required by the standard to mask off and sign-extended the signed n after each multiplication, but with some unsigned values the compiler could do anything it wanted, including going back in time and pretending that no implementation would ever promise to always perform half-sized unsigned multiplies in a fashion consistent with modular arithmetic.

As per C11 standard, chapter 3.4.1,
implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made
So, each implementation has to make a choice. Otherwise, it won't be conforming. Thus, we can say, it must have a defined result for that implementaion. That can be either of the below
a defined behavior, specific to that implementation, which will be exercised if the instruction is encountered.
a defined signal which will be raied if the instrcution is encountered. (Mostly, saying that cannot be handled.)
Related:
From C99 Rationale Document, Chapter 3, (emphasis mine)
Implementation-defined behavior gives an implementor the freedom to choose the appropriate approach, but requires that this choice be explained to the user. Behaviors designated as implementation-defined are generally those in which a user could make meaningful coding decisions based on the implementation’s definition. Implementors should bear in mind this criterion when deciding how extensive an implementation definition ought to be. As with unspecified behavior, simply failing to translate the source containing the implementation defined behavior is not an adequate response.
Now, nobody can take "meaningful coding decisions" based on undefined behavior.
From C FAQ, Question 11.33,
implementation-defined: The implementation must pick some behavior; it may not fail to compile the program. (The program using the construct is not incorrect.) The choice must be documented. The Standard may specify a set of allowable behaviors from which to choose, or it may impose no particular requirements.
The first sentence contains must, same as I mentioned before in my answer.

Can code that will never be executed invoke undefined behavior?

The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?
int main(void)
{
int i;
if(0)
{
i = 1/0;
}
return 0;
}
I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
So, any ideas?

This article discusses this question in section 2.6:
int main(void){
guard();
5 / 0;
}
The authors consider that the program is defined when guard() does not terminate. They also find themselves distinguishing notions of “statically undefined” and “dynamically undefined”, e.g.:
The intention behind the standard11 appears to be that, in general, situations are made statically undefined if it is not easy to generate code for them. Only when code can be generated, then the situation can be undefined dynamically.
11) Private correspondence with committee member.
I would recommend looking at the entire article. Taken together, it paints a consistent picture.
The fact that the authors of the article had to discuss the question with a committee member confirms that the standard is currently fuzzy on the answer to your question.

In this case the undefined behavior is the result of executing the code. So if the code is not executed, there is no undefined behavior.
Non executed code could invoke undefined behavior if the undefined behavior was the result of solely the declaration of the code (e.g. if some case of variable shadowing was undefined).

I'd go with the last paragraph of this answer: https://stackoverflow.com/a/18384176/694576
... UB is a runtime issue, not a compiletime issue ...
So, no, there is no UB invoked.

Only when the standard makes breaking changes and your code suddenly is no longer "never gets executed". But I don't see any logical way in which this can cause 'undefined behaviour'. Its not causing anything.

On the subject of undefined behaviour it is often hard to separate the formal aspects from the practical ones. This is the definition of undefined behaviour in the 1989 standard (I don't have a more recent version at hand, but I don't expect this to have changed substantially):
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely
with unpredictable results, to behaving during translation or program execution
in a documented manner characteristic of the environment (with or without the
issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
From a formal point of view I'd say your program does invoke undefined behaviour, which means that the standard places no requirement whatsoever on what it will do when run, just because it contains division by zero.
On the other hand, from a practical point of view I'd be surprised to find a compiler that didn't behave as you intuitively expect.

The standard says, as I remember right, it's allowed to do anything from the moment, a rule got broken. Maybe there are some special events with kind of global flavour (but I never heard or read about something like that)... So I would say: No this can't be UB, because as long the behavior is well defined 0 is allways false, so the rule can't get broken on runtime.

I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
I think the program does not invoke undefined behavior.
Defect Report #109 addresses a similar question and says:
Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming.
A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.

It depends on how the expression "undefined behavior" is defined, and whether "undefined behavior" of a statement is the same as "undefined behavior" for a program.
This program looks like C, so a deeper analysis of what the C standard used by the compiler (as some answers did) is appropriate.
In absence of a specified standard, the correct answer is "it depends". In some languages, compilers after the first error try to guess what the programmer might mean and still generate some code, according to the compilers guess. In other, more pure languages, once somerthing is undefined, the undefinedness propagate to the whole program.
Other languages have a concept of "bounded errors". For some limited kinds of errors, these languages define how much damage an error can produce. In particular languages with implied garbage collection frequently make a difference whether an error invalidates the typing system or does not.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight