Does this C code produce "undefined behavior"? - c

I'm reading an interesting article A Guide to Undefined Behavior in C and C++, Part 1 on undefined behavior in C and C++. Often I do the following in my code:
int i = 10;
i = (++i) % 7;
Does this produce undefined behavior? On x86? ARM? Perhaps it depends on the compiler?

It's undefined behavior because i is modified more than once without an intervening sequence point.
It depends on the compiler only in the sense that there are no requirements about what the code will do, so every compiler can do something different. To be clear - just because even though you get results that seem to make sense (sometimes), the code is a bug.

Yes - as per standard ISO C.
Is this undefined C behaviour?
What are all the common undefined behaviours that a C++ programmer should know about?
Though, a compiler is expected to produce consistent result.

Related

Does i=i++ cause undefined behavior even if not executed? [duplicate]

The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?
int main(void)
{
int i;
if(0)
{
i = 1/0;
}
return 0;
}
I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
So, any ideas?
Let's look at how the C standard defines the terms "behavior" and "undefined behavior".
References are to the N1570 draft of the ISO C 2011 standard; I'm not aware of any relevant differences in any of the three published ISO C standards (1990, 1999, and 2011).
Section 3.4:
behavior
external appearance or action
Ok, that's a bit vague, but I'd argue that a given statement has no "appearance", and certainly no "action", unless it's actually executed.
Section 3.4.3:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
It says "upon use" of such a construct. The word "use" is not defined by the standard, so we fall back to the common English meaning. A construct is not "used" if it's never executed.
There's a note under that definition:
NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
So a compiler is permitted to reject your program at compile time if its behavior is undefined. But my interpretation of that is that it can do so only if it can prove that every execution of the program will encounter undefined behavior. Which implies, I think, that this:
if (rand() % 2 == 0) {
i = i / 0;
}
which certainly can have undefined behavior, cannot be rejected at compile time.
As a practical matter, programs have to be able to perform runtime tests to guard against invoking undefined behavior, and the standard has to permit them to do so.
Your example was:
if (0) {
i = 1/0;
}
which never executes the division by 0. A very common idiom is:
int x, y;
/* set values for x and y */
if (y != 0) {
x = x / y;
}
The division certainly has undefined behavior if y == 0, but it's never executed if y == 0. The behavior is well defined, and for the same reason that your example is well defined: because the potential undefined behavior can never actually happen.
(Unless INT_MIN < -INT_MAX && x == INT_MIN && y == -1 (yes, integer division can overflow), but that's a separate issue.)
In a comment (since deleted), somebody pointed out that the compiler may evaluate constant expressions at compile time. Which is true, but not relevant in this case, because in the context of
i = 1/0;
1/0 is not a constant expression.
A constant-expression is a syntactic category that reduces to conditional-expression (which excludes assignments and comma expressions). The production constant-expression appears in the grammar only in contexts that actually require a constant expression, such as case labels. So if you write:
switch (...) {
case 1/0:
...
}
then 1/0 is a constant expression -- and one that violates the constraint in 6.6p4: "Each constant expression shall evaluate to a constant that is in the range of representable
values for its type.", so a diagnostic is required. But the right hand side of an assignment does not require a constant-expression, merely a conditional-expression, so the constraints on constant expressions don't apply. A compiler can evaluate any expression that it's able to at compile time, but only if the behavior is the same as if it were evaluated during execution (or, in the context of if (0), not evaluated during execution().
(Something that looks exactly like a constant-expression is not necessarily a constant-expression, just as, in x + y * z, the sequence x + y is not an additive-expression because of the context in which it appears.)
Which means the footnote in N1570 section 6.6 that I was going to cite:
Thus, in the following initialization,
static int i = 2 || 1 / 0;
the expression is a valid integer constant expression with value one.
isn't actually relevant to this question.
Finally, there are a few things that are defined to cause undefined behavior that aren't about what happens during execution. Annex J, section 2 of the C standard (again, see the N1570 draft) lists things that cause undefined behavior, gathered from the rest of the standard. Some examples (I don't claim this is an exhaustive list) are:
A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial
preprocessing token or comment
Token concatenation produces a character sequence matching the syntax of a universal character name
A character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string
literal, a header name, a comment, or a preprocessing token that is
never converted to a token
An identifier, comment, string literal, character constant, or header name contains an invalid multibyte character or does not begin
and end in the initial shift state
The same identifier has both internal and external linkage in the same translation unit
These particular cases are things that a compiler could detect. I think their behavior is undefined because the committee didn't want to, or couldn't, impose the same behavior on all implementations, and defining a range of permitted behaviors just wasn't worth the effort. They don't really fall into the category of "code that will never be executed", but I mention them here for completeness.
This article discusses this question in section 2.6:
int main(void){
guard();
5 / 0;
}
The authors consider that the program is defined when guard() does not terminate. They also find themselves distinguishing notions of “statically undefined” and “dynamically undefined”, e.g.:
The intention behind the standard11 appears to be that, in general, situations are made statically undefined if it is not easy to generate code for them. Only when code can be generated, then the situation can be undefined dynamically.
11) Private correspondence with committee member.
I would recommend looking at the entire article. Taken together, it paints a consistent picture.
The fact that the authors of the article had to discuss the question with a committee member confirms that the standard is currently fuzzy on the answer to your question.
In this case the undefined behavior is the result of executing the code. So if the code is not executed, there is no undefined behavior.
Non executed code could invoke undefined behavior if the undefined behavior was the result of solely the declaration of the code (e.g. if some case of variable shadowing was undefined).
I'd go with the last paragraph of this answer: https://stackoverflow.com/a/18384176/694576
... UB is a runtime issue, not a compiletime issue ...
So, no, there is no UB invoked.
Only when the standard makes breaking changes and your code suddenly is no longer "never gets executed". But I don't see any logical way in which this can cause 'undefined behaviour'. Its not causing anything.
On the subject of undefined behaviour it is often hard to separate the formal aspects from the practical ones. This is the definition of undefined behaviour in the 1989 standard (I don't have a more recent version at hand, but I don't expect this to have changed substantially):
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely
with unpredictable results, to behaving during translation or program execution
in a documented manner characteristic of the environment (with or without the
issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
From a formal point of view I'd say your program does invoke undefined behaviour, which means that the standard places no requirement whatsoever on what it will do when run, just because it contains division by zero.
On the other hand, from a practical point of view I'd be surprised to find a compiler that didn't behave as you intuitively expect.
The standard says, as I remember right, it's allowed to do anything from the moment, a rule got broken. Maybe there are some special events with kind of global flavour (but I never heard or read about something like that)... So I would say: No this can't be UB, because as long the behavior is well defined 0 is allways false, so the rule can't get broken on runtime.
I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.
I think the program does not invoke undefined behavior.
Defect Report #109 addresses a similar question and says:
Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming.
A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.
It depends on how the expression "undefined behavior" is defined, and whether "undefined behavior" of a statement is the same as "undefined behavior" for a program.
This program looks like C, so a deeper analysis of what the C standard used by the compiler (as some answers did) is appropriate.
In absence of a specified standard, the correct answer is "it depends". In some languages, compilers after the first error try to guess what the programmer might mean and still generate some code, according to the compilers guess. In other, more pure languages, once somerthing is undefined, the undefinedness propagate to the whole program.
Other languages have a concept of "bounded errors". For some limited kinds of errors, these languages define how much damage an error can produce. In particular languages with implied garbage collection frequently make a difference whether an error invalidates the typing system or does not.

What is the difference between "undefined behaviour" and "implementation defined behaviour", or why even distinguish between them? [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 4 years ago.
The C standard (AFAIK) uses both terms. I have trouble understanding where the difference between the two is.
If I have any given, syntactically correct C statement, there can be no way that a compiler will not issue some machine instructions. Of course, it could choose to not issue any statement at all, but even that would be "implementation dependant".
A more concrete example: Overflow of integer values. Now we have two types of overflow: arithmetic, and memory-wise. If the overflow of signed integers is UB according to the standard, what does that mean? Could an implementation simply spill an overflowing bit into the adjacent byte to the MSB? (Never seen that, but would it be ok?)
It appears to me that "undefined behaviour" always is implementation dependant. Or, to put it differently, there seems no way a compiler could handle any "undefined behaviour" without introducing "implementation defined" behaviour.
So why even distinguish between the two?
The main difference is that implementation-defined behavior is defined. That is, for every requirement in the Standard that says "implementation-defined", a C implementation is supposed to come with an explanation of what that behavior is.
For example, here is the GCC documentation on implementation-defined behavior for the C language.
Also, in many cases, "implementation-defined" allows for a decision for one of a number of particular possible behaviors. But "undefined behavior" always allows an implementation to do anything at all, either at compile time or at run time.
Read also Lattner's blog What Every Programmer Should Know About Undefined Behavior.

Why does the following code give different results when compiling with gcc and g++?

#include<stdio.h>
int main()
{
const int a=1;
int *p=(int *)&a;
(*p)++;
printf("%d %d\n",*p,a);
if(a==1)
printf("No\n");//"No" in g++.
else
printf("Yes\n");//"Yes" in gcc.
return 0;
}
The above code gives No as output in g++ compilation and Yes in gcc compilation. Can anybody please explain the reason behind this?
Your code triggers undefined behaviour because you are modifying a const object (a). It doesn't have to produce any particular result, not even on the same platform, with the same compiler.
Although the exact mechanism for this behaviour isn't specified, you may be able to figure out what is happening in your particular case by examining the assembly produced by the code (you can see that by using the -S flag.) Note that compilers are allowed to make aggressive optimizations by assuming code with well defined behaviour. For instance, a could simply be replaced by 1 wherever it is used.
From the C++ Standard (1.9 Program execution)
4 Certain other operations are described in this International
Standard as undefined (for example, the effect of attempting to
modify a const object). [ Note: This International Standard imposes
no requirements on the behavior of programs that contain undefined
behavior. —end note ]
Thus your program has undefined behaviour.
In your code, notice following two lines
const int a=1; // a is of type constant int
int *p=(int *)&a; // p is of type int *
you are putting the address of a const int variable to an int * and then trying to modify the value, which should have been treated as const. This is not allowed and invokes undefined behaviour.
For your reference, as mentioned in chapter 6.7.3, C11 standard, paragraph 6
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is
made to refer to an object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined
So, to cut the long story short, you cannot rely on the outputs for comaprison. They are the result of undefined behaviour.
Okay we have here 'identical' code passed to "the same" compiler but once
with a C flag and the other time with a C++ flag. As far as any reasonable
user is concerned nothing has changed. The code should be interpreted
identically by the compiler because nothing significant has happened.
Actually, that's not true. While I would be hard pressed to point to it in
a standard but the precise interpretation of 'const' has slight differences
between C and C++. In C it's very much an add-on, the 'const' flag
says that this normal variable 'a' should not be written to by the code
round here. But there is a possibility that it will be written to
elsewhere. With C++ the emphasis is much more to the immutable constant
concept and the compiler knows that this constant is more akin to an
'enum' that a normal variable.
So I expect this slight difference means that slightly different parse
trees are generated which eventually leads to different assembler.
This sort of thing is actually fairly common, code that's in the C/C++
subset does not always compile to exactly the same assembler even with
'the same' compiler. It tends to be caused by other language features
meaning that there are some things you can't prove about the code right
now in one of the languages but it's okay in the other.
Usually C is the performance winner (as was re-discovered by the Linux
kernel devs) because it's a simpler language but in this example, C++
would probably turn out faster (unless the C dev switches to a macro
or enum
and catches the unreasonable act of taking the address of an immutable constant).

Compiling C code using multiple compilers

I have just recently decided to learn C. I notice there is multiple compilers I can download. If I write C code for one compiler, it should work for all of the compilers, correct?
Short answer: yes
Long answer:
Yes, but only if (and not limited to):
Your code doesn't use compiler specific stuff that's not available on the other compiler
The libraries your code relies on are available and set up correctly on the other compiler
Your code doesn't invoke/rely on undefined or implementation-defined behavior
The other compiler compiles roughly with the same C standard your current compiler.
I'll add more to the list as I think of them.
In the C standard there are two types of 'compiler-dependent' issues defined:
Implementation-defined behavior: The behavior may vary from compiler to compiler, but the compiler must provide some sort of consistent behavior, and must document this behavior.
An example, straight from the standard: "An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.". In other words, the result of -1 >> 1 may vary between compilers, but the compiler has to be consistent about it.
Undefined behavior: The moment you hit undefined behavior, anything - and I do mean anything can happen.
You also need to watch out for constraint violations. Often the standard specifies things like "[main] shall be defined with a return type of int [...]" (§5.1.2.2.1/1). This is equivalent to, "If main is declared with a return type other than int, the program's behavior is undefined." (see §4.2, where the standard explicitly endorses this interpretation)
Note that some implementation-defined behavior has limits - eg, the value of sizeof(int) is implementation defined, but you know that sizeof(int) >= sizeof(short) && sizeof(int) <= sizeof(long) - so just having any implementation-defined behavior doesn't mean you can't say anything about what the program does.

Is there specific documentation for the behavior of "i=i--" in gcc?

Once again, our best loved "i=i--" -like issues. In C99 we have:
6.5 Expressions #2: Between the previous and next sequence point an
object shall have its stored value
modified at most once
70) This paragraph renders
!!undefined!! statement expressions
such as
i = ++i + 1;
But for undefinded behavior there can be variants from random output to "program execution in a documented manner" (c99 3.4.3)
So, the question:
Does gcc document the behavior for i=i++, i=i--, and so on statements?
Actual code is
int main(){int i=2;i=i--;return i;}
GCC does not document this behaviour. The Warning Options page mentions sequence points issues in -Wsequence-point, but does not hint at well-defined sematics for violations.
GCC does have a nice list of C Implementation Defined Behaviour, but I could not find any reference to this issue here either.
It's left to the back-end implementation to decide what it does. You can use -S and inspect the generated code to determine the exact sequence of events.
It is not documented but even it it was, I wouldn't want to read it. You should never rely on what a particular implementation does when running into undefined behavior.
why on earth would you want to do that? Seriously. I'm curious.

Resources