Can addresses of unmodified locals wind up corrupted in setjmp/longjmp? - c

If one winds up in the situation of being stuck using setjmp/longjmp (don't ask), then there are lots of nice warnings from the compiler about when you might be doing something wrong.
But with a -Wall -Wextra -pedantic build while using Address Sanitizer in Clang, I wound up with a case roughly parallel to:
void outer() {
jmp_buf buf;
ERR error;
if (setjmp(buf) ? helper(&error) : FALSE) {
// process whatever helper decided to write into error
return;
}
// do the stuff you wanted to guard that may longjmp.
// error is never modified
}
On a longjmp, looking into the helper stack frame the error pointer is null. If I look in the outer() frame it says error has been "optimized out".
It's puzzling because I'm compiling with -O0 So the "optimized out" is weird for it to be saying. But as with most things longjmp-y, I wonder what keeps the compiler from possibly making a decision on what register it's going to put the error address in ahead of time...then having that be invalidated.
Is address sanitizer punking me, or do I actually have to write something like:
void outer() {
jmp_buf buf;
ERR error;
volatile ERR* error_ptr = &error;
if (setjmp(buf) ? helper(error_ptr) : FALSE) {
// process whatever helper decided to write into error
return;
}
// do the stuff you wanted to guard that may longjmp.
// error is never modified
}
As I research this, I've noticed that jmp_bufs are not locals in any of the examples I see. Is that something you can't do? :-/
NOTE: See #AnT's answer and comments below for the "language-lawyer" issue about the setjmp() ? ... : ... construct. But what I actually had going on here turned out to be a broken longjmp call that was after the function exited. Per longjmp() docs (also: common sense), that's definitely broken; I just didn't realize that's what had happened:
If the function that called setjmp has exited, the behavior is undefined (in other words, only long jumps up the call stack are allowed)

Is there a reason the helper call is "embedded" into the controlling expression of if through ?: operator? This is actually a violation of the language requirements that says
7.13.1.1 The setjmp macro
4 An invocation of the setjmp macro shall appear only in one of the
following contexts:
— the entire controlling expression of a selection
or iteration statement;
— one operand of a relational or equality
operator with the other operand an integer constant expression, with
the resulting expression being the entire controlling expression of
a selection or iteration statement;
— the operand of a unary !
operator with the resulting expression being the entire controlling
expression of a selection or iteration statement; or
— the entire
expression of an expression statement (possibly cast to void).
5 If
the invocation appears in any other context, the behavior is
undefined.
The whole point of that requirement is to make sure the "unpredictable" return from setjmp, triggered by longjmp, should not land in the middle of an expression evaluation, i.e. in an unsequenced context. In your specific example it is rather obvious that from the point of view of abstract C language, variable error cannot possibly be changed by setjmp call, which opens the door for many optimizations.
It is hard to say what happened here, since helper receives a pointer &error, not error's direct value. At the surface everything seems fine from practical point of view. But formally the behavior is undefined.
In your case, you should not try to fix thing by making variables volatile, but rather should simplify the context the setjmp is used in to conform with the above requirements. Something along the lines of
if (setjmp(buf) != 0) {
helper(&error);
...
return;
}

Related

Are noreturn attributes on exiting functions necessary?

Are noreturn attributes on never-returning functions necessary, or is this just an (arguably premature? -- at least for exits, I can't imagine why optimize there) optimization?
It was explained to me that in a context such as
void myexit(int s) _Noreturn {
exit(s);
}
// ...
if (!p) { myexit(1); }
f(*p);
/// ...
noreturn prevents the !p branch from being optimized out.
But is it really permissible for a compiler to optimize out that branch?
I realize the rationale for optimizing it out would be: "Undefined behavior can't happen. If p == NULL, dereferencing it is UB, therefore p can never be NULL in this context, therefore the !p branch does not trigger". But can't the compiler resolve the problem just as well by assuming that myexit could be a function that doesn't return (even if it's not explicitly marked as such)?
This allows for several optimizations to take place. First, for the call itself this may to allow for a simplified setup, not all registers have to be saved, a jmp instruction can be used instead of call or similar. Then the code after the call can also be optimized because there is no branching back to the normal flow.
So yes, usually _Noreturn is a valuable information to the compiler.
But as a direct answer to your question, no, this is a property for optimization, so it is not necessary.
Axiom: The standard is the definite resource on what's well-defined in C.
The standard specifies assert, therefore using assert is well-defined.
assert conditionally calls abort, a _Noreturn function, therefore that's allowed.
Every usage of assert is inside a function. Therefore functions may or may not return.
The standard has this example:
_Noreturn void g (int i) { // causes undefined behavior if i <= 0
if (i > 0) abort();
}
Therefore functions conditionally returning must not be _Noreturn.
This means:
For externally defined functions, the compiler has to assume the function might not return and isn't free to optimize out the if-branch
For "internally" defined functions, the compiler can check whether the function indeed always returns and optimize out the branch.
In both cases, compiled program behavior aligns with what a non-optimizing abstract C machine would do and the 'as-if' rule is observed.

Why is it that using the assignment operator inside an if not causing an error?

I am learning the basics of C-language and I did not understand how this code is working, It should give an error because I am using an assignment operator instead of using equal to (==) in the if block
#include<stdio.h>
int main()
{
int i=4;
if(i=5){
printf("Yup");
}
else
printf("Nope");
}
While it might not seem intuitive, an assignment is actually a valid expression, with the assigned value being the value of the expression.
So when you see this:
if(i=5){
It is effectively:
if(5){
So why is this behavior allowed? A classic example is that it allows you to call a function, save the return value, and check the return value in one shot:
FILE *fp;
if ((fp = fopen("filename","r")) == NULL) {
perror("fopen failed");
exit(1);
}
// use fp
Here, the return value of fopen is assigned to fp, then fp is checked to see if it is NULL, i.e. if fopen failed.
Assignment operator when used inside an if statement will not give any error..
The assignent i = 5 will take place, and the if statement will be evaluated according to the result of the expression on the right side of the =. In this case that is 5.
The expression i=5 evaluates to a non zero value , hence the if() condition turns true.
I believe this question can be interpreted in two different ways. The first is the most literal: "Why does a C compiler allow this syntax?" The second is probably more vauge: "Why was C designed to allow such syntax to be legal?"
The answer to the first can be found in The C Programming Language (a highly recommend book if you do not already have it) and comes down too "because the language says so. It's just the way it is defined.
In the book you can refer to Appendix A to find a description of how the grammar is broken down. Specifically A7. Expressions, and A9. Statements.
A9.4 Selection Statements states:
selection-statement:
if ( expression ) statement
if ( expression ) statement else statement
switch ( expression ) statement
Meaning that any valid expression, of which assignment applies, is legal as the 'argument' to the selection with a minor cavet (emphasis is my own):
In both forms of the if statement, the expression, which must have arithmetic or pointer type, is evaluated, including all side effects, and if it compares unequal to 0, the first substatement is executed.
This might seem odd if you are coming from a language like Java, that requires the result of an expression used in a conditional to be expressly 'boolean' in nature, that attempts to lower runtime errors that are the results of typographical issues (i.e. using = instead of ==).
As for why C's syntax is like this I am not sure. A quick Google search returns nothing immediately but I offer this conjection (in which I stess I have found nothing to back up my claim and my experience with assembly languages is minimal):
C was designed to be a low level language that mapped closely to assembly level mechanisms; making it easier to implement a compiler for, and to translate assembly to.
In assembly level languages branches are the results of instructions that look at registers and decided to do. The work previously placed in the register is of no concern. Decrementing a counter is not a boolean operation but testing the resulting value in the register is. Allowing a general expression possibly made implementations of C easier to write. The original compiler written by Dennis Ritche simply spat our assembly files that needed to be assembled manually.
In C, the assignment operator = is just that: an operator. You can use it everywhere where an expression is expected,† including in the control expression of an if statement. Modern compilers typically warn about this, make sure to turn on this warning.
 † Except where a constant expression is expected as an expression involving the = operator is not a constant expression.

What does section 5.1.2.3, paragraph 4 (in n1570.pdf) mean for null operations?

I have been advised many times that accesses to volatile objects can't be optimised away, however it seems to me as though this section, present in the C89, C99 and C11 standards advises otherwise:
... An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
If I understand correctly, this sentence is stating that an actual implementation can optimise away part of an expression, providing these two requirements are met:
"its value is not used", and
"that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)"...
It seems to me that many people are confusing the meaning for "including" with the meaning for "excluding".
Is it possible for a compiler to distinguish between a side effect that's "needed", and a side effect that isn't? If timing is considered a needed side effect, then why are compilers allowed to optimise away null operations like do_nothing(); or int unused_variable = 0;?
If a compiler is able to deduce that a function does nothing (eg. void do_nothing() { }), then is it possible that the compiler might have justification to optimise calls to that function away?
If a compiler is able to deduce that a volatile object isn't mapped to anything crucial (i.e. perhaps it's mapped to /dev/null to form a null operation), then is it possible that the compiler might also have justification to optimise that non-crucial side-effect away?
If a compiler can perform optimisations to eliminate unnecessary code such as calls to do_nothing() in a process called "dead code elimination" (which is quite the common practice), then why can't the compiler also eliminate volatile writes to a null device?
As I understand, either the compiler can optimise away calls to functions or volatile accesses or the compiler can't optimise away either, because of 5.1.2.3p4.
I think the "including any" applies to the "needed side-effects" , whereas you seem to be reading it as applying to "part of an expression".
So the intent was to say:
... An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced .
Examples of needed side-effects include:
Needed side-effects caused by a function which this expression calls
Accesses to volatile variables
Now, the term needed side-effect is not defined by the Standard. Section /4 is not attempting to define it either -- it's trying (and not succeeding very well) to provide examples.
I think the only sensible interpretation is to treat it as meaning observable behaviour which is defined by 5.1.2.3/6. So it would have been a lot simpler to write:
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no observable behaviour would be caused.
Your questions in the edit are answered by 5.1.2.3/6, sometimes known as the as-if rule, which I'll quote here:
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
This is the observable behaviour of the program.
Answering the specific questions in the edit:
Is it possible for a compiler to distinguish between a side effect that's "needed", and a side effect that isn't? If timing is considered a needed side effect, then why are compilers allowed to optimise away null operations like do_nothing(); or int unused_variable = 0;?
Timing isn't a side-effect. A "needed" side-effect presumably here means one that causes observable behaviour.
If a compiler is able to deduce that a function does nothing (eg. void do_nothing() { }), then is it possible that the compiler might have justification to optimise calls to that function away?
Yes, these can be optimized out because they do not cause observable behaviour.
If a compiler is able to deduce that a volatile object isn't mapped to anything crucial (i.e. perhaps it's mapped to /dev/null to form a null operation), then is it possible that the compiler might also have justification to optimise that non-crucial side-effect away?
No, because accesses to volatile objects are defined as observable behaviour.
If a compiler can perform optimisations to eliminate unnecessary code such as calls to do_nothing() in a process called "dead code elimination" (which is quite the common practice), then why can't the compiler also eliminate volatile writes to a null device?
Because volatile accesses are defined as observable behaviour and empty functions aren't.
I believe this:
(including any caused by calling a function or accessing a volatile
object)
is intended to be read as
(including:
any side-effects caused by calling a function; or
accessing a volatile variable)
This reading makes sense because accessing a volatile variable is a side-effect.

Is a goto in alloca's function scope valid?

The C standard prohibits a goto into a function scope where a VLA exists.
A VLA and the call to alloca function should have the same result on low level.
(I could be wrong, as I'm just a C, not a low level programmer, but in my imagin that appears to be witty)
So will the following snippet be also undefined behaivng?
int main()
{
char *p;
goto label1;
{
p = _alloca(1);
label1:
p = NULL;
}
}
Ofcourse I cant refference p, but whats about the behaviour?
Actually, the rule 6.8.6.1 states:
A goto statement is not allowed to jump past any declarations of objects
with variably modified types.
In your code, there does not exist an object with variably modified types. alloca does not declare an object (that the compiler has to care of). Thus, there is nothing like a scope for alloca, and no reason for undefined behavior in the sense of rule 6.8.6.1.
EDIT
To elaborate the answer a bit: the "undefinedness" of the behavior in case of VLA is due to the promise of a declaration that an object is "known" within its scope (at language level). In general, a declaration sets a context for code execution. There is no need that it is executed at runtime. However, this is not true in case of VLA: here this promise is implemented partly at runtime, breaking C's static declaration approach. To avoid further conflicts that would lead to a dynamic typing system, rule 6.8.6.1 avoids such conflicts.
In contrast, at language level alloca is simply a function; its call does not constitute any scope. It makes only a promise about its run-time behavior in case it is called. If it isn't called, we do not "expect" anything from a function. Thus, its pure existence does not raise any conflict: both cases (bypassing or non-bypassing) have a well defined semantic.
A VLA and the call to alloca function should have the same result on low level.
There are still a few differences. A VLA object is discarded when the scope where it is declared ends and the memory object allocated by alloca is discarded when the function returns.
This makes a difference because the requirement in c99, 6.8.6.1p1 ("A goto statement shall not jump from outside the scope of an identifier having a variably modified type to inside the scope of that identifier") is concerned by the runtime allocation / deallocation of objects with variably modified type. Here the alloca statement is not executed after goto is performed, so I would not think the goto call would invoke undefined behavior.
The C Standard has nothing to say about the behavior of alloca(). Some compilers use the stack in a very predictable fashion, and access automatic variables using a largely-redundant frame pointer. On such compilers, it's possible to reserve space on the stack by simply subtracting a value from the stack pointer, without the compiler having to know or care about the reservation in question. Such code will break badly, however, if the compiler uses the stack in ways that the application wasn't expecting.
I don't think that something like:
int *p = 0;
if (!foo(1,2,3))
goto skip_alloca;
p=alloca(46);
skip_alloca:
bar(4,5,6);
...
is apt to be any more dangerous than:
int *p = 0;
if (foo(1,2,3))
p=alloca(46);
bar(4,5,6);
...
If there is no residue on the stack from the function call at the time the alloca() is performed, either operation would likely be safe. If there is residue on the stack at the time of the alloca (e.g. because the compiler opts to defer cleanup of foo's arguments until after the call to bar) that would make alloca() misbehave badly. Using the goto version of the code might actually be safer than the version with if, because it would be harder for a compiler to identify that deferring the cleanup from foo might be advantageous.

Is return an operator or a function?

This is too basic I think, but how do both of these work?
return true; // 1
and
return (true); // 2
Similar: sizeof, exit
My guess:
If return was a function, 1 would be
erroneous.
So, return should be a unary
operator that can also take in
brackets... pretty much like unary
minus: -5 and -(5), both are
okay.
Is that what it is - a unary operator?
return is a keyword that manipulates control flow. In that it's similar to if, for etc. It can be used with or without an expression (return; returns from a void function). Of course, as with all expressions, extra parentheses are allowed. (So return (42); is similar to int i = (4*10+2);, in both cases the parentheses are redundant, but allowed.)
sizeof is a keyword that is an operator, similar to new, delete, +, ->, ::, etc.
std::exit() is an identifier that denotes a function of the C standard library (which never returns to the caller).
return is just a language/control flow construct. It's certainly not a function, since it's syntactically irreducible, and it's not really an operator either, since it has no return value.
return is not an operator and is not a function. return is a keyword that forms a return statement, which belongs to the category of jump statements. In that regard it has absolutely no similarities with either sizeof or exit.
The requirement to put () around the argument of return existed in ancient pre-standard versions of C (CRM C, for example), but was quickly eliminated, even though the quirky habit to wrap the argument of return in superfluous () can be seen from time to time even today.
return is a control flow keyword, just like goto, break, continue, if, else ... Do not think of it as an operator, because it does not alter the value behind it. The () are just to evaluate expressions and the result of the evaluated expression will be passed along to the calling function (how depends om compiler implementation).
It is also certainly no function, just think about it: how would you return from return?
"return" is neither a routine nor an operator.
It translates to well known assembler instruction. For example, on the x86 architecture, it translates to "ret", and on the PowerPC, architecture it translates to "blr".
For the value it returns, the compiler moves that value into the appropriate register(s) prior to issuing the return instruction. On the x86 architecture, this is typically EAX and EDX if necessary--the registers will change slightly for x86-64. On PPC, if memory serves, it is r1--others may correct me if I am wrong on that detail.
Hope this helps.
'true' is an Expression,
'(true)' is an Expression.
return can always be followed by an expression, but for the return to type check, the expression must have the same type of the return type of the function.
hense you can generalize it by saying
return Expression.
(In a function with a void return type, return may not be followed by an expression; a bare return simply exits the function.)
"The requirement to put () around the argument of return existed in ancient pre-standard versions of C (CRM C, for example), but was quickly eliminated, even though the quirky habit to wrap the argument of return in superfluous () can be seen from time to time even today."
Yeah, you know you are looking at some old code or someone thinks return is a function when you see them using parens with it all the time.
My college instructor did that and it annoyed me all the time.
Oh well at least he was consistent.

Resources