This is a question about norms in standard C11, concerning to side effects when function arguments are evaluated in an expression.
I am trying to define a macro in standard C that emulate the "method"-like syntax of an OOP language, in a rudimentary way.
I have designed a solution, whose main ideas I will expose here, but I have some doubts about its conformance to C11.
I need to do the exposition first and at the end I will make the specific question, which is related to evaluation of expressions involving function calls. Sorry by the long post.
So, given a struct, or well a struct * object x, I would be happy if I could do a "method" call, in this way:
x->foo_method();
The typical manner in that this problem is solved is something like that:
Define the "class" by means of a struct declaration:
typedef struct foo_s { void foo_method(struct foo_s * this); } *foo_class;
foo_class x = malloc(sizeof(struct foo_s));
x->foo_method = some_function_defined_over_there___;
Then make the call by repeating the object in the ``this'' parameter:
x->foo_method(x);
One can try to define some kind of "method-call" macro:
#define call(X, M) ((X)->M(X))
However this approach is bad, since the evaluation of X can duplicate side-effects (this is the well-known pitfail of repeating a macro parameter twice).
[By using tricky macros I can handle the case of an arbitrary number of parameters for the method M, for example, by using __VA_ARGS__ and a few of intermediate macro hacks.]
To solve the problem of repetition of macro arguments, I decided to implement a global stack, maybe hidden as a static array in a function:
(void*) my_stack(void* x, char* operation)
{
static void* stack[100] = { NULL, };
// 'operation' selects "push" or "pop" operations on the stack.
// ...
// IF (operation == "push") then 'x' itself is returned again.
}
So now, I avoid the duplication of side effects in the macro, by writtin `X' only once:
#define call(X, M) (((foo_class)my_stack((X), "push")) -> M (my_stack(0,"pop")))
As you can see, my intent is that the function-like macro be considered by the C-compiler as a expression, whose value is the value returned by the method M.
I have written only once the paramater X inside the macro-body, its value was stored in the stack. Since one needs this value be able to access the "method" member of X itself, this is the reason why the function my_stack returns the value of x: I need to reuse it immediately as part of the same expression that has pushed the value x in the stack.
Although this idea seems to easily solve the problem of duplication of X in the call(X,M) macro, they appear more issues.
One can have "methods" whose arguments are also objects stored in the stack by using the same macro call().
Even more, we could have arguments in the "method" whose values are obtained as the result of evaluating other "methods".
Finally, other functions or methods appearing as arguments of a given "method" could modify the stack, because they are, probably, functions modifying the stack by using the call() macro.
I want that my macro be consistent in all that cases. For example, let us suppose that x1,x2,x3, are foo_class objects.
On the other hand, let us suppose that we have, in foo_class, the following "method" member:
int (*meth)(foo_class this, int, int);
Finally, we could make the "method" call:
call(x1, meth, (call (x2, 2, 2), call(x3, 3, 3)) ) ;
[The real syntax for the macro is not necessarilly as it's been showing there. I expect that the main idea is understood.]
The intent is to emulate this function call:
x1->meth(x1, x2->meth(x2,2,2), x3->meth(x3,3,3));
The problem here is that I'm using a stack to emulate the following duplications of objects in the call: x1->meth(x1,....), x2->meth(x2,...), x3->meth(x3,...).
For example: ((foo_class)(my_stack(x2,"push"))) -> meth (my_stack(0,"pop"), ...).
MY QUESTION IS: Can I be always sure the paring "push"/"pop" in any possible expression (that consistently use the call() macro) gives all the time the expected pair of objects?
For example, if I am "pushing" x2, it would be completely wrong that the x3 be "popped".
MY CONJECTURE IS: The answer would be YES, but after a deep analysis of the standard document ISO C11 around the topic of sequence points.
There is a sequence point between the expression that produces the "method" (actually, the "function") to be called, and the expressions of the arguments to be passed to it.
Thus, for example, x1 is stored in the stack before the meth method is considered to be called.
There is a sequence point after the evaluation of all the arguments passed to the function and before the actual function call.
Thus, for example, if new objects x4, x5, etc. are "pushed"/"popped" in the stack while the call to x1->meth(x1...x2...x3) happens, these objects x4 and x5 will appear and disappear in the stack after x2, x3, have already gone from the stack.
There are not any sequence point between arguments in a function call.
Thus, the following expressions could interleave when they are been evaluated (when they are argurments of the function call shown above, involving x1,x2,x3):
my_stack(x2,"push") -> meth(my_stack(0,"pop"),2,2)
my_stack(x3,"push") -> meth(my_stack(0,"pop"),3,3)
It could happen that after the objects x2 and x3 be "pushed" in the stack, the "pop" operations could happen ill paired: x3 could be "popped" in the meth(...,2,2) line, and x2 could be "popped" in the meth(...,3,3) line, against the desired.
This situationg is completely unlikely, and it seems that under Standard C99 there is not formal solution.
However, in C11 we have the concept of inderminately sequenced side effects.
We have that:
When a function is called, all its side effects are resolved indeterminaly sequenced respect any other expression around the expression that makes the function call. [See paragraph (12) here: sequence points].
Since the side effects in the function call of meth involved in the expression:
my_stack(x2,"push") -> meth(my_stack(0,"pop"),2,2)
have to resolved "completely before" or "completely after" the side effects in:
my_stack(x3,"push") -> meth(my_stack(0,"pop"),3,3)
I conclude that the "push" and "pop" operations are well paired.
Is it OK my interpretation of the standard? I will cite it, just in case:
[C11, 6.5.2.2/10] There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.94)
[Footnote 94]: In other words, function executions do not ‘‘interleave’’ with each other.
That is, although the order of evaluation of arguments in a function call cannot be predicted, I think that one can, anyway, be sure that the rules of "sequence" stablished in ISO C11 are enough to ensure that the "push" and "pop" operations work well in this case.
So, a "method"-like syntax can be used in C, in order to emulate a rudimentary but consistent OOP capability of "methods as members of objects".
No, I don't think you can be guaranteed that this does what you want.Let us decompose your expression
my_stack(x2,"push") -> meth(my_stack(0,"pop"),2,2)
<<<<<< A >>>>>>>>>> <<<<<<< B >>>>>>
<<<<<<<<<<<<< C >>>>>>>>>>>
<<<<<<<<<<<<<<<<<<<<<<< D >>>>>>>>>>>>>>>>>>>>>>>>
The evaluations of B and C are completely independent and must both be done before the function call D. The arguments of a function and the function designator are not much different for that.
Because A and B are function calls, they are in fact sequenced, but this is indeterminely, so you don't know which one comes first and which second.
I think you would be far better off by making your call an inline function. If you really need versions of call for different types, you could go to select the function with a _Generic expression. But as someone already said in the comments, you are really at the limits of what you should reasonably do in C.
Related
I am trying to work with the stdatomic.h functions, specifically atomic_flag_test_and_set. I am not seeing any errors, but want to know if what I am doing is always safe. I have a struct like the following:
typedef struct Mystruct {
int somedata;
atomic_flag flag;
} Mystruct;
Later, when I create a mystruct and use its instance of the flag, I do so like this:
if(atomic_flag_test_and_set(&mystructInstance->flag)) {
// do something
}
Is the evaluation of &mystructInstance->flag always completed before the check for the atomic operation? I would assume so since it should be one processor instruction (or something that emulates one processor instruction), but I want to make sure.
Is the evaluation of &mystructInstance->flag always completed before the check for the atomic operation?
The answer to this question can be found in the section on "Function calls" in the C standard.
6.5.2.2 Function calls
...
4. An argument may be an expression of any complete object type. In preparing for the call to a function, the arguments are evaluated, and each parameter is assigned the value of the corresponding argument.
Also note that if a function takes more than one parameter, the order of evaluation of the arguments passed to it is unspecified. This also is mentioned in the same section in the standard.
10.There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.
I know that how arguments are passed to functions is not part of the C standard, and is dependent on the hardware architecture and calling convention.
I also know that an optimizing compiler may automatically inline functions to save on call overhead, and omit code that has no "side effects".
But, I have a question about a specific case:
Lets say there is a non trivial function that can not be inlined or removed, and must be called, that is declared to take no arguments:
int veryImportantFunc() {
/* do some important stuff */
return result;
}
But this function is called with arguments:
int result = veryImportantFunc(1, 2, 3);
Is the compiler allowed to call the function without passing these arguments?
Or is there some standard or technical limitation that would prevent this kind of optimization?
Also, what if argument evaluation has side effects:
int counter = 1;
int result = veryImportantFunc(1, ++counter, 3);
Is the compiler obligated to evaluate even without passing the result, or would it be legal to drop the evaluation leaving counter == 1?
And finally, what about extra arguments:
char* anotherFunc(int answer) {
/* Do stuff */
return question;
}
if this function is called like this:
char* question = anotherFunc(42, 1);
Can the 1 be dropped by the compiler based on the function declaration?
EDIT: To clarify: I have no intention of writing the kind of code that is in my examples, and I did not find this in any code I am working on.
This question is to learn about how compilers work and what the relevant standards say, so to all of you who advised me to stay away from this kind of code: thank you, but I already know that.
To begin with, "declared to take no arguments" is wrong. int veryImportantFunc() is a function accepting any arguments. This is obsolete C style and shouldn't be used. For a function taking no arguments, use (void).
Is the compiler allowed to call the function without passing these arguments?
If the actual function definition does not match the number of arguments, the behavior is undefined.
Also, what if argument evaluation has side effects
Doesn't matter, since arguments are evaluated (in unspecified order) before the function is called.
Is the compiler obligated to evaluate even without passing the result, or would it be legal to drop the evaluation leaving counter == 1?
It will evaluate the arguments and then invoke undefined behavior. Anything can happen.
And finally, what about extra arguments:
Your example won't compile, as it isn't valid C.
The following quotes from the C standard are relevant to your different questions:
6.5.2.2 Function calls
...
2. If the expression that denotes the called function has a type that includes a prototype, the number of arguments shall agree with the number of parameters.
...
4. An argument may be an expression of any complete object type. In preparing for the call to a function, the arguments are evaluated, and each parameter is assigned the value of the corresponding argument.
...
6. If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined.
...
10. There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.
Lets say there is a non trivial function that can not be inlined or removed, and must be called, that is declared to take no arguments:
int veryImportantFunc() {
/* do some important stuff */
return result;
}
But this function is called with arguments:
There are two possibilities:
the function is declared with a "full" prototype such as
int veryImportantFunc(void);
in this case the call with extra arguments won't compile, as the number of parameters and arguments must match;
the function is declared as taking an unspecified number of arguments, i.e. the declaration visibile to the call site is
int veryImportantFunc();
in this case, the call is undefined behavior, as the usage doesn't match the actual function definition.
All the other considerations about optimization aren't particularly interesting, as what you are trying to do is illegal however you look at it.
We can stretch this and imagine a situation where passing extra useless arguments is legal, for example a variadic function never using the extra arguments.
In this case, as always, the compiler is free to perform any such optimization as long as the observable behavior isn't impacted, and proceeds "as if" performed according to the C abstract machine.
Given that the details of arguments passing aren't observables1, the compiler could in line of principle optimize away the argument passing, while the arguments evaluation may still need to be done if it has some observable impact on the program state.
That being said, I have a hard time imagining how such optimization may be implemented in the "classical linking model", but with LTCG it shouldn't be impossible.
The only observable effects according to the C standard are IO and reads/writes on volatile variables.
Following Pascals theory, it is better to be wrong in believing the compiler can make this optimisation than be right in believing it doesn’t. It serves no purpose to wrongly define a function; if you really must, you can always put a stub in front of it:
int RealSlimShady(void) {
return Dufus;
}
int MaybeSlimShady(int Mathew, int Mathers) {
Used(Mathew);
Used(Mathers);
return RealSlimShady();
}
Everyone is happy, and if your compiler is worth its salt, there will be 0 code overhead.
In this recent question, some code was shown to have undefined behavior:
a[++i] = foo(a[i-1], a[i]);
because even though the actual call of foo() is a sequence point, the assignment is unsequenced, so you don't know whether the function is called after the side-effect of ++i took place or before that.
Thinking further about this, the sequence point at a function call only guarantees that side effects from evaluating the function arguments are carried out once the function is entered, e.g.
int y = 1;
int func1(int x) { return x + y; }
int main(void)
{
int result = func1( y++ ); // guaranteed to be 3
}
But looking at the standard, there's also §7.1.4 p3 (in the chapter about the standard library):
There is a sequence point immediately before a library function returns.
My question here is: What's the consequence of this paragraph? Why does it only concern library functions and what kind of code would actually rely on that?
Simple ideas like (nonsensical code to follow)
errno = 0;
long result = ftell(file) * errno;
would still be undefined as this time, the multiplication is unsequenced. I'm looking for an example that makes use of this special guarantee §7.1.4 p3 makes for library functions.
Regarding the suggested duplicate, Sequence point after a return statement?, this is indeed closely related and I found it before asking this question. It's not a duplicate, because
it asks about normative text stating there is a sequence point immediately after a return, without asking about the consequences when there is one.
it only mentions the special rule for library functions this question is about, without further elaborating on it.
Consequently, my questions here are not answered over there. The accepted answer uses a return value in an unsequenced expression (in this case an addition) and explains how the result depends on the sequencing of this addition, only finding that if you knew the sequencing of the addition, the whole result would be defined with a sequence point immediately after return. It doesn't show an example of code that is actually defined because of this rule, and it doesn't say anything about how/why library functions are special.
Library functions don't have the code that implements them covered by the standard (they might not even be implemented in C). The standard only specifies their behaviour. So the provision about return statements does not apply to implementation of library functions.
The purpose of this clause (in combination with there being a sequence point on entry of a library function) is to say that any side-effects of the library functions are sequenced either before or after any other evaluations that might be in the code which calls the library function.
So the example in your question is not undefined behaviour (unless the multiplication overflows!): the read of errno is either sequenced before or after the modification by ftell, it's unspecified which.
I am studying about undefined behavior in C and I came to a statement that states that
there is no particular order of evaluation of function arguments
but then what about the standard calling conventions like _cdecl and _stdcall, whose definition said (in a book) that arguments are evaluated from right to left.
Now I am confused with these two definitions one, in accordance of UB, states different than the other which is in accordance of the definition of calling convention. Please justify the two.
As Graznarak's answer correctly points out, the order in which arguments are evaluated is distinct from the order in which arguments are passed.
An ABI typically applies only to the order in which arguments are passed, for example which registers are used and/or the order in which argument values are pushed onto the stack.
What the C standard says is that the order of evaluation is unspecified. For example (remembering that printf returns an int result):
some_func(printf("first\n"), printf("second\n"));
the C standard says that the two messages will be printed in some order (evaluation is not interleaved), but explicitly does not say which order is chosen. It can even vary from one call to the next, without violating the C standard. It could even evaluate the first argument, then evaluate the second argument, then push the second argument's result onto the stack, then push the first argument's result onto the stack.
An ABI might specify which registers are used to pass the two arguments, or exactly where on the stack the values are pushed, which is entirely consistent with the requirements of the C standard.
But even if an ABI actually requires the evaluation to occur in a specified order (so that, for example, printing "second\n" followed by "first\n" would violate the ABI) that would still be consistent with the C standard.
What the C standard says is that the C standard itself does not define the order of evaluation. Some secondary standard is still free to do so.
Incidentally, this does not by itself involve undefined behavior. There are cases where the unspecified order of evaluation can lead to undefined behavior, for example:
printf("%d %d\n", i++, i++); /* undefined behavior! */
Argument evaluation and argument passing are related but different problems.
Arguments tend to be passed left to right, often with some arguments passed in registers rather than on the stack. This is what is specified by the ABI and _cdecl and _stdcall.
The order of evaluation of arguments before placing them in the locations that the function call requires is unspecified. It can evaluate them left to right, right to left, or some other order. This is compiler dependent and may even vary depending on optimization level.
_cdecl and _stdcall merely specify that the arguments are pushed onto the stack in right-to-left order, not that they are evaluated in that order. Think about what would happen if calling conventions like _cdecl, _stdcall, and pascal changed the order that the arguments were evaluated.
If evaluation order were modified by calling convention, you would have to know the calling convention of the function you're calling in order to understand how your own code would behave. That's a leaky abstraction if I've ever seen one. Somewhere, buried in a header file someone else wrote, would be a cryptic key to understanding just that one line of code; but you've got a few hundred thousand lines, and the behavior changes for each one? That would be insanity.
I feel like much of the undefined behavior in C89 arose from the fact that the standard was written after multiple conflicting implementations existed. They were maybe more concerned with agreeing on a sane baseline that most implementers could accept than they were with defining all behavior. I like to think that all undefined behavior in C is just a place where a group of smart and passionate people agreed to disagree, but I wasn't there.
I'm tempted now to fork a C compiler and make it evaluate function arguments as if they're a binary tree that I'm running a breadth-first traversal of. You can never have too much fun with undefined behavior!
Check the book you mentioned for any references to "Sequence points", because I think that's what you're trying to get at.
Basically, a sequence point is a point that, once you've arrived there, you are certain that all preceding expressions have been fully evaluated, and its side-effects are sure to be no more.
For example, the end of an initializer is a sequence point. This means that after:
bool foo = !(i++ > j);
You are sure that i will be equal to i's initial value +1, and that foo has been assigned true or false. Another example:
int bar = i++ > j ? i : j;
Is perfectly predictable. It reads as follows: if the current value of i is greater than j, and add one to i after this comparison (the question mark is a sequence point, so after the comparison, i is incremented), then assign i (NEW VALUE) to bar, else assign j. This is down to the fact that the question mark in the ternary operator is also a valid sequence point.
All sequence points listed in the C99 standard (Annex C) are:
The following are the sequence points described in 5.1.2.3:
— The call to a function, after the arguments have been evaluated (6.5.2.2).
— The end of the first operand of the following operators: logical AND && (6.5.13);
logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
— The end of a full declarator: declarators (6.7.5);
— The end of a full expression: an initializer (6.7.8); the expression in an expression
statement (6.8.3); the controlling expression of a selection statement (if or switch)
(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
expressions of a for statement (6.8.5.3); the expression in a return statement
(6.8.6.4).
— Immediately before a library function returns (7.1.4).
— After the actions associated with each formatted input/output function conversion
specifier (7.19.6, 7.24.2).
— Immediately before and immediately after each call to a comparison function, and
also between any call to a comparison function and any movement of the objects
passed as arguments to that call (7.20.5).
What this means, in essence is that any expression that is not a followed by a sequence point can invoke undefined behaviour, like, for example:
printf("%d, %d and %d\n", i++, i++, i--);
In this statement, the sequence point that applies is "The call to a function, after the arguments have been evaluated". After the arguments are evaluated. If we then look at the semantics, in the same standard under 6.5.2.2, point ten, we see:
10 The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point
before the actual call.
That means for i = 1, the values that are passed to printf could be:
1, 2, 3//left to right
But equally valid would be:
1, 0, 1//evaluated i-- first
//or
1, 2, 1//evaluated i-- second
What you can be sure of is that the new value of i after this call will be 2.
But all of the values listed above are, theoretically, equally valid, and 100% standard compliant.
But the appendix on undefined behaviour explicitly lists this as being code that invokes undefined behaviour, too:
Between two sequence points, an object is modified more than once, or is modified
and the prior value is read other than to determine the value to be stored (6.5).
In theory, your program could crash, instead of printinf 1, 2, and 3, the output "666, 666 and 666" would be possible, too
so finally i found it...yeah.
it is because the arguments are passed after they are evaluated.So passing arguments is a completely different story from the evaluation.Compiler of c as it is traditionally build to maximize the speed and optimization can evaluate the expression in any way.
so the both argument passing and evaluation are different stories altogether.
since the C standard does not specify any order for evaluating parameters, every compiler implementation is free to adopt one. That's one reason why coding something like foo(i++) is complete insanity- you may get different results when switching compilers.
One other important thing which has not been highlighted here - if your favorite ARM compiler evaluates parameters left to right, it will do so for all cases and for all subsequent versions. Reading order of parameters for a compiler is merely a convention...
I used to think that in C99, even if the side-effects of functions f and g interfered, and although the expression f() + g() does not contain a sequence point, f and g would contain some, so the behavior would be unspecified: either f() would be called before g(), or g() before f().
I am no longer so sure. What if the compiler inlines the functions (which the compiler may decide to do even if the functions are not declared inline) and then reorders instructions? May one get a result different of the above two? In other words, is this undefined behavior?
This is not because I intend to write this kind of thing, this is to choose the best label for such a statement in a static analyzer.
The expression f() + g() contains a minimum of 4 sequence points; one before the call to f() (after all zero of its arguments are evaluated); one before the call to g() (after all zero of its arguments are evaluated); one as the call to f() returns; and one as the call to g() returns. Further, the two sequence points associated with f() occur either both before or both after the two sequence points associated with g(). What you cannot tell is which order the sequence points will occur in - whether the f-points occur before the g-points or vice versa.
Even if the compiler inlined the code, it has to obey the 'as if' rule - the code must behave the same as if the functions were not interleaved. That limits the scope for damage (assuming a non-buggy compiler).
So the sequence in which f() and g() are evaluated is unspecified. But everything else is pretty clean.
In a comment, supercat asks:
I would expect function calls in the source code remain as sequence points even if a compiler decides on its own to inline them. Does that remain true of functions declared "inline", or does the compiler get extra latitude?
I believe the 'as if' rule applies and the compiler doesn't get extra latitude to omit sequence points because it uses an explicitly inline function. The main reason for thinking that (being too lazy to look for the exact wording in the standard) is that the compiler is allowed to inline or not inline a function according to its rules, but the behaviour of the program should not change (except for performance).
Also, what can be said about the sequencing of (a(),b()) + (c(),d())? Is it possible for c() and/or d() to execute between a() and b(), or for a() or b() to execute between c() and d()?
Clearly, a executes before b, and c executes before d. I believe it is possible for c and d to be executed between a and b, though it is fairly unlikely that it the compiler would generate the code like that; similarly, a and b could be executed between c and d. And although I used 'and' in 'c and d', that could be an 'or' - that is, any of these sequences of operation meet the constraints:
Definitely allowed
abcd
cdab
Possibly allowed (preserves a ≺ b, c ≺ d ordering)
acbd
acdb
cadb
cabd
I believe that covers all possible sequences. See also the chat between Jonathan Leffler and AnArrayOfFunctions — the gist is that AnArrayOfFunctions does not think the 'possibly allowed' sequences are allowed at all.
If such a thing would be possible, that would imply a significant difference between inline functions and macros.
There are significant differences between inline functions and macros, but I don't think the ordering in the expression is one of them. That is, any of the functions a, b, c or d could be replaced with a macro, and the same sequencing of the macro bodies could occur. The primary difference, it seems to me, is that with the inline functions, there are guaranteed sequence points at the function calls - as outlined in the main answer - as well as at the comma operators. With macros, you lose the function-related sequence points. (So, maybe that is a significant difference...) However, in so many ways the issue is rather like questions about how many angels can dance on the head of a pin - it isn't very important in practice. If someone presented me with the expression (a(),b()) + (c(),d()) in a code review, I would tell them to rewrite the code to make it clear:
a();
c();
x = b() + d();
And that assumes there is no critical sequencing requirement on b() vs d().
See Annex C for a list of sequence points. Function calls (the point between all arguments being evaluated and execution passing to the function) are sequence points. As you've said, it's unspecified which function gets called first, but each of the two functions will either see all the side effects of the other, or none at all.
#dmckee
Well, that won't fit inside a comment, but here is the thing:
First, you write a correct static analyzer. "Correct", in this context, means that it won't remain silent if there is anything dubious about the analyzed code, so at this stage you merrily conflate undefined and unspecified behaviors. They are both bad and unacceptable in critical code, and you warn, rightly, for both of them.
But you only want to warn once for one possible bug, and also you know that your analyzer will be judged in benchmarks in terms of "precision" and "recall" when compared to other, possibly not correct, analyzers, so you mustn't warn twice about one same problem... Be it a true or false alarm (you don't know which. you never know which, otherwise it would be too easy).
So you want to emit a single warning for
*p = x;
y = *p;
Because as soon as p is a valid pointer at the first statement, it can be assumed to be a valid pointer at the second statement. And not inferring this will lower your score on the precision metric.
So you teach your analyzer to assume that p is a valid pointer as soon as you have warned about it the first time in the above code, so that you don't warn about it the second time. More generally, you learn to ignore values (and execution paths) that correspond to something you have already warned about.
Then, you realize that not many people are writing critical code, so you make other, lightweight analyses for the rest of them, based on the results of the initial, correct analysis. Say, a C program slicer.
And you tell "them": You don't have to check about all the (possibly, often false) alarms emitted by the first analysis. The sliced program behaves the same as the original program as long as none of them is triggered. The slicer produces programs that are equivalent for the slicing criterion for "defined" execution paths.
And users merrily ignore the alarms and use the slicer.
And then you realize that perhaps there is a misunderstanding. For instance, most implementations of memmove (you know, the one that handles overlapping blocks) actually invoke unspecified behavior when called with pointers that do not point to the same block (comparing addresses that do not point to the same block). And your analyzer ignore both execution paths, because both are unspecified, but in reality both execution paths are equivalent and all is well.
So there shouldn't be any misunderstanding on the meaning of alarms, and if one intends to ignore them, only unmistakable undefined behaviors should be excluded.
And this is how you end up with a strong interest in distinguishing between unspecified behavior and undefined behavior. No-one can blame you for ignoring the latter. But programmers will write the former without even thinking about it, and when you say that your slicer excludes "wrong behaviors" of the program, they will not feel as they are concerned.
And this is the end of a story that definitely did not fit in a comment. Apologies to anyone who read that far.