What is the definition of a "valid program"? - c

ISO/IEC 9899:202x (E) working draft — December 11, 2020 N2596, footnote 9:
... an implementation is free to produce any number of diagnostic messages, often referred to as warnings, as long as a valid program is still correctly translated. It can also successfully translate an invalid program.
Searching the definition of "valid / invalid program" across the standard gives no results. In fact the footnote 9 is the only place where "valid / invalid program" is mentioned.
Note: yes:
In ISO standards, notes are without exception non-normative.
Source: https://www.iso.org/schema/isosts/v1.0/doc/n-6ew0.html.
However, people do frequently use the term "valid / invalid program".
Can someone please help to suggest / deduce the definition (relative to the standard) of the term "valid program"?
The question may look silly at the first glance. However, there are cases when people have different understandings of the term "valid program". Hence, misinterpretations occur.
My guess: valid program -- a program which does not violate any syntax rule or constraint.
Note: "semantics rule" is intentionally not included in this definition because per Rice's theorem "non-trivial semantic properties of programs are undecidable".
Is such definition appropriate? If no, then what it the appropriate definition?

At least in older versions of the Standard, a Conforming C Program is any source text which is accepted by at least one Conforming C Implementation somewhere in the universe. Given that conforming implementations are allowed to extend the language to accept almost any arbitrary source text, including programs that contain constraint violations, provided that they only accept the latter after having issued at least one diagnostic, the question of whether any particular source text is a Conforming C Program is determined by the existence or non-existence of implementations that accept it, rather than by any trait of the source text itself.

Your assumption that a valid program may not violate any constraints is correct. And so is your assumption that correctness is impossible hard to prove via static analysis, but can only be attested to a specific execution pass.
It's the definition of the "invalid program" which is fuzzy. A program can still be valid for a limited set of inputs, so you can't label the program invalid entirely. Only programs which are invalid for every possible input are invalid as a whole. Likewise, only a program which is valid for every possible input is "truly valid". In reality, there is hardly any non-trivial program which would not have edge cases where it's still invalid.
To sum that up into a formal definition:
A program is valid if there is at least a single possible input for which no constraints are violated.
A program is invalid only if it violates constraints for all possible inputs.
And please don't confuse valid/invalid with correct/incorrect. Criteria for the later is correctness for all possible inputs.

Related

Why is the C preprocessor a subject of undefined behavior?

I can understand that:
One of the origins of the UB is a performance increase (e.g. by removing never executed code, such as if (i+1 < i) { /* never_executed_code */ }; UPD: if i is a signed integer).
UB can be triggered at compile time because C does not clearly distinguish between compile time and run time. The "whole language is based on the (rather unhelpful) concept of an "abstract machine" (link).
However, I cannot understand yet why C preprocessor is a subject of undefined behavior? It is known that preprocessing directives are executed at compile time.
Consider C11, 6.10.3.3 The ## operator, 3:
If the result is not a valid preprocessing token, the behavior is undefined.
Why not make it a constraint? For example:
The result shall be a valid preprocessing token.
The same question goes for all the other "the behavior is undefined" in 6.10 Preprocessing directives.
Why is the C preprocessor a subject of undefined behavior?
When the C standard was created, there were some existing C preprocessors and there was some imaginary ideal C preprocessor in the minds of standardization committee members.
So there were these gray areas, where committee members weren't completely sure what would they want to do and/or existing C preprocessor implementations differed which each other in behavior.
So, these cases are not defined behavior. Because the C committee members are not completely sure what the behavior actually should be. So there is no requirement on what it should be.
One of the origins of the UB
Yes, one of.
UB may exist to ease up implementing the language. Like for example, in case of the preprocessor, the preprocessor writers don't have to care about what happens when an invalid preprocessor token is a result of ##.
Or UB may exist to reconcile existing implementations with different behaviors or as a point for extensions. So a preprocessor that segfaults in case of UB, a preprocessor that accepts and works in case of UB, and a preprocessor that formats your hard drive in case of UB, all can be standard conformant (but I wouldn't want to work on that one that formats your drive).
Suppose a file which is read in via include directive ends with the partial line:
#define foo bar
Depending upon the design of the preprocessor, it's possible that the partial token bar might be concatenated to whatever appears at the start of the line following the #include directive, or that whatever appears on that line will behave as though it were placed on the line with the #define directive, but with a whitespace separating it from the token bar, and it would hardly be inconceivable that a build script might rely upon such behaviors. It's also possible that implementations might behave as though a newline were inserted at the end of the included file, or might ignore the last partial line of such a file.
Any code which relied upon one of the former behaviors would clearly have been non-portable, but if code exploited such behavior to do something that would otherwise not be practical, such code would hardly be "erroneous", and the authors of the Standard would not have wanted to forbid an implementation that would process it usefully from continuing to do so.
When the Standard uses the phrase "non-portable or erroneous", that does not mean "non-portable, therefore erroneous". Prior to the publication of C89, C implementations defined many useful constructs, but none of them were defined by "the C Standard" since there wasn't one. If an implementation defined the behavior of some construct, some didn't, and the Standard left the construct as "Undefined", that would simply preserve the status quo where implementations that chose to define a useful behavior would do so, those that chose not to wouldn't, and programs that relied upon such behaviors would be "non-portable", working correctly on implementations that supported the behaviors, but not on those that didn't.
Without getting into specifics, my guess is, there exist several preprocessor implementations which have bugs, but the Standard doesn't want to declare them non-conforming, for compatibility reasons.
In human language: if you write a program which has X in it, preprocessor does weird stuff.
In standardese: the behavior of program with X is undefined.
If the standard says something like "The result shall be a valid preprocessing token", it might be unclear what "shall" means in this context.
The programmer shall write the program so this condition holds? If so, the wording with "undefined behavior" is clearer and more uniform (it appears in other places too)
The preprocessor shall make sure this condition holds? If so, this requires dedicated logic which checks the condition; may be impractical to implement.

What is the rationale for "semantics violation does not require diagnostics"?

Follow-up question for: If "shall / shall not" requirement is violated, then does it matter in which section (e.g. Semantics, Constraints) such requirement is located?.
ISO/IEC 9899:202x (E) working draft— December 11, 2020 N2596, 5.1.1.3 Diagnostics, 1:
A conforming implementation shall produce at least one diagnostic message (identified in an
implementation-defined manner) if a preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined. Diagnostic messages need not be produced in other circumstances.
Consequence: semantics violation does not require diagnostics.
Question: what is the (possible) rationale for "semantics violation does not require diagnostics"?
A possible rationale is given by Rice's theorem : non-trivial semantic properties of programs are undecidable
For example, division by zero is a semantics violation; and you cannot decide, by static analysis alone of the C source code, that it won't happen...
A standard cannot require total detection of such undefined behavior, even if of course some tools (e.g. Frama-C) are sometimes capable of detecting them.
See also the halting problem. You should not expect a C compiler to solve it!
The C99 rationale v5.10 gives this explanation:
5.1.1.3 Diagnostics
By mandating some form of diagnostic message for any program containing a syntax error or
constraint violation, the Standard performs two important services. First, it gives teeth to the
concept of erroneous program, since a conforming implementation must distinguish such a program from a valid one. Second, it severely constrains the nature of extensions permissible to
a conforming implementation.
The Standard says nothing about the nature of the diagnostic message, which could simply be
“syntax error”, with no hint of where the error occurs. (An implementation must, of course,
describe what translator output constitutes a diagnostic message, so that the user can recognize it as such.) The C89 Committee ultimately decided that any diagnostic activity beyond this level is
an issue of quality of implementation, and that market forces would encourage more useful
diagnostics. Nevertheless, the C89 Committee felt that at least some significant class of errors
must be diagnosed, and the class specified should be recognizable by all translators.
This happens because the grammar of the C language is context-sensitive and for all the languages that are defined with context-free or more complex grammars on the Chomsky hierarchy one must do a tradeoff between the semantics of the language and its power.
C designers chose to allow much power for the language and this is why the problem of undecidability is omnipresent in C.
There are languages like Coq that try to cut out the undecidable situations and they restrict the semantics of the recursive functions (they allow only sigma(primitive) recursivity).
The question of whether an implementation provides any useful diagnostics in any particular situation is a Quality of Implementation issue outside the Standard's jurisdiction. If an implementation were to unconditionally output "Warning: this program does not output any useful diagnostics" or even "Warning: water is wet", such output would fully satisfy all of the Standard's requirements with regard to diagnostics even if the implementation didn't output any other diagnostics.
Further, the authors of the Standard characterized as "Undefined Behavior" many actions which they expected would be processed in a meaningful and useful fashion by many if not most implementations. According to the published Rationale document, Undefined Behavior among other things "identifies areas of conforming language extension", since implementations are allowed to specify how they will behave in cases that are not defined by the Standard.
Having implementations issue warnings about constructs which were non-portable, but which they would process in a useful fashion would have been annoying.
Prior to the Standard, some implementations would usefully accept constructs like:
struct foo {
int *p;
char pad [4-sizeof (int*)];
int q,r;
};
for all sizes of pointer up to four bytes (8-byte pointers weren't a thing back then), rather than squawking if pointers were exactly four bytes, but some people on the Committee were opposed to the idea of accepting declarations for zero-sized arrays. Thus, a compromise was reached where compilers would squawk about such things, programmers would ignore the useless warnings, and the useful constructs would remain usable on implementations that supported them.
While there was a vague attempt to distinguish between constructs that should produce warnings that programmers could ignore, versus constructs that might be used so much that warnings would be annoying, the fact that issuance of useful diagnostics was a Quality of Implementation issue outside the Standard's jurisdiction meant there was no real need to worry too much about such distinctions.

Is running a binary generated from a code with "constraint violation" actually undefined behaviour?

Reference: This question
Context: A return statement at the end of main() with no expression, like
return;
As I read the standard, it appears to be undefined behavior and so I wrote an answer.
However, another answer says, it's not UB.
Where did I go wrong reading or understanding the standard language?
Note: I have seen this thread already, but I cannot conclude.
Let's assume this context:
int main(void) {
return;
}
and a hosted implementation.
Note that void main() is not strictly forbidden (an implementation is permitted to permit it), and return; would be perfectly valid within such a definition of main.
First off, this return; is clearly a constraint violation. It violates N1570 6.8.6.4p1:
A return statement without an expression shall only appear in a function whose return type is void.
(It was not a constraint violation in C90; that was changed in C99.)
The standard requires a compiler to issue at least one diagnostic for any program containing a violation of a constraint or syntax rule. Once it's done so, it can then proceed to generate an executable (if the diagnostic is a non-fatal warning).
It's not clear whether there's a general rule that programs with constraint violations have undefined behavior. The standard doesn't say so explicitly. It defines a "constraint" as a
restriction, either syntactic or semantic, by which the exposition of language elements is to be interpreted
My own interpretation is that if a constraint is violated, there is no valid interpretation, and thus the behavior is undefined, but there are some who disagree. I'd like to see an explicit statement to this effect in a future standard.
I believe the wording of the C Standard is genuinely ambiguous on this point.
Without such a general rule, it's not clear that the behavior is undefined. After stating the constraint that this code violates, the standard goes on to define the semantics of the return statement:
A return statement terminates execution of the current function
and returns control to its caller.
That description could still make sense even in the presence of this constraint violation. It's reasonable to assume, though it's not quite stated, that return; is equivalent to reaching the closing }, which is equivalent to return 0; (this is a special-case rule for main; N1570 5.1.2.2.3).
On the other hand, it's not 100% clear that the description is unambiguous. The argument that return; is equivalent to reaching the closing } is particularly weak. So even in the absence of a general rule, one could argue that the behavior is undefined by omission (there's no definition of the behavior).
The return; is clearly a constraint violation, and in my opinion it has undefined behavior if the implementation produces an executable. Of course a conforming compiler is also permitted to reject the program altogether; in that case, it has no behavior at all.
Bottom line: It's much much much easier to change the return; to return 0; than to answer this question.
I encourage other language lawyers to refute this answer. I've made enough internally inconsistent arguments here that it shouldn't be too difficult.
Answering the title:
Is running a binary generated from a code with “constraint violation” actually undefined behaviour?
I say no because of the definition of undefined behaviour in the standard, emphasis mine:
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a
constraint or runtime constraint is violated, the behavior is undefined.
Undefined behavior is otherwise indicated in this International Standard
by the words ‘‘undefined behavior’’ or by the omission of any explicit
definition of behavior. There is no difference in emphasis among
these three; they all describe ‘‘behavior that is undefined’’
§ 4/2
Assuming that violating a constraint means breaking a "shall" or "shall not" requirement that is inside a section marked as "constraint" and it is not otherwise indicated that the violation leads to UB.
As per the context case I agree with Keith Thompson's answer.
The standard doesn't define the behaviour of code containing a constraint violation. Therefore the behaviour is undefined.
Saying that code containing a constraint violation is well-defined is tantamount to saying that all constraints in the Standard can be ignored, which is absurd.
As a real-world scenario where a program with a constraint violation could cause
Undefined Behavior, consider a build system which doesn't erase output files
unless/until it has something to replace them with. When fed a valid program
it creates an executable file with a certain name, and when fed an invalid
program it leaves the old file unmodified. Since the source text that were used to generate the leftover executable may bear no relation to the current source text, the behavior of the executable may be arbitrarily different from anything that's in the current source file.
Ideally an implementation would try to prevent accidental execution of such left-over code, but if a C compiler is just a small portion of a larger build system the compiler might not have control over such issues, and I doubt that the authors of the Standard wanted to say that build systems that didn't let a compiler delete or invalidate an executable (that it might know nothing about) should be unable to host conforming C implementations.

What are the Constraints in Standard C?

C standards talk about constraints, e. g. ISO/IEC 9899:201x defines the term
constraint
restriction, either syntactic or semantic, by which the
exposition of language elements is to be interpreted
and says in chapter Conformance
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a
constraint or runtime-constraint is violated, the behavior is
undefined.
In chapter Environment, Subsection Diagnostics it is said
A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
So, it is important to know what are the constraints in C, for example for compiler writers to judge when diagnostics are required, or for C programmers when diagnostics rather than just undefined behaviour can be expected.
Now, there are sections all over the standard document with the title Constraints, but I cannot find definitive wording as to what exactly the term constraint covers in the standard.
Are the constraints everything that appears in the sections titled Constraints?
Is every requirement that is stated outside of those sections not a constraint?
Is there a comprehensive description of constraint in the standard that I missed?
Are the constraints everything that appears in the sections titled Constraints?
In the sense of n1570 3.8 (a restriction imposed on programs which requires a conforming implementation to issue a compile-time diagnostic message when violated), I think yes.
Is every requirement that is stated outside of those sections not a constraint?
In the sense of 3.8, I think yes, but for a more circular reason: The standard's structure is fairly formal. Whenever applicable there seems to be an explicit Constraints section. Therefore I understand that by definition anything which is not in a Constraints section is not a constraint in the sense of 3.8.
There are a few "shall" clauses outside Constraints sections which appear completely compile-time enforceable, cf. below for a few examples. They are often in adjacent Semantics sections. I may be missing subtleties which prevent compile-time detection in the general case (so that a diagnosis cannot be made mandatory), or perhaps the standard is not completely consistent. But I would think that a compiler could simply translate a violating program, exactly because the requirements are not in a Constraints section.
Is there a comprehensive description of constraint in the standard that I missed?
I think 3.8 is all you get. I try to explore the term below and agree that the definition is unsatisfying.
I looked deeper into the standard to find that out. Here is my research.
The term constraint
Let's start with the basics. The definition of "constraint" in 3.8 which you quote is surprisingly hard to understand, at least without context ("restriction, either syntactic or semantic, by which the exposition of language elements is to be interpreted"). "Restriction" and "constraint" are synonyms, so that the rewording doesn't add much; and what is meant by "exposition of language elements"?? Exposition is a word with several meanings; let's take "writing or speech primarily intended to convey information" from Dictionary.com, and let's assume they mean the standard with that. Then it means basically that a constraint in this standard is a constraint of what is said in this standard. Wow, I wouldn't have guessed that.
Constraints as per 3.8
Pragmatically just examining the actual Constraints sections in the standard shows that they list compile time restrictions imposed on conforming programs. This makes sense because only compile-time constraints can be checked at compile time.
These additional restrictions are those which cannot be expressed in the C syntax.1
Constraints outside Constraints sections
Most uses of "shall" outside of Constraints sections impose restrictions on a conforming implementation. Example: "All objects with static storage duration shall be initialized (set to their
initial values) before program startup", a job of a conforming implementation.
There are a few "shall" clauses imposing restrictions on a program (not the implementation) outside of Constraints sections though. I would argue that most fall in the same category as the "runtime constraints [...] on a program when calling a library function" mentioned in 3.18. They seem to be run time constraints which are not generally detectable at compile time (so that diagnostics can not be mandatory).
Here are a few examples.
In 6.5/7 n1570 details the much-debated aliasing rules:
An object shall have its stored value accessed only
by an lvalue expression that has one of
the following types:
a type compatible with the effective type of the object
a qualified version of a type compatible
with the effective type of the object,
[...]
In 6.5.16.1, "Simple Assignment":
If the value being stored in an object is read from another object that overlaps in any way
the storage of the first object, then the overlap shall be exact[..]."
Other examples concern pointer arithmetic (6.5.6/8).
Shall clauses which could be in Constraints sections
But then there are other shall clauses whose violation should be detectable at compile time; I would not have blinked if they had appeared in the respective Constraints section.
6.6/6, "Cast operators in an integer constant
expression shall only convert arithmetic types to integer types" (under "Semantics"); what can you detect at compile time if you cannot detect types of constants and casts?
6.7/7, "If an identifier for an object is declared with no linkage, the type for the object shall be complete by the end of its declarator" (under "Semantics"). To me is seems to be a basic compiler task to detect whether a type is complete at some point in the code. But of course, I have never written a C compiler.
There are a few more examples. But as I said, I would think that an implementation is not required to diagnose violations. A violating program which manages to sneak past the compiler simply exposes undefined behavior.
1 For example, I understand that the syntax doesn't deal with types -- it only has generic "expressions". Therefore every operator has a Constraints section detailing the permissible types of its arguments. Example for shift operators: "Each of the operands shall have integer type." A program which is trying to shift the bits of a float is violating this constraint, and the implementation must issue a diagnostic.
The C committee addressed this issue in the response to Defect Report # 033. The question in that defect report was:
Is a conforming implementation required to diagnose all violations of ''shall'' and ''shall not'' statements in the standard, even if those statements occur outside of a section labeled Constraints?
The author of that defect report suggested a couple of possible alternative ways of interpreting the language of the standard. The second alternative he listed said (in part):
Syntax rules are those items listed in the Syntax sections of the standard. Constraints are those items listed in the Constraints sections of the standard.
Part of the committee's response was:
Suggested Interpretation #2 is the correct one.
I believe that covers your questions fairly completely, but just to state answers to your questions more directly:
Are the constraints everything that appears in the sections titled Constraints?
Is every requirement that is stated outside of those sections not a constraint?
A "constraint" is a requirement that is stated in a section explicitly marked "Constraints". Any requirement stated outside such a section is not a constraint.
Is there a comprehensive description of constraint in the standard that I missed?
At least as far as I know, the standard itself doesn't contain a more specific statement about what is or isn't a constraint, but the linked defect report does.
Are the constraints everything that appears in the sections titled Constraints?
It appears they are mostly (there are some cases which are not, fx: it's stated that "Incrementing is equivalent to adding 1" in one of the constraint sections).
Is every requirement that is stated outside of those sections not a constraint?
I haven't seen a "constraint" outside those sections.
Is there a comprehensive description of constraint in the standard that I missed?
Probably not, if there were an authoritative such it would be in the standard and probably be the "constraint" sections (and explicitly mentioned that these are all "constraints").
My interpretation is that the chapter 3 should be interpreted so that every use of the defined terms would have the meaning defined in that section. Especially everywhere the term "constraint" is used it should be understood according to your first quote.
Your second quote is no exception. It's noted in the definition of the term "constraint" that there is no requirement that the constraint is explicitely termed a constraint. This means that you have to determine if it's a "constraint" by checking if it's a such restriction.
However there seem to be quite a few examples of "shall" and "shall not" that could be taken to be such restrictions without explicitely termed as such. That would leave all the occurrences "shall" and "shall not" be mandating or prohibiting a certain behavior of the implementation - and if these are not fulfilled, then yes the behavior may be undefined (since you're using an implementation that doesn't conform to the standard).
It looks like all that fits the definition of "constraint" seem to occur under a "constraint" section, and everything in the "constraint" sections seem to be "constraints".
Are the constraints everything that appears in the sections titled Constraints?
Yes. Every syntactic and semantic restrictions mentioned in the standard are constraints.
For example, a constraint on Constant expressions (C11-6.6/3):
Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.115)
Therefore, the constant expressions
3 = 5;
10++;
shows constraint violation.
Note that in this case shall requirement as well as constraint both are violated.
Is every requirement that is stated outside of those sections not a constraint?
For standard conforming C, yes. A shall requirement on integer constant expression (C11-6.6/6):
An integer constant expression117) shall have integer type [...]
For example, an integer constant expression is required for size of a non-variable length array. Therefore,
int arr[5+1.5];
violates the shall requirement. The type of expression 5+1.5 is not integer type. This shall requirement is out of constraint.
It should be noted that a shall requirement may be a constraint too.
In my work in requirements engineering, the words "constraint" and "requirement" have different scope. It is important, also for the standard, to define those explicitly. I searched the word "constraint" in the standard and it seems I may draw the following conclusion:
A constraint is a limitation of either the input (pre-condition) or the output (post-condition) of the behavior the section of the standard describes. For input it means the input must adhere to the constraint (e.g. argc shall be positive). For output it means it must satisfy the constraint for any following unit of the standard to have a well-defined input (its pre-condition).
A requirement is part of the specification of the behavior of the section of the standard. "Shall" is a positive description of what is required; "shall not" is generally a limitiation, but not a constraint - it may participate though in meeting a constraint on its output.
Constraints and requirements can be seen as "external interfaces" (the constraints) and "system behavior/processing" (the requirements).
Shall generally denotes a requirement (a phrase without "shall" is hence not a requirement). "Shall" used in a constraint is then either used to define the input or output (e.g. argc shall be positive) or specifies behavior concerning validating the constraint (e.g. "...shall give a diagnostic message").
Strictly speaking, "shall" used in specifying behavior of validating an input constraint should not be listed in the constraint section (should not be listed in the interface specification) but in a processing section (behavior section).
Note that there can be no validation of an output constraint as the output should adhere to the specification; only a next uit can check those constraints if they are in its input constraints.
This may be a personal view but it seems to fit the uses of these words in the standard.
constraint
restriction, either syntactic or semantic, by which the
exposition of language elements is to be interpreted
This means that every explicit restriction for program logic or syntax set by the c standart in any way is a constraint. This includes syntactic constraints (e.g. Blocks must be terminated with a ;) and semantic constraints (e.g. You shall not use a variable before initializing it), basicly everything that is either syntacticly (notation-wise) or semanticly (usage of correct notation-wise) not allowed or defined as not allowed (undefined behaviour).
Is every requirement that is stated outside of those sections not a
constraint?
I do think that all explicit requirements for the programming in the C language fall either under a syntactic or semantic constraint.
Is there a comprehensive description of constraint in the standard
that I missed?
Not to my knowledge.
The purpose of constraints in the Standard is to specify conditions where a conforming implementation would be required to issue a diagnostic, or to allow implementations could process a program in ways contrary to what would be required absent the constraint in cases where doing so might be might be useful than the otherwise-specified behavior. Although Strictly Conforming C Programs are not allowed to violate constraints (no program that violates a constraint is a Strictly Conforming C Program), no such restriction applies to programs that are intended to be Conforming but not Strictly Conforming.
The C Standard was written as a compromise among multiple overlapping factions, including
those who thought that it should discourage programmers from writing code that wouldn't work on all platforms interchangeably
those who thought it should allow programmers who were targeting known platforms to exploit features that were common to all of the platforms they'd need to support, even if they wouldn't be supportable on all platforms
those who thought that compilers should be allowed to diagnose constructs and actions which would be performed more often by accident than deliberate intent
those who thought that it should allow programmers to do things like perform address calculations which would appear erroneous, but which would, if performed precisely as specified, yield the address of the object the programmer was expecting.
In order to achieve a consensus among these groups, the Standard imposed limits on what could be done within Strictly Conforming C Programs, but also write the definition of Conforming C Program broadly enough that almost no useful programs would be branded non-conforming no matter how obscure the extensions upon which they rely. If a source-code construct would violate a diagnosable constraint, but an implementation's customers would find it useful anyhow, then the implementation could output a diagnostic which its customers could ignore (even an unconditional: "Warning: This implementation doesn't bother outputting diagnostics its author thinks are silly, other than this one" would suffice) and everybody could get on with life.

Strict ISO C Conformance Test

I am currently working on a C project that needs to be fairly portable among different building environments. The project targets POSIX-compliant systems on a hosted C environment.
One way to achieve a good degree of portability is to code under conformance to a chosen standard, but it is difficult to determine whether a given translation unit is strict-conformant to ISO C. For example, it might violate some translation limits, or it might be relying on an undefined behavior, without any diagnostic message from the compilation environment. I am not even sure whether it is possible to check for strict conformance of large projects.
With that in mind, is there any compiler, tool or method to test for strict ISO C conformance under a given standard (for example, C89 or C99) of a translation unit?
Any help is appreciated.
It is not possible in general to find undefined run-time behavior. For example, consider
void foo(int *p, int *q)
{
*p = (*q)++;
...
which is undefined if p == q. Whether that can happen can't be determined ahead of time without solving the halting problem.
(Edited to fix mistake caf pointed out. Thanks, caf.)
Not really. The C standard doesn't set any absolute minimum limits on translation units that must be accepted. As such, a perfectly accurate checker would be trivial to write, but utterly useless in practice:
#include <stdio.h>
int main(int argc, char **argv) {
int i;
for (i=1; i<argc; i++)
fprintf(stderr, "`%s`: Translation limit (potentially) exceeded.\n", argv[i]);
return 0;
}
Yes, this rejects everything, no matter how trivial. That is in accordance with the standard. As I said, it's utterly useless in practice. Unfortunately, you can't really do a whole lot better -- when you decide to port to a different implementation, you could run into some oddball resource limit you've never seen before, so any code you write (up to an including "hello world") could potentially exceed a resource limit despite being allowed by dozens or even hundreds of compilers on/for much smaller systems.
Edit:
Why a "hello world" program isn't strictly conforming
First, it's worth re-stating the definition of "strictly conforming": "A strictly conforming program shall use only those features of the language and library specified in this International Standard.2) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit."
There are actually a number of reasons "Hello, World" isn't strictly conforming. First, as implied above, the minimum requirements for implementation limits are completely meaningless -- although there has to be some program that meets certain limits that will be accepted, no other program has to be accepted, even if it doesn't even come close to any of those limits. Given the way the requirement is stated, it's open to question (at best) whether there is any such thing as a program that doesn't exceed any minimum implementation limit, because the standard doesn't really define any minimum implementation limits.
Second, during phase 1 of translation: "Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set ... " (§5.1.1.2/1). Since "Hello, World!" (or whatever variant you prefer) is supplied as a string literal in the source file, it can be (is) mapped in an implementation-defined manner to the source character set. An implementation is free to decide that (for an idiotic example) string literals will be ROT13 encoded, and as long as that fact is properly documented, it's perfectly legitimate.
Third, the output is normally written via stdout. stdout is a text stream. According to the standard: "Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation." (§7.19.2/2) As such, an implementation could (for example) do Huffman compression on the output (on Monday, Wednesday, or Friday).
So, we have (at least) three distinct points at which the output from a "Hello, World!" depends on implementation-defined characteristics -- any one of which would prevent it from fitting the definition of a strictly conforming program.
gcc has warning levels that will attempt to pin down various aspects of ANSI conformance. But hat's only a starting point.
You might start with gcc -std=c99, or gcc -ansi -pedantic.
Good luck with that. Try to avoid signed integers, because:
int f(int x)
{
return -x;
}
can invoke UB.

Resources