Does "strictly conforming program" + no extensions mean "no diagnostics emitted"? - c

Follow-up question for: clang: <string literal> + <expression returning int> leads to confusing warning: adding 'int' to a string does not append to the string.
Does "strictly conforming program" + no extensions mean "no diagnostics emitted"?
Reason: better understanding of the term "strictly conforming program".

An implementation may generate diagnostics even if a program is conforming.
Section 5.1.1.3p1 of the C standard regarding diagnostics states:
A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
Diagnostic messages need not be produced in other
circumstances.9)
The intent is that an implementation should identify the
nature of, and where possible localize, each violation. Of
course, an implementation is free to produce any number of
diagnostics as long as a valid program is still correctly
translated. It may also successfully translate an invalid program
The portion in bold in footnote 9 states that additional diagnostics may be produced.

Does "strictly conforming program" + no extensions == no diagnostics emitted?
No.
The only things for which the language specification requires diagnostics to be emitted are invalid syntax and constraint violations:
A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
Diagnostic messages need not be produced in other circumstances.
(C2017, 5.1.1.3/1; emphasis added)
By definition, a strictly conforming program exhibits only valid syntax and does not contain any constraint violations, therefore the specification does not require a conforming implementation to emit any diagnostics when presented with such a program.
HOWEVER, the specification does not forbid implementations to emit diagnostics other than those that are required, and most implementations do, under some circumstances, emit diagnostics that are not required. The specification allows this, as clarified by footnote 9, which says, in part:
Of course, an
implementation is free to produce any number of diagnostics as long as
a valid program is still correctly translated.
Note also that "'strictly conforming program' + no extensions" is redundant. A program that makes use of any language extensions may conform, but it does not strictly conform:
A strictly conforming program shall use only those features of the
language and library specified in this International Standard. It
shall not produce output dependent on any unspecified, undefined,or
implementation-defined behavior, and shall not exceed any minimum
implementation limit.
(C2017 4/5; emphasis added)

Related

What is the rationale for "semantics violation does not require diagnostics"?

Follow-up question for: If "shall / shall not" requirement is violated, then does it matter in which section (e.g. Semantics, Constraints) such requirement is located?.
ISO/IEC 9899:202x (E) working draft— December 11, 2020 N2596, 5.1.1.3 Diagnostics, 1:
A conforming implementation shall produce at least one diagnostic message (identified in an
implementation-defined manner) if a preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined. Diagnostic messages need not be produced in other circumstances.
Consequence: semantics violation does not require diagnostics.
Question: what is the (possible) rationale for "semantics violation does not require diagnostics"?
A possible rationale is given by Rice's theorem : non-trivial semantic properties of programs are undecidable
For example, division by zero is a semantics violation; and you cannot decide, by static analysis alone of the C source code, that it won't happen...
A standard cannot require total detection of such undefined behavior, even if of course some tools (e.g. Frama-C) are sometimes capable of detecting them.
See also the halting problem. You should not expect a C compiler to solve it!
The C99 rationale v5.10 gives this explanation:
5.1.1.3 Diagnostics
By mandating some form of diagnostic message for any program containing a syntax error or
constraint violation, the Standard performs two important services. First, it gives teeth to the
concept of erroneous program, since a conforming implementation must distinguish such a program from a valid one. Second, it severely constrains the nature of extensions permissible to
a conforming implementation.
The Standard says nothing about the nature of the diagnostic message, which could simply be
“syntax error”, with no hint of where the error occurs. (An implementation must, of course,
describe what translator output constitutes a diagnostic message, so that the user can recognize it as such.) The C89 Committee ultimately decided that any diagnostic activity beyond this level is
an issue of quality of implementation, and that market forces would encourage more useful
diagnostics. Nevertheless, the C89 Committee felt that at least some significant class of errors
must be diagnosed, and the class specified should be recognizable by all translators.
This happens because the grammar of the C language is context-sensitive and for all the languages that are defined with context-free or more complex grammars on the Chomsky hierarchy one must do a tradeoff between the semantics of the language and its power.
C designers chose to allow much power for the language and this is why the problem of undecidability is omnipresent in C.
There are languages like Coq that try to cut out the undecidable situations and they restrict the semantics of the recursive functions (they allow only sigma(primitive) recursivity).
The question of whether an implementation provides any useful diagnostics in any particular situation is a Quality of Implementation issue outside the Standard's jurisdiction. If an implementation were to unconditionally output "Warning: this program does not output any useful diagnostics" or even "Warning: water is wet", such output would fully satisfy all of the Standard's requirements with regard to diagnostics even if the implementation didn't output any other diagnostics.
Further, the authors of the Standard characterized as "Undefined Behavior" many actions which they expected would be processed in a meaningful and useful fashion by many if not most implementations. According to the published Rationale document, Undefined Behavior among other things "identifies areas of conforming language extension", since implementations are allowed to specify how they will behave in cases that are not defined by the Standard.
Having implementations issue warnings about constructs which were non-portable, but which they would process in a useful fashion would have been annoying.
Prior to the Standard, some implementations would usefully accept constructs like:
struct foo {
int *p;
char pad [4-sizeof (int*)];
int q,r;
};
for all sizes of pointer up to four bytes (8-byte pointers weren't a thing back then), rather than squawking if pointers were exactly four bytes, but some people on the Committee were opposed to the idea of accepting declarations for zero-sized arrays. Thus, a compromise was reached where compilers would squawk about such things, programmers would ignore the useless warnings, and the useful constructs would remain usable on implementations that supported them.
While there was a vague attempt to distinguish between constructs that should produce warnings that programmers could ignore, versus constructs that might be used so much that warnings would be annoying, the fact that issuance of useful diagnostics was a Quality of Implementation issue outside the Standard's jurisdiction meant there was no real need to worry too much about such distinctions.

Can extension cancel the existing standard requirements?

Follow-up question for Why do conforming implementations behave differently w.r.t. incomplete array types with internal linkage?.
Context: in both gcc and clang (conforming implementations) by default the requirement C11,6.9.2p3 [1] is cancelled, which is positioned as an extension.
Question: can an extension cancel the existing standard requirements while keeping the implementation conforming?
[1] C11, 6.9.2 External object definitions, 3:
If the declaration of an identifier for an object is a tentative definition and has internal linkage, the declared type shall not be an incomplete type.
UPD. Yes. In other words: the standard says: "we do not support this, the diagnostics is required". The extension says: "we do support this (hence, the standard required diagnostics is irrelevant)".
It's not so much that an implementation "cancels" a requirement with an extension, but that extensions add features that the standard doesn't otherwise support. The main requirement is that extensions doesn't alter any strictly conforming programs.
The definition of a conforming implementation is as follows from section 4p6 of the C11 standard:
The two forms of conforming implementation are hosted and
freestanding. A conforming hosted implementation shall accept
any strictly conforming program. A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (clause 7) is confined to the contents of the standard headers [ ... omitted for brevity ... ]. A conforming
implementation may have extensions (including additional library
functions), provided they do not alter the behavior of any
strictly conforming program
Where a strictly conforming program is defined in section 4p5:
A strictly conforming program shall use only those features
of the language and library specified in this International
Standard. It shall not produce output dependent on any
unspecified, undefined, or implementation-defined behavior, and
shall not exceed any minimum implementation limit
And a conforming program is defined in section 4p7:
A conforming program is one that is acceptable to a conforming implementation.
So given the case of the program from your prior question:
static int arr[ ];
int main( void )
{
return arr[ 0 ];
}
static int arr[ ] = { 0 };
This is not a strictly conforming program because it violates 6.9.2p3. However some implementations such as gcc allows this as an extension. Supporting such a feature doesn't prevent a similar strictly conforming program such as this
static int arr[1];
int main( void )
{
return arr[ 0 ];
}
static int arr[ ] = { 0 };
From behaving any differently. Therefore an implementation supporting this feature still qualifies as a conforming implementation. This also means that the first program, while not a strictly conforming program, is a conforming program because it will run in a well defined manner on a conforming implementation.
The Standard requires that if a program violates a constraint which is in a constraints section, an implementation must issue at least one diagnostic. There is no requirement as to whether the document be meaningful or have any relation to the constraint violation. An implementation that unconditionally output "Warning: this implementation makes no attempt to enforce constraints its author views as stupid" would suffice. Likewise an implementation that includes a command-line option to output such a message and documents that it may not be conforming unless that option is specified.
Note that even that requirement has a loophole: if a program exceeds an implementation's translation limits, the implementation may behave in any manner whatsoever, without limitation, and without having to issue any sort of diagnostic. Although the Standard requires for each implementation there must exist at least one source program that at least nominally exercises the translation limits given in the Standard without causing the implementation to malfunction, an implementation may impose arbitrary restrictions on how the translation limits interact, e.g. allowing a program to either contain one identifier of up to 63 character, or a larger number of identifiers that are no more than three characters long. There are very few circumstances where anything an otherwise-conforming implementation might do with a particular source text would render it non-conforming.

Is providing the ability to violate "shall" requirement w/o generation of a diagnostic message a compiler bug / defect or feature?

Context: The C standard does not classify diagnostic messages as "warnings" or "errors".
Question: By treating certain "diagnostic messages" as "warnings" and by giving the ability to disable generation of warnings, certain compiler implementations allow to the end user to violate "shall" requirements of the C standard w/o generation of a diagnostic messages. Is this allowance a compiler bug / defect? If not, then how to correctly interpret this case? As a "compiler feature that allows to violate "shall" requirement w/o generation of a diagnostic message"?
Example:
#pragma warning( disable : 34 )
typedef int T[];
int main()
{
return sizeof(T);
}
$ cl t28.c /Za
<no diagnostic messages, the "shall" requirement [1] is silently violated>
[1] ISO/IEC 9899:1990:
The sizeof operator shall not be applied to an expression that has function type or an incomplete type.
UPD.
If /Za (Disable Language Extensions) is specified, then __STDC__ is defined with definition 1.
According to ANSI Conformance page (https://learn.microsoft.com/en-us/cpp/c-language/ansi-conformance?view=msvc-160):
Microsoft C conforms to the standard for the C language as set forth in the 9899:1990 edition of the ANSI C standard.
However, cl gives to the end user the ability to disable "shall requirement originated" warnings. Is it a compiler bug / defect or feature? Need to to correctly interpret this case.
C 2018 6.10.6 discusses the #pragma directive. Paragraph 1 says:
… causes the implementation to behave in an implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner…
That largely licenses the implementation to do anything it wants, as long as it documents it. If #pragma warning( disable : 34 ) is documented to disable the warning, and that is what it does, then that is conforming.
Note in particular that the #pragma “might … cause the translator … to behave in a non-conforming manner.” So, doing something that is otherwise non-conforming because a pragma told you to is conforming.
(I think the original text should say that the #pragma may cause the translator or program to behave in an otherwise non-conforming manner. Because, as currently written, behaving in this documented non-conforming manner is conforming, not non-conforming.)
"shall" (and "shall not") requirements in the standard come in two distinct kinds: restrictions on the program and restrictions on the implementation.
Restrictions on the implemention are things the implementation must (or must not) do -- these may have mandatory diagnostics associated with them.
Restrictions on the program are in fact freedoms for the implementation -- they are things that -- if the program does them -- cause undefined behavior, so the implementation can do anything with them and still be conforming.
The example you have above "The sizeof operator shall not be applied to an expression that "... is a restriction on the program. So a program that does that is not conforming and an implementation can do anything it wants (including treating it as an extension without any requirement for a flag or pragma) and still be conforming.

Is an implementation that allows you turn off diagnostics required by the standard still conforming?

Consider the following:
typedef int;
int main () { return 0; }
If I compile this with clang with no warning specifications I get
warning: typedef requires a name [-Wmissing-declarations]
typedef int;
That's to be expected; typedef int is illegal per section 6.7 of the C11 standard, and per section 5.1.1.3,
A conforming implementation shall produce at least one diagnostic message if a preprocessing translation unit or translation unit contains a violation of any syntax rule or constraint.
If I compile this using clang -Wno-missing-declarations, it compiles clean, without any diagnostic messages.
My question:
Does this mark clang as a non-conforming implementation, or is it okay to provide the ability to disable what would otherwise be mandatory diagnostics?
From the draft C11 standard section 4 Conformance we see that it is not strictly conforming:
A strictly conforming program shall use only those features of the
language and library specified in this International Standard.3) It
shall not produce output dependent on any unspecified, undefined, or
implementation-defined behavior, and shall not exceed any minimum
implementation limit.
but it is a conforming implementation since a conforming implementation is allowed to have extensions as long as they don't break a strictly conforming program:
[...]A conforming implementation may have extensions (including
additional library functions), provided they do not alter the behavior
of any strictly conforming program.4)
The C-FAQ says:
[...]There are very few realistic, useful, strictly conforming programs. On the other hand, a merely conforming program can make use of any compiler-specific extension it wants to.

Does C Standard guarantee diagnostic message for #error directive?

I have some trouble understanding semantics of 5.1.1.3/1 Diagnostics subclause from N1570 C11 draft (emphasis mine):
A conforming implementation shall produce at least one diagnostic message (identified in an implementation-defined manner) if a preprocessing translation unit or translation unit contains a violation of any syntax rule or constraint, even if the behavior is also explicitly specified as undefined or implementation-defined. Diagnostic messages need not be produced in other circumstances.9
I understand that the intent was to exclude (non-constraint) undefined behavior (thus that are no diagnostics on e.g. buffer overflow), but what about #error directive? As in 6.10.5/1 Error directive:
A preprocessing directive of the form
# error pp-tokensopt new-line
causes the implementation to produce a diagnostic message that includes the specified sequence of preprocessing tokens.
Does these both subclauses are not mutually exclusive?
For some other reference see also DR#176.
C99 went a step further than the suggested resolution for that DR. Instead of requiring a diagnostic, they require treating it as an error.
4. Conformance
4 The implementation shall not successfully translate a preprocessing translation unit
containing a #error preprocessing directive unless it is part of a group skipped by
conditional inclusion.
Now, strictly speaking, perhaps you're right that an implementation could choose to refuse to compile a program containing an #error directive without issuing a diagnostic, claiming that 5.1.1.3 allows it to ignore the semantics for #error. However, an implementation that goes to such lengths to being as useless as possible within the bounds set by the standard, would easily work around any attempt to require a diagnostic: the implementation could simply dump the complete preprocessor output (including anything following #error), and follow that by "there's an error in there somewhere". Because of that, it effectively doesn't matter whether the standard requires a diagnostic. There's no excuse for an implementation not to do so, and very little that the standard could to do force unwilling implementors.
As someone who served on the committee, our response to a question like this would often begin with a phrase along the lines of "A careful reading of the standard ...", which is not quite as glib as it sounds. the language used is very specific. Consider the first clause
A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
Diagnostic messages need not be produced in other circumstances.9
This clause is explicitly referring to a "violation of any syntax rule or constraint". A well formed #error directive does not, therefore, trigger this clause. Thus it does not apply, and mutual-exclusion is moot.
Also note the final sentence, where it says that diagnostics " need not be produced ". The term "need not" does not imply "must not". It simply means the implementation has the option whether or not to issue a diagnostic for other conditions (for example, style concerns). But again this entire clause is irrelevant for #error
Your second quote simply states exactly what an implementation must do for a well-formed #error directive.

Resources