When compiling this code in latest verson of gcc (or clang) with -std=c17 -pedantic-errors -Wall -Wextra
static const int y = 1;
static int x = y;
then I get no compiler diagnostic message even though I'm fairly sure that this is not valid C but a constraint violation. We can prove that it is non-conforming by taking look at C17 6.7.9/4:
Constraints
...
All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.
Then the definition about constant expressions, in this case an integer constant expression (6.6):
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts.
And then finally the definition about integer constants (6.4.4.1/2):
An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix that specifies its type.
Thus a const int variable is not an integer constant nor is it an integer constant expression. And therefore not a valid initializer. This has been discussed before (for example here) and I think it's already established that this is non-conforming. However, my question is:
Why did gcc chose to be non-compliant even in strict mode?
clang has apparently always been non-compliant, but gcc changed from being compliant in version 7.3 to non-compliant in version 8.0 and above. gcc 7.3 and earlier gives "error: initializer element is not constant" even in default mode without -pedantic-errors.
Some sort of active, conscious decision seems to have been made regarding this message. Why was it removed entirely in gcc and why didn't they leave it as it was when compiling in strict mode -std=c17 -pedantic-errors?
Why did gcc chose to be non-compliant even in strict mode?
Inasmuch as the question as posed is directed to the motivation of the developers, the only information we have to go on as third parties comes from the public development artifacts, such as GCC bugzilla, repository commit messages, and actual code. As was pointed out in comments, the matter is discussed in the Bugzilla comment thread associated with the change.
The Bugzilla discussion appears to show that the developers considered the standard's requirements in this area, albeit in a somewhat perfunctory manner. See in particular comments 9 and 10. They raise paragraph 6.6/10 of the language specification:
An implementation may accept other forms of constant expressions.
They do not subject this to any particular scrutiny, and I read the comments more as seeking a justification for the change than as a thoughtful inquiry into GCC conformance considerations.
Thus, they made the change because they wanted to implement the feature request, and they found sufficient (for them) justification in the language of the standard to consider the altered behavior to be consistent with language constraints, therefore not requiring a diagnostic.
There is also an implied question of whether recent GCC's silent acceptance of the declaration forms presented in fact violates conforming processors' obligation to diagnose constraint violations.
Although it is possible to interpret 6.6/10 as allowing implementations to accept any expressions they choose as conforming to the requirements for any kind of constant expression, that seems fraught. Whether a given piece of code satisfies the language's constraints should not be implementation dependent. Either of these points of interpretation, if accepted, would resolve that problem:
6.6/10 should be interpreted as expressing a specific case of the general rule that a conforming implementation may accept non-conforming code, without implying that doing so entitles the processor to treat the code as conforming.
6.6/10 should be interpreted as permitting processors to interpret more expressions as "constant expressions" than those described in the preceding paragraphs, but that has no bearing on the definitions of the specific kinds of constant expressions defined in those paragraphs ("integer constant expressions" and "arithmetic constant expressions").
Those are not mutually exclusive. I subscribe to the latter, as I have written previously, and I tend to favor the former as well.
Related
I don't understand the Undefined Behaviours in C99 related to constant expression.
For example:
An expression that is required to be an integer constant expression
does not have an integer type; has operands that are not integer
constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, or immediately-cast
floating constants; or contains casts (outside operands to sizeof
operators) other than conversions of arithmetic types to integer types
(6.6).
I can't find an example of such UB ?
Furthermore I don't understant why a constant expression (evaluated at translation time) does not become an expression evaluated at runtime (instead of being UB).
This is quoted from the informative annex J. To find the actual normative text you have to go the section that the appendix J points at, in this case the definition of integer constant expression C99 6.6:
An integer constant expression99) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and floating constants that are the
immediate operands of casts.
That text is pretty self-explanatory IMO. That is: whenever syntax or normative text elsewhere requires an integer constant expression, whatever you place at such a location must fulfil the above quoted part, or it is not an integer constant expression but undefined behavior. (Violating a "shall" requirement in normative ISO C text is always UB.)
I'd expect compilers to be good at giving errors for this since it's compile-time UB.
For example, this is invalid since an array declaration with static storage duration requires the size to be integer constant expression:
int a=1;
static int x [a];
Similarly, int x [1 + 1.0]; would be invalid but int x[1 + (int)1.0]; is ok.
According to N1570 6.6p10, "An implementation may accept other forms of constant expressions." In general, situations where an implementation would be allowed to reject a program, but would also be allowed to accept it, are classified as Undefined Behavior. While it might be helpful to specify that an implementation given something like (at file scope):
int x,y;
int sz = (uintptr_t)&y - (uintptr_t)&x;
would be required to either reject the program, or else behave as though sz is initialized to a value matching what would be computed if the indicated conversions and subtraction would be performed at runtime, such constructs would often require linker support, and a compiler may have no way of knowing for certain what constructs the linker would support, or what it would do if code uses an unsupportable construct.
The Standard does not use the term "Undefined Behavior" purely to refer to erroneous constructs, but also applies it to non-portable ones which might be unsupportable or erroneous on some implementations but correct on others. The authors of the Standard note that Undefined Behavior, among other things, identifies potential areas of "conforming language extension" by allowing implementations to define behaviors beyond those mandated by the Standard. Viewed in that light, classifying the processing of non-standard forms of integer constant expressions as Undefined Behavior allows compilers to support such constructs when practical and useful, without imposing requirements on the behavior of such constructs that some implementations might be unable to meet.
Returning to the earlier example, a compiler might compute the difference between &y and &x as the difference between the two objects' offsets within their respective data sections. Such a computation might only be useful if the objects happened to be defined in the same translation unit, and might yield a meaningless value, without necessarily issuing a diagnostic, if they're not. A compiler, however, would have no way of knowing whether the objects are defined in the same translation unit, and the Standard would have no concept of code whose behavior would be meaningfully defined if two externally-defined objects are defined in the same compilation unit, but not if they aren't. The Standard term for behavior that implementations would define in some cases, but not in others, based upon criteria outside the Standard's jurisdiction, is "Undefined Behavior".
Are there features / semantics introduced, or removed, in C99 which would make a well defined program written in C89 either
invalid (i.e not compiling anymore, according to the C99 standard)
compiling, but having different semantics.
My findings so far, concerning plainly invalid programs:
implicit int (C89 §3.5.2)
implicit function declaration (C89 §3.3.2.2)
not returning from a function expecting a return value (C89 §3.6.6.4)
using new keywords as identifier (for example restrict, inline, etc)
hacks involving //, which are now treated as comments. However, nearly never encountered in production code.
Subtle changes, making the same code having different semantics:
Integer division has been made well defined, for example -3 / 2 now has to truncate towards zero (C99 §6.5.5/6), instead of being implementation defined (C89 §3.3.5/6)
strtod gained the ability to parse hexadecimal numbers in C99, by parsing 0x or 0X
What have I missed?
There are a lot of programs which would have been considered valid under C89, prior to the publication of C99, which some people insist were never valid. C89 includes a rule that requires that an object of any type may only be accessed using a pointer of that type, a related type, or a character type. Prior to the publication of C99, this rule was generally interpreted as applying only to "named" objects (variables of static or automatic duration which are accessed directly by name), and only in situations where the object in question didn't have its address taken immediately before it was used as a different pointer type. Such interpretation was motivated by a number of factors:
One of the stated goals of the Standard was to fit with what existing compilers and programs were doing, and while it would have been rare for existing programs to access discrete named variables using pointers of different types other than in cases where the variable's address was taken immediately before such use, many other usages of pointer type punning were quite common.
The rationale for the Standard includes as its sole example a function which receives a pointer of one primitive type to write a global variable of another primitive type in such a way that a compiler would have no particular reason to expect aliasing. Being able to keep global variables in registers is clearly a useful optimization, and the stated purpose of the rule is to allow such optimizations in cases where a compiler would have no reason to expect aliasing to occur. Outlawing constructs like like (int*)&foo=23; does nothing to aid such optimizations, since the fact that code is taking foo's address and dereferencing it should make it abundantly clear to any compiler that isn't being deliberately obtuse that the code is going to modify foo.
There are many kinds of code which require semantically the ability to use memory bits as various types, and nothing in the Standard indicate that the rules were intended to make programmers jump through hoops (e.g. by using memcpy) to achieve semantics that could have been easily obtained in the absence of the rules, especially considering that using memcpy would prevent the compiler from keeping global variables in registers across the pointer accesses (thus defeating the purpose for which the rules were written in the first place).
If structure types V and W have a common initial sequence, U is any union type containing both, and p is a V* which identifies the V within a U, then (W*)(U*)p may be used to access those common members, and will be equivalent to (W*)p. Unless a compiler could show that p couldn't possibly be a pointer to a member of some union containing W, it would be required to allow (W*)p to access the common members; it was more helpful to simply treat such common member access as being legitimate regardless of whether or where U might exist than to search for excuses to deny it.
Nothing in the C89 rules makes clear how the "type" of a region of allocated storage is defined, or how storage which holds things of one type that are no longer needed might be re-purposed to hold things of another.
Keeping track of registers allocated to named variables was easier than keeping track of registers allocated to other pointer exceptions, and code which was interested in minimizing the number of loads and stores via pointers would often copy things to named variables and work on them there.
C99 added "effective type" rules which are explicitly applicable to allocated storage. Some people insist those were merely "clarifications" of rules which already existed in C89, but for the above reasons I find that viewpoint untenable. It's fashionable to claim that the only reasons compilers didn't apply aliasing rules to unnamed objects are #5 and #6, but objections #1-#4 are equally significant (and continue to apply to C99 just as much as C89). Still, since C99 added the effective type rules, many constructs which would have been treated as legitimate by most common interpretations of the C89 rules are clearly forbidden.
As an element of contrast and comparison, the git/git codebase remains strictly conform to C89 and does not use C99 initializers, or features from newer C standard.
This is detailed in Git 2.23 (Q3 2019) in Git Coding Guidelines.
This answer illustrates post-C89 feature that might be compatible with C89.
See commit cc0c429 (16 Jul 2019) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit fe9dc6b, 25 Jul 2019)
CodingGuidelines: spell out post-C89 rules
Even though we have been sticking to C89, there are a few handy features we borrow from more recent C language in our codebase after trying them in weather balloons and saw that nobody screamed.
Spell them out.
While at it, extend the existing variable declaration rule a bit to
read better with the newly spelled out rule for the for loop.
The coding guidelines now include:
You should not use features from newer C standard, even if your compiler groks them.
There are a few exceptions to this guideline:
since early 2012 with e1327023ea (Git v1.7.9.2), we have been using an enum definition whose last element is followed by a comma.
This, like an array initializer that ends with a trailing comma, can be used to reduce the patch noise when adding a new identifer at the end.
since mid 2017 with cbc0f81d (Git v2.15.0-rc0), we have been using designated
initializers for struct (e.g. "struct t v = { .val = 'a' };")
There are certain C99 features that might be nice to use in our code base, but we've hesitated to do so in order to avoid breaking compatibility with older compilers.
But we don't actually know if people are even using pre-C99 compilers these days.
If this patch can survive a few releases without complaint, then we can feel more confident that designated initializers are widely supported by our user base.
It also is an indication that other C99 features may be supported, but not a guarantee (e.g., gcc had designated initializers before C99 existed).
since mid 2017 with 512f41cf (Git v2.15.0-rc0), we have been using designated initializers for array (e.g. "int array[10] = { [5] = 2 }").
This is another test balloon to see if we get complaints from people
whose compilers do not support designated initializer for arrays.
These used to be forbidden, but we have not heard any breakage report, and they are assumed to be safe.
Variables have to be declared at the beginning of the block, before the first statement (i.e. -Wdeclaration-after-statement).
Declaring a variable in the for loop "for (int i = 0; i < 10; i++)" is still not allowed in this codebase.
In this rule you have to go to ISO/IEC 9899:1990 Appendix G and study each case of Implementation defined behavior to document them.
It's a difficult task to determine what are the manual checks to do in the code.
Is there some kind of list of manual checks to do because of this rule?
MISRA-C is primarily concerned with avoiding unpredictable behavior in the C language, those “traps and pitfalls” (such as undefined and unspecified behavior) all C developers should be aware of that a compiler will not always warn you about. This includes implementation-defined behavior, where the C standard specifies the behavior of certain constructs after compilation can vary. These tend to be less critical from a safety point of view, provided the compiler documentation describes its intended behavior as required by the standard.
That is, for each specific compiler the behavior is well-defined, but the concern is to assure the developers have verified this, including documenting language extensions, known bugs in the compiler (and build chain) and workarounds.
Although it is possible to manually check C code fully for MISRA-C compliancy, it is not recommended. The guidelines were developed with static analysis tools in mind. Not all guidelines can be fully checked by tools, but the better MISRA-C tools (be careful in your evaluations, there are not many “good” ones), will at least assist where it can identify automatically where code relies on implementation-specific behavior. This includes all the checks required in Rule 3.1., where implementation-defined behavior cannot be completely checked by a tool, then a manual review will be required.
Also, if you are starting a new MISRA-C project, I highly recommend referring to MISRA-C:2012, even if you are required to be MISRA-C:2004 compliant. Having MISRA-C:2012 around helps, because it has clarified many of the guidelines, including additional rationale, explanations and examples. The standard (which can be obtained at misra-c.com ) lists the C90 and C99 implementation-defined behaviors that are considered to have the potential to cause unintended behavior. This may or may not overlap with guidelines that address implementation-defined behaviors that MISRA-C is specifically concerned about.
First of all, the standard definition of implementation-defined behavior is: specific behavior which the compiler must document. So you can always refer to the compiler documentation whenever there is a need to document how a certain implementation-defined behavior is implemented.
What's left to you to do then is to document where the code relies on implementation-defined behavior. This is preferably done in source code comments.
Spontaneously, here are the most important things which you need to look for in the code. The list is not including those cases that are already covered by other MISRA rules (for example signedness of char).
The size of all the integer types. The size of int being most important, as it determines which type that is given to integer literals, C "boolean" expressions, implicitly promoted integers etc etc.
Obscure integer formats that aren't standard two's complement.
Any reliance on endianess.
The enum type format.
The floating point format.
Pointer to integer conversions, in case they are obscure on the given system.
Behavior of function inlining and the register keyword, if these are used.
Alignment issues including struct padding. Reliance on the size of a struct/union.
#include paths, in case they are obscure. Particularly if they are absolute and not relative.
Bitwise operators mixed with signed types (in most cases this is a bug or design mistake).
My CCS 6.1 ARM compiler (for LM3Sxxxx Stellaris) throws a warning :
"MISRA Rule 12.2. The value of an expression shall be the same under any order of evaluation that the standard permits"
for following code:
typedef struct {
...
uint32_t bufferCnt;
uint8_t buffer[100];
...
} DIAG_INTERFACE_T;
static DIAG_INTERFACE_T diagInterfaces[1];
...
DIAG_INTERFACE_T * diag = &diagInterfaces[0];
uint8_t data = 0;
diag->bufferCnt = 0;
diag->buffer[diag->bufferCnt++] = data; // line where warning is issued
...
I don't see a problem in my code. Is it false positive or my bug?
Put diag->bufferCnt++ in a separate statement (as it is also advised by Hans in OP comments) and the warning should not appear.
But regarding MISRA rule 12.2 I see no violation of 12.2 (there is a single sequence point in your statement and no unspecified behavior) in your program and I think it's a bug in your MISRA software.
For information there is also an advisory 12.13 rule in MISRA that says:
(MISRA-C:2004, 12.13) "The increment (++) and decrement (--) operators should not be mixed with other operators in an expression"
The problem with MISRA is their terminology use is far from perfect, for 12.3, while -> or = are C operators, in the explanation they then seem to talk only about arithmetic operators...
Although you don’t indicate it, this is MISRA-C:2004, Rule 12.2, and is now MISRA-C:2012 Rule 13.2. As oauh says, this has nothing to do with "order of evaluation”.
I highly recommend referring to MISRA-C:2012 even if you are required to be MISRA-C:2004 compliant, having MISRA-C:2012 around helps, because it has clarified many of the guidelines, including additional rationale, explanations and examples.
You should not be using a compiler to solely check for MISRA-C compliancy, its nice, but compilers #1 goal is not to warn you about all the traps and pitfalls of the language it is dedicated to take advantage of (optimization). They're not very precise either, as in this case. Also, there are many undefined behaviors across translation units, compilers cannot warn about. Its best to also use a dedicated MISRA Static analysis tool, one that is not compiler specific, but that warns about all unpredictable constructs from the ISO C standards point of view, not a particular implementation.
As oauh also said, this is a violation of MISRA-C:Rule 12.13, which is now MISRA-C:2012 Rule 13.3 which has been relaxed to permit ++ and -- to be mixed with other operators, provided that the ++ or -- is the only source of side-effects (in your case the assignment is also a side effect in C terminology).
The Rule is not critical, i.e. its well defined behavior, but the different values resulting from the prefix version and the postfix version can cause confusion, thus it is “advisory” meaning no formal deviation is required (again, a decent MISRA-C tool would allow you to suppress this particular violation).
GCC 4.9 and 5.1 reject this simple C99 declaration at global scope. Clang accepts it.
const int a = 1, b = a; // error: initializer element is not constant
How could such a basic feature be missing? It seems very straightforward.
C991 section 6.6 Constant expressions is the controlling section. It states in subsections 6 and 7:
6/ An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts.
Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
The definition of integer and floating point constants is specified in 6.4.4 of the standard, and it's restricted to actual values (literals) rather than variables.
7/ More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following (a) an arithmetic constant expression, (b) a null pointer constant, (c) an address constant, or (d) an address constant for an object type plus or minus an integer constant expression.
Since a is none of those things in either subsection 6 or 7, it is not considered a constant expression as per the standard.
The real question, therefore, is not why gcc rejects it but why clang accepts it, and that appears to be buried in subsection 10 of that same section:
10/ An implementation may accept other forms of constant expressions.
In other words, the standard states what an implementation must allow for constant expressions but doesn't limit implementations to allowing only that.
1 C11 is much the same other than minor things like allowing _Alignof as well as sizeof.
This is just the rules of C. It has always been that way. At file scope, initializers must be constant expressions. The definition of a constant expression does not include variables declared with const qualifier.
The rationale behind requiring initializers computable at compile-time was so that the compiler could just put all of the initialized static data as a bloc in the executable file, and then at load time that bloc is loaded into memory as a whole and voila, the global variables all have their correct initial values without any code needing to be executed.
In fact if you could have executable code as initializer for global variables, it introduces quite a lot of complication regarding which order that code should be run in. (This is still a problem in modern C++).
In K&R C, there was no const. They could have had a rule that if a global variable is initialized by a constant expression, then that variable also counts as a constant expression. And when const was added in C89, they could have also added a rule that const int a = 5; leads to a constant expression.
However they didn't. I don't know why sure, but it seems likely that it has to do with keeping the language simple. Consider this:
extern const int a, b = a;
with const int a = 5; being in another unit. Whether or not you want to allow this, it is considerably more complication for the compiler, and some more arbitrary decisions.
If you look at the current C++ rules for constant expressions (which still are not settled to everyone's satisfaction!) you'll see that each time you add support for one more "obvious" thing then there are two other "obvious" things that are next in line and it is never-ending.
In the early days of C, in the 1970s, keeping the compiler simple was important so it may have been that making the compiler support this meant the compiler used too many system resources, or something. (Hopefully a coder from that era can step in and comment more on this!)
Finally, the C89 standardization was quite a contentious process since there were so many different C compilers that had each gone their own way with language evolution. Demanding that a compiler vendor who doesn't support this, change their compiler to support it might be met with opposition, lowering the uptake of the standard.
Because const doesn't make a constant expression -- it makes a variable that can't be assigned to (only initialized). You need constexpr to make a constant expression, which is only available in C++. C99 has no way of making a named constant expression (other than a macro, which is sort-of, but not really an expression at all).