GCC doesn't support simple integer constant expression? - c

GCC 4.9 and 5.1 reject this simple C99 declaration at global scope. Clang accepts it.
const int a = 1, b = a; // error: initializer element is not constant
How could such a basic feature be missing? It seems very straightforward.

C991 section 6.6 Constant expressions is the controlling section. It states in subsections 6 and 7:
6/ An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts.
Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
The definition of integer and floating point constants is specified in 6.4.4 of the standard, and it's restricted to actual values (literals) rather than variables.
7/ More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following (a) an arithmetic constant expression, (b) a null pointer constant, (c) an address constant, or (d) an address constant for an object type plus or minus an integer constant expression.
Since a is none of those things in either subsection 6 or 7, it is not considered a constant expression as per the standard.
The real question, therefore, is not why gcc rejects it but why clang accepts it, and that appears to be buried in subsection 10 of that same section:
10/ An implementation may accept other forms of constant expressions.
In other words, the standard states what an implementation must allow for constant expressions but doesn't limit implementations to allowing only that.
1 C11 is much the same other than minor things like allowing _Alignof as well as sizeof.

This is just the rules of C. It has always been that way. At file scope, initializers must be constant expressions. The definition of a constant expression does not include variables declared with const qualifier.
The rationale behind requiring initializers computable at compile-time was so that the compiler could just put all of the initialized static data as a bloc in the executable file, and then at load time that bloc is loaded into memory as a whole and voila, the global variables all have their correct initial values without any code needing to be executed.
In fact if you could have executable code as initializer for global variables, it introduces quite a lot of complication regarding which order that code should be run in. (This is still a problem in modern C++).
In K&R C, there was no const. They could have had a rule that if a global variable is initialized by a constant expression, then that variable also counts as a constant expression. And when const was added in C89, they could have also added a rule that const int a = 5; leads to a constant expression.
However they didn't. I don't know why sure, but it seems likely that it has to do with keeping the language simple. Consider this:
extern const int a, b = a;
with const int a = 5; being in another unit. Whether or not you want to allow this, it is considerably more complication for the compiler, and some more arbitrary decisions.
If you look at the current C++ rules for constant expressions (which still are not settled to everyone's satisfaction!) you'll see that each time you add support for one more "obvious" thing then there are two other "obvious" things that are next in line and it is never-ending.
In the early days of C, in the 1970s, keeping the compiler simple was important so it may have been that making the compiler support this meant the compiler used too many system resources, or something. (Hopefully a coder from that era can step in and comment more on this!)
Finally, the C89 standardization was quite a contentious process since there were so many different C compilers that had each gone their own way with language evolution. Demanding that a compiler vendor who doesn't support this, change their compiler to support it might be met with opposition, lowering the uptake of the standard.

Because const doesn't make a constant expression -- it makes a variable that can't be assigned to (only initialized). You need constexpr to make a constant expression, which is only available in C++. C99 has no way of making a named constant expression (other than a macro, which is sort-of, but not really an expression at all).

Related

Why are const qualified variables accepted as initializers on gcc?

When compiling this code in latest verson of gcc (or clang) with -std=c17 -pedantic-errors -Wall -Wextra
static const int y = 1;
static int x = y;
then I get no compiler diagnostic message even though I'm fairly sure that this is not valid C but a constraint violation. We can prove that it is non-conforming by taking look at C17 6.7.9/4:
Constraints
...
All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.
Then the definition about constant expressions, in this case an integer constant expression (6.6):
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts.
And then finally the definition about integer constants (6.4.4.1/2):
An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a suffix that specifies its type.
Thus a const int variable is not an integer constant nor is it an integer constant expression. And therefore not a valid initializer. This has been discussed before (for example here) and I think it's already established that this is non-conforming. However, my question is:
Why did gcc chose to be non-compliant even in strict mode?
clang has apparently always been non-compliant, but gcc changed from being compliant in version 7.3 to non-compliant in version 8.0 and above. gcc 7.3 and earlier gives "error: initializer element is not constant" even in default mode without -pedantic-errors.
Some sort of active, conscious decision seems to have been made regarding this message. Why was it removed entirely in gcc and why didn't they leave it as it was when compiling in strict mode -std=c17 -pedantic-errors?
Why did gcc chose to be non-compliant even in strict mode?
Inasmuch as the question as posed is directed to the motivation of the developers, the only information we have to go on as third parties comes from the public development artifacts, such as GCC bugzilla, repository commit messages, and actual code. As was pointed out in comments, the matter is discussed in the Bugzilla comment thread associated with the change.
The Bugzilla discussion appears to show that the developers considered the standard's requirements in this area, albeit in a somewhat perfunctory manner. See in particular comments 9 and 10. They raise paragraph 6.6/10 of the language specification:
An implementation may accept other forms of constant expressions.
They do not subject this to any particular scrutiny, and I read the comments more as seeking a justification for the change than as a thoughtful inquiry into GCC conformance considerations.
Thus, they made the change because they wanted to implement the feature request, and they found sufficient (for them) justification in the language of the standard to consider the altered behavior to be consistent with language constraints, therefore not requiring a diagnostic.
There is also an implied question of whether recent GCC's silent acceptance of the declaration forms presented in fact violates conforming processors' obligation to diagnose constraint violations.
Although it is possible to interpret 6.6/10 as allowing implementations to accept any expressions they choose as conforming to the requirements for any kind of constant expression, that seems fraught. Whether a given piece of code satisfies the language's constraints should not be implementation dependent. Either of these points of interpretation, if accepted, would resolve that problem:
6.6/10 should be interpreted as expressing a specific case of the general rule that a conforming implementation may accept non-conforming code, without implying that doing so entitles the processor to treat the code as conforming.
6.6/10 should be interpreted as permitting processors to interpret more expressions as "constant expressions" than those described in the preceding paragraphs, but that has no bearing on the definitions of the specific kinds of constant expressions defined in those paragraphs ("integer constant expressions" and "arithmetic constant expressions").
Those are not mutually exclusive. I subscribe to the latter, as I have written previously, and I tend to favor the former as well.

Meaning and example of Undefined behaviours related to constant expression in C99

I don't understand the Undefined Behaviours in C99 related to constant expression.
For example:
An expression that is required to be an integer constant expression
does not have an integer type; has operands that are not integer
constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, or immediately-cast
floating constants; or contains casts (outside operands to sizeof
operators) other than conversions of arithmetic types to integer types
(6.6).
I can't find an example of such UB ?
Furthermore I don't understant why a constant expression (evaluated at translation time) does not become an expression evaluated at runtime (instead of being UB).
This is quoted from the informative annex J. To find the actual normative text you have to go the section that the appendix J points at, in this case the definition of integer constant expression C99 6.6:
An integer constant expression99) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and floating constants that are the
immediate operands of casts.
That text is pretty self-explanatory IMO. That is: whenever syntax or normative text elsewhere requires an integer constant expression, whatever you place at such a location must fulfil the above quoted part, or it is not an integer constant expression but undefined behavior. (Violating a "shall" requirement in normative ISO C text is always UB.)
I'd expect compilers to be good at giving errors for this since it's compile-time UB.
For example, this is invalid since an array declaration with static storage duration requires the size to be integer constant expression:
int a=1;
static int x [a];
Similarly, int x [1 + 1.0]; would be invalid but int x[1 + (int)1.0]; is ok.
According to N1570 6.6p10, "An implementation may accept other forms of constant expressions." In general, situations where an implementation would be allowed to reject a program, but would also be allowed to accept it, are classified as Undefined Behavior. While it might be helpful to specify that an implementation given something like (at file scope):
int x,y;
int sz = (uintptr_t)&y - (uintptr_t)&x;
would be required to either reject the program, or else behave as though sz is initialized to a value matching what would be computed if the indicated conversions and subtraction would be performed at runtime, such constructs would often require linker support, and a compiler may have no way of knowing for certain what constructs the linker would support, or what it would do if code uses an unsupportable construct.
The Standard does not use the term "Undefined Behavior" purely to refer to erroneous constructs, but also applies it to non-portable ones which might be unsupportable or erroneous on some implementations but correct on others. The authors of the Standard note that Undefined Behavior, among other things, identifies potential areas of "conforming language extension" by allowing implementations to define behaviors beyond those mandated by the Standard. Viewed in that light, classifying the processing of non-standard forms of integer constant expressions as Undefined Behavior allows compilers to support such constructs when practical and useful, without imposing requirements on the behavior of such constructs that some implementations might be unable to meet.
Returning to the earlier example, a compiler might compute the difference between &y and &x as the difference between the two objects' offsets within their respective data sections. Such a computation might only be useful if the objects happened to be defined in the same translation unit, and might yield a meaningless value, without necessarily issuing a diagnostic, if they're not. A compiler, however, would have no way of knowing whether the objects are defined in the same translation unit, and the Standard would have no concept of code whose behavior would be meaningfully defined if two externally-defined objects are defined in the same compilation unit, but not if they aren't. The Standard term for behavior that implementations would define in some cases, but not in others, based upon criteria outside the Standard's jurisdiction, is "Undefined Behavior".

Why bitwise-or doesn't result in a constant expression, but addition does

In one of my C files, I'm declaring an array foo. Then I'm assigning the address of that variable to an integer type, and I want to bitmask it with 3 to set the lowest two bits. However, the bitmask fails during compiling but adding +3 seems to work. Why?
uint64_t foo[1];
uint64_t bar = (uint64_t)foo | 3;
This fails with:
main.c:6:16: error: initializer element is not constant
uint64_t bar = (uint64_t)foo | 3;
But this works:
uint64_t foo[1];
uint64_t bar = (uint64_t)foo + 3;
As I understand it, the location of foo is not known at compile time because it's global (will be in the .data or .bss section). However, an entry is put into the relocation section so that the linker can patch the address in while linking.
How is it handling the the bitwise-or and the addition? Why does one work while the other doesn't?
Initial values for static objects must be constant expressions or string literals. (C 2018 6.7.9 3: “All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.”)
6.6 7 specifies forms of constant expressions for initializers:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
— an arithmetic constant expression,
— a null pointer constant,
— an address constant, or
— an address constant for a complete object type plus or minus an integer constant expression.
Consider uint64_t bar = (uint64_t)foo + 3;. foo is nominally the static array declared earlier, which is automatically converted to a pointer to its first element. This qualifies as an address constant (6.6 9: “An address constant is … a pointer to an lvalue designating an object of static storage duration,… However, it is cast to uint64_t, which no longer qualifies as an address constant, an address constant plus or minus a constant expression, or a null pointer constant.
Is it an arithmetic constant expression? 6.6 8 excludes it:
… Cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types,…
Thus, (uint64_t)foo + 3 does not qualify as any form of constant expression required by the C standard. However, 6.6 10 says:
An implementation may accept other forms of constant expressions.
So a C implementation may accept (uint64_t) foo + 3 or (uint64_t) foo | 3 as a constant expression. Our question is then why does your C implementation accept the former but not the latter.
A common feature of linkers and object module formats is that the object module can record placeholders for certain expressions, and the linkers can evaluate these expressions and replace the placeholders with calculated values. A primary purpose of this feature is to allow for code in a program to refer to places in data or other code whose locations are not completely known during compilation but that will be decided (at least relative to some base reference point) during linking.
Places in data or code are measured relative to symbols (names) defined in the object modules (or relative to the starts of sections or segments). Thus, a place may be described, in effect, as “34 bytes after the start of routine bar” or “8 bytes after the start of object baz”. So the object module has support for placeholders that are composed of a displacement and a symbol name. After the linker assigns addresses to symbols, it reviews each placeholder, adds the displacement to the assigned address, and replaces the placeholder with the calculated result.
It appears your compiler, in spite of the uint64_t cast, is able to recognize that (uint64_t) foo is still the address of foo, and therefore (uint64_t) foo + 3 may be implemented by the regular use of one of these placeholders.
In contrast, the bitwise OR operator is not supported for use in these placeholders, and therefore the compiler is unable to implement (uint64_t) foo | 3. It cannot evaluate the expression itself (because it does not know the final address for foo), and it cannot write a placeholder for the expression. So it does not accept this as a constant expression.
When you say
sometype *p = f(x);
where p is a global variable (or one with static duration) and where f(x) is not an actual function call but rather, some sequence of compile-time operations involving the address of another symbol x which won't be known until link time, the compiler obviously can't compute the initial value immediately. It actually emits an assembly language directive which causes the assembler to construct a relocation record which causes the linker to evaluate f(x) once the final location of the symbol x is known.
So f(x) (whatever sequence of operations it actually is) has to be, in effect, a function that the linker knows how to evaluate (and that there's a relocation record for, and if necessary an assembly language directive for). And while conventional linkers are good at performing addition and subtraction (because they do it all the time), they don't necessarily know how to perform other kinds of arithmetic.
So in consequence of all this, there are some additional rules on what kinds of arithmetic you can do while constructing pointer constants.
I'm in a hurry this morning and don't have time to dig through the Standard, but I'm pretty sure there's a sentence in there somewhere stating that among other restrictions on constant expressions, when you're initializing a pointer, you're limited to an address plus or minus an integer constant expression (since that's all the C Standard is willing to assume the linker is going to know how to do).
Your question has the additional complication that you're not actually initializing a pointer variable, but rather, an integer. In that case you get, in effect, the worst of both worlds: you're either not allowed to do it at all, or if the compiler lets you, the initializer on the right (since it involves an address/pointer), is limited to the kinds of arithmetic you can do while constructing pointer constants, as described above. You don't get to do the arbitrary arithmetic you'd be able to get away with (perhaps with confounding casts) in an integer expression at run time.
According to the standard, the result of casting a pointer to an integer type is not a constant expression. So both of your examples may be rejected by a conforming compiler.
However there is the clause C11 6.6/10:
An implementation may accept other forms of constant expressions.
which unfortunately means that any particular compiler could accept none, one, or both of your examples.

Why can't a static initialization expression in C use an element of a constant array?

The following (admittedly contrived) C program fails to compile:
int main() {
const int array[] = {1,2,3};
static int x = array[1];
}
When compiling the above C source file with gcc (or Microsoft's CL.EXE), I get the following error:
error: initializer element is not constant
static int x = array[1];
^
Such simple and intuitive syntax is certainly useful, so this seems like it should be legal, but clearly it is not. Surely I am not the only person frustrated with this apparently silly limitation. I don't understand why this is disallowed-- what problem is the C language trying to avoid by making this useful syntax illegal?
It seems like it may have something to do with the way a compiler generates the assembly code for the initialization, because if you remove the "static" keyword (such that the variable "x" is on the stack), then it compiles fine.
However, another strange thing is that it compiles fine in C++ (even with the static keyword), but not in C. So, the C++ compiler seems capable of generating the necessary assembly code to perform such an initialization.
Edit:
Credit to Davislor-- in an attempt to appease the SO powers-that-be, I would seek following types of factual information to answer the question:
Is there any legacy code that supporting these semantics would break?
Have these semantics ever been formally proposed to the standards committee?
Has anyone ever given a reason for rejecting the allowance of these semantics?
Objects with static storage duration (read: variables declared at file scope or with the static keyword) must be initialized by compile time constants.
Section 6.7.9 of the C standard regarding Initialization states:
4 All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or
string literals.
Section 6.6 regarding Constant Expressions states:
7 More latitude is permitted for constant expressions in initializers. Such a constant
expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
8 An arithmetic constant expression shall have arithmetic type and shall only have operands that are integer constants, floating
constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and _Alignof
expressions. Cast operators in an arithmetic constant expression shall
only convert arithmetic types to arithmetic types, except as part of
an operand to a sizeof or
_Alignof operator.
9 An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type. The
array-subscript [] and member-access . and -> operators, the address &
and indirection * unary operators, and pointer casts may be used in
the creation of an address constant, but the value of an object shall
not be accessed by use of these operators.
By the above definition, a const variable does not qualify as a constant expression, so it can't be used to initialize a static object. C++ on the other had does treat const variables as true constants and thus allows them to initialize static objects.
If the C standard allowed this, then compilers would have to know what is in arrays. That is, the compiler would have to have a compile-time model of the array contents. Without this, the compiler has a small amount of work to do for each array: It needs to know its name and type (including its size), and a few other details such as its linkage and storage duration. But, where the initialization of the array is specified in the code, the compiler can just write the relevant information to the object file it is growing and then forget about it.
If the compiler had to be able to fetch values out of the array at compile time, it would have to remember that data. As arrays can be very large, that imposes a burden on the C compiler that the committee likely did not desire, as C is intended to operate in a wide variety of environments, including those with constrained resources.
The C++ committee made a different decision, and C++ is much more burdensome to translate.

Need clarification about constant expressions

K&R c 2nd edition(section 2.3) mentions
A constant expression is an expression that involves only constants. Such expressions may be evaluated at during compilation rather than run-time, and accordingly may be used in any place that a constant can occur
however, I have several doubts regarding it:
Will this expression be considered as constant expression?
const int x=5;
const int y=6;
int z=x+y;
i.e using const keyword is considered constant expression or not?
Is there any technique by which we can check whether an expression was evaluated during compilation or during run-time?
Are there any cases where compile time evaluation produces different result than run-time evaluation?
Should I even care about it? (maybe I use it to optimize my programs)
Perhaps. A compiler can add more forms of constant expressions, so if it can prove to itself that the variable references are constant enough it can compute the expression at compile-time.
You can (of course) disassemble the code and see what the compiler did.
Not if the compiler is standards-compliant, no. The standard says "The semantic rules for the evaluation of a constant expression are the same as for nonconstant expressions" (§6.6 11 in the C11 draft).
Not very much, no. :) But do use const for code like that anyway!
using const keyword is considered constant expression or not?
>> No, it is not a constant. The variable using const is called const qualified, but not a compile time constant.
Is there any technique by which we can check whether an expression was evaluated during compilation or during run-time?
>> (as mentioned in Mr. Unwind's answer) Disassemble the code.
Are there any cases where compile time evaluation produces different result than run-time evaluation?
>> No, it will not. refer to Chapter §6.6 11, C11 standard.
FWIW, in case of usage with sizeof operator (compile time, though not constant expression), NULL pointer dereference will be ok. Compile time NULL pointer dereference invokes undefined behaviour.
Should I even care about it? (maybe I use it to optimize my programs)
>> Opinion-based, so won't answer.
x and y are const, z is not. compiller probably will substitute x and y , but will not substitute z. but probably compiller will calc 5 + 6 as well and will assign to z directly.
not sure you can check generated assembler code, but I do not know how this can be done.
not. compile time means expression is already calculated in run time.
I care :) but it appies only when you need fast execution.
In C, the const qualifier is just a guarantee given by the programmer to the compiler that he will not change the object. Otherwise it does not have special meanings as in C++. The initializer for such objects with file- or global scope has to be a constant expression.
As an extension, gcc has a builtin function (int __builtin_constant_p (exp)) to determine if a value is constant.
No, it shall not - unless you exploit implementation defined or undefined behaviour and compiler and target behave differently. [1]
As constant expressions are evaluated at compile-time, they safe processing time and often code space and possibly data space. Also, in some places (e.g. global initializers), only constant expressions are allowed. See the standard.
[1]: One example is right shifting a signed negative integer constant, e.g. -1 >> 24. As that is implementation defined, the compiler might yield a different result from a program run using a variable which holds the same value:
int i = -1;
(-1 >> 24) == (i >> 24)
^ ^--- run-time evaluated by target
+--- compile-time evaluated by compiler
The comparison might fail.

Resources