Details of what constitutes a constant expression in C? - c

C defines at least 3 levels of "constant expression":
constant expression (unqualified)
arithmetic constant expression
integer constant expression
6.6 paragraph 3 reads:
Constant expressions shall not contain assignment, increment, decrement, function-call,
or comma operators, except when they are contained within a subexpression that is not
evaluated.
So does this mean 1,2 is not a constant expression?
Paragraph 8 reads:
An arithmetic constant expression shall have arithmetic type and shall only have
operands that are integer constants, floating constants, enumeration constants, character
constants, and sizeof expressions. Cast operators in an arithmetic constant expression
shall only convert arithmetic types to arithmetic types, except as part of an operand to a
sizeof operator whose result is an integer constant.
What are the operands in (union { uint32_t i; float f; }){ 1 }.f? If 1 is the operand, then this is presumably an arithmetic constant expression, but if { 1 } is the operand, then it's clearly not.
Edit: Another interesting observation: 7.17 paragraph 3 requires the result of offsetof to be an integer constant expression of type size_t, but the standard implementations of offsetof, as far as I can tell, are not required to be integer constant expressions by the standard. This is of course okay since an implementation is allowed (under 6.6 paragraph 10) to accept other forms of constant expressions, or implement the offsetof macro as __builtin_offsetof rather than via pointer subtraction. The essence of this observation, though, is that if you want to use offsetof in a context where an integer constant expression is required, you really need to use the macro provided by the implementation and not roll your own.

Based on your reading, 1,2 isn't a constant expression. I don't know why it isn't, just that I agree with you that it isn't (despite the fact that it probably should be).
6.5.2 specifies compound literals as a postfix operator. So in
(union { uint32_t i; float f; }){ 1 }.f
The operands are (union { uint32_t i; float f; }){ 1 } and f to the . operator. It is not an arithmetic constant expression, since the first argument is a union type, but it is a constant expression.
UPDATE: I was basing this on a different interpretation of the standard.
My previous reasoning was that (union { uint32_t i; float f; }){ 1 }.f met the criteria for a constant expression, and was therefore a constant expression. I still think it meets the criteria for a constant expression (6.6 paragraph 3) but that it is not any of the standard types of constant expressions (integer, arithmetic, or address) and is therefore only subject to being a constant expression by 6.6 paragraph 10, which allows implementation-defined constant expressions.
I'd also been meaning to get to your edit. I was going to argue that the "hack" implementation of offsetof was a constant expression, but I think it's the same as above: it meets the criteria for a constant expression (and possibly an address constant) but is not an integer constant expression, and is therefore invalid outside of 6.6 paragraph 10.

If 1,2 would be a constant expression, this would allow code like this to compile:
{ // code // How the compiler interprets:
int a[10, 10]; // int a[10];
a[5, 8] = 42; // a[8] = 42;
}
I don't know whether it is the real reason, but I can imagine that emitting an error for this (common?) mistake was considered more important than turning 1,2 into a constant expression.
UPDATE: As R. points out in a comment, the code about is not longer a compiler error since the introduction of VLAs.

Related

What types of expressions are evaluated at compile time?

What types of expressions are evaluated at compile time?
I had this problem because I was learning conditional compilation.
#if is followed by constant-expression in conditional compilation and constant-expression should be evaluated at compile time.
I want to learn what kind of expression can be constant-expression and can follow #if.
#if x>0 || defined(ABC) && defined(BCD)
Is this right, especially x>0?
It's not trivial to write a complete and correct answer the question, since a lot of things are computed at compile-time in C. To answer, one would need to go rather deep into the C standard with lots of "language lawyer" terms used internally by it.
First there is the whole pre-processing part as described in translation phases (C17 5.1.1.2). Included in those pre-processing translation phases is for example the #if directive, which has the formal syntax like:
# if constant-expression new-line
Where constant expression is another term for expressions that are always evaluated at compile-time. C defines such expressions in C17 6.6:
A constant expression can be evaluated during translation rather than runtime, and
accordingly may be used in any place that a constant may be.
Constraints
Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.
Each constant expression shall evaluate to a constant that is in the range of representable
values for its type.
It then categorizes constant expressions into the following types:
Integer constant expressions
An integer constant expression shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.
Arithmetic constant expressions
Nearly identical definition as per above except it also allows floating point types. So an integer constant expression is also an arithmetic constant expression. (The arithmetic types in C are all integer and all floating point types.)
Address constants
An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be
accessed by use of these operators.
Implementation-defined forms (compiler-specific language extensions).

Why const pointer difference cannot be used as initializer for a static variable?

Compiling following piece of C code (using MSVC):
char * const p1;
char * const p2;
static size_t sz = p2 - p1;
results in "initializer is not a constant" error for definition of sz.
As pointers are const (tried also with arrays, same error), why is pointer diff not constant?
Per C 2018 6.6 7, for constants used in initializers, the C standard only requires implementations to support an arithmetic constant expression, a null pointer constant, an address constant, or an address constant for a complete object type plus or minus an integer constant expression. None of these include the subtraction of two addresses, as shown by their definitions below.
A C implementation might be able to resolve the subtraction of two addresses of symbols of the same kind, especially if the compiler can see they will be placed in the same program segment, and the C standard permits an implementation to do this. However, the standard does not require a C implementation to support this, and that is at least in part because the subtraction of two symbols may involve various difficulties. One is that two symbols might refer to objects in different program segments, such as one in a constant read-only section and another in an uninitialized data section. The compiler could not know the relative difference between these sections because it depends on contributions from other object modules linked into the program, and the object module format might not support any way of expressing this difference as something to be resolved by the linker. Even within one section, some object module and symbol schemes may allow the linker to rearrange things, to optimize for alignment issues.
Per 6.6 8:
An arithmetic constant expression shall have arithmetic type and shall only have operands that are integer constants, floating constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and _Alignof expressions. Cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types, except as part of an operand to a sizeof or _Alignof operator.
Per 6.6 9:
An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.
Per 6.3.2.3 3:
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.…
Per 6.6 6:
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.

Not a constant initializer element?

I encountered a confusing case when I was doing semantic analysis for my compiler course.
#include <stdio.h>
int a = "abcd"[2];
int main()
{
char b = "abcd"[2];
printf("%d\n%c\n", a, b);
return 0;
}
GCC says "error: initializer element is not constant" for variable "a".
Why?
The C language requires initializers for global variables to be constant expressions. The motivation behind this is for the compiler to be able to compute the expression at compile time and write the computed value into the generated object file.
The C standard provides specific rules for what is a constant expression:
An
integer constant expression117)
shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants,
sizeof
expressions whose results are integer constants,
_Alignof
expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant
expression shall only convert arithmetic types to integer types, except as part of an
operand to the
sizeof
or
_Alignof
operator
.
More latitude is permitted for constant expressions in initializers. Such a constant
expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant
expression.
As you can see non of the cases include an array access expression or a pointer dereference. So "abcd"[2] does not qualify as a constant expression per the standard.
Now the standard also says:
An implementation may accept other forms of constant expressions.
So it would not violate the standard to allow "abcd"[1] as a constant expression, but it's also not guaranteed to be allowed.
So it's up to you whether or not to allow it in your compiler. It will be standard compliant either way (though allowing it is more work as you need another case in your isConstantExpression check and you need to actually be able to evaluate the expression at compile time, so I'd go with disallowing it).
int a = "abcd"[2];
a is a global variable initilize at compile time but the "abcd"[2] is computed at run time.
char b = "abcd"[2];
here b is local variable and it initilize at run time after "abcd"[2] computed.

Using sizeof() in array declarations in C89

I was under the impression that variable-size array declarations were not possible in C89. But, when compiling with clang -ansi I am able to run the following code:
double array[] = { 0.0, 1.0, 2.0, 3.0, 4.0 };
double other_array[sizeof(array)] = { 0.0 };
What is going on here? Is that not considered a variable-size array declaration?
In ANSI C89 a.k.a. ISO C90, the sizeof operator yields an integer constant, which is suitable for array dimensions. Function calls, for example, are not.
I'd like to add another remark, since I believe the code as-is has a problem that might get overlooked.
If the other_array is declared as
double other_array[sizeof(array)];
it will neither have the same number of elements, nor the same size (that would only be true for array of char) as array[]. If the intent is to declare a second array with the same number of elements (regardless of type), use this:
double other_array[sizeof(array)/sizeof(*array)];
That is because result of sizeof operator is constant expression, so it does not qualify for VLA, just like the following declaration:
int other_array[5];
cannot be variable length array either. From C11 (N1570) §6.6/p6 Constant expressions (emphasis mine going forward):
An integer constant expression117) shall have integer type and shall
only have operands that are integer constants, enumeration constants,
character constants, sizeof expressions whose results are integer
constants, _Alignof expressions, and floating constants that are the
immediate operands of casts.
For sake of completeness, the sizeof operator does not always results into constant expression, though this only affects post-C89 standards (in C11 VLAs were made optional). Referring to §6.5.3.4/p2 The sizeof and _Alignof operators:
If the type of the operand is a variable length array type, the
operand is evaluated; otherwise, the operand is not evaluated and the
result is an integer constant.
First, let's see the criteria for an array (not being) a VLA. C11 doc, chapter §6.7.6.2,
[...] If the size is an integer constant expression
and the element type has a known constant size, the array type is not a variable length
array type; [...]
Coming to your case, sizeof is a compile-time operator, so it produces a value that is considered compile time constant expression. An array definition, whose size is specified as a compile time constant expression is not a VLA. So, in your code,
int other_array[sizeof(array)]
is not a VLA.
Regarding the sizeof operator result, from C11, chapter §6.5.3.4, (emphasis mine)
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. [...] otherwise, the operand is not evaluated and the result is an integer constant.

Operators indirectly forbidden (or not?) in defining integer constant expressions (in C)

In standard C (C99/C11) we have the so-called integer constant expressions, which are constant expressions whose operands are all constant integers.
The following definition applies:
Standard C99, Section 6.6(par.6):
An integer constant expression) shall have integer type and shall
only have operands that are integer costants, enumeration constants,
character constants, sizeof expressions whose results are integer
constants, and floating constants that are the immediate operands of
casts.
Standard C99
This appears after the definition of the more general constant expression.
(Since integer constant expression are defined after constant expression, I assume that the former is a particular case of the last.)
On the other hand, conditional expressions are considered constant expressions, constrained by the following rule:
Standard C99, Section 6.6:
Constant expressions shall not contain assignment, increment,
decrement, function-call, or comma operators, except when they are
contained within a subexpression that is not evaluated.
By unrolling the meaning of conditional expression we can fall down to postfix expressions and/or unary expressions.
Now, if we apply these constraints to integer constant expressions, we roughly obtain that they consist of conditional expressions restricted in such a way that every operand is integer/enumeration/character constants (or floating constant immediately preceded by a cast), and such that there are no assignment, increment, decrement, function-call or comma operators.
By simplicity, let us suppose that E is a such expression, without any sizeof operator and without non-evaluated operands.
MY QUESTION IS:
Are the following operators indirectly forbidden in E:
& (address),
* (indirection),
[] (array-subscript),
. (struct member),
-> (pointer to struct members).
In addition, are compound literals also forbidden?
Aditional note: I am interested in answering this question for strict conforming programs (C99/C11).
I think that they cannot be in any subexpression of E, but I am not sure if this is completely true. My quick reasoning is as follows:
If F is an integer constant subexpression of E, then F has, by definition, an integer type T.
If the unary operator & appears before F in E, then &F ins an operand having type "pointer to T", which is not allowed in E (in despite of that F is not an object, but only an integer value, so & cannot be applied). Thus & cannot appear in any E.
Since F has not any pointer type, it has no sense the expression *F.
A subscript operator [] is used to indicate an element inside an array. This means that we would have in E something like A[N]. Here, N must be an integer constant expression. However we note that A is also an operand, but it is an object of type array, which is not allowed in E. This implies that the array-subscript operator cannot appear in E.
If we have in E the operators . and ->, it implies they are used inside E as follows: S.memb pS->memb. Thus, we have the operand S whose type is struct or union and pS which is a pointer to struct or pointer to union. But these kind of "operands" are not allowed in E.
Compound literals are not allowed in E, because they are lvalues, which implies they will have an address when the program runs. Since such an address cannot be known by the compiler, the expression involving a compound literal is not considered a constant.
Do you think that my reasonings are right?
Do you know exceptional cases in that some of these operators or expressions can be [part of] an integer constant expression (as in the restricted case that I denoted E).
An ICE only has to have values (rvalues in the jargon) as primary expressions that constitute it, and no objects (lvalues).
If you build up from there to exclude operators you see that
none of the operators that need an lvalue as operand can be used (assignment, increment, decrement, unary &)
none of the operators that produce an lvalue can be used either (unary *, array member [], member ->)
the . operators that needs a struct as argument, since
there are no literals for struct
Compound literals are a misnomer, they are objects.
Function calls are not allowed either.
Some of these operators can appear in places when they are not evaluated (or not supposed not to be), in particular _Alignof, the macro offsetof and some appearances of sizeof.

Resources