Address vs. expression on address in static initializer in C - c

(This question was inspired by investigating an earlier question)
I have a code sample that initializes two global static variables: one is a pointer to extern variable, another is an expressioon computed from that pointer:
#include <stdint.h>
#define UNCACHE_MASK 0xABCDEF12UL // Value of the mask to apply
extern int memory_area;
const void * virtual_address = &memory_area;
const uintptr_t int_address = ((uintptr_t)&memory_area) | UNCACHE_MASK;
When I compile I get the following:
$ gcc -c test.c
test.c:6:1: error: initializer element is not computable at load time
const uintptr_t int_address = ((uintptr_t)&memory_area) | UNCACHE_MASK;
^
I am not much of an expert in C, but it seems that if &memory area is good for initializing virtual_address, it should also be good for initializing int_address.
What am I missing?
(gcc version 4.8.2, Cygwin on Win 7)

The formal definition of constant expressions of C language allow converting integer values to pointer types (to form address constants), but not the other way around (to form arithmetic constant expressions). It explicitly states that "cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types". For this reason the (uintptr_t) &memory_area bit violates the requirements imposed on arithmetic constant expressions. The expression is formally not a constant expression and therefore cannot be used in initializer for an object with static storage duration.
So, in short, &memory_area is an address constant, but (uintptr_t) &memory_area is not an arithmetic constant expression.
But it is indeed strange to see that GCC does not allow it as an extension.

Related

Why cannot a const struct field be used as an initializer in C? [duplicate]

I thought C had no more surprises for me, but this surprised me.
const int NUM_FOO = 5;
....
int foo[NUM_FOO];
==>error C2057: expected constant expression
My C++ experience has made me internally deprecate #define as much as possible. So this one was a real surprise. VS2019, compiled with /TC. I thought C99 allowed variable size arrays anyway.
Can anybody explain why the rejection occurs, since the compiler for sure knows at compile time the size of the array?
Is it not the case that C99 allows variable size arrays?
In C, this declaration:
const int NUM_FOO = 5;
doesn't make NUM_FOO a constant expression.
The thing to remember (and yes, this is a bit counterintuitive) is that const doesn't mean constant. A constant expression is, roughly, one that can be evaluated at compile time (like 2+2 or 42). The const type qualifier, even though its name is obviously derived from the English word "constant", really means "read-only".
Consider, for example, that these are a perfectly valid declarations:
const int r = rand();
const time_t now = time(NULL);
The const just means that you can't modify the value of r or now after they've been initialized. Those values clearly cannot be determined until execution time.
(C++ has different rules. It does make NUM_FOO a constant expression, and a later version of the language added constexpr for that purpose. C++ is not C.)
As for variable length arrays, yes, C added them in C99 (and made them optional in C11). But as jamesdlin's answer pointed out, VS2019 doesn't support C99 or C11.
(C++ doesn't support VLAs. This: const int NUM_FOO = 5; int foo[NUM_FOO]; is legal in both C99 and C++, but for different reasons.)
If you want to define a named constant of type int, you can use an enum:
enum { NUM_FOO = 5 };
or an old-fashioned macro (which isn't restricted to type int):
#define NUM_FOO 5
jamesdlin's answer and dbush's answer are both correct. I'm just adding a bit more context.
A variable with the const qualifier does not qualify as a constant expression.
Section 6.6p6 of the C11 standard regarding Constant Expressions states
An integer constant expression shall have integer type and
shall only have operands that are integer constants,
enumeration constants, character constants, sizeof expressions
whose results are integer constants, _Alignof expressions, and
floating constants that are the immediate operands of casts. Cast
operators in an integer constant expression shall only convert
arithmetic types to integer types, except as part of an
operand to the sizeof or _Alignof operator
Note that const qualified integer objects are not included.
This means that int foo[NUM_FOO]; is a variable length array, defined as follows from section 6.7.6.2p4:
If the size is not present, the array type is an incomplete type. If
the size is * instead of being an expression, the array type
is a variable length array type of unspecified size, which can
only be used in declarations or type names with function prototype
scope; such arrays are nonetheless complete types. If the size is
an integer constant expression and the element type has a known
constant size, the array type is not a variable length array
type; otherwise, the array type is a variable length array
type.
As for the error you're getting, that is because Visual Studio is not fully compliant with C99 and does not support variable length arrays.
const in C does not declare a compile-time constant. You can use an enum constant instead if you want to avoid using #define and want a symbolic name that can appear in a debugger.
C99 does support VLAs. However, VS2019 does not support C99.
On top of the existing answers which are all good, the reason a const-qualified object fundamentally can't in general be a constant expression, in the sense of "one that can be evaluated at compile time" (as mentioned in Keith's answer), is that it can have external linkage. For example, you can have in foo.c
const int NUM_FOO = 5;
and in bar.c:
extern int NUM_FOO;
...
int foo[NUM_FOO];
In this example, the value of NUM_FOO cannot be known when compiling bar.c; it is not known until you choose to link foo.o and bar.o.
C's model of "constant expression" is closely tied to properties that allow translation units (source files) to be translated (compiled) independently to a form that requires no further high-level transformations to link. This is also why you can't use addresses in constant expressions except for address constant expressions which are limited to essentially the address of an object plus a constant.

Initialize pointer inside struct as compile-time constant

I would like to know what the legal way of defining a constant struct that has a pointer as one of it's elements.
Looking at this post (Initializer element is not constant in C) I can see the following:
6.6 Constant expressions
(7) More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
— an arithmetic constant expression,
— a null pointer constant,
— an address constant, or
— an address constant for an object type plus or minus an integer constant expression.
(8) An arithmetic constant expression shall have an arithmetic type and shall only have operands that are integer constants, floating constants, enumeration constants, character constants, and sizeof expressions. Cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types, except as part of an operand to a sizeof operator whose result is an integer constant.
My question is if the following is well defined according to the (intersection of the C89 and C99) standard.
The contents of test.h:
#ifndef TEST_H
#define TEST_H
typedef struct _poly {
unsigned long int degree;
signed long int *coeffs;
} poly;
extern const poly my_poly;
extern void print_poly(poly p);
#endif
Contents of test.c:
#include <stdio.h>
#include "test.h"
static signed long int coeffs[3] = {1L, 2L, 3L};
const poly my_poly = {2UL, coeffs};
void print_poly(poly p)
{
/* Irrelevant mumbo-jumbo goes here. */
}
Contents of main.c:
#include "test.h"
int main(void)
{
print_poly(my_poly);
return 0;
}
Compiling on Debian 11 with gcc (and -Wall -Wextra -Wpedantic -std=c89 enabled), clang (with -Weverything -std=c89 enabled), tcc (with -Wall -std=c89 enabled), and pcc (with -std=c89 enabled) produces no warnings and no errors and runs as expected. Is the code:
static signed long int coeffs[3] = {1L, 2L, 3L};
const poly my_poly = {2UL, coeffs};
the correct way to initialize a constant struct that has a pointer as one of its member so that it is a compiler-time constant? It seems to follow the rule that it be an address constant, but I'm not sure.
As the question notes, an initializer may be an address constant.
C 2018 6.6 9 says “An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type…”
In const poly my_poly = {2UL, coeffs};, coeffs is an array of static storage duration, and it is implicitly converted to a pointer to its first element (per C 2018 6.3.2.1 3). Therefore, the result of the conversion, effectively &coeffs[0], is an address constant and may be used as an initializer.

clang/gcc cannot set global variables to an address constant minus another address constant

The program below compiles without errors.
#include <stdio.h>
char addr_a[8];
char addr_b[8];
unsigned long my_addr = (unsigned long)addr_b - 8; // PASS
// unsigned long my_addr = (unsigned long)addr_b - (unsigned long)addr_a; // FAIL (error: initializer element is not constant)
int main() {
printf("%lx\n", my_addr);
return 0;
}
Interestingly, when I set unsigned long my_addr = (unsigned long)addr_b - (unsigned long)addr_a the compiler throws "error: initializer element is not constant."
I know globals can only be initialized with a constant expression. I also know that the types of constant expressions that can be used in an initializer for a global are specified in section 6.6p7 of the C standard:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
Note that an address constant minus an integer constant is allowed, but not an address constant minus another address constant.
Question:
Why does the C standard restrict the ways you can initialize global variables? What is stopping the C standard from accepting unsigned long my_addr = (unsigned long)addr_b - (unsigned long)addr_a?
Why would you want this?
Suppose addr_a and addr_b represent the start and end of the .text section respectively. A program may want to map the .text section, which has size (unsigned long)addr_b - (unsigned long)addr_a. The trusted-firmware-a project does this in Boot Loader stage 2 (BL2). See BL_CODE_END - BL_CODE_BASE, which is used in arm_bl2_setup.c.
Objects with static storage duration (i.e. globals, plus locals defined as static) can only be initialized with a constant expression.
The types of constant expression that can be used in an initializer for such an object is specified in section 6.6p7 of the C standard:
More latitude is permitted for constant expressions in
initializers. Such a constant expression shall be, or evaluate to,
one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
Note that an address constant plus an integer constant is allowed, but not an address constant plus another address constant.
Granted this still isn't exactly what you have, as you have address constants casted to integer type. So let's check 6.6p6 which defines an integer constant expression:
An integer constant expression shall have integer type and
shall only have operands that are integer constants, enumeration
constants, character constants, sizeof expressions whose results
are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in
an integer constant expression shall only convert arithmetic
types to integer types, except as part of an operand to the
sizeof or _Alignof operator.
This paragraph doesn't allow for casting an address constant to an integer type as part of an integer constant expression, but apparently this seems to be supported as an extension.
What is stopping the C standard from accepting unsigned long my_addr = (unsigned long)addr_a + (unsigned long)addr_b?
The underlying reason is "Because why would anyone want that?" It's not meaningful to add two absolute addresses together; the result isn't the address of anything in particular.
It's thus a sort of chicken-and-egg thing. The language doesn't support it because it's useless, but also because existing linkers and object file formats don't support such a relocation. For instance, for ELF on x86-64, see the psABI Table 4.9 for a list of supported relocations, and note there is no S+S. And the linkers don't support it because it's useless, and because the language doesn't require it to be supported.
I guess originally, the tools probably came before the language (the earliest C compilers would have used linkers designed for assembly programs). So the original tools probably didn't support this, the language saw no need to demand that they do so, and over time, neither one ever saw a need to add it.

Why is const int x = 5; not a constant expression in C?

I thought C had no more surprises for me, but this surprised me.
const int NUM_FOO = 5;
....
int foo[NUM_FOO];
==>error C2057: expected constant expression
My C++ experience has made me internally deprecate #define as much as possible. So this one was a real surprise. VS2019, compiled with /TC. I thought C99 allowed variable size arrays anyway.
Can anybody explain why the rejection occurs, since the compiler for sure knows at compile time the size of the array?
Is it not the case that C99 allows variable size arrays?
In C, this declaration:
const int NUM_FOO = 5;
doesn't make NUM_FOO a constant expression.
The thing to remember (and yes, this is a bit counterintuitive) is that const doesn't mean constant. A constant expression is, roughly, one that can be evaluated at compile time (like 2+2 or 42). The const type qualifier, even though its name is obviously derived from the English word "constant", really means "read-only".
Consider, for example, that these are a perfectly valid declarations:
const int r = rand();
const time_t now = time(NULL);
The const just means that you can't modify the value of r or now after they've been initialized. Those values clearly cannot be determined until execution time.
(C++ has different rules. It does make NUM_FOO a constant expression, and a later version of the language added constexpr for that purpose. C++ is not C.)
As for variable length arrays, yes, C added them in C99 (and made them optional in C11). But as jamesdlin's answer pointed out, VS2019 doesn't support C99 or C11.
(C++ doesn't support VLAs. This: const int NUM_FOO = 5; int foo[NUM_FOO]; is legal in both C99 and C++, but for different reasons.)
If you want to define a named constant of type int, you can use an enum:
enum { NUM_FOO = 5 };
or an old-fashioned macro (which isn't restricted to type int):
#define NUM_FOO 5
jamesdlin's answer and dbush's answer are both correct. I'm just adding a bit more context.
A variable with the const qualifier does not qualify as a constant expression.
Section 6.6p6 of the C11 standard regarding Constant Expressions states
An integer constant expression shall have integer type and
shall only have operands that are integer constants,
enumeration constants, character constants, sizeof expressions
whose results are integer constants, _Alignof expressions, and
floating constants that are the immediate operands of casts. Cast
operators in an integer constant expression shall only convert
arithmetic types to integer types, except as part of an
operand to the sizeof or _Alignof operator
Note that const qualified integer objects are not included.
This means that int foo[NUM_FOO]; is a variable length array, defined as follows from section 6.7.6.2p4:
If the size is not present, the array type is an incomplete type. If
the size is * instead of being an expression, the array type
is a variable length array type of unspecified size, which can
only be used in declarations or type names with function prototype
scope; such arrays are nonetheless complete types. If the size is
an integer constant expression and the element type has a known
constant size, the array type is not a variable length array
type; otherwise, the array type is a variable length array
type.
As for the error you're getting, that is because Visual Studio is not fully compliant with C99 and does not support variable length arrays.
const in C does not declare a compile-time constant. You can use an enum constant instead if you want to avoid using #define and want a symbolic name that can appear in a debugger.
C99 does support VLAs. However, VS2019 does not support C99.
On top of the existing answers which are all good, the reason a const-qualified object fundamentally can't in general be a constant expression, in the sense of "one that can be evaluated at compile time" (as mentioned in Keith's answer), is that it can have external linkage. For example, you can have in foo.c
const int NUM_FOO = 5;
and in bar.c:
extern int NUM_FOO;
...
int foo[NUM_FOO];
In this example, the value of NUM_FOO cannot be known when compiling bar.c; it is not known until you choose to link foo.o and bar.o.
C's model of "constant expression" is closely tied to properties that allow translation units (source files) to be translated (compiled) independently to a form that requires no further high-level transformations to link. This is also why you can't use addresses in constant expressions except for address constant expressions which are limited to essentially the address of an object plus a constant.

Initializer with constant expression having possible overflow in C99

Is this valid C99 code? If so, does it define an implementation-defined behavior?
int a;
unsigned long b[] = {(unsigned long)&a+1};
From my understanding of the C99 standard, from §6.6 in the ISO C99 standard, this might be valid:
An integer constant expression shall have integer type and shall only have operands that are integer constants (...) Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
(...)
an address constant for an object type plus or minus an integer constant expression.
However, because there is the possibility of the addition overflowing, this might not be considered a constant expression and therefore not valid C99 code.
Could someone please confirm if my reasoning is correct?
Note that both GCC and Clang accept this code without warnings, even when using -std=c99 -pedantic. However, when casting to unsigned int instead of unsigned long, that is, using the following code:
int a;
unsigned long b[] = {(unsigned int)&a+1};
Then both compilers complain that the expression is not a compile-time constant.
From this clang developers thread on a similar issue: Function pointer is compile-time constant when cast to long but not int? the rationale is that the standard does not require the compiler to support this(this scenario is not included in any of bullets in 6.6p7) and although it is allowed to support this supporting truncated addresses would be burdensome:
I assume that sizeof(int) < sizeof(void(*)()) == sizeof(long) on
your target. The problem is that the tool chain almost certainly
can't express a truncated address as a relocation.
C only requires the implementation to support initializer values that
are either (1) constant binary data, (2) the address of some object, or
(3) or an offset added to the address of some object. We're allowed,
but not required, to support more esoteric things like subtracting two
addresses or multiplying an address by a constant or, as in your
case, truncating the top bits of an address away. That kind of
calculation would require support from the entire tool chain from
assembler to loader, including various file formats along the way.
That support generally doesn't exist.
Your case, which is casting a pointer to a integer type does not fit any of the cases under 6.6 paragraph 7:
More latitude is permitted for constant expressions in initializers.
Such a constant expression shall be, or evaluate to, one of the
following:
an arithmetic constant expression,
anull pointer constant,
an address constant, or
an address constant for an object type plus or minus an integer constant expression.
but as mentioned in the post compiler are allowed to support other forms of constant expression:
An implementation may accept other forms of constant expressions.
but neither clang nor gcc accept this.
This code is not required to be accepted by a conforming implementation. You quoted the relevant passage in your question:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for an object type plus or minus an integer constant expression.
(unsigned long)&x is none of those things. It's not an arithmetic constant because of C11 6.6/8:
Cast operators in an arithmetic constant expression shall only convert
arithmetic types to arithmetic types
(pointer types are not arithmetic types, 6.2.5/18); and it is not an address constant because all address constants are pointers (6.6/9). Finally a pointer plus or minus an ICE is another pointer, so it is not that either.
However 6.6/10 says that an implementation may accept other forms of constant expressions. I'm not sure whether this means the original code should be called ill-formed or not (ill-formed code requires a diagnostic). Clearly your compiler is accepting some other constant expressions here.
The next issue is that casting from pointer to integer is implementation-defined. It may also be undefined if there is no integer representation corresponding to the particular pointer. (6.3.2.3/6)
Finally, the + 1 on the end makes no difference. unsigned long arithmetic is well-defined on addition and subtraction, so it is OK if and only if (unsigned long)&x is OK.
First of all, your initializer is not necessarily a constant expression. If a has local scope, then it is assigned an address in run-time, when it gets pushed on the stack. C11 6.6/7 says that in order for a pointer to be a constant expression, it has to be an address constant, which is defined in 6.6/9 as:
An address constant is a null pointer, a pointer to an lvalue
designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type.
(Emphasis mine)
As for whether your code is standard C, yes it is. Pointer conversions to integers are allowed, although they may come with various forms of poorly specified behavior. Specified in 6.5/6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
To safely ensure that the pointer can fit into the integer, you need to use uintptr_t. But I don't think pointer to integer conversion was the reason you posted this question.
Regarding whether an integer overflow would prevent it from being a compile time constant, I'm not sure where you got that idea from. I don't believe your reasoning is correct, for example (INT_MAX + INT_MAX) is a compile time constant and it is also guaranteed to overflow. (GCC gives you a warning.) In case it overflows, it will invoke undefined behavior.
As for why you get errors about the expression not being a compile-time constant, I don't know. I can't reproduce it on gcc 4.9.1. I tried declaring a with both static and automatic storage duration but no difference.
Sounds like you somehow accidentally compiled as C90, in which case gcc will tell you "error: initializer element is not computable at load time". Or maybe there was a compiler bug which has been fixed in my version of gcc.

Resources