Why can't a constant address be computed in a bitfield? - c

I'm trying to implement a constant linked DMA descriptor list (in ROM) on an Silabs EFR32BG22, where the last descriptor links to another descriptor located in RAM.
I'm using arm-none-eabi-gcc 10.2 (Cortex M33).
I want to use the address of the descriptor in the .data section (desc3) in a constant initializer. However, this does not work when the descriptor struct is defined as a bitfield. desc1 fails to compile with the error initializer element is not computable at load time.
But as I understand it, all required information should be available at link time. And when I use the equivalent hack in desc2, my code compiles and works as expected.
Why can't the compiler figure out the initializer of the first struct, which should be a constant expression as well?
typedef struct {
uint32_t linkMode : 1;
uint32_t link : 1;
int32_t linkAddr : 30;
} DMA_Descriptor1_t;
typedef struct {
int32_t linkAddr;
} DMA_Descriptor2_t;
/* descriptor in RAM */
static DMA_Descriptor1_t desc3;
/* fails */
static const DMA_Descriptor1_t desc1 =
{
.linkMode = 0, // bit 0
.link = 1, // bit 1
.linkAddr = ((uint32_t) &desc3) // bits 31..2
};
/* works */
static const DMA_Descriptor2_t desc2 =
{
.linkAddr = ((uint32_t) &desc3 + (0x1uL << 1))
};

.linkAddr = ((uint32_t) &struct3)
This does not provide a constant expression for an initializer for a static object.
C 2018 6.7.9 4 says “All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.”
C 2018 6.6 7 specifies what is permitted for constant expressions in initializers:
… Such a constant expression shall be, or evaluate to, one of the following:
— an arithmetic constant expression,
— a null pointer constant,
— an address constant, or
— an address constant for a complete object type plus or minus an integer constant expression.
((uint32_t) &struct3) is not an arithmetic constant expression because it does not only have operands that are “integer constants, floating constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and _Alignof expressions” (6.6 8), since struct3 is not any of those operands.
It is clearly not a null pointer constant.
&struct3 is an address constant (per 6.6 9, “a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator”), but (uint32_t) &struct3 is an address constant cast to another type, so it is not an address constant.
And it is not an address constant plus or minus an integer constant expression.
That is the rule-based reason why the initializer is not satisfactory according to the C standard. A practical reason is that address constants and address constants with displacements are implemented by “fix up” operations in the linker or program loader. When an object module is created, the compiler (or other creator) may be references to symbols (such as struct3) in it. Those references must occur in certain forms, such as whole words or as known fields in instruction encodings. After the linker or program loader determines the memory address of a symbol, it completes the references to symbols by filling in the locations where there were unresolved references to symbols. The software written for this only updates the specified forms of reference—it can patch a whole word with an address but cannot do arbitrary calculations, not even the masking and shifting needed to put part of the address into a bit-field.
When we look at .linkAddr = ((uint32_t) &desc3 + (0x1uL << 1)), we see the initializer is not a constant expression as defined above. However C 2018 6.6 10 says “An implementation may accept other forms of constant expressions,” and this expression is very similar to the fourth option above, an address constant plus an integer constant expression. This expression has an address constant converted to uint32_t. For a C implementation that uses a “flat” 32-bit address space with natural pointers, the conversion to uint32_t is functionally no-operation, just a type change, and then adding to it is equivalent to adding to a char *; it is a simple addition that is supported in the relocation fix-up operations.
In contrast, .linkAddr = ((uint32_t) &desc3) requires that initializer be converted to the 30-bit field of linkAddr. This is a different operation from addition, likely not supported by the relocation fix-up operations. Or, looking at the initialization of the structure as a whole, we see the goal is to initialize the entire 32 bits with the address with bits 0 and 1 set to 0 and 1, respectively. That could be done with a mask and an OR, but, again, the relocation fix-up operations do not support these operations.
Testing GCC on Compiler Explorer, we can see it accepts an addition with an address but does not accept an OR operation. We can reasonably conclude that GCC (for some platforms) accepts addresses (optionally cast to integers of the same width) with additions or subtractions but not with AND or OR operations.

Related

clang/gcc cannot set global variables to an address constant minus another address constant

The program below compiles without errors.
#include <stdio.h>
char addr_a[8];
char addr_b[8];
unsigned long my_addr = (unsigned long)addr_b - 8; // PASS
// unsigned long my_addr = (unsigned long)addr_b - (unsigned long)addr_a; // FAIL (error: initializer element is not constant)
int main() {
printf("%lx\n", my_addr);
return 0;
}
Interestingly, when I set unsigned long my_addr = (unsigned long)addr_b - (unsigned long)addr_a the compiler throws "error: initializer element is not constant."
I know globals can only be initialized with a constant expression. I also know that the types of constant expressions that can be used in an initializer for a global are specified in section 6.6p7 of the C standard:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
Note that an address constant minus an integer constant is allowed, but not an address constant minus another address constant.
Question:
Why does the C standard restrict the ways you can initialize global variables? What is stopping the C standard from accepting unsigned long my_addr = (unsigned long)addr_b - (unsigned long)addr_a?
Why would you want this?
Suppose addr_a and addr_b represent the start and end of the .text section respectively. A program may want to map the .text section, which has size (unsigned long)addr_b - (unsigned long)addr_a. The trusted-firmware-a project does this in Boot Loader stage 2 (BL2). See BL_CODE_END - BL_CODE_BASE, which is used in arm_bl2_setup.c.
Objects with static storage duration (i.e. globals, plus locals defined as static) can only be initialized with a constant expression.
The types of constant expression that can be used in an initializer for such an object is specified in section 6.6p7 of the C standard:
More latitude is permitted for constant expressions in
initializers. Such a constant expression shall be, or evaluate to,
one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
Note that an address constant plus an integer constant is allowed, but not an address constant plus another address constant.
Granted this still isn't exactly what you have, as you have address constants casted to integer type. So let's check 6.6p6 which defines an integer constant expression:
An integer constant expression shall have integer type and
shall only have operands that are integer constants, enumeration
constants, character constants, sizeof expressions whose results
are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in
an integer constant expression shall only convert arithmetic
types to integer types, except as part of an operand to the
sizeof or _Alignof operator.
This paragraph doesn't allow for casting an address constant to an integer type as part of an integer constant expression, but apparently this seems to be supported as an extension.
What is stopping the C standard from accepting unsigned long my_addr = (unsigned long)addr_a + (unsigned long)addr_b?
The underlying reason is "Because why would anyone want that?" It's not meaningful to add two absolute addresses together; the result isn't the address of anything in particular.
It's thus a sort of chicken-and-egg thing. The language doesn't support it because it's useless, but also because existing linkers and object file formats don't support such a relocation. For instance, for ELF on x86-64, see the psABI Table 4.9 for a list of supported relocations, and note there is no S+S. And the linkers don't support it because it's useless, and because the language doesn't require it to be supported.
I guess originally, the tools probably came before the language (the earliest C compilers would have used linkers designed for assembly programs). So the original tools probably didn't support this, the language saw no need to demand that they do so, and over time, neither one ever saw a need to add it.

C fixed size array treated as variable size

I have been trying to define a static array with size that should be known at compile time (it's a constant expression). It appears that gcc cannot determine the size of the array when it contains a floating point constant (and I get "storage size of ... isn’t constant").
Here is a minimal example:
int main(void)
{
static int foo[(unsigned)(2 / 0.5)];
return 0;
}
What is the reason for this behavior?
EDIT
I already have the answer I needed. I still don't understand the rationale behind not allowing that kind of expressions, but this is a separate question.
I'll explain for the curious how I arrived at the problem.
It's about a game I'm writing as an excercise. Units move on a battlefield and I have divided the movement in steps. I have to remember the position of each unit on each step so that I can display animation later. The number of steps is chosen so that it ensures there will be a step on which units are close enough to fight each other but not so close as to collide. Here are the relevant pieces of code:
#define UNIT_SPEED_LIMIT 12
#define DISTANCE_MELEE 0.25
#define MOVEMENT_STEPS (unsigned)(2 * UNIT_SPEED_LIMIT / DISTANCE_MELEE)
struct position (*movements)[MOVEMENT_STEPS + 1];
Defining DISTANCE_MELEE (maximum distance at which close combat is possible) and using it to calculate the number of steps seems to be the natural way to proceed (more so because I use this constant in multiple contexts). Since I cannot define movements this way, I have to invent a concept like "number of steps for a single unit of distance" and use multiplication by int instead of division by double. I want to avoid dynamic memory allocation in order to keep the code simple.
According to the publicly available C99 draft standard n1256, the syntax for array declaration is described by
6.7.5.2 Array declarators
2
An ordinary identifier (as defined in 6.2.3) that has a variably modified type shall have
either block scope and no linkage or function prototype scope. If an identifier is declared
to be an object with static storage duration, it shall not have a variable length array type.
4
If the size is not present, the array type is an incomplete type. If the size is * instead of being an
expression, the array type is a variable length array type of
unspecified size, which can only be used in declarations with function
prototype scope; 124) such arrays are nonetheless complete types. If
the size is an integer constant expression and the element type has a
known constant size, the array type is not a variable length array
type; otherwise, the array type is a variable length array type.
So the expression in the [] must be an integer constant expression for the array to be declarable with static storage duration. The standard has this to say about integer constant expressions:
6.6 Constant expressions
6
An integer constant expression 99) shall have integer type and shall
only have operands that are integer constants, enumeration constants,
character constants, sizeof expressions whose results are integer
constants, and floating constants that are the immediate operands of
casts. Cast operators in an integer constant expression shall only
convert arithmetic types to integer types, except as part of an
operand to the sizeof operator.
Unfortunately, (unsigned)(2 / 0.5) does not apply the cast immediately to a floating-point constant, but rather to an arithmetic constant expression. This does not constitute an integer constant expression, and is thus not permissible as the size of an array with static storage duration.
OP's primary question is well answer here.
To address OP's higher level problem of how to use values like 0.5 or 0.25 in pre-processing, use fractional arithmetic:
#define UNIT_SPEED_LIMIT 12
// #define DISTANCE_MELEE 0.25
// use 25/100 or 1/4 or ...
#define DISTANCE_MELEE_N 1
#define DISTANCE_MELEE_D 4
// #define MOVEMENT_STEPS (unsigned)(2 * UNIT_SPEED_LIMIT / DISTANCE_MELEE)
#define MOVEMENT_STEPS (2u * UNIT_SPEED_LIMIT * DISTANCE_MELEE_D / DISTANCE_MELEE_N)
struct position (*movements)[MOVEMENT_STEPS + 1];

error: initializer element is not constant in assigning structure pointers

I can successfully compile this c code in one IDE (using c99 compiler)
but not under linux using gcc
#define MAX_STRUCT 3
sstructtype SStructrecord1, SStructrecord2, SStructrecord3;
u32 RecordStructAdd[MAX_STRUCT] = {(u32)&SStructrecord1, (u32)&SStructrecord2, (u32)&SStructrecord3};
I have also tried with assigning individual values but still same error.What wrong I am doing here?
error: initializer element is not constant
error: (near initialization for ‘RecordStructAdd[0]’)
The issue is the cast. If you simply store the pointers as pointers then the compiler won't complain:
#define MAX_STRUCT 3
sstructtype SStructrecord1, SStructrecord2, SStructrecord3;
sstructtype *RecordStructAdd[MAX_STRUCT] = {&SStructrecord1, &SStructrecord2, &SStructrecord3};
Here's a brief explanation:
The compiler generates an executable that contains relocatable code. The addresses in the executable are placeholders that must be fixed up by the loader at runtime. To help the loader do it's job, the executable contains a relocation table which specifies which placeholder addresses must be replaced by which actual addresses when the executable is loaded.
So if you store the address of an object in a static variable, then the executable will have an entry in the relocation table that allows the loader to put the correct address in the static variable at load time.
But if an address is 64-bits, and a U32 is 32-bits, then casting the pointer to a U32 only stores a portion of the address, not the full address. The relocation table has no mechanism to fix up partial addresses. Hence, a pointer that you cast to a smaller type is not a compile time constant.
Here's what the C specification has to say in §6.6 paragraph 7:
6.6 Constant expressions
...
7 More latitude is permitted for constant expressions in initializers.
Such a constant expression shall be, or evaluate to, one of the
following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
An address cast to a U32 is not one of the allowed constant expressions. But an implementation where a pointer and a U32 are the same size may let you get away with it.

Initializer with constant expression having possible overflow in C99

Is this valid C99 code? If so, does it define an implementation-defined behavior?
int a;
unsigned long b[] = {(unsigned long)&a+1};
From my understanding of the C99 standard, from §6.6 in the ISO C99 standard, this might be valid:
An integer constant expression shall have integer type and shall only have operands that are integer constants (...) Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
(...)
an address constant for an object type plus or minus an integer constant expression.
However, because there is the possibility of the addition overflowing, this might not be considered a constant expression and therefore not valid C99 code.
Could someone please confirm if my reasoning is correct?
Note that both GCC and Clang accept this code without warnings, even when using -std=c99 -pedantic. However, when casting to unsigned int instead of unsigned long, that is, using the following code:
int a;
unsigned long b[] = {(unsigned int)&a+1};
Then both compilers complain that the expression is not a compile-time constant.
From this clang developers thread on a similar issue: Function pointer is compile-time constant when cast to long but not int? the rationale is that the standard does not require the compiler to support this(this scenario is not included in any of bullets in 6.6p7) and although it is allowed to support this supporting truncated addresses would be burdensome:
I assume that sizeof(int) < sizeof(void(*)()) == sizeof(long) on
your target. The problem is that the tool chain almost certainly
can't express a truncated address as a relocation.
C only requires the implementation to support initializer values that
are either (1) constant binary data, (2) the address of some object, or
(3) or an offset added to the address of some object. We're allowed,
but not required, to support more esoteric things like subtracting two
addresses or multiplying an address by a constant or, as in your
case, truncating the top bits of an address away. That kind of
calculation would require support from the entire tool chain from
assembler to loader, including various file formats along the way.
That support generally doesn't exist.
Your case, which is casting a pointer to a integer type does not fit any of the cases under 6.6 paragraph 7:
More latitude is permitted for constant expressions in initializers.
Such a constant expression shall be, or evaluate to, one of the
following:
an arithmetic constant expression,
anull pointer constant,
an address constant, or
an address constant for an object type plus or minus an integer constant expression.
but as mentioned in the post compiler are allowed to support other forms of constant expression:
An implementation may accept other forms of constant expressions.
but neither clang nor gcc accept this.
This code is not required to be accepted by a conforming implementation. You quoted the relevant passage in your question:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for an object type plus or minus an integer constant expression.
(unsigned long)&x is none of those things. It's not an arithmetic constant because of C11 6.6/8:
Cast operators in an arithmetic constant expression shall only convert
arithmetic types to arithmetic types
(pointer types are not arithmetic types, 6.2.5/18); and it is not an address constant because all address constants are pointers (6.6/9). Finally a pointer plus or minus an ICE is another pointer, so it is not that either.
However 6.6/10 says that an implementation may accept other forms of constant expressions. I'm not sure whether this means the original code should be called ill-formed or not (ill-formed code requires a diagnostic). Clearly your compiler is accepting some other constant expressions here.
The next issue is that casting from pointer to integer is implementation-defined. It may also be undefined if there is no integer representation corresponding to the particular pointer. (6.3.2.3/6)
Finally, the + 1 on the end makes no difference. unsigned long arithmetic is well-defined on addition and subtraction, so it is OK if and only if (unsigned long)&x is OK.
First of all, your initializer is not necessarily a constant expression. If a has local scope, then it is assigned an address in run-time, when it gets pushed on the stack. C11 6.6/7 says that in order for a pointer to be a constant expression, it has to be an address constant, which is defined in 6.6/9 as:
An address constant is a null pointer, a pointer to an lvalue
designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type.
(Emphasis mine)
As for whether your code is standard C, yes it is. Pointer conversions to integers are allowed, although they may come with various forms of poorly specified behavior. Specified in 6.5/6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
To safely ensure that the pointer can fit into the integer, you need to use uintptr_t. But I don't think pointer to integer conversion was the reason you posted this question.
Regarding whether an integer overflow would prevent it from being a compile time constant, I'm not sure where you got that idea from. I don't believe your reasoning is correct, for example (INT_MAX + INT_MAX) is a compile time constant and it is also guaranteed to overflow. (GCC gives you a warning.) In case it overflows, it will invoke undefined behavior.
As for why you get errors about the expression not being a compile-time constant, I don't know. I can't reproduce it on gcc 4.9.1. I tried declaring a with both static and automatic storage duration but no difference.
Sounds like you somehow accidentally compiled as C90, in which case gcc will tell you "error: initializer element is not computable at load time". Or maybe there was a compiler bug which has been fixed in my version of gcc.

Why does this implementation of offsetof() work?

In ANSI C, offsetof is defined as below.
#define offsetof(st, m) \
((size_t) ( (char *)&((st *)(0))->m - (char *)0 ))
Why won't this throw a segmentation fault since we are dereferencing a NULL pointer? Or is this some sort of compiler hack where it sees that only address of the offset is taken out, so it statically calculates the address without actually dereferencing it? Also is this code portable?
At no point in the above code is anything dereferenced. A dereference occurs when the * or -> is used on an address value to find referenced value. The only use of * above is in a type declaration for the purpose of casting.
The -> operator is used above but it's not used to access the value. Instead it's used to grab the address of the value. Here is a non-macro code sample that should make it a bit clearer
SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);
The second line does not actually cause a dereference (implementation dependent). It simply returns the address of SomeIntMember within the pSomeType value.
What you see is a lot of casting between arbitrary types and char pointers. The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.
Although that is a typical implementation of offsetof, it is not mandated by the standard, which just says:
The following types and macros are defined in the standard header <stddef.h> [...]
offsetof(type,member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
statictypet;
then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)
Read P J Plauger's "The Standard C Library" for a discussion of it and the other items in <stddef.h> which are all border-line features that could (should?) be in the language proper, and which might require special compiler support.
It's of historic interest only, but I used an early ANSI C compiler on 386/IX (see, I told you of historic interest, circa 1990) that crashed on that version of offsetof but worked when I revised it to:
#define offsetof(st, m) ((size_t)((char *)&((st *)(1024))->m - (char *)1024))
That was a compiler bug of sorts, not least because the header was distributed with the compiler and didn't work.
In ANSI C, offsetof is NOT defined like that. One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways. Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.
The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.
To answer the last part of the question, the code is not portable.
The result of subtracting two pointers is defined and portable only if the two pointers point to objects in the same array or point to one past the last object of the array (7.6.2 Additive Operators, H&S Fifth Edition)
Listing 1: A representative set of offsetof() macro definitions
// Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)
// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)
// Diab Coldfire compiler
#define offsetof(s,memb) ((size_t)((char *)&((s *)0)->memb-(char *)0))
typedef struct
{
int i;
float f;
char c;
} SFOO;
int main(void)
{
printf("Offset of 'f' is %zu\n", offsetof(SFOO, f));
}
The various operators within the macro are evaluated in an order such that the following steps are performed:
((s *)0) takes the integer zero and casts it as a pointer to s.
((s *)0)->m dereferences that pointer to point to structure member m.
&(((s *)0)->m) computes the address of m.
(size_t)&(((s *)0)->m) casts the result to an appropriate data type.
By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure.
It doesn't segfault because you're not dereferencing it. The pointer address is being used as a number that's subtracted from another number, not used to address memory operations.
It calculates the offset of the member m relative to the start address of the representation of an object of type st.
((st *)(0)) refers to a NULL pointer of type st *.
&((st *)(0))->m refers to the address of member m in this object. Since the start address of this object is 0 (NULL), the address of member m is exactly the offset.
char * conversion and the difference calculates the offset in bytes. According to pointer operations, when you make a difference between two pointers of type T *, the result is the number of objects of type T represented between the two addresses contained by the operands.
Quoting the C standard for the offsetof macro:
C standard, section 6.6, paragraph 9
An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.
The macro is defined as
#define offsetof(type, member) ((size_t)&((type *)0)->member)
and the expression comprises the creation of an address constant.
Although genuinely speaking, the result is not an address constant because it does not point to an object of static storage duration. But this is still agreed upon that the value of an object shall not be accessed, so the integer constant cast to pointer type will not be dereferenced.
Also, consider this quote from the C standard:
C standard, section 7.19, paragraph 3
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
A struct in C is a composite data type (or record) declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address.
From the compiler perspective, the struct declared name is an address and the member designator is an offset from that address.

Resources