Linux Kernel's __is_constexpr Macro - c

How does the __is_constexpr(x) macro of the Linux Kernel work? What is its purpose? When was it introduced? Why was it introduced?
/*
* This returns a constant expression while determining if an argument is
* a constant expression, most importantly without evaluating the argument.
* Glory to Martin Uecker <Martin.Uecker#med.uni-goettingen.de>
*/
#define __is_constexpr(x) \
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
For a discussion on different approaches to solve the same problem, see instead: Detecting Integer Constant Expressions in Macros

Linux Kernel's __is_constexpr macro
Introduction
The __is_constexpr(x) macro can be found in Linux Kernel's include/kernel/kernel.h:
/*
* This returns a constant expression while determining if an argument is
* a constant expression, most importantly without evaluating the argument.
* Glory to Martin Uecker <Martin.Uecker#med.uni-goettingen.de>
*/
#define __is_constexpr(x) \
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
It was introduced during the merge window for Linux Kernel v4.17, commit 3c8ba0d61d04 on 2018-04-05; although the discussions around it started a month before.
The macro is notable for taking advantage of subtle details of the C standard: the conditional operator's rules for determining its returned type (6.5.15.6) and the definition of a null pointer constant (6.3.2.3.3).
In addition, it relies on sizeof(void) being allowed (and different than sizeof(int)), which is a GNU C extension.
How does it work?
The macro's body is:
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
Let's focus on this part:
((void *)((long)(x) * 0l))
Note: the (long)(x) cast is intended to allow x to have pointer types and to avoid warnings on u64 types on 32-bit platforms. However, this detail is not important for understanding the key points of the macro.
If x is an integer constant expression (6.6.6), then it follows that ((long)(x) * 0l) is an integer constant expression of value 0. Therefore, (void *)((long)(x) * 0l) is a null pointer constant (6.3.2.3.3):
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant
If x is not an integer constant expression, then (void *)((long)(x) * 0l) is not a null pointer constant, regardless of its value.
Knowing that, we can see what happens afterwards:
8 ? ((void *)((long)(x) * 0l)) : (int *)8
Note: the second 8 literal is intended to avoid compiler warnings about creating pointers to unaligned addresses. The first 8 literal could simply be 1. However, these details are not important for understanding the key points of the macro.
The key here is that the conditional operator returns a different type depending on whether one of the operands is a null pointer constant (6.5.15.6):
[...] if one operand is a null pointer constant, the result has the type of the other operand; otherwise, one operand is a pointer to void or a qualified version of void, in which case the result type is a pointer to an appropriately qualified version of void.
So, if x was an integer constant expression, then the second operand is a null pointer constant and therefore the type of the expression is the type of the third operand, which is a pointer to int.
Otherwise, the second operand is a pointer to void and thus the type of the expression is a pointer to void.
Therefore, we end up with two possibilities:
sizeof(int) == sizeof(*((int *) (NULL))) // if `x` was an integer constant expression
sizeof(int) == sizeof(*((void *)(....))) // otherwise
According to the GNU C extension, sizeof(void) == 1. Therefore, if x was an integer constant expression, the result of the macro is 1; otherwise, 0.
Moreover, since we are only comparing for equality two sizeof expressions, the result is itself another integer constant expression (6.6.3, 6.6.6):
Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
Therefore, in summary, the __is_constexpr(x) macro returns an integer constant expression of value 1 if the argument is an integer constant expression. Otherwise, it returns an integer constant expression of value 0.
Why was it introduced?
The macro came to be during the effort to remove all Variable Length Arrays (VLAs) from the Linux kernel.
In order to facilitate so, it was desirable to enable GCC's -Wvla warning kernel-wide; so that all instances of VLAs were flagged by the compiler.
When the warning was enabled, it turned out that GCC reported many cases of arrays being VLAs which was not intended to be so. For instance in fs/btrfs/tree-checker.c:
#define BTRFS_NAME_LEN 255
#define XATTR_NAME_MAX 255
char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
A developer may expect that max(BTRFS_NAME_LEN, XATTR_NAME_MAX) was resolved to 255 and therefore it should be treated as a standard array (i.e. non-VLA). However, this depends on what the max(x, y) macro expands to.
The key issue is that GCC generates VLA-code if the array's size is not an (integer) constant expression as defined by the C standard. For instance:
#define not_really_constexpr ((void)0, 100)
int a[not_really_constexpr];
According to the C90 standard, ((void)0, 100) is not a constant expression (6.6), due to the comma operator being used (6.6.3). In this case, GCC opts to issue VLA code, even when it knows the size is a compile-time constant. Clang, in contrast, does not.
Since the max(x, y) macro in the kernel was not a constant expression, GCC triggered the warnings and generated VLA code where kernel developers did not intend it.
Therefore, a few kernel developers tried to develop alternative versions of the max and other macros to avoid the warnings and the VLA code. Some attempts tried to leverage GCC's __builtin_constant_p builtin, but no approach worked with all the versions of GCC that the kernel supported at the time (gcc >= 4.4).
At some point, Martin Uecker proposed a particularly clever approach that did not use builtins (taking inspiration from glibc's tgmath.h):
#define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) : (int*)1)))
While the approach uses a GCC extension, it was nevertheless well-received and was used as the key idea behind the __is_constexpr(x) macro which appeared in the kernel after a few iterations with other developers. The macro was used then to implement the max macro and other macros that are required to be constant expressions in order to avoid GCC generating VLA code.

Related

Why are there two ways of expressing NULL in C?

According to §6.3.2.3 ¶3 of the C11 standard, a null pointer constant in C can be defined by an implementation as either the integer constant expression 0 or such an expression cast to void *. In C the null pointer constant is defined by the NULL macro.
My implementation (GCC 9.4.0) defines NULL in stddef.h in the following ways:
#define NULL ((void *)0)
#define NULL 0
Why are both of the above expressions considered semantically equivalent in the context of NULL? More specifically, why do there exist two ways of expressing the same concept rather than one?
Let's consider this example code:
#include <stddef.h>
int *f(void) { return NULL; }
int g(int x) { return x == NULL ? 3 : 4; }
We want f to compile without warnings, and we want g to cause an error or a warning (because an int variable x was compared to a pointer).
In C, #define NULL ((void*)0) gives us both (GCC warning for g, clean compile for f).
However, in C++, #define NULL ((void*)0) causes a compile error for f. Thus, to make it compile in C++, <stddef.h> has #define NULL 0 for C++ only (not for C). Unfortunately, this also prevents the warning from being reported for g. To fix that, C++11 uses built-in nullptr instead of NULL, and with that, C++ compilers report an error for g, and they compile f cleanly.
((void *)0) has stronger typing and could lead to better compiler or static analyser diagnostics. For example since implicit conversions between pointers and plain integers aren't allowed in standard C.
0 is likely allowed for historical reasons, from a pre-standard time when everything in C was pretty much just integers and wild implicit conversions between pointers and integers were allowed, though possibly resulting in undefined behavior.
Ancient K&R 1st edition provides some insight (7.14 the assignment operator):
The compilers currently allow a pointer to be assigned to an integer, an integer to a pointer, and a pointer to a pointer of another type. The assignment is a pure copy operation, with no conversion. This usage is nonportable, and may produce pointers which cause addressing exceptions when used. However, it is guaranteed that assignment of the constant 0 to a pointer will produce a null pointer distinguishable from a pointer to any object.
Few things in C are more confusing than null pointers. The C FAQ list devotes an entire section to the topic, and to the myriad misunderstandings that eternally arise. And we can see that those misunderstandings never go away, as some of them are being recycled even in this thread, in 2022.
The basic facts are these:
C has the concept of a null pointer, a distinguished pointer value which points definitively nowhere.
The source code construct by which a null pointer is requested — a null pointer constant — fundamentally involves the token 0.
Because the token 0 has other uses, ambiguity (not to mention confusion) is possible.
To help reduce the confusion and ambiguity, for many years the token 0 as a null pointer constant has been hidden behind the preprocessor macro NULL.
To provide some type safety and further reduce errors, it's attractive to have the macro definition of NULL include a pointer cast.
However, and most unfortunately, enough confusion crept in along the way that properly mitigating it all has become almost impossible. In particular, there is so very much extant code that says things like strbuf[len] = NULL; (in an obvious but basically wrong attempt to null-terminate a string) that it is believed in some circles to be impossible to actually define NULL with an expansion including either the explicit cast or the hypothetical future (or extant in C++) new keyword nullptr.
See also Why not call nullptr NULL?
Footnote (call this point 3½): It's also possible for a null pointer — despite being represented in C source code as an integer constant 0 — to have an internal value that is not all-bits-0. This fact adds massively to the confusion whenever this topic is discussed, but it doesn't fundamentally change the definition.
There is just one way to express NULL in C, it's a single 4-character token.
But hold on, when going into its definition it gets more interesting.
NULL has to be defined as a null pointer constant, meaning an integer constant with value 0 or such cast to void*.
As an integer constant is just an expression of integer type with a few restrictions to guarantee static evaluation, there are infinite possibilities for any wanted value.
Of all those possibilities, only an integer literal with value 0 is also a null pointer constant in C++, for what it's worth.
The reason for such variation is history and precedent (everyone did it differently, void* was late to the party, and existing code/implementations trumps all), reinforced with backwards-compatibility which preserves it.
6.3.2.3 Pointers
[...]
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
67) If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
[...]
6.6 Constant expressions
[...]
Description
2 A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be.
Constraints
3 Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.117)
4 Each constant expression shall evaluate to a constant that is in the range of representable values for its type.
Semantics
5 An expression that evaluates to a constant is required in several contexts. If a floating expression is evaluated in the translation environment, the arithmetic range and precision shall be at least as
great as if the expression were being evaluated in the execution environment.118)
6 An integer constant expression119) shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts.
Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.
C was originally developed on machines where a null pointer constant and the integer constant 0 had the same representation. Later, some vendors ported the language to mainframes where a different special value triggered a hardware trap when used as a pointer, and wanted to use that value for NULL. These companies discovered that so much existing code type-punned between integers and pointers, they had to recognize 0 as a special constant that could implicitly convert to a null pointer constant. ANSI C incorporated this behavior, at the same time as they introduced the void* as a pointer that implicitly converts to any type of object pointer. This allowed NULL to be used as a safer alternative to 0.
I’ve seen some code that (possibly tongue-in-cheek) detected one of these machines by testing if ((char*)1 == 0).
why do there exist two ways of expressing the same concept rather than one?
History.
NULL started as 0 and later better programming practices encouraged ((void *)0).
First, there are more than 2 ways:
#define NULL ((void *)0)
#define NULL 0
#define NULL 0L
#define NULL 0LL
#define NULL 0u
...
Before void * (Pre C89)
Before void * and void existed, #define NULL some_integer_type_of_zero was used.
It was useful to have the size of that integer type to match the size of object pointers. Consider the below. With 16-bit int and 32-bit long, it is useful for the type of zero used to match the width of an object pointer.
Consider printing pointers.
double x;
printf("%ld\n", &x); // On systems where an object pointer was same size as long
printf("%ld\n", NULL);// Would like to use the same specifier for NULL
With 32-bit object pointers, #define NULL 0L is better.
double x;
printf("%d\n", &x); // On systems where an object pointer was same size as int
printf("%d\n", NULL);// Would like to use the same specifier for NULL
With 16-bit object pointers, #define NULL 0 is better.
C89
After the birth of void, void *, it is natural to have the null pointer constant to be a pointer type. This allowed the bit pattern of (void*)0) to be non-zero. This was useful in some architectures.
printf("%p\n", NULL);
With 16-bit object pointers, #define NULL ((void*)0) works above.
With 32-bit object pointers, #define NULL ((void*)0) works.
With 64-bit object pointers, #define NULL ((void*)0) works.
With 16-bit int, #define NULL ((void*)0) works.
With 32-bit int, #define NULL ((void*)0) works.
We now have independence of the int/long/object pointer size. ((void*)0) works in all cases.
Using #define NULL 0 creates issues when passing NULL as a ... argument, hence the irksome need to do printf("%p\n", (void*)NULL); for highly portable code.
With #define NULL ((void*)0), code like char n = NULL; will more likely raise a warning, unlike ``#define NULL 0`
C99
With the advent of _Generic, we can distinguish, for better or worse, NULL as a void *, int, long, ...
According to §6.3.2.3 ¶3 of the C11 standard, a null pointer constant in C can be defined by an implementation as either the integer constant expression 0 or such an expression cast to void *.
No, that a misleading paraphrase of the language spec. The actual language of the cited paragraph is
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. [...]
Implementations don't get to choose between those alternatives. Both are forms of a null pointer constant in the C language. They can be used interchangeably for the purpose.
Moreover, not only the specific integer constant expression 0 can serve in this role, but any integer constant expression with value 0 can do. For example, 1 + 2 + 3 + 4 - 10 is such an expression.
Additionally, do not confuse null pointer constants generally with the macro NULL. The latter is defined by conforming implementations to expand to a null pointer constant, but that doesn't mean that the replacement text of NULL is the only null pointer constant.
My implementation (GCC 9.4.0) defines NULL in stddef.h in the
following ways:
#define NULL ((void *)0)
#define NULL 0
Not both at the same time, of course.
Why are both of the above expressions considered semantically
equivalent in the context of NULL?
Again with the reversal. It's not "the context of NULL". It's pointer context. There is nothing particularly special about the macro NULL itself to distinguish contexts in which it appears from contexts where its replacement text appears directly.
And I guess you're asking for rationale for paragraph 6.3.2.3/3, as opposed to "because 6.3.2.3/3". There is no published rationale for C11. There is one for C99, which largely serves for C90 as well, but it does not address this issue.
It should be noted, however, that void (and therefore void *) was an invention of the committee that developed the original C language specification ("ANSI C" / C89 / C90). There was no possibility of an "integer constant expression cast to type void *" before then.
More specifically, why do there
exist two ways of expressing the same concept rather than one?
Are there, really?
If we accept an integer constant expression with value 0 as a null pointer constant (a source-code entity), and we want to convert it to a runtime null pointer value, then which pointer type do we choose? Pointers to different object types do not necessarily have the same representation, so this actually matters. Type void * seems the natural choice to me, and that's consistent with the fact that, alone of all pointer types, void * can be converted to other object pointer types without a cast.
But then, in a context where 0 is being interpreted as a null pointer constant, casting it to void * is a no-op, so (void *) 0 expresses exactly the same thing as 0 in such a context.
What's really going on here
At the time the ANSI committee was working, many existing C implementations accepted integer-to-pointer conversions without a cast, and although the meaning of most such conversions was implementation and / or context specific, there was wide acceptance that converting constant 0 to a pointer yielded a null pointer. That use was by far the most common one of converting an integer constant to a pointer. The committee wanted to impose stricter rules on type conversions, but it did not want to break all the existing code that used 0 as a constant representing a null pointer.
So they hacked the spec.
They invented a special kind of constant, the null pointer constant, and provided rules around it that made it compatible with existing use. A null pointer constant, regardless of lexical form, can be implicitly converted to any pointer type, yielding a null pointer (value) of that type. Otherwise, no implicit integer-to-pointer conversions are defined.
But the committee preferred that null pointer constants should actually have pointer type without conversion (which 0 does not, pointer context or no), so they provided for the "cast to type void *" option as part of the definition of a null pointer constant. At the time, that was a forward-looking move, but the general consensus now appears to be that it was the right direction to aim.
And why do we still have the "integer constant expression with value 0"? Backwards compatibility. Consistency with conventional idioms such as {0} as a universal initializer for objects of any type. Resistance to change. Perhaps other reasons as well.
The "why" - it is for historical reasons. NULL was used in various implementations before it was added to a standard. And at the time it was added to a C standard, implementations defined NULL usually as 0, or as 0 cast to some pointer. At that point you wouldn't want to make one of them illegal, because whichever you made illegal, you'd break half the existing code.
The C11 standard allows for a null pointer constant to be defined either as the integer constant expression 0 or as an expression that is cast to void *. The use of the NULL macro makes it easier for programmers to use the null pointer constant in their code, as they don't have to remember which of these definitions the implementation uses.
Using a macro also makes it easier to change the underlying definition of the null pointer constant in the future, if necessary. For example, if the implementation decided to change the definition of NULL to be a different integer constant expression, they could do so by simply modifying the definition of the NULL macro. This would not require any changes to the code that uses the NULL macro, as long as the code uses the NULL macro consistently.
There are two definitions of the NULL macro provided in the example you gave because some systems may define NULL as an expression that is cast to void *, while others may define it as the integer constant expression 0. By providing both definitions, the stddef.h header can be used on a wide range of systems without requiring any modifications.

Why is the ternary operator behaving _weirdly_ in that syntax? [duplicate]

How does the __is_constexpr(x) macro of the Linux Kernel work? What is its purpose? When was it introduced? Why was it introduced?
/*
* This returns a constant expression while determining if an argument is
* a constant expression, most importantly without evaluating the argument.
* Glory to Martin Uecker <Martin.Uecker#med.uni-goettingen.de>
*/
#define __is_constexpr(x) \
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
For a discussion on different approaches to solve the same problem, see instead: Detecting Integer Constant Expressions in Macros
Linux Kernel's __is_constexpr macro
Introduction
The __is_constexpr(x) macro can be found in Linux Kernel's include/kernel/kernel.h:
/*
* This returns a constant expression while determining if an argument is
* a constant expression, most importantly without evaluating the argument.
* Glory to Martin Uecker <Martin.Uecker#med.uni-goettingen.de>
*/
#define __is_constexpr(x) \
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
It was introduced during the merge window for Linux Kernel v4.17, commit 3c8ba0d61d04 on 2018-04-05; although the discussions around it started a month before.
The macro is notable for taking advantage of subtle details of the C standard: the conditional operator's rules for determining its returned type (6.5.15.6) and the definition of a null pointer constant (6.3.2.3.3).
In addition, it relies on sizeof(void) being allowed (and different than sizeof(int)), which is a GNU C extension.
How does it work?
The macro's body is:
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
Let's focus on this part:
((void *)((long)(x) * 0l))
Note: the (long)(x) cast is intended to allow x to have pointer types and to avoid warnings on u64 types on 32-bit platforms. However, this detail is not important for understanding the key points of the macro.
If x is an integer constant expression (6.6.6), then it follows that ((long)(x) * 0l) is an integer constant expression of value 0. Therefore, (void *)((long)(x) * 0l) is a null pointer constant (6.3.2.3.3):
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant
If x is not an integer constant expression, then (void *)((long)(x) * 0l) is not a null pointer constant, regardless of its value.
Knowing that, we can see what happens afterwards:
8 ? ((void *)((long)(x) * 0l)) : (int *)8
Note: the second 8 literal is intended to avoid compiler warnings about creating pointers to unaligned addresses. The first 8 literal could simply be 1. However, these details are not important for understanding the key points of the macro.
The key here is that the conditional operator returns a different type depending on whether one of the operands is a null pointer constant (6.5.15.6):
[...] if one operand is a null pointer constant, the result has the type of the other operand; otherwise, one operand is a pointer to void or a qualified version of void, in which case the result type is a pointer to an appropriately qualified version of void.
So, if x was an integer constant expression, then the second operand is a null pointer constant and therefore the type of the expression is the type of the third operand, which is a pointer to int.
Otherwise, the second operand is a pointer to void and thus the type of the expression is a pointer to void.
Therefore, we end up with two possibilities:
sizeof(int) == sizeof(*((int *) (NULL))) // if `x` was an integer constant expression
sizeof(int) == sizeof(*((void *)(....))) // otherwise
According to the GNU C extension, sizeof(void) == 1. Therefore, if x was an integer constant expression, the result of the macro is 1; otherwise, 0.
Moreover, since we are only comparing for equality two sizeof expressions, the result is itself another integer constant expression (6.6.3, 6.6.6):
Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
Therefore, in summary, the __is_constexpr(x) macro returns an integer constant expression of value 1 if the argument is an integer constant expression. Otherwise, it returns an integer constant expression of value 0.
Why was it introduced?
The macro came to be during the effort to remove all Variable Length Arrays (VLAs) from the Linux kernel.
In order to facilitate so, it was desirable to enable GCC's -Wvla warning kernel-wide; so that all instances of VLAs were flagged by the compiler.
When the warning was enabled, it turned out that GCC reported many cases of arrays being VLAs which was not intended to be so. For instance in fs/btrfs/tree-checker.c:
#define BTRFS_NAME_LEN 255
#define XATTR_NAME_MAX 255
char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)];
A developer may expect that max(BTRFS_NAME_LEN, XATTR_NAME_MAX) was resolved to 255 and therefore it should be treated as a standard array (i.e. non-VLA). However, this depends on what the max(x, y) macro expands to.
The key issue is that GCC generates VLA-code if the array's size is not an (integer) constant expression as defined by the C standard. For instance:
#define not_really_constexpr ((void)0, 100)
int a[not_really_constexpr];
According to the C90 standard, ((void)0, 100) is not a constant expression (6.6), due to the comma operator being used (6.6.3). In this case, GCC opts to issue VLA code, even when it knows the size is a compile-time constant. Clang, in contrast, does not.
Since the max(x, y) macro in the kernel was not a constant expression, GCC triggered the warnings and generated VLA code where kernel developers did not intend it.
Therefore, a few kernel developers tried to develop alternative versions of the max and other macros to avoid the warnings and the VLA code. Some attempts tried to leverage GCC's __builtin_constant_p builtin, but no approach worked with all the versions of GCC that the kernel supported at the time (gcc >= 4.4).
At some point, Martin Uecker proposed a particularly clever approach that did not use builtins (taking inspiration from glibc's tgmath.h):
#define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) : (int*)1)))
While the approach uses a GCC extension, it was nevertheless well-received and was used as the key idea behind the __is_constexpr(x) macro which appeared in the kernel after a few iterations with other developers. The macro was used then to implement the max macro and other macros that are required to be constant expressions in order to avoid GCC generating VLA code.

Not a constant initializer element?

I encountered a confusing case when I was doing semantic analysis for my compiler course.
#include <stdio.h>
int a = "abcd"[2];
int main()
{
char b = "abcd"[2];
printf("%d\n%c\n", a, b);
return 0;
}
GCC says "error: initializer element is not constant" for variable "a".
Why?
The C language requires initializers for global variables to be constant expressions. The motivation behind this is for the compiler to be able to compute the expression at compile time and write the computed value into the generated object file.
The C standard provides specific rules for what is a constant expression:
An
integer constant expression117)
shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants,
sizeof
expressions whose results are integer constants,
_Alignof
expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant
expression shall only convert arithmetic types to integer types, except as part of an
operand to the
sizeof
or
_Alignof
operator
.
More latitude is permitted for constant expressions in initializers. Such a constant
expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant
expression.
As you can see non of the cases include an array access expression or a pointer dereference. So "abcd"[2] does not qualify as a constant expression per the standard.
Now the standard also says:
An implementation may accept other forms of constant expressions.
So it would not violate the standard to allow "abcd"[1] as a constant expression, but it's also not guaranteed to be allowed.
So it's up to you whether or not to allow it in your compiler. It will be standard compliant either way (though allowing it is more work as you need another case in your isConstantExpression check and you need to actually be able to evaluate the expression at compile time, so I'd go with disallowing it).
int a = "abcd"[2];
a is a global variable initilize at compile time but the "abcd"[2] is computed at run time.
char b = "abcd"[2];
here b is local variable and it initilize at run time after "abcd"[2] computed.

Initializer with constant expression having possible overflow in C99

Is this valid C99 code? If so, does it define an implementation-defined behavior?
int a;
unsigned long b[] = {(unsigned long)&a+1};
From my understanding of the C99 standard, from §6.6 in the ISO C99 standard, this might be valid:
An integer constant expression shall have integer type and shall only have operands that are integer constants (...) Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
(...)
an address constant for an object type plus or minus an integer constant expression.
However, because there is the possibility of the addition overflowing, this might not be considered a constant expression and therefore not valid C99 code.
Could someone please confirm if my reasoning is correct?
Note that both GCC and Clang accept this code without warnings, even when using -std=c99 -pedantic. However, when casting to unsigned int instead of unsigned long, that is, using the following code:
int a;
unsigned long b[] = {(unsigned int)&a+1};
Then both compilers complain that the expression is not a compile-time constant.
From this clang developers thread on a similar issue: Function pointer is compile-time constant when cast to long but not int? the rationale is that the standard does not require the compiler to support this(this scenario is not included in any of bullets in 6.6p7) and although it is allowed to support this supporting truncated addresses would be burdensome:
I assume that sizeof(int) < sizeof(void(*)()) == sizeof(long) on
your target. The problem is that the tool chain almost certainly
can't express a truncated address as a relocation.
C only requires the implementation to support initializer values that
are either (1) constant binary data, (2) the address of some object, or
(3) or an offset added to the address of some object. We're allowed,
but not required, to support more esoteric things like subtracting two
addresses or multiplying an address by a constant or, as in your
case, truncating the top bits of an address away. That kind of
calculation would require support from the entire tool chain from
assembler to loader, including various file formats along the way.
That support generally doesn't exist.
Your case, which is casting a pointer to a integer type does not fit any of the cases under 6.6 paragraph 7:
More latitude is permitted for constant expressions in initializers.
Such a constant expression shall be, or evaluate to, one of the
following:
an arithmetic constant expression,
anull pointer constant,
an address constant, or
an address constant for an object type plus or minus an integer constant expression.
but as mentioned in the post compiler are allowed to support other forms of constant expression:
An implementation may accept other forms of constant expressions.
but neither clang nor gcc accept this.
This code is not required to be accepted by a conforming implementation. You quoted the relevant passage in your question:
More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for an object type plus or minus an integer constant expression.
(unsigned long)&x is none of those things. It's not an arithmetic constant because of C11 6.6/8:
Cast operators in an arithmetic constant expression shall only convert
arithmetic types to arithmetic types
(pointer types are not arithmetic types, 6.2.5/18); and it is not an address constant because all address constants are pointers (6.6/9). Finally a pointer plus or minus an ICE is another pointer, so it is not that either.
However 6.6/10 says that an implementation may accept other forms of constant expressions. I'm not sure whether this means the original code should be called ill-formed or not (ill-formed code requires a diagnostic). Clearly your compiler is accepting some other constant expressions here.
The next issue is that casting from pointer to integer is implementation-defined. It may also be undefined if there is no integer representation corresponding to the particular pointer. (6.3.2.3/6)
Finally, the + 1 on the end makes no difference. unsigned long arithmetic is well-defined on addition and subtraction, so it is OK if and only if (unsigned long)&x is OK.
First of all, your initializer is not necessarily a constant expression. If a has local scope, then it is assigned an address in run-time, when it gets pushed on the stack. C11 6.6/7 says that in order for a pointer to be a constant expression, it has to be an address constant, which is defined in 6.6/9 as:
An address constant is a null pointer, a pointer to an lvalue
designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type.
(Emphasis mine)
As for whether your code is standard C, yes it is. Pointer conversions to integers are allowed, although they may come with various forms of poorly specified behavior. Specified in 6.5/6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
To safely ensure that the pointer can fit into the integer, you need to use uintptr_t. But I don't think pointer to integer conversion was the reason you posted this question.
Regarding whether an integer overflow would prevent it from being a compile time constant, I'm not sure where you got that idea from. I don't believe your reasoning is correct, for example (INT_MAX + INT_MAX) is a compile time constant and it is also guaranteed to overflow. (GCC gives you a warning.) In case it overflows, it will invoke undefined behavior.
As for why you get errors about the expression not being a compile-time constant, I don't know. I can't reproduce it on gcc 4.9.1. I tried declaring a with both static and automatic storage duration but no difference.
Sounds like you somehow accidentally compiled as C90, in which case gcc will tell you "error: initializer element is not computable at load time". Or maybe there was a compiler bug which has been fixed in my version of gcc.

Reliably determine the number of elements in an array

Every C programmer can determine the number of elements in an array with this well-known macro:
#define NUM_ELEMS(a) (sizeof(a)/sizeof 0[a])
Here is a typical use case:
int numbers[] = {2, 3, 5, 7, 11, 13, 17, 19};
printf("%lu\n", NUM_ELEMS(numbers)); // 8, as expected
However, nothing prevents the programmer from accidentally passing a pointer instead of an array:
int * pointer = numbers;
printf("%lu\n", NUM_ELEMS(pointer));
On my system, this prints 2, because apparently, a pointer is twice as large as an integer. I thought about how to prevent the programmer from passing a pointer by mistake, and I found a solution:
#define NUM_ELEMS(a) (assert((void*)&(a) == (void*)(a)), (sizeof(a)/sizeof 0[a]))
This works because a pointer to an array has the same value as a pointer to its first element. If you pass a pointer instead, the pointer will be compared with a pointer to itself, which is almost always false. (The only exception is a recursive void pointer, that is, a void pointer that points to itself. I can live with that.)
Accidentally passing a pointer instead of an array now triggers an error at runtime:
Assertion `(void*)&(pointer) == (void*)(pointer)' failed.
Nice! Now I have a couple of questions:
Is my usage of assert as the left operand of the comma expression valid standard C? That is, does the standard allow me to use assert as an expression? Sorry if this is a dumb question :)
Can the check somehow be done at compile-time?
My C compiler thinks that int b[NUM_ELEMS(a)]; is a VLA. Any way to convince him otherwise?
Am I the first to think of this? If so, how many virgins can I expect to be waiting for me in heaven? :)
Is my usage of assert as the left operand of the comma expression valid standard C? That is, does the standard allow me to use assert as an expression?
Yes, it is valid as the left operand of the comma operator can be an expression of type void. And assert function has void as its return type.
My C compiler thinks that int b[NUM_ELEMS(a)]; is a VLA. Any way to convince him otherwise?
It believes so because the result of a comma expression is never a constant expression (e..g, 1, 2 is not a constant expression).
EDIT1: add the update below.
I have another version of your macro which works at compile time:
#define NUM_ELEMS(arr) \
(sizeof (struct {int not_an_array:((void*)&(arr) == &(arr)[0]);}) * 0 \
+ sizeof (arr) / sizeof (*(arr)))
and which seems to work even also with initializer for object with static storage duration.
And it also work correctly with your example of int b[NUM_ELEMS(a)]
EDIT2:
to address #DanielFischer comment. The macro above works with gcc without -pedantic only because gcc accepts :
(void *) &arr == arr
as an integer constant expression, while it considers
(void *) &ptr == ptr
is not an integer constant expression. According to C they are both not integer constant expressions and with -pedantic, gcc correctly issues a diagnostic in both cases.
To my knowledge there is no 100% portable way to write this NUM_ELEM macro. C has more flexible rules with initializer constant expressions (see 6.6p7 in C99) which could be exploited to write this macro (for example with sizeof and compound literals) but at block-scope C does not require initializers to be constant expressions so it will not be possible to have a single macro which works in all cases.
EDIT3:
I think it is worth mentioning that the Linux kernel has an ARRAY_SIZE macro (in include/linux/kernel.h) that implements such a check when sparse (the kernel static analysis checker) is executed.
Their solution is not portable and make use of two GNU extensions:
typeof operator
__builtin_types_compatible_p builtin function
Basically it looks like something like that:
#define NUM_ELEMS(arr) \
(sizeof(struct {int :-!!(__builtin_types_compatible_p(typeof(arr), typeof(&(arr)[0])));}) \
+ sizeof (arr) / sizeof (*(arr)))
Yes. The left expression of a comma operator is always evaluated as a void expression (C99 6.5.17#2). Since assert() is a void expression, no problem to begin with.
Maybe. While the C preprocessor doesn't know about types and casts and can't compare addresses you can use the same trick as for evaluating sizeof() at compile time, e.g. declaring an array the dimension of which is a boolean expression. When 0 it is a constraint violation and a diagnostic must be issued. I've tried it here, but so far have not been successful... maybe the answer actually is "no".
No. Casts (of pointer types) are not integer constant expressions.
Probably not (nothing new under the Sun these days). An indeterminate number of virgins of indeterminate sex :-)

Resources