Is there any way of determining whether a given expression is an lvalue or an rvalue in C? For instance, does there exist a function or macro is_lvalue with the following sort of behaviour:
int four() {
return 4;
}
int a = 4;
/* Returns true */
is_lvalue(a);
/* Returns false */
is_lvalue(four());
I know equivalent functionality exists in C++, but is there any way of doing this in any standard of C? I'm not particularly interested in GCC-specific extensions.
Thank you for your time.
The C standard does not provide any method for detecting whether an expression is an lvalue or not, either by causing some operation to have different values depending on whether an operand is an lvalue or not or by generating a translation-time diagnostic message or error depending on whether an operand is an lvalue or not.
C implementations may of course define an extension that provides this feature.
About the closest one can get in strictly conforming C is to attempt to take the address of the expression with the address-of operator &. This will produce a diagnostic message (and, in typical C implementations, an error) if its operand is not an lvalue. However, it will also produce a message for lvalues that are bit-fields or that were declared with register. If these are excluded from the cases of interest, then it may serve to distinguish between lvalues and non-lvalues during program translation.
Related
I don't understand the Undefined Behaviours in C99 related to constant expression.
For example:
An expression that is required to be an integer constant expression
does not have an integer type; has operands that are not integer
constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, or immediately-cast
floating constants; or contains casts (outside operands to sizeof
operators) other than conversions of arithmetic types to integer types
(6.6).
I can't find an example of such UB ?
Furthermore I don't understant why a constant expression (evaluated at translation time) does not become an expression evaluated at runtime (instead of being UB).
This is quoted from the informative annex J. To find the actual normative text you have to go the section that the appendix J points at, in this case the definition of integer constant expression C99 6.6:
An integer constant expression99) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and floating constants that are the
immediate operands of casts.
That text is pretty self-explanatory IMO. That is: whenever syntax or normative text elsewhere requires an integer constant expression, whatever you place at such a location must fulfil the above quoted part, or it is not an integer constant expression but undefined behavior. (Violating a "shall" requirement in normative ISO C text is always UB.)
I'd expect compilers to be good at giving errors for this since it's compile-time UB.
For example, this is invalid since an array declaration with static storage duration requires the size to be integer constant expression:
int a=1;
static int x [a];
Similarly, int x [1 + 1.0]; would be invalid but int x[1 + (int)1.0]; is ok.
According to N1570 6.6p10, "An implementation may accept other forms of constant expressions." In general, situations where an implementation would be allowed to reject a program, but would also be allowed to accept it, are classified as Undefined Behavior. While it might be helpful to specify that an implementation given something like (at file scope):
int x,y;
int sz = (uintptr_t)&y - (uintptr_t)&x;
would be required to either reject the program, or else behave as though sz is initialized to a value matching what would be computed if the indicated conversions and subtraction would be performed at runtime, such constructs would often require linker support, and a compiler may have no way of knowing for certain what constructs the linker would support, or what it would do if code uses an unsupportable construct.
The Standard does not use the term "Undefined Behavior" purely to refer to erroneous constructs, but also applies it to non-portable ones which might be unsupportable or erroneous on some implementations but correct on others. The authors of the Standard note that Undefined Behavior, among other things, identifies potential areas of "conforming language extension" by allowing implementations to define behaviors beyond those mandated by the Standard. Viewed in that light, classifying the processing of non-standard forms of integer constant expressions as Undefined Behavior allows compilers to support such constructs when practical and useful, without imposing requirements on the behavior of such constructs that some implementations might be unable to meet.
Returning to the earlier example, a compiler might compute the difference between &y and &x as the difference between the two objects' offsets within their respective data sections. Such a computation might only be useful if the objects happened to be defined in the same translation unit, and might yield a meaningless value, without necessarily issuing a diagnostic, if they're not. A compiler, however, would have no way of knowing whether the objects are defined in the same translation unit, and the Standard would have no concept of code whose behavior would be meaningfully defined if two externally-defined objects are defined in the same compilation unit, but not if they aren't. The Standard term for behavior that implementations would define in some cases, but not in others, based upon criteria outside the Standard's jurisdiction, is "Undefined Behavior".
When working with hardware it is sometimes required to perform a read from a specific register discarding the actual value (to clear some flags, for example). One way would be to explicitly read and discard the value such as:
int temp = *(volatile int*)0x1234; // 0x1234 is the register address
(void)temp; // To silence the "unused" warning
Another way that seem to work is simply:
*(volatile int*)0x1234;
But this doesn't seem to obviously imply the read access, yet it seems to translate to one on compilers I checked. Is this guaranteed by the standard?
Example for ARM GCC with -O3:
https://arm.godbolt.org/z/9Vmt6n
void test(void)
{
*(volatile int *)0x1234;
}
translates into
test():
mov r3, #4096
ldr r3, [r3, #564]
bx lr
C 2018 6.7.3 8 says:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3.…
Since *(volatile int*)0x1234; is an expression referring to an object with volatile-qualified type, evaluating it must access the object. (This presumes that 0x1234 stands for a valid reference to some object in the C implementation, of course.)
Per C 2018 5.1.2.3 4:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Per C 2018 6.5 1:
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof.
Thus, an expression specifies computation of a value. Paragraph 5.1.2.3 4 tells us that this evaluation is performed by the abstract machine, and 6.7.3 8 tells us the actual implementation performs this evaluation that the abstraction machine performs.
One caveat is that what constitutes “access” is implementation-defined. “Access” as defined by the C standard includes both reading and writing (C 3.1 1), but the C standard is unable to specify that it means reading from or writing to some particular piece of hardware.
To go further into language-lawyer, territory, C 6.3.2.1 2 tells us:
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion.
Thus, since *(volatile int*)0x1234; is an lvalue, by dint of the * operator, and is not the operand of the listed operators, it is converted to the value stored in the object. Thus, this expression specifies the computation of the value that is stored in the object.
The gcc documentation on volatile tells us that what consititues a volatile access is implementation defined:
C has the concept of volatile objects. These are normally accessed by pointers and used for accessing hardware or inter-thread communication. The standard encourages compilers to refrain from optimizations concerning accesses to volatile objects, but leaves it implementation defined as to what constitutes a volatile access. The minimum requirement is that at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred. Thus an implementation is free to reorder and combine volatile accesses that occur between sequence points, but cannot do so for accesses across a sequence point. The use of volatile does not allow you to violate the restriction on updating objects multiple times between two sequence points.
This is backed up by C11 section 6.7.3 Type qualifiers
p7:
An object that has volatile-qualified type may be modified in ways unknown to the
implementation or have other unknown side effects. Therefore any expression referring
to such an object shall be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the
object shall agree with that prescribed by the abstract machine, except as modified by the
unknown factors mentioned previously.134) What constitutes an access to an object that
has volatile-qualified type is implementation-defined.
The gcc document goes on to specify how volatile works for gcc, for the case similar to your says:
A scalar volatile object is read when it is accessed in a void
context:
volatile int *src = somevalue;
*src;
Such expressions are rvalues, and GCC implements this as a read of the
volatile object being pointed to.
The following (admittedly contrived) C program fails to compile:
int main() {
const int array[] = {1,2,3};
static int x = array[1];
}
When compiling the above C source file with gcc (or Microsoft's CL.EXE), I get the following error:
error: initializer element is not constant
static int x = array[1];
^
Such simple and intuitive syntax is certainly useful, so this seems like it should be legal, but clearly it is not. Surely I am not the only person frustrated with this apparently silly limitation. I don't understand why this is disallowed-- what problem is the C language trying to avoid by making this useful syntax illegal?
It seems like it may have something to do with the way a compiler generates the assembly code for the initialization, because if you remove the "static" keyword (such that the variable "x" is on the stack), then it compiles fine.
However, another strange thing is that it compiles fine in C++ (even with the static keyword), but not in C. So, the C++ compiler seems capable of generating the necessary assembly code to perform such an initialization.
Edit:
Credit to Davislor-- in an attempt to appease the SO powers-that-be, I would seek following types of factual information to answer the question:
Is there any legacy code that supporting these semantics would break?
Have these semantics ever been formally proposed to the standards committee?
Has anyone ever given a reason for rejecting the allowance of these semantics?
Objects with static storage duration (read: variables declared at file scope or with the static keyword) must be initialized by compile time constants.
Section 6.7.9 of the C standard regarding Initialization states:
4 All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or
string literals.
Section 6.6 regarding Constant Expressions states:
7 More latitude is permitted for constant expressions in initializers. Such a constant
expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
8 An arithmetic constant expression shall have arithmetic type and shall only have operands that are integer constants, floating
constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and _Alignof
expressions. Cast operators in an arithmetic constant expression shall
only convert arithmetic types to arithmetic types, except as part of
an operand to a sizeof or
_Alignof operator.
9 An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type. The
array-subscript [] and member-access . and -> operators, the address &
and indirection * unary operators, and pointer casts may be used in
the creation of an address constant, but the value of an object shall
not be accessed by use of these operators.
By the above definition, a const variable does not qualify as a constant expression, so it can't be used to initialize a static object. C++ on the other had does treat const variables as true constants and thus allows them to initialize static objects.
If the C standard allowed this, then compilers would have to know what is in arrays. That is, the compiler would have to have a compile-time model of the array contents. Without this, the compiler has a small amount of work to do for each array: It needs to know its name and type (including its size), and a few other details such as its linkage and storage duration. But, where the initialization of the array is specified in the code, the compiler can just write the relevant information to the object file it is growing and then forget about it.
If the compiler had to be able to fetch values out of the array at compile time, it would have to remember that data. As arrays can be very large, that imposes a burden on the C compiler that the committee likely did not desire, as C is intended to operate in a wide variety of environments, including those with constrained resources.
The C++ committee made a different decision, and C++ is much more burdensome to translate.
Looking at C11 6.3.2.1 paragraph 3:
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
Undefined behaviour seems like an odd choice for this situation. Undefined behaviour "imposes no requirements" (3.4.3). In other words, according to only the wording of 6.3.2.1, indexing into (or doing a few other things with) an array declared with register is presumably permitted to compile, run and do exactly what the code looks like it does without issuing an error.
register int a[5];
a[0] = 6; // apparently not required to cause an error?
This seems to contradict the spirit of the keyword, which (per 6.5.3.2) prevents an lvalue's address being taken with &. This is not quite the same thing, but it's certainly related, as implicit array->pointer conversion, and & on an lvalue, generate the same kind of result: a pointer to the object's storage.
The footnote to 6.7.1 makes this relationship explicit:
the address of any part of an object declared with storage-class specifier register cannot be computed, either explicitly (by use of the unary & operator as discussed in 6.5.3.2) or implicitly (by converting an array name to a pointer as discussed in 6.3.2.1).
So if it "can't" be done, why is the conversion undefined instead of erroneous, or (for indexing, where there are a few other options) implementation-defined?
It doesn't read like an oversight in 6.3.2.1, since register's meaning is straightforward enough according to the other mentions; I'd assume it to be perfectly well-defined if that sentence didn't say otherwise. What is there to be in doubt about?
Remember that Undefined Behavior allows everything, including "behaving in a way that's expected on the particular platform". I.e. for a platform that has hardware array registers, you'd want it to compile, for a platform which does not you don't want it to compile. Leaving it UB allows both.
IIRC a 6502 had 256 memory-mapped registers at the start of address space.
GCC 4.9 and 5.1 reject this simple C99 declaration at global scope. Clang accepts it.
const int a = 1, b = a; // error: initializer element is not constant
How could such a basic feature be missing? It seems very straightforward.
C991 section 6.6 Constant expressions is the controlling section. It states in subsections 6 and 7:
6/ An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts.
Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.
The definition of integer and floating point constants is specified in 6.4.4 of the standard, and it's restricted to actual values (literals) rather than variables.
7/ More latitude is permitted for constant expressions in initializers. Such a constant expression shall be, or evaluate to, one of the following (a) an arithmetic constant expression, (b) a null pointer constant, (c) an address constant, or (d) an address constant for an object type plus or minus an integer constant expression.
Since a is none of those things in either subsection 6 or 7, it is not considered a constant expression as per the standard.
The real question, therefore, is not why gcc rejects it but why clang accepts it, and that appears to be buried in subsection 10 of that same section:
10/ An implementation may accept other forms of constant expressions.
In other words, the standard states what an implementation must allow for constant expressions but doesn't limit implementations to allowing only that.
1 C11 is much the same other than minor things like allowing _Alignof as well as sizeof.
This is just the rules of C. It has always been that way. At file scope, initializers must be constant expressions. The definition of a constant expression does not include variables declared with const qualifier.
The rationale behind requiring initializers computable at compile-time was so that the compiler could just put all of the initialized static data as a bloc in the executable file, and then at load time that bloc is loaded into memory as a whole and voila, the global variables all have their correct initial values without any code needing to be executed.
In fact if you could have executable code as initializer for global variables, it introduces quite a lot of complication regarding which order that code should be run in. (This is still a problem in modern C++).
In K&R C, there was no const. They could have had a rule that if a global variable is initialized by a constant expression, then that variable also counts as a constant expression. And when const was added in C89, they could have also added a rule that const int a = 5; leads to a constant expression.
However they didn't. I don't know why sure, but it seems likely that it has to do with keeping the language simple. Consider this:
extern const int a, b = a;
with const int a = 5; being in another unit. Whether or not you want to allow this, it is considerably more complication for the compiler, and some more arbitrary decisions.
If you look at the current C++ rules for constant expressions (which still are not settled to everyone's satisfaction!) you'll see that each time you add support for one more "obvious" thing then there are two other "obvious" things that are next in line and it is never-ending.
In the early days of C, in the 1970s, keeping the compiler simple was important so it may have been that making the compiler support this meant the compiler used too many system resources, or something. (Hopefully a coder from that era can step in and comment more on this!)
Finally, the C89 standardization was quite a contentious process since there were so many different C compilers that had each gone their own way with language evolution. Demanding that a compiler vendor who doesn't support this, change their compiler to support it might be met with opposition, lowering the uptake of the standard.
Because const doesn't make a constant expression -- it makes a variable that can't be assigned to (only initialized). You need constexpr to make a constant expression, which is only available in C++. C99 has no way of making a named constant expression (other than a macro, which is sort-of, but not really an expression at all).