Why is register array indexing undefined? - c

Looking at C11 6.3.2.1 paragraph 3:
Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
Undefined behaviour seems like an odd choice for this situation. Undefined behaviour "imposes no requirements" (3.4.3). In other words, according to only the wording of 6.3.2.1, indexing into (or doing a few other things with) an array declared with register is presumably permitted to compile, run and do exactly what the code looks like it does without issuing an error.
register int a[5];
a[0] = 6; // apparently not required to cause an error?
This seems to contradict the spirit of the keyword, which (per 6.5.3.2) prevents an lvalue's address being taken with &. This is not quite the same thing, but it's certainly related, as implicit array->pointer conversion, and & on an lvalue, generate the same kind of result: a pointer to the object's storage.
The footnote to 6.7.1 makes this relationship explicit:
the address of any part of an object declared with storage-class specifier register cannot be computed, either explicitly (by use of the unary & operator as discussed in 6.5.3.2) or implicitly (by converting an array name to a pointer as discussed in 6.3.2.1).
So if it "can't" be done, why is the conversion undefined instead of erroneous, or (for indexing, where there are a few other options) implementation-defined?
It doesn't read like an oversight in 6.3.2.1, since register's meaning is straightforward enough according to the other mentions; I'd assume it to be perfectly well-defined if that sentence didn't say otherwise. What is there to be in doubt about?

Remember that Undefined Behavior allows everything, including "behaving in a way that's expected on the particular platform". I.e. for a platform that has hardware array registers, you'd want it to compile, for a platform which does not you don't want it to compile. Leaving it UB allows both.
IIRC a 6502 had 256 memory-mapped registers at the start of address space.

Related

Detecting if expression is lvalue or rvalue in C

Is there any way of determining whether a given expression is an lvalue or an rvalue in C? For instance, does there exist a function or macro is_lvalue with the following sort of behaviour:
int four() {
return 4;
}
int a = 4;
/* Returns true */
is_lvalue(a);
/* Returns false */
is_lvalue(four());
I know equivalent functionality exists in C++, but is there any way of doing this in any standard of C? I'm not particularly interested in GCC-specific extensions.
Thank you for your time.
The C standard does not provide any method for detecting whether an expression is an lvalue or not, either by causing some operation to have different values depending on whether an operand is an lvalue or not or by generating a translation-time diagnostic message or error depending on whether an operand is an lvalue or not.
C implementations may of course define an extension that provides this feature.
About the closest one can get in strictly conforming C is to attempt to take the address of the expression with the address-of operator &. This will produce a diagnostic message (and, in typical C implementations, an error) if its operand is not an lvalue. However, it will also produce a message for lvalues that are bit-fields or that were declared with register. If these are excluded from the cases of interest, then it may serve to distinguish between lvalues and non-lvalues during program translation.

Is `*(volatile T*)0x1234;` guaranteed to translate into read instruction?

When working with hardware it is sometimes required to perform a read from a specific register discarding the actual value (to clear some flags, for example). One way would be to explicitly read and discard the value such as:
int temp = *(volatile int*)0x1234; // 0x1234 is the register address
(void)temp; // To silence the "unused" warning
Another way that seem to work is simply:
*(volatile int*)0x1234;
But this doesn't seem to obviously imply the read access, yet it seems to translate to one on compilers I checked. Is this guaranteed by the standard?
Example for ARM GCC with -O3:
https://arm.godbolt.org/z/9Vmt6n
void test(void)
{
*(volatile int *)0x1234;
}
translates into
test():
mov r3, #4096
ldr r3, [r3, #564]
bx lr
C 2018 6.7.3 8 says:
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3.…
Since *(volatile int*)0x1234; is an expression referring to an object with volatile-qualified type, evaluating it must access the object. (This presumes that 0x1234 stands for a valid reference to some object in the C implementation, of course.)
Per C 2018 5.1.2.3 4:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
Per C 2018 6.5 1:
An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof.
Thus, an expression specifies computation of a value. Paragraph 5.1.2.3 4 tells us that this evaluation is performed by the abstract machine, and 6.7.3 8 tells us the actual implementation performs this evaluation that the abstraction machine performs.
One caveat is that what constitutes “access” is implementation-defined. “Access” as defined by the C standard includes both reading and writing (C 3.1 1), but the C standard is unable to specify that it means reading from or writing to some particular piece of hardware.
To go further into language-lawyer, territory, C 6.3.2.1 2 tells us:
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion.
Thus, since *(volatile int*)0x1234; is an lvalue, by dint of the * operator, and is not the operand of the listed operators, it is converted to the value stored in the object. Thus, this expression specifies the computation of the value that is stored in the object.
The gcc documentation on volatile tells us that what consititues a volatile access is implementation defined:
C has the concept of volatile objects. These are normally accessed by pointers and used for accessing hardware or inter-thread communication. The standard encourages compilers to refrain from optimizations concerning accesses to volatile objects, but leaves it implementation defined as to what constitutes a volatile access. The minimum requirement is that at a sequence point all previous accesses to volatile objects have stabilized and no subsequent accesses have occurred. Thus an implementation is free to reorder and combine volatile accesses that occur between sequence points, but cannot do so for accesses across a sequence point. The use of volatile does not allow you to violate the restriction on updating objects multiple times between two sequence points.
This is backed up by C11 section 6.7.3 Type qualifiers
p7:
An object that has volatile-qualified type may be modified in ways unknown to the
implementation or have other unknown side effects. Therefore any expression referring
to such an object shall be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the
object shall agree with that prescribed by the abstract machine, except as modified by the
unknown factors mentioned previously.134) What constitutes an access to an object that
has volatile-qualified type is implementation-defined.
The gcc document goes on to specify how volatile works for gcc, for the case similar to your says:
A scalar volatile object is read when it is accessed in a void
context:
volatile int *src = somevalue;
*src;
Such expressions are rvalues, and GCC implements this as a read of the
volatile object being pointed to.

Why can't a static initialization expression in C use an element of a constant array?

The following (admittedly contrived) C program fails to compile:
int main() {
const int array[] = {1,2,3};
static int x = array[1];
}
When compiling the above C source file with gcc (or Microsoft's CL.EXE), I get the following error:
error: initializer element is not constant
static int x = array[1];
^
Such simple and intuitive syntax is certainly useful, so this seems like it should be legal, but clearly it is not. Surely I am not the only person frustrated with this apparently silly limitation. I don't understand why this is disallowed-- what problem is the C language trying to avoid by making this useful syntax illegal?
It seems like it may have something to do with the way a compiler generates the assembly code for the initialization, because if you remove the "static" keyword (such that the variable "x" is on the stack), then it compiles fine.
However, another strange thing is that it compiles fine in C++ (even with the static keyword), but not in C. So, the C++ compiler seems capable of generating the necessary assembly code to perform such an initialization.
Edit:
Credit to Davislor-- in an attempt to appease the SO powers-that-be, I would seek following types of factual information to answer the question:
Is there any legacy code that supporting these semantics would break?
Have these semantics ever been formally proposed to the standards committee?
Has anyone ever given a reason for rejecting the allowance of these semantics?
Objects with static storage duration (read: variables declared at file scope or with the static keyword) must be initialized by compile time constants.
Section 6.7.9 of the C standard regarding Initialization states:
4 All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or
string literals.
Section 6.6 regarding Constant Expressions states:
7 More latitude is permitted for constant expressions in initializers. Such a constant
expression shall be, or evaluate to, one of the following:
an arithmetic constant expression,
a null pointer constant,
an address constant, or
an address constant for a complete object type plus or minus an integer constant expression.
8 An arithmetic constant expression shall have arithmetic type and shall only have operands that are integer constants, floating
constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, and _Alignof
expressions. Cast operators in an arithmetic constant expression shall
only convert arithmetic types to arithmetic types, except as part of
an operand to a sizeof or
_Alignof operator.
9 An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type. The
array-subscript [] and member-access . and -> operators, the address &
and indirection * unary operators, and pointer casts may be used in
the creation of an address constant, but the value of an object shall
not be accessed by use of these operators.
By the above definition, a const variable does not qualify as a constant expression, so it can't be used to initialize a static object. C++ on the other had does treat const variables as true constants and thus allows them to initialize static objects.
If the C standard allowed this, then compilers would have to know what is in arrays. That is, the compiler would have to have a compile-time model of the array contents. Without this, the compiler has a small amount of work to do for each array: It needs to know its name and type (including its size), and a few other details such as its linkage and storage duration. But, where the initialization of the array is specified in the code, the compiler can just write the relevant information to the object file it is growing and then forget about it.
If the compiler had to be able to fetch values out of the array at compile time, it would have to remember that data. As arrays can be very large, that imposes a burden on the C compiler that the committee likely did not desire, as C is intended to operate in a wide variety of environments, including those with constrained resources.
The C++ committee made a different decision, and C++ is much more burdensome to translate.

ANCI C (C90): Can const be changed?

I am confused as to what ANSI specification says about changing a variable declared const can be legally modified through its address. Unfortunately I do not have access to C90 specification but got conflicting pointers:
The keyword const doesn't turn a variable into a constant! A symbol with the const
qualifier merely means that the symbol cannot be used for assignment. This makes the value
re ad -onl y through that symbol; it does not prevent the value from being modified through
some other means internal (or even external) to the program. It is pretty much useful only
for qualifying a pointer parameter, to indicate that this function will not change the data that argument points to, but other functions may. (Expert C Programming: Deep C Secrets: Peter van der Linden)
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined. (http://flash-gordon.me.uk/ansi.c.txt)
I have seen the latter in C99 specification (n1256.pdf).
Can anyone clarify as to which of the above two views is true please?
Edit: The Expect C Programming actually gives an example to demonstrate the ability to change a const variable using pointer.
It's similar in C90(C89) as C99.
C89 §3.5.3 Type qualifiers
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.
Undefined behavior doesn't mean that C prohibits it at all, just the behavior is, well, not defined. So actually the two of your statements are both true.
Don't know about C90, but C11 contains this clause, which I imagine has been there since day one (C11, 6.7.3/6):
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
The same is true for volatile-qualified objects.
Change in const variable is possible using pointer because it is just a memory location so it will surely accepts the changes made by pointer method.
But since this variable was defined as const so changes in its value will invoke undefined behaviour.
What a declaration such as:
T const *p; //p has type “pointer to const T
means?
A program can use the expression p to alter the value of the pointer object that p designates, but it can’t use the expression *p to alter the value of any objects that *p might designate. If the program has another expression e of unqualified type that designates an object that *p also designates, the program can still use e to change that object.
Thus, a program might be able to change an object right out from under a const-qualified expression.
It is logical that behavior is undefined if you try to modify const variable. Consider embedded platforms where code + constants are placed in ROM. In that case, it is simply not possible to change the value, as it is burnt in forever. Where as if everything resides in RAM, it will likely be changable. That is why the standard says "undefined behavior" in this case - the behaviour must be dependent on platform and compiler.
Statement 1 and 2 are not really mutually exclusive.
What gave you the impression that these two statements were mutually exclusive?
const is just a type qualifier, it's nothing magical.
Memory can still be altered through external means, or through circumventing the compiler's ability to recognize type restrictions. The second statement merely says that attempting to do this will have undefined results; it may or may not change the value.
The way I see it, there is nothing contradictory about either statement.

In C, if B is volatile, should the expression (void)(B = 1) read B

I work on compilers for a couple of embedded platforms. A user has recently complained about the following behaviour from one of our compilers. Given code like this:
extern volatile int MY_REGISTER;
void Test(void)
{
(void) (MY_REGISTER = 1);
}
The compiler generates this (in pseudo-assembler):
Test:
move regA, 1
store regA, MY_REGISTER
load regB, MY_REGISER
That is, it not only writes to MY_REGISTER, but reads it back afterwards. The extra load upset him for performance reasons. I explained that this was because according to the standard "An assignment expression has the value of the left operand after the assignment, [...]".
Strangely, removing the cast-to-void changes the behaviour: the load disappears. The user's happy, but I'm just confused.
So I also checked this out in a couple of versions of GCC (3.3 and 4.4). There, the compiler never generates a load, even if the value is explicitly used, e.g.
int TestTwo(void)
{
return (MY_REGISTER = 1);
}
Turns into
TestTwo:
move regA, 1
store regA, MY_REGISTER
move returnValue, 1
return
Does anyone have a view on which is a correct interpretation of the standard? Should the read-back happen at all? Is it correct or useful to add the read only if the value is used or cast to void?
The relevant paragraph in the standard is this
An assignment operator stores a value
in the object designated by the left
operand. An assignment expression has
the value of the left operand after
the assignment, but is not an lvalue.
The type of an assignment expression
is the type of the left operand unless
the left operand has qualified type,
in which case it is the unqualified
version of the type of the left
operand. The side effect of updating the stored value of the left operand shall
occur between the previous and the next sequence point.
So this clearly makes the difference between "the value of the left operand" and the update of the stored value. Also note that the return is not an lvalue (so there is no reference to the variable in the return of the expression) and all qualifiers are lost.
So I read this as gcc doing the right thing when it returns the value that it knowingly has to store.
Edit:
The upcoming standard plans to clarify that by adding a footnote:
The implementation is permitted to
read the object to determine the value
but is not required to, even when the
object has volatile-qualified type.
Edit 2:
Actually there is another paragraph about expression statements that might shed a light on that:
The expression in an expression
statement is evaluated as a void
expression for its side effects.\footnote{Such as assignments, and function calls which have side effects}
Since this implies that the effect of returning a value is not wanted for such a statement, this strongly suggests that the value may only be loaded from the variable if the value is used.
As a summary, your customer really is rightly upset when he sees that the variable is loaded. This behavior might be in accordance with the standard if you stretch the interpretation of it, but it clearly is on the borderline of being acceptable.
Reading back seems to be nearer to the standard (especially considering that reading a volatile variable can result in a different value than the one written), but I'm pretty sure it isn't what is expected by most code using volatile, especially in contexts where reading or writing a volatile variable triggers some other effects.
volatile in general isn't very well defined -- "What constitutes an access to an object that
has volatile-qualified type is implementation-defined."
Edit: If I had to make a compiler, I think I wouldn't read back the variable if it isn't used and reread it if is, but with a warning. Then should a cast to void be an used?
(void) v;
should surely be one, and considering that, I don't any reason for
(void) v = exp;
not to be. But in any case, I'd give a warning explaining how to get the other effect.
BTW, If you work on a compiler, you probably have someone in contact with the C committee, filling a formal defect report will bring you a binding interpretation (well, there is the risk of the DR being classified "Not A Defect" without any hint about what they want...)
The language in the standard says nothing about reading the volatile variable, only what the value of the assignment expression is, which a) is defined by C semantics, not by the content of the variable and b) isn't used here, so need not be calculated.

Resources