Does every expression in C make a copy of the variables? - c

I'm trying to understand R-values and L-values when it comes to expressions in C and I know that many expressions are not valid L-values because the address of the end result is not known. This is because in many instances, a copy of the variables is used in the expression and so the address of the copy is not known. For example,
char ch = 'a';
char *p = &ch;
p + 1; //This expression can be used as an R-value, but not an L-value
I believe what is happening (please correct me if I'm wrong) in the expression is that a copy of p is created, 1 is added to that copy so that p+1 points at the char after ch, but the address of this new p+1 pointer is not known, so it can't be used as an L-value.
Does this behavior of making a copy of the variables and using the copies in the expression happen for all expressions in C? For example, if I have
int a = 100;
int b = 25;
a - b;
Is a copy of variable a created (and stored at an unknown location) and a copy of variable b created (the copy of b is also stored at an unknown location), the data in the copies is used to subtract and then the result is stored in another unknown location or is the data taken from the original variables, subtracted and then the result is stored at an unknown location?

Does every expression in C make a copy of the variables?
It doesn't really matter.
The compiler has optimization. Compiler is able to do anything with the code, as long as side effects are the same. In your examples p + 1; and a - b; are doing nothing, they can be optimized out by the compiler, so nothing happens.
What happens on a actual machine with the code generated by the compiler is highly machine specific. Compiler can first load the values of variables to some registers, then perform computation on these registers - or may not, may perform computation on values as they are stored, or may not even reserve any memory for the variables.
I know that many expressions are not valid L-values because the address of the end
result is not known
That could be the underlying reason you could make to justify that, but the rules are just more direct. C standard just lists operators and says that the result of + is a value or the result of * is a lvalue. There is no "generic" rule.
Does this behavior of making a copy of the variables and using the copies in the expression happen for all expressions in C?
C standard doesn't talk about "variables" or "copies of variables", C standard talks about "values". An expression has a defined value, defined end result. C standard doesn't concern really how the compiler will arrive at that value. So the actual behavior can be anything. If a particular compiler decides to make a copy of the variables or decides not to - good for him, as long as the end results are correct.
lvalue is something that can be on the left side of = operator. Not necessarily it is a variable, for example ((char*)0x01)[1] is a lvalue although it's a result of some operators. rvalue (or just value) is a value that you can't assign to.

The implementation of such calculations are not standard so they can be different in different compilers. The important thing to understand is that R-values represent temporary values. It can be a register or some allocated memory.
Calculations are made in the CPU on registers. So, the compiler will move the value of the variables in to registers and than will calculate the subtraction.
this is an assembly of such a calculation:
659: c7 45 f8 64 00 00 00 movl $0x64,-0x8(%rbp)
660: c7 45 fc 19 00 00 00 movl $0x19,-0x4(%rbp)
667: 8b 45 f8 mov -0x8(%rbp),%eax
66a: 2b 45 fc sub -0x4(%rbp),%eax
You can see that the values 100(64 in hex) and 25(19 in hex) are saved on the stack at addresses relative to base pointer(-0x8, -0x4 respectively).
than the value from -0x8 is moved into the eax register and the value from -0x4 is subtracted from the value in the register and stored in the register itself.
As you can see, the result will end up in a register, so it doesn't have an address at the memory.

According to the C11 standard(ISO-IEC-9899-2011)--6.3.2.1 Lvalues, arrays and function designators:
Except when it is the operand of the sizeof operator, the unary & operator, the ++
operator, the -- operator, or the left operand of the . operator or an assignment operator,
an lvalue that does not have array type is converted to the value stored in the designated
object (and is no longer an lvalue); this is called lvalue conversion.

Related

Can someone please explain the output of this C program?

Here is the code:
int a=256;
char *x= (char *)&a;
*x++ = 1;
*x =x[0]++;
printf("a=%d\n", a);
The output is:
a=257
I'm going to take the posted code one line at a time.
int a=256;
I presume this is straightforward enough.
char *x= (char *)&a;
This sets a char pointer to point to just one byte of the multi-byte int value. This is a low-level, machine-dependent, and not necessarily meaningful operation. (Also x is a poor, unidiomatic name for a pointer variable.)
*x++ = 1;
This both sets the byte pointed to by x, and increments x to point to the next byte. In general that's a sensible operation — for example, it's how we often fill characters into a text buffer one at a time. In this context, though, it's borderline meaningless, because it's rare to move along the bytes of an int variable one at a time, setting or altering them.
*x =x[0]++;
And then this line is the kicker. I can't explain what it does, in fact no one can explain what it does, because it's undefined. More on this below.
printf("a=%d\n", a);
Obviously this prints the value of a, although after what poor a has been through, it's hard to say what kind of bloody mess might be left of its bits and bytes.
And now let's take a second look at that line
*x =x[0]++;
One thing we can say is that by the rules of pointer arithmetic, the subexpressions *x and x[0] are identical, they do exactly the same thing, they access the value pointed to by x. So whatever value is pointed to by x, this expression tries to modify it twice: once when it says x[0]++, and then a second time when it says *x = … to assign something to *x. And when you have one expression that tries to modify the same thing twice, that's poison: it leads to undefined behavior, and once you're in undefined behavior territory, you can't say — no one can say — what your program does.
In fact, I tried your code under two different compilers, and I got two different answers! Under one compiler the code printed 257, as yours did, but under the other compiler it printed 513. How can that be? What's the right answer? Well, in the case of undefined behavior, since there is no one right answer, it's not wrong — in fact it's more or less expected — for different compilers to give different results.
You can read much more about undefined behavior, and undefined expressions like this one, at the canonical SO question on this topic, Why are these constructs using pre and post-increment undefined behavior? Your expression is equivalent to the classic one i = i++ which is specifically discussed in several of the answers to that other question.
int a=256;
Initialized integer variable with value 256. On little-endian machine memory layout is:
00 01 00 00 ...
char *x= (char *)&a;
x is pointer to least significant byte of a (on little endian machine).
00 01 00 00 ...
^ -- x points here
*x++ = 1;
Set byte where x points to 1, then move x to next byte. Memory is:
01 01 00 00 ...
^-- x points here
*x =x[0]++;
Unspecified behaviour, *x and x[0] are equal (x[0] is *(x+0)) . Post-increment is ignored in your implementation.
UPD: actually it is not ignored but overwritten by assignment.
x[0]++ increases second byte:
01 02 00 00
^ -- x
and then value taken before increment (01) placed to the same place by *x=
01 01 00 00

Operands in "int i = 0"

I would like to ask if this short code:
int i = 0;
has 1 operand or 2? The i is an operand, but is 0 too? According to wikipedia, 0 shouldn't (or maybe I misunderstand). If 0 isn't operand, is it a constant or something?
If it is important, the code is in C99.
In int i = 0;, = is not an operator. It's simply a part of the variable initializaton syntax. On the other hand, in int i; i = 0; it would be an operator.
Since = here is not an operator, there are no operands. Instead, 0 is the initializer.
Since you've tagged the question as "C", one way to look at it is by reading the C standard. As pointed out by this answer, initialization (such as int i = 0) is not an expression in itself, and by a strict reading of the standard, the = here is not an operator according to the usage of those terms in the standard.
It is not as clear whether i and 0 are operands, however. On one hand, the C standard does not seem to refer to the parts of the initialization as operands. On the other hand, it doesn't define the term "operand" exhaustively. For example, one could interpret section 6.3 as calling almost any non-operator part of an expression an "operand", and therefore at least the 0 would qualify as one.
(Also note that if the code was int i; i = 0; instead, the latter i and the 0 would definitely both be operands of the assignment operator =. It remains unclear whether the intent of the question was to make a distinction between assignment and initialization.)
Apart from the C standard, you also refer to Wikipedia, in particular:
In computing, an operand is the part of a computer instruction which specifies what data is to be manipulated or operated on, while at the same time representing the data itself.
If we consider the context of a "computer instruction", the C code might naively be translate to assembly code like mov [ebp-4], 0 where the two operands would clearly be the [ebp-4] (a location where the variable called i is stored) and the 0, which would make both i and 0 operands by this definition. Yet, in reality the code is likely to be optimized by the compiler into a different form, such as only storing i in a register, in which case zeroing it might become xor eax, eax where the 0 no longer exists as an explicit operand (but is the result of the operation). Or, the whole 0 might be optimized away and replaced by some different value that inevitably gets assigned. Or, the whole variable might end up being removed, e.g., if it is used as a loop counter and the loop is unrolled.
So, in the end, either it is something of a philosophical question ("does the zero exist as an operand if it gets optimized away"), or just a matter of deciding on the desired usage of the terms (perhaps depending on the context of discussion).
The i is an operand, but is 0 too? According to wikipedia, 0 shouldn't
(or maybe I misunderstand).
The question links to the Wikipedia page describing the term "operand" in a mathematical context. This likely factors in to your confusion about whether the 0 on the right-hand side of the = is an operand, because in a mathematical statement of equality such as appears in the article, it is in no way conventional to consider = an operator. It does not express a computation, so the expressions it relates are not operated upon. I.e. they do not function as operands.
You have placed the question in C context, however, and in C, = can be used as an assignment operator. This is a bona fide operator in that it expresses a (simple) operation that produces a result from two operands, and also has a side effect on the left-hand operand. In an assignment statement such as
i = 0;
, then, both i and 0 are operands.
With respect to ...
If 0 isn't operand, is it a constant or something?
... 0 standing on its own is a constant in C, but that has little to do with whether it serves as an operand in any given context. Being an operand is a way an expression, including simply 0, can be used. More on that in a moment.
Now it's unclear whether it was the intent, but the code actually presented in the question is very importantly different from the assignment statement above. As the other answers also observe,
int i = 0;
is a declaration, not a statement. In that context, the i is the identifier being declared, the 0 is used as its initializer, and just as the = in a mathematical equation is not an operator, the = introducing an initializer in a C declaration is not an operator either. No operation is performed here, in that no value computation is associated with this =, as opposed to when the same symbol is used as an assignment operator. There being no operator and no operation being performed, there also are no operands in this particular line of code.

Will char and short be promoted to int before being demoted in assignment expressions?

After doing some research I know in arithmetic expressions char and short will be promoted to int internally. But I am still wondering whether integer promotions like that will occur in assignment internally.
(So please don't give me links only concerning other expressions. I am asking about what happens internally in ASSIGNMENT expressions)
char ch1, ch2 = -1;
ch1 = ch2; // Q
Q: Which of the following will happen internally?
1, The value of ch1 is directly assigned to ch2. Integer promotions won't happen here.
2, The value of ch1 is first promoted to int type (8 bits→32bits), then the 32 bits value is demoted to char type, 8 bits, the final result. Integer promotions happen here.
I have found this book: C Primer Plus and in Page 174 there is:
"...When appearing in an expression, char and short, both signed and unsigned, are automatically converted to int, or if necessary, to unsigned int..."
So I think it should be 2, but I have heard someone told me it should be 1, where integer promotions don't happen.
I am really confused. Could you help me please?
Thank you in advance.
From the C99 standard:
6.5.16.1 Simple assignment
2 In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
In your case since both the LHS and RHS are of the same type, there is no need for any conversion.
The answer is Neither 1 nor 2.
The value of ch2 is directly assigned to ch1. With the assignment operator, the left-hand operand is the target.
There are no promotions; the behaviour is specified by C11 6.5.16.1/2:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
In the previous paragraph it is defined:
The type of an assignment expression is the type the left operand would have
after lvalue conversion.
where "lvalue conversion" means lvalue-to-rvalue conversion, which has the effect of removing const, volatile and _Atomic qualifiers for the type. (It also has an effect on array types, but that is moot here as arrays cannot appear as the left operand of an assignment).
Your quote from "C Primer Plus" is a mistake by the book. Integer promotions do not occur in all expressions. In fact, they occur when the integer is an operand of an operator, and the definition of that operator specifies that its operands undergo the integer promotions.
Most operators do specify that, but the assignment operator, and sizeof for example, do not. You can check the C standard's specification of each operator to see whether that operator promotes its operands.
As the comment #chux makes, I don't see how this is a concern. Since everything happens internally, you should treat it as a black box and should not rely on the behavior of such.
However, being curious, I took the snippet and compiled it to assembly. Let's see what's happening exactly!
This is the source C code. I saved it in a file called test.c:
int main() {
char ch1 = -1;
char ch2 = -1;
ch1 = ch2;
}
And this is the assembly generated by gcc. You can generate it yourself by calling gcc -S test.c. Below is the relevant section:
...
movb $-1, -1(%rbp)
movb $-1, -2(%rbp)
movb -2(%rbp), %cl
movb %cl, -1(%rbp)
popq %rbp
...
So basically, we are pushing value -1 twice on the stack (%rbp), then moving the value stored on the second slot (ch2) to another temporary register %cl, and finally, assign it to the first slot, ch1.
Wait, so what is this temporary register business?! Well it turns out %cl is exactly one byte in size! So yes, no conversion takes place.
As an aside: down to assembly, there is no such thing as type conversions unless we don't have enough space to store one. For example, were we to change the value from -1 to, say, 65537 (just exceeding the short), then what we see is:
x = 65537;
becomes:
movl $65537, -4(%rbp)
We are simply assigning 4 bytes on the stack for the variable x. So internally, promotion is simply allocating more space on the stack. When you demote an integer down to a char, we are just taking the last byte from the integer and stick it into a new slot on the stack. So in the case both are chars to begin with (both are assigned one byte slot on the stack), there really need not be a conversion. But of course, this depends on the compiler. You can have a really inefficient compiler that actually pushes ch2 to stack with size 1 MB, computes the factorial of ch1, sings a song, then assign ch1 with ch2. As I said in the beginning, you should treat this as a black box, and not count on it!
By default the type of char is signed. So the ch2=-1 is a valid value. So there is no need for integer promotion.
Integer promotion happens only when 2 different types are mixed. The resultant type used for computation is the larger of the 2 types.
For char to integer promotion it is as you mentioned in 2. First integer promotion takes place for calculation and at store time
the values are truncated to actual store bucket size.

Does the C99 standard permit assignment of a variable to itself?

Does the C99 standard allow variables to be assigned to themselves? For instance, are the following valid:
int a = 42;
/* Case 1 */
a = a;
/* Case 2 */
int *b = &a;
a = *b;
While I suspect Case 1 is valid, I'm hesitant to say the same for Case 2.
In the case of an assignment, is the right side completely evaluated before assigning the value to the variable on the left -- or is a race condition introduced when dereferencing a pointer to the variable being assigned?
Both cases are perfectly valid, since the value of a is only used to determine the value that is to be stored, not to determine the object in which this value is to be store.
In essence in an assignment you have to distinguish three different operations
determine the object to which the value is to be stored
evaluate the RHS
store the determined value in the determined object
the first two of these three operations can be done in any order, even in parallel. The third is obviously a consequence of the two others, so it will come after.
This is perfectly valid, you are only using the previous value to determine the value to be stored. This is covered in the draft C99 standard section 6.5.2 which says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.Furthermore, the prior value shall be read only to
determine the value to be stored.
One of the examples of valid code is as follows:
i = i + 1;
The C and C++ section here covers the different places where a sequence point can occur.
C99 6.5.16.1 Simple assignment
3 If the value being stored in an object is read from another object that overlaps in any way
the storage of the first object, then the overlap shall be exact and the two objects shall
have qualified or unqualified versions of a compatible type; otherwise, the behavior is
undefined.
I think the example code qualifies the "overlap" condition. Since they do have qualified version of a compatible type, the result is valid.
Also 6.5.16 Assignment operators
4 The order of evaluation of the operands is unspecified. If an attempt is made to modify
the result of an assignment operator or to access it after the next sequence point, the
behavior is undefined.
Still, there's no "attempt to modify the result" so the result is valid.
Assuming the compiler doesn't optimize the first instruction out by simply removing it, there is even a race condition here. On most architecture, if a is stored in memory a = a will be compiled in two move instructions (mem => reg, reg => mem) and therefore is not atomic.
Here is an example:
int a = 1;
int main()
{ a = a; }
Result on an Intel x86_64 with gcc 4.7.1
4004f0: 8b 05 22 0b 20 00 mov 0x200b22(%rip),%eax # 601018 <a>
4004f6: 89 05 1c 0b 20 00 mov %eax,0x200b1c(%rip) # 601018 <a>
I can't see a C compiler not permitting a = a. Such an assignment may occur serendipitously due to macros without a programmer knowing it. It may not even generate any code for that is an optimizing issue.
#define FOO (a)
...
a = FOO;
Sample code readily compiles and my review of the C standard shows no prohibition.
As to race conditions #Yu Hao answers that well: no race condition.

PTX arrays as operands not working

The PTX manual (version 2.3) (http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_2.3.pdf) 6.4.2 states:
Array elements can be accessed using an explicitly calculated byte
address, or by indexing into the array using square-bracket notation.
The expression within square brackets is either a constant integer, a
register variable, or a simple “register with constant offset”
expression, where the offset is a constant expression that is either
added or subtracted from a register variable. If more complicated
indexing is desired, it must be written as an address calculation
prior to use.
ld.global.u32 s, a[0];
ld.global.u32 s, a[N-1];
mov.u32 s, a[1]; // move address of a[1] into s
When I try this I can only get the version pointer plus byte offset to work, i.e. [a+0].
This code fails to load:
.reg .f32 f<1>;
.global .f32 a[10];
ld.global.f32 f0,a[0];
Whereas this loads fine:
.reg .f32 f<1>;
.global .f32 a[10];
ld.global.f32 f0,[a+0];
The problem with the byte offset version is that it really is a byte offset. So, one has to take the underlying size of the type into account, i.e. the second element is [a+4]. Whereas a[1] is supposed to work this out for you.
Ideas what's going wrong?
EDIT
And there is an even more severe issue here involved: The above text states that a register variable can be used to index the array, like:
ld.global.f32 f0,a[u0];
where u0 is probably a .reg.u32 or some other compatible integer.
However, with the pointer plus byte offset method this is not possible. It is illegal to do something like:
mul.u32 u1,u0,4;
ld.global.f32 f0,[a+u1]; // here a reg variable is not allowed.
Now this is a severe limitation. however, one can do another address calculation prior to the load statement. But this complicates things.
This does not seem to fit with the PTX documentation you quoted, but you can add in a multiplier corresponding with the size of the items in your array. For instance, to get the 10th 32-bit word:
ld.const.u32 my_u32, [my_ptr + 10 * 4];

Resources