Here is the code:
int a=256;
char *x= (char *)&a;
*x++ = 1;
*x =x[0]++;
printf("a=%d\n", a);
The output is:
a=257
I'm going to take the posted code one line at a time.
int a=256;
I presume this is straightforward enough.
char *x= (char *)&a;
This sets a char pointer to point to just one byte of the multi-byte int value. This is a low-level, machine-dependent, and not necessarily meaningful operation. (Also x is a poor, unidiomatic name for a pointer variable.)
*x++ = 1;
This both sets the byte pointed to by x, and increments x to point to the next byte. In general that's a sensible operation — for example, it's how we often fill characters into a text buffer one at a time. In this context, though, it's borderline meaningless, because it's rare to move along the bytes of an int variable one at a time, setting or altering them.
*x =x[0]++;
And then this line is the kicker. I can't explain what it does, in fact no one can explain what it does, because it's undefined. More on this below.
printf("a=%d\n", a);
Obviously this prints the value of a, although after what poor a has been through, it's hard to say what kind of bloody mess might be left of its bits and bytes.
And now let's take a second look at that line
*x =x[0]++;
One thing we can say is that by the rules of pointer arithmetic, the subexpressions *x and x[0] are identical, they do exactly the same thing, they access the value pointed to by x. So whatever value is pointed to by x, this expression tries to modify it twice: once when it says x[0]++, and then a second time when it says *x = … to assign something to *x. And when you have one expression that tries to modify the same thing twice, that's poison: it leads to undefined behavior, and once you're in undefined behavior territory, you can't say — no one can say — what your program does.
In fact, I tried your code under two different compilers, and I got two different answers! Under one compiler the code printed 257, as yours did, but under the other compiler it printed 513. How can that be? What's the right answer? Well, in the case of undefined behavior, since there is no one right answer, it's not wrong — in fact it's more or less expected — for different compilers to give different results.
You can read much more about undefined behavior, and undefined expressions like this one, at the canonical SO question on this topic, Why are these constructs using pre and post-increment undefined behavior? Your expression is equivalent to the classic one i = i++ which is specifically discussed in several of the answers to that other question.
int a=256;
Initialized integer variable with value 256. On little-endian machine memory layout is:
00 01 00 00 ...
char *x= (char *)&a;
x is pointer to least significant byte of a (on little endian machine).
00 01 00 00 ...
^ -- x points here
*x++ = 1;
Set byte where x points to 1, then move x to next byte. Memory is:
01 01 00 00 ...
^-- x points here
*x =x[0]++;
Unspecified behaviour, *x and x[0] are equal (x[0] is *(x+0)) . Post-increment is ignored in your implementation.
UPD: actually it is not ignored but overwritten by assignment.
x[0]++ increases second byte:
01 02 00 00
^ -- x
and then value taken before increment (01) placed to the same place by *x=
01 01 00 00
Related
int *i= (int*)5;
char *p=(char*)'A';
printf("i=%d and p=%c",i,p);
I accidentally tried this and I got the output as i=5 and p=A.
Can someone explain? What has happened?
Converting from an integer to a pointer might be ok, given that it doesn't cause misalignment (which it does for sure in case of 5). On some computers, these pointer conversions by themselves might cause a trap/crash. On other computers it will work fine.
Then when you call printf with the wrong types, you get undefined behavior. One possible outcome of undefined behavior on some system might be "seems to work just fine". For example if sizeof(int) == sizeof(int*), the code might print the result you are getting, although no guarantees.
Regarding passing a char to printf, printf is a variadic function and those functions come with implicit promotion of the passed arguments into the int type. Which is why one might get the result A too, in case sizeof(int) happended to equal sizeof(char*).
And on a system with 64 bit pointers, you might want to try this:
printf("size of char* is %zu but the lower 32 bits are %X", sizeof(char*), (unsigned int)p);
On a little endian x86 I get the output
size of char* is 8 but the lower 32 bits are 41
Since the pointer format is little endian, the raw pointer value 41 00 00 00 00 00 00 00 can be read as a 32 bit integer 41 00 00 00 to get ASCII 'A'. So it seems likely that printf just peeled off the lowest 32 bits of the pointer.
Silly bonus program for x86 to demonstrate this:
int* i = (int*) (5 | 1145128260ull<<32); // store some secret msg in upper 32 bits
char* p = (char*)('A' | 1178944834ull<<32); // store some secret msg in upper 32 bits
printf("i=%d and p=%c\n",i,p);
printf("%.4s",(char*)&i + 4);
printf("%.4s",(char*)&p + 4);
Output:
i=5 and p=A
DEADBEEF
I'm trying to understand R-values and L-values when it comes to expressions in C and I know that many expressions are not valid L-values because the address of the end result is not known. This is because in many instances, a copy of the variables is used in the expression and so the address of the copy is not known. For example,
char ch = 'a';
char *p = &ch;
p + 1; //This expression can be used as an R-value, but not an L-value
I believe what is happening (please correct me if I'm wrong) in the expression is that a copy of p is created, 1 is added to that copy so that p+1 points at the char after ch, but the address of this new p+1 pointer is not known, so it can't be used as an L-value.
Does this behavior of making a copy of the variables and using the copies in the expression happen for all expressions in C? For example, if I have
int a = 100;
int b = 25;
a - b;
Is a copy of variable a created (and stored at an unknown location) and a copy of variable b created (the copy of b is also stored at an unknown location), the data in the copies is used to subtract and then the result is stored in another unknown location or is the data taken from the original variables, subtracted and then the result is stored at an unknown location?
Does every expression in C make a copy of the variables?
It doesn't really matter.
The compiler has optimization. Compiler is able to do anything with the code, as long as side effects are the same. In your examples p + 1; and a - b; are doing nothing, they can be optimized out by the compiler, so nothing happens.
What happens on a actual machine with the code generated by the compiler is highly machine specific. Compiler can first load the values of variables to some registers, then perform computation on these registers - or may not, may perform computation on values as they are stored, or may not even reserve any memory for the variables.
I know that many expressions are not valid L-values because the address of the end
result is not known
That could be the underlying reason you could make to justify that, but the rules are just more direct. C standard just lists operators and says that the result of + is a value or the result of * is a lvalue. There is no "generic" rule.
Does this behavior of making a copy of the variables and using the copies in the expression happen for all expressions in C?
C standard doesn't talk about "variables" or "copies of variables", C standard talks about "values". An expression has a defined value, defined end result. C standard doesn't concern really how the compiler will arrive at that value. So the actual behavior can be anything. If a particular compiler decides to make a copy of the variables or decides not to - good for him, as long as the end results are correct.
lvalue is something that can be on the left side of = operator. Not necessarily it is a variable, for example ((char*)0x01)[1] is a lvalue although it's a result of some operators. rvalue (or just value) is a value that you can't assign to.
The implementation of such calculations are not standard so they can be different in different compilers. The important thing to understand is that R-values represent temporary values. It can be a register or some allocated memory.
Calculations are made in the CPU on registers. So, the compiler will move the value of the variables in to registers and than will calculate the subtraction.
this is an assembly of such a calculation:
659: c7 45 f8 64 00 00 00 movl $0x64,-0x8(%rbp)
660: c7 45 fc 19 00 00 00 movl $0x19,-0x4(%rbp)
667: 8b 45 f8 mov -0x8(%rbp),%eax
66a: 2b 45 fc sub -0x4(%rbp),%eax
You can see that the values 100(64 in hex) and 25(19 in hex) are saved on the stack at addresses relative to base pointer(-0x8, -0x4 respectively).
than the value from -0x8 is moved into the eax register and the value from -0x4 is subtracted from the value in the register and stored in the register itself.
As you can see, the result will end up in a register, so it doesn't have an address at the memory.
According to the C11 standard(ISO-IEC-9899-2011)--6.3.2.1 Lvalues, arrays and function designators:
Except when it is the operand of the sizeof operator, the unary & operator, the ++
operator, the -- operator, or the left operand of the . operator or an assignment operator,
an lvalue that does not have array type is converted to the value stored in the designated
object (and is no longer an lvalue); this is called lvalue conversion.
Does the C99 standard allow variables to be assigned to themselves? For instance, are the following valid:
int a = 42;
/* Case 1 */
a = a;
/* Case 2 */
int *b = &a;
a = *b;
While I suspect Case 1 is valid, I'm hesitant to say the same for Case 2.
In the case of an assignment, is the right side completely evaluated before assigning the value to the variable on the left -- or is a race condition introduced when dereferencing a pointer to the variable being assigned?
Both cases are perfectly valid, since the value of a is only used to determine the value that is to be stored, not to determine the object in which this value is to be store.
In essence in an assignment you have to distinguish three different operations
determine the object to which the value is to be stored
evaluate the RHS
store the determined value in the determined object
the first two of these three operations can be done in any order, even in parallel. The third is obviously a consequence of the two others, so it will come after.
This is perfectly valid, you are only using the previous value to determine the value to be stored. This is covered in the draft C99 standard section 6.5.2 which says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.Furthermore, the prior value shall be read only to
determine the value to be stored.
One of the examples of valid code is as follows:
i = i + 1;
The C and C++ section here covers the different places where a sequence point can occur.
C99 6.5.16.1 Simple assignment
3 If the value being stored in an object is read from another object that overlaps in any way
the storage of the first object, then the overlap shall be exact and the two objects shall
have qualified or unqualified versions of a compatible type; otherwise, the behavior is
undefined.
I think the example code qualifies the "overlap" condition. Since they do have qualified version of a compatible type, the result is valid.
Also 6.5.16 Assignment operators
4 The order of evaluation of the operands is unspecified. If an attempt is made to modify
the result of an assignment operator or to access it after the next sequence point, the
behavior is undefined.
Still, there's no "attempt to modify the result" so the result is valid.
Assuming the compiler doesn't optimize the first instruction out by simply removing it, there is even a race condition here. On most architecture, if a is stored in memory a = a will be compiled in two move instructions (mem => reg, reg => mem) and therefore is not atomic.
Here is an example:
int a = 1;
int main()
{ a = a; }
Result on an Intel x86_64 with gcc 4.7.1
4004f0: 8b 05 22 0b 20 00 mov 0x200b22(%rip),%eax # 601018 <a>
4004f6: 89 05 1c 0b 20 00 mov %eax,0x200b1c(%rip) # 601018 <a>
I can't see a C compiler not permitting a = a. Such an assignment may occur serendipitously due to macros without a programmer knowing it. It may not even generate any code for that is an optimizing issue.
#define FOO (a)
...
a = FOO;
Sample code readily compiles and my review of the C standard shows no prohibition.
As to race conditions #Yu Hao answers that well: no race condition.
As far as I know the output should be like 3.14,3.14,3.14 and 256,256,256 right? But this produces a different output. Can anyone please guide me through this and tell me why this happens?
main()
{
float a = 3.14;
int b = 256;
char *p, *p1;
p = (char *) &a;
p1 = (char *) &b;
printf("\nFLOAT:");
printf("\nValue of *p=%f",*p);
printf("\nValue of a=%f",a);
printf("\nValue of *p=%f",*p);
printf("\n\nINTEGER:");
printf("\nValue of *p1=%d",*p1);
printf("\nValue of b=%d",b);
printf("\nValue of *p1=%d",*p1);
}
Output:
FLOAT:
Value of *p=0.000000
Value of a=3.140000
Value of *p=3.140001
INTEGER:
Value of *p1=0
Value of b=256
Value of *p1=0
The expression *p has type char. The format specifier %f requires an argument of type float. Passing an argument of the wrong type to a variadic function like printf invokes undefined behavior.
I got slightly different answers that you did (http://ideone.com/RG4uq) which is not really suprising given that the behavior of variadic functions with mixed floating point and integral types is undefined.
Here's what is happening. Assuming memory starts at address 0x50000000, 32-bit machine, little endian
50000000 c3 f5 48 40 (a)
50000004 00 01 00 00 (b)
50000008 00 00 00 50 (p)
5000000c 04 00 00 50 (p1)
Type of a is float, type of b is int, type of p is char*, type of p1 is char*.
For integers, you see
print p1 as an int ==> *p1 is the byte at address 50000004, which is 0, so 0 is printed.
print b as an int ==> well obviously b is 256 because it is an int.
print p1 as an int ==> *p1 is the byte at address 50000004, which is 0, so 0 is printed (as before).
On a little endian machine, you can even try printing p1[1] as an integer and you will see 1 (interesting, eh? See http://ideone.com/daS6d).
For floats, things are different. On many processors, say the x86-64, parameters are passed in registers. You are calling printf three times. Each time the thing that will be printed comes from xmm0 (assuming the x86-64). But notice when you first try to print *p which is a char, nothing is passed in xmm0 (*p is passed in %edi) so whatever junk you had in there gets printed (could be 0, or 0.0234892374 or whatever). But next you pass a real float in xmm0 and print 3.14. But when you come around for the third printf, again you don't pass anything (because *p is a char) and so what is left in xmm0? That's right 3.14. It could have been something else, perhaps, but most likely it hasn't changed. :)
It is not behaving as expected because float* and char* are of different sizes. Converting them may sacrifice your expression accuracy!
The short answer is that you are invoking undefined behavior when you pass a char type to printf using a format specifier of %f and as such you should't expect anything in particular.
The long answer is implementation dependent but what follows is my observation of what is likely occurring on your platform. When you ask printf to print a double value (as with the %f format specifier), it reads the next sizeof(double) bytes from the stack and interprets it as a floating point value and prints it. In the first printf call, the first time a new stack frame is generated, the data on the stack after the bits that form the char you actually passed equate to a zero floating point value. In the second call to printf, a new stack frame is generated, probably overwriting the same space that the first call did. In this case, a full double value is present and it is printed as expected. The stack frame is "destroyed" when the function returns. As a matter of efficiency, the stack frame is not typically zeroed out when a function returns and the contents remain. In your third call to printf, you again pass a single byte whereas you ask printf to interpret sizeof(double) bytes as a floating point value. The stack frame from the previous call to printf now contains the value of the double passed from the prior call, with one of those bytes overwritten with the byte from the new argument which results in the value printed.
If I change your second printf call to:
printf("\nValue of a=%f", 1.234);
the third call to printf prints (on my system):
Value of *p=1.234000
which seems to validate the logic presented above.
In summary, you are asking printf to read more data from the stack than you actually passed to the function and as such you the results are undefined. In your case, the data read is remnants of previous calls which explains the result you get on your specific platform. (As Ray points out, the actual way the argument is passed could vary as this is implementation dependent. Some systems will pass the value in a register but the point remains the same.)
I'm trying to convert a struct to a char array to send over the network. However, I get some weird output from the char array when I do.
#include <stdio.h>
struct x
{
int x;
} __attribute__((packed));
int main()
{
struct x a;
a.x=127;
char *b = (char *)&a;
int i;
for (i=0; i<4; i++)
printf("%02x ", b[i]);
printf("\n");
for (i=0; i<4; i++)
printf("%d ", b[i]);
printf("\n");
return 0;
}
Here is the output for various values of a.x (on an X86 using gcc):
127:
7f 00 00 00
127 0 0 0
128:
ffffff80 00 00 00
-128 0 0 0
255:
ffffffff 00 00 00
-1 0 0 0
256:
00 01 00 00
0 1 0 0
I understand the values for 127 and 256, but why do the numbers change when going to 128? Why wouldn't it just be:
80 00 00 00
128 0 0 0
Am I forgetting to do something in the conversion process or am I forgetting something about integer representation?
*Note: This is just a small test program. In a real program I have more in the struct, better variable names, and I convert to little-endian.
*Edit: formatting
What you see is the sign preserving conversion from char to int. The behavior results from the fact that on your system, char is signed (Note: char is not signed on all systems). That will lead to negative values if a bit-pattern yields to a negative value for a char. Promoting such a char to an int will preserve the sign and the int will be negative too. Note that even if you don't put a (int) explicitly, the compiler will automatically promote the character to an int when passing to printf. The solution is to convert your value to unsigned char first:
for (i=0; i<4; i++)
printf("%02x ", (unsigned char)b[i]);
Alternatively, you can use unsigned char* from the start on:
unsigned char *b = (unsigned char *)&a;
And then you don't need any cast at the time you print it with printf.
The x format specifier by itself says that the argument is an int, and since the number is negative, printf requires eight characters to show all four non-zero bytes of the int-sized value. The 0 modifier tells to pad the output with zeros, and the 2 modifier says that the minimum output should be two characters long. As far as I can tell, printf doesn't provide a way to specify a maximum width, except for strings.
Now then, you're only passing a char, so bare x tells the function to use the full int that got passed instead — due to default argument promotion for "..." parameters. Try the hh modifier to tell the function to treat the argument as just a char instead:
printf("%02hhx", b[i]);
char is a signed type; so with two's complement, 0x80 is -128 for an 8-bit integer (i.e. a byte)
Treating your struct as if it were a char array is undefined behavior. To send it over the network, use proper serialization instead. It's a pain in C++ and even more so in C, but it's the only way your app will work independently of the machines reading and writing.
http://en.wikipedia.org/wiki/Serialization#C
Converting your structure to characters or bytes the way you're doing it, is going to lead to issues when you do try to make it network neutral. Why not address that problem now? There are a variety of different techniques you can use, all of which are likely to be more "portable" than what you're trying to do. For instance:
Sending numeric data across the network in a machine-neutral fashion has long been dealt with, in the POSIX/Unix world, via the functions htonl, htons, ntohl and ntohs. See, for example, the byteorder(3) manual page on a FreeBSD or Linux system.
Converting data to and from a completely neutral representation like JSON is also perfectly acceptable. The amount of time your programs spend converting the data between JSON and native forms is likely to pale in comparison to the network transmission latencies.
char is a signed type so what you are seeing is the two-compliment representation, casting to (unsigned char*) will fix that (Rowland just beat me).
On a side note you may want to change
for (i=0; i<4; i++) {
//...
}
to
for (i=0; i<sizeof(x); i++) {
//...
}
The signedness of char array is not the root of the problem! (It is -a- problem, but not the only problem.)
Alignment! That's the key word here. That's why you should NEVER try to treat structs like raw memory. Compliers (and various optimization flags), operating systems, and phases of the moon all do strange and exciting things to the actual location in memory of "adjacent" fields in a structure. For example, if you have a struct with a char followed by an int, the whole struct will be EIGHT bytes in memory -- the char, 3 blank, useless bytes, and then 4 bytes for the int. The machine likes to do things like this so structs can fit cleanly on pages of memory, and such like.
Take an introductory course to machine architecture at your local college. Meanwhile, serialize properly. Never treat structs like char arrays.
When you go to send it, just use:
(char*)&CustomPacket
to convert. Works for me.
You may want to convert to a unsigned char array.
Unless you have very convincing measurements showing that every octet is precious, don't do this. Use a readable ASCII protocol like SMTP, NNTP, or one of the many other fine Internet protocols codified by the IETF.
If you really must have a binary format, it's still not safe just to shove out the bytes in a struct, because the byte order, basic sizes, or alignment constraints may differ from host to host. You must design your wire protcol to use well-defined sizes and to use a well defined byte order. For your implementation, either use macros like ntohl(3) or use shifting and masking to put bytes into your stream. Whatever you do, make sure your code produces the same results on both big-endian and little-endian hosts.