How this code works - regarding union initialization in c? - c

I got the output 0 2 for this program.....
but don't know why?
Please explain i think only int i is initialized with 512.
But how ch[1] got the value 2.
#include <stdio.h>
int main()
{
union a /* declared */
{
int i; char ch[2];
};
union a z = { 512 };
printf("%d%d", z.ch[0], z.ch[1]);
return 0;
}

Union declaration means that all its members are allocated the same memory. So your int i and char ch[2] are referencing the same memory space -- in other words, they are aliased. Whenever you change one, you will change the other as well.
Now, assuming your ints are 32-bit wide and you're on a little-endian system like x86, i = 512 (512 == 0x00000200) actually looks like this in memory:
0x00 0x02 0x00 0x00.
with the first two values corresponding directly to the 2-character array:
ch[0] ch[1]
So you get ch[0] == 0x0 and ch[1] == 0x02.
Try setting your i = 0x1234 and see what effect it will have on your character array.
Based on your question, it's possible that you may want to use a struct instead of union -- then its members would be allocated in memory sequentially (one after the other).

512 is 0x200 in hex, so the first byte of your union is 0 the second is 2. If you dont specify which union member should be initialized, the first one will be taken, the int in your case.
You get 2 for the second byte of your string as the first byte of ch is intialized with 0, the second one with 2.

Simple: 512 = binary 1000000000, so ch[0] will get the 8 zeroes (assuming your system is little endian) and ch[1] will get the 10 part, which, in decimal, is 2.

you intermix 'struct' with 'union'. in union you collect different typed and named data into one field (with lenght = maximum (size of data)), which you can access, and for which you have yourself make sure you get the right data.
your example allocs memory for max(int, char[2])
It is no difference, if you say z.i = 32 or z.ch[0]=' '

You got 0 2 for good reasons but the C standard says that the behavior is not defined. If you write i then the value of ch can be theoretically anything.
However, the gcc assures that the data will be well-aligned.

Related

Variable assignment disparity inside a union

What I've heard about union is that it will assign the memory space for biggest sized variable within it. Here I'm trying to assign 'same' value in two different ways , but its ending up problematic.
First,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
The output would be a large random number followed by 'X'
-1674402216 X
When I tried assigning h1.a also into a number,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.a = 1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
This gives the output
88 X
Can someone help me to figure out what exactly is happening here ?
Thankyou :)
Union members occupy same space in memory.
So your union looks something like this:
N-1 ...
--------
N ||X||a||
N+1 | |a||
N+2 | |a||
N+3 | |a||
... | |
--------
...
(Assuming system with 32 bit integer.)
By assiging X you have also modified one byte of your un-initialized a. Your value (-1674402216) can be interpreted as 9C32A658 in base 16. You least significant byte is 58 HEX, which is ASCII code of X adn your other three bytes kept their initial uninitialized value.
In your second case you first initialized int to 1 (which set all but least significant byte to 0), then you have overwritten least significant byte by X have gotten 88 (ASCII code of X) when interpreted as int, and original 'X', when
looking at char member.
Not to forget to mention: Layout like this is implementation defined. Standard does say, as mentioned in comments to you question, that you should not actually access member not written last while at the same time, it is a common practice to use them exactly to do this (see this threads: Why do we need C Unions?, What is the strict aliasing rule?).

unable to understand the output of union program in C

I know the basic properties of union in C but still couldn't understand the output, can somebody explain this?
#include <stdio.h>
int main()
{
union uni_t{
int i;
char ch[2];
};
union uni_t z ={512};
printf("%d%d",z.ch[0],z.ch[1]);
return 0;
}
The output when running this program is
02
union a
{
int i;
char ch[2];
}
This declares a type union a, the contents of which (i.e. the memory area of a variable of this type) could be accessed as either an integer (a.i) or a 2-element char array (a.ch).
union a z ={512};
This defines a variable z of type union a and initializes its first member (which happens to be a.i of type int) to the value of 512. (Cantfindname has the binary representation of that.)
printf( "%d%d", z.ch[0], z.ch[1] );
This takes the first character, then the second character from a.ch, and prints their numerical value. Again, Cantfindname talks about endianess and how it affects the results. Basically, you are taking apart an int byte-by-byte.
And the whole shebang is apparently assuming that sizeof( int ) == 2, which hasn't been true for desktop computers for... quite some time, so you might want to be looking at a more up-to-date tutorial. ;-)
What you get here is the result of endianess (http://en.wikipedia.org/wiki/Endianness).
512 is 0b0000 0010 0000 0000 in binary, which in little endian is stored in the memory as 0000 0000 0000 0010. Then ch[0] reads the last 8 bits (0b0000 0010 = 2 in decimal) and ch[1] reads the first 8 bits (0b0000 0000 = 0 in decimal).
Using int will not lead to this output in 32 bit machines as sizeof(int) = 4. This output will occur only if we use a 16 bit system or we use short int having memory size of 2 bytes.
A Union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping track of size and alignment requirements.
union uni_t
{
short int i;
char ch[2];
};
This code snippet declares a union having two members- a integer and a character array.
The union can be used to hold different values at different times by simply allocating the values.
union uni_t z ={512};
This defines a variable z of type union uni_t and initializes the integer member ( i ) to the value of 512.
So the value stored in z becomes : 0b0000 0010 0000 0000
When this value is referenced using character array then ch[1] refers to first byte of data and ch[0] refers to second byte.
ch[1] = 0b00000010 = 2
ch[0] = ob00000000 = 0
So printf("%d%d",z.ch[0],z.ch[1]) results to
02

C programming: words from byte array

I have some confusion regarding reading a word from a byte array. The background context is that I'm working on a MIPS simulator written in C for an intro computer architecture class, but while debugging my code I ran into a surprising result that I simply don't understand from a C programming standpoint.
I have a byte array called mem defined as follows:
uint8_t *mem;
//...
mem = calloc(MEM_SIZE, sizeof(uint8_t)); // MEM_SIZE is pre defined as 1024x1024
During some of my testing I manually stored a uint32_t value into four of the blocks of memory at an address called mipsaddr, one byte at a time, as follows:
for(int i = 3; i >=0; i--) {
*(mem+mipsaddr+i) = value;
value = value >> 8;
// in my test, value = 0x1084
}
Finally, I tested trying to read a word from the array in one of two ways. In the first way, I basically tried to read the entire word into a variable at once:
uint32_t foo = *(uint32_t*)(mem+mipsaddr);
printf("foo = 0x%08x\n", foo);
In the second way, I read each byte from each cell manually, and then added them together with bit shifts:
uint8_t test0 = mem[mipsaddr];
uint8_t test1 = mem[mipsaddr+1];
uint8_t test2 = mem[mipsaddr+2];
uint8_t test3 = mem[mipsaddr+3];
uint32_t test4 = (mem[mipsaddr]<<24) + (mem[mipsaddr+1]<<16) +
(mem[mipsaddr+2]<<8) + mem[mipsaddr+3];
printf("test4= 0x%08x\n", test4);
The output of the code above came out as this:
foo= 0x84100000
test4= 0x00001084
The value of test4 is exactly as I expect it to be, but foo seems to have reversed the order of the bytes. Why would this be the case? In the case of foo, I expected the uint32_t* pointer to point to mem[mipsaddr], and since it's 32-bits long, it would just read in all 32 bits in the order they exist in the array (which would be 00001084). Clearly, my understanding isn't correct.
I'm new here, and I did search for the answer to this question but couldn't find it. If it's already been posted, I apologize! But if not, I hope someone can enlighten me here.
It is (among others) explained here: http://en.wikipedia.org/wiki/Endianness
When storing data larger than one byte into memory, it depends on the architecture (means, the CPU) in which order the bytes are stored. Either, the most significant byte is stored first and the least significant byte last, or vice versa. When you read back the individual bytes through byte access operations, and then merge them to form the original value again, you need to consider the endianess of your particular system.
In your for-loop, you are storing your value byte-wise, starting with the most significant byte (counting down the index is a bit misleading ;-). Your memory looks like this afterwards: 0x00 0x00 0x10 0x84.
You are then reading the word back with a single 32 bit (four byte) access. Depending on our architecture, this will either become 0x00001084 (big endian) or 0x84100000 (little endian). Since you get the latter, you are working on a little endian system.
In your second approach, you are using the same order in which you stored the individual bytes (most significant first), so you get back the same value which you stored earlier.
It seems to be a problem of endianness, maybe comes from casting (uint8_t *) to (uint32_t *)

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Pointer Dereferencing = Program Crash

unsigned int *pMessageLength, MessageLength;
char *pszParsePos;
...
//DATA into pszParsePos
...
printf("\nMessage Length\nb1: %d\nb2: %d\nb3: %d\nb4: %d\n",
pszParsePos[1],pszParsePos[2],pszParsePos[3],pszParsePos[4]);
pMessageLength= (unsigned int *)&pszParsePos[1];
MessageLength = *((unsigned int *)&pszParsePos[1]);
//Program Dies
Output:
Message Length
b1: 0
b2: 0
b3: 0
b4: 1
I'm don't understand why this is crashing my program. Could someone explain it, or at least suggest an alternative method that won't crash?
Thanks for your time!
Bus error means that you're trying to access data with incorrect alignment. Specifically, it seems like the processor requires int to be aligned more strictly than just anywhere, and if your *pszParsePos is aligned, say on an int boundary (which depends on how you initialize it, but will happen, e.g., if you use malloc), it's certain that &pszParsePos[1] isn't.
One way to fix this would be constructing MessageLength explicitly, i.e., something like
MessageLength = (pszParsePos[1] << 24) | (pszParsePos[2] << 16) | (pszParsePos[3] << 8) | pszParsePos[4]
(or the other way around if it's supposed to be little-endian). If you really want to type-pun, make sure that the pointer you're accessing is properly aligned.
Here's what I think is going wrong:
You added in a comment that you are runing on the Blackfin Processor. I looked this up on some web sites and they claim that the Blackfin requires what are called aligned accesses. That is, if you are reading or writing a 32-bit value to/from memory, then the physical address must be a an even multiple of 4 bytes.
Arrays in C are indexed beginning with [0], not [1]. A 4-byte array of char ends with element [3].
In your code, you have a 4-byte array of char which:
You treat as though it began at index 1.
You convert via pointer casts to a DWORD via 32-bit memory fetch.
I suspect your 4-char array is aligned to a 4-byte boundary, but as you are beginning your memory access at position +1 byte, you get a misalignment of data bus error.

Resources