How do memory gets shared in union - c

for following code
#include <stdio.h>
#include <string.h>
union share
{
int num;
char str[3];
}share1;
int main()
{
strcpy(share1.str,"ab");
printf("str is %s and num is %d", share1.str, share1.num );
return 0;
}
I get output as "str is ab and num is 25185".
str is printed as it is but how do I get 25185.

Unions overlap in memory. That means the 2 bytes representing your int share the same memory location as the first 2 chars (bytes) of your string. Changing the characters automatically changes the int, since by definition they're the SAME thing... just being treated differently because you can access those two bytes as a char OR an int.
a -> 0x61
b -> 0x62
25185 -> 0x6261
^--- a
^----- b

the actual hex coding for "ab" is: 0x616200
Of importance is that the instance of the union is in file global space
so the memory is 'pre-set' to all 0x00.
depending on the architecture, (little or big Endian) that value
"ab" will be read as either:
0x00006261 or as 0x61620000
given the small magnitude of the printed number, it is obvious that
the integer representation is 0x00006261. (little Endian)
0x00006261 (hex) is 25185 (decimal)

Related

C structure with bits printing in hexadecimal

I have defined a structure as below
struct {
UCHAR DSatasetMGMT : 1;
UCHAR AtriburDeallocate : 1;
UCHAR Reserved6 : 6;
UCHAR Reserved7 : 7;
UCHAR DSatasetMGMTComply : 1;
}DatasetMGMTCMDSupport;
It is a 2 byte structure represented in bits. How should I print the whole 2 bytes of structure in hexadecimal. I tried
"DatasetMGMTCMDSupport : 0x%04X\n"
And
0x%04I64X\n
But not getting expected result.
I am getting 0x3DC18003 with 0x%04X\n while the correct data is 0x8003 "
I am using 64 bit windows system.
I need to know how to print 2 byte structure in hexadecimal.
Try using 0x%04hx\n. This tells printf to print out only the two bytes. You can read more about it here: https://en.wikipedia.org/wiki/Printf_format_string#Length_field
In contrast, the I64 in 0x%04I64X\n tells printf to print out a 64 bit integer, which is 8 bytes, and 0x%04X\n tells it to print out a default-size integer, which might be 4 bytes on your system.
The width 04 specifies a minimum width. Since the value needs more digits, they are printed.
From a C Standard point of view, you cannot rely on a particular layout of bit fields. Hence, any solution will at best have implementation-defined behaviour.
That being said, your expected output can be obtained. The structure fits in 2 bytes and if you print sizeof(DatasetMGMTCMDSupport) it should give the result 2.
The byte representation of DatasetMGMTCMDSupport can be printed and that is what you were attempting, but since your system has integer size 4, two additional bytes are included. To fix this, the following can be done:
#include <stdint.h>
#include <string.h>
#include <stdio.h>
...
uint16_t a;
memcpy(&a, &DatasetMGMTCMDSupport, sizeof(a));
printf("0x%04X", (unsigned)a);
This copies the 2 bytes of DatasetMGMTCMDSupport into a 2-byte integer variable and prints the hexadecimal representation of those 2 bytes only. If you are on a little-endian system, you should see 0x8003.
A more general approach would be to directly print the bytes of DatasetMGMTCMDSupport:
for(unsigned i = 0; i < sizeof(DatasetMGMTCMDSupport); i++)
{
printf("%02X", (unsigned)((unsigned char *)&DatasetMGMTCMDSupport)[i]);
}
This will most likely print 0380 (notice the byte order: first byte printed first).
To reverse the byte order is straightforward:
for(unsigned i = 0; i < sizeof(DatasetMGMTCMDSupport); i++)
{
printf("%02X", (unsigned)((unsigned char *)&DatasetMGMTCMDSupport)[sizeof(DatasetMGMTCMDSupport)-1-i]);
}
which should give 8003.

Does memcpy copy bytes in reverse order?

I am little bit confused on usage of memcpy. I though memcpy can be used to copy chunks of binary data to address we desire. I was trying to implement a small logic to directyl convert 2 bytes of hex to 16 bit signed integer without using union.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main()
{ uint8_t message[2] = {0xfd,0x58};
// int16_t roll = message[0]<<8;
// roll|=message[1];
int16_t roll = 0;
memcpy((void *)&roll,(void *)&message,2);
printf("%x",roll);
return 0;
}
This return 58fd instead of fd58
No, memcpy did not reverse the bytes as it copied them. That would be a strange and wrong thing for memcpy to do.
The reason the bytes seem to be in the "wrong" order in the program you wrote is that that's the order they're actually in! There's probably a canonical answer on this somewhere, but here's what you need to understand about byte order, or "endianness".
When you declare a string, it's laid out in memory just about exactly as you expect. Suppose I write this little code fragment:
#include <stdio.h>
char string[] = "Hello";
printf("address of string: %p\n", (void *)&string);
printf("address of 1st char: %p\n", (void *)&string[0]);
printf("address of 5th char: %p\n", (void *)&string[4]);
If I compile and run it, I get something like this:
address of string: 0xe90a49c2
address of 1st char: 0xe90a49c2
address of 5th char: 0xe90a49c6
This tells me that the bytes of the string are laid out in memory like this:
0xe90a49c2 H
0xe90a49c3 e
0xe90a49c4 l
0xe90a49c5 l
0xe90a49c6 o
0xe90a49c7 \0
Here I've shown the string vertically, but if we laid it out horizontally, with addresses increasing from left to right, we would see the characters of the string "Hello" laid out from left to right also, just as we would expect.
But that's for strings, which are arrays of char. But integers of various sizes are not really built out of characters, and it turns out that the individual bytes of an integer are not necessarily laid out in memory in "left-to-right" order as we might expect. In fact, on the vast majority of machines today, the bytes within an integer are laid out in the opposite order. Let's take a closer look at how that works.
Suppose I write this code:
int16_t i2 = 0x1234;
printf("address of short: %p\n", (void *)&i2);
unsigned char *p = &i2;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
This initializes a 16-bit (or "short") integer to the hex value 0x1234, and then uses a pointer to print the two bytes of the integer in "left-to-right" order, that is, with the lower-addressed byte first, followed by the higher-addressed byte.
On my machine, the result is something like:
address of short: 0xe68c99c8
0xe68c99c8: 34
0xe68c99c9: 12
You can clearly see that the byte that's stored at the "front" of the two-byte region in memory is 34, followed by 12. The least-significant byte is stored first. This is referred to as "little endian" byte order, because the "little end" of the integer — its least-significant byte, or LSB — comes first.
Larger integers work the same way:
int32_t i4 = 0x5678abcd;
printf("address of long: %p\n", (void *)&i4);
p = &i4;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
This prints:
address of long: 0xe68c99bc
0xe68c99bc: cd
0xe68c99bd: ab
0xe68c99be: 78
0xe68c99bf: 56
There are machines that lay the byes out in the other order, with the most-significant byte (MSB) first. Those are called "big endian" machines, but for reasons I won't go into they're not as popular.
How do you construct an integer value out of individual bytes if you don't know your machine's byte order? The best way is to do it "mathematically", based on the properties of the numbers. For example, let's go back to your original array of bytes:
uint8_t message[2] = {0xfd, 0x58};
Now, you know, because you wrote it, that 0xfd is supposed to be the MSB and 0xf8 is supposed to be the LSB. So one good way of combining them together into an integer is like this:
int16_t roll = message[0] << 8; /* MSB */
roll |= message[1]; /* LSB */
The nice thing about this code is that it works correctly on machines of either endianness. I called this technique "mathematical" because it's equivalent to doing it this other way:
int16_t roll = message[0] * 256; /* MSB */
roll += message[1]; /* LSB */
And, in fact, this suggestion of mine involving roll = message[0] << 8 is very close to something you already tried, but had commented out in the code you posted. The difference is that you don't want to think about it in terms of two bytes next to each other in memory; you want to think about it in terms of the most- and least-significant byte. When you say << 8, you're obviously thinking about the most-significant byte, so that should be message[0].
Does memcpy copy bytes in reverse order?
memcpy does not reverse the order bytes.
This return 58fd instead of fd58
Yes, your computer is little endian, so bytes 0xfd,0x58 in order are interpreted by your computer as the value 0x58fd.

declaring string using pointer to int

I am trying to initialize a string using pointer to int
#include <stdio.h>
int main()
{
int *ptr = "AAAA";
printf("%d\n",ptr[0]);
return 0;
}
the result of this code is 1094795585
could any body explain this behavior and why the code gave this answers ?
I am trying to initialize a string using pointer to int
The string literal "AAAA" is of type char[5], that is array of five elements of type char.
When you assign:
int *ptr = "AAAA";
you actually must use explicit cast (as types don't match):
int *ptr = (int *) "AAAA";
But, still it's potentially invalid, as int and char objects may have different alignment requirements. In other words:
alignof(char) != alignof(int)
may hold. Also, in this line:
printf("%d\n", ptr[0]);
you are invoking undefined behavior (so it might print "Hello from Mars" if compiler likes so), as ptr[0] dereferences ptr, thus violating strict aliasing rule.
Note that it is valid to make transition int * ---> char * and read object as char *, but not the opposite.
the result of this code is 1094795585
The result makes sense, but for that, you need to rewrite your program in valid form. It might look as:
#include <stdio.h>
#include <string.h>
union StringInt {
char s[sizeof("AAAA")];
int n[1];
};
int main(void)
{
union StringInt si;
strcpy(si.s, "AAAA");
printf("%d\n", si.n[0]);
return 0;
}
To decipher it, you need to make some assumptions, depending on your implementation. For instance, if
int type takes four bytes (i.e. sizeof(int) == 4)
CPU has little-endian byte ordering (though it's not really matter, since every letter is the same)
default character set is ASCII (the letter 'A' is represented as 0x41, that is 65 in decimal)
implementation uses two's complement representation of signed integers
then, you may deduce, that si.n[0] holds in memory:
0x41 0x41 0x41 0x41
that is in binary:
01000001 ...
The sign (most-significant) bit is unset, hence it is just equal to:
65 * 2^24 + 65 * 2^16 + 65 * 2^8 + 65 =
65 * (2^24 + 2^16 + 2^8 + 1) = 65 * 16843009 = 1094795585
1094795585 is correct.
'A' has the ASCII value 65, i.e. 0x41 in hexadecimal.
Four of them makes 0x41414141 which is equal to 1094795585 in decimal.
You got the value 65656565 by doing 65*100^0 + 65*100^1 + 65*100^2 + 65*100^3 but that's wrong since a byte1 can contain 256 different values, not 100.
So the correct calculation would be 65*256^0 + 65*256^1 + 65*256^2 + 65*256^3, which gives 1094795585.
It's easier to think of memory in hexadecimal because one hexadecimal digit directly corresponds to half a byte1, so two hex digits is one full byte1 (cf. 0x41). Whereas in decimal, 255 fits in a single byte1, but 256 does not.
1 assuming CHAR_BIT == 8
65656565 this is a wrong representation of the value of "AAAA" you are seprately representing each character and "AAAA" is stored as array.Its converting into 1094795585 because %d identifier prints decimal value. Run this in gdb with following command:
x/8xb (pointer) //this will show you the memory hex value
x/d (pointer) //this will show you the converted decimal value
#zenith gave you the answer you expected, but your code invokes UB. Anyway, you could demonstrate the same in an almost correct way :
#include <stdio.h>
int main()
{
int i, val;
char *pt = (char *) &val; // cast a pointer to any to a pointer to char : valid
for (i=0; i<sizeof(int); i++) pt[i] = 'A'; // assigning bytes of int : UB in general case
printf("%d 0x%x\n",val, val);
return 0;
}
Assigning bytes of an int is UB in the general case because C standard says that [for] signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. And a remark adds Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit.
But in common architectures, there are no padding bits and all bits values correspond to valid numbers, so the operation is valid (but implementation dependant) on all common systems. It is still implementation dependant because size of int is not fixed by standard, nor is endianness.
So : on a 32 bit system using no padding bits, above code will produce
1094795585 0x41414141
indepentantly of endianness.

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Storing an unsigned integer into a struct

I've written this piece of code where I've assigned an unsigned integer to two different structs. In fact they're the same but one of them has the __attribute__((packed)).
#include
#include
struct st1{
unsigned char opcode[3];
unsigned int target;
}__attribute__((packed));
struct st2{
unsigned char opcode[3];
unsigned int target;
};
void proc(void* addr) {
struct st1* varst1 = (struct st1*)addr;
struct st2* varst2 = (struct st2*)addr;
printf("opcode in varst1: %c,%c, %c\n",varst1->opcode[0],varst1->opcode[1],varst1->opcode[2]);
printf("opcode in varst2: %c,%c,%c\n",varst2->opcode[0],varst2->opcode[1],varst2->opcode[2]);
printf("target in varst1: %d\n",varst1->target);
printf("target in varst2: %d\n",varst2->target);
};
int main(int argc,char* argv[]) {
unsigned int* var;
var =(unsigned int*) malloc(sizeof(unsigned int));
*var = 0x11334433;
proc((void*)var);
return 0;
}
The output is:
opcode in varst1: 3,D,3
opcode in varst2: 3,D,3
target in varst1: 17
target in varst2: 0
Given that I'm storing this number
0x11334433 == 00010001001100110100010000110011
I'd like to know why that is the output I get.
This is to do with data alignment. Most compilers will align data on address boundaries that help with general performance. So, in the first case, the struct with the packed attribute, there is an extra byte between the char [3] and the int to align the int on a four byte boundary. In the packed version that padding byte is missing.
byte : 0 1 2 3 4 5 6 7
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
You allocate an unsigned int and pass that to the function:
byte : 0 1 2 3 4 5 6 7
alloc : |-----------int------------------| |---unallocated---|
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
If you're using a little endian system then the lowest eight bits (right most) are stored at byte 0 (0x33), byte 1 has 0x44, byte 2 has 0x33 and byte 4 has 0x11. In the st1 structure the int value is mapped to memory beyond the end of the allocated amount and the st2 version the lowest byte of the int is mapped to the byte 4, 0x11. So st1 produces 0 and st2 produces 0x11.
You are lucky that the unallocated memory is zero and that you have no memory range checking going on. Writing to the ints in st1 and st2 in this case could corrupt memory at worst, generate memory guard errors or do nothing. It is undefined and dependant on the runtime implementation of the memory manager.
In general, avoid void *.
Your bytes look like this:
00010001 00110011 01000100 00110011
Though obviously your endianness is wrong and in fact they're like this:
00110011 01000100 00110011 00010001
If your struct is packed then the first three bytes are associated with opcode, and the 4th is target - thats why the packed array has atarget of 17 - 0001001 in binary.
The unpacked array is padded with zeros, which is why target in varst2 is zero.
%c interprets the argument as the ascii code of a character and prints the character
3's ascii code is 0x33
D's ascii code is 0x44
17 is 0x11
an int is stored little endian or big endian depending on the processor architecture -- you can't depend on it going into your struct's fields in order.
The int target in the unpacked version is past the position of the int, so it stays 0.

Resources