Why casting short* to int* shows incorrect value

Why casting short* to int* shows incorrect value - c

To better learn how malloc and pointers work internally, I created an array of short. On my system, int is double the size of short, so I created another pointer q of type int* and set its address to the casted value of p:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main() {
short* p = (short*) malloc(2 * sizeof(short));
int* q = (int*) p;
assert(sizeof *q == 2 * sizeof *p);
p[0] = 0;
p[1] = 1;
printf("%u\n", *q);
}
When I print *q it shows the number 65536 instead of 1 and I can't figure out why. If I understand correctly, p should be represented as the following (assuming short is 2 bytes and int is 4 bytes):
p[0] p[1]
0000 0000 0000 0000 | 0000 0000 0000 0001
So *q should read 4 bytes hence reading the value 1. Instead it shows 65536 which is represented as:
0000 0000 0000 0001 0000 0000 0000 0000

Most systems you're likely to interact with these days use little-endian byte ordering, which mean that the least significant byte comes first.
So the bytes starting at p[1] contain 0x01 0x00, not 0x00 0x01. This also means the bytes starting at p[0] are 0x00 0x00 0x10 0x00. If these bytes are then interpreted as a 4 byte int it has the value 0x00010000, i.e. 65536 decimal.
Also, reinterpreting bytes in this fashion (i.e. taking a pointer to one type, casting it to another pointer type, and dereferencing), is an aliasing violation and triggers undefined behavior, so there is no guarantee this will always work in this way.

This is due to endianness (https://en.wikipedia.org/wiki/Endianness).
This determines which byte comes first in memory. Therefore, if you flip the bytes in your representation, you get exactly what you provided as the representation for 65536.
You seem to be on a little endian machine.

Related

How can I treat a short integer as an array of elements in C?

So I have an integer stored as a short. Let's say:
short i = 3000;
Which in binary is:
0011 0000 0000 0000
I was told I can treat it as an array of two elements where each element is a byte basically, so:
i[0] = 0011 0000
i[1] = 0000 0000
How can I accomplish this?

You could do it like this (assuming short is 2 bytes)
short i = 3000; // 3000 in Binary is: 00001011 10111000
unsigned char x[2] = {0};
memcpy(x, &i, 2);
Now x[0] will be 10111000 and x[1] 00001011 if this code runs on little endian machine. And reverse will hold true in case of big endian machine.
Btw. Your binary representation of 3000 looks wrong

Initialization of a union in C

I came across this objective question on the C programming language. The output for the following code is supposed to be 0 2, but I don't understand why.
Please explain the initialization process. Here's the code:
#include <stdio.h>
int main()
{
union a
{
int x;
char y[2];
};
union a z = {512};
printf("\n%d %d", z.y[0], z.y[1]);
return 0;
}

I am going to assume that you use a little endian system where sizeof int is 4 bytes (32 bits) and sizeof a char is 1 byte (8 bits), and one in which integers are represented in two's complement form. A union only has the size of its largest member, and all the members point to this exact piece of memory.
Now, you are writing to this memory an integer value of 512.
512 in binary is 1000000000.
or in 32 bit two's complement form:
00000000 00000000 00000010 00000000.
Now convert this to its little endian representation and you'll get:
00000000 00000010 00000000 00000000
|______| |______|
| |
y[0] y[1]
Now see the above what happens when you access it using indices of a char array.
Thus, y[0] is 00000000 which is 0,
and y[1] is 00000010 which is 2.

The memory allocated for the union is the size of the largest type in the union, which is intin this case. Let's say the size of int on your system is 2 bytes then
512 will be 0x200.
Represenataion looks like:
0000 0010 0000 0000
| | |
-------------------
Byte 1 Byte 0
So the first byte is 0 and the second one is 2.(On Little endian systems)
char is one byte on all systems.
So the access z.y[0] and z.y[1] is per byte access.
z.y[0] = 0000 0000 = 0
z.y[1] = 0000 0010 = 2
I am just giving you how memory is allocated and the value is stored.You need to consider the below points since the output depends on them.
Points to be noted:
The output is completely system dependent.
The endianess and the sizeof(int) matters, which will vary across the systems.
PS: The memory occupied by both the members is the same in union.

The standard says that
6.2.5 Types:
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.
The compiler allocates only enough space for the largest of the members, which overlay each other within this space. In your case, memory is allocated for int data type (assuming 4-bytes). The line
union a z = {512};
will initialize the first member of union z, i.e. x becomes 512. In binary it is represented as 0000 0000 0000 0000 0000 0010 0000 0000 on a 32 machine.
Memory representation for this would depend on the machine architecture. On a 32-bit machine it either will be like (store the least significant byte in the smallest address-- Little Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0010
0x1002 0000 0000
0x1003 0000 0000
or like (store the most significant byte in the smallest address -- Big Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0000
0x1002 0000 0010
0x1003 0000 0000
z.y[0] will access the content at addrees 0x1000 and z.y[1] will access the content at address 0x1001 and those content will depend on the above representation.
It seems that your machine supports Little Endian representation and therefore z.y[0] = 0 and z.y[1] = 2 and output would be 0 2.
But, you should note that footnote 95 of section 6.5.2.3 states that
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

The size of the union is derived by the maximum size to hold a single element of it. So, here it is the size of int.
Assuming it to be 4 bytes/int and 1 bytes/char, we can say: sizeof union a = 4 bytes.
Now, let's see how it is actually stored in memory:
For example, an instance of the union, a, is stored at 2000-2003:
2000 -> last(4th / least significant / rightmost) byte of int x, y[0]
2001 -> 3rd byte of int x, y[1]
2002 -> 2nd byte of int x
2003 -> 1st byte of int x (most significant)
Now, when you say z=512:
since z = 0x00000200,
M[2000] = 0x00
M[2001] = 0x02
M[2002] = 0x00
M[2003] = 0x00
So, whey you print, y[0] and y[1], it will print data M[2000] and M[2001] which is 0 and 2 in decimal respectively.

For automatic (non-static) members, the initialization is identical to assignment:
union a z;
z.x = 512;

how to extract lower bytes from a Hex value?

I am making a program to communicate with a serial device. Device is giving me the data in hex format. Hex format which I am getting is FFFFFF84 but I am interested in extracting the last two bits that is 84 . So how can i extract it?
while(1)
{
int i;
char receivebuffer [1];
read (fd, receivebuffer, sizeof receivebuffer);
for ( i = 0; i < sizeof (receivebuffer); i++)
{
printf("value of buffer is %X\n\n", (char)receivebuffer[i]);
}
return 0;
}
I am getting the data in receivebuffer. Please help thanks.

you want to extract the last 2 byte? you need operator '&' extract it:
FFFFFF84 -> 1111 1111 1111 1111 1111 1111 1000 0100
000000FF -> 0000 0000 0000 0000 0000 0000 1111 1111
---------------------------------------------------
after & -> 0000 0000 0000 0000 0000 0000 1000 0100
so the anwser is do assignment:
last2 = input & 0xFF
hope this anwser help you understand bit operation.

You're just confused because printf is printing your data as a sign-extended int (this means that char on your system char is treated as signed - note that this is implementation-defined).
Change your printf to:
printf("value of buffer is %#X\n\n", (unsigned char)receivebuffer[i]);
or just make the type of receivebuffer unsigned:
unsigned char receivebuffer[1];
// ...
printf("value of buffer is %#X\n\n", receivebuffer[i]);

Device is giving me the data in hex format.
This contradicts your code. It seems the device gives you the data in binary (raw) format and you covert it to hex for printing. That is a huge difference.
If you do
printf("value of buffer is %X\n\n", (char)receivebuffer[i]);
the char (whose cast is unnecessary as it is already a char) gets converted to int. As your system has char signed, the resulting int is negative and thus the FFF... at the start.
You can do any of
printf("value of buffer is %X\n\n", receivebuffer[i] & 0xFF);
printf("value of buffer is %X\n\n", (unsigned char)receivebuffer[i]);
printf("value of buffer is %X\n\n", (uint8_t)receivebuffer[i]);

I know this is an old topic but I just want to add another option:
printf("value of buffer is %hhx\n\n", receivebuffer[i]);
%hhx translates to a "short short hex" value, or in other words, an 8-bit hex value.

The device just returns bytes. It is the printf which displays a byte in a certain format (decimal, hex, etc.). To show bytes in hex you should use "0x%02x" format.

using C Pointer with char array

int i=512;
char *c = (char *)&i;
c[0] =1;
printf("%d",i);
this displays "513", it adds 1 to i.
int i=512;
char *c = (char *)&i;
c[1] =1;
printf("%d",i);
whereas this displays 256. Divides it by 2.
Can someone please explain why? thanks a lot

Binary
The 32-bit number 512 expressed in binary, is just:
00000000000000000000001000000000
because 2 to the power of 9 is 512. Conventionally, you read the bits from right-to-left.
Here are some other decimal numbers in binary:
0001 = 1
0010 = 2
0011 = 3
0100 = 4
The Cast: Reinterpreting the Int as an Array of Bytes
When you do this:
int i = 512;
char *c = (char *)&i;
you are interpreting the 4-byte integer as an array of characters (8-bit bytes), as you probably know. If not, here's what's going on:
&i
takes the address of the variable i.
(char *)&i
reinterprets it (or casts it) to a pointer to char type. This means it can now be used like an array. Since you know an int is at least 32-bit on your machine, can access its bytes using c[0], c[1], c[2], c[3].
Depending on the endianness of the system, the bytes of the number might be laid out: most significant byte first (big endian), or least significant byte first (little endian). x86 processors are little endian. This basically means the number 512 is laid out as in the example above, i.e.:
00000000 00000000 00000010 00000000
c[3] c[2] c[1] c[0]
I've grouped the bits into separate 8-bit chunks (bytes) corresponding to the way they are laid out in memory. Note, you also read them right-to-left here, so we can keep with conventions for the binary number system.
Consequences
Now setting c[0] = 1 has this effect:
00000000 00000000 00000010 00000001
c[3] c[2] c[1] c[0]
which is 2^9 + 2^0 == 513 in decimal.
Setting c[1] = 1 has this effect:
00000000 00000000 00000001 00000000
c[3] c[2] c[1] c[0]
which is 2^8 == 256 in decimal, because you've overwritten the second byte 00000010 with 00000001
Do note on a big endian system, the bytes would be stored in reverse order to a little endian system. This would mean you'd get totally different results to ones you got if you ran it on one of those machines.

Remember char is 8 bit, 512 is bit representation is
512 = 10 0000 0000
when you do char *c = (char *)&i; you make:
c[1] = 10
c[0] = 0000 0000
when you do c[0] = 1
you make it 10 0000 0001 which is 513.
when you do c[1] = 1, you make it 01 0000 0000 which is 256.

Before you wonder why what you're seeing is "odd", consider the platform you're running your code on, and the endianness therein.
Then consider the following
int main(int argc, char *argv[])
{
int i=512;
printf("%d : ", i);
unsigned char *p = (unsigned char*)&i;
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
char *c = (char *)&i;
c[0] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
i = 512;
c[1] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
return 0;
}
On my platform (Macbook Air, OS X 10.8, Intel x64 Arch)
512 : 00020000
513 : 01020000
256 : 00010000
Couple what you see above with what you have hopefully read about endianness, and you can clearly see my platform is little endian. So whats yours?

Since you are aliasing an int through a char pointer, and a char is 8 bits wide (a byte), the assignment:
c[1] = 1;
will set the second byte of i to 000000001. Bytes 1, 3 and 4 (if sizeof(int) == 4) will stay unmodified. Previously, that second byte was 000000010 (since I assume you're on an x86-based computer, which is a little-endian architecture.) So basically, you shifted the only bit that was set one position to the right. That's a division by 2.
On a little-endian machine and a compiler with 32-bit int, you originally had these four bytes in i:
c[0] c[1] c[2] c[3]
00000000 00000010 00000000 00000000
After the assignment, i was set to:
c[0] c[1] c[2] c[3]
00000000 00000001 00000000 00000000
and therefore it went from 512 to 256.
Now you should understand why c[0] = 1 results in 513 :-) Think about which byte is set to 1 and that the assignment doesn't change the other bytes at all.

It's because your machine is little endian, meaning the least-significant byte is stored first in memory.
You said int i=512;. 512 is 0x00000200 in hex (assuming a 32-bit OS for simplicity). Let's look at how i would be stored in memory as hexadecimal bytes:
00 02 00 00 // 4 bytes, least-significant byte first
Now we interpret that same memory location as a character array by doing char *c = (char *)&i; - same memory, different interpretation:
00 02 00 00
c[0][1][2][3]
Now we change c[0] with c[0] =1; and the memory looks like
01 02 00 00
Which means if we look at it as a little endian int again (by doing printf("%d",i);), it's hex 0x00000201, which is 513 decimal.
Now if we go back and change c[1] with c[1] =1;, your memory now becomes:
00 01 00 00
Now we go back and interpret it as a little endian int, it's hex 0x00000100, which is 256 decimal.

It's depends on the machine whether that is little endian or big endian that how data is stored in bits.for more read this about endianness
C language doesn't guarantee about this .
512 in binary :
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0000 ==>512
=============================================
12 34 56 78
(0x12345678 suppose address of this int)
char *c =(char *)&i now c[0] either point to 0x78 or 0x12
Modifying the value using c[0] may result to 513 if it points to 0x78
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0001 ==> 513
=============================================
or, can be
=============================================
0000 0001 | 0000 0000 | 0000 0010 | 0000 0000 ==>2^24+512
=============================================
Similarly for 256 also : because your c1 will have the address of 2nd byte from right.
in figure below,
=============================================
0000 0000 | 0000 0000 | 0000 0001 | 0000 0000 ==>256
=============================================
So its implemention of representation of numbers in our system

int to char casting

int i = 259; /* 03010000 in Little Endian ; 00000103 in Big Endian */
char c = (char)i; /* returns 03 in both Little and Big Endian?? */
In my computer it assigns 03 to char c and I have Little Endian, but I don't know if the char casting reads the least significant byte or reads the byte pointed by the i variable.

Endianness doesn't actually change anything here. It doesn't try to store one of the bytes (MSB, LSB etc).
If char is unsigned it will wrap around. Assuming 8-bit char 259 % 256 = 3
If char is signed the result is implementation defined. Thank you pmg: 6.3.1.3/3 in the C99 Standard

Since you're casting from a larger integer type to a smaller one, it takes the least significant part regardless of endianness. If you were casting pointers instead, though, it would take the byte at the address, which would depend on endianness.
So c = (char)i assigns the least-significant byte to c, but c = *((char *)(&i)) would assign the first byte at the address of i to c, which would be the same thing on little-endian systems only.

If you want to test for little/big endian, you can use a union:
int isBigEndian (void)
{
union foo {
size_t i;
char cp[sizeof(size_t)];
} u;
u.i = 1;
return *u.cp != 1;
}
It works because in little endian, it would look like 01 00 ... 00, but in big endian, it would be 00 ... 00 01 (the ... is made up of zeros). So if the first byte is 0, the test returns true. Otherwise it returns false. Beware, however, that there also exist mixed endian machines that store data differently (some can switch endianness; others just store the data differently). The PDP-11 stored a 32-bit int as two 16-bit words, except the order of the words was reversed (e.g. 0x01234567 was 4567 0123).

When casting from int(4 bytes) to char(1 byte), it will preserve the last 1 byte.
Eg:
int x = 0x3F1; // 0x3F1 = 0000 0011 1111 0001
char y = (char)x; // 1111 0001 --> -15 in decimal (with Two's complement)
char z = (unsigned char)x; // 1111 0001 --> 241 in decimal

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight