cast char array to integer - c

#include <stdio.h>
int main(){
unsigned char a[4] = {1, 2, 3, 4};
int b = *(int *)&a[0];
printf("%d\n", b);
return 0;
}
I just cannot understand why the result of b is 0x4030201.
Could someone help me out?

When you tell the compiler to create an array like this:
unsigned char a[4] = {1, 2, 3, 4};
These numbers are put somewhere in memory in following order:
MemoryAddress0: 0x01 -> a[0]
MemoryAddress1: 0x02 -> a[1]
MemoryAddress2: 0x03 -> a[2]
MemoryAddress3: 0x04 -> a[3]
&a[0] is a char pointer with the value of MemoryAddress0 and points a 1 byte value of 0x01
(int*)&a[0] is a casted pointer with the same value of MemoryAddress0 but with int* type this time so it points to four consecutive bytes.
Most machines we use in our daily lives are little endian which means that they store multibyte values in memory from the least significant byte to the most significant one.
When an int* points to a memory of four bytes, the first byte it encounters is the least significant byte and the second byte is the the second least significant and so on.
MemoryAddress0: 0x01 -> 2^0 term
MemoryAddress1: 0x02 -> 2^8 term
MemoryAddress2: 0x03 -> 2^16 term
MemoryAddress3: 0x04 -> 2^24 term
Thus the 4-byte integer value becomes 0x01*2^0 + 0x02*2^8 + 0x03*2^16 + 0x04*2^24 which is equal to 0x04030201.

You are on a little-endian machine, this means that integers with sizes larger than a byte store the least-significant bytes first.
Note that most architectures these days are little-endian thanks to the common-ness of x86.

Because your system is little endian. The first byte in a multi-byte integer is interpreted as the least significant byte in little endian systems.

Related

Pointers in C with typecasting

#include<stdio.h>
int main()
{
int a;
char *x;
x = (char *) &a;
a = 512;
x[0] = 1;
x[1] = 2;
printf("%d\n",a);
return 0;
}
I'm not able to grasp the fact that how the output is 513 or even Machine dependent ? I can sense that typecasting is playing a major role but what is happening behind the scenes, can someone help me visualise this problem ?
The int a is stored in memory as 4 bytes. The number 512 is represented on your machine as:
0 2 0 0
When you assign to x[0] and x[1], it changes this to:
1 2 0 0
which is the number 513.
This is machine-dependent, because the order of bytes in a multi-byte number is not specified by the C language.
For simplifying assume the following:
size of int is 4 (in bytes)
size of any pointer type is 8
size of char is 1 byte
in line 3 x is referencing a as a char, this means that x thinks that he is pointing to a char (he has no idea that a was actually a int.
line 4 is meant to confuse you. Don't.
line 5 - since x thinks he is pointing to a char x[0] = 1 changes just the first byte of a (because he thinks that he is a char)
line 6 - once again, x changed just the second byte of a.
note that the values put in lines 5 and 6 overide the value in line 4.
the value of a is now 0...0000 0010 0000 0001 (513).
Now when we print a as an int, all 4 bytes would be considered as expected.
Let me try to break this down for you in addition to the previous answers:
#include<stdio.h>
int main()
{
int a; //declares an integer called a
char *x; //declares a pointer to a character called x
x = (char *) &a; //points x to the first byte of a
a = 512; //writes 512 to the int variable
x[0] = 1; //writes 1 to the first byte
x[1] = 2; //writes 2 to the second byte
printf("%d\n",a); //prints the integer
return 0;
}
Note that I wrote first byte and second byte. Depending on the byte order of your platform and the size of an integer you might not get the same results.
Lets look at the memory for 32bit or 4 Bytes sized integers:
Little endian systems
first byte | second byte | third byte | forth byte
0x00 0x02 0x00 0x00
Now assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x00 0x00
Notice that the first byte gets changed to 0x01 while the second was already 0x02.
This new number in memory is equivalent to 513 on little endian systems.
Big endian systems
Lets look at what would happen if you were trying this on a big endian platform:
first byte | second byte | third byte | forth byte
0x00 0x00 0x02 0x00
This time assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x02 0x00
Which is equivalent to 16,908,800 as an integer.
I'm not able to grasp the fact that how the output is 513 or even Machine dependent
The output is implementation-defined. It depends on the order of bytes in CPU's interpretation of integers, commonly known as endianness.
I can sense that typecasting is playing a major role
The code reinterprets the value of a, which is an int, as an array of bytes. It uses two initial bytes, which is guaranteed to work, because an int is at least two bytes in size.
Can someone help me visualise this problem?
An int consists of multiple bytes. They can be addressed as one unit that represents an integer, but they can also be addressed as a collection of bytes. The value of an int depends on the number of bytes that you set, and on the order of these bytes in CPU's interpretation of integers.
It looks like your system stores the least significant byte at a lowest address, so the result of storing 1 and 2 at offsets zero and one produces this layout:
Byte 0 Byte 1 Byte 2 Byte 3
------ ------ ------ ------
1 2 0 0
Integer value can be computed as follows:
1 + 2*256 + 0*65536 + 0*16777216
By taking x, which is a char *, and pointing it to the address of a, which is an int, you can use x to modify the individual bytes that represent a.
The output you're seeing suggests that an int is stored in little-endian format, meaning the least significant byte comes first. This can change however if you run this code on a different system (ex. a Sun SPARC machine which is big-enidan).
You first set a to 512. In hex, that's 0x200. So the memory for a, assuming a 32 bit int in little endian format, is laid out as follows:
-----------------------------
| 0x00 | 0x02 | 0x00 | 0x00 |
-----------------------------
Next you set x[0] to 1, which updates the first byte in the representation of a (in this case leaving it unchanged):
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Then you set x[1] to 2, which updates the second byte in the representation of a:
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Now a has a value of 0x201, which in decimal is 513.

c, hex representation in big-endian system

What is result of:
int x = 0x00000001;
int y = 0x80000000;
in a big-endian system?
My goal is to define an int that has the first (in memory) bit set, regardless of whether it is the most significant one or the least significant one. I know that with little-endian systems, x would satisfy this requirement, but is it still true in a big-endian system?
I'm pretty sure that the following will work in both systems:
char c[4] = {0x80, 0, 0, 0};
int x = (int) c;
Is that correct? Is there a more elegant method?
(I don't have a big-endian system to experiment on)
What you probably want is this:
int x = 0;
char* p = (char*)&x;
p[0] = 0x01;
The above code will set the least significant bit in the lowest-address byte of an int variable to 1:
On a Big-Endian processor, it will set the LS-bit in the MS-byte to 1 (i.e., x == 0x10000000).
On a Little-Endian processor, it will set the LS-bit in the LS-byte to 1 (i.e., x == 0x00000001).
Having said that, what is your definition of "the first bit"? IMHO it is simply the least significant one, in which case, int x = 0x00000001 is the answer regardless of the Endianness of your processor!!
The following terminology might help you to understand a little better:
Set the least significant bit in an 8-bit byte: 0x01
Set the most significant bit in an 8-bit byte: 0x80
Set the least significant byte in a 4-byte integer: 0x000000FF
Set the most significant byte in a 4-byte integer: 0xFF000000
Set the lowest-address byte in a 4-byte integer on a LE processor: 0x000000FF
Set the lowest-address byte in a 4-byte integer on a BE processor: 0xFF000000
Set the highest-address byte in a 4-byte integer on a LE processor: 0xFF000000
Set the highest-address byte in a 4-byte integer on a BE processor: 0x000000FF
You can try unions
union foo
{
char array[8];
int64_t num;
};

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Storing an unsigned integer into a struct

I've written this piece of code where I've assigned an unsigned integer to two different structs. In fact they're the same but one of them has the __attribute__((packed)).
#include
#include
struct st1{
unsigned char opcode[3];
unsigned int target;
}__attribute__((packed));
struct st2{
unsigned char opcode[3];
unsigned int target;
};
void proc(void* addr) {
struct st1* varst1 = (struct st1*)addr;
struct st2* varst2 = (struct st2*)addr;
printf("opcode in varst1: %c,%c, %c\n",varst1->opcode[0],varst1->opcode[1],varst1->opcode[2]);
printf("opcode in varst2: %c,%c,%c\n",varst2->opcode[0],varst2->opcode[1],varst2->opcode[2]);
printf("target in varst1: %d\n",varst1->target);
printf("target in varst2: %d\n",varst2->target);
};
int main(int argc,char* argv[]) {
unsigned int* var;
var =(unsigned int*) malloc(sizeof(unsigned int));
*var = 0x11334433;
proc((void*)var);
return 0;
}
The output is:
opcode in varst1: 3,D,3
opcode in varst2: 3,D,3
target in varst1: 17
target in varst2: 0
Given that I'm storing this number
0x11334433 == 00010001001100110100010000110011
I'd like to know why that is the output I get.
This is to do with data alignment. Most compilers will align data on address boundaries that help with general performance. So, in the first case, the struct with the packed attribute, there is an extra byte between the char [3] and the int to align the int on a four byte boundary. In the packed version that padding byte is missing.
byte : 0 1 2 3 4 5 6 7
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
You allocate an unsigned int and pass that to the function:
byte : 0 1 2 3 4 5 6 7
alloc : |-----------int------------------| |---unallocated---|
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
If you're using a little endian system then the lowest eight bits (right most) are stored at byte 0 (0x33), byte 1 has 0x44, byte 2 has 0x33 and byte 4 has 0x11. In the st1 structure the int value is mapped to memory beyond the end of the allocated amount and the st2 version the lowest byte of the int is mapped to the byte 4, 0x11. So st1 produces 0 and st2 produces 0x11.
You are lucky that the unallocated memory is zero and that you have no memory range checking going on. Writing to the ints in st1 and st2 in this case could corrupt memory at worst, generate memory guard errors or do nothing. It is undefined and dependant on the runtime implementation of the memory manager.
In general, avoid void *.
Your bytes look like this:
00010001 00110011 01000100 00110011
Though obviously your endianness is wrong and in fact they're like this:
00110011 01000100 00110011 00010001
If your struct is packed then the first three bytes are associated with opcode, and the 4th is target - thats why the packed array has atarget of 17 - 0001001 in binary.
The unpacked array is padded with zeros, which is why target in varst2 is zero.
%c interprets the argument as the ascii code of a character and prints the character
3's ascii code is 0x33
D's ascii code is 0x44
17 is 0x11
an int is stored little endian or big endian depending on the processor architecture -- you can't depend on it going into your struct's fields in order.
The int target in the unpacked version is past the position of the int, so it stays 0.

Resources