Why does this code produce the output 513? - c

i saw this question at my c language final exam and the output is 513 and i don't know why
#include <stdio.h>
int main(void){
char a[4] = {1,2,3,4};
print("%d" , *(short*)a);
}

Your array of bytes is (in hex):
[ 0x01, 0x02, 0x03, 0x04 ]
If you treat the start of the array not as an array of bytes, but as the start of a short, then your short has value 0x01 0x02, and because your processor is "Little Endian", it reads backwards from how humans read it. We would it as 0x0201, which is the same as 513(Decimal)

If the system this code is being run on meets the following requirements:
Unaligned memory access is permitted (or a is guaranteed to be short-aligned)
Little-endian byte order is used
sizeof(short) == 2
CHAR_BIT == 8
Then dereferencing a short * pointer to the following memory:
| 0x01 | 0x02 | 0x03 | 0x04 |
Will give you 0x0201, or 513 in base 10.
Also, do note that even if all these requirements are met, aliasing a char [] array as a short * violates the strict aliasing rule.

The code casts your char* pointer into short* one and prints its value.
short in C is represented in 2 bytes, and the binary representation of the first two bytes of your array is 00000001 00000010 but because the processor is a little endian one it reads it as 00000010 00000001 which is 513 in decimal.

Related

Pointers in C with typecasting

#include<stdio.h>
int main()
{
int a;
char *x;
x = (char *) &a;
a = 512;
x[0] = 1;
x[1] = 2;
printf("%d\n",a);
return 0;
}
I'm not able to grasp the fact that how the output is 513 or even Machine dependent ? I can sense that typecasting is playing a major role but what is happening behind the scenes, can someone help me visualise this problem ?
The int a is stored in memory as 4 bytes. The number 512 is represented on your machine as:
0 2 0 0
When you assign to x[0] and x[1], it changes this to:
1 2 0 0
which is the number 513.
This is machine-dependent, because the order of bytes in a multi-byte number is not specified by the C language.
For simplifying assume the following:
size of int is 4 (in bytes)
size of any pointer type is 8
size of char is 1 byte
in line 3 x is referencing a as a char, this means that x thinks that he is pointing to a char (he has no idea that a was actually a int.
line 4 is meant to confuse you. Don't.
line 5 - since x thinks he is pointing to a char x[0] = 1 changes just the first byte of a (because he thinks that he is a char)
line 6 - once again, x changed just the second byte of a.
note that the values put in lines 5 and 6 overide the value in line 4.
the value of a is now 0...0000 0010 0000 0001 (513).
Now when we print a as an int, all 4 bytes would be considered as expected.
Let me try to break this down for you in addition to the previous answers:
#include<stdio.h>
int main()
{
int a; //declares an integer called a
char *x; //declares a pointer to a character called x
x = (char *) &a; //points x to the first byte of a
a = 512; //writes 512 to the int variable
x[0] = 1; //writes 1 to the first byte
x[1] = 2; //writes 2 to the second byte
printf("%d\n",a); //prints the integer
return 0;
}
Note that I wrote first byte and second byte. Depending on the byte order of your platform and the size of an integer you might not get the same results.
Lets look at the memory for 32bit or 4 Bytes sized integers:
Little endian systems
first byte | second byte | third byte | forth byte
0x00 0x02 0x00 0x00
Now assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x00 0x00
Notice that the first byte gets changed to 0x01 while the second was already 0x02.
This new number in memory is equivalent to 513 on little endian systems.
Big endian systems
Lets look at what would happen if you were trying this on a big endian platform:
first byte | second byte | third byte | forth byte
0x00 0x00 0x02 0x00
This time assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x02 0x00
Which is equivalent to 16,908,800 as an integer.
I'm not able to grasp the fact that how the output is 513 or even Machine dependent
The output is implementation-defined. It depends on the order of bytes in CPU's interpretation of integers, commonly known as endianness.
I can sense that typecasting is playing a major role
The code reinterprets the value of a, which is an int, as an array of bytes. It uses two initial bytes, which is guaranteed to work, because an int is at least two bytes in size.
Can someone help me visualise this problem?
An int consists of multiple bytes. They can be addressed as one unit that represents an integer, but they can also be addressed as a collection of bytes. The value of an int depends on the number of bytes that you set, and on the order of these bytes in CPU's interpretation of integers.
It looks like your system stores the least significant byte at a lowest address, so the result of storing 1 and 2 at offsets zero and one produces this layout:
Byte 0 Byte 1 Byte 2 Byte 3
------ ------ ------ ------
1 2 0 0
Integer value can be computed as follows:
1 + 2*256 + 0*65536 + 0*16777216
By taking x, which is a char *, and pointing it to the address of a, which is an int, you can use x to modify the individual bytes that represent a.
The output you're seeing suggests that an int is stored in little-endian format, meaning the least significant byte comes first. This can change however if you run this code on a different system (ex. a Sun SPARC machine which is big-enidan).
You first set a to 512. In hex, that's 0x200. So the memory for a, assuming a 32 bit int in little endian format, is laid out as follows:
-----------------------------
| 0x00 | 0x02 | 0x00 | 0x00 |
-----------------------------
Next you set x[0] to 1, which updates the first byte in the representation of a (in this case leaving it unchanged):
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Then you set x[1] to 2, which updates the second byte in the representation of a:
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Now a has a value of 0x201, which in decimal is 513.

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

cast char array to integer

#include <stdio.h>
int main(){
unsigned char a[4] = {1, 2, 3, 4};
int b = *(int *)&a[0];
printf("%d\n", b);
return 0;
}
I just cannot understand why the result of b is 0x4030201.
Could someone help me out?
When you tell the compiler to create an array like this:
unsigned char a[4] = {1, 2, 3, 4};
These numbers are put somewhere in memory in following order:
MemoryAddress0: 0x01 -> a[0]
MemoryAddress1: 0x02 -> a[1]
MemoryAddress2: 0x03 -> a[2]
MemoryAddress3: 0x04 -> a[3]
&a[0] is a char pointer with the value of MemoryAddress0 and points a 1 byte value of 0x01
(int*)&a[0] is a casted pointer with the same value of MemoryAddress0 but with int* type this time so it points to four consecutive bytes.
Most machines we use in our daily lives are little endian which means that they store multibyte values in memory from the least significant byte to the most significant one.
When an int* points to a memory of four bytes, the first byte it encounters is the least significant byte and the second byte is the the second least significant and so on.
MemoryAddress0: 0x01 -> 2^0 term
MemoryAddress1: 0x02 -> 2^8 term
MemoryAddress2: 0x03 -> 2^16 term
MemoryAddress3: 0x04 -> 2^24 term
Thus the 4-byte integer value becomes 0x01*2^0 + 0x02*2^8 + 0x03*2^16 + 0x04*2^24 which is equal to 0x04030201.
You are on a little-endian machine, this means that integers with sizes larger than a byte store the least-significant bytes first.
Note that most architectures these days are little-endian thanks to the common-ness of x86.
Because your system is little endian. The first byte in a multi-byte integer is interpreted as the least significant byte in little endian systems.

What actually happens when a pointer to integer is cast to a pointer to char?

int i=40;
char *p;
p=(char *)&i;//What actually happens here?
printf("%d",*p);
What will be the output? Please help!
p=(char *)&i;//What actually happens here?
It takes the address of i and casts it to a char pointer. So the value of *p is now the first byte of i. What that value is, is platform dependent.
Let's start by looking at how the contents of i and p would be laid out in memory (assuming big-endian order):
Item Address 0x00 0x01 0x02 0x03
---- ------- ----------------------
i 0x08000000 0x00 0x00 0x00 0x28
p 0x08000004 0x?? 0x?? 0x?? 0x??
Since p is being declared as an auto variable, it's not initialized to anything and contains a random bit pattern represented by 0x??.
In the line
p = (char *)&i;
the expression &i evaluates to the address of i, or 0x08000000, and its type is pointer to int, or int *. The cast converts the type from int * to char *, and the result is assigned to p.
Here's how things look in memory after the assignment:
Item Address 0x00 0x01 0x02 0x03
---- ------- ----------------------
i 0x08000000 0x00 0x00 0x00 0x28
p 0x08000004 0x08 0x00 0x00 0x00
So the value of p is now the address of i. In the line
printf("%d", *p);
the type of the expression *p is char, and its value is whatever is stored in address 0x08000000, which in this particular case is 0. Since printf is a variadic function, the value of *p is promoted from type char to type int.
So for this particular case, the ouput is "0". If the order were little-endian, the map would look like
Item Address 0x03 0x02 0x01 0x00
---- ------- ----------------------
i 0x08000000 0x00 0x00 0x00 0x28
p 0x08000004 0x08 0x00 0x00 0x00
and the output would be "40".
Note that this whole example assumes that integer and character pointers have the same size and layout; that's not guaranteed to be true everywhere (see the Online C Standard (n1256 draft), section 6.2.5, paragraph 27), so you can't rely on this working everywhere the way you expect (assuming I'm correct in thinking that int and char are not compatible types as defined by the standard, but I could be wrong on that). Type punning in general is not safe.
here you are
int i = 40; //allocating memory for integer i and assigning value 40 to it
char *p = (char*)&i;
so here you are defining a pointer variable and assigning it the address of i after casting it to char*
suppose i is allocated at 1021 address so p will have that address with the limit of 1 byte so it should hold first 8 bit from the representation of 40;
as 40 has been cover under first 8 bit of 2 byte it will hold a char equivalend of 40 but as you are printing it with %d it shopuld print 40;
It depends. On Windows, the output will be 40, but this is just because of a lot of coincidences:
First of all, printf does not (cannot) check the type of its arguments, so since it sees %d in the format string, it assumes that the given argument is an int. Although *p is only a char, the result is promoted to an int (as is for every argument that is not specified in the function prototype).
Second, p will point to the memory taken by the variable i, but since it's a char pointer, it will only take one byte from the memory of i. Since Windows/Intel uses Least-Significant-Byte first convention, 40 will be stored as byte pattern "40 0 0 0" so, since *p takes the first byte (char), the result will be 40. If i would have the value 256 or bigger, the result would be incorrect.
What happens when int pointer is typecasted to char?
There is another question marked duplicated to here, I try to explain for it.
$ cat a.c
#include <stdio.h>
int main(){
int a;
char *x;
x = (char *) &a;
a=512;
x[0]=1;
x[1]=2;
printf("%d\n",a);
return 0;
}
Compile and run:
$ gcc a.c && ./a.out
513
Why it is 513? We can use gdb to see the root cause.
$ gcc a.c -g && gdb ./a.out
(gdb) list
1 #include <stdio.h>
2
3 int main(){
4 int a;
5 char *x;
6 x = (char *) &a;
7 a=512;
8 x[0]=1;
9 x[1]=2;
10 printf("%d\n",a);
set a break point at line 8 of a.c, and run
(gdb) b a.c:8
Breakpoint 1 at 0x40113d: file a.c, line 8.
(gdb) run
once program stop at break point, print variable a's memory address.
(gdb) p &a
$2 = (int *) 0x7fffffffd9d4
(gdb) p x
$3 = 0x7fffffffd9d4 ""
variable a's memory address is 0x7fffffffd9d4, and variable x's value is the same.
before show memory content, let's understand how 512 in hex format, it is:
00 00 02 00
and x86 is little endian, so in memory it should be:
[higher address] 00 02 00 00 [lower address]
let's show the real memory, just as same as we thought.
(gdb) x/4xb 0x7fffffffd9d4
0x7fffffffd9d4: 0x00 0x02 0x00 0x00
then, show memory address of x[0] and x[1], and convert the memory content to real value, it should not difficult to understand why print out 513.
(gdb) p &x[0]
$4 = 0x7fffffffd9d4 ""
(gdb) p &x[1]
$5 = 0x7fffffffd9d5 "\002"

Resources