Pointers in C with typecasting - c

#include<stdio.h>
int main()
{
int a;
char *x;
x = (char *) &a;
a = 512;
x[0] = 1;
x[1] = 2;
printf("%d\n",a);
return 0;
}
I'm not able to grasp the fact that how the output is 513 or even Machine dependent ? I can sense that typecasting is playing a major role but what is happening behind the scenes, can someone help me visualise this problem ?

The int a is stored in memory as 4 bytes. The number 512 is represented on your machine as:
0 2 0 0
When you assign to x[0] and x[1], it changes this to:
1 2 0 0
which is the number 513.
This is machine-dependent, because the order of bytes in a multi-byte number is not specified by the C language.

For simplifying assume the following:
size of int is 4 (in bytes)
size of any pointer type is 8
size of char is 1 byte
in line 3 x is referencing a as a char, this means that x thinks that he is pointing to a char (he has no idea that a was actually a int.
line 4 is meant to confuse you. Don't.
line 5 - since x thinks he is pointing to a char x[0] = 1 changes just the first byte of a (because he thinks that he is a char)
line 6 - once again, x changed just the second byte of a.
note that the values put in lines 5 and 6 overide the value in line 4.
the value of a is now 0...0000 0010 0000 0001 (513).
Now when we print a as an int, all 4 bytes would be considered as expected.

Let me try to break this down for you in addition to the previous answers:
#include<stdio.h>
int main()
{
int a; //declares an integer called a
char *x; //declares a pointer to a character called x
x = (char *) &a; //points x to the first byte of a
a = 512; //writes 512 to the int variable
x[0] = 1; //writes 1 to the first byte
x[1] = 2; //writes 2 to the second byte
printf("%d\n",a); //prints the integer
return 0;
}
Note that I wrote first byte and second byte. Depending on the byte order of your platform and the size of an integer you might not get the same results.
Lets look at the memory for 32bit or 4 Bytes sized integers:
Little endian systems
first byte | second byte | third byte | forth byte
0x00 0x02 0x00 0x00
Now assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x00 0x00
Notice that the first byte gets changed to 0x01 while the second was already 0x02.
This new number in memory is equivalent to 513 on little endian systems.
Big endian systems
Lets look at what would happen if you were trying this on a big endian platform:
first byte | second byte | third byte | forth byte
0x00 0x00 0x02 0x00
This time assigning 1 to the first byte and 2 to the second one leaves us with this:
first byte | second byte | third byte | forth byte
0x01 0x02 0x02 0x00
Which is equivalent to 16,908,800 as an integer.

I'm not able to grasp the fact that how the output is 513 or even Machine dependent
The output is implementation-defined. It depends on the order of bytes in CPU's interpretation of integers, commonly known as endianness.
I can sense that typecasting is playing a major role
The code reinterprets the value of a, which is an int, as an array of bytes. It uses two initial bytes, which is guaranteed to work, because an int is at least two bytes in size.
Can someone help me visualise this problem?
An int consists of multiple bytes. They can be addressed as one unit that represents an integer, but they can also be addressed as a collection of bytes. The value of an int depends on the number of bytes that you set, and on the order of these bytes in CPU's interpretation of integers.
It looks like your system stores the least significant byte at a lowest address, so the result of storing 1 and 2 at offsets zero and one produces this layout:
Byte 0 Byte 1 Byte 2 Byte 3
------ ------ ------ ------
1 2 0 0
Integer value can be computed as follows:
1 + 2*256 + 0*65536 + 0*16777216

By taking x, which is a char *, and pointing it to the address of a, which is an int, you can use x to modify the individual bytes that represent a.
The output you're seeing suggests that an int is stored in little-endian format, meaning the least significant byte comes first. This can change however if you run this code on a different system (ex. a Sun SPARC machine which is big-enidan).
You first set a to 512. In hex, that's 0x200. So the memory for a, assuming a 32 bit int in little endian format, is laid out as follows:
-----------------------------
| 0x00 | 0x02 | 0x00 | 0x00 |
-----------------------------
Next you set x[0] to 1, which updates the first byte in the representation of a (in this case leaving it unchanged):
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Then you set x[1] to 2, which updates the second byte in the representation of a:
-----------------------------
| 0x01 | 0x02 | 0x00 | 0x00 |
-----------------------------
Now a has a value of 0x201, which in decimal is 513.

Related

Why char value increasing 3?

I'm dealing with pointers in C, declaring the integer value as 1025. Then I changed the integer to char. When I increase the char pointer by 1, the char value increases by 3. How is this happening?
#include<stdio.h>
int main(){
int a = 1025;
int *p;
p = &a;
char *p0;
p0 = (char*)p;
printf("Address = %d, value = %d\n",p0,*p0);
printf("Address = %d, value = %d\n",p0+1,*(p0+1));
return 0;
}
When you use a char * to alias an object of another type as you're doing, it allows you to access the byte representation of that object.
The value 1025 can be represented in hex as 0x0401. Your system appears to use little-endian byte ordering to store integers, which means the low order bytes appear first in the representation.
So assuming an int is 32 bits on your system a looks like this in memory:
-----------------------------
a | 0x01 | 0x04 | 0x00 | 0x00 |
-----------------------------
The pointer p0 points to the first byte, so *p0 is 1. Then p0+1 points to the next byte so *(p0+1) is 4.
You are incrementing the pointer to the bytes the integer a is composed from. Incidentally 1025 is composed of two bytes with values 1 and 4 - you can check. 1025 = 4*256 + 1. So once you move from 1 to 4 it looks like it was incremented by 3
If you want to access the next element it can be accessed by incrementing the pointer value, not the char value. Increasing the pointer value will enable that pointer to point towards the next value's address and hence you will be seeing the value at the address.
Your output shows what is happening:
Address = 1204440828, value = 1
Address = 1204440829, value = 4
Notice that the address of what you are printing is changing by one. So you are not adding 1 to a you are looking at the value in the location a + 1.
UPDATE: As a clarification: the pointer is pointing to an int but you are printing it as a series of byte values. So the fact that the first byte of the value 1025 happens to decode to 1 If you change a to some other value, you will get somewhat random output. Try changing int a = 1035 and you will get 11 and 4 instead of 1 and 4.

Why does this code produce the output 513?

i saw this question at my c language final exam and the output is 513 and i don't know why
#include <stdio.h>
int main(void){
char a[4] = {1,2,3,4};
print("%d" , *(short*)a);
}
Your array of bytes is (in hex):
[ 0x01, 0x02, 0x03, 0x04 ]
If you treat the start of the array not as an array of bytes, but as the start of a short, then your short has value 0x01 0x02, and because your processor is "Little Endian", it reads backwards from how humans read it. We would it as 0x0201, which is the same as 513(Decimal)
If the system this code is being run on meets the following requirements:
Unaligned memory access is permitted (or a is guaranteed to be short-aligned)
Little-endian byte order is used
sizeof(short) == 2
CHAR_BIT == 8
Then dereferencing a short * pointer to the following memory:
| 0x01 | 0x02 | 0x03 | 0x04 |
Will give you 0x0201, or 513 in base 10.
Also, do note that even if all these requirements are met, aliasing a char [] array as a short * violates the strict aliasing rule.
The code casts your char* pointer into short* one and prints its value.
short in C is represented in 2 bytes, and the binary representation of the first two bytes of your array is 00000001 00000010 but because the processor is a little endian one it reads it as 00000010 00000001 which is 513 in decimal.

Adding one byte to a hexadecimal number

I seem to have confused myself so much that this doesn't make sense anymore.
1 byte = 8 bits.
So if I have a memory location such as
0xdeadbeef
3735928559 (base10)
1101 1110 1010 1101 1011 1110 1110 1111
Now if I add one byte to 0xdeadbeef, what is the binary sequence I'm adding? Is it 1000? If I add 1 bit, I get 0xdeadbee0, and if I add 1 bit 8 times, I get 0xdeadbef7. Which is correct?
I remember from microprocessors the counter incremented in PC += 4, which gives 0xdeadbef3, so I'm not sure which is the right answer.
What I understand from your question is that, you are confused with adding a bit and a byte to the counter.
Since memory addresses are measured in bytes (in programming languages), any arithmetic operation to it is done in bytes.
To increment counter, adding 1 to it is like increment it to one byte next to the base address. Adding 1 to 0xdeadbeef will increment it to 0xdeadbef0.
I'm referring to memory locations.
So 0xdeadbeef is an address. If you increment it by 1 byte, you simply add 1 to it.
i.e. 0xdeadbeef + 1 = 0xdeadbef0
Concluding It looks like adding 1 bit to the address increments pointer by 1 byte because you access memory at byte granularity and increment memory in terms of number of bytes. But actually you added number 1 (i.e. 0x00000001). In case you want to increment by 4 byte, you add 4 to the address because memory is addressed(unit of) as number of bytes.
what is adding one byte? byte needs to have value .
you need to add number to a number , not size (8 bits) to number...
If the byte you are adding = 0x08 --> 1000 in binary , and
0xdeadbeef+ 0x08 = 0xdeadbef7
If, as I gathered, that thing is a memory address, which is measured in "number of bytes after the base (zero) address", if you move one byte forward in memory the memory address is incremented by one.
Think of it like this: if you measure the distance from the start of the road in meters, if you move forward of one meter (which is your unit of measurement) the distance from the start increases of 1.
Be careful though that pointers in C (and C++) work in a slightly confusing (at first) way: if your pointer is of type T *, each arithmetic operation on if is performed in units of T, so the underlying memory address is moved around in steps of sizeof(T).
For example, if you have:
int a[2];
int *ptr=a;
int *ptr2=ptr+1;
printf("Delta in ints: %d", (int)(ptr2-ptr)); // will print 1
char *cptr=(char *)ptr;
char *cptr2=(char *)ptr2;
printf("Delta in chars: %d", (int)(cptr2-cptr)); // will print sizeof(int), typically 4
In computers each memory location holds 8-bit(1-byte) of data. So, when we add 1 byte it will simply add 1 byte to the memory location.
when you add 1 to an address of word-size, 1 is interpreted as a word-size string of bits that has a decimal value of 1.
For your case, 0xdeadbeef has 32-bits or 4 bytes. So adding 1 is like doing 0xdeadbeef + 0x00000001. Adding 300 is like 0xdeadbeef + 0x0000012C
Just in case you're wondering if you can add more than word-size to the address, it's not possible, because the address can't be bigger or smaller than the word size.

converting little endian hex to big endian decimal in C

I am trying to understand and implement a simple file system based on FAT12. I am currently looking at the following snippet of code and its driving me crazy:
int getTotalSize(char * mmap)
{
int *tmp1 = malloc(sizeof(int));
int *tmp2 = malloc(sizeof(int));
int retVal;
* tmp1 = mmap[19];
* tmp2 = mmap[20];
printf("%d and %d read\n",*tmp1,*tmp2);
retVal = *tmp1+((*tmp2)<<8);
free(tmp1);
free(tmp2);
return retVal;
};
From what I've read so far, the FAT12 format stores the integers in little endian format.
and the code above is getting the size of the file system which is stored in the 19th and 20th byte of boot sector.
however I don't understand why retVal = *tmp1+((*tmp2)<<8); works. is the bitwise <<8 converting the second byte to decimal? or to big endian format?
why is it only doing it to the second byte and not the first one?
the bytes in question are [in little endian format] :
40 0B
and i tried converting them manually by switching the order first to
0B 40
and then converting from hex to decimal, and I get the right output, I just don't understand how adding the first byte to the bitwise shift of second byte does the same thing?
Thanks
The use of malloc() here is seriously facepalm-inducing. Utterly unnecessary, and a serious "code smell" (makes me doubt the overall quality of the code). Also, mmap clearly should be unsigned char (or, even better, uint8_t).
That said, the code you're asking about is pretty straight-forward.
Given two byte-sized values a and b, there are two ways of combining them into a 16-bit value (which is what the code is doing): you can either consider a to be the least-significant byte, or b.
Using boxes, the 16-bit value can look either like this:
+---+---+
| a | b |
+---+---+
or like this, if you instead consider b to be the most significant byte:
+---+---+
| b | a |
+---+---+
The way to combine the lsb and the msb into 16-bit value is simply:
result = (msb * 256) + lsb;
UPDATE: The 256 comes from the fact that that's the "worth" of each successively more significant byte in a multibyte number. Compare it to the role of 10 in a decimal number (to combine two single-digit decimal numbers c and d you would use result = 10 * c + d).
Consider msb = 0x01 and lsb = 0x00, then the above would be:
result = 0x1 * 256 + 0 = 256 = 0x0100
You can see that the msb byte ended up in the upper part of the 16-bit value, just as expected.
Your code is using << 8 to do bitwise shifting to the left, which is the same as multiplying by 28, i.e. 256.
Note that result above is a value, i.e. not a byte buffer in memory, so its endianness doesn't matter.
I see no problem combining individual digits or bytes into larger integers.
Let's do decimal with 2 digits: 1 (least significant) and 2 (most significant):
1 + 2 * 10 = 21 (10 is the system base)
Let's now do base-256 with 2 digits: 0x40 (least significant) and 0x0B (most significant):
0x40 + 0x0B * 0x100 = 0x0B40 (0x100=256 is the system base)
The problem, however, is likely lying somewhere else, in how 12-bit integers are stored in FAT12.
A 12-bit integer occupies 1.5 8-bit bytes. And in 3 bytes you have 2 12-bit integers.
Suppose, you have 0x12, 0x34, 0x56 as those 3 bytes.
In order to extract the first integer you only need take the first byte (0x12) and the 4 least significant bits of the second (0x04) and combine them like this:
0x12 + ((0x34 & 0x0F) << 8) == 0x412
In order to extract the second integer you need to take the 4 most significant bits of the second byte (0x03) and the third byte (0x56) and combine them like this:
(0x56 << 4) + (0x34 >> 4) == 0x563
If you read the official Microsoft's document on FAT (look up fatgen103 online), you'll find all the FAT relevant formulas/pseudo code.
The << operator is the left shift operator. It takes the value to the left of the operator, and shift it by the number used on the right side of the operator.
So in your case, it shifts the value of *tmp2 eight bits to the left, and combines it with the value of *tmp1 to generate a 16 bit value from two eight bit values.
For example, lets say you have the integer 1. This is, in 16-bit binary, 0000000000000001. If you shift it left by eight bits, you end up with the binary value 0000000100000000, i.e. 256 in decimal.
The presentation (i.e. binary, decimal or hexadecimal) has nothing to do with it. All integers are stored the same way on the computer.

Split up two byte char into two single byte chars

I have a char of value say 0xB3, and I need to split this into two separate char's. So X = 0xB and Y = 0x3. I've tried the following code:
int main ()
{
char addr = 0xB3;
char *p = &addr;
printf ("%c, %c\n", p[0], p[1]); //This prints ?, Y
printf ("%X, %X\n", p[0], p[1]); //This prints FFFFFFB3, 59
return 0;
}
Just to clarify, I need to take any 2 byte char of value 00 to FF and split the first and second byte into separate char's. Thanks.
Straight from the Wikipedia:
#define HI_NIBBLE(b) (((b) >> 4) & 0x0F)
#define LO_NIBBLE(b) ((b) & 0x0F)
So HI_NIBBLE(addr) would be 0xB. However, 0x00 through 0xFF are not "double bytes". They're single-byte values. A single hex digit can take on 16 bytes, while a byte can take on 256 = 16² of them, so you need two hex digits to represent arbitrary byte values.
There's quite a few problems here, let's take a look at your code:
int main ()
{
char addr = 0xB3; <-- you're asigning 0xB3 in hex, which is (179 in dec) to addr
char *p = &addr; <-- you're assigning a pointer to point to addr
If addr were unsigned, it would now be set to 179, the extended ASCII of │ ( Box drawing character )
A char value can be -127 to +127 if it's signed, or 0 to 255 if it's unsigned. Here (according to your output) it's signed, so you're overflowing the char with that assignment.
printf ("%c, %c\n", p[0], p[1]); <-- print the char value of what p is pointing to
also, do some UB
printf ("%X, %X\n", p[0], p[1]); <-- print the hex value of what p is pointing to
also, do some UB
So the second part of your code here prints the char value of your overflowed addr var, which happens to print '?' for you. The hex value of addr is FFFFFFB3 indicating you have a negitive value (upper most bit is the signed bit).
This: p[0] is really an "add and deference" operator. Meaning that we're going to take the address of p, add 0 to it, then deference and look at the result:
p ---------+
V
------------------------------------------
| ptr(0xB3) | ? | ? | ... |
-------------------------------------------
0xbfd56c2b 0xbfd56c2C 0xbfd56c2d ...
When you do p[1] this goes one char or one byte past ptr and gives you that result. What's there? Don't know. That's out of your scope:
p+1 -------------------+
V
------------------------------------------
| ptr(0xB3) | ? | ? | ... |
-------------------------------------------
0xbfd56c2b 0xbfd56c2C 0xbfd56c2d ...
Y's ASCII value (in hex) is 0x59, so behind your pointer in memory was a Y. But it could have been anything, what is was going to do was undefined. A correct way to do this would be:
int main ()
{
unsigned char addr = 0xB3;
char low = addr & 0x0F;
char high = (addr >> 4) & 0x0F;
printf("%#x becomes %#x and %#x\n", addr, high, low);
return 0;
}
This works via:
0xB3 => 1011 0011 0xB3 >> 4 = 0000 1011
& 0000 1111 & 0000 1111
------------ -------------
0000 0011 => 3 low 0000 1011 => B high
Why do you need to pass by a pointer? just take the 4 relevant bits and shift the most significative when needed:
char lower = value & 0x0F;
char higher = (value >> 4) & 0x0F;
Then 0xB3 is a single byte, not two bytes. Since a hex digit can have 16 values two digits can store 16*16 = 256 values, which is how much you can store in a byte.
Ok so your trying to split 0xB3 into 0xB and 0x3, Just for future reference don't say 'byte chars', The 2 parts of a byte are commonly known as 'Nibbles', a byte is made up of 2 nibbles (which are made up of 4 bits).
If you didn't know, char refers to 1 byte.
So heres the problems with your code:
char addr = 0xB3; <---- Creates single byte with value 0xB3 - Good
char *p = &addr; <---- Creates pointer pointing to 0xB3 - Good
printf ("%c, %c\n", p[0], p[1]); <---- p[0], p[1] - Bad
printf ("%X, %X\n", p[0], p[1]); <---- p[0], p[1] - Bad
Ok so when your referring to p[0] and p[1] your telling your system that the pointer p is pointing to an array of chars (p[0] would refer to 0xB3 but p[1] would be going to the next byte in memory)
Example : This is something your system memory would look like (But with 8 byte pointers)
Integer Values Area Pointers Area
0x01 0x02 0x03 0x04 0x05 0x06 0x12 0x13 0x14 0x15 0x16
----------------------------- ------------------------
.... .... 0xB3 0x59 .... .... .... .... 0x03 .... ....
----------------------------- ------------------------
^ ^ ^
addr | p (example pointer pointing to example address 0x03)
Random number (Pointers are normally 8 Bytes but)
showing up in p[1] (But In this example I used single bytes)
So when you tell your system to get p[0] or *p (these would do the same thing)
it will go to the address (eg. 0x03) and get one byte (because its a char)
in this case 0xB3.
But when you try p[1] or *(p+1) That will go to the address (eg. 0x03) skip the first char and get the next one giving us 0x59 which would be there for some other variable.
Ok so we've got that out of the way so how do you get the nibbles?
A problem with getting the nibble is that you generally can't just have half a byte a put variable, theres no type that supports just 4 bits.
When you print with %x/%X it will only show the nibbles up to the last non-zero number eg. = 0x00230242 would only show 230242 but if you do something like
%2lX would show 2 full bytes (including the zeros)
%4lX would show 4 full bytes (including the zeros)
So it is pretty pointless trying to get individual nibbles but if you want way to do something like then do:
char addr = 0x3B;
char addr1 = ((addr >> 4) & 0x0F);
char addr2 = ((addr >> 0) & 0x0F);

Resources