I've read several posts about casting int pointers to char pointers but i'm still confused on one thing.
I understand that integers take up four bytes of memory (on most 32 bit machines?) and characters take up on byte of memory. By casting a integer pointer to a char pointer, will they both contain the same address? Does the cast operation change the value of what the char pointer points to? ie, it only points to the first 8 bits of an integers and not all 32 bits ? I'm confused as to what actually changes when I cast an int pointer to char pointer.
By casting a integer pointer to a char pointer, will they both contain the same address?
Both pointers would point to the same location in memory.
Does the cast operation change the value of what the char pointer points to?
No, it changes the default interpretation of what the pointer points to.
When you read from an int pointer in an expression *myIntPtr you get back the content of the location interpreted as a multi-byte value of type int. When you read from a char pointer in an expression *myCharPtr, you get back the content of the location interpreted as a single-byte value of type char.
Another consequence of casting a pointer is in pointer arithmetic. When you have two int pointers pointing into the same array, subtracting one from the other produces the difference in ints, for example
int a[20] = {0};
int *p = &a[3];
int *q = &a[13];
ptrdiff_t diff1 = q - p; // This is 10
If you cast p and q to char, you would get the distance in terms of chars, not in terms of ints:
char *x = (char*)p;
char *y = (char*)q;
ptrdiff_t diff2 = y - x; // This is 10 times sizeof(int)
Demo.
The int pointer points to a list of integers in memory. They may be 16, 32, or possibly 64 bits, and they may be big-endian or little endian. By casting the pointer to a char pointer, you reinterpret those bits as characters. So, assuming 16 bit big-endian ints, if we point to an array of two integers, 0x4142 0x4300, the pointer is reinterpreted as pointing to the string "abc" (0x41 is 'a', and the last byte is nul). However if integers are little endian, the same data would be reinterpreted as the string "ba".
Now for practical purposes you are unlikely to want to reinterpret integers as ascii strings. However its often useful to reinterpret as unsigned chars, and thus just a stream of raw bytes.
Casting a pointer just changes how it is interpreted; no change to its value or the data it points to occurs. Using it may change the data it points to, just as using the original may change the data it points to; how it changes that data may differ (which is likely the point of doing the casting in the first place).
A pointer is a particular variable that stores the memory address where another variable begins. Doesnt matter if the variable is a int or a char, if the first bit has the same position in the memory, then a pointer to that variable will look the same.
the difference is when you operate on that pointer. If your pointer variable is p and it's a int pointer, then p++ will increase the address that it contains of 4 bytes.
if your pointer is p and it's a char pointer, then p++ will increase the address that it contains of 1 byte.
this code example will help you understand:
int main(){
int* pi;
int i;
char* pc;
char c;
pi = &i;
pc = &c;
printf("%p\n", pi); // 0x7fff5f72c984
pi++;
printf("%p\n", pi); // 0x7fff5f72c988
printf("%p\n", pc); // 0x7fff5f72c977
pc++;
printf("%p\n", pc); // 0x7fff5f72c978
}
Related
I am having a problem with my exercise in which I have to explain the running of pointers in C.
Can you explain what is the differences between char *pp and (char*) p and the outputs to me?
#include <stdio.h>
#include <stdlib.h>
/*
*
*/
int main(int argc, char** argv) {
int n=260, *p=&n;
printf("n=%d\n", n);
char *pp=(char*)p;
*pp=0;
printf("n=%d\n",n);
return (EXIT_SUCCESS);
}
n=260
n=256
I'm so sorry for the mistake I've done! Hope you guys can help me.
Your question is a basic question, but one that every new C-programmer wrestles with and is fundamental to understanding C. Understanding pointers. While they are easy to understand once you understand them, getting to that point can be frustrating based on the way the information is presented in many books or tutorials.
Pointer Basics
A pointer is simply a normal variable that holds the address of something else as its value. In other words, a pointer points to the address where something else can be found. Where you normally think of a variable holding an immediate values, such as int n = 260;, a pointer (e.g. int *p = &n;) would simply hold the address where 260 is stored in memory.
If you need to access the value stored at the memory address pointed to by p, you dereference p using the unary '*' operator, (e.g. int j = *p; will initialize j = 260).
If you want to obtain a variables address in memory, you use the & (address of) operator. If you need to pass a variable as a pointer, you simply provide the address of the variable as a parameter.
Since p points to the address where 260 is stored, if you change that value at that address (e.g. *p = 41;) 41 is now stored at the address where 260 was before. Since p points to the address of n and you have changed the value at that address, n now equals 41. However j resides in another memory location and its value was set before you changed the value at the address for n, the value for j remains 260.
Pointer Arithmetic
Pointer arithmetic works the same way regardless of the type of object pointed to because the type of the pointer controls the pointer arithmetic, e.g. with a char * pointer, pointer+1 points to the next byte (next char), for an int * pointer (normal 4-byte integer), pointer+1 will point to the next int at an offset 4-bytes after pointer. (so a pointer, is just a pointer.... where arithmetic is automatically handled by the type)
In your case you create a second pointer of a different type char *pp = (char*)p;. The pointer pp now also holds the address of n but it is interpreted at type char on access instead of type int.
The C standard prohibits access of a value stored at an address though a pointer of a different type. C11 Standard - §6.5 Expressions (p6,7) (known as the strict-aliasing rule). There are exceptions to the rule. One exception (the last point) is that any value may be accessed through a pointer of char type.
What Happens to the Value of n In Your Case?
When you assign:
*pp = 0;
you storing the single-byte 0 (or 00000000 in binary) to the memory location held by pp. Here is where endianess (little-endian, big-endian) come into play. Recall, for little-endian computers (just about all x86 and x86_64 IBM-PC clone type boxes), the values are stored in memory with the Least-Significant Byte first. (big-endian stores values with the Most-Significan Byte first). So your original value of n (10000100in binary) is stored in memory on a little-endian box as
n (little endian) : 00000100-00000001-00000000-00000000 (260)
^
|
p (type int)
The character pointer pp is assigned the address held by p, so both p and pp, hold the same address (the difference being one is a pointer to int the other a pointer to char:
n (little endian) : 00000100-00000001-00000000-00000000 (260)
^
|
p (type int)
pp (type char)
When you dereference pp (e.g. *pp) and assign the value zero (e.g. *pp = 0;), you overwrite the first byte of n in memory with zero. After the assignment, you now have:
n (little endian) : 00000000-00000001-00000000-00000000 (256)
^
|
p (type int)
pp (type char)
Which is the binary value 100000000, (256 or hex 0x0100) and what your code outputs for the value of n. Ask yourself this, if the computer you were using was big-endian, what would be resulting value have been?
Let me know if you have any further questions.
char *pp declares the variable pp as a pointer to char - pp will store the address of a char object.
(char *)p is a cast expression - it means “treat the value of p as a char *”.
p was declared as an int * - it stores the address of an int object (in this case, the address of n). The problem is that the char * and int *types are not compatible - you can’t assign one to the other directly1. You have to use a cast to convert the value to the right type.
Pointers to different types are themselves different types, and do not have to have the same size or representation. The one exception is the void * type - it was introduced specifically to be a “generic” pointer type, and you don’t need to explicitly cast when assigning between void * and other pointer types.
I am from Java back ground.I am learning C in which i gone through a code snippet for type conversion from int to char.
int a=5;
int *p;
p=&a;
char *a0;
a0=(char* )p;
My question is that , why we use (char *)p instead of (char)p.
We are only casting the 4 byte memory(Integer) to 1 byte(Character) and not the value related to it
You need to consider pointers as variable that contains addresses. Their sole purpose is to show you where to look in the memory.
so consider this:
int a = 65;
void* addr = &a;
now the 'addr' contains the address of the the memory where 'a' is located
what you do with it is up to you.
here I decided to "see" that part of the memory as an ASCII character that you could print to display the character 'A'
char* car_A = (char*)addr;
putchar(*car_A); // print: A (ASCII code for 'A' is 65)
if instead you decide to do what you suggested:
char* a0 = (char)addr;
The left part of the assignment (char)addr will cast a pointer 'addr' (likely to be 4 or 8 bytes) to a char (1 byte)
The right part of the assignment, the truncated address, will be assigned as the address of the pointer 'a0'
If you don't see why it doesn't make sense let me clarify with a concrete example
Say the address of 'a' is 0x002F4A0E (assuming pointers are stored on 4 bytes) then
'*addr' is equal to 65
'addr' is equal to 0x002F4A0E
When casting it like so (char)addr this become equal to 0x0E.
So the line
char* a0 = (char)addr;
become
char* a0 = 0x0E
So 'a0' will end up pointing to the address 0x0000000E and we don't know what is in this location.
I hope this clarify your problem
First of all, p is not necessarily 4 bytes since it's architecture-dependent. Second, p is a pointer to an integer, a0 is a pointer to a character, not a character. You're taking a pointer pointing to an integer and casting it to a pointer to a character. There are few good reasons to do this. You could also cast the value to a character, but I can't imagine any reason for doing this either.
Pointers do not provide information whether they point to a single object of first object of an array.
Consider
int *p;
int a[5] = { 1, 2, 3, 4, 5 };
int x = 1;
p = a;
p = &x;
So having a value in the pointer p you can not say whether the value is the address of the first element of the array a or it is the address of the single object x.
It is your responsibility to interpret the address correctly.
In this expression-statement
a0=(char* )p;
the address of the extent of memory pointed to by the pointer p and occupied by an object of the type int (it is unknown whether it is a single object or the first object of an array) is interpreted as an address of an extent of memory occupied by an object of the type char. Whether it is a single object of the type char or the first object of a character array with the size equal to sizeof( int ) depends on your intention that is how you are going to deal with the pointer.
I have the code :
#include<stdio.h>
void main(){
int i;
float a=5.2;
char *ptr;
ptr=(char *)&a;
for(i=0;i<=3;i++)
printf("%d ",*ptr++);
}
I got the output as 102 102 -90 64. I couldn't predict how it came, I get confused with this line ptr=(char *)&a;. Can anyone explain me what it does? And as like other variables the code *ptr++ increments? Or there is any other rule for pointers with this case.
I'm a novice in C, so explain the answer in simple terms. Thanks in advance.
This is called a cast. In C, a cast lets you convert or reinterpret a value from one type to another. When you take the address of the float, you get a float*; casting that to a char* gives you a pointer referring to the same location in memory, but pretending that what lives there is char data rather than float data.
sizeof(float) is 4, so printing four bytes starting from that location gives you the bytes that make up the float, according to IEEE-754 single-precision format. Some of the bytes have their high bits set, so when interpreted as signed char and then converted to int for display, they appear as negative values on account of their two's-complement representation.
The expression *ptr++ is equivalent to *(ptr++), which first increments ptr then dereferences its previous value; you can think of it as simultaneously dereferencing and advancing ptr.
That line casts the address of a, denoted &a, to a char*, i.e. a pointer to characters/bytes. The printf loop then prints the values of the four constituent bytes of a in decimal.
(Btw., it would have been neater if the loop had been
for (i=0; i<sizeof(a); i++)
printf("%d ", ptr[i]);
)
The line ptr=(char *)&a; casts the address of the float variable to a pointer of type char. Thus you are now interpreting the 4 bytes of which the float consists of as single bytes, which values you print with your for loop.
The statement *ptr++ post-increments the pointer after reading its value, which means, you read the value pointed to (the single bytes of the float) and then advance the pointer by the offset of one byte.
&a gets the address of a, that is, it yields a pointer of type float *. This pointer of type float * is then cast to a pointer of type char *.
Because sizeof(char) == 1, a can now be seen through ptr as a sequence of bytes.
This is useful when you want to abstract away the type of a variable and treat it as a finite sequence of bytes, especially applicable in serialisation and hashing.
I've constructed the following sections of code to help myself understand pointer dereferencing and typecasting in C.
char a = 'a';
char * b = &a;
int i = (int) *b;
For the above, I understand that on the 3rd line, I've dereferenced b and got 'a' and (int) will typecast the value of 'a' to its corresponding value of 97 which is stored into i. But for this section of code:
char a = 'a';
char * b = &a;
int i = *(int *)b;
This results in i being some arbitrary large number like 792351. I'm assuming this is a memory address but my question is why? When I typecast b to an integer pointer, does this actually cause b to point to a different area in memory? What is going on?
EDIT: If the above doesn't work, then why would something like this work:
char a = 'a';
void * b = &a;
char c = *(char *)b;
This correctly assigns 'a' to c.
Your int is larger than your char - you get the 'a' value + some random data following it in memory.
E.g, assuming this layout in memory:
'a'
0xFF
0xFF
0xFF
Your char * and int * both point to the 'a'. When you dereference the char *, you get only the first byte, the 'a'. When you dereference the int * (assuming your int is 32-bit) you get the 'a' and the 3 bytes of uninitialized data following it.
EDIT: In response to updated question:
In char c = *(char *)b;, b still points at the 'a' value. You cast it to a char *, and then dereference it, getting the char pointed to by a char *
The last line you're concerned about does a very bad thing. First, it treats b as an int* whereas b is a char*. That is, the memory pointer to by b is assumed as 4 bytes(typically) instead of 1 byte. So when you dereference it, it goes to the 1 byte pointed by the actual b, takes the following 3 bytes too, treats those 4 bytes as a single int, and gives you the result. That's why it's garbage.
In general, casting one pointer type to another pointer type must be done with great caution.
You're casting a char pointer to an int pointer. Characters are (usually) stored as 8 bits. ints, on the other hand, are 32 bits (or 64 on 64-bit systems). So if you look at the other 24 bits of memory next to the 8 bits worth of b, you'll get a bunch of extra bits that weren't initialized. Even the position of *b in i is architecture dependent.
big-endian: **** ****|**** ****|**** ****|0110 0001
little-endian: 0110 0001|**** ****|**** ****|**** ****
When you cast the character stored in the above, all the asterisks become relevant.
Since a char is 1 Byte long, and an int 4, when you read an int from the address of a single character, you're reading the character and 3 more bytes. The content of these bytes is just whatever happens to lie in memory (pointers, the value of b) and could even be unallocated (resulting in a segmentation fault).
When you type cast it to a (int *) type, it will refer to a total of 4 bytes(size if int) in memory.
In the second case, you're treating the same address as if it pointed to an int. Officially, the result is simply undefined behavior.
Realistically, what happens is that whatever happens to be in the four1 bytes starting at that address get interpreted as an int.
1 4 bytes assuming a 32-bit int -- if your implementation has, for example, a 64-bit int, it'll be 8 bytes.
Below is the program to find the size of a structure without using sizeof operator:
struct MyStruct
{
int i;
int j;
};
int main()
{
struct MyStruct *p=0;
int size = ((char*)(p+1))-((char*)p);
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
return 0;
}
Why is typecasting to char * required?
If I don't use the char* pointer, the output is 1 - why?
Because pointer arithmetic works in units of the type pointed to. For example:
int* p_num = malloc(10 * sizeof(int));
int* p_num2 = p_num + 5;
Here, p_num2 does not point five bytes beyond p_num, it points five integers beyond p_num. If on your machine an integer is four bytes wide, the address stored in p_num2 will be twenty bytes beyond that stored in p_num. The reason for this is mainly so that pointers can be indexed like arrays. p_num[5] is exactly equivalent to *(p_num + 5), so it wouldn't make sense for pointer arithmetic to always work in bytes, otherwise p_num[5] would give you some data that started in the middle of the second integer, rather than giving you the sixth integer as you would expect.
In order to move a specific number of bytes beyond a pointer, you need to cast the pointer to point to a type that is guaranteed to be exactly 1 byte wide (a char).
Also, you have an error here:
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
You have two format specifiers but only one argument after the format string.
If I don't use the char* pointer, the output is 1 - WHY?
Because operator- obeys the same pointer arithmetic rules that operator+ does. You incremented the sizeof(MyStruct) when you added one to the pointer, but without the cast you are dividing the byte difference by sizeof(MyStruct) in the operator- for pointers.
Why not use the built in sizeof() operator?
Because you want the size of your struct in bytes. And pointer arithmetics implicitly uses type sizes.
int* p;
p + 5; // this is implicitly p + 5 * sizeof(int)
By casting to char* you circumvent this behavior.
Pointer arithmetic is defined in terms of the size of the type of the pointer. This is what allows (for example) the equivalence between pointer arithmetic and array subscripting -- *(ptr+n) is equivalent to ptr[n]. When you subtract two pointers, you get the difference as the number of items they're pointing at. The cast to pointer to char means that it tells you the number of chars between those addresses. Since C makes char and byte essentially equivalent (i.e. a byte is the storage necessary for one char) that's also the number of bytes occupied by the first item.