int pointer to void pointer to char array somehow? - c

I'm having trouble understanding the following code:
char arr[] = {0,1,2,3};
void *vp;
int *ip;
vp = arr;
ip = vp;
How is it possible that the int pointer ip can point to an element of arr if the array itself it's char? As I understand, while defining the pointer it must the same type as the data it poins to, but why in this case is it possible?

How is it possible that the int pointer ip can point to an element of arr if the array itself it's char?
It is not possible. The code presented is valid, but the effect is not to store in ip a pointer to any element of array arr. By definition, ip points to an int, and the elements of arr are not ints, so ip does not point to any of them.
As I understand, while defining the pointer it must the same type as the data it poins to, but why in this case is it possible?
Because pointer values are not necessarily valid (for their types, or even generally). The result of the code presented is not that ip points to a char, but that attempting to dererference its value produces undefined behavior. The same would be true after ip = NULL;, though the undefined behavior resulting from evaluating *ip would likely be different in the two cases.
You can characterize that as ip not pointing to anything, though I suppose you will recognize that that's an incomplete description. The effect of ip = vp is generally not the same as that of ip = NULL.
More broadly, it is essential to understand that pointers are their own kind of values, distinct from and not inherently connected to the objects, if any, to which they point.

This code
char arr[] = {0,1,2,3};
void *vp;
int *ip;
vp = arr;
ip = vp;
can invoke undefined behavior as it is.
Per 6.3.2.3 Pointers, paragraph 7 of the (draft) C11 standard (bolding mine):
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.
Note that no dereference of the pointer is necessary - merely creating a misaligned pointer invokes undefined behavior.
Since there's no guarantee that arr is properly aligned for an int value, undefined behavior can not be ruled out.
And if you were to actually dereference ip and access arr as if it were an int value, that would be violating strict aliasing and would always be undefined behavior.

This code does "pointer punning". It is one of the way of using the binary representation of one type as another type.
In C it violates the strict alising rules and it is Undefined Behavior.
In your case (assuming 4 bytes int) the content of the char array is going to be interpreted as int value.
int main(void)
{
char arr[] = {0,1,2,3};
int *ip = (int *)arr;
printf("%d - 0x%08x\n", *ip, *ip);
}
output
50462976 - 0x03020100
The hex value is easier to interpret as two digits of the number represent one byte in the memory. As you see the the number is "reversed" comparing to the representation in the memory. It is because PC computers are little endian and least significant byte is store first. https://en.wikipedia.org/wiki/Endianness
To avoid Undefined Behavior you should copy the array into the integer
int main(void)
{
char arr[] = {0,1,2,3};
int ip;
memcpy(&ip, arr, sizeof(ip));
printf("%d - 0x%08x\n", ip, ip);
}
How is it possible that the int pointer ip can point to an element of
arr if the array itself it's char?
In the end, any object (it does not matter what its type is) is just a bunch of chars (bytes) stored in the memory. Your pointer is referencing some location in memory.

Related

Effect of casting in a C program

I somehow understand the idea about casting, but still have the following questions about it, when we cast a variable from one type to another:
Does casting change the actual data type generally? Like eg. if we have char *name ="james" and we cast the char pointer ==> int *name = (int *) name
Do the types of all fields (members) also change in case of a structure data type in C? I.e. if we have a struct student {int id, char *name} and there is a pointer to an instance of struct student and type cast it to another type, do the fields also change?
From another answer here,
casting (also called type coercion) does not change the data - the
underlying bits - but it changes the type, i.e., how those bits are
interpreted.
In other words, a cast tells the compiler:
"You would think the expression has this type, but I say you to use the expression as it had this different type"
At this point the compiler says "Ok" and compile code to do what you ask for with the cast. Take this example (probably there are better ones):
int a,b;
int *p;
a=256; // does not fit in a single byte
p=&a; // a pointer to a
b=*p; // <- now b is 256, the value of a
b=*(char *) p; // <- now b is 0 (probably)
The two assignments to "b" are different. In the first case the compiler nows that p points to integers, so it will move in b 4 or 8 bytes taken from a.
The second assignment, with the cast, tells the compiler to act as p was not a pointer to an integer, but a pointer to a char. And the compiler obeys, and takes only one byte from the value of a (pointed to by p).
Does casting change the actual data type generally? Like eg. if we have char *name ="james" and we cast the char pointer ==> int *name = (int *) name
Casting a pointer does not change the memory the pointer points to.
When a pointer, say p, is dereferenced, as with *p, and used to get a value from memory, the type of the pointer determines how the compiler interprets the memory. If p is a char *, then using *p for its value loads one byte from memory and interprets it as a char value. If p is an int *, then using *p for its value loads as many bytes from memory as an int uses and interprets them as an int value. If p is an int * and we use * (char *) p for its value, then (char *) p is a char *, so * (char *) p loads one byte from memory and interprets it as a char value.
Conversely, when using *p to store a value to memory, the value will be encoded according to the rules for the *p type, and the resulting bytes will be written to memory.
The C standard has rules about which types may be used to access memory that has been established to contain data of another type. If those rules are not followed, the behavior of the program is not defined by the C standard. Using an int * converted from a pointer to a char in an array of char is not one of the defined uses. Nominally, it asks the compiler to interpret the bytes of the array as if they encoded an int value, but, because the code is not following the rules of the C standard, the compiler might or might not do that.
Do the types of all fields (members) also change in case of a structure data type in C? I.e. if we have a struct student {int id, char *name} and there is a pointer to an instance of struct student and type cast it to another type, do the fields also change?
Casting a pointer does not change the memory the pointer points to. If you cast a pointer to one structure type to a pointer to another structure type and attempt to access memory with it, it might not work as you desire.

Difference between type casting (char*) and (size_t)

I was trying to find the difference of two pointers by subtraction, but one is int * and other is char *. As a result it gave me an error, as I expected, because of incompatible pointer type.
int main() {
char * ca="test";
int *ia=malloc(12);
*ia=45;
printf("add char * =%p, add int = %p \n", ca, ia);
printf("add ca-va * =%p\n", ca-ia);
return(0);
}
test3.c:22:35: error: invalid operands to binary - (have ‘char *’
and ‘int *’)
However, when I type cast int* to size_t I was successfully able to subtract the address. Can some explain what exactly size_t did here?
int main() {
char * ca="test";
int *ia=malloc(12);
*ia=45;
printf("add char * =%p, add int = %p \n", ca, ia);
printf("add ca-va * =%p\n", (ca-(size_t)ia));
return(0);
}
You have 2 problems here:
The difference between two pointer values is counted in units of the data type the pointers point to. This cannot work if you have two different data types.
Pointer arithmetics is only allowed within the same data object. You may only subtract pointers that point to the same array or to one block of dynamically allocated memory.
This is not the case in your code.
Subtracting pointers that do not match those criterias doesn't make much sense anyway.
The compiler is right to complain.
This is just pointer arithmetic.
For some pointer ptr and integer offset, ptr - offset means the address offset elements before ptr. Note that this is elements (whatever the pointer points to), not bytes. You can also use addition here. ptr[i] is shorthand for *(ptr + i).
For two pointers of the same type (e.g. both char*), ptr1 - ptr2 means the number of elements between the 2 pointers. e.g. if ptr1 - ptr2 == 5, then ptr1 + 5 == ptr2.
For two pointers of different types (e.g. char* and int*) ptr1 - ptr2 doesn't make any sense.
In your first piece of code the error occurs because you're trying to subtract pointers of different types. The second piece of code works because your cast is causing it to use the ptr - offset version. But this is is certainly not what you actually want because a pointer was converted to an offset and the result is a pointer.
What you probably want is something that Paul Hankin mentioned in a comment:
intptr_t pc = (intptr_t)ca;
intptr_t pa = (intptr_t)ia;
printf("add ca-va = %" PRIdPTR "\n", pc - pa);
This converts the pointers into integer types capable of holding an address and then does the subtraction. You will need to #include <inttypes.h> to get PRIdPTR (inttypes.h internally includes stdint.h which provides intptr_t).
size_t is an integer type. When a pointer is converted to an integer type, the result is implementation-defined (if it it can fit in the destination type; otherwise the behavior is not defined by the C standard).
Per a non-normative note in the C standard, “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.” On machines with simple memory address schemes, the result of converting a pointer to an integer is typically the memory address. The remainder of this answer will assume we have such a C implementation.
Thus, if ca points to an array of char at address 9678, and ia points to some allocated memory at 4444, the result of converting ia to size_t would be 4444. Then, when 4444 is subtracted from ca, we are not subtracting two pointers but rather are subtracting an integer from a pointer. In general, the behavior of this is not defined by the C standard, because you are only allowed to add and subtract integers to pointers within the bounds of one array, and 4444 is far outside of ca in this example. However, what the compiler may do is simply convert the integer to the size of the pointed-to elements and then subtract the result from the address. Since ca points to char, and the size of char is one byte, converting 4444 to the size of 4444 char elements is simply 4444 bytes. Then 9678−4444 is 5234, so the result is a pointer that points to address 5234.
When you need to convert a pointer to an integer, there is a better type for this, uintptr_t, defined in the <stdint.h> header. (Comments have pointed out intptr_t, but you should use the unsigned version unless there is specific reason to use the signed version.) Then, if you convert both pointers to uintptr_t, as with (uintptr_t) ca - (uintptr_t) ia you will avoid the problem of the first pointer possibly pointing to some type whose size is not one byte. Then result on machines with flat memory address spaces will typically be the difference between the two addresses.
Since implementation-defined and undefined behavior are involved here, this is not something you can rely on, and you should not manipulate pointers this way in normal code.

Why are C pointers recommended to be the type of the data they're pointing to, if they're 8 bytes large regardless?

So I'm learning about C pointers, and I'm a little confused. Pointers just point to specific memory address.
sizeof(char*), sizeof(int*), sizeof(double*) all output 8. So they all take 8 bytes to store a memory address.
However, if I try to compile something like this:
int main(void)
{
char letter = 'A';
int *a = &letter;
printf("letter: %c\n", *a);
}
I get a warning from the compiler (gcc):
warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
int *a = &letter;
However, char *a = &letter;, doesn't result in a warning.
Why does the type of the pointer matter, if it's 8 bytes long anyway? Why does declaring a pointer with a type different than the type of data it's pointing to yield in a warning?
The issue isn't about the size of the pointer - it's about the type of the pointee.
If you have a pointer to an int, the pointer takes up some number of bytes (seems like you have a 64-bit system, where that pointer takes up eight bytes). However, if you dereference that pointer to read or write what it points to, because the type of the pointer is int*, the read or write will try to manipulate sizeof(int) bytes at the target, and it will try to manipulate them as though they're an int.
If you have a single object of type char, which by definition has size 1, and you try to read or write it through a pointer of type int, which (on many systems) has size 4, then reading the pointer will pull back some garbage data along with the char and writing to the pointer will clobber random regions of memory around that char with unrelated values.
Additionally, C has a rule called the strict aliasing rule that says that you are not allowed to read or write through a pointer of a type that doesn't match the type of what's being pointed at (unless the pointer is of type char *, signed char*, or unsigned char*). Breaking strict aliasing can mess up all sorts of compiler optimizations and lead to code that doesn't behave as expected.
So in short, the size of the pointer really isn't the issue here. It's the semantics about what happens when you try to read or write what's being pointed at.
Think about what you're going to do with the pointer.
int n = 42;
char *p = &n; // BAD
If this compiles (a compiler can reject it outright rather than printing a non-fatal warning), you have a pointer that points to the memory occupied by the int object n. How are you going to get the value of that object? *p gives you a char result, most likely the first byte of n -- which may be the high-order byte or the low-order byte.
Pointer types depend on the type of object they point to so that you can access that object.
(Also, don't make assumptions based on the behavior of your particular implementation. 32-bit systems have 4-byte pointers, and the language doesn't guarantee that all pointers are the same size.)
It's not about bytes length but about the type of what you're pointing to. Char and int are completly different. Moreover, sizeof(char) equal 1 and sizeof(int *) equal 8.

casting int pointer to char pointer

I've read several posts about casting int pointers to char pointers but i'm still confused on one thing.
I understand that integers take up four bytes of memory (on most 32 bit machines?) and characters take up on byte of memory. By casting a integer pointer to a char pointer, will they both contain the same address? Does the cast operation change the value of what the char pointer points to? ie, it only points to the first 8 bits of an integers and not all 32 bits ? I'm confused as to what actually changes when I cast an int pointer to char pointer.
By casting a integer pointer to a char pointer, will they both contain the same address?
Both pointers would point to the same location in memory.
Does the cast operation change the value of what the char pointer points to?
No, it changes the default interpretation of what the pointer points to.
When you read from an int pointer in an expression *myIntPtr you get back the content of the location interpreted as a multi-byte value of type int. When you read from a char pointer in an expression *myCharPtr, you get back the content of the location interpreted as a single-byte value of type char.
Another consequence of casting a pointer is in pointer arithmetic. When you have two int pointers pointing into the same array, subtracting one from the other produces the difference in ints, for example
int a[20] = {0};
int *p = &a[3];
int *q = &a[13];
ptrdiff_t diff1 = q - p; // This is 10
If you cast p and q to char, you would get the distance in terms of chars, not in terms of ints:
char *x = (char*)p;
char *y = (char*)q;
ptrdiff_t diff2 = y - x; // This is 10 times sizeof(int)
Demo.
The int pointer points to a list of integers in memory. They may be 16, 32, or possibly 64 bits, and they may be big-endian or little endian. By casting the pointer to a char pointer, you reinterpret those bits as characters. So, assuming 16 bit big-endian ints, if we point to an array of two integers, 0x4142 0x4300, the pointer is reinterpreted as pointing to the string "abc" (0x41 is 'a', and the last byte is nul). However if integers are little endian, the same data would be reinterpreted as the string "ba".
Now for practical purposes you are unlikely to want to reinterpret integers as ascii strings. However its often useful to reinterpret as unsigned chars, and thus just a stream of raw bytes.
Casting a pointer just changes how it is interpreted; no change to its value or the data it points to occurs. Using it may change the data it points to, just as using the original may change the data it points to; how it changes that data may differ (which is likely the point of doing the casting in the first place).
A pointer is a particular variable that stores the memory address where another variable begins. Doesnt matter if the variable is a int or a char, if the first bit has the same position in the memory, then a pointer to that variable will look the same.
the difference is when you operate on that pointer. If your pointer variable is p and it's a int pointer, then p++ will increase the address that it contains of 4 bytes.
if your pointer is p and it's a char pointer, then p++ will increase the address that it contains of 1 byte.
this code example will help you understand:
int main(){
int* pi;
int i;
char* pc;
char c;
pi = &i;
pc = &c;
printf("%p\n", pi); // 0x7fff5f72c984
pi++;
printf("%p\n", pi); // 0x7fff5f72c988
printf("%p\n", pc); // 0x7fff5f72c977
pc++;
printf("%p\n", pc); // 0x7fff5f72c978
}

Confusion with pointers

I am trying to learn C. The reading I've been doing explains pointers as such:
/* declare */
int *i;
/* assign */
i = &something;
/* or assign like this */
*i = 5;
Which I understand to mean i = the address of the thing stored in something
Or
Put 5, or an internal representation of 5, into the address that *i points to.
However in practice I am seeing:
i = 5;
Should that not cause a mismatch of types?
Edit: Semi-colons. Ruby habits..
Well, yes, in your example setting an int pointer to 5 is a mismatch of types, but this is C, so there's nothing stopping you. This will probably cause faults. Some real hackery could be expecting some relevant data at the absolute address of 5, but you should never do that.
The English equivalents:
i = &something
Assign i equal to the address of something
*i =5
Assign what i is pointing to, to 5.
If you set i = 5 as you wrote in your question, i would contain the address 0x00000005, which probably points to garbage.
Hope this helps explain things:
int *i; /* declare 'i' as a pointer to an integer */
int something; /* declare an integer, and set it to 42 */
something = 42;
i = &something; /* now this contains the address of 'something' */
*i = 5; /* change the value, of the int that 'i' points to, to 5 */
/* Oh, and 'something' now contains 5 rather than 42 */
If you're seeing something along the lines of
int *i;
...
i = 5;
then somebody is attempting to assign the address 0x00000005 to i. This is allowed, although somewhat dangerous (N1256):
6.3.2.3 Pointers
...
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.55) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.56)
...
55) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.17.
56) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
Depending on the architecture and environment you're working in, 0x00000005 may not be a valid integer address (most architectures I'm familiar with require multibyte types to start with even addresses) and such a low address may not be directly accessible by your code (I don't do embedded work, so take that with a grain of salt).
I understand to mean i = the address of the thing stored in something
Actually i contains an address, which SHOULD be the address of a variable containing an int.
I said should because you can't be sure of that in C:
char x;
int *i;
i = (int *)&x;
if i is a pointer, than assign to it something different to a valid address accessible from you program, is an error an I think could lead to undefined behavior:
int *i;
i = 5;
*i; //undefined behavior..probably segfault
here's some examples:
int var;
int *ptr_to_var;
var = 5;
ptr_to_var = var;
printf("var %d ptr_to_var %d\n", var, *ptr_to_var); //both print 5
printf("value of ptr_to_var %p must be equal to pointed variable var %p \n" , ptr_to_var, &var);
I hope this helps.
This declares a variable name "myIntPointer" which has type "pointer to an int".
int *myIntPointer;
This takes the address of an int variable named "blammy" and stores it in the int pointer named "myIntPointer".
int blammy;
int *myIntPointer;
myIntPointer = &blammy;
This takes an integer value 5 and stores it in the space in memory that is addressed by the int variable named "blammy" by assigning the value through an int pointer named "myIntPointer".
int blammy;
int *myIntPointer;
myIntPointer = &blammy;
*myIntPointer = 5;
This sets the int pointer named "myIntPointer" to point to memory address 5.
int *myIntPointer;
myIntPointer = 5;
assignment of hard-coded addresses, is something that shouldn't be done (even in the embedded world, however there are some cases where it's suitable.)
when declaring a pointer, limit yourself to only assign a value to it with dynamiclly allocated memory(see malloc()) or with the & (the address) of a static (not temporary) variable. this will ensure rebust code, and less chance to get the famous segmentation fault.
good luck with learning c.

Resources