Dereferencing and typecasting - c

I've constructed the following sections of code to help myself understand pointer dereferencing and typecasting in C.
char a = 'a';
char * b = &a;
int i = (int) *b;
For the above, I understand that on the 3rd line, I've dereferenced b and got 'a' and (int) will typecast the value of 'a' to its corresponding value of 97 which is stored into i. But for this section of code:
char a = 'a';
char * b = &a;
int i = *(int *)b;
This results in i being some arbitrary large number like 792351. I'm assuming this is a memory address but my question is why? When I typecast b to an integer pointer, does this actually cause b to point to a different area in memory? What is going on?
EDIT: If the above doesn't work, then why would something like this work:
char a = 'a';
void * b = &a;
char c = *(char *)b;
This correctly assigns 'a' to c.

Your int is larger than your char - you get the 'a' value + some random data following it in memory.
E.g, assuming this layout in memory:
'a'
0xFF
0xFF
0xFF
Your char * and int * both point to the 'a'. When you dereference the char *, you get only the first byte, the 'a'. When you dereference the int * (assuming your int is 32-bit) you get the 'a' and the 3 bytes of uninitialized data following it.
EDIT: In response to updated question:
In char c = *(char *)b;, b still points at the 'a' value. You cast it to a char *, and then dereference it, getting the char pointed to by a char *

The last line you're concerned about does a very bad thing. First, it treats b as an int* whereas b is a char*. That is, the memory pointer to by b is assumed as 4 bytes(typically) instead of 1 byte. So when you dereference it, it goes to the 1 byte pointed by the actual b, takes the following 3 bytes too, treats those 4 bytes as a single int, and gives you the result. That's why it's garbage.
In general, casting one pointer type to another pointer type must be done with great caution.

You're casting a char pointer to an int pointer. Characters are (usually) stored as 8 bits. ints, on the other hand, are 32 bits (or 64 on 64-bit systems). So if you look at the other 24 bits of memory next to the 8 bits worth of b, you'll get a bunch of extra bits that weren't initialized. Even the position of *b in i is architecture dependent.
big-endian: **** ****|**** ****|**** ****|0110 0001
little-endian: 0110 0001|**** ****|**** ****|**** ****
When you cast the character stored in the above, all the asterisks become relevant.

Since a char is 1 Byte long, and an int 4, when you read an int from the address of a single character, you're reading the character and 3 more bytes. The content of these bytes is just whatever happens to lie in memory (pointers, the value of b) and could even be unallocated (resulting in a segmentation fault).

When you type cast it to a (int *) type, it will refer to a total of 4 bytes(size if int) in memory.

In the second case, you're treating the same address as if it pointed to an int. Officially, the result is simply undefined behavior.
Realistically, what happens is that whatever happens to be in the four1 bytes starting at that address get interpreted as an int.
1 4 bytes assuming a 32-bit int -- if your implementation has, for example, a 64-bit int, it'll be 8 bytes.

Related

Incrementing pointer to pointer by one byte

#include <stdio.h>
int main(){
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
cp++; // This still moves 8 bytes
return 0;
}
Since the size of a pointer is 64 bits on 64 bit machines, doing a pp++ will always move 8 bytes. Is there a way to make it move only 1 byte?
Is there a way to make it move only 1 byte?
Maybe.
All object pointers can be converted to void * and since char * has the same representation, to char *. ++ increments a char * by 1.
#include <stdio.h>
int main() {
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
char *character_pointer = (char *) cp;
character_pointer++; // Increment by 1
Now is the tricky part. Can that incremented pointer convert back to a char **. C allows that unless
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.2 7
cp = (char **) character_pointer;
return 0;
}
Reading *cp can readily cause undefined behavior as cp does not certainly point to a valid char *. Unclear as to OP's goal at this point.
C is not assembly. What you are trying to do is undefined behavior, and compiler might not do what you ask, and the program might do anything, including possibly what you think it should do if C were just "assembly" with different syntax.
That being said, you can do this:
int a = 5;
int *p = &a;
int **pp = &p;
uintptr_t temp;
memcpy(&temp, &pp, sizeof temp);
temp++;
memcpy(&pp, &temp, sizeof temp);
Above code is likely to do what you want, even though that last memcpy already triggers undefined behavior, because it copies invalid value to a pointer (that is enough for it to be UB). Actually using pp, which now has invalid value, has increasing chance of messing things up.
To understand why having any UB is indeed UB: compiler is free to decide that the effect of the code, which can be proven to have UB, is nothing, or is never reached. So if that last memcpy is inside if, and compiler can prove UB occurs if condition is true, it may just assume condition is never true and optimize whole if away. Presumably C programmer knows to write their condition so that it would never result in UB, so this optimization can be made at compile time already.
Yeah, it is a bit crazy. C is not just assembly with different syntax!
Incrementing pointer to pointer by one byte
If you find an implementation where the size of a pointer to pointer variable contains only 8 bits, (i.e. one that uses 1 byte addressing, btw, very unlikely), then it will be doable, and only then would it be safe to do so. Otherwise it would not be considered a practical or safe thing to do.
For an implementation that uses 64 bit addressing, 64 bits are needed to represent each natural pointer location. Note however though _[t]he smallest incremental change is [available as a by-product of] the alignment needs of the referenced type. For performance, this often matches the width of the ref type, yet systems can allow less._ (per #Chux in comments) but de-referencing these locations could, and likely would lead to undefined behavior.
And in this statement
char **cp = (char **)pp; //where pp is defined as int **
the cast, although allowing a compile without complaining, is simply masking a problem. With the exception of void *, pointer variables are created using the same base type of the object they are to point to for the reason that the sizeof different types can be different, so the pointers designed to point to a particular type can represent its locations accurately.
It is also important to note the following:
sizeof char ** == sizeof char * == sizeof char *** !!= sizeof char`
32bit 4 bytes 4 bytes 4 bytes 1 byte
64bit 8 bytes 8 bytes 8 bytes 1 byte
sizeof int ** == sizeof int * == sizeof int *** !!= sizeof int`
32bit 4 bytes 4 bytes 4 bytes 4 bytes (typically)
64bit 8 bytes 8 bytes 8 bytes 4 bytes (typically)
So, unlike the type of a pointer, its size has little to do with it's ability to point to a location containing an object that is smaller, or even larger in size than the pointer used to point to it.
The purpose of a pointer ( eg char * ) is to store an address to an object of the same base type, in this case char. If targeting 32bit addressing, then the size of the pointer indicates it can point to 4,294,967,296 different locations (or if 64 bits to 18,446,744,073,709,551,616 locations.) and because in this case it is designed to point to char, each address differs by one byte.
But this really has nothing to do with your observation that when you increment a pointer to pointer to char that you see 8 bytes, and not 1 byte. It simply has to do with the fact that pointers, in 64bit addressing, require 8 bytes of space, thus the successive printf statements below will always show an increment of 8 bytes between the 1st and 2nd calls:
char **cp = (char **)pp;
size_t size = sizeof(cp);
printf("address of cp before increment: %p\n", cp);
cp++; // This still moves 8 bytes
printf("address of cp after increment: %p\n", cp);
return 0;

Difference between char *pp and (char*) p?

I am having a problem with my exercise in which I have to explain the running of pointers in C.
Can you explain what is the differences between char *pp and (char*) p and the outputs to me?
#include <stdio.h>
#include <stdlib.h>
/*
*
*/
int main(int argc, char** argv) {
int n=260, *p=&n;
printf("n=%d\n", n);
char *pp=(char*)p;
*pp=0;
printf("n=%d\n",n);
return (EXIT_SUCCESS);
}
n=260
n=256
I'm so sorry for the mistake I've done! Hope you guys can help me.
Your question is a basic question, but one that every new C-programmer wrestles with and is fundamental to understanding C. Understanding pointers. While they are easy to understand once you understand them, getting to that point can be frustrating based on the way the information is presented in many books or tutorials.
Pointer Basics
A pointer is simply a normal variable that holds the address of something else as its value. In other words, a pointer points to the address where something else can be found. Where you normally think of a variable holding an immediate values, such as int n = 260;, a pointer (e.g. int *p = &n;) would simply hold the address where 260 is stored in memory.
If you need to access the value stored at the memory address pointed to by p, you dereference p using the unary '*' operator, (e.g. int j = *p; will initialize j = 260).
If you want to obtain a variables address in memory, you use the & (address of) operator. If you need to pass a variable as a pointer, you simply provide the address of the variable as a parameter.
Since p points to the address where 260 is stored, if you change that value at that address (e.g. *p = 41;) 41 is now stored at the address where 260 was before. Since p points to the address of n and you have changed the value at that address, n now equals 41. However j resides in another memory location and its value was set before you changed the value at the address for n, the value for j remains 260.
Pointer Arithmetic
Pointer arithmetic works the same way regardless of the type of object pointed to because the type of the pointer controls the pointer arithmetic, e.g. with a char * pointer, pointer+1 points to the next byte (next char), for an int * pointer (normal 4-byte integer), pointer+1 will point to the next int at an offset 4-bytes after pointer. (so a pointer, is just a pointer.... where arithmetic is automatically handled by the type)
In your case you create a second pointer of a different type char *pp = (char*)p;. The pointer pp now also holds the address of n but it is interpreted at type char on access instead of type int.
The C standard prohibits access of a value stored at an address though a pointer of a different type. C11 Standard - §6.5 Expressions (p6,7) (known as the strict-aliasing rule). There are exceptions to the rule. One exception (the last point) is that any value may be accessed through a pointer of char type.
What Happens to the Value of n In Your Case?
When you assign:
*pp = 0;
you storing the single-byte 0 (or 00000000 in binary) to the memory location held by pp. Here is where endianess (little-endian, big-endian) come into play. Recall, for little-endian computers (just about all x86 and x86_64 IBM-PC clone type boxes), the values are stored in memory with the Least-Significant Byte first. (big-endian stores values with the Most-Significan Byte first). So your original value of n (10000100in binary) is stored in memory on a little-endian box as
n (little endian) : 00000100-00000001-00000000-00000000 (260)
^
|
p (type int)
The character pointer pp is assigned the address held by p, so both p and pp, hold the same address (the difference being one is a pointer to int the other a pointer to char:
n (little endian) : 00000100-00000001-00000000-00000000 (260)
^
|
p (type int)
pp (type char)
When you dereference pp (e.g. *pp) and assign the value zero (e.g. *pp = 0;), you overwrite the first byte of n in memory with zero. After the assignment, you now have:
n (little endian) : 00000000-00000001-00000000-00000000 (256)
^
|
p (type int)
pp (type char)
Which is the binary value 100000000, (256 or hex 0x0100) and what your code outputs for the value of n. Ask yourself this, if the computer you were using was big-endian, what would be resulting value have been?
Let me know if you have any further questions.
char *pp declares the variable pp as a pointer to char - pp will store the address of a char object.
(char *)p is a cast expression - it means “treat the value of p as a char *”.
p was declared as an int * - it stores the address of an int object (in this case, the address of n). The problem is that the char * and int *types are not compatible - you can’t assign one to the other directly1. You have to use a cast to convert the value to the right type.
Pointers to different types are themselves different types, and do not have to have the same size or representation. The one exception is the void * type - it was introduced specifically to be a “generic” pointer type, and you don’t need to explicitly cast when assigning between void * and other pointer types.

Type casting the character pointer

I am from Java back ground.I am learning C in which i gone through a code snippet for type conversion from int to char.
int a=5;
int *p;
p=&a;
char *a0;
a0=(char* )p;
My question is that , why we use (char *)p instead of (char)p.
We are only casting the 4 byte memory(Integer) to 1 byte(Character) and not the value related to it
You need to consider pointers as variable that contains addresses. Their sole purpose is to show you where to look in the memory.
so consider this:
int a = 65;
void* addr = &a;
now the 'addr' contains the address of the the memory where 'a' is located
what you do with it is up to you.
here I decided to "see" that part of the memory as an ASCII character that you could print to display the character 'A'
char* car_A = (char*)addr;
putchar(*car_A); // print: A (ASCII code for 'A' is 65)
if instead you decide to do what you suggested:
char* a0 = (char)addr;
The left part of the assignment (char)addr will cast a pointer 'addr' (likely to be 4 or 8 bytes) to a char (1 byte)
The right part of the assignment, the truncated address, will be assigned as the address of the pointer 'a0'
If you don't see why it doesn't make sense let me clarify with a concrete example
Say the address of 'a' is 0x002F4A0E (assuming pointers are stored on 4 bytes) then
'*addr' is equal to 65
'addr' is equal to 0x002F4A0E
When casting it like so (char)addr this become equal to 0x0E.
So the line
char* a0 = (char)addr;
become
char* a0 = 0x0E
So 'a0' will end up pointing to the address 0x0000000E and we don't know what is in this location.
I hope this clarify your problem
First of all, p is not necessarily 4 bytes since it's architecture-dependent. Second, p is a pointer to an integer, a0 is a pointer to a character, not a character. You're taking a pointer pointing to an integer and casting it to a pointer to a character. There are few good reasons to do this. You could also cast the value to a character, but I can't imagine any reason for doing this either.
Pointers do not provide information whether they point to a single object of first object of an array.
Consider
int *p;
int a[5] = { 1, 2, 3, 4, 5 };
int x = 1;
p = a;
p = &x;
So having a value in the pointer p you can not say whether the value is the address of the first element of the array a or it is the address of the single object x.
It is your responsibility to interpret the address correctly.
In this expression-statement
a0=(char* )p;
the address of the extent of memory pointed to by the pointer p and occupied by an object of the type int (it is unknown whether it is a single object or the first object of an array) is interpreted as an address of an extent of memory occupied by an object of the type char. Whether it is a single object of the type char or the first object of a character array with the size equal to sizeof( int ) depends on your intention that is how you are going to deal with the pointer.

casting int pointer to char pointer

I've read several posts about casting int pointers to char pointers but i'm still confused on one thing.
I understand that integers take up four bytes of memory (on most 32 bit machines?) and characters take up on byte of memory. By casting a integer pointer to a char pointer, will they both contain the same address? Does the cast operation change the value of what the char pointer points to? ie, it only points to the first 8 bits of an integers and not all 32 bits ? I'm confused as to what actually changes when I cast an int pointer to char pointer.
By casting a integer pointer to a char pointer, will they both contain the same address?
Both pointers would point to the same location in memory.
Does the cast operation change the value of what the char pointer points to?
No, it changes the default interpretation of what the pointer points to.
When you read from an int pointer in an expression *myIntPtr you get back the content of the location interpreted as a multi-byte value of type int. When you read from a char pointer in an expression *myCharPtr, you get back the content of the location interpreted as a single-byte value of type char.
Another consequence of casting a pointer is in pointer arithmetic. When you have two int pointers pointing into the same array, subtracting one from the other produces the difference in ints, for example
int a[20] = {0};
int *p = &a[3];
int *q = &a[13];
ptrdiff_t diff1 = q - p; // This is 10
If you cast p and q to char, you would get the distance in terms of chars, not in terms of ints:
char *x = (char*)p;
char *y = (char*)q;
ptrdiff_t diff2 = y - x; // This is 10 times sizeof(int)
Demo.
The int pointer points to a list of integers in memory. They may be 16, 32, or possibly 64 bits, and they may be big-endian or little endian. By casting the pointer to a char pointer, you reinterpret those bits as characters. So, assuming 16 bit big-endian ints, if we point to an array of two integers, 0x4142 0x4300, the pointer is reinterpreted as pointing to the string "abc" (0x41 is 'a', and the last byte is nul). However if integers are little endian, the same data would be reinterpreted as the string "ba".
Now for practical purposes you are unlikely to want to reinterpret integers as ascii strings. However its often useful to reinterpret as unsigned chars, and thus just a stream of raw bytes.
Casting a pointer just changes how it is interpreted; no change to its value or the data it points to occurs. Using it may change the data it points to, just as using the original may change the data it points to; how it changes that data may differ (which is likely the point of doing the casting in the first place).
A pointer is a particular variable that stores the memory address where another variable begins. Doesnt matter if the variable is a int or a char, if the first bit has the same position in the memory, then a pointer to that variable will look the same.
the difference is when you operate on that pointer. If your pointer variable is p and it's a int pointer, then p++ will increase the address that it contains of 4 bytes.
if your pointer is p and it's a char pointer, then p++ will increase the address that it contains of 1 byte.
this code example will help you understand:
int main(){
int* pi;
int i;
char* pc;
char c;
pi = &i;
pc = &c;
printf("%p\n", pi); // 0x7fff5f72c984
pi++;
printf("%p\n", pi); // 0x7fff5f72c988
printf("%p\n", pc); // 0x7fff5f72c977
pc++;
printf("%p\n", pc); // 0x7fff5f72c978
}

Getting the size of a struct value

Example
#include <stdio.h>
struct A {
char *b;
};
int main(int argc, char *argv[]) {
char c[4] = { 'c', 'a', 't', '\0' };
struct A a;
a.b = c;
printf("%s\n", a.b); // cat
printf("%lu\n", sizeof c); // 4
printf("%lu\n", sizeof a.b); // 8 ???
}
Why does sizeof a.b returns 8 and not 4? If I understood correctly, a.b returns the value that was assigned to it, which is c. But shouldn't it return the size of c (which is 4) then?
sizeof() operator gives the number of bytes allocated to the object and in your case the object is a pointer whose size looks like is 8 bytes on your system.
You're calling sizeof() on two different types.
sizeof(a.b) is sizeof(char *), which is 8 on your platform.
sizeof(c) is sizeof(char[4]), which is 4.
We can have pointers point to arrays via array decaying, which you can read about in this other answer: What is array decaying?
First of all sizeof(a.b) is not size of c. It doesn't give size of what it is pointing to, rather it is size of the pointer.
Take an example of char:
size of char a is 1
and char *b is 4. (on 64 bit)
So it is size of the pointer not what it points to. Please note these sizes are platform dependent.
Although don't get confused by int. An int and int * are of same size on some platforms.
If I understood correctly, a.b returns the value that was assigned to it,
Not exactly. a.b is what's called an lvalue. This means that it designates a memory location. However it does not read that memory location yet; that will only happen if we use a.b within a larger context that expects the memory location to be read.
For example:
a.b = something; // does not read a.b
something = a.b; // does read a.b
The case of sizeof is one context where it does not read the memory location. In fact it tells you how many bytes comprise that memory location; it doesn't tell you anything about what is stored there (let alone about some other memory location that might be pointed to by what is stored there, if it is a pointer).
The output is telling you that your system uses 8 bytes to store a pointer.
sizeof() returns the number of bytes of a variable.
In this case sizeof ( char * ) returns 8 bytes which is the number of bytes that compose a pointer.

Resources