Incrementing pointer to pointer by one byte - c

#include <stdio.h>
int main(){
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
cp++; // This still moves 8 bytes
return 0;
}
Since the size of a pointer is 64 bits on 64 bit machines, doing a pp++ will always move 8 bytes. Is there a way to make it move only 1 byte?

Is there a way to make it move only 1 byte?
Maybe.
All object pointers can be converted to void * and since char * has the same representation, to char *. ++ increments a char * by 1.
#include <stdio.h>
int main() {
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
char *character_pointer = (char *) cp;
character_pointer++; // Increment by 1
Now is the tricky part. Can that incremented pointer convert back to a char **. C allows that unless
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr § 6.3.2.2 7
cp = (char **) character_pointer;
return 0;
}
Reading *cp can readily cause undefined behavior as cp does not certainly point to a valid char *. Unclear as to OP's goal at this point.

C is not assembly. What you are trying to do is undefined behavior, and compiler might not do what you ask, and the program might do anything, including possibly what you think it should do if C were just "assembly" with different syntax.
That being said, you can do this:
int a = 5;
int *p = &a;
int **pp = &p;
uintptr_t temp;
memcpy(&temp, &pp, sizeof temp);
temp++;
memcpy(&pp, &temp, sizeof temp);
Above code is likely to do what you want, even though that last memcpy already triggers undefined behavior, because it copies invalid value to a pointer (that is enough for it to be UB). Actually using pp, which now has invalid value, has increasing chance of messing things up.
To understand why having any UB is indeed UB: compiler is free to decide that the effect of the code, which can be proven to have UB, is nothing, or is never reached. So if that last memcpy is inside if, and compiler can prove UB occurs if condition is true, it may just assume condition is never true and optimize whole if away. Presumably C programmer knows to write their condition so that it would never result in UB, so this optimization can be made at compile time already.
Yeah, it is a bit crazy. C is not just assembly with different syntax!

Incrementing pointer to pointer by one byte
If you find an implementation where the size of a pointer to pointer variable contains only 8 bits, (i.e. one that uses 1 byte addressing, btw, very unlikely), then it will be doable, and only then would it be safe to do so. Otherwise it would not be considered a practical or safe thing to do.
For an implementation that uses 64 bit addressing, 64 bits are needed to represent each natural pointer location. Note however though _[t]he smallest incremental change is [available as a by-product of] the alignment needs of the referenced type. For performance, this often matches the width of the ref type, yet systems can allow less._ (per #Chux in comments) but de-referencing these locations could, and likely would lead to undefined behavior.
And in this statement
char **cp = (char **)pp; //where pp is defined as int **
the cast, although allowing a compile without complaining, is simply masking a problem. With the exception of void *, pointer variables are created using the same base type of the object they are to point to for the reason that the sizeof different types can be different, so the pointers designed to point to a particular type can represent its locations accurately.
It is also important to note the following:
sizeof char ** == sizeof char * == sizeof char *** !!= sizeof char`
32bit 4 bytes 4 bytes 4 bytes 1 byte
64bit 8 bytes 8 bytes 8 bytes 1 byte
sizeof int ** == sizeof int * == sizeof int *** !!= sizeof int`
32bit 4 bytes 4 bytes 4 bytes 4 bytes (typically)
64bit 8 bytes 8 bytes 8 bytes 4 bytes (typically)
So, unlike the type of a pointer, its size has little to do with it's ability to point to a location containing an object that is smaller, or even larger in size than the pointer used to point to it.
The purpose of a pointer ( eg char * ) is to store an address to an object of the same base type, in this case char. If targeting 32bit addressing, then the size of the pointer indicates it can point to 4,294,967,296 different locations (or if 64 bits to 18,446,744,073,709,551,616 locations.) and because in this case it is designed to point to char, each address differs by one byte.
But this really has nothing to do with your observation that when you increment a pointer to pointer to char that you see 8 bytes, and not 1 byte. It simply has to do with the fact that pointers, in 64bit addressing, require 8 bytes of space, thus the successive printf statements below will always show an increment of 8 bytes between the 1st and 2nd calls:
char **cp = (char **)pp;
size_t size = sizeof(cp);
printf("address of cp before increment: %p\n", cp);
cp++; // This still moves 8 bytes
printf("address of cp after increment: %p\n", cp);
return 0;

Related

Confusion about pointer casts in c

char* p = (char*) malloc(8 * sizeof(char));
strcpy(p, "fungus");
for (int i = 0; i < strlen(p); i++) {
printf("%c", p[i]);
}
char* p2 = (char*) malloc(4 * sizeof(char));
*p2 = *(uint16_t*) p;
printf("\nPointer: %c\n", p2[1]);
So I created a char* p to store the string "fungus".
To my understanding, if I typecast p to a uint16_t* and dereference it, the returned value should be the first 2 bytes pointed to by p. So, p2[1] should be "u", since "u" is the second byte in the string "fungus". However, this isn't the case when I run my program. I've also tried to print p[0] + 1, but this just outputs "g". Is there an error in my logic?
To my understanding, if I typecast p to a uint16_t* and dereference it, the returned value should be the first 2 bytes pointed to by p.
This is not correct, for at least two reasons.
One, a pointer to a char might not have the alignment required for a uint16_t, and then the conversion to uint16_t is not defined by the C standard, per C 2018 6.3.2.3 7:
… If the resulting pointer is not correctly aligned70) for the referenced type, the behavior is undefined…
That will not apply in the example code in the question, since the char * is assigned from malloc, and malloc always returns an address suitably aligned for any fundamental type. However, it may apply to char * created in other ways (including by adding offsets to p, such as attempting to convert p+1 to uint16_t *).
Two, accessing objects defined as char (including those created by writing char values to memory allocated with malloc) as if they were uint16_t violates the aliasing rules in C 2018 6.5 7. It is possible a C implementation might reinterpret the two char as a uint16_t, but it is also possible that optimization by the compiler might transform the undefined behavior into something else.
*p2 = *(uint16_t*) p;
This code is an assignment to the single char *p2, which is p2[0], regardless of the fact that the right-hand side may be a 16-bit value. It does not touch p2[1].
If *(uint16_t*) p; does reinterpret two eight-bit bytes of the string as a uint16_t, then it will produce some 16-bit value that is then assigned to the single char *p2. If char is unsigned, this will store the low eight bits of that value as p2[0], leaving p2[1] untouched. If it is signed, an implementation-defined conversion will be performed, and the result (if a trap does not occur) will be assigned to p2[0], again leaving p2[1] untouched.
Then printf("\nPointer: %c\n", p2[1]); attempts to print a value that has not been initialized, since nothing has put a value in p2[1].
You could try changing *p2 = *(uint16_t*) p; to * (uint16_t *) p2 = * (uint16_t *) p; to copy the uint16_t whole, instead of trying to cram it into a single byte.

casting int pointer to char pointer

I've read several posts about casting int pointers to char pointers but i'm still confused on one thing.
I understand that integers take up four bytes of memory (on most 32 bit machines?) and characters take up on byte of memory. By casting a integer pointer to a char pointer, will they both contain the same address? Does the cast operation change the value of what the char pointer points to? ie, it only points to the first 8 bits of an integers and not all 32 bits ? I'm confused as to what actually changes when I cast an int pointer to char pointer.
By casting a integer pointer to a char pointer, will they both contain the same address?
Both pointers would point to the same location in memory.
Does the cast operation change the value of what the char pointer points to?
No, it changes the default interpretation of what the pointer points to.
When you read from an int pointer in an expression *myIntPtr you get back the content of the location interpreted as a multi-byte value of type int. When you read from a char pointer in an expression *myCharPtr, you get back the content of the location interpreted as a single-byte value of type char.
Another consequence of casting a pointer is in pointer arithmetic. When you have two int pointers pointing into the same array, subtracting one from the other produces the difference in ints, for example
int a[20] = {0};
int *p = &a[3];
int *q = &a[13];
ptrdiff_t diff1 = q - p; // This is 10
If you cast p and q to char, you would get the distance in terms of chars, not in terms of ints:
char *x = (char*)p;
char *y = (char*)q;
ptrdiff_t diff2 = y - x; // This is 10 times sizeof(int)
Demo.
The int pointer points to a list of integers in memory. They may be 16, 32, or possibly 64 bits, and they may be big-endian or little endian. By casting the pointer to a char pointer, you reinterpret those bits as characters. So, assuming 16 bit big-endian ints, if we point to an array of two integers, 0x4142 0x4300, the pointer is reinterpreted as pointing to the string "abc" (0x41 is 'a', and the last byte is nul). However if integers are little endian, the same data would be reinterpreted as the string "ba".
Now for practical purposes you are unlikely to want to reinterpret integers as ascii strings. However its often useful to reinterpret as unsigned chars, and thus just a stream of raw bytes.
Casting a pointer just changes how it is interpreted; no change to its value or the data it points to occurs. Using it may change the data it points to, just as using the original may change the data it points to; how it changes that data may differ (which is likely the point of doing the casting in the first place).
A pointer is a particular variable that stores the memory address where another variable begins. Doesnt matter if the variable is a int or a char, if the first bit has the same position in the memory, then a pointer to that variable will look the same.
the difference is when you operate on that pointer. If your pointer variable is p and it's a int pointer, then p++ will increase the address that it contains of 4 bytes.
if your pointer is p and it's a char pointer, then p++ will increase the address that it contains of 1 byte.
this code example will help you understand:
int main(){
int* pi;
int i;
char* pc;
char c;
pi = &i;
pc = &c;
printf("%p\n", pi); // 0x7fff5f72c984
pi++;
printf("%p\n", pi); // 0x7fff5f72c988
printf("%p\n", pc); // 0x7fff5f72c977
pc++;
printf("%p\n", pc); // 0x7fff5f72c978
}

Size of pointer, pointer to pointer in C

How can I justify the output of the below C program?
#include <stdio.h>
char *c[] = {"Mahesh", "Ganesh", "999", "333"};
char *a;
char **cp[] = {c+3, c+2, c+1, c};
char ***cpp = cp;
int main(void) {
printf("%d %d %d %d ",sizeof(a),sizeof(c),sizeof(cp),sizeof(cpp));
return 0;
}
Prints
4 16 16 4
Why?
Here is the ideone link if you want to fiddle with it.
char *c[] = {"Mahesh", "Ganesh", "999", "333"};
c is an array of char* pointers. The initializer gives it a length of 4 elements, so it's of type char *[4]. The size of that type, and therefore of c, is 4 * sizeof (char*).
char *a;
a is a pointer of type char*.
char **cp[] = {c+3, c+2, c+1, c};
cp is an array of char** pointers. The initializer has 4 elements, so it's of type char **[4]. It size is 4 * sizeof (char**).
char ***cpp = cp;
cpp is a pointer to pointer to pointer to char, or char***. Its size is sizeof (char***).
Your code uses %d to print the size values. This is incorrect -- but it happens to work on your system. Probably int and size_t are the same size. To print a size_t value correctly, use %zu -- or, if the value isn't very large, you can cast it to int and use %d. (The %zu format was introduced in C99; there might still be some implementations that don't support it.)
The particular sizes you get:
sizeof a == 4
sizeof c == 16
sizeof cp == 16
sizeof cpp == 4
are specific to your system. Apparently your system uses 4-byte pointers. Other systems may have pointers of different sizes; 8 bytes is common. Almost all systems use the same size for all pointer types, but that's not guaranteed; it's possible, for example, for char* to be larger than char***. (Some systems might require more information to specify a byte location in memory than a word location.)
(You'll note that I omitted the parentheses on the sizeof expressions. That's legal because sizeof is an operator, not a function; its operand is either an expression (which may or may not be parenthesized) or a type name in parentheses, like sizeof (char*).)
a is an usually pointer, which represents the memory address. On 32-bit operating system, 32bit (4 Byte) unsigned integer is used to represent the address. Therefore, sizeof(a) is 4.
c is an array with 4 element, each element is a pointer, its size is 4*4 = 16
cp is also an array, each element is a pointer (the first *, wich point to another pointer (the second *). The later pointer points to an string in the memory. Therefore its basic element size should represent the size of a pointer. and then sizeof(cp) = 4*4 = 16.
cpp is a pointer's pointer's pointer. It is as well represent the 32bit memory address. therefore its sizeof is also 4.
a is a pointer. cpp is also a pointer just to different type (pointer to pointer to pointer).
Now c is an array. You have 4 elements, each is a pointer so you have 4 * 4 = 16 (it would be different if you would run it on x64).
Similar goes for cp. Try changing type to int and you will see the difference.
So the reason you got 4 16 16 4, is because 'a' is simply a pointer, on its own, which only requires 4 bytes (as a pointer is holding a 32bit address depending on your architecture) and so when you have a **pointer which is == to a *pointer[], your really making an array of pointers, and since you initalized 4 things that created 4 pointers, thus the 4x4 = 16. And for the cpp you may ask "well wouldn't it then be 16 as it was initalized?" and the answer is no, because a ***pointer is its own separate variable and still just a pointer(a pointer to a pointer to a pointer, or a pointer to an array of pointers), and requires only 4bytes of memory.

Malloc, and size on 32 bit machines

#include<stdio.h>
#include<stdlib.h>
int main()
{
int *p;
p = (int *)malloc(20);
printf("%d\n", sizeof(p));
free(p);
return 0;
}
On my 64-bit machine, 4 is printed as the size of p. I'm assuming this is because integers take up 4 bytes in memory, and p is an integer pointer. What if I was running a 32-bit machine? Also, what would happen if I replaced
int *p with double *p
and
(int *)malloc(20) with (double *)malloc(20)?
You are assuming wrong.
In printf("%d\n", sizeof(p)); you are not printing size of integer.You are printing here sizeof 'pointer to an integer' which is 4 bytes in your case.Most probably you will get same result on 32-bit machine.
Now about malloc, It allocates number of bytes and returns pointer to it.So same size of memory will be allocated even if you cast the pointer from int* to double*.
In pointers, It will take four bytes for all pointers.
So while you are checking with sizeof, even it is a character pointer it will give four bytes. If you need the value of that pointer use like this.
printf("%d\n", sizeof(p));// It will give the pointer size.
malloc is allocating the given bytes. And it will give equal to all the pointers. Then don't cast the result of malloc. Refer this link.
You have some misunderstanding let me point them,
p = (int *)malloc(20);
You are allocating memory of 20 bytes and malloc returns a void pointer and but compiler does the casting for you and (int *) is not needed. Even though you have a pointer to a double or an int it takes the same amount of bytes (In a 32bit system this merely for mapping 4GB memory space).
// Should this be 4 or 8?
printf("%d\n", sizeof(p));
This should be 8 on a x64 platform if your executable or the build is x64 only. I assume your build is 32bit and it returns 4.
Above printf has a wrong specifier. sizeof returns size_t and not an int. So correct form should be,
printf("%zu\n", sizeof(p));
Irrespective of whether 64-bit system or 32-bit system, size of a pointer variable is 4 bytes by default since 64-bit build settings also have Debug32 by default.
If we specifically change build settings on 64-bit, then the pointer variable can hold 8 bytes.
I'm assuming this is because integers take up 4 bytes in memory, and p is an integer pointer.
Your assumption is not correct ! To answer your doubt not only integer pointer take up 4 bytes.
int *ptrint;
char *ptrchar;
float *ptrfloat;
double *ptrdouble;
Here all ptrint, ptrchar, ptrfloat, ptrdouble takes 4 bytes of memory since it would be the address stored in that variable.
And if you replace int *p with double *p and (int *)malloc(20) with (double *)malloc(20) , the size would be still 4 bytes. I hope this ans cleared your doubts.

Dereferencing and typecasting

I've constructed the following sections of code to help myself understand pointer dereferencing and typecasting in C.
char a = 'a';
char * b = &a;
int i = (int) *b;
For the above, I understand that on the 3rd line, I've dereferenced b and got 'a' and (int) will typecast the value of 'a' to its corresponding value of 97 which is stored into i. But for this section of code:
char a = 'a';
char * b = &a;
int i = *(int *)b;
This results in i being some arbitrary large number like 792351. I'm assuming this is a memory address but my question is why? When I typecast b to an integer pointer, does this actually cause b to point to a different area in memory? What is going on?
EDIT: If the above doesn't work, then why would something like this work:
char a = 'a';
void * b = &a;
char c = *(char *)b;
This correctly assigns 'a' to c.
Your int is larger than your char - you get the 'a' value + some random data following it in memory.
E.g, assuming this layout in memory:
'a'
0xFF
0xFF
0xFF
Your char * and int * both point to the 'a'. When you dereference the char *, you get only the first byte, the 'a'. When you dereference the int * (assuming your int is 32-bit) you get the 'a' and the 3 bytes of uninitialized data following it.
EDIT: In response to updated question:
In char c = *(char *)b;, b still points at the 'a' value. You cast it to a char *, and then dereference it, getting the char pointed to by a char *
The last line you're concerned about does a very bad thing. First, it treats b as an int* whereas b is a char*. That is, the memory pointer to by b is assumed as 4 bytes(typically) instead of 1 byte. So when you dereference it, it goes to the 1 byte pointed by the actual b, takes the following 3 bytes too, treats those 4 bytes as a single int, and gives you the result. That's why it's garbage.
In general, casting one pointer type to another pointer type must be done with great caution.
You're casting a char pointer to an int pointer. Characters are (usually) stored as 8 bits. ints, on the other hand, are 32 bits (or 64 on 64-bit systems). So if you look at the other 24 bits of memory next to the 8 bits worth of b, you'll get a bunch of extra bits that weren't initialized. Even the position of *b in i is architecture dependent.
big-endian: **** ****|**** ****|**** ****|0110 0001
little-endian: 0110 0001|**** ****|**** ****|**** ****
When you cast the character stored in the above, all the asterisks become relevant.
Since a char is 1 Byte long, and an int 4, when you read an int from the address of a single character, you're reading the character and 3 more bytes. The content of these bytes is just whatever happens to lie in memory (pointers, the value of b) and could even be unallocated (resulting in a segmentation fault).
When you type cast it to a (int *) type, it will refer to a total of 4 bytes(size if int) in memory.
In the second case, you're treating the same address as if it pointed to an int. Officially, the result is simply undefined behavior.
Realistically, what happens is that whatever happens to be in the four1 bytes starting at that address get interpreted as an int.
1 4 bytes assuming a 32-bit int -- if your implementation has, for example, a 64-bit int, it'll be 8 bytes.

Resources