Size of pointer, pointer to pointer in C - c

How can I justify the output of the below C program?
#include <stdio.h>
char *c[] = {"Mahesh", "Ganesh", "999", "333"};
char *a;
char **cp[] = {c+3, c+2, c+1, c};
char ***cpp = cp;
int main(void) {
printf("%d %d %d %d ",sizeof(a),sizeof(c),sizeof(cp),sizeof(cpp));
return 0;
}
Prints
4 16 16 4
Why?
Here is the ideone link if you want to fiddle with it.

char *c[] = {"Mahesh", "Ganesh", "999", "333"};
c is an array of char* pointers. The initializer gives it a length of 4 elements, so it's of type char *[4]. The size of that type, and therefore of c, is 4 * sizeof (char*).
char *a;
a is a pointer of type char*.
char **cp[] = {c+3, c+2, c+1, c};
cp is an array of char** pointers. The initializer has 4 elements, so it's of type char **[4]. It size is 4 * sizeof (char**).
char ***cpp = cp;
cpp is a pointer to pointer to pointer to char, or char***. Its size is sizeof (char***).
Your code uses %d to print the size values. This is incorrect -- but it happens to work on your system. Probably int and size_t are the same size. To print a size_t value correctly, use %zu -- or, if the value isn't very large, you can cast it to int and use %d. (The %zu format was introduced in C99; there might still be some implementations that don't support it.)
The particular sizes you get:
sizeof a == 4
sizeof c == 16
sizeof cp == 16
sizeof cpp == 4
are specific to your system. Apparently your system uses 4-byte pointers. Other systems may have pointers of different sizes; 8 bytes is common. Almost all systems use the same size for all pointer types, but that's not guaranteed; it's possible, for example, for char* to be larger than char***. (Some systems might require more information to specify a byte location in memory than a word location.)
(You'll note that I omitted the parentheses on the sizeof expressions. That's legal because sizeof is an operator, not a function; its operand is either an expression (which may or may not be parenthesized) or a type name in parentheses, like sizeof (char*).)

a is an usually pointer, which represents the memory address. On 32-bit operating system, 32bit (4 Byte) unsigned integer is used to represent the address. Therefore, sizeof(a) is 4.
c is an array with 4 element, each element is a pointer, its size is 4*4 = 16
cp is also an array, each element is a pointer (the first *, wich point to another pointer (the second *). The later pointer points to an string in the memory. Therefore its basic element size should represent the size of a pointer. and then sizeof(cp) = 4*4 = 16.
cpp is a pointer's pointer's pointer. It is as well represent the 32bit memory address. therefore its sizeof is also 4.

a is a pointer. cpp is also a pointer just to different type (pointer to pointer to pointer).
Now c is an array. You have 4 elements, each is a pointer so you have 4 * 4 = 16 (it would be different if you would run it on x64).
Similar goes for cp. Try changing type to int and you will see the difference.

So the reason you got 4 16 16 4, is because 'a' is simply a pointer, on its own, which only requires 4 bytes (as a pointer is holding a 32bit address depending on your architecture) and so when you have a **pointer which is == to a *pointer[], your really making an array of pointers, and since you initalized 4 things that created 4 pointers, thus the 4x4 = 16. And for the cpp you may ask "well wouldn't it then be 16 as it was initalized?" and the answer is no, because a ***pointer is its own separate variable and still just a pointer(a pointer to a pointer to a pointer, or a pointer to an array of pointers), and requires only 4bytes of memory.

Related

Incrementing pointer to pointer by one byte

#include <stdio.h>
int main(){
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
cp++; // This still moves 8 bytes
return 0;
}
Since the size of a pointer is 64 bits on 64 bit machines, doing a pp++ will always move 8 bytes. Is there a way to make it move only 1 byte?
Is there a way to make it move only 1 byte?
Maybe.
All object pointers can be converted to void * and since char * has the same representation, to char *. ++ increments a char * by 1.
#include <stdio.h>
int main() {
int a = 5;
int *p = &a;
int **pp = &p;
char **cp = (char **)pp;
char *character_pointer = (char *) cp;
character_pointer++; // Increment by 1
Now is the tricky part. Can that incremented pointer convert back to a char **. C allows that unless
If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C17dr ยง 6.3.2.2 7
cp = (char **) character_pointer;
return 0;
}
Reading *cp can readily cause undefined behavior as cp does not certainly point to a valid char *. Unclear as to OP's goal at this point.
C is not assembly. What you are trying to do is undefined behavior, and compiler might not do what you ask, and the program might do anything, including possibly what you think it should do if C were just "assembly" with different syntax.
That being said, you can do this:
int a = 5;
int *p = &a;
int **pp = &p;
uintptr_t temp;
memcpy(&temp, &pp, sizeof temp);
temp++;
memcpy(&pp, &temp, sizeof temp);
Above code is likely to do what you want, even though that last memcpy already triggers undefined behavior, because it copies invalid value to a pointer (that is enough for it to be UB). Actually using pp, which now has invalid value, has increasing chance of messing things up.
To understand why having any UB is indeed UB: compiler is free to decide that the effect of the code, which can be proven to have UB, is nothing, or is never reached. So if that last memcpy is inside if, and compiler can prove UB occurs if condition is true, it may just assume condition is never true and optimize whole if away. Presumably C programmer knows to write their condition so that it would never result in UB, so this optimization can be made at compile time already.
Yeah, it is a bit crazy. C is not just assembly with different syntax!
Incrementing pointer to pointer by one byte
If you find an implementation where the size of a pointer to pointer variable contains only 8 bits, (i.e. one that uses 1 byte addressing, btw, very unlikely), then it will be doable, and only then would it be safe to do so. Otherwise it would not be considered a practical or safe thing to do.
For an implementation that uses 64 bit addressing, 64 bits are needed to represent each natural pointer location. Note however though _[t]he smallest incremental change is [available as a by-product of] the alignment needs of the referenced type. For performance, this often matches the width of the ref type, yet systems can allow less._ (per #Chux in comments) but de-referencing these locations could, and likely would lead to undefined behavior.
And in this statement
char **cp = (char **)pp; //where pp is defined as int **
the cast, although allowing a compile without complaining, is simply masking a problem. With the exception of void *, pointer variables are created using the same base type of the object they are to point to for the reason that the sizeof different types can be different, so the pointers designed to point to a particular type can represent its locations accurately.
It is also important to note the following:
sizeof char ** == sizeof char * == sizeof char *** !!= sizeof char`
32bit 4 bytes 4 bytes 4 bytes 1 byte
64bit 8 bytes 8 bytes 8 bytes 1 byte
sizeof int ** == sizeof int * == sizeof int *** !!= sizeof int`
32bit 4 bytes 4 bytes 4 bytes 4 bytes (typically)
64bit 8 bytes 8 bytes 8 bytes 4 bytes (typically)
So, unlike the type of a pointer, its size has little to do with it's ability to point to a location containing an object that is smaller, or even larger in size than the pointer used to point to it.
The purpose of a pointer ( eg char * ) is to store an address to an object of the same base type, in this case char. If targeting 32bit addressing, then the size of the pointer indicates it can point to 4,294,967,296 different locations (or if 64 bits to 18,446,744,073,709,551,616 locations.) and because in this case it is designed to point to char, each address differs by one byte.
But this really has nothing to do with your observation that when you increment a pointer to pointer to char that you see 8 bytes, and not 1 byte. It simply has to do with the fact that pointers, in 64bit addressing, require 8 bytes of space, thus the successive printf statements below will always show an increment of 8 bytes between the 1st and 2nd calls:
char **cp = (char **)pp;
size_t size = sizeof(cp);
printf("address of cp before increment: %p\n", cp);
cp++; // This still moves 8 bytes
printf("address of cp after increment: %p\n", cp);
return 0;

What is the size of the following declaration?

My friend and I are arguing over this one.
char **array[2][2];
Is the size that this takes up in memory 8 + 2*2*8 โ€” 8 for the pointer to the array of pointers and then 32 for the array of pointers?
Or is it just 8 because we are declaring a pointer to an array of pointers. This declarations doesn't have to allocate space for the array of pointers, just the pointer?
As cdecl could have told you (with a little fiddling), your declaration
char ** array[2][2];
declares array as an array of two arrays of two pointers to pointers to char. That means a total of four elements of type char ** (and nothing else). C does not specify how large pointers are, nor even that pointers to different types have the same size, but it is common on 64-bit implementations for all pointers to be 8 bytes wide. On such an implementation, the size of the declared object is 2 * 2 * 8 == 32 bytes. There is no extra pointer.
If you wanted a pointer to a 2 x 2 array of char *, that would be different:
char * (*array)[2][2];
... and the size of that is indeed the size of one pointer. No storage is reserved in that case for the pointed-to 2D array.
The compiler allocates your whole array (32 bytes). You can investigate these questions with sizeof():
#include <stdio.h>
char ** array[2][2];
int main() {
printf("size = %zu\n", sizeof(array));
printf("size[0] = %zu\n", sizeof(array[0]));
printf("size[0][0] = %zu\n", sizeof(array[0][0]));
return 0;
}
On x86_64 architecture returns:
size = 32
size[0] = 16
size[0][0] = 8
Each pointer is 8 bytes long, and your two-dimensional array contains 4 total.

Getting the size of a struct value

Example
#include <stdio.h>
struct A {
char *b;
};
int main(int argc, char *argv[]) {
char c[4] = { 'c', 'a', 't', '\0' };
struct A a;
a.b = c;
printf("%s\n", a.b); // cat
printf("%lu\n", sizeof c); // 4
printf("%lu\n", sizeof a.b); // 8 ???
}
Why does sizeof a.b returns 8 and not 4? If I understood correctly, a.b returns the value that was assigned to it, which is c. But shouldn't it return the size of c (which is 4) then?
sizeof() operator gives the number of bytes allocated to the object and in your case the object is a pointer whose size looks like is 8 bytes on your system.
You're calling sizeof() on two different types.
sizeof(a.b) is sizeof(char *), which is 8 on your platform.
sizeof(c) is sizeof(char[4]), which is 4.
We can have pointers point to arrays via array decaying, which you can read about in this other answer: What is array decaying?
First of all sizeof(a.b) is not size of c. It doesn't give size of what it is pointing to, rather it is size of the pointer.
Take an example of char:
size of char a is 1
and char *b is 4. (on 64 bit)
So it is size of the pointer not what it points to. Please note these sizes are platform dependent.
Although don't get confused by int. An int and int * are of same size on some platforms.
If I understood correctly, a.b returns the value that was assigned to it,
Not exactly. a.b is what's called an lvalue. This means that it designates a memory location. However it does not read that memory location yet; that will only happen if we use a.b within a larger context that expects the memory location to be read.
For example:
a.b = something; // does not read a.b
something = a.b; // does read a.b
The case of sizeof is one context where it does not read the memory location. In fact it tells you how many bytes comprise that memory location; it doesn't tell you anything about what is stored there (let alone about some other memory location that might be pointed to by what is stored there, if it is a pointer).
The output is telling you that your system uses 8 bytes to store a pointer.
sizeof() returns the number of bytes of a variable.
In this case sizeof ( char * ) returns 8 bytes which is the number of bytes that compose a pointer.

Interview question about various pointer size under 32bit architecture

char str[] = " http://www.ibegroup.com/";
char *p = str ;
void Foo ( char str[100]){
}
void *p = malloc( 100 );
What's the sizeof str,p,str,p in the above 4 case in turn?
I've tested it under my machine(which seems to be 64bit) with these results:
25 8 8 8
But don't understand the reason yet.
sizeof(char[]) returns the number of bytes in the string, i.e. strlen()+1 for null-terminated C strings filling the entire array. Arrays don't decay to pointers in sizeof. str is an array, and the string has 25 characters plus a null byte, so sizeof(str) should be 26. Did you add a space to the value?
The size of a pointer is of course always determined just by the machine architecture, so both instances of p are 8 bytes on 64-bit architectures and 4 bytes on 32-bit architectures.
In function arguments, arrays do decay to pointers, so you're getting the same result that you get for a pointer. Therefore, the following definitions are equivalent:
void foo(char s[42]) {};
void foo(char s[100]) {};
void foo(char* s) {};
The first is the sizeof of an built-in array, which is the amount of elements (24 + null on the end of the string).
The second is the sizeof of a pointer which is the native word size of your system, in your case 64 bit or 8 bytes.
The third is the sizeof of a pointer to the first element of an array which has the same size as any other pointer, the native word size of your system. Why a pointer to the first element of an array? Because size information of an array goes lost when passed to a function and it gets implicitly converted to a pointer to the first element instead.
The fourth is the sizeof of a pointer which has the same size as any other pointer.
str is an array of 8-bit characters, including null terminator.
p is a pointer, which is typically the size of the machine's native word size (32 bit or 64 bit).
The size taken up by a pointer stays constant, regardless of the size of the memory to which it points.
EDIT
In c++, arguments that are arrays are passed by reference (which internally is a pointer type), that's why the second instance of str has sizeof 8.
in the cases the size of
char str[] = โ€œ http://www.ibegroup.com/โ€
is known to be 25 (24+1), because that much memory is actually allocated.
In the case of
void Foo ( char str[100]){
no memory is allocated

Why does my homespun sizeof operator need a char* cast?

Below is the program to find the size of a structure without using sizeof operator:
struct MyStruct
{
int i;
int j;
};
int main()
{
struct MyStruct *p=0;
int size = ((char*)(p+1))-((char*)p);
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
return 0;
}
Why is typecasting to char * required?
If I don't use the char* pointer, the output is 1 - why?
Because pointer arithmetic works in units of the type pointed to. For example:
int* p_num = malloc(10 * sizeof(int));
int* p_num2 = p_num + 5;
Here, p_num2 does not point five bytes beyond p_num, it points five integers beyond p_num. If on your machine an integer is four bytes wide, the address stored in p_num2 will be twenty bytes beyond that stored in p_num. The reason for this is mainly so that pointers can be indexed like arrays. p_num[5] is exactly equivalent to *(p_num + 5), so it wouldn't make sense for pointer arithmetic to always work in bytes, otherwise p_num[5] would give you some data that started in the middle of the second integer, rather than giving you the sixth integer as you would expect.
In order to move a specific number of bytes beyond a pointer, you need to cast the pointer to point to a type that is guaranteed to be exactly 1 byte wide (a char).
Also, you have an error here:
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
You have two format specifiers but only one argument after the format string.
If I don't use the char* pointer, the output is 1 - WHY?
Because operator- obeys the same pointer arithmetic rules that operator+ does. You incremented the sizeof(MyStruct) when you added one to the pointer, but without the cast you are dividing the byte difference by sizeof(MyStruct) in the operator- for pointers.
Why not use the built in sizeof() operator?
Because you want the size of your struct in bytes. And pointer arithmetics implicitly uses type sizes.
int* p;
p + 5; // this is implicitly p + 5 * sizeof(int)
By casting to char* you circumvent this behavior.
Pointer arithmetic is defined in terms of the size of the type of the pointer. This is what allows (for example) the equivalence between pointer arithmetic and array subscripting -- *(ptr+n) is equivalent to ptr[n]. When you subtract two pointers, you get the difference as the number of items they're pointing at. The cast to pointer to char means that it tells you the number of chars between those addresses. Since C makes char and byte essentially equivalent (i.e. a byte is the storage necessary for one char) that's also the number of bytes occupied by the first item.

Resources