Pointer layout in memory in C - c

I've recently been messing around with pointers and I would like to know a bit more about them, namely how they are organized in memory after using malloc for example.
So this is my understanding of it so far.
int **pointer = NULL;
Since we explicitly set the pointer to NULL it now points to the address 0x00.
Now let's say we do
pointer = malloc(4*sizeof(int*));
Now we have pointer pointing to an address in memory - let's say pointer points to the address 0x0010.
Let's say we then run a loop:
for (i = 0; i<4; i++) pointer[i] = malloc(3*sizeof(int));
Now, this is where it starts getting confusing to me. If we dereference pointer, by doing *pointer what do we get? Do we get pointer[0]? And if so, what is pointer[0]?
Continuing, now supposedly pointer[i] contains stored in it an address. And this is where it really starts confusing me and I will use images to better describe what I think is going on.
In the image you see, if it is correct, is pointer[0] referring to the box that has the address 0x0020 in it? What about pointer[1]?
If I were to print the contents of pointer would it show me 0x0010? What about pointer[0]? Would it show me 0x0020?
Thank you for taking the time to read my question and helping me understand the memory layout.

Pointer Refresher
A pointer is just a numeric value that holds the address of a value of type T. This means that T can also be a pointer type, thus creating pointers-to-pointers, pointers-to-pointers-to-pointers, and crazy things like char********** - which is simply a pointer (T*) where T is a pointer to something else (T = E*) where E is a pointer to something else (and so on...).
Something to remember here is that a pointer itself is a value and thus takes space. More specifically, it's (usually) the size of the addressable space the CPU supports.
So for example, the 6502 processor (commonly found in old gaming consoles like the NES and Atari, as well as the Apple II, etc.) could only address 16 bits of memory, and thus its "pointers" were 16-bits in size.
So regardless of the underlying type, a pointer will (usually) be as large as the addressable space.
Keep in mind that a pointer doesn't guarantee that it points to valid memory - it's simply a numeric value that happens to specify a location in memory.
Array Refresher
An array is simply a series of T elements in contiguously addressable memory. The fact it's a "double pointer" (or pointer-to-a-pointer) is innocuous - it is still a regular pointer.
For example, allocating an array of 3 T's will result in a memory block that is 3 * sizeof(T) bytes long.
When you malloc(...) that memory, the pointer returned simply points to the first element.
T *array = malloc(3 * sizeof(T));
printf("%d\n", (&array[0] == &(*array))); // 1 (true)
Keep in mind that the subscript operator (the [...]) is basically just syntactic sugar for:
(*(array + sizeof(*array) * n)) // array[n]
Arrays of Pointers
To sum all of this up, when you do
E **array = malloc(3 * sizeof(E*));
You're doing the same thing as
T *array = malloc(3 * sizeof(T));
where T is really E*.
Two things to remember about malloc(...):
It doesn't initialize the memory with any specific values (use calloc for that)
It's not guaranteed (nor really even common) for the memory to be contiguous or adjacent to the memory returned by a previous call to malloc
Therefore, when you fill the previously created array-of-pointers with subsequent calls to malloc(), they might be in arbitrarily random places in memory.
All you're doing with your first malloc() call is simply creating the block of memory required to store n pointers. That's it.
To answer your questions...
If we dereference pointer, by doing *pointer what do we get? Do we get pointer[0]?
Since pointer is just a int**, and remembering that malloc(...) returns the address of the first byte in the block of memory you allocated, *pointer will indeed evaluate to pointer[0].
And if so, what is pointer[0]?
Again, since pointer as the type int**, then pointer[0] will return a value type of int* with the numeric contents of the first sizeof(int*) bytes in the memory block pointed to by pointer.
If I were to print the contents of pointer would it show me 0x0010?
If by "printing the contents" you mean printf("%p\n", (void*) pointer), then no.
Since you malloc()'d the memory block that pointer points to, pointer itself is just a value with the size of sizeof(int**), and thus will hold the address (as a numeric value) where the block of memory you malloc()'d resides.
So the above printf() call will simply print that value out.
What about pointer[0]?
Again assuming you mean printf("%p\n", (void*) pointer[0]), then you'll get a slightly different output.
Since pointer[0] is the equivalent of *pointer, and thus causes pointer to be dereferenced, you'll get a value of int* and thus the pointer value that is stored in the first element.
You would need to further dereference that pointer to get the numeric value stored in the first integer that you allocated; for example:
printf("%d\n", **pointer);
// or
printf("%d\n", *pointer[0]);
// or even
printf("%d\n", pointer[0][0]); // though this isn't recommended
// for readability's sake since
// `pointer[0]` isn't an array but
// instead a pointer to a single `int`.

If I dereference pointer, by doing *pointer what do I get? pointer[0]?
Yes.
And if so, what is pointer[0]?
With your definitions: 0x0020.
In the image you see, if it is correct
It seems correct to me.
is pointer[0] referring to the box that has the address 0x0020 in it?
Still yes.
What about pointer[1]?
At this point, I think you can guess that it woud show: 0x002c.
To go further
If you want to check how memory is managed and what pointers look like you can use gdb. It allows running a program step by step and performing various operations such as showing the content of variables. Here is the main page for GNU gdb. A quick internet search should let you find numerous gdb tutorials.
You can also show the address of a pointer in c by using a printf line:
int *plop = NULL;
fprintf(stdout, "%p\n", (void *)pointer);
Note: don't forget to include <stdio.h>

Related

How does malloc() know you want to use the block of memory it supplies as an array?

If malloc() returns a pointer to a single block of memory, how can it be used to store multiple values contiguously and allow access to each one using the subscript operator, acting as a pointer to an array?
If I were to try and change the "second element" of an integer by subscripting its address, it would cause undefined behaviour. As malloc() returns the pointer to a single block of memory, shouldn't the pointer it returns refer to the entire block, and thus subscripting it should access the garbage value next to it in memory?
Furthermore, the allocated memory can also be used to store a single value, but only up to the size of the type the pointer is cast to, not to that of the allocated block of memory.
Is all this something to do with the type the pointer is cast to after being returned? Could someone point me in the right direction?
I think your misunderstanding is here:
As malloc() returns the pointer to a single block of memory, shouldn't the pointer it returns refer to the entire block, and thus subscripting it should access the garbage value next to it in memory?
Indeed if you do p = malloc(n) and p has type "pointer to some type of size n", then p[1] is an out-of-bounds array access. However, normally when you do p = malloc(n) to allocate an array, the type of p is not a pointer to the array (of size n), but a pointer to the first element of the array. That is, instead of
char (*p)[500] = malloc(500);
you do:
char *p = malloc(500);
and in this case p[1] is perfectly valid. Note that with the first, unusual, form, you could still do (*p)[1] or p[0][1] and have it be valid.
But be careful, if you use malloc several times it will return memory allocated in different parts of heap. So you can't move around from one array to another.

Confusion between "int array[int]" and "int *array"

int array[100];
int *array;
I am confused about the differences between int array[100] and int *array.
Essentially, when I do int array[100] (100 it's just an example of an int), I just reserved space in memory for 100 ints, but I can do int * array and I didn't specify any type of size for this array, but I can still do array[9999] = 30 and that will still make sense.
So what's the difference between these two?
A pointer is a pointer, it points somewhere else (like the first element of an array). The compiler doesn't have any information about where it might point or the size of the data it might point to.
An array is, well, an array of a number of consecutive elements of the same type. The compiler knows its size, since it's always specified (although sometimes the size is only implicitly specified).
An array can be initialized, but not assigned to. Arrays also often decay to pointers to their first element.
Array decay example:
int array[10];
int *pointer = array; // Here the symbol array decays to the expression &array[0]
// Now the variable pointer is pointing to the first element of array
Arrays can't naturally be passed to function. When you declare a function argument like int arr[], the compiler will be translating it as int *arr.
All of this information, and more, should be in any good book, tutorial or class.
A non-technical explanation:
A pointer's contents refer to an address (which may or may not be valid). An array has an address (which must be valid for the array to exist).
You can think of a pointer as being like an envelope - you can put any address you want on it, but if you want it sent to somewhere in particular, that address has to be correct.
An array is like your house - it exists somewhere, so it has an address. Things properly addressed get sent there.
In short:
A pointer holds an address.
An array has an address.
So
int *array;
creates a pointer of indeterminate value (it can point anywhere!).
When you then have
array[9999] = 30;
you're trying to set the 9999th int value from where array points to the value of 30. But you don't know where array points because you didn't give it an actual value.
And that's undefined behavior.
The difference is when you do int array[100], a memory block of 100 * sizeof(int) is allocated on the stack, but when you do int *array, you need to dynamically allocate memory (with malloc function for example) to use the array variable. Dynamically allocated memory is on the heap, not stack.
int array[100] means a variable array which will be able to hold 100 int values this memory will be allocated from the stack. The variablearray will be having the base address of the array and memory will be allocated for the same.
But in the case of int *array since you are declaring this as a local variable, pointer variable array will be having a garbage address. So if you do array[9999] it could cause a segmentation violation since you are trying to access garbage memory location outside your program.
Some points that you can find useful to know:
Via int arr[N] you specify an array of type int which can store N
integers. To get information about how much memory array is taking you can use sizeof operator. Just multiply the number of items in an array by the size of type: N*sizeof(int).
Name of the array points to the first element in an array, e.g. *arr is the same as arr[0], also you may wonder why a[5] == 5[a].
An uninitialized array of non-static storage duration is filled with indeterminate values.
The size of an array may be known at runtime, if you write int arr[] = {1, 2} the size is calculated by a compiler.
Accessing an unexisting element can cause undefined behaivor, which means that anything could happen, and in most cases you'll get garbage values.
Via int *array you specify a pointer array of type int
Unless a value is assigned, a pointer will point to some garbage address by default.
If you don't allocate memory at all or not fully allocate it or access unexisting element but try to use a pointer as an array, you'll get undefined behavior as expected.
After allocating memory (when the pointer is no longer needed) memory should be freed.
int array[100]; defines an array of int.
int *array; defines a pointer to an int. This pointer may point to an int variable or to an element of an array of int, or to nothing at all (NULL), or even to an arbitrary, valid or invalid address in memory, which is the case when it is an uninitialized local variable. It is a tad misleading to call this pointer array, but commonly used when naming a function argument that indeed points to an actual array. The compiler cannot determine the size of the array, if any, from the pointer value.
Here is a topographic metaphor:
Think of an array as a street with buildings. It has GPS coordinates (memory address) a name (but not always) and a fixed number of buildings (at a given time, hard to change). The street name together with the building number specifies a precise building. If you specify a number larger than the last number, it is an invalid address.
A pointer is a very different thing: think of it as a an address label. It is a small piece of paper that can be used to identify a building. If it is blank (a null pointer), it is useless and if you stick it to a letter and send that, the letter will get lost and discarded (undefined behavior, but it is easy to tell that it is invalid). If you write an invalid address on it, the effect is similar, but might cost much more before failing delivery (undefined behavior and difficult to test for).
If a street is razed (if memory was freed), previously written address labels are not modified, but they no longer point the anything useful (undefined behavior if you send the letter, the difficult kind). If a new street is later named with the name on the label, the letter might get delivered, but probably not as intended (undefined behavior again, memory was freed and some other allocated object happens to be at the same memory address).
If you pass a building to a function, you would usually not unearth it and truck it, but merely pass its street address (a pointer to the n-th building of the street, &array[n]). If you don't specify a building and just name the street, it means go to the beginning of the street. Similarly, when passing an array to a function is C, the function receives a pointer to the beginning of the array, we say that arrays decays as pointers.
Without specifying size in int * array, array[9999] = 30 can cause segmentation fault as it may lead to accessing of inaccessible memory
Basically int * array points to a random location. For accessing the 9999th element the array must point to a location having that much sufficient space. But the statement int * array doesn't explicitly creates any space for that.

Size of 2d pointers

I'm trying to improve my knowledge with pointers by making an pointer who points to another pointer that is practically a string.
Now I want to get size who normally I could get fromsizeof(foo[0])/sizeof(foo[0][0])
Pointer form
char** foo;
sizeof(test)/sizeof(*test) doesn't indicate the number of elements anymore with your declaration, because the compiler doesn't know what is the pointer pointing to, because sizeof() is a compile time operation and hence not dynamic.
To find no of elements, you can add a sentinel value:
char **test = {"New York", "Paris", "Cairo", NULL};
int testLen = -1;
while(test[++testLen] != NULL){
//DO NOTHING
}
You will never get the size of a block of memory where a pointer points to... because there can be anything.
test simply points to a place in memory where some other pointers are stored (to the first one). Each pointer will again lead to another place in Memory where some character values are stored. So, your test variable contains a simple number (the index of a place in Memory) and depending on your operating System sizeof(test) will maybe have 4 bytes or 8 bytes as result regardless of the size of the allocated memory.
sizeof() will work as you might have expected when using stack arrays. If test is declared as
char test[10][20];
Then sizeof(test) will in fact return 200.
How I can get it's length (=rows)?
You cannot. Read more in How to get the length of dynamically allocated two dimensional arrays in C
Your attempt:
char** foo;
sizeof(foo[0])/sizeof(foo[0][0])
most probably results in 8, right? That's because you are getting the size of a pointer (which is probably 8 in your system) and then divide by the size of a character, which is always 1.
If you are allocating something large you use malloc() and malloc receives one argument - the size in bytes(e.g malloc(sizeof(int)*20).
malloc also returns a void pointer to the allocated memory. You typically cast this pointer to fit your type.
In other words you can't really get the size. You must store it somewhere and pass it to other functions when its needed.
A pointer to pointer (**) is like adding one additional dimension.
[] these are more of a syntax sugar for pointer arithmetic.
a[i] would be the same as *(a+i).
This may vary on your system but sizof() will give you these values for these types.
int a; //4
int b[5]; //20
int* c; //8
int d[5][5];//100
int** e; //8

Stack vs. heap pointers in C

I'm currently learning C and I'm confused about memory layout and pointers.
In the following code, it is my understanding that the array is allocated on the stack.
#include <stdio.h>
int main () {
int x[4];
x[0] = 3; x[1] = 2; x[2] = 1;
printf("%p\n",x);
printf("%p\n", &x);
}
My question is, why do the two print calls output the same value?
I tried a similar snippet using malloc (allocate on the heap), and the values differ.
#include <stdio.h>
#include <stdlib.h>
int main () {
int *x = malloc(sizeof(int) * 4);
x[0] = 3; x[1] = 2; x[2] = 1;
printf("%p\n",x);
printf("%p\n", &x);
}
The reason is that unlike you were probably taught, arrays are not pointers. Arrays in C decay into pointers1 under some circumstances. When you pass an array to a function, it decays into a pointer to the first element. The address of that element is the same as the address of the entire array (an address is always to the first byte of an object).
What you get from malloc is not an array, but the address of a chunk of memory. You assign the address to a pointer. But the pointer and the chunk are separate entities. So printing the value of the pointer, as opposed to its address, yields different results.
(1) Decay is a fancy term for a type of implicit type conversion. When an array expression is used in most places (such as being passed as an argument to a function that expects a pointer), it automatically turns into a pointer to its first element. The "decay" is because you lose type information, i.e. the array size.
Your two print calls print the same value because one tries to print the array, which decays to a pointer to the array, and the other prints the address of the array. A pointer to the array contains the address of the array, so they're the same value.
In the second case, one prints the value of x, the other prints the address of x. Since x is a pointer to the block of memory you allocated, these must be different values.
So in the first case, all you have is an array (x). In the second case, you have an allocated block of memory (unnamed) and a pointer to that allocated block (x).
It is perhaps surprising that one can indeed take the address of a whole array, partly because one doesn't need to very often. The array in a sense is a single object, which has one address, which is the address of its first byte. Like with all objects, the address is obtained with the address operator, &.
The first element of an array (like all of its elements) has an address, too, which is the address of its first byte. A pointer to its first element is what the array type is "adjusted" to when it is passed as an argument to a function.
These two bytes are identical, and have the same address. But they have different types, which becomes obvious if you add 1 to them and print them again.
The pointer y, by contrast, is its own distinct object (probably 4 or 8 bytes in size; enough to store an address in it). Like any object it has an address which can be obtained with the & operator. Perhaps confusingly, it also contains an address, in this case the address of the first byte of the array. The two are of course not equal: The pointer object resides at a different location than the array (namely next to it on the stack, even if Olaf doesn't like that).
Minor remark: You use %p for printing pointers, which is good. If you do that, you should strictly spoken cast the pointer which you print to a void pointer: printf("%p\n", (void *)x);.

Few doubts about pointers in C

1) to initialize a pointer I use:
int number, *Pnumber;
Pnumber=&number;
number=10;
Am I doing this right?
what about:
int *Pnumber;
*Pnumber=10;
When I compile it, I get:
RUN FAILED (exit value 1, total time: 858ms)
btw. do I need to use free(Pnumber) to free the memory?
Am I doing this right?
Yes, you are.
what about:
`int *Pnumber;
*Pnumber=10;`
Pnumber is an unitialized pointer. Dereferencing this pointer leads to an undefined behavior. Pnumber must point to an allocated memory (either to a variable or a dynamically allocated memory area).
btw. do I need to use free(Pnumber) to free the memory?
As long as you don't use malloc, don't use free.
In the first version you point the pointer Pnumber to memory that has already been allocated and thus you may change the value pointed to by the pointer. This version is correct. In the second version you never specify what is the pointer pointing to(it remains uninitialized) and thus when you try to access the memory this will cause an error. So the second version is not correct.
Your first approach is right.
But this wrong:
int *Pnumber;
*Pnumber=10;
Because the pointer doesn't point to valid memory whereas in the first approach it does.
The first one is correct
In the second you are missing pointing the pointer to a memory space.
The pointer is an address so
If we have
int *p;
That means p is an address
and *p is the content of the memory address.
so the pointer should be pointed to a memory space before you fill the memory with
*p = 5;
if you use a Pointer you "point a variable like you did:
int number, *Pnumber;
Pnumber=&number;
number=10;
the advantage of the Pointer is that you save memory to your Program so if you want to change the value of number which is an integer 32bit you can use
*Pnumber = 10;
here you're using integer but if you use a array of them or double or float it a lot of memory and that why it better for you to save the address of the variable in 32 bit in 32bit OS architecture always with pointer in doesn't matter what type you'RE pointing at

Resources