Reinitializing Pointers for C Language - c

I'm currently learning C Programming through Dan Gookin's book Beginning C Programming for Dummies.
One of the topic I'm currently reading is on the fact that arrays are in fact pointers. Dan attempted to prove that with the following code:
#include <stdio.h>
int main()
{
int numbers[10];
int x;
int *pn;
pn = numbers; /* initialize pointer */
/* Fill array */
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
pn = numbers;
/* Display array */
for(x=0;x<10;x++)
{
printf("numbers[%d] = %d, address %p\n",
x+1,*pn,pn);
pn++;
}
return(0);
}
My question is really with line 17. I realized that if I do not reintialize the pointer again as in line 17, the peek values of pointer pn being displayed at the second for loop sequence are a bunch of garbage that do not make sense. Therefore, I would like to know why is there a need to reintialize the pointer pn again for the code to work as intended?

An array is not a pointer, but C allows you to assign the array to a pointer of the type of the variable of the array, with the effect that that pointer will point to the first item in the array. That's what pn = numbers does.
pn is a pointer to an int, not to an array. It points to a single integer. When you increment the pointer, it just shifts to the next memory location. The shift it makes is the size of the type of the pointer, so int in this case.
So what does this prove? Not that an array is a pointer, but only that an array is a continuous block of memory that consists of N times the size of the type of your array item.
When you run the second loop, your pointer arrives at a piece of memory that doesn't belong to the array anymore, and so you get 'garbage' which is just the information which happens to exist at that location.
If you want to iterate over the array again by incrementing a pointer, you will have to reinitialize that pointer to the first item. The for loop does only do one thing, which is counting to 10. It doesn't know about the array and it doesn't know about the pointer, so the loop isn't going to automatically reset the pointer for you.

Since pn is incremented in the first loop, after the first loop is finished, pn will point to an address beyond the numbers array. Therefore, you must initialize pn to the beginning of the array before the second loop since you use the same pointer for printing the contents.

Because you have changed the address contained in pn in the statement pn++ in the following code snippet.
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}

The pn pointer is being used to point into the numbers array.
The first for-loop uses pn to set the values, stepping pn throught the data element by element. After the end of the loop, pn points off the end of numbers (at a non-allocated 11th element).
For the second for-loop to work, i.e. to use pn to loop through numbers again by stepping through the array, pn needs to be moved to the front of the numbers array, otherwise you'll access memory that you shouldn't be looking at (non-allocated memory).

First arrays are not pointers. They decay to pointers when used in function calls and can be used (almost) the same.
Some subtle differences
int a[5]; /* array */
int *pa = a; /* pointer */
pa[0] = 5;
printf("%d\n", a[0]); /* ok it is the same here */
printf("address of array %p - address of pointer %p, value of pointer\n",
&a, &pa, pa); /* &a is the same as pa not &pa */
printf("size of array %d - size of pointer %d\n", sizeof(a), sizeof(pa));
sizeof(a) is here 5 * sizeof(int) whereas sizeof(pa) is the size of a pointer.
Now for your question:
After first loop, pn points to p[10] and no longer to p[0]. That's the reason why you must reset it.

Just to drive the point home, arrays are not pointers. When you declare numbers as int numbers[10], you get the following in memory:
+---+
numbers: | | numbers[0]
+---+
| | numbers[1]
+---+
...
+---+
| | numbers[9]
+---+
There's no storage set aside for a separate pointer to the first element of numbers. What happens is that when the expression numbers appears anywhere, and it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to int", and the value of the expression is the address of the first element of the array.
What you're doing with pn is setting it to point to the first element of numbers, and then "walking" through the array:
+---+
numbers: | | <------+
+---+ |
| | |
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
The expression pn++ advances pn to point to the next integer object, which in this case is the next element of the array:
+---+
numbers: | |
+---+
| | <------+
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
Each pn++ advances the pointer until, at the end of the first loop, you have the following:
+---+
numbers: | |
+---+
| |
+---+
...
+---+
| |
+---+
... <------+
|
+---+ |
pn: | | -------+
+---+
At this point, pn is pointing to the object immediately following the end of the array. This is why you have to reset pn before the next loop; otherwise you're walking through the memory immediately following numbers, which can contain pretty much anything, including trap representations (i.e., bit patterns that don't correspond to a legal value for the given type).
Trying to access memory more than one past the end of an array invokes undefined behavior, which can mean anything from your code crashing outright to displaying garbage to working as expected.

During the fill array, the pointer pn is incremented and the data is placed on array. Same pointer variable used to print the array content. Since this reinitialise is done.

Related

Why should we use a pointer to store an address in C?

I was learning C language where I saw that pointers are variables that store the address of other variables. So I ran this code:
int x = 10;
int *p;
p = &x;
printf("%i\n", p);
Result: 6422292
Then I tried to do the same thing without using pointers, just using a variable to store the address:
int z = 10;
int v;
v = &z;
printf("%i", v);
Result: 6422282
Since we can use variables to store other variables' address, why do we use pointers at all?
Pointers are not integers. They may have integral representation, but they do not behave like integers and should not be treated like integers. Note that on platforms like x86_64 an int is not wide enough to store a pointer value.
Pointers are a distinct class of datatypes for storing the location of an object or function - they are an abstraction of a memory address, with additional type information. Remember, a data type isn't just about what values you can store, but also about what operations you can perform on those values. Pointer operations are distinct from integer operations. The + and - operators mean very different things for integer and pointer types. The unary * operator is not defined for integer types. The arithmetic * and / operators are not defined for pointer types.
And so on.
Pointers to different types are themselves different types and are not interchangeable. Pointer arithmetic (the basis of array subscripting) is based on the pointed-to type. That is, if cp is a char * pointing to a char object, then cp + 1 yields the location of the next char object immediately following. If ip is an int * pointing to an int object, then ip + 1 yields the location of the next int object immediately following:
+---+
c: | | <--- cp
+---+
| | <--- cp + 1
+---+
...
+---+
i: | | <--- ip
+---+
| |
+---+
| |
+---+
| |
+---+
| | <-- ip + 1
+---+
| |
+---+
| |
+---+
| |
+---+
...
This is what I mean about pointers not behaving like integers. They have their own distinct semantics.
C expects the operand of the unary * operator to have pointer type. If you try to deference an integer, even if that integer object stores a valid address value, the compiler will yell at you.
In case of integer, it looks like good, because address is itself integer, but try to do this with other data type like string , array and any struct . You will get the idea why we need pointer in C.

How does this piece of code determine array size without using sizeof( )?

Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.
#include <stdio.h>
int main() {
int a[] = {100, 200, 300, 400, 500};
int size = 0;
size = *(&a + 1) - a;
printf("%d\n", size);
return 0;
}
As expected, it returns 5.
edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method
size = (&arr)[1] - arr;
so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!
When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p points to an int object, then p + 1 will point to the next int in a sequence. If p points to a 5-element array of int (in this case, the expression &a), then p + 1 will point to the next 5-element array of int in a sequence.
Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.
The expression &a yields the address of a, and has the type int (*)[5] (pointer to 5-element array of int). The expression &a + 1 yields the address of the next 5-element array of int following a, and also has the type int (*)[5]. The expression *(&a + 1) dereferences the result of &a + 1, such that it yields the address of the first int following the last element of a, and has type int [5], which in this context "decays" to an expression of type int *.
Similarly, the expression a "decays" to a pointer to the first element of the array and has type int *.
A picture may help:
int [5] int (*)[5] int int *
+---+ +---+
| | <- &a | | <- a
| - | +---+
| | | | <- a + 1
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
| | <- &a + 1 | | <- *(&a + 1)
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int, while on the right, we're viewing it as a sequence of int. I also show the various expressions and their types.
Be aware, the expression *(&a + 1) results in undefined behavior:
...
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
C 2011 Online Draft, 6.5.6/9
This line is of most importance:
size = *(&a + 1) - a;
As you can see, it first takes the address of a and adds one to it. Then, it dereferences that pointer and subtracts the original value of a from it.
Pointer arithmetic in C causes this to return the number of elements in the array, or 5. Adding one and &a is a pointer to the next array of 5 ints after a. After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array.
Details on how pointer arithmetic works:
Say you have a pointer xyz that points to an int type and contains the value (int *)160. When you subtract any number from xyz, C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. For example, if you subtracted 5 from xyz, the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply.
As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1.
Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:
a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced
This means that the code is subtracting a from &a[5] (or a+5), giving 5.
Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.
Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.
Taking the steps one at a time:
&a gets a pointer to an object of type int[5]
+1 gets the next such object assuming there is an array of those
* effectively converts that address into type pointer to int
-a subtracts the two int pointers, returning the count of int instances between them.
I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a.
Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!
I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.
This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.
Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]), when the program performs &a+1 it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]) is actually generated as an array of array of 8 chars (char arr[][8]), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.

What is a free pointer in C?

I learnt that there are two ways of declaring an array in C:
int array[] = {1,2,3};
and:
int* arr = malloc(3*sizeof(int));
Why is arr called a free pointer ? And why can't I change the address contained in array while I can do it with array ?
As said in comments, you learned something incorrect, from a bad source.
In the second case, arr is not an array, it's a pointer. A pointer that (if the allocation succeeds) happens to contain the address of a block of memory that can hold three ints, but that's not an array.
This confusion probably comes from the fact that arrays "decay" to pointers in some contexts, but that does not make them equivalent.
Let's look at how the two objects are laid out in memory:
+---+
array: | 1 | array[0]
+---+
| 2 | array[1]
+---+
| 3 | array[2]
+---+
+---+ +---+
arr: | | ---------> | ? | arr[0]
+---+ +---+
| ? | arr[1]
+---+
| ? | arr[2]
+---+
So, one immediate difference - there is no array object that is separate from the array elements themselves, whereas arr is a separate object from the array elements. Only array is an actual array as far as C is concerned - arr is just a pointer to a single object, which may be the first element of a sequence of objects or not.
This is why you can assign a new address value to arr, but not to array - in the second case, there's nothing to assign the new address value to. It's like trying to change the address of a scalar variable - you can't do it, because the operation doesn't make any sense.
It also means that the address of array[0] is the same as the address of array. The expressions &array[0], array, and &array will all yield the same address value, although the types of the expressions will be different (int *, int *, and int (*)[3], respectively). By contrast, the address of arr is not the same as the address of arr[0]; the expressions arr and &arr[0] will yield the same value, but &arr will not, and its type will be int ** instead of int (*)[3].

Why does free work like this?

Given the following code:
typedef struct Tokens {
char **data;
size_t count;
} Tokens;
void freeTokens(Tokens *tokens) {
int d;
for(d = 0;d < tokens->count;d++)
free(tokens->data[d]);
free(tokens->data);
free(tokens);
tokens = NULL;
}
Why do I need that extra:
free(tokens->data);
Shouldn't that be handled in the for loop?
I've tested both against valgrind/drmemory and indeed the top loop correctly deallocates all dynamic memory, however if I remove the identified line I leak memory.
Howcome?
Let's look at a diagram of the memory you're using in the program:
+---------+ +---------+---------+---------+-----+
| data | --> | char * | char * | char * | ... |
+---------+ +---------+---------+---------+-----+
| count | | | |
+---------+ v v v
+---+ +---+ +---+
| a | | b | | c |
+---+ +---+ +---+
|...| |...| |...|
+---+ +---+ +---+
In C, we can dynamically allocate space for a group (more simply, an array) of elements. However, we can't use an array type to reference that dynamic allocation, and instead use a pointer type. In this case, the pointer just points to the first element of the dynamically allocated array. If you add 1 to the pointer, you'll get a pointer to the second element of the dynamically allocated array, add two to get a pointer to the second element, and so on.
In C, the bracket syntax (data[1]) is shorthand for addition and dereferencing to a pointer. So pointers in C can be used like arrays in this way.
In the diagram, data pointing to the first char * in the dynamically allocated array, which is elsewhere in memory.
Each member of the array pointed to by data is a string, itself dynamically allocated (since the elements are char *s).
So, the loop deallocates the strings ('a...', 'b...', 'c...', etc), free(tokens->data) deallocates the array data points to, and finally, free(tokens) frees the entire struct.
data is a pointer to a pointer. This means data points to a dynamically allocated array of pointers, which then each point to the actual data. The first for loops frees each of the pointers IN the array, but you still need to free the original pointer TO that array of the other points which you freed already. That's the reason for the line you pointed out.
As a general rule of thumb, every malloc() should have a corresponding call to free(). If you look at the code which allocates the memory in this program, you will very likely see a very strict correspondence with the code you posted here that frees the memory.

The good old beginnerkiller: Pointers

I am currently studying C and reached the point (haha...) where I am learning about pointers. I think I know a bit about them already and I think I get the concept of them.
If I have a pointer named "c" and an integer named "a" with the value of 5 and I do the following:
*c = a;
I set the value of the pointer c (because I am using the asterik symbol) to the value of a, which is 5. So *c is 5 after that and c is equal to the memory address of a - Correct?
What about the following then:
c = &a;
I just pass the memory address in which the value of a is stored to the pointer.
Are both operations equal? From my point of view they do the same - Is that correct?
*c = a;
You will end up with this:
+---+ +---+ +---+
| A | | C | | ? |
+---+ +---+ +---+
| 5 | | # | | 5 |
+---+ +-+-+ +-+-+
| ^
| |
+--------+
Howerver with:
c = &a;
you'll end up with:
+---+ +---+
| A | | C |
+---+ +---+
| 5 | | # |
+---+ +-+-+
^ |
| |
+---------+
So in in both cases, you'll have *c == 5, but what differs is what c points to.
Almost. Keep in mind that the first one you are saying: "take the portion of memory where 'c' points to and makes it equal 5". Therefore, you need to assign which portion of memory is that, before setting it to 5. So you actually have two different variables ('a' and the one 'c' points to).
In the second case you are just assigning 'c' to point to 'a'. So yea, you will again point to a value 5 but now you have only one variable 'a' and a pointer pointing to that same space of memory
If I have a pointer named "c" and an integer named "a" with the value
of 5 and I do the following:
*c = a; I set the value of the pointer c (because I am using the asterik symbol) to the value of a, which is 5. So *c is 5 after that
and c is equal to the memory address of a - Correct?
No, the contents of the variable that c is pointing to will get the value of the variable a. The pointer itself is not changed.
What about the following then:
c = &a; I just pass the memory address in which the value of a is
stored to the pointer.
This changes the pointer c itself to point to the variable a.
Are both operations equal? From my point of view they do the same - Is that correct?
No, the first changes the value of whatever c points to, the second one changes what the pointer points to.
An important difference is that the first (*c = a) requires that c is valid -- i.e. actually points to an object). The second one makes c a valid pointer, overwriting its previous value.
If you are using *c=a it will assign 5 to memory address where c is pointing, and if you change value of a it will not affected to value of c.
While if you use c=&a than c started to pointing to memory address of a and of you changed the value of a it will change the value where c is pointing(*c).

Resources