Why both the pointers show same memory address - c

.....
// Some code
char *options[] = {"\nDATA:","\nSUBJECT:","\nMAILFROM:","\nRCPTO:"};
char *data[3] = {};
I am initializing this array of pointers.
But when I try to access each member of array of pointers, I can see that
options[0] = data[3]
0x40873b = 0x40873b
they both point to same memory location.
Even I have declared 'options' array before the 'data' array.
So How to resolve this.
How can be sure that they are at different memory location and store the contents properly.without overlapping , different data at 2 different location.

When you write
char *data[3] = {};
the [3] means "allocate space for three elements of the array".
It does not mean that you have just created a pointer named
data[3] (in fact data[3] is not a pointer)
nor that data[3] is part of the memory that was
just allocated; rather, the three elements of the memory allocated for
the array are data[0], data[1], and data[2],
which are at the memory locations data (0x408738),
data + 1 (0x408739), and data + 2 (0x40873a).
If you write
data[3] == options[0]
then data[3] means whatever is at the memory location data + 3,
which is the first thing after the last allocated element of data.
The compiler happens to have started the memory allocation
for options there, that is, location 0x40873b is where the
first contents of options are to be found.

It looks like the compiler laid your objects out as follows:
+---+
data: | | data[0]
+---+
| | data[1]
+---+
| | data[2] <-- last element of data array
+---+
options: | | options[0], data[3]
+---+
| | options[1], data[4]
+---+
| | options[2], data[5]
+---+
| | options[3], data[6] <-- last element of options array
+---+
Your data array contains 3 elements, indexed from 0 to 2. When you access data[3], you're accessing an object one past the end of the data array, and it just so happens to be the first object of the options array.
Note that attempting to read an object one past the end of an array invokes undefined behavior; C doesn't do any bounds checking on array accesses, so doing this won't raise an OutOfBounds exception or anything like that. In this particular case, you got a reasonable-looking value because the object following the last element of the data array has the same type as that element (char *). You could theoretically iterate through the entire options array using data (as shown above), although that will only "work" in this specific case; if you add another variable or change the code, the compiler could change the order in which things are laid out in memory, and this would suddenly not "work" anymore.

Related

How does this piece of code determine array size without using sizeof( )?

Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.
#include <stdio.h>
int main() {
int a[] = {100, 200, 300, 400, 500};
int size = 0;
size = *(&a + 1) - a;
printf("%d\n", size);
return 0;
}
As expected, it returns 5.
edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method
size = (&arr)[1] - arr;
so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!
When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p points to an int object, then p + 1 will point to the next int in a sequence. If p points to a 5-element array of int (in this case, the expression &a), then p + 1 will point to the next 5-element array of int in a sequence.
Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.
The expression &a yields the address of a, and has the type int (*)[5] (pointer to 5-element array of int). The expression &a + 1 yields the address of the next 5-element array of int following a, and also has the type int (*)[5]. The expression *(&a + 1) dereferences the result of &a + 1, such that it yields the address of the first int following the last element of a, and has type int [5], which in this context "decays" to an expression of type int *.
Similarly, the expression a "decays" to a pointer to the first element of the array and has type int *.
A picture may help:
int [5] int (*)[5] int int *
+---+ +---+
| | <- &a | | <- a
| - | +---+
| | | | <- a + 1
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
| | <- &a + 1 | | <- *(&a + 1)
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int, while on the right, we're viewing it as a sequence of int. I also show the various expressions and their types.
Be aware, the expression *(&a + 1) results in undefined behavior:
...
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
C 2011 Online Draft, 6.5.6/9
This line is of most importance:
size = *(&a + 1) - a;
As you can see, it first takes the address of a and adds one to it. Then, it dereferences that pointer and subtracts the original value of a from it.
Pointer arithmetic in C causes this to return the number of elements in the array, or 5. Adding one and &a is a pointer to the next array of 5 ints after a. After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array.
Details on how pointer arithmetic works:
Say you have a pointer xyz that points to an int type and contains the value (int *)160. When you subtract any number from xyz, C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. For example, if you subtracted 5 from xyz, the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply.
As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1.
Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:
a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced
This means that the code is subtracting a from &a[5] (or a+5), giving 5.
Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.
Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.
Taking the steps one at a time:
&a gets a pointer to an object of type int[5]
+1 gets the next such object assuming there is an array of those
* effectively converts that address into type pointer to int
-a subtracts the two int pointers, returning the count of int instances between them.
I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a.
Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!
I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.
This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.
Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]), when the program performs &a+1 it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]) is actually generated as an array of array of 8 chars (char arr[][8]), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.

What is a free pointer in C?

I learnt that there are two ways of declaring an array in C:
int array[] = {1,2,3};
and:
int* arr = malloc(3*sizeof(int));
Why is arr called a free pointer ? And why can't I change the address contained in array while I can do it with array ?
As said in comments, you learned something incorrect, from a bad source.
In the second case, arr is not an array, it's a pointer. A pointer that (if the allocation succeeds) happens to contain the address of a block of memory that can hold three ints, but that's not an array.
This confusion probably comes from the fact that arrays "decay" to pointers in some contexts, but that does not make them equivalent.
Let's look at how the two objects are laid out in memory:
+---+
array: | 1 | array[0]
+---+
| 2 | array[1]
+---+
| 3 | array[2]
+---+
+---+ +---+
arr: | | ---------> | ? | arr[0]
+---+ +---+
| ? | arr[1]
+---+
| ? | arr[2]
+---+
So, one immediate difference - there is no array object that is separate from the array elements themselves, whereas arr is a separate object from the array elements. Only array is an actual array as far as C is concerned - arr is just a pointer to a single object, which may be the first element of a sequence of objects or not.
This is why you can assign a new address value to arr, but not to array - in the second case, there's nothing to assign the new address value to. It's like trying to change the address of a scalar variable - you can't do it, because the operation doesn't make any sense.
It also means that the address of array[0] is the same as the address of array. The expressions &array[0], array, and &array will all yield the same address value, although the types of the expressions will be different (int *, int *, and int (*)[3], respectively). By contrast, the address of arr is not the same as the address of arr[0]; the expressions arr and &arr[0] will yield the same value, but &arr will not, and its type will be int ** instead of int (*)[3].

Why does free work like this?

Given the following code:
typedef struct Tokens {
char **data;
size_t count;
} Tokens;
void freeTokens(Tokens *tokens) {
int d;
for(d = 0;d < tokens->count;d++)
free(tokens->data[d]);
free(tokens->data);
free(tokens);
tokens = NULL;
}
Why do I need that extra:
free(tokens->data);
Shouldn't that be handled in the for loop?
I've tested both against valgrind/drmemory and indeed the top loop correctly deallocates all dynamic memory, however if I remove the identified line I leak memory.
Howcome?
Let's look at a diagram of the memory you're using in the program:
+---------+ +---------+---------+---------+-----+
| data | --> | char * | char * | char * | ... |
+---------+ +---------+---------+---------+-----+
| count | | | |
+---------+ v v v
+---+ +---+ +---+
| a | | b | | c |
+---+ +---+ +---+
|...| |...| |...|
+---+ +---+ +---+
In C, we can dynamically allocate space for a group (more simply, an array) of elements. However, we can't use an array type to reference that dynamic allocation, and instead use a pointer type. In this case, the pointer just points to the first element of the dynamically allocated array. If you add 1 to the pointer, you'll get a pointer to the second element of the dynamically allocated array, add two to get a pointer to the second element, and so on.
In C, the bracket syntax (data[1]) is shorthand for addition and dereferencing to a pointer. So pointers in C can be used like arrays in this way.
In the diagram, data pointing to the first char * in the dynamically allocated array, which is elsewhere in memory.
Each member of the array pointed to by data is a string, itself dynamically allocated (since the elements are char *s).
So, the loop deallocates the strings ('a...', 'b...', 'c...', etc), free(tokens->data) deallocates the array data points to, and finally, free(tokens) frees the entire struct.
data is a pointer to a pointer. This means data points to a dynamically allocated array of pointers, which then each point to the actual data. The first for loops frees each of the pointers IN the array, but you still need to free the original pointer TO that array of the other points which you freed already. That's the reason for the line you pointed out.
As a general rule of thumb, every malloc() should have a corresponding call to free(). If you look at the code which allocates the memory in this program, you will very likely see a very strict correspondence with the code you posted here that frees the memory.

Reinitializing Pointers for C Language

I'm currently learning C Programming through Dan Gookin's book Beginning C Programming for Dummies.
One of the topic I'm currently reading is on the fact that arrays are in fact pointers. Dan attempted to prove that with the following code:
#include <stdio.h>
int main()
{
int numbers[10];
int x;
int *pn;
pn = numbers; /* initialize pointer */
/* Fill array */
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
pn = numbers;
/* Display array */
for(x=0;x<10;x++)
{
printf("numbers[%d] = %d, address %p\n",
x+1,*pn,pn);
pn++;
}
return(0);
}
My question is really with line 17. I realized that if I do not reintialize the pointer again as in line 17, the peek values of pointer pn being displayed at the second for loop sequence are a bunch of garbage that do not make sense. Therefore, I would like to know why is there a need to reintialize the pointer pn again for the code to work as intended?
An array is not a pointer, but C allows you to assign the array to a pointer of the type of the variable of the array, with the effect that that pointer will point to the first item in the array. That's what pn = numbers does.
pn is a pointer to an int, not to an array. It points to a single integer. When you increment the pointer, it just shifts to the next memory location. The shift it makes is the size of the type of the pointer, so int in this case.
So what does this prove? Not that an array is a pointer, but only that an array is a continuous block of memory that consists of N times the size of the type of your array item.
When you run the second loop, your pointer arrives at a piece of memory that doesn't belong to the array anymore, and so you get 'garbage' which is just the information which happens to exist at that location.
If you want to iterate over the array again by incrementing a pointer, you will have to reinitialize that pointer to the first item. The for loop does only do one thing, which is counting to 10. It doesn't know about the array and it doesn't know about the pointer, so the loop isn't going to automatically reset the pointer for you.
Since pn is incremented in the first loop, after the first loop is finished, pn will point to an address beyond the numbers array. Therefore, you must initialize pn to the beginning of the array before the second loop since you use the same pointer for printing the contents.
Because you have changed the address contained in pn in the statement pn++ in the following code snippet.
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
The pn pointer is being used to point into the numbers array.
The first for-loop uses pn to set the values, stepping pn throught the data element by element. After the end of the loop, pn points off the end of numbers (at a non-allocated 11th element).
For the second for-loop to work, i.e. to use pn to loop through numbers again by stepping through the array, pn needs to be moved to the front of the numbers array, otherwise you'll access memory that you shouldn't be looking at (non-allocated memory).
First arrays are not pointers. They decay to pointers when used in function calls and can be used (almost) the same.
Some subtle differences
int a[5]; /* array */
int *pa = a; /* pointer */
pa[0] = 5;
printf("%d\n", a[0]); /* ok it is the same here */
printf("address of array %p - address of pointer %p, value of pointer\n",
&a, &pa, pa); /* &a is the same as pa not &pa */
printf("size of array %d - size of pointer %d\n", sizeof(a), sizeof(pa));
sizeof(a) is here 5 * sizeof(int) whereas sizeof(pa) is the size of a pointer.
Now for your question:
After first loop, pn points to p[10] and no longer to p[0]. That's the reason why you must reset it.
Just to drive the point home, arrays are not pointers. When you declare numbers as int numbers[10], you get the following in memory:
+---+
numbers: | | numbers[0]
+---+
| | numbers[1]
+---+
...
+---+
| | numbers[9]
+---+
There's no storage set aside for a separate pointer to the first element of numbers. What happens is that when the expression numbers appears anywhere, and it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to int", and the value of the expression is the address of the first element of the array.
What you're doing with pn is setting it to point to the first element of numbers, and then "walking" through the array:
+---+
numbers: | | <------+
+---+ |
| | |
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
The expression pn++ advances pn to point to the next integer object, which in this case is the next element of the array:
+---+
numbers: | |
+---+
| | <------+
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
Each pn++ advances the pointer until, at the end of the first loop, you have the following:
+---+
numbers: | |
+---+
| |
+---+
...
+---+
| |
+---+
... <------+
|
+---+ |
pn: | | -------+
+---+
At this point, pn is pointing to the object immediately following the end of the array. This is why you have to reset pn before the next loop; otherwise you're walking through the memory immediately following numbers, which can contain pretty much anything, including trap representations (i.e., bit patterns that don't correspond to a legal value for the given type).
Trying to access memory more than one past the end of an array invokes undefined behavior, which can mean anything from your code crashing outright to displaying garbage to working as expected.
During the fill array, the pointer pn is incremented and the data is placed on array. Same pointer variable used to print the array content. Since this reinitialise is done.

const char **name VS char *name[]

I know this topic was already discussed several times and I think I basically know the difference between arrays and pointer but I am interested in how arrays are exactly stored in mem.
for example:
const char **name = {{'a',0},{'b',0},{'c',0},0};
printf("Char: %c\n", name[0][0]); // This does not work
but if its declared like this:
const char *name[] = {"a","b","c"};
printf("Char: %c\n", name[0][0]); // Works well
everything works out fine.
When you define a variable like
char const* str = "abc";
char const** name = &str;
it looks something like this:
+---+ +---+ +---+---+---+---+
| *-+---->| *-+--->| a | b | c | 0 |
+---+ +---+ +---+---+---+---+
When you define a variable using the form
char const* name[] = { "a", "b", "c" };
You have an array of pointers. This looks something like that:
+---+ +---+---+
| *-+---->| a | 0 |
+---+ +---+---+
| *-+---->| b | 0 |
+---+ +---+---+
| *-+---->| c | 0 |
+---+ +---+---+
What may be confusing is that when you pass this array somewhere, it decays into a pointer and you got this:
+---+ +---+ +---+---+
| *-+---->| *-+---->| a | 0 |
+---+ +---+ +---+---+
| *-+---->| b | 0 |
+---+ +---+---+
| *-+---->| c | 0 |
+---+ +---+---+
That is, you get a pointer to the first element of the array. Incrementing this pointer moves on to the next element of the array.
A string literal converts implicitly to char const*.
The curly braces initializer doesn't.
Not relevant to your example, but worth knowing: up till and including C++03 a string literal could also implicitly convert to char* (no const), for compatibility with old C, but happily in C++11 this unsafe conversion was finally removed.
The reason the first snippet does not work is that the compiler re-interprets the sequence of characters as the value of a pointer, and then ignores the rest of the initializers. In order for the snippet to work, you need to tell the compiler that you are declaring an array, and that the elements of that array are arrays themselves, like this:
const char *name[] = {(char[]){'a',0},(char[]){'b',0},(char[]){'c',0},0};
With this modification in place, your program works and produces the desired output (link to ideone).
Your first example declares a pointer to a pointer to char. The second declares an array of pointers to char. The difference is that there's one more layer of indirection in the first one. It's a bit hard to describe without a drawing.
In a fake assembly style,
char **name = {{'a',0},{'b',0},{'c',0},0};
would translate to something like:
t1: .byte 'a', 0
.align somewhere; possibly somewhere convenient
t2: .byte 'b', 0
.align
t3: .byte 'c', 0
.align
t4: .dword t1, t2, t3, 0
name: .dword t4
while the second one,
char *name[] = {"a","b","c"};
might generate the same code for t1, t2, and t3, but then would do
name: .dword t1, t2, t3
Does that make sense?
Arrays are stored in memory as a contiguous sequence of objects, where the type of that object is the base type of the array. So, in the case of your array:
const char *name[] = {"a","b","c"};
The base type of the array is const char * and the size of the array is 3 (because your initialiser has three elements). It would look like this in memory:
| const char * | const char * | const char * |
Note that the elements of the array are pointers - the actual strings aren't stored in the array. Each one of those strings is a string literal, which is an array of char. In this case, they're all arrays of two chars, so somewhere else in memory you have three unnamed arrays:
| 'a' | 0 |
| 'b' | 0 |
| 'c' | 0 |
The initialiser sets the three elements of your name array to point to the initial elements of these three unnamed arrays. name[0] points to the 'a', name[1] points to the 'b' and name[2] points to the 'c'.
You have to look at what happens when you declare a variable, and where the memory to store the data for the variable goes.
First, what does it mean to simply write:
char x = 42;
you get enough bytes to hold a char on the stack, and those bytes are set to the value 42.
Secondly, what happens when you declare an array:
char x[] = "hello";
you get 6 bytes on the stack, and they are set to the characters h, e, l, l, o, and the value zero.
Now what happens if you declare a character pointer:
const char* x = "hello";
The bytes for "hello" are stored somewhere in static memory, and you get enough bytes to hold a pointer on the stack, and its value is set to the address of the first byte of that static memory that holds the value of the string.
So now what happens when you declare it as in your second example? You get three separate strings stored in static memory, "a", "b", and "c". Then on the stack you get an array of three pointers, each set to the memory location of those three strings.
So what is your first example trying to do? It looks like you want a pointer to an array of pointers, but the question is where will this array of pointers go? This is like my pointer example above, where something should be allocated in static memory. However, it just happens that you cannot declare a two dimensional array in static memory using brace initialisation like that. So you could do what you want by declaring the array as a variable outside of the function:
const char* name_pointers[] = {"a", "b", "c"};
then inside the function:
const char** name = name_pointers;

Resources