I learnt that there are two ways of declaring an array in C:
int array[] = {1,2,3};
and:
int* arr = malloc(3*sizeof(int));
Why is arr called a free pointer ? And why can't I change the address contained in array while I can do it with array ?
As said in comments, you learned something incorrect, from a bad source.
In the second case, arr is not an array, it's a pointer. A pointer that (if the allocation succeeds) happens to contain the address of a block of memory that can hold three ints, but that's not an array.
This confusion probably comes from the fact that arrays "decay" to pointers in some contexts, but that does not make them equivalent.
Let's look at how the two objects are laid out in memory:
+---+
array: | 1 | array[0]
+---+
| 2 | array[1]
+---+
| 3 | array[2]
+---+
+---+ +---+
arr: | | ---------> | ? | arr[0]
+---+ +---+
| ? | arr[1]
+---+
| ? | arr[2]
+---+
So, one immediate difference - there is no array object that is separate from the array elements themselves, whereas arr is a separate object from the array elements. Only array is an actual array as far as C is concerned - arr is just a pointer to a single object, which may be the first element of a sequence of objects or not.
This is why you can assign a new address value to arr, but not to array - in the second case, there's nothing to assign the new address value to. It's like trying to change the address of a scalar variable - you can't do it, because the operation doesn't make any sense.
It also means that the address of array[0] is the same as the address of array. The expressions &array[0], array, and &array will all yield the same address value, although the types of the expressions will be different (int *, int *, and int (*)[3], respectively). By contrast, the address of arr is not the same as the address of arr[0]; the expressions arr and &arr[0] will yield the same value, but &arr will not, and its type will be int ** instead of int (*)[3].
Related
This program works in C:
#include <stdio.h>
int main(void) {
char a[10] = "Hello";
char *b = a;
printf("%s",b);
}
There are two things I would expect to be different. One is that we in the second line in the main write: "char *b = &a", then the program is like this:
#include <stdio.h>
int main(void) {
char a[10] = "Hello";
char *b = &a;
printf("%s",b);
}
But this does not work. Why is that? Isn't this the correct way to initialize a pointer with an adress?
The second problem I have is in the last line we should have: printf("%s",*b) so the program is like this:
#include <stdio.h>
int main(void) {
char a[10] = "Hello";
char *b = a;
printf("%s",*b);
}
But this gives a segmentation fault. Why does this not work? Aren't we supposed to write "*" in front of a pointer to get its value?
There is a special rule in C. When you write
char *b = a;
you get the same effect as if you had written
char *b = &a[0];
That is, you automatically get a pointer to the array's first element. This happens any time you try to take the "value" of an array.
Aren't we supposed to write "*" in front of a pointer to get its value?
Yes, and if you wanted to get the single character pointed to by b, you would therefore need the *. This code
printf("first char: %c\n", *b);
would print the first character of the string. But when you write
printf("whole string: %s\n", b);
you get the whole string. %s prints multiple characters, and it expects a pointer. Down inside printf, when you use %s, it loops over and prints all the characters in the string.
Expanding on Steve's answer (which is the correct one to accept)...
This is the special rule he's talking about:
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
C 2011 Prepublication Draft
Arrays are weird and don't behave like other types. You don't get this "decay to a pointer to the first element" behavior in other aggregate types like struct types. You can't assign the contents of an entire array with the = operator like you can with struct types; for example, you can't do something like
int a[5] = {1, 2, 3, 4, 5};
int b[5];
...
b = a; // not allowed; that's what "is not an lvalue" means
Why are arrays weird?
C was derived from an earlier language named B, and when you declared an array in B:
auto arr[5];
the compiler set aside an extra word to point to the first element of the array:
+---+
arr: | | ----------+
+---+ |
... |
+---+ |
| | arr[0] <--+
+---+
| | arr[1]
+---+
| | arr[2]
+---+
| | arr[3]
+---+
| | arr[4]
+---+
The array subscript operation arr[i] was defined as *(arr + i) - given the starting address stored in arr, offset i elements from that address and dereference the result. This also meant that &arr would yield a different value from &arr[0].
When he was designing C, Ritchie wanted to keep B's array subscripting behavior, but he didn't want to set aside storage for the separate pointer that behavior required. So instead of storing a separate pointer, he created the "decay" rule. When you declare an array in C:
int arr[5];
the only storage set aside is for the array elements themselves:
+---+
arr: | | arr[0]
+---+
| | arr[1]
+---+
| | arr[2]
+---+
| | arr[3]
+---+
| | arr[4]
+---+
The subscript operation arr[i] is still defined as *(arr + i), but instead of storing a pointer value in arr, a pointer value is computed from the expression arr. This means &arr and &arr[0] will yield the same address value, but the types of the expressions will be different (int (*)[5] vs int *, respectively).
One practical effect of this rule is that you can use the [] operator on pointer expressions as well as array expressions - given your code you can write b[i] and it will behave exactly like a[i].
Another practical effect is that when you pass an array expression as an argument to a function, what the function actually receives is a pointer to the first element. This is why you often have to pass the array size as a separate parameter, because a pointer only points to a single object of the specified type; there's no way to know from the pointer value itself whether you're pointing to the first element of an array, how many elements are in the array, etc.
Arrays carry no metadata around, so there's no way to query an array for its size, or type, or anything else at runtime. The sizeof operator is computed at compile time, not runtime.
#include<stdio.h>
int main(){
int a[] = {1,2,3};
int b[] = {4,5,6};
b = a;
return 0;
}
Result in this error:
array type 'int [3]' is not assignable
I know arrays are lvalues and are not assignable but in this case, all the compiler has to do is
reassign a pointer. b should just point to the address of a. Why isn't this doable?
"I know arrays are lvalues and are not assignable but in this case, all the compiler has to do is reassign a pointer."
"b should just point to the address of a. Why isn't this doable?"
You seem to confuse here something. b isn't a pointer. It is an array of three int elements.
b = a;
Since b is used here as lvalue in the assignment, it is taken as of type int [3], not int *. The pointer to decay rule takes no place in here for b, only for a as rvalue.
You cannot assign an array (here b) by a pointer to the first element of another array (here a) by using b = a; in C.
The syntax doesn't allow that.
That's what the error
"array type 'int [3]' is not assignable"
is saying to you for b.
Also you seem to be under the misunderstanding that the pointer to decay rule means that an array is anyhow converted to a pointer object, which can in any manner store addresses of locations of different objects.
This is not true. This conversion is only happening in a very implicit kind of way and is subject of this SO question:
Is the array to pointer decay changed to a pointer object?
If you want to assign the values from array a to the array b, you can use memcpy():
memcpy(b, a, sizeof(a));
I know arrays are lvalues and are not assignable but in this case, all the compiler has to do is reassign a pointer. b should just point to the address of a. Why isn't this doable?
Because b isn't a pointer. When you declare and allocate a and b, this is what you get:
+---+
| 1 | a[0]
+---+
| 2 | a[1]
+---+
| 3 | a[2]
+---+
...
+---+
| 4 | b[0]
+---+
| 5 | b[1]
+---+
| 6 | b[2]
+---+
No space is set aside for any pointers. There is no pointer object a or b separate from the array elements themselves.
C was derived from an earlier language called B, and in B there was a separate pointer to the first element:
+---+
b: | +-+--+
+---+ |
... |
+----+
|
V
+---+
| | b[0]
+---+
| | b[1]
+---+
...
+---+
| | b[N-1]
+---+
When Dennis Ritchie was developing C, he wanted to keep B's array semantics (specifically, a[i] == *(a + i)), but he didn't want to store that separate pointer anywhere. So instead he created the following rule - unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array, and that value is not an lvalue.
This has several practical effects, the most relevant here being that an array expression may not be the target of an assignment. Array expressions lose their "array-ness" under most circumstances, and simply are not treated like other types.
Edit
Actually, that misstates the case - array expression may not be the target of an assignment because an array expression is not a modifiable lvalue. The decay rule doesn't come into play. But the statement "arrays are not treated like other types" still holds.
End Edit
The upshot is that you cannot copy the contents of one array to the other using just the = operator. You must either use a library function like memcpy or copy each element individually.
Others already explained what you got wrong. I'm writing that answer to explain that actually the compiler could assign an array to another, and you can achieve the same effect with minimal change to your sample code.
Just wrap your array in a structure.
#include <stdio.h>
int main(){
struct Array3 {
int t[3];
};
struct Array3 a = {{1,2,3}};
struct Array3 b = {{4,5,6}};
a = b;
printf("%d %d %d", a.t[0], a.t[1], a.t[2]);
return 0;
}
Once the array is wrapped in a structure copying the array member of the structure works exactly as copying any other member. In other words you are copying an array. This trick is usefull in some cases like when you really want to pass an array to a function by copying it. It's slightly cleaner and safer than using memcopy for that purpose, which obviously would also work.
Henceforth the reason why it is not allowed for top level arrays is not because the compiler can't do it, but merely because that's not what most programmers usually wants to do.
Usually they just want to decay the array to a pointer. Obviously that is what you thought it should do, and direct copy of array is likely forbiden to avoid specifically that misunderstanding.
From The C Programming Language:
The array name is the address of the zeroth element.
There is one difference between an array name and a pointer that must be kept in mind. A pointer is a variable. But an array name is not a variable.
My understanding is that the array name is a constant, so it can't be assigned.
The variable b in your code is allocated on the stack as 3 consecutive ints. You can take the address of b and store it in a variable of type int*.
You could assign a value to it if you allocate the array on the heap and store only the pointer to it on the stack, in this case you could, in fact, be able to change the value of the pointer to be the same as a.
Here is a sample code. I believe array is an address while *array is the value
int array[7][7];
array == *array
But I found out array same as *array. How is it?
Here
int arr[7][7];
arr is two dimensional array containing 7 one dimensional array & each one dimensional array contains 7 elements. It looks like
arr[0][0] arr[0][1] arr[0][2] arr[0][3] arr[0][4] arr[0][5] arr[0][6] arr[0][7]
| | | | | | | | | ....|
---------------------------------------------------------------------- -----
(0x100) | |
arr[0] (0x100) arr[1] ... arr[6]
| |
-----------------------------------------------
|
arr(0x100) -assume base address is 0x100
arr, arr[0] and address of arr[0][0] all results in same i.e arr and *arr results in same address.
You can't have a value without a type.
123456789 just like that is nothing; you need to know if it's an integer, a double, a pointer, ...
So, I like to think of it as the pair (value, type).
And that value could be (123456789, int), or (123456789, double), or (123456789, char*), ... which are all different pairs (with the same value).
In your case you have (<address>, char(*)[7]) not the same as (<address>, char*)
An element of a multidimensional array is an array, which in many cases decays to a pointer to its first element.
So, in both cases you have a pointer to the same location in memory.
As of C, they aren't technically the same, because they point to differents objects, but because those objects have the same address, and because most implementations store all pointers in the same format, they happen to evaluate equal.
Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.
#include <stdio.h>
int main() {
int a[] = {100, 200, 300, 400, 500};
int size = 0;
size = *(&a + 1) - a;
printf("%d\n", size);
return 0;
}
As expected, it returns 5.
edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method
size = (&arr)[1] - arr;
so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!
When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p points to an int object, then p + 1 will point to the next int in a sequence. If p points to a 5-element array of int (in this case, the expression &a), then p + 1 will point to the next 5-element array of int in a sequence.
Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.
The expression &a yields the address of a, and has the type int (*)[5] (pointer to 5-element array of int). The expression &a + 1 yields the address of the next 5-element array of int following a, and also has the type int (*)[5]. The expression *(&a + 1) dereferences the result of &a + 1, such that it yields the address of the first int following the last element of a, and has type int [5], which in this context "decays" to an expression of type int *.
Similarly, the expression a "decays" to a pointer to the first element of the array and has type int *.
A picture may help:
int [5] int (*)[5] int int *
+---+ +---+
| | <- &a | | <- a
| - | +---+
| | | | <- a + 1
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
| | <- &a + 1 | | <- *(&a + 1)
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
| - | +---+
| | | |
+---+ +---+
This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int, while on the right, we're viewing it as a sequence of int. I also show the various expressions and their types.
Be aware, the expression *(&a + 1) results in undefined behavior:
...
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
C 2011 Online Draft, 6.5.6/9
This line is of most importance:
size = *(&a + 1) - a;
As you can see, it first takes the address of a and adds one to it. Then, it dereferences that pointer and subtracts the original value of a from it.
Pointer arithmetic in C causes this to return the number of elements in the array, or 5. Adding one and &a is a pointer to the next array of 5 ints after a. After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array.
Details on how pointer arithmetic works:
Say you have a pointer xyz that points to an int type and contains the value (int *)160. When you subtract any number from xyz, C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. For example, if you subtracted 5 from xyz, the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply.
As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1.
Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:
a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced
This means that the code is subtracting a from &a[5] (or a+5), giving 5.
Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.
Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.
Taking the steps one at a time:
&a gets a pointer to an object of type int[5]
+1 gets the next such object assuming there is an array of those
* effectively converts that address into type pointer to int
-a subtracts the two int pointers, returning the count of int instances between them.
I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a.
Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!
I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.
This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.
Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]), when the program performs &a+1 it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]) is actually generated as an array of array of 8 chars (char arr[][8]), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.
In C, you can declare an char array either by
char []array;
or
char *array;
The later one is a pointer, why can it be an array?
Pointers and arrays are two completely different animals; a pointer cannot be an array and an array cannot be a pointer.
The confusion comes from two concepts that aren't explained very well in most introductory C texts.
The first is that the array subscript operator [] can be applied to both pointer and array expressions. The expression a[i] is defined as *(a + i); you offset i elements from the address stored in a and dereference the result.
So if you declare a pointer
T *p;
and assign it to point to some memory, like so
p = malloc( N * sizeof *p );
you'll get something like the following:
+---+
p: | | ---+
+---+ |
... |
+---+ |
p[0]: | |<---+
+---+
p[1]: | |
+---+
...
+---+
p[N-1]: | |
+---+
p stores the base address of the array, so *(p + i) gives you the value stored in the i'th element (not byte) following that address.
However, when you declare an array, such as
T a[N];
what you get in memory is the following:
+---+
a[0]: | |
+---+
a[1]: | |
+---+
...
+---+
a[N-1]: | |
+---+
Storage has only been set aside for the array elements themselves; there's no separate storage set aside for a variable named a to store the base address of the array. So how can the *(a+i) mechanism possibly work?
This brings us to the second concept: except when it is the operand of the sizeof or unary & operators, or is a string literal being used to initialize another array ijn a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
In other words, when the compiler sees the expression a in the code, it will replace that expression with a pointer to the first element of a, unless a is the operand of sizeof or unary &. So a evaluates to the address of the first element of the array, meaning *(a + i) will work as expected.
Thus, the subscript operator works exactly the same way for both pointer and array expressions. However, this does not mean that pointer objects are the same thing as array objects; they are not, and anyone who claims otherwise is confused.