Understanding the dereference, address-of, and array subscript operators in C - c

I have argv[] defined as a char *. Using the following printf statements:
printf("%s\n",argv[1]); // prints out the entire string
printf("%p\n",&argv[1]); // & -> gets the address
printf("%c\n",argv[1][0]);// prints out the first char of second var
printf("%c\n",*argv[1]); //
It's this last one I don't understand. What does it mean to print *argv[1]? why isn't that the same as *argv[1][0] and how come you can't print out printf("%s\n",*argv[1]);. Also, why is &*argv[1] a different address then &argv[1]?

The array subscript operation a[i] is defined as *(a + i) - given the address a, offset i elements (not bytes) from that address and dereference the result. Thus, given a pointer p, *p is equivalent to *(p + 0), which is equivalent to p[0].
The type of argv is char **; given that, all of the following are true:
Expression Type Value
---------- ---- -----
argv char ** Pointer to a sequence of strings
*argv char * Equivalent to argv[0]
**argv char Equivalent to argv[0][0]
argv[i] char * Pointer to a single string
*argv[i] char Same as argv[i][0]
argv[i][j] char j'th character of i'th string
&argv[i] char ** Address of the pointer to the i'th string
Since the type of argv[i][j] is char, *argv[i][j] is not a valid expression.
Here's a bad visualization of the argv sequence:
+---+ +---+ +---+
argv | | ---> argv[0] | | ---------------------------> argv[0][0] | |
+---+ +---+ +---+ +---+
argv[1] | | -------> argv[1][0] | | argv[0][1] | |
+---+ +---+ +---+
... argv[1][1] | | ...
+---+ +---+ +---+
argv[argc] | | ---||| ... argv[0][n-1] | |
+---+ +---+ +---+
argv[1][m-1] | |
+---+
This may help explain the results of different expressions.

char *argv[]
argv is array(1) of char pointers. So it is normal array just each element of the array is a pointer. argv[0] is a pointer, argv[1], etc.
argv[0] - first element in the array. Since each element in the array is char pointer, value of this is also a char pointer (as we already mentioned above).
*argv[1] - Now here argv[1] is second element in the above array, but argv[1] is also a char pointer. Applying * just dereferences the pointer and you get the first character in the string to which argv[1] points to.
You should use %c to print it as this is just a character.
argv[1][0] is already first character of second string in the array - so no more room for dereferencing. This is essentially same as previous.
(1) as highlighted strictly saying it is pointer to pointer, but maybe you can "think" of it as array of pointers. Anyway more info about it here: https://stackoverflow.com/a/39096006/3963067

If argv[1] is a pointer to char, then *argv[1] dereferences that pointer and gets you the first character of the string at argv[1], so it's the same as argv[1][0] and is printed with the "%c" format specifier.
argv[1][0] is a char itself, not a pointer, so it's not dereferensable.

This is not specific to char *.
You can simplify by what is the difference between *ptr and ptr[0].
There are no difference because ptr[0] is a sugar for *(ptr + 0) or *ptr because + 0 is useless.
// printf("%p\n", &argv[1]); is wrong you must cast to (void *)
printf("%p\n", (void *)&argv[1]);
Because %p specifier expect a void *, in normal case C auto promote your pointer into void * but printf() use variable argument list. There are a lot of rule about this, I let you read the doc if you want. But char * wiil not be promote to void * and like I say printf() except void * so you have a undefined behavior if you don't cast it yourself.

The last line printf("%c\n",*argv[1]); is both dereferencing argv and accessing array index 1. In other words, this is doing argv[1][0], just like the previous line, because the array subscript access [1] has a higher precedence than the dereference operator (*).
However, if you were to parenthesize the expression in the last line to make the dereference operator be processed first, you would do this:
printf("%c\n", (*argv)[1]);
Now, when you run the program, the last line of output would be argv[0][1] instead of [1][0], i.e. the second character in the command line you use to execute the program.

Related

Printf function %p behave differently when it is given an array compared to a pointer

In the first printf the output is the address of arr and I was expecting this. However in the second one it prints the same address. Isn't it supposed to print the "value" pointed to by arr ( which is the address 0 ) ?
char * arr[2];
char ** p;
int main(){
p = arr;
printf("%p\n",&arr);
printf("%p\n",arr);
printf("%p\n",&p);
printf("%p\n",p);
/* OUTPUT
0x10aa71010
0x10aa71010
0x10aa71020
0x10aa71010
*/
}
The variable p is a pointer. It points to the first element of arr. Applying the pointer-to operator & (also known as the address-of operator) to p will create a pointer to the variable p.
Drawing it, it would be something like
+----+ +---+ +--------+
| &p | ---> | p | ---> | arr[0] |
+----+ +---+ +--------+
With arrays it's really the same, applying the pointer-to operator to an array gives you a pointer to the array. But the location of the array is the same as the location of the first element of the array.
Drawing it would be something like this:
+--------+--------+
| arr[0] | arr[1] |
+--------+--------+
^
|
&arr[0] (pointer to the first element, what plain arr decays to)
|
&arr (pointer to the array itself)
From these "drawings" it should hopefully be easier to understand why p and &p would be different, as well as why arr and &arr would be the same.
Very important note: While &arr and &arr[0] point to the same location, the two pointers have different type and are therefore semantically different. Doing pointer arithmetic with the two pointers will not give the same result.
For your example, the type of &arr will be char *(*)[2], while the type of &arr[0] will be char **.
On a technical and kind of nitpicking note, the printf format %p is really for printing void * pointers. And the pointers you pass to printf will have different types.
Mismatching format specifier and argument type leads to undefined behavior. So to be fully correct you need to cast all pointers to void *.
After this assignment
p = arr;
the pointer p points to the first element of the array arr. Note that arrays used in expressions with rare exceptions are implicitly converted to pointers to their first elements.
Thus the address of the first element of the array arr is equal to the address of the array itself.
So this call of printf
printf("%p\n",p);
outputs the address of the first element of the array stored on the pointer p after its assignment.
On the other hand the array arr and the pointer p occupy different extents of memory.
char * arr[2];
char ** p;
And this call of printf
printf("%p\n",&p);
outputs the address of the pointer p itself.

Arrays and pointers in C, two questions

This program works in C:
#include <stdio.h>
int main(void) {
char a[10] = "Hello";
char *b = a;
printf("%s",b);
}
There are two things I would expect to be different. One is that we in the second line in the main write: "char *b = &a", then the program is like this:
#include <stdio.h>
int main(void) {
char a[10] = "Hello";
char *b = &a;
printf("%s",b);
}
But this does not work. Why is that? Isn't this the correct way to initialize a pointer with an adress?
The second problem I have is in the last line we should have: printf("%s",*b) so the program is like this:
#include <stdio.h>
int main(void) {
char a[10] = "Hello";
char *b = a;
printf("%s",*b);
}
But this gives a segmentation fault. Why does this not work? Aren't we supposed to write "*" in front of a pointer to get its value?
There is a special rule in C. When you write
char *b = a;
you get the same effect as if you had written
char *b = &a[0];
That is, you automatically get a pointer to the array's first element. This happens any time you try to take the "value" of an array.
Aren't we supposed to write "*" in front of a pointer to get its value?
Yes, and if you wanted to get the single character pointed to by b, you would therefore need the *. This code
printf("first char: %c\n", *b);
would print the first character of the string. But when you write
printf("whole string: %s\n", b);
you get the whole string. %s prints multiple characters, and it expects a pointer. Down inside printf, when you use %s, it loops over and prints all the characters in the string.
Expanding on Steve's answer (which is the correct one to accept)...
This is the special rule he's talking about:
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue. If the array object has
register storage class, the behavior is undefined.
C 2011 Prepublication Draft
Arrays are weird and don't behave like other types. You don't get this "decay to a pointer to the first element" behavior in other aggregate types like struct types. You can't assign the contents of an entire array with the = operator like you can with struct types; for example, you can't do something like
int a[5] = {1, 2, 3, 4, 5};
int b[5];
...
b = a; // not allowed; that's what "is not an lvalue" means
Why are arrays weird?
C was derived from an earlier language named B, and when you declared an array in B:
auto arr[5];
the compiler set aside an extra word to point to the first element of the array:
+---+
arr: | | ----------+
+---+ |
... |
+---+ |
| | arr[0] <--+
+---+
| | arr[1]
+---+
| | arr[2]
+---+
| | arr[3]
+---+
| | arr[4]
+---+
The array subscript operation arr[i] was defined as *(arr + i) - given the starting address stored in arr, offset i elements from that address and dereference the result. This also meant that &arr would yield a different value from &arr[0].
When he was designing C, Ritchie wanted to keep B's array subscripting behavior, but he didn't want to set aside storage for the separate pointer that behavior required. So instead of storing a separate pointer, he created the "decay" rule. When you declare an array in C:
int arr[5];
the only storage set aside is for the array elements themselves:
+---+
arr: | | arr[0]
+---+
| | arr[1]
+---+
| | arr[2]
+---+
| | arr[3]
+---+
| | arr[4]
+---+
The subscript operation arr[i] is still defined as *(arr + i), but instead of storing a pointer value in arr, a pointer value is computed from the expression arr. This means &arr and &arr[0] will yield the same address value, but the types of the expressions will be different (int (*)[5] vs int *, respectively).
One practical effect of this rule is that you can use the [] operator on pointer expressions as well as array expressions - given your code you can write b[i] and it will behave exactly like a[i].
Another practical effect is that when you pass an array expression as an argument to a function, what the function actually receives is a pointer to the first element. This is why you often have to pass the array size as a separate parameter, because a pointer only points to a single object of the specified type; there's no way to know from the pointer value itself whether you're pointing to the first element of an array, how many elements are in the array, etc.
Arrays carry no metadata around, so there's no way to query an array for its size, or type, or anything else at runtime. The sizeof operator is computed at compile time, not runtime.

Difference between char **p,char *p[],char p[][]

char *p = "some string"
creates a pointer p pointing to a block containing the string.
char p[] = "some string"
creates a character array and with literals in it.
And the first one is a constant declaration.Is it the same of two-dimensional arrays?
what is the difference between
char **p,char *p[],char p[][].
I read a bit about this that char **p creates an array of pointers so it has an overhead compared to char p[][] for storing the pointer values.
the first two declarations create constant arrays.i did not get any run time error when i tried to modify the contents of argv in main(int argc,char **argv). Is it because they are declared in function prototype?
Normal Declarations (Not Function Parameters)
char **p; declares a pointer to a pointer to char. It reserves space for the pointer. It does not reserve any space for the pointed-to pointers or any char.
char *p[N]; declares an array of N pointers to char. It reserves space for N pointers. It does not reserve any space for any char. N must be provided explicitly or, in a definition with initializers, implicitly by letting the compiler count the initializers.
char p[M][N]; declares an array of M arrays of N char. It reserves space for M•N char. There are no pointers involved. N must be provided explicitly. M must be provided explicitly or, in a definition with initializers, implicitly by letting the compiler count the initializers.
Declarations in Function Parameters
char **p declares a pointer to a pointer to char. When the function is called, space is provided for that pointer (typically on a stack or in a processor register). No space is reserved for the pointed-to-pointers or any char.
char *p[N] is adjusted to be char **p, so it is the same as above. The value of N is ignored, and N may be absent. (Some compilers may evaluate N, so, if it is an expression with side effects, such as printf("Hello, world.\n"), these effects may occur when the function is called. The C standard is unclear on this.)
char p[M][N] is adjusted to be char (*p)[N], so it is a pointer to an array of N char. The value of M is ignored, and M may be absent. N must be provided. When the function is called, space is provided for the pointer (typically on a stack or in a processor register). No space is reserved for the array of N char.
argv
argv is created by the special software that calls main. It is filled with data that the software obtains from the “environment”. You are allowed to modify the char data in it.
In your definition char *p = "some string";, you are not permitted to modify the data that p points to because the C standard says that characters in a string literal may not be modified. (Technically, what it says is that it does not define the behavior if you try.) In this definition, p is not an array; it is a pointer to the first char in an array, and those char are inside a string literal, and you are not permitted to modify the contents of a string literal.
In your definition char p[] = "some string";, you may modify the contents of p. They are not a string literal. In this case, the string literal effectively does not exist at run-time; it is only something used to specify how the array p is initialized. Once p is initialized, you may modify it.
The data set up for argv is set up in a way that allows you to modify it (because the C standard specifies this).
Some more differences description looking it from memory addressing view as follows,
I. char **p; p is double pointer of type char
Declaration:
char a = 'g';
char *b = &a;
char **p = &b;
p b a
+------+ +------+ +------+
| | | | | |
|0x2000|------------>|0x1000|------------>| g |
| | | | | |
+------+ +------+ +------+
0x3000 0x2000 0x1000
Figure 1: Typical memory layout assumption
In above declaration, a is char type containing a character g. Pointer b contains the address of an existing character variable a. Now b is address 0x1000 and *b is character g. Finally address of b is assigned to p, therefore a is a character variable, b is pointer and p is pointer to pointer. Which implies a contains value, b contains address and p contains address of address as shown below in the diagram.
Here, sizeof(p) = sizeof(char *) on respective system;
II. char *p[M]; p is array of strings
Declaration:
char *p[] = {"Monday", "Tuesday", "Wednesday"};
p
+------+
| p[0] | +----------+
0 | 0x100|------>| Monday\0 |
| | +----------+
|------| 0x100
| p[1] | +-----------+
1 | 0x200|------>| Tuesday\0 |
| | +-----------+
|------| 0x200
| p[2] | +-------------+
2 | 0x300|------>| Wednesday\0 |
| | +-------------+
+------+ 0x300
Figure 2: Typical memory layout assumption
In this declaration, p is array of 3 pointers of type char. Implies array p can hold 3 strings. Each string (Monday, Tuesday & Wednesday) is located some where in memory (0x100, 0x200 & 0x300), there addresses are in array p as (p[0], p[1] & p[2]) respectively. Hence it is array of pointers.
Notes: char *p[3];
1. p[0], p[1] & p[2] are addresses of strings of type `char *`.
2. p, p+1 & p+2 are address of address with type being `char **`.
3. Accessing elements is through, p[i][j] is char; p[i] is char *; & p is char **
Here sizeof(p) = Number of char array * sizeof(char *)
III. char p[M][N]; p is array of fixed length strings with dimensions as M x N
Declaration:
char p[][10] = {Monday, Tuesday, Wednesday};
p 0x1 2 3 4 5 6 7 8 9 10
+-------------------------+
0 | M o n d a y \0 \0 \0 \0|
1 | T u e s d a y \0 \0 \0|
2 | W e d n e s d a y \0|
+-------------------------+
Figure 3: Typical memory layout assumption
In this case array p contain 3 strings each containing 10 characters. Form the memory layout we can say p is a two dimensional array of characters with size MxN, which is 3x10 in our example. This is useful for representing strings of equal length since there is a possibility of memory wastage when strings contains lesser than 10 characters compared to declaration char *p[], which has no memory wastage because string length is not specified and it is useful for representing strings of unequal length.
Accessing elements is similar as above case, p[M] is M'th string & p[M][N] is N'th character of M'th string.
Here sizeof(p) = (M rows * N columns) * sizeof(char) of two dimensional array;
a in char* a is pointer to array of chars, a can be modified.
b in char b[] is array of chars. b cannot be modified.
They are sort of compatible - b can automatically decay to a in assignments and expressions, but not other way around.
When you use char** p, char* p[] and char p[][] it is very similar situation, just more levels of indirection.

Notation of **argv in main function [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
argc and argv in main
I'm having difficulty understanding the notation used for the general main function declaration, i.e. int main(int argc, char *argv[]). I understand that what is actually passed to the main function is a pointer to a pointer to char, but I find the notation difficult. For instance:
Why does **argv point to the first char and not the whole string? Likewise, why does *argv[0] point to the same thing as the previous example.
Why does *argv point to the whole first string, instead of the first char like the previous example?
This is a little unrelated, but why does *argv + 1 point a string 'minus the first char' instead of pointing to the next string in the array?
Consider a program with argc == 3.
argv
|
v
+---------+ +----------------+
| argv[0] |-------->| program name\0 |
+---------+ +-------------+--+
| argv[1] |-------->| argument1\0 |
+---------+ +-------------+
| argv[2] |-------->| argument2\0 |
+---------+ +-------------+
| 0 |
+---------+
The variable argv points to the start of an array of pointers. argv[0] is the first pointer. It points at the program name (or, if the system cannot determine the program name, then the string for argv[0] will be an empty string; argv[0][0] == '\0'). argv[1] points to the first argument, argv[2] points to the second argument, and argv[3] == 0 (equivalently argv[argc] == 0).
The other detail you need to know, of course, is that array[i] == *(array + i) for any array.
You ask specifically:
Why does **argv point to the first char and not the whole string?
*argv is equivalent to *(argv + 0) and hence argv[0]. It is a char *. When you dereference a char *, you get the 'first' character in the string. And **argv is therefore equivalent to *(argv[0]) or *(argv[0] + 0) or argv[0][0].
(It can be legitimately argued that **argv is a character, not a pointer, so it doesn't 'point to the first char'. It is simply another name for the 'p' of "program name\0".)
Likewise, why does *argv[0] point to the same thing as the previous example.
As noted before, argv[0] is a pointer to the string; therefore *argv[0] must be the first character in the string.
Why does *argv point to the whole first string, instead of the first char like the previous example?
This is a question of convention. *argv points at the first character of the first string. If you interpret it as a pointer to a string, it points to 'the whole string', in the same way that char *pqr = "Hello world\n"; points at 'the whole string'. If you interpret it as a pointer to a single character, it points to the first character of the string. Think of it as like wave-particle duality, only here it is character-string duality.
Why does *argv + 1 point a string 'minus the first char' instead of pointing to the next string in the array?
*argv + 1 is (*argv) + 1. As already discussed, *argv points at the first character of the first string. If you add 1 to a pointer, it points at the next item; since *argv points at a character, *argv+1 points to the next character.
*(argv + 1) points to the (first character of the) next string.
It all falls down to pointer arithmetic.
*argv[0] = *(*(argv + 0)) = **argv
Since [] has higher precedence than unary *.
On the other hand, *argv gives the first cell in the array, an array containing pointers. What does this pointer point to? Why a char array, a string, of course.
*argv + 1 gives what it gives because + has lower precedence than unary *, so first we get a pointer to a string, and than we add 1 to it, thus getting a pointer the the second
character in the string.
I understand that what is actually passed to the main function is a pointer to a pointer to char
No, what's passed is an array of char pointers (an array of character strings). Think of it like this, if I give this at the command prompt:
>> ./program hello 456
My program's main will get:
argc == 3
argv[0] == program (the name of the program as a string)
argv[1] == hello (the first parameter as a string)
argv[2] == 456 (the second parameter as a string)
Why does **argv point to the first char and not the whole string?
char *argv[] //an array of character pointers
*argv // an array decays to a pointer, so this is functionally equivalent to
// argv[0]
**argv // Now the argv[0] decays to a pointer and this is functionally
// equivalent to (argv[0])[0]
Likewise, why does *argv[0] point to the same thing as the previous example.
See above.
Why does *argv point to the whole first string, instead of the first char like the previous example?
See above.
This is all because an array is also a pointer to the first element in the array in c. **argv dereferences our pointer to pointer to char twice, giving us a char. *argv[0] is basically saying 'dereference that address, and return the first element in the array described by the address we just got from dereferencing,' which happens to be the same thing. *argv only dereferences once, so we still have a pointer to char, or a char array. *argv + 1 dereferences once, giving us the first character string, and then adds 1 to the address, giving us the address of the second element. Because pointers are also arrays, we can say that this is the array *argv minus the first element.

Pointer to Pointer with argv

Based on my understanding of pointer to pointer to an array of characters,
% ./pointer one two
argv
+----+ +----+
| . | ---> | . | ---> "./pointer\0"
+----+ +----+
| . | ---> "one\0"
+----+
| . | ---> "two\0"
+----+
From the code:
int main(int argc, char **argv) {
printf("Value of argv[1]: %s", argv[1]);
}
My question is, Why is argv[1] acceptable? Why is it not something like (*argv)[1]?
My understanding steps:
Take argv, dereference it.
It should return the address of the array of pointers to characters.
Using pointer arithmetics to access elements of the array.
It's more convenient to think of [] as an operator for pointers rather than arrays; it's used with both, but since arrays decay to pointers array indexing still makes sense if it's looked at this way. So essentially it offsets, then dereferences, a pointer.
So with argv[1], what you've really got is *(argv + 1) expressed with more convenient syntax. This gives you the second char * in the block of memory pointed at by argv, since char * is the type argv points to, and [1] offsets argv by sizeof(char *) bytes then dereferences the result.
(*argv)[1] would dereference argv first with * to get the first pointer to char, then offset that by 1 * sizeof(char) bytes, then dereferences that to get a char. This gives the second character in the first string of the group of strings pointed at by argv, which is obviously not the same thing as argv[1].
So think of an indexed array variable as a pointer being operated on by an "offset then dereference a pointer" operator.
Because argv is a pointer to pointer to char, it follows that argv[1] is a pointer to char. The printf() format %s expects a pointer to char argument and prints the null-terminated array of characters that the argument points to. Since argv[1] is not a null pointer, there is no problem.
(*argv)[1] is also valid C, but (*argv) is equivalent to argv[0] and is a pointer to char, so (*argv)[1] is the second character of argv[0], which is / in your example.
Indexing a pointer as an array implicitly dereferences it. p[0] is *p, p[1] is *(p + 1), etc.

Resources