Notation of **argv in main function [duplicate] - c

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
argc and argv in main
I'm having difficulty understanding the notation used for the general main function declaration, i.e. int main(int argc, char *argv[]). I understand that what is actually passed to the main function is a pointer to a pointer to char, but I find the notation difficult. For instance:
Why does **argv point to the first char and not the whole string? Likewise, why does *argv[0] point to the same thing as the previous example.
Why does *argv point to the whole first string, instead of the first char like the previous example?
This is a little unrelated, but why does *argv + 1 point a string 'minus the first char' instead of pointing to the next string in the array?

Consider a program with argc == 3.
argv
|
v
+---------+ +----------------+
| argv[0] |-------->| program name\0 |
+---------+ +-------------+--+
| argv[1] |-------->| argument1\0 |
+---------+ +-------------+
| argv[2] |-------->| argument2\0 |
+---------+ +-------------+
| 0 |
+---------+
The variable argv points to the start of an array of pointers. argv[0] is the first pointer. It points at the program name (or, if the system cannot determine the program name, then the string for argv[0] will be an empty string; argv[0][0] == '\0'). argv[1] points to the first argument, argv[2] points to the second argument, and argv[3] == 0 (equivalently argv[argc] == 0).
The other detail you need to know, of course, is that array[i] == *(array + i) for any array.
You ask specifically:
Why does **argv point to the first char and not the whole string?
*argv is equivalent to *(argv + 0) and hence argv[0]. It is a char *. When you dereference a char *, you get the 'first' character in the string. And **argv is therefore equivalent to *(argv[0]) or *(argv[0] + 0) or argv[0][0].
(It can be legitimately argued that **argv is a character, not a pointer, so it doesn't 'point to the first char'. It is simply another name for the 'p' of "program name\0".)
Likewise, why does *argv[0] point to the same thing as the previous example.
As noted before, argv[0] is a pointer to the string; therefore *argv[0] must be the first character in the string.
Why does *argv point to the whole first string, instead of the first char like the previous example?
This is a question of convention. *argv points at the first character of the first string. If you interpret it as a pointer to a string, it points to 'the whole string', in the same way that char *pqr = "Hello world\n"; points at 'the whole string'. If you interpret it as a pointer to a single character, it points to the first character of the string. Think of it as like wave-particle duality, only here it is character-string duality.
Why does *argv + 1 point a string 'minus the first char' instead of pointing to the next string in the array?
*argv + 1 is (*argv) + 1. As already discussed, *argv points at the first character of the first string. If you add 1 to a pointer, it points at the next item; since *argv points at a character, *argv+1 points to the next character.
*(argv + 1) points to the (first character of the) next string.

It all falls down to pointer arithmetic.
*argv[0] = *(*(argv + 0)) = **argv
Since [] has higher precedence than unary *.
On the other hand, *argv gives the first cell in the array, an array containing pointers. What does this pointer point to? Why a char array, a string, of course.
*argv + 1 gives what it gives because + has lower precedence than unary *, so first we get a pointer to a string, and than we add 1 to it, thus getting a pointer the the second
character in the string.

I understand that what is actually passed to the main function is a pointer to a pointer to char
No, what's passed is an array of char pointers (an array of character strings). Think of it like this, if I give this at the command prompt:
>> ./program hello 456
My program's main will get:
argc == 3
argv[0] == program (the name of the program as a string)
argv[1] == hello (the first parameter as a string)
argv[2] == 456 (the second parameter as a string)
Why does **argv point to the first char and not the whole string?
char *argv[] //an array of character pointers
*argv // an array decays to a pointer, so this is functionally equivalent to
// argv[0]
**argv // Now the argv[0] decays to a pointer and this is functionally
// equivalent to (argv[0])[0]
Likewise, why does *argv[0] point to the same thing as the previous example.
See above.
Why does *argv point to the whole first string, instead of the first char like the previous example?
See above.

This is all because an array is also a pointer to the first element in the array in c. **argv dereferences our pointer to pointer to char twice, giving us a char. *argv[0] is basically saying 'dereference that address, and return the first element in the array described by the address we just got from dereferencing,' which happens to be the same thing. *argv only dereferences once, so we still have a pointer to char, or a char array. *argv + 1 dereferences once, giving us the first character string, and then adds 1 to the address, giving us the address of the second element. Because pointers are also arrays, we can say that this is the array *argv minus the first element.

Related

what is the relationship between char *[] and char ** when talking about array of character pointers

In command line arguments in C we can specify the arguments vector as char *argv[] or char **argv
i understand the first one which is an array of pointers to characters but what is the relationship between an array of pointers to characters and the second type which looks a pointer to pointer to character?
The difference between char *argv[] and char **argv is that:
char *argv[] is a array of char * pointers.
char **argv is a pointer to another pointer which points to a char.
char *argv[] can be visualized like this:
p1 -> "hello"
p2 -> "world"
p3 -> "!"
// p1, p2 and p3 are
// pointers to strings
// they have type char *
_________________
| p1| p2 | p3 |
—————————————————
// argv looks like this
// it is an array of all the pointers
when referencing the name of the array argv in an expression it will yield a pointer to the first element in the array.
The type of the array name argv when used in an expression is char **. This is because:
The array name argv decays to a pointer to the first element of the array.
The first element also happens to be a pointer, so argv is essentially a pointer to another pointer hence the type is char **
Function parameters that are arrays get implicitly adjusted by the compiler into a pointer to the first item of that array.
In case of the array char* argv[], it's an array of char* and a pointer to the first item is therefore a char**. Therefore it doesn't matter if you type char* argv[]or char**, they are equivalent in this specific case.
Also since the char* [] will get adjusted to char**, the size of the array doesn't matter. You could write char* argv [42] and that would be equivalent as well.
Subjectively, char* argv[] could be regarded as the most correct form, since it is 1) self-documenting - we are dealing with an array - and 2) the form used in the C standard 5.1.2.2.1 (hosted systems).

av[1] and av[1][0] Not the same address?

What's up ?
I can't understand this issue...
I know that the first element of an array stock the address of the entire array. But in this situation i can't figure it out.
#include <stdio.h>
int main(int ac, char **av)
{
printf("&av[1]= %p\n", &av[1]);
printf("&av[1][0]= %p\n", &av[1][0]);
return(0);
}
Input
./a.out "Hello"
Output
&av[1]= 0x7ffee0ffe4f0
&av[1][0]= 0x7ffee0ffe778
If somebody told you that char **av declares a two-dimensional array, they did you a disservice. In char **av, av is a pointer to a char *, possibly the first char * of several. So av[1] is a char *—it is a pointer to a char, and &av[1] is the address of that pointer.
av[1][0] is the char that av[1] points to, and &av[1][0] is the address of that char.
The pointer to the char and the char are of course not in the same place, so &av[1] and &av[1][0] are different.
In contrast, if you had an array, such as char av[3][4], then av is an array of 3 arrays of 4 char. In that case, av[1] is an array of 4 char, and &av[1] would be the address of (the start of) that array. &av[1][0] would be the address of the first char in that array. Since the char is at the beginning of the array, the address of the array and the address of the char are the same.
(However, they are not necessarily represented the same way in a C implementation, and printing them with %p can show different results even though they refer to the same place in memory.)
av isn't a two-dimensional array here. It's an array of pointers.
Whereas with a two dimensional array, &av[1][0] would have the same address as &av[1]; in an array of pointers , &av[1][0] means take the second (index=1) pointer, dereference it, and give me the address of the first element of what the dereferenced pointer target points at (the target of av[1] is also a pointer).
&av[1] is the address of the second char*, &av[1][0] is the address of the first character that the second char* points at. Different things.
av is an array of pointers where each element of this array points to a char array (of type char *) in which each character you are passing as input to the program is stored.
As you can see by the following variation of your program, av[0] stores the address of av[0][0] (i.e. points to the first position of the string stored in av[0][0]) which contains the character . and av[1] stores the address of av[1][0] (i.e. points to the first position of the string stored in av[1][0]) which contains the character H.
#include <stdio.h>
int main(int ac, char **av)
{
printf("&av[0]= %p\n", &av[0]);
printf("&av[1]= %p\n", &av[1]);
printf("av[0]= %p\n", av[0]);
printf("av[1]= %p\n", av[1]);
printf("av[0][0]= %c\n", av[0][0]);
printf("av[0][1]= %c\n", av[0][1]);
printf("av[0][0]= %p\n", &av[0][0]);
printf("av[0][1]= %p\n", &av[0][1]);
printf("av[1][0]= %c\n", av[1][0]);
printf("av[1][1]= %c\n", av[1][1]);
printf("av[1][0]= %p\n", &av[1][0]);
printf("av[1][1]= %p\n", &av[1][1]);
return(0);
}
which provides the following output:
$ ./a.out "Hello"
&av[0]= 0x7fff53625cf8
&av[1]= 0x7fff53625d00
av[0]= 0x7fff53626ee9
av[1]= 0x7fff53626ef1
av[0][0]= .
av[0][1]= /
&av[0][0]= 0x7fff53626ee9
&av[0][1]= 0x7fff53626eea
av[1][0]= H
av[1][1]= e
&av[1][0]= 0x7fff53626ef1
&av[1][1]= 0x7fff53626ef2
Graphically it is something like (significant part of addresses were removed for simplification purposes):
av: | 26ee9 | 26ef1 |
| |
av[0] av[1]
address: 25cf8 25d00
av[0]: | . | / | ...
| |
av[0][0] av[0][1]
address: 26ee9 26eea
av[1]: | H | e | ...
| |
av[1][0] av[1][1]
address: 26ef1 26ef2
I hope this clarifies your doubt.
av is a pointer to an array of pointers to chars. So its second element (the second element of the array) is, indeed a pointer. &av[1] is it's address (the address of a pointer, not the address of a character).
On the other side, av[1][0] is the first character of that second string. Its address is a char address, where the first character is stored.
Both addresses are addresses of different things, so it's normal they point to different places.
av[1] and &av[1][0] are, respectively, the pointer value (that points to the first char) and the address of the first char of the second string, so they must be pointers showing the same value. The same happens to *av[1] and av[1][0], they represent the same pointed to character.

Does printf only allow taking a string literal as the first argument?

According to the definition of printf, it says that first argument should be an array i.e char* followed by ellipses ... i.e variable arguments after that. If I write:
printf(3+"helloWorld"); //Output is "loWorld"`
According to the definition shouldn't it give an error?
Here is the definition of printf:
#include <libioP.h>
#include <stdarg.h>
#include <stdio.h>
#undef printf
/* Write formatted output to stdout from the format string FORMAT. */
/* VARARGS1 */
int __printf(const char *format, ...) {
va_list arg;
int done;
va_start(arg, format);
done = vfprintf(stdout, format, arg);
va_end (arg);
return done;
}
#undef _IO_printf
ldbl_strong_alias(__printf, printf);
/* This is for libg++. */
ldbl_strong_alias(__printf, _IO_printf);
This is not an error.
If you pass "helloWorld" to printf, the string literal is converted to a pointer to the first character.
If you pass 3+"helloWorld", you're adding 3 to a pointer to the first character, which results in a pointer to the 4th character. This is still a valid pointer to a string, it's just not the whole string that was defined.
3+"helloWorld" is of char * type (after conversion in call to printf). In C, the type of a string literal is char []. When passed as an argument to a function, char [] will convert to pointer to its first element (array to pointer conversion rule). Therefore, "helloWorld" will be converted to a pointer to the element h and 3+"helloWorld" will move the pointer to the 4th element of the array "helloWorld".
From Pointer Arithmetic:
If the pointer P points at an element of an array with index I, then
P+N and N+P are pointers that point at an element of the same array with index I+N
P-N is a pointer that points at an element of the same array with index {tt|I-N}}
The behavior is defined only if both the original pointer and the result pointer are pointing at elements of the same array or one past the end of that array. ....
The type of string literal is char[N], where N is size of string (including null terminator).
From C Standard#6.3.2.1p3 [emphasis mine]
3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
So, in the expression
3+"helloWorld"
"helloWorld", which is of type char [11] (array of characters), convert to pointer to character that points to the initial element of the array object.
Which means, the expression is:
3 + P
where P is pointer to initial element of "helloWorld" string
----------------------------------------------
| h | e | l | l | o | W | o | r | l | d | \0 |
----------------------------------------------
^
|
P (pointer pointing to initial element of array)
when 3 gets added to pointer P, the resulting pointer will be pointing to 4th character:
----------------------------------------------
| h | e | l | l | o | W | o | r | l | d | \0 |
----------------------------------------------
^
|
P (after adding 3 the resulting pointer pointing to 4th element of array)
This resulting pointer of expression 3+"helloWorld" will be passed to printf(). Note that the first parameter of printf() is not array but pointer to a null-terminated string and the expression 3+"helloWorld" resulting in pointer to 4th element of "helloWorld" string. Hence, you are getting output "loWorld".
The answers from dbush and haccks are concise and illuminating, and I upvoted the one from haccks in support of the bounty offered by dbush.
The only thing that I find left unsaid is that the way the question title is phrased makes me wonder if the OP would think that e.g. this also should produce an error:
char sometext[] = {'h', 'e', 'l', 'l', 'o', '\0'};
printf (sometext);
since there is no string literal involved at all. The OP needs to understand that one should never think that a function call that takes a char * argument can "only allow taking a string literal as [that] argument".
The answers from dbush and haccks hint at this by mentioning the conversion of a string literal to a char * (and how adding an integer to that evaluates), but I feel that it's worth pointing out explicitly that anything that is treated as a char * can be used, even things not converted from a string literal.
printf(3+"helloWorld"); //Output is "loWorld"
It will not give error because String in C Language String is array of characters and Array name give base address of array.
In case of printf(3+"helloWorld"); 3+"helloWorld" is giving the address of fourth element of array of charecters i.e of String
This is still a valid pointer to a string i.e char*
printf only allow taking a char* as the first argument
The first argument to printf is declared as const char *format. It means printf should be passed a pointer to char and the characters pointed to by this pointer will not be changed by printf. There are additional constraints on this first argument:
it should point to a proper C string, that is an array of characters terminated by a null byte.
it may contain conversion specifiers, which must be properly constructed and the corresponding arguments must be passed as extra arguments to printf, with the expected types and order as derived from the format string.
Passing a string constant such as "helloWorld" as the format argument is the most common way to invoke printf. String constants are arrays of char terminated by a null byte which should not be modified by the program. Passing them to functions expecting a pointer to char will cause a pointer to their first byte to be passed, as is the case for all arrays in C.
The expression "helloWorld" + 3 or 3 + "helloWorld" evaluates to a pointer to the 4th byte of the string. It is equivalent to the expression 3 + &("helloWorld"[0]), &("helloWorld"[3]) or simply &"helloWorld"[3]. As a matter of fact, it is also equivalent to &3["helloWorld"] but this latest form is only used for pathological obfuscation.
printf does not use the bytes that precede the format argument, so passing 3 + "helloWorld" is equivalent to passing "loWorld" and produces the same output.
To be more precise, the first argument to printf() function is not an array of chars but a pointer to array of chars. the difference between them is the same difference between byval and byref in VB world.
a pointer can be incremented and decremented using (++ and --) or applying arithmetic operations (+ and -).
in your case you are passing a pointer to "helloWorld" incremented by three, thus it points to the forth element of the array of chars "helloWorld".
lets simplify this in asm pseudo code
MOV EAX, offset ("helloWorld")
ADD EAX, 3
PUSH EAX
CALL printf
you maybe think that 3+"hello word" do concatenation between 3 and "hello world", but in C concatenation is done otherwise. the simplest way to do is sprintf(buff, "%d%s",3,"hello wrord");
It's pretty strange, but not wrong. By doing 3 + you are moving your pointer to a different location.
The same thing work when you initialize a char *:
char *str1 = "Hello";
char *str2 = 2 + str1;
str2 is now equal to "llo".

Assigning array to pointer confusion

I was confused by a line of code I found in a tutorial on C. Here is the code:
int main(int argc, char *argv[]){
...
char **inputs = argv+1; // This is the confusing line
...
return 0;
}
I can't understand how can you assign an array to a pointer like that. I would be glad if someone could clarify this for me. Thanks ahead!
Say you execute a program like this
C:\Temp>myprog.exe hello world
the operating system takes these strings and puts them together, in an array of null-terminating strings:
{ "myprog.exe", "hello", "world", NULL }
Then it calls main() and passes it the number of strings (3) as argc and a pointer to the first string in this array. this pointer is calles argv, and is of type char** (char* argv[] is just a syntactic convenience, semantically equivalent inside function signatures)
but you want inputs to hold only the string "hello" and "world", so you takes this pointer, argc, and point to the next element - add one to it:
char **inputs = argv+1;
now inputs points toward { "hello", "world", NULL } .
argv is an array pointers to strings, last pointer is null.
Suppose your executable name is exe and you run it like:
$ exe fist second
then argv is:
+----------+
argv---►| "exe" |
+----------+
argv + 1---►| "first" |
+----------+
argv + 2---►| "second" |
+----------+
argv + 3---►| null |
+----------+
* Notice last is null.
So char** input = argv + 1 points to "first" string that is first input argument.
if your prints argv[0] with %s output will be "exe" that is your executable name and if you prints input[0] with %s output will be "fisrt" string.
Note: even if you don't pass any argument intput will point to NULL (valid address).
(purpose of this is to point to input arguments strings, or say skip program name "exe")
My following code example, and its set of outputs will help you to understand.
code.c:
#include<stdio.h>
int main(int argc, char* argv[]){
char** input = argv + 1;
while(*input) /* run untill input-->null */
printf("%s \n", *input++);
return 1;
}
outputs:
~$ gcc code.c -Wall -pedantic -o exe
~$ ./exe
~$ ./exe first
first
~$ ./exe first second
first
second
Array Name is a constant pointer to the first element of array
MORE:-
http://forum.allaboutcircuits.com/showthread.php?t=56256
Arrays used in function arguments are always converted to pointers.
So char *argv[] is the same as char **argv.
argv[0] contains the program name (or the name used to invoke the program, in the case of a multiply-linked file), so argv+1 is just the program's arguments.
argv is not an array in this context; it is a pointer value. In the context of a function parameter declaration, T a[N] and T a[] are equivalent to T *a; in all three declare a as a pointer to T.
However, it would still be possible to make the assignment even if argv were an array. Unless it is the operand of the sizeof or unary & operators, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T", and the value of the expression is the address of the first element of the array.
Notice that
char *argv[]
Is an array of pointers. An array declaration, is a pointer itself, argv is a pointer. Since here we have an array of pointers, argv is a pointer to a pointer, just like char **inputs, thus
char **inputs = argv+1;
Is just saying inputs is equal to the pointer argv plus one and since argv+1 is also a pointer, then you have a pointer to a pointer.

Pointer to Pointer with argv

Based on my understanding of pointer to pointer to an array of characters,
% ./pointer one two
argv
+----+ +----+
| . | ---> | . | ---> "./pointer\0"
+----+ +----+
| . | ---> "one\0"
+----+
| . | ---> "two\0"
+----+
From the code:
int main(int argc, char **argv) {
printf("Value of argv[1]: %s", argv[1]);
}
My question is, Why is argv[1] acceptable? Why is it not something like (*argv)[1]?
My understanding steps:
Take argv, dereference it.
It should return the address of the array of pointers to characters.
Using pointer arithmetics to access elements of the array.
It's more convenient to think of [] as an operator for pointers rather than arrays; it's used with both, but since arrays decay to pointers array indexing still makes sense if it's looked at this way. So essentially it offsets, then dereferences, a pointer.
So with argv[1], what you've really got is *(argv + 1) expressed with more convenient syntax. This gives you the second char * in the block of memory pointed at by argv, since char * is the type argv points to, and [1] offsets argv by sizeof(char *) bytes then dereferences the result.
(*argv)[1] would dereference argv first with * to get the first pointer to char, then offset that by 1 * sizeof(char) bytes, then dereferences that to get a char. This gives the second character in the first string of the group of strings pointed at by argv, which is obviously not the same thing as argv[1].
So think of an indexed array variable as a pointer being operated on by an "offset then dereference a pointer" operator.
Because argv is a pointer to pointer to char, it follows that argv[1] is a pointer to char. The printf() format %s expects a pointer to char argument and prints the null-terminated array of characters that the argument points to. Since argv[1] is not a null pointer, there is no problem.
(*argv)[1] is also valid C, but (*argv) is equivalent to argv[0] and is a pointer to char, so (*argv)[1] is the second character of argv[0], which is / in your example.
Indexing a pointer as an array implicitly dereferences it. p[0] is *p, p[1] is *(p + 1), etc.

Resources