Difference between *argv++ and *argv-- when reaching the limit - c

could someone explain why
int main(int argc, const char * argv[]) {
while (* argv)
puts(* argv++);
return 0 ;
}
is legal, and
int main(int argc, const char * argv[]) {
argv += argc - 1;
while (* argv)
puts(* argv--);
return 0 ;
}
isn't?
In both cases the 'crement inside the while loop will point outside of the bounds of argv. Why is it legal to point to an imaginary higher index, and not to an imaginary lower index?
Best regards.

Because the C standard says you can form a pointer to one past the end of an array, and it will still compare properly to pointers into the array (though you can't dereference it).
The standard does not say anything of the sort for a pointer to an address before the beginning of an array -- even forming such a pointer gives undefined behavior.

Loop semantics and half-open intervals. The idiomatic way for iterating through an array or list of objects pointed to by a pointer is:
for (T *p = array; p < array + count; p++)
Here, p ends up being out-of-bounds (off by one, pointing one past the end of the array), so it's (not only conceptually) useful to require this not to invoke undefined behavior (the Standard actually imposes this requirement).

The standard forces argv[argc] to be equal to NULL, so dereferencing argv when it's been incremented argc times is legal.
On the other hand nothing is defined about the address preceding argv, so argv - 1 could be anything.
Note that argv is the only array of strings guaranteed to behave this way, as far as I know.
From the standard:
5.1.2.2.1 Program Startup
If they are declared, the parameters to the main function shall obey the following costraints:
argv[argc] shall be a null pointer

argv++ or ++argv as it is const pointer.
If you take a simple array like char* arr[10] and try arr++ it will give error

Related

can you explain me why it is possible p[-1]?

int *p[10]={5,663,36,6};
*(p - 1) = 'e';
int c=*(p-1);
printf("%c",c);
i am not able to understand why we use negative number in array index
*(p - 1) = 'e';
For your example it would be undefined behaviour, but there are situations where you might want to use it, notably if your pointer was pointing to somewhere inside an array and you check that you are still inside the bounds of the array.
Like in this example...
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
char hello[]="worlld";
char *p;
for(p=hello;*p!='\0';p++) {
if ((p>hello) && (p[-1]==p[0])) {
printf("%c\n",p[0]);
}
}
return(0);
}
The language does not prevent you from using negative numbers in indexing of an array or a pointer. This does not meant that it is always correct. i.e. in your example it would access an array element which is 1 position before the beginning of the array. in other words you access invalid memory addres.
However in the situation like the following, where p1 points to a non-0 element of the array, you can use negative indexes:
int p[] = {1,2,3,4};
int *p1 = &p[1];
int x = *(p1-1);
int y = p1[-1]; // equivalent to the previous one
In both cases 'x' and 'y' will become '1';
i am not able to understand why we use negative number in array index
That's because
you apparently think [] is an array operator, but it is not. It is a pointer operator, defined in terms of pointer arithmetic, which, in a general sense, permits subtracting integers from pointers.
you seem to have an expectation of some particular kind of behavior arising from evaluating your example code, but it exhibits undefined behavior on account of performing arithmetic on pointer p that does not produce a result pointing into (or just past the end of) the same object that p points [in]to. "Undefined" means exactly what it says. Although an obvious program failure or error message might be emitted, you cannot rely on that, or on any other particular behavior. Of the entire program.

C variable not where I expect to find it in memory

Can someone explain why printing the pointers to the two ints results in them being placed in different locations in relation to the chars.
The piece of code below should print out the memory address from &a to &c which (I think) should include the two ints defined but it doesn't, however when I try to find out where they're stored in memory (see second code segment) it does print them between the two chars as expected.
Please explain why printing the int pointers effects the ints being stored between the chars in memory.
The two code samples are the same except code 2 has an extra line printf("\n\n%p,%p\n",&i,&j); which prints the pointers of the two ints.
Edit: Yes I know the prinf formating is ugly but the code was only to help me clarify how memory and pointers work, so I didn't need it to be pretty
Code1
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv){
char a='a';
int i=1;
int j=2;
char c='c';
char *pos;
for ( pos=&c; pos<=&a; pos++ ){
printf("%p\t",pos);
}
printf("\n");
for ( pos=&c; pos<=&a; pos++ ){
printf("%i\t\t",*pos);
}
}
Results from Code1
0x7ffde6321e7e 0x7ffde6321e7f
99 97
Code2
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv){
char a='a';
int i=1;
int j=2;
char c='c';
char *pos;
for ( pos=&c; pos<=&a; pos++ ){
printf("%p\t",pos);
}
printf("\n");
for ( pos=&c; pos<=&a; pos++ ){
printf("%i\t\t",*pos);
}
printf("\n\n%p,%p\n",&i,&j);
}
Results from Code2
0x7ffc3575616b 0x7ffc3575616c 0x7ffc3575616d 0x7ffc3575616e 0x7ffc3575616f 0x7ffc35756170 0x7ffc35756171 0x7ffc35756172 0x7ffc35756173 0x7ffc35756174 0x7ffc35756175 0x7ffc35756176 0x7ffc35756177
99 2 0 0 0 1 0 0 0 -4 127 0 97
0x7ffc35756170,0x7ffc3575616c
You're relying on somethingNote 1 which is not specified in C standard. The behaviour cannot be defined. It invokes undefined behavior.Note 2
That said, you should always cast the argument of %p to void *, as the expected type is void * and there's no default promotion for pointers.
Note 1:
C does not mention or guarantee the order of allocation of variables / objects in a program. There's no guarantee that they will have consecutive memory locations, either increasing or decreasing. They are purely allowed to have random memory locations, so the theory you're believing in,
for ( pos=&c; pos<=&a; pos++ )
does not hold true. An(y) implementation can choose to place (reorder) variable(s) however it does see fit. There's absolutely no guarantee of the order of memory address with respect to their definition in the code.
Note 2:
For relational operators, quoting C11. chapter §6.5.8, (emphasis mine)
When two pointers are compared, the result depends on the relative locations in the
address space of the objects pointed to. If two pointers to object types both point to the
same object, or both point one past the last element of the same array object, they
compare equal. If the objects pointed to are members of the same aggregate object,
pointers to structure members declared later compare greater than pointers to members
declared earlier in the structure, and pointers to array elements with larger subscript
values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the
expression P points to an element of an array object and the expression Q points to the
last element of the same array object, the pointer expression Q+1 compares greater than
P. In all other cases, the behavior is undefined.
So, for your case, the comparison pos<=&a; is an attempt to compare two pointers which are neither
pointing to same object
members of the same aggregate object
pointers to array elements
pointers to members of the same union object
In short, they are not within the defined scope and hence, using them as operand of the relational operator invokes undefined behaviour.
The location of local variables is implementation defined. The compiler may put them in any order it deems best.
Making seemingly unrelated code changes such as an extra print statement or changing the optimization level can change how the compiler lays out the variables.
In short, you can't depend on any particular layout of variables in memory.
Local variables are placed in the stack (or in register if possible & if their address is not referred). In your example the i is first and j is second local vars, so you have push i, push j - the address of the second &j is &i-1.

Is argv[n] writable?

C11 5.1.2.2.1/2 says:
The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination.
My interpretation of this is that it specifies:
int main(int argc, char **argv)
{
if ( argv[0][0] )
argv[0][0] = 'x'; // OK
char *q;
argv = &q; // OK
}
however it does not say anything about:
int main(int argc, char **argv)
{
char buf[20];
argv[0] = buf;
}
Is argv[0] = buf; permitted?
I can see (at least) two possible arguments:
The above quote deliberately mentioned argv and argv[x][y] but not argv[x], so the intent was that it is not modifiable
argv is a pointer to non-const objects, so by in the absence of specific wording to the contrary, we should assume they are modifiable objects.
IMO, code like argv[1] = "123"; is UB (using the original argv).
"The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination." C11dr & C17dr1 §5.1.2.2.1 2
Recall that const came into C many years after C's creation.
Much like char *s = "abc"; is valid when it should be const char *s = "abc";. The need for const was not required else too much existing code would have be broken with the introduction of const.
Likewise, even if argv today should be considered char * const argv[] or some other signature with const, the lack of const in the char *argv[] does not completely specify the const-ness needs of the argv, argv[], or argv[][]. The const-ness needs would need to be driven by the spec.
From my reading, since the spec is silent on the issue, yet goes into depth about other assignments of main()'s argv = and argv[i][j] = , it is UB.
Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" §4 2
[edit]:
main() is a very special function in C. What is allowable in other functions may or may not be allowed in main(). The C spec details attributes about its parameters that given the signature int argc, char *argv[] that shouldn't need. main(), unlike other functions in C, can have an alternate signature int main(void) and potentially others. main() is not reentrant. As the C spec goes out of its way to detail what can be modified: argc, argv, argv[][], it is reasonable to question if argv[] is modifiable due to its omission from the spec asserting that code can.
Given the specialty of main() and the omission of specifying that argv[] as modifiable, a conservative programmer would treat this greyness as UB, pending future C spec clarification.
If argv[i] is modifiable on a given platform, certainly the range of i should not exceed argc-1.
As "argv[argc] shall be a null pointer", assignining argv[argc] to something other than NULL appears to be a violation.
Although the strings are modifiable, code should not exceed the original string's length.
char *newstr = "abc";
if (strlen(newstr) <= strlen(argv[1]))
strcpy(argv[1], newstr);
1 No change with C17/18. Since that version was meant to clarify many things, it re-enforces this spec is adequate and not missing an "argv array elements shall be modifiable".
The argv array is not required to be modifiable (but may be in actual implementations). This is an intentional wording which was reaffirmed in the n849 meeting in 1998:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n849.htm
PUBLIC REVIEW COMMENT #7
[...]
Comment 10.
Category: Request for information/clarification
Committee Draft subsection: 5.1.2.2.1
Title: argc/argv modifiability, part 2
Detailed description:
Is the array of pointers to char pointed to by argv modifiable?
Response Code: Q
This is currently implictly unspecified and the committee
has chosen to leave it that way.
In addition, two separate proposals were made to, respectively, change and augment the wording. Both were rejected. Interested readers can find them by searching for "argv".
argc is just an int and is modifiable without any restriction.
argv is a modifiable char **. It means that argv[i] = x is valid. But it does not say anything about argv[i] being itself modifiable. So argv[i][j] = c leads to undefined behaviour.
The getopt function of C standard library does modify argc and argv but never modifies the actual char arrays.
The answer is that argv is an array and yes, its contents are modifiable.
The key is earlier in the same section:
If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup.
From this it is clear that argv is to be thought of as an array of a specific length (argc). Then *argv is a pointer to that array, having decayed to a pointer.
Read in this context, the statement to the effect that 'argv shall be modifiable...and retain its contents' clearly intends that the contents of that array be modifiable.
I concede that there remains some ambiguity in the wording, particularly as to what might happen if argc is modified.
Just to be clear, what I'm saying is that I read this language as meaning:
[the contents of the] argv [array] and the strings pointed to by the argv array shall be modifiable...
So both the pointers in the array and the strings they point to are in read-write memory, no harm is done by changing them, and both preserve their values for the life of the program. I would expect that this behaviour is to be found in all the major C/C++ runtime library implementations, without exception. This is not UB.
The ambiguity is the mention of argc. It is hard to imagine any purpose or any implementation in which the value of argc (which appears to be simply a local function parameter) could not be changed, so why mention it? The standard clearly states that a function can change the value of its parameters, so why treat argc specially in this respect? It is this unexpected mention of argc that has triggered this concern about argv, which would otherwise pass without remark. Delete argc from the sentence and the ambiguity disappears.
It is clearly mentioned that argv and argv[x][x] is modifiable. If argv is modifiable then it can point to another first element of an array of char and hence argv[x] can point to the first element of some another string. Ultimately argv[x] is modifiable too and that could be the reason that there is no need to mention it explicitly in standard.

argv pointer to an array of pointers

I am confused as to how the following passage matches up with the code that follows it:
Since argv is a pointer to an array of pointers, we can manipulate the
pointer rather than index the array. This next variant is based on
incrementing argv, which is a pointer to pointer to char, while argc
is counted down:
#include <stdio.h>
/* echo command-line arguments; 2nd version */
main(int argc, char *argv[])
{
while (--argc > 0)
printf("%s%s", *++argv, (argc > 1) ? " " : "");
printf("\n");
return 0;
}
Isn't char *argv[] just an array of pointers? Wouldn't a pointer to an array of pointers be written as char *(*argv[]) or something similar?
As a side note, is it normal that in general I find declarations that mix arrays and pointers rather confusing?
Such terms as "pointer to array" or "to point to an array" are often treated rather loosely in C terminology. They can mean at least two different things.
In the most strict and pedantic sense of the term, a "pointer to array" has to be declared with "pointer to array" type, as in
int a[10];
int (*p)[10] = &a;
In the above example p is declared as a pointer to array of 10 ints and it is actually initialized to point to such an array.
However, the term is also often used is its less formal meaning. In this example
int a[10];
int *p = &a;
p is declared as a mere pointer to int. It is initialized to point to the first element of array a. You can often hear and see people say that p in this case also "points to an array" of ints, even though this situation is semantically different from previous one. "Points to an array" in this case means "provides access to elements of an array through pointer arithmetic", as in p[5] or *(p + 3).
This is exactly what is meant by the phrase "...argv is a pointer to an array of pointers..." you quoted. argv's declaration in parameter list of main is equivalent to char **argv, meaning that argv is actually a pointer to a char * pointer. But since it physically points to the first element of some array of char * pointers (maintained by the calling code), it is correct to say semi-informally that argv points to an array of pointers.
That's exactly what is meant by the text you quoted.
Where C functions claim to accept arrays, strictly they accept pointers instead. The language does not distinguish between void fn(int *foo) {} and void fn(int foo[]). It doesn't even care if you have void fn(int foo[100]) and then pass that an array of int [10].
int main(int argc, char *argv[])
is the same as
int main(int argc, char **argv)
Consequently, argv points to the first element of an array of char pointers, but it is not itself an array type and it does not (formally) point to a whole array. But we know that array is there, and we can index into it to get the other elements.
In more complex cases, like accepting multi-dimensional arrays, it is only the first [] which drops back to a pointer (and which can be left unsized). The others remain as part of the type that is being pointed to, and they have an influence on pointer arithmetic.
The array-pointer equivalence thing only holds true only for function arguments, so while void fn(const char* argv[]) and void fn(const char** argv) are equivalent, it doesn't hold true when it comes to the variables you might want to pass TO the function.
Consider
void fn(const char** argv)
{
...
}
int main(int argc, const char* argv[])
{
fn(argv); // acceptable.
const char* meats[] = { "Chicken", "Cow", "Pizza" };
// "meats" is an array of const char* pointers, just like argv, so
fn(meats); // acceptable.
const char** meatPtr = meats;
fn(meatPtr); // because the previous call actually cast to this,.
// an array of character arrays.
const char vegetables[][10] = { "Avocado", "Pork", "Pepperoni" };
fn(vegetables); // does not compile.
return 0;
}
"vegetables" is not a pointer to a pointer, it points directly to the first character in a 3*10 contiguous character sequence. Replace fn(vegetables) in the above to get
int main(int argc, const char* argv[])
{
// an array of character arrays.
const char vegetables[][10] = { "Avocado", "Pork", "Pepperoni" };
printf("*vegetables = %c\n", *(const char*)vegetables);
return 0;
}
and the output is "A": vegetables itself is pointing directly - without indirection - to the characters, and not intermediate pointers.
The vegetables assignment is basically a shortcut for this:
const char* __vegetablesPtr = "Avocado\0\0\0Pork\0\0\0\0\0\0Pepperoni\0";
vegetables = __vegetablesPtr;
and
const char* roni = vegetables[2];
translates to
const char* roni = (&vegetables[0]) + (sizeof(*vegetables[0]) * /*dimension=*/10 * /*index=*/2);
Since argv is a pointer to an array of pointers.
This is wrong. argv is an array of pointers.
Since argv is a pointer to an array of pointers,
No, not even close.
Isn't char *argv[] just an array of pointers?
No, it's a pointer to pointers.
"Pointer to the first element of an array" is a common construct. Every string function uses it, including stdio functions that input and output strings. main uses it for argv.
"Pointer to an array" is a rare construct. I can't find any uses of it in the C standard library or POSIX. grepping all the headers I have installed locally (for '([^)]*\*[^)]) *\[') I find exactly 2 legitimate instances of pointer-to-array, one in libjpeg and one in gtk. (Both are struct members, not function parameters, but that's beside the point.)
So if we stick to official language, we have a rare thing with a short name and a similar but much more common thing with a long name. That's the opposite of the way human language naturally wants to work, so there's tension, which gets resolved in all but the most formal situations by using the short name "incorrectly".
The reason we don't just say "pointer to pointer" is that there's another common use of pointers as function parameters, in which the parameter points to a single object that's not a member of an array. For example, in
long strtol(const char *nptr, char **endptr, int base);
endptr is exactly the same type as argv is in main, both are pointer-to-pointer, but they're used in different ways. argv points to the first char * in an array of char *s; inside main you're expected to use it with indexes like argv[0], argv[optind], etc., or step through the array by incrementing it with ++argv.
endptr points to a single char *. Inside strtol, it is not useful to increment endptr or to refer to endptr[n] for any value of n other than zero.
That's semantic difference is expressed by the informal usage of "argv is a pointer to an array". The possible confusion with what "pointer to array" means in formal language is ignored, because the natural instinct to use concise language is stronger than the desire to adhere to a formal definition that tells you not to use the most obvious simple phrase because it's reserved for a situation that will almost never happen.

What is this asterisk for?

I'm learning c programming, and I don't understand what is this asterisk for in the main method.
int main(int argc, char* argv[])
char* a; means that a is a pointer to variable of type char.
In your case argv is a pointer to a pointer (or even several of them - it is specified in argv in your case) to a variable(s) of type char. In other words, it's a pointer to an array (of length argv) of pointers to char variables.
You can even write your code this way: int main(int argc, char** argv) and nothing, actually, changes as soon as char* a is the same as char a[].
It means that argv is an array of character pointers.
The declaration char *argv[] declares argv as an array (of unknown size) of pointer to char.
For any type T, the declaration
T *p;
declares p as a pointer to T. Note that the * is bound to the identifier, not the type; in the declaration
T *a, b;
only a is declared as a pointer.
It signifies a pointer. char argv[] declares an array of characters. char* argv[] declares an array of character pointers, or pointers to strings.
Those are parameters passed from the command line to your program. This asterix is a pointer operator.
Basically char argv[] is an array of characters, char *argv[] is a pointer to an array of characters. So it is here to represent multiple strings to put it simply!
Note that: char *argv[] is equivalent to char * * argv, as char argv[] could be represented as char *argv.
Just to go further you would be amazed that those two expressions are equivalent:
int a[5];
int 5[a];
This is because an array of integers is a pointer to a set of integers in memory.
So a[1] can be represented as *(a + 1), a[2] as *(a + 2) etc. Which is equivalent to *(1 + a) or *(2 + a).
Anyway, pointers are like one of the most important and difficult notion to grasp when starting programming in C so I would suggest you taking a serious look at it on Google!
This " * " over here is, for sure to specify a pointer only, to place the argv[] //variable number of argument values// to a place it can fit.
Cause you don't know how many parameters will the user be passing as it is argc [argument count] and argv [argument value]. But we do want to allocate them a space where they can fit so we use a pointer with no defined specific SIZE, this pointer will automaticaly find and fit to appropriate memory location.
Hope this helped, if this didn't I'll be glad to help just let me know :)

Resources