Is argv[n] writable? - c

C11 5.1.2.2.1/2 says:
The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination.
My interpretation of this is that it specifies:
int main(int argc, char **argv)
{
if ( argv[0][0] )
argv[0][0] = 'x'; // OK
char *q;
argv = &q; // OK
}
however it does not say anything about:
int main(int argc, char **argv)
{
char buf[20];
argv[0] = buf;
}
Is argv[0] = buf; permitted?
I can see (at least) two possible arguments:
The above quote deliberately mentioned argv and argv[x][y] but not argv[x], so the intent was that it is not modifiable
argv is a pointer to non-const objects, so by in the absence of specific wording to the contrary, we should assume they are modifiable objects.

IMO, code like argv[1] = "123"; is UB (using the original argv).
"The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination." C11dr & C17dr1 §5.1.2.2.1 2
Recall that const came into C many years after C's creation.
Much like char *s = "abc"; is valid when it should be const char *s = "abc";. The need for const was not required else too much existing code would have be broken with the introduction of const.
Likewise, even if argv today should be considered char * const argv[] or some other signature with const, the lack of const in the char *argv[] does not completely specify the const-ness needs of the argv, argv[], or argv[][]. The const-ness needs would need to be driven by the spec.
From my reading, since the spec is silent on the issue, yet goes into depth about other assignments of main()'s argv = and argv[i][j] = , it is UB.
Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" §4 2
[edit]:
main() is a very special function in C. What is allowable in other functions may or may not be allowed in main(). The C spec details attributes about its parameters that given the signature int argc, char *argv[] that shouldn't need. main(), unlike other functions in C, can have an alternate signature int main(void) and potentially others. main() is not reentrant. As the C spec goes out of its way to detail what can be modified: argc, argv, argv[][], it is reasonable to question if argv[] is modifiable due to its omission from the spec asserting that code can.
Given the specialty of main() and the omission of specifying that argv[] as modifiable, a conservative programmer would treat this greyness as UB, pending future C spec clarification.
If argv[i] is modifiable on a given platform, certainly the range of i should not exceed argc-1.
As "argv[argc] shall be a null pointer", assignining argv[argc] to something other than NULL appears to be a violation.
Although the strings are modifiable, code should not exceed the original string's length.
char *newstr = "abc";
if (strlen(newstr) <= strlen(argv[1]))
strcpy(argv[1], newstr);
1 No change with C17/18. Since that version was meant to clarify many things, it re-enforces this spec is adequate and not missing an "argv array elements shall be modifiable".

The argv array is not required to be modifiable (but may be in actual implementations). This is an intentional wording which was reaffirmed in the n849 meeting in 1998:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n849.htm
PUBLIC REVIEW COMMENT #7
[...]
Comment 10.
Category: Request for information/clarification
Committee Draft subsection: 5.1.2.2.1
Title: argc/argv modifiability, part 2
Detailed description:
Is the array of pointers to char pointed to by argv modifiable?
Response Code: Q
This is currently implictly unspecified and the committee
has chosen to leave it that way.
In addition, two separate proposals were made to, respectively, change and augment the wording. Both were rejected. Interested readers can find them by searching for "argv".

argc is just an int and is modifiable without any restriction.
argv is a modifiable char **. It means that argv[i] = x is valid. But it does not say anything about argv[i] being itself modifiable. So argv[i][j] = c leads to undefined behaviour.
The getopt function of C standard library does modify argc and argv but never modifies the actual char arrays.

The answer is that argv is an array and yes, its contents are modifiable.
The key is earlier in the same section:
If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup.
From this it is clear that argv is to be thought of as an array of a specific length (argc). Then *argv is a pointer to that array, having decayed to a pointer.
Read in this context, the statement to the effect that 'argv shall be modifiable...and retain its contents' clearly intends that the contents of that array be modifiable.
I concede that there remains some ambiguity in the wording, particularly as to what might happen if argc is modified.
Just to be clear, what I'm saying is that I read this language as meaning:
[the contents of the] argv [array] and the strings pointed to by the argv array shall be modifiable...
So both the pointers in the array and the strings they point to are in read-write memory, no harm is done by changing them, and both preserve their values for the life of the program. I would expect that this behaviour is to be found in all the major C/C++ runtime library implementations, without exception. This is not UB.
The ambiguity is the mention of argc. It is hard to imagine any purpose or any implementation in which the value of argc (which appears to be simply a local function parameter) could not be changed, so why mention it? The standard clearly states that a function can change the value of its parameters, so why treat argc specially in this respect? It is this unexpected mention of argc that has triggered this concern about argv, which would otherwise pass without remark. Delete argc from the sentence and the ambiguity disappears.

It is clearly mentioned that argv and argv[x][x] is modifiable. If argv is modifiable then it can point to another first element of an array of char and hence argv[x] can point to the first element of some another string. Ultimately argv[x] is modifiable too and that could be the reason that there is no need to mention it explicitly in standard.

Related

If argc is 1, can I still use argv[1], argv[2],... character arrays?

If there is no argument passed from the command line ie. if argc is 1, can we still allocate memory for argv[1],argv[2],..... and use those buffers for further experiments.
If that is undefined behavior, can I still use it somehow?
No, the C standard does not specify that argv has any elements beyond argv[argc], so they may not exist in C’s object-memory model and the behavior of using them is not defined by the C standard.
C 2018 5.1.2.2.1 2 says:
…
argv[argc] shall be a null pointer.
If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.
…
That is all there is that defines the extent of the argv array; nothing in the standard says there are more elements.
When argc is one, using argv[1] is defined but using argv[2] is not.
You can store new values to the defined elements because C 2018 5.1.2.2.1 2 also says:
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
can I access argv[] elements after argv[argc]?
You can ... but ONLY IN THIS CODE
#include <stdio.h>
int main(int argc, char **argv) {
if (argc == 1) {
char *foo[] = {"bar", "baz", "quux", NULL, "bingo"};
main(3, foo);
} else {
printf("argc is %d; argv[4] is \"%s\"\n", argc, argv[4]);
}
return 0;
}
See code running on ideone.
In all other codes, you cannot.
Always argv[argc] is equal to NULL. In the described case where argc is equal to 1 the array argv contains two pointers argv[0] and argv[1] where argv[1] is a null pointer.
You may reassign the pointers but this does not make a great sense because that will make your program unclear. Instead you could declare your own array if you need.

What does the "main()" that is put in the beginning of programs in C mean?

I just want to learn the basics thoroughly and what some simple codes refer to.
I was able to find a short description at
https://www.dummies.com/programming/c/looking-at-the-c-language/ but I dont think I fully understand it with the help of just that.
It's the starting point for your program. Per 5.1.2.2.1 Program startup of the C standard:
The function called at program startup is named main. The
implementation declares no prototype for this function. It shall be
defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any
names may be used, as they are local to the function in which they are
declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
If they are declared, the parameters to the main function shall obey
the following constraints:
The value of argc shall be nonnegative.
argv[argc] shall be a null pointer.
If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to
strings, which are given implementation-defined values by the host
environment prior to program startup. The intent is to supply to the
program information determined prior to program startup from elsewhere
in the hosted environment. If the host environment is not capable of
supplying strings with letters in both uppercase and lowercase, the
implementation shall ensure that the strings are received in
lowercase.
If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the
null character if the program name is not available from the host
environment. If the value of argc is greater than one, the strings
pointed to by argv[1] through argv[argc-1] represent the program
parameters.
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their
last-stored values between program startup and program termination.

Why do variables of type char* cstring = "myString" are impossible to modify but variables defined char[] cstring= "mystring" are possible to modify? [duplicate]

Both GCC and Clang do not complain if I assign a string literal to a char*, even when using lots of pedantic options (-Wall -W -pedantic -std=c99):
char *foo = "bar";
while they (of course) do complain if I assign a const char* to a char*.
Does this mean that string literals are considered to be of char* type? Shouldn't they be const char*? It's not defined behavior if they get modified!
And (an uncorrelated question) what about command line parameters (ie: argv): is it considered to be an array of string literals?
They are of type char[N] where N is the number of characters including the terminating \0. So yes you can assign them to char*, but you still cannot write to them (the effect will be undefined).
Wrt argv: It points to an array of pointers to strings. Those strings are explicitly modifiable. You can change them and they are required to hold the last stored value.
For completeness sake the C99 draft standard(C89 and C11 have similar wording) in section 6.4.5 String literals paragraph 5 says:
[...]a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence;[...]
So this says a string literal has static storage duration(lasts the lifetime of the program) and it's type is char[](not char *) and its length is the size of the string literal with an appended zero. *Paragraph 6` says:
If the program attempts to modify such an array, the behavior is undefined.
So attempting to modify a string literal is undefined behavior regardless of the fact that they are not const.
With respect to argv in section 5.1.2.2.1 Program startup paragraph 2 says:
If they are declared, the parameters to the main function shall obey the following
constraints:
[...]
-The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program
startup and program termination.
So argv is not considered an array of string literals and it is ok to modify the contents of argv.
Using -Wwrite-strings option you will get:
warning: initialization discards qualifiers from pointer target type
Irrespective of that option, GCC will put literals into read-only memory section, unless told otherwise by using -fwritable-strings (however this option has been removed from recent GCC versions).
Command line parameters are not const, they typically live on the stack.
(Sorry, I've only just noticed this question is tagged as c, not c++. Maybe my answer isn't so relevant to this question after all!)
String literals are not quite const or not-const, there is a special strange rule for literals.
(Summary: Literals can be taken by reference-to-array as foo( const char (&)[N]) and cannot be taken as the non-const array. They prefer to decay to const char *. So far, that makes it seem like they are const. But there is a special legacy rule which allows literals to decay to char *. See experiments below.)
(Following experiments done on clang3.3 with -std=gnu++0x. Perhaps this is a C++11 issue? Or specific to clang? Either way, there is something strange going on.)
At first, literals appears to be const:
void foo( const char * ) { std::cout << "const char *" << std::endl; }
void foo( char * ) { std::cout << " char *" << std::endl; }
int main() {
const char arr_cc[3] = "hi";
char arr_c[3] = "hi";
foo(arr_cc); // const char *
foo(arr_c); // char *
foo("hi"); // const char *
}
The two arrays behave as expected, demonstrating that foo is able to tell us whether the pointer is const or not. Then "hi" selects the const version of foo. So it seems like that settles it: literals are const ... aren't they?
But, if you remove void foo( const char * ) then it gets strange. First, the call to foo(arr_c) fails with an error at compile time. That is expected. But the literal call (foo("hi")) works via the non-const call.
So, literals are "more const" than arr_c (because they prefer to decay to the const char *, unlike arr_c. But literals are "less const" than arr_cc because they are willing to decay to char * if needed.
(Clang gives a warning when it decays to char *).
But what about the decaying? Let's avoid it for simplicity.
Let's take the arrays by reference into foo instead. This gives us more 'intuitive' results:
void foo( const char (&)[3] ) { std::cout << "const char (&)[3]" << std::endl; }
void foo( char (&)[3] ) { std::cout << " char (&)[3]" << std::endl; }
As before, the literal and the const array (arr_cc) use the const version, and the non-const version is used by arr_c. And if we delete foo( const char (&)[3] ), then we get errors with both foo(arr_cc); and foo("hi");. In short, if we avoid the pointer-decay and use reference-to-array instead, literals behave as if they are const.
Templates?
In templates, the system will deduce const char * instead of char * and you're "stuck" with that.
template<typename T>
void bar(T *t) { // will deduce const char when a literal is supplied
foo(t);
}
So basically, a literal behaves as const at all times, except in the particular case where you directly initialize a char * with a literal.
Johannes' answer is correct concerning the type and contents. But in addition to that, yes, it is undefined behavior to modify contents of a string literal.
Concerning your question about argv:
The parameters argc and argv and the
strings pointed to by the argv array
shall be modifiable by the program,
and retain their last-stored values
between program startup and program
termination.
In both C89 and C99, string literals are of type char * (for historical reasons, as I understand it). You are correct that trying to modify one results in undefined behavior. GCC has a specific warning flag, -Wwrite-strings (which is not part of -Wall), that will warn you if you try to do so.
As for argv, the arguments are copied into your program's address space, and can safely be modified in your main() function.
EDIT: Whoops, had -Wno-write-strings copied by accident. Updated with the correct (positive) form of the warning flag.
String literals have formal type char [] but semantic type const char []. The purists hate it but this is generally useful and harmless, except for bringing lots of newbies to SO with "WHY IS MY PROGRAM CRASHING?!?!" questions.
They are const char*, but there is a specific exclusion for assigning them to char* for legacy code that existed before const did. And the command line arguments are definitely not literal, they are created at run-time.

what is this pointer to a character array supposed to represent [duplicate]

This question already has answers here:
What does int argc, char *argv[] mean?
(12 answers)
Closed 9 years ago.
i am really confused regarding this main function,
int main( int argc, char *argv[] ) {
/*statements*/
}
specifically the
char *argv[ ].
What does that represent exactly? i know that it is a pointer to an array of characters, but how is that array created and how does it work exactly? also is that char array the same as a string, since strings are an array or char?
It is a Command line argument.
You can just pass some values during execution of the program like below.
#include<stdio.h>
int main(int count,char *argv[]){
int i=0;
for(i=0;i<count;i++)
printf("\n%s",argv[i]);
return 0;
}
//save file as arg.c
In command line
C:\tc\bin>arg c JS
Output:
*c*
*JS*
It points to the parameters that are passed to your program when you launch it.
Ex:
./a.out toto tata
printf("argv[0]: %s, argv[1]: %s, argv[2]: %s, argv[3]: %s", argv[0], argv[1], argv[2], argv[3]);
Output:
argv[0]: ./a.out , argv[1]: toto, argv[2]: tata, argv[3]: (null)
argc is the number of arguments stored in argv.
You don't have to care about who created it, as it's part of the C standard. Search information about _start function if you really want to know.
argv is an array of string, and each individual string are each selfs arrays of char.
Some time you will see argv noted like this:**argv or argv[][].
char *argv[] is syntactic sugar for char **argv. argv is simply an array of pointers to null-terminated strings. The operating system creates the array for you before invoking your main() function.
int argc = Number of arguments/parameters when running the program (including program name)
char *argv[] = Arguments as an array of "strings" when running the program. This is how I think of it.
Example:
C:\> echo hello world
argc = 3
argv[0] = echo
argv[1] = hello
argv[2] = world
It points to the parameters passed by executing the java file. If you have a class called MyClass with a main method, by calling java myclass a b, you will have a and b in this array. Also in c or c++ calling myCommand a b...
The crucial facts is that it's an array of pointers to characters not a pointer to an array. So you have several pointers to characters, one per "word" in the program's argument list.
char *argv[] is the same as char **argv and argv[0] to argv[argvc-1] are pointers to C style strings(which are NULL terminated). The draft C99 standard actually provides a nice explanation for how it works and what the contents should be, from section 5.1.2.2.1 Program startup paragraph 2 says:
If they are declared, the parameters to the main function shall obey the following
constraints:
— The value of argc shall be nonnegative.
— argv[argc] shall be a null pointer.
— If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup. The
intent is to supply to the program information determined prior to program startup
from elsewhere in the hosted environment. If the host environment is not capable of
supplying strings with letters in both uppercase and lowercase, the implementation
shall ensure that the strings are received in lowercase.
— If the value of argc is greater than zero, the string pointed to by argv[0]
represents the program name; argv[0][0] shall be the null character if the
program name is not available from the host environment. If the value of argc is
greater than one, the strings pointed to by argv[1] through argv[argc-1]
represent the program parameters.
— The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination.

Difference between *argv++ and *argv-- when reaching the limit

could someone explain why
int main(int argc, const char * argv[]) {
while (* argv)
puts(* argv++);
return 0 ;
}
is legal, and
int main(int argc, const char * argv[]) {
argv += argc - 1;
while (* argv)
puts(* argv--);
return 0 ;
}
isn't?
In both cases the 'crement inside the while loop will point outside of the bounds of argv. Why is it legal to point to an imaginary higher index, and not to an imaginary lower index?
Best regards.
Because the C standard says you can form a pointer to one past the end of an array, and it will still compare properly to pointers into the array (though you can't dereference it).
The standard does not say anything of the sort for a pointer to an address before the beginning of an array -- even forming such a pointer gives undefined behavior.
Loop semantics and half-open intervals. The idiomatic way for iterating through an array or list of objects pointed to by a pointer is:
for (T *p = array; p < array + count; p++)
Here, p ends up being out-of-bounds (off by one, pointing one past the end of the array), so it's (not only conceptually) useful to require this not to invoke undefined behavior (the Standard actually imposes this requirement).
The standard forces argv[argc] to be equal to NULL, so dereferencing argv when it's been incremented argc times is legal.
On the other hand nothing is defined about the address preceding argv, so argv - 1 could be anything.
Note that argv is the only array of strings guaranteed to behave this way, as far as I know.
From the standard:
5.1.2.2.1 Program Startup
If they are declared, the parameters to the main function shall obey the following costraints:
argv[argc] shall be a null pointer
argv++ or ++argv as it is const pointer.
If you take a simple array like char* arr[10] and try arr++ it will give error

Resources