Is it safe to use the argv pointer globally? Or is there a circumstance where it may become invalid?
i.e: Is this code safe?
char **largs;
void function_1()
{
printf("Argument 1: %s\r\n",largs[1]);
}
int main(int argc,char **argv)
{
largs = argv;
function_1();
return 1;
}
Yes, it is safe to use argv globally; you can use it as you would use any char** in your program. The C99 standard even specifies this:
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
The C++ standard does not have a similar paragraph, but the same is implicit with no rule to the contrary.
Note that C++ and C are different languages and you should just choose one to ask your question about.
It should be safe so long as main() function does not exit. A few examples of things that can happen after main() exits are:
Destructors of global and static variables
Threads running longer than main()
Stored argv must not be used in those.
The reference doesn't say anything which would give a reason to assume that the lifetimes of the arguments to main() function differ from the general rules for lifetimes of function arguments.
So long as argv pointer itself is valid, the C/C++ runtime must guarantee that the content to which this pointer points is valid (of course, unless something corrupts memory). So it must be safe to use the pointer and the content that long. After main() returns, there is no reason for the C/C++ runtime to keep the content valid either. So the above reasoning applies to both the pointer and the content it points to.
is it safe to use the argv pointer globally
This requires a little more clarification. As the C11 spec says in chapter §5.1.2.2.1, Program startup
[..].. with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared)
That means, the variables themselves have a scope limited to main(). They are not global themselves.
Again the standard says,
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
That means, the lifetime of these variables are till main() finishes execution.
So, if you're using a global variable to hold the value from main(), you can safely use those globals to access the same in any other function(s).
This thread on the comp.lang.c.moderated newsgroup discusses the issue at length from a C standard point of view, including a citation showing that the contents of the argv arrays (rather than the argv pointer itself, if e.g. you took an address &argv and stored that) last until "program termination", and an assertion that it is "obvious" that program termination has not yet occurred in a way relevant to this while the atexit-registered functions are executing:
The program has not terminated during atexit-registered
function processing. We thought that was pretty obvious.
(I'm not sure who Douglas A. Gwyn is, but it sounds like "we" means the C standard committee?)
The context of the discussion was mainly concerning storing a copy of the pointer argv[0] (program name).
The relevant C standard text is 5.1.2.2.1:
The parameters argc and argv and the strings pointed to by the
argv array shall be modifiable by the program, and retain their
last-stored values between program startup and program
termination.
Of course, C++ is not C, and its standard may subtly differ on this issue or not address it.
You can either pass them as parameters, or store them in global variables. As long as you don't return from main and try to process them in an atexit handler or the destructor of an variable at global scope, they still exist and will be fine to access from any scope.
yes, it is safe for ether C or C++, because there no thread after main was finish.
Related
I often see programs where people put argc and argv in main, but never make any use of these parameters.
int main(int argc, char *argv[]) {
// never touches any of the parameters
}
Should I do so too? What is the reason for that?
The arguments to the main function can be omitted if you do not need to use them. You can define main this way:
int main(void) {
// never touches any of the parameters
}
Or simply:
int main() {
// never touches any of the parameters
}
Regarding why some programmers do that, it could be to conform to local style guides, because they are used to it, or simply a side effect of their IDE's source template system.
When you have a function, it's obviously important that the arguments passed by the caller always match up properly with the arguments expected by the function.
When you define and call one of your own functions, you can pick whatever arguments make sense to you for the function to accept, and then it's your job to call your function with the arguments you've decided on.
When you call a function that somebody else defined — like a standard library function — somebody else picked the arguments that function would accept, and it's your job to pass them correctly. For example, if you call the standard library function strcpy, you just have to pass it a destination string, and a source string, in that order. If you think it would make sense to pass three arguments, like the destination string, and the size of the destination string, and the source string, it won't work. You don't get to make up the way you'll call the function, because it's already defined.
And then there are a few cases where somebody else is going to call a function that you defined, and the way they're going to call it is fixed, such that you don't have any choice in the way you define it. The best example of this (except it turns out it's not such a good example after all, as we'll see) is main(). It's your job to define this function. It's not a standard library function that somebody else is going to define. But, it is a function that somebody else — namely, the C start-up code — is going to call. That code was written a while ago, by somebody else, and you have no control over it. It's going to call your main function in a certain way. So you're constrained to write your main function in a way that's compatible with the way it's going to be called. You can put whatever you want in the body of your main function, but you don't get to pick your own arguments: there are supposed to be two of those, an int and a char **, in that order.
Now, it also turns out that there's a very special exception for main. Even though the caller is going to be calling it with those two predefined arguments, if you're not interested in them, and if you define main with no arguments, instead, like this:
int main()
{
/* ... */
}
your C implementation is required to set things up so that nothing will go wrong, no problems will be caused by the caller passing those two arguments that your main function doesn't accept.
So, in answer to your question, many programs are written to accept int argc and char **argv because they're complying with the simple rule: those are the arguments the caller is accepting, so those are the arguments they believe their main function should be defined as accepting, even if it doesn't actually use them.
Programmers who define main functions that accept argc and argv without using them either haven't heard of, or choose not to make use of, the special exception that says they don't have to. Personally, I don't blame them: that special exception for main is a strange one, which didn't always exist, so since it's not wrong to define main as taking two required arguments but not using them, that could be considered "better style".
(Yes, if you define a function that fails to actually use the arguments it defines, your compiler might warn you about this, but that's a separate question.)
I have a bunch of executables written in C that are statically analyzed with Polyspace Code Prover and Bug Finder. Both tools flag my main() functions for violation of MISRA's Guideline 8.4, with the following message:
"A compatible declaration shall be visible when an object or function with external linkage is defined.
Function 'main' has no visible compatible prototype at definition."
Forward declaring main() seems to solve it, but that is very "weird" for me and it introduces problems when documenting the project with Doxygen.
Here's the function:
int main(int argument_counter, char const *arg_vector[])
also as you can see, we couldn't use the traditional argc and argv[] parameter names because they were too similar to some variables it found on the external headers, which is also superweird in my opinion.
Is this a code problem or is there something wrong with the tools configuration?
You often get these kind of false positives from static analysers regarding main, when you use an implementation-defined form. But notably, a strictly conforming hosted program shall use this form:
int main(int argc, char *argv[])
The name of the parameters doesn't matter, but their types do. char* [] is not the same type as const char* []. The const in your code doesn't mark the actual character arrays as const, but rather the array of pointers to them. Which is a bit weird, I don't really see why anyone would attempt to overwrite those.
Also notable, argc and argv must be writable in a strictly conforming program, C17 5.1.2.2.1 §2:
The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination
So you should ideally just change the types to be the ones required by a strictly conforming program.
However, many C programs are not strictly conforming hosted programs, so the static analyser must be able to swallow implementation-defined forms of main too. There's really no harm in forward declaring main either - and you are safe to assume that the compiler does not do so (C17 5.1.2.2.1 §1 "The implementation declares no
prototype for this function.").
Suppose you have the implementation-defined form void main (void). To silence the tool you can simply write:
void main (void);
void main (void)
{ ...
I strongly suspect the reason for the tool warning is that it's too blunt to recognize that main is a special case. Similarly you can get warnings for using int as return value from main, instead of int32_t - which is a false positive, as MISRA-C has an explicit exception for the return type of main.
main() is an exception to many rules, both within MISRA and without...
For the avoidance of doubt, MISRA C:2012 Technical Corrigendum 1 adds an explicit exception to Rule 8.4 for main():
The function main need not have a separate declaration.
If a function pointer scopes out before being used in another thread to run, will the pointer be invalid? Or are function pointers always valid since they point to executable code which doesn't "move around"?
I think my real question is whether what the pointer points to (the function) will ever change, or is that value static throughout lifetime of program
Pseudo-code:
static void func(void) { printf("hi\n"); }
int main(void)
{
start_thread();
{
void (*f)(void) = func;
// edit: void run_on_other_thread(void (*f)(void));
run_on_other_thread(f); // non-blocking.
}
join_thread();
}
In the C base language, the values of function pointers never become invalid. They point to functions, and functions exist for the entire time a program is executing. The value of a pointer is valid for the entire program.
An object that contains a pointer may have a limited lifetime. (Note: The question mentioned scope, but scope is where in the source code an identifier is visible. Lifetime is when during program execution an object exists.) In the question void (*f)(void) = func;, f is an object with automatic storage duration. Once execution of the block it is defined in ends, f no longer exists, and references to it have undefined behavior. However, the value that was assigned to f is still a valid value. For example, if we define int x = 37;, and the lifetime of x ends, that does not mean you can no longer use the value 37 in a program. In this case, the value that f had, which is the address of func, is still valid. The address of func can continue to be used throughout the program’s execution.
The situations discussed in Xypron’s answer regarding dynamically linked functions or dynamically created functions would be extensions to the C language. In these situations, it is not the lifetime of the pointer object that is in question but rather the fact that the function itself is being removed from memory that causes the pointer to be no longer a valid pointer to the original function.
Whether a function pointer remains valid depends on its usage.
If it points to a function in the source code of your process it stays valid during the runtime of the process.
If you use a function pointer to point to a function in a dynamic link library, the pointer becomes invalid when unloading the library.
Code can be written that relocates itself. E.g. when the Linux kernel is started it relocates itself changing the addresses of functions.
You could call a runtime compiler which creates functions in memory during program execution possibly reusing the memory when an object goes out of scope.
As said it depends.
Why is argc is given as a parameter in C (i.e. int main(int argc, char **argv)) when we actually do not pass the count of our arguments?
I want to know why the syntax is written in such a way when argc does not take the parameter passed. Why didn't they design it as a keyword or a function like length when it is written only for us to know the count?
You're right that when one of the exec*() family of functions is called, you do not specify the number of arguments explicitly — that is indicated by the presence of a null pointer at the end of a list of arguments.
The count is passed to the int main(int argc, char **argv) function for convenience, so that the code does not have to step through the entire argument list to determine how many arguments are present. It is only convenience — since argv[argc] == 0 is guaranteed, you can determine the end of the arguments unambiguously.
For the rest, the reason is historical — it was done that way from the start, and there has been no reason to change it (and every reason not to change it).
It isn't clear what you mean by 'a keyword' for the argument count. C has very few keywords, and one for this purpose would be peculiar. Similarly, although there could be a function to do the job, that isn't really necessary — the interface chosen obviates the need for such a function. It might have been useful to have functional access to the argument list (and the environment) so that library code could enumerate the arguments and environment. (Using getenv(), you can find out about environment variables you know about; you can't find out about environment variables which you don't know about. On POSIX systems, there is the extern char **environ; variable that be used to enumerate the content of the environment, but that's not part of Standard C.)
Just wondering why this
int main(void){}
compiles and links
and so does this:
int main(int argc, char **argv){}
Why isn't it required to be one or the other?
gcc will even compile and link with one argument:
int main(int argc){}
but issue this warning with -Wall:
smallest_1.5.c:3:1: warning: ‘main’ takes only zero or two arguments [-Wmain]
I am not asking this as in "how come they allow this?" but as in "how does the caller and the linker handle multiple possibilities for main?"
I am taking a Linux point of view below.
The main function is very special in the standard definition (for hosted C11 implementations). It is also explicitly known by recent compilers (both GCC & Clang/LLVM....) which have specific code to handle main (and to give you this warning). BTW, GCC (with help from GNU libc headers thru function attributes) has also special code for printf. And you could add your own customization to GCC using MELT for your own function attributes.
For the linker, main is often a usual symbol, but it is called from crt0 (compile your code using gcc -v to understand what that really means). BTW, the ld(1) linker (and ELF files, e.g. executables or object files) has no notion of types or function signatures and deals only with names (This is why C++ compilers do some name mangling).
And the ABI and the calling conventions are so defined that passing unused arguments to a function (like main or even open(2)...) does not do any harm (several arguments get passed in registers). Read the x86-64 System V ABI for details.
See also the references in this answer.
At last, you really should practically define your main as int main(int argc, char**argv) and nothing else, and you hopefully should handle program arguments thru them (at least --help & --version as mandated by GNU coding standards). On Linux, I hate programs (and I curse their programmers) not doing that (so please handle --help & --version).
Because the calling code can, for example, pass arguments in registers or on the stack. The two argument main uses them, while the zero argument main does nothing with them. It's that simple. Linking does not even enter the picture.
If you are worried about stack adjustments in the called code, the main function just needs to make sure the stack pointer is the same when it returns (and often even this is of no importance, e.g. when the ABI states that the caller is responsible for stack management).
Making it work has to do with the binary format of the executable and the OS's loader. The linker doesn't care (well it cares a little: it needs to mark the entry point) and the only caller routine is the loader.
The loader for any system must know how to bring supported binary format into memory and branch into the entry point. This varies slightly by system and binary format.
If you have a question about a particular OS/binary format, you may want to clarify.
The short answer: if you don't use the parameters, then you can declare main without parameters, in two ways:
int main(void)
or
int main()
The first means main is a function with no parameters. The second means main is a function with any number of parameters.
Since you don't access the parameters, both will be fine. Any compiler having "special" code to check the parameters of main is wrong. (But: main must return a value.)
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
Regarding the parameters:
The first counts the arguments supplied to the program and the second is an array of pointers to the strings which are those arguments. These arguments are passed to the program by the command line interpreter.
So, the two possibilities are handled as:
If no parameters are declared: no parameters are expected as input.
If there are parameters in main() ,they should:
argc is greater than zero.
argv[argc] is a null pointer.
argv[0] through to argv[argc-1] are pointers to strings whose meaning will be determined by the program.
argv[0] will be a string containing the program's name or a null string if that is not available. Remaining elements of argv represent the arguments supplied to the program. In cases where there is only support for single-case characters, the contents of these strings will be supplied to the program in lower-case.
In memory:
they will be placed on the stack just above the return address and the saved base pointer (just as any other stack frame).
At machine level:
they will be passed in registers, depending on the implementation.