Is there a limit on the number of arguments that we pass to main() in C? As you all know, it is defined as int main(int argc, char *argv[]).
When I call the program, I can pass arguments like so:
$ prog.exe arg1 arg2 arg3.....argn
Is there an upper bound in the number of argument that we may supply to main() in this way?
According to the POSIX spec for exec, there is a macro ARG_MAX defined in <limits.h> which defines the maximum number of bytes for the arguments + environment variables.
But since C doesn't define anything about that, no, there isn't an inherent cross-platform limit. You have to consult your OS manual if it doesn't define that macro.
No, there is no limit imposed by the ISO C99 standard. If you're using the "blessed" main form (of which there are two):
int main (int argc, char *argv[]);
then you will be limited to the maximum size of a signed integer (implementation-dependent but guaranteed to be at least 215-1 or 32,767).
Of course, you could even have more than that since the standard specifically allows for non-blessed main forms (for example, one that takes a long as the count).
The standard mandates how the arguments are stored and things like argv[argc] having to be NULL, but it does not directly limit the quantity.
Of course, there will be a limit in practice but this will depend entirely on the implementation and environment. However, if you have to ask, then you're probably doing something wrong.
Most tools would place a truly large number of arguments into a response file (say args.txt) then pass a single argument like:
my_prog #args.txt
which gets around arbitrary limits on argument quantity and size.
I wouldn't think so. While there may not be a theoretical limit, the computer probably can't handle 1.5 million arguments. Is there any particular reason you need to know this? I wouldn't recommend using command line arguments for thing other than options, file parameters, ect...
There is no limit explicit in C itself. This is an example of a behavior not defined in the language but rather the implementation. Remember that the language itself is different than it's implementation, subsequent libraries, IDE's, etc.
Related
I'm trying to reconcile the rules I find for creating variadic functions in C. On the one hand I see explicitly stated (for example, here) statements like "just before the ellipses is always an int". On the other hand, I see lots of example programs, including on stackoverflow that make no mention of such a rule (or convention), and in fact work without it. And I see many of the other form (the extra int), that also seem to work. (The most common function, in fact seems to be one defined like: int myFunc(char *format, ...) and is used with sprintf or friends).
I'm trying to wrap my head around how it works so that future efforts are based upon understanding, rather than based on the use of copy/paste. At present, for me, it might as well be a magic wand. So in order to understand how to get the most out of the option, I need to understand the rules. Can you help me understand why I find such conflicting requirements and why both conventions seem to work?
Thanks.
The main rule regarding a variadic function is that you need some way of determining how many arguments you have and what the type of those arguments are, though not necessarily the way the tutorial say.
Generally, there are two ways: either one of the fixed arguments tells you the number and possibly the type of the variadic arguments, or one of the variadic arguments is a sentinel value which specifies the end of the argument list.
Examples from the standard library and POSIX:
printf and family: The first argument is a format string, and the contents of this format string specify the number and type of each variadic argument.
execl: The second of its two fixed arguments is the first argument of an external program to run. If it is not NULL, variadic arguments are read as type const char * until it finds one that is NULL.
A variation of the first option is as you mentioned: one of the fixed arguments is the number of variadic arguments, where each variadic argument has the same predetermined type. This is the simplest to implement, which is probably why the tutorial you linked suggested it.
Which of these you choose depends entirely on your use case.
Another interesting variation is the open function on Linux and similar systems. The man pages show the following signatures:
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
The actual declaration looks something like this:
extern int open (const char *__file, int __oflag, ...) __nonnull ((1));
In this case, one variadic argument is read if the flags parameter includes the value O_CREAT.
There is no rule in the C standard that the parameter just before ... in a function declaration must be an int. The article you link to is merely referring to its particular example: When a function is declared with (int foo, ...), then the first argument passed to that specific function (after conversion from whatever the actual argument is; e.g., a char argument will be converted to int) is always an int.
In general, you can have any types for the parameters before .... The only rule is there must be at least one explicit parameter before the ....
Is environ variable (as of POSIX) available (at least for reading) in major Windows C compilers?
I know that execve is available on Windows: https://en.wikipedia.org/wiki/Exec_(system_call)
But I am unsure if environ is available, too.
environ should be available, but is deprecated and you should use the more secure methods.
The execXX() calls are available, but fork() isn't, so effectively the exec functions are rendered useless.
You can use CreateProcessA for similar effect, and have the ability to set up environments and pipes cleanly.
Just to acknowledge #eryksun 's concerns: You do need to consider which character set you are using before using any Microsoft "A" file (and other O/S) APIs. It is simplest if you can do all your code using 16bit unicode, as that is the underlying type for NT, Windows 7, Windows 10. On unix and mac, you can make assumptions that utf-8 is the 8-bit character set of choice, but that has yet to happen for windows, and of course "backward compatibility". If you are using any of the "unix-like" M/S API, you should already be making the same design decisions, though, so should already have an answer.
The following program will print the environment variables.
#include <stdio.h>
int main(int argc, char *argv[], char *env[]){
int e = 0;
while (env[e] != NULL) {
printf("%s\n", env[e++]);
}
}
EDIT: I was wrong; looks like the MSVC runtime library does include support for environ (though deprecated) after all. I will leave my previous answer below if anyone is interested in alternative methods.
Not that I'm aware of, but, if you want to access the environment-variables on Windows, you have some options:
Declare main or wmain with the following signature:
int (w)main(int argc, char/wchar_t *argv[], char/wchar_t *envp[])
This is defined in the C Standard as a pointer to the environment block, if applicable:
§ J.5.1:
In a hosted environment, the main function receives a third argument, char *envp[],
that points to a null-terminated array of pointers to char, each of which points to a string
that provides information about the environment for this execution of the program
(5.1.2.2.1).
Use the Windows API function GetEnvironmentVariable(A|W) to get an individual environment variable, or GetEnvironmentStrings to get the entire environment array.
The standard C function getenv.
Why is argc is given as a parameter in C (i.e. int main(int argc, char **argv)) when we actually do not pass the count of our arguments?
I want to know why the syntax is written in such a way when argc does not take the parameter passed. Why didn't they design it as a keyword or a function like length when it is written only for us to know the count?
You're right that when one of the exec*() family of functions is called, you do not specify the number of arguments explicitly — that is indicated by the presence of a null pointer at the end of a list of arguments.
The count is passed to the int main(int argc, char **argv) function for convenience, so that the code does not have to step through the entire argument list to determine how many arguments are present. It is only convenience — since argv[argc] == 0 is guaranteed, you can determine the end of the arguments unambiguously.
For the rest, the reason is historical — it was done that way from the start, and there has been no reason to change it (and every reason not to change it).
It isn't clear what you mean by 'a keyword' for the argument count. C has very few keywords, and one for this purpose would be peculiar. Similarly, although there could be a function to do the job, that isn't really necessary — the interface chosen obviates the need for such a function. It might have been useful to have functional access to the argument list (and the environment) so that library code could enumerate the arguments and environment. (Using getenv(), you can find out about environment variables you know about; you can't find out about environment variables which you don't know about. On POSIX systems, there is the extern char **environ; variable that be used to enumerate the content of the environment, but that's not part of Standard C.)
I would like to know whether the following C code adheres to the C99 and/or C11 standard(s):
void foo(int bar0, int bar1, int bar2) {
int *bars = &bar0;
printf("0: %d\n1: %d\n2: %d\n", bars[0], bars[1], bars[2]);
}
int main(int argc, char **argv) {
foo(8, 32, 4);
return 0;
}
This code snippet compiles and runs as expected when using visual studio 2013 and prints:
0: 8
1: 32
2: 4
No, not anywhere near.
C standard does not guarantee that the function arguments are stored in consecutive memory locations (or, any specific ordering, for that matter). It is up to the compiler and/or the platform (architecture) to decide how the function arguments are passed to the function.
To add some more clarity, there is even no guarantee that the arguments which are to be passed are stored in memory (e.g., stack), at all. They can make use of the hardware registers, too (whenever applicable), for some or all the parameters, to make the operations fast. For example,
PowerPC
The PowerPC architecture has a large number of registers so most functions can pass all arguments in registers for single level calls. [...]
MIPS
The most commonly used calling convention for 32 bit MIPS is the O32 ABI which passes the first four arguments to a function in the registers $a0-$a3; subsequent arguments are passed on the stack. [...]
X86
The x86 architecture is used with many different calling conventions. Due to the small number of architectural registers, the x86 calling conventions mostly pass arguments on the stack, while the return value (or a pointer to it) is passed in a register.
and so on. Check the full wiki article here.
So, in your case, bars[0] is a valid access, but whether bars[1] and bars[2] are valid, depends on the underlying environment (platform/compiler), entirely. Best not to rely on the behavior you're expecting.
That said, just to nitpick, in case you don't intend to use the arguments (if any) passed to main() , you can simply reduce the signature to int main(void) {.
No it does not adhere to any published standard. How arguments and local variables are stored, and where, is up to the compiler. What might work in one compiler might not work in another, or even on a different version of the same compiler.
The C specification doesn't even mention a stack, all it specifies are the scoping rules.
No standard supports this. It's extremely naughty.
Array indexing and pointer arithmetic is only valid for arrays. (Note a small exception: you can read a pointer one past an array or a scalar, but you can't deference it.)
Just wondering why this
int main(void){}
compiles and links
and so does this:
int main(int argc, char **argv){}
Why isn't it required to be one or the other?
gcc will even compile and link with one argument:
int main(int argc){}
but issue this warning with -Wall:
smallest_1.5.c:3:1: warning: ‘main’ takes only zero or two arguments [-Wmain]
I am not asking this as in "how come they allow this?" but as in "how does the caller and the linker handle multiple possibilities for main?"
I am taking a Linux point of view below.
The main function is very special in the standard definition (for hosted C11 implementations). It is also explicitly known by recent compilers (both GCC & Clang/LLVM....) which have specific code to handle main (and to give you this warning). BTW, GCC (with help from GNU libc headers thru function attributes) has also special code for printf. And you could add your own customization to GCC using MELT for your own function attributes.
For the linker, main is often a usual symbol, but it is called from crt0 (compile your code using gcc -v to understand what that really means). BTW, the ld(1) linker (and ELF files, e.g. executables or object files) has no notion of types or function signatures and deals only with names (This is why C++ compilers do some name mangling).
And the ABI and the calling conventions are so defined that passing unused arguments to a function (like main or even open(2)...) does not do any harm (several arguments get passed in registers). Read the x86-64 System V ABI for details.
See also the references in this answer.
At last, you really should practically define your main as int main(int argc, char**argv) and nothing else, and you hopefully should handle program arguments thru them (at least --help & --version as mandated by GNU coding standards). On Linux, I hate programs (and I curse their programmers) not doing that (so please handle --help & --version).
Because the calling code can, for example, pass arguments in registers or on the stack. The two argument main uses them, while the zero argument main does nothing with them. It's that simple. Linking does not even enter the picture.
If you are worried about stack adjustments in the called code, the main function just needs to make sure the stack pointer is the same when it returns (and often even this is of no importance, e.g. when the ABI states that the caller is responsible for stack management).
Making it work has to do with the binary format of the executable and the OS's loader. The linker doesn't care (well it cares a little: it needs to mark the entry point) and the only caller routine is the loader.
The loader for any system must know how to bring supported binary format into memory and branch into the entry point. This varies slightly by system and binary format.
If you have a question about a particular OS/binary format, you may want to clarify.
The short answer: if you don't use the parameters, then you can declare main without parameters, in two ways:
int main(void)
or
int main()
The first means main is a function with no parameters. The second means main is a function with any number of parameters.
Since you don't access the parameters, both will be fine. Any compiler having "special" code to check the parameters of main is wrong. (But: main must return a value.)
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
Regarding the parameters:
The first counts the arguments supplied to the program and the second is an array of pointers to the strings which are those arguments. These arguments are passed to the program by the command line interpreter.
So, the two possibilities are handled as:
If no parameters are declared: no parameters are expected as input.
If there are parameters in main() ,they should:
argc is greater than zero.
argv[argc] is a null pointer.
argv[0] through to argv[argc-1] are pointers to strings whose meaning will be determined by the program.
argv[0] will be a string containing the program's name or a null string if that is not available. Remaining elements of argv represent the arguments supplied to the program. In cases where there is only support for single-case characters, the contents of these strings will be supplied to the program in lower-case.
In memory:
they will be placed on the stack just above the return address and the saved base pointer (just as any other stack frame).
At machine level:
they will be passed in registers, depending on the implementation.