Just wondering why this
int main(void){}
compiles and links
and so does this:
int main(int argc, char **argv){}
Why isn't it required to be one or the other?
gcc will even compile and link with one argument:
int main(int argc){}
but issue this warning with -Wall:
smallest_1.5.c:3:1: warning: ‘main’ takes only zero or two arguments [-Wmain]
I am not asking this as in "how come they allow this?" but as in "how does the caller and the linker handle multiple possibilities for main?"
I am taking a Linux point of view below.
The main function is very special in the standard definition (for hosted C11 implementations). It is also explicitly known by recent compilers (both GCC & Clang/LLVM....) which have specific code to handle main (and to give you this warning). BTW, GCC (with help from GNU libc headers thru function attributes) has also special code for printf. And you could add your own customization to GCC using MELT for your own function attributes.
For the linker, main is often a usual symbol, but it is called from crt0 (compile your code using gcc -v to understand what that really means). BTW, the ld(1) linker (and ELF files, e.g. executables or object files) has no notion of types or function signatures and deals only with names (This is why C++ compilers do some name mangling).
And the ABI and the calling conventions are so defined that passing unused arguments to a function (like main or even open(2)...) does not do any harm (several arguments get passed in registers). Read the x86-64 System V ABI for details.
See also the references in this answer.
At last, you really should practically define your main as int main(int argc, char**argv) and nothing else, and you hopefully should handle program arguments thru them (at least --help & --version as mandated by GNU coding standards). On Linux, I hate programs (and I curse their programmers) not doing that (so please handle --help & --version).
Because the calling code can, for example, pass arguments in registers or on the stack. The two argument main uses them, while the zero argument main does nothing with them. It's that simple. Linking does not even enter the picture.
If you are worried about stack adjustments in the called code, the main function just needs to make sure the stack pointer is the same when it returns (and often even this is of no importance, e.g. when the ABI states that the caller is responsible for stack management).
Making it work has to do with the binary format of the executable and the OS's loader. The linker doesn't care (well it cares a little: it needs to mark the entry point) and the only caller routine is the loader.
The loader for any system must know how to bring supported binary format into memory and branch into the entry point. This varies slightly by system and binary format.
If you have a question about a particular OS/binary format, you may want to clarify.
The short answer: if you don't use the parameters, then you can declare main without parameters, in two ways:
int main(void)
or
int main()
The first means main is a function with no parameters. The second means main is a function with any number of parameters.
Since you don't access the parameters, both will be fine. Any compiler having "special" code to check the parameters of main is wrong. (But: main must return a value.)
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
Regarding the parameters:
The first counts the arguments supplied to the program and the second is an array of pointers to the strings which are those arguments. These arguments are passed to the program by the command line interpreter.
So, the two possibilities are handled as:
If no parameters are declared: no parameters are expected as input.
If there are parameters in main() ,they should:
argc is greater than zero.
argv[argc] is a null pointer.
argv[0] through to argv[argc-1] are pointers to strings whose meaning will be determined by the program.
argv[0] will be a string containing the program's name or a null string if that is not available. Remaining elements of argv represent the arguments supplied to the program. In cases where there is only support for single-case characters, the contents of these strings will be supplied to the program in lower-case.
In memory:
they will be placed on the stack just above the return address and the saved base pointer (just as any other stack frame).
At machine level:
they will be passed in registers, depending on the implementation.
Related
_Generic became available with C11, and before that in C99, tgmath.h included similar functionality using compiler specific hacks.
but how did main have multiple signatures back in K&R C, or C89/C90?
there's at least 2 function signatures for main() that I'm aware of:
1: int main(int argc, const char *argv[]);
2: int main(void);
but how did main have multiple signatures back in K&R C, or C89/C90?
main did not have multiple signatures per se in K&R C. That version had no sense of "signature" as you mean it. Although functions did have expectations about the number and types of their arguments, and their behavior was defined only if those expectations were satisfied, function arguments did not constitute a part of function declarations.
The following quotation from section 5.11 of the first edition of The C Programming Language (Kernighan & Ritchie, 1978) may be illuminating:
When main is called to begin execution, it is called with two arguments.
The statement is unconditional: main is (always) called with two arguments in C as described by K&R. Compilers could do whatever they wanted or needed to deal with cases where those parameters were not declared.
The case is not really different in C90 or any later version of C (all of which still support K&R-style functions definitions). Even when main is declared with a prototype, implementations do whatever they want or need to do. For example, maybe they generate code for a standard signature, and perform any necessary patch-up of recursive calls to main() during linking. Or maybe they generate code for whatever (supported) declaration of main() is provided, and deal with it in some kind of OS-specific wrapper. Maybe nothing special is even needed in some implementations.
The C Standard only requires the implementation to support the two signatures given in the question,
1: int main(int argc, const char *argv[]);
2: int main(void);
For calling conventions where the caller pops the arguments off the calling stack, the calling sequence for (1) works fine for (2) -- the caller pushes the arguments onto the stack, the callee (main) never uses them, and the caller removes them from the stack.
For calling conventions where the callee pops the arguments off the calling stack, main would have to be compiled differently depending on which signature is used. This would be a problem in implementations with a fixed piece of startup code in the C runtime, since it doesn't know how main was declared. The easiest way to deal with that is to always use a "caller pops" calling convention for main, and this is in fact how Microsoft's C compiler works -- see, e.g., https://learn.microsoft.com/en-us/cpp/build/reference/gd-gr-gv-gz-calling-convention, which states that other calling conventions are ignored when applied to main.
P.S.
_Generic and tgmath.h had no effect on any of this.
There were no signatures in K&R C, only the names of the arguments and optional type declarations for them, so there was only one possible calling convention for main.
So, none of these language changes over the decades has had any effect on how main is called.
C had and has no munged function signatures. Certainly nothing parameter-specific. Most compilers prepended (and some appended) an underscore ("_") to create a poor-man's linker namespace which made it easy to prevent symbol name collisions.
So the C runtime startup would always have one unambiguous symbol to startup. Most often _main.
start:
;# set up registers
;# set up runtime environment:
;# set up stack, initialize heap, connect stdin, stdout, stderr, etc.
;# obtain environment and format for use with "envp"
;# obtain command line arguments and set up for access with "argv"
push envp
push argv
push argc ; number of arguments in argv
call _main
push r0
call exit
.end start
Why is argc is given as a parameter in C (i.e. int main(int argc, char **argv)) when we actually do not pass the count of our arguments?
I want to know why the syntax is written in such a way when argc does not take the parameter passed. Why didn't they design it as a keyword or a function like length when it is written only for us to know the count?
You're right that when one of the exec*() family of functions is called, you do not specify the number of arguments explicitly — that is indicated by the presence of a null pointer at the end of a list of arguments.
The count is passed to the int main(int argc, char **argv) function for convenience, so that the code does not have to step through the entire argument list to determine how many arguments are present. It is only convenience — since argv[argc] == 0 is guaranteed, you can determine the end of the arguments unambiguously.
For the rest, the reason is historical — it was done that way from the start, and there has been no reason to change it (and every reason not to change it).
It isn't clear what you mean by 'a keyword' for the argument count. C has very few keywords, and one for this purpose would be peculiar. Similarly, although there could be a function to do the job, that isn't really necessary — the interface chosen obviates the need for such a function. It might have been useful to have functional access to the argument list (and the environment) so that library code could enumerate the arguments and environment. (Using getenv(), you can find out about environment variables you know about; you can't find out about environment variables which you don't know about. On POSIX systems, there is the extern char **environ; variable that be used to enumerate the content of the environment, but that's not part of Standard C.)
I would like to know whether the following C code adheres to the C99 and/or C11 standard(s):
void foo(int bar0, int bar1, int bar2) {
int *bars = &bar0;
printf("0: %d\n1: %d\n2: %d\n", bars[0], bars[1], bars[2]);
}
int main(int argc, char **argv) {
foo(8, 32, 4);
return 0;
}
This code snippet compiles and runs as expected when using visual studio 2013 and prints:
0: 8
1: 32
2: 4
No, not anywhere near.
C standard does not guarantee that the function arguments are stored in consecutive memory locations (or, any specific ordering, for that matter). It is up to the compiler and/or the platform (architecture) to decide how the function arguments are passed to the function.
To add some more clarity, there is even no guarantee that the arguments which are to be passed are stored in memory (e.g., stack), at all. They can make use of the hardware registers, too (whenever applicable), for some or all the parameters, to make the operations fast. For example,
PowerPC
The PowerPC architecture has a large number of registers so most functions can pass all arguments in registers for single level calls. [...]
MIPS
The most commonly used calling convention for 32 bit MIPS is the O32 ABI which passes the first four arguments to a function in the registers $a0-$a3; subsequent arguments are passed on the stack. [...]
X86
The x86 architecture is used with many different calling conventions. Due to the small number of architectural registers, the x86 calling conventions mostly pass arguments on the stack, while the return value (or a pointer to it) is passed in a register.
and so on. Check the full wiki article here.
So, in your case, bars[0] is a valid access, but whether bars[1] and bars[2] are valid, depends on the underlying environment (platform/compiler), entirely. Best not to rely on the behavior you're expecting.
That said, just to nitpick, in case you don't intend to use the arguments (if any) passed to main() , you can simply reduce the signature to int main(void) {.
No it does not adhere to any published standard. How arguments and local variables are stored, and where, is up to the compiler. What might work in one compiler might not work in another, or even on a different version of the same compiler.
The C specification doesn't even mention a stack, all it specifies are the scoping rules.
No standard supports this. It's extremely naughty.
Array indexing and pointer arithmetic is only valid for arrays. (Note a small exception: you can read a pointer one past an array or a scalar, but you can't deference it.)
Is there a limit on the number of arguments that we pass to main() in C? As you all know, it is defined as int main(int argc, char *argv[]).
When I call the program, I can pass arguments like so:
$ prog.exe arg1 arg2 arg3.....argn
Is there an upper bound in the number of argument that we may supply to main() in this way?
According to the POSIX spec for exec, there is a macro ARG_MAX defined in <limits.h> which defines the maximum number of bytes for the arguments + environment variables.
But since C doesn't define anything about that, no, there isn't an inherent cross-platform limit. You have to consult your OS manual if it doesn't define that macro.
No, there is no limit imposed by the ISO C99 standard. If you're using the "blessed" main form (of which there are two):
int main (int argc, char *argv[]);
then you will be limited to the maximum size of a signed integer (implementation-dependent but guaranteed to be at least 215-1 or 32,767).
Of course, you could even have more than that since the standard specifically allows for non-blessed main forms (for example, one that takes a long as the count).
The standard mandates how the arguments are stored and things like argv[argc] having to be NULL, but it does not directly limit the quantity.
Of course, there will be a limit in practice but this will depend entirely on the implementation and environment. However, if you have to ask, then you're probably doing something wrong.
Most tools would place a truly large number of arguments into a response file (say args.txt) then pass a single argument like:
my_prog #args.txt
which gets around arbitrary limits on argument quantity and size.
I wouldn't think so. While there may not be a theoretical limit, the computer probably can't handle 1.5 million arguments. Is there any particular reason you need to know this? I wouldn't recommend using command line arguments for thing other than options, file parameters, ect...
There is no limit explicit in C itself. This is an example of a behavior not defined in the language but rather the implementation. Remember that the language itself is different than it's implementation, subsequent libraries, IDE's, etc.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
main() in C, C++, Java, C#
I'm new to programming in general, and C in particular. Every example I've looked at has a "main" function - is this pre-defined in some way, such that the name takes on a special meaning to the compiler or runtime... or is it merely a common idiom among C programmers (like using "foo" and "bar" for arbitrary variable names).
No, you need to define main in your program. Since it's called from the run-time, however, the interface your main must provide is pre-defined (must return an int, must take either zero arguments or two, the first an int, and the second a char ** or, equivalently, char *[]). The C and C++ standards do specify that a function with external linkage named main acts as the entry point for a program1.
At least as the term is normally used, a predefined function would be one such as sin or printf that's in the standard library so you can use it without having to write it yourself.
1If you want to get technical, that's only true for a "hosted" implementation -- i.e., the kind most of us use most of the time that produces programs that run on an operating system. A "free-standing" implementation (one produces program that run directly on the "bare metal" with no operating system under it) is free to define the entry point(s) as it sees fit. A free-standing implementation can also leave out most of the normal run-time library, providing only a handful of headers (e.g., <stddef.h>) and virtually no standard library functions.
Yes, main is a predefined function in the general sense of the the word "defined". In other words, the C language standard specifies that the function called at program startup shall be named main. It is not merely a convention used by programmers as we have with foo or bar.
The fine print: from the perspective of the technical meaning of the word "defined" in the context of C programming, no the main function is not "predefined" -- the compiler or C library do not supply a predefined function named main. You need to define your own implementation of the main function (and, obviously, you should name it main).
There is typically a piece of code that normal C programs are linked to which does:
extern int main(int argc, char * argv[], char * envp[]);
FILE * stdin;
FILE * stdout;
FILE * stderr;
/* ** setup argv ** */
...
/* ** setup envp ** */
...
/* ** setup stdio ** */
stdin = fdopen(0, "r");
stdout = fdopen(1, "w");
stderr = fdopen(2, "w");
int rc;
rc = main(argc, argv, envp); // envp may not be present here with some systems
exit(rc);
Note that this code is C, not C++, so it expects main to be a C function.
Also note that my code does no error checking and leaves out a lot of other system dependent stuff that probably happens. It also ignores some things that happen with C++, objective C, and various other languages that may be linked to it (notably constructor and destructor calling, and possibly having main be within a C++ try/catch block).
Anyway, this code knows that main is a function and takes arguments. If your main looks like:
int main(void) {
Then it still gets passed arguments, but they are ignored.
This code specially linked so that it is called when the program starts up.
You are completely free to write your own version of this code on many architectures, but it relies on intimate knowledge of how the operating system starts a new program as well as the C (and C++ and possibly Objective C) run time. It is likely to require some assembly programming and or use of compiler extensions to properly build.
The C compiler driver (the command you usually call when you call the compiler) passes the object file containing all of this (often called crt0.0, for C Run Time ...) along with the rest of your program to the linker, unless told not to.
When building an operating system kernel or an embedded program you often do not want to use the standard crt*.o file. You also may not want to use it if you are building a normal application in another programming language, or have some other non-standard requirements.
No, or you couldn't define one.
It's not predefined, but its meaning as an entry point, if it is present, is defined.