Consider the following empty C program (the standard guarantees that the compiler does an implicit return 0):
int main(int argc, char* argv[]) {}
You can add any logic into this function that manipulates argc and argv. Yet, when main finishes, its assembly code will just do a simple ret instead of ret 4. I am expecting to see a ret 4 because in my mind main is the callee of some other function and therefore must clean up its two arguments from the stack: and int and a pointer to char array. Why does it not do so?
Most compilers choose to have the caller clean the arguments from the stack; a tradition that dates from early C compilers and handling number of arguments. At the call-site, the compiler knows how many it pushed, so it is trivial for it to adjust the stack.
Also, note that historically main can be specified with 0-3 (arge) arguments. Again, the caller (eg. _start) can just provide 3 and let the implementor choose.
Related
So I'm learning about functions in a book.
It says we need to prototype or declare functions so the compiler can understand if they are correctly called or not.
But why does the main function works without a prototype?
I used to write main functions in my learning process like this:
int main(void)
So it will not get any argument because of (void)
I tried to run my program with argument for example > ./a.out 2
int main(int y){
printf("%s %d\n","y is",y);
}
When I run it normally y is 1, when run it with > ./a.out 1 y is 2, when there is more than one argument it increases by one. So it's not the right way but what causes this?
Declaring y as char says nothing so my guess is it works like the return value of scanf(). It returns number of successful inputs.
A function must be either declared (i.e. a prototype) or defined before it is called. The main function is different from other functions in that it's called by the program startup code and not some other function.
There are however restrictions on what the signature of main can be. On a regular hosted implementation, the C standard says it can be either:
int main(void)
Or:
int main(int argc, char **argv)
The latter case is used to read command line arguments. argc contains the number of arguments passes, including the program name. argv contains the actual arguments as an array of char *.
Many systems also support:
int main(int argc, char **argv, char **envp)
Where envp contains the environment variables known to the program.
The prototype you're using: int main(int y) is not supported in any implementation I'm aware of, so attempting to use such a prototype for main invokes undefined behavior.
When I run it normally y is 1, when run it with > ./a.out 1 y is 2, when there is more than one argument it increases by one. So it's not the right way but what causes this?
The standard entry for program startup kind of answers both your questions:
N1570 § 5.1.2.2.1 Program startup
1 The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
[...]
As dbush already stated in the accepted answer these are the only two main implementations allowed by the standard.
The standard leaves the responsability of dealing with undefined constructs opened and imposes no requirements for what the behavior should be, a given implementation may deal with the situation in any way it considers appropriate, this is known as undefined behavior.
What seems to be happening is that your compiler is assuming that y is argc, which is allowed (as you can see in the second snippet of highlighted citation above), and argc stores the number of arguments in the command line, which is consistent with the results you're having, but again, this behavior may differ in different compilers, systems or even versions of the same compiler.
In C, main() function takes only zero or two arguments.If we provide two arguments,then first argument must be int type.
int main(int argc, char *argv[])
But, I saw following code, when browsing OpenBSD.
int main(void *framep){}
Is it valid in C?
GCC compiler gives following warnings:
prog.c:3:5: warning: first argument of 'main' should be 'int' [-Wmain]
int main(void *p) {
^~~~
prog.c:3:5: warning: 'main' takes only zero or two arguments [-Wmain]
What is the purpose of it?
On Linux, during linkage, the library function _start is to be linked to a function main() that is expected to be present in your code.
Then traditionally your main is called by _start with int argc, char *argv[], the number of arguments (including program name) and the actual arguments (plus a trailing NULL).
However on some other implementations, there might be no need to call main this way, or for performance reasons call it with a reduced number of arguments, following a different format.
main() is the starting function of our programs and is passed argc, argv, but after all, it's only a C function and may be passed something else, as long as the convention, on that implementation, is known and accepted.
Oups, this is not a normal program but a kernel, so the normal rules for main do not really apply. When the program starts, no environment exists to pass argument values, and the return value of main will not be used either because when the kernel exits, nothing else exists. One comment says that the definition was modified to only cope with gcc requirements:
return int, so gcc -Werror won't complain
That is explicit in N1256 draft for C11 at 5.1.2.1 Freestanding environment :
In a freestanding environment (in which C program execution may take place without any
benefit of an operating system), the name and type of the function called at program
startup are implementation-defined. Any library facilities available to a freestanding
program, other than the minimal set required by clause 4, are implementation-defined.
The effect of program termination in a freestanding environment is implementationdefined.
As at kernel startup no OS still exists, so it actually runs in a freestanding environment. That probably means that is also needs to be compiled with special flags...
In the link you provide, framep is not used inside the main function.
And no, it's not standard.
GCC issues warnings as you saw already, but it's worth noting that clang throws an error:
error: first parameter of 'main' (argument count) must be of type 'int'
int main(void *framep){}
^
1 error generated.
From the Standard:
5.1.2.2.1 Program startup 1
The function called at program startup is named main. The implementation declares no prototype for this
function. It shall be defined with a return type of int and with no
parameters: int main(void) { /* ... */ }
or
with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in
which they are declared):
int main(int argc, char *argv[]) { /* ...*/ }
or equivalent) or in some other implementation-defined manner.
You compiled it with g++ to get those errors, if you compile it with gcc you don't get them.
$ gcc test.c
$ g++ test.c
test.c:3:5: warning: first argument of 'int main(void*)' should be 'int' [-Wmain]
int main(void *framep)
^~~~
test.c:3:5: warning: 'int main(void*)' takes only zero or two arguments [-Wmain]
This is important because C doesn't consider the argument types (or number!) to be part of the type of the function (whereas C++ does). There are various reasons, among them being that in C, the caller cleans up the arguments, so if he specifies too many, he also cleans them up. In C++, the callee cleans up arguments, so if he cleans up the wrong number, you end up with a corrupted stack.
On to why you might choose to use int main(void *framep): In the calling convention for C, the arguments are pushed onto the stack, and then the call is made, which places the return address next. The callee will then typically push the old value of EBP, then move the stack pointer into EBP as the "base pointer" for the new stack frame. Then the stack pointer is moved to allocate space of any automatic (local) variables in the callee. I.e., the stack looks like this:
Arg n
Arg n-1
...
Arg 1
Return Addr
Old EBP
Callee locals
Now assume that we'd like to inspect the return address for our function, or read the prior frame pointer (Old EBP) that was pushed. If we were writing in assembly, we'd just dereference relative to the current frame pointer (EBP). But we're writing in C. One way to get a reference would be to take the address of the first argument. That is, &framep, which is the place where Arg1 lives on the stack. Thus (&framep)[-2] should be a void * pointing to the stored prior frame pointer (Old EBP).
(Note: I am assuming Intel architecture, where all pushes to the stack are extended to pointer size by the hardware.)
I have a program like this.
#include<stdio.h>
#include<stdlib.h>
int main(int i) { /* i will start from value 1 */
if(i<10)
printf("\n%d",main(++i)); /* printing the values until i becomes 9 */
}
output :
5
2
2
2
Can anyone explain how the output is coming ?? what main(++i) is returning for each iteration.
Also it is producing output 5111 if i remove the \n in the printf function.
Thanks in advance.
First of all, the declaration of main() is supposed to be int main(int argc, char **argv). You cannot modify that. Even if your code compiles, the system will call main() the way it is supposed to be called, with the first parameter being the number of parameters of your program (1 if no parameter is given). There is no guarantee it will always be 1. If you run your program with additional parameters, this number will increase.
Second, your printf() is attempting to print the return value of main(++i), howover, your main() simply don't return anything at all. You have to give your function a return value if you expect to see any coherence here.
And finally, you are not supposed to call your own program's entrypoint, much less play with recursion with it. Create a separate function for this stuff.
Here's what the C Draft Standard (N1570) says about main:
5.1.2.2.1 Program startup
1 The function called at program startup is named main. The implementation declares no
prototype for this function. It shall be defined with a return type of int and with no
parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be
used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent or in some other implementation-defined manner.
Clearly, the main function in your program is neither of the above forms. Unless your platform supports the form you are using, your program is exhibiting undefined behavior.
This program has undefined behavior (UB) all over the place, and if you have a single instance undefined behavior in your program, you can't safely assume any output or behavior of your program - It legally can happen anything (although in real world the effects often are somewhat localized near the place of UB in the code.
The old C90 standard listed are more than 100 (if i recall right) situations of UB and there is a not known number of UBs on top, which is behavior for situations, the standard do not describe. A set of situations, that are UB exists, for every C and C++ Standard.
In your case (without consulting standards) instances of UB are at least:
not returning an value of a function that is declared with a return value. (exception: calling main the FIRST time - thanks, Jim for the comments)
defining (and calling) main other than with the predefined forms of the standard, or as specified (as implementation defined behavior) by your compiler.
Since you have at least one instance of UB in your program, speculations about the results, are somewhat... speculative and must make assumptions about your compiler, your operating system, hardware, and even software running on parallel, that are normally not documented or can be known.
You are not initializing i, so by default value will be taken from the address where it is stored in RAM.
This code will produce garbage output if you run the code multiple times after restarting your computer.
The output will also depend on compiler.
I'm surprised that even compiles.
When the operating system actually runs the program and main() gets called, two 32 (or 64) bit values are passed to it. You can either ignore them by declaring main(void), or use them by declaring main(int argc, char** args).
As the above prototype suggests, the first value passed is a count of the number of command-line arguments that are being passed to the process, and the second is a pointer to where a list of these arguments is stored in memory, likely on the program's local stack.
The value of argc is always at least 1, because the first item string in args is always the name of the program itself, generated by the OS.
Regarding your unexpected output, I'd say something is not getting pulled off or pushed onto the stack, so variables are getting mixed up. This is either due to the incomplete argument list for main() or the fact that you've declared main to return an int, but haven't returned anything.
I think, the main method is calling itself inside the main method.
to increment the value of a variable, i++ is printing the value of i before it increment while ++i it increment the value of i first before it print the value of i.
you could use this..
int x=0;
main()
{
do
{
printf(x++);
}while (i<10);
}
#include <stdio.h>
int main(int argc, char *argv[])
{
int i;
for(i=1;i<argc;i++)
printf("%s%s", argv[i], (i<argc-1)? " ":"");
printf("\n");
return 0;
}
Given above is a simple C program that outputs command line inputs. Here argc is the argument counter. argv is said to be an array that contains arguments. My question is: why does it define as a pointer to a character array instead of a normal array? Also what is the need for defining its zeroth element (argv[0]) as the name by which the program is invoked.
I am a beginner and please explain it high level perspective.
argv is defined as a pointer rather than as an array because there is no such thing as an array parameter in C.
You can define something that looks like an array parameter, but it's "adjusted" to array type at compile time; for example, these two declarations are exactly equivalent:
int foo(int param[]);
int foo(int param[42]); /* the 42 is quietly ignored */
int foo(int *param); /* this is what the above two declarations really mean */
And the definition of main can be written either as:
int main(int argc, char *argv[]) { /* ... */ }
or as
int main(int argc, char **argv) { /* ... */ }
The two are exactly equivalent (and the second one, IMHO, more clearly expresses what's actually going on).
Array types are, in a sense, second-class types in C. Code that manipulates array almost always does so via pointers to the elements, performing pointer arithmetic to traverse the elements.
Section 6 of the comp.lang.c FAQ explains the often confusing relationship between arrays and pointers.
(And if you've been told that arrays are "really" pointers, they're not; arrays and pointers are distinct things.)
As for why argv[0] points to the program name, that's just because it's useful. Some programs print their names in error messages; others may change their behavior depending on the name by which they're invoked. Bundling the program name with the command-line arguments was a fairly arbitrary choice, but it's convenient and it works.
The char *argv[] is a pointer that an array of char * has decayed into. For example, invoking a command like this:
$ ./command --option1 -opt2 input_file
could be viewed as:
char *argv[] = {
"./command",
"--option1",
"-opt2",
"input_file",
NULL,
};
main(4, argv);
So basically there is an array of strings outside main, and it is passed to you in main:
char *argv[]
\- --/ ^
V |
| It was an array
|
of strings
Regarding argv[0] being the invocation command, the reason is largely historical. I don't know what the first person who thought of it thought about, but I can tell at least one usefulness for it.
Imagine a program, such as vim or gawk. These programs may install symbolic links (such as vi or awk) which point to the same program. So effectively, running vim or vi (or similarly gawk or awk) could execute the exact same program. However, by inspecting argv[0], these programs can tell how they have been called and possibly adjust accordingly.
As far as I know, neither of the programs I mentioned above actually do this, but they could. For example vim called through a symbolic link named vi could turn on some compatibility. Or gawk called as awk could turn off some GNU extensions. In the modern world, if they wanted to do this, they would probably create scripts that gives the correct options, though.
The questions you ask are really answered best by simply saying its all "by definition". i.e. a set of rules designed and agreed upon by a committee.
Here is what C11 says: (see emphasized sections)
5.1.2.2.1 Program startup
1 The function called at program startup is named main. The implementation declares no
prototype for this function. It shall be defined with a return type of int and with no
parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be
used, as they are local to the function in which they are declared):
int main(int argc, char argv[]) { / ... */ }
or equivalent;10) or in some other implementation-defined manner.
2 If they are declared, the parameters to the main function shall obey the following
constraints:
— The value of argc shall be nonnegative.
— argv[argc] shall be a null pointer.
— If the value of argc is greater than zero, the array members argv[0] through
argv[argc-1] inclusive shall contain pointers to strings, which are given
implementation-defined values by the host environment prior to program startup.
The intent is to supply to the program information determined prior to program startup
from elsewhere in the hosted environment. If the host environment is not capable of
supplying strings with letters in both uppercase and lowercase, the implementation
shall ensure that the strings are received in lowercase.
— If the value of argc is greater than zero, the string pointed to by argv[0]
represents the program name; argv[0][0] shall be the null character if the
program name is not available from the host environment. If the value of argc is
greater than one, the strings pointed to by argv[1] through argv[argc-1]
represent the program parameters.
— The parameters argc and argv and the strings pointed to by the argv array shall
be modifiable by the program, and retain their last-stored values between program
startup and program termination.
It is not defined as a normal array because in C the size of array elements has to be known at compile time. The size of char * is known, the size (length) of your arguments are not.
argv[0] contains the name of the invoked process because it is possible to invoke it by any arbitrary name. e.g. exec family of calls can specify what it wants and you are allowed to invoke a program via a symlink. argv[0] allows the program to offer different functionality depending on the invocation name.
Was browsing the source code for sudo as provided on this site, and came across this super weird type signature (Bonus question: is there a more C-like term for "type signature"?) for main:
int
main(argc, argv, envp)
int argc;
char **argv;
char **envp;
{
I understand that the style itself is the oldskool K&R. What I'm really interested in is the fact that main is taking a bonus argument, char **envp. Why? sudo is a fairly standard command line tool and is invoked as such. How does the operating system know what to do when it comes across a main function that isn't defined with the usual (int argc, char *argv[])?
Many a time I myself have been lazy and just left off the arguments completely and whatever program I was writing would appear to work just fine (most dangerous thing that can happen with C, I know :p)
Another part of my question is, what cool stuff does all this allow you to do? I have a hunch it helps a heap with embedded programming, but I've sadly had minimal exposure to that and can't really say. I would love to see some concrete examples
It's just a pointer to the environment, identical to
extern char **environ;
Both have been available in unix since Version 7 (in Version 6 there were no environment variables). The extern named environ got standardized; the third argument to main didn't. There's no reason to use 3-arg main except as some kind of fashion statement.
The process setup code that calls main doesn't need to know whether main expects 3 arguments, because there's no difference at the assembly level between a function that takes 2 arguments and a function that takes 3 arguments but doesn't use the third one. Or between a function that takes no arguments and a function that takes 2 arguments and doesn't use them, which is why int main(void) also works.
Systems with a not-unix-like ABI may need to know which kind of main they're calling.
Put this in one file:
#include <stdio.h>
int foo(int argc, char **argv)
{
int i;
for(i=0;i<argc;++i)
puts(argv[i]);
return 0;
}
And this in another:
extern int foo(int argc, char **argv, char **envp);
int main(int argc, char **argv)
{
char *foo_args[] = { "foo", "arg", "another arg" };
char *foo_env[] = { "VAR=val", "VAR2=val2" };
foo(3, foo_args, foo_env);
return 0;
}
This is completely wrong from a cross-platform language-lawyer standpoint. We've lied to the compiler about the type of foo and passed it more arguments than it wants. But in unix, it works. The extra argument just harmlessly occupies a slot on the stack that is properly accounted for and cleaned up by the caller after the function returns, or temporarily exists in a register where the callee doesn't expect to find anything in particular, and which the caller expects the callee to clobber so it doesn't mind if the register gets reused for another purpose in the callee.
That's exactly what happens to envp in a normal C program with a 2-arg main. And what happens to argc and argv in a program with int main(void).
But just like envp itself, there's no good reason to take advantage of this in serious code. Type checking is good for you, and knowing you can evade it should go along with knowing that you shouldn't.
http://en.wikipedia.org/wiki/Main_function
From the very first paragraph:
Other platform-dependent formats are also allowed by the C and C++ standards, except that in C++ the return type must always be int;[3] for example, Unix (though not POSIX.1) and Microsoft Windows have a third argument giving the program's environment, otherwise accessible through getenv in stdlib.h:
Google is your friend. Also, the operating system doesn't need to know anything about the main in this case - it's the compiler that does the work, and as long as it's valid argument which are accepted by the compiler then there is no problem.