Why does declaring main as an array compile? - c

I saw a snippet of code on CodeGolf that's intended as a compiler bomb, where main is declared as a huge array. I tried the following (non-bomb) version:
int main[1] = { 0 };
It seems to compile fine under Clang and with only a warning under GCC:
warning: 'main' is usually a function [-Wmain]
The resulting binary is, of course, garbage.
But why does it compile at all? Is it even allowed by the C specification? The section that I think is relevant says:
5.1.2.2.1 Program startup
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters [...] or with two parameters [...] or in some other implementation-defined manner.
Does "some other implementation-defined manner" include a global array? (It seems to me that the spec still refers to a function.)
If not, is it a compiler extension? Or a feature of the toolchains, that serves some other purpose and they decided to make it available through the frontend?

It's because C allows for "non-hosted" or freestanding environment which doesn't require the main function. This means that the name main is freed for other uses. This is why the language as such allows for such declarations. Most compilers are designed to support both (the difference is mostly how linking is done) and therefore they don't disallow constructs that would be illegal in hosted environment.
The section you refers to in the standard refers to hosted environment, the corresponding for freestanding is:
in a freestanding environment (in which C program execution may take place without any
benefit of an operating system), the name and type of the function called at program
startup are implementation-defined. Any library facilities available to a freestanding
program, other than the minimal set required by clause 4, are implementation-defined.
If you then link it as usual it will go bad since the linker normally has little knowledge about the nature of the symbols (what type it has or even if it's a function or variable). In this case the linker will happily resolve calls to main to the variable named main. If the symbol is not found it will result in link error.
If you're linking it as usual you're basically trying to use the compiler in hosted operation and then not defining main as you're supposed to means undefined behavior as per appendix J.2:
the behavior is undefined in the following circumstances:
...
program in a hosted environment does not define a function named
main
using one
of the specified forms (5.1.2.2.1)
The purpose of the freestanding possibility is to be able to use C in environments where (for example) standard libraries or CRT initialization is not given. This means that the code that is run before main is called (that's the CRT initialization that initializes the C runtime) might not provided and you would be expected to provide that yourself (and you may decide to have a main or may decide not to).

If you are interested how to create program in main array: https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html. The example source there just contains a char (and later int) array called main which is filled with machine instructions.
The main steps and problems were:
Obtain the machine instructions of a main function from a gdb memory dump and copy it into the array
Tag the data in main[] executable by declaring it const (data is apparently either writable or executable)
Last detail: Change an address for actual string data.
The resulting C code is just
const int main[] = {
-443987883, 440, 113408, -1922629632,
4149, 899584, 84869120, 15544,
266023168, 1818576901, 1461743468, 1684828783,
-1017312735
};
but results in an executable program on a 64 bit PC:
$ gcc -Wall final_array.c -o sixth
final_array.c:1:11: warning: ‘main’ is usually a function [-Wmain]
const int main[] = {
^
$ ./sixth
Hello World!

The problem is that main is not a reserved identifier. The C standard only says that in hosted systems there is usually a function called main. But nothing in the standard prevents you from abusing the same identifier for other sinister purposes.
GCC gives you a smug warning "main is usually a function", hinting that the use of the identifier main for other unrelated purposes isn't a brilliant idea.
Silly example:
#include <stdio.h>
int main (void)
{
int main = 5;
main:
printf("%d\n", main);
main--;
if(main)
{
goto main;
}
else
{
int main (void);
main();
}
}
This program will repeatedly print the numbers 5,4,3,2,1 until it gets a stack overflow and crashes (don't try this at home). Unfortunately, the above program is a strictly conforming C program and the compiler can't stop you from writing it.

main is - after compiling - just another symbol in an object file like many others (global functions, global variables, etc).
The linker will link the symbol main regardless of its type. Indeed, the linker cannot see the type of the symbol at all (he can see, that it isn't in the .text-section however, but he doesn't care ;))
Using gcc, the standard entry point is _start, which in turn calls main() after preparing the runtime environment. So it will jump to the address of the integer array, which usually will result in a bad instruction, segfault or some other bad behaviour.
This all of course has nothing to do with the C-standard.

It only compiles because you don't use the proper options (and works because linkers sometimes only care for the names of symbols, not their type).
$ gcc -std=c89 -pedantic -Wall x.c
x.c:1:5: warning: ISO C forbids zero-size array ‘main’ [-Wpedantic]
int main[0];
^
x.c:1:5: warning: ‘main’ is usually a function [-Wmain]

const int main[1] = { 0xc3c3c3c3 };
This compiles and executes on x86_64... does nothing just return :D

Related

How to understand "main function's prototype cannot be supplied by the program"?

I read main function, and came across following words:
The main function has several special properties:
A prototype for this function cannot be supplied by the program.
Then I wrote a simple program:
# cat foo.c
int main(void);
int main(void)
{
return 0;
}
And compiled it:
# gcc -Wall -Wextra -Wpedantic -Werror foo.c
#
All seems OK! So I am little confused about how to understand "A prototype for this function cannot be supplied by the program". Anyone can give some insights?
The C standard (5.1.2.2.1) just says that the compiler (for hosted systems like PC etc) will not provide a prototype for the main function. So cppreference.com isn't really correct, the C standard doesn't prohibit the application programmer from writing a prototype, although doing so is probably meaningless practice in hosted systems.
In freestanding systems (embedded systems etc), it might be meaningful to declare a prototype for main in case it needs to be called from a reset ISR or from the "C runtime" (CRT).
What's important to realize no matter system is that the compiler specifies which forms of main that are valid. Never the programmer.
main function parameters and the return value is defined by the standard. You shall not provide your own one. But the compiler may accept and compile even non standard types of main.
On the other hand, the main function will not be called by your code so your prototype is not needed at all.

C - One-definition rule for functions

I'm new to C and have read that each function may only be defined once, but I can't seem to reconcile this with what I'm seeing in the console. For example, I am able to overwrite the definition of printf without an error or warning:
#include <stdio.h>
extern int printf(const char *__restrict__format, ...) {
putchar('a');
}
int main() {
printf("Hello, world!");
return 0;
}
So, I tried looking up the one-definition rule in the standard and found Section 6.9 (5) on page 155, which says (emphasis added):
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier delared with external linkage is used in an expression [...], somewhere in the entire program there shall be exactly one external definition for that identifier; otherwise, there shall be no more than one.
My understanding of linkage is very shaky, so I'm not sure if this is the relevant clause or what exactly is meant by "entire program". But if I take "entire program" to mean all the stuff in <stdio.h> + my source file, then shouldn't I be prohibited from redefining printf in my source file since it has already been defined earlier in the "entire program" (i.e. in the stdio bit of the program)?
My apologies if this question is a dupe, I couldn't find any existing answers.
The C standard does not define what happens if there is more than one definition of a function.
… shouldn't I be prohibited…
The C standard has no jurisdiction over what you do. It specifies how C programs are interpreted, not how humans may behave. Although some of its rules are written using “shall,” this is not a command to the programmer about what they may or may not do. It is a rhetorical device for specifying the semantics of C programs. C 2018 4 2 tells us what it actually means:
If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined…
So, when you provide a definition of printf and the standard C library provides a definition of printf, the C standard does not specify what happens. In common practice, several things may happen:
The linker uses your printf. The printf in the library is not used.
The compiler has built-in knowledge of printf and uses that in spite of your definition of printf.
If your printf is in a separate source module, and that module is compiled and inserted into a library, then which printf the program uses depends on the order the libraries are specified to the linker.
While the C standard does not define what happens if there are multiple definitions of a function (or an external symbol in general), linkers commonly do. Ordinarily, when a linker processes a library file, its behavior is:
Examine each module in the library. If the module defines a symbol that is referenced by a previously incorporated object module but not yet defined, then include that module in the output the linker is building. If the module does not define any such symbol, do not use it.
Thus, for ordinary functions, the behavior of multiple definitions that appear in library files is defined by the linker, even though it is not defined by the C standard. (There can be complications, though. Suppose a program uses cos and sin, and the linker has already included a module that defines cos when it finds a library module that defines both sin and cos. Because the linker has an unresolved reference to sin, it includes this library module, which brings in a second definition of cos, causing a multiple-definition error.)
Although the linker behavior may be well defined, this still leaves the issue that compilers have built-in knowledge about the standard library functions. Consider this example. Here, I added a second printf, so the program has:
printf("Hello, world!");
printf("Hello, world!\n");
The program output is “aHello, world.\n”. This shows the program used your definition for the first printf call but used the standard behavior for the second printf call. The program behaves as if there are two different printf definitions in the same program.
Looking at the assembly language shows what happens. For the second call, the compiler decided that, since printf("Hello, world!\n"); is printing a string with no conversion specifications and ending with a new-line character, it can use the more-efficient puts routine instead. So the assembly language has call puts for the second printf. The compiler cannot do this for the first printf because it does not end with a new-line character, which puts automatically adds.
Please aware of declaration and definition. The term are totally different.
stdio.h only provide the declaration. And therefore, when you declare/define in your file, as long as the prototype is similar, it is fine with this.
You are free to define in your source file. And if it is available, the final program will link to the yours instead of the one in library.

C program without header

I write "hello world" program in C.
void main()
{ printf("Hello World"); }
// note that I haven't included any header file
The program compiles with warning as
vikram#vikram-Studio-XPS-1645:~$ gcc hello.c
hello.c: In function ‘main’:
hello.c:2:2: warning: incompatible implicit declaration of built-in function ‘printf’
vikram#vikram-Studio-XPS-1645:~$ ./a.out
Hello Worldvikram#vikram-Studio-XPS-1645:~$
How is this possible? How does the OS link a library without including any header?
The compiler builds your source file with a reference to a function called printf(), without knowing what arguments it actually takes or what its return type is. The generated assembly contains a push of the address of the string "Hello World" in the static data area of your program, followed by a call to printf.
When linking your object file into an executable, the linker sees a reference to printf and supplies the C standard library function printf(). By coincidence, the argument you have passed (const char*) is compatible with the declaration of the real printf(), so it functions correctly. However, note that the printf() that your program implicitly declares has return type int (I think), which the standard printf() also has; but if they differed, and you were to assign the result of calling printf() to a variable, you would be in the land of undefined behaviour and you would likely get an incorrect value.
Long story short: #include the correct headers to get the correct declarations for functions you use, because this kind of implicit declaration is deprecated, because it is error-prone.
The printf function is in the C library (libc in your case) which is linked implicitly (actually gcc has a printf builtin but it's outside the point).
Including the header doesn't bring in any functions for the linker, it simply informs the compiler about their declarations (i.e. "what they look like").
Obviously you should always include headers otherwise you force the compiler into making assumptions about what the functions look like.
In C, if you use a standard library function, you have to include the standard header where the function is declared. For printf you have to include stdio.h header file.
In C89 (and GNU C89 which is the language by default on gcc), a function declaration can be sometimes omitted because there is a feature called implicit function declaration: when a function identifier foo is used and the function has not been declared, the implementation would use this declaration:
/* foo is a function with an unspecified number of arguments */
extern int foo();
But this declaration is OK only for functions that return int with an unspecified but fixed number of arguments. If the function accepts a variable number of arguments (like printf) such program would invoke an undefined behavior.
Here is what C89/C90 says:
(C90, 6.7.1) "If a function that accepts a variable number of arguments is defined without a parameter type list that ends with the ellipsis notation, the behavior is undefined.
So gcc is kind enough to compile even in C89 and GNU C89: a compiler could refuse to compile.
Also note that
void main() { ... }
is not a valid definition for main (at least on hosted implementations which is probably your case).
If your main function doesn't take any argument use this valid definition:
int main(void) { ... }
The header usually1 contains only function declarations, symbolic constants, and macro definitions; it doesn't usually include function definitions.
All stdio.h gives you is the prototype declaration for printf:
int printf(const char * restrict format, ...); // as of C99
The implementation of printf is in a separate library file that your code links against.
Your code "works" for two reasons:
Under C89 and earlier versions, if the compiler sees a function call
before a declaration or definition of that function, it will assume
that the function returns int and takes an unspecified number of
parameters;
The implementation of printf returns an int, and you passed in
an argument that just happens to be compatible with what the
implementation of printf expects for the first argument.
And to echo what everyone else says, use int main(void) or int main(int argc, char **argv); unless your compiler documentation explicitly lists void main() as a legal signature, using it will invoke undefined behavior (which means everything from your code running with no apparent issues to crashing on exit to failing to load completely).
I say "usually"; I've run across some headers that contained code, but those were usually written by people who didn't know what they were doing. There may be very rare occasions where putting code in a header is justified, but as a rule it's bad practice.
hello.c:2:2: warning: incompatible implicit declaration of built-in function ‘printf’
To deal with this warning, you should include the header file (stdio.h). You're accidently using an old feature of C that has been deprecated since 1999.
Also, the fact that the link doesn't fail simply means that the standard C library is linked in by default. Whether or not you have included the relevant header is immaterial.

In C, main need not be a function?

This code compiles, but no surprises, it fails while linking (no main found):
Listing 1:
void main();
Link error: \mingw\lib\libmingw32.a(main.o):main.c:(.text+0x106) undefined reference to _WinMain#16'
But, the code below compiles and links fine, with a warning:
Listing 2:
void (*main)();
warning: 'main' is usually a function
Questions:
In listing 1, linker should have
complained for missing "main". Why
is it looking for _WinMain#16?
The executable generated from
listing 2 simply crashes. What is
the reason?
Thanks for your time.
True, main doesn't need to be a function. This has been exploited in some obfuscated programs that contain binary program code in an array called main.
The return type of main() must be int (not void). If the linker is looking for WinMain, it thinks that you have a GUI application.
In most C compilation systems, there is no type information associated with symbols that are linked. You could declare main as e.g.:
char main[10];
and the linker would be perfectly happy. As you noted, the program would probably crash, uless you cleverly initialized the contents of the array.
Your first example doesn't define main, it just declares it, hence the linker error.
The second example defines main, but incorrectly.
Case 1. is Windows-specific - the compiler probably generates _WinMain symbol when main is properly defined.
Case 2. - you have a pointer, but as static variable it's initialized to zero, thus the crash.
On Windows platforms the program's main unit is WinMain if you don't set the program up as a console app. The "#16" means it is expecting 16 bytes of parameters. So the linker would be quite happy with you as long as you give it a function named WinMain with 16 bytes of parameters.
If you wanted a console app, this is your indication that you messed something up.
You declared a pointer-to-function named main, and the linker warned you that this wouldn't work.
The _WinMain message has to do with how Windows programs work. Below the level of the C runtime, a Windows executable has a WinMain.
Try redefining it as int main(int argc, char *argv[])
What you have is a linker error. The linker expects to find a function with that "signature" - not void with no parameters
See http://publications.gbdirect.co.uk/c_book/chapter10/arguments_to_main.html etc
In listing 1, you are saying "There's a main() defined elsewhere in my code --- I promise!". Which is why it compiles. But you are lying there, which is why the link fails. The reason you get the missing WinMain16 error, is because the standard libraries (for Microsoft compiler) contain a definition for main(), which calls WinMain(). In a Win32 program, you'd define WinMain() and the linker would use the library version of main() to call WinMain().
In Listing 2, you have a symbol called main defined, so both the compiler & the linker are happy, but the startup code will try to call the function that's at location "main", and discover that there's really not a function there, and crash.
1.) An (compiler/platform) dependent function is called before code in main is executed and hence your behavior(_init in case of linux/glibc).
2) The code crash in 2nd case is justified as the system is unable to access the contents of the symbol main as a function which actually is a function pointer pointing to arbitrary location.

Why don't we get a compile time error even if we don't include stdio.h in a C program?

How does the compiler know the prototype of sleep function or even printf function, when I did not include any header file in the first place?
Moreover, if I specify sleep(1,1,"xyz") or any arbitrary number of arguments, the compiler still compiles it.
But the strange thing is that gcc is able to find the definition of this function at link time, I don't understand how is this possible, because actual sleep() function takes a single argument only, but our program mentioned three arguments.
/********************************/
int main()
{
short int i;
for(i = 0; i<5; i++)
{
printf("%d",i);`print("code sample");`
sleep(1);
}
return 0;
}
Lacking a more specific prototype, the compiler will assume that the function returns int and takes whatever number of arguments you provide.
Depending on the CPU architecture arguments can be passed in registers (for example, a0 through a3 on MIPS) or by pushing them onto the stack as in the original x86 calling convention. In either case, passing extra arguments is harmless. The called function won't use the registers passed in nor reference the extra arguments on the stack, but nothing bad happens.
Passing in fewer arguments is more problematic. The called function will use whatever garbage happened to be in the appropriate register or stack location, and hijinks may ensue.
In classic C, you don't need a prototype to call a function. The compiler will infer that the function returns an int and takes a unknown number of parameters. This may work on some architectures, but it will fail if the function returns something other than int, like a structure, or if there are any parameter conversions.
In your example, sleep is seen and the compiler assumes a prototype like
int sleep();
Note that the argument list is empty. In C, this is NOT the same as void. This actually means "unknown". If you were writing K&R C code, you could have unknown parameters through code like
int sleep(t)
int t;
{
/* do something with t */
}
This is all dangerous, especially on some embedded chips where the way parameters are passed for a unprototyped function differs from one with a prototype.
Note: prototypes aren't needed for linking. Usually, the linker automatically links with a C runtime library like glibc on Linux. The association between your use of sleep and the code that implements it happens at link time long after the source code has been processed.
I'd suggest that you use the feature of your compiler to require prototypes to avoid problems like this. With GCC, it's the -Wstrict-prototypes command line argument. In the CodeWarrior tools, it was the "Require Prototypes" flag in the C/C++ Compiler panel.
C will guess int for unknown types. So, it probably thinks sleep has this prototype:
int sleep(int);
As for giving multiple parameters and linking...I'm not sure. That does surprise me. If that really worked, then what happened at run-time?
This is to do with something called 'K & R C' and 'ANSI C'.
In good old K & R C, if something is not declared, it is assumed to be int.
So any thing that looks like a function call, but not declared as function
will automatically take return value of 'int' and argument types depending
on the actuall call.
However people later figured out that this can be very bad sometimes. So
several compilers added warning. C++ made this error. I think gcc has some
flag ( -ansic or -pedantic? ) , which make this condition an error.
So, In a nutshell, this is historical baggage.
Other answers cover the probable mechanics (all guesses as compiler not specified).
The issue that you have is that your compiler and linker have not been set to enable every possible error and warning. For any new project there is (virtually) no excuse for not doing so. for legacy projects more excuse - but should strive to enable as many as possible
Depends on the compiler, but with gcc (for example, since that's the one you referred to), some of the standard (both C and POSIX) functions have builtin "compiler intrinsics". This means that the compiler library shipped with your compiler (libgcc in this case) contains an implementation of the function. The compiler will allow an implicit declaration (i.e., using the function without a header), and the linker will find the implementation in the compiler library because you're probably using the compiler as a linker front-end.
Try compiling your objects with the '-c' flag (compile only, no link), and then link them directly using the linker. You will find that you get the linker errors you expect.
Alternatively, gcc supports options to disable the use of intrinsics: -fno-builtin or for granular control, -fno-builtin-function. There are further options that may be useful if you're doing something like building a homebrew kernel or some other kind of on-the-metal app.
In a non-toy example another file may include the one you missed. Reviewing the output from the pre-processor is a nice way to see what you end up with compiling.

Resources