Why does writing main; in C give a segfault - c

The following is my demo.c file:
main;
On compiling this gcc gives a warning:
demo.c:1:1: warning: data definition has no type or storage class
[enabled by default]
Running ./a.out gives a Segmentation Fault:
Segmentation fault (core dumped)
Is it because, (1) main is not defined anywhere and we are trying to execute it and (2) we are using an imperative statement outside any function, so it can't execute.
In either case, I still don't understand why it should throw a segfault.
Update: It might look similar to Is ‘int main;’ a valid C/C++ program?, but this is different, as not using any identifier, compiles the code.

Your code is formally illegal in standard C (it is generally "non-compilable"). The diagnostic message you received was intended to tell you exactly that.
However, your compiler apparently accepted it and interpreted it is some implementation-specific way. Apparently it interpreted that main as a definition for an int variable with external linkage (legacy K&R C-specific behavior). It created an object file that exports a single external symbol main (likely mangled in some implementation-specific way). Later linker registered that main as your program's entry point.
When you attempt to run your executable, loader passes control to the location of that main variable, mistakenly believing that this is program's entry point. The program crashes, since there's no valid executable code at that location. Or, more likely, it is data execution prevention that causes the program to crash.

Related

No symbol for COMDAT section

Been working on my own language for some time now and I've been implementing implicit and explicit casts. Went back to check what happens when invoking them. Everything went well, until I tried to invoke a cast from a double. Passing doubles to functions for some reason results in clang giving me an error when linking the obj file to the cpp file that invokes it, namely:"clang.exe: error: linker command failed with exit code 1143 (use -v to see invocation)program.obj : fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x5"
I've then noticed this also happens with floats, but does not occur with integers and the likes. I've written a short function (takes a double by value and returns it) to check whether the problem was related to parameters of types float/double. The error only occurs when invoking the function, when defining the function but not invoking it, everything functioned properly. I've also tried to simply store a double literal in memory (alloca) and that worked as well, so the problem has to be with actually passing the argument to the function.
Simple LLVM IR for a function that takes a double by value and returns it, then being invoked from main (main is names _mainCRTStartup as to not confuse it with the symbol for the main function from the cpp file that then invokes it (extern "C"))
Thanks for any help in advance! :D

Will address of `foo' here be always non-zero?

I have the following code:
#include <stdio.h>
#include <stdlib.h>
extern void foo(double);
int main(void) {
// printf("address is: %p\n", *foo);
if (foo)
puts("indeed");
else
puts("not");
exit(0);
}
It always compiles and prints indeed if line number 7 is commented but gcc warns me that:
the address of 'foo' will always evaluate as 'true' [-Waddress]
However, it doesn't compile if line 7 is uncommented:
/tmp/ccWvhcze.o: In function `main':
post.c:(.text.startup+0x5): undefined reference to `foo'
collect2: error: ld returned 1 exit status
Well, if address is always non-zero according to gcc why does linking fail? Of course, I would expect it because foo is just not defined here but why does gcc claim that address will always be non-zero? Is it mandated by C standard for all identifiers in the translation unit to always evaluate to true?
Testing if (pointer_expression) checks whether the pointer is a null pointer or not. Every valid object and function is guaranteed to not be a null pointer. (Note that a null pointer is not necessarily "zero".)
The compiler removes the test in if () because it's guaranteed to be true. Because of that the linker has nothing to resolve. Adding a printf of foo adds back the need to resolve foo.
You must remember that compilation and linking are two completely separate and independent phases for C programs. The compiler is saying "if (and only if) you link and execute this code, foo will always be non-zero". However, you then go on to attempt to link that object file into an executable without supplying a definition for foo (only a declaration). That's illegal, so linking fails. foo isn't NULL or not-NULL, it doesn't have a value.
Not tested because I'm writing this from my smartphone, I guess foo may become zero.
For example, linking with this NASM code will make foo zero if my thought is correct.
bits 32
absolute 0
global foo
global _foo
foo:
_foo:
UPDATE:
I tested this code with GCC 4.8.1 and NASM 2.11.08 and got output
address is: 00000000
indeed
Also, although this code is differ from the original, this code for xv6 on Ubuntu, GCC 4.6.3 ran on Vagrant VM
#include "types.h"
#include "user.h"
void foo(double a) {
(void)a;
}
int main(void) {
printf(1, "address is : %p\n", (void*)*foo);
if (foo)
printf(1, "indeed\n");
else
printf(1, "not\n");
exit();
}
emitted output
address is : 0
indeed
(remove -Werror from the CFLAGS in xv6 Makefile, or you will get a compile error)
Also note that *foo becomes address of function foo because foo as operand of operator * is converted to a pointer to function foo, then dereferenced with * and becomes function foo, then converted to a pointer again for function parameter.
As others have pointed out the GCC assumes 'foo' will never be zero. If you have function 'foo' then it will have some address, if function foo is not defined then your program simply will not link and there will be no executable to run. If you are linking to 'dll' or 'so' file that had function 'foo' at the time of linking, but later at the time of execution it was removed then your executable will not load. So GCC safely assumes that 'foo' will never be zero.
However, it is correct in 99.(99)% of the cases. As MikeCat pointed out the function foo could be located at address zero. This will never happen for user space executable or even the kernel code does not resort to such tricks. The only practical programs that do play with putting objects at absolute addresses are boot loaders and other things that run directly on the hardware.
They rely on linker scripts to place objects at correct addresses. Here is an example of code that does it technically yes, but this is very sick thing to do. The only C program I know that needs to resort to such tricks is boot loader code https://github.com/trini/u-boot/blob/master/arch/arm/mach-at91/armv7/u-boot-spl.lds
But, even if boot loaders can do such thing, GCC still might be correct. Most of the CPUs or OSes reserve address '0' for some special purposes. Most of the time the interrupt vector table is located there, in other cases OS or Architecture itself reserves address 0 for null pointer so any attempt to access it will result in exception. I believe x86 protected mode, which is the mode in which all popular OSes (Linux, Windows) run user programs, is one of such architectures.
So GCC in your case GCC is probably correct at optimizing away the if statement, because you are using x86 version of gcc. If you ever find the architecture where address 0 is not reserved, it will most likely have a version of GCC that does not perform this type of optimization.

Why does declaring main as an array compile?

I saw a snippet of code on CodeGolf that's intended as a compiler bomb, where main is declared as a huge array. I tried the following (non-bomb) version:
int main[1] = { 0 };
It seems to compile fine under Clang and with only a warning under GCC:
warning: 'main' is usually a function [-Wmain]
The resulting binary is, of course, garbage.
But why does it compile at all? Is it even allowed by the C specification? The section that I think is relevant says:
5.1.2.2.1 Program startup
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters [...] or with two parameters [...] or in some other implementation-defined manner.
Does "some other implementation-defined manner" include a global array? (It seems to me that the spec still refers to a function.)
If not, is it a compiler extension? Or a feature of the toolchains, that serves some other purpose and they decided to make it available through the frontend?
It's because C allows for "non-hosted" or freestanding environment which doesn't require the main function. This means that the name main is freed for other uses. This is why the language as such allows for such declarations. Most compilers are designed to support both (the difference is mostly how linking is done) and therefore they don't disallow constructs that would be illegal in hosted environment.
The section you refers to in the standard refers to hosted environment, the corresponding for freestanding is:
in a freestanding environment (in which C program execution may take place without any
benefit of an operating system), the name and type of the function called at program
startup are implementation-defined. Any library facilities available to a freestanding
program, other than the minimal set required by clause 4, are implementation-defined.
If you then link it as usual it will go bad since the linker normally has little knowledge about the nature of the symbols (what type it has or even if it's a function or variable). In this case the linker will happily resolve calls to main to the variable named main. If the symbol is not found it will result in link error.
If you're linking it as usual you're basically trying to use the compiler in hosted operation and then not defining main as you're supposed to means undefined behavior as per appendix J.2:
the behavior is undefined in the following circumstances:
...
program in a hosted environment does not define a function named
main
using one
of the specified forms (5.1.2.2.1)
The purpose of the freestanding possibility is to be able to use C in environments where (for example) standard libraries or CRT initialization is not given. This means that the code that is run before main is called (that's the CRT initialization that initializes the C runtime) might not provided and you would be expected to provide that yourself (and you may decide to have a main or may decide not to).
If you are interested how to create program in main array: https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html. The example source there just contains a char (and later int) array called main which is filled with machine instructions.
The main steps and problems were:
Obtain the machine instructions of a main function from a gdb memory dump and copy it into the array
Tag the data in main[] executable by declaring it const (data is apparently either writable or executable)
Last detail: Change an address for actual string data.
The resulting C code is just
const int main[] = {
-443987883, 440, 113408, -1922629632,
4149, 899584, 84869120, 15544,
266023168, 1818576901, 1461743468, 1684828783,
-1017312735
};
but results in an executable program on a 64 bit PC:
$ gcc -Wall final_array.c -o sixth
final_array.c:1:11: warning: ‘main’ is usually a function [-Wmain]
const int main[] = {
^
$ ./sixth
Hello World!
The problem is that main is not a reserved identifier. The C standard only says that in hosted systems there is usually a function called main. But nothing in the standard prevents you from abusing the same identifier for other sinister purposes.
GCC gives you a smug warning "main is usually a function", hinting that the use of the identifier main for other unrelated purposes isn't a brilliant idea.
Silly example:
#include <stdio.h>
int main (void)
{
int main = 5;
main:
printf("%d\n", main);
main--;
if(main)
{
goto main;
}
else
{
int main (void);
main();
}
}
This program will repeatedly print the numbers 5,4,3,2,1 until it gets a stack overflow and crashes (don't try this at home). Unfortunately, the above program is a strictly conforming C program and the compiler can't stop you from writing it.
main is - after compiling - just another symbol in an object file like many others (global functions, global variables, etc).
The linker will link the symbol main regardless of its type. Indeed, the linker cannot see the type of the symbol at all (he can see, that it isn't in the .text-section however, but he doesn't care ;))
Using gcc, the standard entry point is _start, which in turn calls main() after preparing the runtime environment. So it will jump to the address of the integer array, which usually will result in a bad instruction, segfault or some other bad behaviour.
This all of course has nothing to do with the C-standard.
It only compiles because you don't use the proper options (and works because linkers sometimes only care for the names of symbols, not their type).
$ gcc -std=c89 -pedantic -Wall x.c
x.c:1:5: warning: ISO C forbids zero-size array ‘main’ [-Wpedantic]
int main[0];
^
x.c:1:5: warning: ‘main’ is usually a function [-Wmain]
const int main[1] = { 0xc3c3c3c3 };
This compiles and executes on x86_64... does nothing just return :D

When is "undefined" fatal or just warning?

test.c(6) : warning C4013: 'add' undefined; assuming extern returning int
I've encountered many times when an undefined function will report an error,thus stopping the building process.
Why this time just a warning?
Perhaps you normally code in C++, and this is a C program. C++ is stricter than C; it won't let you call undeclared functions.
In C++, attempting to call a function without a valid declaration in scope is an error (whereas C requires the compiler to accept it and make certain assumptions in such a case).
If you have an undefined external at link time (as opposed to compile time), that will also stop the build -- it the linker can't find a definition of a function you've called (or tried to call, anyway), so it can't produce an executable.
Depending on your compiler, you can instruct it to treat warnings as errors. Although it may be inconvenient, this is often a good thing because the compiler knows more about the details of the code than you do.
In the GNU C suite -Werror is your friend. Offer void in Tuvalu.

In C, main need not be a function?

This code compiles, but no surprises, it fails while linking (no main found):
Listing 1:
void main();
Link error: \mingw\lib\libmingw32.a(main.o):main.c:(.text+0x106) undefined reference to _WinMain#16'
But, the code below compiles and links fine, with a warning:
Listing 2:
void (*main)();
warning: 'main' is usually a function
Questions:
In listing 1, linker should have
complained for missing "main". Why
is it looking for _WinMain#16?
The executable generated from
listing 2 simply crashes. What is
the reason?
Thanks for your time.
True, main doesn't need to be a function. This has been exploited in some obfuscated programs that contain binary program code in an array called main.
The return type of main() must be int (not void). If the linker is looking for WinMain, it thinks that you have a GUI application.
In most C compilation systems, there is no type information associated with symbols that are linked. You could declare main as e.g.:
char main[10];
and the linker would be perfectly happy. As you noted, the program would probably crash, uless you cleverly initialized the contents of the array.
Your first example doesn't define main, it just declares it, hence the linker error.
The second example defines main, but incorrectly.
Case 1. is Windows-specific - the compiler probably generates _WinMain symbol when main is properly defined.
Case 2. - you have a pointer, but as static variable it's initialized to zero, thus the crash.
On Windows platforms the program's main unit is WinMain if you don't set the program up as a console app. The "#16" means it is expecting 16 bytes of parameters. So the linker would be quite happy with you as long as you give it a function named WinMain with 16 bytes of parameters.
If you wanted a console app, this is your indication that you messed something up.
You declared a pointer-to-function named main, and the linker warned you that this wouldn't work.
The _WinMain message has to do with how Windows programs work. Below the level of the C runtime, a Windows executable has a WinMain.
Try redefining it as int main(int argc, char *argv[])
What you have is a linker error. The linker expects to find a function with that "signature" - not void with no parameters
See http://publications.gbdirect.co.uk/c_book/chapter10/arguments_to_main.html etc
In listing 1, you are saying "There's a main() defined elsewhere in my code --- I promise!". Which is why it compiles. But you are lying there, which is why the link fails. The reason you get the missing WinMain16 error, is because the standard libraries (for Microsoft compiler) contain a definition for main(), which calls WinMain(). In a Win32 program, you'd define WinMain() and the linker would use the library version of main() to call WinMain().
In Listing 2, you have a symbol called main defined, so both the compiler & the linker are happy, but the startup code will try to call the function that's at location "main", and discover that there's really not a function there, and crash.
1.) An (compiler/platform) dependent function is called before code in main is executed and hence your behavior(_init in case of linux/glibc).
2) The code crash in 2nd case is justified as the system is unable to access the contents of the symbol main as a function which actually is a function pointer pointing to arbitrary location.

Resources