Why is main method not mangled similar to other methods

Why is main method not mangled similar to other methods - linker

In the following program I want to know why main is not mangled similar to other methods:
int main()
{
}
int main1()
{
}
If I check the out of nm I see the main method is not mangled while main1 is. I tried to change the program entry from main to main1 using #pragma entry but it had not effect.
Appreciate your help on this.

If I check the out of nm I see the main method is not mangled while main1 is.
The main symbol is special in several respects:
you can't take its address
you can't call it yourself
it must not be mangled because the standard C runtime library will call it by unmangled main name.
So it's not mangled because the C++ standard requires that. See also this answer.

Related

Why should we include header file of a function prototype in the same file that the function is declared?

Might be a stupid (and really simple) question, but I've wanted to try since I don't know where to find an answer for that. I'm realizing some book, and I've started googling something - I was actually kinda curious why, if we have files like these:
file1.c
#include <stdio.h>
#include "file2.h"
int main(void){
printf("%s:%s:%d \n", __FILE__, __FUNCTION__, __LINE__);
foo();
return 0;
}
file2.h
void foo(void);
and
file2.c
#include <stdio.h>
#include "file2.h"
void foo(void) {
printf("%s:%s:%d \n", __FILE__, __func__, __LINE__);
return;
}
compiling it with:
gcc file1.c file2.c -o file -Wall
Why is it a good practice to include the header file of file2.h which contains prototype of the foo function in the same file that foo is declared? I totally understand attaching it to file1.c, while we should use the header file to define the interface of each module, rather than writing it "raw", but why attaching header file with the prototype to the file where it is declared (file2.c)? -Wall option flag also does not say anything if I won't include it, so why people say it is "the correct way"? Does it help avoiding errors, or is it just for clearer code?
Those code samples are taken from this discussion:
Compiling multiple C files in a program
Where some user said it is 'the correct way'.

To answer this question, you should have a basic understanding of the difference between the compiler and the linker. In a nuttshell, the compiler, compilers each translation unit (C file) alone then it's the linker's job to link all the compiled files together.
For instance, In the above code the linker is the one who is searching where the function foo() called from main() exists and links to it.
The compiler step comes first then the linker.
Let's demonstrate an example where including file2.h in file2.c comes handy:
file2.h
void foo(void);
file2.c
#include <stdio.h>
#include "file2.h"
void foo(int i) {
printf("%s:%s:%d \n", __FILE__, __func__, __LINE__);
return;
}
Here the prototype of foo() is different from its definition.
By including file2.h in file2.c so the compiler can check whether the prototype of the function is equivalent to the definition of it, if not then you will get a compiler error.
What will happen if file2.h is not included in file2.c?
Then the compiler won't find any issue and we have to wait until the linking step when the linker will find that there is no matching for the function foo() called from main() and it will through an error.
Why bother then if the linker, later on, will find out the error anyway?
Because in big solutions there might be hundreds of source codes that take so much time to be compiled so waiting for the linker to raise the error at the end will waste a great amount of time.

This is The Only True ReasonTM:
If the compiler encounters a call of a function without a prototype, it derives one from the call, see standard chapter 6.5.2.2 paragraph 6. If that does not match the real function's interface, it's undefined behavior in most cases. At best it does no harm, but anything can happen.
Only with a high enough warning level, compilers emit diagnostics like warnings or errors. That's why you should always use the highest warning level possible, and include the header file in the implementation file. You will not want to miss this chance to let your code being checked automatically.

C doesn’t mangle symbols usually (there are some exceptions eg. on Windows). Mangled symbols would carry type information. Without it, the linker trusts that you didn’t make mistakes.
If you don’t include the header, you can declare the symbol to be one thing, but then define it to be whatever else. Eg. in the header you might declare foo to be a function, and then in the source file you can define it to be a totally incompatible function (different calling convention and signature), or even not a function at all – say a global variable. Such a project may link but won’t be functional. The error may be in fact hidden, so if you don’t have solid tests in place, you won’t catch it until a customer lets you know. Or worse, there’s a news article about it.
In C++ the symbol carries information about its type, so if you declare one thing and then define something with same base name but an incompatible type, the linker will refuse to link the project, since a particular symbol is referenced but never defined.
So, in C you include the header to prevent mistakes that the tools can’t catch, that will result in a broken binary. In C++, you do it so that you’ll get perhaps an error during compilation instead of later in the link phase.

Function pointer usage with hierarchical control: xtern/namespace C++

Below is a sample usage from an older and newer version of a software stack. How would the function usage and access differ with the hierarchical structuring of the two pieces of
code below:
namespace std
{
typedef void (*function)();
extern "C" function fn_ptr(function) throw();
}
And
extern "C++"
{
namespace std
{
typedef void (*function)();
function fn_ptr(function) throw();
}
}
The first one is easy but I wish to access fn_ptr from both C and
C++ based files in the 2nd example. Note that it is extern "C++" and there isn't much to find about extern "C++" usage on Stackoverflow or Google.

The second version does not allow direct access from a program written in C.
Of course, nothing stops the C program from calling some other C++ function declared extern "C", which in turn calls std::fn_ptr.
Although this point has been hammered into the ground in comments, it's worth noting that you are not allowed to define your own names in namespace std. Presumably the code you are quoting comes from a library implementation designed to be used in a stand-alone environment. Using namespace std is not relevant to the issue, and is just a distraction from your question.

Here is the unique approach to accessing a function defined in C++ from C. extern "C++" is implicit by default in standard.
Let us assume that you have a .c file (FileC.c) and you wish to call a function defined in .cpp (FileC++.cpp). Let us define the function in C++ file as:
void func_in_cpp(void)
{
// whatever you wanna do here doesn't matter what I am gonna say!
}
Do the following steps now (to be able to call the above function from
a .c file):
1) With you regular C++ compiler (or www.cpp.sh), write a very simple program that includes your function name (func_in_cpp). Compile your program. E.g.
$ g++ FileC++.cpp -o test.o
2) Find the mangled name of your function.
$ nm test.out | grep -i func_in_cpp
[ The result should be "_Z11func_in_cppv" ]
3) Go to your C program and do two things:
void _Z11func_in_cppv(void); // provide the external function definition at the top in your program. Function is extern by default in C.
int main(void)
{
_Z11func_in_cppv(); // call your function to access the function defined in .cpp file
}

How to create modules in C

I have an interface with which I want to be able to statically link modules. For example, I want to be able to call all functions (albeit in seperate files) called FOO or that match a certain prototype, ultimately make a call into a function in the file without a header in the other files. Dont say that it is impossible since I found a hack that can do it, but I want a non hacked method. (The hack is to use nm to get functions and their prototypes then I can dynamically call the function). Also, I know you can do this with dynamic linking, however, I want to statically link the files. Any ideas?

Put a table of all functions into each translation unit:
struct functions MOD1FUNCS[]={
{"FOO", foo},
{"BAR", bar},
{0, 0}
};
Then put a table into the main program listing all these tables:
struct functions* ALLFUNCS[]={
MOD1FUNCS,
MOD2FUNCS,
0
};
Then, at run time, search through the tables, and lookup the corresponding function pointer.

This is somewhat common in writing test code. e.g., you want to call all functions that start with test_. So you have a shell script that grep's through all your .C files and pulls out the function names that match test_.*. Then that script generates a test.c file that contains a function that calls all the test functions.
e.g., generated program would look like:
int main() {
initTestCode();
testA();
testB();
testC();
}
Another way to do it would be to use some linker tricks. This is what the Linux kernel does for its initialization. Functions that are init code are marked with the qualifier __init. This is defined in linux/init.h as follows:
#define __init __section(.init.text) __cold notrace
This causes the linker to put that function in the section .init.text. The kernel will reclaim memory from that section after the system boots.
For calling the functions, each module will declare an initcall function with some other macros core_initcall(func), arch_initcall(func), et cetera (also defined in linux/init.h). These macros put a pointer to the function into a linker section called .initcall.
At boot-time, the kernel will "walk" through the .initcall section calling all of the pointers there. The code that walks through looks like this:
extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[];
static void __init do_initcalls(void)
{
initcall_t *fn;
for (fn = __early_initcall_end; fn < __initcall_end; fn++)
do_one_initcall(*fn);
/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_work();
}
The symbols __initcall_start, __initcall_end, etc. get defined in the linker script.
In general, the Linux kernel does some of the cleverest tricks with the GCC pre-processor, compiler and linker that are possible. It's always been a great reference for C tricks.

You really need static linking and, at the same time, to select all matching functions at runtime, right? Because the latter is a typical case for dynamic linking, i'd say.
You obviusly need some mechanism to register the available functions. Dynamic linking would provide just this.

I really don't think you can do it. C isn't exactly capable of late-binding or the sort of introspection you seem to be requiring.
Although I don't really understand your question. Do you want the features of dynamically linked libraries while statically linking? Because that doesn't make sense to me... to static link, you need to already have the binary in hand, which would make dynamic loading of functions a waste of time, even if you could easily do it.

In C, main need not be a function?

This code compiles, but no surprises, it fails while linking (no main found):
Listing 1:
void main();
Link error: \mingw\lib\libmingw32.a(main.o):main.c:(.text+0x106) undefined reference to _WinMain#16'
But, the code below compiles and links fine, with a warning:
Listing 2:
void (*main)();
warning: 'main' is usually a function
Questions:
In listing 1, linker should have
complained for missing "main". Why
is it looking for _WinMain#16?
The executable generated from
listing 2 simply crashes. What is
the reason?
Thanks for your time.

True, main doesn't need to be a function. This has been exploited in some obfuscated programs that contain binary program code in an array called main.
The return type of main() must be int (not void). If the linker is looking for WinMain, it thinks that you have a GUI application.

In most C compilation systems, there is no type information associated with symbols that are linked. You could declare main as e.g.:
char main[10];
and the linker would be perfectly happy. As you noted, the program would probably crash, uless you cleverly initialized the contents of the array.
Your first example doesn't define main, it just declares it, hence the linker error.
The second example defines main, but incorrectly.

Case 1. is Windows-specific - the compiler probably generates _WinMain symbol when main is properly defined.
Case 2. - you have a pointer, but as static variable it's initialized to zero, thus the crash.

On Windows platforms the program's main unit is WinMain if you don't set the program up as a console app. The "#16" means it is expecting 16 bytes of parameters. So the linker would be quite happy with you as long as you give it a function named WinMain with 16 bytes of parameters.
If you wanted a console app, this is your indication that you messed something up.

You declared a pointer-to-function named main, and the linker warned you that this wouldn't work.
The _WinMain message has to do with how Windows programs work. Below the level of the C runtime, a Windows executable has a WinMain.

Try redefining it as int main(int argc, char *argv[])
What you have is a linker error. The linker expects to find a function with that "signature" - not void with no parameters
See http://publications.gbdirect.co.uk/c_book/chapter10/arguments_to_main.html etc

In listing 1, you are saying "There's a main() defined elsewhere in my code --- I promise!". Which is why it compiles. But you are lying there, which is why the link fails. The reason you get the missing WinMain16 error, is because the standard libraries (for Microsoft compiler) contain a definition for main(), which calls WinMain(). In a Win32 program, you'd define WinMain() and the linker would use the library version of main() to call WinMain().
In Listing 2, you have a symbol called main defined, so both the compiler & the linker are happy, but the startup code will try to call the function that's at location "main", and discover that there's really not a function there, and crash.

1.) An (compiler/platform) dependent function is called before code in main is executed and hence your behavior(_init in case of linux/glibc).
2) The code crash in 2nd case is justified as the system is unable to access the contents of the symbol main as a function which actually is a function pointer pointing to arbitrary location.

Can the C main() function be static?

Can the main() function be declared static in a C program? If so then what is the use of it?
Is it possible if I use assembly code and call the static main() function myself (consider embedded programs)?

No. The C spec actually says somewhere in it (I read the spec, believe it or not) that the main function cannot be static.
The reason for this is that static means "don't let anything outside this source file use this object". The benefit is that it protects against name collisions in C when you go to link (it would be bad bad bad if you had two globals both named "is_initialized" in different files... they'd get silently merged, unless you made them static). It also allows the compiler to perform certain optimizations that it wouldn't be able to otherwise. These two reasons are why static is a nice thing to have.
Since you can't access static functions from outside the file, how would the OS be able to access the main function to start your program? That's why main can't be static.
Some compilers treat "main" specially and might silently ignore you when you declare it static.
Edit: Looks like I was wrong about that the spec says main can't be static, but it does say it can't be inline in a hosted environment (if you have to ask what "hosted environment" means, then you're in one). But on OS X and Linux, if you declare main static, then you'll get a link error because the linker can't find the definition of "main".

You could have a static function called main() in a source file, and it would probably compile, but it would not be the main() function because it would be invisible to the linker when the start-up code (crt0.o on many (older) Unix systems) calls main().
Given the code:
static int main(int argc, char **argv)
{
return(argv + argc);
}
extern int x(int argc, char **argv)
{
return(main(argc, argv));
}
GCC with -Wall helpfully says:
warning: 'main' is normally a non-static function
Yes, it can be done. No, it is normally a mistake - and it is not the main() function.

No you cannot do it. If you will do it you will be unable to compile your program. Because static function is only visible within the same file, so the linker will no be able to find it and make a call of it.

As others have said, no it can't. And that goes double if you ever intend to port your code to C++, as the C++ Standard specifies that main() need not actually be a function.

C has two meanings for 'static'...
static for a local variable means it can be used globally.
static for a global variable means is can only be used in the current file.
static for functions has the exact same impact as denoting a global variable as static ... the static function IS ONLY VISIBLE IN THE CURRENT FILE ...
Thus main can NEVER be static, because it would not be able to serve as the primary entry point for the program.