Win32, WinMain vs custom Entry Point (huge size difference), why?

Win32, WinMain vs custom Entry Point (huge size difference), why? - c

As topic says.
I noticed that if i use WinMain or any other default Entry Point, a C application can be like 70kb.
But if i just specify a custom Entry Point, say "RawMain", int RawMain().
Then the file will be like 6kb.
So i am wondering, why is this, what does it add/reference to the file?
I could understand there being some small difference in size, but the difference is huge for an empty application.
Thanks!

When building for windows in most environments, the actual program entry point will be provided by a function in a small runtime library. That will do some environment preparation and then call a function you provide, such as main, wmain, WinMain, etc.
The code that runs before your user-provided main function includes running global C++ constructors, enabling TLS variables, initializing global mutexes so that standard-library calls work properly in a multithreaded environment, setting up the standard locale, and other stuff.
One thing that setting the entry point does is starts the linker with an undefined symbol with the name you give the entry point, so for example, if you're using mingw32, the linker will start assuming that it needs to link libmingw32.a and with the undefined symbol __tmainCRTStartup.
The linker will find (hopefully) __tmainCRTStartup in libmingw32.a, and include the object file crtexe.o which contains it, along with anything else needed to satisfy undefined symbols emanating from crtexe.o, which is where the extra size comes from.
When you set your own entry point, you override this, and just set the linker to look for whatever function you specify. You get a smaller executable, but you have to be careful that features you're using don't rely on any of the global initialization that would be done by the runtime's startup function.

Related

Using pointer functions - 2 separate applications on 1 device

I asked some time ago this question How can I use one function from main application and bootloader? (embedded) and started to implement proposed solution but ran into a few problems.
On my cortex M4 I, have 2 separate applications - bootloader and user application. Now I had some (many) functions which were the same for both apps. So I compiled them only for bootloader, then created an array of function pointers at specified address, which is known for user application. So in application, I didn't compile the files with those functions again, but I use those pointers whenever needed.
This is example of code I tried to make common for both applications:
static uint8_t m_var_1;
// Sends events to the application.
static void send_event(fs_op_t const * const p_op, fs_ret_t result)
{
uint8_t var_2;
[...]
}
My application ends in Hardfault, which happens e.g. when dividing by zero or using pointer to function with NULL value. I am not sure why yet, but I started wondering what happens with those variables. var_2 will most surely be located on stack so this is no problem. But what about m_var_1? In the map file, it has a specified place in RAM. But I don't have seperate RAM sections for app and bootloader. I am not sure, but I have a feeling that this variable may use the same RAM location as when created for bootloader. Is this possible? Maybe some other issues?

Yes you are right, the code will attempt to access the global variable at the same location as it is linked for loader. This is because linking involves replacing all occurrences of identifiers (including function names and variable names) by the addresses determined after compiling.
In your application, the variable, even if it does exist there too, is likely to be at a different address.
The calling of the functions happens to work, because they are located in ROM and cannot be different for application and loader. Calling them via const pointers, which are also stored in ROM, bypasses the problem.
The solution is using a file system simulator, if you can find one for your hardware.
Otherwise you will hate having to do the following.
Part 1, setup:
introduce a special linker section with all the variables accessed by both system paprts (application and loader)
let one linker fill it
set it up for the other linker as don't-tocuh
be careful with the initialisation
preferrably do not assume any intialisation value
if you need initialisation, e.g. "bss" (init to 0) or "data" (init to specified value),
do so explicitly at the start of the system part which is not associated to the linker you let setup the variables
for safety, it is recommended to do the init the same way in both system parts
"data" init uses a special non-volatile linker section with a copy of the to-be-initialised variables, accessing that is possible
Part 2, access:
option 1)
store const pointers to those variables, like you did for the functions
option 2)
get the second linker (the other one, which did not do the actual setup of the common variable section) to create an identically structured and identically located section as the one from first linker; more studying of your linker needed here
Part 3, resuing values stored by other system part
(e.g. you want to leave some kind of message from loader, to be read my application)
design which system part initisalises which variable, the other one only reads them
separate the common variables in four sections,
written and read by both system parts, initialised by both
written and read by x, only read by y, initialised by x
written and read by y, only read by x, initialised by y
written by both system parts, not initialised, uses checksums and plausibility cehcks,
if a variable has not been initialised, init to default
init each section only in the corresponding writer system part
setup as "no init" in the other linker
setup as "no init" in both linkers for the fourth case
use getters and setters with checksum update and plausibility for the fourth case
To do all that, intense study of your linker features and syntax is needed.
So I recommend not to try, if you can get around it. Consider using an existing file system simulator; because that is basically what above means.

How to circumvent dlopen() caching?

According to its man page, dlopen() will not load the same library twice:
If the same shared object is loaded again with dlopen(), the same
object handle is returned. The dynamic linker maintains reference
counts for object handles, so a dynamically loaded shared object is
not deallocated until dlclose() has been called on it as many times
as dlopen() has succeeded on it. Any initialization returns (see
below) are called just once. However, a subsequent dlopen() call
that loads the same shared object with RTLD_NOW may force symbol
resolution for a shared object earlier loaded with RTLD_LAZY.
(emphasis mine).
But what actually determines the identity of shared objects? I tried to look into the code, but did not come very far. Is it:
some form of normalized path name (e.g. realpath?)
the inode ?
the contents of the libray?
I am pretty sure that I can rule out this last point, since an actual filesystem copy yields two different handles.
To explain the motivation behind this question: I am working with some code that has static global variables. I need multiple instances of that code to run in a thread-safe manner. My current approach is to compile and link said code into a dynamic library and load that library multiple times. With some linker magic, it appears to create several copies of the globals and resolve access in each library to its own copies. The only problem is that my prototype copies the generated library n times for n concurrent uses. This is not only somewhat ugly but I also suspect that it might break on a different platform.
So what is the exact behaviour of dlopen() according to the POSIX standard?
edit: Because it came up in a comment and an answer, no refactoring the code is definitely not an option. It would involve months or even years of work and potentially sacrifice all benefits of using the code in the first place. There exists an ongoing research project that might solve this problem in a much cleaner way, but it is actual research and might fail. I need a solution now.
edit2: Because people still seem to not believe the usecase is actually valid. I am working on a pure functional language, that shall be embedded into a larger C/C++ application. Because I need a prototype with a garbage collector, a proven typechecker, and reasonable performance ASAP, I used OCaml as intermediate code. Right now, I am compiling a source module into an OCaml module, link the generated object code (including startup etc.) into a shared library with the OCaml runtime and dlopen() that shared library. Every .so has its own copy of the runtime, including several global variabels (e.g. the pointer to the young generation) and that is, or rather should be, totally fine. The library exposes exactly two functions: An initializer and a single export that does whatever the original module is intended to do. No symbols of the OCaml runtime are exported/shared. when I load the library, its internal symbols are relocated as expected, the only issue I have right now is that I actually need to copy the .so file for each instance of the job at runtime.
Regarding thread-local-storage: That is actually an interesting idea, as the modification to the runtime is indeed rather simple. But the problem is the machine code generated by the OCaml compiler, as it cannot emit loading instructions for tls symbols (yet?).

POSIX says:
Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.
So the answer is "inode". Copying the library file "should work", but hard links won't. Except. Since they will expose the same global symbols and when that happens all (portability) bets are off. You're in the middle of weakly defined behavior that has evolved through bug fixes rather than good design.
Don't dig deeper when you're in a hole. The approach to add additional horrible hacks to make a fundamentally broken library work just leads to additional breakage. Just spend a few hours to fix the library to not use globals instead of spending days to hack around dynamic linking (which will be unportable at best).

What is "my_main()" in the code

I've seen code like below in a project:
extern void my_main(void) __attribute__ ((__noreturn__, asection(".main","f=ax")));
What does this do?
The project does not have a direct main() function in it. Does the above code indicate to the compiler that my_main() should be treated as main()?
Also, what does the .main memory section indicate?

What the above declaration basically does is declare an extern function called my_main() with no arguments.
The __attribute__ section is a GNU/LLVM attribute syntax. Attributes are basically pragmas that describe some non-standard or extended feature of the function in question - in this case, my_main().
There are two attributes applied to my_main().
__noreturn__ (search for noreturn) indicates that the function will never return.
This is different from returning void - in void-type functions, calls to the function still return at some point, even without a value. This means execution will jump/return back to the caller.
In noreturn (a.k.a. _noreturn or __noreturn__) functions, this indicates that, among other things, calls to this function shouldn't add the return address to the stack, as the function itself will either exit before execution returns, or will long jump to another point in execution.
It is also used in places where adding the return address to the stack will disrupt the stack in a way that interferes with the called function (though this is rare and I've only ever seen it used for this reason once).
The second attribute, asection(".main","f=ax"), is a little more vague. I can't seem to find specific documentation for it, but it seems more or less pretty straightforward.
What it appears to be doing is specifying a linker section as well as what appears to be a unix filemode specifying that the resulting binary is executable, though I could be wrong.
When you write native code, all functionality is placed into appropriate sections of the target binary format (e.g. ELF, Mach-O, PE, etc.) The most common sections are .text, .rodata, and .data.
However, when invoking ld, the GCC linker, you can specify a linker script to specify exactly how you want the target binary to be constructed.
This includes sections, sizes, and even the object files you want to use to make the file, specifying where they should go and their size limits.
One common misconception is that you never use ld. This isn't the case; when you run gcc or g++ or the clang-family of compilers without the -c flag, you inadvertently invoke ld with a default linker script used to link your binaries.
Linker scripts are important especially for embedded hardware where ROM must be built to memory specification.
So back to your line of code: it places my_func() into an arbitrary section called .main. That's all it does. Ultimately, somewhere in your project, there is a linker script that specifies how .main is used and where it goes.
I would imagine the goal of this code was to place my_main() at an exact address in the target binary/executable, so whatever is using it knows the exact location of that function (asection(".main")) and can use it as an entry point (__noreturn__).

manually setting function address gcc

I've got a worked binary used in embeded system. Now i want to write a some kind of patch for it. The patch will be loaded into a RAM bellow the main program and then will be called from main program. The question is how to tell gcc to use manually setted addresses of some function which will be used from patch. in other words:
Old code has function sin() and i could use nm to find out the address of sin() in old code. My patched code will use sin() (or something else from main programm) and i want to tell the gcc (or maybe ld or maybe something else) for it to use the static address of function sin() while it linking the patched code. is it possible?

The problem is that you would gave to replace all references to the original sin() function for the patched code. That would require the runtime system to contain all the object code data used to resolve references, and for the original code to be modifiable (i.e. not in ROM for example).
Windriver's RTOS VxWorks can do something close to what you are suggesting; the way it does it is you use "partial linking" (GNU linker option -r) to generate an object file with links that will be resolved at runtime - this allows an object file to be created with unresolved links - i.e. an incomplete executable. VxWorks itself contains a loader and runtime "linker" that can dynamically load partially linked object files and resolve references. A loaded object file however must be resolvable entirely using already loaded object code - so no circular dependencies, and in your example you would have to reload/restart the system so that the object file containing the sin() were loaded before those that reference it, otherwise only those loaded after would use the new implementation.
So if you were to use VxWorks (or an OS with similar capabilities), the solution is perhaps simple, if not you would have to implement your own loader/linker, which is of course possible, but not trivial.
Another, perhaps simpler possibility is to have all your code call functions through pointers that you hold in variables, so that all calls (or at least all calls you might want to replace) are resolved at runtime. You would have to load the patch and then modify the sin() function's pointer so that all calls thereafter are made to the new function. The problem with this approach is that you would either have to know a priori which functions you might later want to replace, or have all functions called that way (which may be prohibitively expensive in memory terms. It would perhaps be useful for this solution to have some sort of preprocessor or code generator that would allow you to mark functions that would be "dynamic" in this way and could automatically generate the pointers and calling code. So for example you might write code thus:
__dynamic void myFunction( void ) ;
...
myFunction() ;
and your custom preprocessor would generate:
void myFunction( void ) ;
void (*__dynamic_myFunction)(void) = myFunction() ;
...
__dynamic_myFunction() ;
then your patch/loader code would reassign myFunctionDyn with the address of the replacement function.
You could generate a "dynamic symbol table" containing just the names and addresses of the __dynamic_xxxxx symbols and include that in your application so that a loader could change the __dynamic_xxxxx variables by matching the xxxxx name with the symbols in the loaded object file - if you load a plain binary however you would have to provide the link information to the loader - i.e. which __dynamic_xxxxx variable to be reasssigned and teh address to assign to it.

Some general C questions

I am trying to fully understand the process pro writing code in some language to execution by OS. In my case, the language would be C and the OS would be Windows. So far, I read many different articles, but I am not sure, whether I understand the process right, and I would like to ask you if you know some good articles on some subjects I couldn´t find.
So, what I think I know about C (and basically other languages):
C compiler itself handles only data types, basic math operations, pointers operations, and work with functions. By work with functions I mean how to pass argument to it, and how to get output from function. During compilation, function call is replaced by passing arguments to stack, and than if function is not inline, its call is replaced by some symbol for linker. Linker than find the function definition, and replace the symbol to jump adress to that function (and of course than jump back to program).
If the above is generally true and I get it right, where to final .exe file actually linker saves the functions? After the main() function? And what creates the .exe header? Compiler or Linker?
Now, additional capabilities of C, today known as C standart library is set of functions and the declarations of them, that other programmers wrote to extend and simplify use of C language. But these functions like printf() were (or could be?) written in different language, or assembler. And there comes my next question, can be, for example printf() function be written in pure C without use of assembler?
I know this is quite big question, but I just mostly want to know, wheather I am right or not. And trust me, I read a lots of articles on the web, and I would not ask you, If I could find these infromation together on one place, in one article. Insted I must piece by piece gather informations, so I am not sure if I am right. Thanks.

I think that you're exposed to some information that is less relevant as a beginning C programmer and that might be confusing you - part of the goal of using a higher level language like this is to not have to initially think about how this process works. Over time, however, it is important to understand the process. I think you generally have the right understanding of it.
The C compiler merely takes C code and generates object files that contain machine language. Most of the object file is taken by the content of the functions. A simple function call in C, for example, would be represented in the compiled form as low level operators to push things into the stack, change the instruction pointer, etc.
The C library and any other libraries you would use are already available in this compiled form.
The linker is the thing that combines all the relevant object files, resolves all the dependencies (e.g., one object file calling a function in the standard library), and then creates the executable.
As for the language libraries are written in: Think of every function as a black box. As long as the black box has a standard interface (the C calling convention; that is, it takes arguments in a certain way, returns values in a certain way, etc.), how it is written internally doesn't matter. Most typically, the functions would be written in C or directly in assembly. By the time they make it into an object file (or as a compiled library), it doesn't really matter how they were initially created, what matters is that they are now in the compiled machine form.
The format of an executable depends on the operating system, but much of the body of the executable in windows is very similar to that of the object files. Imagine as if someone merged together all the object files and then added some glue. The glue does loading related stuff and then invokes the main(). When I was a kid, for example, people got a kick out of "changing the glue" to add another function before the main() that would display a splash screen with their name.
One thing to note, though is that regardless of the language you use, eventually you have to make use of operating system services. For example, to display stuff on the screen, to manage processes, etc. Most operating systems have an API that is also callable in a similar way, but its contents are not included in your EXE. For example, when you run your browser, it is an executable, but at some point there is a call to the Windows API to create a window or to load a font. If this was part of your EXE, your EXE would be huge. So even in your executable, there are "missing references". Usually, these are addressed at load time or run time, depending on the operating system.

I am a new user and this system does not allow me to post more than one link. To get around that restriction, I have posted some idea at my blog http://zhinkaas.blogspot.com/2010/04/how-does-c-program-work.html. It took me some time to get all links, but in totality, those should get you started.

The compiler is responsible for translating all your functions written in C into assembly, which it saves in the object file (DLL or EXE, for example). So, if you write a .c file that has a main function and a few other function, the compiler will translate all of those into assembly and save them together in the EXE file. Then, when you run the file, the loader (which is part of the OS) knows to start running the main function first. Otherwise, the main function is just like any other function for the compiler.
The linker is responsible for resolving any references between functions and variables in one object file with the references in other files. For example, if you call printf(), since you do not define the function printf() yourself, the linker is responsible for making sure that the call to printf() goes to the right system library where printf() is defined. This is done at compile-time.
printf() is indeed be written in pure C. What it does is call a system call in the OS which knows how to actually send characters to the standard output (like a window terminal). When you call printf() in your program, at compile time, the linker is responsible for linking your call to the printf() function in the standard C libraries. When the function is passed at run-time, printf() formats the arguments properly and then calls the appropriate OS system call to actually display the characters.