Few WinMain questions - c

I have few very simple question. I searched a web for them, but I found different answers so I just want to know which to follow.
So, first, I believe WinMain is NOT C or C++ standart, but is only added by Microsoft to determine when to load different CRT startup code, am I right?
And second, is WinMain called by OS, in a way of lets say similiar to dynamic linking, or is it just program startup point like main?
Why I ask? I mainly used C for programming MCUs. I am more HW oriented than SW, so I like MCUs, I find them and programming for them more "clear".
But when I started to get interested about C language itself and its standart, I found that its very hard. I mean, for example, on MCU, you need no int return type of main, as well as in win32 app you need different startup code than pure main has.
So, I like C but I find its standart to be somehow old. Thanks.

I believe WinMain is NOT C or C++ standart, but is only added by Microsoft to determine when to load different CRT startup code, am I right?
Yes. All C and C++ standards define main() (and only main()) as the program entry point (although its exact signature may vary between languages and standard versions).
And second, is WinMain called by OS, in a way of lets say similiar to dynamic linking, or is it just program startup point like main?
It is actually called from main(). There is a main() in Windows programs too, just hidden deep within WinAPI code.

Although it's all the same, consider C as being 3 languages:
Standard free standing
Standard hosted implementation
Extended hosted implementation
What you describe (WinMain) belongs to type 3.
Type 3 programs work on computers which describe the specific extensions they use
Type 2 has a lot of rules, but offers a guarantee that programs written in that type will work the same on every computer system with a standard C compiler (virtually every computer with a keyboard attached (including PDA, wrist watch, ..., ...)).
Type 1 is the same as type 2 minus a few of the rules and the standard library -- and it should work for every processor on Earth.
The text of the Standard is from 1999 2001 2004 2007. You can find a PDF at the ISO site ( http://www.open-std.org/jtc1/sc22/wg14/www/standards )

Related

Which mechanism knows the entry point of a program is main()

How does an application program know its entry point is the main() function?
I know an application doesn't know its entry point is main() -- it is directed to main() function by means of the language specification whatever it is.
At that point, where is the specification actually declared? For example in C, entry point shall be main() function. Who provides this mechanism to the program? An operating system or compiler?
I came to the question after disassembling a canonical simple "Hello World" example in Visual Studio.
In this code there are only a few lines and a function main().
But after disassembling it, there are lots of definitions and macro in the memory space and main() is not the only declaration and definiton.
Here below disassembling part's screenshot. I also know there is a strict rule in language definition which is only one main() function must be defined and exist.
To summarize my question: I wonder which mechanism directs or sets main() function as an entry point of an application program.
The application does not know that main() is the entry point. Firstly, we assume C not C++ here despite your picture.
For C the "C" entry point is main(). But you cant just start execution there as we have assumptions, more than that, rules, in C that for example .data needs to be initialized and .bss zeroed.
unsigned x = 1;
unsigned int y;
We expect that when main() is hit that x=1. and most folks assume and perhaps it is specified that y = 0 at that time, I wouldn't make that assumption, but anyway.
We also need a stack pointer and need to deal with argc/argv. If C++ then other stuff has to be done. Even for C depending.
The APPLICATION does not generally know any of this. You are likely working with a C library and that library is/should be responsible for bootstrap code that preceeds main() as well as a linker script for the linker as bootstrap and linker script are intimately related. And one could argue based on some implementations that the C library is separable from the toolchain as we know with gnu you can choose from different ones and those have different bootstraps and linker scripts. But I am sure there are many that are intimately related, there is also a relationship of the library and the operating system as so many C library calls end up in one or countless system calls.
You design an operating system, part of the design of the operating system assuming it supports runtime loadable applications is a file format that the operating systems loader supports, features that the operating system loader wants to support and how they overlap with the file format, not uncommon for the OS to define the file format, but with elf and others (not accidentally/independently created no doubt) you have opportunities for a new OS to use an existing container like elf. The OS design and its loader determines a lot of things, and the C library that mates up with all of that has to follow all of those rules, if integrated into the compiler then the compiler has to play along as well.
It is not the application that knows it is part of the system design and the application is simply a slave to all of that, when you compile on that platform for that platform all of these rules and relationships are in play, you are putting in a very small part of the puzzle, the rest is already in place, what file formats are supported, per format what information is required, what rules are required that the compiler/library solution must provide. The system design dictates if .data and .bss are zeroed by the loader or by the application and what I mean by that is by the bootstrap not the user's portion of the program, you cant bootstrap C in C because that C would need a bootstrap and if that bootstrap were in C that C would need a bootstrap and so on.
int main ( void )
{
return 0;
}
there are a lot of things going on in the background when you compile that program not just the few instructions that might be needed to implement that code.
compile that program on windows and Linux and mac and different versions of each with different compilers for each or C libraries, and different versions of each, etc. And what you should expect to see is perhaps even if the same target ISA, same computer even, some percentage of the combinations MIGHT choose the same few instructions for the function, what is wrapped around it is expected to be maybe similar but not the same. Would be no reason to be surprised if some of the implementations are very different from each other.
And this is all for full blown operating systems that load programs into ram and run them, for embedded things don't be surprised if the differences are even bigger. Within a full blown os you would expect to see an mmu and the application gets a perhaps zero based address space for .text, .data, .bss at a minimum so all the solutions might have a favorite place or favorite number of sections in the same order in the binary but the size of each may be specific to the implementation. The order/size might vary by C library version or compiler version, etc.
The magic is in the system design. and that is not magic, that is design. main() cannot be entered directly and still have various parts of the language still work like .data and .bss init, stack pointer can be solved before the entry but how and where .data and .bss are is application specific so cant be handled by a simple branch to main from the OS.
The linker for your toolchain can be told in various ways where the entry point is it could be assumed/dictated for that tool/target or a command line option or a linker script option, or some special symbol you put on a label or whatever the designers choose. main is assumed to be the C entry point, although that doesn't actually mean it is there might be some C code that precedes it but in general there is some amount of asm (cant bootstrap C with C) and then one or more steps to main().

Should a Fortran-compiled and C-compiled DLL be able to import interchangeably? (x86 target)

The premise: I'm writing a plug-in DLL which conforms to an industry standard interface / function signature. This will be used in at least two different software packages used internally at my company, both of which have some example skeleton code or empty shells of this particular interface. One vendor authors their example in C/C++, the other in Fortran.
Ideally I'd like to just have to write and maintain this library code in one language and not duplicate it (especially as I'm only just now getting some comfort level in various flavors of C, but haven't touched Fortran).
I've emailed off to both our vendors to see if there's anything specific their solvers need when they import this DLL, but this has made me curious at a more fundamental level. If I compile a DLL with an exposed method void foo(int bar) in both C and Fortran... by the time it's down to x86 machine instructions - does it make any difference in how that method is called by program "X"? I've gathered so far that if I were to do C++ I'd need the extern "C" bit to avoid "mangling" - there anything else I should be aware of?
It matters. The exported function must use a specific calling convention, there are several incompatible ones in common use in 32-bit code. The calling convention dictates where the function arguments are stored, in what order they are passed and how they are removed again. As well as how the function return value is passed back.
And the name of the function matters, exported function names are often decorated with extra characters. Which is what extern "C" is all about, it suppresses the name mangling that a C++ compiler uses to prevent overloaded functions from having the same exported name. So the name is one that the linker for a C compiler can recognize.
The way a C compiler makes function calls is pretty much the standard if you interop with code written in other languages. Any modern Fortran compiler will support declarations to make them compatible with a C program. And surely this is something that's already used by whatever software vendor you are working with that provides an add-on that was written in Fortran. And the other way around, as long as you provide functions that can be used by a C compiler then the Fortran programmer has a good chance at being able to call it.
Yes it has been discussed here many many times. Study answers and questions in this tag https://stackoverflow.com/questions/tagged/fortran-iso-c-binding .
The equivalent of extern "C" in fortran is bind(C). The equivalency of the datatypes is done using the intrinsic module iso_c_binding.
Also be sure to use the same calling conventions. If you do not specify anything manually, the default is usually the same for both. On Linux this is non-issue.
extern "C" is used in C++ code. So if you DLL is written in C++, you mustn't pass any C++ objects (classes).
If you stick with C types, you need to make sure the function passes parameters in a single way e.g. use C's default of _cdecl. Not sure what Fortran uses.

What 'mark' does a language leaves on a library that we need language bindings?

What mark does a language leaves on a compiled library that we need language bindings if we have call its functions from a different language?
object code looks 'language free' to me.
While learning OpenGL in c in Linux environment I have across language bindings.
Binding provides a simple and consistent way for applications to present and interact with data.
Source: The tag under your question
I'm guessing that you're either young or haven't been programming for more than a decade or so.
Object code should look language free, but it ain't due to history. Back in the 1970s and 1980s, on Intel 80x86 and Motorola 680x0 CPUs, function call arguments were passed on the stack. In the 'Pascal' convention, the number of arguments was fixed and the called function code removed them from the stack before returning. In the 'C' convention, the number of arguments was variable (eg printf) so the calling code had to remove them when the function returned. This cost 2 extra bytes per function call, which is nothing today but was significant back then when PCs only came with 128K or so of RAM. So Microsoft chose to use the Pascal calling convention for the Windows API, even though it was written in C. If your object code called a Windows function with the C convention by mistake, kaboom. This is why the header files are still cluttered up with WINAPI and _stdcall and _fastcall and whatnot.
Starting in the 1990s operating system authors realized this was silly and started imposing standard calling conventions on everyone. The C convention could handle both cases, so it got used everywhere. With the moves to MacOS X, 64 bit Windows, and ARM; we are finally getting language free object code.
Now, OpenGL was designed to be used from C and Fortran. (Which was in the 1990s still an important language for scientific calculations and visualization.) Both languages have integers, floating point numbers, and arrays of various sized ints/floats. C has structs but Fortran doesn't, and I suspect this is a major reason why the OpenGL API never uses any structs. There are also differences in the memory layout of 2D or higher dimension arrays between C and Fortran, and again note that the OpenGL API never specifies 2D arrays, only 1D.
A C API works for most languages. This is partly because C is 'portable assembler' that works onto almost any CPU and operating system. It's also because most other programming languages in common use are either supersets of C (C++, Objective-C) or implemented in C themselves (Python, Perl, Ruby) so can be made to call the OpenGL C API reasonably easily.
Java and C# have more problems, because they define their own object code, so to speak, and memory access is more tightly controlled. The C/OpenGL notion of 'here is a pointer to a block of memory, do what you like with it' breaks the security model of the JVM/CLR. So you end up having to use Java NIOByteBuffer things instead of just passing arrays.
A lot of it also comes down to the skill of the language binding designer. For one example, Python-OpenGL by Mike Fletcher is a really good binding. All the functions and constants have exactly the same names, so a lot of code can be just copied from C and pasted into Python. Python doesn't have C style arrays directly, but the language binding will silently translate any Python sequences/tuples you pass as "arrays" into the underlying C format for you. It feels natural for a Python programmer and still exposes the full capabilities of OpenGL.
For a bad example, JOGL is a pain in the arse. There's no automatic conversion from Java arrays to C, so you have to futz around with NIOByteBuffers yourself. It's so annoying that it's actually easier to use glBegin..glEnd blocks. And extra array offset arguments got added to a lot of OpenGL functions, so your code no longer looks the same as C/C++ and you waste a lot of time sticking ,0 on the end of function calls. Some of this is due to the JVM as mentioned before, but a lot of it is just bad design by (I suspect) somebody who never actually wrote much OpenGL themselves.
A long and rambling answer to a vague question.
Well, all you have to do is think about the myriad of calling conventions in C and C++. In order to prevent serious mishaps, the compiler mangles the function names based on calling convention so that you do not accidentally call a stdcall function using fastcall conventions. Each language has its own set of superfluous details like this that a language independent API should never have to burden itself with. Language bindings serve as an adapter/bridge that separates the language-specific stuff from the standardized API, filling in the gaps wherever necessary.
The OpenGL API is generally implemented in a single language (C) and programs written in other languages interface with the system's implementation through language bindings. OpenGL uses null-terminated ASCII strings for GLSL and has numerous functions that use pointers, things that make perfect sense for an API that is designed to be implemented in C. In Java, strings are not null-terminated and they are UTF-16 encoded; you can see why a bridge is needed. The Java GL bindings take care of string conversion and alter glVertexPointer (...)-like functions to fit Java's conditions for "pointing to" contiguous blocks of memory.

A Windows C compiler that doesn't split arguments in its runtime libraries?

I have heard that in Windows, parameters are passed a single parameter, and then the program splits it into arguments, either in its runtime libraries, or sometimes, in the actual code.
I've heard that most C/C++ compilers do it in runtime libararies (for example, TCC - Tiny C Compiler, which I downloaded)
Are there any C compilers I can download, that don't? Any links to them?
And in such a compiler, would argsv[0] have the whole string?
Added
It's based on what this person (jdedb) said in Super User question Can't pipe or redirect Cygwin grep output, after seeming to suggest that I ask on Stack Overflow.
"It's up to the called program to split the command tail into words, if it wants to operate in Unix (and C language) fashion. (The runtime support libraries of most C and C++ language implementations for Win32 do this splitting behind the scenes."
He said it's the compilers.. But according to Necrolis, it's not the compiler.
(added- Necrolis commented correcting my misreading, compiler!=runtime library)
If you are on Windows, just use GetCommandLine. This is how most CRT wrappers get the command line to split to start with.
As for your actual question, it's not the compiler, but the CRT startup wrapper that they use. If you implement mainCRTstartup, and override the entrypoint with it, you can do whatever you want. A good example of how it works can be seen here.
That "parameter splitting" is the way mandated by the C99 Standard (PDF file) in 5.1.2.2.1.
If an implementation (compiler + library + options) recognizes but does not separate the program name from the other parameters (and parameters from each other) it is not conforming.
Of course, if you use a free-standing implementation none of this applies.

What can you do in C without "std" includes? Are they part of "C," or just libraries?

I apologize if this is a subjective or repeated question. It's sort of awkward to search for, so I wasn't sure what terms to include.
What I'd like to know is what the basic foundation tools/functions are in C when you don't include standard libraries like stdio and stdlib.
What could I do if there's no printf(), fopen(), etc?
Also, are those libraries technically part of the "C" language, or are they just very useful and effectively essential libraries?
The C standard has this to say (5.1.2.3/5):
The least requirements on a conforming
implementation are:
— At sequence points, volatile objects
are stable in the sense that previous
accesses are complete and subsequent
accesses have not yet occurred.
— At program termination, all data
written into files shall be identical
to the result that execution of the
program according to the abstract
semantics would have produced.
— The input and output dynamics of
interactive devices shall take place
as specified in
7.19.3.
So, without the standard library functions, the only behavior that a program is guaranteed to have, relates to the values of volatile objects, because you can't use any of the guaranteed file access or "interactive devices". "Pure C" only provides interaction via standard library functions.
Pure C isn't the whole story, though, since your hardware could have certain addresses which do certain things when read or written (whether that be a SATA or PCI bus, raw video memory, a serial port, something to go beep, or a flashing LED). So, knowing something about your hardware, you can do a whole lot writing in C without using standard library functions. Potentially, you could implement the C standard library, although this might require access to special CPU instructions as well as special memory addresses.
But in pure C, with no extensions, and the standard library functions removed, you basically can't do anything other than read the command line arguments, do some work, and return a status code from main. That's not to be sniffed at, it's still Turing complete subject to resource limits, although your only resource is automatic and static variables, no heap allocation. It's not a very rich programming environment.
The standard libraries are part of the C language specification, but in any language there does tend to be a line drawn between the language "as such", and the libraries. It's a conceptual difference, but ultimately not a very important one in principle, because the standard says they come together. Anyone doing something non-standard could just as easily remove language features as libraries. Either way, the result is not a conforming implementation of C.
Note that a "freestanding" implementation of C only has to implement a subset of standard includes not including any of the I/O, so you're in the position I described above, of relying on hardware-specific extensions to get anything interesting done. If you want to draw a distinction between the "core language" and "the libraries" based on the standard, then that might be a good place to draw the line.
What could I do if there's no printf(), fopen(), etc?
As long as you know how to interface the system you are using you can live without the standard C library. In embedded systems where you only have several kilobytes of memory, you probably don't want to use the standard library at all.
Here is a Hello World! example on Linux and Windows without using any standard C functions:
For example on Linux you can invoke the Linux system calls directly in inline assembly:
/* 64 bit linux. */
#define SYSCALL_EXIT 60
#define SYSCALL_WRITE 1
void sys_exit(int error_code)
{
asm volatile
(
"syscall"
:
: "a"(SYSCALL_EXIT), "D"(error_code)
: "rcx", "r11", "memory"
);
}
int sys_write(unsigned fd, const char *buf, unsigned count)
{
unsigned ret;
asm volatile
(
"syscall"
: "=a"(ret)
: "a"(SYSCALL_WRITE), "D"(fd), "S"(buf), "d"(count)
: "rcx", "r11", "memory"
);
return ret;
}
void _start(void)
{
const char hwText[] = "Hello world!\n";
sys_write(1, hwText, sizeof(hwText));
sys_exit(12);
}
You can look up the manual page for "syscall" which you can find how can you make system calls. On Intel x86_64 you put the system call id into RAX, and then return value will be stored in RAX. The arguments must be put into RDI, RSI, RDX, R10, R9 and R8 in this order (when the argument is used).
Once you have this you should look up how to write inline assembly in gcc.
The syscall instruction changes the RCX, R11 registers and memory so we add this to the clobber list make GCC aware of it.
The default entry point for the GNU linker is _start. Normally the standard library provides it, but without it you need to provide it.
It isn't really a function as there is no caller function to return to. So we must make another system call to exit our process.
Compile this with:
gcc -nostdlib nostd.c
And it outputs Hello world!, and exits.
On Windows the system calls are not published, instead it's hidden behind another layer of abstraction, the kernel32.dll. Which is always loaded when your program starts whether you want it or not. So you can simply include windows.h from the Windows SDK and use the Win32 API as usual:
#include <windows.h>
void _start(void)
{
const char str[] = "Hello world!\n";
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD written;
WriteFile(stdout, str, sizeof(str), &written, NULL);
ExitProcess(12);
}
The windows.h has nothing to do with the standard C library, as you should be able to write Windows programs in any other language too.
You can compile it using the MinGW tools like this:
gcc -nostdlib C:\Windows\System32\kernel32.dll nostdlib.c
Then the compiler is smart enough to resolve the import dependencies and compile your program.
If you disassemble the program, you can see only your code is there, there is no standard library bloat in it.
So you can use C without the standard library.
What could you do? Everything!
There is no magic in C, except perhaps the preprocessor.
The hardest, perhaps is to write putchar - as that is platform dependent I/O.
It's a good undergrad exercise to create your own version of varargs and once you've got that, do your own version of vaprintf, then printf and sprintf.
I did all of then on a Macintosh in 1986 when I wasn't happy with the stdio routines that were provided with Lightspeed C - wrote my own window handler with win_putchar, win_printf, in_getchar, and win_scanf.
This whole process is called bootstrapping and it can be one of the most gratifying experiences in coding - working with a basic design that makes a fair amount of practical sense.
You're certainly not obligated to use the standard libraries if you have no need for them. Quite a few embedded systems either have no standard library support or can't use it for one reason or another. The standard even specifically talks about implementations with no library support, C99 standard 5.1.2.1 "Freestanding environment":
In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.
The headers required by C99 to be available in a freestanding implemenation are <float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>. These headers define only types and macros so there's no need for a function library to support them.
Without the standard library, you're entire reliant on your own code, any non-standard libraries that might be available to you, and any operating system system calls that you might be able to interface to (which might be considered non-standard library calls). Quite possibly you'd have to have your C program call assembly routines to interface to devices and/or whatever operating system might be on the platform.
You can't do a lot, since most of the standard library functions rely on system calls; you are limited to what you can do with the built-in C keywords and operators. It also depends on the system; in some systems you may be able to manipulate bits in a way that results in some external functionality, but this is likely to be the exception rather than the rule.
C's elegance is in it's simplicity, however. Unlike Fortran, which includes much functionality as part of the language, C is quite dependent on its library. This gives it a great degree of flexibility, at the expense of being somewhat less consistent from platform to platform.
This works well, for example, in the operating system, where completely separate "libraries" are implemented, to provide similar functionality with an implementation inside the kernel itself.
Some parts of the libraries are specified as part of ANSI C; they are part of the language, I suppose, but not at its core.
None of them is part of the language keywords. However, all C distributions must include an implementation of these libraries. This ensures portability of many programs.
First of all, you could theoretically implement all these functions yourself using a combination of C and assembly, so you could theoretically do anything.
In practical terms, library functions are primarily meant to save you the work of reinventing the wheel. Some things (like string and library functions) are easier to implement. Other things (like I/O) very much depend on the operating system. Writing your own version would be possible for one O/S, but it is going to make the program less portable.
But you could write programs that do a lot of useful things (e.g., calculate PI or the meaning of life, or simulate an automata). Unless you directly used the OS for I/O, however, it would be very hard to observe what the output is.
In day to day programming, the success of a programming language typically necessitates the availability of a useful high-quality standard library and libraries for many useful tasks. These can be first-party or third-party, but they have to be there.
The std libraries are "standard" libraries, in that for a C compiler to be compliant to a standard (e.g. C99), these libraries must be "include-able." For an interesting example that might help in understanding what this means, have a look at Jessica McKellar's challenge here:
http://blog.ksplice.com/2010/03/libc-free-world/
Edit: The above link has died (thanks Oracle...)
I think this link mirrors the article: https://sudonull.com/post/178679-Hello-from-the-libc-free-world-Part-1
The CRT is part of the C language just as much as the keywords and the syntax. If you are using C, your compiler MUST provide an implementation for your target platform.
Edit:
It's the same as the STL for C++. All languages have a standard library. Maybe assembler as the exception, or some other seriously low level languages. But most medium/high levels have standard libs.
The Standard C Library is part of ANSI C89/ISO C90. I've recently been working on the library for a C compiler that previously was not ANSI-compliant.
The book The Standard C Library by P.J. Plauger was a great reference for that project. In addition to spelling out the requirements of the standard, Plauger explains the history of each .h file and the reasons behind some of the API design. He also provides a full implementation of the library, something that helped me greatly when something in the standard wasn't clear.
The standard describes the macros, types and functions for each of 15 header files (include stdio.h, stdlib.h, but also float.h, limits.h, math.h, locale.h and more).
A compiler can't claim to be ANSI C unless it includes the standard library.
Assembly language has simple commands that move values to registers of the CPU, memory, and other basic functions, as well as perform the core capabilities and calculations of the machine. C libraries are basically chunks of assembly code. You can also use assembly code in your C programs. var is an assembly code instruction. When you use 0x before a number to make it Hex, that is assembly instruction. Assembly code is the readable form of machine code, which is the visual form of the actual switch states of the circuits paths.
So while the machine code, and therefore the assembly code, is built into the machine, C languages are combined of all kinds of pre-formed combinations of code, including your own functions that might be in part assembly language and in part calling on other functions of assembly language or other C libraries. So the assembly code is the foundation of all the programming, and after that it's anyone's guess about what is what. That's why there are so many languages and so few true standards.
Yes you can do a ton of stuff without libraries.
The lifesaver is __asm__ in GCC. It is a keyword so yes you can.
Mostly because every programming language is built on Assembly, and you can make system calls directly under some OSes.

Resources