Standard functions defined in header files or automatically linked? - c

When writing a basic c program.
#include <stdio.h>
main(){
printf("program");
}
Is the definition of printf in "stdio.h" or is the printf function automatically linked?

Usually, in stdio.h there's only the prototype; the definition should be inside a library that your object module is automatically linked against (the various msvcrt for VC++ on Windows, libcsomething for gcc on Linux).
By the way, it's <stdio.h>, not "stdio.h".

Usually they are automatically linked, but the compiler is allowed to implement them as it pleases (even by compiler magic).
The #include is still necessary, because it brings the standard functions into scope.

Stricto sensu, the compiler and the linker are different things (and I am not sure that the C standard speaks of compilation & linking, it more abstractly speaks of translation and implementation issues).
For instance, on Linux, you often use gcc to translate your hello.c source file, and gcc is a "driving program" which runs the compiler cc1, the assembler as, the linker ld etc.
On Linux, the <stdio.h> header is an ordinary file. Run gcc -v -Wall -H hello.c -o hello to understand what is happening. The -v option asks gcc to show you the actual programs (cc1 and others) that are used. The -Wall flag asks for all warnings (don't ignore them!). The -H flag asks the compiler to show you the header files which are included.
The header file /usr/include/stdio.h is #include-ing itself other headers. At some point, the declaration of printf is seen, and the compiler parses it and adjust its state accordingly.
Later, the gcc command would run the linker ld and ask it to link the standard C library (on my system /usr/lib/x86_64-linux-gnu/libc.so). This library contains the [object] code of printf
I am not sure to understand your question. Reading wikipedia's page about compilers, linkers, linux kernel, system calls should be useful.
You should not want gcc to link automagically your own additional libraries. That would be confusing. (but if you really wanted to do that with GCC, read about GCC specs file)

Related

Using msvcrt with Mingw-w64

I'm trying to use the Mingw-w64 toolchain provided by LH_Mouse, version 10.2.1.
When compiling this small program:
#include <windows.h>
#include <stdio.h>
int wmain()
{
wprintf(L"Hello world!\n");
ExitProcess(0);
}
with the following command:
gcc.exe test.c -o test64.exe -municode -s -Os -Wall -nostdlib -lmsvcrt -lkernel32 -e wmain
I get an error: undefined reference to `__mingw_vfwprintf'
Z:/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.2.1/../../../../x86_64-w64-mingw32/bin/ld.exe: R:\Temp\ccDSsj2M.o:test.c:(.text+0x3d): undefined reference to `__mingw_vfwprintf'
collect2.exe: error: ld returned 1 exit status
I previously used Mingw-w64 7.3.0 provided on SourceForge and everything worked fine.
Of course I can use the stdlib (and in this case there is no error) but this increases the size of the executable.
I have read that some changes have been done recently to support UCRT. Is it related?
Is there a way to use msvcrt like before?
Edit: using -lmingwex does not really help. If located after -lmsvcrt, it produces a huge amount of varied "undefined reference" errors. If located before, we got several "undefined reference" to ___chkstk_ms, and one to _pei386_runtime_relocator.
The first ones can be removed by adding -lgcc after -lmingwex. For the last "undefined reference", I have found nothing. Adding -lmingw32 does not help, no matter its location.
So for now, the only possibilities I see are to either modify stdio.h, or create my own declarations.
Well, I use mingw since many years and its not the biggest issue I encountered. But I am a bit disappointed that it affects such basic functions.
You are trying to compile your application with -nostdlib option which means that standard C functions (like printf, scanf, wprintf, ...) are not available for your application. Therefore you should not include standard C header files (like stdio.h, stdlib.h, ...) and also you should not use wprintf in your application.
If you really want to standard C function with -nostdlib option then you have to write your own C header files and also your own implementation of standard C functions (like wprintf).
What you did was to include MinGW's stdio.h header file which calls lot of standard C functions and also internal MinGW functions and then told gcc (via -nostdlib to not link these standard functions and neither internal MinGW functions. Which obviously resulted in lot of linker errors.
Then you have tried to manually link msvcrt.dll library but it failed too as this Windows library does not contain internal MinGW functions used in MinGW's stdio.h header file.
If you really want not use standard C functions from gcc/MinGW (via -nostdlib option) but rather directly via Windows msvcrt.dll library then you have to use also Windows stdio.h header file and not MinGW one. But beware that Windows header file stdio.h is not supported for gcc compiler and it also does not have to work.
So you should first answer questions, why you are trying to use standard C functions (like wprintf) without standard C library with gcc compiler? This sounds really strange and such thing is common only in freestanding environment or for bare-metal applications (like firmware, bootloader, kernel, ...). Are you workarounding some other bug which you have not mentioned?
And to answer your question, why in previous version it (probably) worked fine. Older version of gcc compiled C code according to C89 standard (with GCC extensions). New version of gcc started to compile C code according to C11 standard. Windows library msvcrt.dll provides only subset of standard C functions and only for C89 standard. For this reason MinGW provides its own implementation (or fixes) for some standard C functions to be compliant with C99 and C11 standards. As older gcc compiled your application as C89 code and manually linked msvcrt.dll library provided somehow compatible implementation for printf and wprintf, it resulted in final exe file without errors. Now with new gcc you are compiling your application as C11 code and you are trying to call C11 wprintf function but this function in msvcrt.dll is not compatible with C11 and therefore you get lot of errors.
Windows provides C11 standard functions in UCRT libraries. msvcrt.dll provides only subset of C89.
So you really should not compile your application with -nostdlib if your intention is to produce executable application and use standard C functions.
You can instruct new gcc to compile your C application as C89 with option -std=c89. This should instruct new gcc to behave like old gcc.

Is GCC a compiler or is it a collection of tools for the compilation process?

I'm new to GNU and GCC, sorry if my question sounds dumb.
We know that GCC stands for GNU Compiler Collection, so I think gcc is just a compiler (from a compiler collection).
But I also read that gcc is a compiler driver which contains Pre-processor (cpp), Compiler (cc1), Assembler (as) and Linker (ld).
So it looks that GCC is not a compiler, but why wiki says:
"GCC is a key component of the GNU toolchain and the standard compiler for most projects related to GNU and Linux"
and what does "1" means in cc1, why it is called cc1, not cc2, cc3 ...etc?
In most cases you (a little inaccurately) call gcc the compiler. A reason is that you can run the whole tool chain, at least for simple projects, with a single gcc command. Let's say you have this main.c
// main.c
#include <stdio.h>
int main(void)
{
printf("Hello, world!\n");
}
and compile it with
gcc main.c
Then everything you mentioned, cpp, cc1, as and ld will be involved in creating the executable a.out. Well, almost. cpp is old and newer versions of the compiler has the preprocessor integrated.
If you want to see the output of the preprocessor, use gcc -E main.c
As I mentioned, the preprocessor and the compiler is integrated nowadays, so you cannot really run cc1 without the preprocessor. But you can generate an assembly file with gcc -S main.c and this will produce main.s. You can assemble that to an object file with gcc -c main.s which will produce main.o and then you can link it with gcc main.o to produce your final a.out
https://renenyffenegger.ch/notes/development/languages/C-C-plus-plus/GCC/cc1/index (Emphasis mine)
cc1 is also referred to as the compiler proper.
cc1 preprocesses a c translation unit and compiles it into assembly code. The assembly code is converted into an object file with the assembler.
Earlier versions of cc1 used /usr/bin/cpp for the preprocessing stage.
https://renenyffenegger.ch/notes/Linux/fhs/usr/bin/cpp (Emphasis mine)
The preprocessor.
cpp is not bo be confused with c++.
The preprocessor is concerned with things like
macro expansion
removing comments
trigraph conversion
escaped newline splicing
processing of directives
Newer version of gcc don't invoke /usr/bin/cpp directly for preprocessing a translation unit. Rather, the preprocessing is performed by the compiler proper cc1.
I would almost consider this as a dup of this, but it's impossible to create cross site dupes. Relationship between cc1 and gcc?
Related: 'Compiler proper' command for C program
and what does "1" means in cc1, why it is called cc1, not cc2, cc3 ...etc?
Don't know. My first guess was that they just added a 1 to cc which was and is the standard compiler on Unix (excluding Linux) systems. On most Linux system, cc is just a link to gcc. But another good guess is that it stands for the first phase in compilation. Have not found a good source though.

How a standard library differ from user defined header file (.h) and its implementation file (.c) in C?

How a standard library like libc.a (static library) which is included using #include <stdio.h> in our main.c differ from user defined header file (cube.h) included in main.c with its implementation file (cube.c) in C ?
I mean both are header files but one's implementation is a static library (.a) and others is source file (.c) .
You would have the definition (implementation) in, say, cube.c
#include "cube.h"
int cube( int x ) {
return x * x * x;
}
Then we'll put the function declaration in another file. By convention, this is done in a header file, cube.h in this case.
int cube( int x );
We can now call the function from somewhere else, main.c for instance, by using the #include directive (which is part of the C preprocessor) .
#include "cube.h"
#include <stdio.h>
int main() {
int c = cube( 10 );
printf("%d", c);
...
}
Also if I included include guards in cube.h what would happen when I include cube.h in both main.c and cube.c . Where it will get included?
A programming language is not the same as its implementation.
A programming language is a specification (written on paper; you should read n1570, which practically is the C11 standard), it is not a software. The C standard specifies a C standard library and defines the headers to be #include-d.
(you could run your C program with a bunch of human slaves and without any computers; that would be very unethical; you could also use some interpreter like Ch and avoid any compiler or object or executable files)
How a standard library like libc.a (static library) which is included using #include <stdio.h> ... differs from a user file cube.c
The above sentence is utterly wrong (and makes no sense). libc.a does not #include -or is not included by- the <stdio.h> header (i.e. file /usr/include/stdio.h and other internal headers e.g. /usr/include/bits/stdio2.h). That inclusion happens when you compile your main.c or cube.c.
In principle, <stdio.h> might not be any file on your computer (e.g. #include <stdio.h> could trigger some magic in your compiler). In practice, the compiler is parsing /usr/include/stdio.h (and other included files) when you #include <stdio.h>.
Some standard headers (notably <setjmp.h>, <stdreturn.h>, <stdarg.h>, ....) are specified by the standard but are implemented with the help of special builtins or attributes (that is "magic" things) of the GCC compiler.
The C standard knows about translation units.
Your GCC compiler processes source files (grossly speaking, implementing translation units) and starts with a preprocessing phase (processing #include and other directives and expanding macros). And gcc runs not only the compiler proper (some cc1) but also the assembler as and the linker ld (read Levine's Linkers and Loaders book for more).
For good reasons, your header file cube.h should practically start with include guards. In your simplistic example they are probably useless (but you should get that habit).
You practically should almost always use gcc -Wall -Wextra -g (to get all warnings and debug info). Read the chapter about Invoking GCC.
You may pass also -v to gcc to understand what programs (e.g. cc1, ld, as) are actually run.
You may pass -H to gcc to understand what source files are included during preprocessing phase. You can also get the preprocessed form of cube.c as the cube.i file obtained with gcc -C -E cube.c > cube.i and later look into that cube.i file with some editor or pager.
You -or gcc- would need (in your example) to compile cube.c (the translation unit given by that file and every header files it is #include-ing) into the cube.o object file (assuming a Linux system). You would also compile main.c into main.o. At last gcc would link cube.o, main.o, some startup files (read about crt0) and the libc.so shared library (implementing the POSIX C standard library specification and a bit more) to produce an executable. Relocatable object files, shared libraries (and static libraries, if you use some) and executables use the ELF file format on Linux.
If you code a C program with several source files (and translation units) you practically should use a build automation tool like GNU make.
If I included include guards in cube.h what would happen when I include cube.h in both main.c and cube.c ?
These should be two different translation units. And you would compile them in several steps. First you compile main.c into main.o using
gcc -Wall -Wextra -g -c main.c
and the above command is producing a main.o object file (with the help of cc1 and as)
Then you compile (another translation unit) cube.c using
gcc -Wall -Wextra -g -c cube.c
hence obtaining cube.o
(notice that adding include guards in your cube.h don't change the fact that it would be read twice, once when compiling cube.c and the other time when compiling main.c)
At last you link both object files into yourprog executable using
gcc -Wall -Wextra -g cube.o main.o -o yourprog
(I invite you to try all these commands, and also to try them with gcc -v instead of gcc above).
Notice that gcc -Wall -Wextra -g cube.c main.c -o yourprog is running all the steps above (check with gcc -v). You really should write a Makefile to avoid typing all these commands (and just compile using make, or even better make -j to run compilation in parallel).
Finally you can run your executable using ./yourprog (but read about PATH), but you should learn how to use gdb and try gdb ./yourprog.
Where it cube.h will get included?
It will get included at both translation units; once when running gcc -Wall -Wextra -g -c main.c and another time when running gcc -Wall -Wextra -g -c cube.c. Notice that object files (cube.o and main.o) don't contain included headers. Their debug information (in DWARF format) retains that inclusion (e.g. the included path, not the content of the header file).
BTW, look into existing free software projects (and study some of their source code, at least for inspiration). You might look into GNU glibc or musl-libc to understand what a C standard library really contains on Linux (it is built above system calls, listed in syscalls(2), provided and implemented by the Linux kernel). For example printf would ultimately sometimes use write(2) but it is buffering (see fflush(3)).
PS. Perhaps you dream of programming languages (like Ocaml, Go, ...) knowing about modules. C is not one.
TL;DR: the most crucial difference between the C standard library and your library function is that the compiler might intimately know what the standard library functions do without seeing their definition.
First of all, there are 2 kinds of libraries:
The C standard library (and possibly other libraries that are part of the C implementation, like libgcc)
Any other libraries - which includes all those other libraries in /usr/lib, /lib, etc.., or those in your project.
The most crucial difference between a library in category 1 and a library in category 2 library is that the compiler is allowed to assume that every single identifier that you use from category 1 library behaves as if it is the standard library function and behaves as if in the standard and can use this fact to optimize things as it sees fit - this even without it actually linking against the relevant routine from the standard library, or executing it at the runtime. Look at this example:
% cat foo.c
#include <math.h>
#include <stdio.h>
int main(void) {
printf("%f\n", sqrt(4.0));
}
We compile it, and run:
% gcc foo.c -Wall -Werror
% ./a.out
2.000000
%
and correct result is printed out.
So what happens when we ask the user for the number:
% cat foo.c
#include <math.h>
#include <stdio.h>
int main(void) {
double n;
scanf("%lf\n", &n);
printf("%f\n", sqrt(n));
}
then we compile the program:
% gcc foo.c -Wall -Werror
/tmp/ccTipZ5Q.o: In function `main':
foo.c:(.text+0x3d): undefined reference to `sqrt'
collect2: error: ld returned 1 exit status
Surprise, it doesn't link. That is because sqrt is in the math library -lm and you need to link against it to get the definition. But how did it work in the first place? Because the C compiler is free to assume that any function from standard library behaves as if it was as written in the standard, so it can optimize all invocations to it out; this even when we weren't using any -O switches.
Notice that it isn't even necessary to include the header. C11 7.1.4p2 allows this:
Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header.
Therefore in the following program, the compiler can still assume that the sqrt is the one from the standard library, and the behaviour here is still conforming:
% cat foo.c
int printf(const char * restrict format, ...);
double sqrt(double x);
int main(void) {
printf("%f\n", sqrt(4.0));
}
% gcc foo.c -std=c11 -pedantic -Wall -Werror
% ./a.out
2.000000
If you drop the prototype for sqrt, and compile the program,
int printf(const char * restrict format, ...);
int main(void) {
printf("%f\n", sqrt(4));
}
A conforming C99, C11 compiler must diagnose constraint violation for implicit function declaration. The program is now an invalid program, but it still compiles (the C standard allows that too). GCC still calculates sqrt(4) at compilation time. Notice that we use int here instead of double, so it wouldn't even work at runtime without proper declaration for an ordinary function because without prototype the compiler wouldn't know that the argument must be double and not the int that was passed in (without a prototype, the compiler doesn't know that the int must be converted to a double). But it still works.
% gcc foo.c -std=c11 -pedantic
foo.c: In function ‘main’:
foo.c:4:20: warning: implicit declaration of function ‘sqrt’
[-Wimplicit-function-declaration]
printf("%f\n", sqrt(4));
^~~~
foo.c:4:20: warning: incompatible implicit declaration of built-in function ‘sqrt’
foo.c:4:20: note: include ‘<math.h>’ or provide a declaration of ‘sqrt’
% ./a.out
2.000000
This is because an implicit function declaration is one with external linkage, and C standard says this (C11 7.1.3):
[...] All identifiers with external linkage in any of the following subclauses (including the future library directions) and errno are always reserved for use as identifiers with external linkage. [...]
and Appendix J.2. explicitly lists as undefined behaviour:
[...] The program declares or defines a reserved identifier, other than as allowed by 7.1.4 (7.1.3).
I.e. if the program did actually have its own sqrt then the behaviour is simply undefined, because the compiler can assume that the sqrt is the standard-conforming one.

gcc switches - what do these do?

I am new with using gcc and so I have a couple of questions.
What do the following switches accomplish:
gcc -v -lm -lfftw3 code.c
I know that lfftw3 is an .h file used with code.c but why is it part of the command?
I couldn't find out what -lm does in my search. What does it do?
I think I found out -v causes gcc to display programs invoked by it.
-l specifies a library to include. In this case, you're including the math library (-lm) and the fftw3 library (-lffw3). The library will be somewhere in your library path, possibly /usr/lib, and will be named something like libffw3.so
From GCC's man page:
-v Print (on standard error output) the commands executed to run the
stages of compilation. Also print the version number of the
compiler driver program and of the preprocessor and the compiler
proper.
-l library
Search the library named library when linking. (The second
alternative with the library as a separate argument is only for
POSIX compliance and is not recommended.)
It makes a difference where in the command you write this option;
the linker searches and processes libraries and object files in the
order they are specified. Thus, foo.o -lz bar.o searches library z
after file foo.o but before bar.o. If bar.o refers to functions in
z, those functions may not be loaded.
The linker searches a standard list of directories for the library,
which is actually a file named liblibrary.a. The linker then uses
this file as if it had been specified precisely by name.
The directories searched include several standard system
directories plus any that you specify with -L.
Normally the files found this way are library files---archive files
whose members are object files. The linker handles an archive file
by scanning through it for members which define symbols that have
so far been referenced but not defined. But if the file that is
found is an ordinary object file, it is linked in the usual
fashion. The only difference between using an -l option and
specifying a file name is that -l surrounds library with lib and .a
and searches several directories.
libm is the library that math.h uses, so -lm includes that library. You might want to get a better grasp of the concept of linking. Basically, that switch adds a bunch of compiled code to your program.
-lm links your program with the math library.
-v is the verbose (extra ouput) flag for the compiler.
-lfftw3 links your program with fftw3 library.
You just include headers by using #include "fftw3.h". If you want to actually include the code associated to it, you need to link it. -l is for that. Linking with libraries.
arguments starting with -l specify a library which is linked into the program. Like Pablo Santa Cruz said, -lm is the standard math library, -lfftw3 is a library for fourier transformation.
Try man when you're trying to learn about a command.
From man gcc
-v Print (on standard error output) the commands executed to run the
stages of compilation. Also print the version number of the
com-
piler driver program and of the preprocessor and the compiler
proper.
As Pablo stated, -lm links your math library.
-lfftw3 links in a library used for Fourier transforms. The project page, with more info can be found here:
http://www.fftw.org/
The net gist of all these statements is that they compile your code file into a program, which will be named the default (a.out) and is dependent on function calls from the math and fourier transform libs. The -v statement just helps you keep track of the compilation process and diagnose errors should occur.
In addition to man gcc which should be the first stop for questions about any command, you can also try the almost standard --help option. Even for commands that don't support it, an unsupported option usually causes it to print an error containing usage information that should hint at a similar option. In this case, gcc will display a terse (for gcc, its only about 50 lines long) help summary listing the small number of options that are understood by the gcc program itself rather than passed on to its component programs. After the description of the --help option itself, it lists --target-help and -v --help as ways to get more information about the target architecture and the component programs.
My MinGW GCC 3.4.5 installation generates more than 1200 lines of output from gcc -v --help on Windows XP. I'm pretty sure that doesn't get much smaller in other installations.
It would also be a good idea to read the official manual for GCC. It is also helpful to read the documentation for the linker (ld) and assembler (often gas or just as, but it may be some platform specific assembler as well); aside from a platform-specific assembler, these are documented as part of the binutils collection.
General familiarity with the command line style of Unix tools is also helpful. The idea that a single-character option's value might not be delimited from the option name is a convention that goes back essentially as far as Unix does. The modern convention (promulgated by GNU) that multiple-character option names are introduced by -- instead of just - implies that -lm might be a synonym for -l m (or the pair of options -l -m in some conventions but that happens not to be the case for gcc) but it is probably not a single option named -lm. You will see a similar pattern with the -f options that control specific optimizations or the -W options that control warnings, for example.

Question on gcc compiler commands

I had to compile a small little C program using the following;
gcc sine.c -o sine -lm
I needed the "-lm" because the program included the math.h.
In looking this up under compiler commands man shows it a either -llibrary or -l library.
I could not find any information on what other libraries. Apparently -lm is needed for math.h
what other library commands might be needed.
Thanks
-lm means to link the "m" library, which as you said contains math stuff. If you need other libraries for your code, your documentation for those functions will show that.
If it links without errors, you don't need anything anything else. In fact you don't even need to specify -lm, as it and the standard C library are linked automatically.

Resources