clang not recognizing unitialized pointer found in static library

clang not recognizing unitialized pointer found in static library - c

I've found a curiosity when compiling with clang (on a MacBook, if it helps). Suppose I have two files:
blah.c
int *p;
main.c
#include <stdio.h>
extern int *p;
int main() {
printf("%p\n", p);
return 0;
}
If I compile with
clang blah.c main.c
everything works out fine. However, if I do
clang -c blah.c
ar rcs libblah.a blah.o
clang main.c libblah.a
I get a linker error:
Undefined symbols for architecture x86_64:
"_p", referenced from:
_main in test-4bf0d6.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Interestingly, if I initialize the variable in blah.c,
#include <stddef.h>
int *p = NULL;
the error goes away.
Also, compiling with gcc doesn't produce this behavior. What exactly is going on with clang here?
Here's the output from clang --version:
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: x86_64-apple-darwin21.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

What exactly is going on with clang here?
TL;DR: Your Clang has a bug. You can probably work around it without modifying your code by adding -fno-common to your compile options.
More detail
Both variations of your code are correct, and as far as the C language specification is concerned, they have the same meaning. On my Linux machine, GCC 8.5 and Clang 12 both accept both variations and successfully build working executables, whether blah.o is linked directly or from a library.
But if you use nm to examine the library built with and without the initializer for p, you will likely get a hint about what is happening. Without an initializer, I see (with either compiler) that p has type 'C' (common). With an initializer (to null), I see that it has type 'B' (BSS).
That is reflective of a traditional behavior of Unix C implementations: to merge multiple definitions of the same symbol as long as no more than one is defined with an explicit initializer. That is an extension to standard C, for the language requires that there be exactly one definition of each external symbol that a program references. Among other things, that extension covers the common error of omitting extern from variable declarations in headers, provided that the header does not specify an initializer.
To implement that, the toolchain needs to distinguish between symbols defined with an explicit initializer and those defined without, and that's where (for C) symbol type "common" comes in -- it is used to convey a symbol that is defined, but without an explicit initializer. Typical linker behavior would be to treat all such symbols as undefined ones if one of the objects being linked has a definition for that symbol with a different type, or else to treat all but one of them as undefined, and the other as having type B (implying default initializtion).
But the MacOS development toolchain seems to have hatched a bug. In your example, it is erroneously failing to recognize the type C symbol as a viable definition when that appears in a library. The issue might be either in the Clang front end or in the system linker, or in a combination of both. Perhaps this arrived together with Apple's recent tightening (and subsequent re-loosening) of the compiler's default conformance settings.
You can probably work around this issue by adding --fno-common to your C compiler flags. GCC and Clang both accept that for disabling the symbol merging described above, and, at least on my machine, they both implement that by emitting the symbol as type B when it is defined without an explicit initializer, just as if it had been explicitly initialized to a null pointer. Note well, however, that this will break any code that is presently relying on that merging behavior.

Related

How a standard library differ from user defined header file (.h) and its implementation file (.c) in C?

How a standard library like libc.a (static library) which is included using #include <stdio.h> in our main.c differ from user defined header file (cube.h) included in main.c with its implementation file (cube.c) in C ?
I mean both are header files but one's implementation is a static library (.a) and others is source file (.c) .
You would have the definition (implementation) in, say, cube.c
#include "cube.h"
int cube( int x ) {
return x * x * x;
}
Then we'll put the function declaration in another file. By convention, this is done in a header file, cube.h in this case.
int cube( int x );
We can now call the function from somewhere else, main.c for instance, by using the #include directive (which is part of the C preprocessor) .
#include "cube.h"
#include <stdio.h>
int main() {
int c = cube( 10 );
printf("%d", c);
...
}
Also if I included include guards in cube.h what would happen when I include cube.h in both main.c and cube.c . Where it will get included?

A programming language is not the same as its implementation.
A programming language is a specification (written on paper; you should read n1570, which practically is the C11 standard), it is not a software. The C standard specifies a C standard library and defines the headers to be #include-d.
(you could run your C program with a bunch of human slaves and without any computers; that would be very unethical; you could also use some interpreter like Ch and avoid any compiler or object or executable files)
How a standard library like libc.a (static library) which is included using #include <stdio.h> ... differs from a user file cube.c
The above sentence is utterly wrong (and makes no sense). libc.a does not #include -or is not included by- the <stdio.h> header (i.e. file /usr/include/stdio.h and other internal headers e.g. /usr/include/bits/stdio2.h). That inclusion happens when you compile your main.c or cube.c.
In principle, <stdio.h> might not be any file on your computer (e.g. #include <stdio.h> could trigger some magic in your compiler). In practice, the compiler is parsing /usr/include/stdio.h (and other included files) when you #include <stdio.h>.
Some standard headers (notably <setjmp.h>, <stdreturn.h>, <stdarg.h>, ....) are specified by the standard but are implemented with the help of special builtins or attributes (that is "magic" things) of the GCC compiler.
The C standard knows about translation units.
Your GCC compiler processes source files (grossly speaking, implementing translation units) and starts with a preprocessing phase (processing #include and other directives and expanding macros). And gcc runs not only the compiler proper (some cc1) but also the assembler as and the linker ld (read Levine's Linkers and Loaders book for more).
For good reasons, your header file cube.h should practically start with include guards. In your simplistic example they are probably useless (but you should get that habit).
You practically should almost always use gcc -Wall -Wextra -g (to get all warnings and debug info). Read the chapter about Invoking GCC.
You may pass also -v to gcc to understand what programs (e.g. cc1, ld, as) are actually run.
You may pass -H to gcc to understand what source files are included during preprocessing phase. You can also get the preprocessed form of cube.c as the cube.i file obtained with gcc -C -E cube.c > cube.i and later look into that cube.i file with some editor or pager.
You -or gcc- would need (in your example) to compile cube.c (the translation unit given by that file and every header files it is #include-ing) into the cube.o object file (assuming a Linux system). You would also compile main.c into main.o. At last gcc would link cube.o, main.o, some startup files (read about crt0) and the libc.so shared library (implementing the POSIX C standard library specification and a bit more) to produce an executable. Relocatable object files, shared libraries (and static libraries, if you use some) and executables use the ELF file format on Linux.
If you code a C program with several source files (and translation units) you practically should use a build automation tool like GNU make.
If I included include guards in cube.h what would happen when I include cube.h in both main.c and cube.c ?
These should be two different translation units. And you would compile them in several steps. First you compile main.c into main.o using
gcc -Wall -Wextra -g -c main.c
and the above command is producing a main.o object file (with the help of cc1 and as)
Then you compile (another translation unit) cube.c using
gcc -Wall -Wextra -g -c cube.c
hence obtaining cube.o
(notice that adding include guards in your cube.h don't change the fact that it would be read twice, once when compiling cube.c and the other time when compiling main.c)
At last you link both object files into yourprog executable using
gcc -Wall -Wextra -g cube.o main.o -o yourprog
(I invite you to try all these commands, and also to try them with gcc -v instead of gcc above).
Notice that gcc -Wall -Wextra -g cube.c main.c -o yourprog is running all the steps above (check with gcc -v). You really should write a Makefile to avoid typing all these commands (and just compile using make, or even better make -j to run compilation in parallel).
Finally you can run your executable using ./yourprog (but read about PATH), but you should learn how to use gdb and try gdb ./yourprog.
Where it cube.h will get included?
It will get included at both translation units; once when running gcc -Wall -Wextra -g -c main.c and another time when running gcc -Wall -Wextra -g -c cube.c. Notice that object files (cube.o and main.o) don't contain included headers. Their debug information (in DWARF format) retains that inclusion (e.g. the included path, not the content of the header file).
BTW, look into existing free software projects (and study some of their source code, at least for inspiration). You might look into GNU glibc or musl-libc to understand what a C standard library really contains on Linux (it is built above system calls, listed in syscalls(2), provided and implemented by the Linux kernel). For example printf would ultimately sometimes use write(2) but it is buffering (see fflush(3)).
PS. Perhaps you dream of programming languages (like Ocaml, Go, ...) knowing about modules. C is not one.

TL;DR: the most crucial difference between the C standard library and your library function is that the compiler might intimately know what the standard library functions do without seeing their definition.
First of all, there are 2 kinds of libraries:
The C standard library (and possibly other libraries that are part of the C implementation, like libgcc)
Any other libraries - which includes all those other libraries in /usr/lib, /lib, etc.., or those in your project.
The most crucial difference between a library in category 1 and a library in category 2 library is that the compiler is allowed to assume that every single identifier that you use from category 1 library behaves as if it is the standard library function and behaves as if in the standard and can use this fact to optimize things as it sees fit - this even without it actually linking against the relevant routine from the standard library, or executing it at the runtime. Look at this example:
% cat foo.c
#include <math.h>
#include <stdio.h>
int main(void) {
printf("%f\n", sqrt(4.0));
}
We compile it, and run:
% gcc foo.c -Wall -Werror
% ./a.out
2.000000
%
and correct result is printed out.
So what happens when we ask the user for the number:
% cat foo.c
#include <math.h>
#include <stdio.h>
int main(void) {
double n;
scanf("%lf\n", &n);
printf("%f\n", sqrt(n));
}
then we compile the program:
% gcc foo.c -Wall -Werror
/tmp/ccTipZ5Q.o: In function `main':
foo.c:(.text+0x3d): undefined reference to `sqrt'
collect2: error: ld returned 1 exit status
Surprise, it doesn't link. That is because sqrt is in the math library -lm and you need to link against it to get the definition. But how did it work in the first place? Because the C compiler is free to assume that any function from standard library behaves as if it was as written in the standard, so it can optimize all invocations to it out; this even when we weren't using any -O switches.
Notice that it isn't even necessary to include the header. C11 7.1.4p2 allows this:
Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header.
Therefore in the following program, the compiler can still assume that the sqrt is the one from the standard library, and the behaviour here is still conforming:
% cat foo.c
int printf(const char * restrict format, ...);
double sqrt(double x);
int main(void) {
printf("%f\n", sqrt(4.0));
}
% gcc foo.c -std=c11 -pedantic -Wall -Werror
% ./a.out
2.000000
If you drop the prototype for sqrt, and compile the program,
int printf(const char * restrict format, ...);
int main(void) {
printf("%f\n", sqrt(4));
}
A conforming C99, C11 compiler must diagnose constraint violation for implicit function declaration. The program is now an invalid program, but it still compiles (the C standard allows that too). GCC still calculates sqrt(4) at compilation time. Notice that we use int here instead of double, so it wouldn't even work at runtime without proper declaration for an ordinary function because without prototype the compiler wouldn't know that the argument must be double and not the int that was passed in (without a prototype, the compiler doesn't know that the int must be converted to a double). But it still works.
% gcc foo.c -std=c11 -pedantic
foo.c: In function ‘main’:
foo.c:4:20: warning: implicit declaration of function ‘sqrt’
[-Wimplicit-function-declaration]
printf("%f\n", sqrt(4));
^~~~
foo.c:4:20: warning: incompatible implicit declaration of built-in function ‘sqrt’
foo.c:4:20: note: include ‘<math.h>’ or provide a declaration of ‘sqrt’
% ./a.out
2.000000
This is because an implicit function declaration is one with external linkage, and C standard says this (C11 7.1.3):
[...] All identifiers with external linkage in any of the following subclauses (including the future library directions) and errno are always reserved for use as identifiers with external linkage. [...]
and Appendix J.2. explicitly lists as undefined behaviour:
[...] The program declares or defines a reserved identifier, other than as allowed by 7.1.4 (7.1.3).
I.e. if the program did actually have its own sqrt then the behaviour is simply undefined, because the compiler can assume that the sqrt is the standard-conforming one.

Why does this C program compile with no errors?

Locked. There are disputes about this question’s content being resolved at this time. It is not currently accepting new answers or interactions.
I have the two C files, main.c and weird.c:
// main.c
int weird(int *);
int
main(void)
{
int x, *y;
y = (int *)7;
x = weird(y);
printf("x = %d\n", x);
return (0);
}
// weird.c
char *weird = "weird";
However, when I run the following:
clang -Wall -Wextra -c main.c
clang -Wall -Wextra -c weird.c
clang -o program main.o weird.o
I do not get any errors. Why is this? Shouldn't there at least be linking errors? Note that I am just talking about compiling the files — not running them. Running gives a segmentation fault.

Should there be a linker error?
The short answer to "Shouldn't there at least be linking errors?" is "There is no guarantee that there'll be a linking error". The C standard doesn't mandate it.
As Raymond Chen noted in a comment:
The language-lawyer answer is that the standard does not require a diagnostic for this error. The practical answer is that C does not type-decorate symbols with external linkage, so the type mismatch goes undetected.
One of the reasons C++ has type-safe linkage is to avoid problems with code analogous to this (though the main reason is to allow for function name overloading — resolving this sort of problem is, perhaps, more a side-effect).
The C standard says:
§6.9 External definitions
¶5 An external definition is an external declaration that is also a definition of a function
(other than an inline definition) or an object. If an identifier declared with external
linkage is used in an expression (other than as part of the operand of a sizeof or
_Alignof operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the identifier; otherwise, there
shall be no more than one.
§5.1.1.1 Program structure
¶1 A C program need not all be translated at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this International Standard. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.
5.1.1.2 Translation phases
All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment.
The linking is done based on the names of external definitions, not on the types of the objects identified by the name. The onus is on the programmer to ensure that the type of the function or object for each external definition is consistent with the way it is used.
Avoiding the problem
In a comment, I said:
This [question] is an argument for making use of headers to ensure that different parts of a program are coherent. If you never declare an external function in a source file but only in headers, and use the headers wherever the relevant symbol (in this case weird) is used or defined, then the code would not all compile. You could either have a function or a string, but not both. You'd have a header weird.h which contains either extern char *weird; or extern int weird(int *p); (but not both), and both main.c and weird.c would include the header, and only one of them would compile successfully.
To which there came the response:
What could I add to these files to ensure that the error is detected and thrown when main.c is compiled?
You'd create 3 source files. The code shown here is slightly more complicated than you'd normally use because it allows you to use conditional compilation to compile the code with either a function or a variable as the 'external identifier with external linkage' called weird. Normally, you'd select one intended representation for weird and only allow that to be exposed.
weird.h
#ifndef WEIRD_H_INCLUDED
#define WEIRD_H_INCLUDED
#ifdef USE_WEIRD_STRING
extern const char *weird;
#else
extern int weird(int *p);
#endif
#endif /* WEIRD_H_INCLUDED */
main.c
#include <stdio.h>
#include "weird.h"
int main(void)
{
int x, *y;
y = (int *)7;
x = weird(y);
printf("x = %d\n", x);
return (0);
}
weird.c
#include "weird.h"
#ifdef USE_WEIRD_STRING
const char *weird = "weird";
#else
int weird(int *p)
{
if (p == 0)
return 42;
else
return 99;
}
#endif
Valid compilation sequences
gcc -c weird.c
gcc -c main.c
gcc -o program weird.o main.o
gcc -o program -DUSE_WEIRD_FUNCTION main.c weird.c
Both these work because the code is compiled to use the weird() function. The header, in both cases, ensures that the compilations are consistent.
Invalid compilation sequence
gcc -c -DUSE_WEIRD_STRING weird.c
gcc -c main.c
gcc -o program weird.o main.o
This is basically the same as the setup in the question. The weird.c file is compiled to create a string called weird, but the main.c code is compiled expecting to use a function weird(). The linker does link the code, but things go disastrously wrong when the function call in main() is retargeted to the "weird". The chances are that the memory where it is stored is not executable and the execution fails because of that. Otherwise, the string is interpreted as machine code and it probably doesn't do anything meaningful and leads to a crash. Neither is desirable; neither is guaranteed — this is a result of invoking undefined behaviour.
If you tried to compile main.c with -DUSE_WEIRD_STRING, the compilation would fail because the header would indicate that weird is a char * and the code would try to use it as a function.
If you replaced the conditional code in weird.c with either the string or the function (unconditionally), then:
Either the compilation would fail if the file contained the function but -DUSE_WEIRD_STRING was set on the command line,
Or the compilation would fail if the file contained the string but you did not set -DUSE_WEIRD_STRING.
Normally, the header would contain an unconditional declaration for weird, either as a function or as a pointer (but without any provision for choosing between them at compile time).
The key point is that the header is included in both source files, so unless the conditional compilation flags make a difference, the compiler can check the code in the source files for consistency with the header, and therefore the two object files stand a chance of working together. If you subvert the checking by setting the compilation flags so that the two source files see different declarations in the header, then you're back to square one.
The header, therefore, declares the interfaces, and the source files are checked to ensure that they adhere to the interface. The headers are the glue that hold the system together. Consequently, any function (or variable) that must be accessed outside its source file should be declared in a header (one header only), and that header should be used in the source file where the function (or variable) is defined, and also in every source file that references the function (or variable). You should not write extern … weird …; in a source file; such declarations belong in a header. All functions (or variables) that are not referenced outside the source file where they're defined should be defined with static. This gives you the maximum chance of spotting problems before you run the program.
You can use GCC to help you. For functions, you can insist on prototypes being in scope before a (non-static) function is referenced or defined (and before a static function is referenced — you can simply define a static function before it is referenced without a separate prototype). I use:
gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Wold-style-declaration …
The -Wall and -Wextra imply some, but not all, of the other -W… options, so that isn't a minimal set. And not all versions of GCC support both the -Wold-style-… options. But together, these options ensure that functions have a full prototype declaration before the function is used.

Neither file on its own contains any error that would cause a problem with compilation. main.c correctly declares (but doesn't define) a function called weird and calls it, and weird.c correctly defines a char * named weird. After compilation, main.o contains an unresolved reference to weird and weird.o contains a definition.
Now, here's the fun part: neither .o file necessarily[*] contains anything about the type of weird. Just names and addresses. By the time linking is happening, it's too late to say "hey, main expects this to be an int(*)(int *) and what you provided is actually a char *!" All the linker does is see that the name weird is provided by one object and referenced by another, and fits the pieces together like a jigsaw puzzle. In C, it's entirely the programmer's job to make sure that all compilation units that use a given external symbol declare it with compatible types (not necessarily identical; there are intricate rules as to what are "compatible types"). If you don't, the resulting behavior is undefined and probably wrong.
[*]: actually I can think of several cases where the object files do contain the types — for example, certain kinds of debugging information, or special .o files for link-time optimization. But as far as I know, even when the type information does exist, the linker doesn't use it to warn about things like this.

I am taking a Linux point of view. Details could be different on other OSes.
Of course your latest edit cannot compile, since:
char *weird = "weird";
printf(weird); // wrong, remove this line
contains a statement (the printf) outside of any function. So let assume you have removed that line. And clang-3.7 -Wall -Wextra -c main.c gives several warnings:
main.c:7:9: warning: cast to 'int *' from smaller integer type 'int' [-Wint-to-pointer-cast]
w = (int *)x;
^
main.c:8:9: warning: cast to 'int *' from smaller integer type 'int' [-Wint-to-pointer-cast]
y = (int *)z;
^
main.c:7:16: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
w = (int *)x;
^
main.c:6:10: note: initialize the variable 'x' to silence this warning
int x, *y, z, *w;
^
= 0
main.c:8:16: warning: variable 'z' is uninitialized when used here [-Wuninitialized]
y = (int *)z;
^
main.c:6:17: note: initialize the variable 'z' to silence this warning
int x, *y, z, *w;
^
= 0
4 warnings generated.
Technically, I guess that your example is some undefined behavior. Then the implementation is not supposed to warn you, and bad things can happen (or not!).
You might have some warning (but I am not sure) if you enabled link time optimization both at compile time and at link time, perhaps with
gcc -flto -Wall -Wextra -O -c main.c
gcc -flto -Wall -Wextra -O -c weird.c
gcc -flto -Wall -Wextra -O main.o weird.o -o program
and you could replace gcc by clang if you wish to. I guess that asking for optimization (-O) is relevant.
Actually I am getting no warnings with clang-3.7 -flto but I am getting a warning (at the last link command) with gcc 6:
% gcc -flto -O -Wall -Wextra weird.o main.o -o program
main.c:1:5: error: variable ‘weird’ redeclared as function
int weird(int *);
^
weird.c:3:7: note: previously declared here
char *weird = "weird";
^
lto1: fatal error: errors during merging of translation units
compilation terminated.
(I am explaining for GCC which I know well, including some of its internals; for clang it should be similar)
With -flto the compiler (e.g. lto1 with GCC) is also running for linking (so can optimize then, e.g. inlining calls between translation units). It is using compiler intermediate representations stored in object files (and these representations contain typing information). Without it, the last command (e.g. your clang main.o weird.o -o program) is simply invoking the linker ld with appropriate options (e.g. for crt0 & C standard library)
Current linkers don't keep or handle any type information (pedantically doing some type erasure, mostly done by the compiler itself). They just manage symbols (that is C identifiers) in some simple symbol table and process relocations. Lack of type information in object files (more precisely, the symbol tables known to the linker) is why name mangling is required for C++.
Read more about ELF, e.g. elf(5), the format used by object files and executables.
Replace clang or gcc by clang -v or gcc -v to understand what is happening (so it would show you the underlying cc1 or lto1 or ld processes).
As others explained, you really should share a common #include-d header file (if the C code is not machine generated but hand written). Some C code generators might avoid generating header files and would generate relevant (and identical) declarations in every generated C file.

when you want to build a executable program, you have to link object.
but now, you just compile the source.
Complier thought
"ah, you'll compile weird.c later. okay. i'll just compile this one"

linking pgi compiled library with gcc linker

I would like to know how to link a pgc++ compiled code (blabla.a) with a main code compiled with c++ or g++ GNU compiler.
For the moment linking with default gnu c++ linker gives errors like:
undefined reference to `__pgio_initu'

As the previous person already pointed out, PGI supports G++ name mangling when using the pgc++ command. Judging from this output, I'm guessing that you're linking with g++ rather than pgc++. I've had the most success when using pgc++ as the linker so that it finds the PGI libraries. If that's not an option, you can link an executable with pgc++ -dryrun to get the full link line and past the -L and -l options from there to get the same libraries.

Different C++ compilers use different name-mangling conventions
to generate the names that they expose to the linker, so the member function
name int A::foo(int) will be emitted to to the linker by compiler A as one string
of goobledegook, and by compiler B as quite a different string of goobledegook,
and the linker has no way of knowing they refer to the same function. Hence
you can't link object files produced by different C++ compilers unless they
employ the same name-mangling convention (and quite possibly not even then: name-mangling
is just one aspect of ABI compatibility.)
That being said, according to this document,
PGC++ supported name-mangling compatibility with g++ 3-and-half years ago, provided that PGI C++ compiler was invoked with precisely the command pgc++ or pgcpp --gnu. It may be that the library you are dealing with was not built in that specific way, or perhaps was built with an older PGI C++ compiler.
Anyhow, if g++ compiles the headers of your blabla.a and emits different
symbols from the ones in blabla.a, you can't link g++ code with blabla.a.
You'd need to rebuild blabla.a with g++, which perhaps is not an option.

GCC flags during linking

Is there any situation in which flags such as -ansi, -Wall, and -pedantic might be relevant during the linking part of the process?
What about the -O optimization flags? Are they only relevant during the compile steps or are they also relevant during linking?
Thanks!

In practice, no - but in theory, -ansi is a dialect option, so it could conceivably affect linking. I've seen similar behaviour with older versions of clang that use libc++ or libstdc++, when using C++11 or C++03 respectively. I find it easier to put these flags in the CC variable: CC = gcc -std=c99 or CC = gcc -std=c90 (ansi).
I just invoke C++ (or C) with $CXX or $CC out of habit. And they are passed by default to configure scripts.
I'm not aware of this being an issue with C, as long as the ABI and calling conventions haven't changed. C++, on the other hand, requires changes to the C++ runtime to support new language features. In either case, it's the compiler that invokes the linker with the relevant libraries.

There is link-time optimization in gcc:
-flto[=n]
This option runs the standard link-time optimizer. When invoked
with source code, it generates GIMPLE (one of GCC's internal
representations) and writes it to special ELF sections in the
object file. When the object files are linked together, all the
function bodies are read from these ELF sections and instantiated
as if they had been part of the same translation unit.
To use the link-time optimizer, -flto needs to be specified at
compile time and during the final link.

Standard functions defined in header files or automatically linked?

When writing a basic c program.
#include <stdio.h>
main(){
printf("program");
}
Is the definition of printf in "stdio.h" or is the printf function automatically linked?

Usually, in stdio.h there's only the prototype; the definition should be inside a library that your object module is automatically linked against (the various msvcrt for VC++ on Windows, libcsomething for gcc on Linux).
By the way, it's <stdio.h>, not "stdio.h".

Usually they are automatically linked, but the compiler is allowed to implement them as it pleases (even by compiler magic).
The #include is still necessary, because it brings the standard functions into scope.

Stricto sensu, the compiler and the linker are different things (and I am not sure that the C standard speaks of compilation & linking, it more abstractly speaks of translation and implementation issues).
For instance, on Linux, you often use gcc to translate your hello.c source file, and gcc is a "driving program" which runs the compiler cc1, the assembler as, the linker ld etc.
On Linux, the <stdio.h> header is an ordinary file. Run gcc -v -Wall -H hello.c -o hello to understand what is happening. The -v option asks gcc to show you the actual programs (cc1 and others) that are used. The -Wall flag asks for all warnings (don't ignore them!). The -H flag asks the compiler to show you the header files which are included.
The header file /usr/include/stdio.h is #include-ing itself other headers. At some point, the declaration of printf is seen, and the compiler parses it and adjust its state accordingly.
Later, the gcc command would run the linker ld and ask it to link the standard C library (on my system /usr/lib/x86_64-linux-gnu/libc.so). This library contains the [object] code of printf
I am not sure to understand your question. Reading wikipedia's page about compilers, linkers, linux kernel, system calls should be useful.
You should not want gcc to link automagically your own additional libraries. That would be confusing. (but if you really wanted to do that with GCC, read about GCC specs file)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight