I am a C beginner so I tried to hack around the stuff.
I read stdio.h and I found this line:
extern int printf (const char *__restrict __format, ...);
So I wrote this code and i have no idea why it works.
code:
extern int printf (const char *__restrict __format, ...);
main()
{
printf("Hello, World!\n");
}
output:
sh-5.1$ ./a.out
Hello, World!
sh-5.1$
Where did GCC find the function printf? It also works with other compilers.
I am a beginner in C and I find this very strange.
gcc will link your program, by default, with the c library libc which implements printf:
$ ldd ./a.out
linux-vdso.so.1 (0x00007ffd5d7d3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf2d307000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdf2d4f0000)
$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf' | head -1
0000000000056cf0 T printf##GLIBC_2.2.5
If you build your program with -nolibc you have to satisfy a few symbols on your own (see
Compiling without libc):
$ gcc -nolibc ./1.c
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
/usr/bin/ld: (.text+0x19): undefined reference to `__libc_csu_init'
/usr/bin/ld: (.text+0x26): undefined reference to `__libc_start_main'
/usr/bin/ld: /tmp/user/1000/ccCFGFhf.o: in function `main':
1.c:(.text+0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status
You need to understand the difference between the compile and link phases of program compilation.
In the compilation phase you describe to the compiler the various things you intend to call that may be in this file, in other files or in libraries. This is done using function declarations.
int woodle(char*);
for example. This is what header files are full of.
If the function is in the same file then the compiler will work out how to call it while it compiles that file. But for other functions it leaves a note in the generated code that says
please wire up the woodle function here so I can call it.
Usually called an import and there are tools you can use to look at the imports in an object file - name depends on platform and toolset
The linkers job is to find those imports and resolve them. It will look at objects files passed on the command line, at libraries included on the command line and also standard libraries that the c standard says should be available to all programs.
In your printf case the linker found printf in the c standard library that the linker includes automatically.
BTW - the linker looks for 'exports' from objects and libraries, there are tools to look at those too. The linkers job is to match each 'import' to an 'export'
First, realize what the gcc program is. Technically, it is not a compiler, but a compiler driver. A compiler driver is responsible for driving the various other tools which perform compilation-related tasks. Some of the tools are found in PATH, whereas others are in internal compiler directories.
There are various ways to check what the driver is doing. I won't go into much detail about how I made the rest of this post, but briefly:
strace -f -e %process gcc is a Linux-specific way of showing all the programs executed (elsewhere in this answer, I assume Linux when specifying details but it doesn't matter)
gcc -v will dump out various information, but you have to learn what parts actually matter for whatever you are doing.
there exists a "specs" file that controls some of the argument-related stuff the driver does
Now for the actual data:
Here's the tree of processes that gcc might execute:
gcc, the "driver" (input various, output various. Some arguments are handled by the driver itself, but most are passed to the various subprocesses)
(these are repeated for every input file. If -pipe is passed, temporary files are omitted and processes are run in parallel; if --save-temps is passed, intermediate files are preserved):
cc1 -E -lang-asm, the "preprocessor" for assembly code (input .S, output .s - yes, case matters. Only relevant if you're trying to compile separate ASM files that need preprocessing)
cc1 -E, the "preprocessor" for C code (input .c; output .i. Only a separate process if -fno-integrated-cpp is passed, which is rare. Note that the cpp program in PATH is never called, even though it is provided by GCC - rather, it calls this. If -E is passed, the driver stops after this)
cc1, the "compiler" proper (input (usually) .c or (rarely) .i; output .s. If -S is passed, the driver stops after this; if -fsyntax-only is passed, this stage doesn't even complete)
(For other languages, replace cc1 with cc1plus, cc1d, cc1obj, f951, gnat1, etc. Note that the different drivers like g++, gdc, etc. only affect what extra libraries are linked by default)
as, the "assembler" (input .s; output .o. This is looked up in PATH; it is shipped as part of Binutils, not GCC. If -c is passed, the driver stops here)
collect2, the "linker" wrapper (supposedly this has something to do with constructors, and potentially calls ld twice, but in practice I've never seen it. Just think of it as forwarding all its arguments to ld, even if you have constructors normally)
ld, the "linker" proper (input .o or others (assumed to be libraries); output executable or shared library. Like as, this is actually part of Binutils, not GCC, so it is looked up in PATH)
The driver has a lot of logic, so it is important that you use it. Notably, you should never call as or ld yourself, since that will omit arguments that rely on the driver's sense of "exact current platform".
Now, getting to your specific question:
Ignoring irrevelant arguments and simplifying paths, the ld call ends up looking like:
ld -o foo Scrt1.o crti.o crtbeginS.o foo.o -lgcc -lgcc_s -lc -lgcc -lgcc_s crtendS.o crtn.o
The various "crt" loose object files are a mixture of parts of GLIBC and GCC, needed to support the C runtime (note that there are others as well; which are linked depends on arguments). The gcc and gcc_s libraries are needed to run code on the platform at all; they are repeated because they rely on the c library which also relies on them.
Since -lc is passed by default (regardless of language), the printf symbol can be resolved. Notably, -lm, -lrt, -lpthread and others are not passed by default, so other symbols from differents parts of the C library will not be resolved unless you pass them manually.
All of this is completely independent of what headers are included.
That your program compiles without a header present means that the compiler settings were lenient. You should still get a warning though. The reason that your program links is that the C standard library, which contains the code of the function printf, is linked automatically. Almost every C program needs it because input and output, or generally interaction with peripherals, which that library handles, are the general means of generating a "side effect", an effect outside the program. The opposite is so uncommon that one must make the wish to not link with it explicit.
So why does your compiler accept a call to a function which has not been declared?
C emerged at a time when programs were much smaller and software development as an engineering discipline didn't formally exist:
Four years later [i.e., in 1978], as a still-junior faculty member, I tried to get my colleagues [...] to create an undergraduate computer-science degree. A senior mechanical engineer of forbidding mien snorted surely not: Harvard had never offered a degree in automotive science, why would we create one in computer science? I waited until I had tenure before trying again (and succeeding) in 1982. -Harry R. Lewis
That was about 10 years after Denis Ritchie had started to develop this versatile new programming language, the successor to B. The problems involved in creating and maintaining large programs back then were simply not as pressing and not as well-understood as they are, perhaps, today.
Among the many things that help us today, at least in most compiled languages, is strong typing. Every identifier we use is declared with a static type. But the importance and benefits of that were not that obvious in the 1970s, and early C permitted mixing and matching integers and pointers at will. It's all numbers, right? And a function is just a name for a jump address, right? The user will know what to put on the stack, and the function will read it off the stack — I really don't see a problem here ;-). This attitude brought us functions like printf().
After this stage-setting we are slowly getting to the point. Because a function is just a jump address, no function declaration needed to be present in order to to call one. The assumed parameters were what you presented, and the presumed return type defaulted to int, which was often correct or at least didn't hurt. And for a long time C kept this backward compatibility. I think the C99 standard forbid the use of undeclared identifiers, and the standard drafts for C11 and C21 both say:
An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator)91
Footnote 91 says "Thus, an undeclared identifier is a violation of the syntax." (All emphasis by me.)
All compilers I tried compile it anyway (with a warning), perhaps because some ancient code that still gets compiled frequently depends on it.
Related
I am trying to build a Fortran program, but I get errors about an undefined reference or an unresolved external symbol. I've seen another question about these errors, but the answers there are mostly specific to C++.
What are common causes of these errors when writing in Fortran, and how do I fix/prevent them?
This is a canonical question for a whole class of errors when building Fortran programs. If you've been referred here or had your question closed as a duplicate of this one, you may need to read one or more of several answers. Start with this answer which acts as a table of contents for solutions provided.
A link-time error like these messages can be for many of the same reasons as for more general uses of the linker, rather than just having compiled a Fortran program. Some of these are covered in the linked question about C++ linking and in another answer here: failing to specify the library, or providing them in the wrong order.
However, there are common mistakes in writing a Fortran program that can lead to link errors.
Unsupported intrinsics
If a subroutine reference is intended to refer to an intrinsic subroutine then this can lead to a link-time error if that subroutine intrinsic isn't offered by the compiler: it is taken to be an external subroutine.
implicit none
call unsupported_intrinsic
end
With unsupported_intrinsic not provided by the compiler we may see a linking error message like
undefined reference to `unsupported_intrinsic_'
If we are using a non-standard, or not commonly implemented, intrinsic we can help our compiler report this in a couple of ways:
implicit none
intrinsic :: my_intrinsic
call my_intrinsic
end program
If my_intrinsic isn't a supported intrinsic, then the compiler will complain with a helpful message:
Error: ‘my_intrinsic’ declared INTRINSIC at (1) does not exist
We don't have this problem with intrinsic functions because we are using implicit none:
implicit none
print *, my_intrinsic()
end
Error: Function ‘my_intrinsic’ at (1) has no IMPLICIT type
With some compilers we can use the Fortran 2018 implicit statement to do the same for subroutines
implicit none (external)
call my_intrinsic
end
Error: Procedure ‘my_intrinsic’ called at (1) is not explicitly declared
Note that it may be necessary to specify a compiler option when compiling to request the compiler support non-standard intrinsics (such as gfortran's -fdec-math). Equally, if you are requesting conformance to a particular language revision but using an intrinsic introduced in a later revision it may be necessary to change the conformance request. For example, compiling
intrinsic move_alloc
end
with gfortran and -std=f95:
intrinsic move_alloc
1
Error: The intrinsic ‘move_alloc’ declared INTRINSIC at (1) is not available in the current standard settings but new in Fortran 2003. Use an appropriate ‘-std=*’ option or enable ‘-fall-intrinsics’ in order to use it.
External procedure instead of module procedure
Just as we can try to use a module procedure in a program, but forget to give the object defining it to the linker, we can accidentally tell the compiler to use an external procedure (with a different link symbol name) instead of the module procedure:
module mod
implicit none
contains
integer function sub()
sub = 1
end function
end module
use mod, only :
implicit none
integer :: sub
print *, sub()
end
Or we could forget to use the module at all. Equally, we often see this when mistakenly referring to external procedures instead of sibling module procedures.
Using implicit none (external) can help us when we forget to use a module but this won't capture the case here where we explicitly declare the function to be an external one. We have to be careful, but if we see a link error like
undefined reference to `sub_'
then we should think we've referred to an external procedure sub instead of a module procedure: there's the absence of any name mangling for "module namespaces". That's a strong hint where we should be looking.
Mis-specified binding label
If we are interoperating with C then we can specify the link names of symbols incorrectly quite easily. It's so easy when not using the standard interoperability facility that I won't bother pointing this out. If you see link errors relating to what should be C functions, check carefully.
If using the standard facility there are still ways to trip up. Case sensitivity is one way: link symbol names are case sensitive, but your Fortran compiler has to be told the case if it's not all lower:
interface
function F() bind(c)
use, intrinsic :: iso_c_binding, only : c_int
integer(c_int) :: f
end function f
end interface
print *, F()
end
tells the Fortran compiler to ask the linker about a symbol f, even though we've called it F here. If the symbol really is called F, we need to say that explicitly:
interface
function F() bind(c, name='F')
use, intrinsic :: iso_c_binding, only : c_int
integer(c_int) :: f
end function f
end interface
print *, F()
end
If you see link errors which differ by case, check your binding labels.
The same holds for data objects with binding labels, and also make sure that any data object with linkage association has matching name in any C definition and link object.
Equally, forgetting to specify C interoperability with bind(c) means the linker may look for a mangled name with a trailing underscore or two (depending on compiler and its options). If you're trying to link against a C function cfunc but the linker complains about cfunc_, check you've said bind(c).
Not providing a main program
A compiler will often assume, unless told otherwise, that it's compiling a main program in order to generate (with the linker) an executable. If we aren't compiling a main program that's not what we want. That is, if we're compiling a module or external subprogram, for later use:
module mod
implicit none
contains
integer function f()
f = 1
end function f
end module
subroutine s()
end subroutine s
we may get a message like
undefined reference to `main'
This means that we need to tell the compiler that we aren't providing a Fortran main program. This will often be with the -c flag, but there will be a different option if trying to build a library object. The compiler documentation will give the appropriate options in this case.
There are many possible ways you can see an error like this. You may see it when trying to build your program (link error) or when running it (load error). Unfortunately, there's rarely a simple way to see which cause of your error you have.
This answer provides a summary of and links to the other answers to help you navigate. You may need to read all answers to solve your problem.
The most common cause of getting a link error like this is that you haven't correctly specified external dependencies or do not put all parts of your code together correctly.
When trying to run your program you may have a missing or incompatible runtime library.
If building fails and you have specified external dependencies, you may have a programming error which means that the compiler is looking for the wrong thing.
Not linking the library (properly)
The most common reason for the undefined reference/unresolved external symbol error is the failure to link the library that provides the symbol (most often a function or subroutine).
For example, when a subroutine from the BLAS library, like DGEMM is used, the library that provides this subroutine must be used in the linking step.
In the most simple use cases, the linking is combined with compilation:
gfortran my_source.f90 -lblas
The -lblas tells the linker (here invoked by the compiler) to link the libblas library. It can be a dynamic library (.so, .dll) or a static library (.a, .lib).
In many cases, it will be necessary to provide the library object defining the subroutine after the object requesting it. So, the linking above may succeed where switching the command line options (gfortran -lblas my_source.f90) may fail.
Note that the name of the library can be different as there are multiple implementations of BLAS (MKL, OpenBLAS, GotoBLAS,...).
But it will always be shortened from lib... to l... as in liopenblas.so and -lopenblas.
If the library is in a location where the linker does not see it, you can use the -L flag to explicitly add the directory for the linker to consider, e.g.:
gfortran -L/usr/local/lib -lopenblas
You can also try to add the path into some environment variable the linker searches, such as LIBRARY_PATH, e.g.:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/lib
When linking and compilation are separated, the library is linked in the linking step:
gfortran -c my_source.f90 -o my_source.o
gfortran my_source.o -lblas
Not providing the module object file when linking
We have a module in a separate file module.f90 and the main program program.f90.
If we do
gfortran -c module.f90
gfortran program.f90 -o program
we receive an undefined reference error for the procedures contained in the module.
If we want to keep separate compilation steps, we need to link the compiled module object file
gfortran -c module.f90
gfortran module.o program.f90 -o program
or, when separating the linking step completely
gfortran -c module.f90
gfortran -c program.f90
gfortran module.o program.o -o program
Problems with the compiler's own libraries
Most Fortran compilers need to link your code against their own libraries. This should happen automatically without you needing to intervene, but this can fail for a number of reasons.
If you are compiling with gfortran, this problem will manifest as undefined references to symbols in libgfortran, which are all named _gfortran_.... These error messages will look like
undefined reference to '_gfortran_...'
The solution to this problem depends on its cause:
The compiler library is not installed
The compiler library should have been installed automatically when you installed the compiler. If the compiler did not install correctly, this may not have happened.
This can be solved by correctly installing the library, by correctly installing the compiler. It may be worth uninstalling the incorrectly installed compiler to avoid conflicts.
N.B. proceed with caution when uninstalling a compiler: if you uninstall the system compiler it may uninstall other necessary programs, and may render other programs unusable.
The compiler cannot find the compiler library
If the compiler library is installed in a non-standard location, the compiler may be unable to find it. You can tell the compiler where the library is using LD_LIBRARY_PATH, e.g. as
export LD_LIBRARY_PATH="/path/to/library:$LD_LIBRARY_PATH"
If you can't find the compiler library yourself, you may need to install a new copy.
The compiler and the compiler library are incompatible
If you have multiple versions of the compiler installed, you probably also have multiple versions of the compiler library installed. These may not be compatible, and the compiler might find the wrong library version.
This can be solved by pointing the compiler to the correct library version, e.g. by using LD_LIBRARY_PATH as above.
The Fortran compiler is not used for linking
If you are linking invoking the linker directly, or indirectly through a C (or other) compiler, then you may need to tell this compiler/linker to include the Fortran compiler's runtime library. For example, if using GCC's C frontend:
gcc -o program fortran_object.o c_object.o -lgfortran
Code as below:
int main (int argc, char *argv[]) {
long pid = (long)getpid();
long test = pid + 1;
}
Have not included any head files, still can compile code successfully and still can run program successfully.
Why?
Environment info: Ubuntu 18.04.2 LTS, gcc (Ubuntu 4.8.5-4ubuntu8) 4.8.5
Have not included any head files,still can compile code successfully.
still can run program successfuly.Why?
Why not?
All question of whether the particular code presented conforms to the language standard notwithstanding, language non-conformance does not imply that compilation or execution must fail. Instead, you get undefined behavior, which can manifest in any manner within the power of the machine to produce, including compiling successfully and running as intended.
In your particular case, however, you are using GCC 4.8.5. The GCC 4.8 series defaults to compiling for the C90 standard, with GNU extensions. C90 allows calls to functions with no in-scope declaration, for compatibility with earlier, pre-standardization practice. That is no longer allowed in C99 or later, but many implementations nevertheless continue to accept it as an extension.
It should be understood, however, that C interprets some argument lists differently when the called function has an in-scope prototype than when it doesn't (which may be the case even for functions that are declared, because not all declarations provide prototypes). You may be able to get away with calling undeclared functions under some circumstances, but it is poor style, and if you do it enough then it will bite you at some point.
Note also that GCC 4 definitely has the ability to emit warnings about usage such as yours, even when compiling in C90 or GNU90 mode. You would be well served to turn on the -Wall option when you compile, and maybe additional warning options as well.
Normally, to use a dynamic library function you would have to link against the specified library at compile time through the -l switch, like for example gcc -lm prog.c when using the mathematic functions (from libm).
However, since it's so common, GCC always links the standard C library by default, meaning that doing gcc prog.c is actually the same as doing gcc -lc prog.c. This is always done whether you include any header or not.
The second thing that makes this work, is that GCC assumes any function that has not been declared at compile time to have the signature int func(void). In this case, the signature is quite similar to the one of the real getpid() function.
If you take a look at your compiled program with the ldd tool to show which dynamic libraries are required, you'll see that the program is linked against libc:
$ ldd prog
linux-vdso.so.1 (0x00007ffede7d0000)
==> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0f47018000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0f475b9000)
When your program runs, it asks the dynamic loader where (at which address) to find the getpid function before calling it the first time. The dynamic loaders checks the loaded libraries, and finds a function with that name in libc, so everything seems to work without a problem.
You can tell GCC to not link against the standard library by default using the -nostdlib compiler switch, but this is not that useful in your case. The real solution is to always treat warnings about implicit function declarations as errors (-Werror=implicit-function-declaration).
If your compiler doesn't give this warning by default, I would suggest you to upgrade it to a newer version. GCC 4 is definitely not the latest version available for Ubuntu 18.
$ sudo apt update
$ sudo apt upgrade
Without #include<ctype.h>, the following program outputs 1 and 0.
With the include, it outputs 1 and 1.
I am using TDM-GCC 4.9.2 64-bit. I wonder what the implementation of isdigit is in the first case, and why it is able to link.
#include<stdio.h>
//#include<ctype.h>
int main()
{
printf("%d %d\n",isdigit(48),isdigit(48.4));
return 0;
}
By default GCC uses the C90 standard (with GNU extensions (reference)) which allows implicit declarations. The problem with your case is that you have two calls to isdigit with two different arguments which might confuse the compiler when it creates the implicit declaration of the function, and it probably selects int isdigit(double) to be on the safe side. That is of course the wrong prototype for the function, which means that when the library function is called at run-time it will be called with wrong arguments and you will have undefined behavior.
When you include the <ctype.h> header file, there is a correct prototype, and then the compiler know that isdigit takes an int argument and can convert the double literal 48.4 to the integer 48 for the call.
As for why it's linking, it's because while these functions may be implemented as macros, that's not a requirement. What is a requirement is that those functions, at least in the C11 standard (I don't have any older version available at the moment), have to be aware of the current locale which will make their implementation as macros much harder, and much easier as normal library functions. And as the standard library is always linked (unless you tell GCC otherwise) the functions will be available.
First of all #include statements don't have anything to do with linking. Remember anything with a # in-front in C is meant for the preprocessor, not the compiler or the linker.
But that said the function has to be linked isn't it?
Let's do the steps in separate steps.
$ gcc -c -Werror --std=c99 st.c
st.c: In function ‘main’:
st.c:5:22: error: implicit declaration of function ‘isdigit’ [-Werror=implicit-function-declaration]
printf("%d %d\n",isdigit(48),isdigit(48.4));
^
cc1: all warnings being treated as errors
Well as you see gcc's lint(static analyzer) is in action!
Whatever we will proceed to ignore it...
$ gcc -c --std=c99 st.c
st.c: In function ‘main’:
st.c:5:22: warning: implicit declaration of function ‘isdigit’ [-Wimplicit-function-declaration]
printf("%d %d\n",isdigit(48),isdigit(48.4));
This time only an warning. Now we have a object file at the current directory. Let's inspect it...
$ nm st.o
U isdigit
0000000000000000 T main
U printf
As you can see both printf and isdigit is listed as undefined. So the code has to come from somewhere isn't it?
let's proceed to link it ...
$ gcc st.o
$ nm a.out | grep 'printf\|isdigit'
U isdigit##GLIBC_2.2.5
U printf##GLIBC_2.2.5
Well as you can see situation is mildly improved. As isdigit and printf are not helpless loners like they were in the st.o. You could see both of the functions are provided by GLIBC_2.2.5. But where is that GLIBC?
Well let's examine the final executable a bit more...
$ ldd a.out
linux-vdso.so.1 => (0x00007ffe58d70000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb66f299000)
/lib64/ld-linux-x86-64.so.2 (0x000055b26631d000)
AHA...there is that libc . So it turns out, though you have not given any instruction, the linker is linking with 3 libraries by default, one of them is the libc which contains both printf and isdigit.
You can see the default behaviour of the linker by :
$gcc -dumpspec
*link:
%{!r:--build-id} %{!static:--eh-frame-hdr} %{!mandroid|tno-android-ld:%{m16|m32|mx32:;:-m elf_x86_64} %{m16|m32:-m elf_i386} %{mx32:-m elf32_x86_64} --hash-style=gnu --as-needed %{shared:-shared} %{!shared: %{!static: %{rdynamic:-export-dynamic} %{m16|m32:-dynamic-linker %{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}} %{m16|m32|mx32:;:-dynamic-linker %{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2}}} %{mx32:-dynamic-linker %{muclibc:/lib/ldx32-uClibc.so.0;:%{mbionic:/system/bin/linkerx32;:/libx32/ld-linux-x32.so.2}}}} %{static:-static}};:%{m16|m32|mx32:;:-m elf_x86_64} %{m16|m32:-m elf_i386} %{mx32:-m elf32_x86_64} --hash-style=gnu --as-needed %{shared:-shared} %{!shared: %{!static: %{rdynamic:-export-dynamic} %{m16|m32:-dynamic-linker %{muclibc:/lib/ld-uClibc.so.0;:%{mbionic:/system/bin/linker;:/lib/ld-linux.so.2}}} %{m16|m32|mx32:;:-dynamic-linker %{muclibc:/lib/ld64-uClibc.so.0;:%{mbionic:/system/bin/linker64;:/lib64/ld-linux-x86-64.so.2}}} %{mx32:-dynamic-linker %{muclibc:/lib/ldx32-uClibc.so.0;:%{mbionic:/system/bin/linkerx32;:/libx32/ld-linux-x32.so.2}}}} %{static:-static}} %{shared: -Bsymbolic}}
What are the other two libraries?
Well remember when you dug into a.out, both printf and isdigit were still shown as U that means unknown. In other words, there were no memory address associated with these symbols.
In reality this is where the magic lies. These libraries were actually loaded during runtime, not during link time like older systems.
How it's implemented? Well it has a jargon associated with, something like lazy linking. What it does, is when the process calls a function , if there is no memory address(TEXT section), it generates a Trap (Something like a Exception in high level language jargon, when control is handed over to the language engine). The kernel intercepts such Trap and hands it over to the dynamic loader which loads the library and returns the associated memory address to the caller process.
There are multiple theoretical reason, why doing things lazily is better than doing it beforehand. I guess that's a whole new topic, which we will discuss at some other time.
new to using C
Header files for libraries like stdlib do not contain the actual implementation code for the functions they provide access to. I understand that the actual source text for libraries like this aren't needed to compile, but how does this work specifically? Are the implementation details for these libraries contained within the compiler?
When you use a function like printf(), including the header file essentially pastes in code for the declaration of the function, but normally the implementation code would need to be available as well.
What form is it stored in? (and where?) Is this compiler specific? Would it be possible to write custom code and reference it in this way without modifying the behavior of the compiler?
I've been searching around and found some info that is relevant but nothing specific. This could be related to not formulating the question well. Thanks.
When you link a program, the compiler will implicitly add some extra libraries to your program:
$ ls
main.c
$ cc -c main.c
$ cc main.o
$ ls
main.c main.o a.out
You can discover the extra libraries a program uses with ldd. Here, there are three libraries linked into the program, and I didn't ask for any of them:
$ ldd a.out
linux-vdso.so => (0x00...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00...)
/lib64/ld-linux-x86-64.so.2 (0x00...)
So, what happens if we link without these libraries? That's easy enough, just use the linker (ld) directly, instead of calling it through cc. When you use ld, it doesn't give you these extra libraries, so you get an error:
$ ld main.o
Undefined symbols:
"_printf", referenced from:
_main in main.o
The implementation for printf() is stored in the standard C library, which is usually just another library on your system... the only difference is that it gets automatically included into your program when you compile C.
You can use nm to find out what symbols are in a library, so I can use it to find printf() in libc:
$ nm -D /lib/x86_64-linux-gnu/libc-2.13.so | grep printf
...
000000000004e4b0 T printf
...
So, now that we know that libc has printf(), we can use -lc to tell the linker to include libc, and that will get rid of the errors about printf() being missing:
$ ld main.o -lc
There might be some other bits missing, and that's why we use cc to link our programs instead of ld: cc gives us all the default libraries.
When you compile a file you only need to promise the compiler that you have certain functions and symbols. A function call is in the compiled into a call [some_address]
The compiler will compile each C-file into object files that just have place holders for calls to functions declared in the headers. That is [some_address] does not need to be known at this point.
A number of oject files can be collected into what is known as a library.
After that it is the linkers job to look through all object files and libraries it know of and find out what the real value of all unknown [some_address] is and translate the call to, e.g. call 0x1234 if the particular function you are calling starts at 0x1234 (or it might be a relative offset from the current program pointer.
Stdlib and other library functions are implemented in an object library. A library is a collection of code that is linked with your program. By default C programs are linked against the stdlib library, which is usually provided by the operating system. Most modern operating systems use a dynamical linker. That is, your program is not linked against the library until it is executed. When it is being loaded, the linker-loader combines your code and the library code in your program's address space. You code and then make a call to the printf() code that is located in that library.
Usually a header file contains only a function prototype while the implementation is either in a separate source file or a precompiled library in the case of stdlib (and other libraries, both shipped with a compiler or available separately) the precompiled library gets linked at the end of the compilation process. (There's also a distinction between static and dynamic libraries, but I won't go into detail about that)
The implementation of standard libraries (which are shipped with a compiler) are usually compiler specific (there is a standard describing which functions have to be in a library, but the compiler programmer can decide how exactly he implements them) and it is (in theory) possible to exchange these libraries with some of your own without modifying the behaviour of the compiler (though not recommended as you would have to rewrite the entire library in order to ensure that all functions are contained).
This question already has answers here:
Undefined reference to sqrt (or other mathematical functions)
(5 answers)
Closed 4 years ago.
I have the following code (stripped down to the bare basics for this question):
#include<stdio.h>
#include<math.h>
double f1(double x)
{
double res = sin(x);
return 0;
}
/* The main function */
int main(void)
{
return 0;
}
When compiling it with gcc test.c I get the following error, and I can't work out why:
/tmp/ccOF5bis.o: In function `f1':
test2.c:(.text+0x13): undefined reference to `sin'
collect2: ld returned 1 exit status
However, I've written various test programs that call sin from within the main function, and those work perfectly. I must be doing something obviously wrong here - but what is it?
You have compiled your code with references to the correct math.h header file, but when you attempted to link it, you forgot the option to include the math library. As a result, you can compile your .o object files, but not build your executable.
As Paul has already mentioned add "-lm" to link with the math library in the step where you are attempting to generate your executable.
In the comment, linuxD asks:
Why for sin() in <math.h>, do we need -lm option explicitly; but,
not for printf() in <stdio.h>?
Because both these functions are implemented as part of the "Single UNIX Specification". This history of this standard is interesting, and is known by many names (IEEE Std 1003.1, X/Open Portability Guide, POSIX, Spec 1170).
This standard, specifically separates out the "Standard C library" routines from the "Standard C Mathematical Library" routines (page 277). The pertinent passage is copied below:
Standard C Library
The Standard C library is automatically searched by
cc to resolve external references. This library supports all of the
interfaces of the Base System, as defined in Volume 1, except for the
Math Routines.
Standard C Mathematical Library
This library supports
the Base System math routines, as defined in Volume 1. The cc option
-lm is used to search this library.
The reasoning behind this separation was influenced by a number of factors:
The UNIX wars led to increasing divergence from the original AT&T UNIX offering.
The number of UNIX platforms added difficulty in developing software for the operating system.
An attempt to define the lowest common denominator for software developers was launched, called 1988 POSIX.
Software developers programmed against the POSIX standard to provide their software on "POSIX compliant systems" in order to reach more platforms.
UNIX customers demanded "POSIX compliant" UNIX systems to run the software.
The pressures that fed into the decision to put -lm in a different library probably included, but are not limited to:
It seems like a good way to keep the size of libc down, as many applications don't use functions embedded in the math library.
It provides flexibility in math library implementation, where some math libraries rely on larger embedded lookup tables while others may rely on smaller lookup tables (computing solutions).
For truly size constrained applications, it permits reimplementations of the math library in a non-standard way (like pulling out just sin() and putting it in a custom built library.
In any case, it is now part of the standard to not be automatically included as part of the C language, and that's why you must add -lm.
I still have the problem with -lm added:
gcc -Wall -lm mtest.c -o mtest.o
mtest.c: In function 'f1':
mtest.c:6:12: warning: unused variable 'res' [-Wunused-variable]
/tmp/cc925Nmf.o: In function `f1':
mtest.c:(.text+0x19): undefined reference to `sin'
collect2: ld returned 1 exit status
I discovered recently that it does not work if you specify -lm first. The order matters. You must specify -lm last, like this:
gcc mtest.c -o mtest.o -lm
That links without problems.
So, you must specify the libraries at the end.
You need to link with the math library, libm:
$ gcc -Wall foo.c -o foo -lm
I had the same problem, which went away after I listed my library last: gcc prog.c -lm