Modular programming and compiling a C program in linux

Modular programming and compiling a C program in linux - c

So I have been studying this Modular programming that mainly compiles each file of the program at a time. Say we have FILE.c and OTHER.c that both are in the same program. To compile it, we do this in the prompt
$gcc FILE.c OTHER.c -c
Using the -c flag to compile it into .o files (FILE.o and OTHER.o) and only when that happens do we translate it (compile) to executable using
$gcc FILE.o OTHER.o -o
I know I can just do it and skip the middle part but as it shows everywhere, they do it first and then they compile it into executable, which I can't understand at all.
May I know why?

If you are working on a project with several modules, you don't want to recompile all modules if only some of them have been modified. The final linking command is however always needed. Build tools such as make is used to keep track of which modules need to be compiled or recompiled.

Doing it in two steps allows to separate more clearly the compiling and linking phases.
The output of the compiling step is object (.o) files that are machine code but missing the external references of each module (i.e. each c file); for instance file.c might use a function defined in other.c, but the compiler doesn't care about that dependency in that step;
The input of the linking step is the object files, and its output is the executable. The linking step bind together the object files by filling the blanks (i.e. resolving dependencies between objets files). That's also where you add the libraries to your executable.

This part of another answer responds to your question:
You might ask why there are separate compilation and linking steps.
First, it's probably easier to implement things that way. The compiler
does its thing, and the linker does its thing -- by keeping the
functions separate, the complexity of the program is reduced. Another
(more obvious) advantage is that this allows the creation of large
programs without having to redo the compilation step every time a file
is changed. Instead, using so called "conditional compilation", it is
necessary to compile only those source files that have changed; for
the rest, the object files are sufficient input for the linker.
Finally, this makes it simple to implement libraries of pre-compiled
code: just create object files and link them just like any other
object file. (The fact that each file is compiled separately from
information contained in other files, incidentally, is called the
"separate compilation model".)
It was too long to put in a comment, please give credit to the original answer.

Related

Building a C program in parallel

I have a medium-size project which consists of many *.c unit files.
In a "normal" compilation exercise, the program is built from its *.o object files, which are passed as pre-requisite of the main program in the Makefile recipe. This works well for parallel builds : with make -j, all these object files are compiled in parallel, then linked together at the end. It makes the whole build experience a lot faster.
However, in other cases, the list of prerequisites is passed as a list of *.c unit files, not *.o object files. The original intention is to not build these object files, as they could pollute the cache.
Now, this could probably be done differently, but for this specific project, the Makefile is an immutable object, so it can't be updated, and we have to live with it.
In this case, using make -j is not effective, as gcc will effectively receive the full list of units directly on a single command line. Which means, make is no longer able to organize parallelism.
The only opportunity I've got left is to pass flags and parameters to make. I was trying to find one which would make gcc compile a list of units in parallel, internally. I couldn't find any. Searching around on Internet, I found conjectures stating that "since make can do parallel build, gcc doesn't need to replicate this functionality". But no solution.
So the question is : how to deal with it ?
Assuming a compilation line like gcc *.c -o final_exe, which can be altered through standard flags (CC, CFLAGS, CPPFLAGS, LDFLAGS), is there any option available to make it build these units in parallel ?

List "never linked against" source file in C project

I would like to know if someone is aware of a trick to retrieve the list of files that had been (or ideally will be) used by linker to produce an executable.
Some kind of solution must exist. A a static source analyzer, or a hack, such as compiling with some weird flags, and analyzing produced executable with another tool, or force the linker to output this information.
The goal is to provide a tool that strip useless source files from a list of source files.
The end goal is to ease the build process, by allowing him to give a list of usable source files. Then my tool would only compile the ones actually used by linker instead of everything.
This would allow for some unit_test to still be runnable even if some others are broken and can't compile, while not asking the user to manually list every test dependencies manually in the cmake.
I am targetting linux for now, but will be intersted in the futur to do the same trick on others OS. So I would like a cross-platform solution, eventhought I doubt I will have it :)
Thanks for your help
Edit because I see that it is confusing, what I mean by
allowing him to give a list of usable source file
is that, in cmake, for exemple. If you use add_executable(name, sources), then sources is considered as the sources to compile and link on.
I want to wrap add_executable, so sources is viewed as a set of usable if necessary sources files.

I'm afraid the idea of detecting never linked source files is not a fruitful one.
To build a program, CMake will not compile a source file if it not going to link the resulting object
file into the program. I can understand how you might think that this happens, but it doesn't happen.
CMake already does what you would like it to do and the same is true of every other build automation system going back to
their invention in the 1970s. The fundamental purpose of all
such systems is to ensure that the building of a program
compiles a source file name.(c|cc|f|m|...) if and only if
the object file name.o is going to be linked into the program
and is out of date or does not exist. You can always defeat this purpose by
egregiously bad coding of the project's build spec (CMakeLists.txt, Makefile, SConstruct, etc.),
but with CMake you would need to be really trying to do it, and
trying quite expertly.
If you do not want name.c to be compiled and the object file name.o
linked into a target program, then you do not tell the build system
that name.o or name.c is a prerequisite of the program. Don't tell
it what you know is not true. It is elementary competence not to specify redundant prerequisites of
a build system target.
The linker will link all its input object files into an output
program without question. It does not ask whether or not they are "needed"
by the program because it cannot answer that question. Neither the
linker nor any possible static analysis tool can know what program
you intend to produce when you input some object files for linkage.
It can only be assumed that you intend to produce the program that
results from the linkage of those object files, assuming the
linkage is successful.
If those object files cannot be linked into a program at all, the linker will tell you
that, and why. Otherwise, if you have linked object files that you didn't
intend to link, you can only discover that for yourself, by noticing
the mistake in the build log, or failing that by testing the program and/or inspecting its contents and comparing
your observations with your expectations.
Given your choice of object files for linkage, you can instruct the linker
to detect any code sections or data sections it extracts those object files in
which no symbols are defined that can be referenced by the program, and to
throw away all such unreferenced input sections instead of linking them
into the program. This is called linktime "garbage collection". You tell the
linker to do it by passing the option -Wl,-gc-sections in the
gcc linkage command. See this question
to learn how to maximise the collectible garbage. This is what you
can do to remove redundant object code from the linkage.
But you can only collect any garbage from a program in this way if the program
is dynamically opaque, i.e not linked with the option -rdynamic
: then the global symbols defined in the program's static image are not visible
to the OS loader and cannot be referenced from outside its static image by dynamic
libraries in the same process. In this case the linker can determine by static
analysis that a symbol whose definition is not referenced in the program's static
image cannot be referenced at all, since it cannot be referenced dynamically,
and if all symbols defined in an input section are statically unreferenced then
it can garbage-collect the section.
If the program has been linked -rdynamic then -Wl,-gc-sections will
collect no garbage, and this is quite right, because if the program is
not dynamically opaque then it is impossible for static analysis to determine that anything
defined in its linkage cannot be referenced.
It's noteworthy that although -rdynamic is not a default linkage
option for GCC, it is a default linkage option for CMake projects using
the GCC toolchain. So to use linktime garbage collection in CMake projects
you would always have to override the -rdynamic default. And obviously it would only be
valid to do this if you have determined that it is alright for the program to
be dynamically opaque.

Optimization: Faster compilation

Separating a program into header and source files perhaps might benefit in faster compilation if given to a smart compilation manager, which is what I am working on.
Will on theory work:
Creating a thread for each source file and
compiling each source file into object file at once.
Then link those object files together.
It still needs to wait for the source file being the slowest.
This shouldn't be a problem as a simple n != nSources counter can be implemented that increments for each .o generated.
I don't think GCC on default does that. When it invokes the assembler
it should parse the files one by one.
Is this a valid approach and how could I optimize compilation time even further?

All modern (as in post 2000-ish) make's offer this feature. Both GNU make and the various flavours of BSD make will compile source files in separate threads with the -j flag. It just requires that you have a makefile, of course. Ninja also does this by default. It vastly speeds up compilation.

Does the linker refer to the main code

Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.

1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.

If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.

How to link two files in C

I am currently working on a class assignment. The assignment is to create a linked list in c. But because we it's a class assignment we have some constraints:
We have a header file that we cannot modify.
We have a c file that is the linkedlist
We have a c file that is just a main method just to test the linkedlist
the header file has a main method defined, so when I attempt to build the linkedlist it fails because there is no main method. What should I do to resolve the issue?? Import the test file (this causes another error)?

I'm assuming your three files are called header.h, main.c, and linkedlist.c
gcc main.c linkedlist.c -o executable
This will create an executable binary called "executable"
Note this also assumes you're using gcc as a compiler.

Like most languages, C supports modules. What I assume your assignment requires is compiling a module. Modules, unlike full programs, lack entry points. Roughly speaking, they are collections of functions, in the manner of a library. When compiling a module, no linking is made.
You would compile a module like this: gcc -c linkedlist.c -> this would actually produce linkedlist.o, which is a module. Try executing this linkedlist.o (after changing its mode to executable, since it won't be so by default). The reason you fail to execute this module is, partly, because it is not in the proper format to be executed. Ones of the reasons it is not so is it lacks entry point (what we know as 'main') and linkage. Your assignment seems to provide a test 'main.c', if you wanted to use it, you would only have to link the 'main.c' (actually compiled into main.o) with linkedlist.o . To actually do that, simply type in gcc -o name_of_your_program main.c linkedlist.o. In fact, what is being done here is that your compiler first compiles main.c into a main.o module, then links the 2 modules together under the name you have given it with the -o option, but the compiler is pretty smart and needs nothing explicit about the steps he needs to take. Now if you wanted to know more about this stuff, you'd have to try and learn about how compilers do what they do. Google can help you with that more than I ever could. Good luck.