Change access/scope of C variable from static to global at compile time, without changing source code? - c

Let's say, in my C executable project, I use a C library to which I have source access - in fact, the .c files of the library get built as part of the build process for my C executable.
Then, let's say in some of the library .c files, let's say exotic.c, there is something like:
static struct exotic_type all_exotic_array[NUM_EXOTIC_ITEMS][8];
... which the author of the library believes should remain hidden (ergo static) - but which, it turns out, I need a reference to in my code; therefore I'd like to make this variable's symbol publicly accessible.
So, in principle I could just erase the static keyword there and recompile - and I'd get the all_exotic_array as global. However, let's say I also do not want to change the source code of the library.
So in this context, I can intervene in the Makefile, when the individual library .c files get compiled. So:
Is there a compilation switch, with which I could tell the compiler gcc (or linker ld?), while compiling exotic.c, something like: "please ignore the static storage class specifier of the symbol all_exotic_array, and make it available globally"?
If not - is there any other action I could take (maybe after compilation of exotic.c but before linking my executable), to make the symbol all_exotic_array globally available?
I have seen
Access of static variable from one file to another file
Renaming symbols at compile time without changing the code in a cross platform way
Access a global static variable from a .so file without modifying library
Keep all exported symbols when creating a shared library from a static library
... which hint that there may be a possibility to do what I want, however I couldn't find exactly what to do in my use case.

Related

List "never linked against" source file in C project

I would like to know if someone is aware of a trick to retrieve the list of files that had been (or ideally will be) used by linker to produce an executable.
Some kind of solution must exist. A a static source analyzer, or a hack, such as compiling with some weird flags, and analyzing produced executable with another tool, or force the linker to output this information.
The goal is to provide a tool that strip useless source files from a list of source files.
The end goal is to ease the build process, by allowing him to give a list of usable source files. Then my tool would only compile the ones actually used by linker instead of everything.
This would allow for some unit_test to still be runnable even if some others are broken and can't compile, while not asking the user to manually list every test dependencies manually in the cmake.
I am targetting linux for now, but will be intersted in the futur to do the same trick on others OS. So I would like a cross-platform solution, eventhought I doubt I will have it :)
Thanks for your help
Edit because I see that it is confusing, what I mean by
allowing him to give a list of usable source file
is that, in cmake, for exemple. If you use add_executable(name, sources), then sources is considered as the sources to compile and link on.
I want to wrap add_executable, so sources is viewed as a set of usable if necessary sources files.
I'm afraid the idea of detecting never linked source files is not a fruitful one.
To build a program, CMake will not compile a source file if it not going to link the resulting object
file into the program. I can understand how you might think that this happens, but it doesn't happen.
CMake already does what you would like it to do and the same is true of every other build automation system going back to
their invention in the 1970s. The fundamental purpose of all
such systems is to ensure that the building of a program
compiles a source file name.(c|cc|f|m|...) if and only if
the object file name.o is going to be linked into the program
and is out of date or does not exist. You can always defeat this purpose by
egregiously bad coding of the project's build spec (CMakeLists.txt, Makefile, SConstruct, etc.),
but with CMake you would need to be really trying to do it, and
trying quite expertly.
If you do not want name.c to be compiled and the object file name.o
linked into a target program, then you do not tell the build system
that name.o or name.c is a prerequisite of the program. Don't tell
it what you know is not true. It is elementary competence not to specify redundant prerequisites of
a build system target.
The linker will link all its input object files into an output
program without question. It does not ask whether or not they are "needed"
by the program because it cannot answer that question. Neither the
linker nor any possible static analysis tool can know what program
you intend to produce when you input some object files for linkage.
It can only be assumed that you intend to produce the program that
results from the linkage of those object files, assuming the
linkage is successful.
If those object files cannot be linked into a program at all, the linker will tell you
that, and why. Otherwise, if you have linked object files that you didn't
intend to link, you can only discover that for yourself, by noticing
the mistake in the build log, or failing that by testing the program and/or inspecting its contents and comparing
your observations with your expectations.
Given your choice of object files for linkage, you can instruct the linker
to detect any code sections or data sections it extracts those object files in
which no symbols are defined that can be referenced by the program, and to
throw away all such unreferenced input sections instead of linking them
into the program. This is called linktime "garbage collection". You tell the
linker to do it by passing the option -Wl,-gc-sections in the
gcc linkage command. See this question
to learn how to maximise the collectible garbage. This is what you
can do to remove redundant object code from the linkage.
But you can only collect any garbage from a program in this way if the program
is dynamically opaque, i.e not linked with the option -rdynamic
: then the global symbols defined in the program's static image are not visible
to the OS loader and cannot be referenced from outside its static image by dynamic
libraries in the same process. In this case the linker can determine by static
analysis that a symbol whose definition is not referenced in the program's static
image cannot be referenced at all, since it cannot be referenced dynamically,
and if all symbols defined in an input section are statically unreferenced then
it can garbage-collect the section.
If the program has been linked -rdynamic then -Wl,-gc-sections will
collect no garbage, and this is quite right, because if the program is
not dynamically opaque then it is impossible for static analysis to determine that anything
defined in its linkage cannot be referenced.
It's noteworthy that although -rdynamic is not a default linkage
option for GCC, it is a default linkage option for CMake projects using
the GCC toolchain. So to use linktime garbage collection in CMake projects
you would always have to override the -rdynamic default. And obviously it would only be
valid to do this if you have determined that it is alright for the program to
be dynamically opaque.

Undefined reference to error when .so libraries are involved while building the executable

I have a .so library and while building it I didn't get any undefined reference errors.
But now I am building an executable using the .so file and I can see the undefined reference errors during the linking stage as shown below:
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
I referred to this and this but couldn't really understand the dynamic linking.
Also reading from here lead to more confusion:
Dynamic linking is accomplished by placing the name of a sharable
library in the executable image. Actual linking with the library
routines does not occur until the image is run, when both the
executable and the library are placed in memory. An advantage of
dynamic linking is that multiple programs can share a single copy of
the library.
My questions are:
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so
file on which executable depends it should crash?
What actually is dynamic linking if all external entity references are
resolved while building? Is it some sort of pre-check performed by dynamic linker? Else
dynamic linker can make use of
LD_LIBRARY_PATH to get additional libraries to resolve the undefined
symbols.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
No, linker will exit with "No such file or directory" message.
Imagine it like this:
Your executable stores somewhere a list of shared libraries it needs.
Linker, think of it as a normal program.
Linker opens your executable.
Linker reads this list. For each file.
It tries to find this file in linker paths.
If it finds the file, it "loads" it.
If it can't find the file, it get's errno with No Such file or directory from open() call. And then prints a message that it can't find the library and terminates your executable.
When running the executable, linker dynamically searches for a symbol in shared libraries.
When it can't find a symbol, it prints some message and the executable teerminates.
You can for example set LD_DEBUG=all to inspect what linker is doing. You can also inspect your executable under strace to see all the open calls.
What actually is dynamic linking if all external entity references are resolved while
building?
Dynamic linking is when you run the executable then the linker loads each shared library.
When building, your compiler is kind enough to check for you, that all symbols that you use in your program exist in shared libraries. This is just for safety. You can for example disable this check with ex. --unresolved-symbols=ignore-in-shared-libs.
Is it some sort of pre-check performed by dynamic linker?
Yes.
Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
LD_LIBRARY_PATH is just a comma separated list of paths to search for the shared library. Paths in LD_LIBRARY_PATH are just processed before standard paths. That's all. It doesn't get "additional libraries", it gets additional paths to search for the libraries - libraries stay the same.
It looks like there is a #define missing when you compile your shared library. This error
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
means, that something like
#define MICRO_TO_NANO_ULL(sec) ((unsigned long long)sec * 1000)
should be present, but is not.
The compiler assumes then, that it is an external function and creates an (undefined) symbol for it, while it should be resolved at compile time by a preprocessor macro.
If you include the correct file (grep for the macro name) or put an appropriate definition at the top of your source file, then the linker error should vanish.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
Yes. If the .so file is not present at run-time.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
It allows for libraries to be upgraded and have applications still be able to use the library, and it reduces memory usage by loading one copy of the library instead of one in every application that uses it.
The linker just creates references to these symbols so that the underlying variables or functions can be used later. It does not link the variables and functions directly into the executable.
The dynamic linker does not pull in any libraries unless those libraries are specified in the executable (or by extension any library the executable depends on). If you provide an LD_LIBRARY_PATH directory with a .so file of an entirely different version than what the executable requires the executable can crash.
In your case, it seems as if a required macro definition has not been found and the compiler is using implicit declaration rules. You can easily fix this by compiling your code with -pedantic -pedantic-errors (assuming you're using GCC).
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so file
on which executable depends it should crash?
It will crash. The time of crash does depend on the way you call a certain exported function from the .so file.
You might retrieve all exported functions via functions pointers by yourself by using dlopen dlysm and co. In this case the program will crash at first call in case it does not find the exported method.
In case of the executable just calling an exported method from a shared object (part of it's header) the dynamic linker uses the information of the method to be called in it's executable (see second answer) and crashes in case of not finding the lib or a mismatch in symbols.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
You need to differentiate between the actual linking and the dynamic linking. Starting off with the actual linking:
In case of linking a static library, the actual linking will copy all code from the method to be called inside the executable/library using it.
When linking a dynamic library you will not copy code but symbols. The symbols contain offsets or other information pointing to the acual code in the dynamic library. If the executable does invoke a method which is not exported by the dynamic library though, it will already fail at the actual linking part.
Now when starting your executable, the OS will at some point try to load the shared object into memory where the code actually resides in. If it does not find it or also if it is imcotable (i.e.: the executable was linked to a library using different exports), it might still fail at runtime.

How do the preprocessor's and linker's jobs not interfere? [duplicate]

Okay, until this morning I was thoroughly confused between these terms. I guess I have got the difference, hopefully.
Firstly, the confusion was that since the preprocessor already includes the header files into the code which contains the functions, what library functions does linker link to the object file produced by the assembler/compiler? Part of the confusion primarily arose due to my ignorance about the difference between a header file and a library.
After a bit of googling, and stack-overflowing (is that the term? :p), I gathered that the header file mostly contains the function declarations whereas the actual implementation is in another binary file called the library (I am still not 100% sure about this).
So, suppose in the following program:-
#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file in the code. The compiler/compiler+assembler does its work, and then finally linker combines this object file with another object file which actually has stored the way printf() works.
Am I correct in my understanding? I may be way off...so could you please help me?
Edit: I have always wondered about the C++ STL. It always confused me as to what it exactly is, a collection of all those headers or what? Now after reading the responses, can I say that STL is an object file/something that resembles an object file?
And also, I thought where I could read the function definitions of functions like pow(), sqrt() etc etc. I would open the header files and not find anything. So, is the function definition in the library in binary unreadable form?
A C source file goes through two main stages, (1) the preprocessor stage where the C source code is processed by the preprocessor utility which looks for preprocessor directives and performs those actions and (2) the compilation stage where the processed C source code is then actually compiled to produce object code files.
The preprocessor is a utility that does text manipulation. It takes as input a file that contains text (usually C source code) that may contain preprocessor directives and outputs a modified version of the file by applying any directives found to the text input to generate a text output.
The file does not have to be C source code because the preprocessor is doing text manipulation. I have seen the C Preprocssor used to extend the make utility by allowing preprossor directives to be included in a make file. The make file with the C Preprocessor directives is run through the C Preprocessor utility and the resulting output then fed into make to do the actual build of the make target.
Libraries and linking
A library is a file that contains object code of various functions. It is a way to package the output from several source files when they are compiled into a single file. Many times a library file is provided along with a header file (include file), typically with a .h file extension. The header file contains the function declarations, global variable declarations, as well as preprocessor directives needed for the library. So to use the library, you include the header file provided using the #include directive and you link with the library file.
A nice feature of a library file is that you are providing the compiled version of your source code and not the source code itself. On the other hand since the library file contains compiled source code, the compiler used to generate the library file must be compatible with the compiler being used to compile your own source code files.
There are two types of libraries commonly used. The first and older type is the static library. The second and more recent is the dynamic library (Dynamic Link Library or DLL in Windows and Shared Library or SO in Linux). The difference between the two is when the functions in the library are bound to the executable that is using the library file.
The linker is a utility that takes the various object files and library files to create the executable file. When an external or global function or variable is used the C source file, a kind of marker is used to tell the linker that the address of the function or variable needs to be inserted at that point.
The C compiler only knows what is in the source it compiles and does not know what is in other files such as object files or libraries. So the linker's job is to take the various object files and libraries and to make the final connections between parts by replacing the markers with actual connections. So a linker is a utility that "links" together the various components, replacing the marker for a global function or variable in the object files and libraries with a link to the actual object code that was generated for that global function or variable.
During the linker stage is when the difference between a static library and a dynamic or shared library becomes evident. When a static library is used, the actual object code of the library is included in the application executable. When a dynamic or shared library is used, the object code included in the application executable is code to find the shared library and connect with it when the application is run.
In some cases the same global function name may be used in several different object files or libraries so the linker will normally just use the first one it comes across and issue a warning about others found.
Summary of compile and link
So the basic process for a compile and link of a C program is:
preprocessor utility generates the C source to be compiled
compiler compiles the C source into object code generating a set of object files
linker links the various object files along with any libraries into executable file
The above is the basic process however when using dynamic libraries it can get more complicated especially if part of the application being generated has dynamic libraries that it is generating.
The loader
There is also the stage of when the application is actually loaded into memory and execution starts. An operating system provides a utility, the loader, which reads the application executable file and loads it into memory and then starts the application running. The starting point or entry point for the executable is specified in the executable file so after the loader reads the executable file into memory it will then start the executable running by jumping to the entry point memory address.
One problem the linker can run into is that sometimes it may come across a marker when it is processing the object code files that requires an actual memory address. However the linker does not know the actual memory address because the address will vary depending on where in memory the application is loaded. So the linker marks that as something for the loader utility to fix when the loader is loading the executable into memory and getting ready to start it running.
With modern CPUs with hardware supported virtual address to physical address mapping or translation, this issue of actual memory address is seldom a problem. Each application is loaded at the same virtual address and the hardware address translation deals with the actual, physical address. However older CPUs or lower cost CPUs such as micro-controllers that are lacking the memory management unit (MMU) hardware support for address translation still need this issue addressed.
Entry points and the C Runtime
A final topic is the C Runtime and the main() and the executable entry point.
The C Runtime is object code provided by the compiler manufacturer that contains the entry point for an application that is written in C. The main() function is the entry point provided by the programmer writing the application however this is not the entry point that the loader sees. The main() function is called by the C Runtime after the application is started and the C Runtime code sets up the environment for the application.
The C Runtime is not the Standard C Library. The purpose of the C Runtime is to manage the runtime environment for the application. The purpose of the Standard C Library is to provide a set of useful utility functions so that a programmer doesn't have to create their own.
When the loader loads the application and jumps to the entry point provided by the C Runtime, the C Runtime then performs the various initialization actions needed to provide the proper runtime environment for the application. Once this is done, the C Runtime then calls the main() function so that the code created by the application developer or programmer starts to run. When the main() returns or when the exit() function is called, the C Runtime performs any actions needed to clean up and close out the application.
This is an extremely common source of confusion. I think the easiest way to understand what's happening is to take a simple example. Forget about libraries for a moment and consider the following:
$ cat main.c
extern int foo( void );
int main( void ) { return foo(); }
$ cat foo.c
int foo( void ) { return 0; }
$ cc -c main.c
$ cc -c foo.c
$ cc main.o foo.o
The declaration extern int foo( void ) is performing exactly the same function as the header file of a library. foo.o is performing the function of the library. If you understand this example, and why neither cc main.c nor cc main.o work, then you understand the difference between header files and libraries.
Yes, almost correct. Except that the linker does not links object files, but also libraries - in thise case, it's the C standard library (libc) is what is linked to your object file. The rest of your assumptions appear to be true about the compilation stages + difference between a header and a library.

Does the linker refer to the main code

Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.
1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.
If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.

Determining the space used by library functions in C

I have a bunch of code I need to analyse that I don't know how to do. I have a pile of code that here and there are using math functions from a header file I have included called math.h that came with my IDE. I am being asked to see how much space is used to include this. Specifically is the compiler including all of the library functions or just the ones we use. There is no object file being created. So I think the library code is being compiled into the individual files. Any ideas of a slick way to figure this out? I can't just comment out the includes because then the code wont complie and I won't know a size and if I comment out all the lines that use math functions it is not really representative.
You can use the objdump command to see the individual symbols inside your object files and the space they require.
Note that unless you're doing static compilation, library methods aren't generally copied into your produced binary, but only referenced (and brought in via the dynamic linker when your program is loaded).
As math.h is part of the standard C library, a copy of that library is basically guaranteed to always be in memory, so the additional memory and space requirements on dynamic linking are minimal. (During static linking, all symbols which aren't directly required by your program are discarded, and math functions don't tend to be very big, so usage should be fairly minimal there too).
The code in the header file is being complied into the object file of the .c you are using if your header has the definitions of the functions and just being referenced to if they are simply the declarations. The linker will then find a definition for each symbol and place it in your executable if you are using dynamic linking the OS will pull in the definition at run time.

Resources