Compile coretuils as shared objects? - c

I am trying to compile GNU Coreutils as a set of shared libraries, instead of a set of executables. I thought that make would let me pass in a flag to tell it to do this, but from what I can see I would actually have to modify the configure.ac and Makefile.am in order to make this work. I would prefer not to do this, since this potentially introduces bugs into code that I can currently rely on being bug-free. I tried manually turning the object files into so's by entering:
make CFLAGS='-fpic'
gcc -shared -o ls.so coreutils/src/ls.o
I am able to create the so file, but there seem to be a number of flags that I am missing, and I don't see any way to access a list of necessary flags to compile and link the code (even though this information is clearly contained in the computer). The only thing I can think to do is manually go through all of the linker errors and try to figure out what flags are missing, but I'm hoping that there is a less tedious way of getting what I want.

Not sure what you're trying to do, but related to this is the ./configure --enable-single-binary option which links all objects to a single executable.

Related

Clang Coverage Mapping with -fprofile-instr-generate

I'm trying to generate a code coverage report for some individually compiled tests on Ubuntu 18.04 and running into a strange problem. If I compile with clang 5.0.0 and pass it the -fprofile-instr-generate and -fcoverage-mapping flags, it does actually work and running the compiled test causes it to spit out a .profraw file I can process with llvm-cov and turn into a coverage report. However, the only coverage that it seems to track is that of the test harness and any code directly included via #include, completely ignoring code that was linked. As an example, if a header file is included via #include it will show coverage for that file but not for the associated .c file that the actual called code is stored in. From some research it seemed like the solution was to add -fprofile-instr-generate to the linking step as well, but this didn't change the result at all. It's terrible practice (and unsustainable) to manually #include any files I want to see the coverage of, but I don't see another option that lets me view the coverage of linked code (specifically, the coverage of the function I'm calling in the test harness and anything that function calls). Is this a problem that other people have had, and does anyone know how to solve it?
You need to add the flags when compiling the units to be covered. The compiler will instrument the object code, that means it adds the code that counts "passing control flow" to say it simply.
This means that you need to compile the unit-under-test again before you link it to your test harness. That test harness in contrary does not need to be compiled with the flags, because you are not interested in its coverage most probably.
There is no need to include the unit's source into another source. How you found, this is too hard to maintain.
However, the linker needs the flags, too.
For the production code you will compile your units without the flags.
I ran into this problem. It turns out that I need --object option for each binary.
I got the report that includes both reports on library and executable with this command:
llvm-cov report --object mylib.so --object my_executable --instr-profile somewhere.profdata
Fixed my own issue! Turns out the test was linking with a .so shared library that contained my target functions, so the problem was solved by doing two things. First, instrumenting the step that compiled the .o objects into the shared library allowed for visibility into the target functions. Second, using llvm-cov show/report with the .profdata generated by the test and passing it the shared library as the binary instead of the test binary allowed for it to report coverage information for the target functions. Lesson learned, always check if your instrumented test is importing shared libraries and make sure your shared libraries are being compiled with the right flags to instrument them!

gcc generating a list of function and variable names

I am looking for a way to get a list of all the function and variable names for a set of c source files. I know that gcc breaks down those elements when compiling and linking so is there a way to piggyback that process? Or any other tool that could do the same thing?
EDIT: It's mostly because I am curious, I have been playing with things like make auto dependency and graphing include trees and would like to be able to get more stats on the source files. And it seems like something that would already exist but i haven't found any options or flags for it.
If you are only interested by names of global functions and variables, you might (assuming you are on Linux) use the nm or objdump utilities on the ELF binary executable or object files.
Otherwise, you might customize the GCC compiler (assuming you have a recent version, e.g. 5.3 or 6 at least) thru plugins. You could code them directly in C++, or you might consider using GCC MELT, a Lisp-like domain specific language to customize GCC. Perhaps even the findgimple mode of GCC MELT might be enough....
If you consider extending GCC, be aware that you'll need to spend a significant time (perhaps months) understanding its internal representations (notably Generic Trees & Gimple) in details. The links and slides on GCC MELT documentation page might be useful.
Your main issue is that you probably need to understand most of the details about GCC internal representations, and that takes time!
Also, the details of GCC internals are slightly changing from one version of GCC to the next one.
You could also consider (instead of working inside GCC) using the Clang/LLVM framework (but learning that is also a lot of time). Maybe you might also look into Frama-C or Coccinnelle.
Another approach might be to compile with debug info and parse DWARF information.
PS. My point is that your problem is probably much more difficult than what you believe. Parsing C is not that simple ... You might spend months or even years working on that... And details could be target-processor & system & compiler specific...

Modular programming and compiling a C program in linux

So I have been studying this Modular programming that mainly compiles each file of the program at a time. Say we have FILE.c and OTHER.c that both are in the same program. To compile it, we do this in the prompt
$gcc FILE.c OTHER.c -c
Using the -c flag to compile it into .o files (FILE.o and OTHER.o) and only when that happens do we translate it (compile) to executable using
$gcc FILE.o OTHER.o -o
I know I can just do it and skip the middle part but as it shows everywhere, they do it first and then they compile it into executable, which I can't understand at all.
May I know why?
If you are working on a project with several modules, you don't want to recompile all modules if only some of them have been modified. The final linking command is however always needed. Build tools such as make is used to keep track of which modules need to be compiled or recompiled.
Doing it in two steps allows to separate more clearly the compiling and linking phases.
The output of the compiling step is object (.o) files that are machine code but missing the external references of each module (i.e. each c file); for instance file.c might use a function defined in other.c, but the compiler doesn't care about that dependency in that step;
The input of the linking step is the object files, and its output is the executable. The linking step bind together the object files by filling the blanks (i.e. resolving dependencies between objets files). That's also where you add the libraries to your executable.
This part of another answer responds to your question:
You might ask why there are separate compilation and linking steps.
First, it's probably easier to implement things that way. The compiler
does its thing, and the linker does its thing -- by keeping the
functions separate, the complexity of the program is reduced. Another
(more obvious) advantage is that this allows the creation of large
programs without having to redo the compilation step every time a file
is changed. Instead, using so called "conditional compilation", it is
necessary to compile only those source files that have changed; for
the rest, the object files are sufficient input for the linker.
Finally, this makes it simple to implement libraries of pre-compiled
code: just create object files and link them just like any other
object file. (The fact that each file is compiled separately from
information contained in other files, incidentally, is called the
"separate compilation model".)
It was too long to put in a comment, please give credit to the original answer.

Using Perl's ExtUtils::MakeMaker, how can I compile an executable using the same settings as my XS module?

Given a Perl XS module using a C library, assume there is a Makefile.PL that is set up correctly so that all header and library locations, compiler and linker flags etc work correctly.
Now, let's say I want to include a small C program with said XS module that uses the same underlying C library. What is the correct, platform independent way to specify the target executable so that it gets built with the same settings and flags?
If I do the following
sub MY::postamble {
return <<FRAG;
target$Config{exe_ext}: target$Config{obj_ext}
target$Config{obj_ext}: target.c
FRAG
}
I don't get those include locations, lists of libraries etc I set up in the arguments to WriteMakefile. If I start writing rules manually, I have to account for at least make, dmake, and nmake. I can't figure out a straightforward way to specify libraries to link against if use ExtUtils::CBuilder.
I must be missing something. I would appreciate it if you can point it out.
EU::MM does not know how to create executable. If you need to do this, you will need to take into account various compiler toolchains. I've done this at least once, but I don't claim it's completely portable (just portable enough).
A long term solution would be a proper compilation framework, I've been working on that but it's fairly non-trivial.
You might want to look at Dist::Zilla. I use it because I can never remember how to do this either. With a fairly small and boilerplaty dist.ini file to tell it which plugins to use, it'll generate the all the right building systems for you, including the whole of the necessary Makefile.PL. Think of it it as a makefile-maker-maker. I use it for one of my smaller and more broken C-based CPAN modules: https://github.com/morungos/p5-Algorithm-Diff-Fast, which doesn't work especially well as module but has a decent build. The magic needed is in the inc/DiffMakeMaker.pm.
However, the short answer is to look at the extra settings this component drops into Makefile.PL:
LIBS => [''],
INC => '-I.',
OBJECT => '$(O_FILES)', # link all the C files too
Just adding these options into your Makefile.PL should get it to build a Makefile that handles C as well as XS, and links them together for the Perl.
Because, although EU::MM doesn't know how to create an executable, most of what it does it to make a Makefile. And that's more than happy to make what's needed to glue the C and Perl together properly.

How to relink existing shared library with extra object file

I have existing Linux shared object file (shared library) which has been stripped. I want to produce a new version of the library with some additional functions included. I had hoped that something like the following would work, but does not:
ld -o newlib.so newfuncs.o --whole-archive existinglib.so
I do not have the source to the existing library. I could get it but getting a full build environment with the necessary dependencies in place would be a lot of effort for what seems like a simple problem.
You might like to try coming at this from a slightly different angle by loading your object using preloading.
Set LD_PRELOAD to point to your new object
export LD_PRELOAD=/my/newfuncs/dir/newfuncs.o
and specify the existing library in the same way via your LD_LIBRARY_PATH.
This will then instruct the run time linker to search for needed symbols in your object before looking in objects located in your LD_LIBRARY_PATH.
BTW You can put calls in your object to then call the function that would've been called if you hadn't specified an LD_PRELOAD object or objects. This is why this is sometimes called interposing.
This is how many memory allocation analysis tools work. They interpose versions of malloc() and free() that records the calls to alloc() and free() before then calling the actual system alloc and free functions to perform the memory management.
There's plenty of tutorials on the interwebs on using LD_PRELOAD. One of the original and best is still "Building library interposers for fun and profit". Though written nine years ago and written for Solaris it is still an excellent resource.
HTH and good luck.
Completely untested idea:
# mv existinglib.so existinglib-real.so
# ld -o exlistinglib.so -shared newfuncs.o -lexistinglib-real
The dynamic linker, when loading a program that expects to load existinglib.so, will find your version, and also load existinglib-real.so on which it depends. It doesn't quite achieve the stated goal of your question, but should look as if it does to the program loading the library.
Shared libraries are not archives, they are really more like executables. You cannot therefore just stuff extra content into them in the same way you could for a .a static library.
Short answer: exactly what you asked for can't be done.
Longer answer: depending on why you want to do this, and exactly how existinglib.so has been linked, you may get close to desired behavior. In addition to LD_PRELOAD and renaming existinglib.so already mentioned, you could also use a linker script (cat /lib/libc.so to see what I mean).

Resources