What does the GNU ld --undefined option do? - c

Can somebody explain what the GNU ld option --undefined does?
Working on a LiteOS project. The app is linked with many -u options. For example -utask_shellcmd.
The GNU linker manual for --undefined=symbol simply says:
Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional modules from standard libraries.
So the symbol will be included in the output file as an undefined. What if the symbol is already defined in one of the linked obj files? If it is really undefined, when the linking of additional modules will happen and how does that happen?

The -u option is only relevant when archive (.a) libraries are involved (maybe also .so libraries with --as-needed in effect).
Unlike individual object files (.o) on the linking command line, which are all linked in the order in which they appear, object files from an archive library are only linked when they satisfy one or more undefined symbol references at the point they appear in the link command line order. Once once .o file from the archive is pulled into the link, the process is repeated recursively, so that if it introduces more undefined symbol references, other object files from the same (or later) archives will be pulled in to satisfy them.
Using -u allows you to cause a particular symbol (and, indirectly, all dependencies of the object file it was defined in) to be pulled into the link. Of course you could just put all .o files on the command line directly, without using any archive libraries, but by using libraries you can avoid linking unused object files (this is especially useful if large parts of the code may be unused depending on build-time-configurable settings in other files!) while getting the ones you need.

Related

What does it mean to 'add an index to an archive file'?

My C textbook creates an archive using the -s command line option which adds an index to the following archive file.
ar -rcs libfile.a file1.o file2.o
However, I don't understand what an index is or it's purpose?
Converting comments into an answer.
There's a long, complicated history tied up in part with the ranlib program. Basically, for the linker to be able to scan a library efficiently, some program — either ar itself or ranlib — adds a special 'index' file which identifies the symbols defined in the library and the object file within the archive that defines each of those symbols. Many versions of ar add that automatically if any of the saved files is an object file. Some, it seems, require prodding with the -s option. Others don't add the index file at all and rely on ranlib to do the job.
The ar on macOS documents:
-s — Write an object-file index into the archive, or update an existing one, even if no other change is made to the archive. You may use this modifier flag either with any operation, or alone. Running ar s on an archive is equivalent to running ranlib on it.
I've not needed to use this option explicitly on macOS for a long time (nor have I run ranlib) — I think things changed sometime in the middle of the first decade of the millennium.
Don't object files already come with symbol tables? Wouldn't it seem redundant to add another symbol table to the start of the archive file?
Each object file contains information about what's in that one object file (and information about referenced objects as well as defined ones); the archive index contains much simpler information about which of the many object files in the archive defines each symbol, so that the linker doesn't have to scan each object file in the archive separately.
So, would it be correct to say that the index at the start of the archive is just a giant symbol table which replaces the individual symbol tables in each object file so the linker has an easier job?
Not replaces — augments. It allows the linker to identify which object file(s) to pull into the linked executable. The linker then processes the object files just as it does any other object file, checking for definitions and references, marking newly defined references as satisfied and identifying previously unused references that are not yet defined. But the linker doesn't have to read every file in the archive to work out which symbols are defined by the file — it knows from the index file which ones are defined.
To clarify the index allows the linker to find the specific object file which defines a symbol rather than having to scan every object file to resolve a symbol?
Yes, that’s right.

Removing symbols from `.a`s

I'm compiling a C++ static library using g++ via Cmake. I want to remove symbols relating to the internal implementation so they don't show up in nm. (See here and here for the same with shared libraries.)
This answer tells you how to do it on iOS, and I'm trying to understand what happens under the hood so I can replicate on Linux. They invoke ld with:
-r/--relocatable to Generate relocatable output---i.e., generate an output file that can in turn serve as input to ld.
-x/--discard-all: Delete all local symbols.
AFAICS the -r glues all the modules into one module, and then the -x removes symbols only used inside that module. Is that right?
It's not clear how the linker 'knows' which symbols will be exported externally? Does it rely on __attribute__((visibility("hidden/default"))) as in the .so case?
Edit: clearly I'm confused... I thought cmake invoked ld to link the .os into .a. Googled + clarified above.
Question still stands: how do I modify the build process to exclude most symbols?

GNU linker get objects through INPUT

I want to use the suggestion made in de GNU linker manual page 40, i.e. INPUT (subr.o), thus specifying object members in a script file.
Eventually I want to specify all the object members of my program that the linker has to use.
The script file looks like this (only the files parts is shown)
SEARCH_DIR(../lib)
STARTUP(boot.o)
ENTRY(_start)
GROUP (libkernel.a libkflib.a)
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
The linker replies with:
attempt to open boot.o failed
attempt to open ../lib/boot.o failed
m68k-rtems4.11-ld: cannot find boot.o
I have specified the search path, the libraries and a list of object members; the object members are definitly in the libraries.
I was expection the linker to look for object members in the working directory and if not there using the search path and libraries.
Obviously there is something wrong but I cannot figure it out.
Suggestions are welcome in order to achieve the desired way of linking: specifying all objects to link an not more than that.
Thanks
Ben
You have misread the manual:
INPUT(file file ...)
The INPUT command directs the linker to include the named files in the link,
as though they were named on the command line.
...
In case a sysroot prefix is configured, and the filename starts with the / character,
and the script being processed was located inside the sysroot prefix, the
filename will be looked for in the sysroot prefix. Otherwise, the linker will
try to open the file in the current directory. If it is not found, the linker
will search through the archive library search path. See the description of
-L in Command Line Options.
If you use INPUT (-lfile), ld will transform the name to libfile.a, as with
the command line argument -l.
...
GROUP(file file ...)
The GROUP command is like INPUT, except that the named files should all be
archives, and they are searched repeatedly until no new undefined references
are created.
STARTUP(filename)
The STARTUP command is just like the INPUT command, except that filename will
become the first input file to be linked, as though it were specified first on the command line
Or perhaps you have read some other documentation that is in error.
You have got wrong the impression that command GROUP(libboo.a...) is complementary to INPUT(foo.o...)
and STARTUP(bar.o), with the effect that files bar.o, foo.o... will be searched for in archives
libboo.a... and - if found - will be extracted from the archives and added to the linkage.
INPUT(foo.o) has the same effect as specifying foo.o on the linker
commandline except that if not found in the current directory it will be searched
for in the same way that a static library specified with the -l option would be, with any SEARCH_DIR(path)
commands in the linker script having the same influence as commandline options
-Lpath.
STARTUP(bar.o) has the same effect as specifying bar.o first in the linker
commandline.
GROUP(libboo.a...) has the same effect as the commandline options
--start-group -lboo.a... --end-group
again with any SEARCH_DIR(path) commands actiing like -Lpath.
INPUT(foo.o) and STARTUP(bar.o) are unconnected with GROUP(libboo.a...), just as in the commandline options:
bar.o --start-group -lboo.a... --end-group foo.o
bar.o and foo.o are an input files unconnected with --start-group -lboo.a... --end-group.
The linker never looks inside static libraries for an input object file that it otherwise
fails to find.
So this command:
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
in your linker script requires the linker to find those object files in the current
directory or in ../lib, and they are not there. Similarly for STARTUP(boot.o). Hence
the linkage errors.
Those object files aren't there, but static libraries libkernel.a and libkflib.a are,
and you tell us they contain all all those object files as members. In that case
you simply don't need:
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
because the linker will search static libraries to find object files that
provide definitions of any symbols for which definitions are called for
by references within object files that it has already linked. You don't have to
tell it to.
But to give the linker any reason to search a static library at all you
must have told it to link some object file that refers to some undefined symbol(s)
before it considers the static library.
That's the reason why, on the commandline, you must place object files before the
libraries to which they refer and it is why, in a self-contained linker script, if you
add libraries to the linkage with GROUP(libboo.a...) or with INPUT (-lboo...),
you must also specify an object file to be linked first, with a STARTUP
command.
So while your failing INPUT command is unnecessary and can just be
deleted, your failing STARTUP(boot.o) must remain, to initiate
any linkage. And must not fail.
To make STARTUP(boot.o) succeed, you have to put the object file boot.o
itself in a place where the script will tell the linker to look for it,
either the current directory or ../lib. Presumably, the current directory.
And when you do that, it becomes pointless to have boot.o remain as a member
of one your static libraries, since it can't be linked from within one. Best
delete it from whichever libary you have it in. Leaving it there is similar
to putting the main function of a program into one of the libraries you
will link it with.

Why do I need to manually link the C runtime library when creating an EXE out of static libraries without any object files?

I'm pretty new to working with libraries and I'm in the process of trying to understand some specifics regarding static libraries and object files.
Summary
The behavior I'm noticing is that I can link several objects to make an executable with no problem, but if I take an intermediate step of combining those objects into static libraries, I cannot link those static libraries to make an executable without additionally specifying the needed C Run-time library in the link command.
Also, or the record, I'm doing the compiling/linking with Visual Studio 2010 from the command line. More details of the process I'm following are below.
First, let's say I have four source files in a project: main.c, util1.c, util2.c, and util3.c.
What works
I can compile these sources with the following command:cl -c main.c util1.c util2.c util3.cAs a result, I now have four object files: main.obj, util1.obj, util2.obj, and util3.obj. These object files each contain a DEFAULTLIB statement intended to inform the linker that it should additionally check the static C Run-time library libcmt.lib for any unresolved external dependencies in these object files when linking them.
I can create an executable named "app_objs.exe" by linking these objects with the following command:
link -out:app_objs.exe main.obj util1.obj util2.obj util3.obj
As mentioned in step 1, the linker used the runtime library due to the compiler's step of adding a default library statement to the objects.
Where I'm confused
Let's say I want to have an intermediate step of combining these objects into static libraries, and then linking those resulting LIB files to create my executable. First, I can create these libraries with the following commands:
link -lib -out:main.lib main.obj
link -lib -out:util.lib util1.obj util2.obj util3.obj
Now, my original thought was that I could simply link these libraries and have the same executable that I created in step 2 of "What works". I tried the following command and received linker error LNK1561, which states that an entry point needs to be specified:
link -out:app_libs.exe main.lib util.lib
From Microsoft's documentation, it is evident that linking libraries without any object files may require entry points to be specified, so I modified the command to set the subsystem as "console" to specify that the executable in intended to be a console application (which seems to imply certain entry points, thereby resolving that error):link -out:app_libs.exe -subsystem:console main.lib util.libUnfortunately, now I get a linker error stating that mainCRTStartup is an unresolved external symbol. I understand that this is defined in the C runtime library, so I can resolve this issue by manually specifying that I want to link against libcmt.lib, and this gives me a functioning executable:link -out:app_libs.exe -subsystem:console main.lib util.lib libcmt.lib
What I'm not understanding is why the default library info that the compiler placed in each object file couldn't be used to resolve the dependency on libcmt.lib. If I can link object files without explicitly stating I want libcmt.lib, and I created static libraries that are containers for the object files, why can't I link those static libraries without having to explicitly state that I want libcmt.lib? Is this just the way things are, or is there some way I could create the static libraries so that the linker will know to check for unresolved symbols in the runtime library?
Thanks for your help. If I have some fundamentally incorrect ideas here, I'd love suggestions on good references to learn all of this correctly.
Well the answer to your misunderstanding is that .lib files are often a product in themselves, and the compiler can't make those assumptions safely. That's what "external" is for.
If I produce binaries for someone's platform because its users are totally helpless, and they want/need static linkage, I have to give them foo.h and libfoo.lib without tying them to a specific runtime entry point. They may very well have defined their own entry point already for their final product, whether DLL or EXE.
You either want the runtime, or you want your own .obj that contains your entry point. Be warned that declaring and defining mainCRTStartup on your own may mean you're not executing important instructions for the target platform.

linking object files and linking static libraries containing these files

Hello Stack Overflow Community,
i am working on a c project to interleave multiple c programs into one binary, which can run the interleaved programs as treads or forks for benchmarking purposes.
Therefore i run make in each program folder of the desired programs and prelink all .o files with "ld -r" to one new .o file. After that i add a specific named function to each of these "big" .o files, which does nothing but run the main() of each program and providing the argc and argv. Then i use objcopy to localize every global Symbol except the unknown ones and the one of my specific function which shall run the main(). At last i link these manipulated .o files together with my program which runs the specific named functions as threads, or forks or after another.
Now to my Question/Problem:
I ran into a problem with static libs. I was using ffmpeg for testing, and it builds static libs such as libavcodc and libavutil and so on. Unfortunately, "ld -r" does not link .a files. So i tried to extract these libs with ar -x and then link the extracted .o files in the way mentioned above to the "big" new .o file. But i did not work because libavcodec and libavutil both include the file ff_inverse.o. That is obviously not a problem when i just build ffmpeg, which will link these static libraries. But still, both libraries include it, so there must be a machanism which makes the choice, which ff_inverse.o to use and to link. So my Question: How does this work? Where is the difference?
The way ld does it with normal linking is to prioritize the libraries. Libraries listed first in the command line are linked in first, and only if symbols still are unresolved does it move on to the next library. When linking static libraries, it ignores the name of each .o file, because the name is unnecessary, only the exported symbols are necessary. You may want to emulate that behavior, by extracting libraries in a sorted order.

Resources