How does static linking without an archive file work?

How does static linking without an archive file work? - c

I have two files
main.c
void swap();
int buf[2] = {1, 2};
int main()
{
swap();
return 0;
}
swap.c
extern int buf[];
int* bufp0 = &buf[0]; /* .data */
int* bufp1; /* .bss */
void swap()
{
int temp;
bufp1 = &buf[1];
temp = *bufp0;
*bufp0 = *bufp1;
*bufp1 = temp;
}
Here are 2 excerpts from a book
During this scan, the linker maintains a set E of relocatable object files that
will be merged to form the executable, a set U of unresolved symbols
(i.e., symbols referred to, but not yet defined), and a set D of symbols that
have been defined in previous input files.
Initially, E, U , and D are empty.
For each input file f on the command line, the linker determines if f is an
object file or an archive. If f is an object file, the linker adds f to E, updates
U and D to reflect the symbol definitions and references in f , and proceeds
to the next input file.
If f is an archive, the linker attempts to match the unresolved symbols in U
against the symbols defined by the members of the archive. If some archive
member, m, defines a symbol that resolves a reference in U , then m is added
to E, and the linker updates U and D to reflect the symbol definitions and
references in m. This process iterates over the member object files in the
archive until a fixed point is reached where U and D no longer change. At
this point, any member object files not contained in E are simply discarded
and the linker proceeds to the next input file.
If U is nonempty when the linker finishes scanning the input files on the
command line, it prints an error and terminates. Otherwise, it merges and
relocates the object files in E to build the output executable file.
The general rule for libraries is to place them at the end of the command
line. If the members of the different libraries are independent, in that no member
references a symbol defined by another member, then the libraries can be placed
at the end of the command line in any order.
If, on the other hand, the libraries are not independent, then they must be
ordered so that for each symbol s that is referenced externally by a member of an
archive, at least one definition of s follows a reference to s on the command line.
For example, suppose foo.c calls functions in libx.a and libz.a that call func-
tions in liby.a. Then libx.a and libz.a must precede liby.a on the command
line:
unix> gcc foo.c libx.a libz.a liby.a
I ran the following command to statically link the two object files ( without creating any archive file )
gcc -static -o main.o main.c swap.c
I expected the above command to fail because both main.c and swap.c have references that are defined in each other. But contrary to my expectations, it was successful. I expect it to be successful only if I pass main.c again at the end of the command line.
How did the linker resolve the references in both the files in this case? Does the working of a linker differ when it tries to statically link multiple object files instead of archive files? My guess is that the linker circled back to main.c to resolve the reference buf in swap.c.

Generally, the default behavior of linkers is to include everything from each object module file given to it and to take from a library only the object modules that define references the linker is aware of when processing the library.
So, when the linker processes main.o, it prepares everything in it to go into the output file it is building. That includes remembering (whether in memory or with auxiliary files the linker maintains temporarily) all the symbols defined by main.o and all the symbols that main.o has unresolved references to. When the linker processes swap.o, it adds everything from swap.o into the output file it is building. Further, for any references in main.o that are satisfied by definitions in swap.o, it resolves those references. And, for any references in swap.o that are satisfied by definitions in main.o, it resolves those references.
As the text you quote says, for an object module file:
“(...) the linker adds f to E, updates U and D to reflect the symbol definitions and references in f, and proceeds to the next input file.”
That step is actually the same for each object module the linker adds to the executable, whether the object module comes from an object module file or comes from a library file. The difference is that if the object module is in a file, then the linker adds it to the executable unconditionally, but, if the object module is in a library, the linker adds it to the executable only if it defines a symbol the linker is currently seeking.

The gcc commands as in your case (without -c) produces an executable image. The command compiles every '.c' file on the command line into a '.o' presentation. Then it calls a linker (ld) pointing to all .o files from the command line. The linker resolves references and generates an executable named ... main.o in your case (-o names the executable). You can run it.
A static archive library is just a collection of .o files whcih were compiled separateley. The linker checks all of them in the archive to resolve symbols. You can pre-compile your .c files with '-c' qualifier, generate .o files and then use them on the command line, or create an acrhive of them and use the archive instead.

Related

C compiler / linker - How to hide internal symbols of a static library

MY GOAL IS: to compile a static library (normally .a extension) and have internal symbols hidden (function and variable names) when I open the library file with a text editor.
Currently I am on XC32 compiler of Microchip Technology, who makes the well known PIC microcontrollers.
Actually I can't share here my lib, but I have created a very simple one to make tests.
C source code is only:
int variable;
int sum (int a, int b)
{
return a+b;
}
and header is only:
int sum (int a, int b);
Created a library project on mplabX IDE and selected XC32 compiler. The project properties show these options:
I did changed these option:
xc32-as -> Have symbols in production build. uncheked
xc32-gcc -> Have symbols in production build. uncheked
xc32-ld -> Symbols & Macros -> Symbols. Selected option "strip all symbol info"
Then compiled. Library (.a extension) was generated. I opened it with text editor and the see theat the symbols are yet being shown:
For each of the compiler sections (xc32-as, xc32-gcc, xc32-ld etc), There is a field "Additional options".
I tried to input these options, but no one worked to hide the symbols:
-fvisibility=hidden on gcc and ld
-fvisibility=hidden on gcc and strip -r -S -x on ld
If I try strip -r -S -x on gcc I get:
pic32m-gcc.exe: error: strip: No such file or directory
pic32m-gcc.exe: error: missing argument to '-x'
Then I removed the addicional options for ld and gcc, placed "static" in front of the variable and function, and C source become:
static int variable;
static int sum (int a, int b)
{
return a+b;
}
Doing this (adding static in front of all), the symbols get correctly hidden. But on the other hand I am not able to use the lib attached to my app project, because when the compiler shows "undefined reference" to the functions of the lib. This is because: [From Wikipedia] In the C programming language, static is used with global variables and functions to set their scope to the containing file.
Then, how can I reach my goal in this case? (first phrase of the topic).
Regards.
EDIT:
This is being shown on the project properties on the IDE, I am compiling with it.
All the internal symbols are being shown near the bottom of the lib file where it shows the name of the c file, only in this place shows to c filename

What does it mean to 'add an index to an archive file'?

My C textbook creates an archive using the -s command line option which adds an index to the following archive file.
ar -rcs libfile.a file1.o file2.o
However, I don't understand what an index is or it's purpose?

Converting comments into an answer.
There's a long, complicated history tied up in part with the ranlib program. Basically, for the linker to be able to scan a library efficiently, some program — either ar itself or ranlib — adds a special 'index' file which identifies the symbols defined in the library and the object file within the archive that defines each of those symbols. Many versions of ar add that automatically if any of the saved files is an object file. Some, it seems, require prodding with the -s option. Others don't add the index file at all and rely on ranlib to do the job.
The ar on macOS documents:
-s — Write an object-file index into the archive, or update an existing one, even if no other change is made to the archive. You may use this modifier flag either with any operation, or alone. Running ar s on an archive is equivalent to running ranlib on it.
I've not needed to use this option explicitly on macOS for a long time (nor have I run ranlib) — I think things changed sometime in the middle of the first decade of the millennium.
Don't object files already come with symbol tables? Wouldn't it seem redundant to add another symbol table to the start of the archive file?
Each object file contains information about what's in that one object file (and information about referenced objects as well as defined ones); the archive index contains much simpler information about which of the many object files in the archive defines each symbol, so that the linker doesn't have to scan each object file in the archive separately.
So, would it be correct to say that the index at the start of the archive is just a giant symbol table which replaces the individual symbol tables in each object file so the linker has an easier job?
Not replaces — augments. It allows the linker to identify which object file(s) to pull into the linked executable. The linker then processes the object files just as it does any other object file, checking for definitions and references, marking newly defined references as satisfied and identifying previously unused references that are not yet defined. But the linker doesn't have to read every file in the archive to work out which symbols are defined by the file — it knows from the index file which ones are defined.
To clarify the index allows the linker to find the specific object file which defines a symbol rather than having to scan every object file to resolve a symbol?
Yes, that’s right.

GNU linker get objects through INPUT

I want to use the suggestion made in de GNU linker manual page 40, i.e. INPUT (subr.o), thus specifying object members in a script file.
Eventually I want to specify all the object members of my program that the linker has to use.
The script file looks like this (only the files parts is shown)
SEARCH_DIR(../lib)
STARTUP(boot.o)
ENTRY(_start)
GROUP (libkernel.a libkflib.a)
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
The linker replies with:
attempt to open boot.o failed
attempt to open ../lib/boot.o failed
m68k-rtems4.11-ld: cannot find boot.o
I have specified the search path, the libraries and a list of object members; the object members are definitly in the libraries.
I was expection the linker to look for object members in the working directory and if not there using the search path and libraries.
Obviously there is something wrong but I cannot figure it out.
Suggestions are welcome in order to achieve the desired way of linking: specifying all objects to link an not more than that.
Thanks
Ben

You have misread the manual:
INPUT(file file ...)
The INPUT command directs the linker to include the named files in the link,
as though they were named on the command line.
...
In case a sysroot prefix is configured, and the filename starts with the / character,
and the script being processed was located inside the sysroot prefix, the
filename will be looked for in the sysroot prefix. Otherwise, the linker will
try to open the file in the current directory. If it is not found, the linker
will search through the archive library search path. See the description of
-L in Command Line Options.
If you use INPUT (-lfile), ld will transform the name to libfile.a, as with
the command line argument -l.
...
GROUP(file file ...)
The GROUP command is like INPUT, except that the named files should all be
archives, and they are searched repeatedly until no new undefined references
are created.
STARTUP(filename)
The STARTUP command is just like the INPUT command, except that filename will
become the first input file to be linked, as though it were specified first on the command line
Or perhaps you have read some other documentation that is in error.
You have got wrong the impression that command GROUP(libboo.a...) is complementary to INPUT(foo.o...)
and STARTUP(bar.o), with the effect that files bar.o, foo.o... will be searched for in archives
libboo.a... and - if found - will be extracted from the archives and added to the linkage.
INPUT(foo.o) has the same effect as specifying foo.o on the linker
commandline except that if not found in the current directory it will be searched
for in the same way that a static library specified with the -l option would be, with any SEARCH_DIR(path)
commands in the linker script having the same influence as commandline options
-Lpath.
STARTUP(bar.o) has the same effect as specifying bar.o first in the linker
commandline.
GROUP(libboo.a...) has the same effect as the commandline options
--start-group -lboo.a... --end-group
again with any SEARCH_DIR(path) commands actiing like -Lpath.
INPUT(foo.o) and STARTUP(bar.o) are unconnected with GROUP(libboo.a...), just as in the commandline options:
bar.o --start-group -lboo.a... --end-group foo.o
bar.o and foo.o are an input files unconnected with --start-group -lboo.a... --end-group.
The linker never looks inside static libraries for an input object file that it otherwise
fails to find.
So this command:
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
in your linker script requires the linker to find those object files in the current
directory or in ../lib, and they are not there. Similarly for STARTUP(boot.o). Hence
the linkage errors.
Those object files aren't there, but static libraries libkernel.a and libkflib.a are,
and you tell us they contain all all those object files as members. In that case
you simply don't need:
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
because the linker will search static libraries to find object files that
provide definitions of any symbols for which definitions are called for
by references within object files that it has already linked. You don't have to
tell it to.
But to give the linker any reason to search a static library at all you
must have told it to link some object file that refers to some undefined symbol(s)
before it considers the static library.
That's the reason why, on the commandline, you must place object files before the
libraries to which they refer and it is why, in a self-contained linker script, if you
add libraries to the linkage with GROUP(libboo.a...) or with INPUT (-lboo...),
you must also specify an object file to be linked first, with a STARTUP
command.
So while your failing INPUT command is unnecessary and can just be
deleted, your failing STARTUP(boot.o) must remain, to initiate
any linkage. And must not fail.
To make STARTUP(boot.o) succeed, you have to put the object file boot.o
itself in a place where the script will tell the linker to look for it,
either the current directory or ../lib. Presumably, the current directory.
And when you do that, it becomes pointless to have boot.o remain as a member
of one your static libraries, since it can't be linked from within one. Best
delete it from whichever libary you have it in. Leaving it there is similar
to putting the main function of a program into one of the libraries you
will link it with.

What does the GNU ld --undefined option do?

Can somebody explain what the GNU ld option --undefined does?
Working on a LiteOS project. The app is linked with many -u options. For example -utask_shellcmd.
The GNU linker manual for --undefined=symbol simply says:
Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional modules from standard libraries.
So the symbol will be included in the output file as an undefined. What if the symbol is already defined in one of the linked obj files? If it is really undefined, when the linking of additional modules will happen and how does that happen?

The -u option is only relevant when archive (.a) libraries are involved (maybe also .so libraries with --as-needed in effect).
Unlike individual object files (.o) on the linking command line, which are all linked in the order in which they appear, object files from an archive library are only linked when they satisfy one or more undefined symbol references at the point they appear in the link command line order. Once once .o file from the archive is pulled into the link, the process is repeated recursively, so that if it introduces more undefined symbol references, other object files from the same (or later) archives will be pulled in to satisfy them.
Using -u allows you to cause a particular symbol (and, indirectly, all dependencies of the object file it was defined in) to be pulled into the link. Of course you could just put all .o files on the command line directly, without using any archive libraries, but by using libraries you can avoid linking unused object files (this is especially useful if large parts of the code may be unused depending on build-time-configurable settings in other files!) while getting the ones you need.

Why do I get 'multiple definition' errors when linking against an archive?

I'm using CppUTest to test the C code defined in a fornol.c source file. That file defines the main production main() function.
I also have an AllTests.cpp file that also has a main() function, but that main() is supposed to be used only when running the unit tests.
AllTests.cpp gets compiled to a .o file, whereas fornol.c gets compiled to a libfornol.a archive.
Then CppUTest tries to link everything together, but here is what I get instead:
Linking fornol_tests
cc -o fornol_tests objs/tests/AllTests.o objs/tests/FornolTests.o lib/libfornol.a ../../CppUTest/lib/libCppUTest.a ../../CppUTest/lib/libCppUTestExt.a -lstdc++ -lgcov
lib/libfornol.a(fornol.o): In function `main':
/home/dlindelof/Work/endor/nol/fornol/fornol.c:453: multiple definition of `main'
objs/tests/AllTests.o:/home/dlindelof/Work/endor/nol/fornol/tests/AllTests.cpp:4: first defined here
collect2: ld returned 1 exit status
It looks as if the main() function defined in fornol.c and present in the archive libfornol.a conflicts with the main() defined in AllTests.cpp. But my understanding was that archive/library files are searched only if/when a given symbol hasn't been referenced yet. It should therefore not be a problem to have the same symbol defined more than once, provided all definitions are in archive/library files.
What am I doing wrong here?

You need to remove the main() from AllTests.cpp and put it in its own source file.
When a linker links in a library, it can't split object files in the library; it has to either link or omit each object file in the library as a unit. (I know LLVM is different, but that's another topic.) This is why, if you look at the source for a library like glibc, each function gets its own source file.
So what's happening to you is that the linker needs to pull in an object file (fornol.o) from the library (libfornol.a) to satisfy dependencies, but that object file carries a duplicate symbol with it (main).
It's perfectly okay to put test code in a library (we do this routinely where I work), but keep it in its own source files (we traditionally use main.cc). (This is a better test anyway, because test code should not have access to static-declared symbols.)

A library is supposed not to have a main() function as it is a library.
You should remove that main() from fornol.c and compile it again.
main() is the entry point of an executable file's source code, since a library (especially a static ".a" library) is only pre-compiled source code, you cannot use a main in there.
If you want a main production entry point of your library you could rename the main() in fornol.c to something more explicit and less reserved such as "fornolMain()" for example.
A static library is compiled in your binary executable and thus is not searched only if the symbol is loaded. It is exactly the same as compiling fornol.c and linking fornol.o and your other .o

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight