What does it mean to 'add an index to an archive file'? - c

My C textbook creates an archive using the -s command line option which adds an index to the following archive file.
ar -rcs libfile.a file1.o file2.o
However, I don't understand what an index is or it's purpose?

Converting comments into an answer.
There's a long, complicated history tied up in part with the ranlib program. Basically, for the linker to be able to scan a library efficiently, some program — either ar itself or ranlib — adds a special 'index' file which identifies the symbols defined in the library and the object file within the archive that defines each of those symbols. Many versions of ar add that automatically if any of the saved files is an object file. Some, it seems, require prodding with the -s option. Others don't add the index file at all and rely on ranlib to do the job.
The ar on macOS documents:
-s — Write an object-file index into the archive, or update an existing one, even if no other change is made to the archive. You may use this modifier flag either with any operation, or alone. Running ar s on an archive is equivalent to running ranlib on it.
I've not needed to use this option explicitly on macOS for a long time (nor have I run ranlib) — I think things changed sometime in the middle of the first decade of the millennium.
Don't object files already come with symbol tables? Wouldn't it seem redundant to add another symbol table to the start of the archive file?
Each object file contains information about what's in that one object file (and information about referenced objects as well as defined ones); the archive index contains much simpler information about which of the many object files in the archive defines each symbol, so that the linker doesn't have to scan each object file in the archive separately.
So, would it be correct to say that the index at the start of the archive is just a giant symbol table which replaces the individual symbol tables in each object file so the linker has an easier job?
Not replaces — augments. It allows the linker to identify which object file(s) to pull into the linked executable. The linker then processes the object files just as it does any other object file, checking for definitions and references, marking newly defined references as satisfied and identifying previously unused references that are not yet defined. But the linker doesn't have to read every file in the archive to work out which symbols are defined by the file — it knows from the index file which ones are defined.
To clarify the index allows the linker to find the specific object file which defines a symbol rather than having to scan every object file to resolve a symbol?
Yes, that’s right.

Related

Getting undefined reference to `hmac_sha1' in C

This is my current workspace. I have the Headers in the same folder with the otp.c but whenever I compile and run it it returns an error telling me that hmac-sha1 is undefined. Hope someone can help me.
Short Background
Including a header file enables you to compile the source file into an object file by declaring the function.
However, to get an executable, you need to link the object files together whereby one function used in one object file may be defined (i.e. implemented) in another object file. When listing the objects for the linker, they must be arranged in order of dependency, e.g if a depends on b the a should appear before b on the command line (in case of circular dependencies please find a post on it).
Solution
The way you run gcc makes it first compile the sources into object files and link them. otp.c requires the function hmac_sha1 is probably in hmac-sha1.c (I am guessing from the header file name) and so you should run:
gcc otp.c hmac-sha1.c -o otp
Note that otp.c depends on hmac-sha1.c hence the order.

GNU linker get objects through INPUT

I want to use the suggestion made in de GNU linker manual page 40, i.e. INPUT (subr.o), thus specifying object members in a script file.
Eventually I want to specify all the object members of my program that the linker has to use.
The script file looks like this (only the files parts is shown)
SEARCH_DIR(../lib)
STARTUP(boot.o)
ENTRY(_start)
GROUP (libkernel.a libkflib.a)
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
The linker replies with:
attempt to open boot.o failed
attempt to open ../lib/boot.o failed
m68k-rtems4.11-ld: cannot find boot.o
I have specified the search path, the libraries and a list of object members; the object members are definitly in the libraries.
I was expection the linker to look for object members in the working directory and if not there using the search path and libraries.
Obviously there is something wrong but I cannot figure it out.
Suggestions are welcome in order to achieve the desired way of linking: specifying all objects to link an not more than that.
Thanks
Ben
You have misread the manual:
INPUT(file file ...)
The INPUT command directs the linker to include the named files in the link,
as though they were named on the command line.
...
In case a sysroot prefix is configured, and the filename starts with the / character,
and the script being processed was located inside the sysroot prefix, the
filename will be looked for in the sysroot prefix. Otherwise, the linker will
try to open the file in the current directory. If it is not found, the linker
will search through the archive library search path. See the description of
-L in Command Line Options.
If you use INPUT (-lfile), ld will transform the name to libfile.a, as with
the command line argument -l.
...
GROUP(file file ...)
The GROUP command is like INPUT, except that the named files should all be
archives, and they are searched repeatedly until no new undefined references
are created.
STARTUP(filename)
The STARTUP command is just like the INPUT command, except that filename will
become the first input file to be linked, as though it were specified first on the command line
Or perhaps you have read some other documentation that is in error.
You have got wrong the impression that command GROUP(libboo.a...) is complementary to INPUT(foo.o...)
and STARTUP(bar.o), with the effect that files bar.o, foo.o... will be searched for in archives
libboo.a... and - if found - will be extracted from the archives and added to the linkage.
INPUT(foo.o) has the same effect as specifying foo.o on the linker
commandline except that if not found in the current directory it will be searched
for in the same way that a static library specified with the -l option would be, with any SEARCH_DIR(path)
commands in the linker script having the same influence as commandline options
-Lpath.
STARTUP(bar.o) has the same effect as specifying bar.o first in the linker
commandline.
GROUP(libboo.a...) has the same effect as the commandline options
--start-group -lboo.a... --end-group
again with any SEARCH_DIR(path) commands actiing like -Lpath.
INPUT(foo.o) and STARTUP(bar.o) are unconnected with GROUP(libboo.a...), just as in the commandline options:
bar.o --start-group -lboo.a... --end-group foo.o
bar.o and foo.o are an input files unconnected with --start-group -lboo.a... --end-group.
The linker never looks inside static libraries for an input object file that it otherwise
fails to find.
So this command:
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
in your linker script requires the linker to find those object files in the current
directory or in ../lib, and they are not there. Similarly for STARTUP(boot.o). Hence
the linkage errors.
Those object files aren't there, but static libraries libkernel.a and libkflib.a are,
and you tell us they contain all all those object files as members. In that case
you simply don't need:
INPUT (
lowcore.o
init.o
kfalloc.o
kflog.o
kfprintf.o
)
because the linker will search static libraries to find object files that
provide definitions of any symbols for which definitions are called for
by references within object files that it has already linked. You don't have to
tell it to.
But to give the linker any reason to search a static library at all you
must have told it to link some object file that refers to some undefined symbol(s)
before it considers the static library.
That's the reason why, on the commandline, you must place object files before the
libraries to which they refer and it is why, in a self-contained linker script, if you
add libraries to the linkage with GROUP(libboo.a...) or with INPUT (-lboo...),
you must also specify an object file to be linked first, with a STARTUP
command.
So while your failing INPUT command is unnecessary and can just be
deleted, your failing STARTUP(boot.o) must remain, to initiate
any linkage. And must not fail.
To make STARTUP(boot.o) succeed, you have to put the object file boot.o
itself in a place where the script will tell the linker to look for it,
either the current directory or ../lib. Presumably, the current directory.
And when you do that, it becomes pointless to have boot.o remain as a member
of one your static libraries, since it can't be linked from within one. Best
delete it from whichever libary you have it in. Leaving it there is similar
to putting the main function of a program into one of the libraries you
will link it with.

What does the GNU ld --undefined option do?

Can somebody explain what the GNU ld option --undefined does?
Working on a LiteOS project. The app is linked with many -u options. For example -utask_shellcmd.
The GNU linker manual for --undefined=symbol simply says:
Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional modules from standard libraries.
So the symbol will be included in the output file as an undefined. What if the symbol is already defined in one of the linked obj files? If it is really undefined, when the linking of additional modules will happen and how does that happen?
The -u option is only relevant when archive (.a) libraries are involved (maybe also .so libraries with --as-needed in effect).
Unlike individual object files (.o) on the linking command line, which are all linked in the order in which they appear, object files from an archive library are only linked when they satisfy one or more undefined symbol references at the point they appear in the link command line order. Once once .o file from the archive is pulled into the link, the process is repeated recursively, so that if it introduces more undefined symbol references, other object files from the same (or later) archives will be pulled in to satisfy them.
Using -u allows you to cause a particular symbol (and, indirectly, all dependencies of the object file it was defined in) to be pulled into the link. Of course you could just put all .o files on the command line directly, without using any archive libraries, but by using libraries you can avoid linking unused object files (this is especially useful if large parts of the code may be unused depending on build-time-configurable settings in other files!) while getting the ones you need.

Combining static libraries

Suppose I have three C static libraries say libColor.a which depends on *libRGB.*a which in turn depends on libPixel.a . The library libColor.a is said to depend on library libRGB.a since there are some references in libColor.a to some of symbols defined in libRGB.a. How do I combine all the above libraries to a new libNewColor.a which is independent?
Independent means the new library should have all symbols defined. So while linking I just need to give -lNewColor. The size of the new library should be minimal i.e it should not contain any symbols in libRGB.a which is not used by libColor.a etc.
I tried my luck using various options in ar command (used to create and update static libraries/archives).
A little used feature of the GNU archiver is the archive script, it is a simple but powerful interface, and it can do exactly what you want, for example if the following script is called script.ar:
CREATE libNewColor.a
ADDLIB libColor.a
ADDLIB libRGB.a
ADDLIB libPixel.a
SAVE
END
Then you could invoke ar as follows:
ar -M < script.ar
and you would get libNewColor.a that contains all of the .o files from libColor.a libRGB.a and libPixel.a.
Additionally you can also add regular .o files as well with the ADDMOD command:
CREATE libNewColor.a
ADDLIB libColor.a
ADDLIB libRGB.a
ADDLIB libPixel.a
ADDMOD someRandomCompiledFile.o
SAVE
END
Furthermore it is super easy to generate these scripts in Makefiles, so I typically create a somewhat generic makefile rule for creating archives which actually generates the script and invokes ar on the script. Something like this:
$(OUTARC): $(OBJECTS)
$(SILENT)echo "CREATE $#" > $(ODIR)/$(ARSCRIPT)
$(SILENT)for a in $(ARCHIVES); do (echo "ADDLIB $$a" >> $(ODIR)/$(ARSCRIPT)); done
$(SILENT)echo "ADDMOD $(OBJECTS)" >> $(ODIR)/$(ARSCRIPT)
$(SILENT)echo "SAVE" >> $(ODIR)/$(ARSCRIPT)
$(SILENT)echo "END" >> $(ODIR)/$(ARSCRIPT)
$(SILENT)$(AR) -M < $(ODIR)/$(ARSCRIPT)
Though now that I look at it I guess it doesn't work if $(OBJECTS) is empty (i.e. if you just want to combine archives without adding extra object files) but I will leave it as an exercise for the reader to fix that issue if needed... :D
Here are the docs for this feature:
https://sourceware.org/binutils/docs/binutils/ar-scripts.html#ar-scripts
1/ Extract ALL of the object files from each library (using ar) and try to compile your code without the libraries or any of the object files. You'll probably get an absolute bucket-load of undefined symbols. If you get no undefined symbols, go to step 5.
2/ Grab the first one and find out which object file satisfies that symbol (using nm).
3/ Write down that object file then compile your code, including the new object file. You'll get a new list of undefined symbols or, if there's none, go to step 5.
4/ Go to step 2.
5/ Combine all the object files in your list (if any) into a single library (again with ar).
Bang! There you have it. Try to link your code without any of the objects but with the new library.
This whole thing could be relatively easily automated with a shell script.
A static library is not much more than an archive of some object files (.o). What you can do is extract all the objects in the two libraries (using "ar x") and then use "ar" to link them together in a new library.

What exactly does "ar" utility do?

I don't really understand what ar utility does on Unix systems.
I know it can be somehow used for creating c libraries, but all that man page tells me is that it is used to make archives from files, which sounds similar to, for example, tar....
The primary purpose is to take individual object files (*.o) and bundle them together into a static library file (*.a). The .a file contains an index that allows the linker to quickly locate symbols in the library.
Tar doesn't create files that linkers understand.
ar is a general purpose archiver, just like tar. It just "happens" to be used mostly for creating static library archives, one of its traditional uses, but you can still use it for general purpose archiving, though tar would probably be a better choice. ar is also used for Debian .deb packages.
Exactly, ar is an archiver. It simply takes a set of object files (*.o) and put them in an archive that you call a static library.
It takes code in the form of object files (.obj, .o, etc) and makes a static library (archive). The library can then be included when linking with ld to include the object code into your executable.
Take a look at the example usage in the Wikipedia article.
You might want to run man ar to get the full picture. Here's a copy of that on the web.
To quote:
The GNU ar program creates, modifies, and extracts from archives. An
archive is a single file holding a collection of other files in a
structure that makes it possible to retrieve the original individual
files (called members of the archive).
ar is considered a binary utility because archives of this sort are
most often used as libraries holding commonly needed subroutines.
ar is specifically for archives (or libraries) of object code; tar is for archives of arbitrary files. Anybody's guess why GNU refers to these as 'archives', in other environments this utility is called the 'librarian', and the resulting files just libraries.

Resources