Is there a way to know which compiler generated a static library? - linker

A third party provided me a static lib (.a) to link with on solaris station.
I tried to compile with sunpro, and failed at link step.
I suppose the issue is coming from the compiler I use (gcc instead?) or simply its version (as the std lib provided by the compiler could change from the version expected by the library AFAIK it could leads to errors at link step).
How could I know which compiler was used to generate this lib? Is there some tools doing that? Some option in sunpro/gcc or whatever?
As an hint: I've read some time ago that compilers use different mangling conventions when generating object files (true?). Still, "nm --demangle" command line prints me well all function names from debug symbols in this static lib. How does it work ? If my assumption is ok, nm does have a way to resolve which convention is in use in a static library, isn't it? Or is it simply meaning that lib was generated by GNU gcc, as nm is a part of GNU binutils?
I am not close to my workstation so I can't copy & paste error output from the linker (not for the moment but I could copy them in a further edit)

Extract the object files from the archive then run the strings command on some of them (first on the smaller ones since there'd be less noise to sift through). Many compilers insert ASCII signatures in the object files.
For example, the following meaningless source file, foo.c:
extern void blah();
when compiled on my Fedora 10 machine into foo.o via gcc -c -o foo.o foo.c results in a 647 byte foo.o object file. Running strings on foo.o results in
GCC: (GNU) 4.3.2 20081105 (Red Hat 4.3.2-7)
.symtab
.strtab
.shstrtab
.text
.data
.bss
.comment
.note.GNU-stack
foo.c
which makes it clear the compiler was GCC. Even if I'd compiled it with -fno-ident, the .GNU-stack note ELF section would have still been present.
You can extract the object files using the ar utility, or using Midnight Commander (which integrates ar), or you can simply run strings on the archive (which might give you more noise and be less relevant, but would still help.)

I tend to use the strings program (with the '-a' option, or my own variant where the '-a' behaviour is standard) and look for the tell-tale signs. For example, in one of my own libraries, I find:
/work1/gcc/v4.2.3/bin/../lib/gcc/sparc-sun-solaris2.10/4.2.3/include
/work1/gcc/v4.3.0/bin/../lib/gcc/sparc-sun-solaris2.10/4.3.0/include
/work1/gcc/v4.3.1/bin/../lib/gcc/sparc-sun-solaris2.10/4.3.1/include
/work1/gcc/v4.3.3/bin/../lib/gcc/sparc-sun-solaris2.10/4.3.3/include
That suggests that the code in the library has been compiled with a variety of versions of GCC over a period of years (actually, I'm quite startled to find so many versions in a single library).
Another library contains:
cg: Sun Compiler Common 11 Patch 120760-06 2006/05/26
acomp: Sun C 5.8 Patch 121015-02 2006/03/29
iropt: Sun Compiler Common 11 Patch 120760-06 2006/05/26
/compilers/v11/SUNWspro/prod/bin/cc -O -v -Xa -xarch=v9 ...
So, there are usually fingerprints in the object files indicating which compiler was used. But you have to know how to look for them.

Is the library supposed to be a C or C++ library?
If it is a C library then name mangling can not be the problem, as there is none in C. It could be however in a wrong format. Unices used to have libraries in the a.out format but almost all newer versions switched to more powerful formats like ELF.
If it is a C++ library then name mangling can be an issue. Most compilers embed some symbols that are compiler specific into the code, so if you have a tool like nm to list the symbols you can hopefully deduce from what compiler it came.
For example g++ creates a symbol
__gxx_personality_v0
in it's libraries

You can try the unix utility file:
file foo.a

Related

gcc object file linking

I'm learning C by rehashing some Project Euler problems, as I did for Python. In Python, I created a file of general mathematical utilities such as prime number checking, which I pulled functions out of as and when I needed them. I was wondering if there was a way to simply do a similar thing with C, other than compiling alongside the utilities file each time?
I'm running Linux and using gcc as my compiler, if that helps.
It looks like you need some basic knowledge about separate compilation and libraries(archives and shared libraries). You can read about it in chapter "2.3 Writing and Using Libraries" of
Advanced Linux Programming, 1st Edition by CodeSourcery LLC, Mark L. Mitchell, Alex Samuel, Jeffrey Oldham.
This book is also available as a PDF from http://www.advancedlinuxprogramming.com/ (although the site is down at the moment). Perhaps you can search for other places to legally download the PDF.
A crash course:
You create a number of object (*.o) files via
gcc name.c -o name.o
Each file has a header that declares the functions in the source file. You might have several source files using a single header if the functions are related. The source files such as name.c include that header. Your code that uses those functions also includes that header.
You create a static library (archive) with ar
ar ruv libXYZ.a name1.o name2.o ... nameN.o
The prefix lib is important.
You link to the library with
gcc prog.o -lXYZ -o prog
This command will create an executable named prog from the object file prog.o and from object files, extracted from libXYZ.a, which are required to satisfy symbol references from prog.o.

Re-export Shared Library Symbols from Other Library (OS X / POSIX)

My question is fairly OS X on x86-64 specific but a universal solution that works on other POSIX OSes is even more appreciated.
Given a list of symbol names of some shared library (called original library in the following) and I want my shared library to re-export these symbols. Re-export as in if someone tries to resolve the symbol against my library I either provide my version of this symbol or (if my library doesn't have this symbol) forward to the original library's symbol.
I don't know the types of the symbols, I only know whether they are functions (type T in nm output) or other symbols (type S in nm output).
For functions, I already have a solution: For every function I want to re-export I generate an assembly stub that does dynamically resolve the symbol (using dlsym()) and then jumps into the resolved function with the very same environment (registers rdi, rsi, rdx, rcx, r8, r9, stack pointer, ...). I'm basically generating universal proxy functions. Using some macro trickery that can be generated fairly easy without writing code for each and every symbol.
For non-function symbols the problem seems to be harder because I cannot generate this universal proxy function, because the resolving party does never call a function.
Using a constructor function static void init(void) __attribute__((constructor)); I can execute code whenever someone loads my library, that would be a good point to resolve and re-export all non-function symbols if that's possible.
In other words, I'd like to write the symbol table of my library to point to the respective symbols of another shared library. Doing the rewriting at compile or run time is okay (run time preferred). Or put yet another way, the behaviour of DYLD_INSERT_LIBRARIES (LD_PRELOAD) is exactly what I need but I don't want to insert a new library, I want to replace one (in the file system). EDIT: The reason I don't want/can't use DYLD_INSERT_LIBRARIES or any other environment variable of the DYLD_* family is that they are ignored for code signed, restricted, ... binaries.
I'm aware of the -reexport-l, -reexport_library and -reexported_symbols_list linker flags but I could not get them to work, especially when my library is a "replacement" for frameworks that are part of umbrella frameworks (example: /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/SearchKit) because ld forbids to link directly against parts of umbrella frameworks.
EDIT: Because I explained it somewhat ambiguously: I can't change the way the actual program is linked. The goal is to produce a shared library that is a replacement for the original library. (Apparently called filter library.)
Found it out now (OS X specific): clang -o replacement-lib.dylib ... -Xlinker -reexport_library PATH_TO_ORIGINAL_LIB does the trick. PATH_TO_ORIGINAL_LIB could for example be /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit.
If PATH_TO_ORIGINAL_LIB is a library that is part of an umbrella framework (as in the example above), then replace PATH_TO_ORIGINAL_LIB by the path of some other lib (I created a lib empty.dylib for that) and as a second step do
install_name_tool -change /usr/local/lib/empty.dylib PATH_TO_ORIGINAL_LIB replacement-lib.dylib
To see if the actual reexporting worked use:
otool -l replacement-lib.dylib | grep -A2 LC_REEXPORT_DYLIB
The output should look like
cmd LC_REEXPORT_DYLIB
cmdsize XX
name empty.dylib (offset YY)
After launching the install_name_tool it could be
cmd LC_REEXPORT_DYLIB
cmdsize XX
name /System/Library/Frameworks/CoreServices.framework/Frameworks/SearchKit.framework/Versions/Current/SearchKit (offset YY)
You could link against both libraries and use the link order to make sure to link against the right symbols. This works on both OS X and Linux:
cc -o executable -lmylib -loriglib
Where origlib is the original library and mylib contains symbols that are supposed to overwrite symbols in origlib. Then the executable will be linked against your symbols from mylib first and all unresolved symbols will be linked against origlib.
This works in the same way when linking against OS X frameworks. Just link against your library that replaces symbols first and against the framework after.
cc -o executable -lmylib -framework SomeFramework
Edit: If you just want to replace symbols at runtime then you can use LD_PRELOAD in the same way:
cc -o executable -framework SomeFramework
LD_PRELOAD=libmylib.dylib ./executable

Can the object files output by gcc vary between compilations of the same source with the same options?

Does the gcc output of the object file (C language) vary between compilations? There is no time-specific information, no change in compilation options or the source code. No change in linked libraries, environmental variables either. This is a VxWorks MIPS64 cross compiler, if that helps. I personally think it shouldn't change. But I observe that sometimes randomly, the instructions generated changes. I don't know what's the reason. Can anyone throw some light on this?
How is this built? For example, if I built the very same Linux kernel, it includes a counter that is incremented each build. GCC has options to use profiler information to guide code generation, if the profiling information changes, so will the code.
What did you analyze? The generated assembly, an objdump of object files or the executable? How did you compare the different versions? Are you sure you looked at executable code, not compiler/assembler/linker timestamps?
Did anything change in the environment? New libraries (and header files/declarations/macro definitions!)? New compiler, linker? New kernel (yes, some header files originate with the kernel source and are shipped with it)?
Any changes in environment variables (another user doing the compiling, different machine, different hookup to the net gives a different IP address that makes it's way into the build)?
I'd try tracing the build process in detail (run a build and capture the output in a file, and do so again; compare those).
Completely mystified...
I had a similar problem with g++. Pre 4.3 versions produced exactly the same object files each time. With 4.3 (and later?) some of the mangled symbol names are different for each run - even without -g or other recordings. Perhaps the use a time stamp or random number (I hope not). Obviously some of those symbols make it into the .o symbol table and you get a difference.
Stripping the object file(s) makes them equal again (wrt. binary comparison).
g++ -c file.C ; strip file.o; cmp file.o origfile.o
Why should it vary? It is the same result always. Try this:
for i in `seq 1000`; do gcc 1.c; md5sum a.out; done | sort | uniq | wc -l
The answer is always 1. Replace 1.c and a.out to suit your needs.
The above counts how many different executables are generated by gcc when compiling the same source for 1000 times.
I've found that in at least some environments, the same source may yield a different executable if the source tree for the subsequent build is located in a different directory. Example:
Checkout a pristine copy of your project to dir1. Do a full rebuild from scratch.
Then, with the same user on the same machine, checkout the same exact copy of your source code to dir2 (dir1 != dir2). Do another full rebuild from scratch.
These builds are minutes apart, with no change in the toolchain or any 3rd party libs or code. Binary comparison of source code is the same. However, the executable in dir1 has different md5sum than the executable in dir2.
If I compare the different executables in BeyondCompare's hex editor, the difference is not just some tiny section that could plausibly be a timestamp.
I do get the same executable if I build in dir1, then rebuild again in dir1. Same if I keep building the same source over and over from dir2.
My only guess is that some sort of absolute paths of the include hierarchy are embedded in the executable.
My gcc sometimes produces different code for exactly the same Input. The output object files differ in exactly one byte.
Sometimes this causes linker Errors, because one possible object file is invalid. Recompiling another version usually fixes the linker error.
The gcc Version is 4.3.4 on Suse Linux Enterprise.
The gcc Parameters are:
cc -std=c++0x -Wall -fno-builtin -march=native -g -I<path1> -I<path2> -I<path3> -o obj/file.o -c file.cpp
If someone experiences the same effect, then please let me know.

Statically linking libclang in C code

I'm trying to write a simple syntax checker for C code using the frontend available in libclang. Due to deployment concerns, I need to be able to statically link all the libraries in libclang, and not pass around the .so file that has all the libraries.
I'm building clang/llvm from source, and in llvm/Release+Asserts/lib I have a bunch of .a files that I think I should be able to use, but it never seems to work (the linker spews out thousands of errors about missing symbols). However, when I compile it using the libclang.so also present in that directory as follows:
clang main.c -o bin/dlc -I../llvm/tools/clang/include -L../llvm/Release+Asserts/lib/ -lclang
Everything seems to work well.
What is the minimum set of .a files I need to include to make this work? I've tried including absolutely all of the .a files in the build output directory, with them provided to clang/gcc in different orders, without any success. I only need the functions mentioned in libclang's Index.h, but there don't seem to be any resources or documentation on what the various libclang*.a files are for. It would be very helpful to know which files libclang.so pulls in.
The following is supposed to work, as long the whole project has all static libraries (I counted 116 in my Release/lib directory).
clang main.c -o bin/dlc -I../llvm/tools/clang/include ../llvm/Release/lib/*.a
[edit: clang main.c -o bin/dlc -I../llvm/tools/clang/include ../llvm/Release/lib/libclang.a ../llvm/Release/lib/*.a]
Note that the output binary is not static, so you don't need any -static flag for gcc or ld, if you're using this syntax.
If that doesn't work you might need to list the libraries in order: if some library requires a function available in another library, then it may be necessary to list it first in the command line. See comments about link order at:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Link-Options.html#Link-Options

Fortran g77 compiler can't recognize o.f or comment "c"

I was using Fortran g77 and experienced this problem:
c this program calculates runoff and sediment
1 2
Unrecognized statement name at (1) and invalid form for assignment or statement-function definition at (2)
Also, the compiler can recognized only .for file extension, not .f.
Does anyone know, where is the problem? I downloaded it from http://www.cse.yorku.ca/~roumani/fortran/ftn.htm.
The compiler is not recognizing that statement as a comment. As a comment it should ignore the line but it is trying parse it. Are you sure that the "C" is in the first column?
Why are you using g77? It hasn't been supported for years. gfortran is the current GNU Fortran compiler. It can compile FORTRAN 77, Fortran 90, 95 and portions of 2003 and 2008.
EDIT: Perhaps its wants an upper-case "C".
The page you have linked to states that the f2exe wrapper passes -ffree-form to the compiler:
Compilation Command
The above f2exe command is just a batch file that invokes g77, the "real" compilation command. The command:
g77 -ffree-form prog.for -oprog.exe
directs the compiler to compile the file prog.for and stores the output in the file prog.exe. The -ffree-form switch indicates free-form style (remove it if you are using the old style).
In free-form Fortran the only allowed comment format is that of a line starting with !. As a matter of fact, this is also written on the same page directly under the above text:
Comments
In free-form style, use ! for both full-line and in-line comments. In the old style, use a "C" in column-1.
If you are not using the provided f2exe wrapper, don't pass -ffree-form option when compiling fixed-form FORTRAN 77 code.
I'll assume you want to stick with this compiler.
As noted above, the problems you have come from using the F2EXE batch file, which is not very useful: first it automatically adds ".for" to the file name, so you can't compile ".f" files, and it assumes free-form syntax, which is unusual when programming in Fortran 77 (and if you want Fortran 90, find another compiler, other answers give you links).
Now, suppose you have written a program myprogram.f, and you are in a Windows command line, in the same directory where the program resides (use "cd C:\mydirectory" for example, to change)
You will compile with
g77 myprogram.f
If you use SLATEC, you use
g77 myprogram.f -lslatec
If you want to specify a name for your .exe file (default is a.exe), you write
g77 myprogram.f -o myprogram.exe
There are other useful options
g77 -O2 myprogram.f to optimize (within g77 2.95 limitations)
g77 -Wall myprogram.f to enable all compiler warnings, very useful
to find errors in your code
g77 -c myprogram.f to only compile (you get a .o file), this is
useful to compile functions and subroutines, to
later build a static library (.a file), like
libslatec.a which is given with the compiler
And to build a library, using ar.exe:
ar cru mylib.a myfunc1.o myfnuc2.o ...
Then you can use is with
g77 myprogram.f mylib.a
G77 runs in command line under Windows. You write programs in a text editor.
Notepad++ is fairly good and its free. See http://notepad-plus-plus.org/
If you have problems with compilation, maybe it comes from environment variables, so here are some precisions. You have to tell Windows where to find the G77 compiler (g77.exe).
You can follow instructions on the site where you downloaded it to change Windows' environment variables PATH and LIBRARY_PATH. It needs you install the compiler in the C:\F directory : that is, you will have C:\F\G77\bin, etc.
Slight modification to the instructions on that page :
You should set PATH to C:\F\G77\bin
And LIBRARY_PATH to C:\F\G77\lib;C:\F\SLATEC\lib
This modification to LIBRARY_PATH allows you to compile with SLATEC simply with "-lslatec" as above.
A note about the compiler. It's G77, also know as GNU Fortran 77. An old compiler, integrated with the well known GCC suite until GCC 3.4.6 (we are at GCC 4.7.2 now). And the compiler you downloaded is for version GCC 2.95.
It's a good Fortran 77 compiler, but it's not very well optimized, and of course, you don't get any support for new processor features such as Intel SSE.
Modern Fortran compilers can still understand most if not all of Fortran 77, plus all the newer features of Fortran 90 and newer standards, which are extremely useful.
It may also be interesting to know there is another place to download the same compiler (eccept there is no SLATEC), just in case the page gets destroyed :
http://www.mbr-pwrc.usgs.gov/software/g77.html

Resources