Make LLVM inline a function from a library

Make LLVM inline a function from a library - linker

I am trying to make LLVM inline a function from a library.
I have LLVM bitcode files (manually generated) that I linked together with llvm-link, and I also have a library (written in C) compiled into bitcode by clang and archived with llvm-ar. I manage to link everything together and to execute but I can't manage to get LLVM to inline a function from the library. Any clue about how this should be done?

After you link the bitcode files together with the library, do you run an Internalize pass on the linked bitcode? The internalize pass makes all functions (besides main()) static and tells optimizer/code generator that the functions can be safely inlined without keeping a copy available for some (non-existent) external reference.
I manually link my bitcode files and bitcode libraries together using code borrowed from llvm-ld and I do the internalize pass, but I'm not sure if llvm-link does the internalize pass or not.

Related

Why do some system libraries require a -l option while others do not?

I recently took a class where we used pthreads and when compiling we were told to add -lpthread. But how come when using other #include <> statements for system header files, it seems that the linking of the object implementation code happens automatically? For example, if I just want to get the header file #include <stdio.h>, I don't need a -l option in compilation, the linking of that .o implementation file file just happens.

For this file
#include <stdio.h>
int main() {
return 0;
}
run
gcc -v -o simple simple.c
and you will see what, actually, gcc does. You will see that it links with libraries behind your back. This is why you don't specify system libraries explicitly.

Basic Answer: -lpthreads tells the compiler/linker to link to the pthreads library.
Longer answer
Headers tell the compiler that a certain function is going to be available (sometimes that function is defined in the header, perhaps inline) when the code is later linked. So the compiler marks the function essentially available but later during the linking phase the linker has to connect the function call to an actual function. A lot of function in the system headers are part of the "C Runtime Library" your linker automatically uses but others are provided by external libraries (such as pthreads). In the case where it is provided by an external library you have to use the '-lxxx' so the compiler/linker knows which external library to include in the process so it gets the address of the functions correctly.
Hope that helps

A C header file does not contain the implementations of the functions. It only contains function prototypes, so that the compiler could generate proper function calls.
The actual implementation is stored in a library. The most commonly used functions (such as printf and malloc) are implemented in the Standard C Library (LibC), which is implicitly linked to any executable file unless you request not to link it. The threads support is implemented in a separate library that has to be linked explicitly by passing the -pthread option to the linker.
NB: You can pass the option -pthread to the compiler that will also link the appropriate library.

The compiler links the standard C library libc.a implicitly so a -lc argument is not necessary. Pthreads is a system library but not a part of the C standard library, so must be explicitly linked.
If you were to specify -nolibc you would then need to explicitly link libc (or some alternative C library).
If all system libraries were linked implicitly, gcc would have to be have a different implementation for each system (for example pthreads is not a system library on Windows), and if a system introduced a new library, gcc would have to change in lock step. Moreover the link time would increase as each library were searched in some unknown sequence to resolve symbols. The C standard library is the one library the compiler can rely on to be provided in any particular implementation, so implicit linking is generally safe and a simple convenience.

Some library files are are searched by default, simply for convenience, basically the C standard library. Linkers have options to disable this convenience and have no default libraries (for if you are doing something unusual and don't want them). Non-default libraries, you have to tell linker to use them.
There could be a mechanism to tell what libraries are needed in the source code (include files are plain text which is just "copy-pasted" at #include line), there's no technical difficulty. But there just isn't (as far as I know, not even as non-standard compiler extension).
There are some solutions to the problem though, such as pkg-config for unixy platforms, which tackle the problem of include and library files by providing the compiler and linker options for a library easily.

the -l option is for linking against an external library in this case libpthread, against system libraries is linked by default
http://www.network-theory.co.uk/docs/gccintro/gccintro_17.html
C++: How to add external libraries

How does a compiler find out which dynamic link library will be used in my code, if I only include headers-files, where is not describe it?

How does a compiler find out which dynamic link library will be used in my code, if I only include headers-files, where is not describe it?
#include <stdio.h>
void main()
{
printf("Hello world\n");
}
There is I only include
stdio.h
and my code is used
printf function
How it is known, in headers-files prototypes , macros and constant are described, but nothing about in which file "printf" is implement. How does then it works?

When you compile a runnable executable, you don't just specify the source code, but also a list of libraries from which undefined references are looked up. With the C standard library, this happens implicitly (unless you tell GCC -nostdinc), so you may not have been consciously aware of this.
The libraries are only consumed by the linker, not the compiler. The linker locates all the undefined references in the libraries. If the library is a static one, the linker just adds the actual machine code to your final executable. On the other hand, if the library is a shared one, the linker only records the name (and version?) of the library in the executable's header. It is then the job of the loader to find appropriate libraries at load time and resolve the missing dependencies on the fly.
On Linux, you can use ldd to list the load-time dependencies of a dynamically linked executable, e.g. try ldd /bin/ls. (On MacOS, you can use otool -L for the same purpose.)

As others have answered, the standard c library is implicitly linked. If you are using gcc you can use the -Wl,--trace option to see what the linker is doing.
I tested your example code:
gcc -Wl,--trace main.c
Gives:
/usr/bin/ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o
/tmp/ccCjfUFN.o
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/4.6/libgcc_s.so)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
-lgcc_s (/usr/lib/gcc/x86_64-linux-gnu/4.6/libgcc_s.so)
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../x86_64-linux-gnu/crtn.o
This shows that the linker is using libc.so (and also ld-linux.so).

The library glibc is linked by default by GCC. There is no need to mention -l library when you are building your executable. Hence you find that the functions printf and others which are a part of glibc do not need any linking exclusively.

Technically your compiler does not figure out which libraries will be used. The linker (commonly ld) does this. The header files only tell the compiler what interface your library functions use and leaves it up to the linker to figure out where they are.
A source file goes a long path until it becomes an executable. Commonly
source.c -[preprocess]> source.i -[compile]> source.s -[assemble]> source.o -[link]> a.out
When you invoke cc source.c all those steps are done transparently for you in one go and the standard libraries (commonly libc.so) and executable loader (commonly crt0.o) are linked together.
Any additional libraries have to be passed as additional linker flags i.e. -lpthread.

I would say that depends on IDE or the compiler and system. Header file just contains interface information like name of function parameters it expects any attributes others and that's how compiler first convert your code to an intermediate object file.
After that comes linking where in code for printf is getting added to the executable either through static library or dynamic library.
Functions and other facilities like STL are part of C/C++ so they are either delivered by compiler or system. e.g on Solaris there is no debug version of C library unless you are using gcc. But on Visual Studio you have debug version msvcrt.dll and you can also link C library statically.
In short the answer is that code for printf and other functions in C library are added by compiler at link time.

How do you create a lua plugin that calls the c lua api?

I'm getting errors in the lua plugin that I'm writing that are symptomatic of linking in two copies of the lua runtime, as per this message:
http://lua-users.org/lists/lua-l/2008-01/msg00671.html
Quote:
Which in turn means the equality test for dummynode is failing.
This is the usual symptom, if you've linked two copies of the Lua
core into your application (causing two instances of dummynode to
appear).
A common error is to link C extension modules (shared libraries)
with the static library. The linker command line for extension
modules must not ever contain -llua or anything similar!
The Lua core symbols (lua_insert() and so on) are only to be
exported from the executable which contains the Lua core itself.
All C extension modules loaded afterwards can then access these
symbols. Under ELF systems this is what -Wl,-E is for on the
linker line. MACH-O systems don't need this since all non-static
symbols are exported.
This is exactly the error I'm seeing... what I don't know is what I should be doing instead.
I've added the lua src directory to the include path of the DLL that is the c component of my lua plugin, but when I link it I get a pile of errors like:
Creating library file: libmo.dll.a
CMakeFiles/moshared.dir/objects.a(LTools.c.obj): In function `moLTools_dump':
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:38: undefined reference to `lua_gettop'
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:47: undefined reference to `lua_type'
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:48: undefined reference to `lua_typename'
d:/projects/mo-pong/deps/mo/src/mo/lua/LTools.c:49: undefined reference to `lua_tolstring'
So, in summary, I have this situation:
A parent binary that is statically linked to the lua runtime.
A lua library that loads a DLL with C code in it.
The C code in the DLL needs to invoke the lua c api (eg. lua_gettop())
How do I link that? Surely the dynamic library can't 'see' the symbols in the parent binary, because the parent binary isn't loading them from a DLL, they're statically linked.
...but if I link the symbols in as part of the plugin, I get the error above.
Help? This seems like a problem that should turn up a lot (dll depends on symbols in parent binary, how do you link it?) but I can't seem to see any useful threads about it.
(before you ask, no, I dont have control over the parent binary and I cant get it to load the lua symbols from the DLL)

It's probably best to use libtool for this to make your linking easier and more portable. The executable needs to be linked with -export-dynamic to export all the symbols in it, including the Lua symbols from the static library. The module needs to then be linked with -module -shared -avoid-version and, if on Windows, additionall -no-undefined; if on MacOS, additionally -no-undefined -flat_namespace -undefined suppress -bundle; Linux and FreeBSD need no other symbols. This will leave the module with undefined symbols that are satisfied in the parent. If there are any missing, the module will fail to be dlopened by the parent.
The semantics are slightly different for each environment, so it might take some fiddling. Sometimes order of the flags matters. Again, libtool is recommended since it hides much of the inconsistency.

LLVM - linking problem

I am writing a LLVM code generator for the language Timber, the current compiler emits C-code. My problem is that I need to call C functions from the generated LLVM files, for example the compiler has a real-time garbage collector and i need to call functions to notify when new objects are allocated on the heap. I have no idea on how to link these functions with my generated LLVM files.
The code generation is made by generate .ll-files and then manually compile these.
I'm trying to call an external function from LLVM but i have no luck. In the examples I've >found only C standard functions like "puts" and "printf" are called, but I want to call a >homemade function. I'm stuck.

I'm assuming you're writing an LLVM transformation, and you want to add calls to external functions into transformed code. If this is not the case, edit your question and include more information.
Before you can call an external function from LLVM code, you need to insert a declaration for it. For example:
virtual bool runOnModule(Module &m) {
Constant *log_func = m.getOrInsertFunction("log_func",
Type::VoidTy,
PointerType::getUnqual(Type::Int8Ty),
Type::Int32Ty,
Type::Int32Ty,
NULL);
...
}
The code above declares a function log_func which returns void and takes three arguments: a byte pointer (string), and two 32-bit integers. getOrInsertFunction is a method of Module.
To actually call the function, you have to insert a CallInst. There are several static Create methods for this.

Compile your LLVM assembly files normally with llvm-as:
llvm-as *.ll
Compile the bitcode files to .s assembly language files:
llc *.bc
GCC them in with the runtime library:
gcc *.s runtime.c -o executable
Substitute in real makefiles, shared libraries, etc. if necessary. You get the idea.

I'm interpreting your question as being "how do I implement a runtime library in C or C++ for my language that gets compiled to LLVM?"
One approach is, as detailed by Jonathan Tang, to transform the output of your compiler from LLVM IR to bitcode to assembly, and have vanilla gcc link the assembly against the runtime source (or object files).
An alternative, possibly more flexible approach is to use llvm-gcc to compile the runtime itself into LLVM bitcode, and then use llvm-ld to link the bitcode from your compiler with the bitcode of your runtime. This bitcode can then be re-optimized with opt, converted back to IR with llvm-dis, interpreted directly with lli (this will, afaik, only work if LLVM was built against libffi), or compiled to assembly with llc (and then to a native binary with vanilla gcc).

How do linkers decide what parts of libraries to include?

Assume library A has a() and b(). If I link my program B with A and call a(), does b() get included in the binary? Does the compiler see if any function in the program call b() (perhaps a() calls b() or another lib calls b())? If so, how does the compiler get this information? If not, isn't this a big waste of final compile size if I'm linking to a big library but only using a minor feature?

Take a look at link-time optimization. This is necessarily vendor dependent. It will also depend how you build your binaries. MS compilers (2005 onwards at least) provide something called Function Level Linking -- which is another way of stripping symbols you don't need. This post explains how the same can be achieved with GCC (this is old, GCC must've moved on but the content is relevant to your question).
Also take a look at the LLVM implementation (and the examples section).
I suggest you also take a look at Linkers and Loaders by John Levine -- an excellent read.

It depends.
If the library is a shared object or DLL, then everything in the library is loaded, but at run time. The cost in extra memory is (hopefully) offset by sharing the library (really, the code pages) between all the processes in memory that use that library. This is a big win for something like libc.so, less so for myreallyobscurelibrary.so. But you probably aren't asking about shared objects, really.
Static libraries are a simply a collection of individual object files, each the result of a separate compilation (or assembly), and possibly not even written in the same source language. Each object file has a number of exported symbols, and almost always a number of imported symbols.
The linker's job is to create a finished executable that has no remaining undefined imported symbols. (I'm lying, of course, if dynamic linking is allowed, but bear with me.) To do that, it starts with the modules named explicitly on the link command line (and possibly implicitly in its configuration) and assumes that any module named explicitly must be part of the finished executable. It then attempts to find definitions for all of the undefined symbols.
Usually, the named object modules expect to get symbols from some library such as libc.a.
In your example, you have a single module that calls the function a(), which will result in the linker looking for module that exports a().
You say that the library named A (on unix, probably libA.a) offers a() and b(), but you don't specify how. You implied that a() and b() do not call each other, which I will assume.
If libA.a was built from a.o and b.o where each defines the corresponding single function, then the linker will include a.o and ignore b.o.
However, if libA.a included ab.o that defined both a() and b() then it will include ab.o in the link, satisfying the need for a(), and including the unused function b().
As others have mentioned, there are linkers that are capable of splitting individual functions out of modules, and including only those that are actually used. In many cases, that is a safe thing to do. But it is usually safest to assume that your linker does not do that unless you have specific documentation.
Something else to be aware of is that most linkers make as few passes as they can through the files and libraries that are named on the command line, and build up their symbol table as they go. As a practical matter, this means that it is good practice to always specify libraries after all of the object modules on the link command line.

It depends on the linker.
eg. Microsoft Visual C++ has an option "Enable function level linking" so you can enable it manually.
(I assume they have a reason for not just enabling it all the time...maybe linking is slower or something)

Usually (static) libraries are composed of objects created from source files. What linkers usually do is include the object if a function that is provided by that object is referenced. if your source file only contains one function than only that function will be brought in by the linker. There are more sophisticated linkers out there but most C based linkers still work like outlined. There are tools available that split C source that contain multiple functions into artificially smaller source files to make static linking more fine granular.
If you are using shared libraries then you don't impact you compiled size by using more or less of them. However your runtime size will include them.

This lecture at Academic Earth gives a pretty good overview, linking is talked about near the later half of the talk, IIRC.

Without any optimization, yes, it'll be included. The linker, however, might be able to optimize out by statically analyzing the code and trying to remove unreachable code.

It depends on the linker, but in general only functions that are actually called get included in the final executable. The linker works by looking up the function name in the library and then using the code associated with the name.
There are very few books on linkers, which is strange when you think how important they are. The text for a good one can be found here.

It depends on the options passed to the linker, but typically the linker will leave out the object files in a library that are not referenced anywhere.
$ cat foo.c
int main(){}
$ gcc -static foo.c
$ size
text data bss dec hex filename
452659 1928 6880 461467 70a9b a.out
# force linking of libz.a even though it isn't used
$ gcc -static foo.c -Wl,-whole-archive -lz -Wl,-no-whole-archive
$ size
text data bss dec hex filename
517951 2180 6844 526975 80a7f a.out

It depends on the linker and how the library was built. Usually libraries are a combination of object files (import libraries are a major exception to this). Older linkers would pull things into the output file image at a granularity of the object files that were put into the library. So if function a() and function b() were both in the same object file, they would both be in the output file - even if only one of the 2 functions were actually referenced.
This is a reason why you'll often see library-oriented projects with a policy of a single C function per source file. That way each function is packaged in its own object file and linkers have no problem pulling in only what is referenced.
Note however that newer linkers (certainly newer Microsoft linkers) have the ability to pull in only parts of object files that are referenced, so there's less of a need today to enforce a one-function-per-source-file policy - though there are reasonable arguments that that should be done anyway for maintainability.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight