Only link certain symbols from a library - c

I am developing an embedded system with GCC, and would like to only use a few symbols from libc. For instance, I would like to use the basic memcpy, memmove, memset, strlen, strcpy, etc. However, I would like to provide my own (smaller) printf function, so I do not want libc to privide printf. I don't want dynamic allocation in this platform, so I do not want malloc to resolve at all.
Is there a way to tell GCC "only provide these symbols" from libc?
edit: To be clear, I am asking if there is a way I can only provide a few specific symbols from a library, not just override a library function with my own implementation. If the code uses a symbol that is in the library but not specified, the linker should fail with "unresolved symbol". If another question explains how to do this, I haven't yet seen it.

This should happen "automatically" as long as your libc and linker setup supports it. You haven't told what your platform is, so here is one where it does work.
So, let's create a silly example using snprintf.
/*
* main.c
*/
#include <stdio.h>
int main(int argc, char **argv) {
char l[100];
snprintf(l, 100, "%s %d\n", argv[0], argc);
return 0;
}
try to compile and link it
$ CC=/opt/gcc-arm-none-eabi-4_7-2013q3/bin/arm-none-eabi-gcc
$ CFLAGS="-mcpu=arm926ej-s -Wall -Wextra -O6"
$ LDFLAGS="-nostartfiles -L. -Wl,--gc-sections,-emain"
$ $CC $CFLAGS -c main.c -o main.o
$ $CC $LDFLAGS main.o -o example
/opt/gcc-arm-none-eabi-4_7-2013q3/bin/../lib/gcc/arm-none-eabi/4.7.4/../../../../arm-none-eabi/lib/libc.a(lib_a-sbrkr.o): In function `_sbrk_r':
sbrkr.c:(.text._sbrk_r+0x18): undefined reference to `_sbrk'
collect2: error: ld returned 1 exit status
It needs _sbrk because newlib *printf functions use malloc which needs a way to allocate system memory. Let's provide it a dummy one.
/*
* sbrk.c
*/
#include <stdint.h>
#include <unistd.h>
void *_sbrk(intptr_t increment) {
return 0;
}
and compile it
$ $CC $CFLAGS -c sbrk.c -o sbrk.o
$ $CC $LDFLAGS -Wl,-Map,"sbrk.map" main.o sbrk.o -o with-sbrk
$ /opt/gcc-arm-none-eabi-4_7-2013q3/bin/arm-none-eabi-size with-sbrk
text data bss dec hex filename
28956 2164 56 31176 79c8 with-sbrk
Well, that's the reason you'd like to get rid of printf and friends, isn't it? Now, replace snprintf with our function
/*
* replace.c
*/
#include <stdio.h>
#include <string.h>
int snprintf(char *str, size_t size, const char *format, ...) {
return strlen(format);
}
then compile
$ $CC $CFLAGS -c replace.c -o replace.o
$ $CC $LDFLAGS -Wl,-Map,"replace.map" main.o replace.o -o with-replace
$ /opt/gcc-arm-none-eabi-4_7-2013q3/bin/arm-none-eabi-size with-sbrk
text data bss dec hex filename
180 0 0 180 b4 with-replace
Note that we did not use the _sbrk stub at all. As long as you don't provide _sbrk, you can be sure that malloc is not (can't be) linked and used.

The simplest solution is probably to use a wrapper which defines the symbols and resolves them at runtime using dlfcn:
#include <dlfcn.h>
void* (*memcpy)(void *dest, const void *src, size_t n);
char* (*strncpy)(char *dest, const char *src, size_t n);
...
void init_symbols (void) {
void *handle = dlopen("/lib/libc.so.6", RTLD_LAZY);
memcpy = dlsym(handle, "memcpy");
strncpy = dlsym(handle, "strncpy");
...
}
and link your binary with -nostdlib. This gives you the best control on which symbols to use from which source.

Related

dl_iterate_phdr() returns 0 as the main executable base address? [duplicate]

For various purposes, I am trying to obtain the address of the ELF header of the main executable without parsing /proc/self/maps. I have tried parsing the link_list chain given by dlopen/dlinfo functions but they do not contain an entry where l_addr points to the base address of the main executable. Is there any way to do this (Standard or not) without parsing /proc/self/maps?
An example of what I'm trying to do:
#include <stdio.h>
#include <elf.h>
int main()
{
Elf32_Ehdr* header = /* Somehow obtain the address of the ELF header of this program */;
printf("%p\n", header);
/* Read the header and do stuff, etc */
return 0;
}
The void * pointer returned by dlopen(0, RTLD_LAZY) gives you a struct link_map *, that corresponds to the main executable.
Calling dl_iterate_phdr also returns the entry for the main executable on the very first execution of callback.
You are likely confused by the fact that .l_addr == 0 in the link map, and that dlpi_addr == 0 when using dl_iterate_phdr.
This is happening, because l_addr (and dlpi_addr) don't actually record the load address of an ELF image. Rather, they record the relocation that has been applied to that image.
Usually the main executable is built to load at 0x400000 (for x86_64 Linux) or at 0x08048000 (for ix86 Linux), and are loaded at that same address (i.e. they are not relocated).
But if you link your executable with -pie flag, then it will be linked-at 0x0, and it will be relocated to some other address.
So how do you get to the ELF header?
2023 Update:
Isn't a simpler method (if relying on undocumented details), just to call dladdr on the l_ld address in the struct link_map, and then use dli_fbase out of that? – Simon Kissane
Indeed it is. Here is much simpler solution:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>
int main()
{
void *dyn = _DYNAMIC;
Dl_info info;
if (dladdr(dyn, &info) != 0) {
printf("a.out loaded at %p\n", info.dli_fbase);
}
return 0;
}
gcc -g -Wall -Wextra x.c -ldl && ./a.out
a.out loaded at 0x556433ea0000 # high address here because my GCC defaults to PIE.
gcc -g -Wall -Wextra x.c -ldl -no-pie && ./a.out
a.out loaded at 0x400000
gcc -g -Wall -Wextra x.c -ldl -no-pie -m32 && ./a.out
a.out loaded at 0x8048000
Original 2012 solution:
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <link.h>
#include <stdio.h>
#include <stdlib.h>
static int
callback(struct dl_phdr_info *info, size_t size, void *data)
{
int j;
static int once = 0;
if (once) return 0;
once = 1;
printf("relocation: 0x%lx\n", (long)info->dlpi_addr);
for (j = 0; j < info->dlpi_phnum; j++) {
if (info->dlpi_phdr[j].p_type == PT_LOAD) {
printf("a.out loaded at %p\n",
(void *) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr));
break;
}
}
return 0;
}
int
main(int argc, char *argv[])
{
dl_iterate_phdr(callback, NULL);
exit(EXIT_SUCCESS);
}
$ gcc -m32 t.c && ./a.out
relocation: 0x0
a.out loaded at 0x8048000
$ gcc -m64 t.c && ./a.out
relocation: 0x0
a.out loaded at 0x400000
$ gcc -m32 -pie -fPIC t.c && ./a.out
relocation: 0xf7789000
a.out loaded at 0xf7789000
$ gcc -m64 -pie -fPIC t.c && ./a.out
relocation: 0x7f3824964000
a.out loaded at 0x7f3824964000
Update:
Why does the man page say "base address" and not relocation?
It's a bug ;-)
I am guessing that the man page was written long before prelink and pie, and ASLR existed. Without prelink, shared libraries are always linked to load at address 0x0, and then relocation and base address become one and the same.
how come dlpi_name points to an empty string when info refers to the main executable?
It's an accident of implementation.
The way this works, is that the kernel open(2)s the executable and passes the open file descriptor to the loader (in the auxv[] vector, as AT_EXECFD). Everything the loader knows about the executable it gets by reading that file descriptor.
There is no easy way on UNIX to map a file descriptor back to the name it was opened as. For one thing, UNIX supports hard-links, and there could be multiple filenames that refer to the same file.
Newer Linux kernels also pass in the name that was used to execve(2) the executable (also in auxv[], as AT_EXECFN). But that is optional, and even when it is passed in, glibc doesn't put it into .l_name / dlpi_name in order to not break existing programs which became dependent on the name being empty.
Instead, glibc saves that name in __progname and __progname_full.
The loader coud readlink(2) the name from /proc/self/exe on systems that didn't use AT_EXECFN, but the /proc file system is not guaranteed to be mounted either, so that would still leave it with an empty name sometimes.
There is the glibc dl_iterate_phdr() function. I'm not sure it gives you exactly what you want, but that is as close as I know:
"The dl_iterate_phdr() function allows an application to inquire at run time to find out which shared objects it has loaded."
http://linux.die.net/man/3/dl_iterate_phdr

How to automatically run user-code upon loading Fortran module

In C using GCC, one can use the following function to have some code called upon loading a shared library:
static void __attribute__((constructor)) _my_initializer(void)
{
...
}
After some search on the web, I could not find the equivalent in Fortran using GCC (i.e. gfortran). For sure that this feature must exist in gfortran since it comes from GCC (thus it should be available in all languages supported by GCC). Any pointers?
"For sure that this feature must exist in gfortran since it comes from GCC" That is clearly false. It simply does not have to exist. gfortran does support the !GCC$ ATTRIBUTES directive, but the number of attributes supported is limited.
You can write your constructor in C and let it be part of the same library and call any Fortran code you want.
Example:
library.f90:
subroutine sub() bind(C)
write(*,*) "Hello!"
end subroutine
init_library.c:
void sub(void);
static void __attribute__((constructor)) _init(void)
{
sub();
}
load_library.c:
#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>
typedef void (*foo)(void);
int main(int argc, char* argv[])
{
void *lib = dlopen("library.so", RTLD_NOW);
if(lib == NULL)
return printf("ERROR: Cannot load library\n");
dlclose(lib);
}
compile and run:
> gfortran -c -fPIC init_library.c
> gfortran -c -fPIC library.f90
> gfortran -shared library.o init_library.o -o library.so
> gfortran load_library.c -ldl
> ./a.out
Hello!

Can't preload a function in C program

I wrote some code in nasm and I'm trying to implement it in a C program as a replacement of strlen through a shared library, but it doesn't work.
nasm code:
section .text
global strlen:function
strlen:
mov rax, 42
ret
C code:
#include <stdio.h>
size_t strlen(const char *s);
int main()
{
printf("%zu\n", strlen("foobar"));
return (0);
}
I compile the C program just using gcc without any arguments, and I create the shared library with the following commands:
nasm -f elf64 strlen.asm
gcc -shared -fPIC -o libasm.so strlen.o
Finally, I include the shared library:
export LD_PRELOAD=`pwd`/libasm.so
But it displays '6' where I expect it to display '42'.
I don't think the problem comes from my library, because I get segmentation fault when I execute the ls command with LD_PRELOAD.
I'm working on Ubuntu 16.04.
This is not related to nasm at all. A C equivalent of your strlen() function does not work either.
$ cat strlen.c
#include <stddef.h>
size_t strlen(const char *s)
{
return 43;
}
$ cat s.c
#include <stdio.h>
size_t strlen(const char *s);
int main()
{
printf("%zu\n", strlen("foobar"));
return 0;
}
$ make s
cc s.c -o s
$ gcc -shared -fPIC -o strlen.so strlen.c
$ LD_PRELOAD=$PWD/strlen.so ./s
6
What is happening here is that gcc is using its own built-in version of strlen() that cannot be overridden. If the C program that calls strlen() is recompiled to not use this built-in version of strlen(), then your shared library can override it.
$ rm s
$ make s CFLAGS=-fno-builtin-strlen
cc -fno-builtin-strlen s.c -o s
$ LD_PRELOAD=$PWD/strlen.so ./s
43
$ LD_PRELOAD=$PWD/libasm.so ./s
42

Using dlsym() to look for a variable in a statically linked library

We have a program that links in a number of static libraries, which may or may not define a number of symbols depending on compilation options. On OS X, we use dlsym(3) with a NULL handle to obtain the symbol addresses. However, on Linux, dlsym(3) always returns NULL.
Consider a trivial program (sources below) that links in a static library containing a function and a variable and tries to print their addresses. We can check that the program contains the symbols:
$ nm -C test | grep "test\(func\|var\)"
0000000000400715 T testFunc
0000000000601050 B testVar
However, when the program is run, neither can be located:
$ ./test
testVar: (nil)
testFunc: (nil)
Is what we are trying to do possible on Linux, using glibc's implementation of dlsym(3)?
 Makefile
(Sorry about the spaces)
LDFLAGS=-L.
LDLIBS=-Wl,--whole-archive -ltest -Wl,--no-whole-archive -ldl
libtest.o: libtest.c libtest.h
libtest.a: libtest.o
test: test.o libtest.a
clean:
-rm -f test test.o libtest.o libtest.a
libtest.h
#pragma once
extern void *testVar;
extern int testFunc(int);
libtest.c
#include "libtest.h"
void *testVar;
int testFunc(int x) { return x + 42; }
test.c
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
int main(int argc, char *argv[]) {
void *handle = dlopen(NULL, 0);
void *symbol = dlsym(handle, "testVar");
printf("testVar: %p\n", symbol);
symbol = dlsym(handle, "testFunc");
printf("testFunc: %p\n", symbol);
return 0;
}
You should link your program with -rdynamic (or --export-dynamic for ld(1)) so
LDFLAGS += -rdynamic -L.
Then all the symbols are in the dynamic symbol table, the one used by dlsym
BTW, the visibility attribute could be of interest.

How to access the global variable of executable in shared library (c - linux)

I want to access the global variable of executable in shared library? I have tried to compile using option -export-dynamic but no luck.
I have tried with extern key word. this also not working.
Any help or suggestion would be appreciable.
Environment c - Linux
executable:-
tst.c
int tstVar = 5;
void main(){
funInso();
printf("tstVar %d", tstVar);
}
lib:-
tstLib.c
extern int tstVar;
void funInso(){
tstVar = 50;
}
Since my code is very big, I just gave the sample which I have used in my program.
It should work. BTW, your tst.cis lacking a #include <stdio.h>. And its main should return an ìnt and end with e.g. return 0;.
With
/* file tst.c */
#include <stdio.h>
int tstVar = 5;
extern void funInso(void);
int main(){
funInso();
printf("tstVar %d\n", tstVar);
return 0;
}
and
/* file tstlib.c */
extern int tstVar;
void funInso(){
tstVar = 50;
}
I compiled with gcc -Wall -c tst.c the first file, I compiled with gcc -Wall -c tstlib.c the second file. I made it a library with
ar r libtst.a tstlib.o
ranlib libtst.a
Then I linked the first file to the library with gcc -Wall tst.o -L. -ltst -o tst
The common practice is to have with your library a header file tstlib.h which would contain e.g.
#ifndef TSTLIB_H_
#define TSTLIB_H_
/* a useful explanation about tstVar. */
extern int tstVar;
/* the role of funInso. */
extern void funInso(void);
#endif /*TSTLIB_H */
and have both tst.c and tstlib.c contain an #include "tstlib.h"
If the library is shared, you should
compile the library file in position independent code mode
gcc -Wall -fpic -c tstlib.c -o tstlib.pic.o
link the library with -shared
gcc -shared tstlib.pic.o -o libtst.so
Note that you can link a shared object with other libraries. You could have appended -lgdbm to that command, if your tstlib.c is e.g. calling gdbm_open hence including <gdbm.h>. This is one of the many features shared libraries give you that static libraries don't.
link the executable with -rdynamic
gcc -rdynamic tst.o -L. -ltst -o tst
Please take time to read the Program Library Howto
your tstVar variable could be defined in the lib. and you can share this variable via functions:
setFunction: to edit this variable
void setFunction (int v)
{
tstVar = v;
}
getFunction: to return the variable
int getFunction ()
{
return tstVar
}

Resources