Accessing main arguments outside of main on Linux - c

Is it possible to access the arguments to main outside of main (namely in a shared library constructor) on Linux other than by parsing /proc/self/cmdline?

You can do this by putting the constructor in the .init_array section. Functions in the .init_array (unlike .init) are called with the same arguments main will be called with: argc, argv and env.
Here's a simple example. I used LD_PRELOAD simply to avoid complicating the example with code which actually links and uses a shared library, but it would work in a more normal scenario as well.
file: printargs.c
#include <stdio.h>
static int printargs(int argc, char** argv, char** env) {
puts("In printargs:");
for (int i = 0; i < argc; ++i)
printf(" Arg %d (%p) '%s'\n", i, (void*)argv[i], argv[i]);
return 0;
}
/* Put the function into the init_array */
__attribute__((section(".init_array"))) static void *ctr = &printargs;
Build and use the shared library
(If you use -Wall, you will see a warning, because ctr is unused.)
$ gcc -o printargs.so -std=c11 -shared -fpic printargs.c
$ LD_PRELOAD=./printargs.so /bin/echo Hello, world.
In printargs:
Arg 0 (0x7ffc7617102f) '/bin/echo'
Arg 1 (0x7ffc76171039) 'Hello,'
Arg 2 (0x7ffc76171040) 'world.'
Hello, world.
This solution comes from a suggestion by Mike Frysinger in the libc-help mailing list and there is an even more laconic version of this answer here on SO.

Related

Where do Linux shells look for interpreters for ELF binaries? [duplicate]

So everyone probably knows that glibc's /lib/libc.so.6 can be executed in the shell like a normal executable in which cases it prints its version information and exits. This is done via defining an entry point in the .so. For some cases it could be interesting to use this for other projects too. Unfortunately, the low-level entry point you can set by ld's -e option is a bit too low-level: the dynamic loader is not available so you cannot call any proper library functions. glibc for this reason implements the write() system call via a naked system call in this entry point.
My question now is, can anyone think of a nice way how one could bootstrap a full dynamic linker from that entry point so that one could access functions from other .so's?
Update 2: see Andrew G Morgan's slightly more complicated solution which does work for any GLIBC (that solution is also used in libc.so.6 itself (since forever), which is why you can run it as ./libc.so.6 (it prints version info when invoked that way)).
Update 1: this no longer works with newer GLIBC versions:
./a.out: error while loading shared libraries: ./pie.so: cannot dynamically load position-independent executable
Original answer from 2009:
Building your shared library with -pie option appears to give you everything you want:
/* pie.c */
#include <stdio.h>
int foo()
{
printf("in %s %s:%d\n", __func__, __FILE__, __LINE__);
return 42;
}
int main()
{
printf("in %s %s:%d\n", __func__, __FILE__, __LINE__);
return foo();
}
/* main.c */
#include <stdio.h>
extern int foo(void);
int main()
{
printf("in %s %s:%d\n", __func__, __FILE__, __LINE__);
return foo();
}
$ gcc -fPIC -pie -o pie.so pie.c -Wl,-E
$ gcc main.c ./pie.so
$ ./pie.so
in main pie.c:9
in foo pie.c:4
$ ./a.out
in main main.c:6
in foo pie.c:4
$
P.S. glibc implements write(3) via system call because it doesn't have anywhere else to call (it is the lowest level already). This has nothing to do with being able to execute libc.so.6.
I have been looking to add support for this to pam_cap.so, and found this question. As #EmployedRussian notes in a follow-up to their own post, the accepted answer stopped working at some point. It took a while to figure out how to make this work again, so here is a worked example.
This worked example involves 5 files to show how things work with some corresponding tests.
First, consider this trivial program (call it empty.c):
int main(int argc, char **argv) { return 0; }
Compiling it, we can see how it resolves the dynamic symbols on my system as follows:
$ gcc -o empty empty.c
$ objcopy --dump-section .interp=/dev/stdout empty ; echo
/lib64/ld-linux-x86-64.so.2
$ DL_LOADER=/lib64/ld-linux-x86-64.so.2
That last line sets a shell variable for use later.
Here are the two files that build my example shared library:
/* multi.h */
void multi_main(void);
void multi(const char *caller);
and
/* multi.c */
#include <stdio.h>
#include <stdlib.h>
#include "multi.h"
void multi(const char *caller) {
printf("called from %s\n", caller);
}
__attribute__((force_align_arg_pointer))
void multi_main(void) {
multi(__FILE__);
exit(42);
}
const char dl_loader[] __attribute__((section(".interp"))) =
DL_LOADER ;
(Update 2021-11-13: The forced alignment is to help __i386__ code be SSE compatible - without it we get hard to debug glibc SIGSEGV crashes.)
We can compile and run it as follows:
$ gcc -fPIC -shared -o multi.so -DDL_LOADER="\"${DL_LOADER}\"" multi.c -Wl,-e,multi_main
$ ./multi.so
called from multi.c
$ echo $?
42
So, this is a .so that can be executed as a stand alone binary. Next, we validate that it can be loaded as shared object.
/* opener.c */
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
void *handle = dlopen("./multi.so", RTLD_NOW);
if (handle == NULL) {
perror("no multi.so load");
exit(1);
}
void (*multi)(const char *) = dlsym(handle, "multi");
multi(__FILE__);
}
That is we dynamically load the shared-object and run a function from it:
$ gcc -o opener opener.c -ldl
$ ./opener
called from opener.c
Finally, we link against this shared object:
/* main.c */
#include "multi.h"
int main(int argc, char **argv) {
multi(__FILE__);
}
Where we compile and run it as follows:
$ gcc main.c -o main multi.so
$ LD_LIBRARY_PATH=./ ./main
called from main.c
(Note, because multi.so isn't in a standard system library location, we need to override where the runtime looks for the shared object file with the LD_LIBRARY_PATH environment variable.)
I suppose you'd have your ld -e point to an entry point which would then use the dlopen() family of functions to find and bootstrap the rest of the dynamic linker. Of course you'd have to ensure that dlopen() itself was either statically linked or you might have to implement enough of your own linker stub to get at it (using system call interfaces such as mmap() just as libc itself is doing.
None of that sounds "nice" to me. In fact just the thought of reading the glibc sources (and the ld-linux source code, as one example) enough to assess the size of the job sounds pretty hoary to me. It might also be a portability nightmare. There may be major differences between how Linux implements ld-linux and how the linkages are done under OpenSolaris, FreeBSD, and so on. (I don't know).

Is it possible for an LD_PRELOAD to only affect the main executable?

The Actual Problem
I have an executable that by default uses EGL and SDL 1.2 to handle graphics and user input respectively. Using LD_PRELOAD, I have replaced both with GLFW.
This works normally unless the user has installed the Wayland version of GLFW, which depends on EGL itself. Because all the EGL calls are either stubbed to do nothing or call GLFW equivalents, it doesn't work (ie. eglSwapBuffers calls glfwSwapBuffers which calls eglSwapBuffers and so on). I can't remove the EGL stubs because then it would call both EGL and GLFW and the main executable is closed-source so I can't modify that.
Is there any way to make LD_PRELOAD affect the main executable but not GLFW? Or any other solution to obtain the same effect?
The Simplified Problem
I made a simplified example to demonstrate the problem.
Main Executable:
#include <stdio.h>
extern void do_something();
int main() {
do_something();
fputs("testing B\n", stderr);
}
Shared Library:
#include <stdio.h>
void do_something() {
fputs("testing A\n", stderr);
}
Preloaded Library:
#include <stdio.h>
int fputs(const char *str, FILE *file) {
// Do Nothing
return 0;
}
When the preloaded library isn't used, the output is:
testing A
testing B
When it is used, the output is nothing.
I'm looking for a way to make the preloaded library only affect the main executable, that the output would be:
testing A
Thank you!
You can check if the return address is in the executable or the library, and then call either the "real" function or do your stub code, like this:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
#include <stdio.h>
#include <stdlib.h>
static struct {
ElfW(Addr) start, end;
} *segments;
static int n;
static int (*real_fputs)(const char *, FILE *);
static int callback(struct dl_phdr_info *info, size_t size, void *data) {
n = info->dlpi_phnum;
segments = malloc(n * sizeof *segments);
for(int i = 0; i < n; ++i) {
segments[i].start = info->dlpi_addr + info->dlpi_phdr[i].p_vaddr;
segments[i].end = info->dlpi_addr + info->dlpi_phdr[i].p_vaddr + info->dlpi_phdr[i].p_memsz;
}
return 1;
}
__attribute__((__constructor__))
static void setup(void) {
real_fputs = dlsym(RTLD_NEXT, "fputs");
dl_iterate_phdr(callback, NULL);
}
__attribute__((__destructor__))
static void teardown(void) {
free(segments);
}
__attribute__((__noinline__))
int fputs(const char *str, FILE *file) {
ElfW(Addr) addr = (ElfW(Addr))__builtin_extract_return_addr(__builtin_return_address(0));
for(int i = 0; i < n; ++i) {
if(addr >= segments[i].start && addr < segments[i].end) {
// Do Nothing
return 0;
}
}
return real_fputs(str, file);
}
This has some caveats, though. For example, if your executable calls a library function that tail-calls a function you're hooking, then this will incorrectly consider that library call an executable call. (You could mitigate this problem by adding wrappers for those library functions too, that unconditionally forward to the "real" function, and compiling the wrapper code with -fno-optimize-sibling-calls.) Also, there's no way to distinguish whether anonymous executable memory (e.g., JITted code) originally came from the executable or a library.
To test this, save my code as hook_fputs.c, your main executable as main.c, and your shared library as libfoo.c. Then run these commands:
clang -fPIC -shared hook_fputs.c -ldl -o hook_fputs.so
clang -fPIC -shared libfoo.c -o libfoo.so
clang main.c ./libfoo.so
LD_PRELOAD=./hook_fputs.so ./a.out
Implement the interposing library separately for the two cases.
Create a wrapper script or program that uses ldd to find out the exact EGL library version and their paths the target binary is dynamically linked against; then, using ldd on the the GLFW library, to find out whether it is linked against EGL or not. Finally, have it execute the target binary with the path to the appropriate interposing library in LD_PRELOAD environment variable.

How to execute a code whose main function looks like this?

The code is like (real noob question) :
int main(int argc, char **argv){
//some code
}
I know, it means I have to give some arguments while executing in the terminal, but the code does not require any arguments or information from the user. I don't know what to give as the argument?
For example:
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Hello World\n");
}
Compile with GCC,
$gcc prog.c -o prog
$./prog
Hello World
So, If you do not use agave in your code, then, there is no need to provide an argument.
I have to give some arguments while executing in the terminal,
No, you don't have to. You may give some arguments. There are conventions regarding program arguments (but these are just conventions, not requirements).
It is perfectly possible to write some C code with a main without argument, or with ignored arguments. Then you'll compile your program into some executable myprog and you just type ./myprog (or even just myprog if your PATH variable mentions at the right place the directory containing your myprog) in your terminal.
The C11 standard n1570 specifies in ยง5.1.2.2.1 [Program startup] that
The function called at program startup is named main. The implementation declares no
prototype for this function. It shall be defined with a return type of int and with no
parameters:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be
used, as they are local to the function in which they are declared):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent) or in some other implementation-defined manner.
The POSIX standard specifies further the relation between the command line, the execve function, and the main of your program. See also this.
In practice I strongly recommend, in any serious program running on a POSIX system, to give two argc & argv arguments to main and to parse them following established conventions (In particular, I hate serious programs not understanding --help and --version).
You can always pass some number of arguments or pass nothing unless you are checking for the number of arguments and arguments passed (or forcing compiler to do so). Your command interpreter has no idea what your program is going to do with the passed argument or whether the program need any argument. It's your program which takes care of all these things.
For example,
int main(void){
return 0;
}
you can pass any number of arguments to the above program
$ gcc hello.c -o hello
$ ./hello blah blah blah
In case of
int main(int argc, char **argv){
return 0;
}
you can pass no arguments.
$ gcc hello.c -o hello
$ ./hello
For
int main(int argc, char **argv){
if(argc < 3){
printf("You need to pass two arguments to print those on the terminal\n");
exit(0);
}
else{
printf("%s %s\n", argv[1], arv[2]);
}
return 0;
}
You have to pass two arguments because the program checking the number of arguments passed and using them
$ gcc hello.c -o hello
$ ./hello Hello world

dlsym returns NULL, even though the symbol exists

I am using dlsym to look up symbols in my program, but it always returns NULL, which I am not expecting. According to the manpage, dlsym may return NULL if there was an error somehow, or if the symbol indeed is NULL. In my case, I am getting an error. I will show you the MCVE I have made this evening.
Here is the contents of instr.c:
#include <stdio.h>
void * testing(int i) {
printf("You called testing(%d)\n", i);
return 0;
}
A very simple thing containing only an unremarkable example function.
Here is the contents of test.c:
#include <dlfcn.h>
#include <stdlib.h>
#include <stdio.h>
typedef void * (*dltest)(int);
int main(int argc, char ** argv) {
/* Declare and set a pointer to a function in the executable */
void * handle = dlopen(NULL, RTLD_NOW | RTLD_GLOBAL);
dlerror();
dltest fn = dlsym(handle, "testing");
if(fn == NULL) {
printf("%s\n", dlerror());
dlclose(handle);
return 1;
}
dlclose(handle);
return 0;
}
As I step through the code with the debugger, I see the dlopen is returning a handle. According to the manpage, If filename is NULL, then the returned handle is for the main program. So if I link a symbol called testing into the main program, dlsym should find it, right?
Here is the way that I am compiling and linking the program:
all: test
instr.o: instr.c
gcc -ggdb -Wall -c instr.c
test.o: test.c
gcc -ggdb -Wall -c test.c
test: test.o instr.o
gcc -ldl -o test test.o instr.o
clean:
rm -f *.o test
And when I build this program, and then do objdump -t test | grep testing, I see that the symbol testing is indeed there:
08048632 g F .text 00000020 testing
Yet the output of my program is the error:
./test: undefined symbol: testing
I am not sure what I am doing wrong. I would appreciate if someone could shed some light on this problem.
I don't think you can do that, dlsym works on exported symbols. Because you're doing dlsym on NULL (current image), even though the symbols are present in the executable ELF image, they're not exported (since it's not a shared library).
Why not call it directly and let the linker take care of it? There's no point in using dlsym to get symbols from the same image as your dlsym call. If your testing symbol was in a shared library that you either linked against or loaded using dlopen then you would be able to retrieve it.
I believe there's also a way of exporting symbols when building executables (-Wl,--export-dynamic as mentioned in a comment by Brandon) but I'm not sure why you'd want to do that.
I faced the similar issue in my code.
I did the following to export symbols
#ifndef EXPORT_API
#define EXPORT_API __attribute__ ((visibility("default")))
#endif
Now for each of the function definition I used the above attribute.
For example the earlier code was
int func() { printf(" I am a func %s ", __FUNCTION__ ) ;
I changed to
EXPORT_API int func() { printf(" I am a func %s ", __FUNCTION__ ) ;
Now it works.
dlsym gives no issues after this.
Hope this works for you as well.

With MACH-O is there a way to register a function that will run before main?

Under Linux, I can register a routine that will run before main. For example:
#include <stdio.h>
void myinit(int argc, char **argv, char **envp) {
printf("%s: %s\n", __FILE__, __FUNCTION__);
}
__attribute__((section(".init_array"))) typeof(myinit) *__init = myinit;
By compiling this with GCC and linking it in, the function myinit will be run before main.
Is there way to do this under Mac OSX and MACH-O?
Thanks.
You could place the function in __mod_init_func data section of Mach-O binary.
From Mach-O format reference:
__DATA,__mod_init_func
Module initialization functions. The C++ compiler places static constructors here.
example.c
#include <stdio.h>
void myinit(int argc, char **argv, char **envp) {
printf("%s: %s\n", __FILE__, __FUNCTION__);
}
__attribute__((section("__DATA,__mod_init_func"))) typeof(myinit) *__init = myinit;
int main() {
printf("%s: %s\n", __FILE__, __FUNCTION__);
return 0;
}
I build your example with clang on OS X platform:
$ clang -Wall example.c
$ ./a.out
example.c: myinit
example.c: main
Easiest way is to specify the function to be constructor using constructor attribute. The constructor attribute causes the function to be called automatically before execution enters main(). Similarly, the destructor attribute causes the function to be called automatically after main() completes or exit() is called. You can also specify optional priority if you have several functions
e.g. __attribute__((constructor(100)))
#include <stdio.h>
__attribute__((constructor)) void myinit() {
printf("my init\n");
}
int main() {
printf("my main\n");
return 0;
}
__attribute__((destructor)) void mydeinit() {
printf("my deinit\n");
}
$ clang -Wall example.c
$ ./a.out
my init
my main
my deinit
Disclaimer: I generally discourage what I'm about to say. Having code running before or after main makes things less predictable. I'm not sure why you wouldn't just let the first line of main invoke your myinit, but I suppose everyone has a reason. Here goes.
I don't know much about Mach-O, but the simplest way to run code before main, is to link in a C++ class that has a corresponding global instance defined. You can do this independently of your "C" code without having to alter anything else. You can also have this C++ code invoke C functions defined elsewhere in your code. In the example below, I show a simple example of how I would invoke your myinit.
In a standalone .cpp (or .cc) file, declare a very simple C++ class with a constructor that calls your "myinit function".
foo.cpp
// forward declare your myinit function and designate "C" linkage
extern "C" myinit(int, char**, char**);
class CodeToRunBeforeMain
{
public:
CodeToRunBeforeMain()
{
// invoke your myinit function here
myinit(0, NULL, NULL);
}
};
// global instance - constructor will run before main.
CodeToRunBeforeMain g_runBeforeMain;
The above approach doesn't recognize argc, argv, or envp. Hopefully, that isn't important.

Resources