LD_PRELOAD affects new child even after unsetenv("LD_PRELOAD") - c

my code is as follows: preload.c, with the following content:
#include <stdio.h>
#include <stdlib.h>
int __attribute__((constructor)) main_init(void)
{
printf("Unsetting LD_PRELOAD: %x\n",unsetenv("LD_PRELOAD"));
FILE *fp = popen("ls", "r");
pclose(fp);
}
then in the shell (do the 2nd command with care!!):
gcc preload.c -shared -Wl,-soname,mylib -o mylib.so -fPIC
LD_PRELOAD=./mylib.so bash
!!! be carefull with the last command it will result with endless loop of forking "sh -c ls". Stop it after 2 seconds with ^C, (or better ^Z and then see ps).
More info
This problem relate to bash in some way; either as the command that the user run, or as the bash the popen execute.
additional Key factors: 1) perform the popen from the pre-loaded library, 2) probably need to do the popen in the initialization section of the library.
if you use:
LD_DEBUG=all LD_DEBUG_OUTPUT=/tmp/ld-debug LD_PRELOAD=./mylib.so bash
instead of the last command, you will get many ld-debug files, named /tmp/ld-debug.*. One for each forked process. IN ALL THESE FILES you'll see that symbols are first searched in mylib.so even though LD_PRELOAD was removed from the environment.

edit: so the problem/question actually was: howcome can't you unset LD_PRELOAD reliably using a preloaded main_init() from within bash.
The reason is that execve, which is called after you popen, takes the environment from (probably)
extern char **environ;
which is some global state variable that points to your environment. unsetenv() normally modifies your environment and will therefore have an effect on the contents of **environ.
If bash tries to do something special with the environment (well... would it? being a shell?) then you may be in trouble.
Appearantly, bash overloads unsetenv() even before main_init(). Changing the example code to:
extern char**environ;
int __attribute__((constructor)) main_init(void)
{
int i;
printf("Unsetting LD_PRELOAD: %x\n",unsetenv("LD_PRELOAD"));
printf("LD_PRELOAD: \"%s\"\n",getenv("LD_PRELOAD"));
printf("Environ: %lx\n",environ);
printf("unsetenv: %lx\n",unsetenv);
for (i=0;environ[i];i++ ) printf("env: %s\n",environ[i]);
fflush(stdout);
FILE *fp = popen("ls", "r");
pclose(fp);
}
shows the problem. In normal runs (running cat, ls, etc) I get this version of unsetenv:
unsetenv: 7f4c78fd5290
unsetenv: 7f1127317290
unsetenv: 7f1ab63a2290
however, running bash or sh:
unsetenv: 46d170
So, there you have it. bash has got you fooled ;-)
So just modify the environment in place using your own unsetenv, acting on **environ:
for (i=0;environ[i];i++ )
{
if ( strstr(environ[i],"LD_PRELOAD=") )
{
printf("hacking out LD_PRELOAD from environ[%d]\n",i);
environ[i][0] = 'D';
}
}
which can be seen to work in the strace:
execve("/bin/sh", ["sh", "-c", "ls"], [... "DD_PRELOAD=mylib.so" ...]) = 0
Q.E.D.

(The answer is a pure speculation, and may be is incorrect).
Perhaps, when you fork your process, the context of the loaded libraries persists. So, mylib.so was loaded when you invoked the main program via LD_PRELOAD. When you unset the variable and forked, it wasn't loaded again; however it already has been loaded by the parent process. Maybe, you should explicitly unload it after forking.
You may also try to "demote" symbols in mylib.so. To do this, reopen it via dlopen with flags that place it to the end of the symbol resolution queue:
dlopen("mylib.so", RTLD_NOLOAD | RTLD_LOCAL);

the answer from mvds is incorrect!
popen() will spawn child process which inherit the preloaded .so lied in parent process. this child process don't care LD_PRELOAD environment.

Related

Why is this C program doing nothing in Ubuntu?

My very simple C program just hangs and I don’t know why.
I am trying to make a simple executable to handle multiple monotonous actions for me every time I start a new programming session.
So I decided with something simple (below) yet every time I run it, the app just hangs, never returns. So I have to Ctrl-C out of it. I have added printf commands to see if it goes anywhere, but those never appear.
My build command returns no error messages:
gcc -o tail tail.c
Just curious what I am missing.
#include <stdio.h>
#include <unistd.h>
int main() {
chdir("\\var\\www");
return 0;
}
There are at least two problems with the source code:
It is unlikely that you have a sub-directory called \var\www in your current directory — Ubuntu uses / and not \ for path separators.
Even if there was a sub-directory with the right name, your program would change directory to it but that wouldn't affect the calling program.
You should check the return value from chdir() — at minimum:
if (chdir("/var/www") != 0)
{
perror("chdir");
exit(EXIT_FAILURE);
}
And, as Max pointed out, calling your program by the name of a well-known utility such as tail is likely to lead to confusion. Use a different name.
Incidentally, don't use test as a program name either. That, too, will lead to confusion as it is a shell built-in as well as an executable in either /bin or /usr/bin. There is also a program /bin/cd or /usr/bin/cd on your machine — it will check that it can change directory, but won't affect the current directory of your shell. You have to invoke it explicitly by the full pathname to get it to run at all because cd is another shell built-in.
Two things:
First, that's not what Linux paths look like
Second, check the return value from chdir()
ie
if (chdir("/var/www") != 0)
printf("failed to change directory");
Finally, the effect of chdir() lasts for the duration of the program. It will not change the current directory of your shell once this program finishes.
The other answers adequately cover the issues in your C code. However, the reason you are seeing it hang is because you chose the name tail for your program.
In Linux, tail is a command in /usr/bin in most setups, and if you just type tail at the command line, the shell searches the $PATH first, and runs this. Without any parameters, it waits for input on its stdin. You can end it by pressing control-d to mark the end of file.
You can bypass the $PATH lookup by typing ./tail instead.
$ tail
[system tail]
$ ./tail
[tail in your current directory]
It is a good idea to use ./ as a habit, but you can also avoid confusion by not naming your program the same as common commands. Another name to avoid is test which is a shell built-in for testing various aspects of files, but appears to do nothing as it reports results in its system return code.

How to pass run time arguments to a function in c through a shell script

I have a shell script which has to take arguments from the command line and pass it to a function in C. I tried to search but didn't find understandable solutions. Kindly help me out.
Should the arguments be passed via an option as a command in the shell script?
I have a main function like this:
int main(int argc, char *argv[])
{
if(argc>1)
{
if(!strcmp(argv[1], "ABC"))
{
}
else if(!strcmp(argv[1], "XYZ"))
{
}
}
}
How to pass the parameters ABC/XYZ from the command line through a shell script which in turn uses a makefile to compile the code?
You cannot meaningfully compare strings with == which is a pointer equality test. You could use strcmp as something like argc>1 && !strcmp(argv[1], "XYZ"). The arguments of main have certain properties, see here.
BTW, main's argc is at least 1. So your test argc==0 is never true. Generally argv[0] is the program name.
However, if you use GNU glibc (e.g. on Linux), it provides several ways for parsing program arguments.
There are conventions and habits regarding program arguments, and you'll better follow them. POSIX specifies getopt(3), but on GNU systems, getopt_long is even more handy.
Be also aware that globbing is done by the shell on Unix-like systems. See glob(7).
(On Windows, things are different, and the command line might be parsed by some startup routine à la crt0)
In practice, you'll better use some system functions for parsing program arguments. Some libraries provide a way for that, e.g. GTK has gtk_init_with_args. Otherwise, if you have it, use getopt_long ...
Look also, for inspiration, into the source code of some free software program. You'll find many of them on github or elsewhere.
How to pass the parameters ABC/XYZ from the command line through a shell script
If you compile your C++ program into an executable, e.g. /some/path/to/yourexecutable, you just have to run a command like
/some/path/to/yourexecutable ABC
and if the directory /some/path/to/ containing yourexecutable is in your PATH variable, you can simply run yourexecutable ABC. How to set that PATH variable (which you can query using echo $PATH in your Unix shell) is a different question (you could edit some shell startup file, perhaps your $HOME/.bashrc, with a source code editor such as GNU emacs, vim, gedit, etc...; you could run some export PATH=.... command with an appropriate, colon-separated, sequence of directories).
which in turn uses a makefile to compile the code?
Then you should look into that Makefile and you'll know what is the executable file.
You are using and coding on/for Linux, so you should read something about Linux programming (e.g. ALP or something newer; see also intro(2) & syscalls(2)...) and you need to understand more about operating systems (so read Operating Systems: Three Easy Pieces).
See following simple example:
$ cat foo.c
#include <stdio.h>
int main(int argc, char ** argv)
{
int i;
for (i = 0; i < argc; ++i) {
printf("[%d] %s\n", i, argv[i]);
}
return 0;
}
$ gcc foo.c
$ ./a.out foo bar
[0] ./a.out
[1] foo
[2] bar
$

Execlp vs Execl

Is there any occasion in which is better to use execl instead of execlp?
I think that maybe when a program is in two different folders using execlp could lead to confusion but I don't know if it is the only case.
I ask because one could think that writing execlp("ls", ...) is easier than writing execl("/bin/ls", ...).
Security
Looking programs up via PATH is convenient, but it can also be insecure. If a directory in a user's PATH is world writable, it's possible to inject a malicious program into the PATH lookup. This would affect execlp but not execl.
For example, if you had a PATH like /foo/bar/bin:/home/you/bin:/usr/bin:/bin and /foo/bar/bin was world writable, someone with access to that machine could copy a malicious program to /foo/bar/bin/ls. Then executing ls would run /foo/bar/bin/ls rather than /bin/ls. They'd be able to execute commands as you and gain greater access.
For this reason, it's often a good idea to refer to specific executables in known locations. Or to hard wire a secure PATH in the executable.
Compatibility
While there is a common set of Unix commands and features specified by POSIX, many programs rely on extensions. If your program uses those extensions, grabbing the first one in the PATH might not be a good idea.
For example, here on OS X the installed utilities in /bin and /usr/bin are BSD-flavored. But I have GNU versions installed earlier in my PATH. A program designed to run on OS X would want to explicitly use, for example, /bin/ls or /usr/bin/tar to be sure they get a known version of those utilities.
$ /usr/bin/tar --version
bsdtar 2.8.3 - libarchive 2.8.3
$ tar --version
tar (GNU tar) 1.29
Both execl() and execlp() work fine and similarly if your executables are in different folders or in the same folder, but you need to set the $PATH if different folders.
execl() is needed for executing executables (like ls) from command line as you can't go with execlp() in that case. I added a snapshot below.
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Usage Msg: ./a.out userdefined_executable\n");
return;
}
//execl(argv[1], argv[1], NULL); // it works
execlp(argv[1], argv[1], NULL); // it doesn't work
return 0;
}
// Input will be like this, here "p1" is an user-defined executable.
// xyz#xyz:~/stack_overflow$ ./a.out p1

How do C programs pass whitespace arguments to the libc system(3) calls?

When a C program calls system() to run a Unix command, I know it's
possible to pass arguments to the command, and according to a
StackOverflow answer (from a very high-rep user), the system() call
uses the shell to execute the
command.
It surprised me to see system("ls -lh >/dev/null 2>&1"); as an
example system() call for a C program, since it looks like this is
using the same whitespace delimited "words" as the shell uses
interactively.
From my standpoint as a sysadmin, I'd like to understand what the
provisos and pitfalls are when a system() call is going to be executed
from within a C program on some files or commands on my system.
Passing whitespace-containing filenames into a shell script is
very issue-prone;
are there similar issues when a C program is calling a command?
Or to make the point even blunter (though less exact): Is a C program
written by a novice just as likely to break on whitespace-containing
filenames as a shell script?
There is nothing to do in C source here, the string you pass to system() was run in shell context. The shell will parse that string, your C program doesn't.
If you look at system() function prototype:
#include <stdlib.h>
int system(const char *command);
The argument passed to system() is a string. It has nothing to do with whitespaces characters in string, it gets a string and passes that string to other system call. The same as you did:
sh -c 'ls -l'
Here system() use execve():
$ cat <<\CODE | gcc -xc -
#include <stdlib.h>
int main(void) {
system("ls -l");
return 0;
}
CODE
$ strace -fe execve ./a.out
execve("./a.out", ["./a.out"], [/* 64 vars */]) = 0
Process 16281 attached
[pid 16281] execve("/bin/sh", ["sh", "-c", "ls -l"], [/* 64 vars */]) = 0
Process 16282 attached
[pid 16282] execve("/bin/ls", ["ls", "-l"], [/* 64 vars */]) = 0
...
Yes and no.
The system() function is a C library call that runs a shell to start an other program, and waits for its result. It is useful for the common need of wanting to run an external command to check if some external condition is satisfied. It does indeed use the shell to start the external command.
It is, not a system call. A system call is a direct request to the kernel; things like 'give me more memory', 'open a file' and 'write this stuff to a file descriptor' are. The system() function is implemented in terms of (at least) three system calls:
fork(), to spawn off a child process.
One of the exec() family, called by the child, to replace that child process by another executable
wait(), called by the parent, to wait for the child process to finish and read its exit state.
The exec() call starts the process without involving any shell, and therefore no whitespace expansion occurs. It is, in theory, possible to bypass the system() function, and then you wouldn't have whitespace issues. That's a lot of work, though, and not usually worth it, because there's a lot more that can go wrong there.
Which brings me to my point: if you're worried about novice people dealing with shell scripts, you shouldn't let them deal with C compilers. It is possible to do a lot more wrong in C than in shell...
If a C program uses system() to invoke another program, it must create a command to be parsed by the shell (specifically, /bin/sh).
The implementation of system(const char *command) will normally end up calling something like execl('/bin/sh', 'sh', '-c', command, (char*)NULL) in the child process (and the execl library function will end up calling execve system call).

Problem with gcc tracker/make/fork/exec/wait

This is a most singular problem, with many interdisciplinary ramifications.
It focuses on this piece of code (file name mainpp.c):
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int status;
if (fork())
{
FILE *f=fopen("/tmp/gcc-trace","a");
fprintf(f,"----------------------------------------------------------------\n");
int i;
for(i=0;i<argc;i++)
{
fprintf(f,"%s:",argv[i]);
}
wait(&status);
fprintf(f,"\nstatus=%d",status);
fprintf(f,"\n");
fclose(f);
}
else
{
execv("g++.old",argv);
}
sleep(10);
return status;
}
This is used with a bash script:
#!/bin/sh
gcc -g main.c -o gcc
gcc -g mainpp.c -o g++
mv /usr/bin/gcc /usr/bin/gcc.old
mv /usr/bin/g++ /usr/bin/g++.old
cp ./gcc /usr/bin/gcc
cp ./g++ /usr/bin/g++
The purpose of this code ( and a corresponding main.c for gcc) is hopefully clear. it replaces g++ and logs calls to g++ plus all commandline arguments, it then proceeds to call the g++ compiler ( now called g++.old ).
The plan is use this to log all the calls to g++/gcc. ( Since make -n does not trace recursive makes, this is a way of capturing calls "in the wild". )
I tried this out on several programs and it worked well. ( Including compiling the program itself. ) I then tried it out on the project I was interested in, libapt-pkg-dev ( Ubuntu repository ).
The build seemed to go well but when I checked some executables were missing. Counting files in the project directory I find that an unlogged version produces 1373 whereas a logged version produces 1294. Making a list of these files, I discover that all the missing files are executables, shared libraries or object files.
Capturing the standard out of both logged makes and unlogged makes gives the same output.
The recorded return value of all processes called by exec is 0.
I've placed sleeps in various positions in the code. They do not seem to make any difference. ( The code with the traced version seems to compile much faster per file. I suspected that the exec might have caused the program to terminate while leaving gcc running. I thought that might cause failure because some object files might not be finishing when others need them. )
I have only one more diagnostic to run to see if I can diagnose the problem and then I am out of ideas. Suggestions?
I'm not sure if this if this will solve your problem, but have you considered using strace instead of your custom code?
strace executes a command (or attaches to a running process) and lists all the system calls it makes. So for instance, instead of running make directly, you might run:
strace -f -q -e trace=execve make
-f means attach to new processes as they are forked
-q means suppress attach/detach messages
-e trace=execve means only report calls to execve
You can then grep through the output for messages about /usr/bin/gcc.

Resources