Hooking fopen() function throws Segmentation fault - c

I'm trying to log access to a particular directory by hooking the fopen() function and using LD_PRELOAD.
My first question is: Is hooking fopen() enough to log operations that open a file?
My code throws Segmentation fault. In particular, the code is like this (ignoring error checks):
FILE* (my_fopen)(const char filename, const char* mode);
void* libc_handle;
void __attribute__ ((constructor)) init(void){
libc_handle = dlopen("libc.so.6", RTLD_LAZY);
*(void**)(&my_fopen) = dlsym(libc_handle,"fopen");
}
FILE* fopen(const char* filename, const char* mode){
printf("Hello world\n");
return my_fopen(filename, mode);
}
After compiling and specifying the new library in LD_PRELOAD, I ran
ls
and it throws Segmenation Fault. Any idea why that happened? I even tried to remove the printf(), but did not help.

You have some issues in your code, which are fixed in the sample below (I've also added the relevant headers and provided a main to give a complete program):
#include <stdio.h>
#include <dlfcn.h>
FILE* (*my_fopen)(const char *filename, const char* mode);
void* libc_handle;
void __attribute__ ((constructor)) init(void){
libc_handle = dlopen("libc.so.6", RTLD_LAZY);
my_fopen = dlsym(libc_handle,"fopen");
}
FILE* fopen(const char* filename, const char* mode){
printf("Hello, Pax\n");
return my_fopen(filename, mode);
}
int main (void) {
FILE *fout = fopen ("xyzzy.txt","w");
fclose (fout);
return 0;
}
The changes from what you provided are as follows:
The my_fopen function pointer should be exactly that, a pointer. I suspect you may have thought the FILE* made it so but that's not actually correct. To specify a ffunction pointer returning a FILE pointer, you need FILE * (*fn)(blah, blah).
Similarly, the first argument of that function must be a const char *, a pointer in other words. You had it as simply const char.
You don't actually need that convoluted expression for setting the my_fopen pointer (casting, taking address of, de-referencing). You can just use the much simpler my_fopen = .... In fact, I think the casting may be actually what's preventing gcc from reporting an error in this case since it's assuming, if you cast, that you know what you're doing.
You probably should also check the return value of dlopen. I haven't done that in this code but, if you don't find (or can't load) the library for some reason, the line after that will probably cause you grief.
When I compile and run this program on Red Hat Enterprise Linux Workstation release 6.4 (Santiago), I get the output of Hello, Pax and the file xyzzy.txt is created.
And, just as an aside, there are other functions which may be used to gain access to the file-system, things like open, opendir, freopen, creat, mkfifo (I think).
Depending on your needs, you may have some extra work to do.
One thing you may want to consider is that ls may not even use fopen. It can actually be built with just opendir/readdir and stat.
So, let's use a program that we know calls fopen. Enter the following program qqtest.c:
#include <stdio.h>
int main (void) {
FILE *fh = fopen ("xyzzy.txt", "w");
fclose (fh);
return 0;
}
and compile it with gcc -o qqtest qqtest.c, then run it. You should see no output but the file xyzzy.txt should be created. Once you've confirmed that, delete the xyzzy.txt file, then enter the following program qq.c:
#include <stdio.h>
#include <dlfcn.h>
FILE* (*my_fopen)(const char *filename, const char* mode);
void* libc_handle;
void __attribute__ ((constructor)) init(void){
libc_handle = dlopen("libc.so.6", RTLD_LAZY);
my_fopen = dlsym(libc_handle,"fopen");
}
FILE* fopen(const char* filename, const char* mode){
printf("Hello, Pax\n");
return my_fopen(filename, mode);
}
Compile this with gcc -shared -o qq.so qq.c -ldl and then run your qqtest program (changing the shared object path to your own directory of course):
LD_PRELOAD=/home/pax/qq.so ./qqtest
This time, you should see the Hello, Pax string output before the xyzzy.txt file is created, proof that it's calling your wrapper function, which in turn calls the original fopen.
Now, that's all very well but, even once you get this bit working, you have to intercept quite a few different calls to ensure you catch all changes.
That's going to take you quite a while to get done and, as Chris Stratton points out in a comment, the Linux kernel already has the capability to report file-system changes to you.
If your goal is to just track file-system changes rather than educate yourself on how it could be done, look into inotify to see how to do this without having to re-invent the wheel.

Related

Can I intercept normal stdio calls in a C program, do some work and then call the original ones?

I have a bunch of C files that try to read and write CSV and other random data to and from disk using stdio functions like fread(), fwrite(), fseek(). (If it matters, it's for a university assignment where we are to experiment with IO performance using different block sizes, and various structures to track data on disk files and so on.)
What I wanted to do was compile these source files (there are dozens of them)
without the definitions for fopen(), fread(), fwrite() that come from <stdio.h>. I want to supply my own fopen(), fread(), fwrite() where I track some information, like which process tried to read which file, and how many blocks/pages where read and things like that, and then call the normal stdio functions.
I don't want to have to go through every line of every file and change fopen() to my_fopen() .... is there better way to do this at compile time?
I am half way working on a Python program that scans the source files and changes these calls with my functions but it's getting a bit messy and I am kind of lost. I thought maybe there is a better way to do this; if you could point me in the right direction, like what to search for that would be great.
Also I don't want to use some Linux profiling stuff that reports which syscalls where made and what not; I just want to execute some code before calling these functions.
An alternative to the LD_PRELOAD trick (which requires you to write a separate library and works only on Linux) you can use the --wrap option of the GNU linker. See here for an example of this technique.
Main differences with LD_PRELOAD:
no external library needed - it's all in the executable;
no runtime options needed;
works on any platform as long as you are using the GNU toolchain;
works only for the calls that are resolved at link time - dynamic libraries will still use the original functions
No but yes but no. The best way I know of is to LD_PRELOAD a library that provides your own versions of those functions. You can get at the originals by dlopening libc.so (the dlopen NULL trick to get at libc functions isn't applicable here because your library will have already been loaded).
One way of doing it is by redefining all the stdio functions you need. fopen becomes my_fopen, fread becomes my_fread, then have your my_fopen call fopen. This can be done in a header file that you include in the files where you want to replace the calls to fopen. See example below.
main.c:
#include <stdio.h>
#include "my_stdio.h"
int main(void)
{
FILE *f;
char buf[256];
f = fopen("test.cvs", "r");
if(f == NULL)
{
printf("Couldn't open file\n");
return 1;
}
fread(buf, sizeof(char), sizeof(buf), f);
fclose(f);
return 0;
}
my_stdio.c:
#include <stdio.h>
FILE *my_fopen(const char *path, const char *mode)
{
FILE *fp;
printf("%s before fopen\n", __FUNCTION__);
fp = fopen(path,mode);
printf("%s after fopen\n", __FUNCTION__);
return fp;
}
int my_fclose(FILE *fp)
{
int rv;
printf("%s before fclose\n", __FUNCTION__);
rv = fclose(fp);
printf("%s after fclose\n", __FUNCTION__);
return rv;
}
size_t my_fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
{
size_t s;
printf("%s before fread\n", __FUNCTION__);
s = fread(ptr,size,nmemb,stream);
printf("%s after fread\n", __FUNCTION__);
return s;
}
size_t my_fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream)
{
size_t s;
printf("%s before fwrite\n", __FUNCTION__);
s = fwrite(ptr,size,nmemb,stream);
printf("%s after fwrite\n", __FUNCTION__);
return s;
}
my_stdio.h:
#ifndef _MY_STDIO_H_
#define _MY_STDIO_H_
#define fopen my_fopen
#define fclose my_fclose
#define fread my_fread
#define fwrite my_fwrite
#endif /* _MY_STDIO_H_ */
Makefile:
main: main.o my_stdio.o
$(CC) -g -o $# main.o my_stdio.o
main.o: main.c
$(CC) -g -c -o $# $<
my_stdio.o: my_stdio.c my_stdio.h
$(CC) -g -c -o $# $<
Another way: Add -Dfread=my_fread to the Makefile CFLAGS for any .o files you wish to "spy" on. Add in my_fread.o that defines my_fread [which has no -D tricks].
Repeat the above for any functions you wish to intercept. About the same as the LD_PRELOAD [in terms of effectiveness and probably easier to implement]. I've done both.
Or create a my_func.h that does the defines and insert a #include "my_func.h" in each file. Dealer's choice
UPDATE
Forgot about another way. Compile normally. Mangle the symbol names in the target .o's [symbol table] (via a custom program or ELF/hex editor): Change fread into something with the same length that doesn't conflict with anything [you can control this]. Target name: qread or frea_ or whatever.
Add your intercept .o's using the new names.
This might seem "dirty", but what we're doing here is a "dirty" job. This is an "old school" [sigh :-)] method that I've used for .o's for which I didn't have the source and before LD_PRELOAD existed.

Getting directory of binary in C

How do I get the absolute path to the directory of the currently executing command in C? I'm looking for something similar to the command dirname "$(readlink -f "$0")" in a shell script. For instance, if the C binary is /home/august/foo/bar and it's executed as foo/bar I want to get the result /home/august/foo.
Maybe try POSIX realpath() with argv[0]; something like the following (works on my machine):
#include <limits.h> /* PATH_MAX */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
char buf[PATH_MAX];
char *res = realpath(argv[0], buf);
(void)argc; /* make compiler happy */
if (res) {
printf("Binary is at %s.\n", buf);
} else {
perror("realpath");
exit(EXIT_FAILURE);
}
return 0;
}
One alternative to argv[0] and realpath(3) on Linux is to use /proc/self/exe, which is a symbolic link pointing to the executable. You can use readlink(2) to get the pathname from it. See proc(5) for more information.
argv[0] is allowed to be NULL by the way (though this usually wouldn't happen in practice). It is also not guaranteed to contain the path used to run the command, though it will when starting programs from the shell.
I have come to the conclusion that there is no portable way for a commpiled executable to get the path to its directory. The obvious alternative is to pass an environment variable to the executable telling it where it is located.

Segmentation fault on file write

When I am trying to print to a file it gives a segmentation fault. How can I print date and time to file?
#include <time.h>
#include <stdio.h>
main()
{
FILE *fp;
time_t mytime;
mytime=time(NULL);
fp=("sys.txt","w+");
fprintf(fp,"%s",ctime(&mytime));
fclose(fp);
return 0;
}
You forgot to (a) call fopen() and (b) pay attention to your compiler's warnings. If you didn't get compiler warnings, turn them on or get a better compiler.
A segmentation fault usually occurs when your program tries to access memory that it doesn't own or have access permission to.
This problem occurs in your program because you have incorrectly tried to open a file. You have to use the fopen() call to open a file:
FILE *fopen(const char *path, const char *mode);
In its present state, the program tries to write a string to a file described by a file descriptor containing some random value. That is, it tries to write to a random file that might not exist and that it definitely does not have write access to.
If you compiled your code with gcc, you would see this notifying you of a potential problem:
warning: assignment from incompatible pointer type
Replace fp=("sys.txt","w+"); with fp=fopen("sys.txt","w+");
And read about File Operations in C

unix command result to a variable - char*

How can I assign "pwd" (or any other command in that case) result (present working dir) to a variable which is char*?
command can be anything. Not bounded to just "pwd".
Thanks.
Start with popen. That will let you run a command with its standard output directed to a FILE * that your parent can read. From there it's just a matter of reading its output like you would any normal file (e.g., with fgets, getchar, etc.)
Generally, however, you'd prefer to avoid running an external program for that -- you should have getcwd available, which will give the same result much more directly.
Why not just call getcwd()? It's not part of C's standard library, but it is POSIX, and it's very widely supported.
Anyway, if pwd was just an example, have a look at popen(). That will run an external command and give you a FILE* with which to read its output.
There is a POSIX function, getcwd() for this - I'd use that.
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
char *dir;
dir = getcwd(NULL, 0);
printf("Current directory is: %s\n", dir);
free(dir);
return 0;
}
I'm lazy, and like the NULL, 0 parameters, which is a GNU extension to allocate as large a buffer as necessary to hold the full pathname. (It can probably still fail, if you're buried a few hundred thousand characters deep.)
Because it is allocated for you, you need to free(3) it when you're done. I'm done with it quickly, so I free(3) it quickly, but that might not be how you need to use it.
You can fork and use one of the execv* functions to call pwd from your C program, but getting the result of that would be messy at best.
The proper way to get the current working directory in a C program is to call char* getcwd(char* name, size_t size);

Executing machine code in memory

I'm trying to figure out how to execute machine code stored in memory.
I have the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
FILE* f = fopen(argv[1], "rb");
fseek(f, 0, SEEK_END);
unsigned int len = ftell(f);
fseek(f, 0, SEEK_SET);
char* bin = (char*)malloc(len);
fread(bin, 1, len, f);
fclose(f);
return ((int (*)(int, char *)) bin)(argc-1, argv[1]);
}
The code above compiles fine in GCC, but when I try and execute the program from the command line like this:
./my_prog /bin/echo hello
The program segfaults. I've figured out the problem is on the last line, as commenting it out stops the segfault.
I don't think I'm doing it quite right, as I'm still getting my head around function pointers.
Is the problem a faulty cast, or something else?
You need a page with write execute permissions. See mmap(2) and mprotect(2) if you are under unix. You shouldn't do it using malloc.
Also, read what the others said, you can only run raw machine code using your loader. If you try to run an ELF header it will probably segfault all the same.
Regarding the content of replies and downmods:
1- OP said he was trying to run machine code, so I replied on that rather than executing an executable file.
2- See why you don't mix malloc and mman functions:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>
int main()
{
char *a=malloc(10);
char *b=malloc(10);
char *c=malloc(10);
memset (a,'a',4095);
memset (b,'b',4095);
memset (c,'c',4095);
puts (a);
memset (c,0xc3,10); /* return */
/* c is not alligned to page boundary so this is NOOP.
Many implementations include a header to malloc'ed data so it's always NOOP. */
mprotect(c,10,PROT_READ|PROT_EXEC);
b[0]='H'; /* oops it is still writeable. If you provided an alligned
address it would segfault */
char *d=mmap(0,4096,PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_ANON,-1,0);
memset (d,0xc3,4096);
((void(*)(void))d)();
((void(*)(void))c)(); /* oops it isn't executable */
return 0;
}
It displays exactly this behavior on Linux x86_64 other ugly behavior sure to arise on other implementations.
Using malloc works fine.
OK this is my final answer, please note I used the orignal poster's code.
I'm loading from disk, the compiled version of this code to a heap allocated area "bin", just as the orignal code did (the name is fixed not using argv, and the value 0x674 is from;
objdump -F -D foo|grep -i hoho
08048674 <hohoho> (File Offset: 0x674):
This can be looked up at run time with the BFD (Binary File Descriptor library) or something else, you can call other binaries (not just yourself) so long as they are statically linked to the same set of lib's.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
unsigned char *charp;
unsigned char *bin;
void hohoho()
{
printf("merry mas\n");
fflush(stdout);
}
int main(int argc, char **argv)
{
int what;
charp = malloc(10101);
memset(charp, 0xc3, 10101);
mprotect(charp, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);
__asm__("leal charp, %eax");
__asm__("call (%eax)" );
printf("am I alive?\n");
char *more = strdup("more heap operations");
printf("%s\n", more);
FILE* f = fopen("foo", "rb");
fseek(f, 0, SEEK_END);
unsigned int len = ftell(f);
fseek(f, 0, SEEK_SET);
bin = (char*)malloc(len);
printf("read in %d\n", fread(bin, 1, len, f));
printf("%p\n", bin);
fclose(f);
mprotect(&bin, 10101, PROT_EXEC | PROT_READ | PROT_WRITE);
asm volatile ("movl %0, %%eax"::"g"(bin));
__asm__("addl $0x674, %eax");
__asm__("call %eax" );
fflush(stdout);
return 0;
}
running...
co tmp # ./foo
am I alive?
more heap operations
read in 30180
0x804d910
merry mas
You can use UPX to manage the load/modify/exec of a file.
P.S. sorry for the previous broken link :|
It seems to me you're loading an ELF image and then trying to jump straight into the ELF header? http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
If you're trying to execute another binary, why don't you use the process creation functions for whichever platform you're using?
An typical executable file has:
a header
entry code that is called before main(int, char **)
The first means that you can't generally expect byte 0 of the file to be executable; intead, the information in the header describes how to load the rest of the file in memory and where to start executing it.
The second means that when you have found the entry point, you can't expect to treat it like a C function taking arguments (int, char **). It may, perhaps, be usable as a function taking no paramters (and hence requiring nothing to be pushed prior to calling it). But you do need to populate the environment that will in turn be used by the entry code to construct the command line strings passed to main.
Doing this by hand under a given OS would go into some depth which is beyond me; but I'm sure there is a much nicer way of doing what you're trying to do. Are you trying to execute an external file as a on-off operation, or load an external binary and treat its functions as part of your program? Both are catered for by the C libraries in Unix.
It is more likely that that it is the code that is jumped to by the call through function-pointer that is causing the segfault rather than the call itself. There is no way from the code you have posted to determine that that code loaded into bin is valid. Your best bet is to use a debugger, switch to assembler view, break on the return statement and step into the function call to determine that the code you expect to run is indeed running, and that it is valid.
Note also that in order to run at all the code will need to be position independent and fully resolved.
Moreover if your processor/OS enables data execution prevention, then the attempt is probably doomed. It is at best ill-advised in any case, loading code is what the OS is for.
What you are trying to do is something akin to what interpreters do. Except that an interpreter reads a program written in an interpreted language like Python, compiles that code on the fly, puts executable code in memory and then executes it.
You may want to read more about just-in-time compilation too:
Just in time compilation
Java HotSpot JIT runtime
There are libraries available for JIT code generation such as the GNU lightning and libJIT, if you are interested. You'd have to do a lot more than just reading from file and trying to execute code, though. An example usage scenario will be:
Read a program written in a scripting-language (maybe
your own).
Parse and compile the source into an
intermediate language understood by
the JIT library.
Use the JIT library to generate code
for this intermediate
representation, for your target platform's CPU.
Execute the JIT generated code.
And for executing the code you'd have to use techniques such as using mmap() to map the executable code into the process's address space, marking that page executable and jumping to that piece of memory. It's more complicated than this, but its a good start in order to understand what's going on beneath all those interpreters of scripting languages such as Python, Ruby etc.
The online version of the book "Linkers and Loaders" will give you more information about object file formats, what goes on behind the scenes when you execute a program, the roles of the linkers and loaders and so on. It's a very good read.
You can dlopen() a file, look up the symbol "main" and call it with 0, 1, 2 or 3 arguments (all of type char*) via a cast to pointer-to-function-returning-int-taking-0,1,2,or3-char*
Use the operating system for loading and executing programs.
On unix, the exec calls can do this.
Your snippet in the question could be rewritten:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char* argv[])
{
return execv(argv[1],argv+2);
}
Executable files contain much more than just code. Header, code, data, more data, this stuff is separated and loaded into different areas of memory by the OS and its libraries. You can't load a program file into a single chunk of memory and expect to jump to it's first byte.
If you are trying to execute your own arbitrary code, you need to look into dynamic libraries because that is exactly what they're for.

Resources