Catching a system call just before control enters a shared library - c

I have wrapped a number of system call function like write(), open() etc and LD-PRELOAD is used to override the original system calls. Moreover I have defined a few more functions and made this too a shred library.
I would like to catch all system calls from different application processes to these shared libraries before they enter the shared library. How can i do that?
Thanks

LD_PRELOAD is not necessarily a good way to interpose system calls, because a) it only allows you to intercept library calls and b) it only allows you to intercept library calls. ;)
A) While in general, system calls are wrapped by the shared libC in your system, no one prevents you from calling a system call yourself, e.g., but setting up the right register content and then issuing INT 0x80 on an x86 system. If the program you're interested in does so, you'll never catch those with LD_PRELOAD-based libc-interposition.
B) While in general, most programs use the shared libC in your system to make system calls, sometimes applications are linked statically, which means the libC code is part of the application and does not come from the shared lib. In such cases, LD_PRELOAD also does not help.
A comment already suggested to use strace/ltrace -- my generalized advice would be to have a look at ptrace() which both of these tools use and which should give you what you want without the need of modifying the kernel.

Patch-free User-level Link-time intercepting of system calls and interposing on library functions may do the trick but I have not tested it.

I'm pretty sure the only way you can do this is by modifying the system call table. HIDS systems (such as Samhain) will report this as an intrusion and Linux kernel developers frown upon this, heavily. The implementation details are very specific to the OS (i.e. what works on FreeBSD won't necessarily work on Linux), but the general implementation details are going to be the same. A kernel module might be a better way to go with cleaner, more standardized APIs.

Related

How to prevent a dlopened library from using certain libc functions?

I'm writing a Linux/Unix program that has a lot of implementation in plugins that are dlopened by the program on-demand.
I'd like to prevent these plugin libraries from using some libc functions that mess with global state of the host process (such as manipulating signal handlers and suchlike).
What would be the best way to do this?
As far as I know I can't employ the classical LD_PRELOAD trick here since the libs are dlopened.
In practical terms, you can't. Code running from a library runs with the full privileges of the host application. Don't load libraries that you don't trust to not do stupid things.
You could conceivably examine the library before loading it and (for instance) reject libraries which have unexpected dependencies, or which have relocations for functions which they shouldn't be using. (This could be accomplished using ldd or readelf, for instance.) However, this will never be entirely reliable; there are numerous ways that a malicious library could hide its use of various functions.

Libcurl and curl_global_init in shared library loaded at runtime

I am developing a photo booth application that uses 3 modules to provide printing, capturing, and triggering functionality. The idea is that people can develop modules for it that extend this functionality. These modules are implemented as shared libraries that are loaded at runtime when the user clicks "start".
I am trying to implement a printer module that "prints" to a facebook image gallery. I want to use libcurl for this. My problem is with the initialization function: curl_global_init() The libcurl API documentation states that this function is absolutely not thread safe. From the docs:
This function is not thread safe. You must not call it when any other thread in the program (i.e. a thread sharing the same memory) is running. This doesn't just mean no other thread that is using libcurl. Because curl_global_init() calls functions of other libraries that are similarly thread unsafe, it could conflict with any other thread that uses these other libraries.
Elsewhere in the documentation it says:
The global constant situation merits special consideration when the code you are writing to use libcurl is not the main program, but rather a modular piece of a program, e.g. another library. As a module, your code doesn't know about other parts of the program -- it doesn't know whether they use libcurl or not. And its code doesn't necessarily run at the start and end of the whole program.
A module like this must have global constant functions of its own, just like curl_global_init() and curl_global_cleanup(). The module thus has control at the beginning and end of the program and has a place to call the libcurl functions.
...which seems to address the issue. However, this seems to imply that my module's init() and finalize() functions would be called at the program's beginning and end. Since the modules are designed to be swappable at runtime, there is no way I can do this. Even if I could, my application uses GLib, which per their documentation, it is never safe to assume there are no threads running:
...Since version 2.32, the GLib threading system is automatically initialized at the start of your program, and all thread-creation functions and synchronization primitives are available right away.
Note that it is not safe to assume that your program has no threads even if you don't call g_thread_new() yourself. GLib and GIO can and will create threads for their own purposes...
My question is: is there any way to safely call curl_global_init() in my application? Can I put the calls to curl_global_init() and curl_global_cleanup() in my module's init() and finalize() functions? Do I need to find another HTTP library?
First, you won't really find any other library without these restrictions since they are inherited by libcurl from 3rd party (SSL mostly) libraries with those restrictions. For example OpenSSL.
This said, the thread safe situation for global_init is very unfortunate and something we (in the curl project) really strongly dislike but cannot do much about as long as we use those other libraries. This also means that the exact situation for you depends on exactly which dependency libraries your libcurl is built to use.
You will in most situations be perfectly fine with calling curl_global_init() from your modules init() function the way you suggest. I can't guarantee this to be safe with 100% certainty of course since there are a few unknowns here that I cannot speak to.

Functions available when writing a new system call

For a college assignment we have to add a system call to the Linux kernel. I have "Hello, World" done no problem. In terms of adding a more complicated call, I know (or at least think) I can't use C functions like malloc, but I'm wondering can I use syscall() to use other system calls?
The kernel has its own specific calls for pretty much everything. You don't have access to system calls or <sys/xxxx.h> header files.
For your exmaple, yes, you can't use malloc() but you can use kmalloc()
In older versions of the kernel (2.4) you could use syscall() via: syscallN() macros. I'm pretty sure that's been removed.
In general syscalls() from the kernel is not a good idea. Really system calls are just a way of user space going into the kernel to do something, so if you're already in the kernel there should be a better way to do what you're trying to do.

Writing a POSIX-compliant kernel

I've wanted to write a kernel for some time now. I already have a sufficient knowledge of C and I've dabbled in x86 Assembler. You see, I've wanted to write a kernel that is POSIX-compliant in C so that *NIX applications can be potentially ported to my OS, but I haven't found many resources on standard POSIX kernel functions. I have found resources on the filesystem structure, environment variables, and more on the Open Group's POSIX page.
Unfortunately, I haven't found anything explaining what calls and kernel functions a POSIX-compliant kernel must have (in other words, what kind of internal structure must a kernel have to comply with POSIX). If anyone could find that information, please tell me.
POSIX doesn't define the internal structure of the kernel, the kernel-to-userspace interface, or even libc, at all. Indeed, even Windows has a POSIX-compliant subsystem. Just make sure the POSIX interfaces defined at your link there work somehow. Note, however, that POSIX does not require anything to be implemented specifically in the kernel - you can implement things in the C library using simpler kernel interfaces of your own design where possible, if you prefer.
It just so happens that a lot of the POSIX compliant OSes (BSD, Linux, etc) have a fairly close relationship between many of those calls and the kernel layer, but there are exceptions. For example, on Linux, a write() call is a direct syscall, invoking a sys_write() function in the kernel. However on Windows, write() is implemented in a POSIX support DLL, which translates the file descriptor to a NT handle and calls NtWriteFile() to service it, which in turn invokes a corresponding system call in ntoskrnl.exe. So you have a lot of freedom in how to do things - which makes things harder, if anything :)
The opengroup.org leaves the decisions about kernel syscalls to each implmentation.
write(), for example has to look and behave as stated, but what it calls underneath is not defined. A lot of calls like write, read, lseek are free to call whatever entrypoint they want inside the kernel.
So, no, there really is nothing that says you have to have a certain function name with a defined set of semantics available in the kernel. It just has to available in the C runtime library.

What's the difference between "C system calls" and "C library routines"?

There are multiple sections in the manpages. Two of them are:
2 Unix and C system calls
3 C Library routines for C programs
For example there is getmntinfo(3) and getfsstat(2), both look like they do the same thing. When should one use which and what is the difference?
System calls are operating system functions, like on UNIX, the malloc() function is built on top of the sbrk() system call (for resizing process memory space).
Libraries are just application code that's not part of the operating system and will often be available on more than one OS. They're basically the same as function calls within your own program.
The line can be a little blurry but just view system calls as kernel-level functionality.
Libraries of common functions are built on top of the system call interface, but applications are free to use both.
System calls are like authentication keys which have the access to use kernel resources.
Above image is from Advanced Linux programming and helps to understand how the user apps interact with kernel.
System calls are the interface between user-level code and the kernel. C Library routines are library calls like any other, they just happen to be really commonly provided (pretty much universally). A lot of standard library routines are wrappers (thin or otherwise) around system calls, which does tend to blur the line a bit.
As to which one to use, as a general rule, use the one that best suits your needs.
The calls described in section 2 of the manual are all relatively thin wrappers around actual calls to system services that trap to the kernel. The C standard library routines described in section 3 of the manual are client-side library functions that may or may not actually use system calls.
This posting has a description of system calls and trapping to the kernel (in a slightly different context) and explains the underlying mechanism behind system calls with some references.
As a general rule, you should always use the C library version. They often have wrappers that handle esoteric things like restarts on a signal (if you have requested that). This is especially true if you have already linked with the library. All rules have reasons to be broken. Reasons to use the direct calls,
You want to be libc agnostic; Maybe with an installer. Such code could run on Android (bionic), uClibc, and more traditional glibc/eglibc systems, regardless of the library used. Also, dynamic loading with wrappers to make a run-time glibc/bionic layer allowing a dual Android/Linux binary.
You need extreme performance. Although this is probably rare and most likely misguided. Probably rethinking the problem will give better performance benefits and not calling the system is often a performance win, which the libc can occasionally do.
You are writing some initramfs or init code without a library; to create a smaller image or boot faster.
You are testing a new kernel/platform and don't want to complicate life with a full blown file system; very similar to the initramfs.
You wish to do something very quickly on program startup, but eventually want to use the libc routines.
To avoid a known bug in the libc.
The functionality is not available through libc.
Sorry, most of the examples are Linux specific, but the rationals should apply to other Unix variants. The last item is quite common when new features are introduced into a kernel. For example when kqueue or epoll where first introduced, there was no libc to support them. This may also happen if the system has an older library, but a newer kernel and you wish to use this functionality.
If your process hasn't used the libc, then most likely something in the system will have. By coding your own variants, you can negate the cache by providing two paths to the same end goal. Also, Unix
will share the code pages between processes. Generally there is no reason not to use the libc version.
Other answers have already done a stellar job on the difference between libc and system calls.

Resources