Is there a way to intercept file handling requests in Linux?

Is there a way to intercept file handling requests in Linux? - c

I was wondering if there's a way to intercept filesystem requests:
I'm trying to write a logger in C that can log newly created, modified and deleted files under Linux.
It would be good if the file entry could be logged in ~realtime.
Any help would be really appreciated, thanks.

I would recommend to use kernel space for this task. To be more precisely - use VFS. I mean you need to modify all the required filesystem syscalls (open, write, etc) in Linux kernel. Look for sys_open() procedure in kernel sources and try to modify it. Then look for other syscalls (read, write, etc).
But if you don't know how to program kernel space - try some user-space solution like kaylum wrote.

Related

Linux Kernel - Read/Write to a File

I'm working on a LKM which needs to retrieve and write a certain set of information to files. I looked up common ways to do so, but could not find a working one for Linux 4.x. I also found out that it is possible to retrieve system calls from memory and effectively call them.
As I found currently no better way I'd be interested if it'd be feasible to find the system call table, and call open, read/write and close this way.

This is strongly discouraged in most situations.
https://www.linuxjournal.com/article/8110 was a really good read for me the first time I thought I had to do this as well.
From within the Linux kernel, however, reading data out of a file for configuration information is considered to be forbidden. This is due to a vast array of different problems that could result if a developer tries to do this.
Indeed this is possible to do using system calls from within the kernel, but the practice of calling system calls from within the kernel is also generally discouraged. They're designed as interfaces for userspace applications to ask things of the kernel, not for the kernel to get itself to do work.
What kind of files do you want to manipulate from within the kernel? If the kind of file you'd like to manipulate is provided by the proc filesystem or the sysfs filesystem or the dev filesystem, you can modify the contents of the file from within the kernel (since the kernel provides these to userspace itself) -- this should be done NOT with file manipulation calls. If it's a normal userspace file, almost never do you want the kernel to be able to modify it.
If you provide more specifics I'd be interested to hear them, but this is usually a bad idea.

What is the best method to detect which files are used/modified/created/deleted by a process?

I want to write software that will detect all used/created/modified/deleted files during the execution of a process (and its child processes). The process has not yet run - the user provides a command line which will later be subprocessed via bash, so we can do things before and after execution, and control the environment the command is run in.
I have thought of four methods so far that could be useful:
Parse the command line to identify files and directories mentioned. Assume all files explicitly mentioned are used. Check directories before/after for created/deleted files. MD5 existing files before/after to see any are modified. This works on all operating systems and environments, but obviously has serious limitations (doesnt work when command is "./script.sh")
Run the process via another process like strace (dtruss for OSX, and there are equivalent windows programs), which listens for system calls. Parse output file to find files used/modified/deleted/created. Pros that its more sensitive than MD5 method and can deal with script.sh. Cons that its very OS specific (dtruss requires root privileges even if the process being run does not - outputs of tools all different). Could also create huge log files if there are a lot of read/write operations, and will certainly slow things down.
Integrate something similar to the above into the kernel. Obviously still OS specific, but at least now we are calling the shots, creating common output format for all OS's. Wouldn't create huge log files, and could even stop hooking syscalls to, say, read() after process has requested the first read() to the file. I think this is what the tool inotify is doing, but im not familiar with it at all, nor kernel programming!
Run the process using the LD_PRELOAD trick (called DYLD_INSERT_LIBRARIES on OSX, not sure if it exists in Windows) which basically overwrites any call to open() by the process with our own version of open() which logs what we're opening. Same for write, read, etc. It's very simple to do, and very performant since you're essentially teaching the process to log itself. The downside is that it only works for dynamically-linked process, and i have no idea of the prevalence of dynamic/statically linked programs. I dont even know if it is possible before execution to tell if a process is dynamically or statically linked (with the intention of using this method by default, but falling back to a less-performant method if its not possible).
I need help choosing the optimal path to go down. I have already implemented the first method because it was simple and gave me a way to work on the logging backend (http://ac.gt/log) but really i need to upgrade to one of the other methods. Your advice would be invaluable :)

Take a look to the source code of "strace" (and its -f to trace children). It does basically what you are trying to do. It captures all the system calls of the process (or its childs) so you can grep for operations like "open", etc.
The following link provides some examples of implementing your own strace by using the ptrace system call:
https://blog.nelhage.com/2010/08/write-yourself-an-strace-in-70-lines-of-code/

File in both KLM and user space

I remembering reading this concept somewhere. I do not remember where though.
I have a file say file.c, which along with other files I compile along with some other files as a library for use by applications.
Now suppose i compile the same file and build it with a Kernel module. Hence now the same file object is in both user space and kernel space and it allows me to access kernel data structures without invoking a system call. I mean i can have api's in the library by which applications can access kernel data structures without system calls. I am not sure if I can write anything into the kernel (which i think is impossile in this manner), but reading some data structures from kernel this way would be fine?
Can anyone give me more details about this approach. I could not find anything in google regarding this.

I believe this is a conceptually flawed approach, unless I misunderstand what you're talking about.
If I understand you correctly, you want to take the same file and compile it twice: once as a module and once as a userspace program. Then you want to run both of them, so that they can share memory.
So, the obvious problem with that is that even though the programs come from the same source code, they would still exist as separate executables. The module won't be its own process: it only would get invoked when the kernel get's going (i.e. system calls). So by itself, it doesn't let you escape the system call nonsense.
A better solution depends on what your goal is: do you simply want to access kernel data structures because you need something that you can't normally get at? Or, are you concerned about performance and want to access these structures faster than a system call?
For (1), you can create a character device or a procfs file. Both of these allow your userspace programs to reach their dirty little fingers into the kernel.
For (2), you are in a tough spot, and the problem gets a lot nastier (and more insteresting). To solve the speed issue, it depends a lot on what exact data you're trying to extract.
Does this help?

There are two ways to do this, the most common being what's called a Character Device, and the other being a Block Device (i.e. something "disk-like").
Here's a guide on how to create drivers that register chardevs.

How to ensure that when my process is operating on a file, no other process tries to write to it or delete it?

If my process is trying to read from a file, then how do I ensure from my code (C Language) that no other process either writes to it or deletes it (include system commands for deleting the file)?
Also, can this be achieved on all OS (Windows, Linux, Solaris, HP-UX, VxWorks etc)?

Edit: I'll answer for Unix/Linux
As gspr and others said, take a look at file locking using fcntl, flock, etc. However, be warned that those are ADVISORY LOCKING methods.
What does this mean? It means you can warn other processes that you are currently accesing a file, or a portion of it, but you can't forcibly keep them from ignoring you and writing all over your file.
There are no COMPULSORY locking primitives. You can use permissions to your advantage, but you'll never have full guarantees -- the root user can always override your limitations. I don't think there's a way to work around that.

For POSIX systems, take a look at fcntl. flock could also be of interest, although I don't think it's a part of POSIX.

You have to open the file with READ sharing permissions, which means anyone else trying to open it can only get read access. As #Pointy says, this is OS specific and you'll probably have to code this separately for Windows, Linux, etc. However, most modern OSs should support this.

Copy the file, make your changes, then rename it back over the original.
Another method, if the file doesn't yet exist, is to create the file using exclusive flags, then delete it from the filesystem. Perform your writing, then hard link it back into the filesystem (it presently exists only as an open inode). You can make use of /proc for the source path.

Linux kernel code that uses procfs: what should I be aware of?

I have a very nice idea for a kernel patch, and I want to conduct some research and see code examples before I shape my idea.
I'm looking for interesting code examples that would demonstrate advanced usage of procfs (the Linux /proc file system). By interesting, I mean more than just reading a documented value.
My idea is to provide every process with an easy broadcast mechanism. For example, let's consider a process that runs multiple instances of rsync and wants to check the transfer status (how many bytes have been transfered so far) for each child. Currently, I don't know of any way that can be done.
I intend to provide the process with a minimal interface to write data to the procfs. That data would be placed under the PID directory. For example:
/procfs/1343/data_transfered/incoming
I can think of numerous advantage for this, mainly in the concurrency field.
By the way, if such a mechanism already exists, do tell...

Yes, I've written stuff that pokes around in /proc. I suspect you are unlikely to get linux kernel patches accepted that do anything with proc, unless they are just fixing something that is already there that was broken in some way.*
/sysfs seems to be where things are moving.
/proc was originally for process information, but a lot of misc. driver stuff ended up in there.
*well, maybe they'll take it if whatever you're doing has to do with processes, and isn't in a driver.

Go look at the source code for the procps package for code that uses /proc

http://github.com/tialaramex/leakdice/tree/master
Uses proc to figure out the memory address layout of a process, and dump random pages from its heap (for reasons which are explained in its documentation).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight