Re-using an inode field - c

I am in a project where I need to do some book-keeping i.e to indicate whether a particular file has been accessed by a program A. I plan to store this information in the inode as using other additional datastructure would be inefficient.
I plan to reuse the field i_mode in the inode datastructure. Any suggestions. Moreover I don't know how to write to the inode data structure from user space. How do I do that? thanks...

The file system looks after the inode; it won't even let super-user modify the inode directly (though root can always access the unmounted (block or character) device to change it).
Unless you write code to modify the file system - a kernel module - you will not be able to do as you wish. Find another way to do it.

File system is not designed to solve users problem. You want bookkeeping changed files, other want bookkeeping of new/deleted files.
I see only the following options:
inotify
keep status of interested files/directories and check for changes once a time
Just for fun you can consider:
kernel module
implement your own file system

After a bit of googling around saw that the "sticky bit" is not much in use today and we can use it as well as modify it from user space.

Related

simulating memfd_create on Linux 2.6

I'm backporting a piece of code that uses a virtual memory trick involving a file descriptor that gets passed to mmap, but doesn't have a mount point. Having the physical file would be an unnecessary overhead in this application. The original code uses memfd_create which is great.
Since Linux 2.6 doesn't have memfd_create or the O_TMPFILE flag for open, I'm currently creating a file with mkstemp and then unlinking it without closing it first. This works, but it doesn't please me at all.
Is there a better way to get a file descriptor for mmap purposes without ever touching the file system in 2.6?
Before somebody says "XY problem," what I really need is two different virtual memory addresses to the same data in memory. This is implemented by mmap'ing the same anonymous file to two different addresses. Any other "Y" to my "X" also welcome.
Thanks
I considered two approaches:
Creating my temporary under /dev/shm/ rather than /tmp/
Using shm_open to get a file descriptor.
Although irrelevant to the specific problem at hand, /dev/shm/ is not guaranteed to exist on all distributions, so #2 felt more correct to me.
In order to not have to worry about unique names of the shared memory objects, I just generate UUIDs.
I think I'm happy with this.
Shout out to #NominalAnimal.

Ensure that UID/GID check in system call is executed in RCU-critical section

Task
I have a small kernel module I wrote for my RaspBerry Pi 2 which implements an additional system call for generating power consumption metrics. I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it. Otherwise, the call just skips the bulk of its body and returns success.
Background Work
I've read into the issue at length, and I've found a similar question on SO, but there are numerous problems with it, from my perspective (noted below).
Question
The linked question notes that struct task_struct contains a pointer element to struct cred, as defined in linux/sched.h and linux/cred.h. The latter of the two headers doesn't exist on my system(s), and the former doesn't show any declaration of a pointer to a struct cred element. Does this make sense?
Silly mistake. This is present in its entirety in the kernel headers (ie: /usr/src/linux-headers-$(uname -r)/include/linux/cred.h), I was searching in gcc-build headers in /usr/include/linux.
Even if the above worked, it doesn't mention if I would be getting the the real, effective, or saved UID for the process. Is it even possible to get each of these three values from within the system call?
cred.h already contains all of these.
Is there a safe way in the kernel module to quickly determine which groups the user belongs to without parsing /etc/group?
cred.h already contains all of these.
Update
So, the only valid question remaining is the following:
Note, that iterating through processes and reading process's
credentials should be done under RCU-critical section.
... how do I ensure my check is run in this critical section? Are there any working examples of how to accomplish this? I've found some existing kernel documentation that instructs readers to wrap the relevant code with rcu_read_lock() and rcu_read_unlock(). Do I just need to wrap an read operations against the struct cred and/or struct task_struct data structures?
First, adding a new system call is rarely the right way to do things. It's best to do things via the existing mechanisms because you'll benefit from already-existing tools on both sides: existing utility functions in the kernel, existing libc and high-level language support in userland. Files are a central concept in Linux (like other Unix systems) and most data is exchanged via files, either device files or special filesystems such as proc and sysfs.
I would like to modify the system call so that it only gets invoked if a special user (such as "root" or user "pi") issues it.
You can't do this in the kernel. Not only is it wrong from a design point of view, but it isn't even possible. The kernel knows nothing about user names. The only knowledge about users in the kernel in that some privileged actions are reserved to user 0 in the root namespace (don't forget that last part! And if that's new to you it's a sign that you shouldn't be doing advanced things like adding system calls). (Many actions actually look for a capability rather than being root.)
What you want to use is sysfs. Read the kernel documentation and look for non-ancient online tutorials or existing kernel code (code that uses sysfs is typically pretty clean nowadays). With sysfs, you expose information through files under /sys. Access control is up to userland — have a sane default in the kernel and do things like calling chgrp, chmod or setfacl in the boot scripts. That's one of the many wheels that you don't need to reinvent on the user side when using the existing mechanisms.
The sysfs show method automatically takes a lock around the file, so only one kernel thread can be executing it at a time. That's one of the many wheels that you don't need to reinvent on the kernel side when using the existing mechanisms.
The linked question concerns a fundamentally different issue. To quote:
Please note that the uid that I want to get is NOT of the current process.
Clearly, a thread which is not the currently executing thread can in principle exit at any point or change credentials. Measures need to be taken to ensure the stability of whatever we are fiddling with. RCU is often the right answer. The answer provided there is somewhat wrong in the sense that there are other ways as well.
Meanwhile, if you want to operate on the thread executing the very code, you can know it wont exit (because it is executing your code as opposed to an exit path). A question arises what about the stability of credentials -- good news, they are also guaranteed to be there and can be accessed with no preparation whatsoever. This can be easily verified by checking the code doing credential switching.
We are left with the question what primitives can be used to do the access. To that end one can use make_kuid, uid_eq and similar primitives.
The real question is why is this a syscall as opposed to just a /proc file.
See this blogpost for somewhat elaborated description of credential handling: http://codingtragedy.blogspot.com/2015/04/weird-stuff-thread-credentials-in-linux.html

Removing bytes from File in (C) without creating new File

I have a file let's log. I need to remove some bytes let's n bytes from starting of file only. Issue is, this file referenced by some other file pointers in other programs and may these pointer write to this file log any time. I can't re-create new file otherwise file-pointer would malfunction(i am not sure about it too).
I tried to google it but all suggestion for only to re-write to new files.
Is there any solution for it?
I can suggest two options:
Ring bufferUse a memory mapped file as your logging medium, and use it as a ring buffer. You will need to manually manage where the last written byte is, and wrap around your ring appropriately as you step over the end of the ring. This way, your logging file stays a constant size, but you can't tail it like a regular file. Instead, you will need to write a special program that knows how to walk the ring buffer when you want to display the log.
Multiple number of small log filesUse some number of smaller log files that you log to, and remove the oldest file as the collection of files grow beyond the size of logs you want to maintain. If the most recent log file is always named the same, you can use the standard tail -F utility to follow the log contents perpetually. To avoid issues of multiple programs manipulating the same file, your logging code can send logs as messages to a single logging daemon.
So... you want to change the file, but you cannot. The reason you cannot is that other programs are using the file. In general terms, you appear to need to:
stop all the other programs messing with the file while you change it -- to chop now unwanted stuff off the front;
inform the other programs that you have changed it -- so they can re-establish their file-pointers.
I guess there must be a mechanism to allow the other programs to change the file without tripping over each other... so perhaps you can extend that ? [If all the other programs are children of the main program, then if the children all O_APPEND, you have a fighting chance of doing this, perhaps with the help of a file-lock or a semaphore (which may already exist ?). But if the programs are this intimately related, then #jxh has other, probably better, suggestions.]
But, if you cannot change the other programs in any way, you appear to be stuck, except...
...perhaps you could try 'sparse' files ? On (recent-ish) Linux (at least) you can fallocate() with FALLOC_FL_PUNCH_HOLE, to remove the stuff you don't want without affecting the other programs file-pointers. Of course, sooner or later the other programs may overflow the file-pointer, but that may be a more theoretical than practical issue.

How do I open a directory at kernel level using the file descriptor for that directory?

I'm working on a project where I must open a directory and read the files/directories inside at kernel level. I'm basically trying to find out how ls is implemented at kernel level.
Right now I've figured out how to get a file descriptor for a directory using sys_open() and the O_DIRECTORY flag, but I don't know how to read the fd that I receive. If anyone has any tips or other suggestions I'd appreciate it. (Keep in mind this has to be done at kernel level).
Edit:For a long story short, For a school project I am implementing file/directory attributes. Where I'm storring the attributes is a hidden folder at the same level of the file with a given attribute. (So a file in Desktop/MyFolder has an attributes folder called Desktop/MyFolder/.filename_attr). Trust me I don't care to mess around in kernel for funsies. But the reason I need to read a dir at kernel level is because it's apart of project specs.
To add to caf's answer mentioning vfs_readdir(), reading and writing to files from within the kernel is is considered unsafe (except for /proc, which acts as an interface to internal data structures in the kernel.)
The reasons are well described in this linuxjournal article, although they also provide a hack to access files. I don't think their method could be easily modified to work for directories. A more correct approach is accessing the kernel's filesystem inode entries, which is what vfs_readdir does.
Inodes are filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems).
Notice that vfs_readdir() expects a file * parameter. To obtain a file structure pointer from a user space file descriptor, you should utilize the kernel's file descriptor table.
The kernel.org files documentation says the following on doing so safely:
To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup.
An example :
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
// Handling of the file structures is special.
// Since the look-up of the fd (fget() / fget_light())
// are lock-free, it is possible that look-up may race with
// the last put() operation on the file structure.
// This is avoided using atomic_long_inc_not_zero() on ->f_count
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
....
return file;
atomic_long_inc_not_zero() detects if refcounts is already zero or
goes to zero during increment. If it does, we fail fget() / fget_light().
Finally, take a look at filldir_t, the second parameter type.
You probably want vfs_readdir() from fs/readdir.c.
In general though kernel code does not read directories, user code does.

Using C, how can I know when a file is created?

I'm making a program in C for linux that scans a directory every x seconds during a time period for modifications, but I'm having trouble finding out when a file or directory is created. Here are a few options I considered:
Using the stat struct, check if the last status change and data modification timestamps are the same. This brings up the problem that you can create a file, modify it before the program has a chance to check it, which changes the data modification timestamp and no longer sees it as a new file.
Keep a log of the name of every file/directory in the directory and check for new ones. This has the problem where you delete a file and then create a new one with the same name, it doesn't get interpreted as a new file.
Keep a count of the number of file/directories. Similliar problem to the last idea.
With that said, does anyone have any idea on how I can uniquely identify the creation of a file/directory?
You cannot, at least not this way. POSIX has no provisions for storing the file creation time in the file system, like Windows and some other OSes do. It only keeps the status change, access and modification times. Most Unix filesystems do not store that information either.
One of the reasons for this is the existence of hard links, since file timestamps are stored in their inodes and not in the directory references. What would you consider the creation time to be for a file that was created at 10am and then hard-linked into another directory at 11am? What if a file is copied?
Your best, but unfortunately OS-specific, approach would be to use whatever framework is available in your platform to monitor filesystem events, e.g. inotify on Linux and kqueue on FreeBSD and MacOS X...
EDIT:
By the way, Ext4fs on Linux does store inode creation times (crtime). Unfortunately getting to that information from userspace is still at least a bit awkward.
Perhaps you should use inotify?
Check out inotify (Linux-specific).

Resources