Information about file on Linux system? - c

I want known if a determinate file is in use by process, i.e. if file is open in read-only mode by that process.
I thought about searching through /proc/[pid]/[fd] directory, but this way I waste a lot of time, and I think that doing this is not beautiful.
Is there any way using some Linux API to determinate if X file is open by any process? Or maybe some structures data like /proc but for files?

Not that I know of. The lsof and fuser tools do precisely what you suggest, wander through /proc/*/fd.
Note that it is possible for open files to not have a name, if the file was deleted after being opened, and it is possible for a file to be open without the process holding a file descriptor (through mmap), and even the combination of both (this would be a process-private swap file that is automatically cleaned up on process exit).

Determining if a process is using a file is easy. The inverse less so. The reason is that the kernel does not keep track of the inverse directly. The information that IS kept is:
A file knows how many links refer to itself (inode table)
A processes knows what files it has open (file descriptor table)
This is why lsof's /proc walking is necessary. The file descriptors in use by a particular process are kept in /proj/$PID (among other things), and so lsof can use this (and other things) to spit out all of the pid <-> fd <-> inode relationships.
This is a nice article on lsof. As with any Linux util, you can always check out its source code for all of the details :)

lsof might be the tool you're searching for.
EDIT: I din't realize you are specifically searching for something to be integrated in your application, so my answer appears a little simplistic. But anyway, I think that this question is pretty much related to yours.

Related

How to find processes that holds a file in C

I want to find which processes holds some file in C code (Linux).
One way that comes to my mind is looking at proc/<PID>/fd for all running processes.
However, it would take so long time and because of sweeping all files under the fd files of all processes.
Could you give another method that is more lightweight?
Thank you in advance.
Enumerating all the numerical pseudo-files under /proc, and then examining the fd/ directory for each one, is the standard way of doing this. It is the way that utilities like "lsof" are typically implemented. All this data is held in memory, so accessing it should be fast enough for most purposes.

How do I make files save for concurrent C access?

I have several C-programs, which are accessing (read: fprintf/ write fopen) at the same time different files on the file system. What is the best way to do this concurrent access save? should I write some sort of file locks (and whats the best way to do this?) or are there any better reading methods (preferably in the C99 standard lib, additional dependencies would be a problem)? or should I use something like SQLite?
edit:
I am using Linux as operating system.
edit:
I don't really want to write with different processes in same files, I'm dealing with a legacy monolith code, which saves intermediate steps in files for recycling. I want a way to speed the calculations up by running several calculations at the same time, which have the same intermediate results.
You could use fcntl() with F_SETLK or F_SETLKW:
struct flock lock;
...
fcntl( fd, F_SETLKW, &lock );
See more from man page fcntl(3) or this article.
You can make sure that your files do not get corrupted on concurrent writes from multiple threads/processes by using copy-on-the-write technique:
A writer opens the file it would like to update for reading.
The writer creates a new file with a unique name (mkostemps) and copies the original file into the copy.
The writer modifies the copy.
The writer renames the copy to the original name using rename. This happens atomically, so that users of the file either see the old version of it or the new, but never a partially updated file.
See Things UNIX can do atomically for more details.

How do I open a directory at kernel level using the file descriptor for that directory?

I'm working on a project where I must open a directory and read the files/directories inside at kernel level. I'm basically trying to find out how ls is implemented at kernel level.
Right now I've figured out how to get a file descriptor for a directory using sys_open() and the O_DIRECTORY flag, but I don't know how to read the fd that I receive. If anyone has any tips or other suggestions I'd appreciate it. (Keep in mind this has to be done at kernel level).
Edit:For a long story short, For a school project I am implementing file/directory attributes. Where I'm storring the attributes is a hidden folder at the same level of the file with a given attribute. (So a file in Desktop/MyFolder has an attributes folder called Desktop/MyFolder/.filename_attr). Trust me I don't care to mess around in kernel for funsies. But the reason I need to read a dir at kernel level is because it's apart of project specs.
To add to caf's answer mentioning vfs_readdir(), reading and writing to files from within the kernel is is considered unsafe (except for /proc, which acts as an interface to internal data structures in the kernel.)
The reasons are well described in this linuxjournal article, although they also provide a hack to access files. I don't think their method could be easily modified to work for directories. A more correct approach is accessing the kernel's filesystem inode entries, which is what vfs_readdir does.
Inodes are filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems).
Notice that vfs_readdir() expects a file * parameter. To obtain a file structure pointer from a user space file descriptor, you should utilize the kernel's file descriptor table.
The kernel.org files documentation says the following on doing so safely:
To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup.
An example :
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
// Handling of the file structures is special.
// Since the look-up of the fd (fget() / fget_light())
// are lock-free, it is possible that look-up may race with
// the last put() operation on the file structure.
// This is avoided using atomic_long_inc_not_zero() on ->f_count
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
....
return file;
atomic_long_inc_not_zero() detects if refcounts is already zero or
goes to zero during increment. If it does, we fail fget() / fget_light().
Finally, take a look at filldir_t, the second parameter type.
You probably want vfs_readdir() from fs/readdir.c.
In general though kernel code does not read directories, user code does.

How to know if a file is being copied?

I am currently trying to check wether the copy of a file from a directory to another is done.
I would like to know if the target file is still being copied.
So I would like to get the number of file descriptors openned on this file.
I use C langage and don't really find a way to resolve that problem.
If you have control of it, I would recommend using the copy-move idiom on the program doing the copying:
cp file1 otherdir/.file1.tmp
mv otherdir/.file1.tmp otherdir/file1
The mv just changes some filesystem entries and is atomic and very fast compared to the copy.
If you're able to open the file for writing, there's a good chance that the OS has finished the copy and has released its lock on it. Different operating systems may behave differently for this, however.
Another approach is to open both the source and destination files for reading and compare their sizes. If they're of identical size, the copy has very likely finished. You can use fseek() and ftell() to determine the size of a file in C:
fseek(fp, 0L, SEEK_END);
sz = ftell(fp);
In linux, try the lsof command, which lists all of the open files on your system.
edit 1: The only C language feature that comes to mind is the fstat function. You might be able to use that with the struct's st_mtime (last modification time) field - once that value stops changing (for, say, a period of 10 seconds), then you could assume that file copy operation has stopped.
edit 2: also, on linux, you could traverse /proc/[pid]/fd to see which files are open. The files in there are symlinks, but C's readlink() function could tell you its path, so you could see whether it is still open. Using getpid(), you would know the process ID of your program (if you are doing a file copy from within your program) to know where to look in /proc.
I think your basic mistake is trying to synchronize a C program with a shell tool/external program that's not intended for synchronization. If you have some degree of control over the program/script doing the copying, you should modify it to perform advisory locking of some sort (preferably fcntl-based) on the target file. Then your other program can simply block on acquiring the lock.
If you don't have any control over the program performing the copy, the only solutions depend on non-portable hacks like lsof or Linux inotify API.
(This answer makes the big, big assumption that this will be running on Linux.)
The C source code of lsof, a tool that tells which programs currently have an open file descriptor to a specific file, is freely available. However, just to warn you, I couldn't make any sense out of it. There are references to reading kernel memory, so to me it's either voodoo or black magic.
That said, nothing prevents you from running lsof through your own program. Running third-party programs from your own program is normally something you try to avoid for several reasons, like security (if a rogue user changes lsof for a malicious program, it will run with your program's privileges, with potentially catastrophic consequences) but inspecting the lsof source code, I came to the conclusion that there's no public API to determine which program has which file open. If you're not afraid of people changing programs in /usr/sbin, you might consider this.
int isOpen(const char* file)
{
char* command;
// BE AWARE THAT THIS WILL NOT WORK IF THE FILE NAME CONTAINS A DOUBLE QUOTE
// OR IF IT CAN SOMEHOW BE ALTERED THROUGH SHELL EXPANSION
// you should either try to fix it yourself, or use a function of the `exec`
// family that won't trigger shell expansion.
// It would be an EXTREMELY BAD idea to call `lsof` without an absolute path
// since it could result in another program being run. If this is not where
// `lsof` resides on your system, change it to the appropriate absolute path.
asprintf(&command, "/usr/sbin/lsof \"%s\"", file);
int result = system(command);
free(command);
return result;
}
If you also need to know which program has your file open (presumably cp?), you can use popen to read the output of lsof in a similar fashion. popen descriptors behave like fopen descriptors, so all you need to do is fread them and see if you can find your program's name. On my machine, lsof output looks like this:
$ lsof document.pdf
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
SomeApp 873 felix txt REG 14,3 303260 5165763 document.pdf
As poundifdef mentioned, the fstat() function can give you the current modification time. But fstat also gives you the size of the file.
Back in the dim dark ages of C when I was monitoring files being copied by various programs I had no control over I always:
Waited until the target file size was >= the source size, and
Waited until the target modification time was at least N seconds older than the current time. N being a number such a 5, and set larger if experience showed that was necessary. Yes 5 seconds seems extreme, but it is safe.
If you don't know what the target file is then the only real choice you have is #2, but user a larger N to allow for the worse case network and local CPU delays, with a healthy safety factor.
using boost libs will solve the issue
boost::filesystem::fstream fileStream(filePath, std::ios_base::in | std::ios_base::binary);
if(fileStream.is_open())
//not getting copied
else
//Wait, the file is getting copied

Open system call

I'm studying for my operating systems midterm and was wondering if I can get some help.
Can someone explain the checks and what the kernel does during the open() system call?
Thanks!
Very roughly, you can think of the following steps:
Translate the file name into an inode, which is the actual file system object describing the contents of the file, by traversing the filesystem data structures.
During this traversal, the kernel will check that you have sufficient access through the directory path to the file, and check access on the file itself. The precise checks depend on what modes were passed to open.
Create what's sometimes called an open file descriptor within the kernel. There is one of these objects for each file the kernel has opened on behalf of any process.
Allocate an unused index in the per-process file descriptor table, and point it at the open file descriptor.
Return this index from the system call as the file descriptor.
This description should be essentially correct for opening plain files and/or directories, but things are different for various sorts of special files, in particular for devices.
I would go back to what the prof told you - there a lot of things that happen during open(), depending on what you're opening (i.e. a device, a file, a directory), and unless you write what the professor's looking for, you'll lose points.
That being said, it mostly involves the checks to see if this open is valid (i.e. does this file exist, does the user have permissions to read/write it, etc), then an entry in the kernel handle table is allocated to keep track of the fd and its current file position (and of course, some other things)

Resources