I've just recently learned how to check if a file exists, in C, without opening it with open(). What I wanted to ask is if there is an option or a flag or other instruction that checks if a file exists, but blocks the process until the file exists. Something like this:
while(access("socket", F_OK) !=0);
But without all this processing cycle... Something like select but to check if a file exists.
There's no way to do this portably, and on some filesystems (e.g. certain network filesystems) there simply isn't any way to do it at all without periodically checking for the file's existence.
That said, there are nonportable approaches which can cover the majority of platforms in wide use:
OS X: FSEvents
Linux: inotify
*BSD: kqueue
Windows: ReadDirectoryChanges
No, there is no standard function that does what you're asking for.
The way you're checking is poor form in that it will consume as much CPU time as it possibly can while checking. At a minimum, you want to add a delay in your loop, such as:
while(access("socket", F_OK) !=0) sleep(1);
A better solution is to monitor a directory for changes. There isn't any standard function to do this but there are various operating system specific methods. See Monitor directory for new files only for some possibilities.
Related
I want to implement a C program in Linux (Ubuntu distro) that mimics tail -f. Note that I do not want to actually call tail -f from my C code, rather implement its behaviour. At the moment I can think of two ways to implement it.
When the program is called, I seek to the end of file. Afterwards, I would read to the end of file periodically and print whatever I read if it is not empty.
The second method which can potentially be more efficient is to again, seek to the end of file. But, this time I "somehow" listen for changes to that file and read to the end of file, only if I it is changed.
With that being said, my question is how to implement the second approach and if someone can share if it is worth the effort. Also, are these the only two options?
NOTE: Thanks for the comments, the question is changed based on them.
There is no standardized mechanism for monitoring changes to a file, so you'll need to implement a "polling" solution anyway (that is, when you hit the end of file, wait a short amount of time and try again.)
On Linux, you can use the inotify family of system calls, but be aware that it won't always work. It doesn't work for special files or remote filesystems, for example, and it may not work for some local filesystems. It is complicated in the case of symlinks. And so on. There is a Windows equivalent, but I believe it suffers from some of the same issues.
So even if you use a notification system, you'll need the polling solution as a backup, and since OS notifications are not guaranteed to be reliable (that is, if the system is under load, notifications might be dropped), you'll need to poll on timeout even if you are using a notification system.
You might want to take a look at the implementation of the GNU tail utility (http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c) to see how the special cases are handled.
You can implement the requirement by following steps:
1) fopen with 'a+' mode;
2) select the file discriptor opened (need do convert from FILE * to file descriptor) and do the read.
The flag O_DIRECTORY can be used with the syscalls open(2) and openat(2) to avoid denial-of-service vulnerabilities when opening directories. However: How can I avoid the same kind of race conditions for regular files?
Some background information: I am trying to develop some kind of backup tool. The programs walks over a directory tree, reads all regular files and only stats other files. If I first call fstatat(2) for each directory entry, test the result for regular files and open them with openat(2), then there is a race condition between the syscalls. An attacker could replace the regular file with a FIFO, and my program would hang on the FIFO.
How can I avoid this race condition? For directories, there is O_DIRECTORY, for symbolic links, O_PATH can be used. However, I have found no solution for regular files. I only need a solution that works on recent Linux versions.
If your only concern is fifos, O_NONBLOCK will prevent blocking and allow you to open a fifo even if it has a no writers (see http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html for where this is specified). However, there are also a few other concerns:
Device nodes
Fake files in Linux /proc with bad properties
...
Since these normally can't be created in arbitrary locations by non-root users, O_NOFOLLOW should be sufficient to avoid following symlinks to them.
With that said, on modern Linux there is an even safer solution: perform the initial open with O_PATH|O_NOFOLLOW, then perform stat on /proc/self/fd/%d to check the file type. You can then open /proc/self/fd/%d and be completely certain it corresponds to the same file you just stat'd.
Note that on sufficiently new Linux, you don't need to use /proc/self/fd/%d to reach the file to which you obtained an inode handle with O_PATH. You can use fstat and openat on it directly to "stat" it and get a descriptor to a real open file description, respectively. However O_PATH file descriptors had a lot of broken/unimplemented corner cases like this in the range of late 2.6.x (when they were first added) to 3.8 or so, and I find the /proc method the most reliable. Of course you could always try the direct method and fallback to /proc if it fails.
Open with O_RDONLY|O_NONBLOCK, check that the result isn't -1, then do an fstat() on the resulting file descriptor and compare st_mode (and possibly st_dev and st_ino) with what you expected.
Remember to set the AT_SYMLINK_NOFOLLOW flag on your fstatat.
The Linux function access() allows me to check file permissions for the current user.
Is there a similar function that gives me the same information - but instead of checking the current user it checks the permissions of any given system user?
Something like int access_for(const char *pathname, uid_t uid, int mode); or whatever
I can't use seteuid() as I need this for a multithreaded process (POSIX threads) which would affect all threads at the same time. That's why I need to check file permissions myself.
Edit: The process itself is known/assumed to have at least the privileges of the relevant user. So, in theory I could also walk the file system and calculate the rights by hand, but I'd need something much more efficient as the check needs to be done several (up to hundreds) times per second. Possible?
not sure how it could work. if you're running as user X, you couldn't reliably check if user Y has access to something, because the check would be done under YOUR permissions. You may not have access to something that Y does, meaning you'd get a false negative.
Beware of TOCTOU. If you check NOW that a file can be accessed, it doesn't mean that NOW it can (or can't), because the time it took you to read those words between "NOW" and "NOW", the file privileges may well have changed.
So, the CORRECT solution is to run in a thread/process as the user that you want to access the file as. Otherwise, you run a risk of "the file privileges changed while you were working" problem.
Of course, this applies to any type of access to "things that may be restricted based on who I am running as".
On Linux, fundamentally all set*id operations are thread-local. This is wrong and non-conforming (the standard specifies that a process, not a thread, has ids that are set by these functions) and the userspace code (in libc) has to work around the issue via delicate and error-prone logic to change all the thread uids in a synchronized way. However, you may be able to invoke the syscall directly (via syscall()) to change the ids for just one thread.
Also, Linux has the concept of "filesystem uid" set by the setfsuid function. If I'm not mistaken, libc leaves this one thread-local, since it's not specified by any standard (and thus does not have any requirements imposed on it) and since the whole purpose of this function is thread-local use like what you're doing. I think this second option is much better if it works.
Finally, there's one solution that's completely portable but slow: fork then use seteuid in the child, call access there, pass the result back to the parent, and _exit.
Assuming Linux, or more generally a sufficiently POSIX compliant system, is there a ready made method of checking if opening a file with a given name would succeed? Most optimistically I am seeking an implementation of a function with the same prototype as open(2)
int test_open(const char *pathname, int flags);
which would return result according to anticipated success or failure of open(2) system call with the same parameters, but without actually creating or opening any file. It should be suitably licensed (reusable in proprietary software project) open source.
The open(2) manual page lists many reasons for open(2) failing. One errno value can decode multiple reasons, and the errno is different between Linux and POSIX. But nevertheless roughly speaking:
I think in general the following cases as itemized by errno are most relevant: EACCESS, EEXIST, ENOENT, EISDIR, ENOTDIR (both POSIX and Linux).
Less important: ELOOP, EMFILE, ENFILE, ENAMETOOLONG, ENODEV, ENXIO, EOVERFLOW, EPERM, EROFS, ETXTBSY, EWOULDBLOCK (POSIX adds EAGAIN).
Irrelevant (more transient conditions): ENOMEM, EINTR, ENOSPC (POSIX adds EIO, ENOSR).
(I am now unable to quickly find online POSIX manual page for open(), I am personally referring to POSIX manual pages installed in my Linux machine - I will edit the question when I find online link.)
Background and Expectations: My application/system configuration architecture mandates that an input value need to be validated before storing it permanently. Only after the validation and storage steps are performed, is the file going to be used for writing. Accepting bad values would be huge inconvenience (also trying to actually change to use bad file path would disturb the operation). I cannot or do not want to make exception for this special case (it is just one of over a hundred of configuration values).
I would prefer to not introduce side effects for the validation by creating a file (the flags for open() include O_CREAT). It is obvious that the check I am seeking for cannot be implemented 100% reliably in the most general case, which is the underlying reason for my categorizing the possible error conditions into three groups. We could have a very educated guess by analyzing the directory permissions, existence of directories, and whether there is already something with the same name which hinders opening the file, and whether the file name makes sense (my group 1 conditions). (Group 2 checks for number of symbolic links, file descriptor limits, name length limit, O_NOATIME permission, writability of the file system, and maybe EWOULDBLOCK and POSIX EAGAIN cases could be done but they are more cumbersome and probably less portable to do, and are expected to be less likely to happen unless evil input, which were the reasons for categorizing them less important).
P.S. I added tag c which is my programming language now, but the language is not very relevant.
There is no fail-proof way to do that, because (as Jite commented) some other process could have changed the environment (e.g. removed the parent directory, or filled up the filesystem, exceeded the disk quota, ....) between your test_open and the further open or creat syscall. Or the disk (or the media containing the filesystem, e.g. some USB stick) could have burned or have been unplugged.
The good practice is to check the result of open and use errno when it has failed.
You could use access to check a few things before. But since there is no fail-proof way, why bother?
You might validate the directory part of your file path using the realpath(3) function .... But even that is useless, some other process could have created or deleted the directory between your test_open and the real open
I am currently trying to check wether the copy of a file from a directory to another is done.
I would like to know if the target file is still being copied.
So I would like to get the number of file descriptors openned on this file.
I use C langage and don't really find a way to resolve that problem.
If you have control of it, I would recommend using the copy-move idiom on the program doing the copying:
cp file1 otherdir/.file1.tmp
mv otherdir/.file1.tmp otherdir/file1
The mv just changes some filesystem entries and is atomic and very fast compared to the copy.
If you're able to open the file for writing, there's a good chance that the OS has finished the copy and has released its lock on it. Different operating systems may behave differently for this, however.
Another approach is to open both the source and destination files for reading and compare their sizes. If they're of identical size, the copy has very likely finished. You can use fseek() and ftell() to determine the size of a file in C:
fseek(fp, 0L, SEEK_END);
sz = ftell(fp);
In linux, try the lsof command, which lists all of the open files on your system.
edit 1: The only C language feature that comes to mind is the fstat function. You might be able to use that with the struct's st_mtime (last modification time) field - once that value stops changing (for, say, a period of 10 seconds), then you could assume that file copy operation has stopped.
edit 2: also, on linux, you could traverse /proc/[pid]/fd to see which files are open. The files in there are symlinks, but C's readlink() function could tell you its path, so you could see whether it is still open. Using getpid(), you would know the process ID of your program (if you are doing a file copy from within your program) to know where to look in /proc.
I think your basic mistake is trying to synchronize a C program with a shell tool/external program that's not intended for synchronization. If you have some degree of control over the program/script doing the copying, you should modify it to perform advisory locking of some sort (preferably fcntl-based) on the target file. Then your other program can simply block on acquiring the lock.
If you don't have any control over the program performing the copy, the only solutions depend on non-portable hacks like lsof or Linux inotify API.
(This answer makes the big, big assumption that this will be running on Linux.)
The C source code of lsof, a tool that tells which programs currently have an open file descriptor to a specific file, is freely available. However, just to warn you, I couldn't make any sense out of it. There are references to reading kernel memory, so to me it's either voodoo or black magic.
That said, nothing prevents you from running lsof through your own program. Running third-party programs from your own program is normally something you try to avoid for several reasons, like security (if a rogue user changes lsof for a malicious program, it will run with your program's privileges, with potentially catastrophic consequences) but inspecting the lsof source code, I came to the conclusion that there's no public API to determine which program has which file open. If you're not afraid of people changing programs in /usr/sbin, you might consider this.
int isOpen(const char* file)
{
char* command;
// BE AWARE THAT THIS WILL NOT WORK IF THE FILE NAME CONTAINS A DOUBLE QUOTE
// OR IF IT CAN SOMEHOW BE ALTERED THROUGH SHELL EXPANSION
// you should either try to fix it yourself, or use a function of the `exec`
// family that won't trigger shell expansion.
// It would be an EXTREMELY BAD idea to call `lsof` without an absolute path
// since it could result in another program being run. If this is not where
// `lsof` resides on your system, change it to the appropriate absolute path.
asprintf(&command, "/usr/sbin/lsof \"%s\"", file);
int result = system(command);
free(command);
return result;
}
If you also need to know which program has your file open (presumably cp?), you can use popen to read the output of lsof in a similar fashion. popen descriptors behave like fopen descriptors, so all you need to do is fread them and see if you can find your program's name. On my machine, lsof output looks like this:
$ lsof document.pdf
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
SomeApp 873 felix txt REG 14,3 303260 5165763 document.pdf
As poundifdef mentioned, the fstat() function can give you the current modification time. But fstat also gives you the size of the file.
Back in the dim dark ages of C when I was monitoring files being copied by various programs I had no control over I always:
Waited until the target file size was >= the source size, and
Waited until the target modification time was at least N seconds older than the current time. N being a number such a 5, and set larger if experience showed that was necessary. Yes 5 seconds seems extreme, but it is safe.
If you don't know what the target file is then the only real choice you have is #2, but user a larger N to allow for the worse case network and local CPU delays, with a healthy safety factor.
using boost libs will solve the issue
boost::filesystem::fstream fileStream(filePath, std::ios_base::in | std::ios_base::binary);
if(fileStream.is_open())
//not getting copied
else
//Wait, the file is getting copied