Using C, how can I know when a file is created?

Using C, how can I know when a file is created? - c

I'm making a program in C for linux that scans a directory every x seconds during a time period for modifications, but I'm having trouble finding out when a file or directory is created. Here are a few options I considered:
Using the stat struct, check if the last status change and data modification timestamps are the same. This brings up the problem that you can create a file, modify it before the program has a chance to check it, which changes the data modification timestamp and no longer sees it as a new file.
Keep a log of the name of every file/directory in the directory and check for new ones. This has the problem where you delete a file and then create a new one with the same name, it doesn't get interpreted as a new file.
Keep a count of the number of file/directories. Similliar problem to the last idea.
With that said, does anyone have any idea on how I can uniquely identify the creation of a file/directory?

You cannot, at least not this way. POSIX has no provisions for storing the file creation time in the file system, like Windows and some other OSes do. It only keeps the status change, access and modification times. Most Unix filesystems do not store that information either.
One of the reasons for this is the existence of hard links, since file timestamps are stored in their inodes and not in the directory references. What would you consider the creation time to be for a file that was created at 10am and then hard-linked into another directory at 11am? What if a file is copied?
Your best, but unfortunately OS-specific, approach would be to use whatever framework is available in your platform to monitor filesystem events, e.g. inotify on Linux and kqueue on FreeBSD and MacOS X...
EDIT:
By the way, Ext4fs on Linux does store inode creation times (crtime). Unfortunately getting to that information from userspace is still at least a bit awkward.

Perhaps you should use inotify?

Check out inotify (Linux-specific).

Related

Modified time after copying a file

In perl 5, Using stat on a file that was freshly copied from an old file, it seems that mtime is the time in which the old file was initially created, not copied.
How do I get the copy time? ctime seems the closest, however there are warnings in the documentation about compatibility across OSs.

Simple answer: you can't in most cases.
ctime is not the file creation time. Quoting from man 7 inode on a Linux system:
Last status change timestamp (ctime)
stat.st_ctime; statx.stx_ctime
This is the file's last status change timestamp. It is changed
by writing or by setting inode information (i.e., owner, group,
link count, mode, etc.).
Unless
you don't require portability
are running on kernel with support for it
the file is on a file system that has support for it.
Quoting from man 7 inode on a Linux system:
File creation (birth) timestamp (btime)
(not returned in the stat structure); statx.stx_btime
The file's creation timestamp. This is set on file creation and
not changed subsequently.
The btime timestamp was not historically present on UNIX systems
and is not currently supported by most Linux filesystems.

There are a few unspecified details, but it seems that ctime timestamp is the best builtin tool at your disposal. (For Windows, also see the module linked below.)
On Windows, according to perlport (Files and Filesystems), the inode change time time-stamp
... may really be the "creation timestamp" (which it is not in Unix).
This would directly work for you, if the file is created by copying.
In Unix ctime does track the inode change (along with metadata changes), what seems to be what you need. This timestamp can also be obtained simply with the -C file-test operator.
If the copying may update the contents of an existing file I don't see why ctime wouldn't work.
Note the module Win32API::File::Time, with the purpose to
provide maximal access to the file creation, modification, and access times under MSWin32
Please see some caveats in docs. I haven't used it and can't test on Windows now.

How to preserve ownership and permissions when doing an atomic file replace?

So, the normal POSIX way to safely, atomically replace the contents of a file is:
fopen(3) a temporary file on the same volume
fwrite(3) the new contents to the temporary file
fflush(3)/fsync(2) to ensure the contents are written to disk
fclose(3) the temporary file
rename(2) the temporary file to replace the target file
However, on my Linux system (Ubuntu 16.04 LTS), one consequence of this process is that the ownership and permissions of the target file change to the ownership and permissions of the temporary file, which default to uid/gid and current umask.
I thought I would add code to stat(2) the target file before overwriting, and fchown(2)/fchmod(2) the temporary file before calling rename, but that can fail due to EPERM.
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file? Is there a safe way to fall back in this case, or do we necessarily lose the atomic guarantee?

Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file?
No.
In Linux, a process with the CAP_LEASE capability can obtain an exclusive lease on the file, which blocks other processes from opening the file for up to /proc/sys/fs/lease-break-time seconds. This means that technically, you can take the exclusive lease, replace the file contents, and release the lease, to modify the file atomically (from the perspective of other processes).
Also, a process with the CAP_CHOWN capability can change the file ownership (user and group) arbitrarily.
Is there a safe way to [handle the case where the uid or gid does not match the current process], or do we necessarily lose the atomic guarantee?
Considering that in general, files may have ACLs and xattrs, it might be useful to create a helper program, that clones the ownership including ACLs, and extended attributes, from an existing file to a new file in the same directory (perhaps with a fixed name pattern, say .new-################, where # indicate random alphanumeric characters), if the real user (getuid(), getgid(), getgroups()) is allowed to modify the original file. This helper program would have at least the CAP_CHOWN capability, and would have to consider the various security aspects (especially the ways it could be exploited). (However, if the caller can overwrite the contents, and create new files in the target directory -- the caller must have write access to the target directory, so that they can do the rename/hardlink replacement --, creating a clone file on their behalf with empty contents ought to be safe. I would personally exclude target files owned by root user or group, though.)
Essentially, the helper program would behave much like the mktemp command, except it would take the path to the existing target file as a parameter. It would then be relatively straightforward to wrap it into a library function, using e.g. fork()/exec() and pipes or sockets.
I personally avoid this problem by using group-based access controls: dedicated (local) group for each set. The file owner field is basically just an informational field then, indicating the user that last recreated (or was in charge of) said file, with access control entirely based on the group. This means that changing the mode and the group id to match the original file suffices. (Copying ACLs would be even better, though.) If the user is a member of the target group, they can do the fchown() to change the group of any file they own, as well as the fchmod() to set the mode, too.

I am by no means an expert in this area, but I don't think it's possible. This answer seems to back this up. There has to be a compromise.
Here are some possible solutions. Every one has advantages and disadvantages and weighted and chosen depending on the use case and scenario.
Use atomic rename.
Advantage: atomic operation
Disadvantage: possible to not keep owner/permissions
Create a backup. Write file in place
This is what some text editor do.
Advantage: will keep owner/permissions
Disadvantage: no atomicity. Can corrupt file. Other application might get a "draft" version of the file.
Set up permissions to the folder such that creating a new file is possible with the original owner & attributes.
Advantages: atomicity & owner/permissions are kept
Disadvantages: Can be used only in certain specific scenarios (knowledge at the time of creation of the files that would be edited, the security model must allow and permit this). Can decrease security.
Create a daemon/service responsible for editing the files. This process would have the necessary permissions to create files with the respective owner & permissions. It would accept requests to edit files.
Advantages: atomicity & owner/permissions are kept. Higher and granular control to what and how can be edited.
Disadvantages. Possible in only specific scenarios. More complex to implement. Might require deployment and installation. Adding an attack surface. Adding another source of possible (security) bugs. Possible performance impact due to the added intermediate layer.

Do you have to worry about the file that's named being a symlink to a file somewhere else in the file system?
Do you have to worry about the file that's named being one of multiple links to an inode (st_nlink > 1).
Do you need to worry about extended attributes?
Do you need to worry about ACLs?
Does the user ID and group IDs of the current process permit the process to write in the directory where the file is stored?
Is there enough disk space available for both the old and the new files on the same file system?
Each of these issues complicates the operation.
Symlinks are relatively easy to deal with; you simply need to establish the realpath() to the actual file and do file creation operations in the directory containing the real path to the file. From here on, they're a non-issue.
In the simplest case, where the user (process) running the operation owns the file and the directory where the file is stored, can set the group on the file, the file has no hard links, ACLs or extended attributes, and there's enough space available, then you can get atomic operation with more or less the sequence outlined in the question — you'd do group and permission setting before executing the atomic rename() operation.
There is an outside risk of TOCTOU — time of check, time of use — problems with file attributes. If a link is added between the time when it is determined that there are no links and the rename operation, then the link is broken. If the owner or group or permissions on the file change between the time when they're checked and set on the new file, then the changes are lost. You could reduce the risk of that by breaking atomicity but renaming the old file to a temporary name, renaming the new file to the original name, and rechecking the attributes on the renamed old file before deleting it. That is probably an unnecessary complication for most people, most of the time.
If the target file has multiple hard links to it and those links must be preserved, or if the file has ACLs or extended attributes and you don't wish to work out how to copy those to the new file, then you might consider something along the lines of:
write the output to a named temporary file in the same directory as the target file;
copy the old (target) file to another named temporary file in the same directory as the target;
if anything goes wrong during steps 1 or 2, abandon the operation with no damage done;
ignoring signals as much as possible, copy the new file over the old file;
if anything goes wrong during step 4, you can recover from the extra backup made in step 2;
if anything goes wrong in step 5, report the file names (new file, backup of original file, broken file) for the user to clean up;
clean up the temporary output file and the backup file.
Clearly, this loses all pretense at atomicity, but it does preserve links, owner, group, permissions, ACLS, extended attributes. It also requires more space — if the file doesn't change size significantly, it requires 3 times the space of the original file (formally, it needs size(old) + size(new) + max(size(old), size(new)) blocks). In its favour is that it is recoverable even if something goes wrong during the final copy — even a stray SIGKILL — as long as the temporary files have known names (the names can be determined).
Automatic recovery from SIGKILL probably isn't feasible. A SIGSTOP signal could be problematic too; a lot could happen while the process is stopped.
I hope it goes without saying that errors must be detected and handled carefully with all the system calls used.
If there isn't enough space on the target file system for all the copies of the files, or if the process cannot create files in the target directory (even though it can modify the original file), you have to consider what the alternatives are. Can you identify another file system with enough space? If there isn't enough space anywhere for both the old and the new file, you clearly have major issues — irresolvable ones for anything approaching atomicity.
The answer by Nominal Animal mentions Linux capabilities. Since the question is tagged POSIX and not Linux, it isn't clear whether those are applicable to you. However, if they can be used, then CAP_LEASE sounds useful.
How crucial is atomicity vs accuracy?
How crucial is POSIX compliance vs working on Linux (or any other specific POSIX implementation)?

How do I make files save for concurrent C access?

I have several C-programs, which are accessing (read: fprintf/ write fopen) at the same time different files on the file system. What is the best way to do this concurrent access save? should I write some sort of file locks (and whats the best way to do this?) or are there any better reading methods (preferably in the C99 standard lib, additional dependencies would be a problem)? or should I use something like SQLite?
edit:
I am using Linux as operating system.
edit:
I don't really want to write with different processes in same files, I'm dealing with a legacy monolith code, which saves intermediate steps in files for recycling. I want a way to speed the calculations up by running several calculations at the same time, which have the same intermediate results.

You could use fcntl() with F_SETLK or F_SETLKW:
struct flock lock;
...
fcntl( fd, F_SETLKW, &lock );
See more from man page fcntl(3) or this article.

You can make sure that your files do not get corrupted on concurrent writes from multiple threads/processes by using copy-on-the-write technique:
A writer opens the file it would like to update for reading.
The writer creates a new file with a unique name (mkostemps) and copies the original file into the copy.
The writer modifies the copy.
The writer renames the copy to the original name using rename. This happens atomically, so that users of the file either see the old version of it or the new, but never a partially updated file.
See Things UNIX can do atomically for more details.

Re-using an inode field

I am in a project where I need to do some book-keeping i.e to indicate whether a particular file has been accessed by a program A. I plan to store this information in the inode as using other additional datastructure would be inefficient.
I plan to reuse the field i_mode in the inode datastructure. Any suggestions. Moreover I don't know how to write to the inode data structure from user space. How do I do that? thanks...

The file system looks after the inode; it won't even let super-user modify the inode directly (though root can always access the unmounted (block or character) device to change it).
Unless you write code to modify the file system - a kernel module - you will not be able to do as you wish. Find another way to do it.

File system is not designed to solve users problem. You want bookkeeping changed files, other want bookkeeping of new/deleted files.
I see only the following options:
inotify
keep status of interested files/directories and check for changes once a time
Just for fun you can consider:
kernel module
implement your own file system

After a bit of googling around saw that the "sticky bit" is not much in use today and we can use it as well as modify it from user space.

Is it possible to find the modification time for a socket (or file), given the inode?

I have the inode of a socket - taken from /proc/net/tcp for example, and wish to find more data on that socket, specifically the creation or modification time.
I am working in C on linux (2.6 kernel).
This is similar to the question Get file details by inode - but that was from bash. The conclusion there is that there is no easy way, and relies on trawling directories for a match. I was hoping for something more efficient.

I'm afraid not. The file creation time is not held, the three date/time stamps (see man 2 stat) held are the time of last access (atime), the time of last modification (mtime) and the time of the last file status change (ctime).
When the creation time is needed it is common practice to include it somewhere in the file name, obviously not an option with /proc/net/tcp.

It appears that on the systems I investigated, there is little about time of socket creation or modification stored in an accessible manner.
It is possible to find the inode from the entries in /proc/net/tcp, and then search through all file handles in all the processes in /proc//fd for a match.
This doesn't really help though, as the time-stamps there appear to be when that directory is first accessed. i.e. the pseudo-directory is only created when it is queried.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight