Modified time after copying a file - file

In perl 5, Using stat on a file that was freshly copied from an old file, it seems that mtime is the time in which the old file was initially created, not copied.
How do I get the copy time? ctime seems the closest, however there are warnings in the documentation about compatibility across OSs.

Simple answer: you can't in most cases.
ctime is not the file creation time. Quoting from man 7 inode on a Linux system:
Last status change timestamp (ctime)
stat.st_ctime; statx.stx_ctime
This is the file's last status change timestamp. It is changed
by writing or by setting inode information (i.e., owner, group,
link count, mode, etc.).
Unless
you don't require portability
are running on kernel with support for it
the file is on a file system that has support for it.
Quoting from man 7 inode on a Linux system:
File creation (birth) timestamp (btime)
(not returned in the stat structure); statx.stx_btime
The file's creation timestamp. This is set on file creation and
not changed subsequently.
The btime timestamp was not historically present on UNIX systems
and is not currently supported by most Linux filesystems.

There are a few unspecified details, but it seems that ctime timestamp is the best builtin tool at your disposal. (For Windows, also see the module linked below.)
On Windows, according to perlport (Files and Filesystems), the inode change time time-stamp
... may really be the "creation timestamp" (which it is not in Unix).
This would directly work for you, if the file is created by copying.
In Unix ctime does track the inode change (along with metadata changes), what seems to be what you need. This timestamp can also be obtained simply with the -C file-test operator.
If the copying may update the contents of an existing file I don't see why ctime wouldn't work.
Note the module Win32API::File::Time, with the purpose to
provide maximal access to the file creation, modification, and access times under MSWin32
Please see some caveats in docs. I haven't used it and can't test on Windows now.

Related

Get filesystem creation date in C

I need to know the creation datetime of the filesystem on a disk (in a Linux machine) with C. I would like to avoid using shell commands, such as
tune2fs -l /dev/sdb2 | grep 'Filesystem created:'
and make a parser.
Thanks
From a program coded in C (or in any language capable of calling C routines) you would use the stat(2) system call (or, with recent kernels and some file systems, the statx(2) one) to query the creation time of a given file (or directory). Of course, commands like ls(1) or stat(1) are using internally that stat(2) system calll.
There is no standard, and file system neutral, way to get the creation time of a given file system. That information is not always kept. I guess that FAT filesystems, or distributed file systems such as NFS, don't keep that.
You might use stat(2) on the mount point of that file system.
The statfs(2) system call retrieves some filesystem information, but does not give any time stamps.
For ext4 file systems, see ext4(5) and use proc(5). You might parse /proc/mounts and some /proc/fs/ext4/*/ directory. The pseudofiles in /proc/ can be parsed quickly and usually won't involve physical disk IO.
You could also work at the ext2/3/4 disk partition level, on an unmounted file ext[234] system, with a library like (or programs from) e2fsprogs. You should not access (even just read) a disk partition containing some file system if that file system is mounted.
(your question should give some motivation and context)

Find most recently accessed file from give files in C

How to get most recently accessed file in Linux?
I used stat() call checking for st_atime, but it is not updating if i open and read the file.
You can check if your filesystem is mounted with the noatime or relatime option:
greek0#orest:/home/greek0$ cat /proc/mounts
/dev/md0 / ext3 rw,noatime,errors=remount-ro,data=ordered 0 0
...
These mount options are often used because they increase filesystem performance. Without them, every single read of a file turns into a write to the disk (for updating the atime).
In general, you can't rely on atime to have any useful meaning on most computers.
If it's Ok to only detect accesses to files that happen while your program is running, you can look into inotify. It provides a method to be notified of currently ongoing filesystem accesses.
If that doesn't satisfy your requirements, I'm afraid you're out of luck.

Is it possible to find the modification time for a socket (or file), given the inode?

I have the inode of a socket - taken from /proc/net/tcp for example, and wish to find more data on that socket, specifically the creation or modification time.
I am working in C on linux (2.6 kernel).
This is similar to the question Get file details by inode - but that was from bash. The conclusion there is that there is no easy way, and relies on trawling directories for a match. I was hoping for something more efficient.
I'm afraid not. The file creation time is not held, the three date/time stamps (see man 2 stat) held are the time of last access (atime), the time of last modification (mtime) and the time of the last file status change (ctime).
When the creation time is needed it is common practice to include it somewhere in the file name, obviously not an option with /proc/net/tcp.
It appears that on the systems I investigated, there is little about time of socket creation or modification stored in an accessible manner.
It is possible to find the inode from the entries in /proc/net/tcp, and then search through all file handles in all the processes in /proc//fd for a match.
This doesn't really help though, as the time-stamps there appear to be when that directory is first accessed. i.e. the pseudo-directory is only created when it is queried.

Using C, how can I know when a file is created?

I'm making a program in C for linux that scans a directory every x seconds during a time period for modifications, but I'm having trouble finding out when a file or directory is created. Here are a few options I considered:
Using the stat struct, check if the last status change and data modification timestamps are the same. This brings up the problem that you can create a file, modify it before the program has a chance to check it, which changes the data modification timestamp and no longer sees it as a new file.
Keep a log of the name of every file/directory in the directory and check for new ones. This has the problem where you delete a file and then create a new one with the same name, it doesn't get interpreted as a new file.
Keep a count of the number of file/directories. Similliar problem to the last idea.
With that said, does anyone have any idea on how I can uniquely identify the creation of a file/directory?
You cannot, at least not this way. POSIX has no provisions for storing the file creation time in the file system, like Windows and some other OSes do. It only keeps the status change, access and modification times. Most Unix filesystems do not store that information either.
One of the reasons for this is the existence of hard links, since file timestamps are stored in their inodes and not in the directory references. What would you consider the creation time to be for a file that was created at 10am and then hard-linked into another directory at 11am? What if a file is copied?
Your best, but unfortunately OS-specific, approach would be to use whatever framework is available in your platform to monitor filesystem events, e.g. inotify on Linux and kqueue on FreeBSD and MacOS X...
EDIT:
By the way, Ext4fs on Linux does store inode creation times (crtime). Unfortunately getting to that information from userspace is still at least a bit awkward.
Perhaps you should use inotify?
Check out inotify (Linux-specific).

Alternatives to using stat() to get file type?

Are there any alternatives to stat (which is found on most Unix systems) which can determine the file type? The manpage says that a call to stat is expensive, and I need to call it quite often in my app.
The alternative is fstat() if you already have the file open (so you have a file descriptor for it). Or lstat() if you want to find out about symbolic links rather than the file the symlink points to.
I think the man page is exaggerating the cost; it is not much worse than any other system call that has to resolve the name of the file into an inode. It is more costly than getpid(); it is less costly than open().
The "file type" that stat() gives you is whether the file is a regular file or something like a device file or directory, among other things like its size and inode number. If that's what you need to know, then you must use stat().
If what you actually need to know is the type of the file's contents -- e.g. text file, JPEG image, MP3 audio -- then you have two options. You can guess based on the filename extension (if it ends in ".mp3", the file probably contains MP3 audio), or you can use libmagic, which actually opens the file and reads some of its contents to figure out what it is. The libmagic approach is more expensive (if you're trying to avoid stat(), you probably want to avoid open() too), but less prone to error (in case that ".mp3" file is actually a JPEG image, for example).
Under Linux with some filesystems the file type (regular, char device, block device, directory, pipe, sym link, ...) is stored in the linux_dirent struct, which is what the kernel supplies applications directory entries in via the getdents system call. If the only thing in the stat structure you needed was the file type and you needed to get that for all or many entries of a directory, you could use getdents directly (rather than readdir) and attempt to get the file type out of that, only using stat if you found an invalid file type in linux_dirent. Depending on the your application's filesystem usage pattern this could be faster than using stat if you are using Linux, but stat should be fast in many cases.
Stat's speed has mostly to do with locating the data that is being asked for on disk. If you are traversing a directory recursively stat-ing all of the files then each stat should end up being fairly quick overall because most of the work getting the data stat needs ends up cached before you ask the kernel for it by a previous call to stat. If on the other hand you stat the same number of files randomly distributed around the system then the kernel will likely have to read from disk several directories for each file you are going to call stat on.
fstat should always be very fast since the kernel should already have the data you're asking for in RAM, as it needs to access it for the file to be in the open state, and the kernel won't have to go through the trouble of traversing the path of the filename to see if each component is in RAM or on disk and possibly reading in a directory from disk (but likely not having to), only to discover that it has the data that you are asking for in RAM.
That being said, calling stat on an open file should be faster than calling it on an unopened file.
Are you aware of the "magic" file on *nix systems? By querying a file from the command line with something like file myfile.ext you can get the real file type.
This is done by reading the contents of the file rather than looking at its extension, and is widely used on *nix (Linux, Unix, ...) systems.
If your application is expected to run on Linux systems, why don't you try inotify(7). It is definitely faster than stating many files.

Resources