Get filesystem creation date in C - c

I need to know the creation datetime of the filesystem on a disk (in a Linux machine) with C. I would like to avoid using shell commands, such as
tune2fs -l /dev/sdb2 | grep 'Filesystem created:'
and make a parser.
Thanks

From a program coded in C (or in any language capable of calling C routines) you would use the stat(2) system call (or, with recent kernels and some file systems, the statx(2) one) to query the creation time of a given file (or directory). Of course, commands like ls(1) or stat(1) are using internally that stat(2) system calll.
There is no standard, and file system neutral, way to get the creation time of a given file system. That information is not always kept. I guess that FAT filesystems, or distributed file systems such as NFS, don't keep that.
You might use stat(2) on the mount point of that file system.
The statfs(2) system call retrieves some filesystem information, but does not give any time stamps.
For ext4 file systems, see ext4(5) and use proc(5). You might parse /proc/mounts and some /proc/fs/ext4/*/ directory. The pseudofiles in /proc/ can be parsed quickly and usually won't involve physical disk IO.
You could also work at the ext2/3/4 disk partition level, on an unmounted file ext[234] system, with a library like (or programs from) e2fsprogs. You should not access (even just read) a disk partition containing some file system if that file system is mounted.
(your question should give some motivation and context)

Related

How to create files which share an extent?

The Linux programmer's manual manpage fallocate(2) states:
If the FALLOC_FL_UNSHARE flag is specified in mode, shared file data extents will be made private to the file to guarantee that a subsequent write will not fail due to lack of space. Typically, this will be done by performing a copy-on-write operation on all shared data in the file. This flag may not be supported by all filesystems.
That's cool, but… How do I create shared file data extents in the first place?
Shared data extents are created when the underlying filesystem supports reflinks (example: XFS and BTRFS) and you perform a cp with the --reflink flag or use the ioctl_ficlonerange(2) syscall.
Looking at the kernel code, I see FALLOC_FL_UNSHARE_RANGE being handled only in case of XFS, so maybe this flag to fallocate works only on XFS as of now.

What is the fastest way to detect file size is not zero without knowing the file descriptor?

To explain shortly why I need this,
I am currently doing the detection by stat(2). I don't have control over the file descriptor (may get used up by some other thread as my code is getting injected to replace syscalls) , so i can't use fstat(2) (which is faster). I need to do this check a lot of times, so is there a faster way to do the same thing?
I am checking the same file in different processes which do not have a parent child relation.
You should probably benchmark it for yourself.
I've measured
//Real-time System-time
272.58 ns(R) 170.11 ns(S) //lseek
366.44 ns(R) 366.28 ns(S) //fstat
812.77 ns(R) 711.69 ns(S) //stat("/etc/profile",&sb)
on my Linux laptop. It fluctuates a little between runs but lseek is usually a bunch of ns faster than fstat, but you also need a fd for it and opening is quite expensive at about 1.6µs, so stat is probably the best choice for your case.
As tom-karzes has noted, stat should dependent on the number of directory components in the path. I tried it on a PATH_MAX long "/foo/foo/.../foo" directory and there I'm getting about 80µs.
The most efficient approach, knowing the filesystem you are searching in, is to open the block device associated and search (block by block) the inode table, and check the actual size from the inodes there (open the block device, so you get the inodes from the in-memory images, and not from the disk). This allows you to get all the zero length inodes of a filesystem in a quick and dirty way. The drawback is that you first need to get the info of the filesystem, and then to access the block device directly, which is normally forbidden for a non-root process. After that, you have to search the filesystem to get the names of the files involved, just in case you need to do something on those files.
By the way, your assumption of not being able to use fstat(2) on a shared file descriptor with another thread is wrong, as the stat system call operates on an open file descriptor, and doesn't do anything on the file ---it's nonblocking---, and the system warrants that the inode is locked while accessing the stat structure.
The approach of using lseek(2) is not valid in this case, because it actually moves the file pointer to the end of file, and then back to the saved place, and this requires two system calls to do and undo the move, and there are many race scenarios that can happen if another thread uses another system call (does a write(2), between the two) while you have the file pointer at another place.
Unix (incl. all posix systems linux, bsd, etc.) warrants that a nonblocking system call (as stat(2) is) is atomic in nature, blocking the inode of the file while the process (or thread) is executing the system call. So no other thread can be using the file while your stat(2) system call is getting the data. Even on blocking calls, unix warrants that a different system call made to the same descriptor will be chained to be executed and the process/thread will have to wait until the stat(2) syscall ends.
The problem on fstat(2) is that it has to solve all the path elements until it gets to the final inode of the file (this is where the length of the file is stored) and this is done in a one by one basis. Until it doesn't get to the final inode, no lock is made to the final inode (indeed, it is unknown until we get to it, so we cannot block it until we finish the namei() resolving) and then it solves as the original stat(2).
CONCLUSION
Use stat(2) with the other thread file descriptor whithout fearing about data corruption, it's not possible to happen. Don't hesitate to do this, as nothing is going to happen to the inode of the file while you are gathering the stat info.

Find most recently accessed file from give files in C

How to get most recently accessed file in Linux?
I used stat() call checking for st_atime, but it is not updating if i open and read the file.
You can check if your filesystem is mounted with the noatime or relatime option:
greek0#orest:/home/greek0$ cat /proc/mounts
/dev/md0 / ext3 rw,noatime,errors=remount-ro,data=ordered 0 0
...
These mount options are often used because they increase filesystem performance. Without them, every single read of a file turns into a write to the disk (for updating the atime).
In general, you can't rely on atime to have any useful meaning on most computers.
If it's Ok to only detect accesses to files that happen while your program is running, you can look into inotify. It provides a method to be notified of currently ongoing filesystem accesses.
If that doesn't satisfy your requirements, I'm afraid you're out of luck.

Direct access to hard disk with no FS from C program on Linux

I want to access the whole hard disk directly from a C program. There's no FS on it and never's gonna be one.
I just want to open /dev/sda (for example) and do I/O at the block/sector level of the disk.
I'm planning to write some programs for learning C programming in the Linux environment (I know C language, Python, Perl and Java) but lack confidence with the Linux environment.
For my learning purposes I'm thinking about playing with kyoto-cabinet and saving the value corresponding to the computed hash directly into a "block/sector" of the hard disk, recording the pair: "hash, block/sector reference" into a kyoto-cabinet hash database file.
I don't know if this is feasible using standard C I/O functions or otherwise I'd have to write a "device driver" or something like...
As mentioned elsewhere, under *NIX systems, block devices like /dev/sda can be accessed as plain files. Note that if file system is mounted from the device, opening it as file for writing would fail.
If you want to play with block devices, I would advise to first use the loop device, which presents a plain file as a block device. For example:
dd if=/dev/zero of=./loop_file_10MB bs=1024 count=10K
losetup /dev/loop0 $PWD/loop_file_10MB
After that, /dev/loop0 would behave as if it was a block device, but all information written would be stored in the file.
As device files for drives (e.g. /dev/sda) are block devices, this means you can open, seek and use the file almost like a normal file.
Yes, as others have noted, you can simply open the block device.
However, it's a really good idea to do IO (writes anyway) on block boundaries and whole blocks. You can use something like pread() and pwrite() to do these IO, or mmap some or all of the device.
There are a bunch of ioctls which can be used, see "man sd" for some more info. They don't seem to all be documented in the same place.
In linux/fs.h BLKROSET and a bunch of other ioctls are defined, you have to look around to find out how to use them. You can do useful things like find out how big the device is, and what the block size is.
The source code of the util-linux-ng package is your friend, it contains examples.

Alternatives to using stat() to get file type?

Are there any alternatives to stat (which is found on most Unix systems) which can determine the file type? The manpage says that a call to stat is expensive, and I need to call it quite often in my app.
The alternative is fstat() if you already have the file open (so you have a file descriptor for it). Or lstat() if you want to find out about symbolic links rather than the file the symlink points to.
I think the man page is exaggerating the cost; it is not much worse than any other system call that has to resolve the name of the file into an inode. It is more costly than getpid(); it is less costly than open().
The "file type" that stat() gives you is whether the file is a regular file or something like a device file or directory, among other things like its size and inode number. If that's what you need to know, then you must use stat().
If what you actually need to know is the type of the file's contents -- e.g. text file, JPEG image, MP3 audio -- then you have two options. You can guess based on the filename extension (if it ends in ".mp3", the file probably contains MP3 audio), or you can use libmagic, which actually opens the file and reads some of its contents to figure out what it is. The libmagic approach is more expensive (if you're trying to avoid stat(), you probably want to avoid open() too), but less prone to error (in case that ".mp3" file is actually a JPEG image, for example).
Under Linux with some filesystems the file type (regular, char device, block device, directory, pipe, sym link, ...) is stored in the linux_dirent struct, which is what the kernel supplies applications directory entries in via the getdents system call. If the only thing in the stat structure you needed was the file type and you needed to get that for all or many entries of a directory, you could use getdents directly (rather than readdir) and attempt to get the file type out of that, only using stat if you found an invalid file type in linux_dirent. Depending on the your application's filesystem usage pattern this could be faster than using stat if you are using Linux, but stat should be fast in many cases.
Stat's speed has mostly to do with locating the data that is being asked for on disk. If you are traversing a directory recursively stat-ing all of the files then each stat should end up being fairly quick overall because most of the work getting the data stat needs ends up cached before you ask the kernel for it by a previous call to stat. If on the other hand you stat the same number of files randomly distributed around the system then the kernel will likely have to read from disk several directories for each file you are going to call stat on.
fstat should always be very fast since the kernel should already have the data you're asking for in RAM, as it needs to access it for the file to be in the open state, and the kernel won't have to go through the trouble of traversing the path of the filename to see if each component is in RAM or on disk and possibly reading in a directory from disk (but likely not having to), only to discover that it has the data that you are asking for in RAM.
That being said, calling stat on an open file should be faster than calling it on an unopened file.
Are you aware of the "magic" file on *nix systems? By querying a file from the command line with something like file myfile.ext you can get the real file type.
This is done by reading the contents of the file rather than looking at its extension, and is widely used on *nix (Linux, Unix, ...) systems.
If your application is expected to run on Linux systems, why don't you try inotify(7). It is definitely faster than stating many files.

Resources