What is a 'whiteout' (S_IFWHT) in Unix? - c

One of the possible file types that can be obtained using stat(2) is S_IFWHT, also called a whiteout. What is it?

The official Linux kernel contains no such thing. On UNIX systems where it does exist, and possibly in some unofficial patches for Linux, it's a type of file that stops further lookup for a file but reports that it doesn't exist. It's useful with union and overlay filesystems, to be able to remove files that exist in the base image. The Linux kernel's overlayfs does have whiteouts, but they're S_IFCHR files with major and minor number 0, not S_IFWHT.

Related

Cross-platform way to determine if file has been edited?

I am writing a cross-platform (big 3 - Linux, MAC, Windows) backup program, so I need to know if a file has been edited since last time. My plan is to save the last save time in a file and check the real situation of a folder against the data in the file to determine which files need to be backed up or updated.
I would like to avoid methods that require a lot of processing power (like diff, or counting bytes).
In this similar post, people suggested to use fstat(), but that solution would be a last resort for me because I was hoping for a cross-platform solution that can be solved with pure C. As far as I know, fstat is a (2), and in my man page it appears as (1), which (to my understanding) means that it is a system function in Linux and isn't a part of the standard C library. I have searched for fstat on windows, but could only find some android version.
Is there some other way to access file metadata? Is there some other solution to this? I am open to any suggestions and am ok if it sometimes false-flags, as long as it backs up data correctly and doesn't waste resources on backing up everything all the time.
Please help!
Thank you!
fstat is still the way to do this, but on Windows it's called _fstat. You can check for the _MSC_VER macro which will be defined if you're building with MSVC, and if so create a macro alias for fstat.
You can do the same for struct stat which MSVC calls struct _stat:
#ifdef _MSC_VER
#define fstat(fd,buf) _fstat(fd,buf)
typedef struct _stat stat_struct;
#else
typedef struct stat stat_struct;
#endif
Then you can use fstat and pass it an argument of type stat_struct for the second argument.
I have a decently sized cross platform open source application that uses this technique.
My plan is to save the last save time in a file and check the real situation of a folder against the data in the file to determine which files need to be backed up or updated.
Ok.
I was hoping for a cross-platform solution that can be solved with pure C.
If by "pure C" you mean relying on only language features and library functions defined by the C language specification, then I'm afraid you're out of luck. Pure C (in that sense) has no concept of persistent file metadata such as modification timestamps. All functions and data structures dealing with such things are extensions or third-party libraries.
You can rely on standard POSIX facilities (such as fstat()) for both Linux and Mac, but Windows does not provide that. At least, Windows does not provide it exactly. The Microsoft C library does provide some POSIX compatibility functions, but it somewhat maddeningly uses modified names for them. In particular, it offers several flavors of _fstat() (note leading underscore). With a little bit of macro glue, it should not be too hard to make your program use POSIX fstat() on Linux and Mac, and use one of the _fstat() flavors on Windows.

unistd_64.h in Ubuntu

unistd_64 as my understanding (with lots of limited) contains the system call number. When I search the file from terminal, it shows more than one results under different directories as below:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h
I don't understand the difference between these files and the use of each file. And the file number 3 has .cmd, what does it mean?
If you are writing an ordinary C program that needs to know system call numbers, you should not use any of those headers. Instead, you should use <sys/syscall.h>. Your C program does not need to know the full pathname of this header; #include <sys/syscall.h> is all that is necessary. However, if you want to read it, it will be found somewhere in /usr/include, probably either /usr/include/sys/syscall.h or /usr/include/x86_64-linux-gnu/sys/syscall.h.
Now, I will explain the files you found:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h: This is a header file that may be used internally by sys/syscall.h. You can read it, but do not include it directly in your program. It probably defines a whole bunch of names that begin with __NR_. Those names should never be used in an ordinary, "userspace" program: always use the names beginning with SYS_ instead.
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h and /usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h: These are private kernel headers. They exist for the sake of people trying to build kernel modules that are developed separately from the kernel proper. It's possible that one of them is textually the same as /usr/include/x86_64-linux-gnu/asm/unistd_64 but that is not something you should rely on.
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd: This is not a header file at all, it is used by the Linux kernel's build system.
The first file, which resides in /usr/include (the system include directory) is the one you would include.
The others reside in /usr/src, which is a source code directory that should not be referenced.

pstatus_t no found in procfs.h (LINUX)

I am reading the /proc/PID/status file using my C program and I want to use the pstatus_t struct to directly read the values from the file into this struct. However, my compiler is showing that this file is not present in the procfs.h.
I have checked few examples on internet where they are using the same header file but in my case, it is not working.
When you say "reading /proc/PID/status", I'm assuming that you are running in userspace (as opposed to in the kernel). In this case, the pstatus_t structure is worthless to you. Most files under /proc, including status, are a text-formatted representation of the kernel data structures. There is no way to directly get the binary contents of a kernel pstatus_t structure.

Moving a file on Linux in C

Platform: Debian Wheezy 3.2.0-4-686-pae
Complier: GCC (Debian 4.7.2-5) 4.7.2 (Code::Blocks)
I want to move a file from one location to another. Nothing complex like moving to different drives or to different file systems. I know the "standard" way to do this would be simply copying the file and then removing the original. But I want some way of preserving the file's ownership, mode, last access/modification, etc. . I am assuming that I will have to copy the file and then edit the new file's ownership, mode, etc. afterwards but I have no idea how to do this.
The usual way to move a file in C is to use rename(2), which sometimes fail.
If you cannot use the rename(2) syscall (e.g. because source and target are on different filesystems), you have to query the size, permission and other metadata of the source file with stat(2); copy the data looping on read(2), write(2) (using a buffer of several kilobytes), open(2), close(2) and the metadata using chmod(2), chown(2), utime(2). You might also care about copying attributes using getxattr(2), setxattr(2), listxattr(2). You could also in some cases use sendfile(2), as commented by David C. Rankin.
And if the source and target are on different filesystems, there is no way to make the move atomic and avoid race conditions (So using rename(2) is preferable when possible, because it is atomic according to its man page). The source file can always be modified (by another process) during the move operations...
So a practical way to move files is to first try doing a rename(2), and if that fails with EXDEV (when oldpath and newpath are not on the same mounted filesystem), then you need to copy bytes and metadata. Several libraries provide functions doing that, e.g. Qt QFile::rename.
Read Advanced Linux Programming - and see syscalls(2) - for more (and also try to strace some mv command to understand what it is doing). That book is freely and legally downloadable (so you could find several copies on the Web).
The /bin/mv command (see mv(1)) is part of GNU coreutils which is free software. You could either study its source code, or use strace(1) to understand what that command does (in terms of syscalls(2)). In some open source Unix shells like sash or busybox, mv might be a shell builtin. See also path_resolution(7) and glob(7).
There are subtle corner cases (imagine another process or pthread doing some file operations on the same filesystem, directory, or files). Read some operating system textbook for more.
Using a mix of snprintf(3), system(3), mv(1) could be tricky if the file name contains weird characters such as tab or or newlines, or starts with an initial -. See errno(3).
If the original and new location for the file are on the same filesystem then a "move" is conceptually identical to a "rename."
#include <stdio.h>
int rename (const char *oldname, const char *newname)

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Resources